Files
project-lyra/vllm-mi50.md
2025-11-17 03:34:23 -05:00

417 lines
8.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Here you go — a **clean, polished, ready-to-drop-into-Trilium or GitHub** Markdown file.
If you want, I can also auto-generate a matching `/docs/vllm-mi50/` folder structure and a mini-ToC.
---
# **MI50 + vLLM + Proxmox LXC Setup Guide**
### *End-to-End Field Manual for gfx906 LLM Serving*
**Version:** 1.0
**Last updated:** 2025-11-17
---
## **📌 Overview**
This guide documents how to run a **vLLM OpenAI-compatible server** on an
**AMD Instinct MI50 (gfx906)** inside a **Proxmox LXC container**, expose it over LAN,
and wire it into **Project Lyra's Cortex reasoning layer**.
This file is long, specific, and intentionally leaves *nothing* out so you never have to rediscover ROCm pain rituals again.
---
## **1. What This Stack Looks Like**
```
Proxmox Host
├─ AMD Instinct MI50 (gfx906)
├─ AMDGPU + ROCm stack
└─ LXC Container (CT 201: cortex-gpu)
├─ Ubuntu 24.04
├─ Docker + docker compose
├─ vLLM inside Docker (nalanzeyu/vllm-gfx906)
├─ GPU passthrough via /dev/kfd + /dev/dri + PCI bind
└─ vLLM API exposed on :8000
Lyra Cortex (VM/Server)
└─ LLM_PRIMARY_URL=http://10.0.0.43:8000
```
---
## **2. Proxmox Host — GPU Setup**
### **2.1 Confirm MI50 exists**
```bash
lspci -nn | grep -i 'vega\|instinct\|radeon'
```
You should see something like:
```
0a:00.0 Display controller: AMD Instinct MI50 (gfx906)
```
### **2.2 Load AMDGPU driver**
The main pitfall after **any host reboot**.
```bash
modprobe amdgpu
```
If you skip this, the LXC container won't see the GPU.
---
## **3. LXC Container Configuration (CT 201)**
The container ID is **201**.
Config file is at:
```
/etc/pve/lxc/201.conf
```
### **3.1 Working 201.conf**
Paste this *exact* version:
```ini
arch: amd64
cores: 4
hostname: cortex-gpu
memory: 16384
swap: 512
ostype: ubuntu
onboot: 1
startup: order=2,up=10,down=10
net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:C6:3E:88,ip=dhcp,type=veth
rootfs: local-lvm:vm-201-disk-0,size=200G
unprivileged: 0
# Docker in LXC requires this
features: keyctl=1,nesting=1
lxc.apparmor.profile: unconfined
lxc.cap.drop:
# --- GPU passthrough for ROCm (MI50) ---
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file,mode=0666
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /sys/class/drm sys/class/drm none bind,ro,optional,create=dir
lxc.mount.entry: /opt/rocm /opt/rocm none bind,ro,optional,create=dir
# Bind the MI50 PCI device
lxc.mount.entry: /dev/bus/pci/0000:0a:00.0 dev/bus/pci/0000:0a:00.0 none bind,optional,create=file
# Allow GPU-related character devices
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.cgroup2.devices.allow: c 29:* rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.cgroup2.devices.allow: c 238:* rwm
lxc.cgroup2.devices.allow: c 241:* rwm
lxc.cgroup2.devices.allow: c 242:* rwm
lxc.cgroup2.devices.allow: c 243:* rwm
lxc.cgroup2.devices.allow: c 244:* rwm
lxc.cgroup2.devices.allow: c 245:* rwm
lxc.cgroup2.devices.allow: c 246:* rwm
lxc.cgroup2.devices.allow: c 247:* rwm
lxc.cgroup2.devices.allow: c 248:* rwm
lxc.cgroup2.devices.allow: c 249:* rwm
lxc.cgroup2.devices.allow: c 250:* rwm
lxc.cgroup2.devices.allow: c 510:0 rwm
```
### **3.2 Restart sequence**
```bash
pct stop 201
modprobe amdgpu
pct start 201
pct enter 201
```
---
## **4. Inside CT 201 — Verifying ROCm + GPU Visibility**
### **4.1 Check device nodes**
```bash
ls -l /dev/kfd
ls -l /dev/dri
ls -l /opt/rocm
```
All must exist.
### **4.2 Validate GPU via rocminfo**
```bash
/opt/rocm/bin/rocminfo | grep -i gfx
```
You need to see:
```
gfx906
```
If you see **nothing**, the GPU isnt passed through — restart and re-check the host steps.
---
## **5. Install Docker in the LXC (Ubuntu 24.04)**
This container runs Docker inside LXC (nesting enabled).
```bash
apt update
apt install -y ca-certificates curl gnupg
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
> /etc/apt/sources.list.d/docker.list
apt update
apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
```
Check:
```bash
docker --version
docker compose version
```
---
## **6. Running vLLM Inside CT 201 via Docker**
### **6.1 Create directory**
```bash
mkdir -p /root/vllm
cd /root/vllm
```
### **6.2 docker-compose.yml**
Save this exact file as `/root/vllm/docker-compose.yml`:
```yaml
version: "3.9"
services:
vllm-mi50:
image: nalanzeyu/vllm-gfx906:latest
container_name: vllm-mi50
restart: unless-stopped
ports:
- "8000:8000"
environment:
VLLM_ROLE: "APIServer"
VLLM_MODEL: "/model"
VLLM_LOGGING_LEVEL: "INFO"
command: >
vllm serve /model
--host 0.0.0.0
--port 8000
--dtype float16
--max-model-len 4096
--api-type openai
devices:
- "/dev/kfd:/dev/kfd"
- "/dev/dri:/dev/dri"
volumes:
- /opt/rocm:/opt/rocm:ro
```
### **6.3 Start vLLM**
```bash
docker compose up -d
docker compose logs -f
```
When healthy, youll see:
```
(APIServer) Application startup complete.
```
and periodic throughput logs.
---
## **7. Test vLLM API**
### **7.1 From Proxmox host**
```bash
curl -X POST http://10.0.0.43:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"model":"/model","prompt":"ping","max_tokens":5}'
```
Should respond like:
```json
{"choices":[{"text":"-pong"}]}
```
### **7.2 From Cortex machine**
```bash
curl -X POST http://10.0.0.43:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"model":"/model","prompt":"ping from cortex","max_tokens":5}'
```
---
## **8. Wiring into Lyra Cortex**
In `cortex` containers `docker-compose.yml`:
```yaml
environment:
LLM_PRIMARY_URL: http://10.0.0.43:8000
```
Not `/v1/completions` because the router appends that automatically.
In `cortex/.env`:
```env
LLM_FORCE_BACKEND=primary
LLM_MODEL=/model
```
Test:
```bash
curl -X POST http://10.0.0.41:7081/reason \
-H "Content-Type: application/json" \
-d '{"prompt":"test vllm","session_id":"dev"}'
```
If you get a meaningful response: **Cortex → vLLM is online**.
---
## **9. Common Failure Modes (And Fixes)**
### **9.1 “Failed to infer device type”**
vLLM cannot see any ROCm devices.
Fix:
```bash
# On host
modprobe amdgpu
pct stop 201
pct start 201
# In container
/opt/rocm/bin/rocminfo | grep -i gfx
docker compose up -d
```
### **9.2 GPU disappears after reboot**
Same fix:
```bash
modprobe amdgpu
pct stop 201
pct start 201
```
### **9.3 Invalid image name**
If you see pull errors:
```
pull access denied for nalanzeuy...
```
Use:
```
image: nalanzeyu/vllm-gfx906
```
### **9.4 Double `/v1` in URL**
Ensure:
```
LLM_PRIMARY_URL=http://10.0.0.43:8000
```
Router appends `/v1/completions`.
---
## **10. Daily / Reboot Ritual**
### **On Proxmox host**
```bash
modprobe amdgpu
pct stop 201
pct start 201
```
### **Inside CT 201**
```bash
/opt/rocm/bin/rocminfo | grep -i gfx
cd /root/vllm
docker compose up -d
docker compose logs -f
```
### **Test API**
```bash
curl -X POST http://10.0.0.43:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"model":"/model","prompt":"ping","max_tokens":5}'
```
---
## **11. Summary**
You now have:
* **MI50 (gfx906)** correctly passed into LXC
* **ROCm** inside the container via bind mounts
* **vLLM** running inside Docker in the LXC
* **OpenAI-compatible API** on port 8000
* **Lyra Cortex** using it automatically as primary backend
This is a complete, reproducible setup that survives reboots (with the modprobe ritual) and allows you to upgrade/replace models anytime.
---
If you want, I can generate:
* A `/docs/vllm-mi50/README.md`
* A "vLLM Gotchas" document
* A quick-reference cheat sheet
* A troubleshooting decision tree
Just say the word.