Here you go β€” a **clean, polished, ready-to-drop-into-Trilium or GitHub** Markdown file. If you want, I can also auto-generate a matching `/docs/vllm-mi50/` folder structure and a mini-ToC. --- # **MI50 + vLLM + Proxmox LXC Setup Guide** ### *End-to-End Field Manual for gfx906 LLM Serving* **Version:** 1.0 **Last updated:** 2025-11-17 --- ## **πŸ“Œ Overview** This guide documents how to run a **vLLM OpenAI-compatible server** on an **AMD Instinct MI50 (gfx906)** inside a **Proxmox LXC container**, expose it over LAN, and wire it into **Project Lyra's Cortex reasoning layer**. This file is long, specific, and intentionally leaves *nothing* out so you never have to rediscover ROCm pain rituals again. --- ## **1. What This Stack Looks Like** ``` Proxmox Host β”œβ”€ AMD Instinct MI50 (gfx906) β”œβ”€ AMDGPU + ROCm stack └─ LXC Container (CT 201: cortex-gpu) β”œβ”€ Ubuntu 24.04 β”œβ”€ Docker + docker compose β”œβ”€ vLLM inside Docker (nalanzeyu/vllm-gfx906) β”œβ”€ GPU passthrough via /dev/kfd + /dev/dri + PCI bind └─ vLLM API exposed on :8000 Lyra Cortex (VM/Server) └─ LLM_PRIMARY_URL=http://10.0.0.43:8000 ``` --- ## **2. Proxmox Host β€” GPU Setup** ### **2.1 Confirm MI50 exists** ```bash lspci -nn | grep -i 'vega\|instinct\|radeon' ``` You should see something like: ``` 0a:00.0 Display controller: AMD Instinct MI50 (gfx906) ``` ### **2.2 Load AMDGPU driver** The main pitfall after **any host reboot**. ```bash modprobe amdgpu ``` If you skip this, the LXC container won't see the GPU. --- ## **3. LXC Container Configuration (CT 201)** The container ID is **201**. Config file is at: ``` /etc/pve/lxc/201.conf ``` ### **3.1 Working 201.conf** Paste this *exact* version: ```ini arch: amd64 cores: 4 hostname: cortex-gpu memory: 16384 swap: 512 ostype: ubuntu onboot: 1 startup: order=2,up=10,down=10 net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:C6:3E:88,ip=dhcp,type=veth rootfs: local-lvm:vm-201-disk-0,size=200G unprivileged: 0 # Docker in LXC requires this features: keyctl=1,nesting=1 lxc.apparmor.profile: unconfined lxc.cap.drop: # --- GPU passthrough for ROCm (MI50) --- lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file,mode=0666 lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir lxc.mount.entry: /sys/class/drm sys/class/drm none bind,ro,optional,create=dir lxc.mount.entry: /opt/rocm /opt/rocm none bind,ro,optional,create=dir # Bind the MI50 PCI device lxc.mount.entry: /dev/bus/pci/0000:0a:00.0 dev/bus/pci/0000:0a:00.0 none bind,optional,create=file # Allow GPU-related character devices lxc.cgroup2.devices.allow: c 226:* rwm lxc.cgroup2.devices.allow: c 29:* rwm lxc.cgroup2.devices.allow: c 189:* rwm lxc.cgroup2.devices.allow: c 238:* rwm lxc.cgroup2.devices.allow: c 241:* rwm lxc.cgroup2.devices.allow: c 242:* rwm lxc.cgroup2.devices.allow: c 243:* rwm lxc.cgroup2.devices.allow: c 244:* rwm lxc.cgroup2.devices.allow: c 245:* rwm lxc.cgroup2.devices.allow: c 246:* rwm lxc.cgroup2.devices.allow: c 247:* rwm lxc.cgroup2.devices.allow: c 248:* rwm lxc.cgroup2.devices.allow: c 249:* rwm lxc.cgroup2.devices.allow: c 250:* rwm lxc.cgroup2.devices.allow: c 510:0 rwm ``` ### **3.2 Restart sequence** ```bash pct stop 201 modprobe amdgpu pct start 201 pct enter 201 ``` --- ## **4. Inside CT 201 β€” Verifying ROCm + GPU Visibility** ### **4.1 Check device nodes** ```bash ls -l /dev/kfd ls -l /dev/dri ls -l /opt/rocm ``` All must exist. ### **4.2 Validate GPU via rocminfo** ```bash /opt/rocm/bin/rocminfo | grep -i gfx ``` You need to see: ``` gfx906 ``` If you see **nothing**, the GPU isn’t passed through β€” restart and re-check the host steps. --- ## **5. Install Docker in the LXC (Ubuntu 24.04)** This container runs Docker inside LXC (nesting enabled). ```bash apt update apt install -y ca-certificates curl gnupg install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg \ | gpg --dearmor -o /etc/apt/keyrings/docker.gpg chmod a+r /etc/apt/keyrings/docker.gpg echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \ https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \ > /etc/apt/sources.list.d/docker.list apt update apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin ``` Check: ```bash docker --version docker compose version ``` --- ## **6. Running vLLM Inside CT 201 via Docker** ### **6.1 Create directory** ```bash mkdir -p /root/vllm cd /root/vllm ``` ### **6.2 docker-compose.yml** Save this exact file as `/root/vllm/docker-compose.yml`: ```yaml version: "3.9" services: vllm-mi50: image: nalanzeyu/vllm-gfx906:latest container_name: vllm-mi50 restart: unless-stopped ports: - "8000:8000" environment: VLLM_ROLE: "APIServer" VLLM_MODEL: "/model" VLLM_LOGGING_LEVEL: "INFO" command: > vllm serve /model --host 0.0.0.0 --port 8000 --dtype float16 --max-model-len 4096 --api-type openai devices: - "/dev/kfd:/dev/kfd" - "/dev/dri:/dev/dri" volumes: - /opt/rocm:/opt/rocm:ro ``` ### **6.3 Start vLLM** ```bash docker compose up -d docker compose logs -f ``` When healthy, you’ll see: ``` (APIServer) Application startup complete. ``` and periodic throughput logs. --- ## **7. Test vLLM API** ### **7.1 From Proxmox host** ```bash curl -X POST http://10.0.0.43:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{"model":"/model","prompt":"ping","max_tokens":5}' ``` Should respond like: ```json {"choices":[{"text":"-pong"}]} ``` ### **7.2 From Cortex machine** ```bash curl -X POST http://10.0.0.43:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{"model":"/model","prompt":"ping from cortex","max_tokens":5}' ``` --- ## **8. Wiring into Lyra Cortex** In `cortex` container’s `docker-compose.yml`: ```yaml environment: LLM_PRIMARY_URL: http://10.0.0.43:8000 ``` Not `/v1/completions` because the router appends that automatically. In `cortex/.env`: ```env LLM_FORCE_BACKEND=primary LLM_MODEL=/model ``` Test: ```bash curl -X POST http://10.0.0.41:7081/reason \ -H "Content-Type: application/json" \ -d '{"prompt":"test vllm","session_id":"dev"}' ``` If you get a meaningful response: **Cortex β†’ vLLM is online**. --- ## **9. Common Failure Modes (And Fixes)** ### **9.1 β€œFailed to infer device type”** vLLM cannot see any ROCm devices. Fix: ```bash # On host modprobe amdgpu pct stop 201 pct start 201 # In container /opt/rocm/bin/rocminfo | grep -i gfx docker compose up -d ``` ### **9.2 GPU disappears after reboot** Same fix: ```bash modprobe amdgpu pct stop 201 pct start 201 ``` ### **9.3 Invalid image name** If you see pull errors: ``` pull access denied for nalanzeuy... ``` Use: ``` image: nalanzeyu/vllm-gfx906 ``` ### **9.4 Double `/v1` in URL** Ensure: ``` LLM_PRIMARY_URL=http://10.0.0.43:8000 ``` Router appends `/v1/completions`. --- ## **10. Daily / Reboot Ritual** ### **On Proxmox host** ```bash modprobe amdgpu pct stop 201 pct start 201 ``` ### **Inside CT 201** ```bash /opt/rocm/bin/rocminfo | grep -i gfx cd /root/vllm docker compose up -d docker compose logs -f ``` ### **Test API** ```bash curl -X POST http://10.0.0.43:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{"model":"/model","prompt":"ping","max_tokens":5}' ``` --- ## **11. Summary** You now have: * **MI50 (gfx906)** correctly passed into LXC * **ROCm** inside the container via bind mounts * **vLLM** running inside Docker in the LXC * **OpenAI-compatible API** on port 8000 * **Lyra Cortex** using it automatically as primary backend This is a complete, reproducible setup that survives reboots (with the modprobe ritual) and allows you to upgrade/replace models anytime. --- If you want, I can generate: * A `/docs/vllm-mi50/README.md` * A "vLLM Gotchas" document * A quick-reference cheat sheet * A troubleshooting decision tree Just say the word.