8.0 KiB
Here you go — a clean, polished, ready-to-drop-into-Trilium or GitHub Markdown file.
If you want, I can also auto-generate a matching /docs/vllm-mi50/ folder structure and a mini-ToC.
MI50 + vLLM + Proxmox LXC Setup Guide
End-to-End Field Manual for gfx906 LLM Serving
Version: 1.0 Last updated: 2025-11-17
📌 Overview
This guide documents how to run a vLLM OpenAI-compatible server on an AMD Instinct MI50 (gfx906) inside a Proxmox LXC container, expose it over LAN, and wire it into Project Lyra's Cortex reasoning layer.
This file is long, specific, and intentionally leaves nothing out so you never have to rediscover ROCm pain rituals again.
1. What This Stack Looks Like
Proxmox Host
├─ AMD Instinct MI50 (gfx906)
├─ AMDGPU + ROCm stack
└─ LXC Container (CT 201: cortex-gpu)
├─ Ubuntu 24.04
├─ Docker + docker compose
├─ vLLM inside Docker (nalanzeyu/vllm-gfx906)
├─ GPU passthrough via /dev/kfd + /dev/dri + PCI bind
└─ vLLM API exposed on :8000
Lyra Cortex (VM/Server)
└─ LLM_PRIMARY_URL=http://10.0.0.43:8000
2. Proxmox Host — GPU Setup
2.1 Confirm MI50 exists
lspci -nn | grep -i 'vega\|instinct\|radeon'
You should see something like:
0a:00.0 Display controller: AMD Instinct MI50 (gfx906)
2.2 Load AMDGPU driver
The main pitfall after any host reboot.
modprobe amdgpu
If you skip this, the LXC container won't see the GPU.
3. LXC Container Configuration (CT 201)
The container ID is 201. Config file is at:
/etc/pve/lxc/201.conf
3.1 Working 201.conf
Paste this exact version:
arch: amd64
cores: 4
hostname: cortex-gpu
memory: 16384
swap: 512
ostype: ubuntu
onboot: 1
startup: order=2,up=10,down=10
net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:C6:3E:88,ip=dhcp,type=veth
rootfs: local-lvm:vm-201-disk-0,size=200G
unprivileged: 0
# Docker in LXC requires this
features: keyctl=1,nesting=1
lxc.apparmor.profile: unconfined
lxc.cap.drop:
# --- GPU passthrough for ROCm (MI50) ---
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file,mode=0666
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /sys/class/drm sys/class/drm none bind,ro,optional,create=dir
lxc.mount.entry: /opt/rocm /opt/rocm none bind,ro,optional,create=dir
# Bind the MI50 PCI device
lxc.mount.entry: /dev/bus/pci/0000:0a:00.0 dev/bus/pci/0000:0a:00.0 none bind,optional,create=file
# Allow GPU-related character devices
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.cgroup2.devices.allow: c 29:* rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.cgroup2.devices.allow: c 238:* rwm
lxc.cgroup2.devices.allow: c 241:* rwm
lxc.cgroup2.devices.allow: c 242:* rwm
lxc.cgroup2.devices.allow: c 243:* rwm
lxc.cgroup2.devices.allow: c 244:* rwm
lxc.cgroup2.devices.allow: c 245:* rwm
lxc.cgroup2.devices.allow: c 246:* rwm
lxc.cgroup2.devices.allow: c 247:* rwm
lxc.cgroup2.devices.allow: c 248:* rwm
lxc.cgroup2.devices.allow: c 249:* rwm
lxc.cgroup2.devices.allow: c 250:* rwm
lxc.cgroup2.devices.allow: c 510:0 rwm
3.2 Restart sequence
pct stop 201
modprobe amdgpu
pct start 201
pct enter 201
4. Inside CT 201 — Verifying ROCm + GPU Visibility
4.1 Check device nodes
ls -l /dev/kfd
ls -l /dev/dri
ls -l /opt/rocm
All must exist.
4.2 Validate GPU via rocminfo
/opt/rocm/bin/rocminfo | grep -i gfx
You need to see:
gfx906
If you see nothing, the GPU isn’t passed through — restart and re-check the host steps.
5. Install Docker in the LXC (Ubuntu 24.04)
This container runs Docker inside LXC (nesting enabled).
apt update
apt install -y ca-certificates curl gnupg
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
> /etc/apt/sources.list.d/docker.list
apt update
apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Check:
docker --version
docker compose version
6. Running vLLM Inside CT 201 via Docker
6.1 Create directory
mkdir -p /root/vllm
cd /root/vllm
6.2 docker-compose.yml
Save this exact file as /root/vllm/docker-compose.yml:
version: "3.9"
services:
vllm-mi50:
image: nalanzeyu/vllm-gfx906:latest
container_name: vllm-mi50
restart: unless-stopped
ports:
- "8000:8000"
environment:
VLLM_ROLE: "APIServer"
VLLM_MODEL: "/model"
VLLM_LOGGING_LEVEL: "INFO"
command: >
vllm serve /model
--host 0.0.0.0
--port 8000
--dtype float16
--max-model-len 4096
--api-type openai
devices:
- "/dev/kfd:/dev/kfd"
- "/dev/dri:/dev/dri"
volumes:
- /opt/rocm:/opt/rocm:ro
6.3 Start vLLM
docker compose up -d
docker compose logs -f
When healthy, you’ll see:
(APIServer) Application startup complete.
and periodic throughput logs.
7. Test vLLM API
7.1 From Proxmox host
curl -X POST http://10.0.0.43:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"model":"/model","prompt":"ping","max_tokens":5}'
Should respond like:
{"choices":[{"text":"-pong"}]}
7.2 From Cortex machine
curl -X POST http://10.0.0.43:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"model":"/model","prompt":"ping from cortex","max_tokens":5}'
8. Wiring into Lyra Cortex
In cortex container’s docker-compose.yml:
environment:
LLM_PRIMARY_URL: http://10.0.0.43:8000
Not /v1/completions because the router appends that automatically.
In cortex/.env:
LLM_FORCE_BACKEND=primary
LLM_MODEL=/model
Test:
curl -X POST http://10.0.0.41:7081/reason \
-H "Content-Type: application/json" \
-d '{"prompt":"test vllm","session_id":"dev"}'
If you get a meaningful response: Cortex → vLLM is online.
9. Common Failure Modes (And Fixes)
9.1 “Failed to infer device type”
vLLM cannot see any ROCm devices.
Fix:
# On host
modprobe amdgpu
pct stop 201
pct start 201
# In container
/opt/rocm/bin/rocminfo | grep -i gfx
docker compose up -d
9.2 GPU disappears after reboot
Same fix:
modprobe amdgpu
pct stop 201
pct start 201
9.3 Invalid image name
If you see pull errors:
pull access denied for nalanzeuy...
Use:
image: nalanzeyu/vllm-gfx906
9.4 Double /v1 in URL
Ensure:
LLM_PRIMARY_URL=http://10.0.0.43:8000
Router appends /v1/completions.
10. Daily / Reboot Ritual
On Proxmox host
modprobe amdgpu
pct stop 201
pct start 201
Inside CT 201
/opt/rocm/bin/rocminfo | grep -i gfx
cd /root/vllm
docker compose up -d
docker compose logs -f
Test API
curl -X POST http://10.0.0.43:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"model":"/model","prompt":"ping","max_tokens":5}'
11. Summary
You now have:
- MI50 (gfx906) correctly passed into LXC
- ROCm inside the container via bind mounts
- vLLM running inside Docker in the LXC
- OpenAI-compatible API on port 8000
- Lyra Cortex using it automatically as primary backend
This is a complete, reproducible setup that survives reboots (with the modprobe ritual) and allows you to upgrade/replace models anytime.
If you want, I can generate:
- A
/docs/vllm-mi50/README.md - A "vLLM Gotchas" document
- A quick-reference cheat sheet
- A troubleshooting decision tree
Just say the word.