417 lines
8.0 KiB
Markdown
417 lines
8.0 KiB
Markdown
Here you go — a **clean, polished, ready-to-drop-into-Trilium or GitHub** Markdown file.
|
||
|
||
If you want, I can also auto-generate a matching `/docs/vllm-mi50/` folder structure and a mini-ToC.
|
||
|
||
---
|
||
|
||
# **MI50 + vLLM + Proxmox LXC Setup Guide**
|
||
|
||
### *End-to-End Field Manual for gfx906 LLM Serving*
|
||
|
||
**Version:** 1.0
|
||
**Last updated:** 2025-11-17
|
||
|
||
---
|
||
|
||
## **📌 Overview**
|
||
|
||
This guide documents how to run a **vLLM OpenAI-compatible server** on an
|
||
**AMD Instinct MI50 (gfx906)** inside a **Proxmox LXC container**, expose it over LAN,
|
||
and wire it into **Project Lyra's Cortex reasoning layer**.
|
||
|
||
This file is long, specific, and intentionally leaves *nothing* out so you never have to rediscover ROCm pain rituals again.
|
||
|
||
---
|
||
|
||
## **1. What This Stack Looks Like**
|
||
|
||
```
|
||
Proxmox Host
|
||
├─ AMD Instinct MI50 (gfx906)
|
||
├─ AMDGPU + ROCm stack
|
||
└─ LXC Container (CT 201: cortex-gpu)
|
||
├─ Ubuntu 24.04
|
||
├─ Docker + docker compose
|
||
├─ vLLM inside Docker (nalanzeyu/vllm-gfx906)
|
||
├─ GPU passthrough via /dev/kfd + /dev/dri + PCI bind
|
||
└─ vLLM API exposed on :8000
|
||
Lyra Cortex (VM/Server)
|
||
└─ LLM_PRIMARY_URL=http://10.0.0.43:8000
|
||
```
|
||
|
||
---
|
||
|
||
## **2. Proxmox Host — GPU Setup**
|
||
|
||
### **2.1 Confirm MI50 exists**
|
||
|
||
```bash
|
||
lspci -nn | grep -i 'vega\|instinct\|radeon'
|
||
```
|
||
|
||
You should see something like:
|
||
|
||
```
|
||
0a:00.0 Display controller: AMD Instinct MI50 (gfx906)
|
||
```
|
||
|
||
### **2.2 Load AMDGPU driver**
|
||
|
||
The main pitfall after **any host reboot**.
|
||
|
||
```bash
|
||
modprobe amdgpu
|
||
```
|
||
|
||
If you skip this, the LXC container won't see the GPU.
|
||
|
||
---
|
||
|
||
## **3. LXC Container Configuration (CT 201)**
|
||
|
||
The container ID is **201**.
|
||
Config file is at:
|
||
|
||
```
|
||
/etc/pve/lxc/201.conf
|
||
```
|
||
|
||
### **3.1 Working 201.conf**
|
||
|
||
Paste this *exact* version:
|
||
|
||
```ini
|
||
arch: amd64
|
||
cores: 4
|
||
hostname: cortex-gpu
|
||
memory: 16384
|
||
swap: 512
|
||
ostype: ubuntu
|
||
onboot: 1
|
||
startup: order=2,up=10,down=10
|
||
net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:C6:3E:88,ip=dhcp,type=veth
|
||
rootfs: local-lvm:vm-201-disk-0,size=200G
|
||
unprivileged: 0
|
||
|
||
# Docker in LXC requires this
|
||
features: keyctl=1,nesting=1
|
||
lxc.apparmor.profile: unconfined
|
||
lxc.cap.drop:
|
||
|
||
# --- GPU passthrough for ROCm (MI50) ---
|
||
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file,mode=0666
|
||
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
|
||
lxc.mount.entry: /sys/class/drm sys/class/drm none bind,ro,optional,create=dir
|
||
lxc.mount.entry: /opt/rocm /opt/rocm none bind,ro,optional,create=dir
|
||
|
||
# Bind the MI50 PCI device
|
||
lxc.mount.entry: /dev/bus/pci/0000:0a:00.0 dev/bus/pci/0000:0a:00.0 none bind,optional,create=file
|
||
|
||
# Allow GPU-related character devices
|
||
lxc.cgroup2.devices.allow: c 226:* rwm
|
||
lxc.cgroup2.devices.allow: c 29:* rwm
|
||
lxc.cgroup2.devices.allow: c 189:* rwm
|
||
lxc.cgroup2.devices.allow: c 238:* rwm
|
||
lxc.cgroup2.devices.allow: c 241:* rwm
|
||
lxc.cgroup2.devices.allow: c 242:* rwm
|
||
lxc.cgroup2.devices.allow: c 243:* rwm
|
||
lxc.cgroup2.devices.allow: c 244:* rwm
|
||
lxc.cgroup2.devices.allow: c 245:* rwm
|
||
lxc.cgroup2.devices.allow: c 246:* rwm
|
||
lxc.cgroup2.devices.allow: c 247:* rwm
|
||
lxc.cgroup2.devices.allow: c 248:* rwm
|
||
lxc.cgroup2.devices.allow: c 249:* rwm
|
||
lxc.cgroup2.devices.allow: c 250:* rwm
|
||
lxc.cgroup2.devices.allow: c 510:0 rwm
|
||
```
|
||
|
||
### **3.2 Restart sequence**
|
||
|
||
```bash
|
||
pct stop 201
|
||
modprobe amdgpu
|
||
pct start 201
|
||
pct enter 201
|
||
```
|
||
|
||
---
|
||
|
||
## **4. Inside CT 201 — Verifying ROCm + GPU Visibility**
|
||
|
||
### **4.1 Check device nodes**
|
||
|
||
```bash
|
||
ls -l /dev/kfd
|
||
ls -l /dev/dri
|
||
ls -l /opt/rocm
|
||
```
|
||
|
||
All must exist.
|
||
|
||
### **4.2 Validate GPU via rocminfo**
|
||
|
||
```bash
|
||
/opt/rocm/bin/rocminfo | grep -i gfx
|
||
```
|
||
|
||
You need to see:
|
||
|
||
```
|
||
gfx906
|
||
```
|
||
|
||
If you see **nothing**, the GPU isn’t passed through — restart and re-check the host steps.
|
||
|
||
---
|
||
|
||
## **5. Install Docker in the LXC (Ubuntu 24.04)**
|
||
|
||
This container runs Docker inside LXC (nesting enabled).
|
||
|
||
```bash
|
||
apt update
|
||
apt install -y ca-certificates curl gnupg
|
||
|
||
install -m 0755 -d /etc/apt/keyrings
|
||
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
|
||
| gpg --dearmor -o /etc/apt/keyrings/docker.gpg
|
||
chmod a+r /etc/apt/keyrings/docker.gpg
|
||
|
||
echo \
|
||
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
|
||
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
|
||
> /etc/apt/sources.list.d/docker.list
|
||
|
||
apt update
|
||
apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
||
```
|
||
|
||
Check:
|
||
|
||
```bash
|
||
docker --version
|
||
docker compose version
|
||
```
|
||
|
||
---
|
||
|
||
## **6. Running vLLM Inside CT 201 via Docker**
|
||
|
||
### **6.1 Create directory**
|
||
|
||
```bash
|
||
mkdir -p /root/vllm
|
||
cd /root/vllm
|
||
```
|
||
|
||
### **6.2 docker-compose.yml**
|
||
|
||
Save this exact file as `/root/vllm/docker-compose.yml`:
|
||
|
||
```yaml
|
||
version: "3.9"
|
||
|
||
services:
|
||
vllm-mi50:
|
||
image: nalanzeyu/vllm-gfx906:latest
|
||
container_name: vllm-mi50
|
||
restart: unless-stopped
|
||
ports:
|
||
- "8000:8000"
|
||
environment:
|
||
VLLM_ROLE: "APIServer"
|
||
VLLM_MODEL: "/model"
|
||
VLLM_LOGGING_LEVEL: "INFO"
|
||
command: >
|
||
vllm serve /model
|
||
--host 0.0.0.0
|
||
--port 8000
|
||
--dtype float16
|
||
--max-model-len 4096
|
||
--api-type openai
|
||
devices:
|
||
- "/dev/kfd:/dev/kfd"
|
||
- "/dev/dri:/dev/dri"
|
||
volumes:
|
||
- /opt/rocm:/opt/rocm:ro
|
||
```
|
||
|
||
### **6.3 Start vLLM**
|
||
|
||
```bash
|
||
docker compose up -d
|
||
docker compose logs -f
|
||
```
|
||
|
||
When healthy, you’ll see:
|
||
|
||
```
|
||
(APIServer) Application startup complete.
|
||
```
|
||
|
||
and periodic throughput logs.
|
||
|
||
---
|
||
|
||
## **7. Test vLLM API**
|
||
|
||
### **7.1 From Proxmox host**
|
||
|
||
```bash
|
||
curl -X POST http://10.0.0.43:8000/v1/completions \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"model":"/model","prompt":"ping","max_tokens":5}'
|
||
```
|
||
|
||
Should respond like:
|
||
|
||
```json
|
||
{"choices":[{"text":"-pong"}]}
|
||
```
|
||
|
||
### **7.2 From Cortex machine**
|
||
|
||
```bash
|
||
curl -X POST http://10.0.0.43:8000/v1/completions \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"model":"/model","prompt":"ping from cortex","max_tokens":5}'
|
||
```
|
||
|
||
---
|
||
|
||
## **8. Wiring into Lyra Cortex**
|
||
|
||
In `cortex` container’s `docker-compose.yml`:
|
||
|
||
```yaml
|
||
environment:
|
||
LLM_PRIMARY_URL: http://10.0.0.43:8000
|
||
```
|
||
|
||
Not `/v1/completions` because the router appends that automatically.
|
||
|
||
In `cortex/.env`:
|
||
|
||
```env
|
||
LLM_FORCE_BACKEND=primary
|
||
LLM_MODEL=/model
|
||
```
|
||
|
||
Test:
|
||
|
||
```bash
|
||
curl -X POST http://10.0.0.41:7081/reason \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"prompt":"test vllm","session_id":"dev"}'
|
||
```
|
||
|
||
If you get a meaningful response: **Cortex → vLLM is online**.
|
||
|
||
---
|
||
|
||
## **9. Common Failure Modes (And Fixes)**
|
||
|
||
### **9.1 “Failed to infer device type”**
|
||
|
||
vLLM cannot see any ROCm devices.
|
||
|
||
Fix:
|
||
|
||
```bash
|
||
# On host
|
||
modprobe amdgpu
|
||
pct stop 201
|
||
pct start 201
|
||
# In container
|
||
/opt/rocm/bin/rocminfo | grep -i gfx
|
||
docker compose up -d
|
||
```
|
||
|
||
### **9.2 GPU disappears after reboot**
|
||
|
||
Same fix:
|
||
|
||
```bash
|
||
modprobe amdgpu
|
||
pct stop 201
|
||
pct start 201
|
||
```
|
||
|
||
### **9.3 Invalid image name**
|
||
|
||
If you see pull errors:
|
||
|
||
```
|
||
pull access denied for nalanzeuy...
|
||
```
|
||
|
||
Use:
|
||
|
||
```
|
||
image: nalanzeyu/vllm-gfx906
|
||
```
|
||
|
||
### **9.4 Double `/v1` in URL**
|
||
|
||
Ensure:
|
||
|
||
```
|
||
LLM_PRIMARY_URL=http://10.0.0.43:8000
|
||
```
|
||
|
||
Router appends `/v1/completions`.
|
||
|
||
---
|
||
|
||
## **10. Daily / Reboot Ritual**
|
||
|
||
### **On Proxmox host**
|
||
|
||
```bash
|
||
modprobe amdgpu
|
||
pct stop 201
|
||
pct start 201
|
||
```
|
||
|
||
### **Inside CT 201**
|
||
|
||
```bash
|
||
/opt/rocm/bin/rocminfo | grep -i gfx
|
||
cd /root/vllm
|
||
docker compose up -d
|
||
docker compose logs -f
|
||
```
|
||
|
||
### **Test API**
|
||
|
||
```bash
|
||
curl -X POST http://10.0.0.43:8000/v1/completions \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"model":"/model","prompt":"ping","max_tokens":5}'
|
||
```
|
||
|
||
---
|
||
|
||
## **11. Summary**
|
||
|
||
You now have:
|
||
|
||
* **MI50 (gfx906)** correctly passed into LXC
|
||
* **ROCm** inside the container via bind mounts
|
||
* **vLLM** running inside Docker in the LXC
|
||
* **OpenAI-compatible API** on port 8000
|
||
* **Lyra Cortex** using it automatically as primary backend
|
||
|
||
This is a complete, reproducible setup that survives reboots (with the modprobe ritual) and allows you to upgrade/replace models anytime.
|
||
|
||
---
|
||
|
||
If you want, I can generate:
|
||
|
||
* A `/docs/vllm-mi50/README.md`
|
||
* A "vLLM Gotchas" document
|
||
* A quick-reference cheat sheet
|
||
* A troubleshooting decision tree
|
||
|
||
Just say the word.
|