project-lyra/vllm-mi50.md

Here you go — a **clean, polished, ready-to-drop-into-Trilium or GitHub** Markdown file.

If you want, I can also auto-generate a matching `/docs/vllm-mi50/` folder structure and a mini-ToC.

---

# **MI50 + vLLM + Proxmox LXC Setup Guide**

### *End-to-End Field Manual for gfx906 LLM Serving*

**Version:** 1.0
**Last updated:** 2025-11-17

---

## **📌 Overview**

This guide documents how to run a **vLLM OpenAI-compatible server** on an
**AMD Instinct MI50 (gfx906)** inside a **Proxmox LXC container**, expose it over LAN,
and wire it into **Project Lyra's Cortex reasoning layer**.

This file is long, specific, and intentionally leaves *nothing* out so you never have to rediscover ROCm pain rituals again.

---

## **1. What This Stack Looks Like**

```
Proxmox Host
 ├─ AMD Instinct MI50 (gfx906)
 ├─ AMDGPU + ROCm stack
 └─ LXC Container (CT 201: cortex-gpu)
      ├─ Ubuntu 24.04
      ├─ Docker + docker compose
      ├─ vLLM inside Docker (nalanzeyu/vllm-gfx906)
      ├─ GPU passthrough via /dev/kfd + /dev/dri + PCI bind
      └─ vLLM API exposed on :8000
Lyra Cortex (VM/Server)
 └─ LLM_PRIMARY_URL=http://10.0.0.43:8000
```

---

## **2. Proxmox Host — GPU Setup**

### **2.1 Confirm MI50 exists**

```bash
lspci -nn | grep -i 'vega\|instinct\|radeon'
```

You should see something like:

```
0a:00.0 Display controller: AMD Instinct MI50 (gfx906)
```

### **2.2 Load AMDGPU driver**

The main pitfall after **any host reboot**.

```bash
modprobe amdgpu
```

If you skip this, the LXC container won't see the GPU.

---

## **3. LXC Container Configuration (CT 201)**

The container ID is **201**.
Config file is at:

```
/etc/pve/lxc/201.conf
```

### **3.1 Working 201.conf**

Paste this *exact* version:

```ini
arch: amd64
cores: 4
hostname: cortex-gpu
memory: 16384
swap: 512
ostype: ubuntu
onboot: 1
startup: order=2,up=10,down=10
net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:C6:3E:88,ip=dhcp,type=veth
rootfs: local-lvm:vm-201-disk-0,size=200G
unprivileged: 0

# Docker in LXC requires this
features: keyctl=1,nesting=1
lxc.apparmor.profile: unconfined
lxc.cap.drop:

# --- GPU passthrough for ROCm (MI50) ---
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file,mode=0666
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /sys/class/drm sys/class/drm none bind,ro,optional,create=dir
lxc.mount.entry: /opt/rocm /opt/rocm none bind,ro,optional,create=dir

# Bind the MI50 PCI device
lxc.mount.entry: /dev/bus/pci/0000:0a:00.0 dev/bus/pci/0000:0a:00.0 none bind,optional,create=file

# Allow GPU-related character devices
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.cgroup2.devices.allow: c 29:* rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.cgroup2.devices.allow: c 238:* rwm
lxc.cgroup2.devices.allow: c 241:* rwm
lxc.cgroup2.devices.allow: c 242:* rwm
lxc.cgroup2.devices.allow: c 243:* rwm
lxc.cgroup2.devices.allow: c 244:* rwm
lxc.cgroup2.devices.allow: c 245:* rwm
lxc.cgroup2.devices.allow: c 246:* rwm
lxc.cgroup2.devices.allow: c 247:* rwm
lxc.cgroup2.devices.allow: c 248:* rwm
lxc.cgroup2.devices.allow: c 249:* rwm
lxc.cgroup2.devices.allow: c 250:* rwm
lxc.cgroup2.devices.allow: c 510:0 rwm
```

### **3.2 Restart sequence**

```bash
pct stop 201
modprobe amdgpu
pct start 201
pct enter 201
```

---

## **4. Inside CT 201 — Verifying ROCm + GPU Visibility**

### **4.1 Check device nodes**

```bash
ls -l /dev/kfd
ls -l /dev/dri
ls -l /opt/rocm
```

All must exist.

### **4.2 Validate GPU via rocminfo**

```bash
/opt/rocm/bin/rocminfo | grep -i gfx
```

You need to see:

```
gfx906
```

If you see **nothing**, the GPU isn’t passed through — restart and re-check the host steps.

---

## **5. Install Docker in the LXC (Ubuntu 24.04)**

This container runs Docker inside LXC (nesting enabled).

```bash
apt update
apt install -y ca-certificates curl gnupg

install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
  | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
  > /etc/apt/sources.list.d/docker.list

apt update
apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
```

Check:

```bash
docker --version
docker compose version
```

---

## **6. Running vLLM Inside CT 201 via Docker**

### **6.1 Create directory**

```bash
mkdir -p /root/vllm
cd /root/vllm
```

### **6.2 docker-compose.yml**

Save this exact file as `/root/vllm/docker-compose.yml`:

```yaml
version: "3.9"

services:
  vllm-mi50:
    image: nalanzeyu/vllm-gfx906:latest
    container_name: vllm-mi50
    restart: unless-stopped
    ports:
      - "8000:8000"
    environment:
      VLLM_ROLE: "APIServer"
      VLLM_MODEL: "/model"
      VLLM_LOGGING_LEVEL: "INFO"
    command: >
      vllm serve /model
      --host 0.0.0.0
      --port 8000
      --dtype float16
      --max-model-len 4096
      --api-type openai
    devices:
      - "/dev/kfd:/dev/kfd"
      - "/dev/dri:/dev/dri"
    volumes:
      - /opt/rocm:/opt/rocm:ro
```

### **6.3 Start vLLM**

```bash
docker compose up -d
docker compose logs -f
```

When healthy, you’ll see:

```
(APIServer) Application startup complete.
```

and periodic throughput logs.

---

## **7. Test vLLM API**

### **7.1 From Proxmox host**

```bash
curl -X POST http://10.0.0.43:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/model","prompt":"ping","max_tokens":5}'
```

Should respond like:

```json
{"choices":[{"text":"-pong"}]}
```

### **7.2 From Cortex machine**

```bash
curl -X POST http://10.0.0.43:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/model","prompt":"ping from cortex","max_tokens":5}'
```

---

## **8. Wiring into Lyra Cortex**

In `cortex` container’s `docker-compose.yml`:

```yaml
environment:
  LLM_PRIMARY_URL: http://10.0.0.43:8000
```

Not `/v1/completions` because the router appends that automatically.

In `cortex/.env`:

```env
LLM_FORCE_BACKEND=primary
LLM_MODEL=/model
```

Test:

```bash
curl -X POST http://10.0.0.41:7081/reason \
  -H "Content-Type: application/json" \
  -d '{"prompt":"test vllm","session_id":"dev"}'
```

If you get a meaningful response: **Cortex → vLLM is online**.

---

## **9. Common Failure Modes (And Fixes)**

### **9.1 “Failed to infer device type”**

vLLM cannot see any ROCm devices.

Fix:

```bash
# On host
modprobe amdgpu
pct stop 201
pct start 201
# In container
/opt/rocm/bin/rocminfo | grep -i gfx
docker compose up -d
```

### **9.2 GPU disappears after reboot**

Same fix:

```bash
modprobe amdgpu
pct stop 201
pct start 201
```

### **9.3 Invalid image name**

If you see pull errors:

```
pull access denied for nalanzeuy...
```

Use:

```
image: nalanzeyu/vllm-gfx906
```

### **9.4 Double `/v1` in URL**

Ensure:

```
LLM_PRIMARY_URL=http://10.0.0.43:8000
```

Router appends `/v1/completions`.

---

## **10. Daily / Reboot Ritual**

### **On Proxmox host**

```bash
modprobe amdgpu
pct stop 201
pct start 201
```

### **Inside CT 201**

```bash
/opt/rocm/bin/rocminfo | grep -i gfx
cd /root/vllm
docker compose up -d
docker compose logs -f
```

### **Test API**

```bash
curl -X POST http://10.0.0.43:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/model","prompt":"ping","max_tokens":5}'
```

---

## **11. Summary**

You now have:

* **MI50 (gfx906)** correctly passed into LXC
* **ROCm** inside the container via bind mounts
* **vLLM** running inside Docker in the LXC
* **OpenAI-compatible API** on port 8000
* **Lyra Cortex** using it automatically as primary backend

This is a complete, reproducible setup that survives reboots (with the modprobe ritual) and allows you to upgrade/replace models anytime.

---

If you want, I can generate:

* A `/docs/vllm-mi50/README.md`
* A "vLLM Gotchas" document
* A quick-reference cheat sheet
* A troubleshooting decision tree

Just say the word.