Add MI50 + vLLM full setup guide

2025-11-17 03:34:23 -05:00
parent 180af9eb63
commit e5e32f2683
3 changed files with 1324 additions and 908 deletions
@@ -0,0 +1,416 @@
 Here you go — a **clean, polished, ready-to-drop-into-Trilium or GitHub** Markdown file.
 If you want, I can also auto-generate a matching `/docs/vllm-mi50/` folder structure and a mini-ToC.
 ---
 # **MI50 + vLLM + Proxmox LXC Setup Guide**
 ### *End-to-End Field Manual for gfx906 LLM Serving*
 **Version:** 1.0
 **Last updated:** 2025-11-17
 ---
 ## **📌 Overview**
 This guide documents how to run a **vLLM OpenAI-compatible server** on an
 **AMD Instinct MI50 (gfx906)** inside a **Proxmox LXC container**, expose it over LAN,
 and wire it into **Project Lyra's Cortex reasoning layer**.
 This file is long, specific, and intentionally leaves *nothing* out so you never have to rediscover ROCm pain rituals again.
 ---
 ## **1. What This Stack Looks Like**
 ```
 Proxmox Host
 ├─ AMD Instinct MI50 (gfx906)
 ├─ AMDGPU + ROCm stack
 └─ LXC Container (CT 201: cortex-gpu)
      ├─ Ubuntu 24.04
      ├─ Docker + docker compose
      ├─ vLLM inside Docker (nalanzeyu/vllm-gfx906)
      ├─ GPU passthrough via /dev/kfd + /dev/dri + PCI bind
      └─ vLLM API exposed on :8000
 Lyra Cortex (VM/Server)
 └─ LLM_PRIMARY_URL=http://10.0.0.43:8000
 ```
 ---
 ## **2. Proxmox Host — GPU Setup**
 ### **2.1 Confirm MI50 exists**
 ```bash
 lspci -nn | grep -i 'vega\|instinct\|radeon'
 ```
 You should see something like:
 ```
 0a:00.0 Display controller: AMD Instinct MI50 (gfx906)
 ```
 ### **2.2 Load AMDGPU driver**
 The main pitfall after **any host reboot**.
 ```bash
 modprobe amdgpu
 ```
 If you skip this, the LXC container won't see the GPU.
 ---
 ## **3. LXC Container Configuration (CT 201)**
 The container ID is **201**.
 Config file is at:
 ```
 /etc/pve/lxc/201.conf
 ```
 ### **3.1 Working 201.conf**
 Paste this *exact* version:
 ```ini
 arch: amd64
 cores: 4
 hostname: cortex-gpu
 memory: 16384
 swap: 512
 ostype: ubuntu
 onboot: 1
 startup: order=2,up=10,down=10
 net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:C6:3E:88,ip=dhcp,type=veth
 rootfs: local-lvm:vm-201-disk-0,size=200G
 unprivileged: 0
 # Docker in LXC requires this
 features: keyctl=1,nesting=1
 lxc.apparmor.profile: unconfined
 lxc.cap.drop:
 # --- GPU passthrough for ROCm (MI50) ---
 lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file,mode=0666
 lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
 lxc.mount.entry: /sys/class/drm sys/class/drm none bind,ro,optional,create=dir
 lxc.mount.entry: /opt/rocm /opt/rocm none bind,ro,optional,create=dir
 # Bind the MI50 PCI device
 lxc.mount.entry: /dev/bus/pci/0000:0a:00.0 dev/bus/pci/0000:0a:00.0 none bind,optional,create=file
 # Allow GPU-related character devices
 lxc.cgroup2.devices.allow: c 226:* rwm
 lxc.cgroup2.devices.allow: c 29:* rwm
 lxc.cgroup2.devices.allow: c 189:* rwm
 lxc.cgroup2.devices.allow: c 238:* rwm
 lxc.cgroup2.devices.allow: c 241:* rwm
 lxc.cgroup2.devices.allow: c 242:* rwm
 lxc.cgroup2.devices.allow: c 243:* rwm
 lxc.cgroup2.devices.allow: c 244:* rwm
 lxc.cgroup2.devices.allow: c 245:* rwm
 lxc.cgroup2.devices.allow: c 246:* rwm
 lxc.cgroup2.devices.allow: c 247:* rwm
 lxc.cgroup2.devices.allow: c 248:* rwm
 lxc.cgroup2.devices.allow: c 249:* rwm
 lxc.cgroup2.devices.allow: c 250:* rwm
 lxc.cgroup2.devices.allow: c 510:0 rwm
 ```
 ### **3.2 Restart sequence**
 ```bash
 pct stop 201
 modprobe amdgpu
 pct start 201
 pct enter 201
 ```
 ---
 ## **4. Inside CT 201 — Verifying ROCm + GPU Visibility**
 ### **4.1 Check device nodes**
 ```bash
 ls -l /dev/kfd
 ls -l /dev/dri
 ls -l /opt/rocm
 ```
 All must exist.
 ### **4.2 Validate GPU via rocminfo**
 ```bash
 /opt/rocm/bin/rocminfo | grep -i gfx
 ```
 You need to see:
 ```
 gfx906
 ```
 If you see **nothing**, the GPU isn’t passed through — restart and re-check the host steps.
 ---
 ## **5. Install Docker in the LXC (Ubuntu 24.04)**
 This container runs Docker inside LXC (nesting enabled).
 ```bash
 apt update
 apt install -y ca-certificates curl gnupg
 install -m 0755 -d /etc/apt/keyrings
 curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
  | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
 chmod a+r /etc/apt/keyrings/docker.gpg
 echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
  > /etc/apt/sources.list.d/docker.list
 apt update
 apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
 ```
 Check:
 ```bash
 docker --version
 docker compose version
 ```
 ---
 ## **6. Running vLLM Inside CT 201 via Docker**
 ### **6.1 Create directory**
 ```bash
 mkdir -p /root/vllm
 cd /root/vllm
 ```
 ### **6.2 docker-compose.yml**
 Save this exact file as `/root/vllm/docker-compose.yml`:
 ```yaml
 version: "3.9"
 services:
  vllm-mi50:
    image: nalanzeyu/vllm-gfx906:latest
    container_name: vllm-mi50
    restart: unless-stopped
    ports:
      - "8000:8000"
    environment:
      VLLM_ROLE: "APIServer"
      VLLM_MODEL: "/model"
      VLLM_LOGGING_LEVEL: "INFO"
    command: >
      vllm serve /model
      --host 0.0.0.0
      --port 8000
      --dtype float16
      --max-model-len 4096
      --api-type openai
    devices:
      - "/dev/kfd:/dev/kfd"
      - "/dev/dri:/dev/dri"
    volumes:
      - /opt/rocm:/opt/rocm:ro
 ```
 ### **6.3 Start vLLM**
 ```bash
 docker compose up -d
 docker compose logs -f
 ```
 When healthy, you’ll see:
 ```
 (APIServer) Application startup complete.
 ```
 and periodic throughput logs.
 ---
 ## **7. Test vLLM API**
 ### **7.1 From Proxmox host**
 ```bash
 curl -X POST http://10.0.0.43:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/model","prompt":"ping","max_tokens":5}'
 ```
 Should respond like:
 ```json
 {"choices":[{"text":"-pong"}]}
 ```
 ### **7.2 From Cortex machine**
 ```bash
 curl -X POST http://10.0.0.43:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/model","prompt":"ping from cortex","max_tokens":5}'
 ```
 ---
 ## **8. Wiring into Lyra Cortex**
 In `cortex` container’s `docker-compose.yml`:
 ```yaml
 environment:
  LLM_PRIMARY_URL: http://10.0.0.43:8000
 ```
 Not `/v1/completions` because the router appends that automatically.
 In `cortex/.env`:
 ```env
 LLM_FORCE_BACKEND=primary
 LLM_MODEL=/model
 ```
 Test:
 ```bash
 curl -X POST http://10.0.0.41:7081/reason \
  -H "Content-Type: application/json" \
  -d '{"prompt":"test vllm","session_id":"dev"}'
 ```
 If you get a meaningful response: **Cortex → vLLM is online**.
 ---
 ## **9. Common Failure Modes (And Fixes)**
 ### **9.1 “Failed to infer device type”**
 vLLM cannot see any ROCm devices.
 Fix:
 ```bash
 # On host
 modprobe amdgpu
 pct stop 201
 pct start 201
 # In container
 /opt/rocm/bin/rocminfo | grep -i gfx
 docker compose up -d
 ```
 ### **9.2 GPU disappears after reboot**
 Same fix:
 ```bash
 modprobe amdgpu
 pct stop 201
 pct start 201
 ```
 ### **9.3 Invalid image name**
 If you see pull errors:
 ```
 pull access denied for nalanzeuy...
 ```
 Use:
 ```
 image: nalanzeyu/vllm-gfx906
 ```
 ### **9.4 Double `/v1` in URL**
 Ensure:
 ```
 LLM_PRIMARY_URL=http://10.0.0.43:8000
 ```
 Router appends `/v1/completions`.
 ---
 ## **10. Daily / Reboot Ritual**
 ### **On Proxmox host**
 ```bash
 modprobe amdgpu
 pct stop 201
 pct start 201
 ```
 ### **Inside CT 201**
 ```bash
 /opt/rocm/bin/rocminfo | grep -i gfx
 cd /root/vllm
 docker compose up -d
 docker compose logs -f
 ```
 ### **Test API**
 ```bash
 curl -X POST http://10.0.0.43:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/model","prompt":"ping","max_tokens":5}'
 ```
 ---
 ## **11. Summary**
 You now have:
 * **MI50 (gfx906)** correctly passed into LXC
 * **ROCm** inside the container via bind mounts
 * **vLLM** running inside Docker in the LXC
 * **OpenAI-compatible API** on port 8000
 * **Lyra Cortex** using it automatically as primary backend
 This is a complete, reproducible setup that survives reboots (with the modprobe ritual) and allows you to upgrade/replace models anytime.
 ---
 If you want, I can generate:
 * A `/docs/vllm-mi50/README.md`
 * A "vLLM Gotchas" document
 * A quick-reference cheat sheet
 * A troubleshooting decision tree
 Just say the word.