Merge branch 'main' of https://github.com/serversdwn/project-lyra

2025-11-17 03:41:51 -05:00
parent a19231abd0 e5e32f2683
commit b5fe47074a
3 changed files with 1324 additions and 908 deletions
@@ -0,0 +1,416 @@
+Here you go — a **clean, polished, ready-to-drop-into-Trilium or GitHub** Markdown file.
+
+If you want, I can also auto-generate a matching `/docs/vllm-mi50/` folder structure and a mini-ToC.
+
+---
+
+# **MI50 + vLLM + Proxmox LXC Setup Guide**
+
+### *End-to-End Field Manual for gfx906 LLM Serving*
+
+**Version:** 1.0
+**Last updated:** 2025-11-17
+
+---
+
+## **📌 Overview**
+
+This guide documents how to run a **vLLM OpenAI-compatible server** on an
+**AMD Instinct MI50 (gfx906)** inside a **Proxmox LXC container**, expose it over LAN,
+and wire it into **Project Lyra's Cortex reasoning layer**.
+
+This file is long, specific, and intentionally leaves *nothing* out so you never have to rediscover ROCm pain rituals again.
+
+---
+
+## **1. What This Stack Looks Like**
+
+```
+Proxmox Host
+ ├─ AMD Instinct MI50 (gfx906)
+ ├─ AMDGPU + ROCm stack
+ └─ LXC Container (CT 201: cortex-gpu)
+      ├─ Ubuntu 24.04
+      ├─ Docker + docker compose
+      ├─ vLLM inside Docker (nalanzeyu/vllm-gfx906)
+      ├─ GPU passthrough via /dev/kfd + /dev/dri + PCI bind
+      └─ vLLM API exposed on :8000
+Lyra Cortex (VM/Server)
+ └─ LLM_PRIMARY_URL=http://10.0.0.43:8000
+```
+
+---
+
+## **2. Proxmox Host — GPU Setup**
+
+### **2.1 Confirm MI50 exists**
+
+```bash
+lspci -nn | grep -i 'vega\|instinct\|radeon'
+```
+
+You should see something like:
+
+```
+0a:00.0 Display controller: AMD Instinct MI50 (gfx906)
+```
+
+### **2.2 Load AMDGPU driver**
+
+The main pitfall after **any host reboot**.
+
+```bash
+modprobe amdgpu
+```
+
+If you skip this, the LXC container won't see the GPU.
+
+---
+
+## **3. LXC Container Configuration (CT 201)**
+
+The container ID is **201**.
+Config file is at:
+
+```
+/etc/pve/lxc/201.conf
+```
+
+### **3.1 Working 201.conf**
+
+Paste this *exact* version:
+
+```ini
+arch: amd64
+cores: 4
+hostname: cortex-gpu
+memory: 16384
+swap: 512
+ostype: ubuntu
+onboot: 1
+startup: order=2,up=10,down=10
+net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:C6:3E:88,ip=dhcp,type=veth
+rootfs: local-lvm:vm-201-disk-0,size=200G
+unprivileged: 0
+
+# Docker in LXC requires this
+features: keyctl=1,nesting=1
+lxc.apparmor.profile: unconfined
+lxc.cap.drop:
+
+# --- GPU passthrough for ROCm (MI50) ---
+lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file,mode=0666
+lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
+lxc.mount.entry: /sys/class/drm sys/class/drm none bind,ro,optional,create=dir
+lxc.mount.entry: /opt/rocm /opt/rocm none bind,ro,optional,create=dir
+
+# Bind the MI50 PCI device
+lxc.mount.entry: /dev/bus/pci/0000:0a:00.0 dev/bus/pci/0000:0a:00.0 none bind,optional,create=file
+
+# Allow GPU-related character devices
+lxc.cgroup2.devices.allow: c 226:* rwm
+lxc.cgroup2.devices.allow: c 29:* rwm
+lxc.cgroup2.devices.allow: c 189:* rwm
+lxc.cgroup2.devices.allow: c 238:* rwm
+lxc.cgroup2.devices.allow: c 241:* rwm
+lxc.cgroup2.devices.allow: c 242:* rwm
+lxc.cgroup2.devices.allow: c 243:* rwm
+lxc.cgroup2.devices.allow: c 244:* rwm
+lxc.cgroup2.devices.allow: c 245:* rwm
+lxc.cgroup2.devices.allow: c 246:* rwm
+lxc.cgroup2.devices.allow: c 247:* rwm
+lxc.cgroup2.devices.allow: c 248:* rwm
+lxc.cgroup2.devices.allow: c 249:* rwm
+lxc.cgroup2.devices.allow: c 250:* rwm
+lxc.cgroup2.devices.allow: c 510:0 rwm
+```
+
+### **3.2 Restart sequence**
+
+```bash
+pct stop 201
+modprobe amdgpu
+pct start 201
+pct enter 201
+```
+
+---
+
+## **4. Inside CT 201 — Verifying ROCm + GPU Visibility**
+
+### **4.1 Check device nodes**
+
+```bash
+ls -l /dev/kfd
+ls -l /dev/dri
+ls -l /opt/rocm
+```
+
+All must exist.
+
+### **4.2 Validate GPU via rocminfo**
+
+```bash
+/opt/rocm/bin/rocminfo | grep -i gfx
+```
+
+You need to see:
+
+```
+gfx906
+```
+
+If you see **nothing**, the GPU isn’t passed through — restart and re-check the host steps.
+
+---
+
+## **5. Install Docker in the LXC (Ubuntu 24.04)**
+
+This container runs Docker inside LXC (nesting enabled).
+
+```bash
+apt update
+apt install -y ca-certificates curl gnupg
+
+install -m 0755 -d /etc/apt/keyrings
+curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
+  | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
+chmod a+r /etc/apt/keyrings/docker.gpg
+
+echo \
+  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
+  https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
+  > /etc/apt/sources.list.d/docker.list
+
+apt update
+apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
+```
+
+Check:
+
+```bash
+docker --version
+docker compose version
+```
+
+---
+
+## **6. Running vLLM Inside CT 201 via Docker**
+
+### **6.1 Create directory**
+
+```bash
+mkdir -p /root/vllm
+cd /root/vllm
+```
+
+### **6.2 docker-compose.yml**
+
+Save this exact file as `/root/vllm/docker-compose.yml`:
+
+```yaml
+version: "3.9"
+
+services:
+  vllm-mi50:
+    image: nalanzeyu/vllm-gfx906:latest
+    container_name: vllm-mi50
+    restart: unless-stopped
+    ports:
+      - "8000:8000"
+    environment:
+      VLLM_ROLE: "APIServer"
+      VLLM_MODEL: "/model"
+      VLLM_LOGGING_LEVEL: "INFO"
+    command: >
+      vllm serve /model
+      --host 0.0.0.0
+      --port 8000
+      --dtype float16
+      --max-model-len 4096
+      --api-type openai
+    devices:
+      - "/dev/kfd:/dev/kfd"
+      - "/dev/dri:/dev/dri"
+    volumes:
+      - /opt/rocm:/opt/rocm:ro
+```
+
+### **6.3 Start vLLM**
+
+```bash
+docker compose up -d
+docker compose logs -f
+```
+
+When healthy, you’ll see:
+
+```
+(APIServer) Application startup complete.
+```
+
+and periodic throughput logs.
+
+---
+
+## **7. Test vLLM API**
+
+### **7.1 From Proxmox host**
+
+```bash
+curl -X POST http://10.0.0.43:8000/v1/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model":"/model","prompt":"ping","max_tokens":5}'
+```
+
+Should respond like:
+
+```json
+{"choices":[{"text":"-pong"}]}
+```
+
+### **7.2 From Cortex machine**
+
+```bash
+curl -X POST http://10.0.0.43:8000/v1/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model":"/model","prompt":"ping from cortex","max_tokens":5}'
+```
+
+---
+
+## **8. Wiring into Lyra Cortex**
+
+In `cortex` container’s `docker-compose.yml`:
+
+```yaml
+environment:
+  LLM_PRIMARY_URL: http://10.0.0.43:8000
+```
+
+Not `/v1/completions` because the router appends that automatically.
+
+In `cortex/.env`:
+
+```env
+LLM_FORCE_BACKEND=primary
+LLM_MODEL=/model
+```
+
+Test:
+
+```bash
+curl -X POST http://10.0.0.41:7081/reason \
+  -H "Content-Type: application/json" \
+  -d '{"prompt":"test vllm","session_id":"dev"}'
+```
+
+If you get a meaningful response: **Cortex → vLLM is online**.
+
+---
+
+## **9. Common Failure Modes (And Fixes)**
+
+### **9.1 “Failed to infer device type”**
+
+vLLM cannot see any ROCm devices.
+
+Fix:
+
+```bash
+# On host
+modprobe amdgpu
+pct stop 201
+pct start 201
+# In container
+/opt/rocm/bin/rocminfo | grep -i gfx
+docker compose up -d
+```
+
+### **9.2 GPU disappears after reboot**
+
+Same fix:
+
+```bash
+modprobe amdgpu
+pct stop 201
+pct start 201
+```
+
+### **9.3 Invalid image name**
+
+If you see pull errors:
+
+```
+pull access denied for nalanzeuy...
+```
+
+Use:
+
+```
+image: nalanzeyu/vllm-gfx906
+```
+
+### **9.4 Double `/v1` in URL**
+
+Ensure:
+
+```
+LLM_PRIMARY_URL=http://10.0.0.43:8000
+```
+
+Router appends `/v1/completions`.
+
+---
+
+## **10. Daily / Reboot Ritual**
+
+### **On Proxmox host**
+
+```bash
+modprobe amdgpu
+pct stop 201
+pct start 201
+```
+
+### **Inside CT 201**
+
+```bash
+/opt/rocm/bin/rocminfo | grep -i gfx
+cd /root/vllm
+docker compose up -d
+docker compose logs -f
+```
+
+### **Test API**
+
+```bash
+curl -X POST http://10.0.0.43:8000/v1/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model":"/model","prompt":"ping","max_tokens":5}'
+```
+
+---
+
+## **11. Summary**
+
+You now have:
+
+* **MI50 (gfx906)** correctly passed into LXC
+* **ROCm** inside the container via bind mounts
+* **vLLM** running inside Docker in the LXC
+* **OpenAI-compatible API** on port 8000
+* **Lyra Cortex** using it automatically as primary backend
+
+This is a complete, reproducible setup that survives reboots (with the modprobe ritual) and allows you to upgrade/replace models anytime.
+
+---
+
+If you want, I can generate:
+
+* A `/docs/vllm-mi50/README.md`
+* A "vLLM Gotchas" document
+* A quick-reference cheat sheet
+* A troubleshooting decision tree
+
+Just say the word.