Files
project-lyra/vllm-mi50.md
2025-11-17 03:34:23 -05:00

8.0 KiB
Raw Blame History

Here you go — a clean, polished, ready-to-drop-into-Trilium or GitHub Markdown file.

If you want, I can also auto-generate a matching /docs/vllm-mi50/ folder structure and a mini-ToC.


MI50 + vLLM + Proxmox LXC Setup Guide

End-to-End Field Manual for gfx906 LLM Serving

Version: 1.0 Last updated: 2025-11-17


📌 Overview

This guide documents how to run a vLLM OpenAI-compatible server on an AMD Instinct MI50 (gfx906) inside a Proxmox LXC container, expose it over LAN, and wire it into Project Lyra's Cortex reasoning layer.

This file is long, specific, and intentionally leaves nothing out so you never have to rediscover ROCm pain rituals again.


1. What This Stack Looks Like

Proxmox Host
 ├─ AMD Instinct MI50 (gfx906)
 ├─ AMDGPU + ROCm stack
 └─ LXC Container (CT 201: cortex-gpu)
      ├─ Ubuntu 24.04
      ├─ Docker + docker compose
      ├─ vLLM inside Docker (nalanzeyu/vllm-gfx906)
      ├─ GPU passthrough via /dev/kfd + /dev/dri + PCI bind
      └─ vLLM API exposed on :8000
Lyra Cortex (VM/Server)
 └─ LLM_PRIMARY_URL=http://10.0.0.43:8000

2. Proxmox Host — GPU Setup

2.1 Confirm MI50 exists

lspci -nn | grep -i 'vega\|instinct\|radeon'

You should see something like:

0a:00.0 Display controller: AMD Instinct MI50 (gfx906)

2.2 Load AMDGPU driver

The main pitfall after any host reboot.

modprobe amdgpu

If you skip this, the LXC container won't see the GPU.


3. LXC Container Configuration (CT 201)

The container ID is 201. Config file is at:

/etc/pve/lxc/201.conf

3.1 Working 201.conf

Paste this exact version:

arch: amd64
cores: 4
hostname: cortex-gpu
memory: 16384
swap: 512
ostype: ubuntu
onboot: 1
startup: order=2,up=10,down=10
net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:C6:3E:88,ip=dhcp,type=veth
rootfs: local-lvm:vm-201-disk-0,size=200G
unprivileged: 0

# Docker in LXC requires this
features: keyctl=1,nesting=1
lxc.apparmor.profile: unconfined
lxc.cap.drop:

# --- GPU passthrough for ROCm (MI50) ---
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file,mode=0666
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /sys/class/drm sys/class/drm none bind,ro,optional,create=dir
lxc.mount.entry: /opt/rocm /opt/rocm none bind,ro,optional,create=dir

# Bind the MI50 PCI device
lxc.mount.entry: /dev/bus/pci/0000:0a:00.0 dev/bus/pci/0000:0a:00.0 none bind,optional,create=file

# Allow GPU-related character devices
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.cgroup2.devices.allow: c 29:* rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.cgroup2.devices.allow: c 238:* rwm
lxc.cgroup2.devices.allow: c 241:* rwm
lxc.cgroup2.devices.allow: c 242:* rwm
lxc.cgroup2.devices.allow: c 243:* rwm
lxc.cgroup2.devices.allow: c 244:* rwm
lxc.cgroup2.devices.allow: c 245:* rwm
lxc.cgroup2.devices.allow: c 246:* rwm
lxc.cgroup2.devices.allow: c 247:* rwm
lxc.cgroup2.devices.allow: c 248:* rwm
lxc.cgroup2.devices.allow: c 249:* rwm
lxc.cgroup2.devices.allow: c 250:* rwm
lxc.cgroup2.devices.allow: c 510:0 rwm

3.2 Restart sequence

pct stop 201
modprobe amdgpu
pct start 201
pct enter 201

4. Inside CT 201 — Verifying ROCm + GPU Visibility

4.1 Check device nodes

ls -l /dev/kfd
ls -l /dev/dri
ls -l /opt/rocm

All must exist.

4.2 Validate GPU via rocminfo

/opt/rocm/bin/rocminfo | grep -i gfx

You need to see:

gfx906

If you see nothing, the GPU isnt passed through — restart and re-check the host steps.


5. Install Docker in the LXC (Ubuntu 24.04)

This container runs Docker inside LXC (nesting enabled).

apt update
apt install -y ca-certificates curl gnupg

install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
  | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
  > /etc/apt/sources.list.d/docker.list

apt update
apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Check:

docker --version
docker compose version

6. Running vLLM Inside CT 201 via Docker

6.1 Create directory

mkdir -p /root/vllm
cd /root/vllm

6.2 docker-compose.yml

Save this exact file as /root/vllm/docker-compose.yml:

version: "3.9"

services:
  vllm-mi50:
    image: nalanzeyu/vllm-gfx906:latest
    container_name: vllm-mi50
    restart: unless-stopped
    ports:
      - "8000:8000"
    environment:
      VLLM_ROLE: "APIServer"
      VLLM_MODEL: "/model"
      VLLM_LOGGING_LEVEL: "INFO"
    command: >
      vllm serve /model
      --host 0.0.0.0
      --port 8000
      --dtype float16
      --max-model-len 4096
      --api-type openai
    devices:
      - "/dev/kfd:/dev/kfd"
      - "/dev/dri:/dev/dri"
    volumes:
      - /opt/rocm:/opt/rocm:ro

6.3 Start vLLM

docker compose up -d
docker compose logs -f

When healthy, youll see:

(APIServer) Application startup complete.

and periodic throughput logs.


7. Test vLLM API

7.1 From Proxmox host

curl -X POST http://10.0.0.43:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/model","prompt":"ping","max_tokens":5}'

Should respond like:

{"choices":[{"text":"-pong"}]}

7.2 From Cortex machine

curl -X POST http://10.0.0.43:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/model","prompt":"ping from cortex","max_tokens":5}'

8. Wiring into Lyra Cortex

In cortex containers docker-compose.yml:

environment:
  LLM_PRIMARY_URL: http://10.0.0.43:8000

Not /v1/completions because the router appends that automatically.

In cortex/.env:

LLM_FORCE_BACKEND=primary
LLM_MODEL=/model

Test:

curl -X POST http://10.0.0.41:7081/reason \
  -H "Content-Type: application/json" \
  -d '{"prompt":"test vllm","session_id":"dev"}'

If you get a meaningful response: Cortex → vLLM is online.


9. Common Failure Modes (And Fixes)

9.1 “Failed to infer device type”

vLLM cannot see any ROCm devices.

Fix:

# On host
modprobe amdgpu
pct stop 201
pct start 201
# In container
/opt/rocm/bin/rocminfo | grep -i gfx
docker compose up -d

9.2 GPU disappears after reboot

Same fix:

modprobe amdgpu
pct stop 201
pct start 201

9.3 Invalid image name

If you see pull errors:

pull access denied for nalanzeuy...

Use:

image: nalanzeyu/vllm-gfx906

9.4 Double /v1 in URL

Ensure:

LLM_PRIMARY_URL=http://10.0.0.43:8000

Router appends /v1/completions.


10. Daily / Reboot Ritual

On Proxmox host

modprobe amdgpu
pct stop 201
pct start 201

Inside CT 201

/opt/rocm/bin/rocminfo | grep -i gfx
cd /root/vllm
docker compose up -d
docker compose logs -f

Test API

curl -X POST http://10.0.0.43:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/model","prompt":"ping","max_tokens":5}'

11. Summary

You now have:

  • MI50 (gfx906) correctly passed into LXC
  • ROCm inside the container via bind mounts
  • vLLM running inside Docker in the LXC
  • OpenAI-compatible API on port 8000
  • Lyra Cortex using it automatically as primary backend

This is a complete, reproducible setup that survives reboots (with the modprobe ritual) and allows you to upgrade/replace models anytime.


If you want, I can generate:

  • A /docs/vllm-mi50/README.md
  • A "vLLM Gotchas" document
  • A quick-reference cheat sheet
  • A troubleshooting decision tree

Just say the word.