Update to v0.9.1 #1

Merged
serversdown merged 44 commits from dev into main 2026-01-18 02:46:25 -05:00
4 changed files with 123 additions and 185 deletions
Showing only changes of commit 4acaddfd12 - Show all commits

249
README.md
View File

@@ -2,7 +2,7 @@
Lyra is a modular persistent AI companion system with advanced reasoning capabilities. Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**, It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
with multi-stage reasoning pipeline powered by distributed LLM backends. with multi-stage reasoning pipeline powered by HTTP-based LLM backends.
## Mission Statement ## Mission Statement
@@ -12,9 +12,9 @@ The point of Project Lyra is to give an AI chatbot more abilities than a typical
## Architecture Overview ## Architecture Overview
Project Lyra operates as a series of Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules: Project Lyra operates as a **single docker-compose deployment** with multiple Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
### A. VM 100 - lyra-core (Core Services) ### Core Services
**1. Relay** (Node.js/Express) - Port 7078 **1. Relay** (Node.js/Express) - Port 7078
- Main orchestrator and message router - Main orchestrator and message router
@@ -26,7 +26,7 @@ Project Lyra operates as a series of Docker containers networked together in a m
**2. UI** (Static HTML) **2. UI** (Static HTML)
- Browser-based chat interface with cyberpunk theme - Browser-based chat interface with cyberpunk theme
- Connects to Relay at `http://10.0.0.40:7078` - Connects to Relay
- Saves and loads sessions - Saves and loads sessions
- OpenAI-compatible message format - OpenAI-compatible message format
@@ -37,7 +37,7 @@ Project Lyra operates as a series of Docker containers networked together in a m
- Semantic memory updates and retrieval - Semantic memory updates and retrieval
- No external SDK dependencies - fully local - No external SDK dependencies - fully local
### B. VM 101 - lyra-cortex (Reasoning Layer) ### Reasoning Layer
**4. Cortex** (Python/FastAPI) - Port 7081 **4. Cortex** (Python/FastAPI) - Port 7081
- Primary reasoning engine with multi-stage pipeline - Primary reasoning engine with multi-stage pipeline
@@ -47,7 +47,7 @@ Project Lyra operates as a series of Docker containers networked together in a m
3. **Refinement** - Polishes and improves the draft 3. **Refinement** - Polishes and improves the draft
4. **Persona** - Applies Lyra's personality and speaking style 4. **Persona** - Applies Lyra's personality and speaking style
- Integrates with Intake for short-term context - Integrates with Intake for short-term context
- Flexible LLM router supporting multiple backends - Flexible LLM router supporting multiple backends via HTTP
**5. Intake v0.2** (Python/FastAPI) - Port 7080 **5. Intake v0.2** (Python/FastAPI) - Port 7080
- Simplified short-term memory summarization - Simplified short-term memory summarization
@@ -60,14 +60,16 @@ Project Lyra operates as a series of Docker containers networked together in a m
- `GET /summaries?session_id={id}` - Retrieve session summary - `GET /summaries?session_id={id}` - Retrieve session summary
- `POST /close_session/{id}` - Close and cleanup session - `POST /close_session/{id}` - Close and cleanup session
### C. LLM Backends (Remote/Local APIs) ### LLM Backends (HTTP-based)
**Multi-Backend Strategy:** **All LLM communication is done via HTTP APIs:**
- **PRIMARY**: vLLM on AMD MI50 GPU (`http://10.0.0.43:8000`) - Cortex reasoning, Intake - **PRIMARY**: vLLM server (`http://10.0.0.43:8000`) - AMD MI50 GPU backend
- **SECONDARY**: Ollama on RTX 3090 (`http://10.0.0.3:11434`) - Configurable per-module - **SECONDARY**: Ollama server (`http://10.0.0.3:11434`) - RTX 3090 backend
- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cortex persona layer - **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cloud-based models
- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback - **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
Each module can be configured to use a different backend via environment variables.
--- ---
## Data Flow Architecture (v0.5.0) ## Data Flow Architecture (v0.5.0)
@@ -101,22 +103,22 @@ Relay → UI (returns final response)
### Cortex 4-Stage Reasoning Pipeline: ### Cortex 4-Stage Reasoning Pipeline:
1. **Reflection** (`reflection.py`) - Cloud backend (OpenAI) 1. **Reflection** (`reflection.py`) - Configurable LLM via HTTP
- Analyzes user intent and conversation context - Analyzes user intent and conversation context
- Generates meta-awareness notes - Generates meta-awareness notes
- "What is the user really asking?" - "What is the user really asking?"
2. **Reasoning** (`reasoning.py`) - Primary backend (vLLM) 2. **Reasoning** (`reasoning.py`) - Configurable LLM via HTTP
- Retrieves short-term context from Intake - Retrieves short-term context from Intake
- Creates initial draft answer - Creates initial draft answer
- Integrates context, reflection notes, and user prompt - Integrates context, reflection notes, and user prompt
3. **Refinement** (`refine.py`) - Primary backend (vLLM) 3. **Refinement** (`refine.py`) - Configurable LLM via HTTP
- Polishes the draft answer - Polishes the draft answer
- Improves clarity and coherence - Improves clarity and coherence
- Ensures factual consistency - Ensures factual consistency
4. **Persona** (`speak.py`) - Cloud backend (OpenAI) 4. **Persona** (`speak.py`) - Configurable LLM via HTTP
- Applies Lyra's personality and speaking style - Applies Lyra's personality and speaking style
- Natural, conversational output - Natural, conversational output
- Final answer returned to user - Final answer returned to user
@@ -125,7 +127,7 @@ Relay → UI (returns final response)
## Features ## Features
### Lyra-Core (VM 100) ### Core Services
**Relay**: **Relay**:
- Main orchestrator and message router - Main orchestrator and message router
@@ -150,11 +152,11 @@ Relay → UI (returns final response)
- Session save/load functionality - Session save/load functionality
- OpenAI message format support - OpenAI message format support
### Cortex (VM 101) ### Reasoning Layer
**Cortex** (v0.5): **Cortex** (v0.5):
- Multi-stage reasoning pipeline (reflection → reasoning → refine → persona) - Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
- Flexible LLM backend routing - Flexible LLM backend routing via HTTP
- Per-stage backend selection - Per-stage backend selection
- Async processing throughout - Async processing throughout
- IntakeClient integration for short-term context - IntakeClient integration for short-term context
@@ -169,7 +171,7 @@ Relay → UI (returns final response)
- **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30) - **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30)
**LLM Router**: **LLM Router**:
- Dynamic backend selection - Dynamic backend selection via HTTP
- Environment-driven configuration - Environment-driven configuration
- Support for vLLM, Ollama, OpenAI, custom endpoints - Support for vLLM, Ollama, OpenAI, custom endpoints
- Per-module backend preferences - Per-module backend preferences
@@ -220,49 +222,44 @@ Relay → UI (returns final response)
"imported_at": "2025-11-07T03:55:00Z" "imported_at": "2025-11-07T03:55:00Z"
}``` }```
# Cortex VM (VM101, CT201) ---
- **CT201 main reasoning orchestrator.**
- This is the internal brain of Lyra.
- Running in a privellaged LXC.
- Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
- Accessible via 10.0.0.43:8000/v1/completions.
- **Intake v0.1.1 ** ## Docker Deployment
- Recieves messages from relay and summarizes them in a cascading format.
- Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
- Intake then sends to cortex for self reflection, neomem for memory consolidation.
- **Reflect ** All services run in a single docker-compose stack with the following containers:
-TBD
# Self hosted vLLM server # - **neomem-postgres** - PostgreSQL with pgvector extension (port 5432)
- **CT201 main reasoning orchestrator.** - **neomem-neo4j** - Neo4j graph database (ports 7474, 7687)
- This is the internal brain of Lyra. - **neomem-api** - NeoMem memory service (port 7077)
- Running in a privellaged LXC. - **relay** - Main orchestrator (port 7078)
- Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm. - **cortex** - Reasoning engine (port 7081)
- Accessible via 10.0.0.43:8000/v1/completions. - **intake** - Short-term memory summarization (port 7080) - currently disabled
- **Stack Flow** - **rag** - RAG search service (port 7090) - currently disabled
- [Proxmox Host]
└── loads AMDGPU driver
└── boots CT201 (order=2)
[CT201 GPU Container] All containers communicate via the `lyra_net` Docker bridge network.
├── lyra-start-vllm.sh → starts vLLM ROCm model server
├── lyra-vllm.service → runs the above automatically
├── lyra-core.service → launches Cortex + Intake Docker stack
└── Docker Compose → runs Cortex + Intake containers
[Cortex Container] ## External LLM Services
├── Listens on port 7081
├── Talks to NVGRAM (mem API) + Intake
└── Main relay between Lyra UI ↔ memory ↔ model
[Intake Container] The following LLM backends are accessed via HTTP (not part of docker-compose):
├── Listens on port 7080
├── Summarizes every few exchanges
├── Writes summaries to /app/logs/summaries.log
└── Future: sends summaries → Cortex for reflection
- **vLLM Server** (`http://10.0.0.43:8000`)
- AMD MI50 GPU-accelerated inference
- Custom ROCm-enabled vLLM build
- Primary backend for reasoning and refinement stages
- **Ollama Server** (`http://10.0.0.3:11434`)
- RTX 3090 GPU-accelerated inference
- Secondary/configurable backend
- Model: qwen2.5:7b-instruct-q4_K_M
- **OpenAI API** (`https://api.openai.com/v1`)
- Cloud-based inference
- Used for reflection and persona stages
- Model: gpt-4o-mini
- **Fallback Server** (`http://10.0.0.41:11435`)
- Emergency backup endpoint
- Local llama-3.2-8b-instruct model
--- ---
@@ -292,6 +289,7 @@ Relay → UI (returns final response)
### Non-Critical ### Non-Critical
- Session management endpoints not fully implemented in Relay - Session management endpoints not fully implemented in Relay
- Intake service currently disabled in docker-compose.yml
- RAG service currently disabled in docker-compose.yml - RAG service currently disabled in docker-compose.yml
- Cortex `/ingest` endpoint is a stub - Cortex `/ingest` endpoint is a stub
@@ -307,14 +305,19 @@ Relay → UI (returns final response)
### Prerequisites ### Prerequisites
- Docker + Docker Compose - Docker + Docker Compose
- PostgreSQL 13+, Neo4j 4.4+ (for NeoMem) - At least one HTTP-accessible LLM endpoint (vLLM, Ollama, or OpenAI API key)
- At least one LLM API endpoint (vLLM, Ollama, or OpenAI)
### Setup ### Setup
1. Configure environment variables in `.env` files 1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys
2. Start services: `docker-compose up -d` 2. Start all services with docker-compose:
3. Check health: `curl http://localhost:7078/_health` ```bash
4. Access UI: `http://localhost:7078` docker-compose up -d
```
3. Check service health:
```bash
curl http://localhost:7078/_health
```
4. Access the UI at `http://localhost:7078`
### Test ### Test
```bash ```bash
@@ -326,6 +329,8 @@ curl -X POST http://localhost:7078/v1/chat/completions \
}' }'
``` ```
All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack.
--- ---
## Documentation ## Documentation
@@ -345,104 +350,44 @@ NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
--- ---
## 📦 Requirements ## Integration Notes
- Docker + Docker Compose - NeoMem API is compatible with Mem0 OSS endpoints (`/memories`, `/search`)
- Postgres + Neo4j (for NeoMem) - All services communicate via Docker internal networking on the `lyra_net` bridge
- Access to an open AI or ollama style API. - History and entity graphs are managed via PostgreSQL + Neo4j
- OpenAI API key (for Relay fallback LLMs) - LLM backends are accessed via HTTP and configured in `.env`
**Dependencies:**
- fastapi==0.115.8
- uvicorn==0.34.0
- pydantic==2.10.4
- python-dotenv==1.0.1
- psycopg>=3.2.8
- ollama
--- ---
🔌 Integration Notes ## Beta Lyrae - RAG Memory System (Currently Disabled)
Lyra-Core connects to neomem-api:8000 inside Docker or localhost:7077 locally. **Note:** The RAG service is currently disabled in docker-compose.yml
API endpoints remain identical to Mem0 (/memories, /search). ### Requirements
- Python 3.10+
- Dependencies: `chromadb openai tqdm python-dotenv fastapi uvicorn`
- Persistent storage: `./chromadb` or `/mnt/data/lyra_rag_db`
History and entity graphs managed internally via Postgres + Neo4j. ### Setup
1. Import chat logs (must be in OpenAI message format):
```bash
python3 rag/rag_chat_import.py
```
--- 2. Build and start the RAG API server:
```bash
cd rag
python3 rag_build.py
uvicorn rag_api:app --host 0.0.0.0 --port 7090
```
🧱 Architecture Snapshot 3. Query the RAG system:
```bash
User → Relay → Cortex curl -X POST http://127.0.0.1:7090/rag/search \
-H "Content-Type: application/json" \
[RAG Search] -d '{
"query": "What is the current state of Cortex?",
[Reflection Loop] "where": {"category": "lyra"}
}'
Intake (async summaries) ```
NeoMem (persistent memory)
**Cortex v0.4.1 introduces the first fully integrated reasoning loop.**
- Data Flow:
- User message enters Cortex via /reason.
- Cortex assembles context:
- Intake summaries (short-term memory)
- RAG contextual data (knowledge base)
- LLM generates initial draft (call_llm).
- Reflection loop critiques and refines the answer.
- Intake asynchronously summarizes and sends snapshots to NeoMem.
RAG API Configuration:
Set RAG_API_URL in .env (default: http://localhost:7090).
---
## Setup and Operation ##
## Beta Lyrae - RAG memory system ##
**Requirements**
-Env= python 3.10+
-Dependences: pip install chromadb openai tqdm python-dotenv fastapi uvicorn jq
-Persistent storage path: ./chromadb (can be moved to /mnt/data/lyra_rag_db)
**Import Chats**
- Chats need to be formatted into the correct format of
```
"messages": [
{
"role:" "user",
"content": "Message here"
},
"messages": [
{
"role:" "assistant",
"content": "Message here"
},```
- Organize the chats into categorical folders. This step is optional, but it helped me keep it straight.
- run "python3 rag_chat_import.py", chats will then be imported automatically. For reference, it took 32 Minutes to import 68 Chat logs (aprox 10.3MB).
**Build API Server**
- Run: rag_build.py, this automatically builds the chromaDB using data saved in the /chatlogs/ folder. (docs folder to be added in future.)
- Run: rag_api.py or ```uvicorn rag_api:app --host 0.0.0.0 --port 7090```
**Query**
- Run: python3 rag_query.py "Question here?"
- For testing a curl command can reach it too
```
curl -X POST http://127.0.0.1:7090/rag/search \
-H "Content-Type: application/json" \
-d '{
"query": "What is the current state of Cortex and Project Lyra?",
"where": {"category": "lyra"}
}'
```
# Beta Lyrae - RAG System
## 📖 License
NeoMem is a derivative work based on the Mem0 OSS project (Apache 2.0).
This fork retains the original Apache 2.0 license and adds local modifications.
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.

View File

@@ -1,3 +1,6 @@
// relay v0.3.0
// Core relay server for Lyra project
// Handles incoming chat requests and forwards them to Cortex services
import express from "express"; import express from "express";
import dotenv from "dotenv"; import dotenv from "dotenv";
import cors from "cors"; import cors from "cors";
@@ -10,9 +13,8 @@ app.use(express.json());
const PORT = Number(process.env.PORT || 7078); const PORT = Number(process.env.PORT || 7078);
// core endpoints // Cortex endpoints (only these are used now)
const CORTEX_REASON = process.env.CORTEX_REASON_URL || "http://cortex:7081/reason"; const CORTEX_REASON = process.env.CORTEX_REASON_URL || "http://cortex:7081/reason";
const CORTEX_INGEST = process.env.CORTEX_INGEST_URL || "http://cortex:7081/ingest";
// ----------------------------------------------------- // -----------------------------------------------------
// Helper request wrapper // Helper request wrapper
@@ -27,7 +29,6 @@ async function postJSON(url, data) {
const raw = await resp.text(); const raw = await resp.text();
let json; let json;
// Try to parse JSON safely
try { try {
json = raw ? JSON.parse(raw) : null; json = raw ? JSON.parse(raw) : null;
} catch (e) { } catch (e) {
@@ -42,11 +43,12 @@ async function postJSON(url, data) {
} }
// ----------------------------------------------------- // -----------------------------------------------------
// Shared chat handler logic // The unified chat handler
// ----------------------------------------------------- // -----------------------------------------------------
async function handleChatRequest(session_id, user_msg) { async function handleChatRequest(session_id, user_msg) {
// 1. → Cortex.reason: the main pipeline
let reason; let reason;
// 1. → Cortex.reason (main pipeline)
try { try {
reason = await postJSON(CORTEX_REASON, { reason = await postJSON(CORTEX_REASON, {
session_id, session_id,
@@ -57,19 +59,13 @@ async function handleChatRequest(session_id, user_msg) {
throw new Error(`cortex_reason_failed: ${e.message}`); throw new Error(`cortex_reason_failed: ${e.message}`);
} }
const persona = reason.final_output || reason.persona || "(no persona text)"; // Correct persona field
const persona =
reason.persona ||
reason.final_output ||
"(no persona text)";
// 2. → Cortex.ingest (async, non-blocking) // Return final answer
// Cortex might still want this for separate ingestion pipeline.
postJSON(CORTEX_INGEST, {
session_id,
user_msg,
assistant_msg: persona
}).catch(e =>
console.warn("Relay → Cortex.ingest failed:", e.message)
);
// 3. Return corrected result
return { return {
session_id, session_id,
reply: persona reply: persona
@@ -84,7 +80,7 @@ app.get("/_health", (_, res) => {
}); });
// ----------------------------------------------------- // -----------------------------------------------------
// OPENAI-COMPATIBLE ENDPOINT (for UI & clients) // OPENAI-COMPATIBLE ENDPOINT
// ----------------------------------------------------- // -----------------------------------------------------
app.post("/v1/chat/completions", async (req, res) => { app.post("/v1/chat/completions", async (req, res) => {
try { try {
@@ -101,7 +97,7 @@ app.post("/v1/chat/completions", async (req, res) => {
const result = await handleChatRequest(session_id, user_msg); const result = await handleChatRequest(session_id, user_msg);
return res.json({ res.json({
id: `chatcmpl-${Date.now()}`, id: `chatcmpl-${Date.now()}`,
object: "chat.completion", object: "chat.completion",
created: Math.floor(Date.now() / 1000), created: Math.floor(Date.now() / 1000),
@@ -134,7 +130,7 @@ app.post("/v1/chat/completions", async (req, res) => {
}); });
// ----------------------------------------------------- // -----------------------------------------------------
// MAIN ENDPOINT (canonical Lyra UI entrance) // MAIN ENDPOINT (Lyra-native UI)
// ----------------------------------------------------- // -----------------------------------------------------
app.post("/chat", async (req, res) => { app.post("/chat", async (req, res) => {
try { try {
@@ -144,7 +140,7 @@ app.post("/chat", async (req, res) => {
console.log(`Relay → received: "${user_msg}"`); console.log(`Relay → received: "${user_msg}"`);
const result = await handleChatRequest(session_id, user_msg); const result = await handleChatRequest(session_id, user_msg);
return res.json(result); res.json(result);
} catch (err) { } catch (err) {
console.error("Relay fatal:", err); console.error("Relay fatal:", err);

View File

@@ -1,6 +1,8 @@
import os import os
from datetime import datetime from datetime import datetime
from typing import List, Dict, Any, TYPE_CHECKING from typing import List, Dict, Any, TYPE_CHECKING
from collections import deque
if TYPE_CHECKING: if TYPE_CHECKING:
from collections import deque as _deque from collections import deque as _deque

View File

@@ -10,7 +10,6 @@ from reasoning.reflection import reflect_notes
from reasoning.refine import refine_answer from reasoning.refine import refine_answer
from persona.speak import speak from persona.speak import speak
from persona.identity import load_identity from persona.identity import load_identity
from ingest.intake_client import IntakeClient
from context import collect_context, update_last_assistant_message from context import collect_context, update_last_assistant_message
from intake.intake import add_exchange_internal from intake.intake import add_exchange_internal
@@ -50,9 +49,6 @@ if VERBOSE_DEBUG:
# ----------------------------- # -----------------------------
cortex_router = APIRouter() cortex_router = APIRouter()
# Initialize Intake client once
intake_client = IntakeClient()
# ----------------------------- # -----------------------------
# Pydantic models # Pydantic models
@@ -202,11 +198,10 @@ class IngestPayload(BaseModel):
assistant_msg: str assistant_msg: str
@cortex_router.post("/ingest") @cortex_router.post("/ingest")
async def ingest(payload: IngestPayload): async def ingest_stub():
""" # Intake is internal now — this endpoint is only for compatibility.
Relay calls this after /reason. return {"status": "ok", "note": "intake is internal now"}
We update Cortex state AND feed Intake's internal buffer.
"""
# 1. Update Cortex session state # 1. Update Cortex session state
update_last_assistant_message(payload.session_id, payload.assistant_msg) update_last_assistant_message(payload.session_id, payload.assistant_msg)