Update to v0.9.1 #1

Merged
serversdown merged 44 commits from dev into main 2026-01-18 02:46:25 -05:00
4 changed files with 123 additions and 185 deletions
Showing only changes of commit 4acaddfd12 - Show all commits

255
README.md
View File

@@ -2,19 +2,19 @@
Lyra is a modular persistent AI companion system with advanced reasoning capabilities. Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**, It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
with multi-stage reasoning pipeline powered by distributed LLM backends. with multi-stage reasoning pipeline powered by HTTP-based LLM backends.
## Mission Statement ## Mission Statement
The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later. The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
--- ---
## Architecture Overview ## Architecture Overview
Project Lyra operates as a series of Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules: Project Lyra operates as a **single docker-compose deployment** with multiple Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
### A. VM 100 - lyra-core (Core Services) ### Core Services
**1. Relay** (Node.js/Express) - Port 7078 **1. Relay** (Node.js/Express) - Port 7078
- Main orchestrator and message router - Main orchestrator and message router
@@ -26,7 +26,7 @@ Project Lyra operates as a series of Docker containers networked together in a m
**2. UI** (Static HTML) **2. UI** (Static HTML)
- Browser-based chat interface with cyberpunk theme - Browser-based chat interface with cyberpunk theme
- Connects to Relay at `http://10.0.0.40:7078` - Connects to Relay
- Saves and loads sessions - Saves and loads sessions
- OpenAI-compatible message format - OpenAI-compatible message format
@@ -37,7 +37,7 @@ Project Lyra operates as a series of Docker containers networked together in a m
- Semantic memory updates and retrieval - Semantic memory updates and retrieval
- No external SDK dependencies - fully local - No external SDK dependencies - fully local
### B. VM 101 - lyra-cortex (Reasoning Layer) ### Reasoning Layer
**4. Cortex** (Python/FastAPI) - Port 7081 **4. Cortex** (Python/FastAPI) - Port 7081
- Primary reasoning engine with multi-stage pipeline - Primary reasoning engine with multi-stage pipeline
@@ -47,7 +47,7 @@ Project Lyra operates as a series of Docker containers networked together in a m
3. **Refinement** - Polishes and improves the draft 3. **Refinement** - Polishes and improves the draft
4. **Persona** - Applies Lyra's personality and speaking style 4. **Persona** - Applies Lyra's personality and speaking style
- Integrates with Intake for short-term context - Integrates with Intake for short-term context
- Flexible LLM router supporting multiple backends - Flexible LLM router supporting multiple backends via HTTP
**5. Intake v0.2** (Python/FastAPI) - Port 7080 **5. Intake v0.2** (Python/FastAPI) - Port 7080
- Simplified short-term memory summarization - Simplified short-term memory summarization
@@ -60,13 +60,15 @@ Project Lyra operates as a series of Docker containers networked together in a m
- `GET /summaries?session_id={id}` - Retrieve session summary - `GET /summaries?session_id={id}` - Retrieve session summary
- `POST /close_session/{id}` - Close and cleanup session - `POST /close_session/{id}` - Close and cleanup session
### C. LLM Backends (Remote/Local APIs) ### LLM Backends (HTTP-based)
**Multi-Backend Strategy:** **All LLM communication is done via HTTP APIs:**
- **PRIMARY**: vLLM on AMD MI50 GPU (`http://10.0.0.43:8000`) - Cortex reasoning, Intake - **PRIMARY**: vLLM server (`http://10.0.0.43:8000`) - AMD MI50 GPU backend
- **SECONDARY**: Ollama on RTX 3090 (`http://10.0.0.3:11434`) - Configurable per-module - **SECONDARY**: Ollama server (`http://10.0.0.3:11434`) - RTX 3090 backend
- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cortex persona layer - **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cloud-based models
- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback - **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
Each module can be configured to use a different backend via environment variables.
--- ---
@@ -101,22 +103,22 @@ Relay → UI (returns final response)
### Cortex 4-Stage Reasoning Pipeline: ### Cortex 4-Stage Reasoning Pipeline:
1. **Reflection** (`reflection.py`) - Cloud backend (OpenAI) 1. **Reflection** (`reflection.py`) - Configurable LLM via HTTP
- Analyzes user intent and conversation context - Analyzes user intent and conversation context
- Generates meta-awareness notes - Generates meta-awareness notes
- "What is the user really asking?" - "What is the user really asking?"
2. **Reasoning** (`reasoning.py`) - Primary backend (vLLM) 2. **Reasoning** (`reasoning.py`) - Configurable LLM via HTTP
- Retrieves short-term context from Intake - Retrieves short-term context from Intake
- Creates initial draft answer - Creates initial draft answer
- Integrates context, reflection notes, and user prompt - Integrates context, reflection notes, and user prompt
3. **Refinement** (`refine.py`) - Primary backend (vLLM) 3. **Refinement** (`refine.py`) - Configurable LLM via HTTP
- Polishes the draft answer - Polishes the draft answer
- Improves clarity and coherence - Improves clarity and coherence
- Ensures factual consistency - Ensures factual consistency
4. **Persona** (`speak.py`) - Cloud backend (OpenAI) 4. **Persona** (`speak.py`) - Configurable LLM via HTTP
- Applies Lyra's personality and speaking style - Applies Lyra's personality and speaking style
- Natural, conversational output - Natural, conversational output
- Final answer returned to user - Final answer returned to user
@@ -125,7 +127,7 @@ Relay → UI (returns final response)
## Features ## Features
### Lyra-Core (VM 100) ### Core Services
**Relay**: **Relay**:
- Main orchestrator and message router - Main orchestrator and message router
@@ -150,11 +152,11 @@ Relay → UI (returns final response)
- Session save/load functionality - Session save/load functionality
- OpenAI message format support - OpenAI message format support
### Cortex (VM 101) ### Reasoning Layer
**Cortex** (v0.5): **Cortex** (v0.5):
- Multi-stage reasoning pipeline (reflection → reasoning → refine → persona) - Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
- Flexible LLM backend routing - Flexible LLM backend routing via HTTP
- Per-stage backend selection - Per-stage backend selection
- Async processing throughout - Async processing throughout
- IntakeClient integration for short-term context - IntakeClient integration for short-term context
@@ -169,7 +171,7 @@ Relay → UI (returns final response)
- **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30) - **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30)
**LLM Router**: **LLM Router**:
- Dynamic backend selection - Dynamic backend selection via HTTP
- Environment-driven configuration - Environment-driven configuration
- Support for vLLM, Ollama, OpenAI, custom endpoints - Support for vLLM, Ollama, OpenAI, custom endpoints
- Per-module backend preferences - Per-module backend preferences
@@ -220,49 +222,44 @@ Relay → UI (returns final response)
"imported_at": "2025-11-07T03:55:00Z" "imported_at": "2025-11-07T03:55:00Z"
}``` }```
# Cortex VM (VM101, CT201) ---
- **CT201 main reasoning orchestrator.**
- This is the internal brain of Lyra.
- Running in a privellaged LXC.
- Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
- Accessible via 10.0.0.43:8000/v1/completions.
- **Intake v0.1.1 ** ## Docker Deployment
- Recieves messages from relay and summarizes them in a cascading format.
- Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
- Intake then sends to cortex for self reflection, neomem for memory consolidation.
- **Reflect **
-TBD
# Self hosted vLLM server # All services run in a single docker-compose stack with the following containers:
- **CT201 main reasoning orchestrator.**
- This is the internal brain of Lyra.
- Running in a privellaged LXC.
- Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
- Accessible via 10.0.0.43:8000/v1/completions.
- **Stack Flow**
- [Proxmox Host]
└── loads AMDGPU driver
└── boots CT201 (order=2)
[CT201 GPU Container] - **neomem-postgres** - PostgreSQL with pgvector extension (port 5432)
├── lyra-start-vllm.sh → starts vLLM ROCm model server - **neomem-neo4j** - Neo4j graph database (ports 7474, 7687)
├── lyra-vllm.service → runs the above automatically - **neomem-api** - NeoMem memory service (port 7077)
├── lyra-core.service → launches Cortex + Intake Docker stack - **relay** - Main orchestrator (port 7078)
└── Docker Compose → runs Cortex + Intake containers - **cortex** - Reasoning engine (port 7081)
- **intake** - Short-term memory summarization (port 7080) - currently disabled
- **rag** - RAG search service (port 7090) - currently disabled
[Cortex Container] All containers communicate via the `lyra_net` Docker bridge network.
├── Listens on port 7081
├── Talks to NVGRAM (mem API) + Intake
└── Main relay between Lyra UI ↔ memory ↔ model
[Intake Container] ## External LLM Services
├── Listens on port 7080
├── Summarizes every few exchanges
├── Writes summaries to /app/logs/summaries.log
└── Future: sends summaries → Cortex for reflection
The following LLM backends are accessed via HTTP (not part of docker-compose):
- **vLLM Server** (`http://10.0.0.43:8000`)
- AMD MI50 GPU-accelerated inference
- Custom ROCm-enabled vLLM build
- Primary backend for reasoning and refinement stages
- **Ollama Server** (`http://10.0.0.3:11434`)
- RTX 3090 GPU-accelerated inference
- Secondary/configurable backend
- Model: qwen2.5:7b-instruct-q4_K_M
- **OpenAI API** (`https://api.openai.com/v1`)
- Cloud-based inference
- Used for reflection and persona stages
- Model: gpt-4o-mini
- **Fallback Server** (`http://10.0.0.41:11435`)
- Emergency backup endpoint
- Local llama-3.2-8b-instruct model
--- ---
@@ -292,6 +289,7 @@ Relay → UI (returns final response)
### Non-Critical ### Non-Critical
- Session management endpoints not fully implemented in Relay - Session management endpoints not fully implemented in Relay
- Intake service currently disabled in docker-compose.yml
- RAG service currently disabled in docker-compose.yml - RAG service currently disabled in docker-compose.yml
- Cortex `/ingest` endpoint is a stub - Cortex `/ingest` endpoint is a stub
@@ -307,14 +305,19 @@ Relay → UI (returns final response)
### Prerequisites ### Prerequisites
- Docker + Docker Compose - Docker + Docker Compose
- PostgreSQL 13+, Neo4j 4.4+ (for NeoMem) - At least one HTTP-accessible LLM endpoint (vLLM, Ollama, or OpenAI API key)
- At least one LLM API endpoint (vLLM, Ollama, or OpenAI)
### Setup ### Setup
1. Configure environment variables in `.env` files 1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys
2. Start services: `docker-compose up -d` 2. Start all services with docker-compose:
3. Check health: `curl http://localhost:7078/_health` ```bash
4. Access UI: `http://localhost:7078` docker-compose up -d
```
3. Check service health:
```bash
curl http://localhost:7078/_health
```
4. Access the UI at `http://localhost:7078`
### Test ### Test
```bash ```bash
@@ -326,6 +329,8 @@ curl -X POST http://localhost:7078/v1/chat/completions \
}' }'
``` ```
All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack.
--- ---
## Documentation ## Documentation
@@ -345,104 +350,44 @@ NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
--- ---
## 📦 Requirements ## Integration Notes
- Docker + Docker Compose - NeoMem API is compatible with Mem0 OSS endpoints (`/memories`, `/search`)
- Postgres + Neo4j (for NeoMem) - All services communicate via Docker internal networking on the `lyra_net` bridge
- Access to an open AI or ollama style API. - History and entity graphs are managed via PostgreSQL + Neo4j
- OpenAI API key (for Relay fallback LLMs) - LLM backends are accessed via HTTP and configured in `.env`
**Dependencies:**
- fastapi==0.115.8
- uvicorn==0.34.0
- pydantic==2.10.4
- python-dotenv==1.0.1
- psycopg>=3.2.8
- ollama
--- ---
🔌 Integration Notes ## Beta Lyrae - RAG Memory System (Currently Disabled)
Lyra-Core connects to neomem-api:8000 inside Docker or localhost:7077 locally. **Note:** The RAG service is currently disabled in docker-compose.yml
API endpoints remain identical to Mem0 (/memories, /search). ### Requirements
- Python 3.10+
- Dependencies: `chromadb openai tqdm python-dotenv fastapi uvicorn`
- Persistent storage: `./chromadb` or `/mnt/data/lyra_rag_db`
History and entity graphs managed internally via Postgres + Neo4j. ### Setup
1. Import chat logs (must be in OpenAI message format):
```bash
python3 rag/rag_chat_import.py
```
--- 2. Build and start the RAG API server:
```bash
cd rag
python3 rag_build.py
uvicorn rag_api:app --host 0.0.0.0 --port 7090
```
🧱 Architecture Snapshot 3. Query the RAG system:
```bash
User → Relay → Cortex curl -X POST http://127.0.0.1:7090/rag/search \
-H "Content-Type: application/json" \
[RAG Search] -d '{
"query": "What is the current state of Cortex?",
[Reflection Loop] "where": {"category": "lyra"}
}'
Intake (async summaries) ```
NeoMem (persistent memory)
**Cortex v0.4.1 introduces the first fully integrated reasoning loop.**
- Data Flow:
- User message enters Cortex via /reason.
- Cortex assembles context:
- Intake summaries (short-term memory)
- RAG contextual data (knowledge base)
- LLM generates initial draft (call_llm).
- Reflection loop critiques and refines the answer.
- Intake asynchronously summarizes and sends snapshots to NeoMem.
RAG API Configuration:
Set RAG_API_URL in .env (default: http://localhost:7090).
---
## Setup and Operation ##
## Beta Lyrae - RAG memory system ##
**Requirements**
-Env= python 3.10+
-Dependences: pip install chromadb openai tqdm python-dotenv fastapi uvicorn jq
-Persistent storage path: ./chromadb (can be moved to /mnt/data/lyra_rag_db)
**Import Chats**
- Chats need to be formatted into the correct format of
```
"messages": [
{
"role:" "user",
"content": "Message here"
},
"messages": [
{
"role:" "assistant",
"content": "Message here"
},```
- Organize the chats into categorical folders. This step is optional, but it helped me keep it straight.
- run "python3 rag_chat_import.py", chats will then be imported automatically. For reference, it took 32 Minutes to import 68 Chat logs (aprox 10.3MB).
**Build API Server**
- Run: rag_build.py, this automatically builds the chromaDB using data saved in the /chatlogs/ folder. (docs folder to be added in future.)
- Run: rag_api.py or ```uvicorn rag_api:app --host 0.0.0.0 --port 7090```
**Query**
- Run: python3 rag_query.py "Question here?"
- For testing a curl command can reach it too
```
curl -X POST http://127.0.0.1:7090/rag/search \
-H "Content-Type: application/json" \
-d '{
"query": "What is the current state of Cortex and Project Lyra?",
"where": {"category": "lyra"}
}'
```
# Beta Lyrae - RAG System
## 📖 License
NeoMem is a derivative work based on the Mem0 OSS project (Apache 2.0).
This fork retains the original Apache 2.0 license and adds local modifications.
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.

View File

@@ -1,3 +1,6 @@
// relay v0.3.0
// Core relay server for Lyra project
// Handles incoming chat requests and forwards them to Cortex services
import express from "express"; import express from "express";
import dotenv from "dotenv"; import dotenv from "dotenv";
import cors from "cors"; import cors from "cors";
@@ -10,9 +13,8 @@ app.use(express.json());
const PORT = Number(process.env.PORT || 7078); const PORT = Number(process.env.PORT || 7078);
// core endpoints // Cortex endpoints (only these are used now)
const CORTEX_REASON = process.env.CORTEX_REASON_URL || "http://cortex:7081/reason"; const CORTEX_REASON = process.env.CORTEX_REASON_URL || "http://cortex:7081/reason";
const CORTEX_INGEST = process.env.CORTEX_INGEST_URL || "http://cortex:7081/ingest";
// ----------------------------------------------------- // -----------------------------------------------------
// Helper request wrapper // Helper request wrapper
@@ -27,7 +29,6 @@ async function postJSON(url, data) {
const raw = await resp.text(); const raw = await resp.text();
let json; let json;
// Try to parse JSON safely
try { try {
json = raw ? JSON.parse(raw) : null; json = raw ? JSON.parse(raw) : null;
} catch (e) { } catch (e) {
@@ -42,11 +43,12 @@ async function postJSON(url, data) {
} }
// ----------------------------------------------------- // -----------------------------------------------------
// Shared chat handler logic // The unified chat handler
// ----------------------------------------------------- // -----------------------------------------------------
async function handleChatRequest(session_id, user_msg) { async function handleChatRequest(session_id, user_msg) {
// 1. → Cortex.reason: the main pipeline
let reason; let reason;
// 1. → Cortex.reason (main pipeline)
try { try {
reason = await postJSON(CORTEX_REASON, { reason = await postJSON(CORTEX_REASON, {
session_id, session_id,
@@ -57,19 +59,13 @@ async function handleChatRequest(session_id, user_msg) {
throw new Error(`cortex_reason_failed: ${e.message}`); throw new Error(`cortex_reason_failed: ${e.message}`);
} }
const persona = reason.final_output || reason.persona || "(no persona text)"; // Correct persona field
const persona =
reason.persona ||
reason.final_output ||
"(no persona text)";
// 2. → Cortex.ingest (async, non-blocking) // Return final answer
// Cortex might still want this for separate ingestion pipeline.
postJSON(CORTEX_INGEST, {
session_id,
user_msg,
assistant_msg: persona
}).catch(e =>
console.warn("Relay → Cortex.ingest failed:", e.message)
);
// 3. Return corrected result
return { return {
session_id, session_id,
reply: persona reply: persona
@@ -84,7 +80,7 @@ app.get("/_health", (_, res) => {
}); });
// ----------------------------------------------------- // -----------------------------------------------------
// OPENAI-COMPATIBLE ENDPOINT (for UI & clients) // OPENAI-COMPATIBLE ENDPOINT
// ----------------------------------------------------- // -----------------------------------------------------
app.post("/v1/chat/completions", async (req, res) => { app.post("/v1/chat/completions", async (req, res) => {
try { try {
@@ -101,7 +97,7 @@ app.post("/v1/chat/completions", async (req, res) => {
const result = await handleChatRequest(session_id, user_msg); const result = await handleChatRequest(session_id, user_msg);
return res.json({ res.json({
id: `chatcmpl-${Date.now()}`, id: `chatcmpl-${Date.now()}`,
object: "chat.completion", object: "chat.completion",
created: Math.floor(Date.now() / 1000), created: Math.floor(Date.now() / 1000),
@@ -134,7 +130,7 @@ app.post("/v1/chat/completions", async (req, res) => {
}); });
// ----------------------------------------------------- // -----------------------------------------------------
// MAIN ENDPOINT (canonical Lyra UI entrance) // MAIN ENDPOINT (Lyra-native UI)
// ----------------------------------------------------- // -----------------------------------------------------
app.post("/chat", async (req, res) => { app.post("/chat", async (req, res) => {
try { try {
@@ -144,7 +140,7 @@ app.post("/chat", async (req, res) => {
console.log(`Relay → received: "${user_msg}"`); console.log(`Relay → received: "${user_msg}"`);
const result = await handleChatRequest(session_id, user_msg); const result = await handleChatRequest(session_id, user_msg);
return res.json(result); res.json(result);
} catch (err) { } catch (err) {
console.error("Relay fatal:", err); console.error("Relay fatal:", err);

View File

@@ -1,6 +1,8 @@
import os import os
from datetime import datetime from datetime import datetime
from typing import List, Dict, Any, TYPE_CHECKING from typing import List, Dict, Any, TYPE_CHECKING
from collections import deque
if TYPE_CHECKING: if TYPE_CHECKING:
from collections import deque as _deque from collections import deque as _deque

View File

@@ -10,7 +10,6 @@ from reasoning.reflection import reflect_notes
from reasoning.refine import refine_answer from reasoning.refine import refine_answer
from persona.speak import speak from persona.speak import speak
from persona.identity import load_identity from persona.identity import load_identity
from ingest.intake_client import IntakeClient
from context import collect_context, update_last_assistant_message from context import collect_context, update_last_assistant_message
from intake.intake import add_exchange_internal from intake.intake import add_exchange_internal
@@ -50,9 +49,6 @@ if VERBOSE_DEBUG:
# ----------------------------- # -----------------------------
cortex_router = APIRouter() cortex_router = APIRouter()
# Initialize Intake client once
intake_client = IntakeClient()
# ----------------------------- # -----------------------------
# Pydantic models # Pydantic models
@@ -202,11 +198,10 @@ class IngestPayload(BaseModel):
assistant_msg: str assistant_msg: str
@cortex_router.post("/ingest") @cortex_router.post("/ingest")
async def ingest(payload: IngestPayload): async def ingest_stub():
""" # Intake is internal now — this endpoint is only for compatibility.
Relay calls this after /reason. return {"status": "ok", "note": "intake is internal now"}
We update Cortex state AND feed Intake's internal buffer.
"""
# 1. Update Cortex session state # 1. Update Cortex session state
update_last_assistant_message(payload.session_id, payload.assistant_msg) update_last_assistant_message(payload.session_id, payload.assistant_msg)