449 lines
14 KiB
Markdown
449 lines
14 KiB
Markdown
# Project Lyra - README v0.5.0
|
|
|
|
Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
|
|
It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
|
|
with multi-stage reasoning pipeline powered by distributed LLM backends.
|
|
|
|
## Mission Statement
|
|
|
|
The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
Project Lyra operates as a series of Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
|
|
|
|
### A. VM 100 - lyra-core (Core Services)
|
|
|
|
**1. Relay** (Node.js/Express) - Port 7078
|
|
- Main orchestrator and message router
|
|
- Coordinates all module interactions
|
|
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
|
|
- Internal endpoint: `POST /chat`
|
|
- Routes messages through Cortex reasoning pipeline
|
|
- Manages async calls to Intake and NeoMem
|
|
|
|
**2. UI** (Static HTML)
|
|
- Browser-based chat interface with cyberpunk theme
|
|
- Connects to Relay at `http://10.0.0.40:7078`
|
|
- Saves and loads sessions
|
|
- OpenAI-compatible message format
|
|
|
|
**3. NeoMem** (Python/FastAPI) - Port 7077
|
|
- Long-term memory database (fork of Mem0 OSS)
|
|
- Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
|
|
- RESTful API: `/memories`, `/search`
|
|
- Semantic memory updates and retrieval
|
|
- No external SDK dependencies - fully local
|
|
|
|
### B. VM 101 - lyra-cortex (Reasoning Layer)
|
|
|
|
**4. Cortex** (Python/FastAPI) - Port 7081
|
|
- Primary reasoning engine with multi-stage pipeline
|
|
- **4-Stage Processing:**
|
|
1. **Reflection** - Generates meta-awareness notes about conversation
|
|
2. **Reasoning** - Creates initial draft answer using context
|
|
3. **Refinement** - Polishes and improves the draft
|
|
4. **Persona** - Applies Lyra's personality and speaking style
|
|
- Integrates with Intake for short-term context
|
|
- Flexible LLM router supporting multiple backends
|
|
|
|
**5. Intake v0.2** (Python/FastAPI) - Port 7080
|
|
- Simplified short-term memory summarization
|
|
- Session-based circular buffer (deque, maxlen=200)
|
|
- Single-level simple summarization (no cascading)
|
|
- Background async processing with FastAPI BackgroundTasks
|
|
- Pushes summaries to NeoMem automatically
|
|
- **API Endpoints:**
|
|
- `POST /add_exchange` - Add conversation exchange
|
|
- `GET /summaries?session_id={id}` - Retrieve session summary
|
|
- `POST /close_session/{id}` - Close and cleanup session
|
|
|
|
### C. LLM Backends (Remote/Local APIs)
|
|
|
|
**Multi-Backend Strategy:**
|
|
- **PRIMARY**: vLLM on AMD MI50 GPU (`http://10.0.0.43:8000`) - Cortex reasoning, Intake
|
|
- **SECONDARY**: Ollama on RTX 3090 (`http://10.0.0.3:11434`) - Configurable per-module
|
|
- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cortex persona layer
|
|
- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
|
|
|
|
---
|
|
|
|
## Data Flow Architecture (v0.5.0)
|
|
|
|
### Normal Message Flow:
|
|
|
|
```
|
|
User (UI) → POST /v1/chat/completions
|
|
↓
|
|
Relay (7078)
|
|
↓ POST /reason
|
|
Cortex (7081)
|
|
↓ GET /summaries?session_id=xxx
|
|
Intake (7080) [RETURNS SUMMARY]
|
|
↓
|
|
Cortex processes (4 stages):
|
|
1. reflection.py → meta-awareness notes
|
|
2. reasoning.py → draft answer (uses LLM)
|
|
3. refine.py → refined answer (uses LLM)
|
|
4. persona/speak.py → Lyra personality (uses LLM)
|
|
↓
|
|
Returns persona answer to Relay
|
|
↓
|
|
Relay → Cortex /ingest (async, stub)
|
|
Relay → Intake /add_exchange (async)
|
|
↓
|
|
Intake → Background summarize → NeoMem
|
|
↓
|
|
Relay → UI (returns final response)
|
|
```
|
|
|
|
### Cortex 4-Stage Reasoning Pipeline:
|
|
|
|
1. **Reflection** (`reflection.py`) - Cloud backend (OpenAI)
|
|
- Analyzes user intent and conversation context
|
|
- Generates meta-awareness notes
|
|
- "What is the user really asking?"
|
|
|
|
2. **Reasoning** (`reasoning.py`) - Primary backend (vLLM)
|
|
- Retrieves short-term context from Intake
|
|
- Creates initial draft answer
|
|
- Integrates context, reflection notes, and user prompt
|
|
|
|
3. **Refinement** (`refine.py`) - Primary backend (vLLM)
|
|
- Polishes the draft answer
|
|
- Improves clarity and coherence
|
|
- Ensures factual consistency
|
|
|
|
4. **Persona** (`speak.py`) - Cloud backend (OpenAI)
|
|
- Applies Lyra's personality and speaking style
|
|
- Natural, conversational output
|
|
- Final answer returned to user
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
### Lyra-Core (VM 100)
|
|
|
|
**Relay**:
|
|
- Main orchestrator and message router
|
|
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
|
|
- Internal endpoint: `POST /chat`
|
|
- Health check: `GET /_health`
|
|
- Async non-blocking calls to Cortex and Intake
|
|
- Shared request handler for code reuse
|
|
- Comprehensive error handling
|
|
|
|
**NeoMem (Memory Engine)**:
|
|
- Forked from Mem0 OSS - fully independent
|
|
- Drop-in compatible API (`/memories`, `/search`)
|
|
- Local-first: runs on FastAPI with Postgres + Neo4j
|
|
- No external SDK dependencies
|
|
- Semantic memory updates - compares embeddings and performs in-place updates
|
|
- Default service: `neomem-api` (port 7077)
|
|
|
|
**UI**:
|
|
- Lightweight static HTML chat interface
|
|
- Cyberpunk theme
|
|
- Session save/load functionality
|
|
- OpenAI message format support
|
|
|
|
### Cortex (VM 101)
|
|
|
|
**Cortex** (v0.5):
|
|
- Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
|
|
- Flexible LLM backend routing
|
|
- Per-stage backend selection
|
|
- Async processing throughout
|
|
- IntakeClient integration for short-term context
|
|
- `/reason`, `/ingest` (stub), `/health` endpoints
|
|
|
|
**Intake** (v0.2):
|
|
- Simplified single-level summarization
|
|
- Session-based circular buffer (200 exchanges max)
|
|
- Background async summarization
|
|
- Automatic NeoMem push
|
|
- No persistent log files (memory-only)
|
|
- **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30)
|
|
|
|
**LLM Router**:
|
|
- Dynamic backend selection
|
|
- Environment-driven configuration
|
|
- Support for vLLM, Ollama, OpenAI, custom endpoints
|
|
- Per-module backend preferences
|
|
|
|
# Beta Lyrae (RAG Memory DB) - added 11-3-25
|
|
- **RAG Knowledge DB - Beta Lyrae (sheliak)**
|
|
- This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra.
|
|
- It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation.
|
|
The system uses:
|
|
- **ChromaDB** for persistent vector storage
|
|
- **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity
|
|
- **FastAPI** (port 7090) for the `/rag/search` REST endpoint
|
|
- Directory Layout
|
|
rag/
|
|
├── rag_chat_import.py # imports JSON chat logs
|
|
├── rag_docs_import.py # (planned) PDF/EPUB/manual importer
|
|
├── rag_build.py # legacy single-folder builder
|
|
├── rag_query.py # command-line query helper
|
|
├── rag_api.py # FastAPI service providing /rag/search
|
|
├── chromadb/ # persistent vector store
|
|
├── chatlogs/ # organized source data
|
|
│ ├── poker/
|
|
│ ├── work/
|
|
│ ├── lyra/
|
|
│ ├── personal/
|
|
│ └── ...
|
|
└── import.log # progress log for batch runs
|
|
- **OpenAI chatlog importer.
|
|
- Takes JSON formatted chat logs and imports it to the RAG.
|
|
- **fetures include:**
|
|
- Recursive folder indexing with **category detection** from directory name
|
|
- Smart chunking for long messages (5 000 chars per slice)
|
|
- Automatic deduplication using SHA-1 hash of file + chunk
|
|
- Timestamps for both file modification and import time
|
|
- Full progress logging via tqdm
|
|
- Safe to run in background with nohup … &
|
|
- Metadata per chunk:
|
|
```json
|
|
{
|
|
"chat_id": "<sha1 of filename>",
|
|
"chunk_index": 0,
|
|
"source": "chatlogs/lyra/0002_cortex_LLMs_11-1-25.json",
|
|
"title": "cortex LLMs 11-1-25",
|
|
"role": "assistant",
|
|
"category": "lyra",
|
|
"type": "chat",
|
|
"file_modified": "2025-11-06T23:41:02",
|
|
"imported_at": "2025-11-07T03:55:00Z"
|
|
}```
|
|
|
|
# Cortex VM (VM101, CT201)
|
|
- **CT201 main reasoning orchestrator.**
|
|
- This is the internal brain of Lyra.
|
|
- Running in a privellaged LXC.
|
|
- Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
|
|
- Accessible via 10.0.0.43:8000/v1/completions.
|
|
|
|
- **Intake v0.1.1 **
|
|
- Recieves messages from relay and summarizes them in a cascading format.
|
|
- Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
|
|
- Intake then sends to cortex for self reflection, neomem for memory consolidation.
|
|
|
|
- **Reflect **
|
|
-TBD
|
|
|
|
# Self hosted vLLM server #
|
|
- **CT201 main reasoning orchestrator.**
|
|
- This is the internal brain of Lyra.
|
|
- Running in a privellaged LXC.
|
|
- Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
|
|
- Accessible via 10.0.0.43:8000/v1/completions.
|
|
- **Stack Flow**
|
|
- [Proxmox Host]
|
|
└── loads AMDGPU driver
|
|
└── boots CT201 (order=2)
|
|
|
|
[CT201 GPU Container]
|
|
├── lyra-start-vllm.sh → starts vLLM ROCm model server
|
|
├── lyra-vllm.service → runs the above automatically
|
|
├── lyra-core.service → launches Cortex + Intake Docker stack
|
|
└── Docker Compose → runs Cortex + Intake containers
|
|
|
|
[Cortex Container]
|
|
├── Listens on port 7081
|
|
├── Talks to NVGRAM (mem API) + Intake
|
|
└── Main relay between Lyra UI ↔ memory ↔ model
|
|
|
|
[Intake Container]
|
|
├── Listens on port 7080
|
|
├── Summarizes every few exchanges
|
|
├── Writes summaries to /app/logs/summaries.log
|
|
└── Future: sends summaries → Cortex for reflection
|
|
|
|
|
|
---
|
|
|
|
## Version History
|
|
|
|
### v0.5.0 (2025-11-28) - Current Release
|
|
- ✅ Fixed all critical API wiring issues
|
|
- ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
|
|
- ✅ Fixed Cortex → Intake integration
|
|
- ✅ Added missing Python package `__init__.py` files
|
|
- ✅ End-to-end message flow verified and working
|
|
|
|
### v0.4.x (Major Rewire)
|
|
- Cortex multi-stage reasoning pipeline
|
|
- Intake v0.2 simplification
|
|
- LLM router with multi-backend support
|
|
- Major architectural restructuring
|
|
|
|
### v0.3.x
|
|
- Beta Lyrae RAG system
|
|
- NeoMem integration
|
|
- Basic Cortex reasoning loop
|
|
|
|
---
|
|
|
|
## Known Issues (v0.5.0)
|
|
|
|
### Non-Critical
|
|
- Session management endpoints not fully implemented in Relay
|
|
- RAG service currently disabled in docker-compose.yml
|
|
- Cortex `/ingest` endpoint is a stub
|
|
|
|
### Future Enhancements
|
|
- Re-enable RAG service integration
|
|
- Implement full session persistence
|
|
- Add request correlation IDs for tracing
|
|
- Comprehensive health checks
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
- Docker + Docker Compose
|
|
- PostgreSQL 13+, Neo4j 4.4+ (for NeoMem)
|
|
- At least one LLM API endpoint (vLLM, Ollama, or OpenAI)
|
|
|
|
### Setup
|
|
1. Configure environment variables in `.env` files
|
|
2. Start services: `docker-compose up -d`
|
|
3. Check health: `curl http://localhost:7078/_health`
|
|
4. Access UI: `http://localhost:7078`
|
|
|
|
### Test
|
|
```bash
|
|
curl -X POST http://localhost:7078/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"messages": [{"role": "user", "content": "Hello Lyra!"}],
|
|
"session_id": "test"
|
|
}'
|
|
```
|
|
|
|
---
|
|
|
|
## Documentation
|
|
|
|
- See [CHANGELOG.md](CHANGELOG.md) for detailed version history
|
|
- See `ENVIRONMENT_VARIABLES.md` for environment variable reference
|
|
- Additional information available in the Trilium docs
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
|
|
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.
|
|
|
|
**Built with Claude Code**
|
|
|
|
---
|
|
|
|
## 📦 Requirements
|
|
|
|
- Docker + Docker Compose
|
|
- Postgres + Neo4j (for NeoMem)
|
|
- Access to an open AI or ollama style API.
|
|
- OpenAI API key (for Relay fallback LLMs)
|
|
|
|
**Dependencies:**
|
|
- fastapi==0.115.8
|
|
- uvicorn==0.34.0
|
|
- pydantic==2.10.4
|
|
- python-dotenv==1.0.1
|
|
- psycopg>=3.2.8
|
|
- ollama
|
|
|
|
---
|
|
|
|
🔌 Integration Notes
|
|
|
|
Lyra-Core connects to neomem-api:8000 inside Docker or localhost:7077 locally.
|
|
|
|
API endpoints remain identical to Mem0 (/memories, /search).
|
|
|
|
History and entity graphs managed internally via Postgres + Neo4j.
|
|
|
|
---
|
|
|
|
🧱 Architecture Snapshot
|
|
|
|
User → Relay → Cortex
|
|
↓
|
|
[RAG Search]
|
|
↓
|
|
[Reflection Loop]
|
|
↓
|
|
Intake (async summaries)
|
|
↓
|
|
NeoMem (persistent memory)
|
|
|
|
**Cortex v0.4.1 introduces the first fully integrated reasoning loop.**
|
|
- Data Flow:
|
|
- User message enters Cortex via /reason.
|
|
- Cortex assembles context:
|
|
- Intake summaries (short-term memory)
|
|
- RAG contextual data (knowledge base)
|
|
- LLM generates initial draft (call_llm).
|
|
- Reflection loop critiques and refines the answer.
|
|
- Intake asynchronously summarizes and sends snapshots to NeoMem.
|
|
|
|
RAG API Configuration:
|
|
Set RAG_API_URL in .env (default: http://localhost:7090).
|
|
|
|
---
|
|
|
|
## Setup and Operation ##
|
|
|
|
## Beta Lyrae - RAG memory system ##
|
|
**Requirements**
|
|
-Env= python 3.10+
|
|
-Dependences: pip install chromadb openai tqdm python-dotenv fastapi uvicorn jq
|
|
-Persistent storage path: ./chromadb (can be moved to /mnt/data/lyra_rag_db)
|
|
|
|
**Import Chats**
|
|
- Chats need to be formatted into the correct format of
|
|
```
|
|
"messages": [
|
|
{
|
|
"role:" "user",
|
|
"content": "Message here"
|
|
},
|
|
"messages": [
|
|
{
|
|
"role:" "assistant",
|
|
"content": "Message here"
|
|
},```
|
|
- Organize the chats into categorical folders. This step is optional, but it helped me keep it straight.
|
|
- run "python3 rag_chat_import.py", chats will then be imported automatically. For reference, it took 32 Minutes to import 68 Chat logs (aprox 10.3MB).
|
|
|
|
**Build API Server**
|
|
- Run: rag_build.py, this automatically builds the chromaDB using data saved in the /chatlogs/ folder. (docs folder to be added in future.)
|
|
- Run: rag_api.py or ```uvicorn rag_api:app --host 0.0.0.0 --port 7090```
|
|
|
|
**Query**
|
|
- Run: python3 rag_query.py "Question here?"
|
|
- For testing a curl command can reach it too
|
|
```
|
|
curl -X POST http://127.0.0.1:7090/rag/search \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"query": "What is the current state of Cortex and Project Lyra?",
|
|
"where": {"category": "lyra"}
|
|
}'
|
|
```
|
|
|
|
# Beta Lyrae - RAG System
|
|
|
|
## 📖 License
|
|
NeoMem is a derivative work based on the Mem0 OSS project (Apache 2.0).
|
|
This fork retains the original Apache 2.0 license and adds local modifications.
|
|
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.
|
|
|