Files
project-lyra/README.md
2025-12-06 04:32:42 -05:00

394 lines
12 KiB
Markdown

# Project Lyra - README v0.5.0
Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
with multi-stage reasoning pipeline powered by HTTP-based LLM backends.
## Mission Statement
The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
---
## Architecture Overview
Project Lyra operates as a **single docker-compose deployment** with multiple Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
### Core Services
**1. Relay** (Node.js/Express) - Port 7078
- Main orchestrator and message router
- Coordinates all module interactions
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
- Internal endpoint: `POST /chat`
- Routes messages through Cortex reasoning pipeline
- Manages async calls to Intake and NeoMem
**2. UI** (Static HTML)
- Browser-based chat interface with cyberpunk theme
- Connects to Relay
- Saves and loads sessions
- OpenAI-compatible message format
**3. NeoMem** (Python/FastAPI) - Port 7077
- Long-term memory database (fork of Mem0 OSS)
- Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
- RESTful API: `/memories`, `/search`
- Semantic memory updates and retrieval
- No external SDK dependencies - fully local
### Reasoning Layer
**4. Cortex** (Python/FastAPI) - Port 7081
- Primary reasoning engine with multi-stage pipeline
- **4-Stage Processing:**
1. **Reflection** - Generates meta-awareness notes about conversation
2. **Reasoning** - Creates initial draft answer using context
3. **Refinement** - Polishes and improves the draft
4. **Persona** - Applies Lyra's personality and speaking style
- Integrates with Intake for short-term context
- Flexible LLM router supporting multiple backends via HTTP
**5. Intake v0.2** (Python/FastAPI) - Port 7080
- Simplified short-term memory summarization
- Session-based circular buffer (deque, maxlen=200)
- Single-level simple summarization (no cascading)
- Background async processing with FastAPI BackgroundTasks
- Pushes summaries to NeoMem automatically
- **API Endpoints:**
- `POST /add_exchange` - Add conversation exchange
- `GET /summaries?session_id={id}` - Retrieve session summary
- `POST /close_session/{id}` - Close and cleanup session
### LLM Backends (HTTP-based)
**All LLM communication is done via HTTP APIs:**
- **PRIMARY**: vLLM server (`http://10.0.0.43:8000`) - AMD MI50 GPU backend
- **SECONDARY**: Ollama server (`http://10.0.0.3:11434`) - RTX 3090 backend
- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cloud-based models
- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
Each module can be configured to use a different backend via environment variables.
---
## Data Flow Architecture (v0.5.0)
### Normal Message Flow:
```
User (UI) → POST /v1/chat/completions
Relay (7078)
↓ POST /reason
Cortex (7081)
↓ GET /summaries?session_id=xxx
Intake (7080) [RETURNS SUMMARY]
Cortex processes (4 stages):
1. reflection.py → meta-awareness notes
2. reasoning.py → draft answer (uses LLM)
3. refine.py → refined answer (uses LLM)
4. persona/speak.py → Lyra personality (uses LLM)
Returns persona answer to Relay
Relay → Cortex /ingest (async, stub)
Relay → Intake /add_exchange (async)
Intake → Background summarize → NeoMem
Relay → UI (returns final response)
```
### Cortex 4-Stage Reasoning Pipeline:
1. **Reflection** (`reflection.py`) - Configurable LLM via HTTP
- Analyzes user intent and conversation context
- Generates meta-awareness notes
- "What is the user really asking?"
2. **Reasoning** (`reasoning.py`) - Configurable LLM via HTTP
- Retrieves short-term context from Intake
- Creates initial draft answer
- Integrates context, reflection notes, and user prompt
3. **Refinement** (`refine.py`) - Configurable LLM via HTTP
- Polishes the draft answer
- Improves clarity and coherence
- Ensures factual consistency
4. **Persona** (`speak.py`) - Configurable LLM via HTTP
- Applies Lyra's personality and speaking style
- Natural, conversational output
- Final answer returned to user
---
## Features
### Core Services
**Relay**:
- Main orchestrator and message router
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
- Internal endpoint: `POST /chat`
- Health check: `GET /_health`
- Async non-blocking calls to Cortex and Intake
- Shared request handler for code reuse
- Comprehensive error handling
**NeoMem (Memory Engine)**:
- Forked from Mem0 OSS - fully independent
- Drop-in compatible API (`/memories`, `/search`)
- Local-first: runs on FastAPI with Postgres + Neo4j
- No external SDK dependencies
- Semantic memory updates - compares embeddings and performs in-place updates
- Default service: `neomem-api` (port 7077)
**UI**:
- Lightweight static HTML chat interface
- Cyberpunk theme
- Session save/load functionality
- OpenAI message format support
### Reasoning Layer
**Cortex** (v0.5):
- Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
- Flexible LLM backend routing via HTTP
- Per-stage backend selection
- Async processing throughout
- IntakeClient integration for short-term context
- `/reason`, `/ingest` (stub), `/health` endpoints
**Intake** (v0.2):
- Simplified single-level summarization
- Session-based circular buffer (200 exchanges max)
- Background async summarization
- Automatic NeoMem push
- No persistent log files (memory-only)
- **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30)
**LLM Router**:
- Dynamic backend selection via HTTP
- Environment-driven configuration
- Support for vLLM, Ollama, OpenAI, custom endpoints
- Per-module backend preferences
# Beta Lyrae (RAG Memory DB) - added 11-3-25
- **RAG Knowledge DB - Beta Lyrae (sheliak)**
- This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra.
- It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation.
The system uses:
- **ChromaDB** for persistent vector storage
- **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity
- **FastAPI** (port 7090) for the `/rag/search` REST endpoint
- Directory Layout
rag/
├── rag_chat_import.py # imports JSON chat logs
├── rag_docs_import.py # (planned) PDF/EPUB/manual importer
├── rag_build.py # legacy single-folder builder
├── rag_query.py # command-line query helper
├── rag_api.py # FastAPI service providing /rag/search
├── chromadb/ # persistent vector store
├── chatlogs/ # organized source data
│ ├── poker/
│ ├── work/
│ ├── lyra/
│ ├── personal/
│ └── ...
└── import.log # progress log for batch runs
- **OpenAI chatlog importer.
- Takes JSON formatted chat logs and imports it to the RAG.
- **fetures include:**
- Recursive folder indexing with **category detection** from directory name
- Smart chunking for long messages (5 000 chars per slice)
- Automatic deduplication using SHA-1 hash of file + chunk
- Timestamps for both file modification and import time
- Full progress logging via tqdm
- Safe to run in background with nohup … &
- Metadata per chunk:
```json
{
"chat_id": "<sha1 of filename>",
"chunk_index": 0,
"source": "chatlogs/lyra/0002_cortex_LLMs_11-1-25.json",
"title": "cortex LLMs 11-1-25",
"role": "assistant",
"category": "lyra",
"type": "chat",
"file_modified": "2025-11-06T23:41:02",
"imported_at": "2025-11-07T03:55:00Z"
}```
---
## Docker Deployment
All services run in a single docker-compose stack with the following containers:
- **neomem-postgres** - PostgreSQL with pgvector extension (port 5432)
- **neomem-neo4j** - Neo4j graph database (ports 7474, 7687)
- **neomem-api** - NeoMem memory service (port 7077)
- **relay** - Main orchestrator (port 7078)
- **cortex** - Reasoning engine (port 7081)
- **intake** - Short-term memory summarization (port 7080) - currently disabled
- **rag** - RAG search service (port 7090) - currently disabled
All containers communicate via the `lyra_net` Docker bridge network.
## External LLM Services
The following LLM backends are accessed via HTTP (not part of docker-compose):
- **vLLM Server** (`http://10.0.0.43:8000`)
- AMD MI50 GPU-accelerated inference
- Custom ROCm-enabled vLLM build
- Primary backend for reasoning and refinement stages
- **Ollama Server** (`http://10.0.0.3:11434`)
- RTX 3090 GPU-accelerated inference
- Secondary/configurable backend
- Model: qwen2.5:7b-instruct-q4_K_M
- **OpenAI API** (`https://api.openai.com/v1`)
- Cloud-based inference
- Used for reflection and persona stages
- Model: gpt-4o-mini
- **Fallback Server** (`http://10.0.0.41:11435`)
- Emergency backup endpoint
- Local llama-3.2-8b-instruct model
---
## Version History
### v0.5.0 (2025-11-28) - Current Release
- ✅ Fixed all critical API wiring issues
- ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
- ✅ Fixed Cortex → Intake integration
- ✅ Added missing Python package `__init__.py` files
- ✅ End-to-end message flow verified and working
### v0.4.x (Major Rewire)
- Cortex multi-stage reasoning pipeline
- Intake v0.2 simplification
- LLM router with multi-backend support
- Major architectural restructuring
### v0.3.x
- Beta Lyrae RAG system
- NeoMem integration
- Basic Cortex reasoning loop
---
## Known Issues (v0.5.0)
### Non-Critical
- Session management endpoints not fully implemented in Relay
- Intake service currently disabled in docker-compose.yml
- RAG service currently disabled in docker-compose.yml
- Cortex `/ingest` endpoint is a stub
### Future Enhancements
- Re-enable RAG service integration
- Implement full session persistence
- Add request correlation IDs for tracing
- Comprehensive health checks
---
## Quick Start
### Prerequisites
- Docker + Docker Compose
- At least one HTTP-accessible LLM endpoint (vLLM, Ollama, or OpenAI API key)
### Setup
1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys
2. Start all services with docker-compose:
```bash
docker-compose up -d
```
3. Check service health:
```bash
curl http://localhost:7078/_health
```
4. Access the UI at `http://localhost:7078`
### Test
```bash
curl -X POST http://localhost:7078/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello Lyra!"}],
"session_id": "test"
}'
```
All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack.
---
## Documentation
- See [CHANGELOG.md](CHANGELOG.md) for detailed version history
- See `ENVIRONMENT_VARIABLES.md` for environment variable reference
- Additional information available in the Trilium docs
---
## License
NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.
**Built with Claude Code**
---
## Integration Notes
- NeoMem API is compatible with Mem0 OSS endpoints (`/memories`, `/search`)
- All services communicate via Docker internal networking on the `lyra_net` bridge
- History and entity graphs are managed via PostgreSQL + Neo4j
- LLM backends are accessed via HTTP and configured in `.env`
---
## Beta Lyrae - RAG Memory System (Currently Disabled)
**Note:** The RAG service is currently disabled in docker-compose.yml
### Requirements
- Python 3.10+
- Dependencies: `chromadb openai tqdm python-dotenv fastapi uvicorn`
- Persistent storage: `./chromadb` or `/mnt/data/lyra_rag_db`
### Setup
1. Import chat logs (must be in OpenAI message format):
```bash
python3 rag/rag_chat_import.py
```
2. Build and start the RAG API server:
```bash
cd rag
python3 rag_build.py
uvicorn rag_api:app --host 0.0.0.0 --port 7090
```
3. Query the RAG system:
```bash
curl -X POST http://127.0.0.1:7090/rag/search \
-H "Content-Type: application/json" \
-d '{
"query": "What is the current state of Cortex?",
"where": {"category": "lyra"}
}'
```