docs updated
This commit is contained in:
303
README.md
303
README.md
@@ -1,73 +1,178 @@
|
||||
##### Project Lyra - README v0.3.0 - needs fixing #####
|
||||
# Project Lyra - README v0.5.0
|
||||
|
||||
Lyra is a modular persistent AI companion system.
|
||||
It provides memory-backed chat using **NeoMem** + **Relay** + **Persona Sidecar**,
|
||||
with optional subconscious annotation powered by **Cortex VM** running local LLMs.
|
||||
Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
|
||||
It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
|
||||
with multi-stage reasoning pipeline powered by distributed LLM backends.
|
||||
|
||||
## Mission Statement ##
|
||||
The point of project lyra is to give an AI chatbot more abilities than a typical chatbot. typical chat bots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/data base/ co-creator/collaborattor all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
|
||||
## Mission Statement
|
||||
|
||||
The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
|
||||
|
||||
---
|
||||
|
||||
## Structure ##
|
||||
Project Lyra exists as a series of docker containers that run independentally of each other but are all networked together. Think of it as how the brain has regions, Lyra has modules:
|
||||
## A. VM 100 - lyra-core:
|
||||
1. ** Core v0.3.1 - Docker Stack
|
||||
- Relay - (docker container) - The main harness that connects the modules together and accepts input from the user.
|
||||
- UI - (HTML) - This is how the user communicates with lyra. ATM its a typical instant message interface, but plans are to make it much more than that.
|
||||
- Persona - (docker container) - This is the personality of lyra, set how you want her to behave. Give specific instructions for output. Basically prompt injection.
|
||||
- All of this is built and controlled by a single .env and docker-compose.lyra.yml.
|
||||
2. **NeoMem v0.1.0 - (docker stack)
|
||||
- NeoMem is Lyra's main long term memory data base. It is a fork of mem0 oss. Uses vector databases and graph.
|
||||
- NeoMem launches with a single separate docker-compose.neomem.yml.
|
||||
|
||||
## B. VM 101 - lyra - cortex
|
||||
3. ** Cortex - VM containing docker stack
|
||||
- This is the working reasoning layer of Lyra.
|
||||
- Built to be flexible in deployment. Run it locally or remotely (via wan/lan)
|
||||
- Intake v0.1.0 - (docker Container) gives conversations context and purpose
|
||||
- Intake takes the last N exchanges and summarizes them into coherrent short term memories.
|
||||
- Uses a cascading summarization setup that quantizes the exchanges. Summaries occur at L2, L5, L10, L15, L20 etc.
|
||||
- Keeps the bot aware of what is going on with out having to send it the whole chat every time.
|
||||
- Cortex - Docker container containing:
|
||||
- Reasoning Layer
|
||||
- TBD
|
||||
- Reflect - (docker continer) - Not yet implemented, road map.
|
||||
- Calls back to NeoMem after N exchanges and N summaries and edits memories created during the initial messaging step. This helps contain memories to coherrent thoughts, reduces the noise.
|
||||
- Can be done actively and asynchronously, or on a time basis (think human sleep and dreams).
|
||||
- This stage is not yet built, this is just an idea.
|
||||
|
||||
## C. Remote LLM APIs:
|
||||
3. **AI Backends
|
||||
- Lyra doesnt run models her self, she calls up APIs.
|
||||
- Endlessly customizable as long as it outputs to the same schema.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
Project Lyra operates as a series of Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
|
||||
|
||||
### A. VM 100 - lyra-core (Core Services)
|
||||
|
||||
**1. Relay** (Node.js/Express) - Port 7078
|
||||
- Main orchestrator and message router
|
||||
- Coordinates all module interactions
|
||||
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
|
||||
- Internal endpoint: `POST /chat`
|
||||
- Routes messages through Cortex reasoning pipeline
|
||||
- Manages async calls to Intake and NeoMem
|
||||
|
||||
**2. UI** (Static HTML)
|
||||
- Browser-based chat interface with cyberpunk theme
|
||||
- Connects to Relay at `http://10.0.0.40:7078`
|
||||
- Saves and loads sessions
|
||||
- OpenAI-compatible message format
|
||||
|
||||
**3. NeoMem** (Python/FastAPI) - Port 7077
|
||||
- Long-term memory database (fork of Mem0 OSS)
|
||||
- Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
|
||||
- RESTful API: `/memories`, `/search`
|
||||
- Semantic memory updates and retrieval
|
||||
- No external SDK dependencies - fully local
|
||||
|
||||
### B. VM 101 - lyra-cortex (Reasoning Layer)
|
||||
|
||||
**4. Cortex** (Python/FastAPI) - Port 7081
|
||||
- Primary reasoning engine with multi-stage pipeline
|
||||
- **4-Stage Processing:**
|
||||
1. **Reflection** - Generates meta-awareness notes about conversation
|
||||
2. **Reasoning** - Creates initial draft answer using context
|
||||
3. **Refinement** - Polishes and improves the draft
|
||||
4. **Persona** - Applies Lyra's personality and speaking style
|
||||
- Integrates with Intake for short-term context
|
||||
- Flexible LLM router supporting multiple backends
|
||||
|
||||
**5. Intake v0.2** (Python/FastAPI) - Port 7080
|
||||
- Simplified short-term memory summarization
|
||||
- Session-based circular buffer (deque, maxlen=200)
|
||||
- Single-level simple summarization (no cascading)
|
||||
- Background async processing with FastAPI BackgroundTasks
|
||||
- Pushes summaries to NeoMem automatically
|
||||
- **API Endpoints:**
|
||||
- `POST /add_exchange` - Add conversation exchange
|
||||
- `GET /summaries?session_id={id}` - Retrieve session summary
|
||||
- `POST /close_session/{id}` - Close and cleanup session
|
||||
|
||||
### C. LLM Backends (Remote/Local APIs)
|
||||
|
||||
**Multi-Backend Strategy:**
|
||||
- **PRIMARY**: vLLM on AMD MI50 GPU (`http://10.0.0.43:8000`) - Cortex reasoning, Intake
|
||||
- **SECONDARY**: Ollama on RTX 3090 (`http://10.0.0.3:11434`) - Configurable per-module
|
||||
- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cortex persona layer
|
||||
- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
|
||||
|
||||
---
|
||||
|
||||
## Data Flow Architecture (v0.5.0)
|
||||
|
||||
## 🚀 Features ##
|
||||
### Normal Message Flow:
|
||||
|
||||
# Lyra-Core VM (VM100)
|
||||
- **Relay **:
|
||||
- The main harness and orchestrator of Lyra.
|
||||
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
|
||||
- Injects persona + relevant memories into every LLM call
|
||||
- Routes all memory storage/retrieval through **NeoMem**
|
||||
- Logs spans (`neomem.add`, `neomem.search`, `persona.fetch`, `llm.generate`)
|
||||
```
|
||||
User (UI) → POST /v1/chat/completions
|
||||
↓
|
||||
Relay (7078)
|
||||
↓ POST /reason
|
||||
Cortex (7081)
|
||||
↓ GET /summaries?session_id=xxx
|
||||
Intake (7080) [RETURNS SUMMARY]
|
||||
↓
|
||||
Cortex processes (4 stages):
|
||||
1. reflection.py → meta-awareness notes
|
||||
2. reasoning.py → draft answer (uses LLM)
|
||||
3. refine.py → refined answer (uses LLM)
|
||||
4. persona/speak.py → Lyra personality (uses LLM)
|
||||
↓
|
||||
Returns persona answer to Relay
|
||||
↓
|
||||
Relay → Cortex /ingest (async, stub)
|
||||
Relay → Intake /add_exchange (async)
|
||||
↓
|
||||
Intake → Background summarize → NeoMem
|
||||
↓
|
||||
Relay → UI (returns final response)
|
||||
```
|
||||
|
||||
- **NeoMem (Memory Engine)**:
|
||||
- Forked from Mem0 OSS and fully independent.
|
||||
- Drop-in compatible API (`/memories`, `/search`).
|
||||
- Local-first: runs on FastAPI with Postgres + Neo4j.
|
||||
- No external SDK dependencies.
|
||||
- Default service: `neomem-api` (port 7077).
|
||||
- Capable of adding new memories and updating previous memories. Compares existing embeddings and performs in place updates when a memory is judged to be a semantic match.
|
||||
### Cortex 4-Stage Reasoning Pipeline:
|
||||
|
||||
- **UI**:
|
||||
- Lightweight static HTML chat page.
|
||||
- Connects to Relay at `http://<host>:7078`.
|
||||
- Nice cyberpunk theme!
|
||||
- Saves and loads sessions, which then in turn send to relay.
|
||||
1. **Reflection** (`reflection.py`) - Cloud backend (OpenAI)
|
||||
- Analyzes user intent and conversation context
|
||||
- Generates meta-awareness notes
|
||||
- "What is the user really asking?"
|
||||
|
||||
2. **Reasoning** (`reasoning.py`) - Primary backend (vLLM)
|
||||
- Retrieves short-term context from Intake
|
||||
- Creates initial draft answer
|
||||
- Integrates context, reflection notes, and user prompt
|
||||
|
||||
3. **Refinement** (`refine.py`) - Primary backend (vLLM)
|
||||
- Polishes the draft answer
|
||||
- Improves clarity and coherence
|
||||
- Ensures factual consistency
|
||||
|
||||
4. **Persona** (`speak.py`) - Cloud backend (OpenAI)
|
||||
- Applies Lyra's personality and speaking style
|
||||
- Natural, conversational output
|
||||
- Final answer returned to user
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
### Lyra-Core (VM 100)
|
||||
|
||||
**Relay**:
|
||||
- Main orchestrator and message router
|
||||
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
|
||||
- Internal endpoint: `POST /chat`
|
||||
- Health check: `GET /_health`
|
||||
- Async non-blocking calls to Cortex and Intake
|
||||
- Shared request handler for code reuse
|
||||
- Comprehensive error handling
|
||||
|
||||
**NeoMem (Memory Engine)**:
|
||||
- Forked from Mem0 OSS - fully independent
|
||||
- Drop-in compatible API (`/memories`, `/search`)
|
||||
- Local-first: runs on FastAPI with Postgres + Neo4j
|
||||
- No external SDK dependencies
|
||||
- Semantic memory updates - compares embeddings and performs in-place updates
|
||||
- Default service: `neomem-api` (port 7077)
|
||||
|
||||
**UI**:
|
||||
- Lightweight static HTML chat interface
|
||||
- Cyberpunk theme
|
||||
- Session save/load functionality
|
||||
- OpenAI message format support
|
||||
|
||||
### Cortex (VM 101)
|
||||
|
||||
**Cortex** (v0.5):
|
||||
- Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
|
||||
- Flexible LLM backend routing
|
||||
- Per-stage backend selection
|
||||
- Async processing throughout
|
||||
- IntakeClient integration for short-term context
|
||||
- `/reason`, `/ingest` (stub), `/health` endpoints
|
||||
|
||||
**Intake** (v0.2):
|
||||
- Simplified single-level summarization
|
||||
- Session-based circular buffer (200 exchanges max)
|
||||
- Background async summarization
|
||||
- Automatic NeoMem push
|
||||
- No persistent log files (memory-only)
|
||||
- **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30)
|
||||
|
||||
**LLM Router**:
|
||||
- Dynamic backend selection
|
||||
- Environment-driven configuration
|
||||
- Support for vLLM, Ollama, OpenAI, custom endpoints
|
||||
- Per-module backend preferences
|
||||
|
||||
# Beta Lyrae (RAG Memory DB) - added 11-3-25
|
||||
- **RAG Knowledge DB - Beta Lyrae (sheliak)**
|
||||
@@ -159,7 +264,85 @@ with optional subconscious annotation powered by **Cortex VM** running local LLM
|
||||
└── Future: sends summaries → Cortex for reflection
|
||||
|
||||
|
||||
# Additional information available in the trilium docs. #
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
### v0.5.0 (2025-11-28) - Current Release
|
||||
- ✅ Fixed all critical API wiring issues
|
||||
- ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
|
||||
- ✅ Fixed Cortex → Intake integration
|
||||
- ✅ Added missing Python package `__init__.py` files
|
||||
- ✅ End-to-end message flow verified and working
|
||||
|
||||
### v0.4.x (Major Rewire)
|
||||
- Cortex multi-stage reasoning pipeline
|
||||
- Intake v0.2 simplification
|
||||
- LLM router with multi-backend support
|
||||
- Major architectural restructuring
|
||||
|
||||
### v0.3.x
|
||||
- Beta Lyrae RAG system
|
||||
- NeoMem integration
|
||||
- Basic Cortex reasoning loop
|
||||
|
||||
---
|
||||
|
||||
## Known Issues (v0.5.0)
|
||||
|
||||
### Non-Critical
|
||||
- Session management endpoints not fully implemented in Relay
|
||||
- RAG service currently disabled in docker-compose.yml
|
||||
- Cortex `/ingest` endpoint is a stub
|
||||
|
||||
### Future Enhancements
|
||||
- Re-enable RAG service integration
|
||||
- Implement full session persistence
|
||||
- Add request correlation IDs for tracing
|
||||
- Comprehensive health checks
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
- Docker + Docker Compose
|
||||
- PostgreSQL 13+, Neo4j 4.4+ (for NeoMem)
|
||||
- At least one LLM API endpoint (vLLM, Ollama, or OpenAI)
|
||||
|
||||
### Setup
|
||||
1. Configure environment variables in `.env` files
|
||||
2. Start services: `docker-compose up -d`
|
||||
3. Check health: `curl http://localhost:7078/_health`
|
||||
4. Access UI: `http://localhost:7078`
|
||||
|
||||
### Test
|
||||
```bash
|
||||
curl -X POST http://localhost:7078/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"messages": [{"role": "user", "content": "Hello Lyra!"}],
|
||||
"session_id": "test"
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
- See [CHANGELOG.md](CHANGELOG.md) for detailed version history
|
||||
- See `ENVIRONMENT_VARIABLES.md` for environment variable reference
|
||||
- Additional information available in the Trilium docs
|
||||
|
||||
---
|
||||
|
||||
## License
|
||||
|
||||
NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
|
||||
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.
|
||||
|
||||
**Built with Claude Code**
|
||||
|
||||
---
|
||||
|
||||
## 📦 Requirements
|
||||
|
||||
Reference in New Issue
Block a user