docs updated

2025-11-28 18:05:59 -05:00
parent a83405beb1
commit d9281a1816
12 changed files with 557 additions and 477 deletions
--- a/README.md
+++ b/README.md
@@ -1,73 +1,178 @@
-##### Project Lyra - README v0.3.0 - needs fixing #####
+# Project Lyra - README v0.5.0

-Lyra is a modular persistent AI companion system.  
-It provides memory-backed chat using **NeoMem** + **Relay** + **Persona Sidecar**,  
-with optional subconscious annotation powered by **Cortex VM** running local LLMs.
+Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
+It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
+with multi-stage reasoning pipeline powered by distributed LLM backends.

-## Mission Statement ##
-	The point of project lyra is to give an AI chatbot more abilities than a typical chatbot. typical chat bots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/data base/ co-creator/collaborattor all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
+## Mission Statement
+
+The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
 	
 ---
-	
-## Structure ##
-	Project Lyra exists as a series of docker containers that run independentally of each other but are all networked together. Think of it as how the brain has regions, Lyra has modules:
-	## A. VM 100 - lyra-core:
-		1. ** Core v0.3.1 - Docker Stack
-			- Relay - (docker container) - The main harness that connects the modules together and accepts input from the user.
-			- UI - (HTML) - This is how the user communicates with lyra. ATM its a typical instant message interface, but plans are to make it much more than that.
-			- Persona - (docker container) - This is the personality of lyra, set how you want her to behave. Give specific instructions for output. Basically prompt injection.
-			- All of this is built and controlled by a single .env and docker-compose.lyra.yml.
-		2. **NeoMem v0.1.0 - (docker stack)
-			- NeoMem is Lyra's main long term memory data base. It is a fork of mem0 oss. Uses vector databases and graph.
-			- NeoMem launches with a single separate docker-compose.neomem.yml.
-			
-	## B. VM 101 - lyra - cortex
-		3. ** Cortex - VM containing docker stack
-		- This is the working reasoning layer of Lyra.
-		- Built to be flexible in deployment. Run it locally or remotely (via wan/lan) 
-		- Intake v0.1.0 - (docker Container) gives conversations context and purpose
-			- Intake takes the last N exchanges and summarizes them into coherrent short term memories.
-			- Uses a cascading summarization setup that quantizes the exchanges. Summaries occur at L2, L5, L10, L15, L20 etc.
-			- Keeps the bot aware of what is going on with out having to send it the whole chat every time. 
-		- Cortex - Docker container containing: 
-			- Reasoning Layer
-				- TBD
-			- Reflect - (docker continer) - Not yet implemented, road map. 
-				- Calls back to NeoMem after N exchanges and N summaries and edits memories created during the initial messaging step. This helps contain memories to coherrent thoughts, reduces the noise.
-				- Can be done actively and asynchronously, or on a time basis (think human sleep and dreams). 
-				- This stage is not yet built, this is just an idea. 
-		
-	## C. Remote LLM APIs:
-		3. **AI Backends
-			- Lyra doesnt run models her self, she calls up APIs.
-			- Endlessly customizable as long as it outputs to the same schema. 
+
+## Architecture Overview
+
+Project Lyra operates as a series of Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
+
+### A. VM 100 - lyra-core (Core Services)
+
+**1. Relay** (Node.js/Express) - Port 7078
+- Main orchestrator and message router
+- Coordinates all module interactions
+- OpenAI-compatible endpoint: `POST /v1/chat/completions`
+- Internal endpoint: `POST /chat`
+- Routes messages through Cortex reasoning pipeline
+- Manages async calls to Intake and NeoMem
+
+**2. UI** (Static HTML)
+- Browser-based chat interface with cyberpunk theme
+- Connects to Relay at `http://10.0.0.40:7078`
+- Saves and loads sessions
+- OpenAI-compatible message format
+
+**3. NeoMem** (Python/FastAPI) - Port 7077
+- Long-term memory database (fork of Mem0 OSS)
+- Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
+- RESTful API: `/memories`, `/search`
+- Semantic memory updates and retrieval
+- No external SDK dependencies - fully local
+
+### B. VM 101 - lyra-cortex (Reasoning Layer)
+
+**4. Cortex** (Python/FastAPI) - Port 7081
+- Primary reasoning engine with multi-stage pipeline
+- **4-Stage Processing:**
+  1. **Reflection** - Generates meta-awareness notes about conversation
+  2. **Reasoning** - Creates initial draft answer using context
+  3. **Refinement** - Polishes and improves the draft
+  4. **Persona** - Applies Lyra's personality and speaking style
+- Integrates with Intake for short-term context
+- Flexible LLM router supporting multiple backends
+
+**5. Intake v0.2** (Python/FastAPI) - Port 7080
+- Simplified short-term memory summarization
+- Session-based circular buffer (deque, maxlen=200)
+- Single-level simple summarization (no cascading)
+- Background async processing with FastAPI BackgroundTasks
+- Pushes summaries to NeoMem automatically
+- **API Endpoints:**
+  - `POST /add_exchange` - Add conversation exchange
+  - `GET /summaries?session_id={id}` - Retrieve session summary
+  - `POST /close_session/{id}` - Close and cleanup session
+
+### C. LLM Backends (Remote/Local APIs)
+
+**Multi-Backend Strategy:**
+- **PRIMARY**: vLLM on AMD MI50 GPU (`http://10.0.0.43:8000`) - Cortex reasoning, Intake
+- **SECONDARY**: Ollama on RTX 3090 (`http://10.0.0.3:11434`) - Configurable per-module
+- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cortex persona layer
+- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback 
 			
 ---

+## Data Flow Architecture (v0.5.0)

-## 🚀 Features ##
+### Normal Message Flow:

-# Lyra-Core VM (VM100)
- **Relay **:
-  - The main harness and orchestrator of Lyra.
-  - OpenAI-compatible endpoint: `POST /v1/chat/completions`
-  - Injects persona + relevant memories into every LLM call
-  - Routes all memory storage/retrieval through **NeoMem**
-  - Logs spans (`neomem.add`, `neomem.search`, `persona.fetch`, `llm.generate`)
+```
+User (UI) → POST /v1/chat/completions
+  ↓
+Relay (7078)
+  ↓ POST /reason
+Cortex (7081)
+  ↓ GET /summaries?session_id=xxx
+Intake (7080) [RETURNS SUMMARY]
+  ↓
+Cortex processes (4 stages):
+  1. reflection.py → meta-awareness notes
+  2. reasoning.py → draft answer (uses LLM)
+  3. refine.py → refined answer (uses LLM)
+  4. persona/speak.py → Lyra personality (uses LLM)
+  ↓
+Returns persona answer to Relay
+  ↓
+Relay → Cortex /ingest (async, stub)
+Relay → Intake /add_exchange (async)
+  ↓
+Intake → Background summarize → NeoMem
+  ↓
+Relay → UI (returns final response)
+```

- **NeoMem (Memory Engine)**:
-  - Forked from Mem0 OSS and fully independent.
-  - Drop-in compatible API (`/memories`, `/search`).
-  - Local-first: runs on FastAPI with Postgres + Neo4j.
-  - No external SDK dependencies.
-  - Default service: `neomem-api` (port 7077).
-  - Capable of adding new memories and updating previous memories. Compares existing embeddings and performs in place updates when a memory is judged to be a semantic match.
+### Cortex 4-Stage Reasoning Pipeline:

- **UI**:
-  - Lightweight static HTML chat page.
-  - Connects to Relay at `http://<host>:7078`.
-  - Nice cyberpunk theme!
-  - Saves and loads sessions, which then in turn send to relay.
+1. **Reflection** (`reflection.py`) - Cloud backend (OpenAI)
+   - Analyzes user intent and conversation context
+   - Generates meta-awareness notes
+   - "What is the user really asking?"
+
+2. **Reasoning** (`reasoning.py`) - Primary backend (vLLM)
+   - Retrieves short-term context from Intake
+   - Creates initial draft answer
+   - Integrates context, reflection notes, and user prompt
+
+3. **Refinement** (`refine.py`) - Primary backend (vLLM)
+   - Polishes the draft answer
+   - Improves clarity and coherence
+   - Ensures factual consistency
+
+4. **Persona** (`speak.py`) - Cloud backend (OpenAI)
+   - Applies Lyra's personality and speaking style
+   - Natural, conversational output
+   - Final answer returned to user
+
+---
+
+## Features
+
+### Lyra-Core (VM 100)
+
+**Relay**:
+- Main orchestrator and message router
+- OpenAI-compatible endpoint: `POST /v1/chat/completions`
+- Internal endpoint: `POST /chat`
+- Health check: `GET /_health`
+- Async non-blocking calls to Cortex and Intake
+- Shared request handler for code reuse
+- Comprehensive error handling
+
+**NeoMem (Memory Engine)**:
+- Forked from Mem0 OSS - fully independent
+- Drop-in compatible API (`/memories`, `/search`)
+- Local-first: runs on FastAPI with Postgres + Neo4j
+- No external SDK dependencies
+- Semantic memory updates - compares embeddings and performs in-place updates
+- Default service: `neomem-api` (port 7077)
+
+**UI**:
+- Lightweight static HTML chat interface
+- Cyberpunk theme
+- Session save/load functionality
+- OpenAI message format support
+
+### Cortex (VM 101)
+
+**Cortex** (v0.5):
+- Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
+- Flexible LLM backend routing
+- Per-stage backend selection
+- Async processing throughout
+- IntakeClient integration for short-term context
+- `/reason`, `/ingest` (stub), `/health` endpoints
+
+**Intake** (v0.2):
+- Simplified single-level summarization
+- Session-based circular buffer (200 exchanges max)
+- Background async summarization
+- Automatic NeoMem push
+- No persistent log files (memory-only)
+- **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30)
+
+**LLM Router**:
+- Dynamic backend selection
+- Environment-driven configuration
+- Support for vLLM, Ollama, OpenAI, custom endpoints
+- Per-module backend preferences

 # Beta Lyrae (RAG Memory DB) - added 11-3-25
 - **RAG Knowledge DB - Beta Lyrae (sheliak)**
@@ -159,7 +264,85 @@ with optional subconscious annotation powered by **Cortex VM** running local LLM
 			└── Future: sends summaries → Cortex for reflection


-# Additional information available in the trilium docs. #
+---
+
+## Version History
+
+### v0.5.0 (2025-11-28) - Current Release
+- ✅ Fixed all critical API wiring issues
+- ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
+- ✅ Fixed Cortex → Intake integration
+- ✅ Added missing Python package `__init__.py` files
+- ✅ End-to-end message flow verified and working
+
+### v0.4.x (Major Rewire)
+- Cortex multi-stage reasoning pipeline
+- Intake v0.2 simplification
+- LLM router with multi-backend support
+- Major architectural restructuring
+
+### v0.3.x
+- Beta Lyrae RAG system
+- NeoMem integration
+- Basic Cortex reasoning loop
+
+---
+
+## Known Issues (v0.5.0)
+
+### Non-Critical
+- Session management endpoints not fully implemented in Relay
+- RAG service currently disabled in docker-compose.yml
+- Cortex `/ingest` endpoint is a stub
+
+### Future Enhancements
+- Re-enable RAG service integration
+- Implement full session persistence
+- Add request correlation IDs for tracing
+- Comprehensive health checks
+
+---
+
+## Quick Start
+
+### Prerequisites
+- Docker + Docker Compose
+- PostgreSQL 13+, Neo4j 4.4+ (for NeoMem)
+- At least one LLM API endpoint (vLLM, Ollama, or OpenAI)
+
+### Setup
+1. Configure environment variables in `.env` files
+2. Start services: `docker-compose up -d`
+3. Check health: `curl http://localhost:7078/_health`
+4. Access UI: `http://localhost:7078`
+
+### Test
+```bash
+curl -X POST http://localhost:7078/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [{"role": "user", "content": "Hello Lyra!"}],
+    "session_id": "test"
+  }'
+```
+
+---
+
+## Documentation
+
+- See [CHANGELOG.md](CHANGELOG.md) for detailed version history
+- See `ENVIRONMENT_VARIABLES.md` for environment variable reference
+- Additional information available in the Trilium docs
+
+---
+
+## License
+
+NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
+© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.
+
+**Built with Claude Code**
+
 ---

 ## 📦 Requirements