docs updated for v0.5.1
This commit is contained in:
@@ -1,71 +1,925 @@
|
||||
# Lyra Core — Project Summary
|
||||
# Project Lyra — Comprehensive AI Context Summary
|
||||
|
||||
## v0.4 (2025-10-03)
|
||||
**Version:** v0.5.1 (2025-12-11)
|
||||
**Status:** Production-ready modular AI companion system
|
||||
**Purpose:** Memory-backed conversational AI with multi-stage reasoning, persistent context, and modular LLM backend architecture
|
||||
|
||||
### 🧠 High-Level Architecture
|
||||
- **Lyra Core (v0.3.1)** — Orchestration layer.
|
||||
- Accepts chat requests (`/v1/chat/completions`).
|
||||
- Routes through Cortex for subconscious annotation.
|
||||
- Stores everything in Mem0 (no discard).
|
||||
- Fetches persona + relevant memories.
|
||||
- Injects context back into LLM.
|
||||
---
|
||||
|
||||
- **Cortex (v0.3.0)** — Subconscious annotator.
|
||||
- Runs locally via `llama.cpp` (Phi-3.5-mini Q4_K_M).
|
||||
- Strict JSON schema:
|
||||
```json
|
||||
{
|
||||
"sentiment": "positive" | "neutral" | "negative",
|
||||
"novelty": 0.0–1.0,
|
||||
"tags": ["keyword", "keyword"],
|
||||
"notes": "short string"
|
||||
## Executive Summary
|
||||
|
||||
Project Lyra is a **self-hosted AI companion system** designed to overcome the limitations of typical chatbots by providing:
|
||||
- **Persistent long-term memory** (NeoMem: PostgreSQL + Neo4j graph storage)
|
||||
- **Multi-stage reasoning pipeline** (Cortex: reflection → reasoning → refinement → persona)
|
||||
- **Short-term context management** (Intake: session-based summarization embedded in Cortex)
|
||||
- **Flexible LLM backend routing** (supports llama.cpp, Ollama, OpenAI, custom endpoints)
|
||||
- **OpenAI-compatible API** (drop-in replacement for chat applications)
|
||||
|
||||
**Core Philosophy:** Like a human brain has different regions for different functions, Lyra has specialized modules that work together. She's not just a chatbot—she's a notepad, schedule, database, co-creator, and collaborator with her own executive function.
|
||||
|
||||
---
|
||||
|
||||
## Quick Context for AI Assistants
|
||||
|
||||
If you're an AI being given this project to work on, here's what you need to know:
|
||||
|
||||
### What This Project Does
|
||||
Lyra is a conversational AI system that **remembers everything** across sessions. When a user says something in passing, Lyra stores it, contextualizes it, and can recall it later. She can:
|
||||
- Track project progress over time
|
||||
- Remember user preferences and past conversations
|
||||
- Reason through complex questions using multiple LLM calls
|
||||
- Apply a consistent personality across all interactions
|
||||
- Integrate with multiple LLM backends (local and cloud)
|
||||
|
||||
### Current Architecture (v0.5.1)
|
||||
```
|
||||
User → Relay (Express/Node.js, port 7078)
|
||||
↓
|
||||
Cortex (FastAPI/Python, port 7081)
|
||||
├─ Intake module (embedded, in-memory SESSIONS)
|
||||
├─ 4-stage reasoning pipeline
|
||||
└─ Multi-backend LLM router
|
||||
↓
|
||||
NeoMem (FastAPI/Python, port 7077)
|
||||
├─ PostgreSQL (vector storage)
|
||||
└─ Neo4j (graph relationships)
|
||||
```
|
||||
|
||||
### Key Files You'll Work With
|
||||
|
||||
**Backend Services:**
|
||||
- [cortex/router.py](cortex/router.py) - Main Cortex routing logic (306 lines, `/reason`, `/ingest` endpoints)
|
||||
- [cortex/intake/intake.py](cortex/intake/intake.py) - Short-term memory module (367 lines, SESSIONS management)
|
||||
- [cortex/reasoning/reasoning.py](cortex/reasoning/reasoning.py) - Draft answer generation
|
||||
- [cortex/reasoning/refine.py](cortex/reasoning/refine.py) - Answer refinement
|
||||
- [cortex/reasoning/reflection.py](cortex/reasoning/reflection.py) - Meta-awareness notes
|
||||
- [cortex/persona/speak.py](cortex/persona/speak.py) - Personality layer
|
||||
- [cortex/llm/llm_router.py](cortex/llm/llm_router.py) - LLM backend selector
|
||||
- [core/relay/server.js](core/relay/server.js) - Main orchestrator (Node.js)
|
||||
- [neomem/main.py](neomem/main.py) - Long-term memory API
|
||||
|
||||
**Configuration:**
|
||||
- [.env](.env) - Root environment variables (LLM backends, databases, API keys)
|
||||
- [cortex/.env](cortex/.env) - Cortex-specific overrides
|
||||
- [docker-compose.yml](docker-compose.yml) - Service definitions (152 lines)
|
||||
|
||||
**Documentation:**
|
||||
- [CHANGELOG.md](CHANGELOG.md) - Complete version history (836 lines, chronological format)
|
||||
- [README.md](README.md) - User-facing documentation (610 lines)
|
||||
- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - This file
|
||||
|
||||
### Recent Critical Fixes (v0.5.1)
|
||||
The most recent work fixed a critical bug where Intake's SESSIONS buffer wasn't persisting:
|
||||
1. **Fixed**: `bg_summarize()` was only a TYPE_CHECKING stub → implemented as logging stub
|
||||
2. **Fixed**: `/ingest` endpoint had unreachable code → removed early return, added lenient error handling
|
||||
3. **Added**: `cortex/intake/__init__.py` → proper Python package structure
|
||||
4. **Added**: Diagnostic endpoints `/debug/sessions` and `/debug/summary` for troubleshooting
|
||||
|
||||
**Key Insight**: Intake is no longer a standalone service—it's embedded in Cortex as a Python module. SESSIONS must persist in a single Uvicorn worker (no multi-worker support without Redis).
|
||||
|
||||
---
|
||||
|
||||
## Architecture Deep Dive
|
||||
|
||||
### Service Topology (Docker Compose)
|
||||
|
||||
**Active Containers:**
|
||||
1. **relay** (Node.js/Express, port 7078)
|
||||
- Entry point for all user requests
|
||||
- OpenAI-compatible `/v1/chat/completions` endpoint
|
||||
- Routes to Cortex for reasoning
|
||||
- Async calls to Cortex `/ingest` after response
|
||||
|
||||
2. **cortex** (Python/FastAPI, port 7081)
|
||||
- Multi-stage reasoning pipeline
|
||||
- Embedded Intake module (no HTTP, direct Python imports)
|
||||
- Endpoints: `/reason`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary`
|
||||
|
||||
3. **neomem-api** (Python/FastAPI, port 7077)
|
||||
- Long-term memory storage
|
||||
- Fork of Mem0 OSS (fully local, no external SDK)
|
||||
- Endpoints: `/memories`, `/search`, `/health`
|
||||
|
||||
4. **neomem-postgres** (PostgreSQL + pgvector, port 5432)
|
||||
- Vector embeddings storage
|
||||
- Memory history records
|
||||
|
||||
5. **neomem-neo4j** (Neo4j, ports 7474/7687)
|
||||
- Graph relationships between memories
|
||||
- Entity extraction and linking
|
||||
|
||||
**Disabled Services:**
|
||||
- `intake` - No longer needed (embedded in Cortex as of v0.5.1)
|
||||
- `rag` - Beta Lyrae RAG service (planned re-enablement)
|
||||
|
||||
### External LLM Backends (HTTP APIs)
|
||||
|
||||
**PRIMARY Backend** - llama.cpp @ `http://10.0.0.44:8080`
|
||||
- AMD MI50 GPU-accelerated inference
|
||||
- Model: `/model` (path-based routing)
|
||||
- Used for: Reasoning, refinement, summarization
|
||||
|
||||
**SECONDARY Backend** - Ollama @ `http://10.0.0.3:11434`
|
||||
- RTX 3090 GPU-accelerated inference
|
||||
- Model: `qwen2.5:7b-instruct-q4_K_M`
|
||||
- Used for: Configurable per-module
|
||||
|
||||
**CLOUD Backend** - OpenAI @ `https://api.openai.com/v1`
|
||||
- Cloud-based inference
|
||||
- Model: `gpt-4o-mini`
|
||||
- Used for: Reflection, persona layers
|
||||
|
||||
**FALLBACK Backend** - Local @ `http://10.0.0.41:11435`
|
||||
- CPU-based inference
|
||||
- Model: `llama-3.2-8b-instruct`
|
||||
- Used for: Emergency fallback
|
||||
|
||||
### Data Flow (Request Lifecycle)
|
||||
|
||||
```
|
||||
1. User sends message → Relay (/v1/chat/completions)
|
||||
↓
|
||||
2. Relay → Cortex (/reason)
|
||||
↓
|
||||
3. Cortex calls Intake module (internal Python)
|
||||
- Intake.summarize_context(session_id, exchanges)
|
||||
- Returns L1/L5/L10/L20/L30 summaries
|
||||
↓
|
||||
4. Cortex 4-stage pipeline:
|
||||
a. reflection.py → Meta-awareness notes (CLOUD backend)
|
||||
- "What is the user really asking?"
|
||||
- Returns JSON: {"notes": [...]}
|
||||
|
||||
b. reasoning.py → Draft answer (PRIMARY backend)
|
||||
- Uses context from Intake
|
||||
- Integrates reflection notes
|
||||
- Returns draft text
|
||||
|
||||
c. refine.py → Refined answer (PRIMARY backend)
|
||||
- Polishes draft for clarity
|
||||
- Ensures factual consistency
|
||||
- Returns refined text
|
||||
|
||||
d. speak.py → Persona layer (CLOUD backend)
|
||||
- Applies Lyra's personality
|
||||
- Natural, conversational tone
|
||||
- Returns final answer
|
||||
↓
|
||||
5. Cortex → Relay (returns persona answer)
|
||||
↓
|
||||
6. Relay → Cortex (/ingest) [async, non-blocking]
|
||||
- Sends (session_id, user_msg, assistant_msg)
|
||||
- Cortex calls add_exchange_internal()
|
||||
- Appends to SESSIONS[session_id]["buffer"]
|
||||
↓
|
||||
7. Relay → User (returns final response)
|
||||
↓
|
||||
8. [Planned] Relay → NeoMem (/memories) [async]
|
||||
- Store conversation in long-term memory
|
||||
```
|
||||
|
||||
### Intake Module Architecture (v0.5.1)
|
||||
|
||||
**Location:** `cortex/intake/`
|
||||
|
||||
**Key Change:** Intake is now **embedded in Cortex** as a Python module, not a standalone service.
|
||||
|
||||
**Import Pattern:**
|
||||
```python
|
||||
from intake.intake import add_exchange_internal, SESSIONS, summarize_context
|
||||
```
|
||||
|
||||
**Core Data Structure:**
|
||||
```python
|
||||
SESSIONS: dict[str, dict] = {}
|
||||
|
||||
# Structure:
|
||||
SESSIONS[session_id] = {
|
||||
"buffer": deque(maxlen=200), # Circular buffer of exchanges
|
||||
"created_at": datetime
|
||||
}
|
||||
|
||||
# Each exchange in buffer:
|
||||
{
|
||||
"session_id": "...",
|
||||
"user_msg": "...",
|
||||
"assistant_msg": "...",
|
||||
"timestamp": "2025-12-11T..."
|
||||
}
|
||||
```
|
||||
|
||||
**Functions:**
|
||||
1. **`add_exchange_internal(exchange: dict)`**
|
||||
- Adds exchange to SESSIONS buffer
|
||||
- Creates new session if needed
|
||||
- Calls `bg_summarize()` stub
|
||||
- Returns `{"ok": True, "session_id": "..."}`
|
||||
|
||||
2. **`summarize_context(session_id: str, exchanges: list[dict])`** [async]
|
||||
- Generates L1/L5/L10/L20/L30 summaries via LLM
|
||||
- Called during `/reason` endpoint
|
||||
- Returns multi-level summary dict
|
||||
|
||||
3. **`bg_summarize(session_id: str)`**
|
||||
- **Stub function** - logs only, no actual work
|
||||
- Defers summarization to `/reason` call
|
||||
- Exists to prevent NameError
|
||||
|
||||
**Critical Constraint:** SESSIONS is a module-level global dict. This requires **single-worker Uvicorn** mode. Multi-worker deployments need Redis or shared storage.
|
||||
|
||||
**Diagnostic Endpoints:**
|
||||
- `GET /debug/sessions` - Inspect all SESSIONS (object ID, buffer sizes, recent exchanges)
|
||||
- `GET /debug/summary?session_id=X` - Test summarization for a session
|
||||
|
||||
---
|
||||
|
||||
## Environment Configuration
|
||||
|
||||
### LLM Backend Registry (Multi-Backend Strategy)
|
||||
|
||||
**Root `.env` defines all backend OPTIONS:**
|
||||
```bash
|
||||
# PRIMARY Backend (llama.cpp)
|
||||
LLM_PRIMARY_PROVIDER=llama.cpp
|
||||
LLM_PRIMARY_URL=http://10.0.0.44:8080
|
||||
LLM_PRIMARY_MODEL=/model
|
||||
|
||||
# SECONDARY Backend (Ollama)
|
||||
LLM_SECONDARY_PROVIDER=ollama
|
||||
LLM_SECONDARY_URL=http://10.0.0.3:11434
|
||||
LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
|
||||
|
||||
# CLOUD Backend (OpenAI)
|
||||
LLM_OPENAI_PROVIDER=openai
|
||||
LLM_OPENAI_URL=https://api.openai.com/v1
|
||||
LLM_OPENAI_MODEL=gpt-4o-mini
|
||||
OPENAI_API_KEY=sk-proj-...
|
||||
|
||||
# FALLBACK Backend
|
||||
LLM_FALLBACK_PROVIDER=openai_completions
|
||||
LLM_FALLBACK_URL=http://10.0.0.41:11435
|
||||
LLM_FALLBACK_MODEL=llama-3.2-8b-instruct
|
||||
```
|
||||
|
||||
**Module-specific backend selection:**
|
||||
```bash
|
||||
CORTEX_LLM=SECONDARY # Cortex uses Ollama
|
||||
INTAKE_LLM=PRIMARY # Intake uses llama.cpp
|
||||
SPEAK_LLM=OPENAI # Persona uses OpenAI
|
||||
NEOMEM_LLM=PRIMARY # NeoMem uses llama.cpp
|
||||
UI_LLM=OPENAI # UI uses OpenAI
|
||||
RELAY_LLM=PRIMARY # Relay uses llama.cpp
|
||||
```
|
||||
|
||||
**Philosophy:** Root `.env` provides all backend OPTIONS. Each service chooses which backend to USE via `{MODULE}_LLM` variable. This eliminates URL duplication while preserving flexibility.
|
||||
|
||||
### Database Configuration
|
||||
```bash
|
||||
# PostgreSQL (vector storage)
|
||||
POSTGRES_USER=neomem
|
||||
POSTGRES_PASSWORD=neomempass
|
||||
POSTGRES_DB=neomem
|
||||
POSTGRES_HOST=neomem-postgres
|
||||
POSTGRES_PORT=5432
|
||||
|
||||
# Neo4j (graph storage)
|
||||
NEO4J_URI=bolt://neomem-neo4j:7687
|
||||
NEO4J_USERNAME=neo4j
|
||||
NEO4J_PASSWORD=neomemgraph
|
||||
```
|
||||
|
||||
### Service URLs (Docker Internal Network)
|
||||
```bash
|
||||
NEOMEM_API=http://neomem-api:7077
|
||||
CORTEX_API=http://cortex:7081
|
||||
CORTEX_REASON_URL=http://cortex:7081/reason
|
||||
CORTEX_INGEST_URL=http://cortex:7081/ingest
|
||||
RELAY_URL=http://relay:7078
|
||||
```
|
||||
|
||||
### Feature Flags
|
||||
```bash
|
||||
CORTEX_ENABLED=true
|
||||
MEMORY_ENABLED=true
|
||||
PERSONA_ENABLED=false
|
||||
DEBUG_PROMPT=true
|
||||
VERBOSE_DEBUG=true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Code Structure Overview
|
||||
|
||||
### Cortex Service (`cortex/`)
|
||||
|
||||
**Main Files:**
|
||||
- `main.py` - FastAPI app initialization
|
||||
- `router.py` - Route definitions (`/reason`, `/ingest`, `/health`, `/debug/*`)
|
||||
- `context.py` - Context aggregation (Intake summaries, session state)
|
||||
|
||||
**Reasoning Pipeline (`reasoning/`):**
|
||||
- `reflection.py` - Meta-awareness notes (Cloud LLM)
|
||||
- `reasoning.py` - Draft answer generation (Primary LLM)
|
||||
- `refine.py` - Answer refinement (Primary LLM)
|
||||
|
||||
**Persona Layer (`persona/`):**
|
||||
- `speak.py` - Personality application (Cloud LLM)
|
||||
- `identity.py` - Persona loader
|
||||
|
||||
**Intake Module (`intake/`):**
|
||||
- `__init__.py` - Package exports (SESSIONS, add_exchange_internal, summarize_context)
|
||||
- `intake.py` - Core logic (367 lines)
|
||||
- SESSIONS dictionary
|
||||
- add_exchange_internal()
|
||||
- summarize_context()
|
||||
- bg_summarize() stub
|
||||
|
||||
**LLM Integration (`llm/`):**
|
||||
- `llm_router.py` - Backend selector and HTTP client
|
||||
- call_llm() function
|
||||
- Environment-based routing
|
||||
- Payload formatting per backend type
|
||||
|
||||
**Utilities (`utils/`):**
|
||||
- Helper functions for common operations
|
||||
|
||||
**Configuration:**
|
||||
- `Dockerfile` - Single-worker constraint documented
|
||||
- `requirements.txt` - Python dependencies
|
||||
- `.env` - Service-specific overrides
|
||||
|
||||
### Relay Service (`core/relay/`)
|
||||
|
||||
**Main Files:**
|
||||
- `server.js` - Express.js server (Node.js)
|
||||
- `/v1/chat/completions` - OpenAI-compatible endpoint
|
||||
- `/chat` - Internal endpoint
|
||||
- `/_health` - Health check
|
||||
- `package.json` - Node.js dependencies
|
||||
|
||||
**Key Logic:**
|
||||
- Receives user messages
|
||||
- Routes to Cortex `/reason`
|
||||
- Async calls to Cortex `/ingest` after response
|
||||
- Returns final answer to user
|
||||
|
||||
### NeoMem Service (`neomem/`)
|
||||
|
||||
**Main Files:**
|
||||
- `main.py` - FastAPI app (memory API)
|
||||
- `memory.py` - Memory management logic
|
||||
- `embedder.py` - Embedding generation
|
||||
- `graph.py` - Neo4j graph operations
|
||||
- `Dockerfile` - Container definition
|
||||
- `requirements.txt` - Python dependencies
|
||||
|
||||
**API Endpoints:**
|
||||
- `POST /memories` - Add new memory
|
||||
- `POST /search` - Semantic search
|
||||
- `GET /health` - Service health
|
||||
|
||||
---
|
||||
|
||||
## Common Development Tasks
|
||||
|
||||
### Adding a New Endpoint to Cortex
|
||||
|
||||
**Example: Add `/debug/buffer` endpoint**
|
||||
|
||||
1. **Edit `cortex/router.py`:**
|
||||
```python
|
||||
@cortex_router.get("/debug/buffer")
|
||||
async def debug_buffer(session_id: str, limit: int = 10):
|
||||
"""Return last N exchanges from a session buffer."""
|
||||
from intake.intake import SESSIONS
|
||||
|
||||
session = SESSIONS.get(session_id)
|
||||
if not session:
|
||||
return {"error": "session not found", "session_id": session_id}
|
||||
|
||||
buffer = session["buffer"]
|
||||
recent = list(buffer)[-limit:]
|
||||
|
||||
return {
|
||||
"session_id": session_id,
|
||||
"total_exchanges": len(buffer),
|
||||
"recent_exchanges": recent
|
||||
}
|
||||
```
|
||||
- Normalizes keys (lowercase).
|
||||
- Strips Markdown fences before parsing.
|
||||
- Configurable via `.env` (`CORTEX_ENABLED=true|false`).
|
||||
- Currently generates annotations, but not yet persisted into Mem0 payloads (stored as empty `{cortex:{}}`).
|
||||
```
|
||||
|
||||
- **Mem0 (v0.4.0)** — Persistent memory layer.
|
||||
- Handles embeddings, graph storage, and retrieval.
|
||||
- Dual embedder support:
|
||||
- **OpenAI Cloud** (`text-embedding-3-small`, 1536-dim).
|
||||
- **HuggingFace TEI** (gte-Qwen2-1.5B-instruct, 1536-dim, hosted on 3090).
|
||||
- Environment toggle for provider (`.env.openai` vs `.env.3090`).
|
||||
- Memory persistence in Postgres (`payload` JSON).
|
||||
- CSV export pipeline confirmed (id, user_id, data, created_at).
|
||||
2. **Restart Cortex:**
|
||||
```bash
|
||||
docker-compose restart cortex
|
||||
```
|
||||
|
||||
- **Persona Sidecar**
|
||||
- Provides personality, style, and protocol instructions.
|
||||
- Injected at runtime into Core prompt building.
|
||||
3. **Test:**
|
||||
```bash
|
||||
curl "http://localhost:7081/debug/buffer?session_id=test&limit=5"
|
||||
```
|
||||
|
||||
### Modifying LLM Backend for a Module
|
||||
|
||||
**Example: Switch Cortex to use PRIMARY backend**
|
||||
|
||||
1. **Edit `.env`:**
|
||||
```bash
|
||||
CORTEX_LLM=PRIMARY # Change from SECONDARY to PRIMARY
|
||||
```
|
||||
|
||||
2. **Restart Cortex:**
|
||||
```bash
|
||||
docker-compose restart cortex
|
||||
```
|
||||
|
||||
3. **Verify in logs:**
|
||||
```bash
|
||||
docker logs cortex | grep "Backend"
|
||||
```
|
||||
|
||||
### Adding Diagnostic Logging
|
||||
|
||||
**Example: Log every exchange addition**
|
||||
|
||||
1. **Edit `cortex/intake/intake.py`:**
|
||||
```python
|
||||
def add_exchange_internal(exchange: dict):
|
||||
session_id = exchange.get("session_id")
|
||||
|
||||
# Add detailed logging
|
||||
print(f"[DEBUG] Adding exchange to {session_id}")
|
||||
print(f"[DEBUG] User msg: {exchange.get('user_msg', '')[:100]}")
|
||||
print(f"[DEBUG] Assistant msg: {exchange.get('assistant_msg', '')[:100]}")
|
||||
|
||||
# ... rest of function
|
||||
```
|
||||
|
||||
2. **View logs:**
|
||||
```bash
|
||||
docker logs cortex -f | grep DEBUG
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🚀 Recent Changes
|
||||
- **Mem0**
|
||||
- Added HuggingFace TEI integration (local 3090 embedder).
|
||||
- Enabled dual-mode environment switch (OpenAI cloud ↔ local TEI).
|
||||
- Fixed `.env` line ending mismatch (CRLF vs LF).
|
||||
- Added memory dump/export commands for Postgres.
|
||||
## Debugging Guide
|
||||
|
||||
- **Core/Relay**
|
||||
- No major changes since v0.3.1 (still routing input → Cortex → Mem0).
|
||||
### Problem: SESSIONS Not Persisting
|
||||
|
||||
- **Cortex**
|
||||
- Still outputs annotations, but not yet persisted into Mem0 payloads.
|
||||
**Symptoms:**
|
||||
- `/debug/sessions` shows empty or only 1 exchange
|
||||
- Summaries always return empty
|
||||
- Buffer size doesn't increase
|
||||
|
||||
**Diagnosis Steps:**
|
||||
1. Check Cortex logs for SESSIONS object ID:
|
||||
```bash
|
||||
docker logs cortex | grep "SESSIONS object id"
|
||||
```
|
||||
- Should show same ID across all calls
|
||||
- If IDs differ → module reloading issue
|
||||
|
||||
2. Verify single-worker mode:
|
||||
```bash
|
||||
docker exec cortex cat Dockerfile | grep uvicorn
|
||||
```
|
||||
- Should NOT have `--workers` flag or `--workers 1`
|
||||
|
||||
3. Check `/debug/sessions` endpoint:
|
||||
```bash
|
||||
curl http://localhost:7081/debug/sessions | jq
|
||||
```
|
||||
- Should show sessions_object_id and current sessions
|
||||
|
||||
4. Inspect `__init__.py` exists:
|
||||
```bash
|
||||
docker exec cortex ls -la intake/__init__.py
|
||||
```
|
||||
|
||||
**Solution (Fixed in v0.5.1):**
|
||||
- Ensure `cortex/intake/__init__.py` exists with proper exports
|
||||
- Verify `bg_summarize()` is implemented (not just TYPE_CHECKING stub)
|
||||
- Check `/ingest` endpoint doesn't have early return
|
||||
- Rebuild Cortex container: `docker-compose build cortex && docker-compose restart cortex`
|
||||
|
||||
### Problem: LLM Backend Timeout
|
||||
|
||||
**Symptoms:**
|
||||
- Cortex `/reason` hangs
|
||||
- 504 Gateway Timeout errors
|
||||
- Logs show "waiting for LLM response"
|
||||
|
||||
**Diagnosis Steps:**
|
||||
1. Test backend directly:
|
||||
```bash
|
||||
# llama.cpp
|
||||
curl http://10.0.0.44:8080/health
|
||||
|
||||
# Ollama
|
||||
curl http://10.0.0.3:11434/api/tags
|
||||
|
||||
# OpenAI
|
||||
curl https://api.openai.com/v1/models \
|
||||
-H "Authorization: Bearer $OPENAI_API_KEY"
|
||||
```
|
||||
|
||||
2. Check network connectivity:
|
||||
```bash
|
||||
docker exec cortex ping -c 3 10.0.0.44
|
||||
```
|
||||
|
||||
3. Review Cortex logs:
|
||||
```bash
|
||||
docker logs cortex -f | grep "LLM"
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
- Verify backend URL in `.env` is correct and accessible
|
||||
- Check firewall rules for backend ports
|
||||
- Increase timeout in `cortex/llm/llm_router.py`
|
||||
- Switch to different backend temporarily: `CORTEX_LLM=CLOUD`
|
||||
|
||||
### Problem: Docker Compose Won't Start
|
||||
|
||||
**Symptoms:**
|
||||
- `docker-compose up -d` fails
|
||||
- Container exits immediately
|
||||
- "port already in use" errors
|
||||
|
||||
**Diagnosis Steps:**
|
||||
1. Check port conflicts:
|
||||
```bash
|
||||
netstat -tulpn | grep -E '7078|7081|7077|5432'
|
||||
```
|
||||
|
||||
2. Check container logs:
|
||||
```bash
|
||||
docker-compose logs --tail=50
|
||||
```
|
||||
|
||||
3. Verify environment file:
|
||||
```bash
|
||||
cat .env | grep -v "^#" | grep -v "^$"
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
- Stop conflicting services: `docker-compose down`
|
||||
- Check `.env` syntax (no quotes unless necessary)
|
||||
- Rebuild containers: `docker-compose build --no-cache`
|
||||
- Check Docker daemon: `systemctl status docker`
|
||||
|
||||
---
|
||||
|
||||
### 📈 Versioning
|
||||
- **Lyra Core** → v0.3.1
|
||||
- **Cortex** → v0.3.0
|
||||
- **Mem0** → v0.4.0
|
||||
## Testing Checklist
|
||||
|
||||
### After Making Changes to Cortex
|
||||
|
||||
**1. Build and restart:**
|
||||
```bash
|
||||
docker-compose build cortex
|
||||
docker-compose restart cortex
|
||||
```
|
||||
|
||||
**2. Verify service health:**
|
||||
```bash
|
||||
curl http://localhost:7081/health
|
||||
```
|
||||
|
||||
**3. Test /ingest endpoint:**
|
||||
```bash
|
||||
curl -X POST http://localhost:7081/ingest \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"session_id": "test",
|
||||
"user_msg": "Hello",
|
||||
"assistant_msg": "Hi there!"
|
||||
}'
|
||||
```
|
||||
|
||||
**4. Verify SESSIONS updated:**
|
||||
```bash
|
||||
curl http://localhost:7081/debug/sessions | jq '.sessions.test.buffer_size'
|
||||
```
|
||||
- Should show 1 (or increment if already populated)
|
||||
|
||||
**5. Test summarization:**
|
||||
```bash
|
||||
curl "http://localhost:7081/debug/summary?session_id=test" | jq '.summary'
|
||||
```
|
||||
- Should return L1/L5/L10/L20/L30 summaries
|
||||
|
||||
**6. Test full pipeline:**
|
||||
```bash
|
||||
curl -X POST http://localhost:7078/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"messages": [{"role": "user", "content": "Test message"}],
|
||||
"session_id": "test"
|
||||
}' | jq '.choices[0].message.content'
|
||||
```
|
||||
|
||||
**7. Check logs for errors:**
|
||||
```bash
|
||||
docker logs cortex --tail=50
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 📋 Next Steps
|
||||
- [ ] Wire Cortex annotations into Mem0 payloads (`cortex` object).
|
||||
- [ ] Add “export all memories” script to standard workflow.
|
||||
- [ ] Consider async embedding for faster `mem.add`.
|
||||
- [ ] Build visual diagram of data flow (Core ↔ Cortex ↔ Mem0 ↔ Persona).
|
||||
- [ ] Explore larger LLMs for Cortex (Qwen2-7B, etc.) for richer subconscious annotation.
|
||||
## Project History & Context
|
||||
|
||||
### Evolution Timeline
|
||||
|
||||
**v0.1.x (2025-09-23 to 2025-09-25)**
|
||||
- Initial MVP: Relay + Mem0 + Ollama
|
||||
- Basic memory storage and retrieval
|
||||
- Simple UI with session support
|
||||
|
||||
**v0.2.x (2025-09-24 to 2025-09-30)**
|
||||
- Migrated to mem0ai SDK
|
||||
- Added sessionId support
|
||||
- Created standalone Lyra-Mem0 stack
|
||||
|
||||
**v0.3.x (2025-09-26 to 2025-10-28)**
|
||||
- Forked Mem0 → NVGRAM → NeoMem
|
||||
- Added salience filtering
|
||||
- Integrated Cortex reasoning VM
|
||||
- Built RAG system (Beta Lyrae)
|
||||
- Established multi-backend LLM support
|
||||
|
||||
**v0.4.x (2025-11-05 to 2025-11-13)**
|
||||
- Major architectural rewire
|
||||
- Implemented 4-stage reasoning pipeline
|
||||
- Added reflection, refinement stages
|
||||
- RAG integration
|
||||
- LLM router with per-stage backend selection
|
||||
|
||||
**Infrastructure v1.0.0 (2025-11-26)**
|
||||
- Consolidated 9 `.env` files into single source of truth
|
||||
- Multi-backend LLM strategy
|
||||
- Docker Compose consolidation
|
||||
- Created security templates
|
||||
|
||||
**v0.5.0 (2025-11-28)**
|
||||
- Fixed all critical API wiring issues
|
||||
- Added OpenAI-compatible Relay endpoint
|
||||
- Fixed Cortex → Intake integration
|
||||
- End-to-end flow verification
|
||||
|
||||
**v0.5.1 (2025-12-11) - CURRENT**
|
||||
- **Critical fix**: SESSIONS persistence bug
|
||||
- Implemented `bg_summarize()` stub
|
||||
- Fixed `/ingest` unreachable code
|
||||
- Added `cortex/intake/__init__.py`
|
||||
- Embedded Intake in Cortex (no longer standalone)
|
||||
- Added diagnostic endpoints
|
||||
- Lenient error handling
|
||||
- Documented single-worker constraint
|
||||
|
||||
### Architectural Philosophy
|
||||
|
||||
**Modular Design:**
|
||||
- Each service has a single, clear responsibility
|
||||
- Services communicate via well-defined HTTP APIs
|
||||
- Configuration is centralized but allows per-service overrides
|
||||
|
||||
**Local-First:**
|
||||
- No reliance on external services (except optional OpenAI)
|
||||
- All data stored locally (PostgreSQL + Neo4j)
|
||||
- Can run entirely air-gapped with local LLMs
|
||||
|
||||
**Flexible LLM Backend:**
|
||||
- Not tied to any single LLM provider
|
||||
- Can mix local and cloud models
|
||||
- Per-stage backend selection for optimal performance/cost
|
||||
|
||||
**Error Handling:**
|
||||
- Lenient mode: Never fail the chat pipeline
|
||||
- Log errors but continue processing
|
||||
- Graceful degradation
|
||||
|
||||
**Observability:**
|
||||
- Diagnostic endpoints for debugging
|
||||
- Verbose logging mode
|
||||
- Object ID tracking for singleton verification
|
||||
|
||||
---
|
||||
|
||||
## Known Issues & Limitations
|
||||
|
||||
### Fixed in v0.5.1
|
||||
- ✅ Intake SESSIONS not persisting → **FIXED**
|
||||
- ✅ `bg_summarize()` NameError → **FIXED**
|
||||
- ✅ `/ingest` endpoint unreachable code → **FIXED**
|
||||
|
||||
### Current Limitations
|
||||
|
||||
**1. Single-Worker Constraint**
|
||||
- Cortex must run with single Uvicorn worker
|
||||
- SESSIONS is in-memory module-level global
|
||||
- Multi-worker support requires Redis or shared storage
|
||||
- Documented in `cortex/Dockerfile` lines 7-8
|
||||
|
||||
**2. NeoMem Integration Incomplete**
|
||||
- Relay doesn't yet push to NeoMem after responses
|
||||
- Memory storage planned for v0.5.2
|
||||
- Currently all memory is short-term (SESSIONS only)
|
||||
|
||||
**3. RAG Service Disabled**
|
||||
- Beta Lyrae (RAG) commented out in docker-compose.yml
|
||||
- Awaiting re-enablement after Intake stabilization
|
||||
- Code exists but not currently integrated
|
||||
|
||||
**4. Session Management**
|
||||
- No session cleanup/expiration
|
||||
- SESSIONS grows unbounded (maxlen=200 per session, but infinite sessions)
|
||||
- No session list endpoint in Relay
|
||||
|
||||
**5. Persona Integration**
|
||||
- `PERSONA_ENABLED=false` in `.env`
|
||||
- Persona Sidecar not fully wired
|
||||
- Identity loaded but not consistently applied
|
||||
|
||||
### Future Enhancements
|
||||
|
||||
**Short-term (v0.5.2):**
|
||||
- Enable NeoMem integration in Relay
|
||||
- Add session cleanup/expiration
|
||||
- Session list endpoint
|
||||
- NeoMem health monitoring
|
||||
|
||||
**Medium-term (v0.6.x):**
|
||||
- Re-enable RAG service
|
||||
- Migrate SESSIONS to Redis for multi-worker support
|
||||
- Add request correlation IDs
|
||||
- Comprehensive health checks
|
||||
|
||||
**Long-term (v0.7.x+):**
|
||||
- Persona Sidecar full integration
|
||||
- Autonomous "dream" cycles (self-reflection)
|
||||
- Verifier module for factual grounding
|
||||
- Advanced RAG with hybrid search
|
||||
- Memory consolidation strategies
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting Quick Reference
|
||||
|
||||
| Problem | Quick Check | Solution |
|
||||
|---------|-------------|----------|
|
||||
| SESSIONS empty | `curl localhost:7081/debug/sessions` | Rebuild Cortex, verify `__init__.py` exists |
|
||||
| LLM timeout | `curl http://10.0.0.44:8080/health` | Check backend connectivity, increase timeout |
|
||||
| Port conflict | `netstat -tulpn \| grep 7078` | Stop conflicting service or change port |
|
||||
| Container crash | `docker logs cortex` | Check logs for Python errors, verify .env syntax |
|
||||
| Missing package | `docker exec cortex pip list` | Rebuild container, check requirements.txt |
|
||||
| 502 from Relay | `curl localhost:7081/health` | Verify Cortex is running, check docker network |
|
||||
|
||||
---
|
||||
|
||||
## API Reference (Quick)
|
||||
|
||||
### Relay (Port 7078)
|
||||
|
||||
**POST /v1/chat/completions** - OpenAI-compatible chat
|
||||
```json
|
||||
{
|
||||
"messages": [{"role": "user", "content": "..."}],
|
||||
"session_id": "..."
|
||||
}
|
||||
```
|
||||
|
||||
**GET /_health** - Service health
|
||||
|
||||
### Cortex (Port 7081)
|
||||
|
||||
**POST /reason** - Main reasoning pipeline
|
||||
```json
|
||||
{
|
||||
"session_id": "...",
|
||||
"user_prompt": "...",
|
||||
"temperature": 0.7 // optional
|
||||
}
|
||||
```
|
||||
|
||||
**POST /ingest** - Add exchange to SESSIONS
|
||||
```json
|
||||
{
|
||||
"session_id": "...",
|
||||
"user_msg": "...",
|
||||
"assistant_msg": "..."
|
||||
}
|
||||
```
|
||||
|
||||
**GET /debug/sessions** - Inspect SESSIONS state
|
||||
|
||||
**GET /debug/summary?session_id=X** - Test summarization
|
||||
|
||||
**GET /health** - Service health
|
||||
|
||||
### NeoMem (Port 7077)
|
||||
|
||||
**POST /memories** - Add memory
|
||||
```json
|
||||
{
|
||||
"messages": [{"role": "...", "content": "..."}],
|
||||
"user_id": "...",
|
||||
"metadata": {}
|
||||
}
|
||||
```
|
||||
|
||||
**POST /search** - Semantic search
|
||||
```json
|
||||
{
|
||||
"query": "...",
|
||||
"user_id": "...",
|
||||
"limit": 10
|
||||
}
|
||||
```
|
||||
|
||||
**GET /health** - Service health
|
||||
|
||||
---
|
||||
|
||||
## File Manifest (Key Files Only)
|
||||
|
||||
```
|
||||
project-lyra/
|
||||
├── .env # Root environment variables
|
||||
├── docker-compose.yml # Service definitions (152 lines)
|
||||
├── CHANGELOG.md # Version history (836 lines)
|
||||
├── README.md # User documentation (610 lines)
|
||||
├── PROJECT_SUMMARY.md # This file (AI context)
|
||||
│
|
||||
├── cortex/ # Reasoning engine
|
||||
│ ├── Dockerfile # Single-worker constraint documented
|
||||
│ ├── requirements.txt
|
||||
│ ├── .env # Cortex overrides
|
||||
│ ├── main.py # FastAPI initialization
|
||||
│ ├── router.py # Routes (306 lines)
|
||||
│ ├── context.py # Context aggregation
|
||||
│ │
|
||||
│ ├── intake/ # Short-term memory (embedded)
|
||||
│ │ ├── __init__.py # Package exports
|
||||
│ │ └── intake.py # Core logic (367 lines)
|
||||
│ │
|
||||
│ ├── reasoning/ # Reasoning pipeline
|
||||
│ │ ├── reflection.py # Meta-awareness
|
||||
│ │ ├── reasoning.py # Draft generation
|
||||
│ │ └── refine.py # Refinement
|
||||
│ │
|
||||
│ ├── persona/ # Personality layer
|
||||
│ │ ├── speak.py # Persona application
|
||||
│ │ └── identity.py # Persona loader
|
||||
│ │
|
||||
│ └── llm/ # LLM integration
|
||||
│ └── llm_router.py # Backend selector
|
||||
│
|
||||
├── core/relay/ # Orchestrator
|
||||
│ ├── server.js # Express server (Node.js)
|
||||
│ └── package.json
|
||||
│
|
||||
├── neomem/ # Long-term memory
|
||||
│ ├── Dockerfile
|
||||
│ ├── requirements.txt
|
||||
│ ├── .env # NeoMem overrides
|
||||
│ └── main.py # Memory API
|
||||
│
|
||||
└── rag/ # RAG system (disabled)
|
||||
├── rag_api.py
|
||||
├── rag_chat_import.py
|
||||
└── chromadb/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Final Notes for AI Assistants
|
||||
|
||||
### What You Should Know Before Making Changes
|
||||
|
||||
1. **SESSIONS is sacred** - It's a module-level global in `cortex/intake/intake.py`. Don't move it, don't duplicate it, don't make it a class attribute. It must remain a singleton.
|
||||
|
||||
2. **Single-worker is mandatory** - Until SESSIONS is migrated to Redis, Cortex MUST run with a single Uvicorn worker. Multi-worker will cause SESSIONS to be inconsistent.
|
||||
|
||||
3. **Lenient error handling** - The `/ingest` endpoint and other parts of the pipeline use lenient error handling: log errors but always return success. Never fail the chat pipeline.
|
||||
|
||||
4. **Backend routing is environment-driven** - Don't hardcode LLM URLs. Use the `{MODULE}_LLM` environment variables and the llm_router.py system.
|
||||
|
||||
5. **Intake is embedded** - Don't try to make HTTP calls to Intake. Use direct Python imports: `from intake.intake import ...`
|
||||
|
||||
6. **Test with diagnostic endpoints** - Always use `/debug/sessions` and `/debug/summary` to verify SESSIONS behavior after changes.
|
||||
|
||||
7. **Follow the changelog format** - When documenting changes, use the chronological format established in CHANGELOG.md v0.5.1. Group by version, then by change type (Fixed, Added, Changed, etc.).
|
||||
|
||||
### When You Need Help
|
||||
|
||||
- **SESSIONS issues**: Check `cortex/intake/intake.py` lines 11-14 for initialization, lines 325-366 for `add_exchange_internal()`
|
||||
- **Routing issues**: Check `cortex/router.py` lines 65-189 for `/reason`, lines 201-233 for `/ingest`
|
||||
- **LLM backend issues**: Check `cortex/llm/llm_router.py` for backend selection logic
|
||||
- **Environment variables**: Check `.env` lines 13-40 for LLM backends, lines 28-34 for module selection
|
||||
|
||||
### Most Important Thing
|
||||
|
||||
**This project values reliability over features.** It's better to have a simple, working system than a complex, broken one. When in doubt, keep it simple, log everything, and never fail silently.
|
||||
|
||||
---
|
||||
|
||||
**End of AI Context Summary**
|
||||
|
||||
*This document is maintained to provide complete context for AI assistants working on Project Lyra. Last updated: v0.5.1 (2025-12-11)*
|
||||
|
||||
Reference in New Issue
Block a user