# Project Lyra — Comprehensive AI Context Summary **Version:** v0.5.1 (2025-12-11) **Status:** Production-ready modular AI companion system **Purpose:** Memory-backed conversational AI with multi-stage reasoning, persistent context, and modular LLM backend architecture --- ## Executive Summary Project Lyra is a **self-hosted AI companion system** designed to overcome the limitations of typical chatbots by providing: - **Persistent long-term memory** (NeoMem: PostgreSQL + Neo4j graph storage) - **Multi-stage reasoning pipeline** (Cortex: reflection → reasoning → refinement → persona) - **Short-term context management** (Intake: session-based summarization embedded in Cortex) - **Flexible LLM backend routing** (supports llama.cpp, Ollama, OpenAI, custom endpoints) - **OpenAI-compatible API** (drop-in replacement for chat applications) **Core Philosophy:** Like a human brain has different regions for different functions, Lyra has specialized modules that work together. She's not just a chatbot—she's a notepad, schedule, database, co-creator, and collaborator with her own executive function. --- ## Quick Context for AI Assistants If you're an AI being given this project to work on, here's what you need to know: ### What This Project Does Lyra is a conversational AI system that **remembers everything** across sessions. When a user says something in passing, Lyra stores it, contextualizes it, and can recall it later. She can: - Track project progress over time - Remember user preferences and past conversations - Reason through complex questions using multiple LLM calls - Apply a consistent personality across all interactions - Integrate with multiple LLM backends (local and cloud) ### Current Architecture (v0.5.1) ``` User → Relay (Express/Node.js, port 7078) ↓ Cortex (FastAPI/Python, port 7081) ├─ Intake module (embedded, in-memory SESSIONS) ├─ 4-stage reasoning pipeline └─ Multi-backend LLM router ↓ NeoMem (FastAPI/Python, port 7077) ├─ PostgreSQL (vector storage) └─ Neo4j (graph relationships) ``` ### Key Files You'll Work With **Backend Services:** - [cortex/router.py](cortex/router.py) - Main Cortex routing logic (306 lines, `/reason`, `/ingest` endpoints) - [cortex/intake/intake.py](cortex/intake/intake.py) - Short-term memory module (367 lines, SESSIONS management) - [cortex/reasoning/reasoning.py](cortex/reasoning/reasoning.py) - Draft answer generation - [cortex/reasoning/refine.py](cortex/reasoning/refine.py) - Answer refinement - [cortex/reasoning/reflection.py](cortex/reasoning/reflection.py) - Meta-awareness notes - [cortex/persona/speak.py](cortex/persona/speak.py) - Personality layer - [cortex/llm/llm_router.py](cortex/llm/llm_router.py) - LLM backend selector - [core/relay/server.js](core/relay/server.js) - Main orchestrator (Node.js) - [neomem/main.py](neomem/main.py) - Long-term memory API **Configuration:** - [.env](.env) - Root environment variables (LLM backends, databases, API keys) - [cortex/.env](cortex/.env) - Cortex-specific overrides - [docker-compose.yml](docker-compose.yml) - Service definitions (152 lines) **Documentation:** - [CHANGELOG.md](CHANGELOG.md) - Complete version history (836 lines, chronological format) - [README.md](README.md) - User-facing documentation (610 lines) - [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - This file ### Recent Critical Fixes (v0.5.1) The most recent work fixed a critical bug where Intake's SESSIONS buffer wasn't persisting: 1. **Fixed**: `bg_summarize()` was only a TYPE_CHECKING stub → implemented as logging stub 2. **Fixed**: `/ingest` endpoint had unreachable code → removed early return, added lenient error handling 3. **Added**: `cortex/intake/__init__.py` → proper Python package structure 4. **Added**: Diagnostic endpoints `/debug/sessions` and `/debug/summary` for troubleshooting **Key Insight**: Intake is no longer a standalone service—it's embedded in Cortex as a Python module. SESSIONS must persist in a single Uvicorn worker (no multi-worker support without Redis). --- ## Architecture Deep Dive ### Service Topology (Docker Compose) **Active Containers:** 1. **relay** (Node.js/Express, port 7078) - Entry point for all user requests - OpenAI-compatible `/v1/chat/completions` endpoint - Routes to Cortex for reasoning - Async calls to Cortex `/ingest` after response 2. **cortex** (Python/FastAPI, port 7081) - Multi-stage reasoning pipeline - Embedded Intake module (no HTTP, direct Python imports) - Endpoints: `/reason`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary` 3. **neomem-api** (Python/FastAPI, port 7077) - Long-term memory storage - Fork of Mem0 OSS (fully local, no external SDK) - Endpoints: `/memories`, `/search`, `/health` 4. **neomem-postgres** (PostgreSQL + pgvector, port 5432) - Vector embeddings storage - Memory history records 5. **neomem-neo4j** (Neo4j, ports 7474/7687) - Graph relationships between memories - Entity extraction and linking **Disabled Services:** - `intake` - No longer needed (embedded in Cortex as of v0.5.1) - `rag` - Beta Lyrae RAG service (planned re-enablement) ### External LLM Backends (HTTP APIs) **PRIMARY Backend** - llama.cpp @ `http://10.0.0.44:8080` - AMD MI50 GPU-accelerated inference - Model: `/model` (path-based routing) - Used for: Reasoning, refinement, summarization **SECONDARY Backend** - Ollama @ `http://10.0.0.3:11434` - RTX 3090 GPU-accelerated inference - Model: `qwen2.5:7b-instruct-q4_K_M` - Used for: Configurable per-module **CLOUD Backend** - OpenAI @ `https://api.openai.com/v1` - Cloud-based inference - Model: `gpt-4o-mini` - Used for: Reflection, persona layers **FALLBACK Backend** - Local @ `http://10.0.0.41:11435` - CPU-based inference - Model: `llama-3.2-8b-instruct` - Used for: Emergency fallback ### Data Flow (Request Lifecycle) ``` 1. User sends message → Relay (/v1/chat/completions) ↓ 2. Relay → Cortex (/reason) ↓ 3. Cortex calls Intake module (internal Python) - Intake.summarize_context(session_id, exchanges) - Returns L1/L5/L10/L20/L30 summaries ↓ 4. Cortex 4-stage pipeline: a. reflection.py → Meta-awareness notes (CLOUD backend) - "What is the user really asking?" - Returns JSON: {"notes": [...]} b. reasoning.py → Draft answer (PRIMARY backend) - Uses context from Intake - Integrates reflection notes - Returns draft text c. refine.py → Refined answer (PRIMARY backend) - Polishes draft for clarity - Ensures factual consistency - Returns refined text d. speak.py → Persona layer (CLOUD backend) - Applies Lyra's personality - Natural, conversational tone - Returns final answer ↓ 5. Cortex → Relay (returns persona answer) ↓ 6. Relay → Cortex (/ingest) [async, non-blocking] - Sends (session_id, user_msg, assistant_msg) - Cortex calls add_exchange_internal() - Appends to SESSIONS[session_id]["buffer"] ↓ 7. Relay → User (returns final response) ↓ 8. [Planned] Relay → NeoMem (/memories) [async] - Store conversation in long-term memory ``` ### Intake Module Architecture (v0.5.1) **Location:** `cortex/intake/` **Key Change:** Intake is now **embedded in Cortex** as a Python module, not a standalone service. **Import Pattern:** ```python from intake.intake import add_exchange_internal, SESSIONS, summarize_context ``` **Core Data Structure:** ```python SESSIONS: dict[str, dict] = {} # Structure: SESSIONS[session_id] = { "buffer": deque(maxlen=200), # Circular buffer of exchanges "created_at": datetime } # Each exchange in buffer: { "session_id": "...", "user_msg": "...", "assistant_msg": "...", "timestamp": "2025-12-11T..." } ``` **Functions:** 1. **`add_exchange_internal(exchange: dict)`** - Adds exchange to SESSIONS buffer - Creates new session if needed - Calls `bg_summarize()` stub - Returns `{"ok": True, "session_id": "..."}` 2. **`summarize_context(session_id: str, exchanges: list[dict])`** [async] - Generates L1/L5/L10/L20/L30 summaries via LLM - Called during `/reason` endpoint - Returns multi-level summary dict 3. **`bg_summarize(session_id: str)`** - **Stub function** - logs only, no actual work - Defers summarization to `/reason` call - Exists to prevent NameError **Critical Constraint:** SESSIONS is a module-level global dict. This requires **single-worker Uvicorn** mode. Multi-worker deployments need Redis or shared storage. **Diagnostic Endpoints:** - `GET /debug/sessions` - Inspect all SESSIONS (object ID, buffer sizes, recent exchanges) - `GET /debug/summary?session_id=X` - Test summarization for a session --- ## Environment Configuration ### LLM Backend Registry (Multi-Backend Strategy) **Root `.env` defines all backend OPTIONS:** ```bash # PRIMARY Backend (llama.cpp) LLM_PRIMARY_PROVIDER=llama.cpp LLM_PRIMARY_URL=http://10.0.0.44:8080 LLM_PRIMARY_MODEL=/model # SECONDARY Backend (Ollama) LLM_SECONDARY_PROVIDER=ollama LLM_SECONDARY_URL=http://10.0.0.3:11434 LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M # CLOUD Backend (OpenAI) LLM_OPENAI_PROVIDER=openai LLM_OPENAI_URL=https://api.openai.com/v1 LLM_OPENAI_MODEL=gpt-4o-mini OPENAI_API_KEY=sk-proj-... # FALLBACK Backend LLM_FALLBACK_PROVIDER=openai_completions LLM_FALLBACK_URL=http://10.0.0.41:11435 LLM_FALLBACK_MODEL=llama-3.2-8b-instruct ``` **Module-specific backend selection:** ```bash CORTEX_LLM=SECONDARY # Cortex uses Ollama INTAKE_LLM=PRIMARY # Intake uses llama.cpp SPEAK_LLM=OPENAI # Persona uses OpenAI NEOMEM_LLM=PRIMARY # NeoMem uses llama.cpp UI_LLM=OPENAI # UI uses OpenAI RELAY_LLM=PRIMARY # Relay uses llama.cpp ``` **Philosophy:** Root `.env` provides all backend OPTIONS. Each service chooses which backend to USE via `{MODULE}_LLM` variable. This eliminates URL duplication while preserving flexibility. ### Database Configuration ```bash # PostgreSQL (vector storage) POSTGRES_USER=neomem POSTGRES_PASSWORD=neomempass POSTGRES_DB=neomem POSTGRES_HOST=neomem-postgres POSTGRES_PORT=5432 # Neo4j (graph storage) NEO4J_URI=bolt://neomem-neo4j:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=neomemgraph ``` ### Service URLs (Docker Internal Network) ```bash NEOMEM_API=http://neomem-api:7077 CORTEX_API=http://cortex:7081 CORTEX_REASON_URL=http://cortex:7081/reason CORTEX_INGEST_URL=http://cortex:7081/ingest RELAY_URL=http://relay:7078 ``` ### Feature Flags ```bash CORTEX_ENABLED=true MEMORY_ENABLED=true PERSONA_ENABLED=false DEBUG_PROMPT=true VERBOSE_DEBUG=true ``` --- ## Code Structure Overview ### Cortex Service (`cortex/`) **Main Files:** - `main.py` - FastAPI app initialization - `router.py` - Route definitions (`/reason`, `/ingest`, `/health`, `/debug/*`) - `context.py` - Context aggregation (Intake summaries, session state) **Reasoning Pipeline (`reasoning/`):** - `reflection.py` - Meta-awareness notes (Cloud LLM) - `reasoning.py` - Draft answer generation (Primary LLM) - `refine.py` - Answer refinement (Primary LLM) **Persona Layer (`persona/`):** - `speak.py` - Personality application (Cloud LLM) - `identity.py` - Persona loader **Intake Module (`intake/`):** - `__init__.py` - Package exports (SESSIONS, add_exchange_internal, summarize_context) - `intake.py` - Core logic (367 lines) - SESSIONS dictionary - add_exchange_internal() - summarize_context() - bg_summarize() stub **LLM Integration (`llm/`):** - `llm_router.py` - Backend selector and HTTP client - call_llm() function - Environment-based routing - Payload formatting per backend type **Utilities (`utils/`):** - Helper functions for common operations **Configuration:** - `Dockerfile` - Single-worker constraint documented - `requirements.txt` - Python dependencies - `.env` - Service-specific overrides ### Relay Service (`core/relay/`) **Main Files:** - `server.js` - Express.js server (Node.js) - `/v1/chat/completions` - OpenAI-compatible endpoint - `/chat` - Internal endpoint - `/_health` - Health check - `package.json` - Node.js dependencies **Key Logic:** - Receives user messages - Routes to Cortex `/reason` - Async calls to Cortex `/ingest` after response - Returns final answer to user ### NeoMem Service (`neomem/`) **Main Files:** - `main.py` - FastAPI app (memory API) - `memory.py` - Memory management logic - `embedder.py` - Embedding generation - `graph.py` - Neo4j graph operations - `Dockerfile` - Container definition - `requirements.txt` - Python dependencies **API Endpoints:** - `POST /memories` - Add new memory - `POST /search` - Semantic search - `GET /health` - Service health --- ## Common Development Tasks ### Adding a New Endpoint to Cortex **Example: Add `/debug/buffer` endpoint** 1. **Edit `cortex/router.py`:** ```python @cortex_router.get("/debug/buffer") async def debug_buffer(session_id: str, limit: int = 10): """Return last N exchanges from a session buffer.""" from intake.intake import SESSIONS session = SESSIONS.get(session_id) if not session: return {"error": "session not found", "session_id": session_id} buffer = session["buffer"] recent = list(buffer)[-limit:] return { "session_id": session_id, "total_exchanges": len(buffer), "recent_exchanges": recent } ``` 2. **Restart Cortex:** ```bash docker-compose restart cortex ``` 3. **Test:** ```bash curl "http://localhost:7081/debug/buffer?session_id=test&limit=5" ``` ### Modifying LLM Backend for a Module **Example: Switch Cortex to use PRIMARY backend** 1. **Edit `.env`:** ```bash CORTEX_LLM=PRIMARY # Change from SECONDARY to PRIMARY ``` 2. **Restart Cortex:** ```bash docker-compose restart cortex ``` 3. **Verify in logs:** ```bash docker logs cortex | grep "Backend" ``` ### Adding Diagnostic Logging **Example: Log every exchange addition** 1. **Edit `cortex/intake/intake.py`:** ```python def add_exchange_internal(exchange: dict): session_id = exchange.get("session_id") # Add detailed logging print(f"[DEBUG] Adding exchange to {session_id}") print(f"[DEBUG] User msg: {exchange.get('user_msg', '')[:100]}") print(f"[DEBUG] Assistant msg: {exchange.get('assistant_msg', '')[:100]}") # ... rest of function ``` 2. **View logs:** ```bash docker logs cortex -f | grep DEBUG ``` --- ## Debugging Guide ### Problem: SESSIONS Not Persisting **Symptoms:** - `/debug/sessions` shows empty or only 1 exchange - Summaries always return empty - Buffer size doesn't increase **Diagnosis Steps:** 1. Check Cortex logs for SESSIONS object ID: ```bash docker logs cortex | grep "SESSIONS object id" ``` - Should show same ID across all calls - If IDs differ → module reloading issue 2. Verify single-worker mode: ```bash docker exec cortex cat Dockerfile | grep uvicorn ``` - Should NOT have `--workers` flag or `--workers 1` 3. Check `/debug/sessions` endpoint: ```bash curl http://localhost:7081/debug/sessions | jq ``` - Should show sessions_object_id and current sessions 4. Inspect `__init__.py` exists: ```bash docker exec cortex ls -la intake/__init__.py ``` **Solution (Fixed in v0.5.1):** - Ensure `cortex/intake/__init__.py` exists with proper exports - Verify `bg_summarize()` is implemented (not just TYPE_CHECKING stub) - Check `/ingest` endpoint doesn't have early return - Rebuild Cortex container: `docker-compose build cortex && docker-compose restart cortex` ### Problem: LLM Backend Timeout **Symptoms:** - Cortex `/reason` hangs - 504 Gateway Timeout errors - Logs show "waiting for LLM response" **Diagnosis Steps:** 1. Test backend directly: ```bash # llama.cpp curl http://10.0.0.44:8080/health # Ollama curl http://10.0.0.3:11434/api/tags # OpenAI curl https://api.openai.com/v1/models \ -H "Authorization: Bearer $OPENAI_API_KEY" ``` 2. Check network connectivity: ```bash docker exec cortex ping -c 3 10.0.0.44 ``` 3. Review Cortex logs: ```bash docker logs cortex -f | grep "LLM" ``` **Solutions:** - Verify backend URL in `.env` is correct and accessible - Check firewall rules for backend ports - Increase timeout in `cortex/llm/llm_router.py` - Switch to different backend temporarily: `CORTEX_LLM=CLOUD` ### Problem: Docker Compose Won't Start **Symptoms:** - `docker-compose up -d` fails - Container exits immediately - "port already in use" errors **Diagnosis Steps:** 1. Check port conflicts: ```bash netstat -tulpn | grep -E '7078|7081|7077|5432' ``` 2. Check container logs: ```bash docker-compose logs --tail=50 ``` 3. Verify environment file: ```bash cat .env | grep -v "^#" | grep -v "^$" ``` **Solutions:** - Stop conflicting services: `docker-compose down` - Check `.env` syntax (no quotes unless necessary) - Rebuild containers: `docker-compose build --no-cache` - Check Docker daemon: `systemctl status docker` --- ## Testing Checklist ### After Making Changes to Cortex **1. Build and restart:** ```bash docker-compose build cortex docker-compose restart cortex ``` **2. Verify service health:** ```bash curl http://localhost:7081/health ``` **3. Test /ingest endpoint:** ```bash curl -X POST http://localhost:7081/ingest \ -H "Content-Type: application/json" \ -d '{ "session_id": "test", "user_msg": "Hello", "assistant_msg": "Hi there!" }' ``` **4. Verify SESSIONS updated:** ```bash curl http://localhost:7081/debug/sessions | jq '.sessions.test.buffer_size' ``` - Should show 1 (or increment if already populated) **5. Test summarization:** ```bash curl "http://localhost:7081/debug/summary?session_id=test" | jq '.summary' ``` - Should return L1/L5/L10/L20/L30 summaries **6. Test full pipeline:** ```bash curl -X POST http://localhost:7078/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "messages": [{"role": "user", "content": "Test message"}], "session_id": "test" }' | jq '.choices[0].message.content' ``` **7. Check logs for errors:** ```bash docker logs cortex --tail=50 ``` --- ## Project History & Context ### Evolution Timeline **v0.1.x (2025-09-23 to 2025-09-25)** - Initial MVP: Relay + Mem0 + Ollama - Basic memory storage and retrieval - Simple UI with session support **v0.2.x (2025-09-24 to 2025-09-30)** - Migrated to mem0ai SDK - Added sessionId support - Created standalone Lyra-Mem0 stack **v0.3.x (2025-09-26 to 2025-10-28)** - Forked Mem0 → NVGRAM → NeoMem - Added salience filtering - Integrated Cortex reasoning VM - Built RAG system (Beta Lyrae) - Established multi-backend LLM support **v0.4.x (2025-11-05 to 2025-11-13)** - Major architectural rewire - Implemented 4-stage reasoning pipeline - Added reflection, refinement stages - RAG integration - LLM router with per-stage backend selection **Infrastructure v1.0.0 (2025-11-26)** - Consolidated 9 `.env` files into single source of truth - Multi-backend LLM strategy - Docker Compose consolidation - Created security templates **v0.5.0 (2025-11-28)** - Fixed all critical API wiring issues - Added OpenAI-compatible Relay endpoint - Fixed Cortex → Intake integration - End-to-end flow verification **v0.5.1 (2025-12-11) - CURRENT** - **Critical fix**: SESSIONS persistence bug - Implemented `bg_summarize()` stub - Fixed `/ingest` unreachable code - Added `cortex/intake/__init__.py` - Embedded Intake in Cortex (no longer standalone) - Added diagnostic endpoints - Lenient error handling - Documented single-worker constraint ### Architectural Philosophy **Modular Design:** - Each service has a single, clear responsibility - Services communicate via well-defined HTTP APIs - Configuration is centralized but allows per-service overrides **Local-First:** - No reliance on external services (except optional OpenAI) - All data stored locally (PostgreSQL + Neo4j) - Can run entirely air-gapped with local LLMs **Flexible LLM Backend:** - Not tied to any single LLM provider - Can mix local and cloud models - Per-stage backend selection for optimal performance/cost **Error Handling:** - Lenient mode: Never fail the chat pipeline - Log errors but continue processing - Graceful degradation **Observability:** - Diagnostic endpoints for debugging - Verbose logging mode - Object ID tracking for singleton verification --- ## Known Issues & Limitations ### Fixed in v0.5.1 - ✅ Intake SESSIONS not persisting → **FIXED** - ✅ `bg_summarize()` NameError → **FIXED** - ✅ `/ingest` endpoint unreachable code → **FIXED** ### Current Limitations **1. Single-Worker Constraint** - Cortex must run with single Uvicorn worker - SESSIONS is in-memory module-level global - Multi-worker support requires Redis or shared storage - Documented in `cortex/Dockerfile` lines 7-8 **2. NeoMem Integration Incomplete** - Relay doesn't yet push to NeoMem after responses - Memory storage planned for v0.5.2 - Currently all memory is short-term (SESSIONS only) **3. RAG Service Disabled** - Beta Lyrae (RAG) commented out in docker-compose.yml - Awaiting re-enablement after Intake stabilization - Code exists but not currently integrated **4. Session Management** - No session cleanup/expiration - SESSIONS grows unbounded (maxlen=200 per session, but infinite sessions) - No session list endpoint in Relay **5. Persona Integration** - `PERSONA_ENABLED=false` in `.env` - Persona Sidecar not fully wired - Identity loaded but not consistently applied ### Future Enhancements **Short-term (v0.5.2):** - Enable NeoMem integration in Relay - Add session cleanup/expiration - Session list endpoint - NeoMem health monitoring **Medium-term (v0.6.x):** - Re-enable RAG service - Migrate SESSIONS to Redis for multi-worker support - Add request correlation IDs - Comprehensive health checks **Long-term (v0.7.x+):** - Persona Sidecar full integration - Autonomous "dream" cycles (self-reflection) - Verifier module for factual grounding - Advanced RAG with hybrid search - Memory consolidation strategies --- ## Troubleshooting Quick Reference | Problem | Quick Check | Solution | |---------|-------------|----------| | SESSIONS empty | `curl localhost:7081/debug/sessions` | Rebuild Cortex, verify `__init__.py` exists | | LLM timeout | `curl http://10.0.0.44:8080/health` | Check backend connectivity, increase timeout | | Port conflict | `netstat -tulpn \| grep 7078` | Stop conflicting service or change port | | Container crash | `docker logs cortex` | Check logs for Python errors, verify .env syntax | | Missing package | `docker exec cortex pip list` | Rebuild container, check requirements.txt | | 502 from Relay | `curl localhost:7081/health` | Verify Cortex is running, check docker network | --- ## API Reference (Quick) ### Relay (Port 7078) **POST /v1/chat/completions** - OpenAI-compatible chat ```json { "messages": [{"role": "user", "content": "..."}], "session_id": "..." } ``` **GET /_health** - Service health ### Cortex (Port 7081) **POST /reason** - Main reasoning pipeline ```json { "session_id": "...", "user_prompt": "...", "temperature": 0.7 // optional } ``` **POST /ingest** - Add exchange to SESSIONS ```json { "session_id": "...", "user_msg": "...", "assistant_msg": "..." } ``` **GET /debug/sessions** - Inspect SESSIONS state **GET /debug/summary?session_id=X** - Test summarization **GET /health** - Service health ### NeoMem (Port 7077) **POST /memories** - Add memory ```json { "messages": [{"role": "...", "content": "..."}], "user_id": "...", "metadata": {} } ``` **POST /search** - Semantic search ```json { "query": "...", "user_id": "...", "limit": 10 } ``` **GET /health** - Service health --- ## File Manifest (Key Files Only) ``` project-lyra/ ├── .env # Root environment variables ├── docker-compose.yml # Service definitions (152 lines) ├── CHANGELOG.md # Version history (836 lines) ├── README.md # User documentation (610 lines) ├── PROJECT_SUMMARY.md # This file (AI context) │ ├── cortex/ # Reasoning engine │ ├── Dockerfile # Single-worker constraint documented │ ├── requirements.txt │ ├── .env # Cortex overrides │ ├── main.py # FastAPI initialization │ ├── router.py # Routes (306 lines) │ ├── context.py # Context aggregation │ │ │ ├── intake/ # Short-term memory (embedded) │ │ ├── __init__.py # Package exports │ │ └── intake.py # Core logic (367 lines) │ │ │ ├── reasoning/ # Reasoning pipeline │ │ ├── reflection.py # Meta-awareness │ │ ├── reasoning.py # Draft generation │ │ └── refine.py # Refinement │ │ │ ├── persona/ # Personality layer │ │ ├── speak.py # Persona application │ │ └── identity.py # Persona loader │ │ │ └── llm/ # LLM integration │ └── llm_router.py # Backend selector │ ├── core/relay/ # Orchestrator │ ├── server.js # Express server (Node.js) │ └── package.json │ ├── neomem/ # Long-term memory │ ├── Dockerfile │ ├── requirements.txt │ ├── .env # NeoMem overrides │ └── main.py # Memory API │ └── rag/ # RAG system (disabled) ├── rag_api.py ├── rag_chat_import.py └── chromadb/ ``` --- ## Final Notes for AI Assistants ### What You Should Know Before Making Changes 1. **SESSIONS is sacred** - It's a module-level global in `cortex/intake/intake.py`. Don't move it, don't duplicate it, don't make it a class attribute. It must remain a singleton. 2. **Single-worker is mandatory** - Until SESSIONS is migrated to Redis, Cortex MUST run with a single Uvicorn worker. Multi-worker will cause SESSIONS to be inconsistent. 3. **Lenient error handling** - The `/ingest` endpoint and other parts of the pipeline use lenient error handling: log errors but always return success. Never fail the chat pipeline. 4. **Backend routing is environment-driven** - Don't hardcode LLM URLs. Use the `{MODULE}_LLM` environment variables and the llm_router.py system. 5. **Intake is embedded** - Don't try to make HTTP calls to Intake. Use direct Python imports: `from intake.intake import ...` 6. **Test with diagnostic endpoints** - Always use `/debug/sessions` and `/debug/summary` to verify SESSIONS behavior after changes. 7. **Follow the changelog format** - When documenting changes, use the chronological format established in CHANGELOG.md v0.5.1. Group by version, then by change type (Fixed, Added, Changed, etc.). ### When You Need Help - **SESSIONS issues**: Check `cortex/intake/intake.py` lines 11-14 for initialization, lines 325-366 for `add_exchange_internal()` - **Routing issues**: Check `cortex/router.py` lines 65-189 for `/reason`, lines 201-233 for `/ingest` - **LLM backend issues**: Check `cortex/llm/llm_router.py` for backend selection logic - **Environment variables**: Check `.env` lines 13-40 for LLM backends, lines 28-34 for module selection ### Most Important Thing **This project values reliability over features.** It's better to have a simple, working system than a complex, broken one. When in doubt, keep it simple, log everything, and never fail silently. --- **End of AI Context Summary** *This document is maintained to provide complete context for AI assistants working on Project Lyra. Last updated: v0.5.1 (2025-12-11)*