27 KiB
Project Lyra — Comprehensive AI Context Summary
Version: v0.5.1 (2025-12-11) Status: Production-ready modular AI companion system Purpose: Memory-backed conversational AI with multi-stage reasoning, persistent context, and modular LLM backend architecture
Executive Summary
Project Lyra is a self-hosted AI companion system designed to overcome the limitations of typical chatbots by providing:
- Persistent long-term memory (NeoMem: PostgreSQL + Neo4j graph storage)
- Multi-stage reasoning pipeline (Cortex: reflection → reasoning → refinement → persona)
- Short-term context management (Intake: session-based summarization embedded in Cortex)
- Flexible LLM backend routing (supports llama.cpp, Ollama, OpenAI, custom endpoints)
- OpenAI-compatible API (drop-in replacement for chat applications)
Core Philosophy: Like a human brain has different regions for different functions, Lyra has specialized modules that work together. She's not just a chatbot—she's a notepad, schedule, database, co-creator, and collaborator with her own executive function.
Quick Context for AI Assistants
If you're an AI being given this project to work on, here's what you need to know:
What This Project Does
Lyra is a conversational AI system that remembers everything across sessions. When a user says something in passing, Lyra stores it, contextualizes it, and can recall it later. She can:
- Track project progress over time
- Remember user preferences and past conversations
- Reason through complex questions using multiple LLM calls
- Apply a consistent personality across all interactions
- Integrate with multiple LLM backends (local and cloud)
Current Architecture (v0.5.1)
User → Relay (Express/Node.js, port 7078)
↓
Cortex (FastAPI/Python, port 7081)
├─ Intake module (embedded, in-memory SESSIONS)
├─ 4-stage reasoning pipeline
└─ Multi-backend LLM router
↓
NeoMem (FastAPI/Python, port 7077)
├─ PostgreSQL (vector storage)
└─ Neo4j (graph relationships)
Key Files You'll Work With
Backend Services:
- cortex/router.py - Main Cortex routing logic (306 lines,
/reason,/ingestendpoints) - cortex/intake/intake.py - Short-term memory module (367 lines, SESSIONS management)
- cortex/reasoning/reasoning.py - Draft answer generation
- cortex/reasoning/refine.py - Answer refinement
- cortex/reasoning/reflection.py - Meta-awareness notes
- cortex/persona/speak.py - Personality layer
- cortex/llm/llm_router.py - LLM backend selector
- core/relay/server.js - Main orchestrator (Node.js)
- neomem/main.py - Long-term memory API
Configuration:
- .env - Root environment variables (LLM backends, databases, API keys)
- cortex/.env - Cortex-specific overrides
- docker-compose.yml - Service definitions (152 lines)
Documentation:
- CHANGELOG.md - Complete version history (836 lines, chronological format)
- README.md - User-facing documentation (610 lines)
- PROJECT_SUMMARY.md - This file
Recent Critical Fixes (v0.5.1)
The most recent work fixed a critical bug where Intake's SESSIONS buffer wasn't persisting:
- Fixed:
bg_summarize()was only a TYPE_CHECKING stub → implemented as logging stub - Fixed:
/ingestendpoint had unreachable code → removed early return, added lenient error handling - Added:
cortex/intake/__init__.py→ proper Python package structure - Added: Diagnostic endpoints
/debug/sessionsand/debug/summaryfor troubleshooting
Key Insight: Intake is no longer a standalone service—it's embedded in Cortex as a Python module. SESSIONS must persist in a single Uvicorn worker (no multi-worker support without Redis).
Architecture Deep Dive
Service Topology (Docker Compose)
Active Containers:
-
relay (Node.js/Express, port 7078)
- Entry point for all user requests
- OpenAI-compatible
/v1/chat/completionsendpoint - Routes to Cortex for reasoning
- Async calls to Cortex
/ingestafter response
-
cortex (Python/FastAPI, port 7081)
- Multi-stage reasoning pipeline
- Embedded Intake module (no HTTP, direct Python imports)
- Endpoints:
/reason,/ingest,/health,/debug/sessions,/debug/summary
-
neomem-api (Python/FastAPI, port 7077)
- Long-term memory storage
- Fork of Mem0 OSS (fully local, no external SDK)
- Endpoints:
/memories,/search,/health
-
neomem-postgres (PostgreSQL + pgvector, port 5432)
- Vector embeddings storage
- Memory history records
-
neomem-neo4j (Neo4j, ports 7474/7687)
- Graph relationships between memories
- Entity extraction and linking
Disabled Services:
intake- No longer needed (embedded in Cortex as of v0.5.1)rag- Beta Lyrae RAG service (planned re-enablement)
External LLM Backends (HTTP APIs)
PRIMARY Backend - llama.cpp @ http://10.0.0.44:8080
- AMD MI50 GPU-accelerated inference
- Model:
/model(path-based routing) - Used for: Reasoning, refinement, summarization
SECONDARY Backend - Ollama @ http://10.0.0.3:11434
- RTX 3090 GPU-accelerated inference
- Model:
qwen2.5:7b-instruct-q4_K_M - Used for: Configurable per-module
CLOUD Backend - OpenAI @ https://api.openai.com/v1
- Cloud-based inference
- Model:
gpt-4o-mini - Used for: Reflection, persona layers
FALLBACK Backend - Local @ http://10.0.0.41:11435
- CPU-based inference
- Model:
llama-3.2-8b-instruct - Used for: Emergency fallback
Data Flow (Request Lifecycle)
1. User sends message → Relay (/v1/chat/completions)
↓
2. Relay → Cortex (/reason)
↓
3. Cortex calls Intake module (internal Python)
- Intake.summarize_context(session_id, exchanges)
- Returns L1/L5/L10/L20/L30 summaries
↓
4. Cortex 4-stage pipeline:
a. reflection.py → Meta-awareness notes (CLOUD backend)
- "What is the user really asking?"
- Returns JSON: {"notes": [...]}
b. reasoning.py → Draft answer (PRIMARY backend)
- Uses context from Intake
- Integrates reflection notes
- Returns draft text
c. refine.py → Refined answer (PRIMARY backend)
- Polishes draft for clarity
- Ensures factual consistency
- Returns refined text
d. speak.py → Persona layer (CLOUD backend)
- Applies Lyra's personality
- Natural, conversational tone
- Returns final answer
↓
5. Cortex → Relay (returns persona answer)
↓
6. Relay → Cortex (/ingest) [async, non-blocking]
- Sends (session_id, user_msg, assistant_msg)
- Cortex calls add_exchange_internal()
- Appends to SESSIONS[session_id]["buffer"]
↓
7. Relay → User (returns final response)
↓
8. [Planned] Relay → NeoMem (/memories) [async]
- Store conversation in long-term memory
Intake Module Architecture (v0.5.1)
Location: cortex/intake/
Key Change: Intake is now embedded in Cortex as a Python module, not a standalone service.
Import Pattern:
from intake.intake import add_exchange_internal, SESSIONS, summarize_context
Core Data Structure:
SESSIONS: dict[str, dict] = {}
# Structure:
SESSIONS[session_id] = {
"buffer": deque(maxlen=200), # Circular buffer of exchanges
"created_at": datetime
}
# Each exchange in buffer:
{
"session_id": "...",
"user_msg": "...",
"assistant_msg": "...",
"timestamp": "2025-12-11T..."
}
Functions:
-
add_exchange_internal(exchange: dict)- Adds exchange to SESSIONS buffer
- Creates new session if needed
- Calls
bg_summarize()stub - Returns
{"ok": True, "session_id": "..."}
-
summarize_context(session_id: str, exchanges: list[dict])[async]- Generates L1/L5/L10/L20/L30 summaries via LLM
- Called during
/reasonendpoint - Returns multi-level summary dict
-
bg_summarize(session_id: str)- Stub function - logs only, no actual work
- Defers summarization to
/reasoncall - Exists to prevent NameError
Critical Constraint: SESSIONS is a module-level global dict. This requires single-worker Uvicorn mode. Multi-worker deployments need Redis or shared storage.
Diagnostic Endpoints:
GET /debug/sessions- Inspect all SESSIONS (object ID, buffer sizes, recent exchanges)GET /debug/summary?session_id=X- Test summarization for a session
Environment Configuration
LLM Backend Registry (Multi-Backend Strategy)
Root .env defines all backend OPTIONS:
# PRIMARY Backend (llama.cpp)
LLM_PRIMARY_PROVIDER=llama.cpp
LLM_PRIMARY_URL=http://10.0.0.44:8080
LLM_PRIMARY_MODEL=/model
# SECONDARY Backend (Ollama)
LLM_SECONDARY_PROVIDER=ollama
LLM_SECONDARY_URL=http://10.0.0.3:11434
LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
# CLOUD Backend (OpenAI)
LLM_OPENAI_PROVIDER=openai
LLM_OPENAI_URL=https://api.openai.com/v1
LLM_OPENAI_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-proj-...
# FALLBACK Backend
LLM_FALLBACK_PROVIDER=openai_completions
LLM_FALLBACK_URL=http://10.0.0.41:11435
LLM_FALLBACK_MODEL=llama-3.2-8b-instruct
Module-specific backend selection:
CORTEX_LLM=SECONDARY # Cortex uses Ollama
INTAKE_LLM=PRIMARY # Intake uses llama.cpp
SPEAK_LLM=OPENAI # Persona uses OpenAI
NEOMEM_LLM=PRIMARY # NeoMem uses llama.cpp
UI_LLM=OPENAI # UI uses OpenAI
RELAY_LLM=PRIMARY # Relay uses llama.cpp
Philosophy: Root .env provides all backend OPTIONS. Each service chooses which backend to USE via {MODULE}_LLM variable. This eliminates URL duplication while preserving flexibility.
Database Configuration
# PostgreSQL (vector storage)
POSTGRES_USER=neomem
POSTGRES_PASSWORD=neomempass
POSTGRES_DB=neomem
POSTGRES_HOST=neomem-postgres
POSTGRES_PORT=5432
# Neo4j (graph storage)
NEO4J_URI=bolt://neomem-neo4j:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=neomemgraph
Service URLs (Docker Internal Network)
NEOMEM_API=http://neomem-api:7077
CORTEX_API=http://cortex:7081
CORTEX_REASON_URL=http://cortex:7081/reason
CORTEX_INGEST_URL=http://cortex:7081/ingest
RELAY_URL=http://relay:7078
Feature Flags
CORTEX_ENABLED=true
MEMORY_ENABLED=true
PERSONA_ENABLED=false
DEBUG_PROMPT=true
VERBOSE_DEBUG=true
Code Structure Overview
Cortex Service (cortex/)
Main Files:
main.py- FastAPI app initializationrouter.py- Route definitions (/reason,/ingest,/health,/debug/*)context.py- Context aggregation (Intake summaries, session state)
Reasoning Pipeline (reasoning/):
reflection.py- Meta-awareness notes (Cloud LLM)reasoning.py- Draft answer generation (Primary LLM)refine.py- Answer refinement (Primary LLM)
Persona Layer (persona/):
speak.py- Personality application (Cloud LLM)identity.py- Persona loader
Intake Module (intake/):
__init__.py- Package exports (SESSIONS, add_exchange_internal, summarize_context)intake.py- Core logic (367 lines)- SESSIONS dictionary
- add_exchange_internal()
- summarize_context()
- bg_summarize() stub
LLM Integration (llm/):
llm_router.py- Backend selector and HTTP client- call_llm() function
- Environment-based routing
- Payload formatting per backend type
Utilities (utils/):
- Helper functions for common operations
Configuration:
Dockerfile- Single-worker constraint documentedrequirements.txt- Python dependencies.env- Service-specific overrides
Relay Service (core/relay/)
Main Files:
server.js- Express.js server (Node.js)/v1/chat/completions- OpenAI-compatible endpoint/chat- Internal endpoint/_health- Health check
package.json- Node.js dependencies
Key Logic:
- Receives user messages
- Routes to Cortex
/reason - Async calls to Cortex
/ingestafter response - Returns final answer to user
NeoMem Service (neomem/)
Main Files:
main.py- FastAPI app (memory API)memory.py- Memory management logicembedder.py- Embedding generationgraph.py- Neo4j graph operationsDockerfile- Container definitionrequirements.txt- Python dependencies
API Endpoints:
POST /memories- Add new memoryPOST /search- Semantic searchGET /health- Service health
Common Development Tasks
Adding a New Endpoint to Cortex
Example: Add /debug/buffer endpoint
- Edit
cortex/router.py:
@cortex_router.get("/debug/buffer")
async def debug_buffer(session_id: str, limit: int = 10):
"""Return last N exchanges from a session buffer."""
from intake.intake import SESSIONS
session = SESSIONS.get(session_id)
if not session:
return {"error": "session not found", "session_id": session_id}
buffer = session["buffer"]
recent = list(buffer)[-limit:]
return {
"session_id": session_id,
"total_exchanges": len(buffer),
"recent_exchanges": recent
}
- Restart Cortex:
docker-compose restart cortex
- Test:
curl "http://localhost:7081/debug/buffer?session_id=test&limit=5"
Modifying LLM Backend for a Module
Example: Switch Cortex to use PRIMARY backend
- Edit
.env:
CORTEX_LLM=PRIMARY # Change from SECONDARY to PRIMARY
- Restart Cortex:
docker-compose restart cortex
- Verify in logs:
docker logs cortex | grep "Backend"
Adding Diagnostic Logging
Example: Log every exchange addition
- Edit
cortex/intake/intake.py:
def add_exchange_internal(exchange: dict):
session_id = exchange.get("session_id")
# Add detailed logging
print(f"[DEBUG] Adding exchange to {session_id}")
print(f"[DEBUG] User msg: {exchange.get('user_msg', '')[:100]}")
print(f"[DEBUG] Assistant msg: {exchange.get('assistant_msg', '')[:100]}")
# ... rest of function
- View logs:
docker logs cortex -f | grep DEBUG
Debugging Guide
Problem: SESSIONS Not Persisting
Symptoms:
/debug/sessionsshows empty or only 1 exchange- Summaries always return empty
- Buffer size doesn't increase
Diagnosis Steps:
-
Check Cortex logs for SESSIONS object ID:
docker logs cortex | grep "SESSIONS object id"- Should show same ID across all calls
- If IDs differ → module reloading issue
-
Verify single-worker mode:
docker exec cortex cat Dockerfile | grep uvicorn- Should NOT have
--workersflag or--workers 1
- Should NOT have
-
Check
/debug/sessionsendpoint:curl http://localhost:7081/debug/sessions | jq- Should show sessions_object_id and current sessions
-
Inspect
__init__.pyexists:docker exec cortex ls -la intake/__init__.py
Solution (Fixed in v0.5.1):
- Ensure
cortex/intake/__init__.pyexists with proper exports - Verify
bg_summarize()is implemented (not just TYPE_CHECKING stub) - Check
/ingestendpoint doesn't have early return - Rebuild Cortex container:
docker-compose build cortex && docker-compose restart cortex
Problem: LLM Backend Timeout
Symptoms:
- Cortex
/reasonhangs - 504 Gateway Timeout errors
- Logs show "waiting for LLM response"
Diagnosis Steps:
-
Test backend directly:
# llama.cpp curl http://10.0.0.44:8080/health # Ollama curl http://10.0.0.3:11434/api/tags # OpenAI curl https://api.openai.com/v1/models \ -H "Authorization: Bearer $OPENAI_API_KEY" -
Check network connectivity:
docker exec cortex ping -c 3 10.0.0.44 -
Review Cortex logs:
docker logs cortex -f | grep "LLM"
Solutions:
- Verify backend URL in
.envis correct and accessible - Check firewall rules for backend ports
- Increase timeout in
cortex/llm/llm_router.py - Switch to different backend temporarily:
CORTEX_LLM=CLOUD
Problem: Docker Compose Won't Start
Symptoms:
docker-compose up -dfails- Container exits immediately
- "port already in use" errors
Diagnosis Steps:
-
Check port conflicts:
netstat -tulpn | grep -E '7078|7081|7077|5432' -
Check container logs:
docker-compose logs --tail=50 -
Verify environment file:
cat .env | grep -v "^#" | grep -v "^$"
Solutions:
- Stop conflicting services:
docker-compose down - Check
.envsyntax (no quotes unless necessary) - Rebuild containers:
docker-compose build --no-cache - Check Docker daemon:
systemctl status docker
Testing Checklist
After Making Changes to Cortex
1. Build and restart:
docker-compose build cortex
docker-compose restart cortex
2. Verify service health:
curl http://localhost:7081/health
3. Test /ingest endpoint:
curl -X POST http://localhost:7081/ingest \
-H "Content-Type: application/json" \
-d '{
"session_id": "test",
"user_msg": "Hello",
"assistant_msg": "Hi there!"
}'
4. Verify SESSIONS updated:
curl http://localhost:7081/debug/sessions | jq '.sessions.test.buffer_size'
- Should show 1 (or increment if already populated)
5. Test summarization:
curl "http://localhost:7081/debug/summary?session_id=test" | jq '.summary'
- Should return L1/L5/L10/L20/L30 summaries
6. Test full pipeline:
curl -X POST http://localhost:7078/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Test message"}],
"session_id": "test"
}' | jq '.choices[0].message.content'
7. Check logs for errors:
docker logs cortex --tail=50
Project History & Context
Evolution Timeline
v0.1.x (2025-09-23 to 2025-09-25)
- Initial MVP: Relay + Mem0 + Ollama
- Basic memory storage and retrieval
- Simple UI with session support
v0.2.x (2025-09-24 to 2025-09-30)
- Migrated to mem0ai SDK
- Added sessionId support
- Created standalone Lyra-Mem0 stack
v0.3.x (2025-09-26 to 2025-10-28)
- Forked Mem0 → NVGRAM → NeoMem
- Added salience filtering
- Integrated Cortex reasoning VM
- Built RAG system (Beta Lyrae)
- Established multi-backend LLM support
v0.4.x (2025-11-05 to 2025-11-13)
- Major architectural rewire
- Implemented 4-stage reasoning pipeline
- Added reflection, refinement stages
- RAG integration
- LLM router with per-stage backend selection
Infrastructure v1.0.0 (2025-11-26)
- Consolidated 9
.envfiles into single source of truth - Multi-backend LLM strategy
- Docker Compose consolidation
- Created security templates
v0.5.0 (2025-11-28)
- Fixed all critical API wiring issues
- Added OpenAI-compatible Relay endpoint
- Fixed Cortex → Intake integration
- End-to-end flow verification
v0.5.1 (2025-12-11) - CURRENT
- Critical fix: SESSIONS persistence bug
- Implemented
bg_summarize()stub - Fixed
/ingestunreachable code - Added
cortex/intake/__init__.py - Embedded Intake in Cortex (no longer standalone)
- Added diagnostic endpoints
- Lenient error handling
- Documented single-worker constraint
Architectural Philosophy
Modular Design:
- Each service has a single, clear responsibility
- Services communicate via well-defined HTTP APIs
- Configuration is centralized but allows per-service overrides
Local-First:
- No reliance on external services (except optional OpenAI)
- All data stored locally (PostgreSQL + Neo4j)
- Can run entirely air-gapped with local LLMs
Flexible LLM Backend:
- Not tied to any single LLM provider
- Can mix local and cloud models
- Per-stage backend selection for optimal performance/cost
Error Handling:
- Lenient mode: Never fail the chat pipeline
- Log errors but continue processing
- Graceful degradation
Observability:
- Diagnostic endpoints for debugging
- Verbose logging mode
- Object ID tracking for singleton verification
Known Issues & Limitations
Fixed in v0.5.1
- ✅ Intake SESSIONS not persisting → FIXED
- ✅
bg_summarize()NameError → FIXED - ✅
/ingestendpoint unreachable code → FIXED
Current Limitations
1. Single-Worker Constraint
- Cortex must run with single Uvicorn worker
- SESSIONS is in-memory module-level global
- Multi-worker support requires Redis or shared storage
- Documented in
cortex/Dockerfilelines 7-8
2. NeoMem Integration Incomplete
- Relay doesn't yet push to NeoMem after responses
- Memory storage planned for v0.5.2
- Currently all memory is short-term (SESSIONS only)
3. RAG Service Disabled
- Beta Lyrae (RAG) commented out in docker-compose.yml
- Awaiting re-enablement after Intake stabilization
- Code exists but not currently integrated
4. Session Management
- No session cleanup/expiration
- SESSIONS grows unbounded (maxlen=200 per session, but infinite sessions)
- No session list endpoint in Relay
5. Persona Integration
PERSONA_ENABLED=falsein.env- Persona Sidecar not fully wired
- Identity loaded but not consistently applied
Future Enhancements
Short-term (v0.5.2):
- Enable NeoMem integration in Relay
- Add session cleanup/expiration
- Session list endpoint
- NeoMem health monitoring
Medium-term (v0.6.x):
- Re-enable RAG service
- Migrate SESSIONS to Redis for multi-worker support
- Add request correlation IDs
- Comprehensive health checks
Long-term (v0.7.x+):
- Persona Sidecar full integration
- Autonomous "dream" cycles (self-reflection)
- Verifier module for factual grounding
- Advanced RAG with hybrid search
- Memory consolidation strategies
Troubleshooting Quick Reference
| Problem | Quick Check | Solution |
|---|---|---|
| SESSIONS empty | curl localhost:7081/debug/sessions |
Rebuild Cortex, verify __init__.py exists |
| LLM timeout | curl http://10.0.0.44:8080/health |
Check backend connectivity, increase timeout |
| Port conflict | netstat -tulpn | grep 7078 |
Stop conflicting service or change port |
| Container crash | docker logs cortex |
Check logs for Python errors, verify .env syntax |
| Missing package | docker exec cortex pip list |
Rebuild container, check requirements.txt |
| 502 from Relay | curl localhost:7081/health |
Verify Cortex is running, check docker network |
API Reference (Quick)
Relay (Port 7078)
POST /v1/chat/completions - OpenAI-compatible chat
{
"messages": [{"role": "user", "content": "..."}],
"session_id": "..."
}
GET /_health - Service health
Cortex (Port 7081)
POST /reason - Main reasoning pipeline
{
"session_id": "...",
"user_prompt": "...",
"temperature": 0.7 // optional
}
POST /ingest - Add exchange to SESSIONS
{
"session_id": "...",
"user_msg": "...",
"assistant_msg": "..."
}
GET /debug/sessions - Inspect SESSIONS state
GET /debug/summary?session_id=X - Test summarization
GET /health - Service health
NeoMem (Port 7077)
POST /memories - Add memory
{
"messages": [{"role": "...", "content": "..."}],
"user_id": "...",
"metadata": {}
}
POST /search - Semantic search
{
"query": "...",
"user_id": "...",
"limit": 10
}
GET /health - Service health
File Manifest (Key Files Only)
project-lyra/
├── .env # Root environment variables
├── docker-compose.yml # Service definitions (152 lines)
├── CHANGELOG.md # Version history (836 lines)
├── README.md # User documentation (610 lines)
├── PROJECT_SUMMARY.md # This file (AI context)
│
├── cortex/ # Reasoning engine
│ ├── Dockerfile # Single-worker constraint documented
│ ├── requirements.txt
│ ├── .env # Cortex overrides
│ ├── main.py # FastAPI initialization
│ ├── router.py # Routes (306 lines)
│ ├── context.py # Context aggregation
│ │
│ ├── intake/ # Short-term memory (embedded)
│ │ ├── __init__.py # Package exports
│ │ └── intake.py # Core logic (367 lines)
│ │
│ ├── reasoning/ # Reasoning pipeline
│ │ ├── reflection.py # Meta-awareness
│ │ ├── reasoning.py # Draft generation
│ │ └── refine.py # Refinement
│ │
│ ├── persona/ # Personality layer
│ │ ├── speak.py # Persona application
│ │ └── identity.py # Persona loader
│ │
│ └── llm/ # LLM integration
│ └── llm_router.py # Backend selector
│
├── core/relay/ # Orchestrator
│ ├── server.js # Express server (Node.js)
│ └── package.json
│
├── neomem/ # Long-term memory
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── .env # NeoMem overrides
│ └── main.py # Memory API
│
└── rag/ # RAG system (disabled)
├── rag_api.py
├── rag_chat_import.py
└── chromadb/
Final Notes for AI Assistants
What You Should Know Before Making Changes
-
SESSIONS is sacred - It's a module-level global in
cortex/intake/intake.py. Don't move it, don't duplicate it, don't make it a class attribute. It must remain a singleton. -
Single-worker is mandatory - Until SESSIONS is migrated to Redis, Cortex MUST run with a single Uvicorn worker. Multi-worker will cause SESSIONS to be inconsistent.
-
Lenient error handling - The
/ingestendpoint and other parts of the pipeline use lenient error handling: log errors but always return success. Never fail the chat pipeline. -
Backend routing is environment-driven - Don't hardcode LLM URLs. Use the
{MODULE}_LLMenvironment variables and the llm_router.py system. -
Intake is embedded - Don't try to make HTTP calls to Intake. Use direct Python imports:
from intake.intake import ... -
Test with diagnostic endpoints - Always use
/debug/sessionsand/debug/summaryto verify SESSIONS behavior after changes. -
Follow the changelog format - When documenting changes, use the chronological format established in CHANGELOG.md v0.5.1. Group by version, then by change type (Fixed, Added, Changed, etc.).
When You Need Help
- SESSIONS issues: Check
cortex/intake/intake.pylines 11-14 for initialization, lines 325-366 foradd_exchange_internal() - Routing issues: Check
cortex/router.pylines 65-189 for/reason, lines 201-233 for/ingest - LLM backend issues: Check
cortex/llm/llm_router.pyfor backend selection logic - Environment variables: Check
.envlines 13-40 for LLM backends, lines 28-34 for module selection
Most Important Thing
This project values reliability over features. It's better to have a simple, working system than a complex, broken one. When in doubt, keep it simple, log everything, and never fail silently.
End of AI Context Summary
This document is maintained to provide complete context for AI assistants working on Project Lyra. Last updated: v0.5.1 (2025-12-11)