Files

serversdwn 30f6c1a3da autonomy, initial scaffold

2025-12-11 13:12:44 -05:00

27 KiB

Raw Blame History

Project Lyra — Comprehensive AI Context Summary

Version: v0.5.1 (2025-12-11) Status: Production-ready modular AI companion system Purpose: Memory-backed conversational AI with multi-stage reasoning, persistent context, and modular LLM backend architecture

Executive Summary

Project Lyra is a self-hosted AI companion system designed to overcome the limitations of typical chatbots by providing:

Persistent long-term memory (NeoMem: PostgreSQL + Neo4j graph storage)
Multi-stage reasoning pipeline (Cortex: reflection → reasoning → refinement → persona)
Short-term context management (Intake: session-based summarization embedded in Cortex)
Flexible LLM backend routing (supports llama.cpp, Ollama, OpenAI, custom endpoints)
OpenAI-compatible API (drop-in replacement for chat applications)

Core Philosophy: Like a human brain has different regions for different functions, Lyra has specialized modules that work together. She's not just a chatbot—she's a notepad, schedule, database, co-creator, and collaborator with her own executive function.

Quick Context for AI Assistants

If you're an AI being given this project to work on, here's what you need to know:

What This Project Does

Lyra is a conversational AI system that remembers everything across sessions. When a user says something in passing, Lyra stores it, contextualizes it, and can recall it later. She can:

Track project progress over time
Remember user preferences and past conversations
Reason through complex questions using multiple LLM calls
Apply a consistent personality across all interactions
Integrate with multiple LLM backends (local and cloud)

Current Architecture (v0.5.1)

User → Relay (Express/Node.js, port 7078)
  ↓
Cortex (FastAPI/Python, port 7081)
  ├─ Intake module (embedded, in-memory SESSIONS)
  ├─ 4-stage reasoning pipeline
  └─ Multi-backend LLM router
  ↓
NeoMem (FastAPI/Python, port 7077)
  ├─ PostgreSQL (vector storage)
  └─ Neo4j (graph relationships)

Key Files You'll Work With

Backend Services:

cortex/router.py - Main Cortex routing logic (306 lines, /reason, /ingest endpoints)
cortex/intake/intake.py - Short-term memory module (367 lines, SESSIONS management)
cortex/reasoning/reasoning.py - Draft answer generation
cortex/reasoning/refine.py - Answer refinement
cortex/reasoning/reflection.py - Meta-awareness notes
cortex/persona/speak.py - Personality layer
cortex/llm/llm_router.py - LLM backend selector
core/relay/server.js - Main orchestrator (Node.js)
neomem/main.py - Long-term memory API

Configuration:

.env - Root environment variables (LLM backends, databases, API keys)
cortex/.env - Cortex-specific overrides
docker-compose.yml - Service definitions (152 lines)

Documentation:

CHANGELOG.md - Complete version history (836 lines, chronological format)
README.md - User-facing documentation (610 lines)
PROJECT_SUMMARY.md - This file

Recent Critical Fixes (v0.5.1)

The most recent work fixed a critical bug where Intake's SESSIONS buffer wasn't persisting:

Fixed: bg_summarize() was only a TYPE_CHECKING stub → implemented as logging stub
Fixed: /ingest endpoint had unreachable code → removed early return, added lenient error handling
Added: cortex/intake/__init__.py → proper Python package structure
Added: Diagnostic endpoints /debug/sessions and /debug/summary for troubleshooting

Key Insight: Intake is no longer a standalone service—it's embedded in Cortex as a Python module. SESSIONS must persist in a single Uvicorn worker (no multi-worker support without Redis).

Architecture Deep Dive

Service Topology (Docker Compose)

Active Containers:

relay (Node.js/Express, port 7078)
- Entry point for all user requests
- OpenAI-compatible /v1/chat/completions endpoint
- Routes to Cortex for reasoning
- Async calls to Cortex /ingest after response
cortex (Python/FastAPI, port 7081)
- Multi-stage reasoning pipeline
- Embedded Intake module (no HTTP, direct Python imports)
- Endpoints: /reason, /ingest, /health, /debug/sessions, /debug/summary
neomem-api (Python/FastAPI, port 7077)
- Long-term memory storage
- Fork of Mem0 OSS (fully local, no external SDK)
- Endpoints: /memories, /search, /health
neomem-postgres (PostgreSQL + pgvector, port 5432)
- Vector embeddings storage
- Memory history records
neomem-neo4j (Neo4j, ports 7474/7687)
- Graph relationships between memories
- Entity extraction and linking

Disabled Services:

intake - No longer needed (embedded in Cortex as of v0.5.1)
rag - Beta Lyrae RAG service (planned re-enablement)

External LLM Backends (HTTP APIs)

PRIMARY Backend - llama.cpp @ http://10.0.0.44:8080

AMD MI50 GPU-accelerated inference
Model: /model (path-based routing)
Used for: Reasoning, refinement, summarization

SECONDARY Backend - Ollama @ http://10.0.0.3:11434

RTX 3090 GPU-accelerated inference
Model: qwen2.5:7b-instruct-q4_K_M
Used for: Configurable per-module

CLOUD Backend - OpenAI @ https://api.openai.com/v1

Cloud-based inference
Model: gpt-4o-mini
Used for: Reflection, persona layers

FALLBACK Backend - Local @ http://10.0.0.41:11435

CPU-based inference
Model: llama-3.2-8b-instruct
Used for: Emergency fallback

Data Flow (Request Lifecycle)

1. User sends message → Relay (/v1/chat/completions)
   ↓
2. Relay → Cortex (/reason)
   ↓
3. Cortex calls Intake module (internal Python)
   - Intake.summarize_context(session_id, exchanges)
   - Returns L1/L5/L10/L20/L30 summaries
   ↓
4. Cortex 4-stage pipeline:
   a. reflection.py → Meta-awareness notes (CLOUD backend)
      - "What is the user really asking?"
      - Returns JSON: {"notes": [...]}

   b. reasoning.py → Draft answer (PRIMARY backend)
      - Uses context from Intake
      - Integrates reflection notes
      - Returns draft text

   c. refine.py → Refined answer (PRIMARY backend)
      - Polishes draft for clarity
      - Ensures factual consistency
      - Returns refined text

   d. speak.py → Persona layer (CLOUD backend)
      - Applies Lyra's personality
      - Natural, conversational tone
      - Returns final answer
   ↓
5. Cortex → Relay (returns persona answer)
   ↓
6. Relay → Cortex (/ingest) [async, non-blocking]
   - Sends (session_id, user_msg, assistant_msg)
   - Cortex calls add_exchange_internal()
   - Appends to SESSIONS[session_id]["buffer"]
   ↓
7. Relay → User (returns final response)
   ↓
8. [Planned] Relay → NeoMem (/memories) [async]
   - Store conversation in long-term memory

Intake Module Architecture (v0.5.1)

Location: cortex/intake/

Key Change: Intake is now embedded in Cortex as a Python module, not a standalone service.

Import Pattern:

from intake.intake import add_exchange_internal, SESSIONS, summarize_context

Core Data Structure:

SESSIONS: dict[str, dict] = {}

# Structure:
SESSIONS[session_id] = {
    "buffer": deque(maxlen=200),  # Circular buffer of exchanges
    "created_at": datetime
}

# Each exchange in buffer:
{
    "session_id": "...",
    "user_msg": "...",
    "assistant_msg": "...",
    "timestamp": "2025-12-11T..."
}

Functions:

add_exchange_internal(exchange: dict)
- Adds exchange to SESSIONS buffer
- Creates new session if needed
- Calls bg_summarize() stub
- Returns {"ok": True, "session_id": "..."}
summarize_context(session_id: str, exchanges: list[dict]) [async]
- Generates L1/L5/L10/L20/L30 summaries via LLM
- Called during /reason endpoint
- Returns multi-level summary dict
bg_summarize(session_id: str)
- Stub function - logs only, no actual work
- Defers summarization to /reason call
- Exists to prevent NameError

Critical Constraint: SESSIONS is a module-level global dict. This requires single-worker Uvicorn mode. Multi-worker deployments need Redis or shared storage.

Diagnostic Endpoints:

GET /debug/sessions - Inspect all SESSIONS (object ID, buffer sizes, recent exchanges)
GET /debug/summary?session_id=X - Test summarization for a session

Environment Configuration

LLM Backend Registry (Multi-Backend Strategy)

Root .env defines all backend OPTIONS:

# PRIMARY Backend (llama.cpp)
LLM_PRIMARY_PROVIDER=llama.cpp
LLM_PRIMARY_URL=http://10.0.0.44:8080
LLM_PRIMARY_MODEL=/model

# SECONDARY Backend (Ollama)
LLM_SECONDARY_PROVIDER=ollama
LLM_SECONDARY_URL=http://10.0.0.3:11434
LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M

# CLOUD Backend (OpenAI)
LLM_OPENAI_PROVIDER=openai
LLM_OPENAI_URL=https://api.openai.com/v1
LLM_OPENAI_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-proj-...

# FALLBACK Backend
LLM_FALLBACK_PROVIDER=openai_completions
LLM_FALLBACK_URL=http://10.0.0.41:11435
LLM_FALLBACK_MODEL=llama-3.2-8b-instruct

Module-specific backend selection:

CORTEX_LLM=SECONDARY      # Cortex uses Ollama
INTAKE_LLM=PRIMARY        # Intake uses llama.cpp
SPEAK_LLM=OPENAI          # Persona uses OpenAI
NEOMEM_LLM=PRIMARY        # NeoMem uses llama.cpp
UI_LLM=OPENAI             # UI uses OpenAI
RELAY_LLM=PRIMARY         # Relay uses llama.cpp

Philosophy: Root .env provides all backend OPTIONS. Each service chooses which backend to USE via {MODULE}_LLM variable. This eliminates URL duplication while preserving flexibility.

Database Configuration

# PostgreSQL (vector storage)
POSTGRES_USER=neomem
POSTGRES_PASSWORD=neomempass
POSTGRES_DB=neomem
POSTGRES_HOST=neomem-postgres
POSTGRES_PORT=5432

# Neo4j (graph storage)
NEO4J_URI=bolt://neomem-neo4j:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=neomemgraph

Service URLs (Docker Internal Network)

NEOMEM_API=http://neomem-api:7077
CORTEX_API=http://cortex:7081
CORTEX_REASON_URL=http://cortex:7081/reason
CORTEX_INGEST_URL=http://cortex:7081/ingest
RELAY_URL=http://relay:7078

Feature Flags

CORTEX_ENABLED=true
MEMORY_ENABLED=true
PERSONA_ENABLED=false
DEBUG_PROMPT=true
VERBOSE_DEBUG=true

Code Structure Overview

Cortex Service (`cortex/`)

Main Files:

main.py - FastAPI app initialization
router.py - Route definitions (/reason, /ingest, /health, /debug/*)
context.py - Context aggregation (Intake summaries, session state)

Reasoning Pipeline (reasoning/):

reflection.py - Meta-awareness notes (Cloud LLM)
reasoning.py - Draft answer generation (Primary LLM)
refine.py - Answer refinement (Primary LLM)

Persona Layer (persona/):

speak.py - Personality application (Cloud LLM)
identity.py - Persona loader

Intake Module (intake/):

__init__.py - Package exports (SESSIONS, add_exchange_internal, summarize_context)
intake.py - Core logic (367 lines)
- SESSIONS dictionary
- add_exchange_internal()
- summarize_context()
- bg_summarize() stub

LLM Integration (llm/):

llm_router.py - Backend selector and HTTP client
- call_llm() function
- Environment-based routing
- Payload formatting per backend type

Utilities (utils/):

Helper functions for common operations

Configuration:

Dockerfile - Single-worker constraint documented
requirements.txt - Python dependencies
.env - Service-specific overrides

Relay Service (`core/relay/`)

Main Files:

server.js - Express.js server (Node.js)
- /v1/chat/completions - OpenAI-compatible endpoint
- /chat - Internal endpoint
- /_health - Health check
package.json - Node.js dependencies

Key Logic:

Receives user messages
Routes to Cortex /reason
Async calls to Cortex /ingest after response
Returns final answer to user

NeoMem Service (`neomem/`)

Main Files:

main.py - FastAPI app (memory API)
memory.py - Memory management logic
embedder.py - Embedding generation
graph.py - Neo4j graph operations
Dockerfile - Container definition
requirements.txt - Python dependencies

API Endpoints:

POST /memories - Add new memory
POST /search - Semantic search
GET /health - Service health

Common Development Tasks

Adding a New Endpoint to Cortex

Example: Add /debug/buffer endpoint

Edit cortex/router.py:

@cortex_router.get("/debug/buffer")
async def debug_buffer(session_id: str, limit: int = 10):
    """Return last N exchanges from a session buffer."""
    from intake.intake import SESSIONS

    session = SESSIONS.get(session_id)
    if not session:
        return {"error": "session not found", "session_id": session_id}

    buffer = session["buffer"]
    recent = list(buffer)[-limit:]

    return {
        "session_id": session_id,
        "total_exchanges": len(buffer),
        "recent_exchanges": recent
    }

Restart Cortex:

docker-compose restart cortex

Test:

curl "http://localhost:7081/debug/buffer?session_id=test&limit=5"

Modifying LLM Backend for a Module

Example: Switch Cortex to use PRIMARY backend

Edit .env:

CORTEX_LLM=PRIMARY  # Change from SECONDARY to PRIMARY

Restart Cortex:

docker-compose restart cortex

Verify in logs:

docker logs cortex | grep "Backend"

Adding Diagnostic Logging

Example: Log every exchange addition

Edit cortex/intake/intake.py:

def add_exchange_internal(exchange: dict):
    session_id = exchange.get("session_id")

    # Add detailed logging
    print(f"[DEBUG] Adding exchange to {session_id}")
    print(f"[DEBUG] User msg: {exchange.get('user_msg', '')[:100]}")
    print(f"[DEBUG] Assistant msg: {exchange.get('assistant_msg', '')[:100]}")

    # ... rest of function

View logs:

docker logs cortex -f | grep DEBUG

Debugging Guide

Problem: SESSIONS Not Persisting

Symptoms:

/debug/sessions shows empty or only 1 exchange
Summaries always return empty
Buffer size doesn't increase

Diagnosis Steps:

Check Cortex logs for SESSIONS object ID:
```
docker logs cortex | grep "SESSIONS object id"
```
- Should show same ID across all calls
- If IDs differ → module reloading issue
Verify single-worker mode:
```
docker exec cortex cat Dockerfile | grep uvicorn
```
- Should NOT have --workers flag or --workers 1
Check /debug/sessions endpoint:
```
curl http://localhost:7081/debug/sessions | jq
```
- Should show sessions_object_id and current sessions

Inspect __init__.py exists:

docker exec cortex ls -la intake/__init__.py

Solution (Fixed in v0.5.1):

Ensure cortex/intake/__init__.py exists with proper exports
Verify bg_summarize() is implemented (not just TYPE_CHECKING stub)
Check /ingest endpoint doesn't have early return
Rebuild Cortex container: docker-compose build cortex && docker-compose restart cortex

Problem: LLM Backend Timeout

Symptoms:

Cortex /reason hangs
504 Gateway Timeout errors
Logs show "waiting for LLM response"

Diagnosis Steps:

Test backend directly:

# llama.cpp
curl http://10.0.0.44:8080/health

# Ollama
curl http://10.0.0.3:11434/api/tags

# OpenAI
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Check network connectivity:
```
docker exec cortex ping -c 3 10.0.0.44
```
Review Cortex logs:
```
docker logs cortex -f | grep "LLM"
```

Solutions:

Verify backend URL in .env is correct and accessible
Check firewall rules for backend ports
Increase timeout in cortex/llm/llm_router.py
Switch to different backend temporarily: CORTEX_LLM=CLOUD

Problem: Docker Compose Won't Start

Symptoms:

docker-compose up -d fails
Container exits immediately
"port already in use" errors

Diagnosis Steps:

Check port conflicts:

netstat -tulpn | grep -E '7078|7081|7077|5432'

Check container logs:
```
docker-compose logs --tail=50
```
Verify environment file:
```
cat .env | grep -v "^#" | grep -v "^$"
```

Solutions:

Stop conflicting services: docker-compose down
Check .env syntax (no quotes unless necessary)
Rebuild containers: docker-compose build --no-cache
Check Docker daemon: systemctl status docker

Testing Checklist

After Making Changes to Cortex

1. Build and restart:

docker-compose build cortex
docker-compose restart cortex

2. Verify service health:

curl http://localhost:7081/health

3. Test /ingest endpoint:

curl -X POST http://localhost:7081/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "test",
    "user_msg": "Hello",
    "assistant_msg": "Hi there!"
  }'

4. Verify SESSIONS updated:

curl http://localhost:7081/debug/sessions | jq '.sessions.test.buffer_size'

Should show 1 (or increment if already populated)

5. Test summarization:

curl "http://localhost:7081/debug/summary?session_id=test" | jq '.summary'

Should return L1/L5/L10/L20/L30 summaries

6. Test full pipeline:

curl -X POST http://localhost:7078/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Test message"}],
    "session_id": "test"
  }' | jq '.choices[0].message.content'

7. Check logs for errors:

docker logs cortex --tail=50

Project History & Context

Evolution Timeline

v0.1.x (2025-09-23 to 2025-09-25)

Initial MVP: Relay + Mem0 + Ollama
Basic memory storage and retrieval
Simple UI with session support

v0.2.x (2025-09-24 to 2025-09-30)

Migrated to mem0ai SDK
Added sessionId support
Created standalone Lyra-Mem0 stack

v0.3.x (2025-09-26 to 2025-10-28)

Forked Mem0 → NVGRAM → NeoMem
Added salience filtering
Integrated Cortex reasoning VM
Built RAG system (Beta Lyrae)
Established multi-backend LLM support

v0.4.x (2025-11-05 to 2025-11-13)

Major architectural rewire
Implemented 4-stage reasoning pipeline
Added reflection, refinement stages
RAG integration
LLM router with per-stage backend selection

Infrastructure v1.0.0 (2025-11-26)

Consolidated 9 .env files into single source of truth
Multi-backend LLM strategy
Docker Compose consolidation
Created security templates

v0.5.0 (2025-11-28)

Fixed all critical API wiring issues
Added OpenAI-compatible Relay endpoint
Fixed Cortex → Intake integration
End-to-end flow verification

v0.5.1 (2025-12-11) - CURRENT

Critical fix: SESSIONS persistence bug
Implemented bg_summarize() stub
Fixed /ingest unreachable code
Added cortex/intake/__init__.py
Embedded Intake in Cortex (no longer standalone)
Added diagnostic endpoints
Lenient error handling
Documented single-worker constraint

Architectural Philosophy

Modular Design:

Each service has a single, clear responsibility
Services communicate via well-defined HTTP APIs
Configuration is centralized but allows per-service overrides

Local-First:

No reliance on external services (except optional OpenAI)
All data stored locally (PostgreSQL + Neo4j)
Can run entirely air-gapped with local LLMs

Flexible LLM Backend:

Not tied to any single LLM provider
Can mix local and cloud models
Per-stage backend selection for optimal performance/cost

Error Handling:

Lenient mode: Never fail the chat pipeline
Log errors but continue processing
Graceful degradation

Observability:

Diagnostic endpoints for debugging
Verbose logging mode
Object ID tracking for singleton verification

Known Issues & Limitations

Fixed in v0.5.1

✅ Intake SESSIONS not persisting → FIXED
✅ bg_summarize() NameError → FIXED
✅ /ingest endpoint unreachable code → FIXED

Current Limitations

1. Single-Worker Constraint

Cortex must run with single Uvicorn worker
SESSIONS is in-memory module-level global
Multi-worker support requires Redis or shared storage
Documented in cortex/Dockerfile lines 7-8

2. NeoMem Integration Incomplete

Relay doesn't yet push to NeoMem after responses
Memory storage planned for v0.5.2
Currently all memory is short-term (SESSIONS only)

3. RAG Service Disabled

Beta Lyrae (RAG) commented out in docker-compose.yml
Awaiting re-enablement after Intake stabilization
Code exists but not currently integrated

4. Session Management

No session cleanup/expiration
SESSIONS grows unbounded (maxlen=200 per session, but infinite sessions)
No session list endpoint in Relay

5. Persona Integration

PERSONA_ENABLED=false in .env
Persona Sidecar not fully wired
Identity loaded but not consistently applied

Future Enhancements

Short-term (v0.5.2):

Enable NeoMem integration in Relay
Add session cleanup/expiration
Session list endpoint
NeoMem health monitoring

Medium-term (v0.6.x):

Re-enable RAG service
Migrate SESSIONS to Redis for multi-worker support
Add request correlation IDs
Comprehensive health checks

Long-term (v0.7.x+):

Persona Sidecar full integration
Autonomous "dream" cycles (self-reflection)
Verifier module for factual grounding
Advanced RAG with hybrid search
Memory consolidation strategies

Troubleshooting Quick Reference

Problem	Quick Check	Solution
SESSIONS empty	`curl localhost:7081/debug/sessions`	Rebuild Cortex, verify `__init__.py` exists
LLM timeout	`curl http://10.0.0.44:8080/health`	Check backend connectivity, increase timeout
Port conflict	`netstat -tulpn \| grep 7078`	Stop conflicting service or change port
Container crash	`docker logs cortex`	Check logs for Python errors, verify .env syntax
Missing package	`docker exec cortex pip list`	Rebuild container, check requirements.txt
502 from Relay	`curl localhost:7081/health`	Verify Cortex is running, check docker network

API Reference (Quick)

Relay (Port 7078)

POST /v1/chat/completions - OpenAI-compatible chat

{
  "messages": [{"role": "user", "content": "..."}],
  "session_id": "..."
}

GET /_health - Service health

Cortex (Port 7081)

POST /reason - Main reasoning pipeline

{
  "session_id": "...",
  "user_prompt": "...",
  "temperature": 0.7  // optional
}

POST /ingest - Add exchange to SESSIONS

{
  "session_id": "...",
  "user_msg": "...",
  "assistant_msg": "..."
}

GET /debug/sessions - Inspect SESSIONS state

GET /debug/summary?session_id=X - Test summarization

GET /health - Service health

NeoMem (Port 7077)

POST /memories - Add memory

{
  "messages": [{"role": "...", "content": "..."}],
  "user_id": "...",
  "metadata": {}
}

POST /search - Semantic search

{
  "query": "...",
  "user_id": "...",
  "limit": 10
}

GET /health - Service health

File Manifest (Key Files Only)

project-lyra/
├── .env                           # Root environment variables
├── docker-compose.yml             # Service definitions (152 lines)
├── CHANGELOG.md                   # Version history (836 lines)
├── README.md                      # User documentation (610 lines)
├── PROJECT_SUMMARY.md             # This file (AI context)
│
├── cortex/                        # Reasoning engine
│   ├── Dockerfile                 # Single-worker constraint documented
│   ├── requirements.txt
│   ├── .env                       # Cortex overrides
│   ├── main.py                    # FastAPI initialization
│   ├── router.py                  # Routes (306 lines)
│   ├── context.py                 # Context aggregation
│   │
│   ├── intake/                    # Short-term memory (embedded)
│   │   ├── __init__.py           # Package exports
│   │   └── intake.py             # Core logic (367 lines)
│   │
│   ├── reasoning/                 # Reasoning pipeline
│   │   ├── reflection.py         # Meta-awareness
│   │   ├── reasoning.py          # Draft generation
│   │   └── refine.py             # Refinement
│   │
│   ├── persona/                   # Personality layer
│   │   ├── speak.py              # Persona application
│   │   └── identity.py           # Persona loader
│   │
│   └── llm/                       # LLM integration
│       └── llm_router.py         # Backend selector
│
├── core/relay/                    # Orchestrator
│   ├── server.js                 # Express server (Node.js)
│   └── package.json
│
├── neomem/                        # Long-term memory
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── .env                       # NeoMem overrides
│   └── main.py                   # Memory API
│
└── rag/                           # RAG system (disabled)
    ├── rag_api.py
    ├── rag_chat_import.py
    └── chromadb/

Final Notes for AI Assistants

What You Should Know Before Making Changes

SESSIONS is sacred - It's a module-level global in cortex/intake/intake.py. Don't move it, don't duplicate it, don't make it a class attribute. It must remain a singleton.
Single-worker is mandatory - Until SESSIONS is migrated to Redis, Cortex MUST run with a single Uvicorn worker. Multi-worker will cause SESSIONS to be inconsistent.
Lenient error handling - The /ingest endpoint and other parts of the pipeline use lenient error handling: log errors but always return success. Never fail the chat pipeline.
Backend routing is environment-driven - Don't hardcode LLM URLs. Use the {MODULE}_LLM environment variables and the llm_router.py system.
Intake is embedded - Don't try to make HTTP calls to Intake. Use direct Python imports: from intake.intake import ...
Test with diagnostic endpoints - Always use /debug/sessions and /debug/summary to verify SESSIONS behavior after changes.
Follow the changelog format - When documenting changes, use the chronological format established in CHANGELOG.md v0.5.1. Group by version, then by change type (Fixed, Added, Changed, etc.).

When You Need Help

SESSIONS issues: Check cortex/intake/intake.py lines 11-14 for initialization, lines 325-366 for add_exchange_internal()
Routing issues: Check cortex/router.py lines 65-189 for /reason, lines 201-233 for /ingest
LLM backend issues: Check cortex/llm/llm_router.py for backend selection logic
Environment variables: Check .env lines 13-40 for LLM backends, lines 28-34 for module selection

Most Important Thing

This project values reliability over features. It's better to have a simple, working system than a complex, broken one. When in doubt, keep it simple, log everything, and never fail silently.

End of AI Context Summary

This document is maintained to provide complete context for AI assistants working on Project Lyra. Last updated: v0.5.1 (2025-12-11)

27 KiB Raw Blame History

Project Lyra — Comprehensive AI Context Summary

Executive Summary

Quick Context for AI Assistants

What This Project Does

Current Architecture (v0.5.1)

Key Files You'll Work With

Recent Critical Fixes (v0.5.1)

Architecture Deep Dive

Service Topology (Docker Compose)

External LLM Backends (HTTP APIs)

Data Flow (Request Lifecycle)

Intake Module Architecture (v0.5.1)

Environment Configuration

LLM Backend Registry (Multi-Backend Strategy)

Database Configuration

Service URLs (Docker Internal Network)

Feature Flags

Code Structure Overview

Cortex Service (cortex/)

Relay Service (core/relay/)

NeoMem Service (neomem/)

Common Development Tasks

Adding a New Endpoint to Cortex

Modifying LLM Backend for a Module

Adding Diagnostic Logging

Debugging Guide

Problem: SESSIONS Not Persisting

Problem: LLM Backend Timeout

Problem: Docker Compose Won't Start

Testing Checklist

After Making Changes to Cortex

Project History & Context

Evolution Timeline

Architectural Philosophy

Known Issues & Limitations

Fixed in v0.5.1

Current Limitations

Future Enhancements

Troubleshooting Quick Reference

API Reference (Quick)

Relay (Port 7078)

Cortex (Port 7081)

NeoMem (Port 7077)

File Manifest (Key Files Only)

Final Notes for AI Assistants

What You Should Know Before Making Changes

When You Need Help

Most Important Thing

27 KiB

Raw Blame History

Cortex Service (`cortex/`)

Relay Service (`core/relay/`)

NeoMem Service (`neomem/`)