Files
project-lyra/docs/PROJECT_SUMMARY.md
2025-12-11 13:12:44 -05:00

27 KiB

Project Lyra — Comprehensive AI Context Summary

Version: v0.5.1 (2025-12-11) Status: Production-ready modular AI companion system Purpose: Memory-backed conversational AI with multi-stage reasoning, persistent context, and modular LLM backend architecture


Executive Summary

Project Lyra is a self-hosted AI companion system designed to overcome the limitations of typical chatbots by providing:

  • Persistent long-term memory (NeoMem: PostgreSQL + Neo4j graph storage)
  • Multi-stage reasoning pipeline (Cortex: reflection → reasoning → refinement → persona)
  • Short-term context management (Intake: session-based summarization embedded in Cortex)
  • Flexible LLM backend routing (supports llama.cpp, Ollama, OpenAI, custom endpoints)
  • OpenAI-compatible API (drop-in replacement for chat applications)

Core Philosophy: Like a human brain has different regions for different functions, Lyra has specialized modules that work together. She's not just a chatbot—she's a notepad, schedule, database, co-creator, and collaborator with her own executive function.


Quick Context for AI Assistants

If you're an AI being given this project to work on, here's what you need to know:

What This Project Does

Lyra is a conversational AI system that remembers everything across sessions. When a user says something in passing, Lyra stores it, contextualizes it, and can recall it later. She can:

  • Track project progress over time
  • Remember user preferences and past conversations
  • Reason through complex questions using multiple LLM calls
  • Apply a consistent personality across all interactions
  • Integrate with multiple LLM backends (local and cloud)

Current Architecture (v0.5.1)

User → Relay (Express/Node.js, port 7078)
  ↓
Cortex (FastAPI/Python, port 7081)
  ├─ Intake module (embedded, in-memory SESSIONS)
  ├─ 4-stage reasoning pipeline
  └─ Multi-backend LLM router
  ↓
NeoMem (FastAPI/Python, port 7077)
  ├─ PostgreSQL (vector storage)
  └─ Neo4j (graph relationships)

Key Files You'll Work With

Backend Services:

Configuration:

Documentation:

Recent Critical Fixes (v0.5.1)

The most recent work fixed a critical bug where Intake's SESSIONS buffer wasn't persisting:

  1. Fixed: bg_summarize() was only a TYPE_CHECKING stub → implemented as logging stub
  2. Fixed: /ingest endpoint had unreachable code → removed early return, added lenient error handling
  3. Added: cortex/intake/__init__.py → proper Python package structure
  4. Added: Diagnostic endpoints /debug/sessions and /debug/summary for troubleshooting

Key Insight: Intake is no longer a standalone service—it's embedded in Cortex as a Python module. SESSIONS must persist in a single Uvicorn worker (no multi-worker support without Redis).


Architecture Deep Dive

Service Topology (Docker Compose)

Active Containers:

  1. relay (Node.js/Express, port 7078)

    • Entry point for all user requests
    • OpenAI-compatible /v1/chat/completions endpoint
    • Routes to Cortex for reasoning
    • Async calls to Cortex /ingest after response
  2. cortex (Python/FastAPI, port 7081)

    • Multi-stage reasoning pipeline
    • Embedded Intake module (no HTTP, direct Python imports)
    • Endpoints: /reason, /ingest, /health, /debug/sessions, /debug/summary
  3. neomem-api (Python/FastAPI, port 7077)

    • Long-term memory storage
    • Fork of Mem0 OSS (fully local, no external SDK)
    • Endpoints: /memories, /search, /health
  4. neomem-postgres (PostgreSQL + pgvector, port 5432)

    • Vector embeddings storage
    • Memory history records
  5. neomem-neo4j (Neo4j, ports 7474/7687)

    • Graph relationships between memories
    • Entity extraction and linking

Disabled Services:

  • intake - No longer needed (embedded in Cortex as of v0.5.1)
  • rag - Beta Lyrae RAG service (planned re-enablement)

External LLM Backends (HTTP APIs)

PRIMARY Backend - llama.cpp @ http://10.0.0.44:8080

  • AMD MI50 GPU-accelerated inference
  • Model: /model (path-based routing)
  • Used for: Reasoning, refinement, summarization

SECONDARY Backend - Ollama @ http://10.0.0.3:11434

  • RTX 3090 GPU-accelerated inference
  • Model: qwen2.5:7b-instruct-q4_K_M
  • Used for: Configurable per-module

CLOUD Backend - OpenAI @ https://api.openai.com/v1

  • Cloud-based inference
  • Model: gpt-4o-mini
  • Used for: Reflection, persona layers

FALLBACK Backend - Local @ http://10.0.0.41:11435

  • CPU-based inference
  • Model: llama-3.2-8b-instruct
  • Used for: Emergency fallback

Data Flow (Request Lifecycle)

1. User sends message → Relay (/v1/chat/completions)
   ↓
2. Relay → Cortex (/reason)
   ↓
3. Cortex calls Intake module (internal Python)
   - Intake.summarize_context(session_id, exchanges)
   - Returns L1/L5/L10/L20/L30 summaries
   ↓
4. Cortex 4-stage pipeline:
   a. reflection.py → Meta-awareness notes (CLOUD backend)
      - "What is the user really asking?"
      - Returns JSON: {"notes": [...]}

   b. reasoning.py → Draft answer (PRIMARY backend)
      - Uses context from Intake
      - Integrates reflection notes
      - Returns draft text

   c. refine.py → Refined answer (PRIMARY backend)
      - Polishes draft for clarity
      - Ensures factual consistency
      - Returns refined text

   d. speak.py → Persona layer (CLOUD backend)
      - Applies Lyra's personality
      - Natural, conversational tone
      - Returns final answer
   ↓
5. Cortex → Relay (returns persona answer)
   ↓
6. Relay → Cortex (/ingest) [async, non-blocking]
   - Sends (session_id, user_msg, assistant_msg)
   - Cortex calls add_exchange_internal()
   - Appends to SESSIONS[session_id]["buffer"]
   ↓
7. Relay → User (returns final response)
   ↓
8. [Planned] Relay → NeoMem (/memories) [async]
   - Store conversation in long-term memory

Intake Module Architecture (v0.5.1)

Location: cortex/intake/

Key Change: Intake is now embedded in Cortex as a Python module, not a standalone service.

Import Pattern:

from intake.intake import add_exchange_internal, SESSIONS, summarize_context

Core Data Structure:

SESSIONS: dict[str, dict] = {}

# Structure:
SESSIONS[session_id] = {
    "buffer": deque(maxlen=200),  # Circular buffer of exchanges
    "created_at": datetime
}

# Each exchange in buffer:
{
    "session_id": "...",
    "user_msg": "...",
    "assistant_msg": "...",
    "timestamp": "2025-12-11T..."
}

Functions:

  1. add_exchange_internal(exchange: dict)

    • Adds exchange to SESSIONS buffer
    • Creates new session if needed
    • Calls bg_summarize() stub
    • Returns {"ok": True, "session_id": "..."}
  2. summarize_context(session_id: str, exchanges: list[dict]) [async]

    • Generates L1/L5/L10/L20/L30 summaries via LLM
    • Called during /reason endpoint
    • Returns multi-level summary dict
  3. bg_summarize(session_id: str)

    • Stub function - logs only, no actual work
    • Defers summarization to /reason call
    • Exists to prevent NameError

Critical Constraint: SESSIONS is a module-level global dict. This requires single-worker Uvicorn mode. Multi-worker deployments need Redis or shared storage.

Diagnostic Endpoints:

  • GET /debug/sessions - Inspect all SESSIONS (object ID, buffer sizes, recent exchanges)
  • GET /debug/summary?session_id=X - Test summarization for a session

Environment Configuration

LLM Backend Registry (Multi-Backend Strategy)

Root .env defines all backend OPTIONS:

# PRIMARY Backend (llama.cpp)
LLM_PRIMARY_PROVIDER=llama.cpp
LLM_PRIMARY_URL=http://10.0.0.44:8080
LLM_PRIMARY_MODEL=/model

# SECONDARY Backend (Ollama)
LLM_SECONDARY_PROVIDER=ollama
LLM_SECONDARY_URL=http://10.0.0.3:11434
LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M

# CLOUD Backend (OpenAI)
LLM_OPENAI_PROVIDER=openai
LLM_OPENAI_URL=https://api.openai.com/v1
LLM_OPENAI_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-proj-...

# FALLBACK Backend
LLM_FALLBACK_PROVIDER=openai_completions
LLM_FALLBACK_URL=http://10.0.0.41:11435
LLM_FALLBACK_MODEL=llama-3.2-8b-instruct

Module-specific backend selection:

CORTEX_LLM=SECONDARY      # Cortex uses Ollama
INTAKE_LLM=PRIMARY        # Intake uses llama.cpp
SPEAK_LLM=OPENAI          # Persona uses OpenAI
NEOMEM_LLM=PRIMARY        # NeoMem uses llama.cpp
UI_LLM=OPENAI             # UI uses OpenAI
RELAY_LLM=PRIMARY         # Relay uses llama.cpp

Philosophy: Root .env provides all backend OPTIONS. Each service chooses which backend to USE via {MODULE}_LLM variable. This eliminates URL duplication while preserving flexibility.

Database Configuration

# PostgreSQL (vector storage)
POSTGRES_USER=neomem
POSTGRES_PASSWORD=neomempass
POSTGRES_DB=neomem
POSTGRES_HOST=neomem-postgres
POSTGRES_PORT=5432

# Neo4j (graph storage)
NEO4J_URI=bolt://neomem-neo4j:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=neomemgraph

Service URLs (Docker Internal Network)

NEOMEM_API=http://neomem-api:7077
CORTEX_API=http://cortex:7081
CORTEX_REASON_URL=http://cortex:7081/reason
CORTEX_INGEST_URL=http://cortex:7081/ingest
RELAY_URL=http://relay:7078

Feature Flags

CORTEX_ENABLED=true
MEMORY_ENABLED=true
PERSONA_ENABLED=false
DEBUG_PROMPT=true
VERBOSE_DEBUG=true

Code Structure Overview

Cortex Service (cortex/)

Main Files:

  • main.py - FastAPI app initialization
  • router.py - Route definitions (/reason, /ingest, /health, /debug/*)
  • context.py - Context aggregation (Intake summaries, session state)

Reasoning Pipeline (reasoning/):

  • reflection.py - Meta-awareness notes (Cloud LLM)
  • reasoning.py - Draft answer generation (Primary LLM)
  • refine.py - Answer refinement (Primary LLM)

Persona Layer (persona/):

  • speak.py - Personality application (Cloud LLM)
  • identity.py - Persona loader

Intake Module (intake/):

  • __init__.py - Package exports (SESSIONS, add_exchange_internal, summarize_context)
  • intake.py - Core logic (367 lines)
    • SESSIONS dictionary
    • add_exchange_internal()
    • summarize_context()
    • bg_summarize() stub

LLM Integration (llm/):

  • llm_router.py - Backend selector and HTTP client
    • call_llm() function
    • Environment-based routing
    • Payload formatting per backend type

Utilities (utils/):

  • Helper functions for common operations

Configuration:

  • Dockerfile - Single-worker constraint documented
  • requirements.txt - Python dependencies
  • .env - Service-specific overrides

Relay Service (core/relay/)

Main Files:

  • server.js - Express.js server (Node.js)
    • /v1/chat/completions - OpenAI-compatible endpoint
    • /chat - Internal endpoint
    • /_health - Health check
  • package.json - Node.js dependencies

Key Logic:

  • Receives user messages
  • Routes to Cortex /reason
  • Async calls to Cortex /ingest after response
  • Returns final answer to user

NeoMem Service (neomem/)

Main Files:

  • main.py - FastAPI app (memory API)
  • memory.py - Memory management logic
  • embedder.py - Embedding generation
  • graph.py - Neo4j graph operations
  • Dockerfile - Container definition
  • requirements.txt - Python dependencies

API Endpoints:

  • POST /memories - Add new memory
  • POST /search - Semantic search
  • GET /health - Service health

Common Development Tasks

Adding a New Endpoint to Cortex

Example: Add /debug/buffer endpoint

  1. Edit cortex/router.py:
@cortex_router.get("/debug/buffer")
async def debug_buffer(session_id: str, limit: int = 10):
    """Return last N exchanges from a session buffer."""
    from intake.intake import SESSIONS

    session = SESSIONS.get(session_id)
    if not session:
        return {"error": "session not found", "session_id": session_id}

    buffer = session["buffer"]
    recent = list(buffer)[-limit:]

    return {
        "session_id": session_id,
        "total_exchanges": len(buffer),
        "recent_exchanges": recent
    }
  1. Restart Cortex:
docker-compose restart cortex
  1. Test:
curl "http://localhost:7081/debug/buffer?session_id=test&limit=5"

Modifying LLM Backend for a Module

Example: Switch Cortex to use PRIMARY backend

  1. Edit .env:
CORTEX_LLM=PRIMARY  # Change from SECONDARY to PRIMARY
  1. Restart Cortex:
docker-compose restart cortex
  1. Verify in logs:
docker logs cortex | grep "Backend"

Adding Diagnostic Logging

Example: Log every exchange addition

  1. Edit cortex/intake/intake.py:
def add_exchange_internal(exchange: dict):
    session_id = exchange.get("session_id")

    # Add detailed logging
    print(f"[DEBUG] Adding exchange to {session_id}")
    print(f"[DEBUG] User msg: {exchange.get('user_msg', '')[:100]}")
    print(f"[DEBUG] Assistant msg: {exchange.get('assistant_msg', '')[:100]}")

    # ... rest of function
  1. View logs:
docker logs cortex -f | grep DEBUG

Debugging Guide

Problem: SESSIONS Not Persisting

Symptoms:

  • /debug/sessions shows empty or only 1 exchange
  • Summaries always return empty
  • Buffer size doesn't increase

Diagnosis Steps:

  1. Check Cortex logs for SESSIONS object ID:

    docker logs cortex | grep "SESSIONS object id"
    
    • Should show same ID across all calls
    • If IDs differ → module reloading issue
  2. Verify single-worker mode:

    docker exec cortex cat Dockerfile | grep uvicorn
    
    • Should NOT have --workers flag or --workers 1
  3. Check /debug/sessions endpoint:

    curl http://localhost:7081/debug/sessions | jq
    
    • Should show sessions_object_id and current sessions
  4. Inspect __init__.py exists:

    docker exec cortex ls -la intake/__init__.py
    

Solution (Fixed in v0.5.1):

  • Ensure cortex/intake/__init__.py exists with proper exports
  • Verify bg_summarize() is implemented (not just TYPE_CHECKING stub)
  • Check /ingest endpoint doesn't have early return
  • Rebuild Cortex container: docker-compose build cortex && docker-compose restart cortex

Problem: LLM Backend Timeout

Symptoms:

  • Cortex /reason hangs
  • 504 Gateway Timeout errors
  • Logs show "waiting for LLM response"

Diagnosis Steps:

  1. Test backend directly:

    # llama.cpp
    curl http://10.0.0.44:8080/health
    
    # Ollama
    curl http://10.0.0.3:11434/api/tags
    
    # OpenAI
    curl https://api.openai.com/v1/models \
      -H "Authorization: Bearer $OPENAI_API_KEY"
    
  2. Check network connectivity:

    docker exec cortex ping -c 3 10.0.0.44
    
  3. Review Cortex logs:

    docker logs cortex -f | grep "LLM"
    

Solutions:

  • Verify backend URL in .env is correct and accessible
  • Check firewall rules for backend ports
  • Increase timeout in cortex/llm/llm_router.py
  • Switch to different backend temporarily: CORTEX_LLM=CLOUD

Problem: Docker Compose Won't Start

Symptoms:

  • docker-compose up -d fails
  • Container exits immediately
  • "port already in use" errors

Diagnosis Steps:

  1. Check port conflicts:

    netstat -tulpn | grep -E '7078|7081|7077|5432'
    
  2. Check container logs:

    docker-compose logs --tail=50
    
  3. Verify environment file:

    cat .env | grep -v "^#" | grep -v "^$"
    

Solutions:

  • Stop conflicting services: docker-compose down
  • Check .env syntax (no quotes unless necessary)
  • Rebuild containers: docker-compose build --no-cache
  • Check Docker daemon: systemctl status docker

Testing Checklist

After Making Changes to Cortex

1. Build and restart:

docker-compose build cortex
docker-compose restart cortex

2. Verify service health:

curl http://localhost:7081/health

3. Test /ingest endpoint:

curl -X POST http://localhost:7081/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "test",
    "user_msg": "Hello",
    "assistant_msg": "Hi there!"
  }'

4. Verify SESSIONS updated:

curl http://localhost:7081/debug/sessions | jq '.sessions.test.buffer_size'
  • Should show 1 (or increment if already populated)

5. Test summarization:

curl "http://localhost:7081/debug/summary?session_id=test" | jq '.summary'
  • Should return L1/L5/L10/L20/L30 summaries

6. Test full pipeline:

curl -X POST http://localhost:7078/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Test message"}],
    "session_id": "test"
  }' | jq '.choices[0].message.content'

7. Check logs for errors:

docker logs cortex --tail=50

Project History & Context

Evolution Timeline

v0.1.x (2025-09-23 to 2025-09-25)

  • Initial MVP: Relay + Mem0 + Ollama
  • Basic memory storage and retrieval
  • Simple UI with session support

v0.2.x (2025-09-24 to 2025-09-30)

  • Migrated to mem0ai SDK
  • Added sessionId support
  • Created standalone Lyra-Mem0 stack

v0.3.x (2025-09-26 to 2025-10-28)

  • Forked Mem0 → NVGRAM → NeoMem
  • Added salience filtering
  • Integrated Cortex reasoning VM
  • Built RAG system (Beta Lyrae)
  • Established multi-backend LLM support

v0.4.x (2025-11-05 to 2025-11-13)

  • Major architectural rewire
  • Implemented 4-stage reasoning pipeline
  • Added reflection, refinement stages
  • RAG integration
  • LLM router with per-stage backend selection

Infrastructure v1.0.0 (2025-11-26)

  • Consolidated 9 .env files into single source of truth
  • Multi-backend LLM strategy
  • Docker Compose consolidation
  • Created security templates

v0.5.0 (2025-11-28)

  • Fixed all critical API wiring issues
  • Added OpenAI-compatible Relay endpoint
  • Fixed Cortex → Intake integration
  • End-to-end flow verification

v0.5.1 (2025-12-11) - CURRENT

  • Critical fix: SESSIONS persistence bug
  • Implemented bg_summarize() stub
  • Fixed /ingest unreachable code
  • Added cortex/intake/__init__.py
  • Embedded Intake in Cortex (no longer standalone)
  • Added diagnostic endpoints
  • Lenient error handling
  • Documented single-worker constraint

Architectural Philosophy

Modular Design:

  • Each service has a single, clear responsibility
  • Services communicate via well-defined HTTP APIs
  • Configuration is centralized but allows per-service overrides

Local-First:

  • No reliance on external services (except optional OpenAI)
  • All data stored locally (PostgreSQL + Neo4j)
  • Can run entirely air-gapped with local LLMs

Flexible LLM Backend:

  • Not tied to any single LLM provider
  • Can mix local and cloud models
  • Per-stage backend selection for optimal performance/cost

Error Handling:

  • Lenient mode: Never fail the chat pipeline
  • Log errors but continue processing
  • Graceful degradation

Observability:

  • Diagnostic endpoints for debugging
  • Verbose logging mode
  • Object ID tracking for singleton verification

Known Issues & Limitations

Fixed in v0.5.1

  • Intake SESSIONS not persisting → FIXED
  • bg_summarize() NameError → FIXED
  • /ingest endpoint unreachable code → FIXED

Current Limitations

1. Single-Worker Constraint

  • Cortex must run with single Uvicorn worker
  • SESSIONS is in-memory module-level global
  • Multi-worker support requires Redis or shared storage
  • Documented in cortex/Dockerfile lines 7-8

2. NeoMem Integration Incomplete

  • Relay doesn't yet push to NeoMem after responses
  • Memory storage planned for v0.5.2
  • Currently all memory is short-term (SESSIONS only)

3. RAG Service Disabled

  • Beta Lyrae (RAG) commented out in docker-compose.yml
  • Awaiting re-enablement after Intake stabilization
  • Code exists but not currently integrated

4. Session Management

  • No session cleanup/expiration
  • SESSIONS grows unbounded (maxlen=200 per session, but infinite sessions)
  • No session list endpoint in Relay

5. Persona Integration

  • PERSONA_ENABLED=false in .env
  • Persona Sidecar not fully wired
  • Identity loaded but not consistently applied

Future Enhancements

Short-term (v0.5.2):

  • Enable NeoMem integration in Relay
  • Add session cleanup/expiration
  • Session list endpoint
  • NeoMem health monitoring

Medium-term (v0.6.x):

  • Re-enable RAG service
  • Migrate SESSIONS to Redis for multi-worker support
  • Add request correlation IDs
  • Comprehensive health checks

Long-term (v0.7.x+):

  • Persona Sidecar full integration
  • Autonomous "dream" cycles (self-reflection)
  • Verifier module for factual grounding
  • Advanced RAG with hybrid search
  • Memory consolidation strategies

Troubleshooting Quick Reference

Problem Quick Check Solution
SESSIONS empty curl localhost:7081/debug/sessions Rebuild Cortex, verify __init__.py exists
LLM timeout curl http://10.0.0.44:8080/health Check backend connectivity, increase timeout
Port conflict netstat -tulpn | grep 7078 Stop conflicting service or change port
Container crash docker logs cortex Check logs for Python errors, verify .env syntax
Missing package docker exec cortex pip list Rebuild container, check requirements.txt
502 from Relay curl localhost:7081/health Verify Cortex is running, check docker network

API Reference (Quick)

Relay (Port 7078)

POST /v1/chat/completions - OpenAI-compatible chat

{
  "messages": [{"role": "user", "content": "..."}],
  "session_id": "..."
}

GET /_health - Service health

Cortex (Port 7081)

POST /reason - Main reasoning pipeline

{
  "session_id": "...",
  "user_prompt": "...",
  "temperature": 0.7  // optional
}

POST /ingest - Add exchange to SESSIONS

{
  "session_id": "...",
  "user_msg": "...",
  "assistant_msg": "..."
}

GET /debug/sessions - Inspect SESSIONS state

GET /debug/summary?session_id=X - Test summarization

GET /health - Service health

NeoMem (Port 7077)

POST /memories - Add memory

{
  "messages": [{"role": "...", "content": "..."}],
  "user_id": "...",
  "metadata": {}
}

POST /search - Semantic search

{
  "query": "...",
  "user_id": "...",
  "limit": 10
}

GET /health - Service health


File Manifest (Key Files Only)

project-lyra/
├── .env                           # Root environment variables
├── docker-compose.yml             # Service definitions (152 lines)
├── CHANGELOG.md                   # Version history (836 lines)
├── README.md                      # User documentation (610 lines)
├── PROJECT_SUMMARY.md             # This file (AI context)
│
├── cortex/                        # Reasoning engine
│   ├── Dockerfile                 # Single-worker constraint documented
│   ├── requirements.txt
│   ├── .env                       # Cortex overrides
│   ├── main.py                    # FastAPI initialization
│   ├── router.py                  # Routes (306 lines)
│   ├── context.py                 # Context aggregation
│   │
│   ├── intake/                    # Short-term memory (embedded)
│   │   ├── __init__.py           # Package exports
│   │   └── intake.py             # Core logic (367 lines)
│   │
│   ├── reasoning/                 # Reasoning pipeline
│   │   ├── reflection.py         # Meta-awareness
│   │   ├── reasoning.py          # Draft generation
│   │   └── refine.py             # Refinement
│   │
│   ├── persona/                   # Personality layer
│   │   ├── speak.py              # Persona application
│   │   └── identity.py           # Persona loader
│   │
│   └── llm/                       # LLM integration
│       └── llm_router.py         # Backend selector
│
├── core/relay/                    # Orchestrator
│   ├── server.js                 # Express server (Node.js)
│   └── package.json
│
├── neomem/                        # Long-term memory
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── .env                       # NeoMem overrides
│   └── main.py                   # Memory API
│
└── rag/                           # RAG system (disabled)
    ├── rag_api.py
    ├── rag_chat_import.py
    └── chromadb/

Final Notes for AI Assistants

What You Should Know Before Making Changes

  1. SESSIONS is sacred - It's a module-level global in cortex/intake/intake.py. Don't move it, don't duplicate it, don't make it a class attribute. It must remain a singleton.

  2. Single-worker is mandatory - Until SESSIONS is migrated to Redis, Cortex MUST run with a single Uvicorn worker. Multi-worker will cause SESSIONS to be inconsistent.

  3. Lenient error handling - The /ingest endpoint and other parts of the pipeline use lenient error handling: log errors but always return success. Never fail the chat pipeline.

  4. Backend routing is environment-driven - Don't hardcode LLM URLs. Use the {MODULE}_LLM environment variables and the llm_router.py system.

  5. Intake is embedded - Don't try to make HTTP calls to Intake. Use direct Python imports: from intake.intake import ...

  6. Test with diagnostic endpoints - Always use /debug/sessions and /debug/summary to verify SESSIONS behavior after changes.

  7. Follow the changelog format - When documenting changes, use the chronological format established in CHANGELOG.md v0.5.1. Group by version, then by change type (Fixed, Added, Changed, etc.).

When You Need Help

  • SESSIONS issues: Check cortex/intake/intake.py lines 11-14 for initialization, lines 325-366 for add_exchange_internal()
  • Routing issues: Check cortex/router.py lines 65-189 for /reason, lines 201-233 for /ingest
  • LLM backend issues: Check cortex/llm/llm_router.py for backend selection logic
  • Environment variables: Check .env lines 13-40 for LLM backends, lines 28-34 for module selection

Most Important Thing

This project values reliability over features. It's better to have a simple, working system than a complex, broken one. When in doubt, keep it simple, log everything, and never fail silently.


End of AI Context Summary

This document is maintained to provide complete context for AI assistants working on Project Lyra. Last updated: v0.5.1 (2025-12-11)