# Project Lyra **A streamlined AI conversation system with intelligent summarization and memory** Lyra is a unified conversational AI system that processes your thoughts, summarizes conversations at multiple levels, and prepares them for semantic memory storage. Think of it as your personal thought processor—you dump ideas, it makes sense of them, and stores both the raw conversation and progressive summaries. **Current Version:** v1.0.0 (2026-02-23) --- ## Mission Statement Project Lyra is designed to be your **external brain**. Unlike typical chatbots that forget everything, Lyra: - **Captures** everything you say in raw form - **Summarizes** conversations at multiple granularities (L1-L30) - **Stores** both raw and summarized data for future retrieval - **Prepares** everything for semantic search via vector embeddings (Nebula, coming soon) You can vomit ideas at it, and Lyra will organize, summarize, and remember. --- ## Architecture Overview Lyra runs as a **unified Docker container** with a clean separation of concerns: ``` ┌─────────────────────────────────────────────┐ │ Unified Container (lyra) │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ Relay :7078 │ │ Cortex :7081 │ │ │ │ (Node.js) │→ │ (Python FastAPI) │ │ │ │ │ │ │ │ │ │ - API Gateway│ │ - /reason (full) │ │ │ │ - Sessions │ │ - /simple (fast) │ │ │ │ - OpenAI API │ │ - /ingest (intake) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ │ ↓ │ │ ┌──────────────┐ │ │ │ Intake │ │ │ │ (embedded) │ │ │ │ │ │ │ │ - L1-L30 │ │ │ │ - Summary │ │ │ │ - Buffer │ │ │ └──────────────┘ │ │ │ │ └────────────────────────────┼─────────────────┘ ↓ ┌─────────────┐ │ Nebula │ (coming soon) │ (vector │ │ storage) │ └─────────────┘ ``` ### Components **1. Relay (Node.js - Port 7078)** - User-facing API gateway - OpenAI-compatible endpoint: `POST /v1/chat/completions` - Session management (save, load, rename, delete) - Proxies requests to Cortex **2. Cortex (Python - Port 7081)** - Main reasoning and processing brain - Multi-stage reasoning pipeline - LLM routing to different backends - Embedded Intake module **3. Intake (Python Module - Embedded)** - Short-term memory buffer (200 messages per session) - Multi-level summarization: - **L1** (5 messages): Ultra-short summary - **L5** (10 messages): Short overview - **L10** (10 messages): "Reality Check" - tone, intent, direction - **L20** (merged L10s): "Session Overview" - progress and themes - **L30** (merged L20s): "Continuity Report" - high-level reflection - Sends summaries to Nebula (HTTP POST with disk fallback) **4. Nebula (Future - Port 7090)** - Vector database for semantic memory - RAG (Retrieval-Augmented Generation) - Memory resurfacing based on similarity --- ## What Makes Lyra Different? ### Progressive Summarization Most chatbots either keep raw history (expensive) or forget everything (useless). Lyra does both: - **Raw storage**: Every conversation turn saved - **L1-L30 summaries**: Multiple granularities for different use cases - L1: "What just happened?" (immediate context) - L10: "What's the vibe?" (tone and direction) - L20: "What did we accomplish?" (session overview) - L30: "What's the big picture?" (continuity across sessions) ### Nebula-Ready Architecture Summaries are sent via HTTP to Nebula (when available), with automatic disk fallback: ``` .nebula_fallback/ └── {session_id}/ ├── L10_20260223_203045.json ├── L20_20260223_204512.json └── L30_20260223_210030.json ``` ### Dual Mode Operation - **Simple Mode** (`/simple`): Fast, direct LLM responses - **Cortex Mode** (`/reason`): Full 4-stage reasoning pipeline 1. Reflection (meta-awareness) 2. Reasoning (draft) 3. Refinement (polish) 4. Persona (Lyra's voice) --- ## Quick Start ### Prerequisites - Docker + Docker Compose - At least one LLM backend (llama.cpp, Ollama, OpenAI API) ### Run It ```bash # 1. Create .env file with your LLM backend cp .env.example .env # Edit .env with your LLM URLs and API keys # 2. Build and start docker-compose up -d --build # 3. Check health curl http://localhost:7078/_health # Relay curl http://localhost:7081/_health # Cortex # 4. Open UI open http://localhost:8081 ``` ### Test It ```bash # Simple chat curl -X POST http://localhost:7078/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "mode": "standard", "messages": [{"role": "user", "content": "Hello!"}], "sessionId": "test" }' # Full reasoning pipeline curl -X POST http://localhost:7078/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "mode": "cortex", "messages": [{"role": "user", "content": "Explain quantum computing"}], "sessionId": "test" }' ``` --- ## Data Flow ### Simple Mode (Fast Path) ``` User → Relay → Cortex (/simple) → Direct LLM → Response ↓ Intake (buffer + summarize on triggers) ↓ Nebula (summaries only) ``` ### Cortex Mode (Full Pipeline) ``` User → Relay → Cortex (/reason) ↓ 1. Reflection (what's being asked?) ↓ 2. Reasoning (draft answer) ↓ 3. Refinement (polish) ↓ 4. Persona (Lyra's voice) ↓ Intake (buffer + multi-level summaries) ↓ Nebula (raw + summaries) ↓ Response ``` --- ## Configuration ### Environment Variables **LLM Backends:** ```bash # Primary backend (llama.cpp on AMD MI50) LLM_PRIMARY_URL=http://10.0.0.44:8080 LLM_PRIMARY_MODEL=/model # Secondary backend (Ollama on RTX 3090) LLM_SECONDARY_URL=http://10.0.0.3:11434 LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M # Cloud backend (OpenAI) LLM_OPENAI_URL=https://api.openai.com/v1 LLM_OPENAI_MODEL=gpt-4o-mini OPENAI_API_KEY=sk-... ``` **Module-Specific Backend Selection:** ```bash CORTEX_LLM=PRIMARY # Reasoning engine INTAKE_LLM=PRIMARY # Summarization SPEAK_LLM=OPENAI # Persona (final voice) STANDARD_MODE_LLM=SECONDARY # Simple mode default ``` **Nebula Integration:** ```bash NEBULA_API=http://localhost:7090 # When Nebula is running NEBULA_KEY=your-api-key # Optional auth ``` **Intake Settings:** ```bash INTAKE_LLM=PRIMARY SUMMARY_MAX_TOKENS=200 SUMMARY_TEMPERATURE=0.3 ``` --- ## API Reference ### Relay Endpoints (Port 7078) **Chat (OpenAI-compatible):** ```bash POST /v1/chat/completions { "mode": "standard" | "cortex", "messages": [{"role": "user", "content": "..."}], "sessionId": "session-123" } ``` **Sessions:** ```bash GET /sessions # List all sessions GET /sessions/:id # Get session history POST /sessions/:id # Save session PATCH /sessions/:id/metadata # Rename session DELETE /sessions/:id # Delete session ``` **Health:** ```bash GET /_health ``` ### Cortex Endpoints (Port 7081) **Reasoning:** ```bash POST /reason { "session_id": "session-123", "user_prompt": "Your question here" } ``` **Simple Mode:** ```bash POST /simple { "session_id": "session-123", "user_prompt": "Your question here", "backend": "SECONDARY" # Optional } ``` **Intake:** ```bash POST /ingest { "session_id": "session-123", "user_msg": "User message", "assistant_msg": "Assistant response" } ``` **Health:** ```bash GET /_health ``` --- ## File Structure ``` project-lyra/ ├── Dockerfile # Unified container (Node + Python) ├── docker-compose.yml # Single lyra service + UI ├── start.sh # Startup script (Cortex → Relay) ├── .dockerignore ├── QUICKSTART.md # Quick reference │ ├── core/ │ └── relay/ # Node.js API gateway │ ├── server.js │ ├── lib/ │ │ ├── cortex.js # Cortex HTTP client │ │ └── llm.js # LLM routing │ └── sessions/ # Session storage (volume) │ ├── cortex/ # Python reasoning engine │ ├── main.py # FastAPI app │ ├── router.py # /reason, /simple, /ingest │ ├── context.py # Session context │ ├── llm/ │ │ └── llm_router.py # Multi-backend LLM routing │ ├── intake/ │ │ └── intake.py # Summarization module │ ├── reasoning/ │ │ ├── reflection.py │ │ ├── reasoning.py │ │ └── refine.py │ └── persona/ │ └── speak.py │ └── .nebula_fallback/ # Disk storage until Nebula runs └── {session_id}/ ├── L10_*.json ├── L20_*.json └── L30_*.json ``` --- ## Roadmap ### ✅ Phase 1 (Complete) - Unified container architecture - Multi-level summarization (L1-L30) - HTTP client for Nebula (with disk fallback) - Session management - Dual-mode operation ### 🚧 Phase 2 (In Progress) - Build Nebula vector database - RAG integration - Memory resurfacing based on semantic similarity ### 📋 Phase 3 (Planned) - Entity extraction from summaries - Topic clustering - Automatic knowledge graph generation - Temporal memory (what happened when) --- ## Troubleshooting ### Container won't start ```bash # Check logs docker-compose logs lyra # Common issues: # - Missing .env file # - Invalid LLM backend URLs # - Port conflicts (7078, 7081) ``` ### Summaries not appearing ```bash # Check Nebula fallback directory ls -la .nebula_fallback/ # Verify Cortex is processing docker-compose logs lyra | grep "Nebula" ``` ### Sessions not persisting ```bash # Check volume mount docker-compose exec lyra ls -la /app/relay/sessions/ # Verify session save calls curl http://localhost:7078/sessions ``` --- ## Development ### Making Changes **Code changes (hot reload):** ```bash docker-compose restart lyra ``` **Dependency changes (rebuild):** ```bash docker-compose up -d --build lyra ``` **View logs:** ```bash docker-compose logs -f lyra ``` ### Adding a New LLM Backend 1. Add to `.env`: ```bash LLM_CUSTOM_URL=http://your-backend:port LLM_CUSTOM_MODEL=model-name ``` 2. Configure module: ```bash CORTEX_LLM=CUSTOM ``` 3. Restart: ```bash docker-compose restart lyra ``` --- ## Version History ### v1.0.0 (2026-02-23) - The Great Simplification **Major Refactor:** - ✅ Unified Relay + Cortex into single container - ✅ Removed NeoMem (replaced by upcoming Nebula) - ✅ Removed old ingest_handler and RAG services - ✅ Simplified to core flow: intake → summarize → store - ✅ Added HTTP client for Nebula with disk fallback - ✅ Cleaned docker-compose (2 services instead of 7) - ✅ Updated documentation to reflect new architecture **Architecture Changes:** - Intake now sends summaries to Nebula (HTTP POST) - Disk fallback writes JSON files to `.nebula_fallback/` - Relay and Cortex communicate via localhost (faster) - Single build, single deploy, single log stream --- ## License © 2026 Terra-Mechanics / ServersDown Labs. Apache 2.0. **Built with Claude Code** --- ## Credits Built by Brian with assistance from Claude (Anthropic). Special thanks to the open source community: - FastAPI - Express.js - Docker - llama.cpp - Ollama