# Project Lyra - README v0.9.1 Lyra is a modular persistent AI companion system with advanced reasoning capabilities and autonomous decision-making. It provides memory-backed chat using **Relay** + **Cortex** with integrated **Autonomy System**, featuring a multi-stage reasoning pipeline powered by HTTP-based LLM backends. **NEW in v0.9.0:** Trilium Notes integration - Search and create notes from conversations **Current Version:** v0.9.1 (2025-12-29) > **Note:** As of v0.6.0, NeoMem is **disabled by default** while we work out integration hiccups in the pipeline. The autonomy system is being refined independently before full memory integration. ## Mission Statement The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget evertything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later. --- ## Architecture Overview Project Lyra operates as a **single docker-compose deployment** with multiple Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules: ### Core Services **1. Relay** (Node.js/Express) - Port 7078 - Main orchestrator and message router - Coordinates all module interactions - OpenAI-compatible endpoint: `POST /v1/chat/completions` - Internal endpoint: `POST /chat` - Dual-mode routing: Standard Mode (simple chat) or Cortex Mode (full reasoning) - Server-side session persistence with file-based storage - Session management API: `GET/POST/PATCH/DELETE /sessions` - Manages async calls to Cortex ingest - *(NeoMem integration currently disabled in v0.6.0)* **2. UI** (Static HTML) - Port 8081 (nginx) - Browser-based chat interface with cyberpunk theme - Mode selector (Standard/Cortex) in header - Settings modal with backend selection and session management - Light/Dark mode toggle (dark by default) - **NEW in v0.8.0:** "🧠 Show Work" button for real-time thinking stream - Opens popup window with live SSE connection - Color-coded events: thinking, tool calls, results, completion - Auto-scrolling with animations - Session-aware (matches current chat session) - Server-synced session management (persists across browsers and reboots) - OpenAI-compatible message format **3. NeoMem** (Python/FastAPI) - Port 7077 - **DISABLED IN v0.6.0** - Long-term memory database (fork of Mem0 OSS) - Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j) - RESTful API: `/memories`, `/search` - Semantic memory updates and retrieval - No external SDK dependencies - fully local - **Status:** Currently disabled while pipeline integration is refined ### Reasoning Layer **4. Cortex** (Python/FastAPI) - Port 7081 - Primary reasoning engine with multi-stage pipeline and autonomy system - **Includes embedded Intake module** (no separate service as of v0.5.1) - **Integrated Autonomy System** (NEW in v0.6.0) - See Autonomy System section below - **Tool Calling System** (NEW in v0.8.0) - Agentic execution for Standard Mode - Sandboxed code execution (Python, JavaScript, Bash) - Web search via Tavily API - **Trilium knowledge base integration** (NEW in v0.9.0) - Multi-iteration autonomous tool use (max 5 iterations) - Real-time thinking stream via SSE - **Dual Operating Modes:** - **Standard Mode** (v0.7.0) - Simple chatbot with context retention + tool calling (v0.8.0) - Bypasses reflection, reasoning, refinement stages - Direct LLM call with conversation history - User-selectable backend (SECONDARY, OPENAI, or custom) - **NEW:** Autonomous tool calling for code execution, web search, knowledge queries - **NEW:** "Show Your Work" real-time thinking stream - Faster responses for coding and practical tasks - **Cortex Mode** - Full 4-stage reasoning pipeline 1. **Reflection** - Generates meta-awareness notes about conversation 2. **Reasoning** - Creates initial draft answer using context 3. **Refinement** - Polishes and improves the draft 4. **Persona** - Applies Lyra's personality and speaking style - Integrates with Intake for short-term context via internal Python imports - Flexible LLM router supporting multiple backends via HTTP - **Endpoints:** - `POST /reason` - Main reasoning pipeline (Cortex Mode) - `POST /simple` - Direct LLM chat with tool calling (Standard Mode) - `GET /stream/thinking/{session_id}` - SSE stream for thinking events **NEW in v0.8.0** - `POST /ingest` - Receives conversation exchanges from Relay - `GET /health` - Service health check - `GET /debug/sessions` - Inspect in-memory SESSIONS state - `GET /debug/summary` - Test summarization for a session **5. Intake** (Python Module) - **Embedded in Cortex** - **No longer a standalone service** - runs as Python module inside Cortex container - Short-term memory management with session-based circular buffer - In-memory SESSIONS dictionary: `session_id → {buffer: deque(maxlen=200), created_at: timestamp}` - Multi-level summarization (L1/L5/L10/L20/L30) produced by `summarize_context()` - Deferred summarization - actual summary generation happens during `/reason` call - Internal Python API: - `add_exchange_internal(exchange)` - Direct function call from Cortex - `summarize_context(session_id, exchanges)` - Async LLM-based summarization - `SESSIONS` - Module-level global state (requires single Uvicorn worker) ### LLM Backends (HTTP-based) **All LLM communication is done via HTTP APIs:** - **PRIMARY**: llama.cpp server (`http://10.0.0.44:8080`) - AMD MI50 GPU backend - **SECONDARY**: Ollama server (`http://10.0.0.3:11434`) - RTX 3090 backend - Model: qwen2.5:7b-instruct-q4_K_M - **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cloud-based models - Model: gpt-4o-mini - **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback - Model: llama-3.2-8b-instruct Each module can be configured to use a different backend via environment variables. ### Autonomy System (NEW in v0.6.0) **Cortex Autonomy Subsystems** - Multi-layered autonomous decision-making and learning - **Executive Layer** [cortex/autonomy/executive/](cortex/autonomy/executive/) - High-level planning and goal setting - Multi-step reasoning for complex objectives - Strategic decision making - **Decision Engine** [cortex/autonomy/tools/decision_engine.py](cortex/autonomy/tools/decision_engine.py) - Autonomous decision-making framework - Option evaluation and selection - Coordinated decision orchestration - **Autonomous Actions** [cortex/autonomy/actions/](cortex/autonomy/actions/) - Self-initiated action execution - Context-aware behavior implementation - Action logging and tracking - **Pattern Learning** [cortex/autonomy/learning/](cortex/autonomy/learning/) - Learns from interaction patterns - Identifies recurring user needs - Adaptive behavior refinement - **Proactive Monitoring** [cortex/autonomy/proactive/](cortex/autonomy/proactive/) - System state monitoring - Intervention opportunity detection - Background awareness capabilities - **Self-Analysis** [cortex/autonomy/self/](cortex/autonomy/self/) - Performance tracking and analysis - Cognitive pattern identification - Self-state persistence in [cortex/data/self_state.json](cortex/data/self_state.json) - **Orchestrator** [cortex/autonomy/tools/orchestrator.py](cortex/autonomy/tools/orchestrator.py) - Coordinates all autonomy subsystems - Manages tool selection and execution - Handles external integrations (with enable/disable controls) **Autonomy Architecture:** The autonomy system operates in coordinated layers, all maintaining state in `self_state.json`: 1. Executive Layer → Planning and goals 2. Decision Layer → Evaluation and choices 3. Action Layer → Execution 4. Learning Layer → Pattern adaptation 5. Monitoring Layer → Proactive awareness --- ## Data Flow Architecture (v0.7.0) ### Standard Mode Flow (NEW in v0.7.0): ``` User (UI) → POST /v1/chat/completions {mode: "standard", backend: "SECONDARY"} ↓ Relay (7078) ↓ POST /simple Cortex (7081) ↓ (internal Python call) Intake module → get_recent_messages() (last 20 messages) ↓ Direct LLM call (user-selected backend: SECONDARY/OPENAI/custom) ↓ Returns simple response to Relay ↓ Relay → POST /ingest (async) ↓ Cortex → add_exchange_internal() → SESSIONS buffer ↓ Relay → POST /sessions/:id (save session to file) ↓ Relay → UI (returns final response) Note: Bypasses reflection, reasoning, refinement, persona stages ``` ### Cortex Mode Flow (Full Reasoning): ``` User (UI) → POST /v1/chat/completions {mode: "cortex"} ↓ Relay (7078) ↓ POST /reason Cortex (7081) ↓ (internal Python call) Intake module → summarize_context() ↓ Autonomy System → Decision evaluation & pattern learning ↓ Cortex processes (4 stages): 1. reflection.py → meta-awareness notes (CLOUD backend) 2. reasoning.py → draft answer (PRIMARY backend, autonomy-aware) 3. refine.py → refined answer (PRIMARY backend) 4. persona/speak.py → Lyra personality (CLOUD backend, autonomy-aware) ↓ Returns persona answer to Relay ↓ Relay → POST /ingest (async) ↓ Cortex → add_exchange_internal() → SESSIONS buffer ↓ Autonomy System → Update self_state.json (pattern tracking) ↓ Relay → POST /sessions/:id (save session to file) ↓ Relay → UI (returns final response) Note: NeoMem integration disabled in v0.6.0 ``` ### Session Persistence Flow (NEW in v0.7.0): ``` UI loads → GET /sessions → Relay → List all sessions from files → UI dropdown User sends message → POST /sessions/:id → Relay → Save to sessions/*.json User renames session → PATCH /sessions/:id/metadata → Relay → Update *.meta.json User deletes session → DELETE /sessions/:id → Relay → Remove session files Sessions stored in: core/relay/sessions/ - {sessionId}.json (conversation history) - {sessionId}.meta.json (name, timestamps, metadata) ``` ### Cortex 4-Stage Reasoning Pipeline: 1. **Reflection** (`reflection.py`) - Cloud LLM (OpenAI) - Analyzes user intent and conversation context - Generates meta-awareness notes - "What is the user really asking?" 2. **Reasoning** (`reasoning.py`) - Primary LLM (llama.cpp) - Retrieves short-term context from Intake module - Creates initial draft answer - Integrates context, reflection notes, and user prompt 3. **Refinement** (`refine.py`) - Primary LLM (llama.cpp) - Polishes the draft answer - Improves clarity and coherence - Ensures factual consistency 4. **Persona** (`speak.py`) - Cloud LLM (OpenAI) - Applies Lyra's personality and speaking style - Natural, conversational output - Final answer returned to user --- ## Features ### Core Services **Relay**: - Main orchestrator and message router - OpenAI-compatible endpoint: `POST /v1/chat/completions` - Internal endpoint: `POST /chat` - Health check: `GET /_health` - **NEW:** Dual-mode routing (Standard/Cortex) - **NEW:** Server-side session persistence with CRUD API - **NEW:** Session management endpoints: - `GET /sessions` - List all sessions - `GET /sessions/:id` - Retrieve session history - `POST /sessions/:id` - Save session history - `PATCH /sessions/:id/metadata` - Update session metadata - `DELETE /sessions/:id` - Delete session - Async non-blocking calls to Cortex - Shared request handler for code reuse - Comprehensive error handling **NeoMem (Memory Engine)**: - Forked from Mem0 OSS - fully independent - Drop-in compatible API (`/memories`, `/search`) - Local-first: runs on FastAPI with Postgres + Neo4j - No external SDK dependencies - Semantic memory updates - compares embeddings and performs in-place updates - Default service: `neomem-api` (port 7077) **UI**: - Lightweight static HTML chat interface - Cyberpunk theme with light/dark mode toggle - **NEW:** Mode selector (Standard/Cortex) in header - **NEW:** Settings modal (⚙ button) with: - Backend selection for Standard Mode (SECONDARY/OPENAI/custom) - Session management (view, delete sessions) - Theme toggle (dark mode default) - **NEW:** Server-synced session management - Sessions persist across browsers and reboots - Rename sessions with custom names - Delete sessions with confirmation - Automatic session save on every message - OpenAI message format support ### Reasoning Layer **Cortex** (v0.7.0): - **NEW:** Dual operating modes: - **Standard Mode** - Simple chat with context (`/simple` endpoint) - User-selectable backend (SECONDARY, OPENAI, or custom) - Full conversation history via Intake integration - Bypasses reasoning pipeline for faster responses - **Cortex Mode** - Full reasoning pipeline (`/reason` endpoint) - Multi-stage processing: reflection → reasoning → refine → persona - Per-stage backend selection - Autonomy system integration - Flexible LLM backend routing via HTTP - Async processing throughout - Embedded Intake module for short-term context - `/reason`, `/simple`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary` endpoints - Lenient error handling - never fails the chat pipeline **Intake** (Embedded Module): - **Architectural change**: Now runs as Python module inside Cortex container - In-memory SESSIONS management (session_id → buffer) - Multi-level summarization: L1 (ultra-short), L5 (short), L10 (medium), L20 (detailed), L30 (full) - Deferred summarization strategy - summaries generated during `/reason` call - `bg_summarize()` is a logging stub - actual work deferred - **Single-worker constraint**: SESSIONS requires single Uvicorn worker or Redis/shared storage **LLM Router**: - Dynamic backend selection via HTTP - Environment-driven configuration - Support for llama.cpp, Ollama, OpenAI, custom endpoints - Per-module backend preferences: - `CORTEX_LLM=SECONDARY` (Ollama for reasoning) - `INTAKE_LLM=PRIMARY` (llama.cpp for summarization) - `SPEAK_LLM=OPENAI` (Cloud for persona) - `NEOMEM_LLM=PRIMARY` (llama.cpp for memory operations) ### Beta Lyrae (RAG Memory DB) - Currently Disabled - **RAG Knowledge DB - Beta Lyrae (sheliak)** - This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra. - It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation. - **Status**: Disabled in docker-compose.yml (v0.5.1) The system uses: - **ChromaDB** for persistent vector storage - **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity - **FastAPI** (port 7090) for the `/rag/search` REST endpoint Directory Layout: ``` rag/ ├── rag_chat_import.py # imports JSON chat logs ├── rag_docs_import.py # (planned) PDF/EPUB/manual importer ├── rag_build.py # legacy single-folder builder ├── rag_query.py # command-line query helper ├── rag_api.py # FastAPI service providing /rag/search ├── chromadb/ # persistent vector store ├── chatlogs/ # organized source data │ ├── poker/ │ ├── work/ │ ├── lyra/ │ ├── personal/ │ └── ... └── import.log # progress log for batch runs ``` **OpenAI chatlog importer features:** - Recursive folder indexing with **category detection** from directory name - Smart chunking for long messages (5,000 chars per slice) - Automatic deduplication using SHA-1 hash of file + chunk - Timestamps for both file modification and import time - Full progress logging via tqdm - Safe to run in background with `nohup … &` --- ## Docker Deployment All services run in a single docker-compose stack with the following containers: **Active Services:** - **relay** - Main orchestrator (port 7078) - **cortex** - Reasoning engine with embedded Intake and Autonomy System (port 7081) **Disabled Services (v0.6.0):** - **neomem-postgres** - PostgreSQL with pgvector extension (port 5432) - *disabled while refining pipeline* - **neomem-neo4j** - Neo4j graph database (ports 7474, 7687) - *disabled while refining pipeline* - **neomem-api** - NeoMem memory service (port 7077) - *disabled while refining pipeline* - **intake** - No longer needed (embedded in Cortex as of v0.5.1) - **rag** - Beta Lyrae RAG service (port 7090) - currently disabled All containers communicate via the `lyra_net` Docker bridge network. ## External LLM Services The following LLM backends are accessed via HTTP (not part of docker-compose): - **llama.cpp Server** (`http://10.0.0.44:8080`) - AMD MI50 GPU-accelerated inference - Primary backend for reasoning and refinement stages - Model path: `/model` - **Ollama Server** (`http://10.0.0.3:11434`) - RTX 3090 GPU-accelerated inference - Secondary/configurable backend - Model: qwen2.5:7b-instruct-q4_K_M - **OpenAI API** (`https://api.openai.com/v1`) - Cloud-based inference - Used for reflection and persona stages - Model: gpt-4o-mini - **Fallback Server** (`http://10.0.0.41:11435`) - Emergency backup endpoint - Local llama-3.2-8b-instruct model --- ## Version History ### v0.9.0 (2025-12-29) - Current Release **Major Feature: Trilium Notes Integration** - ✅ Added Trilium ETAPI integration for knowledge base access - ✅ `search_notes()` tool for searching personal notes during conversations - ✅ `create_note()` tool for capturing insights and information - ✅ ETAPI authentication with secure token management - ✅ Complete setup documentation and API reference - ✅ Environment configuration with feature flag (`ENABLE_TRILIUM`) - ✅ Automatic parent note handling (defaults to "root") - ✅ Connection error handling and user-friendly messages **Key Capabilities:** - Search your Trilium notes during conversations for context - Create new notes from conversation insights automatically - Cross-reference information between chat and knowledge base - Future: Find duplicates, suggest organization, summarize notes **Documentation:** - Added [TRILIUM_SETUP.md](TRILIUM_SETUP.md) - Complete setup guide - Added [docs/TRILIUM_API.md](docs/TRILIUM_API.md) - Full API reference ### v0.8.0 (2025-12-26) **Major Feature: Agentic Tool Calling + "Show Your Work"** - ✅ Added tool calling system for Standard Mode - ✅ Real-time thinking stream visualization - ✅ Sandboxed code execution (Python, JavaScript, Bash) - ✅ Web search integration via Tavily API - ✅ Server-Sent Events (SSE) for live tool execution updates ### v0.7.0 (2025-12-21) **Major Features: Standard Mode + Backend Selection + Session Persistence** - ✅ Added Standard Mode for simple chatbot functionality - ✅ UI mode selector (Standard/Cortex) in header - ✅ Settings modal with backend selection for Standard Mode - ✅ Server-side session persistence with file-based storage - ✅ Session management UI (view, rename, delete sessions) - ✅ Light/Dark mode toggle (dark by default) - ✅ Context retention in Standard Mode via Intake integration - ✅ Fixed modal positioning and z-index issues - ✅ Cortex `/simple` endpoint for direct LLM calls - ✅ Session CRUD API in Relay - ✅ Full backward compatibility - Cortex Mode unchanged **Key Changes:** - Standard Mode bypasses 6 of 7 reasoning stages for faster responses - Sessions now sync across browsers and survive container restarts - User can select SECONDARY (Ollama), OPENAI, or custom backend for Standard Mode - Theme preference and backend selection persisted in localStorage - Session files stored in `core/relay/sessions/` directory ### v0.6.0 (2025-12-18) **Major Feature: Autonomy System (Phase 1, 2, and 2.5)** - ✅ Added autonomous decision-making framework - ✅ Implemented executive planning and goal-setting layer - ✅ Added pattern learning system for adaptive behavior - ✅ Implemented proactive monitoring capabilities - ✅ Created self-analysis and performance tracking system - ✅ Integrated self-state persistence (`cortex/data/self_state.json`) - ✅ Built decision engine with orchestrator coordination - ✅ Added autonomous action execution framework - ✅ Integrated autonomy into reasoning and persona layers - ✅ Created comprehensive test suites for autonomy features - ✅ Added complete system breakdown documentation **Architecture Changes:** - Autonomy system integrated into Cortex reasoning pipeline - Multi-layered autonomous decision-making architecture - Self-state tracking across sessions - NeoMem disabled by default while refining pipeline integration - Enhanced orchestrator with flexible service controls **Documentation:** - Added [PROJECT_LYRA_COMPLETE_BREAKDOWN.md](docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md) - Updated changelog with comprehensive autonomy system details ### v0.5.1 (2025-12-11) **Critical Intake Integration Fixes:** - ✅ Fixed `bg_summarize()` NameError preventing SESSIONS persistence - ✅ Fixed `/ingest` endpoint unreachable code - ✅ Added `cortex/intake/__init__.py` for proper package structure - ✅ Added diagnostic logging to verify SESSIONS singleton behavior - ✅ Added `/debug/sessions` and `/debug/summary` endpoints - ✅ Documented single-worker constraint in Dockerfile - ✅ Implemented lenient error handling (never fails chat pipeline) - ✅ Intake now embedded in Cortex - no longer standalone service **Architecture Changes:** - Intake module runs inside Cortex container as pure Python import - No HTTP calls between Cortex and Intake (internal function calls) - SESSIONS persist correctly in Uvicorn worker - Deferred summarization strategy (summaries generated during `/reason`) ### v0.5.0 (2025-11-28) - ✅ Fixed all critical API wiring issues - ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`) - ✅ Fixed Cortex → Intake integration - ✅ Added missing Python package `__init__.py` files - ✅ End-to-end message flow verified and working ### Infrastructure v1.0.0 (2025-11-26) - Consolidated 9 scattered `.env` files into single source of truth - Multi-backend LLM strategy implemented - Docker Compose consolidation - Created `.env.example` security templates ### v0.4.x (Major Rewire) - Cortex multi-stage reasoning pipeline - LLM router with multi-backend support - Major architectural restructuring ### v0.3.x - Beta Lyrae RAG system - NeoMem integration - Basic Cortex reasoning loop --- ## Known Issues (v0.7.0) ### Temporarily Disabled - **NeoMem disabled by default** - Being refined independently before full integration - PostgreSQL + pgvector storage inactive - Neo4j graph database inactive - Memory persistence endpoints not active - RAG service (Beta Lyrae) currently disabled in docker-compose.yml ### Standard Mode Limitations - No reflection, reasoning, or refinement stages (by design) - DeepSeek R1 not recommended for Standard Mode (generates reasoning artifacts) - No RAG integration (same as Cortex Mode - currently disabled) - No NeoMem memory storage (same as Cortex Mode - currently disabled) ### Session Management Limitations - Sessions stored in container filesystem - requires volume mount for true persistence - No session import/export functionality yet - No session search or filtering - Old localStorage sessions don't automatically migrate to server ### Operational Notes - **Single-worker constraint**: Cortex must run with single Uvicorn worker to maintain SESSIONS state - Multi-worker scaling requires migrating SESSIONS to Redis or shared storage - Diagnostic endpoints (`/debug/sessions`, `/debug/summary`) available for troubleshooting - Backend selection only affects Standard Mode - Cortex Mode uses environment-configured backends ### Future Enhancements - Re-enable NeoMem integration after pipeline refinement - Full autonomy system maturation and optimization - Re-enable RAG service integration - Session import/export functionality - Session search and filtering UI - Migrate SESSIONS to Redis for multi-worker support - Add request correlation IDs for tracing - Comprehensive health checks across all services - Enhanced pattern learning with long-term memory integration --- ## Quick Start ### Prerequisites - Docker + Docker Compose - At least one HTTP-accessible LLM endpoint (llama.cpp, Ollama, or OpenAI API key) ### Setup 1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys: ```bash # Required: Configure at least one LLM backend LLM_PRIMARY_URL=http://10.0.0.44:8080 # llama.cpp LLM_SECONDARY_URL=http://10.0.0.3:11434 # Ollama OPENAI_API_KEY=sk-... # OpenAI ``` 2. Start all services with docker-compose: ```bash docker-compose up -d ``` 3. Check service health: ```bash # Relay health curl http://localhost:7078/_health # Cortex health curl http://localhost:7081/health # NeoMem health curl http://localhost:7077/health ``` 4. Access the UI at `http://localhost:8081` ### Using the UI **Mode Selection:** - Use the **Mode** dropdown in the header to switch between: - **Standard** - Simple chatbot for coding and practical tasks - **Cortex** - Full reasoning pipeline with autonomy features **Settings Menu:** 1. Click the **⚙ Settings** button in the header 2. **Backend Selection** (Standard Mode only): - Choose **SECONDARY** (Ollama/Qwen on 3090) - Fast, local - Choose **OPENAI** (GPT-4o-mini) - Cloud-based, high quality - Enter custom backend name for advanced configurations 3. **Session Management**: - View all saved sessions with message counts and timestamps - Click 🗑️ to delete unwanted sessions 4. **Theme Toggle**: - Click **🌙 Dark Mode** or **☀️ Light Mode** to switch themes **Session Management:** - Sessions automatically save on every message - Use the **Session** dropdown to switch between sessions - Click **➕ New** to create a new session - Click **✏️ Rename** to rename the current session - Sessions persist across browsers and container restarts ### Test **Test Standard Mode:** ```bash curl -X POST http://localhost:7078/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "mode": "standard", "backend": "SECONDARY", "messages": [{"role": "user", "content": "Hello!"}], "sessionId": "test" }' ``` **Test Cortex Mode (Full Reasoning):** ```bash curl -X POST http://localhost:7078/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "mode": "cortex", "messages": [{"role": "user", "content": "Hello Lyra!"}], "sessionId": "test" }' ``` **Test Cortex /ingest endpoint:** ```bash curl -X POST http://localhost:7081/ingest \ -H "Content-Type: application/json" \ -d '{ "session_id": "test", "user_msg": "Hello", "assistant_msg": "Hi there!" }' ``` **Inspect SESSIONS state:** ```bash curl http://localhost:7081/debug/sessions ``` **Get summary for a session:** ```bash curl "http://localhost:7081/debug/summary?session_id=test" ``` **List all sessions:** ```bash curl http://localhost:7078/sessions ``` **Get session history:** ```bash curl http://localhost:7078/sessions/sess-abc123 ``` **Delete a session:** ```bash curl -X DELETE http://localhost:7078/sessions/sess-abc123 ``` All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack. --- ## Environment Variables ### LLM Backend Configuration **Backend URLs (Full API endpoints):** ```bash LLM_PRIMARY_URL=http://10.0.0.44:8080 # llama.cpp LLM_PRIMARY_MODEL=/model LLM_SECONDARY_URL=http://10.0.0.3:11434 # Ollama LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M LLM_OPENAI_URL=https://api.openai.com/v1 LLM_OPENAI_MODEL=gpt-4o-mini OPENAI_API_KEY=sk-... ``` **Module-specific backend selection:** ```bash CORTEX_LLM=SECONDARY # Use Ollama for reasoning INTAKE_LLM=PRIMARY # Use llama.cpp for summarization SPEAK_LLM=OPENAI # Use OpenAI for persona NEOMEM_LLM=PRIMARY # Use llama.cpp for memory UI_LLM=OPENAI # Use OpenAI for UI RELAY_LLM=PRIMARY # Use llama.cpp for relay STANDARD_MODE_LLM=SECONDARY # Default backend for Standard Mode (NEW in v0.7.0) ``` ### Database Configuration ```bash POSTGRES_USER=neomem POSTGRES_PASSWORD=neomempass POSTGRES_DB=neomem POSTGRES_HOST=neomem-postgres POSTGRES_PORT=5432 NEO4J_URI=bolt://neomem-neo4j:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=neomemgraph ``` ### Service URLs (Internal Docker Network) ```bash NEOMEM_API=http://neomem-api:7077 CORTEX_API=http://cortex:7081 CORTEX_REASON_URL=http://cortex:7081/reason CORTEX_SIMPLE_URL=http://cortex:7081/simple # NEW in v0.7.0 CORTEX_INGEST_URL=http://cortex:7081/ingest RELAY_URL=http://relay:7078 ``` ### Feature Flags ```bash CORTEX_ENABLED=true MEMORY_ENABLED=true PERSONA_ENABLED=false DEBUG_PROMPT=true VERBOSE_DEBUG=true ENABLE_TRILIUM=true # NEW in v0.9.0 ``` For complete environment variable reference, see [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md). --- ## Documentation - [CHANGELOG.md](CHANGELOG.md) - Detailed version history - [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Comprehensive project overview for AI context - [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md) - Environment variable reference - [DEPRECATED_FILES.md](DEPRECATED_FILES.md) - Deprecated files and migration guide --- ## Troubleshooting ### SESSIONS not persisting **Symptom:** Intake buffer always shows 0 exchanges, summaries always empty. **Solution (Fixed in v0.5.1):** - Ensure `cortex/intake/__init__.py` exists - Check Cortex logs for `[Intake Module Init]` message showing SESSIONS object ID - Verify single-worker mode (Dockerfile: `uvicorn main:app --workers 1`) - Use `/debug/sessions` endpoint to inspect current state ### Cortex connection errors **Symptom:** Relay can't reach Cortex, 502 errors. **Solution:** - Verify Cortex container is running: `docker ps | grep cortex` - Check Cortex health: `curl http://localhost:7081/health` - Verify environment variables: `CORTEX_REASON_URL=http://cortex:7081/reason` - Check docker network: `docker network inspect lyra_net` ### LLM backend timeouts **Symptom:** Reasoning stage hangs or times out. **Solution:** - Verify LLM backend is running and accessible - Check LLM backend health: `curl http://10.0.0.44:8080/health` - Increase timeout in llm_router.py if using slow models - Check logs for specific backend errors --- ## License NeoMem is a derivative work based on Mem0 OSS (Apache 2.0). © 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0. **Built with Claude Code** --- ## Integration Notes - NeoMem API is compatible with Mem0 OSS endpoints (`/memories`, `/search`) - All services communicate via Docker internal networking on the `lyra_net` bridge - History and entity graphs are managed via PostgreSQL + Neo4j - LLM backends are accessed via HTTP and configured in `.env` - Intake module is imported internally by Cortex (no HTTP communication) - SESSIONS state is maintained in-memory within Cortex container --- ## Beta Lyrae - RAG Memory System (Currently Disabled) **Note:** The RAG service is currently disabled in docker-compose.yml ### Requirements - Python 3.10+ - Dependencies: `chromadb openai tqdm python-dotenv fastapi uvicorn` - Persistent storage: `./chromadb` or `/mnt/data/lyra_rag_db` ### Setup 1. Import chat logs (must be in OpenAI message format): ```bash python3 rag/rag_chat_import.py ``` 2. Build and start the RAG API server: ```bash cd rag python3 rag_build.py uvicorn rag_api:app --host 0.0.0.0 --port 7090 ``` 3. Query the RAG system: ```bash curl -X POST http://127.0.0.1:7090/rag/search \ -H "Content-Type: application/json" \ -d '{ "query": "What is the current state of Cortex?", "where": {"category": "lyra"} }' ``` --- ## Development Notes ### Cortex Architecture (v0.6.0) - Cortex contains embedded Intake module at `cortex/intake/` - Intake is imported as: `from intake.intake import add_exchange_internal, SESSIONS` - SESSIONS is a module-level global dictionary (singleton pattern) - Single-worker constraint required to maintain SESSIONS state - Diagnostic endpoints available for debugging: `/debug/sessions`, `/debug/summary` - **NEW:** Autonomy system integrated at `cortex/autonomy/` - Executive, decision, action, learning, and monitoring layers - Self-state persistence in `cortex/data/self_state.json` - Coordinated via orchestrator with flexible service controls ### Adding New LLM Backends 1. Add backend URL to `.env`: ```bash LLM_CUSTOM_URL=http://your-backend:port LLM_CUSTOM_MODEL=model-name ``` 2. Configure module to use new backend: ```bash CORTEX_LLM=CUSTOM ``` 3. Restart Cortex container: ```bash docker-compose restart cortex ``` ### Debugging Tips - Enable verbose logging: `VERBOSE_DEBUG=true` in `.env` - Check Cortex logs: `docker logs cortex -f` - Check Relay logs: `docker logs relay -f` - Inspect SESSIONS: `curl http://localhost:7081/debug/sessions` - Test summarization: `curl "http://localhost:7081/debug/summary?session_id=test"` - List sessions: `curl http://localhost:7078/sessions` - Test Standard Mode: `curl -X POST http://localhost:7078/v1/chat/completions -H "Content-Type: application/json" -d '{"mode":"standard","backend":"SECONDARY","messages":[{"role":"user","content":"test"}],"sessionId":"test"}'` - Monitor Docker network: `docker network inspect lyra_net` - Check session files: `ls -la core/relay/sessions/`