chore: nuke legacy code, keep design docs for restart

Preserved on the archive branch. Keeping only the architecture and design thinking that survives the rewrite: - docs/ARCH_v0-6-1.md (Inner Self / Executive / Chat / Persona model) - docs/ARCHITECTURE_v0-6-0.md (predecessor architecture) - docs/PROJECT_SUMMARY.md (project history and rationale) - docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md (detailed design notes) - docs/ENVIRONMENT_VARIABLES.md (multi-backend env conventions) - docs/LLMS.md - docs/TRILLIUM_API.md (for future tool integration) Removed: all service code (cortex, core/relay, neomem, rag, sandbox, persona-sidecar), docker-compose, migration/logging docs, stale root test scripts, CHANGELOG.
2026-05-16 05:57:07 +00:00
parent 4b951f3be8
commit faf4e8a1aa
265 changed files with 0 additions and 47927 deletions
@@ -1,902 +0,0 @@
-# Project Lyra - README v0.9.0
-
-Lyra is a modular persistent AI companion system with advanced reasoning capabilities and autonomous decision-making.
-It provides memory-backed chat using **Relay** + **Cortex** with integrated **Autonomy System**,
-featuring a multi-stage reasoning pipeline powered by HTTP-based LLM backends.
-
-**NEW in v0.9.0:** Trilium Notes integration - Search and create notes from conversations
-
-**Current Version:** v0.9.0 (2025-12-29)
-
-> **Note:** As of v0.6.0, NeoMem is **disabled by default** while we work out integration hiccups in the pipeline. The autonomy system is being refined independently before full memory integration.
-
-## Mission Statement
-
-The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget evertything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
-
---
-
-## Architecture Overview
-
-Project Lyra operates as a **single docker-compose deployment** with multiple Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
-
-### Core Services
-
-**1. Relay** (Node.js/Express) - Port 7078
- Main orchestrator and message router
- Coordinates all module interactions
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
- Internal endpoint: `POST /chat`
- Dual-mode routing: Standard Mode (simple chat) or Cortex Mode (full reasoning)
- Server-side session persistence with file-based storage
- Session management API: `GET/POST/PATCH/DELETE /sessions`
- Manages async calls to Cortex ingest
- *(NeoMem integration currently disabled in v0.6.0)*
-
-**2. UI** (Static HTML) - Port 8081 (nginx)
- Browser-based chat interface with cyberpunk theme
- Mode selector (Standard/Cortex) in header
- Settings modal with backend selection and session management
- Light/Dark mode toggle (dark by default)
- **NEW in v0.8.0:** "🧠 Show Work" button for real-time thinking stream
-  - Opens popup window with live SSE connection
-  - Color-coded events: thinking, tool calls, results, completion
-  - Auto-scrolling with animations
-  - Session-aware (matches current chat session)
- Server-synced session management (persists across browsers and reboots)
- OpenAI-compatible message format
-
-**3. NeoMem** (Python/FastAPI) - Port 7077 - **DISABLED IN v0.6.0**
- Long-term memory database (fork of Mem0 OSS)
- Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
- RESTful API: `/memories`, `/search`
- Semantic memory updates and retrieval
- No external SDK dependencies - fully local
- **Status:** Currently disabled while pipeline integration is refined
-
-### Reasoning Layer
-
-**4. Cortex** (Python/FastAPI) - Port 7081
- Primary reasoning engine with multi-stage pipeline and autonomy system
- **Includes embedded Intake module** (no separate service as of v0.5.1)
- **Integrated Autonomy System** (NEW in v0.6.0) - See Autonomy System section below
- **Tool Calling System** (NEW in v0.8.0) - Agentic execution for Standard Mode
-  - Sandboxed code execution (Python, JavaScript, Bash)
-  - Web search via Tavily API
-  - **Trilium knowledge base integration** (NEW in v0.9.0)
-  - Multi-iteration autonomous tool use (max 5 iterations)
-  - Real-time thinking stream via SSE
- **Dual Operating Modes:**
-  - **Standard Mode** (v0.7.0) - Simple chatbot with context retention + tool calling (v0.8.0)
-    - Bypasses reflection, reasoning, refinement stages
-    - Direct LLM call with conversation history
-    - User-selectable backend (SECONDARY, OPENAI, or custom)
-    - **NEW:** Autonomous tool calling for code execution, web search, knowledge queries
-    - **NEW:** "Show Your Work" real-time thinking stream
-    - Faster responses for coding and practical tasks
-  - **Cortex Mode** - Full 4-stage reasoning pipeline
-    1. **Reflection** - Generates meta-awareness notes about conversation
-    2. **Reasoning** - Creates initial draft answer using context
-    3. **Refinement** - Polishes and improves the draft
-    4. **Persona** - Applies Lyra's personality and speaking style
- Integrates with Intake for short-term context via internal Python imports
- Flexible LLM router supporting multiple backends via HTTP
- **Endpoints:**
-  - `POST /reason` - Main reasoning pipeline (Cortex Mode)
-  - `POST /simple` - Direct LLM chat with tool calling (Standard Mode)
-  - `GET /stream/thinking/{session_id}` - SSE stream for thinking events **NEW in v0.8.0**
-  - `POST /ingest` - Receives conversation exchanges from Relay
-  - `GET /health` - Service health check
-  - `GET /debug/sessions` - Inspect in-memory SESSIONS state
-  - `GET /debug/summary` - Test summarization for a session
-
-**5. Intake** (Python Module) - **Embedded in Cortex**
- **No longer a standalone service** - runs as Python module inside Cortex container
- Short-term memory management with session-based circular buffer
- In-memory SESSIONS dictionary: `session_id → {buffer: deque(maxlen=200), created_at: timestamp}`
- Multi-level summarization (L1/L5/L10/L20/L30) produced by `summarize_context()`
- Deferred summarization - actual summary generation happens during `/reason` call
- Internal Python API:
-  - `add_exchange_internal(exchange)` - Direct function call from Cortex
-  - `summarize_context(session_id, exchanges)` - Async LLM-based summarization
-  - `SESSIONS` - Module-level global state (requires single Uvicorn worker)
-
-### LLM Backends (HTTP-based)
-
-**All LLM communication is done via HTTP APIs:**
- **PRIMARY**: llama.cpp server (`http://10.0.0.44:8080`) - AMD MI50 GPU backend
- **SECONDARY**: Ollama server (`http://10.0.0.3:11434`) - RTX 3090 backend
-  - Model: qwen2.5:7b-instruct-q4_K_M
- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cloud-based models
-  - Model: gpt-4o-mini
- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
-  - Model: llama-3.2-8b-instruct
-
-Each module can be configured to use a different backend via environment variables.
-
-### Autonomy System (NEW in v0.6.0)
-
-**Cortex Autonomy Subsystems** - Multi-layered autonomous decision-making and learning
- **Executive Layer** [cortex/autonomy/executive/](cortex/autonomy/executive/)
-  - High-level planning and goal setting
-  - Multi-step reasoning for complex objectives
-  - Strategic decision making
- **Decision Engine** [cortex/autonomy/tools/decision_engine.py](cortex/autonomy/tools/decision_engine.py)
-  - Autonomous decision-making framework
-  - Option evaluation and selection
-  - Coordinated decision orchestration
- **Autonomous Actions** [cortex/autonomy/actions/](cortex/autonomy/actions/)
-  - Self-initiated action execution
-  - Context-aware behavior implementation
-  - Action logging and tracking
- **Pattern Learning** [cortex/autonomy/learning/](cortex/autonomy/learning/)
-  - Learns from interaction patterns
-  - Identifies recurring user needs
-  - Adaptive behavior refinement
- **Proactive Monitoring** [cortex/autonomy/proactive/](cortex/autonomy/proactive/)
-  - System state monitoring
-  - Intervention opportunity detection
-  - Background awareness capabilities
- **Self-Analysis** [cortex/autonomy/self/](cortex/autonomy/self/)
-  - Performance tracking and analysis
-  - Cognitive pattern identification
-  - Self-state persistence in [cortex/data/self_state.json](cortex/data/self_state.json)
- **Orchestrator** [cortex/autonomy/tools/orchestrator.py](cortex/autonomy/tools/orchestrator.py)
-  - Coordinates all autonomy subsystems
-  - Manages tool selection and execution
-  - Handles external integrations (with enable/disable controls)
-
-**Autonomy Architecture:**
-The autonomy system operates in coordinated layers, all maintaining state in `self_state.json`:
-1. Executive Layer → Planning and goals
-2. Decision Layer → Evaluation and choices
-3. Action Layer → Execution
-4. Learning Layer → Pattern adaptation
-5. Monitoring Layer → Proactive awareness
-
---
-
-## Data Flow Architecture (v0.7.0)
-
-### Standard Mode Flow (NEW in v0.7.0):
-
-```
-User (UI) → POST /v1/chat/completions {mode: "standard", backend: "SECONDARY"}
-  ↓
-Relay (7078)
-  ↓ POST /simple
-Cortex (7081)
-  ↓ (internal Python call)
-Intake module → get_recent_messages() (last 20 messages)
-  ↓
-Direct LLM call (user-selected backend: SECONDARY/OPENAI/custom)
-  ↓
-Returns simple response to Relay
-  ↓
-Relay → POST /ingest (async)
-  ↓
-Cortex → add_exchange_internal() → SESSIONS buffer
-  ↓
-Relay → POST /sessions/:id (save session to file)
-  ↓
-Relay → UI (returns final response)
-
-Note: Bypasses reflection, reasoning, refinement, persona stages
-```
-
-### Cortex Mode Flow (Full Reasoning):
-
-```
-User (UI) → POST /v1/chat/completions {mode: "cortex"}
-  ↓
-Relay (7078)
-  ↓ POST /reason
-Cortex (7081)
-  ↓ (internal Python call)
-Intake module → summarize_context()
-  ↓
-Autonomy System → Decision evaluation & pattern learning
-  ↓
-Cortex processes (4 stages):
-  1. reflection.py → meta-awareness notes (CLOUD backend)
-  2. reasoning.py → draft answer (PRIMARY backend, autonomy-aware)
-  3. refine.py → refined answer (PRIMARY backend)
-  4. persona/speak.py → Lyra personality (CLOUD backend, autonomy-aware)
-  ↓
-Returns persona answer to Relay
-  ↓
-Relay → POST /ingest (async)
-  ↓
-Cortex → add_exchange_internal() → SESSIONS buffer
-  ↓
-Autonomy System → Update self_state.json (pattern tracking)
-  ↓
-Relay → POST /sessions/:id (save session to file)
-  ↓
-Relay → UI (returns final response)
-
-Note: NeoMem integration disabled in v0.6.0
-```
-
-### Session Persistence Flow (NEW in v0.7.0):
-
-```
-UI loads → GET /sessions → Relay → List all sessions from files → UI dropdown
-User sends message → POST /sessions/:id → Relay → Save to sessions/*.json
-User renames session → PATCH /sessions/:id/metadata → Relay → Update *.meta.json
-User deletes session → DELETE /sessions/:id → Relay → Remove session files
-
-Sessions stored in: core/relay/sessions/
- {sessionId}.json (conversation history)
- {sessionId}.meta.json (name, timestamps, metadata)
-```
-
-### Cortex 4-Stage Reasoning Pipeline:
-
-1. **Reflection** (`reflection.py`) - Cloud LLM (OpenAI)
-   - Analyzes user intent and conversation context
-   - Generates meta-awareness notes
-   - "What is the user really asking?"
-
-2. **Reasoning** (`reasoning.py`) - Primary LLM (llama.cpp)
-   - Retrieves short-term context from Intake module
-   - Creates initial draft answer
-   - Integrates context, reflection notes, and user prompt
-
-3. **Refinement** (`refine.py`) - Primary LLM (llama.cpp)
-   - Polishes the draft answer
-   - Improves clarity and coherence
-   - Ensures factual consistency
-
-4. **Persona** (`speak.py`) - Cloud LLM (OpenAI)
-   - Applies Lyra's personality and speaking style
-   - Natural, conversational output
-   - Final answer returned to user
-
---
-
-## Features
-
-### Core Services
-
-**Relay**:
- Main orchestrator and message router
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
- Internal endpoint: `POST /chat`
- Health check: `GET /_health`
- **NEW:** Dual-mode routing (Standard/Cortex)
- **NEW:** Server-side session persistence with CRUD API
- **NEW:** Session management endpoints:
-  - `GET /sessions` - List all sessions
-  - `GET /sessions/:id` - Retrieve session history
-  - `POST /sessions/:id` - Save session history
-  - `PATCH /sessions/:id/metadata` - Update session metadata
-  - `DELETE /sessions/:id` - Delete session
- Async non-blocking calls to Cortex
- Shared request handler for code reuse
- Comprehensive error handling
-
-**NeoMem (Memory Engine)**:
- Forked from Mem0 OSS - fully independent
- Drop-in compatible API (`/memories`, `/search`)
- Local-first: runs on FastAPI with Postgres + Neo4j
- No external SDK dependencies
- Semantic memory updates - compares embeddings and performs in-place updates
- Default service: `neomem-api` (port 7077)
-
-**UI**:
- Lightweight static HTML chat interface
- Cyberpunk theme with light/dark mode toggle
- **NEW:** Mode selector (Standard/Cortex) in header
- **NEW:** Settings modal (⚙ button) with:
-  - Backend selection for Standard Mode (SECONDARY/OPENAI/custom)
-  - Session management (view, delete sessions)
-  - Theme toggle (dark mode default)
- **NEW:** Server-synced session management
-  - Sessions persist across browsers and reboots
-  - Rename sessions with custom names
-  - Delete sessions with confirmation
-  - Automatic session save on every message
- OpenAI message format support
-
-### Reasoning Layer
-
-**Cortex** (v0.7.0):
- **NEW:** Dual operating modes:
-  - **Standard Mode** - Simple chat with context (`/simple` endpoint)
-    - User-selectable backend (SECONDARY, OPENAI, or custom)
-    - Full conversation history via Intake integration
-    - Bypasses reasoning pipeline for faster responses
-  - **Cortex Mode** - Full reasoning pipeline (`/reason` endpoint)
-    - Multi-stage processing: reflection → reasoning → refine → persona
-    - Per-stage backend selection
-    - Autonomy system integration
- Flexible LLM backend routing via HTTP
- Async processing throughout
- Embedded Intake module for short-term context
- `/reason`, `/simple`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary` endpoints
- Lenient error handling - never fails the chat pipeline
-
-**Intake** (Embedded Module):
- **Architectural change**: Now runs as Python module inside Cortex container
- In-memory SESSIONS management (session_id → buffer)
- Multi-level summarization: L1 (ultra-short), L5 (short), L10 (medium), L20 (detailed), L30 (full)
- Deferred summarization strategy - summaries generated during `/reason` call
- `bg_summarize()` is a logging stub - actual work deferred
- **Single-worker constraint**: SESSIONS requires single Uvicorn worker or Redis/shared storage
-
-**LLM Router**:
- Dynamic backend selection via HTTP
- Environment-driven configuration
- Support for llama.cpp, Ollama, OpenAI, custom endpoints
- Per-module backend preferences:
-  - `CORTEX_LLM=SECONDARY` (Ollama for reasoning)
-  - `INTAKE_LLM=PRIMARY` (llama.cpp for summarization)
-  - `SPEAK_LLM=OPENAI` (Cloud for persona)
-  - `NEOMEM_LLM=PRIMARY` (llama.cpp for memory operations)
-
-### Beta Lyrae (RAG Memory DB) - Currently Disabled
-
- **RAG Knowledge DB - Beta Lyrae (sheliak)**
-  - This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra.
-  - It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation.
-  - **Status**: Disabled in docker-compose.yml (v0.5.1)
-
-The system uses:
- **ChromaDB** for persistent vector storage
- **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity
- **FastAPI** (port 7090) for the `/rag/search` REST endpoint
-
-Directory Layout:
-```
-rag/
-├── rag_chat_import.py    # imports JSON chat logs
-├── rag_docs_import.py    # (planned) PDF/EPUB/manual importer
-├── rag_build.py          # legacy single-folder builder
-├── rag_query.py          # command-line query helper
-├── rag_api.py            # FastAPI service providing /rag/search
-├── chromadb/             # persistent vector store
-├── chatlogs/             # organized source data
-│   ├── poker/
-│   ├── work/
-│   ├── lyra/
-│   ├── personal/
-│   └── ...
-└── import.log            # progress log for batch runs
-```
-
-**OpenAI chatlog importer features:**
- Recursive folder indexing with **category detection** from directory name
- Smart chunking for long messages (5,000 chars per slice)
- Automatic deduplication using SHA-1 hash of file + chunk
- Timestamps for both file modification and import time
- Full progress logging via tqdm
- Safe to run in background with `nohup … &`
-
---
-
-## Docker Deployment
-
-All services run in a single docker-compose stack with the following containers:
-
-**Active Services:**
- **relay** - Main orchestrator (port 7078)
- **cortex** - Reasoning engine with embedded Intake and Autonomy System (port 7081)
-
-**Disabled Services (v0.6.0):**
- **neomem-postgres** - PostgreSQL with pgvector extension (port 5432) - *disabled while refining pipeline*
- **neomem-neo4j** - Neo4j graph database (ports 7474, 7687) - *disabled while refining pipeline*
- **neomem-api** - NeoMem memory service (port 7077) - *disabled while refining pipeline*
- **intake** - No longer needed (embedded in Cortex as of v0.5.1)
- **rag** - Beta Lyrae RAG service (port 7090) - currently disabled
-
-All containers communicate via the `lyra_net` Docker bridge network.
-
-## External LLM Services
-
-The following LLM backends are accessed via HTTP (not part of docker-compose):
-
- **llama.cpp Server** (`http://10.0.0.44:8080`)
-  - AMD MI50 GPU-accelerated inference
-  - Primary backend for reasoning and refinement stages
-  - Model path: `/model`
-
- **Ollama Server** (`http://10.0.0.3:11434`)
-  - RTX 3090 GPU-accelerated inference
-  - Secondary/configurable backend
-  - Model: qwen2.5:7b-instruct-q4_K_M
-
- **OpenAI API** (`https://api.openai.com/v1`)
-  - Cloud-based inference
-  - Used for reflection and persona stages
-  - Model: gpt-4o-mini
-
- **Fallback Server** (`http://10.0.0.41:11435`)
-  - Emergency backup endpoint
-  - Local llama-3.2-8b-instruct model
-
---
-
-## Version History
-
-### v0.9.0 (2025-12-29) - Current Release
-**Major Feature: Trilium Notes Integration**
- ✅ Added Trilium ETAPI integration for knowledge base access
- ✅ `search_notes()` tool for searching personal notes during conversations
- ✅ `create_note()` tool for capturing insights and information
- ✅ ETAPI authentication with secure token management
- ✅ Complete setup documentation and API reference
- ✅ Environment configuration with feature flag (`ENABLE_TRILIUM`)
- ✅ Automatic parent note handling (defaults to "root")
- ✅ Connection error handling and user-friendly messages
-
-**Key Capabilities:**
- Search your Trilium notes during conversations for context
- Create new notes from conversation insights automatically
- Cross-reference information between chat and knowledge base
- Future: Find duplicates, suggest organization, summarize notes
-
-**Documentation:**
- Added [TRILIUM_SETUP.md](TRILIUM_SETUP.md) - Complete setup guide
- Added [docs/TRILIUM_API.md](docs/TRILIUM_API.md) - Full API reference
-
-### v0.8.0 (2025-12-26)
-**Major Feature: Agentic Tool Calling + "Show Your Work"**
- ✅ Added tool calling system for Standard Mode
- ✅ Real-time thinking stream visualization
- ✅ Sandboxed code execution (Python, JavaScript, Bash)
- ✅ Web search integration via Tavily API
- ✅ Server-Sent Events (SSE) for live tool execution updates
-
-### v0.7.0 (2025-12-21)
-**Major Features: Standard Mode + Backend Selection + Session Persistence**
- ✅ Added Standard Mode for simple chatbot functionality
- ✅ UI mode selector (Standard/Cortex) in header
- ✅ Settings modal with backend selection for Standard Mode
- ✅ Server-side session persistence with file-based storage
- ✅ Session management UI (view, rename, delete sessions)
- ✅ Light/Dark mode toggle (dark by default)
- ✅ Context retention in Standard Mode via Intake integration
- ✅ Fixed modal positioning and z-index issues
- ✅ Cortex `/simple` endpoint for direct LLM calls
- ✅ Session CRUD API in Relay
- ✅ Full backward compatibility - Cortex Mode unchanged
-
-**Key Changes:**
- Standard Mode bypasses 6 of 7 reasoning stages for faster responses
- Sessions now sync across browsers and survive container restarts
- User can select SECONDARY (Ollama), OPENAI, or custom backend for Standard Mode
- Theme preference and backend selection persisted in localStorage
- Session files stored in `core/relay/sessions/` directory
-
-### v0.6.0 (2025-12-18)
-**Major Feature: Autonomy System (Phase 1, 2, and 2.5)**
- ✅ Added autonomous decision-making framework
- ✅ Implemented executive planning and goal-setting layer
- ✅ Added pattern learning system for adaptive behavior
- ✅ Implemented proactive monitoring capabilities
- ✅ Created self-analysis and performance tracking system
- ✅ Integrated self-state persistence (`cortex/data/self_state.json`)
- ✅ Built decision engine with orchestrator coordination
- ✅ Added autonomous action execution framework
- ✅ Integrated autonomy into reasoning and persona layers
- ✅ Created comprehensive test suites for autonomy features
- ✅ Added complete system breakdown documentation
-
-**Architecture Changes:**
- Autonomy system integrated into Cortex reasoning pipeline
- Multi-layered autonomous decision-making architecture
- Self-state tracking across sessions
- NeoMem disabled by default while refining pipeline integration
- Enhanced orchestrator with flexible service controls
-
-**Documentation:**
- Added [PROJECT_LYRA_COMPLETE_BREAKDOWN.md](docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md)
- Updated changelog with comprehensive autonomy system details
-
-### v0.5.1 (2025-12-11)
-**Critical Intake Integration Fixes:**
- ✅ Fixed `bg_summarize()` NameError preventing SESSIONS persistence
- ✅ Fixed `/ingest` endpoint unreachable code
- ✅ Added `cortex/intake/__init__.py` for proper package structure
- ✅ Added diagnostic logging to verify SESSIONS singleton behavior
- ✅ Added `/debug/sessions` and `/debug/summary` endpoints
- ✅ Documented single-worker constraint in Dockerfile
- ✅ Implemented lenient error handling (never fails chat pipeline)
- ✅ Intake now embedded in Cortex - no longer standalone service
-
-**Architecture Changes:**
- Intake module runs inside Cortex container as pure Python import
- No HTTP calls between Cortex and Intake (internal function calls)
- SESSIONS persist correctly in Uvicorn worker
- Deferred summarization strategy (summaries generated during `/reason`)
-
-### v0.5.0 (2025-11-28)
- ✅ Fixed all critical API wiring issues
- ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
- ✅ Fixed Cortex → Intake integration
- ✅ Added missing Python package `__init__.py` files
- ✅ End-to-end message flow verified and working
-
-### Infrastructure v1.0.0 (2025-11-26)
- Consolidated 9 scattered `.env` files into single source of truth
- Multi-backend LLM strategy implemented
- Docker Compose consolidation
- Created `.env.example` security templates
-
-### v0.4.x (Major Rewire)
- Cortex multi-stage reasoning pipeline
- LLM router with multi-backend support
- Major architectural restructuring
-
-### v0.3.x
- Beta Lyrae RAG system
- NeoMem integration
- Basic Cortex reasoning loop
-
---
-
-## Known Issues (v0.7.0)
-
-### Temporarily Disabled
- **NeoMem disabled by default** - Being refined independently before full integration
-  - PostgreSQL + pgvector storage inactive
-  - Neo4j graph database inactive
-  - Memory persistence endpoints not active
- RAG service (Beta Lyrae) currently disabled in docker-compose.yml
-
-### Standard Mode Limitations
- No reflection, reasoning, or refinement stages (by design)
- DeepSeek R1 not recommended for Standard Mode (generates reasoning artifacts)
- No RAG integration (same as Cortex Mode - currently disabled)
- No NeoMem memory storage (same as Cortex Mode - currently disabled)
-
-### Session Management Limitations
- Sessions stored in container filesystem - requires volume mount for true persistence
- No session import/export functionality yet
- No session search or filtering
- Old localStorage sessions don't automatically migrate to server
-
-### Operational Notes
- **Single-worker constraint**: Cortex must run with single Uvicorn worker to maintain SESSIONS state
-  - Multi-worker scaling requires migrating SESSIONS to Redis or shared storage
- Diagnostic endpoints (`/debug/sessions`, `/debug/summary`) available for troubleshooting
- Backend selection only affects Standard Mode - Cortex Mode uses environment-configured backends
-
-### Future Enhancements
- Re-enable NeoMem integration after pipeline refinement
- Full autonomy system maturation and optimization
- Re-enable RAG service integration
- Session import/export functionality
- Session search and filtering UI
- Migrate SESSIONS to Redis for multi-worker support
- Add request correlation IDs for tracing
- Comprehensive health checks across all services
- Enhanced pattern learning with long-term memory integration
-
---
-
-## Quick Start
-
-### Prerequisites
- Docker + Docker Compose
- At least one HTTP-accessible LLM endpoint (llama.cpp, Ollama, or OpenAI API key)
-
-### Setup
-1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys:
-   ```bash
-   # Required: Configure at least one LLM backend
-   LLM_PRIMARY_URL=http://10.0.0.44:8080       # llama.cpp
-   LLM_SECONDARY_URL=http://10.0.0.3:11434     # Ollama
-   OPENAI_API_KEY=sk-...                        # OpenAI
-   ```
-
-2. Start all services with docker-compose:
-   ```bash
-   docker-compose up -d
-   ```
-
-3. Check service health:
-   ```bash
-   # Relay health
-   curl http://localhost:7078/_health
-
-   # Cortex health
-   curl http://localhost:7081/health
-
-   # NeoMem health
-   curl http://localhost:7077/health
-   ```
-
-4. Access the UI at `http://localhost:8081`
-
-### Using the UI
-
-**Mode Selection:**
- Use the **Mode** dropdown in the header to switch between:
-  - **Standard** - Simple chatbot for coding and practical tasks
-  - **Cortex** - Full reasoning pipeline with autonomy features
-
-**Settings Menu:**
-1. Click the **⚙ Settings** button in the header
-2. **Backend Selection** (Standard Mode only):
-   - Choose **SECONDARY** (Ollama/Qwen on 3090) - Fast, local
-   - Choose **OPENAI** (GPT-4o-mini) - Cloud-based, high quality
-   - Enter custom backend name for advanced configurations
-3. **Session Management**:
-   - View all saved sessions with message counts and timestamps
-   - Click 🗑️ to delete unwanted sessions
-4. **Theme Toggle**:
-   - Click **🌙 Dark Mode** or **☀️ Light Mode** to switch themes
-
-**Session Management:**
- Sessions automatically save on every message
- Use the **Session** dropdown to switch between sessions
- Click **➕ New** to create a new session
- Click **✏️ Rename** to rename the current session
- Sessions persist across browsers and container restarts
-
-### Test
-
-**Test Standard Mode:**
-```bash
-curl -X POST http://localhost:7078/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "mode": "standard",
-    "backend": "SECONDARY",
-    "messages": [{"role": "user", "content": "Hello!"}],
-    "sessionId": "test"
-  }'
-```
-
-**Test Cortex Mode (Full Reasoning):**
-```bash
-curl -X POST http://localhost:7078/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "mode": "cortex",
-    "messages": [{"role": "user", "content": "Hello Lyra!"}],
-    "sessionId": "test"
-  }'
-```
-
-**Test Cortex /ingest endpoint:**
-```bash
-curl -X POST http://localhost:7081/ingest \
-  -H "Content-Type: application/json" \
-  -d '{
-    "session_id": "test",
-    "user_msg": "Hello",
-    "assistant_msg": "Hi there!"
-  }'
-```
-
-**Inspect SESSIONS state:**
-```bash
-curl http://localhost:7081/debug/sessions
-```
-
-**Get summary for a session:**
-```bash
-curl "http://localhost:7081/debug/summary?session_id=test"
-```
-
-**List all sessions:**
-```bash
-curl http://localhost:7078/sessions
-```
-
-**Get session history:**
-```bash
-curl http://localhost:7078/sessions/sess-abc123
-```
-
-**Delete a session:**
-```bash
-curl -X DELETE http://localhost:7078/sessions/sess-abc123
-```
-
-All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack.
-
---
-
-## Environment Variables
-
-### LLM Backend Configuration
-
-**Backend URLs (Full API endpoints):**
-```bash
-LLM_PRIMARY_URL=http://10.0.0.44:8080           # llama.cpp
-LLM_PRIMARY_MODEL=/model
-
-LLM_SECONDARY_URL=http://10.0.0.3:11434         # Ollama
-LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
-
-LLM_OPENAI_URL=https://api.openai.com/v1
-LLM_OPENAI_MODEL=gpt-4o-mini
-OPENAI_API_KEY=sk-...
-```
-
-**Module-specific backend selection:**
-```bash
-CORTEX_LLM=SECONDARY         # Use Ollama for reasoning
-INTAKE_LLM=PRIMARY           # Use llama.cpp for summarization
-SPEAK_LLM=OPENAI             # Use OpenAI for persona
-NEOMEM_LLM=PRIMARY           # Use llama.cpp for memory
-UI_LLM=OPENAI                # Use OpenAI for UI
-RELAY_LLM=PRIMARY            # Use llama.cpp for relay
-STANDARD_MODE_LLM=SECONDARY  # Default backend for Standard Mode (NEW in v0.7.0)
-```
-
-### Database Configuration
-```bash
-POSTGRES_USER=neomem
-POSTGRES_PASSWORD=neomempass
-POSTGRES_DB=neomem
-POSTGRES_HOST=neomem-postgres
-POSTGRES_PORT=5432
-
-NEO4J_URI=bolt://neomem-neo4j:7687
-NEO4J_USERNAME=neo4j
-NEO4J_PASSWORD=neomemgraph
-```
-
-### Service URLs (Internal Docker Network)
-```bash
-NEOMEM_API=http://neomem-api:7077
-CORTEX_API=http://cortex:7081
-CORTEX_REASON_URL=http://cortex:7081/reason
-CORTEX_SIMPLE_URL=http://cortex:7081/simple      # NEW in v0.7.0
-CORTEX_INGEST_URL=http://cortex:7081/ingest
-RELAY_URL=http://relay:7078
-```
-
-### Feature Flags
-```bash
-CORTEX_ENABLED=true
-MEMORY_ENABLED=true
-PERSONA_ENABLED=false
-DEBUG_PROMPT=true
-VERBOSE_DEBUG=true
-ENABLE_TRILIUM=true          # NEW in v0.9.0
-```
-
-For complete environment variable reference, see [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md).
-
---
-
-## Documentation
-
- [CHANGELOG.md](CHANGELOG.md) - Detailed version history
- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Comprehensive project overview for AI context
- [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md) - Environment variable reference
- [DEPRECATED_FILES.md](DEPRECATED_FILES.md) - Deprecated files and migration guide
-
---
-
-## Troubleshooting
-
-### SESSIONS not persisting
-**Symptom:** Intake buffer always shows 0 exchanges, summaries always empty.
-
-**Solution (Fixed in v0.5.1):**
- Ensure `cortex/intake/__init__.py` exists
- Check Cortex logs for `[Intake Module Init]` message showing SESSIONS object ID
- Verify single-worker mode (Dockerfile: `uvicorn main:app --workers 1`)
- Use `/debug/sessions` endpoint to inspect current state
-
-### Cortex connection errors
-**Symptom:** Relay can't reach Cortex, 502 errors.
-
-**Solution:**
- Verify Cortex container is running: `docker ps | grep cortex`
- Check Cortex health: `curl http://localhost:7081/health`
- Verify environment variables: `CORTEX_REASON_URL=http://cortex:7081/reason`
- Check docker network: `docker network inspect lyra_net`
-
-### LLM backend timeouts
-**Symptom:** Reasoning stage hangs or times out.
-
-**Solution:**
- Verify LLM backend is running and accessible
- Check LLM backend health: `curl http://10.0.0.44:8080/health`
- Increase timeout in llm_router.py if using slow models
- Check logs for specific backend errors
-
---
-
-## License
-
-NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
-© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.
-
-**Built with Claude Code**
-
---
-
-## Integration Notes
-
- NeoMem API is compatible with Mem0 OSS endpoints (`/memories`, `/search`)
- All services communicate via Docker internal networking on the `lyra_net` bridge
- History and entity graphs are managed via PostgreSQL + Neo4j
- LLM backends are accessed via HTTP and configured in `.env`
- Intake module is imported internally by Cortex (no HTTP communication)
- SESSIONS state is maintained in-memory within Cortex container
-
---
-
-## Beta Lyrae - RAG Memory System (Currently Disabled)
-
-**Note:** The RAG service is currently disabled in docker-compose.yml
-
-### Requirements
- Python 3.10+
- Dependencies: `chromadb openai tqdm python-dotenv fastapi uvicorn`
- Persistent storage: `./chromadb` or `/mnt/data/lyra_rag_db`
-
-### Setup
-1. Import chat logs (must be in OpenAI message format):
-   ```bash
-   python3 rag/rag_chat_import.py
-   ```
-
-2. Build and start the RAG API server:
-   ```bash
-   cd rag
-   python3 rag_build.py
-   uvicorn rag_api:app --host 0.0.0.0 --port 7090
-   ```
-
-3. Query the RAG system:
-   ```bash
-   curl -X POST http://127.0.0.1:7090/rag/search \
-     -H "Content-Type: application/json" \
-     -d '{
-       "query": "What is the current state of Cortex?",
-       "where": {"category": "lyra"}
-     }'
-   ```
-
---
-
-## Development Notes
-
-### Cortex Architecture (v0.6.0)
- Cortex contains embedded Intake module at `cortex/intake/`
- Intake is imported as: `from intake.intake import add_exchange_internal, SESSIONS`
- SESSIONS is a module-level global dictionary (singleton pattern)
- Single-worker constraint required to maintain SESSIONS state
- Diagnostic endpoints available for debugging: `/debug/sessions`, `/debug/summary`
- **NEW:** Autonomy system integrated at `cortex/autonomy/`
-  - Executive, decision, action, learning, and monitoring layers
-  - Self-state persistence in `cortex/data/self_state.json`
-  - Coordinated via orchestrator with flexible service controls
-
-### Adding New LLM Backends
-1. Add backend URL to `.env`:
-   ```bash
-   LLM_CUSTOM_URL=http://your-backend:port
-   LLM_CUSTOM_MODEL=model-name
-   ```
-
-2. Configure module to use new backend:
-   ```bash
-   CORTEX_LLM=CUSTOM
-   ```
-
-3. Restart Cortex container:
-   ```bash
-   docker-compose restart cortex
-   ```
-
-### Debugging Tips
- Enable verbose logging: `VERBOSE_DEBUG=true` in `.env`
- Check Cortex logs: `docker logs cortex -f`
- Check Relay logs: `docker logs relay -f`
- Inspect SESSIONS: `curl http://localhost:7081/debug/sessions`
- Test summarization: `curl "http://localhost:7081/debug/summary?session_id=test"`
- List sessions: `curl http://localhost:7078/sessions`
- Test Standard Mode: `curl -X POST http://localhost:7078/v1/chat/completions -H "Content-Type: application/json" -d '{"mode":"standard","backend":"SECONDARY","messages":[{"role":"user","content":"test"}],"sessionId":"test"}'`
- Monitor Docker network: `docker network inspect lyra_net`
- Check session files: `ls -la core/relay/sessions/`