diff --git a/CHANGELOG.md b/CHANGELOG.md
index ab30ad6..c895d52 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,55 @@ Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Se
 
 ---
 
+## [0.5.2] - 2025-12-12
+
+### Fixed - LLM Router & Async HTTP
+- **Critical**: Replaced synchronous `requests` with async `httpx` in LLM router [cortex/llm/llm_router.py](cortex/llm/llm_router.py)
+  - Event loop blocking was causing timeouts and empty responses
+  - All three providers (MI50, Ollama, OpenAI) now use `await http_client.post()`
+  - Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake
+- **Critical**: Fixed missing `backend` parameter in intake summarization [cortex/intake/intake.py:285](cortex/intake/intake.py#L285)
+  - Was defaulting to PRIMARY (MI50) instead of respecting `INTAKE_LLM=SECONDARY`
+  - Now correctly uses configured backend (Ollama on 3090)
+- **Relay**: Fixed session ID case mismatch [core/relay/server.js:87](core/relay/server.js#L87)
+  - UI sends `sessionId` (camelCase) but relay expected `session_id` (snake_case)
+  - Now accepts both variants: `req.body.session_id || req.body.sessionId`
+  - Custom session IDs now properly tracked instead of defaulting to "default"
+
+### Added - Error Handling & Diagnostics
+- Added comprehensive error handling in LLM router for all providers
+  - HTTPError, JSONDecodeError, KeyError, and generic Exception handling
+  - Detailed error messages with exception type and description
+  - Provider-specific error logging (mi50, ollama, openai)
+- Added debug logging in intake summarization
+  - Logs LLM response length and preview
+  - Validates non-empty responses before JSON parsing
+  - Helps diagnose empty or malformed responses
+
+### Added - Session Management
+- Added session persistence endpoints in relay [core/relay/server.js:160-171](core/relay/server.js#L160-L171)
+  - `GET /sessions/:id` - Retrieve session history
+  - `POST /sessions/:id` - Save session history
+  - In-memory storage using Map (ephemeral, resets on container restart)
+  - Fixes UI "Failed to load session" errors
+
+### Changed - Provider Configuration
+- Added `mi50` provider support for llama.cpp server [cortex/llm/llm_router.py:62-81](cortex/llm/llm_router.py#L62-L81)
+  - Uses `/completion` endpoint with `n_predict` parameter
+  - Extracts `content` field from response
+  - Configured for MI50 GPU with DeepSeek model
+- Increased memory retrieval threshold from 0.78 to 0.90 [cortex/.env:20](cortex/.env#L20)
+  - Filters out low-relevance memories (only returns 90%+ similarity)
+  - Reduces noise in context retrieval
+
+### Technical Improvements
+- Unified async HTTP handling across all LLM providers
+- Better separation of concerns between provider implementations
+- Improved error messages for debugging LLM API failures
+- Consistent timeout handling (120 seconds for all providers)
+
+---
+
 ## [0.5.1] - 2025-12-11
 
 ### Fixed - Intake Integration
diff --git a/PROJECT_SUMMARY.md b/PROJECT_SUMMARY.md
deleted file mode 100644
index 551170e..0000000
--- a/PROJECT_SUMMARY.md
+++ /dev/null
@@ -1,71 +0,0 @@
-# Lyra Core — Project Summary
-
-## v0.4 (2025-10-03)
-
-### 🧠 High-Level Architecture
-- **Lyra Core (v0.3.1)** — Orchestration layer.  
-  - Accepts chat requests (`/v1/chat/completions`).  
-  - Routes through Cortex for subconscious annotation.  
-  - Stores everything in Mem0 (no discard).  
-  - Fetches persona + relevant memories.  
-  - Injects context back into LLM.  
-
-- **Cortex (v0.3.0)** — Subconscious annotator.  
-  - Runs locally via `llama.cpp` (Phi-3.5-mini Q4_K_M).  
-  - Strict JSON schema:  
-    ```json
-    {
-      "sentiment": "positive" | "neutral" | "negative",
-      "novelty": 0.0–1.0,
-      "tags": ["keyword", "keyword"],
-      "notes": "short string"
-    }
-    ```  
-  - Normalizes keys (lowercase).  
-  - Strips Markdown fences before parsing.  
-  - Configurable via `.env` (`CORTEX_ENABLED=true|false`).  
-  - Currently generates annotations, but not yet persisted into Mem0 payloads (stored as empty `{cortex:{}}`).  
-
-- **Mem0 (v0.4.0)** — Persistent memory layer.  
-  - Handles embeddings, graph storage, and retrieval.  
-  - Dual embedder support:  
-    - **OpenAI Cloud** (`text-embedding-3-small`, 1536-dim).  
-    - **HuggingFace TEI** (gte-Qwen2-1.5B-instruct, 1536-dim, hosted on 3090).  
-  - Environment toggle for provider (`.env.openai` vs `.env.3090`).  
-  - Memory persistence in Postgres (`payload` JSON).  
-  - CSV export pipeline confirmed (id, user_id, data, created_at).  
-
-- **Persona Sidecar**  
-  - Provides personality, style, and protocol instructions.  
-  - Injected at runtime into Core prompt building.  
-
----
-
-### 🚀 Recent Changes
-- **Mem0**  
-  - Added HuggingFace TEI integration (local 3090 embedder).  
-  - Enabled dual-mode environment switch (OpenAI cloud ↔ local TEI).  
-  - Fixed `.env` line ending mismatch (CRLF vs LF).  
-  - Added memory dump/export commands for Postgres.  
-
-- **Core/Relay**  
-  - No major changes since v0.3.1 (still routing input → Cortex → Mem0).  
-
-- **Cortex**  
-  - Still outputs annotations, but not yet persisted into Mem0 payloads.  
-
----
-
-### 📈 Versioning
-- **Lyra Core** → v0.3.1  
-- **Cortex** → v0.3.0  
-- **Mem0** → v0.4.0  
-
----
-
-### 📋 Next Steps
-- [ ] Wire Cortex annotations into Mem0 payloads (`cortex` object).  
-- [ ] Add “export all memories” script to standard workflow.  
-- [ ] Consider async embedding for faster `mem.add`.  
-- [ ] Build visual diagram of data flow (Core ↔ Cortex ↔ Mem0 ↔ Persona).  
-- [ ] Explore larger LLMs for Cortex (Qwen2-7B, etc.) for richer subconscious annotation.  
diff --git a/README.md b/README.md
index 072f3e0..15ea23d 100644
--- a/README.md
+++ b/README.md
@@ -1,12 +1,14 @@
-# Project Lyra - README v0.5.0
+# Project Lyra - README v0.5.1
 
 Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
 It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
 with multi-stage reasoning pipeline powered by HTTP-based LLM backends.
 
+**Current Version:** v0.5.1 (2025-12-11)
+
 ## Mission Statement
 
-The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
+The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget evertything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
 
 ---
 
@@ -22,7 +24,7 @@ Project Lyra operates as a **single docker-compose deployment** with multiple Do
 - OpenAI-compatible endpoint: `POST /v1/chat/completions`
 - Internal endpoint: `POST /chat`
 - Routes messages through Cortex reasoning pipeline
-- Manages async calls to Intake and NeoMem
+- Manages async calls to NeoMem and Cortex ingest
 
 **2. UI** (Static HTML)
 - Browser-based chat interface with cyberpunk theme
@@ -41,38 +43,48 @@ Project Lyra operates as a **single docker-compose deployment** with multiple Do
 
 **4. Cortex** (Python/FastAPI) - Port 7081
 - Primary reasoning engine with multi-stage pipeline
+- **Includes embedded Intake module** (no separate service as of v0.5.1)
 - **4-Stage Processing:**
   1. **Reflection** - Generates meta-awareness notes about conversation
   2. **Reasoning** - Creates initial draft answer using context
   3. **Refinement** - Polishes and improves the draft
   4. **Persona** - Applies Lyra's personality and speaking style
-- Integrates with Intake for short-term context
+- Integrates with Intake for short-term context via internal Python imports
 - Flexible LLM router supporting multiple backends via HTTP
+- **Endpoints:**
+  - `POST /reason` - Main reasoning pipeline
+  - `POST /ingest` - Receives conversation exchanges from Relay
+  - `GET /health` - Service health check
+  - `GET /debug/sessions` - Inspect in-memory SESSIONS state
+  - `GET /debug/summary` - Test summarization for a session
 
-**5. Intake v0.2** (Python/FastAPI) - Port 7080
-- Simplified short-term memory summarization
-- Session-based circular buffer (deque, maxlen=200)
-- Single-level simple summarization (no cascading)
-- Background async processing with FastAPI BackgroundTasks
-- Pushes summaries to NeoMem automatically
-- **API Endpoints:**
-  - `POST /add_exchange` - Add conversation exchange
-  - `GET /summaries?session_id={id}` - Retrieve session summary
-  - `POST /close_session/{id}` - Close and cleanup session
+**5. Intake** (Python Module) - **Embedded in Cortex**
+- **No longer a standalone service** - runs as Python module inside Cortex container
+- Short-term memory management with session-based circular buffer
+- In-memory SESSIONS dictionary: `session_id → {buffer: deque(maxlen=200), created_at: timestamp}`
+- Multi-level summarization (L1/L5/L10/L20/L30) produced by `summarize_context()`
+- Deferred summarization - actual summary generation happens during `/reason` call
+- Internal Python API:
+  - `add_exchange_internal(exchange)` - Direct function call from Cortex
+  - `summarize_context(session_id, exchanges)` - Async LLM-based summarization
+  - `SESSIONS` - Module-level global state (requires single Uvicorn worker)
 
 ### LLM Backends (HTTP-based)
 
 **All LLM communication is done via HTTP APIs:**
-- **PRIMARY**: vLLM server (`http://10.0.0.43:8000`) - AMD MI50 GPU backend
+- **PRIMARY**: llama.cpp server (`http://10.0.0.44:8080`) - AMD MI50 GPU backend
 - **SECONDARY**: Ollama server (`http://10.0.0.3:11434`) - RTX 3090 backend
+  - Model: qwen2.5:7b-instruct-q4_K_M
 - **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cloud-based models
+  - Model: gpt-4o-mini
 - **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
+  - Model: llama-3.2-8b-instruct
+
+Each module can be configured to use a different backend via environment variables.
 
-Each module can be configured to use a different backend via environment variables. 
-			
 ---
 
-## Data Flow Architecture (v0.5.0)
+## Data Flow Architecture (v0.5.1)
 
 ### Normal Message Flow:
 
@@ -82,43 +94,44 @@ User (UI) → POST /v1/chat/completions
 Relay (7078)
   ↓ POST /reason
 Cortex (7081)
-  ↓ GET /summaries?session_id=xxx
-Intake (7080) [RETURNS SUMMARY]
+  ↓ (internal Python call)
+Intake module → summarize_context()
   ↓
 Cortex processes (4 stages):
-  1. reflection.py → meta-awareness notes
-  2. reasoning.py → draft answer (uses LLM)
-  3. refine.py → refined answer (uses LLM)
-  4. persona/speak.py → Lyra personality (uses LLM)
+  1. reflection.py → meta-awareness notes (CLOUD backend)
+  2. reasoning.py → draft answer (PRIMARY backend)
+  3. refine.py → refined answer (PRIMARY backend)
+  4. persona/speak.py → Lyra personality (CLOUD backend)
   ↓
 Returns persona answer to Relay
   ↓
-Relay → Cortex /ingest (async, stub)
-Relay → Intake /add_exchange (async)
+Relay → POST /ingest (async)
   ↓
-Intake → Background summarize → NeoMem
+Cortex → add_exchange_internal() → SESSIONS buffer
+  ↓
+Relay → NeoMem /memories (async, planned)
   ↓
 Relay → UI (returns final response)
 ```
 
 ### Cortex 4-Stage Reasoning Pipeline:
 
-1. **Reflection** (`reflection.py`) - Configurable LLM via HTTP
+1. **Reflection** (`reflection.py`) - Cloud LLM (OpenAI)
    - Analyzes user intent and conversation context
    - Generates meta-awareness notes
    - "What is the user really asking?"
 
-2. **Reasoning** (`reasoning.py`) - Configurable LLM via HTTP
-   - Retrieves short-term context from Intake
+2. **Reasoning** (`reasoning.py`) - Primary LLM (llama.cpp)
+   - Retrieves short-term context from Intake module
    - Creates initial draft answer
    - Integrates context, reflection notes, and user prompt
 
-3. **Refinement** (`refine.py`) - Configurable LLM via HTTP
+3. **Refinement** (`refine.py`) - Primary LLM (llama.cpp)
    - Polishes the draft answer
    - Improves clarity and coherence
    - Ensures factual consistency
 
-4. **Persona** (`speak.py`) - Configurable LLM via HTTP
+4. **Persona** (`speak.py`) - Cloud LLM (OpenAI)
    - Applies Lyra's personality and speaking style
    - Natural, conversational output
    - Final answer returned to user
@@ -134,7 +147,7 @@ Relay → UI (returns final response)
 - OpenAI-compatible endpoint: `POST /v1/chat/completions`
 - Internal endpoint: `POST /chat`
 - Health check: `GET /_health`
-- Async non-blocking calls to Cortex and Intake
+- Async non-blocking calls to Cortex
 - Shared request handler for code reuse
 - Comprehensive error handling
 
@@ -154,73 +167,70 @@ Relay → UI (returns final response)
 
 ### Reasoning Layer
 
-**Cortex** (v0.5):
+**Cortex** (v0.5.1):
 - Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
 - Flexible LLM backend routing via HTTP
 - Per-stage backend selection
 - Async processing throughout
-- IntakeClient integration for short-term context
-- `/reason`, `/ingest` (stub), `/health` endpoints
+- Embedded Intake module for short-term context
+- `/reason`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary` endpoints
+- Lenient error handling - never fails the chat pipeline
 
-**Intake** (v0.2):
-- Simplified single-level summarization
-- Session-based circular buffer (200 exchanges max)
-- Background async summarization
-- Automatic NeoMem push
-- No persistent log files (memory-only)
-- **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30)
+**Intake** (Embedded Module):
+- **Architectural change**: Now runs as Python module inside Cortex container
+- In-memory SESSIONS management (session_id → buffer)
+- Multi-level summarization: L1 (ultra-short), L5 (short), L10 (medium), L20 (detailed), L30 (full)
+- Deferred summarization strategy - summaries generated during `/reason` call
+- `bg_summarize()` is a logging stub - actual work deferred
+- **Single-worker constraint**: SESSIONS requires single Uvicorn worker or Redis/shared storage
 
 **LLM Router**:
 - Dynamic backend selection via HTTP
 - Environment-driven configuration
-- Support for vLLM, Ollama, OpenAI, custom endpoints
-- Per-module backend preferences
+- Support for llama.cpp, Ollama, OpenAI, custom endpoints
+- Per-module backend preferences:
+  - `CORTEX_LLM=SECONDARY` (Ollama for reasoning)
+  - `INTAKE_LLM=PRIMARY` (llama.cpp for summarization)
+  - `SPEAK_LLM=OPENAI` (Cloud for persona)
+  - `NEOMEM_LLM=PRIMARY` (llama.cpp for memory operations)
+
+### Beta Lyrae (RAG Memory DB) - Currently Disabled
 
-# Beta Lyrae (RAG Memory DB) - added 11-3-25
 - **RAG Knowledge DB - Beta Lyrae (sheliak)**
-  - This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra.  
+  - This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra.
   - It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation.
-		The system uses:
-  - **ChromaDB** for persistent vector storage  
-  - **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity  
-  - **FastAPI** (port 7090) for the `/rag/search` REST endpoint  
-  - Directory Layout
-		rag/
-		├── rag_chat_import.py # imports JSON chat logs
-		├── rag_docs_import.py # (planned) PDF/EPUB/manual importer
-		├── rag_build.py # legacy single-folder builder
-		├── rag_query.py # command-line query helper
-		├── rag_api.py # FastAPI service providing /rag/search
-		├── chromadb/ # persistent vector store
-		├── chatlogs/ # organized source data
-		│ ├── poker/
-		│ ├── work/
-		│ ├── lyra/
-		│ ├── personal/
-		│ └── ...
-		└── import.log # progress log for batch runs
-  - **OpenAI chatlog importer.
-	  - Takes JSON formatted chat logs and imports it to the RAG.
-	  - **fetures include:**
-	    - Recursive folder indexing with **category detection** from directory name  
-		- Smart chunking for long messages (5 000 chars per slice)  
-		- Automatic deduplication using SHA-1 hash of file + chunk
-		- Timestamps for both file modification and import time
-		- Full progress logging via tqdm
-		- Safe to run in background with nohup … &
-		- Metadata per chunk:
-		  ```json
-		  {
-			"chat_id": "<sha1 of filename>",
-			"chunk_index": 0,
-			"source": "chatlogs/lyra/0002_cortex_LLMs_11-1-25.json",
-			"title": "cortex LLMs 11-1-25",
-			"role": "assistant",
-			"category": "lyra",
-			"type": "chat",
-			"file_modified": "2025-11-06T23:41:02",
-			"imported_at": "2025-11-07T03:55:00Z"
-		  }```
+  - **Status**: Disabled in docker-compose.yml (v0.5.1)
+
+The system uses:
+- **ChromaDB** for persistent vector storage
+- **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity
+- **FastAPI** (port 7090) for the `/rag/search` REST endpoint
+
+Directory Layout:
+```
+rag/
+├── rag_chat_import.py    # imports JSON chat logs
+├── rag_docs_import.py    # (planned) PDF/EPUB/manual importer
+├── rag_build.py          # legacy single-folder builder
+├── rag_query.py          # command-line query helper
+├── rag_api.py            # FastAPI service providing /rag/search
+├── chromadb/             # persistent vector store
+├── chatlogs/             # organized source data
+│   ├── poker/
+│   ├── work/
+│   ├── lyra/
+│   ├── personal/
+│   └── ...
+└── import.log            # progress log for batch runs
+```
+
+**OpenAI chatlog importer features:**
+- Recursive folder indexing with **category detection** from directory name
+- Smart chunking for long messages (5,000 chars per slice)
+- Automatic deduplication using SHA-1 hash of file + chunk
+- Timestamps for both file modification and import time
+- Full progress logging via tqdm
+- Safe to run in background with `nohup … &`
 
 ---
 
@@ -228,13 +238,16 @@ Relay → UI (returns final response)
 
 All services run in a single docker-compose stack with the following containers:
 
+**Active Services:**
 - **neomem-postgres** - PostgreSQL with pgvector extension (port 5432)
 - **neomem-neo4j** - Neo4j graph database (ports 7474, 7687)
 - **neomem-api** - NeoMem memory service (port 7077)
 - **relay** - Main orchestrator (port 7078)
-- **cortex** - Reasoning engine (port 7081)
-- **intake** - Short-term memory summarization (port 7080) - currently disabled
-- **rag** - RAG search service (port 7090) - currently disabled
+- **cortex** - Reasoning engine with embedded Intake (port 7081)
+
+**Disabled Services:**
+- **intake** - No longer needed (embedded in Cortex as of v0.5.1)
+- **rag** - Beta Lyrae RAG service (port 7090) - currently disabled
 
 All containers communicate via the `lyra_net` Docker bridge network.
 
@@ -242,10 +255,10 @@ All containers communicate via the `lyra_net` Docker bridge network.
 
 The following LLM backends are accessed via HTTP (not part of docker-compose):
 
-- **vLLM Server** (`http://10.0.0.43:8000`)
+- **llama.cpp Server** (`http://10.0.0.44:8080`)
   - AMD MI50 GPU-accelerated inference
-  - Custom ROCm-enabled vLLM build
   - Primary backend for reasoning and refinement stages
+  - Model path: `/model`
 
 - **Ollama Server** (`http://10.0.0.3:11434`)
   - RTX 3090 GPU-accelerated inference
@@ -265,16 +278,38 @@ The following LLM backends are accessed via HTTP (not part of docker-compose):
 
 ## Version History
 
-### v0.5.0 (2025-11-28) - Current Release
+### v0.5.1 (2025-12-11) - Current Release
+**Critical Intake Integration Fixes:**
+- ✅ Fixed `bg_summarize()` NameError preventing SESSIONS persistence
+- ✅ Fixed `/ingest` endpoint unreachable code
+- ✅ Added `cortex/intake/__init__.py` for proper package structure
+- ✅ Added diagnostic logging to verify SESSIONS singleton behavior
+- ✅ Added `/debug/sessions` and `/debug/summary` endpoints
+- ✅ Documented single-worker constraint in Dockerfile
+- ✅ Implemented lenient error handling (never fails chat pipeline)
+- ✅ Intake now embedded in Cortex - no longer standalone service
+
+**Architecture Changes:**
+- Intake module runs inside Cortex container as pure Python import
+- No HTTP calls between Cortex and Intake (internal function calls)
+- SESSIONS persist correctly in Uvicorn worker
+- Deferred summarization strategy (summaries generated during `/reason`)
+
+### v0.5.0 (2025-11-28)
 - ✅ Fixed all critical API wiring issues
 - ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
 - ✅ Fixed Cortex → Intake integration
 - ✅ Added missing Python package `__init__.py` files
 - ✅ End-to-end message flow verified and working
 
+### Infrastructure v1.0.0 (2025-11-26)
+- Consolidated 9 scattered `.env` files into single source of truth
+- Multi-backend LLM strategy implemented
+- Docker Compose consolidation
+- Created `.env.example` security templates
+
 ### v0.4.x (Major Rewire)
 - Cortex multi-stage reasoning pipeline
-- Intake v0.2 simplification
 - LLM router with multi-backend support
 - Major architectural restructuring
 
@@ -285,19 +320,30 @@ The following LLM backends are accessed via HTTP (not part of docker-compose):
 
 ---
 
-## Known Issues (v0.5.0)
+## Known Issues (v0.5.1)
+
+### Critical (Fixed in v0.5.1)
+- ~~Intake SESSIONS not persisting~~ ✅ **FIXED**
+- ~~`bg_summarize()` NameError~~ ✅ **FIXED**
+- ~~`/ingest` endpoint unreachable code~~ ✅ **FIXED**
 
 ### Non-Critical
 - Session management endpoints not fully implemented in Relay
-- Intake service currently disabled in docker-compose.yml
 - RAG service currently disabled in docker-compose.yml
-- Cortex `/ingest` endpoint is a stub
+- NeoMem integration in Relay not yet active (planned for v0.5.2)
+
+### Operational Notes
+- **Single-worker constraint**: Cortex must run with single Uvicorn worker to maintain SESSIONS state
+  - Multi-worker scaling requires migrating SESSIONS to Redis or shared storage
+- Diagnostic endpoints (`/debug/sessions`, `/debug/summary`) available for troubleshooting
 
 ### Future Enhancements
 - Re-enable RAG service integration
 - Implement full session persistence
+- Migrate SESSIONS to Redis for multi-worker support
 - Add request correlation IDs for tracing
-- Comprehensive health checks
+- Comprehensive health checks across all services
+- NeoMem integration in Relay
 
 ---
 
@@ -305,21 +351,39 @@ The following LLM backends are accessed via HTTP (not part of docker-compose):
 
 ### Prerequisites
 - Docker + Docker Compose
-- At least one HTTP-accessible LLM endpoint (vLLM, Ollama, or OpenAI API key)
+- At least one HTTP-accessible LLM endpoint (llama.cpp, Ollama, or OpenAI API key)
 
 ### Setup
-1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys
+1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys:
+   ```bash
+   # Required: Configure at least one LLM backend
+   LLM_PRIMARY_URL=http://10.0.0.44:8080       # llama.cpp
+   LLM_SECONDARY_URL=http://10.0.0.3:11434     # Ollama
+   OPENAI_API_KEY=sk-...                        # OpenAI
+   ```
+
 2. Start all services with docker-compose:
    ```bash
    docker-compose up -d
    ```
+
 3. Check service health:
    ```bash
+   # Relay health
    curl http://localhost:7078/_health
+
+   # Cortex health
+   curl http://localhost:7081/health
+
+   # NeoMem health
+   curl http://localhost:7077/health
    ```
+
 4. Access the UI at `http://localhost:7078`
 
 ### Test
+
+**Test Relay → Cortex pipeline:**
 ```bash
 curl -X POST http://localhost:7078/v1/chat/completions \
   -H "Content-Type: application/json" \
@@ -329,15 +393,130 @@ curl -X POST http://localhost:7078/v1/chat/completions \
   }'
 ```
 
+**Test Cortex /ingest endpoint:**
+```bash
+curl -X POST http://localhost:7081/ingest \
+  -H "Content-Type: application/json" \
+  -d '{
+    "session_id": "test",
+    "user_msg": "Hello",
+    "assistant_msg": "Hi there!"
+  }'
+```
+
+**Inspect SESSIONS state:**
+```bash
+curl http://localhost:7081/debug/sessions
+```
+
+**Get summary for a session:**
+```bash
+curl "http://localhost:7081/debug/summary?session_id=test"
+```
+
 All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack.
 
 ---
 
+## Environment Variables
+
+### LLM Backend Configuration
+
+**Backend URLs (Full API endpoints):**
+```bash
+LLM_PRIMARY_URL=http://10.0.0.44:8080           # llama.cpp
+LLM_PRIMARY_MODEL=/model
+
+LLM_SECONDARY_URL=http://10.0.0.3:11434         # Ollama
+LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
+
+LLM_OPENAI_URL=https://api.openai.com/v1
+LLM_OPENAI_MODEL=gpt-4o-mini
+OPENAI_API_KEY=sk-...
+```
+
+**Module-specific backend selection:**
+```bash
+CORTEX_LLM=SECONDARY      # Use Ollama for reasoning
+INTAKE_LLM=PRIMARY        # Use llama.cpp for summarization
+SPEAK_LLM=OPENAI          # Use OpenAI for persona
+NEOMEM_LLM=PRIMARY        # Use llama.cpp for memory
+UI_LLM=OPENAI             # Use OpenAI for UI
+RELAY_LLM=PRIMARY         # Use llama.cpp for relay
+```
+
+### Database Configuration
+```bash
+POSTGRES_USER=neomem
+POSTGRES_PASSWORD=neomempass
+POSTGRES_DB=neomem
+POSTGRES_HOST=neomem-postgres
+POSTGRES_PORT=5432
+
+NEO4J_URI=bolt://neomem-neo4j:7687
+NEO4J_USERNAME=neo4j
+NEO4J_PASSWORD=neomemgraph
+```
+
+### Service URLs (Internal Docker Network)
+```bash
+NEOMEM_API=http://neomem-api:7077
+CORTEX_API=http://cortex:7081
+CORTEX_REASON_URL=http://cortex:7081/reason
+CORTEX_INGEST_URL=http://cortex:7081/ingest
+RELAY_URL=http://relay:7078
+```
+
+### Feature Flags
+```bash
+CORTEX_ENABLED=true
+MEMORY_ENABLED=true
+PERSONA_ENABLED=false
+DEBUG_PROMPT=true
+VERBOSE_DEBUG=true
+```
+
+For complete environment variable reference, see [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md).
+
+---
+
 ## Documentation
 
-- See [CHANGELOG.md](CHANGELOG.md) for detailed version history
-- See `ENVIRONMENT_VARIABLES.md` for environment variable reference
-- Additional information available in the Trilium docs
+- [CHANGELOG.md](CHANGELOG.md) - Detailed version history
+- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Comprehensive project overview for AI context
+- [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md) - Environment variable reference
+- [DEPRECATED_FILES.md](DEPRECATED_FILES.md) - Deprecated files and migration guide
+
+---
+
+## Troubleshooting
+
+### SESSIONS not persisting
+**Symptom:** Intake buffer always shows 0 exchanges, summaries always empty.
+
+**Solution (Fixed in v0.5.1):**
+- Ensure `cortex/intake/__init__.py` exists
+- Check Cortex logs for `[Intake Module Init]` message showing SESSIONS object ID
+- Verify single-worker mode (Dockerfile: `uvicorn main:app --workers 1`)
+- Use `/debug/sessions` endpoint to inspect current state
+
+### Cortex connection errors
+**Symptom:** Relay can't reach Cortex, 502 errors.
+
+**Solution:**
+- Verify Cortex container is running: `docker ps | grep cortex`
+- Check Cortex health: `curl http://localhost:7081/health`
+- Verify environment variables: `CORTEX_REASON_URL=http://cortex:7081/reason`
+- Check docker network: `docker network inspect lyra_net`
+
+### LLM backend timeouts
+**Symptom:** Reasoning stage hangs or times out.
+
+**Solution:**
+- Verify LLM backend is running and accessible
+- Check LLM backend health: `curl http://10.0.0.44:8080/health`
+- Increase timeout in llm_router.py if using slow models
+- Check logs for specific backend errors
 
 ---
 
@@ -356,6 +535,8 @@ NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
 - All services communicate via Docker internal networking on the `lyra_net` bridge
 - History and entity graphs are managed via PostgreSQL + Neo4j
 - LLM backends are accessed via HTTP and configured in `.env`
+- Intake module is imported internally by Cortex (no HTTP communication)
+- SESSIONS state is maintained in-memory within Cortex container
 
 ---
 
@@ -391,3 +572,38 @@ NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
      }'
    ```
 
+---
+
+## Development Notes
+
+### Cortex Architecture (v0.5.1)
+- Cortex contains embedded Intake module at `cortex/intake/`
+- Intake is imported as: `from intake.intake import add_exchange_internal, SESSIONS`
+- SESSIONS is a module-level global dictionary (singleton pattern)
+- Single-worker constraint required to maintain SESSIONS state
+- Diagnostic endpoints available for debugging: `/debug/sessions`, `/debug/summary`
+
+### Adding New LLM Backends
+1. Add backend URL to `.env`:
+   ```bash
+   LLM_CUSTOM_URL=http://your-backend:port
+   LLM_CUSTOM_MODEL=model-name
+   ```
+
+2. Configure module to use new backend:
+   ```bash
+   CORTEX_LLM=CUSTOM
+   ```
+
+3. Restart Cortex container:
+   ```bash
+   docker-compose restart cortex
+   ```
+
+### Debugging Tips
+- Enable verbose logging: `VERBOSE_DEBUG=true` in `.env`
+- Check Cortex logs: `docker logs cortex -f`
+- Inspect SESSIONS: `curl http://localhost:7081/debug/sessions`
+- Test summarization: `curl "http://localhost:7081/debug/summary?session_id=test"`
+- Check Relay logs: `docker logs relay -f`
+- Monitor Docker network: `docker network inspect lyra_net`
diff --git a/autonomy/autonomy_core.py b/autonomy/autonomy_core.py
new file mode 100644
index 0000000..e69de29
diff --git a/autonomy/inner_self.py b/autonomy/inner_self.py
new file mode 100644
index 0000000..e69de29
diff --git a/autonomy/prompts/inner_monologue_prompt.txt b/autonomy/prompts/inner_monologue_prompt.txt
new file mode 100644
index 0000000..e69de29
diff --git a/autonomy/prompts/state_interp_prompt.txt b/autonomy/prompts/state_interp_prompt.txt
new file mode 100644
index 0000000..e69de29
diff --git a/autonomy/state/self_state.json b/autonomy/state/self_state.json
new file mode 100644
index 0000000..e69de29
diff --git a/core/relay/server.js b/core/relay/server.js
index db706d8..c0e7c2a 100644
--- a/core/relay/server.js
+++ b/core/relay/server.js
@@ -84,7 +84,7 @@ app.get("/_health", (_, res) => {
 // -----------------------------------------------------
 app.post("/v1/chat/completions", async (req, res) => {
   try {
-    const session_id = req.body.session_id || req.body.user || "default";
+    const session_id = req.body.session_id || req.body.sessionId || req.body.user || "default";
     const messages = req.body.messages || [];
     const lastMessage = messages[messages.length - 1];
     const user_msg = lastMessage?.content || "";
@@ -151,6 +151,25 @@ app.post("/chat", async (req, res) => {
   }
 });
 
+// -----------------------------------------------------
+// SESSION ENDPOINTS (for UI)
+// -----------------------------------------------------
+// In-memory session storage (could be replaced with a database)
+const sessions = new Map();
+
+app.get("/sessions/:id", (req, res) => {
+  const sessionId = req.params.id;
+  const history = sessions.get(sessionId) || [];
+  res.json(history);
+});
+
+app.post("/sessions/:id", (req, res) => {
+  const sessionId = req.params.id;
+  const history = req.body;
+  sessions.set(sessionId, history);
+  res.json({ ok: true, saved: history.length });
+});
+
 // -----------------------------------------------------
 app.listen(PORT, () => {
   console.log(`Relay is online on port ${PORT}`);
diff --git a/core/ui/index.html b/core/ui/index.html
index 299c193..ca37a7b 100644
--- a/core/ui/index.html
+++ b/core/ui/index.html
@@ -51,7 +51,7 @@
   </div>
 
   <script>
-    const RELAY_BASE = "http://10.0.0.40:7078";
+    const RELAY_BASE = "http://10.0.0.41:7078";
     const API_URL = `${RELAY_BASE}/v1/chat/completions`;
 
 	function generateSessionId() {
diff --git a/cortex/intake/intake.py b/cortex/intake/intake.py
index 50b192d..f5d9cba 100644
--- a/cortex/intake/intake.py
+++ b/cortex/intake/intake.py
@@ -282,11 +282,17 @@ JSON only. No text outside JSON.
     try:
         llm_response = await call_llm(
             prompt,
+            backend=INTAKE_LLM,
             temperature=0.2
         )
 
+        print(f"[Intake] LLM response length: {len(llm_response) if llm_response else 0}")
+        print(f"[Intake] LLM response preview: {llm_response[:200] if llm_response else '(empty)'}")
 
         # LLM should return JSON, parse it
+        if not llm_response or not llm_response.strip():
+            raise ValueError("Empty response from LLM")
+
         summary = json.loads(llm_response)
 
         return {
diff --git a/cortex/llm/llm_router.py b/cortex/llm/llm_router.py
index cd164bf..7b7c173 100644
--- a/cortex/llm/llm_router.py
+++ b/cortex/llm/llm_router.py
@@ -1,7 +1,10 @@
 # llm_router.py
 import os
-import requests
+import httpx
 import json
+import logging
+
+logger = logging.getLogger(__name__)
 
 # ------------------------------------------------------------
 # Load backend registry from root .env
@@ -33,6 +36,9 @@ BACKENDS = {
 
 DEFAULT_BACKEND = "PRIMARY"
 
+# Reusable async HTTP client
+http_client = httpx.AsyncClient(timeout=120.0)
+
 
 # ------------------------------------------------------------
 # Public call
@@ -57,18 +63,28 @@ async def call_llm(
         raise RuntimeError(f"Backend '{backend}' missing url/model in env")
 
     # -------------------------------
-    # Provider: VLLM (your MI50)
+    # Provider: MI50 (llama.cpp server)
     # -------------------------------
-    if provider == "vllm":
+    if provider == "mi50":
         payload = {
-            "model": model,
             "prompt": prompt,
-            "max_tokens": max_tokens,
+            "n_predict": max_tokens,
             "temperature": temperature
         }
-        r = requests.post(url, json=payload, timeout=120)
-        data = r.json()
-        return data["choices"][0]["text"]
+        try:
+            r = await http_client.post(f"{url}/completion", json=payload)
+            r.raise_for_status()
+            data = r.json()
+            return data.get("content", "")
+        except httpx.HTTPError as e:
+            logger.error(f"HTTP error calling mi50: {type(e).__name__}: {str(e)}")
+            raise RuntimeError(f"LLM API error (mi50): {type(e).__name__}: {str(e)}")
+        except (KeyError, json.JSONDecodeError) as e:
+            logger.error(f"Response parsing error from mi50: {e}")
+            raise RuntimeError(f"Invalid response format (mi50): {e}")
+        except Exception as e:
+            logger.error(f"Unexpected error calling mi50: {type(e).__name__}: {str(e)}")
+            raise RuntimeError(f"Unexpected error (mi50): {type(e).__name__}: {str(e)}")
 
     # -------------------------------
     # Provider: OLLAMA (your 3090)
@@ -79,13 +95,22 @@ async def call_llm(
             "messages": [
                 {"role": "user", "content": prompt}
             ],
-            "stream": False        # <-- critical fix
+            "stream": False
         }
-
-        r = requests.post(f"{url}/api/chat", json=payload, timeout=120)
-        data = r.json()
-
-        return data["message"]["content"]
+        try:
+            r = await http_client.post(f"{url}/api/chat", json=payload)
+            r.raise_for_status()
+            data = r.json()
+            return data["message"]["content"]
+        except httpx.HTTPError as e:
+            logger.error(f"HTTP error calling ollama: {type(e).__name__}: {str(e)}")
+            raise RuntimeError(f"LLM API error (ollama): {type(e).__name__}: {str(e)}")
+        except (KeyError, json.JSONDecodeError) as e:
+            logger.error(f"Response parsing error from ollama: {e}")
+            raise RuntimeError(f"Invalid response format (ollama): {e}")
+        except Exception as e:
+            logger.error(f"Unexpected error calling ollama: {type(e).__name__}: {str(e)}")
+            raise RuntimeError(f"Unexpected error (ollama): {type(e).__name__}: {str(e)}")
 
 
     # -------------------------------
@@ -104,9 +129,20 @@ async def call_llm(
             "temperature": temperature,
             "max_tokens": max_tokens,
         }
-        r = requests.post(f"{url}/chat/completions", json=payload, headers=headers, timeout=120)
-        data = r.json()
-        return data["choices"][0]["message"]["content"]
+        try:
+            r = await http_client.post(f"{url}/chat/completions", json=payload, headers=headers)
+            r.raise_for_status()
+            data = r.json()
+            return data["choices"][0]["message"]["content"]
+        except httpx.HTTPError as e:
+            logger.error(f"HTTP error calling openai: {type(e).__name__}: {str(e)}")
+            raise RuntimeError(f"LLM API error (openai): {type(e).__name__}: {str(e)}")
+        except (KeyError, json.JSONDecodeError) as e:
+            logger.error(f"Response parsing error from openai: {e}")
+            raise RuntimeError(f"Invalid response format (openai): {e}")
+        except Exception as e:
+            logger.error(f"Unexpected error calling openai: {type(e).__name__}: {str(e)}")
+            raise RuntimeError(f"Unexpected error (openai): {type(e).__name__}: {str(e)}")
 
     # -------------------------------
     # Unknown provider
diff --git a/docker-compose.yml b/docker-compose.yml
index ecd5f0e..4a63308 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -97,6 +97,21 @@ services:
     networks:
       - lyra_net
 
+  # ============================================================
+  # UI Server
+  # ============================================================
+  lyra-ui:
+    image: nginx:alpine
+    container_name: lyra-ui
+    restart: unless-stopped
+    ports:
+      - "8081:80"
+    volumes:
+      - ./core/ui:/usr/share/nginx/html:ro
+    networks:
+      - lyra_net
+
+
   # ============================================================
   # Cortex
   # ============================================================
diff --git a/docs/ARCHITECTURE_v0-6-0.md b/docs/ARCHITECTURE_v0-6-0.md
new file mode 100644
index 0000000..6bd9a27
--- /dev/null
+++ b/docs/ARCHITECTURE_v0-6-0.md
@@ -0,0 +1,280 @@
+
+
+`docs/ARCHITECTURE_v0.6.0.md`
+
+This reflects **everything we clarified**, expressed cleanly and updated to the new 3-brain design.
+
+---
+
+# **Cortex v0.6.0 — Cognitive Architecture Overview**
+
+*Last updated: Dec 2025*
+
+## **Summary**
+
+Cortex v0.6.0 evolves from a linear “reflection → reasoning → refine → persona” pipeline into a **three-layer cognitive system** modeled after human cognition:
+
+1. **Autonomy Core** — Lyra’s self-model (identity, mood, long-term goals)
+2. **Inner Monologue** — Lyra’s private narrator (self-talk + internal reflection)
+3. **Executive Agent (DeepSeek)** — Lyra’s task-oriented decision-maker
+
+Cortex itself now becomes the **central orchestrator**, not the whole mind. It routes user messages through these layers and produces the final outward response via the persona system.
+
+---
+
+# **Chain concept**
+    User > Relay > Cortex intake > Inner self > Cortex > Exec (deepseek) > Cortex > persona > relay > user And inner self
+
+               USER
+                 │
+                 ▼
+              RELAY
+   (sessions, logging, routing)
+                 │
+                 ▼
+  ┌──────────────────────────────────┐
+  │              CORTEX              │
+  │ Intake → Reflection → Exec → Reason → Refine │
+  └───────────────┬──────────────────┘
+                  │ self_state
+                  ▼
+         INNER SELF (monologue)
+                  │
+                  ▼
+            AUTONOMY CORE
+           (long-term identity)
+                  ▲
+                  │
+         Persona Layer (speak)
+                  │
+                  ▼
+                RELAY
+                  │
+                  ▼
+                 USER
+
+
+
+
+
+# **High-level Architecture**
+
+```
+               Autonomy Core (Self-Model)
+      ┌────────────────────────────────────────┐
+      │ mood, identity, goals, emotional state│
+      │ updated outside Cortex by inner monologue│
+      └─────────────────────┬──────────────────┘
+                            │
+                            ▼
+               Inner Monologue (Self-Talk Loop)
+      ┌────────────────────────────────────────┐
+      │ Interprets events in language          │
+      │ Updates Autonomy Core                  │
+      │ Sends state-signals INTO Cortex        │
+      └─────────────────────┬──────────────────┘
+                            │
+                            ▼
+              Cortex (Task Brain / Router)
+   ┌────────────────────────────────────────────────────────┐
+   │ Intake → Reflection → Exec Agent → Reason → Refinement │
+   │            ↑                                  │        │
+   │            │                                  ▼        │
+   │     Receives state from                Persona Output  │
+   │        inner self                       (Lyra’s voice) │
+   └────────────────────────────────────────────────────────┘
+```
+
+The **user interacts only with the Persona layer**.
+Inner Monologue and Autonomy Core never speak directly to the user.
+
+---
+
+# **Component Breakdown**
+
+## **1. Autonomy Core (Self-Model)**
+
+*Not inside Cortex.*
+
+A persistent JSON/state machine representing Lyra’s ongoing inner life:
+
+* `mood`
+* `focus_mode`
+* `confidence`
+* `identity_traits`
+* `relationship_memory`
+* `long_term_goals`
+* `emotional_baseline`
+
+The Autonomy Core:
+
+* Is updated by Inner Monologue
+* Exposes its state to Cortex via a simple `get_state()` API
+* Never speaks to the user directly
+* Does not run LLMs itself
+
+It is the **structure** of self, not the thoughts.
+
+---
+
+## **2. Inner Monologue (Narrating, Private Mind)**
+
+*New subsystem in v0.6.0.*
+
+This module:
+
+* Reads Cortex summaries (intake, reflection, persona output)
+* Generates private self-talk (using an LLM, typically DeepSeek)
+* Updates the Autonomy Core
+* Produces a **self-state packet** for Cortex to use during task execution
+
+Inner Monologue is like:
+
+> “Brian is asking about X.
+> I should shift into a focused, serious tone.
+> I feel confident about this area.”
+
+It **never** outputs directly to the user.
+
+### Output schema (example):
+
+```json
+{
+  "mood": "focused",
+  "persona_bias": "clear",
+  "confidence_delta": +0.05,
+  "stance": "analytical",
+  "notes_to_cortex": [
+     "Reduce playfulness",
+     "Prioritize clarity",
+     "Recall project memory"
+  ]
+}
+```
+
+---
+
+## **3. Executive Agent (DeepSeek Director Mode)**
+
+Inside Cortex.
+
+This is Lyra’s **prefrontal cortex** — the task-oriented planner that decides how to respond to the current user message.
+
+Input to Executive Agent:
+
+* User message
+* Intake summary
+* Reflection notes
+* **Self-state packet** from Inner Monologue
+
+It outputs a **plan**, not a final answer:
+
+```json
+{
+  "action": "WRITE_NOTE",
+  "tools": ["memory_search"],
+  "tone": "focused",
+  "steps": [
+     "Search relevant project notes",
+     "Synthesize into summary",
+     "Draft actionable update"
+  ]
+}
+```
+
+Cortex then executes this plan.
+
+---
+
+# **Cortex Pipeline (v0.6.0)**
+
+Cortex becomes the orchestrator for the entire sequence:
+
+### **0. Intake**
+
+Parse the user message, extract relevant features.
+
+### **1. Reflection**
+
+Lightweight summarization (unchanged).
+Output used by both Inner Monologue and Executive Agent.
+
+### **2. Inner Monologue Update (parallel)**
+
+Reflection summary is sent to Inner Self, which:
+
+* updates Autonomy Core
+* returns `self_state` to Cortex
+
+### **3. Executive Agent (DeepSeek)**
+
+Given:
+
+* user message
+* reflection summary
+* autonomy self_state
+  → produce a **task plan**
+
+### **4. Reasoning**
+
+Carries out the plan:
+
+* tool calls
+* retrieval
+* synthesis
+
+### **5. Refinement**
+
+Polish the draft, ensure quality, follow constraints.
+
+### **6. Persona (speak.py)**
+
+Final transformation into Lyra’s voice.
+Persona now uses:
+
+* self_state (mood, tone)
+* constraints from Executive Agent
+
+### **7. User Response**
+
+Persona output is delivered to the user.
+
+### **8. Inner Monologue Post-Update**
+
+Cortex sends the final answer BACK to inner self for:
+
+* narrative continuity
+* emotional adjustment
+* identity update
+
+---
+
+# **Key Conceptual Separation**
+
+These three layers must remain distinct:
+
+| Layer               | Purpose                                                 |
+| ------------------- | ------------------------------------------------------- |
+| **Autonomy Core**   | Lyra’s identity + emotional continuity                  |
+| **Inner Monologue** | Lyra’s private thoughts, interpretation, meaning-making |
+| **Executive Agent** | Deciding what to *do* for the user message              |
+| **Cortex**          | Executing the plan                                      |
+| **Persona**         | Outward voice (what the user actually hears)            |
+
+The **user only interacts with Persona.**
+Inner Monologue and Autonomy Core are internal cognitive machinery.
+
+---
+
+# **What This Architecture Enables**
+
+* Emotional continuity
+* Identity stability
+* Agentic decision-making
+* Multi-model routing
+* Context-aware tone
+* Internal narrative
+* Proactive behavioral shifts
+* Human-like cognition
+
+This design turns Cortex from a simple pipeline into the **center of a functional artificial mind**.
\ No newline at end of file
diff --git a/docs/ARCH_v0-6-1.md b/docs/ARCH_v0-6-1.md
new file mode 100644
index 0000000..ef67ab3
--- /dev/null
+++ b/docs/ARCH_v0-6-1.md
@@ -0,0 +1,354 @@
+Here you go — **ARCHITECTURE_v0.6.1.md**, clean, structured, readable, and aligned exactly with the new mental model where **Inner Self is the core agent** the user interacts with.
+
+No walls of text — just the right amount of detail.
+
+---
+
+# **ARCHITECTURE_v0.6.1 — Lyra Cognitive System**
+
+> **Core change from v0.6.0 → v0.6.1:**
+> **Inner Self becomes the primary conversational agent**
+> (the model the user is *actually* talking to),
+> while Executive and Cortex models support the Self rather than drive it.
+
+---
+
+# **1. High-Level Overview**
+
+Lyra v0.6.1 is composed of **three cognitive layers** and **one expression layer**, plus an autonomy module for ongoing identity continuity.
+
+```
+USER
+  ↓
+Relay (I/O)
+  ↓
+Cortex Intake (context snapshot)
+  ↓
+INNER SELF  ←→  EXECUTIVE MODEL (DeepSeek)
+  ↓
+Cortex Chat Model (draft language)
+  ↓
+Persona Model (Lyra’s voice)
+  ↓
+Relay → USER
+  ↓
+Inner Self updates Autonomy Core (self-state)
+```
+
+---
+
+# **2. Roles of Each Layer**
+
+---
+
+## **2.1 Inner Self (Primary Conversational Agent)**
+
+The Self is Lyra’s “seat of consciousness.”
+
+This layer:
+
+* Interprets every user message
+* Maintains internal monologue
+* Chooses emotional stance (warm, blunt, focused, chaotic)
+* Decides whether to think deeply or reply quickly
+* Decides whether to consult the Executive model
+* Forms a **response intent**
+* Provides tone and meta-guidance to the Persona layer
+* Updates self-state (mood, trust, narrative identity)
+
+Inner Self is the thing the **user is actually talking to.**
+
+Inner Self does **NOT** generate paragraphs of text —
+it generates *intent*:
+
+```
+{
+  "intent": "comfort Brian and explain the error simply",
+  "tone": "gentle",
+  "depth": "medium",
+  "consult_exec": true
+}
+```
+
+---
+
+## **2.2 Executive Model (DeepSeek Reasoner)**
+
+This model is the **thinking engine** Inner Self consults when necessary.
+
+It performs:
+
+* planning
+* deep reasoning
+* tool selection
+* multi-step logic
+* explanation chains
+
+It never speaks directly to the user.
+
+It returns a **plan**, not a message:
+
+```
+{
+  "plan": [
+    "Identify error",
+    "Recommend restart",
+    "Reassure user"
+  ],
+  "confidence": 0.86
+}
+```
+
+Inner Self can follow or override the plan.
+
+---
+
+## **2.3 Cortex Chat Model (Draft Generator)**
+
+This is the **linguistic engine**.
+
+It converts Inner Self’s intent (plus Executive’s plan if provided) into actual language:
+
+Input:
+
+```
+intent + optional plan + context snapshot
+```
+
+Output:
+
+```
+structured draft paragraph
+```
+
+This model must be:
+
+* instruction-tuned
+* coherent
+* factual
+* friendly
+
+Examples: GPT-4o-mini, Qwen-14B-instruct, Mixtral chat, etc.
+
+---
+
+## **2.4 Persona Model (Lyra’s Voice)**
+
+This is the **expression layer** — the mask, the tone, the identity.
+
+It takes:
+
+* the draft language
+* the Self’s tone instructions
+* the narrative state (from Autonomy Core)
+* prior persona shaping rules
+
+And transforms the text into:
+
+* Lyra’s voice
+* Lyra’s humor
+* Lyra’s emotional texture
+* Lyra’s personality consistency
+
+Persona does not change the *meaning* — only the *presentation*.
+
+---
+
+# **3. Message Flow (Full Pipeline)**
+
+A clean version, step-by-step:
+
+---
+
+### **1. USER → Relay**
+
+Relay attaches metadata (session, timestamp) and forwards to Cortex.
+
+---
+
+### **2. Intake → Context Snapshot**
+
+Cortex creates:
+
+* cleaned message
+* recent context summary
+* memory matches (RAG)
+* time-since-last
+* conversation mode
+
+---
+
+### **3. Inner Self Receives Snapshot**
+
+Inner Self:
+
+* interprets the user’s intent
+* updates internal monologue
+* decides how Lyra *feels* about the input
+* chooses whether to consult Executive
+* produces an **intent packet**
+
+---
+
+### **4. (Optional) Inner Self Consults Executive Model**
+
+Inner Self sends the situation to DeepSeek:
+
+```
+"Given Brian's message and my context, what is the best plan?"
+```
+
+DeepSeek returns:
+
+* a plan
+* recommended steps
+* rationale
+* optional tool suggestions
+
+Inner Self integrates the plan or overrides it.
+
+---
+
+### **5. Inner Self → Cortex Chat Model**
+
+Self creates an **instruction packet**:
+
+```
+{
+  "intent": "...",
+  "tone": "...",
+  "plan": [...],
+  "context_summary": {...}
+}
+```
+
+Cortex chat model produces the draft text.
+
+---
+
+### **6. Persona Model Transforms the Draft**
+
+Persona takes draft → produces final Lyra-styled output.
+
+Persona ensures:
+
+* emotional fidelity
+* humor when appropriate
+* warmth / sharpness depending on state
+* consistent narrative identity
+
+---
+
+### **7. Relay Sends Response to USER**
+
+---
+
+### **8. Inner Self Updates Autonomy Core**
+
+Inner Self receives:
+
+* the action taken
+* the emotional tone used
+* any RAG results
+* narrative significance
+
+And updates:
+
+* mood
+* trust memory
+* identity drift
+* ongoing narrative
+* stable traits
+
+This becomes part of her evolving self.
+
+---
+
+# **4. Cognitive Ownership Summary**
+
+### Inner Self
+
+**Owns:**
+
+* decision-making
+* feeling
+* interpreting
+* intent
+* tone
+* continuity of self
+* mood
+* monologue
+* overrides
+
+### Executive (DeepSeek)
+
+**Owns:**
+
+* logic
+* planning
+* structure
+* analysis
+* tool selection
+
+### Cortex Chat Model
+
+**Owns:**
+
+* language generation
+* factual content
+* clarity
+
+### Persona
+
+**Owns:**
+
+* voice
+* flavor
+* style
+* emotional texture
+* social expression
+
+---
+
+# **5. Why v0.6.1 is Better**
+
+* More human
+* More natural
+* Allows spontaneous responses
+* Allows deep thinking when needed
+* Separates “thought” from “speech”
+* Gives Lyra a *real self*
+* Allows much more autonomy later
+* Matches your brain’s actual structure
+
+---
+
+# **6. Migration Notes from v0.6.0**
+
+Nothing is deleted.
+Everything is **rearranged** so that meaning, intent, and tone flow correctly.
+
+Main changes:
+
+* Inner Self now initiates the response, rather than merely influencing it.
+* Executive is secondary, not primary.
+* Persona becomes an expression layer, not a content layer.
+* Cortex Chat Model handles drafting, not cognition.
+
+The whole system becomes both more powerful and easier to reason about.
+
+---
+
+If you want, I can also generate:
+
+### ✔ the updated directory structure
+
+### ✔ the updated function-level API contracts
+
+### ✔ the v0.6.1 llm_router configuration
+
+### ✔ code scaffolds for inner_self.py and autonomy_core.py
+
+### ✔ the call chain diagrams (ASCII or PNG)
+
+Just say **“continue v0.6.1”** and I’ll build the next layer.
diff --git a/ENVIRONMENT_VARIABLES.md b/docs/ENVIRONMENT_VARIABLES.md
similarity index 100%
rename from ENVIRONMENT_VARIABLES.md
rename to docs/ENVIRONMENT_VARIABLES.md
diff --git a/docs/LLMS.md b/docs/LLMS.md
new file mode 100644
index 0000000..6439a18
--- /dev/null
+++ b/docs/LLMS.md
@@ -0,0 +1,39 @@
+Request Flow Chain
+1. UI (Frontend)
+   ↓ sends HTTP POST to
+   
+2. Relay Service (Node.js - server.js)
+   Location: /home/serversdown/project-lyra/core/relay/server.js
+   Port: 7078
+   Endpoint: POST /v1/chat/completions
+   ↓ calls handleChatRequest() which posts to
+   
+3. Cortex Service - Reason Endpoint (Python FastAPI - router.py)
+   Location: /home/serversdown/project-lyra/cortex/router.py
+   Port: 7081
+   Endpoint: POST /reason
+   Function: run_reason() at line 126
+   ↓ calls
+   
+4. Cortex Reasoning Module (reasoning.py)
+   Location: /home/serversdown/project-lyra/cortex/reasoning/reasoning.py
+   Function: reason_check() at line 188
+   ↓ calls
+   
+5. LLM Router (llm_router.py)
+   Location: /home/serversdown/project-lyra/cortex/llm/llm_router.py
+   Function: call_llm()
+   - Gets backend from env: CORTEX_LLM=PRIMARY (from .env line 29)
+   - Looks up PRIMARY config which has provider="mi50" (from .env line 13)
+   - Routes to the mi50 provider handler (line 62-70)
+   ↓ makes HTTP POST to
+   
+6. MI50 LLM Server (llama.cpp)
+   Location: http://10.0.0.44:8080
+   Endpoint: POST /completion
+   Hardware: AMD MI50 GPU running DeepSeek model
+Key Configuration Points
+Backend Selection: .env:29 sets CORTEX_LLM=PRIMARY
+Provider Name: .env:13 sets LLM_PRIMARY_PROVIDER=mi50
+Server URL: .env:14 sets LLM_PRIMARY_URL=http://10.0.0.44:8080
+Provider Handler: llm_router.py:62-70 implements the mi50 provider
\ No newline at end of file
diff --git a/docs/PROJECT_SUMMARY.md b/docs/PROJECT_SUMMARY.md
new file mode 100644
index 0000000..7395e46
--- /dev/null
+++ b/docs/PROJECT_SUMMARY.md
@@ -0,0 +1,925 @@
+# Project Lyra — Comprehensive AI Context Summary
+
+**Version:** v0.5.1 (2025-12-11)
+**Status:** Production-ready modular AI companion system
+**Purpose:** Memory-backed conversational AI with multi-stage reasoning, persistent context, and modular LLM backend architecture
+
+---
+
+## Executive Summary
+
+Project Lyra is a **self-hosted AI companion system** designed to overcome the limitations of typical chatbots by providing:
+- **Persistent long-term memory** (NeoMem: PostgreSQL + Neo4j graph storage)
+- **Multi-stage reasoning pipeline** (Cortex: reflection → reasoning → refinement → persona)
+- **Short-term context management** (Intake: session-based summarization embedded in Cortex)
+- **Flexible LLM backend routing** (supports llama.cpp, Ollama, OpenAI, custom endpoints)
+- **OpenAI-compatible API** (drop-in replacement for chat applications)
+
+**Core Philosophy:** Like a human brain has different regions for different functions, Lyra has specialized modules that work together. She's not just a chatbot—she's a notepad, schedule, database, co-creator, and collaborator with her own executive function.
+
+---
+
+## Quick Context for AI Assistants
+
+If you're an AI being given this project to work on, here's what you need to know:
+
+### What This Project Does
+Lyra is a conversational AI system that **remembers everything** across sessions. When a user says something in passing, Lyra stores it, contextualizes it, and can recall it later. She can:
+- Track project progress over time
+- Remember user preferences and past conversations
+- Reason through complex questions using multiple LLM calls
+- Apply a consistent personality across all interactions
+- Integrate with multiple LLM backends (local and cloud)
+
+### Current Architecture (v0.5.1)
+```
+User → Relay (Express/Node.js, port 7078)
+  ↓
+Cortex (FastAPI/Python, port 7081)
+  ├─ Intake module (embedded, in-memory SESSIONS)
+  ├─ 4-stage reasoning pipeline
+  └─ Multi-backend LLM router
+  ↓
+NeoMem (FastAPI/Python, port 7077)
+  ├─ PostgreSQL (vector storage)
+  └─ Neo4j (graph relationships)
+```
+
+### Key Files You'll Work With
+
+**Backend Services:**
+- [cortex/router.py](cortex/router.py) - Main Cortex routing logic (306 lines, `/reason`, `/ingest` endpoints)
+- [cortex/intake/intake.py](cortex/intake/intake.py) - Short-term memory module (367 lines, SESSIONS management)
+- [cortex/reasoning/reasoning.py](cortex/reasoning/reasoning.py) - Draft answer generation
+- [cortex/reasoning/refine.py](cortex/reasoning/refine.py) - Answer refinement
+- [cortex/reasoning/reflection.py](cortex/reasoning/reflection.py) - Meta-awareness notes
+- [cortex/persona/speak.py](cortex/persona/speak.py) - Personality layer
+- [cortex/llm/llm_router.py](cortex/llm/llm_router.py) - LLM backend selector
+- [core/relay/server.js](core/relay/server.js) - Main orchestrator (Node.js)
+- [neomem/main.py](neomem/main.py) - Long-term memory API
+
+**Configuration:**
+- [.env](.env) - Root environment variables (LLM backends, databases, API keys)
+- [cortex/.env](cortex/.env) - Cortex-specific overrides
+- [docker-compose.yml](docker-compose.yml) - Service definitions (152 lines)
+
+**Documentation:**
+- [CHANGELOG.md](CHANGELOG.md) - Complete version history (836 lines, chronological format)
+- [README.md](README.md) - User-facing documentation (610 lines)
+- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - This file
+
+### Recent Critical Fixes (v0.5.1)
+The most recent work fixed a critical bug where Intake's SESSIONS buffer wasn't persisting:
+1. **Fixed**: `bg_summarize()` was only a TYPE_CHECKING stub → implemented as logging stub
+2. **Fixed**: `/ingest` endpoint had unreachable code → removed early return, added lenient error handling
+3. **Added**: `cortex/intake/__init__.py` → proper Python package structure
+4. **Added**: Diagnostic endpoints `/debug/sessions` and `/debug/summary` for troubleshooting
+
+**Key Insight**: Intake is no longer a standalone service—it's embedded in Cortex as a Python module. SESSIONS must persist in a single Uvicorn worker (no multi-worker support without Redis).
+
+---
+
+## Architecture Deep Dive
+
+### Service Topology (Docker Compose)
+
+**Active Containers:**
+1. **relay** (Node.js/Express, port 7078)
+   - Entry point for all user requests
+   - OpenAI-compatible `/v1/chat/completions` endpoint
+   - Routes to Cortex for reasoning
+   - Async calls to Cortex `/ingest` after response
+
+2. **cortex** (Python/FastAPI, port 7081)
+   - Multi-stage reasoning pipeline
+   - Embedded Intake module (no HTTP, direct Python imports)
+   - Endpoints: `/reason`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary`
+
+3. **neomem-api** (Python/FastAPI, port 7077)
+   - Long-term memory storage
+   - Fork of Mem0 OSS (fully local, no external SDK)
+   - Endpoints: `/memories`, `/search`, `/health`
+
+4. **neomem-postgres** (PostgreSQL + pgvector, port 5432)
+   - Vector embeddings storage
+   - Memory history records
+
+5. **neomem-neo4j** (Neo4j, ports 7474/7687)
+   - Graph relationships between memories
+   - Entity extraction and linking
+
+**Disabled Services:**
+- `intake` - No longer needed (embedded in Cortex as of v0.5.1)
+- `rag` - Beta Lyrae RAG service (planned re-enablement)
+
+### External LLM Backends (HTTP APIs)
+
+**PRIMARY Backend** - llama.cpp @ `http://10.0.0.44:8080`
+- AMD MI50 GPU-accelerated inference
+- Model: `/model` (path-based routing)
+- Used for: Reasoning, refinement, summarization
+
+**SECONDARY Backend** - Ollama @ `http://10.0.0.3:11434`
+- RTX 3090 GPU-accelerated inference
+- Model: `qwen2.5:7b-instruct-q4_K_M`
+- Used for: Configurable per-module
+
+**CLOUD Backend** - OpenAI @ `https://api.openai.com/v1`
+- Cloud-based inference
+- Model: `gpt-4o-mini`
+- Used for: Reflection, persona layers
+
+**FALLBACK Backend** - Local @ `http://10.0.0.41:11435`
+- CPU-based inference
+- Model: `llama-3.2-8b-instruct`
+- Used for: Emergency fallback
+
+### Data Flow (Request Lifecycle)
+
+```
+1. User sends message → Relay (/v1/chat/completions)
+   ↓
+2. Relay → Cortex (/reason)
+   ↓
+3. Cortex calls Intake module (internal Python)
+   - Intake.summarize_context(session_id, exchanges)
+   - Returns L1/L5/L10/L20/L30 summaries
+   ↓
+4. Cortex 4-stage pipeline:
+   a. reflection.py → Meta-awareness notes (CLOUD backend)
+      - "What is the user really asking?"
+      - Returns JSON: {"notes": [...]}
+
+   b. reasoning.py → Draft answer (PRIMARY backend)
+      - Uses context from Intake
+      - Integrates reflection notes
+      - Returns draft text
+
+   c. refine.py → Refined answer (PRIMARY backend)
+      - Polishes draft for clarity
+      - Ensures factual consistency
+      - Returns refined text
+
+   d. speak.py → Persona layer (CLOUD backend)
+      - Applies Lyra's personality
+      - Natural, conversational tone
+      - Returns final answer
+   ↓
+5. Cortex → Relay (returns persona answer)
+   ↓
+6. Relay → Cortex (/ingest) [async, non-blocking]
+   - Sends (session_id, user_msg, assistant_msg)
+   - Cortex calls add_exchange_internal()
+   - Appends to SESSIONS[session_id]["buffer"]
+   ↓
+7. Relay → User (returns final response)
+   ↓
+8. [Planned] Relay → NeoMem (/memories) [async]
+   - Store conversation in long-term memory
+```
+
+### Intake Module Architecture (v0.5.1)
+
+**Location:** `cortex/intake/`
+
+**Key Change:** Intake is now **embedded in Cortex** as a Python module, not a standalone service.
+
+**Import Pattern:**
+```python
+from intake.intake import add_exchange_internal, SESSIONS, summarize_context
+```
+
+**Core Data Structure:**
+```python
+SESSIONS: dict[str, dict] = {}
+
+# Structure:
+SESSIONS[session_id] = {
+    "buffer": deque(maxlen=200),  # Circular buffer of exchanges
+    "created_at": datetime
+}
+
+# Each exchange in buffer:
+{
+    "session_id": "...",
+    "user_msg": "...",
+    "assistant_msg": "...",
+    "timestamp": "2025-12-11T..."
+}
+```
+
+**Functions:**
+1. **`add_exchange_internal(exchange: dict)`**
+   - Adds exchange to SESSIONS buffer
+   - Creates new session if needed
+   - Calls `bg_summarize()` stub
+   - Returns `{"ok": True, "session_id": "..."}`
+
+2. **`summarize_context(session_id: str, exchanges: list[dict])`** [async]
+   - Generates L1/L5/L10/L20/L30 summaries via LLM
+   - Called during `/reason` endpoint
+   - Returns multi-level summary dict
+
+3. **`bg_summarize(session_id: str)`**
+   - **Stub function** - logs only, no actual work
+   - Defers summarization to `/reason` call
+   - Exists to prevent NameError
+
+**Critical Constraint:** SESSIONS is a module-level global dict. This requires **single-worker Uvicorn** mode. Multi-worker deployments need Redis or shared storage.
+
+**Diagnostic Endpoints:**
+- `GET /debug/sessions` - Inspect all SESSIONS (object ID, buffer sizes, recent exchanges)
+- `GET /debug/summary?session_id=X` - Test summarization for a session
+
+---
+
+## Environment Configuration
+
+### LLM Backend Registry (Multi-Backend Strategy)
+
+**Root `.env` defines all backend OPTIONS:**
+```bash
+# PRIMARY Backend (llama.cpp)
+LLM_PRIMARY_PROVIDER=llama.cpp
+LLM_PRIMARY_URL=http://10.0.0.44:8080
+LLM_PRIMARY_MODEL=/model
+
+# SECONDARY Backend (Ollama)
+LLM_SECONDARY_PROVIDER=ollama
+LLM_SECONDARY_URL=http://10.0.0.3:11434
+LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
+
+# CLOUD Backend (OpenAI)
+LLM_OPENAI_PROVIDER=openai
+LLM_OPENAI_URL=https://api.openai.com/v1
+LLM_OPENAI_MODEL=gpt-4o-mini
+OPENAI_API_KEY=sk-proj-...
+
+# FALLBACK Backend
+LLM_FALLBACK_PROVIDER=openai_completions
+LLM_FALLBACK_URL=http://10.0.0.41:11435
+LLM_FALLBACK_MODEL=llama-3.2-8b-instruct
+```
+
+**Module-specific backend selection:**
+```bash
+CORTEX_LLM=SECONDARY      # Cortex uses Ollama
+INTAKE_LLM=PRIMARY        # Intake uses llama.cpp
+SPEAK_LLM=OPENAI          # Persona uses OpenAI
+NEOMEM_LLM=PRIMARY        # NeoMem uses llama.cpp
+UI_LLM=OPENAI             # UI uses OpenAI
+RELAY_LLM=PRIMARY         # Relay uses llama.cpp
+```
+
+**Philosophy:** Root `.env` provides all backend OPTIONS. Each service chooses which backend to USE via `{MODULE}_LLM` variable. This eliminates URL duplication while preserving flexibility.
+
+### Database Configuration
+```bash
+# PostgreSQL (vector storage)
+POSTGRES_USER=neomem
+POSTGRES_PASSWORD=neomempass
+POSTGRES_DB=neomem
+POSTGRES_HOST=neomem-postgres
+POSTGRES_PORT=5432
+
+# Neo4j (graph storage)
+NEO4J_URI=bolt://neomem-neo4j:7687
+NEO4J_USERNAME=neo4j
+NEO4J_PASSWORD=neomemgraph
+```
+
+### Service URLs (Docker Internal Network)
+```bash
+NEOMEM_API=http://neomem-api:7077
+CORTEX_API=http://cortex:7081
+CORTEX_REASON_URL=http://cortex:7081/reason
+CORTEX_INGEST_URL=http://cortex:7081/ingest
+RELAY_URL=http://relay:7078
+```
+
+### Feature Flags
+```bash
+CORTEX_ENABLED=true
+MEMORY_ENABLED=true
+PERSONA_ENABLED=false
+DEBUG_PROMPT=true
+VERBOSE_DEBUG=true
+```
+
+---
+
+## Code Structure Overview
+
+### Cortex Service (`cortex/`)
+
+**Main Files:**
+- `main.py` - FastAPI app initialization
+- `router.py` - Route definitions (`/reason`, `/ingest`, `/health`, `/debug/*`)
+- `context.py` - Context aggregation (Intake summaries, session state)
+
+**Reasoning Pipeline (`reasoning/`):**
+- `reflection.py` - Meta-awareness notes (Cloud LLM)
+- `reasoning.py` - Draft answer generation (Primary LLM)
+- `refine.py` - Answer refinement (Primary LLM)
+
+**Persona Layer (`persona/`):**
+- `speak.py` - Personality application (Cloud LLM)
+- `identity.py` - Persona loader
+
+**Intake Module (`intake/`):**
+- `__init__.py` - Package exports (SESSIONS, add_exchange_internal, summarize_context)
+- `intake.py` - Core logic (367 lines)
+  - SESSIONS dictionary
+  - add_exchange_internal()
+  - summarize_context()
+  - bg_summarize() stub
+
+**LLM Integration (`llm/`):**
+- `llm_router.py` - Backend selector and HTTP client
+  - call_llm() function
+  - Environment-based routing
+  - Payload formatting per backend type
+
+**Utilities (`utils/`):**
+- Helper functions for common operations
+
+**Configuration:**
+- `Dockerfile` - Single-worker constraint documented
+- `requirements.txt` - Python dependencies
+- `.env` - Service-specific overrides
+
+### Relay Service (`core/relay/`)
+
+**Main Files:**
+- `server.js` - Express.js server (Node.js)
+  - `/v1/chat/completions` - OpenAI-compatible endpoint
+  - `/chat` - Internal endpoint
+  - `/_health` - Health check
+- `package.json` - Node.js dependencies
+
+**Key Logic:**
+- Receives user messages
+- Routes to Cortex `/reason`
+- Async calls to Cortex `/ingest` after response
+- Returns final answer to user
+
+### NeoMem Service (`neomem/`)
+
+**Main Files:**
+- `main.py` - FastAPI app (memory API)
+- `memory.py` - Memory management logic
+- `embedder.py` - Embedding generation
+- `graph.py` - Neo4j graph operations
+- `Dockerfile` - Container definition
+- `requirements.txt` - Python dependencies
+
+**API Endpoints:**
+- `POST /memories` - Add new memory
+- `POST /search` - Semantic search
+- `GET /health` - Service health
+
+---
+
+## Common Development Tasks
+
+### Adding a New Endpoint to Cortex
+
+**Example: Add `/debug/buffer` endpoint**
+
+1. **Edit `cortex/router.py`:**
+```python
+@cortex_router.get("/debug/buffer")
+async def debug_buffer(session_id: str, limit: int = 10):
+    """Return last N exchanges from a session buffer."""
+    from intake.intake import SESSIONS
+
+    session = SESSIONS.get(session_id)
+    if not session:
+        return {"error": "session not found", "session_id": session_id}
+
+    buffer = session["buffer"]
+    recent = list(buffer)[-limit:]
+
+    return {
+        "session_id": session_id,
+        "total_exchanges": len(buffer),
+        "recent_exchanges": recent
+    }
+```
+
+2. **Restart Cortex:**
+```bash
+docker-compose restart cortex
+```
+
+3. **Test:**
+```bash
+curl "http://localhost:7081/debug/buffer?session_id=test&limit=5"
+```
+
+### Modifying LLM Backend for a Module
+
+**Example: Switch Cortex to use PRIMARY backend**
+
+1. **Edit `.env`:**
+```bash
+CORTEX_LLM=PRIMARY  # Change from SECONDARY to PRIMARY
+```
+
+2. **Restart Cortex:**
+```bash
+docker-compose restart cortex
+```
+
+3. **Verify in logs:**
+```bash
+docker logs cortex | grep "Backend"
+```
+
+### Adding Diagnostic Logging
+
+**Example: Log every exchange addition**
+
+1. **Edit `cortex/intake/intake.py`:**
+```python
+def add_exchange_internal(exchange: dict):
+    session_id = exchange.get("session_id")
+
+    # Add detailed logging
+    print(f"[DEBUG] Adding exchange to {session_id}")
+    print(f"[DEBUG] User msg: {exchange.get('user_msg', '')[:100]}")
+    print(f"[DEBUG] Assistant msg: {exchange.get('assistant_msg', '')[:100]}")
+
+    # ... rest of function
+```
+
+2. **View logs:**
+```bash
+docker logs cortex -f | grep DEBUG
+```
+
+---
+
+## Debugging Guide
+
+### Problem: SESSIONS Not Persisting
+
+**Symptoms:**
+- `/debug/sessions` shows empty or only 1 exchange
+- Summaries always return empty
+- Buffer size doesn't increase
+
+**Diagnosis Steps:**
+1. Check Cortex logs for SESSIONS object ID:
+   ```bash
+   docker logs cortex | grep "SESSIONS object id"
+   ```
+   - Should show same ID across all calls
+   - If IDs differ → module reloading issue
+
+2. Verify single-worker mode:
+   ```bash
+   docker exec cortex cat Dockerfile | grep uvicorn
+   ```
+   - Should NOT have `--workers` flag or `--workers 1`
+
+3. Check `/debug/sessions` endpoint:
+   ```bash
+   curl http://localhost:7081/debug/sessions | jq
+   ```
+   - Should show sessions_object_id and current sessions
+
+4. Inspect `__init__.py` exists:
+   ```bash
+   docker exec cortex ls -la intake/__init__.py
+   ```
+
+**Solution (Fixed in v0.5.1):**
+- Ensure `cortex/intake/__init__.py` exists with proper exports
+- Verify `bg_summarize()` is implemented (not just TYPE_CHECKING stub)
+- Check `/ingest` endpoint doesn't have early return
+- Rebuild Cortex container: `docker-compose build cortex && docker-compose restart cortex`
+
+### Problem: LLM Backend Timeout
+
+**Symptoms:**
+- Cortex `/reason` hangs
+- 504 Gateway Timeout errors
+- Logs show "waiting for LLM response"
+
+**Diagnosis Steps:**
+1. Test backend directly:
+   ```bash
+   # llama.cpp
+   curl http://10.0.0.44:8080/health
+
+   # Ollama
+   curl http://10.0.0.3:11434/api/tags
+
+   # OpenAI
+   curl https://api.openai.com/v1/models \
+     -H "Authorization: Bearer $OPENAI_API_KEY"
+   ```
+
+2. Check network connectivity:
+   ```bash
+   docker exec cortex ping -c 3 10.0.0.44
+   ```
+
+3. Review Cortex logs:
+   ```bash
+   docker logs cortex -f | grep "LLM"
+   ```
+
+**Solutions:**
+- Verify backend URL in `.env` is correct and accessible
+- Check firewall rules for backend ports
+- Increase timeout in `cortex/llm/llm_router.py`
+- Switch to different backend temporarily: `CORTEX_LLM=CLOUD`
+
+### Problem: Docker Compose Won't Start
+
+**Symptoms:**
+- `docker-compose up -d` fails
+- Container exits immediately
+- "port already in use" errors
+
+**Diagnosis Steps:**
+1. Check port conflicts:
+   ```bash
+   netstat -tulpn | grep -E '7078|7081|7077|5432'
+   ```
+
+2. Check container logs:
+   ```bash
+   docker-compose logs --tail=50
+   ```
+
+3. Verify environment file:
+   ```bash
+   cat .env | grep -v "^#" | grep -v "^$"
+   ```
+
+**Solutions:**
+- Stop conflicting services: `docker-compose down`
+- Check `.env` syntax (no quotes unless necessary)
+- Rebuild containers: `docker-compose build --no-cache`
+- Check Docker daemon: `systemctl status docker`
+
+---
+
+## Testing Checklist
+
+### After Making Changes to Cortex
+
+**1. Build and restart:**
+```bash
+docker-compose build cortex
+docker-compose restart cortex
+```
+
+**2. Verify service health:**
+```bash
+curl http://localhost:7081/health
+```
+
+**3. Test /ingest endpoint:**
+```bash
+curl -X POST http://localhost:7081/ingest \
+  -H "Content-Type: application/json" \
+  -d '{
+    "session_id": "test",
+    "user_msg": "Hello",
+    "assistant_msg": "Hi there!"
+  }'
+```
+
+**4. Verify SESSIONS updated:**
+```bash
+curl http://localhost:7081/debug/sessions | jq '.sessions.test.buffer_size'
+```
+- Should show 1 (or increment if already populated)
+
+**5. Test summarization:**
+```bash
+curl "http://localhost:7081/debug/summary?session_id=test" | jq '.summary'
+```
+- Should return L1/L5/L10/L20/L30 summaries
+
+**6. Test full pipeline:**
+```bash
+curl -X POST http://localhost:7078/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [{"role": "user", "content": "Test message"}],
+    "session_id": "test"
+  }' | jq '.choices[0].message.content'
+```
+
+**7. Check logs for errors:**
+```bash
+docker logs cortex --tail=50
+```
+
+---
+
+## Project History & Context
+
+### Evolution Timeline
+
+**v0.1.x (2025-09-23 to 2025-09-25)**
+- Initial MVP: Relay + Mem0 + Ollama
+- Basic memory storage and retrieval
+- Simple UI with session support
+
+**v0.2.x (2025-09-24 to 2025-09-30)**
+- Migrated to mem0ai SDK
+- Added sessionId support
+- Created standalone Lyra-Mem0 stack
+
+**v0.3.x (2025-09-26 to 2025-10-28)**
+- Forked Mem0 → NVGRAM → NeoMem
+- Added salience filtering
+- Integrated Cortex reasoning VM
+- Built RAG system (Beta Lyrae)
+- Established multi-backend LLM support
+
+**v0.4.x (2025-11-05 to 2025-11-13)**
+- Major architectural rewire
+- Implemented 4-stage reasoning pipeline
+- Added reflection, refinement stages
+- RAG integration
+- LLM router with per-stage backend selection
+
+**Infrastructure v1.0.0 (2025-11-26)**
+- Consolidated 9 `.env` files into single source of truth
+- Multi-backend LLM strategy
+- Docker Compose consolidation
+- Created security templates
+
+**v0.5.0 (2025-11-28)**
+- Fixed all critical API wiring issues
+- Added OpenAI-compatible Relay endpoint
+- Fixed Cortex → Intake integration
+- End-to-end flow verification
+
+**v0.5.1 (2025-12-11) - CURRENT**
+- **Critical fix**: SESSIONS persistence bug
+- Implemented `bg_summarize()` stub
+- Fixed `/ingest` unreachable code
+- Added `cortex/intake/__init__.py`
+- Embedded Intake in Cortex (no longer standalone)
+- Added diagnostic endpoints
+- Lenient error handling
+- Documented single-worker constraint
+
+### Architectural Philosophy
+
+**Modular Design:**
+- Each service has a single, clear responsibility
+- Services communicate via well-defined HTTP APIs
+- Configuration is centralized but allows per-service overrides
+
+**Local-First:**
+- No reliance on external services (except optional OpenAI)
+- All data stored locally (PostgreSQL + Neo4j)
+- Can run entirely air-gapped with local LLMs
+
+**Flexible LLM Backend:**
+- Not tied to any single LLM provider
+- Can mix local and cloud models
+- Per-stage backend selection for optimal performance/cost
+
+**Error Handling:**
+- Lenient mode: Never fail the chat pipeline
+- Log errors but continue processing
+- Graceful degradation
+
+**Observability:**
+- Diagnostic endpoints for debugging
+- Verbose logging mode
+- Object ID tracking for singleton verification
+
+---
+
+## Known Issues & Limitations
+
+### Fixed in v0.5.1
+- ✅ Intake SESSIONS not persisting → **FIXED**
+- ✅ `bg_summarize()` NameError → **FIXED**
+- ✅ `/ingest` endpoint unreachable code → **FIXED**
+
+### Current Limitations
+
+**1. Single-Worker Constraint**
+- Cortex must run with single Uvicorn worker
+- SESSIONS is in-memory module-level global
+- Multi-worker support requires Redis or shared storage
+- Documented in `cortex/Dockerfile` lines 7-8
+
+**2. NeoMem Integration Incomplete**
+- Relay doesn't yet push to NeoMem after responses
+- Memory storage planned for v0.5.2
+- Currently all memory is short-term (SESSIONS only)
+
+**3. RAG Service Disabled**
+- Beta Lyrae (RAG) commented out in docker-compose.yml
+- Awaiting re-enablement after Intake stabilization
+- Code exists but not currently integrated
+
+**4. Session Management**
+- No session cleanup/expiration
+- SESSIONS grows unbounded (maxlen=200 per session, but infinite sessions)
+- No session list endpoint in Relay
+
+**5. Persona Integration**
+- `PERSONA_ENABLED=false` in `.env`
+- Persona Sidecar not fully wired
+- Identity loaded but not consistently applied
+
+### Future Enhancements
+
+**Short-term (v0.5.2):**
+- Enable NeoMem integration in Relay
+- Add session cleanup/expiration
+- Session list endpoint
+- NeoMem health monitoring
+
+**Medium-term (v0.6.x):**
+- Re-enable RAG service
+- Migrate SESSIONS to Redis for multi-worker support
+- Add request correlation IDs
+- Comprehensive health checks
+
+**Long-term (v0.7.x+):**
+- Persona Sidecar full integration
+- Autonomous "dream" cycles (self-reflection)
+- Verifier module for factual grounding
+- Advanced RAG with hybrid search
+- Memory consolidation strategies
+
+---
+
+## Troubleshooting Quick Reference
+
+| Problem | Quick Check | Solution |
+|---------|-------------|----------|
+| SESSIONS empty | `curl localhost:7081/debug/sessions` | Rebuild Cortex, verify `__init__.py` exists |
+| LLM timeout | `curl http://10.0.0.44:8080/health` | Check backend connectivity, increase timeout |
+| Port conflict | `netstat -tulpn \| grep 7078` | Stop conflicting service or change port |
+| Container crash | `docker logs cortex` | Check logs for Python errors, verify .env syntax |
+| Missing package | `docker exec cortex pip list` | Rebuild container, check requirements.txt |
+| 502 from Relay | `curl localhost:7081/health` | Verify Cortex is running, check docker network |
+
+---
+
+## API Reference (Quick)
+
+### Relay (Port 7078)
+
+**POST /v1/chat/completions** - OpenAI-compatible chat
+```json
+{
+  "messages": [{"role": "user", "content": "..."}],
+  "session_id": "..."
+}
+```
+
+**GET /_health** - Service health
+
+### Cortex (Port 7081)
+
+**POST /reason** - Main reasoning pipeline
+```json
+{
+  "session_id": "...",
+  "user_prompt": "...",
+  "temperature": 0.7  // optional
+}
+```
+
+**POST /ingest** - Add exchange to SESSIONS
+```json
+{
+  "session_id": "...",
+  "user_msg": "...",
+  "assistant_msg": "..."
+}
+```
+
+**GET /debug/sessions** - Inspect SESSIONS state
+
+**GET /debug/summary?session_id=X** - Test summarization
+
+**GET /health** - Service health
+
+### NeoMem (Port 7077)
+
+**POST /memories** - Add memory
+```json
+{
+  "messages": [{"role": "...", "content": "..."}],
+  "user_id": "...",
+  "metadata": {}
+}
+```
+
+**POST /search** - Semantic search
+```json
+{
+  "query": "...",
+  "user_id": "...",
+  "limit": 10
+}
+```
+
+**GET /health** - Service health
+
+---
+
+## File Manifest (Key Files Only)
+
+```
+project-lyra/
+├── .env                           # Root environment variables
+├── docker-compose.yml             # Service definitions (152 lines)
+├── CHANGELOG.md                   # Version history (836 lines)
+├── README.md                      # User documentation (610 lines)
+├── PROJECT_SUMMARY.md             # This file (AI context)
+│
+├── cortex/                        # Reasoning engine
+│   ├── Dockerfile                 # Single-worker constraint documented
+│   ├── requirements.txt
+│   ├── .env                       # Cortex overrides
+│   ├── main.py                    # FastAPI initialization
+│   ├── router.py                  # Routes (306 lines)
+│   ├── context.py                 # Context aggregation
+│   │
+│   ├── intake/                    # Short-term memory (embedded)
+│   │   ├── __init__.py           # Package exports
+│   │   └── intake.py             # Core logic (367 lines)
+│   │
+│   ├── reasoning/                 # Reasoning pipeline
+│   │   ├── reflection.py         # Meta-awareness
+│   │   ├── reasoning.py          # Draft generation
+│   │   └── refine.py             # Refinement
+│   │
+│   ├── persona/                   # Personality layer
+│   │   ├── speak.py              # Persona application
+│   │   └── identity.py           # Persona loader
+│   │
+│   └── llm/                       # LLM integration
+│       └── llm_router.py         # Backend selector
+│
+├── core/relay/                    # Orchestrator
+│   ├── server.js                 # Express server (Node.js)
+│   └── package.json
+│
+├── neomem/                        # Long-term memory
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   ├── .env                       # NeoMem overrides
+│   └── main.py                   # Memory API
+│
+└── rag/                           # RAG system (disabled)
+    ├── rag_api.py
+    ├── rag_chat_import.py
+    └── chromadb/
+```
+
+---
+
+## Final Notes for AI Assistants
+
+### What You Should Know Before Making Changes
+
+1. **SESSIONS is sacred** - It's a module-level global in `cortex/intake/intake.py`. Don't move it, don't duplicate it, don't make it a class attribute. It must remain a singleton.
+
+2. **Single-worker is mandatory** - Until SESSIONS is migrated to Redis, Cortex MUST run with a single Uvicorn worker. Multi-worker will cause SESSIONS to be inconsistent.
+
+3. **Lenient error handling** - The `/ingest` endpoint and other parts of the pipeline use lenient error handling: log errors but always return success. Never fail the chat pipeline.
+
+4. **Backend routing is environment-driven** - Don't hardcode LLM URLs. Use the `{MODULE}_LLM` environment variables and the llm_router.py system.
+
+5. **Intake is embedded** - Don't try to make HTTP calls to Intake. Use direct Python imports: `from intake.intake import ...`
+
+6. **Test with diagnostic endpoints** - Always use `/debug/sessions` and `/debug/summary` to verify SESSIONS behavior after changes.
+
+7. **Follow the changelog format** - When documenting changes, use the chronological format established in CHANGELOG.md v0.5.1. Group by version, then by change type (Fixed, Added, Changed, etc.).
+
+### When You Need Help
+
+- **SESSIONS issues**: Check `cortex/intake/intake.py` lines 11-14 for initialization, lines 325-366 for `add_exchange_internal()`
+- **Routing issues**: Check `cortex/router.py` lines 65-189 for `/reason`, lines 201-233 for `/ingest`
+- **LLM backend issues**: Check `cortex/llm/llm_router.py` for backend selection logic
+- **Environment variables**: Check `.env` lines 13-40 for LLM backends, lines 28-34 for module selection
+
+### Most Important Thing
+
+**This project values reliability over features.** It's better to have a simple, working system than a complex, broken one. When in doubt, keep it simple, log everything, and never fail silently.
+
+---
+
+**End of AI Context Summary**
+
+*This document is maintained to provide complete context for AI assistants working on Project Lyra. Last updated: v0.5.1 (2025-12-11)*
diff --git a/docs/lyra_tree.txt b/docs/lyra_tree.txt
new file mode 100644
index 0000000..f0b6df3
--- /dev/null
+++ b/docs/lyra_tree.txt
@@ -0,0 +1,441 @@
+├── CHANGELOG.md
+├── core
+│   ├── env experiments
+│   ├── persona-sidecar
+│   │   ├── Dockerfile
+│   │   ├── package.json
+│   │   ├── persona-server.js
+│   │   └── personas.json
+│   ├── relay
+│   │   ├── Dockerfile
+│   │   ├── lib
+│   │   │   ├── cortex.js
+│   │   │   └── llm.js
+│   │   ├── package.json
+│   │   ├── package-lock.json
+│   │   ├── server.js
+│   │   ├── sessions
+│   │   │   ├── default.jsonl
+│   │   │   ├── sess-6rxu7eia.json
+│   │   │   ├── sess-6rxu7eia.jsonl
+│   │   │   ├── sess-l08ndm60.json
+│   │   │   └── sess-l08ndm60.jsonl
+│   │   └── test-llm.js
+│   ├── relay-backup
+│   └── ui
+│       ├── index.html
+│       ├── manifest.json
+│       └── style.css
+├── cortex
+│   ├── context.py
+│   ├── Dockerfile
+│   ├── ingest
+│   │   ├── ingest_handler.py
+│   │   ├── __init__.py
+│   │   └── intake_client.py
+│   ├── intake
+│   │   ├── __init__.py
+│   │   ├── intake.py
+│   │   └── logs
+│   ├── llm
+│   │   ├── __init__.py
+│   │   └── llm_router.py
+│   ├── logs
+│   │   ├── cortex_verbose_debug.log
+│   │   └── reflections.log
+│   ├── main.py
+│   ├── neomem_client.py
+│   ├── persona
+│   │   ├── identity.py
+│   │   ├── __init__.py
+│   │   └── speak.py
+│   ├── rag.py
+│   ├── reasoning
+│   │   ├── __init__.py
+│   │   ├── reasoning.py
+│   │   ├── refine.py
+│   │   └── reflection.py
+│   ├── requirements.txt
+│   ├── router.py
+│   ├── tests
+│   └── utils
+│       ├── config.py
+│       ├── __init__.py
+│       ├── log_utils.py
+│       └── schema.py
+├── deprecated.env.txt
+├── DEPRECATED_FILES.md
+├── docker-compose.yml
+├── docs
+│   ├── ARCHITECTURE_v0-6-0.md
+│   ├── ENVIRONMENT_VARIABLES.md
+│   ├── lyra_tree.txt
+│   └── PROJECT_SUMMARY.md
+├── intake-logs
+│   └── summaries.log
+├── neomem
+│   ├── _archive
+│   │   └── old_servers
+│   │       ├── main_backup.py
+│   │       └── main_dev.py
+│   ├── docker-compose.yml
+│   ├── Dockerfile
+│   ├── neomem
+│   │   ├── api
+│   │   ├── client
+│   │   │   ├── __init__.py
+│   │   │   ├── main.py
+│   │   │   ├── project.py
+│   │   │   └── utils.py
+│   │   ├── configs
+│   │   │   ├── base.py
+│   │   │   ├── embeddings
+│   │   │   │   ├── base.py
+│   │   │   │   └── __init__.py
+│   │   │   ├── enums.py
+│   │   │   ├── __init__.py
+│   │   │   ├── llms
+│   │   │   │   ├── anthropic.py
+│   │   │   │   ├── aws_bedrock.py
+│   │   │   │   ├── azure.py
+│   │   │   │   ├── base.py
+│   │   │   │   ├── deepseek.py
+│   │   │   │   ├── __init__.py
+│   │   │   │   ├── lmstudio.py
+│   │   │   │   ├── ollama.py
+│   │   │   │   ├── openai.py
+│   │   │   │   └── vllm.py
+│   │   │   ├── prompts.py
+│   │   │   └── vector_stores
+│   │   │       ├── azure_ai_search.py
+│   │   │       ├── azure_mysql.py
+│   │   │       ├── baidu.py
+│   │   │       ├── chroma.py
+│   │   │       ├── databricks.py
+│   │   │       ├── elasticsearch.py
+│   │   │       ├── faiss.py
+│   │   │       ├── __init__.py
+│   │   │       ├── langchain.py
+│   │   │       ├── milvus.py
+│   │   │       ├── mongodb.py
+│   │   │       ├── neptune.py
+│   │   │       ├── opensearch.py
+│   │   │       ├── pgvector.py
+│   │   │       ├── pinecone.py
+│   │   │       ├── qdrant.py
+│   │   │       ├── redis.py
+│   │   │       ├── s3_vectors.py
+│   │   │       ├── supabase.py
+│   │   │       ├── upstash_vector.py
+│   │   │       ├── valkey.py
+│   │   │       ├── vertex_ai_vector_search.py
+│   │   │       └── weaviate.py
+│   │   ├── core
+│   │   ├── embeddings
+│   │   │   ├── aws_bedrock.py
+│   │   │   ├── azure_openai.py
+│   │   │   ├── base.py
+│   │   │   ├── configs.py
+│   │   │   ├── gemini.py
+│   │   │   ├── huggingface.py
+│   │   │   ├── __init__.py
+│   │   │   ├── langchain.py
+│   │   │   ├── lmstudio.py
+│   │   │   ├── mock.py
+│   │   │   ├── ollama.py
+│   │   │   ├── openai.py
+│   │   │   ├── together.py
+│   │   │   └── vertexai.py
+│   │   ├── exceptions.py
+│   │   ├── graphs
+│   │   │   ├── configs.py
+│   │   │   ├── __init__.py
+│   │   │   ├── neptune
+│   │   │   │   ├── base.py
+│   │   │   │   ├── __init__.py
+│   │   │   │   ├── neptunedb.py
+│   │   │   │   └── neptunegraph.py
+│   │   │   ├── tools.py
+│   │   │   └── utils.py
+│   │   ├── __init__.py
+│   │   ├── LICENSE
+│   │   ├── llms
+│   │   │   ├── anthropic.py
+│   │   │   ├── aws_bedrock.py
+│   │   │   ├── azure_openai.py
+│   │   │   ├── azure_openai_structured.py
+│   │   │   ├── base.py
+│   │   │   ├── configs.py
+│   │   │   ├── deepseek.py
+│   │   │   ├── gemini.py
+│   │   │   ├── groq.py
+│   │   │   ├── __init__.py
+│   │   │   ├── langchain.py
+│   │   │   ├── litellm.py
+│   │   │   ├── lmstudio.py
+│   │   │   ├── ollama.py
+│   │   │   ├── openai.py
+│   │   │   ├── openai_structured.py
+│   │   │   ├── sarvam.py
+│   │   │   ├── together.py
+│   │   │   ├── vllm.py
+│   │   │   └── xai.py
+│   │   ├── memory
+│   │   │   ├── base.py
+│   │   │   ├── graph_memory.py
+│   │   │   ├── __init__.py
+│   │   │   ├── kuzu_memory.py
+│   │   │   ├── main.py
+│   │   │   ├── memgraph_memory.py
+│   │   │   ├── setup.py
+│   │   │   ├── storage.py
+│   │   │   ├── telemetry.py
+│   │   │   └── utils.py
+│   │   ├── proxy
+│   │   │   ├── __init__.py
+│   │   │   └── main.py
+│   │   ├── server
+│   │   │   ├── dev.Dockerfile
+│   │   │   ├── docker-compose.yaml
+│   │   │   ├── Dockerfile
+│   │   │   ├── main_old.py
+│   │   │   ├── main.py
+│   │   │   ├── Makefile
+│   │   │   ├── README.md
+│   │   │   └── requirements.txt
+│   │   ├── storage
+│   │   ├── utils
+│   │   │   └── factory.py
+│   │   └── vector_stores
+│   │       ├── azure_ai_search.py
+│   │       ├── azure_mysql.py
+│   │       ├── baidu.py
+│   │       ├── base.py
+│   │       ├── chroma.py
+│   │       ├── configs.py
+│   │       ├── databricks.py
+│   │       ├── elasticsearch.py
+│   │       ├── faiss.py
+│   │       ├── __init__.py
+│   │       ├── langchain.py
+│   │       ├── milvus.py
+│   │       ├── mongodb.py
+│   │       ├── neptune_analytics.py
+│   │       ├── opensearch.py
+│   │       ├── pgvector.py
+│   │       ├── pinecone.py
+│   │       ├── qdrant.py
+│   │       ├── redis.py
+│   │       ├── s3_vectors.py
+│   │       ├── supabase.py
+│   │       ├── upstash_vector.py
+│   │       ├── valkey.py
+│   │       ├── vertex_ai_vector_search.py
+│   │       └── weaviate.py
+│   ├── neomem_history
+│   │   └── history.db
+│   ├── pyproject.toml
+│   ├── README.md
+│   └── requirements.txt
+├── neomem_history
+│   └── history.db
+├── rag
+│   ├── chatlogs
+│   │   └── lyra
+│   │       ├── 0000_Wire_ROCm_to_Cortex.json
+│   │       ├── 0001_Branch___10_22_ct201branch-ssh_tut.json
+│   │       ├── 0002_cortex_LLMs_11-1-25.json
+│   │       ├── 0003_RAG_beta.json
+│   │       ├── 0005_Cortex_v0_4_0_planning.json
+│   │       ├── 0006_Cortex_v0_4_0_Refinement.json
+│   │       ├── 0009_Branch___Cortex_v0_4_0_planning.json
+│   │       ├── 0012_Cortex_4_-_neomem_11-1-25.json
+│   │       ├── 0016_Memory_consolidation_concept.json
+│   │       ├── 0017_Model_inventory_review.json
+│   │       ├── 0018_Branch___Memory_consolidation_concept.json
+│   │       ├── 0022_Branch___Intake_conversation_summaries.json
+│   │       ├── 0026_Intake_conversation_summaries.json
+│   │       ├── 0027_Trilium_AI_LLM_setup.json
+│   │       ├── 0028_LLMs_and_sycophancy_levels.json
+│   │       ├── 0031_UI_improvement_plan.json
+│   │       ├── 0035_10_27-neomem_update.json
+│   │       ├── 0044_Install_llama_cpp_on_ct201.json
+│   │       ├── 0045_AI_task_assistant.json
+│   │       ├── 0047_Project_scope_creation.json
+│   │       ├── 0052_View_docker_container_logs.json
+│   │       ├── 0053_10_21-Proxmox_fan_control.json
+│   │       ├── 0054_10_21-pytorch_branch_Quant_experiments.json
+│   │       ├── 0055_10_22_ct201branch-ssh_tut.json
+│   │       ├── 0060_Lyra_project_folder_issue.json
+│   │       ├── 0062_Build_pytorch_API.json
+│   │       ├── 0063_PokerBrain_dataset_structure.json
+│   │       ├── 0065_Install_PyTorch_setup.json
+│   │       ├── 0066_ROCm_PyTorch_setup_quirks.json
+│   │       ├── 0067_VM_model_setup_steps.json
+│   │       ├── 0070_Proxmox_disk_error_fix.json
+│   │       ├── 0072_Docker_Compose_vs_Portainer.json
+│   │       ├── 0073_Check_system_temps_Proxmox.json
+│   │       ├── 0075_Cortex_gpu_progress.json
+│   │       ├── 0076_Backup_Proxmox_before_upgrade.json
+│   │       ├── 0077_Storage_cleanup_advice.json
+│   │       ├── 0082_Install_ROCm_on_Proxmox.json
+│   │       ├── 0088_Thalamus_program_summary.json
+│   │       ├── 0094_Cortex_blueprint_development.json
+│   │       ├── 0095_mem0_advancments.json
+│   │       ├── 0096_Embedding_provider_swap.json
+│   │       ├── 0097_Update_git_commit_steps.json
+│   │       ├── 0098_AI_software_description.json
+│   │       ├── 0099_Seed_memory_process.json
+│   │       ├── 0100_Set_up_Git_repo.json
+│   │       ├── 0101_Customize_embedder_setup.json
+│   │       ├── 0102_Seeding_Local_Lyra_memory.json
+│   │       ├── 0103_Mem0_seeding_part_3.json
+│   │       ├── 0104_Memory_build_prompt.json
+│   │       ├── 0105_Git_submodule_setup_guide.json
+│   │       ├── 0106_Serve_UI_on_LAN.json
+│   │       ├── 0107_AI_name_suggestion.json
+│   │       ├── 0108_Room_X_planning_update.json
+│   │       ├── 0109_Salience_filtering_design.json
+│   │       ├── 0110_RoomX_Cortex_build.json
+│   │       ├── 0119_Explain_Lyra_cortex_idea.json
+│   │       ├── 0120_Git_submodule_organization.json
+│   │       ├── 0121_Web_UI_fix_guide.json
+│   │       ├── 0122_UI_development_planning.json
+│   │       ├── 0123_NVGRAM_debugging_steps.json
+│   │       ├── 0124_NVGRAM_setup_troubleshooting.json
+│   │       ├── 0125_NVGRAM_development_update.json
+│   │       ├── 0126_RX_-_NeVGRAM_New_Features.json
+│   │       ├── 0127_Error_troubleshooting_steps.json
+│   │       ├── 0135_Proxmox_backup_with_ABB.json
+│   │       ├── 0151_Auto-start_Lyra-Core_VM.json
+│   │       ├── 0156_AI_GPU_benchmarks_comparison.json
+│   │       └── 0251_Lyra_project_handoff.json
+│   ├── chromadb
+│   │   ├── c4f701ee-1978-44a1-9df4-3e865b5d33c1
+│   │   │   ├── data_level0.bin
+│   │   │   ├── header.bin
+│   │   │   ├── index_metadata.pickle
+│   │   │   ├── length.bin
+│   │   │   └── link_lists.bin
+│   │   └── chroma.sqlite3
+│   ├── import.log
+│   ├── lyra-chatlogs
+│   │   ├── 0000_Wire_ROCm_to_Cortex.json
+│   │   ├── 0001_Branch___10_22_ct201branch-ssh_tut.json
+│   │   ├── 0002_cortex_LLMs_11-1-25.json
+│   │   └── 0003_RAG_beta.json
+│   ├── rag_api.py
+│   ├── rag_build.py
+│   ├── rag_chat_import.py
+│   └── rag_query.py
+├── README.md
+└── volumes
+    ├── neo4j_data
+    │   ├── databases
+    │   │   ├── neo4j
+    │   │   │   ├── database_lock
+    │   │   │   ├── id-buffer.tmp.0
+    │   │   │   ├── neostore
+    │   │   │   ├── neostore.counts.db
+    │   │   │   ├── neostore.indexstats.db
+    │   │   │   ├── neostore.labeltokenstore.db
+    │   │   │   ├── neostore.labeltokenstore.db.id
+    │   │   │   ├── neostore.labeltokenstore.db.names
+    │   │   │   ├── neostore.labeltokenstore.db.names.id
+    │   │   │   ├── neostore.nodestore.db
+    │   │   │   ├── neostore.nodestore.db.id
+    │   │   │   ├── neostore.nodestore.db.labels
+    │   │   │   ├── neostore.nodestore.db.labels.id
+    │   │   │   ├── neostore.propertystore.db
+    │   │   │   ├── neostore.propertystore.db.arrays
+    │   │   │   ├── neostore.propertystore.db.arrays.id
+    │   │   │   ├── neostore.propertystore.db.id
+    │   │   │   ├── neostore.propertystore.db.index
+    │   │   │   ├── neostore.propertystore.db.index.id
+    │   │   │   ├── neostore.propertystore.db.index.keys
+    │   │   │   ├── neostore.propertystore.db.index.keys.id
+    │   │   │   ├── neostore.propertystore.db.strings
+    │   │   │   ├── neostore.propertystore.db.strings.id
+    │   │   │   ├── neostore.relationshipgroupstore.db
+    │   │   │   ├── neostore.relationshipgroupstore.db.id
+    │   │   │   ├── neostore.relationshipgroupstore.degrees.db
+    │   │   │   ├── neostore.relationshipstore.db
+    │   │   │   ├── neostore.relationshipstore.db.id
+    │   │   │   ├── neostore.relationshiptypestore.db
+    │   │   │   ├── neostore.relationshiptypestore.db.id
+    │   │   │   ├── neostore.relationshiptypestore.db.names
+    │   │   │   ├── neostore.relationshiptypestore.db.names.id
+    │   │   │   ├── neostore.schemastore.db
+    │   │   │   ├── neostore.schemastore.db.id
+    │   │   │   └── schema
+    │   │   │       └── index
+    │   │   │           └── token-lookup-1.0
+    │   │   │               ├── 1
+    │   │   │               │   └── index-1
+    │   │   │               └── 2
+    │   │   │                   └── index-2
+    │   │   ├── store_lock
+    │   │   └── system
+    │   │       ├── database_lock
+    │   │       ├── id-buffer.tmp.0
+    │   │       ├── neostore
+    │   │       ├── neostore.counts.db
+    │   │       ├── neostore.indexstats.db
+    │   │       ├── neostore.labeltokenstore.db
+    │   │       ├── neostore.labeltokenstore.db.id
+    │   │       ├── neostore.labeltokenstore.db.names
+    │   │       ├── neostore.labeltokenstore.db.names.id
+    │   │       ├── neostore.nodestore.db
+    │   │       ├── neostore.nodestore.db.id
+    │   │       ├── neostore.nodestore.db.labels
+    │   │       ├── neostore.nodestore.db.labels.id
+    │   │       ├── neostore.propertystore.db
+    │   │       ├── neostore.propertystore.db.arrays
+    │   │       ├── neostore.propertystore.db.arrays.id
+    │   │       ├── neostore.propertystore.db.id
+    │   │       ├── neostore.propertystore.db.index
+    │   │       ├── neostore.propertystore.db.index.id
+    │   │       ├── neostore.propertystore.db.index.keys
+    │   │       ├── neostore.propertystore.db.index.keys.id
+    │   │       ├── neostore.propertystore.db.strings
+    │   │       ├── neostore.propertystore.db.strings.id
+    │   │       ├── neostore.relationshipgroupstore.db
+    │   │       ├── neostore.relationshipgroupstore.db.id
+    │   │       ├── neostore.relationshipgroupstore.degrees.db
+    │   │       ├── neostore.relationshipstore.db
+    │   │       ├── neostore.relationshipstore.db.id
+    │   │       ├── neostore.relationshiptypestore.db
+    │   │       ├── neostore.relationshiptypestore.db.id
+    │   │       ├── neostore.relationshiptypestore.db.names
+    │   │       ├── neostore.relationshiptypestore.db.names.id
+    │   │       ├── neostore.schemastore.db
+    │   │       ├── neostore.schemastore.db.id
+    │   │       └── schema
+    │   │           └── index
+    │   │               ├── range-1.0
+    │   │               │   ├── 3
+    │   │               │   │   └── index-3
+    │   │               │   ├── 4
+    │   │               │   │   └── index-4
+    │   │               │   ├── 7
+    │   │               │   │   └── index-7
+    │   │               │   ├── 8
+    │   │               │   │   └── index-8
+    │   │               │   └── 9
+    │   │               │       └── index-9
+    │   │               └── token-lookup-1.0
+    │   │                   ├── 1
+    │   │                   │   └── index-1
+    │   │                   └── 2
+    │   │                       └── index-2
+    │   ├── dbms
+    │   │   └── auth.ini
+    │   ├── server_id
+    │   └── transactions
+    │       ├── neo4j
+    │       │   ├── checkpoint.0
+    │       │   └── neostore.transaction.db.0
+    │       └── system
+    │           ├── checkpoint.0
+    │           └── neostore.transaction.db.0
+    └── postgres_data  [error opening dir]
\ No newline at end of file
diff --git a/lyra_tree.txt b/lyra_tree.txt
deleted file mode 100644
index 289c8b6..0000000
--- a/lyra_tree.txt
+++ /dev/null
@@ -1,460 +0,0 @@
-/home/serversdown/project-lyra
-├── CHANGELOG.md
-├── core
-│   ├── backups
-│   │   ├── mem0_20250927_221040.sql
-│   │   └── mem0_history_20250927_220925.tgz
-│   ├── docker-compose.yml
-│   ├── .env
-│   ├── env experiments
-│   │   ├── .env
-│   │   ├── .env.local
-│   │   └── .env.openai
-│   ├── persona-sidecar
-│   │   ├── Dockerfile
-│   │   ├── package.json
-│   │   ├── persona-server.js
-│   │   └── personas.json
-│   ├── PROJECT_SUMMARY.md
-│   ├── relay
-│   │   ├── Dockerfile
-│   │   ├── .dockerignore
-│   │   ├── lib
-│   │   │   ├── cortex.js
-│   │   │   └── llm.js
-│   │   ├── package.json
-│   │   ├── package-lock.json
-│   │   ├── server.js
-│   │   ├── sessions
-│   │   │   ├── sess-6rxu7eia.json
-│   │   │   ├── sess-6rxu7eia.jsonl
-│   │   │   ├── sess-l08ndm60.json
-│   │   │   └── sess-l08ndm60.jsonl
-│   │   └── test-llm.js
-│   └── ui
-│       ├── index.html
-│       ├── manifest.json
-│       └── style.css
-├── cortex
-│   ├── Dockerfile
-│   ├── .env
-│   ├── ingest
-│   │   ├── ingest_handler.py
-│   │   └── intake_client.py
-│   ├── llm
-│   │   ├── llm_router.py
-│   │   └── resolve_llm_url.py
-│   ├── logs
-│   │   └── reflections.log
-│   ├── main.py
-│   ├── neomem_client.py
-│   ├── persona
-│   │   └── speak.py
-│   ├── rag.py
-│   ├── reasoning
-│   │   ├── reasoning.py
-│   │   ├── refine.py
-│   │   └── reflection.py
-│   ├── requirements.txt
-│   ├── router.py
-│   ├── tests
-│   └── utils
-│       ├── config.py
-│       ├── log_utils.py
-│       └── schema.py
-├── deprecated.env.txt
-├── docker-compose.yml
-├── .env
-├── .gitignore
-├── intake
-│   ├── Dockerfile
-│   ├── .env
-│   ├── intake.py
-│   ├── logs
-│   ├── requirements.txt
-│   └── venv
-│       ├── bin
-│       │   ├── python -> python3
-│       │   ├── python3 -> /usr/bin/python3
-│       │   └── python3.10 -> python3
-│       ├── include
-│       ├── lib
-│       │   └── python3.10
-│       │       └── site-packages
-│       ├── lib64 -> lib
-│       └── pyvenv.cfg
-├── intake-logs
-│   └── summaries.log
-├── lyra_tree.txt
-├── neomem
-│   ├── _archive
-│   │   └── old_servers
-│   │       ├── main_backup.py
-│   │       └── main_dev.py
-│   ├── docker-compose.yml
-│   ├── Dockerfile
-│   ├── .env
-│   ├── .gitignore
-│   ├── neomem
-│   │   ├── api
-│   │   ├── client
-│   │   │   ├── __init__.py
-│   │   │   ├── main.py
-│   │   │   ├── project.py
-│   │   │   └── utils.py
-│   │   ├── configs
-│   │   │   ├── base.py
-│   │   │   ├── embeddings
-│   │   │   │   ├── base.py
-│   │   │   │   └── __init__.py
-│   │   │   ├── enums.py
-│   │   │   ├── __init__.py
-│   │   │   ├── llms
-│   │   │   │   ├── anthropic.py
-│   │   │   │   ├── aws_bedrock.py
-│   │   │   │   ├── azure.py
-│   │   │   │   ├── base.py
-│   │   │   │   ├── deepseek.py
-│   │   │   │   ├── __init__.py
-│   │   │   │   ├── lmstudio.py
-│   │   │   │   ├── ollama.py
-│   │   │   │   ├── openai.py
-│   │   │   │   └── vllm.py
-│   │   │   ├── prompts.py
-│   │   │   └── vector_stores
-│   │   │       ├── azure_ai_search.py
-│   │   │       ├── azure_mysql.py
-│   │   │       ├── baidu.py
-│   │   │       ├── chroma.py
-│   │   │       ├── databricks.py
-│   │   │       ├── elasticsearch.py
-│   │   │       ├── faiss.py
-│   │   │       ├── __init__.py
-│   │   │       ├── langchain.py
-│   │   │       ├── milvus.py
-│   │   │       ├── mongodb.py
-│   │   │       ├── neptune.py
-│   │   │       ├── opensearch.py
-│   │   │       ├── pgvector.py
-│   │   │       ├── pinecone.py
-│   │   │       ├── qdrant.py
-│   │   │       ├── redis.py
-│   │   │       ├── s3_vectors.py
-│   │   │       ├── supabase.py
-│   │   │       ├── upstash_vector.py
-│   │   │       ├── valkey.py
-│   │   │       ├── vertex_ai_vector_search.py
-│   │   │       └── weaviate.py
-│   │   ├── core
-│   │   ├── embeddings
-│   │   │   ├── aws_bedrock.py
-│   │   │   ├── azure_openai.py
-│   │   │   ├── base.py
-│   │   │   ├── configs.py
-│   │   │   ├── gemini.py
-│   │   │   ├── huggingface.py
-│   │   │   ├── __init__.py
-│   │   │   ├── langchain.py
-│   │   │   ├── lmstudio.py
-│   │   │   ├── mock.py
-│   │   │   ├── ollama.py
-│   │   │   ├── openai.py
-│   │   │   ├── together.py
-│   │   │   └── vertexai.py
-│   │   ├── exceptions.py
-│   │   ├── graphs
-│   │   │   ├── configs.py
-│   │   │   ├── __init__.py
-│   │   │   ├── neptune
-│   │   │   │   ├── base.py
-│   │   │   │   ├── __init__.py
-│   │   │   │   ├── neptunedb.py
-│   │   │   │   └── neptunegraph.py
-│   │   │   ├── tools.py
-│   │   │   └── utils.py
-│   │   ├── __init__.py
-│   │   ├── LICENSE
-│   │   ├── llms
-│   │   │   ├── anthropic.py
-│   │   │   ├── aws_bedrock.py
-│   │   │   ├── azure_openai.py
-│   │   │   ├── azure_openai_structured.py
-│   │   │   ├── base.py
-│   │   │   ├── configs.py
-│   │   │   ├── deepseek.py
-│   │   │   ├── gemini.py
-│   │   │   ├── groq.py
-│   │   │   ├── __init__.py
-│   │   │   ├── langchain.py
-│   │   │   ├── litellm.py
-│   │   │   ├── lmstudio.py
-│   │   │   ├── ollama.py
-│   │   │   ├── openai.py
-│   │   │   ├── openai_structured.py
-│   │   │   ├── sarvam.py
-│   │   │   ├── together.py
-│   │   │   ├── vllm.py
-│   │   │   └── xai.py
-│   │   ├── memory
-│   │   │   ├── base.py
-│   │   │   ├── graph_memory.py
-│   │   │   ├── __init__.py
-│   │   │   ├── kuzu_memory.py
-│   │   │   ├── main.py
-│   │   │   ├── memgraph_memory.py
-│   │   │   ├── setup.py
-│   │   │   ├── storage.py
-│   │   │   ├── telemetry.py
-│   │   │   └── utils.py
-│   │   ├── proxy
-│   │   │   ├── __init__.py
-│   │   │   └── main.py
-│   │   ├── server
-│   │   │   ├── dev.Dockerfile
-│   │   │   ├── docker-compose.yaml
-│   │   │   ├── Dockerfile
-│   │   │   ├── main_old.py
-│   │   │   ├── main.py
-│   │   │   ├── Makefile
-│   │   │   ├── README.md
-│   │   │   └── requirements.txt
-│   │   ├── storage
-│   │   ├── utils
-│   │   │   └── factory.py
-│   │   └── vector_stores
-│   │       ├── azure_ai_search.py
-│   │       ├── azure_mysql.py
-│   │       ├── baidu.py
-│   │       ├── base.py
-│   │       ├── chroma.py
-│   │       ├── configs.py
-│   │       ├── databricks.py
-│   │       ├── elasticsearch.py
-│   │       ├── faiss.py
-│   │       ├── __init__.py
-│   │       ├── langchain.py
-│   │       ├── milvus.py
-│   │       ├── mongodb.py
-│   │       ├── neptune_analytics.py
-│   │       ├── opensearch.py
-│   │       ├── pgvector.py
-│   │       ├── pinecone.py
-│   │       ├── qdrant.py
-│   │       ├── redis.py
-│   │       ├── s3_vectors.py
-│   │       ├── supabase.py
-│   │       ├── upstash_vector.py
-│   │       ├── valkey.py
-│   │       ├── vertex_ai_vector_search.py
-│   │       └── weaviate.py
-│   ├── neomem_history
-│   │   └── history.db
-│   ├── pyproject.toml
-│   ├── README.md
-│   └── requirements.txt
-├── neomem_history
-│   └── history.db
-├── rag
-│   ├── chatlogs
-│   │   └── lyra
-│   │       ├── 0000_Wire_ROCm_to_Cortex.json
-│   │       ├── 0001_Branch___10_22_ct201branch-ssh_tut.json
-│   │       ├── 0002_cortex_LLMs_11-1-25.json
-│   │       ├── 0003_RAG_beta.json
-│   │       ├── 0005_Cortex_v0_4_0_planning.json
-│   │       ├── 0006_Cortex_v0_4_0_Refinement.json
-│   │       ├── 0009_Branch___Cortex_v0_4_0_planning.json
-│   │       ├── 0012_Cortex_4_-_neomem_11-1-25.json
-│   │       ├── 0016_Memory_consolidation_concept.json
-│   │       ├── 0017_Model_inventory_review.json
-│   │       ├── 0018_Branch___Memory_consolidation_concept.json
-│   │       ├── 0022_Branch___Intake_conversation_summaries.json
-│   │       ├── 0026_Intake_conversation_summaries.json
-│   │       ├── 0027_Trilium_AI_LLM_setup.json
-│   │       ├── 0028_LLMs_and_sycophancy_levels.json
-│   │       ├── 0031_UI_improvement_plan.json
-│   │       ├── 0035_10_27-neomem_update.json
-│   │       ├── 0044_Install_llama_cpp_on_ct201.json
-│   │       ├── 0045_AI_task_assistant.json
-│   │       ├── 0047_Project_scope_creation.json
-│   │       ├── 0052_View_docker_container_logs.json
-│   │       ├── 0053_10_21-Proxmox_fan_control.json
-│   │       ├── 0054_10_21-pytorch_branch_Quant_experiments.json
-│   │       ├── 0055_10_22_ct201branch-ssh_tut.json
-│   │       ├── 0060_Lyra_project_folder_issue.json
-│   │       ├── 0062_Build_pytorch_API.json
-│   │       ├── 0063_PokerBrain_dataset_structure.json
-│   │       ├── 0065_Install_PyTorch_setup.json
-│   │       ├── 0066_ROCm_PyTorch_setup_quirks.json
-│   │       ├── 0067_VM_model_setup_steps.json
-│   │       ├── 0070_Proxmox_disk_error_fix.json
-│   │       ├── 0072_Docker_Compose_vs_Portainer.json
-│   │       ├── 0073_Check_system_temps_Proxmox.json
-│   │       ├── 0075_Cortex_gpu_progress.json
-│   │       ├── 0076_Backup_Proxmox_before_upgrade.json
-│   │       ├── 0077_Storage_cleanup_advice.json
-│   │       ├── 0082_Install_ROCm_on_Proxmox.json
-│   │       ├── 0088_Thalamus_program_summary.json
-│   │       ├── 0094_Cortex_blueprint_development.json
-│   │       ├── 0095_mem0_advancments.json
-│   │       ├── 0096_Embedding_provider_swap.json
-│   │       ├── 0097_Update_git_commit_steps.json
-│   │       ├── 0098_AI_software_description.json
-│   │       ├── 0099_Seed_memory_process.json
-│   │       ├── 0100_Set_up_Git_repo.json
-│   │       ├── 0101_Customize_embedder_setup.json
-│   │       ├── 0102_Seeding_Local_Lyra_memory.json
-│   │       ├── 0103_Mem0_seeding_part_3.json
-│   │       ├── 0104_Memory_build_prompt.json
-│   │       ├── 0105_Git_submodule_setup_guide.json
-│   │       ├── 0106_Serve_UI_on_LAN.json
-│   │       ├── 0107_AI_name_suggestion.json
-│   │       ├── 0108_Room_X_planning_update.json
-│   │       ├── 0109_Salience_filtering_design.json
-│   │       ├── 0110_RoomX_Cortex_build.json
-│   │       ├── 0119_Explain_Lyra_cortex_idea.json
-│   │       ├── 0120_Git_submodule_organization.json
-│   │       ├── 0121_Web_UI_fix_guide.json
-│   │       ├── 0122_UI_development_planning.json
-│   │       ├── 0123_NVGRAM_debugging_steps.json
-│   │       ├── 0124_NVGRAM_setup_troubleshooting.json
-│   │       ├── 0125_NVGRAM_development_update.json
-│   │       ├── 0126_RX_-_NeVGRAM_New_Features.json
-│   │       ├── 0127_Error_troubleshooting_steps.json
-│   │       ├── 0135_Proxmox_backup_with_ABB.json
-│   │       ├── 0151_Auto-start_Lyra-Core_VM.json
-│   │       ├── 0156_AI_GPU_benchmarks_comparison.json
-│   │       └── 0251_Lyra_project_handoff.json
-│   ├── chromadb
-│   │   ├── c4f701ee-1978-44a1-9df4-3e865b5d33c1
-│   │   │   ├── data_level0.bin
-│   │   │   ├── header.bin
-│   │   │   ├── index_metadata.pickle
-│   │   │   ├── length.bin
-│   │   │   └── link_lists.bin
-│   │   └── chroma.sqlite3
-│   ├── .env
-│   ├── import.log
-│   ├── lyra-chatlogs
-│   │   ├── 0000_Wire_ROCm_to_Cortex.json
-│   │   ├── 0001_Branch___10_22_ct201branch-ssh_tut.json
-│   │   ├── 0002_cortex_LLMs_11-1-25.json
-│   │   └── 0003_RAG_beta.json
-│   ├── rag_api.py
-│   ├── rag_build.py
-│   ├── rag_chat_import.py
-│   └── rag_query.py
-├── README.md
-├── vllm-mi50.md
-└── volumes
-    ├── neo4j_data
-    │   ├── databases
-    │   │   ├── neo4j
-    │   │   │   ├── database_lock
-    │   │   │   ├── id-buffer.tmp.0
-    │   │   │   ├── neostore
-    │   │   │   ├── neostore.counts.db
-    │   │   │   ├── neostore.indexstats.db
-    │   │   │   ├── neostore.labeltokenstore.db
-    │   │   │   ├── neostore.labeltokenstore.db.id
-    │   │   │   ├── neostore.labeltokenstore.db.names
-    │   │   │   ├── neostore.labeltokenstore.db.names.id
-    │   │   │   ├── neostore.nodestore.db
-    │   │   │   ├── neostore.nodestore.db.id
-    │   │   │   ├── neostore.nodestore.db.labels
-    │   │   │   ├── neostore.nodestore.db.labels.id
-    │   │   │   ├── neostore.propertystore.db
-    │   │   │   ├── neostore.propertystore.db.arrays
-    │   │   │   ├── neostore.propertystore.db.arrays.id
-    │   │   │   ├── neostore.propertystore.db.id
-    │   │   │   ├── neostore.propertystore.db.index
-    │   │   │   ├── neostore.propertystore.db.index.id
-    │   │   │   ├── neostore.propertystore.db.index.keys
-    │   │   │   ├── neostore.propertystore.db.index.keys.id
-    │   │   │   ├── neostore.propertystore.db.strings
-    │   │   │   ├── neostore.propertystore.db.strings.id
-    │   │   │   ├── neostore.relationshipgroupstore.db
-    │   │   │   ├── neostore.relationshipgroupstore.db.id
-    │   │   │   ├── neostore.relationshipgroupstore.degrees.db
-    │   │   │   ├── neostore.relationshipstore.db
-    │   │   │   ├── neostore.relationshipstore.db.id
-    │   │   │   ├── neostore.relationshiptypestore.db
-    │   │   │   ├── neostore.relationshiptypestore.db.id
-    │   │   │   ├── neostore.relationshiptypestore.db.names
-    │   │   │   ├── neostore.relationshiptypestore.db.names.id
-    │   │   │   ├── neostore.schemastore.db
-    │   │   │   ├── neostore.schemastore.db.id
-    │   │   │   └── schema
-    │   │   │       └── index
-    │   │   │           └── token-lookup-1.0
-    │   │   │               ├── 1
-    │   │   │               │   └── index-1
-    │   │   │               └── 2
-    │   │   │                   └── index-2
-    │   │   ├── store_lock
-    │   │   └── system
-    │   │       ├── database_lock
-    │   │       ├── id-buffer.tmp.0
-    │   │       ├── neostore
-    │   │       ├── neostore.counts.db
-    │   │       ├── neostore.indexstats.db
-    │   │       ├── neostore.labeltokenstore.db
-    │   │       ├── neostore.labeltokenstore.db.id
-    │   │       ├── neostore.labeltokenstore.db.names
-    │   │       ├── neostore.labeltokenstore.db.names.id
-    │   │       ├── neostore.nodestore.db
-    │   │       ├── neostore.nodestore.db.id
-    │   │       ├── neostore.nodestore.db.labels
-    │   │       ├── neostore.nodestore.db.labels.id
-    │   │       ├── neostore.propertystore.db
-    │   │       ├── neostore.propertystore.db.arrays
-    │   │       ├── neostore.propertystore.db.arrays.id
-    │   │       ├── neostore.propertystore.db.id
-    │   │       ├── neostore.propertystore.db.index
-    │   │       ├── neostore.propertystore.db.index.id
-    │   │       ├── neostore.propertystore.db.index.keys
-    │   │       ├── neostore.propertystore.db.index.keys.id
-    │   │       ├── neostore.propertystore.db.strings
-    │   │       ├── neostore.propertystore.db.strings.id
-    │   │       ├── neostore.relationshipgroupstore.db
-    │   │       ├── neostore.relationshipgroupstore.db.id
-    │   │       ├── neostore.relationshipgroupstore.degrees.db
-    │   │       ├── neostore.relationshipstore.db
-    │   │       ├── neostore.relationshipstore.db.id
-    │   │       ├── neostore.relationshiptypestore.db
-    │   │       ├── neostore.relationshiptypestore.db.id
-    │   │       ├── neostore.relationshiptypestore.db.names
-    │   │       ├── neostore.relationshiptypestore.db.names.id
-    │   │       ├── neostore.schemastore.db
-    │   │       ├── neostore.schemastore.db.id
-    │   │       └── schema
-    │   │           └── index
-    │   │               ├── range-1.0
-    │   │               │   ├── 3
-    │   │               │   │   └── index-3
-    │   │               │   ├── 4
-    │   │               │   │   └── index-4
-    │   │               │   ├── 7
-    │   │               │   │   └── index-7
-    │   │               │   ├── 8
-    │   │               │   │   └── index-8
-    │   │               │   └── 9
-    │   │               │       └── index-9
-    │   │               └── token-lookup-1.0
-    │   │                   ├── 1
-    │   │                   │   └── index-1
-    │   │                   └── 2
-    │   │                       └── index-2
-    │   ├── dbms
-    │   │   └── auth.ini
-    │   ├── server_id
-    │   └── transactions
-    │       ├── neo4j
-    │       │   ├── checkpoint.0
-    │       │   └── neostore.transaction.db.0
-    │       └── system
-    │           ├── checkpoint.0
-    │           └── neostore.transaction.db.0
-    └── postgres_data  [error opening dir]
-
-81 directories, 376 files