autonomy, initial scaffold

2025-12-11 13:12:44 -05:00
parent d5d7ea3469
commit 30f6c1a3da
9 changed files with 0 additions and 0 deletions
--- a/docs/ARCHITECTURE_v0-6-0.md
+++ b/docs/ARCHITECTURE_v0-6-0.md
--- a/docs/ENVIRONMENT_VARIABLES.md
+++ b/docs/ENVIRONMENT_VARIABLES.md
@@ -0,0 +1,250 @@
+# Environment Variables Reference
+
+This document describes all environment variables used across Project Lyra services.
+
+## Quick Start
+
+1. Copy environment templates:
+   ```bash
+   cp .env.example .env
+   cp cortex/.env.example cortex/.env
+   cp neomem/.env.example neomem/.env
+   cp intake/.env.example intake/.env
+   ```
+
+2. Edit `.env` and add your credentials:
+   - `OPENAI_API_KEY`: Your OpenAI API key
+   - `POSTGRES_PASSWORD`: Database password
+   - `NEO4J_PASSWORD`: Graph database password
+   - `NEOMEM_API_KEY`: Generate a secure token
+
+3. Update service URLs if your infrastructure differs from defaults
+
+## File Structure
+
+### Root `.env` - Shared Infrastructure
+Contains all shared configuration used by multiple services:
+- LLM backend options (PRIMARY, SECONDARY, CLOUD, FALLBACK)
+- Database credentials (Postgres, Neo4j)
+- API keys (OpenAI)
+- Internal service URLs
+- Feature flags
+
+### Service-Specific `.env` Files
+Each service has minimal overrides for service-specific parameters:
+- **`cortex/.env`**: Cortex operational parameters
+- **`neomem/.env`**: NeoMem LLM naming convention mappings
+- **`intake/.env`**: Intake summarization parameters
+
+## Environment Loading Order
+
+Docker Compose loads environment files in this order (later overrides earlier):
+1. Service-specific `.env` (e.g., `cortex/.env`)
+2. Root `.env`
+
+This means service-specific files can override root values when needed.
+
+## Global Variables (Root `.env`)
+
+### Global Configuration
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LOCAL_TZ_LABEL` | `America/New_York` | Timezone for logs and timestamps |
+| `DEFAULT_SESSION_ID` | `default` | Default chat session identifier |
+
+### LLM Backend Options
+Each service chooses which backend to use from these available options.
+
+#### Primary Backend (vLLM on MI50 GPU)
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LLM_PRIMARY_PROVIDER` | `vllm` | Provider type |
+| `LLM_PRIMARY_URL` | `http://10.0.0.43:8000` | vLLM server endpoint |
+| `LLM_PRIMARY_MODEL` | `/model` | Model path for vLLM |
+
+#### Secondary Backend (Ollama on 3090 GPU)
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LLM_SECONDARY_PROVIDER` | `ollama` | Provider type |
+| `LLM_SECONDARY_URL` | `http://10.0.0.3:11434` | Ollama server endpoint |
+| `LLM_SECONDARY_MODEL` | `qwen2.5:7b-instruct-q4_K_M` | Ollama model name |
+
+#### Cloud Backend (OpenAI)
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LLM_CLOUD_PROVIDER` | `openai_chat` | Provider type |
+| `LLM_CLOUD_URL` | `https://api.openai.com/v1` | OpenAI API endpoint |
+| `LLM_CLOUD_MODEL` | `gpt-4o-mini` | OpenAI model to use |
+| `OPENAI_API_KEY` | *required* | OpenAI API authentication key |
+
+#### Fallback Backend (llama.cpp/LM Studio)
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LLM_FALLBACK_PROVIDER` | `openai_completions` | Provider type (llama.cpp mimics OpenAI) |
+| `LLM_FALLBACK_URL` | `http://10.0.0.41:11435` | Fallback server endpoint |
+| `LLM_FALLBACK_MODEL` | `llama-3.2-8b-instruct` | Fallback model name |
+
+#### LLM Global Settings
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LLM_TEMPERATURE` | `0.7` | Sampling temperature (0.0-2.0) |
+
+### Database Configuration
+
+#### PostgreSQL (with pgvector)
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `POSTGRES_USER` | `neomem` | PostgreSQL username |
+| `POSTGRES_PASSWORD` | *required* | PostgreSQL password |
+| `POSTGRES_DB` | `neomem` | Database name |
+| `POSTGRES_HOST` | `neomem-postgres` | Container name/hostname |
+| `POSTGRES_PORT` | `5432` | PostgreSQL port |
+
+#### Neo4j Graph Database
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `NEO4J_URI` | `bolt://neomem-neo4j:7687` | Neo4j connection URI |
+| `NEO4J_USERNAME` | `neo4j` | Neo4j username |
+| `NEO4J_PASSWORD` | *required* | Neo4j password |
+| `NEO4J_AUTH` | `neo4j/<password>` | Neo4j auth string |
+
+### Memory Services (NeoMem)
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `NEOMEM_API` | `http://neomem-api:7077` | NeoMem API endpoint |
+| `NEOMEM_API_KEY` | *required* | NeoMem API authentication token |
+| `NEOMEM_HISTORY_DB` | `postgresql://...` | PostgreSQL connection string for history |
+| `EMBEDDER_PROVIDER` | `openai` | Embedding provider (used by NeoMem) |
+| `EMBEDDER_MODEL` | `text-embedding-3-small` | Embedding model name |
+
+### Internal Service URLs
+All using Docker container names for network communication:
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `INTAKE_API_URL` | `http://intake:7080` | Intake summarizer service |
+| `CORTEX_API` | `http://cortex:7081` | Cortex reasoning service |
+| `CORTEX_URL` | `http://cortex:7081/reflect` | Cortex reflection endpoint |
+| `CORTEX_URL_INGEST` | `http://cortex:7081/ingest` | Cortex ingest endpoint |
+| `RAG_API_URL` | `http://rag:7090` | RAG service (if enabled) |
+| `RELAY_URL` | `http://relay:7078` | Relay orchestration service |
+| `PERSONA_URL` | `http://persona-sidecar:7080/current` | Persona service (optional) |
+
+### Feature Flags
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `CORTEX_ENABLED` | `true` | Enable Cortex autonomous reflection |
+| `MEMORY_ENABLED` | `true` | Enable NeoMem long-term memory |
+| `PERSONA_ENABLED` | `false` | Enable persona sidecar |
+| `DEBUG_PROMPT` | `true` | Enable debug logging for prompts |
+
+## Service-Specific Variables
+
+### Cortex (`cortex/.env`)
+Cortex operational parameters:
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `CORTEX_MODE` | `autonomous` | Operation mode (autonomous/manual) |
+| `CORTEX_LOOP_INTERVAL` | `300` | Seconds between reflection loops |
+| `CORTEX_REFLECTION_INTERVAL` | `86400` | Seconds between deep reflections (24h) |
+| `CORTEX_LOG_LEVEL` | `debug` | Logging verbosity |
+| `NEOMEM_HEALTH_CHECK_INTERVAL` | `300` | NeoMem health check frequency |
+| `REFLECTION_NOTE_TARGET` | `trilium` | Where to store reflection notes |
+| `REFLECTION_NOTE_PATH` | `/app/logs/reflections.log` | Reflection output path |
+| `RELEVANCE_THRESHOLD` | `0.78` | Memory retrieval relevance threshold |
+
+**Note**: Cortex uses `LLM_PRIMARY` (vLLM on MI50) by default from root `.env`.
+
+### NeoMem (`neomem/.env`)
+NeoMem uses different variable naming conventions:
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LLM_PROVIDER` | `ollama` | NeoMem's LLM provider name |
+| `LLM_MODEL` | `qwen2.5:7b-instruct-q4_K_M` | NeoMem's LLM model |
+| `LLM_API_BASE` | `http://10.0.0.3:11434` | NeoMem's LLM endpoint (Ollama) |
+
+**Note**: NeoMem uses Ollama (SECONDARY) for reasoning and OpenAI for embeddings. Database credentials and `OPENAI_API_KEY` inherited from root `.env`.
+
+### Intake (`intake/.env`)
+Intake summarization parameters:
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SUMMARY_MODEL_NAME` | `/model` | Model path for summarization |
+| `SUMMARY_API_URL` | `http://10.0.0.43:8000` | LLM endpoint for summaries |
+| `SUMMARY_MAX_TOKENS` | `400` | Max tokens for summary generation |
+| `SUMMARY_TEMPERATURE` | `0.4` | Temperature for summaries (lower = more focused) |
+| `SUMMARY_INTERVAL` | `300` | Seconds between summary checks |
+| `INTAKE_LOG_PATH` | `/app/logs/intake.log` | Log file location |
+| `INTAKE_LOG_LEVEL` | `info` | Logging verbosity |
+
+**Note**: Intake uses `LLM_PRIMARY` (vLLM) by default.
+
+## Multi-Backend LLM Strategy
+
+Project Lyra supports flexible backend selection per service:
+
+**Root `.env` provides backend OPTIONS**:
+- PRIMARY: vLLM on MI50 GPU (high performance)
+- SECONDARY: Ollama on 3090 GPU (local inference)
+- CLOUD: OpenAI API (cloud fallback)
+- FALLBACK: llama.cpp/LM Studio (CPU-only)
+
+**Services choose which backend to USE**:
+- **Cortex** → vLLM (PRIMARY) for autonomous reasoning
+- **NeoMem** → Ollama (SECONDARY) + OpenAI embeddings
+- **Intake** → vLLM (PRIMARY) for summarization
+- **Relay** → Implements fallback cascade with user preference
+
+This design eliminates URL duplication while preserving per-service flexibility.
+
+## Security Best Practices
+
+1. **Never commit `.env` files to git** - they contain secrets
+2. **Use `.env.example` templates** for documentation and onboarding
+3. **Rotate credentials regularly**, especially:
+   - `OPENAI_API_KEY`
+   - `NEOMEM_API_KEY`
+   - Database passwords
+4. **Use strong passwords** for production databases
+5. **Restrict network access** to LLM backends and databases
+
+## Troubleshooting
+
+### Services can't connect to each other
+- Verify container names match in service URLs
+- Check all services are on the `lyra_net` Docker network
+- Use `docker-compose ps` to verify all services are running
+
+### LLM calls failing
+- Verify backend URLs are correct for your infrastructure
+- Check if LLM servers are running and accessible
+- Test with `curl <LLM_URL>/v1/models` (OpenAI-compatible APIs)
+
+### Database connection errors
+- Verify database credentials match in all locations
+- Check if database containers are healthy: `docker-compose ps`
+- Review database logs: `docker-compose logs neomem-postgres`
+
+### Environment variables not loading
+- Verify env_file paths in docker-compose.yml
+- Check file permissions: `.env` files must be readable
+- Remember loading order: service `.env` overrides root `.env`
+
+## Migration from Old Setup
+
+If you have the old multi-file setup with duplicated variables:
+
+1. **Backup existing files**: All original `.env` files are in `.env-backups/`
+2. **Copy new templates**: Use `.env.example` files as base
+3. **Merge credentials**: Transfer your actual keys/passwords to new root `.env`
+4. **Test thoroughly**: Verify all services start and communicate correctly
+
+## Support
+
+For issues or questions:
+- Check logs: `docker-compose logs <service>`
+- Verify configuration: `docker exec <container> env | grep <VAR>`
+- Review this documentation for variable descriptions
--- a/docs/PROJECT_SUMMARY.md
+++ b/docs/PROJECT_SUMMARY.md
@@ -0,0 +1,925 @@
+# Project Lyra — Comprehensive AI Context Summary
+
+**Version:** v0.5.1 (2025-12-11)
+**Status:** Production-ready modular AI companion system
+**Purpose:** Memory-backed conversational AI with multi-stage reasoning, persistent context, and modular LLM backend architecture
+
+---
+
+## Executive Summary
+
+Project Lyra is a **self-hosted AI companion system** designed to overcome the limitations of typical chatbots by providing:
+- **Persistent long-term memory** (NeoMem: PostgreSQL + Neo4j graph storage)
+- **Multi-stage reasoning pipeline** (Cortex: reflection → reasoning → refinement → persona)
+- **Short-term context management** (Intake: session-based summarization embedded in Cortex)
+- **Flexible LLM backend routing** (supports llama.cpp, Ollama, OpenAI, custom endpoints)
+- **OpenAI-compatible API** (drop-in replacement for chat applications)
+
+**Core Philosophy:** Like a human brain has different regions for different functions, Lyra has specialized modules that work together. She's not just a chatbot—she's a notepad, schedule, database, co-creator, and collaborator with her own executive function.
+
+---
+
+## Quick Context for AI Assistants
+
+If you're an AI being given this project to work on, here's what you need to know:
+
+### What This Project Does
+Lyra is a conversational AI system that **remembers everything** across sessions. When a user says something in passing, Lyra stores it, contextualizes it, and can recall it later. She can:
+- Track project progress over time
+- Remember user preferences and past conversations
+- Reason through complex questions using multiple LLM calls
+- Apply a consistent personality across all interactions
+- Integrate with multiple LLM backends (local and cloud)
+
+### Current Architecture (v0.5.1)
+```
+User → Relay (Express/Node.js, port 7078)
+  ↓
+Cortex (FastAPI/Python, port 7081)
+  ├─ Intake module (embedded, in-memory SESSIONS)
+  ├─ 4-stage reasoning pipeline
+  └─ Multi-backend LLM router
+  ↓
+NeoMem (FastAPI/Python, port 7077)
+  ├─ PostgreSQL (vector storage)
+  └─ Neo4j (graph relationships)
+```
+
+### Key Files You'll Work With
+
+**Backend Services:**
+- [cortex/router.py](cortex/router.py) - Main Cortex routing logic (306 lines, `/reason`, `/ingest` endpoints)
+- [cortex/intake/intake.py](cortex/intake/intake.py) - Short-term memory module (367 lines, SESSIONS management)
+- [cortex/reasoning/reasoning.py](cortex/reasoning/reasoning.py) - Draft answer generation
+- [cortex/reasoning/refine.py](cortex/reasoning/refine.py) - Answer refinement
+- [cortex/reasoning/reflection.py](cortex/reasoning/reflection.py) - Meta-awareness notes
+- [cortex/persona/speak.py](cortex/persona/speak.py) - Personality layer
+- [cortex/llm/llm_router.py](cortex/llm/llm_router.py) - LLM backend selector
+- [core/relay/server.js](core/relay/server.js) - Main orchestrator (Node.js)
+- [neomem/main.py](neomem/main.py) - Long-term memory API
+
+**Configuration:**
+- [.env](.env) - Root environment variables (LLM backends, databases, API keys)
+- [cortex/.env](cortex/.env) - Cortex-specific overrides
+- [docker-compose.yml](docker-compose.yml) - Service definitions (152 lines)
+
+**Documentation:**
+- [CHANGELOG.md](CHANGELOG.md) - Complete version history (836 lines, chronological format)
+- [README.md](README.md) - User-facing documentation (610 lines)
+- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - This file
+
+### Recent Critical Fixes (v0.5.1)
+The most recent work fixed a critical bug where Intake's SESSIONS buffer wasn't persisting:
+1. **Fixed**: `bg_summarize()` was only a TYPE_CHECKING stub → implemented as logging stub
+2. **Fixed**: `/ingest` endpoint had unreachable code → removed early return, added lenient error handling
+3. **Added**: `cortex/intake/__init__.py` → proper Python package structure
+4. **Added**: Diagnostic endpoints `/debug/sessions` and `/debug/summary` for troubleshooting
+
+**Key Insight**: Intake is no longer a standalone service—it's embedded in Cortex as a Python module. SESSIONS must persist in a single Uvicorn worker (no multi-worker support without Redis).
+
+---
+
+## Architecture Deep Dive
+
+### Service Topology (Docker Compose)
+
+**Active Containers:**
+1. **relay** (Node.js/Express, port 7078)
+   - Entry point for all user requests
+   - OpenAI-compatible `/v1/chat/completions` endpoint
+   - Routes to Cortex for reasoning
+   - Async calls to Cortex `/ingest` after response
+
+2. **cortex** (Python/FastAPI, port 7081)
+   - Multi-stage reasoning pipeline
+   - Embedded Intake module (no HTTP, direct Python imports)
+   - Endpoints: `/reason`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary`
+
+3. **neomem-api** (Python/FastAPI, port 7077)
+   - Long-term memory storage
+   - Fork of Mem0 OSS (fully local, no external SDK)
+   - Endpoints: `/memories`, `/search`, `/health`
+
+4. **neomem-postgres** (PostgreSQL + pgvector, port 5432)
+   - Vector embeddings storage
+   - Memory history records
+
+5. **neomem-neo4j** (Neo4j, ports 7474/7687)
+   - Graph relationships between memories
+   - Entity extraction and linking
+
+**Disabled Services:**
+- `intake` - No longer needed (embedded in Cortex as of v0.5.1)
+- `rag` - Beta Lyrae RAG service (planned re-enablement)
+
+### External LLM Backends (HTTP APIs)
+
+**PRIMARY Backend** - llama.cpp @ `http://10.0.0.44:8080`
+- AMD MI50 GPU-accelerated inference
+- Model: `/model` (path-based routing)
+- Used for: Reasoning, refinement, summarization
+
+**SECONDARY Backend** - Ollama @ `http://10.0.0.3:11434`
+- RTX 3090 GPU-accelerated inference
+- Model: `qwen2.5:7b-instruct-q4_K_M`
+- Used for: Configurable per-module
+
+**CLOUD Backend** - OpenAI @ `https://api.openai.com/v1`
+- Cloud-based inference
+- Model: `gpt-4o-mini`
+- Used for: Reflection, persona layers
+
+**FALLBACK Backend** - Local @ `http://10.0.0.41:11435`
+- CPU-based inference
+- Model: `llama-3.2-8b-instruct`
+- Used for: Emergency fallback
+
+### Data Flow (Request Lifecycle)
+
+```
+1. User sends message → Relay (/v1/chat/completions)
+   ↓
+2. Relay → Cortex (/reason)
+   ↓
+3. Cortex calls Intake module (internal Python)
+   - Intake.summarize_context(session_id, exchanges)
+   - Returns L1/L5/L10/L20/L30 summaries
+   ↓
+4. Cortex 4-stage pipeline:
+   a. reflection.py → Meta-awareness notes (CLOUD backend)
+      - "What is the user really asking?"
+      - Returns JSON: {"notes": [...]}
+
+   b. reasoning.py → Draft answer (PRIMARY backend)
+      - Uses context from Intake
+      - Integrates reflection notes
+      - Returns draft text
+
+   c. refine.py → Refined answer (PRIMARY backend)
+      - Polishes draft for clarity
+      - Ensures factual consistency
+      - Returns refined text
+
+   d. speak.py → Persona layer (CLOUD backend)
+      - Applies Lyra's personality
+      - Natural, conversational tone
+      - Returns final answer
+   ↓
+5. Cortex → Relay (returns persona answer)
+   ↓
+6. Relay → Cortex (/ingest) [async, non-blocking]
+   - Sends (session_id, user_msg, assistant_msg)
+   - Cortex calls add_exchange_internal()
+   - Appends to SESSIONS[session_id]["buffer"]
+   ↓
+7. Relay → User (returns final response)
+   ↓
+8. [Planned] Relay → NeoMem (/memories) [async]
+   - Store conversation in long-term memory
+```
+
+### Intake Module Architecture (v0.5.1)
+
+**Location:** `cortex/intake/`
+
+**Key Change:** Intake is now **embedded in Cortex** as a Python module, not a standalone service.
+
+**Import Pattern:**
+```python
+from intake.intake import add_exchange_internal, SESSIONS, summarize_context
+```
+
+**Core Data Structure:**
+```python
+SESSIONS: dict[str, dict] = {}
+
+# Structure:
+SESSIONS[session_id] = {
+    "buffer": deque(maxlen=200),  # Circular buffer of exchanges
+    "created_at": datetime
+}
+
+# Each exchange in buffer:
+{
+    "session_id": "...",
+    "user_msg": "...",
+    "assistant_msg": "...",
+    "timestamp": "2025-12-11T..."
+}
+```
+
+**Functions:**
+1. **`add_exchange_internal(exchange: dict)`**
+   - Adds exchange to SESSIONS buffer
+   - Creates new session if needed
+   - Calls `bg_summarize()` stub
+   - Returns `{"ok": True, "session_id": "..."}`
+
+2. **`summarize_context(session_id: str, exchanges: list[dict])`** [async]
+   - Generates L1/L5/L10/L20/L30 summaries via LLM
+   - Called during `/reason` endpoint
+   - Returns multi-level summary dict
+
+3. **`bg_summarize(session_id: str)`**
+   - **Stub function** - logs only, no actual work
+   - Defers summarization to `/reason` call
+   - Exists to prevent NameError
+
+**Critical Constraint:** SESSIONS is a module-level global dict. This requires **single-worker Uvicorn** mode. Multi-worker deployments need Redis or shared storage.
+
+**Diagnostic Endpoints:**
+- `GET /debug/sessions` - Inspect all SESSIONS (object ID, buffer sizes, recent exchanges)
+- `GET /debug/summary?session_id=X` - Test summarization for a session
+
+---
+
+## Environment Configuration
+
+### LLM Backend Registry (Multi-Backend Strategy)
+
+**Root `.env` defines all backend OPTIONS:**
+```bash
+# PRIMARY Backend (llama.cpp)
+LLM_PRIMARY_PROVIDER=llama.cpp
+LLM_PRIMARY_URL=http://10.0.0.44:8080
+LLM_PRIMARY_MODEL=/model
+
+# SECONDARY Backend (Ollama)
+LLM_SECONDARY_PROVIDER=ollama
+LLM_SECONDARY_URL=http://10.0.0.3:11434
+LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
+
+# CLOUD Backend (OpenAI)
+LLM_OPENAI_PROVIDER=openai
+LLM_OPENAI_URL=https://api.openai.com/v1
+LLM_OPENAI_MODEL=gpt-4o-mini
+OPENAI_API_KEY=sk-proj-...
+
+# FALLBACK Backend
+LLM_FALLBACK_PROVIDER=openai_completions
+LLM_FALLBACK_URL=http://10.0.0.41:11435
+LLM_FALLBACK_MODEL=llama-3.2-8b-instruct
+```
+
+**Module-specific backend selection:**
+```bash
+CORTEX_LLM=SECONDARY      # Cortex uses Ollama
+INTAKE_LLM=PRIMARY        # Intake uses llama.cpp
+SPEAK_LLM=OPENAI          # Persona uses OpenAI
+NEOMEM_LLM=PRIMARY        # NeoMem uses llama.cpp
+UI_LLM=OPENAI             # UI uses OpenAI
+RELAY_LLM=PRIMARY         # Relay uses llama.cpp
+```
+
+**Philosophy:** Root `.env` provides all backend OPTIONS. Each service chooses which backend to USE via `{MODULE}_LLM` variable. This eliminates URL duplication while preserving flexibility.
+
+### Database Configuration
+```bash
+# PostgreSQL (vector storage)
+POSTGRES_USER=neomem
+POSTGRES_PASSWORD=neomempass
+POSTGRES_DB=neomem
+POSTGRES_HOST=neomem-postgres
+POSTGRES_PORT=5432
+
+# Neo4j (graph storage)
+NEO4J_URI=bolt://neomem-neo4j:7687
+NEO4J_USERNAME=neo4j
+NEO4J_PASSWORD=neomemgraph
+```
+
+### Service URLs (Docker Internal Network)
+```bash
+NEOMEM_API=http://neomem-api:7077
+CORTEX_API=http://cortex:7081
+CORTEX_REASON_URL=http://cortex:7081/reason
+CORTEX_INGEST_URL=http://cortex:7081/ingest
+RELAY_URL=http://relay:7078
+```
+
+### Feature Flags
+```bash
+CORTEX_ENABLED=true
+MEMORY_ENABLED=true
+PERSONA_ENABLED=false
+DEBUG_PROMPT=true
+VERBOSE_DEBUG=true
+```
+
+---
+
+## Code Structure Overview
+
+### Cortex Service (`cortex/`)
+
+**Main Files:**
+- `main.py` - FastAPI app initialization
+- `router.py` - Route definitions (`/reason`, `/ingest`, `/health`, `/debug/*`)
+- `context.py` - Context aggregation (Intake summaries, session state)
+
+**Reasoning Pipeline (`reasoning/`):**
+- `reflection.py` - Meta-awareness notes (Cloud LLM)
+- `reasoning.py` - Draft answer generation (Primary LLM)
+- `refine.py` - Answer refinement (Primary LLM)
+
+**Persona Layer (`persona/`):**
+- `speak.py` - Personality application (Cloud LLM)
+- `identity.py` - Persona loader
+
+**Intake Module (`intake/`):**
+- `__init__.py` - Package exports (SESSIONS, add_exchange_internal, summarize_context)
+- `intake.py` - Core logic (367 lines)
+  - SESSIONS dictionary
+  - add_exchange_internal()
+  - summarize_context()
+  - bg_summarize() stub
+
+**LLM Integration (`llm/`):**
+- `llm_router.py` - Backend selector and HTTP client
+  - call_llm() function
+  - Environment-based routing
+  - Payload formatting per backend type
+
+**Utilities (`utils/`):**
+- Helper functions for common operations
+
+**Configuration:**
+- `Dockerfile` - Single-worker constraint documented
+- `requirements.txt` - Python dependencies
+- `.env` - Service-specific overrides
+
+### Relay Service (`core/relay/`)
+
+**Main Files:**
+- `server.js` - Express.js server (Node.js)
+  - `/v1/chat/completions` - OpenAI-compatible endpoint
+  - `/chat` - Internal endpoint
+  - `/_health` - Health check
+- `package.json` - Node.js dependencies
+
+**Key Logic:**
+- Receives user messages
+- Routes to Cortex `/reason`
+- Async calls to Cortex `/ingest` after response
+- Returns final answer to user
+
+### NeoMem Service (`neomem/`)
+
+**Main Files:**
+- `main.py` - FastAPI app (memory API)
+- `memory.py` - Memory management logic
+- `embedder.py` - Embedding generation
+- `graph.py` - Neo4j graph operations
+- `Dockerfile` - Container definition
+- `requirements.txt` - Python dependencies
+
+**API Endpoints:**
+- `POST /memories` - Add new memory
+- `POST /search` - Semantic search
+- `GET /health` - Service health
+
+---
+
+## Common Development Tasks
+
+### Adding a New Endpoint to Cortex
+
+**Example: Add `/debug/buffer` endpoint**
+
+1. **Edit `cortex/router.py`:**
+```python
+@cortex_router.get("/debug/buffer")
+async def debug_buffer(session_id: str, limit: int = 10):
+    """Return last N exchanges from a session buffer."""
+    from intake.intake import SESSIONS
+
+    session = SESSIONS.get(session_id)
+    if not session:
+        return {"error": "session not found", "session_id": session_id}
+
+    buffer = session["buffer"]
+    recent = list(buffer)[-limit:]
+
+    return {
+        "session_id": session_id,
+        "total_exchanges": len(buffer),
+        "recent_exchanges": recent
+    }
+```
+
+2. **Restart Cortex:**
+```bash
+docker-compose restart cortex
+```
+
+3. **Test:**
+```bash
+curl "http://localhost:7081/debug/buffer?session_id=test&limit=5"
+```
+
+### Modifying LLM Backend for a Module
+
+**Example: Switch Cortex to use PRIMARY backend**
+
+1. **Edit `.env`:**
+```bash
+CORTEX_LLM=PRIMARY  # Change from SECONDARY to PRIMARY
+```
+
+2. **Restart Cortex:**
+```bash
+docker-compose restart cortex
+```
+
+3. **Verify in logs:**
+```bash
+docker logs cortex | grep "Backend"
+```
+
+### Adding Diagnostic Logging
+
+**Example: Log every exchange addition**
+
+1. **Edit `cortex/intake/intake.py`:**
+```python
+def add_exchange_internal(exchange: dict):
+    session_id = exchange.get("session_id")
+
+    # Add detailed logging
+    print(f"[DEBUG] Adding exchange to {session_id}")
+    print(f"[DEBUG] User msg: {exchange.get('user_msg', '')[:100]}")
+    print(f"[DEBUG] Assistant msg: {exchange.get('assistant_msg', '')[:100]}")
+
+    # ... rest of function
+```
+
+2. **View logs:**
+```bash
+docker logs cortex -f | grep DEBUG
+```
+
+---
+
+## Debugging Guide
+
+### Problem: SESSIONS Not Persisting
+
+**Symptoms:**
+- `/debug/sessions` shows empty or only 1 exchange
+- Summaries always return empty
+- Buffer size doesn't increase
+
+**Diagnosis Steps:**
+1. Check Cortex logs for SESSIONS object ID:
+   ```bash
+   docker logs cortex | grep "SESSIONS object id"
+   ```
+   - Should show same ID across all calls
+   - If IDs differ → module reloading issue
+
+2. Verify single-worker mode:
+   ```bash
+   docker exec cortex cat Dockerfile | grep uvicorn
+   ```
+   - Should NOT have `--workers` flag or `--workers 1`
+
+3. Check `/debug/sessions` endpoint:
+   ```bash
+   curl http://localhost:7081/debug/sessions | jq
+   ```
+   - Should show sessions_object_id and current sessions
+
+4. Inspect `__init__.py` exists:
+   ```bash
+   docker exec cortex ls -la intake/__init__.py
+   ```
+
+**Solution (Fixed in v0.5.1):**
+- Ensure `cortex/intake/__init__.py` exists with proper exports
+- Verify `bg_summarize()` is implemented (not just TYPE_CHECKING stub)
+- Check `/ingest` endpoint doesn't have early return
+- Rebuild Cortex container: `docker-compose build cortex && docker-compose restart cortex`
+
+### Problem: LLM Backend Timeout
+
+**Symptoms:**
+- Cortex `/reason` hangs
+- 504 Gateway Timeout errors
+- Logs show "waiting for LLM response"
+
+**Diagnosis Steps:**
+1. Test backend directly:
+   ```bash
+   # llama.cpp
+   curl http://10.0.0.44:8080/health
+
+   # Ollama
+   curl http://10.0.0.3:11434/api/tags
+
+   # OpenAI
+   curl https://api.openai.com/v1/models \
+     -H "Authorization: Bearer $OPENAI_API_KEY"
+   ```
+
+2. Check network connectivity:
+   ```bash
+   docker exec cortex ping -c 3 10.0.0.44
+   ```
+
+3. Review Cortex logs:
+   ```bash
+   docker logs cortex -f | grep "LLM"
+   ```
+
+**Solutions:**
+- Verify backend URL in `.env` is correct and accessible
+- Check firewall rules for backend ports
+- Increase timeout in `cortex/llm/llm_router.py`
+- Switch to different backend temporarily: `CORTEX_LLM=CLOUD`
+
+### Problem: Docker Compose Won't Start
+
+**Symptoms:**
+- `docker-compose up -d` fails
+- Container exits immediately
+- "port already in use" errors
+
+**Diagnosis Steps:**
+1. Check port conflicts:
+   ```bash
+   netstat -tulpn | grep -E '7078|7081|7077|5432'
+   ```
+
+2. Check container logs:
+   ```bash
+   docker-compose logs --tail=50
+   ```
+
+3. Verify environment file:
+   ```bash
+   cat .env | grep -v "^#" | grep -v "^$"
+   ```
+
+**Solutions:**
+- Stop conflicting services: `docker-compose down`
+- Check `.env` syntax (no quotes unless necessary)
+- Rebuild containers: `docker-compose build --no-cache`
+- Check Docker daemon: `systemctl status docker`
+
+---
+
+## Testing Checklist
+
+### After Making Changes to Cortex
+
+**1. Build and restart:**
+```bash
+docker-compose build cortex
+docker-compose restart cortex
+```
+
+**2. Verify service health:**
+```bash
+curl http://localhost:7081/health
+```
+
+**3. Test /ingest endpoint:**
+```bash
+curl -X POST http://localhost:7081/ingest \
+  -H "Content-Type: application/json" \
+  -d '{
+    "session_id": "test",
+    "user_msg": "Hello",
+    "assistant_msg": "Hi there!"
+  }'
+```
+
+**4. Verify SESSIONS updated:**
+```bash
+curl http://localhost:7081/debug/sessions | jq '.sessions.test.buffer_size'
+```
+- Should show 1 (or increment if already populated)
+
+**5. Test summarization:**
+```bash
+curl "http://localhost:7081/debug/summary?session_id=test" | jq '.summary'
+```
+- Should return L1/L5/L10/L20/L30 summaries
+
+**6. Test full pipeline:**
+```bash
+curl -X POST http://localhost:7078/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [{"role": "user", "content": "Test message"}],
+    "session_id": "test"
+  }' | jq '.choices[0].message.content'
+```
+
+**7. Check logs for errors:**
+```bash
+docker logs cortex --tail=50
+```
+
+---
+
+## Project History & Context
+
+### Evolution Timeline
+
+**v0.1.x (2025-09-23 to 2025-09-25)**
+- Initial MVP: Relay + Mem0 + Ollama
+- Basic memory storage and retrieval
+- Simple UI with session support
+
+**v0.2.x (2025-09-24 to 2025-09-30)**
+- Migrated to mem0ai SDK
+- Added sessionId support
+- Created standalone Lyra-Mem0 stack
+
+**v0.3.x (2025-09-26 to 2025-10-28)**
+- Forked Mem0 → NVGRAM → NeoMem
+- Added salience filtering
+- Integrated Cortex reasoning VM
+- Built RAG system (Beta Lyrae)
+- Established multi-backend LLM support
+
+**v0.4.x (2025-11-05 to 2025-11-13)**
+- Major architectural rewire
+- Implemented 4-stage reasoning pipeline
+- Added reflection, refinement stages
+- RAG integration
+- LLM router with per-stage backend selection
+
+**Infrastructure v1.0.0 (2025-11-26)**
+- Consolidated 9 `.env` files into single source of truth
+- Multi-backend LLM strategy
+- Docker Compose consolidation
+- Created security templates
+
+**v0.5.0 (2025-11-28)**
+- Fixed all critical API wiring issues
+- Added OpenAI-compatible Relay endpoint
+- Fixed Cortex → Intake integration
+- End-to-end flow verification
+
+**v0.5.1 (2025-12-11) - CURRENT**
+- **Critical fix**: SESSIONS persistence bug
+- Implemented `bg_summarize()` stub
+- Fixed `/ingest` unreachable code
+- Added `cortex/intake/__init__.py`
+- Embedded Intake in Cortex (no longer standalone)
+- Added diagnostic endpoints
+- Lenient error handling
+- Documented single-worker constraint
+
+### Architectural Philosophy
+
+**Modular Design:**
+- Each service has a single, clear responsibility
+- Services communicate via well-defined HTTP APIs
+- Configuration is centralized but allows per-service overrides
+
+**Local-First:**
+- No reliance on external services (except optional OpenAI)
+- All data stored locally (PostgreSQL + Neo4j)
+- Can run entirely air-gapped with local LLMs
+
+**Flexible LLM Backend:**
+- Not tied to any single LLM provider
+- Can mix local and cloud models
+- Per-stage backend selection for optimal performance/cost
+
+**Error Handling:**
+- Lenient mode: Never fail the chat pipeline
+- Log errors but continue processing
+- Graceful degradation
+
+**Observability:**
+- Diagnostic endpoints for debugging
+- Verbose logging mode
+- Object ID tracking for singleton verification
+
+---
+
+## Known Issues & Limitations
+
+### Fixed in v0.5.1
+- ✅ Intake SESSIONS not persisting → **FIXED**
+- ✅ `bg_summarize()` NameError → **FIXED**
+- ✅ `/ingest` endpoint unreachable code → **FIXED**
+
+### Current Limitations
+
+**1. Single-Worker Constraint**
+- Cortex must run with single Uvicorn worker
+- SESSIONS is in-memory module-level global
+- Multi-worker support requires Redis or shared storage
+- Documented in `cortex/Dockerfile` lines 7-8
+
+**2. NeoMem Integration Incomplete**
+- Relay doesn't yet push to NeoMem after responses
+- Memory storage planned for v0.5.2
+- Currently all memory is short-term (SESSIONS only)
+
+**3. RAG Service Disabled**
+- Beta Lyrae (RAG) commented out in docker-compose.yml
+- Awaiting re-enablement after Intake stabilization
+- Code exists but not currently integrated
+
+**4. Session Management**
+- No session cleanup/expiration
+- SESSIONS grows unbounded (maxlen=200 per session, but infinite sessions)
+- No session list endpoint in Relay
+
+**5. Persona Integration**
+- `PERSONA_ENABLED=false` in `.env`
+- Persona Sidecar not fully wired
+- Identity loaded but not consistently applied
+
+### Future Enhancements
+
+**Short-term (v0.5.2):**
+- Enable NeoMem integration in Relay
+- Add session cleanup/expiration
+- Session list endpoint
+- NeoMem health monitoring
+
+**Medium-term (v0.6.x):**
+- Re-enable RAG service
+- Migrate SESSIONS to Redis for multi-worker support
+- Add request correlation IDs
+- Comprehensive health checks
+
+**Long-term (v0.7.x+):**
+- Persona Sidecar full integration
+- Autonomous "dream" cycles (self-reflection)
+- Verifier module for factual grounding
+- Advanced RAG with hybrid search
+- Memory consolidation strategies
+
+---
+
+## Troubleshooting Quick Reference
+
+| Problem | Quick Check | Solution |
+|---------|-------------|----------|
+| SESSIONS empty | `curl localhost:7081/debug/sessions` | Rebuild Cortex, verify `__init__.py` exists |
+| LLM timeout | `curl http://10.0.0.44:8080/health` | Check backend connectivity, increase timeout |
+| Port conflict | `netstat -tulpn \| grep 7078` | Stop conflicting service or change port |
+| Container crash | `docker logs cortex` | Check logs for Python errors, verify .env syntax |
+| Missing package | `docker exec cortex pip list` | Rebuild container, check requirements.txt |
+| 502 from Relay | `curl localhost:7081/health` | Verify Cortex is running, check docker network |
+
+---
+
+## API Reference (Quick)
+
+### Relay (Port 7078)
+
+**POST /v1/chat/completions** - OpenAI-compatible chat
+```json
+{
+  "messages": [{"role": "user", "content": "..."}],
+  "session_id": "..."
+}
+```
+
+**GET /_health** - Service health
+
+### Cortex (Port 7081)
+
+**POST /reason** - Main reasoning pipeline
+```json
+{
+  "session_id": "...",
+  "user_prompt": "...",
+  "temperature": 0.7  // optional
+}
+```
+
+**POST /ingest** - Add exchange to SESSIONS
+```json
+{
+  "session_id": "...",
+  "user_msg": "...",
+  "assistant_msg": "..."
+}
+```
+
+**GET /debug/sessions** - Inspect SESSIONS state
+
+**GET /debug/summary?session_id=X** - Test summarization
+
+**GET /health** - Service health
+
+### NeoMem (Port 7077)
+
+**POST /memories** - Add memory
+```json
+{
+  "messages": [{"role": "...", "content": "..."}],
+  "user_id": "...",
+  "metadata": {}
+}
+```
+
+**POST /search** - Semantic search
+```json
+{
+  "query": "...",
+  "user_id": "...",
+  "limit": 10
+}
+```
+
+**GET /health** - Service health
+
+---
+
+## File Manifest (Key Files Only)
+
+```
+project-lyra/
+├── .env                           # Root environment variables
+├── docker-compose.yml             # Service definitions (152 lines)
+├── CHANGELOG.md                   # Version history (836 lines)
+├── README.md                      # User documentation (610 lines)
+├── PROJECT_SUMMARY.md             # This file (AI context)
+│
+├── cortex/                        # Reasoning engine
+│   ├── Dockerfile                 # Single-worker constraint documented
+│   ├── requirements.txt
+│   ├── .env                       # Cortex overrides
+│   ├── main.py                    # FastAPI initialization
+│   ├── router.py                  # Routes (306 lines)
+│   ├── context.py                 # Context aggregation
+│   │
+│   ├── intake/                    # Short-term memory (embedded)
+│   │   ├── __init__.py           # Package exports
+│   │   └── intake.py             # Core logic (367 lines)
+│   │
+│   ├── reasoning/                 # Reasoning pipeline
+│   │   ├── reflection.py         # Meta-awareness
+│   │   ├── reasoning.py          # Draft generation
+│   │   └── refine.py             # Refinement
+│   │
+│   ├── persona/                   # Personality layer
+│   │   ├── speak.py              # Persona application
+│   │   └── identity.py           # Persona loader
+│   │
+│   └── llm/                       # LLM integration
+│       └── llm_router.py         # Backend selector
+│
+├── core/relay/                    # Orchestrator
+│   ├── server.js                 # Express server (Node.js)
+│   └── package.json
+│
+├── neomem/                        # Long-term memory
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   ├── .env                       # NeoMem overrides
+│   └── main.py                   # Memory API
+│
+└── rag/                           # RAG system (disabled)
+    ├── rag_api.py
+    ├── rag_chat_import.py
+    └── chromadb/
+```
+
+---
+
+## Final Notes for AI Assistants
+
+### What You Should Know Before Making Changes
+
+1. **SESSIONS is sacred** - It's a module-level global in `cortex/intake/intake.py`. Don't move it, don't duplicate it, don't make it a class attribute. It must remain a singleton.
+
+2. **Single-worker is mandatory** - Until SESSIONS is migrated to Redis, Cortex MUST run with a single Uvicorn worker. Multi-worker will cause SESSIONS to be inconsistent.
+
+3. **Lenient error handling** - The `/ingest` endpoint and other parts of the pipeline use lenient error handling: log errors but always return success. Never fail the chat pipeline.
+
+4. **Backend routing is environment-driven** - Don't hardcode LLM URLs. Use the `{MODULE}_LLM` environment variables and the llm_router.py system.
+
+5. **Intake is embedded** - Don't try to make HTTP calls to Intake. Use direct Python imports: `from intake.intake import ...`
+
+6. **Test with diagnostic endpoints** - Always use `/debug/sessions` and `/debug/summary` to verify SESSIONS behavior after changes.
+
+7. **Follow the changelog format** - When documenting changes, use the chronological format established in CHANGELOG.md v0.5.1. Group by version, then by change type (Fixed, Added, Changed, etc.).
+
+### When You Need Help
+
+- **SESSIONS issues**: Check `cortex/intake/intake.py` lines 11-14 for initialization, lines 325-366 for `add_exchange_internal()`
+- **Routing issues**: Check `cortex/router.py` lines 65-189 for `/reason`, lines 201-233 for `/ingest`
+- **LLM backend issues**: Check `cortex/llm/llm_router.py` for backend selection logic
+- **Environment variables**: Check `.env` lines 13-40 for LLM backends, lines 28-34 for module selection
+
+### Most Important Thing
+
+**This project values reliability over features.** It's better to have a simple, working system than a complex, broken one. When in doubt, keep it simple, log everything, and never fail silently.
+
+---
+
+**End of AI Context Summary**
+
+*This document is maintained to provide complete context for AI assistants working on Project Lyra. Last updated: v0.5.1 (2025-12-11)*
--- a/docs/lyra_tree.txt
+++ b/docs/lyra_tree.txt
@@ -0,0 +1,460 @@
+/home/serversdown/project-lyra
+├── CHANGELOG.md
+├── core
+│   ├── backups
+│   │   ├── mem0_20250927_221040.sql
+│   │   └── mem0_history_20250927_220925.tgz
+│   ├── docker-compose.yml
+│   ├── .env
+│   ├── env experiments
+│   │   ├── .env
+│   │   ├── .env.local
+│   │   └── .env.openai
+│   ├── persona-sidecar
+│   │   ├── Dockerfile
+│   │   ├── package.json
+│   │   ├── persona-server.js
+│   │   └── personas.json
+│   ├── PROJECT_SUMMARY.md
+│   ├── relay
+│   │   ├── Dockerfile
+│   │   ├── .dockerignore
+│   │   ├── lib
+│   │   │   ├── cortex.js
+│   │   │   └── llm.js
+│   │   ├── package.json
+│   │   ├── package-lock.json
+│   │   ├── server.js
+│   │   ├── sessions
+│   │   │   ├── sess-6rxu7eia.json
+│   │   │   ├── sess-6rxu7eia.jsonl
+│   │   │   ├── sess-l08ndm60.json
+│   │   │   └── sess-l08ndm60.jsonl
+│   │   └── test-llm.js
+│   └── ui
+│       ├── index.html
+│       ├── manifest.json
+│       └── style.css
+├── cortex
+│   ├── Dockerfile
+│   ├── .env
+│   ├── ingest
+│   │   ├── ingest_handler.py
+│   │   └── intake_client.py
+│   ├── llm
+│   │   ├── llm_router.py
+│   │   └── resolve_llm_url.py
+│   ├── logs
+│   │   └── reflections.log
+│   ├── main.py
+│   ├── neomem_client.py
+│   ├── persona
+│   │   └── speak.py
+│   ├── rag.py
+│   ├── reasoning
+│   │   ├── reasoning.py
+│   │   ├── refine.py
+│   │   └── reflection.py
+│   ├── requirements.txt
+│   ├── router.py
+│   ├── tests
+│   └── utils
+│       ├── config.py
+│       ├── log_utils.py
+│       └── schema.py
+├── deprecated.env.txt
+├── docker-compose.yml
+├── .env
+├── .gitignore
+├── intake
+│   ├── Dockerfile
+│   ├── .env
+│   ├── intake.py
+│   ├── logs
+│   ├── requirements.txt
+│   └── venv
+│       ├── bin
+│       │   ├── python -> python3
+│       │   ├── python3 -> /usr/bin/python3
+│       │   └── python3.10 -> python3
+│       ├── include
+│       ├── lib
+│       │   └── python3.10
+│       │       └── site-packages
+│       ├── lib64 -> lib
+│       └── pyvenv.cfg
+├── intake-logs
+│   └── summaries.log
+├── lyra_tree.txt
+├── neomem
+│   ├── _archive
+│   │   └── old_servers
+│   │       ├── main_backup.py
+│   │       └── main_dev.py
+│   ├── docker-compose.yml
+│   ├── Dockerfile
+│   ├── .env
+│   ├── .gitignore
+│   ├── neomem
+│   │   ├── api
+│   │   ├── client
+│   │   │   ├── __init__.py
+│   │   │   ├── main.py
+│   │   │   ├── project.py
+│   │   │   └── utils.py
+│   │   ├── configs
+│   │   │   ├── base.py
+│   │   │   ├── embeddings
+│   │   │   │   ├── base.py
+│   │   │   │   └── __init__.py
+│   │   │   ├── enums.py
+│   │   │   ├── __init__.py
+│   │   │   ├── llms
+│   │   │   │   ├── anthropic.py
+│   │   │   │   ├── aws_bedrock.py
+│   │   │   │   ├── azure.py
+│   │   │   │   ├── base.py
+│   │   │   │   ├── deepseek.py
+│   │   │   │   ├── __init__.py
+│   │   │   │   ├── lmstudio.py
+│   │   │   │   ├── ollama.py
+│   │   │   │   ├── openai.py
+│   │   │   │   └── vllm.py
+│   │   │   ├── prompts.py
+│   │   │   └── vector_stores
+│   │   │       ├── azure_ai_search.py
+│   │   │       ├── azure_mysql.py
+│   │   │       ├── baidu.py
+│   │   │       ├── chroma.py
+│   │   │       ├── databricks.py
+│   │   │       ├── elasticsearch.py
+│   │   │       ├── faiss.py
+│   │   │       ├── __init__.py
+│   │   │       ├── langchain.py
+│   │   │       ├── milvus.py
+│   │   │       ├── mongodb.py
+│   │   │       ├── neptune.py
+│   │   │       ├── opensearch.py
+│   │   │       ├── pgvector.py
+│   │   │       ├── pinecone.py
+│   │   │       ├── qdrant.py
+│   │   │       ├── redis.py
+│   │   │       ├── s3_vectors.py
+│   │   │       ├── supabase.py
+│   │   │       ├── upstash_vector.py
+│   │   │       ├── valkey.py
+│   │   │       ├── vertex_ai_vector_search.py
+│   │   │       └── weaviate.py
+│   │   ├── core
+│   │   ├── embeddings
+│   │   │   ├── aws_bedrock.py
+│   │   │   ├── azure_openai.py
+│   │   │   ├── base.py
+│   │   │   ├── configs.py
+│   │   │   ├── gemini.py
+│   │   │   ├── huggingface.py
+│   │   │   ├── __init__.py
+│   │   │   ├── langchain.py
+│   │   │   ├── lmstudio.py
+│   │   │   ├── mock.py
+│   │   │   ├── ollama.py
+│   │   │   ├── openai.py
+│   │   │   ├── together.py
+│   │   │   └── vertexai.py
+│   │   ├── exceptions.py
+│   │   ├── graphs
+│   │   │   ├── configs.py
+│   │   │   ├── __init__.py
+│   │   │   ├── neptune
+│   │   │   │   ├── base.py
+│   │   │   │   ├── __init__.py
+│   │   │   │   ├── neptunedb.py
+│   │   │   │   └── neptunegraph.py
+│   │   │   ├── tools.py
+│   │   │   └── utils.py
+│   │   ├── __init__.py
+│   │   ├── LICENSE
+│   │   ├── llms
+│   │   │   ├── anthropic.py
+│   │   │   ├── aws_bedrock.py
+│   │   │   ├── azure_openai.py
+│   │   │   ├── azure_openai_structured.py
+│   │   │   ├── base.py
+│   │   │   ├── configs.py
+│   │   │   ├── deepseek.py
+│   │   │   ├── gemini.py
+│   │   │   ├── groq.py
+│   │   │   ├── __init__.py
+│   │   │   ├── langchain.py
+│   │   │   ├── litellm.py
+│   │   │   ├── lmstudio.py
+│   │   │   ├── ollama.py
+│   │   │   ├── openai.py
+│   │   │   ├── openai_structured.py
+│   │   │   ├── sarvam.py
+│   │   │   ├── together.py
+│   │   │   ├── vllm.py
+│   │   │   └── xai.py
+│   │   ├── memory
+│   │   │   ├── base.py
+│   │   │   ├── graph_memory.py
+│   │   │   ├── __init__.py
+│   │   │   ├── kuzu_memory.py
+│   │   │   ├── main.py
+│   │   │   ├── memgraph_memory.py
+│   │   │   ├── setup.py
+│   │   │   ├── storage.py
+│   │   │   ├── telemetry.py
+│   │   │   └── utils.py
+│   │   ├── proxy
+│   │   │   ├── __init__.py
+│   │   │   └── main.py
+│   │   ├── server
+│   │   │   ├── dev.Dockerfile
+│   │   │   ├── docker-compose.yaml
+│   │   │   ├── Dockerfile
+│   │   │   ├── main_old.py
+│   │   │   ├── main.py
+│   │   │   ├── Makefile
+│   │   │   ├── README.md
+│   │   │   └── requirements.txt
+│   │   ├── storage
+│   │   ├── utils
+│   │   │   └── factory.py
+│   │   └── vector_stores
+│   │       ├── azure_ai_search.py
+│   │       ├── azure_mysql.py
+│   │       ├── baidu.py
+│   │       ├── base.py
+│   │       ├── chroma.py
+│   │       ├── configs.py
+│   │       ├── databricks.py
+│   │       ├── elasticsearch.py
+│   │       ├── faiss.py
+│   │       ├── __init__.py
+│   │       ├── langchain.py
+│   │       ├── milvus.py
+│   │       ├── mongodb.py
+│   │       ├── neptune_analytics.py
+│   │       ├── opensearch.py
+│   │       ├── pgvector.py
+│   │       ├── pinecone.py
+│   │       ├── qdrant.py
+│   │       ├── redis.py
+│   │       ├── s3_vectors.py
+│   │       ├── supabase.py
+│   │       ├── upstash_vector.py
+│   │       ├── valkey.py
+│   │       ├── vertex_ai_vector_search.py
+│   │       └── weaviate.py
+│   ├── neomem_history
+│   │   └── history.db
+│   ├── pyproject.toml
+│   ├── README.md
+│   └── requirements.txt
+├── neomem_history
+│   └── history.db
+├── rag
+│   ├── chatlogs
+│   │   └── lyra
+│   │       ├── 0000_Wire_ROCm_to_Cortex.json
+│   │       ├── 0001_Branch___10_22_ct201branch-ssh_tut.json
+│   │       ├── 0002_cortex_LLMs_11-1-25.json
+│   │       ├── 0003_RAG_beta.json
+│   │       ├── 0005_Cortex_v0_4_0_planning.json
+│   │       ├── 0006_Cortex_v0_4_0_Refinement.json
+│   │       ├── 0009_Branch___Cortex_v0_4_0_planning.json
+│   │       ├── 0012_Cortex_4_-_neomem_11-1-25.json
+│   │       ├── 0016_Memory_consolidation_concept.json
+│   │       ├── 0017_Model_inventory_review.json
+│   │       ├── 0018_Branch___Memory_consolidation_concept.json
+│   │       ├── 0022_Branch___Intake_conversation_summaries.json
+│   │       ├── 0026_Intake_conversation_summaries.json
+│   │       ├── 0027_Trilium_AI_LLM_setup.json
+│   │       ├── 0028_LLMs_and_sycophancy_levels.json
+│   │       ├── 0031_UI_improvement_plan.json
+│   │       ├── 0035_10_27-neomem_update.json
+│   │       ├── 0044_Install_llama_cpp_on_ct201.json
+│   │       ├── 0045_AI_task_assistant.json
+│   │       ├── 0047_Project_scope_creation.json
+│   │       ├── 0052_View_docker_container_logs.json
+│   │       ├── 0053_10_21-Proxmox_fan_control.json
+│   │       ├── 0054_10_21-pytorch_branch_Quant_experiments.json
+│   │       ├── 0055_10_22_ct201branch-ssh_tut.json
+│   │       ├── 0060_Lyra_project_folder_issue.json
+│   │       ├── 0062_Build_pytorch_API.json
+│   │       ├── 0063_PokerBrain_dataset_structure.json
+│   │       ├── 0065_Install_PyTorch_setup.json
+│   │       ├── 0066_ROCm_PyTorch_setup_quirks.json
+│   │       ├── 0067_VM_model_setup_steps.json
+│   │       ├── 0070_Proxmox_disk_error_fix.json
+│   │       ├── 0072_Docker_Compose_vs_Portainer.json
+│   │       ├── 0073_Check_system_temps_Proxmox.json
+│   │       ├── 0075_Cortex_gpu_progress.json
+│   │       ├── 0076_Backup_Proxmox_before_upgrade.json
+│   │       ├── 0077_Storage_cleanup_advice.json
+│   │       ├── 0082_Install_ROCm_on_Proxmox.json
+│   │       ├── 0088_Thalamus_program_summary.json
+│   │       ├── 0094_Cortex_blueprint_development.json
+│   │       ├── 0095_mem0_advancments.json
+│   │       ├── 0096_Embedding_provider_swap.json
+│   │       ├── 0097_Update_git_commit_steps.json
+│   │       ├── 0098_AI_software_description.json
+│   │       ├── 0099_Seed_memory_process.json
+│   │       ├── 0100_Set_up_Git_repo.json
+│   │       ├── 0101_Customize_embedder_setup.json
+│   │       ├── 0102_Seeding_Local_Lyra_memory.json
+│   │       ├── 0103_Mem0_seeding_part_3.json
+│   │       ├── 0104_Memory_build_prompt.json
+│   │       ├── 0105_Git_submodule_setup_guide.json
+│   │       ├── 0106_Serve_UI_on_LAN.json
+│   │       ├── 0107_AI_name_suggestion.json
+│   │       ├── 0108_Room_X_planning_update.json
+│   │       ├── 0109_Salience_filtering_design.json
+│   │       ├── 0110_RoomX_Cortex_build.json
+│   │       ├── 0119_Explain_Lyra_cortex_idea.json
+│   │       ├── 0120_Git_submodule_organization.json
+│   │       ├── 0121_Web_UI_fix_guide.json
+│   │       ├── 0122_UI_development_planning.json
+│   │       ├── 0123_NVGRAM_debugging_steps.json
+│   │       ├── 0124_NVGRAM_setup_troubleshooting.json
+│   │       ├── 0125_NVGRAM_development_update.json
+│   │       ├── 0126_RX_-_NeVGRAM_New_Features.json
+│   │       ├── 0127_Error_troubleshooting_steps.json
+│   │       ├── 0135_Proxmox_backup_with_ABB.json
+│   │       ├── 0151_Auto-start_Lyra-Core_VM.json
+│   │       ├── 0156_AI_GPU_benchmarks_comparison.json
+│   │       └── 0251_Lyra_project_handoff.json
+│   ├── chromadb
+│   │   ├── c4f701ee-1978-44a1-9df4-3e865b5d33c1
+│   │   │   ├── data_level0.bin
+│   │   │   ├── header.bin
+│   │   │   ├── index_metadata.pickle
+│   │   │   ├── length.bin
+│   │   │   └── link_lists.bin
+│   │   └── chroma.sqlite3
+│   ├── .env
+│   ├── import.log
+│   ├── lyra-chatlogs
+│   │   ├── 0000_Wire_ROCm_to_Cortex.json
+│   │   ├── 0001_Branch___10_22_ct201branch-ssh_tut.json
+│   │   ├── 0002_cortex_LLMs_11-1-25.json
+│   │   └── 0003_RAG_beta.json
+│   ├── rag_api.py
+│   ├── rag_build.py
+│   ├── rag_chat_import.py
+│   └── rag_query.py
+├── README.md
+├── vllm-mi50.md
+└── volumes
+    ├── neo4j_data
+    │   ├── databases
+    │   │   ├── neo4j
+    │   │   │   ├── database_lock
+    │   │   │   ├── id-buffer.tmp.0
+    │   │   │   ├── neostore
+    │   │   │   ├── neostore.counts.db
+    │   │   │   ├── neostore.indexstats.db
+    │   │   │   ├── neostore.labeltokenstore.db
+    │   │   │   ├── neostore.labeltokenstore.db.id
+    │   │   │   ├── neostore.labeltokenstore.db.names
+    │   │   │   ├── neostore.labeltokenstore.db.names.id
+    │   │   │   ├── neostore.nodestore.db
+    │   │   │   ├── neostore.nodestore.db.id
+    │   │   │   ├── neostore.nodestore.db.labels
+    │   │   │   ├── neostore.nodestore.db.labels.id
+    │   │   │   ├── neostore.propertystore.db
+    │   │   │   ├── neostore.propertystore.db.arrays
+    │   │   │   ├── neostore.propertystore.db.arrays.id
+    │   │   │   ├── neostore.propertystore.db.id
+    │   │   │   ├── neostore.propertystore.db.index
+    │   │   │   ├── neostore.propertystore.db.index.id
+    │   │   │   ├── neostore.propertystore.db.index.keys
+    │   │   │   ├── neostore.propertystore.db.index.keys.id
+    │   │   │   ├── neostore.propertystore.db.strings
+    │   │   │   ├── neostore.propertystore.db.strings.id
+    │   │   │   ├── neostore.relationshipgroupstore.db
+    │   │   │   ├── neostore.relationshipgroupstore.db.id
+    │   │   │   ├── neostore.relationshipgroupstore.degrees.db
+    │   │   │   ├── neostore.relationshipstore.db
+    │   │   │   ├── neostore.relationshipstore.db.id
+    │   │   │   ├── neostore.relationshiptypestore.db
+    │   │   │   ├── neostore.relationshiptypestore.db.id
+    │   │   │   ├── neostore.relationshiptypestore.db.names
+    │   │   │   ├── neostore.relationshiptypestore.db.names.id
+    │   │   │   ├── neostore.schemastore.db
+    │   │   │   ├── neostore.schemastore.db.id
+    │   │   │   └── schema
+    │   │   │       └── index
+    │   │   │           └── token-lookup-1.0
+    │   │   │               ├── 1
+    │   │   │               │   └── index-1
+    │   │   │               └── 2
+    │   │   │                   └── index-2
+    │   │   ├── store_lock
+    │   │   └── system
+    │   │       ├── database_lock
+    │   │       ├── id-buffer.tmp.0
+    │   │       ├── neostore
+    │   │       ├── neostore.counts.db
+    │   │       ├── neostore.indexstats.db
+    │   │       ├── neostore.labeltokenstore.db
+    │   │       ├── neostore.labeltokenstore.db.id
+    │   │       ├── neostore.labeltokenstore.db.names
+    │   │       ├── neostore.labeltokenstore.db.names.id
+    │   │       ├── neostore.nodestore.db
+    │   │       ├── neostore.nodestore.db.id
+    │   │       ├── neostore.nodestore.db.labels
+    │   │       ├── neostore.nodestore.db.labels.id
+    │   │       ├── neostore.propertystore.db
+    │   │       ├── neostore.propertystore.db.arrays
+    │   │       ├── neostore.propertystore.db.arrays.id
+    │   │       ├── neostore.propertystore.db.id
+    │   │       ├── neostore.propertystore.db.index
+    │   │       ├── neostore.propertystore.db.index.id
+    │   │       ├── neostore.propertystore.db.index.keys
+    │   │       ├── neostore.propertystore.db.index.keys.id
+    │   │       ├── neostore.propertystore.db.strings
+    │   │       ├── neostore.propertystore.db.strings.id
+    │   │       ├── neostore.relationshipgroupstore.db
+    │   │       ├── neostore.relationshipgroupstore.db.id
+    │   │       ├── neostore.relationshipgroupstore.degrees.db
+    │   │       ├── neostore.relationshipstore.db
+    │   │       ├── neostore.relationshipstore.db.id
+    │   │       ├── neostore.relationshiptypestore.db
+    │   │       ├── neostore.relationshiptypestore.db.id
+    │   │       ├── neostore.relationshiptypestore.db.names
+    │   │       ├── neostore.relationshiptypestore.db.names.id
+    │   │       ├── neostore.schemastore.db
+    │   │       ├── neostore.schemastore.db.id
+    │   │       └── schema
+    │   │           └── index
+    │   │               ├── range-1.0
+    │   │               │   ├── 3
+    │   │               │   │   └── index-3
+    │   │               │   ├── 4
+    │   │               │   │   └── index-4
+    │   │               │   ├── 7
+    │   │               │   │   └── index-7
+    │   │               │   ├── 8
+    │   │               │   │   └── index-8
+    │   │               │   └── 9
+    │   │               │       └── index-9
+    │   │               └── token-lookup-1.0
+    │   │                   ├── 1
+    │   │                   │   └── index-1
+    │   │                   └── 2
+    │   │                       └── index-2
+    │   ├── dbms
+    │   │   └── auth.ini
+    │   ├── server_id
+    │   └── transactions
+    │       ├── neo4j
+    │       │   ├── checkpoint.0
+    │       │   └── neostore.transaction.db.0
+    │       └── system
+    │           ├── checkpoint.0
+    │           └── neostore.transaction.db.0
+    └── postgres_data  [error opening dir]
+
+81 directories, 376 files