v0.5.2 update

Dev
This commit is contained in:
2025-12-12 08:04:20 +00:00
committed by GitHub
20 changed files with 2503 additions and 654 deletions
+49
View File
@@ -9,6 +9,55 @@ Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Se
--- ---
## [0.5.2] - 2025-12-12
### Fixed - LLM Router & Async HTTP
- **Critical**: Replaced synchronous `requests` with async `httpx` in LLM router [cortex/llm/llm_router.py](cortex/llm/llm_router.py)
- Event loop blocking was causing timeouts and empty responses
- All three providers (MI50, Ollama, OpenAI) now use `await http_client.post()`
- Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake
- **Critical**: Fixed missing `backend` parameter in intake summarization [cortex/intake/intake.py:285](cortex/intake/intake.py#L285)
- Was defaulting to PRIMARY (MI50) instead of respecting `INTAKE_LLM=SECONDARY`
- Now correctly uses configured backend (Ollama on 3090)
- **Relay**: Fixed session ID case mismatch [core/relay/server.js:87](core/relay/server.js#L87)
- UI sends `sessionId` (camelCase) but relay expected `session_id` (snake_case)
- Now accepts both variants: `req.body.session_id || req.body.sessionId`
- Custom session IDs now properly tracked instead of defaulting to "default"
### Added - Error Handling & Diagnostics
- Added comprehensive error handling in LLM router for all providers
- HTTPError, JSONDecodeError, KeyError, and generic Exception handling
- Detailed error messages with exception type and description
- Provider-specific error logging (mi50, ollama, openai)
- Added debug logging in intake summarization
- Logs LLM response length and preview
- Validates non-empty responses before JSON parsing
- Helps diagnose empty or malformed responses
### Added - Session Management
- Added session persistence endpoints in relay [core/relay/server.js:160-171](core/relay/server.js#L160-L171)
- `GET /sessions/:id` - Retrieve session history
- `POST /sessions/:id` - Save session history
- In-memory storage using Map (ephemeral, resets on container restart)
- Fixes UI "Failed to load session" errors
### Changed - Provider Configuration
- Added `mi50` provider support for llama.cpp server [cortex/llm/llm_router.py:62-81](cortex/llm/llm_router.py#L62-L81)
- Uses `/completion` endpoint with `n_predict` parameter
- Extracts `content` field from response
- Configured for MI50 GPU with DeepSeek model
- Increased memory retrieval threshold from 0.78 to 0.90 [cortex/.env:20](cortex/.env#L20)
- Filters out low-relevance memories (only returns 90%+ similarity)
- Reduces noise in context retrieval
### Technical Improvements
- Unified async HTTP handling across all LLM providers
- Better separation of concerns between provider implementations
- Improved error messages for debugging LLM API failures
- Consistent timeout handling (120 seconds for all providers)
---
## [0.5.1] - 2025-12-11 ## [0.5.1] - 2025-12-11
### Fixed - Intake Integration ### Fixed - Intake Integration
-71
View File
@@ -1,71 +0,0 @@
# Lyra Core — Project Summary
## v0.4 (2025-10-03)
### 🧠 High-Level Architecture
- **Lyra Core (v0.3.1)** — Orchestration layer.
- Accepts chat requests (`/v1/chat/completions`).
- Routes through Cortex for subconscious annotation.
- Stores everything in Mem0 (no discard).
- Fetches persona + relevant memories.
- Injects context back into LLM.
- **Cortex (v0.3.0)** — Subconscious annotator.
- Runs locally via `llama.cpp` (Phi-3.5-mini Q4_K_M).
- Strict JSON schema:
```json
{
"sentiment": "positive" | "neutral" | "negative",
"novelty": 0.01.0,
"tags": ["keyword", "keyword"],
"notes": "short string"
}
```
- Normalizes keys (lowercase).
- Strips Markdown fences before parsing.
- Configurable via `.env` (`CORTEX_ENABLED=true|false`).
- Currently generates annotations, but not yet persisted into Mem0 payloads (stored as empty `{cortex:{}}`).
- **Mem0 (v0.4.0)** — Persistent memory layer.
- Handles embeddings, graph storage, and retrieval.
- Dual embedder support:
- **OpenAI Cloud** (`text-embedding-3-small`, 1536-dim).
- **HuggingFace TEI** (gte-Qwen2-1.5B-instruct, 1536-dim, hosted on 3090).
- Environment toggle for provider (`.env.openai` vs `.env.3090`).
- Memory persistence in Postgres (`payload` JSON).
- CSV export pipeline confirmed (id, user_id, data, created_at).
- **Persona Sidecar**
- Provides personality, style, and protocol instructions.
- Injected at runtime into Core prompt building.
---
### 🚀 Recent Changes
- **Mem0**
- Added HuggingFace TEI integration (local 3090 embedder).
- Enabled dual-mode environment switch (OpenAI cloud ↔ local TEI).
- Fixed `.env` line ending mismatch (CRLF vs LF).
- Added memory dump/export commands for Postgres.
- **Core/Relay**
- No major changes since v0.3.1 (still routing input → Cortex → Mem0).
- **Cortex**
- Still outputs annotations, but not yet persisted into Mem0 payloads.
---
### 📈 Versioning
- **Lyra Core** → v0.3.1
- **Cortex** → v0.3.0
- **Mem0** → v0.4.0
---
### 📋 Next Steps
- [ ] Wire Cortex annotations into Mem0 payloads (`cortex` object).
- [ ] Add “export all memories” script to standard workflow.
- [ ] Consider async embedding for faster `mem.add`.
- [ ] Build visual diagram of data flow (Core ↔ Cortex ↔ Mem0 ↔ Persona).
- [ ] Explore larger LLMs for Cortex (Qwen2-7B, etc.) for richer subconscious annotation.
+317 -101
View File
@@ -1,12 +1,14 @@
# Project Lyra - README v0.5.0 # Project Lyra - README v0.5.1
Lyra is a modular persistent AI companion system with advanced reasoning capabilities. Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**, It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
with multi-stage reasoning pipeline powered by HTTP-based LLM backends. with multi-stage reasoning pipeline powered by HTTP-based LLM backends.
**Current Version:** v0.5.1 (2025-12-11)
## Mission Statement ## Mission Statement
The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later. The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget evertything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
--- ---
@@ -22,7 +24,7 @@ Project Lyra operates as a **single docker-compose deployment** with multiple Do
- OpenAI-compatible endpoint: `POST /v1/chat/completions` - OpenAI-compatible endpoint: `POST /v1/chat/completions`
- Internal endpoint: `POST /chat` - Internal endpoint: `POST /chat`
- Routes messages through Cortex reasoning pipeline - Routes messages through Cortex reasoning pipeline
- Manages async calls to Intake and NeoMem - Manages async calls to NeoMem and Cortex ingest
**2. UI** (Static HTML) **2. UI** (Static HTML)
- Browser-based chat interface with cyberpunk theme - Browser-based chat interface with cyberpunk theme
@@ -41,38 +43,48 @@ Project Lyra operates as a **single docker-compose deployment** with multiple Do
**4. Cortex** (Python/FastAPI) - Port 7081 **4. Cortex** (Python/FastAPI) - Port 7081
- Primary reasoning engine with multi-stage pipeline - Primary reasoning engine with multi-stage pipeline
- **Includes embedded Intake module** (no separate service as of v0.5.1)
- **4-Stage Processing:** - **4-Stage Processing:**
1. **Reflection** - Generates meta-awareness notes about conversation 1. **Reflection** - Generates meta-awareness notes about conversation
2. **Reasoning** - Creates initial draft answer using context 2. **Reasoning** - Creates initial draft answer using context
3. **Refinement** - Polishes and improves the draft 3. **Refinement** - Polishes and improves the draft
4. **Persona** - Applies Lyra's personality and speaking style 4. **Persona** - Applies Lyra's personality and speaking style
- Integrates with Intake for short-term context - Integrates with Intake for short-term context via internal Python imports
- Flexible LLM router supporting multiple backends via HTTP - Flexible LLM router supporting multiple backends via HTTP
- **Endpoints:**
- `POST /reason` - Main reasoning pipeline
- `POST /ingest` - Receives conversation exchanges from Relay
- `GET /health` - Service health check
- `GET /debug/sessions` - Inspect in-memory SESSIONS state
- `GET /debug/summary` - Test summarization for a session
**5. Intake v0.2** (Python/FastAPI) - Port 7080 **5. Intake** (Python Module) - **Embedded in Cortex**
- Simplified short-term memory summarization - **No longer a standalone service** - runs as Python module inside Cortex container
- Session-based circular buffer (deque, maxlen=200) - Short-term memory management with session-based circular buffer
- Single-level simple summarization (no cascading) - In-memory SESSIONS dictionary: `session_id → {buffer: deque(maxlen=200), created_at: timestamp}`
- Background async processing with FastAPI BackgroundTasks - Multi-level summarization (L1/L5/L10/L20/L30) produced by `summarize_context()`
- Pushes summaries to NeoMem automatically - Deferred summarization - actual summary generation happens during `/reason` call
- **API Endpoints:** - Internal Python API:
- `POST /add_exchange` - Add conversation exchange - `add_exchange_internal(exchange)` - Direct function call from Cortex
- `GET /summaries?session_id={id}` - Retrieve session summary - `summarize_context(session_id, exchanges)` - Async LLM-based summarization
- `POST /close_session/{id}` - Close and cleanup session - `SESSIONS` - Module-level global state (requires single Uvicorn worker)
### LLM Backends (HTTP-based) ### LLM Backends (HTTP-based)
**All LLM communication is done via HTTP APIs:** **All LLM communication is done via HTTP APIs:**
- **PRIMARY**: vLLM server (`http://10.0.0.43:8000`) - AMD MI50 GPU backend - **PRIMARY**: llama.cpp server (`http://10.0.0.44:8080`) - AMD MI50 GPU backend
- **SECONDARY**: Ollama server (`http://10.0.0.3:11434`) - RTX 3090 backend - **SECONDARY**: Ollama server (`http://10.0.0.3:11434`) - RTX 3090 backend
- Model: qwen2.5:7b-instruct-q4_K_M
- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cloud-based models - **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cloud-based models
- Model: gpt-4o-mini
- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback - **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
- Model: llama-3.2-8b-instruct
Each module can be configured to use a different backend via environment variables. Each module can be configured to use a different backend via environment variables.
--- ---
## Data Flow Architecture (v0.5.0) ## Data Flow Architecture (v0.5.1)
### Normal Message Flow: ### Normal Message Flow:
@@ -82,43 +94,44 @@ User (UI) → POST /v1/chat/completions
Relay (7078) Relay (7078)
↓ POST /reason ↓ POST /reason
Cortex (7081) Cortex (7081)
GET /summaries?session_id=xxx (internal Python call)
Intake (7080) [RETURNS SUMMARY] Intake module → summarize_context()
Cortex processes (4 stages): Cortex processes (4 stages):
1. reflection.py → meta-awareness notes 1. reflection.py → meta-awareness notes (CLOUD backend)
2. reasoning.py → draft answer (uses LLM) 2. reasoning.py → draft answer (PRIMARY backend)
3. refine.py → refined answer (uses LLM) 3. refine.py → refined answer (PRIMARY backend)
4. persona/speak.py → Lyra personality (uses LLM) 4. persona/speak.py → Lyra personality (CLOUD backend)
Returns persona answer to Relay Returns persona answer to Relay
Relay → Cortex /ingest (async, stub) Relay → POST /ingest (async)
Relay → Intake /add_exchange (async)
Intake → Background summarize → NeoMem Cortex → add_exchange_internal() → SESSIONS buffer
Relay → NeoMem /memories (async, planned)
Relay → UI (returns final response) Relay → UI (returns final response)
``` ```
### Cortex 4-Stage Reasoning Pipeline: ### Cortex 4-Stage Reasoning Pipeline:
1. **Reflection** (`reflection.py`) - Configurable LLM via HTTP 1. **Reflection** (`reflection.py`) - Cloud LLM (OpenAI)
- Analyzes user intent and conversation context - Analyzes user intent and conversation context
- Generates meta-awareness notes - Generates meta-awareness notes
- "What is the user really asking?" - "What is the user really asking?"
2. **Reasoning** (`reasoning.py`) - Configurable LLM via HTTP 2. **Reasoning** (`reasoning.py`) - Primary LLM (llama.cpp)
- Retrieves short-term context from Intake - Retrieves short-term context from Intake module
- Creates initial draft answer - Creates initial draft answer
- Integrates context, reflection notes, and user prompt - Integrates context, reflection notes, and user prompt
3. **Refinement** (`refine.py`) - Configurable LLM via HTTP 3. **Refinement** (`refine.py`) - Primary LLM (llama.cpp)
- Polishes the draft answer - Polishes the draft answer
- Improves clarity and coherence - Improves clarity and coherence
- Ensures factual consistency - Ensures factual consistency
4. **Persona** (`speak.py`) - Configurable LLM via HTTP 4. **Persona** (`speak.py`) - Cloud LLM (OpenAI)
- Applies Lyra's personality and speaking style - Applies Lyra's personality and speaking style
- Natural, conversational output - Natural, conversational output
- Final answer returned to user - Final answer returned to user
@@ -134,7 +147,7 @@ Relay → UI (returns final response)
- OpenAI-compatible endpoint: `POST /v1/chat/completions` - OpenAI-compatible endpoint: `POST /v1/chat/completions`
- Internal endpoint: `POST /chat` - Internal endpoint: `POST /chat`
- Health check: `GET /_health` - Health check: `GET /_health`
- Async non-blocking calls to Cortex and Intake - Async non-blocking calls to Cortex
- Shared request handler for code reuse - Shared request handler for code reuse
- Comprehensive error handling - Comprehensive error handling
@@ -154,73 +167,70 @@ Relay → UI (returns final response)
### Reasoning Layer ### Reasoning Layer
**Cortex** (v0.5): **Cortex** (v0.5.1):
- Multi-stage reasoning pipeline (reflection → reasoning → refine → persona) - Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
- Flexible LLM backend routing via HTTP - Flexible LLM backend routing via HTTP
- Per-stage backend selection - Per-stage backend selection
- Async processing throughout - Async processing throughout
- IntakeClient integration for short-term context - Embedded Intake module for short-term context
- `/reason`, `/ingest` (stub), `/health` endpoints - `/reason`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary` endpoints
- Lenient error handling - never fails the chat pipeline
**Intake** (v0.2): **Intake** (Embedded Module):
- Simplified single-level summarization - **Architectural change**: Now runs as Python module inside Cortex container
- Session-based circular buffer (200 exchanges max) - In-memory SESSIONS management (session_id → buffer)
- Background async summarization - Multi-level summarization: L1 (ultra-short), L5 (short), L10 (medium), L20 (detailed), L30 (full)
- Automatic NeoMem push - Deferred summarization strategy - summaries generated during `/reason` call
- No persistent log files (memory-only) - `bg_summarize()` is a logging stub - actual work deferred
- **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30) - **Single-worker constraint**: SESSIONS requires single Uvicorn worker or Redis/shared storage
**LLM Router**: **LLM Router**:
- Dynamic backend selection via HTTP - Dynamic backend selection via HTTP
- Environment-driven configuration - Environment-driven configuration
- Support for vLLM, Ollama, OpenAI, custom endpoints - Support for llama.cpp, Ollama, OpenAI, custom endpoints
- Per-module backend preferences - Per-module backend preferences:
- `CORTEX_LLM=SECONDARY` (Ollama for reasoning)
- `INTAKE_LLM=PRIMARY` (llama.cpp for summarization)
- `SPEAK_LLM=OPENAI` (Cloud for persona)
- `NEOMEM_LLM=PRIMARY` (llama.cpp for memory operations)
### Beta Lyrae (RAG Memory DB) - Currently Disabled
# Beta Lyrae (RAG Memory DB) - added 11-3-25
- **RAG Knowledge DB - Beta Lyrae (sheliak)** - **RAG Knowledge DB - Beta Lyrae (sheliak)**
- This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra. - This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra.
- It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation. - It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation.
The system uses: - **Status**: Disabled in docker-compose.yml (v0.5.1)
- **ChromaDB** for persistent vector storage
- **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity The system uses:
- **FastAPI** (port 7090) for the `/rag/search` REST endpoint - **ChromaDB** for persistent vector storage
- Directory Layout - **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity
rag/ - **FastAPI** (port 7090) for the `/rag/search` REST endpoint
├── rag_chat_import.py # imports JSON chat logs
├── rag_docs_import.py # (planned) PDF/EPUB/manual importer Directory Layout:
├── rag_build.py # legacy single-folder builder ```
├── rag_query.py # command-line query helper rag/
├── rag_api.py # FastAPI service providing /rag/search ├── rag_chat_import.py # imports JSON chat logs
├── chromadb/ # persistent vector store ├── rag_docs_import.py # (planned) PDF/EPUB/manual importer
├── chatlogs/ # organized source data ├── rag_build.py # legacy single-folder builder
│ ├── poker/ ├── rag_query.py # command-line query helper
│ ├── work/ ├── rag_api.py # FastAPI service providing /rag/search
│ ├── lyra/ ├── chromadb/ # persistent vector store
│ ├── personal/ ├── chatlogs/ # organized source data
│ └── ... │ ├── poker/
└── import.log # progress log for batch runs │ ├── work/
- **OpenAI chatlog importer. │ ├── lyra/
- Takes JSON formatted chat logs and imports it to the RAG. │ ├── personal/
- **fetures include:** │ └── ...
- Recursive folder indexing with **category detection** from directory name └── import.log # progress log for batch runs
- Smart chunking for long messages (5 000 chars per slice) ```
- Automatic deduplication using SHA-1 hash of file + chunk
- Timestamps for both file modification and import time **OpenAI chatlog importer features:**
- Full progress logging via tqdm - Recursive folder indexing with **category detection** from directory name
- Safe to run in background with nohup … & - Smart chunking for long messages (5,000 chars per slice)
- Metadata per chunk: - Automatic deduplication using SHA-1 hash of file + chunk
```json - Timestamps for both file modification and import time
{ - Full progress logging via tqdm
"chat_id": "<sha1 of filename>", - Safe to run in background with `nohup … &`
"chunk_index": 0,
"source": "chatlogs/lyra/0002_cortex_LLMs_11-1-25.json",
"title": "cortex LLMs 11-1-25",
"role": "assistant",
"category": "lyra",
"type": "chat",
"file_modified": "2025-11-06T23:41:02",
"imported_at": "2025-11-07T03:55:00Z"
}```
--- ---
@@ -228,13 +238,16 @@ Relay → UI (returns final response)
All services run in a single docker-compose stack with the following containers: All services run in a single docker-compose stack with the following containers:
**Active Services:**
- **neomem-postgres** - PostgreSQL with pgvector extension (port 5432) - **neomem-postgres** - PostgreSQL with pgvector extension (port 5432)
- **neomem-neo4j** - Neo4j graph database (ports 7474, 7687) - **neomem-neo4j** - Neo4j graph database (ports 7474, 7687)
- **neomem-api** - NeoMem memory service (port 7077) - **neomem-api** - NeoMem memory service (port 7077)
- **relay** - Main orchestrator (port 7078) - **relay** - Main orchestrator (port 7078)
- **cortex** - Reasoning engine (port 7081) - **cortex** - Reasoning engine with embedded Intake (port 7081)
- **intake** - Short-term memory summarization (port 7080) - currently disabled
- **rag** - RAG search service (port 7090) - currently disabled **Disabled Services:**
- **intake** - No longer needed (embedded in Cortex as of v0.5.1)
- **rag** - Beta Lyrae RAG service (port 7090) - currently disabled
All containers communicate via the `lyra_net` Docker bridge network. All containers communicate via the `lyra_net` Docker bridge network.
@@ -242,10 +255,10 @@ All containers communicate via the `lyra_net` Docker bridge network.
The following LLM backends are accessed via HTTP (not part of docker-compose): The following LLM backends are accessed via HTTP (not part of docker-compose):
- **vLLM Server** (`http://10.0.0.43:8000`) - **llama.cpp Server** (`http://10.0.0.44:8080`)
- AMD MI50 GPU-accelerated inference - AMD MI50 GPU-accelerated inference
- Custom ROCm-enabled vLLM build
- Primary backend for reasoning and refinement stages - Primary backend for reasoning and refinement stages
- Model path: `/model`
- **Ollama Server** (`http://10.0.0.3:11434`) - **Ollama Server** (`http://10.0.0.3:11434`)
- RTX 3090 GPU-accelerated inference - RTX 3090 GPU-accelerated inference
@@ -265,16 +278,38 @@ The following LLM backends are accessed via HTTP (not part of docker-compose):
## Version History ## Version History
### v0.5.0 (2025-11-28) - Current Release ### v0.5.1 (2025-12-11) - Current Release
**Critical Intake Integration Fixes:**
- ✅ Fixed `bg_summarize()` NameError preventing SESSIONS persistence
- ✅ Fixed `/ingest` endpoint unreachable code
- ✅ Added `cortex/intake/__init__.py` for proper package structure
- ✅ Added diagnostic logging to verify SESSIONS singleton behavior
- ✅ Added `/debug/sessions` and `/debug/summary` endpoints
- ✅ Documented single-worker constraint in Dockerfile
- ✅ Implemented lenient error handling (never fails chat pipeline)
- ✅ Intake now embedded in Cortex - no longer standalone service
**Architecture Changes:**
- Intake module runs inside Cortex container as pure Python import
- No HTTP calls between Cortex and Intake (internal function calls)
- SESSIONS persist correctly in Uvicorn worker
- Deferred summarization strategy (summaries generated during `/reason`)
### v0.5.0 (2025-11-28)
- ✅ Fixed all critical API wiring issues - ✅ Fixed all critical API wiring issues
- ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`) - ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
- ✅ Fixed Cortex → Intake integration - ✅ Fixed Cortex → Intake integration
- ✅ Added missing Python package `__init__.py` files - ✅ Added missing Python package `__init__.py` files
- ✅ End-to-end message flow verified and working - ✅ End-to-end message flow verified and working
### Infrastructure v1.0.0 (2025-11-26)
- Consolidated 9 scattered `.env` files into single source of truth
- Multi-backend LLM strategy implemented
- Docker Compose consolidation
- Created `.env.example` security templates
### v0.4.x (Major Rewire) ### v0.4.x (Major Rewire)
- Cortex multi-stage reasoning pipeline - Cortex multi-stage reasoning pipeline
- Intake v0.2 simplification
- LLM router with multi-backend support - LLM router with multi-backend support
- Major architectural restructuring - Major architectural restructuring
@@ -285,19 +320,30 @@ The following LLM backends are accessed via HTTP (not part of docker-compose):
--- ---
## Known Issues (v0.5.0) ## Known Issues (v0.5.1)
### Critical (Fixed in v0.5.1)
- ~~Intake SESSIONS not persisting~~ ✅ **FIXED**
- ~~`bg_summarize()` NameError~~ ✅ **FIXED**
- ~~`/ingest` endpoint unreachable code~~ ✅ **FIXED**
### Non-Critical ### Non-Critical
- Session management endpoints not fully implemented in Relay - Session management endpoints not fully implemented in Relay
- Intake service currently disabled in docker-compose.yml
- RAG service currently disabled in docker-compose.yml - RAG service currently disabled in docker-compose.yml
- Cortex `/ingest` endpoint is a stub - NeoMem integration in Relay not yet active (planned for v0.5.2)
### Operational Notes
- **Single-worker constraint**: Cortex must run with single Uvicorn worker to maintain SESSIONS state
- Multi-worker scaling requires migrating SESSIONS to Redis or shared storage
- Diagnostic endpoints (`/debug/sessions`, `/debug/summary`) available for troubleshooting
### Future Enhancements ### Future Enhancements
- Re-enable RAG service integration - Re-enable RAG service integration
- Implement full session persistence - Implement full session persistence
- Migrate SESSIONS to Redis for multi-worker support
- Add request correlation IDs for tracing - Add request correlation IDs for tracing
- Comprehensive health checks - Comprehensive health checks across all services
- NeoMem integration in Relay
--- ---
@@ -305,21 +351,39 @@ The following LLM backends are accessed via HTTP (not part of docker-compose):
### Prerequisites ### Prerequisites
- Docker + Docker Compose - Docker + Docker Compose
- At least one HTTP-accessible LLM endpoint (vLLM, Ollama, or OpenAI API key) - At least one HTTP-accessible LLM endpoint (llama.cpp, Ollama, or OpenAI API key)
### Setup ### Setup
1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys 1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys:
```bash
# Required: Configure at least one LLM backend
LLM_PRIMARY_URL=http://10.0.0.44:8080 # llama.cpp
LLM_SECONDARY_URL=http://10.0.0.3:11434 # Ollama
OPENAI_API_KEY=sk-... # OpenAI
```
2. Start all services with docker-compose: 2. Start all services with docker-compose:
```bash ```bash
docker-compose up -d docker-compose up -d
``` ```
3. Check service health: 3. Check service health:
```bash ```bash
# Relay health
curl http://localhost:7078/_health curl http://localhost:7078/_health
# Cortex health
curl http://localhost:7081/health
# NeoMem health
curl http://localhost:7077/health
``` ```
4. Access the UI at `http://localhost:7078` 4. Access the UI at `http://localhost:7078`
### Test ### Test
**Test Relay → Cortex pipeline:**
```bash ```bash
curl -X POST http://localhost:7078/v1/chat/completions \ curl -X POST http://localhost:7078/v1/chat/completions \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
@@ -329,15 +393,130 @@ curl -X POST http://localhost:7078/v1/chat/completions \
}' }'
``` ```
**Test Cortex /ingest endpoint:**
```bash
curl -X POST http://localhost:7081/ingest \
-H "Content-Type: application/json" \
-d '{
"session_id": "test",
"user_msg": "Hello",
"assistant_msg": "Hi there!"
}'
```
**Inspect SESSIONS state:**
```bash
curl http://localhost:7081/debug/sessions
```
**Get summary for a session:**
```bash
curl "http://localhost:7081/debug/summary?session_id=test"
```
All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack. All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack.
--- ---
## Environment Variables
### LLM Backend Configuration
**Backend URLs (Full API endpoints):**
```bash
LLM_PRIMARY_URL=http://10.0.0.44:8080 # llama.cpp
LLM_PRIMARY_MODEL=/model
LLM_SECONDARY_URL=http://10.0.0.3:11434 # Ollama
LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
LLM_OPENAI_URL=https://api.openai.com/v1
LLM_OPENAI_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-...
```
**Module-specific backend selection:**
```bash
CORTEX_LLM=SECONDARY # Use Ollama for reasoning
INTAKE_LLM=PRIMARY # Use llama.cpp for summarization
SPEAK_LLM=OPENAI # Use OpenAI for persona
NEOMEM_LLM=PRIMARY # Use llama.cpp for memory
UI_LLM=OPENAI # Use OpenAI for UI
RELAY_LLM=PRIMARY # Use llama.cpp for relay
```
### Database Configuration
```bash
POSTGRES_USER=neomem
POSTGRES_PASSWORD=neomempass
POSTGRES_DB=neomem
POSTGRES_HOST=neomem-postgres
POSTGRES_PORT=5432
NEO4J_URI=bolt://neomem-neo4j:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=neomemgraph
```
### Service URLs (Internal Docker Network)
```bash
NEOMEM_API=http://neomem-api:7077
CORTEX_API=http://cortex:7081
CORTEX_REASON_URL=http://cortex:7081/reason
CORTEX_INGEST_URL=http://cortex:7081/ingest
RELAY_URL=http://relay:7078
```
### Feature Flags
```bash
CORTEX_ENABLED=true
MEMORY_ENABLED=true
PERSONA_ENABLED=false
DEBUG_PROMPT=true
VERBOSE_DEBUG=true
```
For complete environment variable reference, see [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md).
---
## Documentation ## Documentation
- See [CHANGELOG.md](CHANGELOG.md) for detailed version history - [CHANGELOG.md](CHANGELOG.md) - Detailed version history
- See `ENVIRONMENT_VARIABLES.md` for environment variable reference - [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Comprehensive project overview for AI context
- Additional information available in the Trilium docs - [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md) - Environment variable reference
- [DEPRECATED_FILES.md](DEPRECATED_FILES.md) - Deprecated files and migration guide
---
## Troubleshooting
### SESSIONS not persisting
**Symptom:** Intake buffer always shows 0 exchanges, summaries always empty.
**Solution (Fixed in v0.5.1):**
- Ensure `cortex/intake/__init__.py` exists
- Check Cortex logs for `[Intake Module Init]` message showing SESSIONS object ID
- Verify single-worker mode (Dockerfile: `uvicorn main:app --workers 1`)
- Use `/debug/sessions` endpoint to inspect current state
### Cortex connection errors
**Symptom:** Relay can't reach Cortex, 502 errors.
**Solution:**
- Verify Cortex container is running: `docker ps | grep cortex`
- Check Cortex health: `curl http://localhost:7081/health`
- Verify environment variables: `CORTEX_REASON_URL=http://cortex:7081/reason`
- Check docker network: `docker network inspect lyra_net`
### LLM backend timeouts
**Symptom:** Reasoning stage hangs or times out.
**Solution:**
- Verify LLM backend is running and accessible
- Check LLM backend health: `curl http://10.0.0.44:8080/health`
- Increase timeout in llm_router.py if using slow models
- Check logs for specific backend errors
--- ---
@@ -356,6 +535,8 @@ NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
- All services communicate via Docker internal networking on the `lyra_net` bridge - All services communicate via Docker internal networking on the `lyra_net` bridge
- History and entity graphs are managed via PostgreSQL + Neo4j - History and entity graphs are managed via PostgreSQL + Neo4j
- LLM backends are accessed via HTTP and configured in `.env` - LLM backends are accessed via HTTP and configured in `.env`
- Intake module is imported internally by Cortex (no HTTP communication)
- SESSIONS state is maintained in-memory within Cortex container
--- ---
@@ -391,3 +572,38 @@ NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
}' }'
``` ```
---
## Development Notes
### Cortex Architecture (v0.5.1)
- Cortex contains embedded Intake module at `cortex/intake/`
- Intake is imported as: `from intake.intake import add_exchange_internal, SESSIONS`
- SESSIONS is a module-level global dictionary (singleton pattern)
- Single-worker constraint required to maintain SESSIONS state
- Diagnostic endpoints available for debugging: `/debug/sessions`, `/debug/summary`
### Adding New LLM Backends
1. Add backend URL to `.env`:
```bash
LLM_CUSTOM_URL=http://your-backend:port
LLM_CUSTOM_MODEL=model-name
```
2. Configure module to use new backend:
```bash
CORTEX_LLM=CUSTOM
```
3. Restart Cortex container:
```bash
docker-compose restart cortex
```
### Debugging Tips
- Enable verbose logging: `VERBOSE_DEBUG=true` in `.env`
- Check Cortex logs: `docker logs cortex -f`
- Inspect SESSIONS: `curl http://localhost:7081/debug/sessions`
- Test summarization: `curl "http://localhost:7081/debug/summary?session_id=test"`
- Check Relay logs: `docker logs relay -f`
- Monitor Docker network: `docker network inspect lyra_net`
View File
View File
View File
+20 -1
View File
@@ -84,7 +84,7 @@ app.get("/_health", (_, res) => {
// ----------------------------------------------------- // -----------------------------------------------------
app.post("/v1/chat/completions", async (req, res) => { app.post("/v1/chat/completions", async (req, res) => {
try { try {
const session_id = req.body.session_id || req.body.user || "default"; const session_id = req.body.session_id || req.body.sessionId || req.body.user || "default";
const messages = req.body.messages || []; const messages = req.body.messages || [];
const lastMessage = messages[messages.length - 1]; const lastMessage = messages[messages.length - 1];
const user_msg = lastMessage?.content || ""; const user_msg = lastMessage?.content || "";
@@ -151,6 +151,25 @@ app.post("/chat", async (req, res) => {
} }
}); });
// -----------------------------------------------------
// SESSION ENDPOINTS (for UI)
// -----------------------------------------------------
// In-memory session storage (could be replaced with a database)
const sessions = new Map();
app.get("/sessions/:id", (req, res) => {
const sessionId = req.params.id;
const history = sessions.get(sessionId) || [];
res.json(history);
});
app.post("/sessions/:id", (req, res) => {
const sessionId = req.params.id;
const history = req.body;
sessions.set(sessionId, history);
res.json({ ok: true, saved: history.length });
});
// ----------------------------------------------------- // -----------------------------------------------------
app.listen(PORT, () => { app.listen(PORT, () => {
console.log(`Relay is online on port ${PORT}`); console.log(`Relay is online on port ${PORT}`);
+1 -1
View File
@@ -51,7 +51,7 @@
</div> </div>
<script> <script>
const RELAY_BASE = "http://10.0.0.40:7078"; const RELAY_BASE = "http://10.0.0.41:7078";
const API_URL = `${RELAY_BASE}/v1/chat/completions`; const API_URL = `${RELAY_BASE}/v1/chat/completions`;
function generateSessionId() { function generateSessionId() {
+6
View File
@@ -282,11 +282,17 @@ JSON only. No text outside JSON.
try: try:
llm_response = await call_llm( llm_response = await call_llm(
prompt, prompt,
backend=INTAKE_LLM,
temperature=0.2 temperature=0.2
) )
print(f"[Intake] LLM response length: {len(llm_response) if llm_response else 0}")
print(f"[Intake] LLM response preview: {llm_response[:200] if llm_response else '(empty)'}")
# LLM should return JSON, parse it # LLM should return JSON, parse it
if not llm_response or not llm_response.strip():
raise ValueError("Empty response from LLM")
summary = json.loads(llm_response) summary = json.loads(llm_response)
return { return {
+53 -17
View File
@@ -1,7 +1,10 @@
# llm_router.py # llm_router.py
import os import os
import requests import httpx
import json import json
import logging
logger = logging.getLogger(__name__)
# ------------------------------------------------------------ # ------------------------------------------------------------
# Load backend registry from root .env # Load backend registry from root .env
@@ -33,6 +36,9 @@ BACKENDS = {
DEFAULT_BACKEND = "PRIMARY" DEFAULT_BACKEND = "PRIMARY"
# Reusable async HTTP client
http_client = httpx.AsyncClient(timeout=120.0)
# ------------------------------------------------------------ # ------------------------------------------------------------
# Public call # Public call
@@ -57,18 +63,28 @@ async def call_llm(
raise RuntimeError(f"Backend '{backend}' missing url/model in env") raise RuntimeError(f"Backend '{backend}' missing url/model in env")
# ------------------------------- # -------------------------------
# Provider: VLLM (your MI50) # Provider: MI50 (llama.cpp server)
# ------------------------------- # -------------------------------
if provider == "vllm": if provider == "mi50":
payload = { payload = {
"model": model,
"prompt": prompt, "prompt": prompt,
"max_tokens": max_tokens, "n_predict": max_tokens,
"temperature": temperature "temperature": temperature
} }
r = requests.post(url, json=payload, timeout=120) try:
data = r.json() r = await http_client.post(f"{url}/completion", json=payload)
return data["choices"][0]["text"] r.raise_for_status()
data = r.json()
return data.get("content", "")
except httpx.HTTPError as e:
logger.error(f"HTTP error calling mi50: {type(e).__name__}: {str(e)}")
raise RuntimeError(f"LLM API error (mi50): {type(e).__name__}: {str(e)}")
except (KeyError, json.JSONDecodeError) as e:
logger.error(f"Response parsing error from mi50: {e}")
raise RuntimeError(f"Invalid response format (mi50): {e}")
except Exception as e:
logger.error(f"Unexpected error calling mi50: {type(e).__name__}: {str(e)}")
raise RuntimeError(f"Unexpected error (mi50): {type(e).__name__}: {str(e)}")
# ------------------------------- # -------------------------------
# Provider: OLLAMA (your 3090) # Provider: OLLAMA (your 3090)
@@ -79,13 +95,22 @@ async def call_llm(
"messages": [ "messages": [
{"role": "user", "content": prompt} {"role": "user", "content": prompt}
], ],
"stream": False # <-- critical fix "stream": False
} }
try:
r = requests.post(f"{url}/api/chat", json=payload, timeout=120) r = await http_client.post(f"{url}/api/chat", json=payload)
data = r.json() r.raise_for_status()
data = r.json()
return data["message"]["content"] return data["message"]["content"]
except httpx.HTTPError as e:
logger.error(f"HTTP error calling ollama: {type(e).__name__}: {str(e)}")
raise RuntimeError(f"LLM API error (ollama): {type(e).__name__}: {str(e)}")
except (KeyError, json.JSONDecodeError) as e:
logger.error(f"Response parsing error from ollama: {e}")
raise RuntimeError(f"Invalid response format (ollama): {e}")
except Exception as e:
logger.error(f"Unexpected error calling ollama: {type(e).__name__}: {str(e)}")
raise RuntimeError(f"Unexpected error (ollama): {type(e).__name__}: {str(e)}")
# ------------------------------- # -------------------------------
@@ -104,9 +129,20 @@ async def call_llm(
"temperature": temperature, "temperature": temperature,
"max_tokens": max_tokens, "max_tokens": max_tokens,
} }
r = requests.post(f"{url}/chat/completions", json=payload, headers=headers, timeout=120) try:
data = r.json() r = await http_client.post(f"{url}/chat/completions", json=payload, headers=headers)
return data["choices"][0]["message"]["content"] r.raise_for_status()
data = r.json()
return data["choices"][0]["message"]["content"]
except httpx.HTTPError as e:
logger.error(f"HTTP error calling openai: {type(e).__name__}: {str(e)}")
raise RuntimeError(f"LLM API error (openai): {type(e).__name__}: {str(e)}")
except (KeyError, json.JSONDecodeError) as e:
logger.error(f"Response parsing error from openai: {e}")
raise RuntimeError(f"Invalid response format (openai): {e}")
except Exception as e:
logger.error(f"Unexpected error calling openai: {type(e).__name__}: {str(e)}")
raise RuntimeError(f"Unexpected error (openai): {type(e).__name__}: {str(e)}")
# ------------------------------- # -------------------------------
# Unknown provider # Unknown provider
+15
View File
@@ -97,6 +97,21 @@ services:
networks: networks:
- lyra_net - lyra_net
# ============================================================
# UI Server
# ============================================================
lyra-ui:
image: nginx:alpine
container_name: lyra-ui
restart: unless-stopped
ports:
- "8081:80"
volumes:
- ./core/ui:/usr/share/nginx/html:ro
networks:
- lyra_net
# ============================================================ # ============================================================
# Cortex # Cortex
# ============================================================ # ============================================================
+280
View File
@@ -0,0 +1,280 @@
`docs/ARCHITECTURE_v0.6.0.md`
This reflects **everything we clarified**, expressed cleanly and updated to the new 3-brain design.
---
# **Cortex v0.6.0 — Cognitive Architecture Overview**
*Last updated: Dec 2025*
## **Summary**
Cortex v0.6.0 evolves from a linear “reflection → reasoning → refine → persona” pipeline into a **three-layer cognitive system** modeled after human cognition:
1. **Autonomy Core** — Lyras self-model (identity, mood, long-term goals)
2. **Inner Monologue** — Lyras private narrator (self-talk + internal reflection)
3. **Executive Agent (DeepSeek)** — Lyras task-oriented decision-maker
Cortex itself now becomes the **central orchestrator**, not the whole mind. It routes user messages through these layers and produces the final outward response via the persona system.
---
# **Chain concept**
User > Relay > Cortex intake > Inner self > Cortex > Exec (deepseek) > Cortex > persona > relay > user And inner self
USER
RELAY
(sessions, logging, routing)
┌──────────────────────────────────┐
│ CORTEX │
│ Intake → Reflection → Exec → Reason → Refine │
└───────────────┬──────────────────┘
│ self_state
INNER SELF (monologue)
AUTONOMY CORE
(long-term identity)
Persona Layer (speak)
RELAY
USER
# **High-level Architecture**
```
Autonomy Core (Self-Model)
┌────────────────────────────────────────┐
│ mood, identity, goals, emotional state│
│ updated outside Cortex by inner monologue│
└─────────────────────┬──────────────────┘
Inner Monologue (Self-Talk Loop)
┌────────────────────────────────────────┐
│ Interprets events in language │
│ Updates Autonomy Core │
│ Sends state-signals INTO Cortex │
└─────────────────────┬──────────────────┘
Cortex (Task Brain / Router)
┌────────────────────────────────────────────────────────┐
│ Intake → Reflection → Exec Agent → Reason → Refinement │
│ ↑ │ │
│ │ ▼ │
│ Receives state from Persona Output │
│ inner self (Lyras voice) │
└────────────────────────────────────────────────────────┘
```
The **user interacts only with the Persona layer**.
Inner Monologue and Autonomy Core never speak directly to the user.
---
# **Component Breakdown**
## **1. Autonomy Core (Self-Model)**
*Not inside Cortex.*
A persistent JSON/state machine representing Lyras ongoing inner life:
* `mood`
* `focus_mode`
* `confidence`
* `identity_traits`
* `relationship_memory`
* `long_term_goals`
* `emotional_baseline`
The Autonomy Core:
* Is updated by Inner Monologue
* Exposes its state to Cortex via a simple `get_state()` API
* Never speaks to the user directly
* Does not run LLMs itself
It is the **structure** of self, not the thoughts.
---
## **2. Inner Monologue (Narrating, Private Mind)**
*New subsystem in v0.6.0.*
This module:
* Reads Cortex summaries (intake, reflection, persona output)
* Generates private self-talk (using an LLM, typically DeepSeek)
* Updates the Autonomy Core
* Produces a **self-state packet** for Cortex to use during task execution
Inner Monologue is like:
> “Brian is asking about X.
> I should shift into a focused, serious tone.
> I feel confident about this area.”
It **never** outputs directly to the user.
### Output schema (example):
```json
{
"mood": "focused",
"persona_bias": "clear",
"confidence_delta": +0.05,
"stance": "analytical",
"notes_to_cortex": [
"Reduce playfulness",
"Prioritize clarity",
"Recall project memory"
]
}
```
---
## **3. Executive Agent (DeepSeek Director Mode)**
Inside Cortex.
This is Lyras **prefrontal cortex** — the task-oriented planner that decides how to respond to the current user message.
Input to Executive Agent:
* User message
* Intake summary
* Reflection notes
* **Self-state packet** from Inner Monologue
It outputs a **plan**, not a final answer:
```json
{
"action": "WRITE_NOTE",
"tools": ["memory_search"],
"tone": "focused",
"steps": [
"Search relevant project notes",
"Synthesize into summary",
"Draft actionable update"
]
}
```
Cortex then executes this plan.
---
# **Cortex Pipeline (v0.6.0)**
Cortex becomes the orchestrator for the entire sequence:
### **0. Intake**
Parse the user message, extract relevant features.
### **1. Reflection**
Lightweight summarization (unchanged).
Output used by both Inner Monologue and Executive Agent.
### **2. Inner Monologue Update (parallel)**
Reflection summary is sent to Inner Self, which:
* updates Autonomy Core
* returns `self_state` to Cortex
### **3. Executive Agent (DeepSeek)**
Given:
* user message
* reflection summary
* autonomy self_state
→ produce a **task plan**
### **4. Reasoning**
Carries out the plan:
* tool calls
* retrieval
* synthesis
### **5. Refinement**
Polish the draft, ensure quality, follow constraints.
### **6. Persona (speak.py)**
Final transformation into Lyras voice.
Persona now uses:
* self_state (mood, tone)
* constraints from Executive Agent
### **7. User Response**
Persona output is delivered to the user.
### **8. Inner Monologue Post-Update**
Cortex sends the final answer BACK to inner self for:
* narrative continuity
* emotional adjustment
* identity update
---
# **Key Conceptual Separation**
These three layers must remain distinct:
| Layer | Purpose |
| ------------------- | ------------------------------------------------------- |
| **Autonomy Core** | Lyras identity + emotional continuity |
| **Inner Monologue** | Lyras private thoughts, interpretation, meaning-making |
| **Executive Agent** | Deciding what to *do* for the user message |
| **Cortex** | Executing the plan |
| **Persona** | Outward voice (what the user actually hears) |
The **user only interacts with Persona.**
Inner Monologue and Autonomy Core are internal cognitive machinery.
---
# **What This Architecture Enables**
* Emotional continuity
* Identity stability
* Agentic decision-making
* Multi-model routing
* Context-aware tone
* Internal narrative
* Proactive behavioral shifts
* Human-like cognition
This design turns Cortex from a simple pipeline into the **center of a functional artificial mind**.
+354
View File
@@ -0,0 +1,354 @@
Here you go — **ARCHITECTURE_v0.6.1.md**, clean, structured, readable, and aligned exactly with the new mental model where **Inner Self is the core agent** the user interacts with.
No walls of text — just the right amount of detail.
---
# **ARCHITECTURE_v0.6.1 — Lyra Cognitive System**
> **Core change from v0.6.0 → v0.6.1:**
> **Inner Self becomes the primary conversational agent**
> (the model the user is *actually* talking to),
> while Executive and Cortex models support the Self rather than drive it.
---
# **1. High-Level Overview**
Lyra v0.6.1 is composed of **three cognitive layers** and **one expression layer**, plus an autonomy module for ongoing identity continuity.
```
USER
Relay (I/O)
Cortex Intake (context snapshot)
INNER SELF ←→ EXECUTIVE MODEL (DeepSeek)
Cortex Chat Model (draft language)
Persona Model (Lyras voice)
Relay → USER
Inner Self updates Autonomy Core (self-state)
```
---
# **2. Roles of Each Layer**
---
## **2.1 Inner Self (Primary Conversational Agent)**
The Self is Lyras “seat of consciousness.”
This layer:
* Interprets every user message
* Maintains internal monologue
* Chooses emotional stance (warm, blunt, focused, chaotic)
* Decides whether to think deeply or reply quickly
* Decides whether to consult the Executive model
* Forms a **response intent**
* Provides tone and meta-guidance to the Persona layer
* Updates self-state (mood, trust, narrative identity)
Inner Self is the thing the **user is actually talking to.**
Inner Self does **NOT** generate paragraphs of text —
it generates *intent*:
```
{
"intent": "comfort Brian and explain the error simply",
"tone": "gentle",
"depth": "medium",
"consult_exec": true
}
```
---
## **2.2 Executive Model (DeepSeek Reasoner)**
This model is the **thinking engine** Inner Self consults when necessary.
It performs:
* planning
* deep reasoning
* tool selection
* multi-step logic
* explanation chains
It never speaks directly to the user.
It returns a **plan**, not a message:
```
{
"plan": [
"Identify error",
"Recommend restart",
"Reassure user"
],
"confidence": 0.86
}
```
Inner Self can follow or override the plan.
---
## **2.3 Cortex Chat Model (Draft Generator)**
This is the **linguistic engine**.
It converts Inner Selfs intent (plus Executives plan if provided) into actual language:
Input:
```
intent + optional plan + context snapshot
```
Output:
```
structured draft paragraph
```
This model must be:
* instruction-tuned
* coherent
* factual
* friendly
Examples: GPT-4o-mini, Qwen-14B-instruct, Mixtral chat, etc.
---
## **2.4 Persona Model (Lyras Voice)**
This is the **expression layer** — the mask, the tone, the identity.
It takes:
* the draft language
* the Selfs tone instructions
* the narrative state (from Autonomy Core)
* prior persona shaping rules
And transforms the text into:
* Lyras voice
* Lyras humor
* Lyras emotional texture
* Lyras personality consistency
Persona does not change the *meaning* — only the *presentation*.
---
# **3. Message Flow (Full Pipeline)**
A clean version, step-by-step:
---
### **1. USER → Relay**
Relay attaches metadata (session, timestamp) and forwards to Cortex.
---
### **2. Intake → Context Snapshot**
Cortex creates:
* cleaned message
* recent context summary
* memory matches (RAG)
* time-since-last
* conversation mode
---
### **3. Inner Self Receives Snapshot**
Inner Self:
* interprets the users intent
* updates internal monologue
* decides how Lyra *feels* about the input
* chooses whether to consult Executive
* produces an **intent packet**
---
### **4. (Optional) Inner Self Consults Executive Model**
Inner Self sends the situation to DeepSeek:
```
"Given Brian's message and my context, what is the best plan?"
```
DeepSeek returns:
* a plan
* recommended steps
* rationale
* optional tool suggestions
Inner Self integrates the plan or overrides it.
---
### **5. Inner Self → Cortex Chat Model**
Self creates an **instruction packet**:
```
{
"intent": "...",
"tone": "...",
"plan": [...],
"context_summary": {...}
}
```
Cortex chat model produces the draft text.
---
### **6. Persona Model Transforms the Draft**
Persona takes draft → produces final Lyra-styled output.
Persona ensures:
* emotional fidelity
* humor when appropriate
* warmth / sharpness depending on state
* consistent narrative identity
---
### **7. Relay Sends Response to USER**
---
### **8. Inner Self Updates Autonomy Core**
Inner Self receives:
* the action taken
* the emotional tone used
* any RAG results
* narrative significance
And updates:
* mood
* trust memory
* identity drift
* ongoing narrative
* stable traits
This becomes part of her evolving self.
---
# **4. Cognitive Ownership Summary**
### Inner Self
**Owns:**
* decision-making
* feeling
* interpreting
* intent
* tone
* continuity of self
* mood
* monologue
* overrides
### Executive (DeepSeek)
**Owns:**
* logic
* planning
* structure
* analysis
* tool selection
### Cortex Chat Model
**Owns:**
* language generation
* factual content
* clarity
### Persona
**Owns:**
* voice
* flavor
* style
* emotional texture
* social expression
---
# **5. Why v0.6.1 is Better**
* More human
* More natural
* Allows spontaneous responses
* Allows deep thinking when needed
* Separates “thought” from “speech”
* Gives Lyra a *real self*
* Allows much more autonomy later
* Matches your brains actual structure
---
# **6. Migration Notes from v0.6.0**
Nothing is deleted.
Everything is **rearranged** so that meaning, intent, and tone flow correctly.
Main changes:
* Inner Self now initiates the response, rather than merely influencing it.
* Executive is secondary, not primary.
* Persona becomes an expression layer, not a content layer.
* Cortex Chat Model handles drafting, not cognition.
The whole system becomes both more powerful and easier to reason about.
---
If you want, I can also generate:
### ✔ the updated directory structure
### ✔ the updated function-level API contracts
### ✔ the v0.6.1 llm_router configuration
### ✔ code scaffolds for inner_self.py and autonomy_core.py
### ✔ the call chain diagrams (ASCII or PNG)
Just say **“continue v0.6.1”** and Ill build the next layer.
+39
View File
@@ -0,0 +1,39 @@
Request Flow Chain
1. UI (Frontend)
↓ sends HTTP POST to
2. Relay Service (Node.js - server.js)
Location: /home/serversdown/project-lyra/core/relay/server.js
Port: 7078
Endpoint: POST /v1/chat/completions
↓ calls handleChatRequest() which posts to
3. Cortex Service - Reason Endpoint (Python FastAPI - router.py)
Location: /home/serversdown/project-lyra/cortex/router.py
Port: 7081
Endpoint: POST /reason
Function: run_reason() at line 126
↓ calls
4. Cortex Reasoning Module (reasoning.py)
Location: /home/serversdown/project-lyra/cortex/reasoning/reasoning.py
Function: reason_check() at line 188
↓ calls
5. LLM Router (llm_router.py)
Location: /home/serversdown/project-lyra/cortex/llm/llm_router.py
Function: call_llm()
- Gets backend from env: CORTEX_LLM=PRIMARY (from .env line 29)
- Looks up PRIMARY config which has provider="mi50" (from .env line 13)
- Routes to the mi50 provider handler (line 62-70)
↓ makes HTTP POST to
6. MI50 LLM Server (llama.cpp)
Location: http://10.0.0.44:8080
Endpoint: POST /completion
Hardware: AMD MI50 GPU running DeepSeek model
Key Configuration Points
Backend Selection: .env:29 sets CORTEX_LLM=PRIMARY
Provider Name: .env:13 sets LLM_PRIMARY_PROVIDER=mi50
Server URL: .env:14 sets LLM_PRIMARY_URL=http://10.0.0.44:8080
Provider Handler: llm_router.py:62-70 implements the mi50 provider
+925
View File
@@ -0,0 +1,925 @@
# Project Lyra — Comprehensive AI Context Summary
**Version:** v0.5.1 (2025-12-11)
**Status:** Production-ready modular AI companion system
**Purpose:** Memory-backed conversational AI with multi-stage reasoning, persistent context, and modular LLM backend architecture
---
## Executive Summary
Project Lyra is a **self-hosted AI companion system** designed to overcome the limitations of typical chatbots by providing:
- **Persistent long-term memory** (NeoMem: PostgreSQL + Neo4j graph storage)
- **Multi-stage reasoning pipeline** (Cortex: reflection → reasoning → refinement → persona)
- **Short-term context management** (Intake: session-based summarization embedded in Cortex)
- **Flexible LLM backend routing** (supports llama.cpp, Ollama, OpenAI, custom endpoints)
- **OpenAI-compatible API** (drop-in replacement for chat applications)
**Core Philosophy:** Like a human brain has different regions for different functions, Lyra has specialized modules that work together. She's not just a chatbot—she's a notepad, schedule, database, co-creator, and collaborator with her own executive function.
---
## Quick Context for AI Assistants
If you're an AI being given this project to work on, here's what you need to know:
### What This Project Does
Lyra is a conversational AI system that **remembers everything** across sessions. When a user says something in passing, Lyra stores it, contextualizes it, and can recall it later. She can:
- Track project progress over time
- Remember user preferences and past conversations
- Reason through complex questions using multiple LLM calls
- Apply a consistent personality across all interactions
- Integrate with multiple LLM backends (local and cloud)
### Current Architecture (v0.5.1)
```
User → Relay (Express/Node.js, port 7078)
Cortex (FastAPI/Python, port 7081)
├─ Intake module (embedded, in-memory SESSIONS)
├─ 4-stage reasoning pipeline
└─ Multi-backend LLM router
NeoMem (FastAPI/Python, port 7077)
├─ PostgreSQL (vector storage)
└─ Neo4j (graph relationships)
```
### Key Files You'll Work With
**Backend Services:**
- [cortex/router.py](cortex/router.py) - Main Cortex routing logic (306 lines, `/reason`, `/ingest` endpoints)
- [cortex/intake/intake.py](cortex/intake/intake.py) - Short-term memory module (367 lines, SESSIONS management)
- [cortex/reasoning/reasoning.py](cortex/reasoning/reasoning.py) - Draft answer generation
- [cortex/reasoning/refine.py](cortex/reasoning/refine.py) - Answer refinement
- [cortex/reasoning/reflection.py](cortex/reasoning/reflection.py) - Meta-awareness notes
- [cortex/persona/speak.py](cortex/persona/speak.py) - Personality layer
- [cortex/llm/llm_router.py](cortex/llm/llm_router.py) - LLM backend selector
- [core/relay/server.js](core/relay/server.js) - Main orchestrator (Node.js)
- [neomem/main.py](neomem/main.py) - Long-term memory API
**Configuration:**
- [.env](.env) - Root environment variables (LLM backends, databases, API keys)
- [cortex/.env](cortex/.env) - Cortex-specific overrides
- [docker-compose.yml](docker-compose.yml) - Service definitions (152 lines)
**Documentation:**
- [CHANGELOG.md](CHANGELOG.md) - Complete version history (836 lines, chronological format)
- [README.md](README.md) - User-facing documentation (610 lines)
- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - This file
### Recent Critical Fixes (v0.5.1)
The most recent work fixed a critical bug where Intake's SESSIONS buffer wasn't persisting:
1. **Fixed**: `bg_summarize()` was only a TYPE_CHECKING stub → implemented as logging stub
2. **Fixed**: `/ingest` endpoint had unreachable code → removed early return, added lenient error handling
3. **Added**: `cortex/intake/__init__.py` → proper Python package structure
4. **Added**: Diagnostic endpoints `/debug/sessions` and `/debug/summary` for troubleshooting
**Key Insight**: Intake is no longer a standalone service—it's embedded in Cortex as a Python module. SESSIONS must persist in a single Uvicorn worker (no multi-worker support without Redis).
---
## Architecture Deep Dive
### Service Topology (Docker Compose)
**Active Containers:**
1. **relay** (Node.js/Express, port 7078)
- Entry point for all user requests
- OpenAI-compatible `/v1/chat/completions` endpoint
- Routes to Cortex for reasoning
- Async calls to Cortex `/ingest` after response
2. **cortex** (Python/FastAPI, port 7081)
- Multi-stage reasoning pipeline
- Embedded Intake module (no HTTP, direct Python imports)
- Endpoints: `/reason`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary`
3. **neomem-api** (Python/FastAPI, port 7077)
- Long-term memory storage
- Fork of Mem0 OSS (fully local, no external SDK)
- Endpoints: `/memories`, `/search`, `/health`
4. **neomem-postgres** (PostgreSQL + pgvector, port 5432)
- Vector embeddings storage
- Memory history records
5. **neomem-neo4j** (Neo4j, ports 7474/7687)
- Graph relationships between memories
- Entity extraction and linking
**Disabled Services:**
- `intake` - No longer needed (embedded in Cortex as of v0.5.1)
- `rag` - Beta Lyrae RAG service (planned re-enablement)
### External LLM Backends (HTTP APIs)
**PRIMARY Backend** - llama.cpp @ `http://10.0.0.44:8080`
- AMD MI50 GPU-accelerated inference
- Model: `/model` (path-based routing)
- Used for: Reasoning, refinement, summarization
**SECONDARY Backend** - Ollama @ `http://10.0.0.3:11434`
- RTX 3090 GPU-accelerated inference
- Model: `qwen2.5:7b-instruct-q4_K_M`
- Used for: Configurable per-module
**CLOUD Backend** - OpenAI @ `https://api.openai.com/v1`
- Cloud-based inference
- Model: `gpt-4o-mini`
- Used for: Reflection, persona layers
**FALLBACK Backend** - Local @ `http://10.0.0.41:11435`
- CPU-based inference
- Model: `llama-3.2-8b-instruct`
- Used for: Emergency fallback
### Data Flow (Request Lifecycle)
```
1. User sends message → Relay (/v1/chat/completions)
2. Relay → Cortex (/reason)
3. Cortex calls Intake module (internal Python)
- Intake.summarize_context(session_id, exchanges)
- Returns L1/L5/L10/L20/L30 summaries
4. Cortex 4-stage pipeline:
a. reflection.py → Meta-awareness notes (CLOUD backend)
- "What is the user really asking?"
- Returns JSON: {"notes": [...]}
b. reasoning.py → Draft answer (PRIMARY backend)
- Uses context from Intake
- Integrates reflection notes
- Returns draft text
c. refine.py → Refined answer (PRIMARY backend)
- Polishes draft for clarity
- Ensures factual consistency
- Returns refined text
d. speak.py → Persona layer (CLOUD backend)
- Applies Lyra's personality
- Natural, conversational tone
- Returns final answer
5. Cortex → Relay (returns persona answer)
6. Relay → Cortex (/ingest) [async, non-blocking]
- Sends (session_id, user_msg, assistant_msg)
- Cortex calls add_exchange_internal()
- Appends to SESSIONS[session_id]["buffer"]
7. Relay → User (returns final response)
8. [Planned] Relay → NeoMem (/memories) [async]
- Store conversation in long-term memory
```
### Intake Module Architecture (v0.5.1)
**Location:** `cortex/intake/`
**Key Change:** Intake is now **embedded in Cortex** as a Python module, not a standalone service.
**Import Pattern:**
```python
from intake.intake import add_exchange_internal, SESSIONS, summarize_context
```
**Core Data Structure:**
```python
SESSIONS: dict[str, dict] = {}
# Structure:
SESSIONS[session_id] = {
"buffer": deque(maxlen=200), # Circular buffer of exchanges
"created_at": datetime
}
# Each exchange in buffer:
{
"session_id": "...",
"user_msg": "...",
"assistant_msg": "...",
"timestamp": "2025-12-11T..."
}
```
**Functions:**
1. **`add_exchange_internal(exchange: dict)`**
- Adds exchange to SESSIONS buffer
- Creates new session if needed
- Calls `bg_summarize()` stub
- Returns `{"ok": True, "session_id": "..."}`
2. **`summarize_context(session_id: str, exchanges: list[dict])`** [async]
- Generates L1/L5/L10/L20/L30 summaries via LLM
- Called during `/reason` endpoint
- Returns multi-level summary dict
3. **`bg_summarize(session_id: str)`**
- **Stub function** - logs only, no actual work
- Defers summarization to `/reason` call
- Exists to prevent NameError
**Critical Constraint:** SESSIONS is a module-level global dict. This requires **single-worker Uvicorn** mode. Multi-worker deployments need Redis or shared storage.
**Diagnostic Endpoints:**
- `GET /debug/sessions` - Inspect all SESSIONS (object ID, buffer sizes, recent exchanges)
- `GET /debug/summary?session_id=X` - Test summarization for a session
---
## Environment Configuration
### LLM Backend Registry (Multi-Backend Strategy)
**Root `.env` defines all backend OPTIONS:**
```bash
# PRIMARY Backend (llama.cpp)
LLM_PRIMARY_PROVIDER=llama.cpp
LLM_PRIMARY_URL=http://10.0.0.44:8080
LLM_PRIMARY_MODEL=/model
# SECONDARY Backend (Ollama)
LLM_SECONDARY_PROVIDER=ollama
LLM_SECONDARY_URL=http://10.0.0.3:11434
LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
# CLOUD Backend (OpenAI)
LLM_OPENAI_PROVIDER=openai
LLM_OPENAI_URL=https://api.openai.com/v1
LLM_OPENAI_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-proj-...
# FALLBACK Backend
LLM_FALLBACK_PROVIDER=openai_completions
LLM_FALLBACK_URL=http://10.0.0.41:11435
LLM_FALLBACK_MODEL=llama-3.2-8b-instruct
```
**Module-specific backend selection:**
```bash
CORTEX_LLM=SECONDARY # Cortex uses Ollama
INTAKE_LLM=PRIMARY # Intake uses llama.cpp
SPEAK_LLM=OPENAI # Persona uses OpenAI
NEOMEM_LLM=PRIMARY # NeoMem uses llama.cpp
UI_LLM=OPENAI # UI uses OpenAI
RELAY_LLM=PRIMARY # Relay uses llama.cpp
```
**Philosophy:** Root `.env` provides all backend OPTIONS. Each service chooses which backend to USE via `{MODULE}_LLM` variable. This eliminates URL duplication while preserving flexibility.
### Database Configuration
```bash
# PostgreSQL (vector storage)
POSTGRES_USER=neomem
POSTGRES_PASSWORD=neomempass
POSTGRES_DB=neomem
POSTGRES_HOST=neomem-postgres
POSTGRES_PORT=5432
# Neo4j (graph storage)
NEO4J_URI=bolt://neomem-neo4j:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=neomemgraph
```
### Service URLs (Docker Internal Network)
```bash
NEOMEM_API=http://neomem-api:7077
CORTEX_API=http://cortex:7081
CORTEX_REASON_URL=http://cortex:7081/reason
CORTEX_INGEST_URL=http://cortex:7081/ingest
RELAY_URL=http://relay:7078
```
### Feature Flags
```bash
CORTEX_ENABLED=true
MEMORY_ENABLED=true
PERSONA_ENABLED=false
DEBUG_PROMPT=true
VERBOSE_DEBUG=true
```
---
## Code Structure Overview
### Cortex Service (`cortex/`)
**Main Files:**
- `main.py` - FastAPI app initialization
- `router.py` - Route definitions (`/reason`, `/ingest`, `/health`, `/debug/*`)
- `context.py` - Context aggregation (Intake summaries, session state)
**Reasoning Pipeline (`reasoning/`):**
- `reflection.py` - Meta-awareness notes (Cloud LLM)
- `reasoning.py` - Draft answer generation (Primary LLM)
- `refine.py` - Answer refinement (Primary LLM)
**Persona Layer (`persona/`):**
- `speak.py` - Personality application (Cloud LLM)
- `identity.py` - Persona loader
**Intake Module (`intake/`):**
- `__init__.py` - Package exports (SESSIONS, add_exchange_internal, summarize_context)
- `intake.py` - Core logic (367 lines)
- SESSIONS dictionary
- add_exchange_internal()
- summarize_context()
- bg_summarize() stub
**LLM Integration (`llm/`):**
- `llm_router.py` - Backend selector and HTTP client
- call_llm() function
- Environment-based routing
- Payload formatting per backend type
**Utilities (`utils/`):**
- Helper functions for common operations
**Configuration:**
- `Dockerfile` - Single-worker constraint documented
- `requirements.txt` - Python dependencies
- `.env` - Service-specific overrides
### Relay Service (`core/relay/`)
**Main Files:**
- `server.js` - Express.js server (Node.js)
- `/v1/chat/completions` - OpenAI-compatible endpoint
- `/chat` - Internal endpoint
- `/_health` - Health check
- `package.json` - Node.js dependencies
**Key Logic:**
- Receives user messages
- Routes to Cortex `/reason`
- Async calls to Cortex `/ingest` after response
- Returns final answer to user
### NeoMem Service (`neomem/`)
**Main Files:**
- `main.py` - FastAPI app (memory API)
- `memory.py` - Memory management logic
- `embedder.py` - Embedding generation
- `graph.py` - Neo4j graph operations
- `Dockerfile` - Container definition
- `requirements.txt` - Python dependencies
**API Endpoints:**
- `POST /memories` - Add new memory
- `POST /search` - Semantic search
- `GET /health` - Service health
---
## Common Development Tasks
### Adding a New Endpoint to Cortex
**Example: Add `/debug/buffer` endpoint**
1. **Edit `cortex/router.py`:**
```python
@cortex_router.get("/debug/buffer")
async def debug_buffer(session_id: str, limit: int = 10):
"""Return last N exchanges from a session buffer."""
from intake.intake import SESSIONS
session = SESSIONS.get(session_id)
if not session:
return {"error": "session not found", "session_id": session_id}
buffer = session["buffer"]
recent = list(buffer)[-limit:]
return {
"session_id": session_id,
"total_exchanges": len(buffer),
"recent_exchanges": recent
}
```
2. **Restart Cortex:**
```bash
docker-compose restart cortex
```
3. **Test:**
```bash
curl "http://localhost:7081/debug/buffer?session_id=test&limit=5"
```
### Modifying LLM Backend for a Module
**Example: Switch Cortex to use PRIMARY backend**
1. **Edit `.env`:**
```bash
CORTEX_LLM=PRIMARY # Change from SECONDARY to PRIMARY
```
2. **Restart Cortex:**
```bash
docker-compose restart cortex
```
3. **Verify in logs:**
```bash
docker logs cortex | grep "Backend"
```
### Adding Diagnostic Logging
**Example: Log every exchange addition**
1. **Edit `cortex/intake/intake.py`:**
```python
def add_exchange_internal(exchange: dict):
session_id = exchange.get("session_id")
# Add detailed logging
print(f"[DEBUG] Adding exchange to {session_id}")
print(f"[DEBUG] User msg: {exchange.get('user_msg', '')[:100]}")
print(f"[DEBUG] Assistant msg: {exchange.get('assistant_msg', '')[:100]}")
# ... rest of function
```
2. **View logs:**
```bash
docker logs cortex -f | grep DEBUG
```
---
## Debugging Guide
### Problem: SESSIONS Not Persisting
**Symptoms:**
- `/debug/sessions` shows empty or only 1 exchange
- Summaries always return empty
- Buffer size doesn't increase
**Diagnosis Steps:**
1. Check Cortex logs for SESSIONS object ID:
```bash
docker logs cortex | grep "SESSIONS object id"
```
- Should show same ID across all calls
- If IDs differ → module reloading issue
2. Verify single-worker mode:
```bash
docker exec cortex cat Dockerfile | grep uvicorn
```
- Should NOT have `--workers` flag or `--workers 1`
3. Check `/debug/sessions` endpoint:
```bash
curl http://localhost:7081/debug/sessions | jq
```
- Should show sessions_object_id and current sessions
4. Inspect `__init__.py` exists:
```bash
docker exec cortex ls -la intake/__init__.py
```
**Solution (Fixed in v0.5.1):**
- Ensure `cortex/intake/__init__.py` exists with proper exports
- Verify `bg_summarize()` is implemented (not just TYPE_CHECKING stub)
- Check `/ingest` endpoint doesn't have early return
- Rebuild Cortex container: `docker-compose build cortex && docker-compose restart cortex`
### Problem: LLM Backend Timeout
**Symptoms:**
- Cortex `/reason` hangs
- 504 Gateway Timeout errors
- Logs show "waiting for LLM response"
**Diagnosis Steps:**
1. Test backend directly:
```bash
# llama.cpp
curl http://10.0.0.44:8080/health
# Ollama
curl http://10.0.0.3:11434/api/tags
# OpenAI
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"
```
2. Check network connectivity:
```bash
docker exec cortex ping -c 3 10.0.0.44
```
3. Review Cortex logs:
```bash
docker logs cortex -f | grep "LLM"
```
**Solutions:**
- Verify backend URL in `.env` is correct and accessible
- Check firewall rules for backend ports
- Increase timeout in `cortex/llm/llm_router.py`
- Switch to different backend temporarily: `CORTEX_LLM=CLOUD`
### Problem: Docker Compose Won't Start
**Symptoms:**
- `docker-compose up -d` fails
- Container exits immediately
- "port already in use" errors
**Diagnosis Steps:**
1. Check port conflicts:
```bash
netstat -tulpn | grep -E '7078|7081|7077|5432'
```
2. Check container logs:
```bash
docker-compose logs --tail=50
```
3. Verify environment file:
```bash
cat .env | grep -v "^#" | grep -v "^$"
```
**Solutions:**
- Stop conflicting services: `docker-compose down`
- Check `.env` syntax (no quotes unless necessary)
- Rebuild containers: `docker-compose build --no-cache`
- Check Docker daemon: `systemctl status docker`
---
## Testing Checklist
### After Making Changes to Cortex
**1. Build and restart:**
```bash
docker-compose build cortex
docker-compose restart cortex
```
**2. Verify service health:**
```bash
curl http://localhost:7081/health
```
**3. Test /ingest endpoint:**
```bash
curl -X POST http://localhost:7081/ingest \
-H "Content-Type: application/json" \
-d '{
"session_id": "test",
"user_msg": "Hello",
"assistant_msg": "Hi there!"
}'
```
**4. Verify SESSIONS updated:**
```bash
curl http://localhost:7081/debug/sessions | jq '.sessions.test.buffer_size'
```
- Should show 1 (or increment if already populated)
**5. Test summarization:**
```bash
curl "http://localhost:7081/debug/summary?session_id=test" | jq '.summary'
```
- Should return L1/L5/L10/L20/L30 summaries
**6. Test full pipeline:**
```bash
curl -X POST http://localhost:7078/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Test message"}],
"session_id": "test"
}' | jq '.choices[0].message.content'
```
**7. Check logs for errors:**
```bash
docker logs cortex --tail=50
```
---
## Project History & Context
### Evolution Timeline
**v0.1.x (2025-09-23 to 2025-09-25)**
- Initial MVP: Relay + Mem0 + Ollama
- Basic memory storage and retrieval
- Simple UI with session support
**v0.2.x (2025-09-24 to 2025-09-30)**
- Migrated to mem0ai SDK
- Added sessionId support
- Created standalone Lyra-Mem0 stack
**v0.3.x (2025-09-26 to 2025-10-28)**
- Forked Mem0 → NVGRAM → NeoMem
- Added salience filtering
- Integrated Cortex reasoning VM
- Built RAG system (Beta Lyrae)
- Established multi-backend LLM support
**v0.4.x (2025-11-05 to 2025-11-13)**
- Major architectural rewire
- Implemented 4-stage reasoning pipeline
- Added reflection, refinement stages
- RAG integration
- LLM router with per-stage backend selection
**Infrastructure v1.0.0 (2025-11-26)**
- Consolidated 9 `.env` files into single source of truth
- Multi-backend LLM strategy
- Docker Compose consolidation
- Created security templates
**v0.5.0 (2025-11-28)**
- Fixed all critical API wiring issues
- Added OpenAI-compatible Relay endpoint
- Fixed Cortex → Intake integration
- End-to-end flow verification
**v0.5.1 (2025-12-11) - CURRENT**
- **Critical fix**: SESSIONS persistence bug
- Implemented `bg_summarize()` stub
- Fixed `/ingest` unreachable code
- Added `cortex/intake/__init__.py`
- Embedded Intake in Cortex (no longer standalone)
- Added diagnostic endpoints
- Lenient error handling
- Documented single-worker constraint
### Architectural Philosophy
**Modular Design:**
- Each service has a single, clear responsibility
- Services communicate via well-defined HTTP APIs
- Configuration is centralized but allows per-service overrides
**Local-First:**
- No reliance on external services (except optional OpenAI)
- All data stored locally (PostgreSQL + Neo4j)
- Can run entirely air-gapped with local LLMs
**Flexible LLM Backend:**
- Not tied to any single LLM provider
- Can mix local and cloud models
- Per-stage backend selection for optimal performance/cost
**Error Handling:**
- Lenient mode: Never fail the chat pipeline
- Log errors but continue processing
- Graceful degradation
**Observability:**
- Diagnostic endpoints for debugging
- Verbose logging mode
- Object ID tracking for singleton verification
---
## Known Issues & Limitations
### Fixed in v0.5.1
- ✅ Intake SESSIONS not persisting → **FIXED**
- ✅ `bg_summarize()` NameError → **FIXED**
- ✅ `/ingest` endpoint unreachable code → **FIXED**
### Current Limitations
**1. Single-Worker Constraint**
- Cortex must run with single Uvicorn worker
- SESSIONS is in-memory module-level global
- Multi-worker support requires Redis or shared storage
- Documented in `cortex/Dockerfile` lines 7-8
**2. NeoMem Integration Incomplete**
- Relay doesn't yet push to NeoMem after responses
- Memory storage planned for v0.5.2
- Currently all memory is short-term (SESSIONS only)
**3. RAG Service Disabled**
- Beta Lyrae (RAG) commented out in docker-compose.yml
- Awaiting re-enablement after Intake stabilization
- Code exists but not currently integrated
**4. Session Management**
- No session cleanup/expiration
- SESSIONS grows unbounded (maxlen=200 per session, but infinite sessions)
- No session list endpoint in Relay
**5. Persona Integration**
- `PERSONA_ENABLED=false` in `.env`
- Persona Sidecar not fully wired
- Identity loaded but not consistently applied
### Future Enhancements
**Short-term (v0.5.2):**
- Enable NeoMem integration in Relay
- Add session cleanup/expiration
- Session list endpoint
- NeoMem health monitoring
**Medium-term (v0.6.x):**
- Re-enable RAG service
- Migrate SESSIONS to Redis for multi-worker support
- Add request correlation IDs
- Comprehensive health checks
**Long-term (v0.7.x+):**
- Persona Sidecar full integration
- Autonomous "dream" cycles (self-reflection)
- Verifier module for factual grounding
- Advanced RAG with hybrid search
- Memory consolidation strategies
---
## Troubleshooting Quick Reference
| Problem | Quick Check | Solution |
|---------|-------------|----------|
| SESSIONS empty | `curl localhost:7081/debug/sessions` | Rebuild Cortex, verify `__init__.py` exists |
| LLM timeout | `curl http://10.0.0.44:8080/health` | Check backend connectivity, increase timeout |
| Port conflict | `netstat -tulpn \| grep 7078` | Stop conflicting service or change port |
| Container crash | `docker logs cortex` | Check logs for Python errors, verify .env syntax |
| Missing package | `docker exec cortex pip list` | Rebuild container, check requirements.txt |
| 502 from Relay | `curl localhost:7081/health` | Verify Cortex is running, check docker network |
---
## API Reference (Quick)
### Relay (Port 7078)
**POST /v1/chat/completions** - OpenAI-compatible chat
```json
{
"messages": [{"role": "user", "content": "..."}],
"session_id": "..."
}
```
**GET /_health** - Service health
### Cortex (Port 7081)
**POST /reason** - Main reasoning pipeline
```json
{
"session_id": "...",
"user_prompt": "...",
"temperature": 0.7 // optional
}
```
**POST /ingest** - Add exchange to SESSIONS
```json
{
"session_id": "...",
"user_msg": "...",
"assistant_msg": "..."
}
```
**GET /debug/sessions** - Inspect SESSIONS state
**GET /debug/summary?session_id=X** - Test summarization
**GET /health** - Service health
### NeoMem (Port 7077)
**POST /memories** - Add memory
```json
{
"messages": [{"role": "...", "content": "..."}],
"user_id": "...",
"metadata": {}
}
```
**POST /search** - Semantic search
```json
{
"query": "...",
"user_id": "...",
"limit": 10
}
```
**GET /health** - Service health
---
## File Manifest (Key Files Only)
```
project-lyra/
├── .env # Root environment variables
├── docker-compose.yml # Service definitions (152 lines)
├── CHANGELOG.md # Version history (836 lines)
├── README.md # User documentation (610 lines)
├── PROJECT_SUMMARY.md # This file (AI context)
├── cortex/ # Reasoning engine
│ ├── Dockerfile # Single-worker constraint documented
│ ├── requirements.txt
│ ├── .env # Cortex overrides
│ ├── main.py # FastAPI initialization
│ ├── router.py # Routes (306 lines)
│ ├── context.py # Context aggregation
│ │
│ ├── intake/ # Short-term memory (embedded)
│ │ ├── __init__.py # Package exports
│ │ └── intake.py # Core logic (367 lines)
│ │
│ ├── reasoning/ # Reasoning pipeline
│ │ ├── reflection.py # Meta-awareness
│ │ ├── reasoning.py # Draft generation
│ │ └── refine.py # Refinement
│ │
│ ├── persona/ # Personality layer
│ │ ├── speak.py # Persona application
│ │ └── identity.py # Persona loader
│ │
│ └── llm/ # LLM integration
│ └── llm_router.py # Backend selector
├── core/relay/ # Orchestrator
│ ├── server.js # Express server (Node.js)
│ └── package.json
├── neomem/ # Long-term memory
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── .env # NeoMem overrides
│ └── main.py # Memory API
└── rag/ # RAG system (disabled)
├── rag_api.py
├── rag_chat_import.py
└── chromadb/
```
---
## Final Notes for AI Assistants
### What You Should Know Before Making Changes
1. **SESSIONS is sacred** - It's a module-level global in `cortex/intake/intake.py`. Don't move it, don't duplicate it, don't make it a class attribute. It must remain a singleton.
2. **Single-worker is mandatory** - Until SESSIONS is migrated to Redis, Cortex MUST run with a single Uvicorn worker. Multi-worker will cause SESSIONS to be inconsistent.
3. **Lenient error handling** - The `/ingest` endpoint and other parts of the pipeline use lenient error handling: log errors but always return success. Never fail the chat pipeline.
4. **Backend routing is environment-driven** - Don't hardcode LLM URLs. Use the `{MODULE}_LLM` environment variables and the llm_router.py system.
5. **Intake is embedded** - Don't try to make HTTP calls to Intake. Use direct Python imports: `from intake.intake import ...`
6. **Test with diagnostic endpoints** - Always use `/debug/sessions` and `/debug/summary` to verify SESSIONS behavior after changes.
7. **Follow the changelog format** - When documenting changes, use the chronological format established in CHANGELOG.md v0.5.1. Group by version, then by change type (Fixed, Added, Changed, etc.).
### When You Need Help
- **SESSIONS issues**: Check `cortex/intake/intake.py` lines 11-14 for initialization, lines 325-366 for `add_exchange_internal()`
- **Routing issues**: Check `cortex/router.py` lines 65-189 for `/reason`, lines 201-233 for `/ingest`
- **LLM backend issues**: Check `cortex/llm/llm_router.py` for backend selection logic
- **Environment variables**: Check `.env` lines 13-40 for LLM backends, lines 28-34 for module selection
### Most Important Thing
**This project values reliability over features.** It's better to have a simple, working system than a complex, broken one. When in doubt, keep it simple, log everything, and never fail silently.
---
**End of AI Context Summary**
*This document is maintained to provide complete context for AI assistants working on Project Lyra. Last updated: v0.5.1 (2025-12-11)*
+441
View File
@@ -0,0 +1,441 @@
├── CHANGELOG.md
├── core
│ ├── env experiments
│ ├── persona-sidecar
│ │ ├── Dockerfile
│ │ ├── package.json
│ │ ├── persona-server.js
│ │ └── personas.json
│ ├── relay
│ │ ├── Dockerfile
│ │ ├── lib
│ │ │ ├── cortex.js
│ │ │ └── llm.js
│ │ ├── package.json
│ │ ├── package-lock.json
│ │ ├── server.js
│ │ ├── sessions
│ │ │ ├── default.jsonl
│ │ │ ├── sess-6rxu7eia.json
│ │ │ ├── sess-6rxu7eia.jsonl
│ │ │ ├── sess-l08ndm60.json
│ │ │ └── sess-l08ndm60.jsonl
│ │ └── test-llm.js
│ ├── relay-backup
│ └── ui
│ ├── index.html
│ ├── manifest.json
│ └── style.css
├── cortex
│ ├── context.py
│ ├── Dockerfile
│ ├── ingest
│ │ ├── ingest_handler.py
│ │ ├── __init__.py
│ │ └── intake_client.py
│ ├── intake
│ │ ├── __init__.py
│ │ ├── intake.py
│ │ └── logs
│ ├── llm
│ │ ├── __init__.py
│ │ └── llm_router.py
│ ├── logs
│ │ ├── cortex_verbose_debug.log
│ │ └── reflections.log
│ ├── main.py
│ ├── neomem_client.py
│ ├── persona
│ │ ├── identity.py
│ │ ├── __init__.py
│ │ └── speak.py
│ ├── rag.py
│ ├── reasoning
│ │ ├── __init__.py
│ │ ├── reasoning.py
│ │ ├── refine.py
│ │ └── reflection.py
│ ├── requirements.txt
│ ├── router.py
│ ├── tests
│ └── utils
│ ├── config.py
│ ├── __init__.py
│ ├── log_utils.py
│ └── schema.py
├── deprecated.env.txt
├── DEPRECATED_FILES.md
├── docker-compose.yml
├── docs
│ ├── ARCHITECTURE_v0-6-0.md
│ ├── ENVIRONMENT_VARIABLES.md
│ ├── lyra_tree.txt
│ └── PROJECT_SUMMARY.md
├── intake-logs
│ └── summaries.log
├── neomem
│ ├── _archive
│ │ └── old_servers
│ │ ├── main_backup.py
│ │ └── main_dev.py
│ ├── docker-compose.yml
│ ├── Dockerfile
│ ├── neomem
│ │ ├── api
│ │ ├── client
│ │ │ ├── __init__.py
│ │ │ ├── main.py
│ │ │ ├── project.py
│ │ │ └── utils.py
│ │ ├── configs
│ │ │ ├── base.py
│ │ │ ├── embeddings
│ │ │ │ ├── base.py
│ │ │ │ └── __init__.py
│ │ │ ├── enums.py
│ │ │ ├── __init__.py
│ │ │ ├── llms
│ │ │ │ ├── anthropic.py
│ │ │ │ ├── aws_bedrock.py
│ │ │ │ ├── azure.py
│ │ │ │ ├── base.py
│ │ │ │ ├── deepseek.py
│ │ │ │ ├── __init__.py
│ │ │ │ ├── lmstudio.py
│ │ │ │ ├── ollama.py
│ │ │ │ ├── openai.py
│ │ │ │ └── vllm.py
│ │ │ ├── prompts.py
│ │ │ └── vector_stores
│ │ │ ├── azure_ai_search.py
│ │ │ ├── azure_mysql.py
│ │ │ ├── baidu.py
│ │ │ ├── chroma.py
│ │ │ ├── databricks.py
│ │ │ ├── elasticsearch.py
│ │ │ ├── faiss.py
│ │ │ ├── __init__.py
│ │ │ ├── langchain.py
│ │ │ ├── milvus.py
│ │ │ ├── mongodb.py
│ │ │ ├── neptune.py
│ │ │ ├── opensearch.py
│ │ │ ├── pgvector.py
│ │ │ ├── pinecone.py
│ │ │ ├── qdrant.py
│ │ │ ├── redis.py
│ │ │ ├── s3_vectors.py
│ │ │ ├── supabase.py
│ │ │ ├── upstash_vector.py
│ │ │ ├── valkey.py
│ │ │ ├── vertex_ai_vector_search.py
│ │ │ └── weaviate.py
│ │ ├── core
│ │ ├── embeddings
│ │ │ ├── aws_bedrock.py
│ │ │ ├── azure_openai.py
│ │ │ ├── base.py
│ │ │ ├── configs.py
│ │ │ ├── gemini.py
│ │ │ ├── huggingface.py
│ │ │ ├── __init__.py
│ │ │ ├── langchain.py
│ │ │ ├── lmstudio.py
│ │ │ ├── mock.py
│ │ │ ├── ollama.py
│ │ │ ├── openai.py
│ │ │ ├── together.py
│ │ │ └── vertexai.py
│ │ ├── exceptions.py
│ │ ├── graphs
│ │ │ ├── configs.py
│ │ │ ├── __init__.py
│ │ │ ├── neptune
│ │ │ │ ├── base.py
│ │ │ │ ├── __init__.py
│ │ │ │ ├── neptunedb.py
│ │ │ │ └── neptunegraph.py
│ │ │ ├── tools.py
│ │ │ └── utils.py
│ │ ├── __init__.py
│ │ ├── LICENSE
│ │ ├── llms
│ │ │ ├── anthropic.py
│ │ │ ├── aws_bedrock.py
│ │ │ ├── azure_openai.py
│ │ │ ├── azure_openai_structured.py
│ │ │ ├── base.py
│ │ │ ├── configs.py
│ │ │ ├── deepseek.py
│ │ │ ├── gemini.py
│ │ │ ├── groq.py
│ │ │ ├── __init__.py
│ │ │ ├── langchain.py
│ │ │ ├── litellm.py
│ │ │ ├── lmstudio.py
│ │ │ ├── ollama.py
│ │ │ ├── openai.py
│ │ │ ├── openai_structured.py
│ │ │ ├── sarvam.py
│ │ │ ├── together.py
│ │ │ ├── vllm.py
│ │ │ └── xai.py
│ │ ├── memory
│ │ │ ├── base.py
│ │ │ ├── graph_memory.py
│ │ │ ├── __init__.py
│ │ │ ├── kuzu_memory.py
│ │ │ ├── main.py
│ │ │ ├── memgraph_memory.py
│ │ │ ├── setup.py
│ │ │ ├── storage.py
│ │ │ ├── telemetry.py
│ │ │ └── utils.py
│ │ ├── proxy
│ │ │ ├── __init__.py
│ │ │ └── main.py
│ │ ├── server
│ │ │ ├── dev.Dockerfile
│ │ │ ├── docker-compose.yaml
│ │ │ ├── Dockerfile
│ │ │ ├── main_old.py
│ │ │ ├── main.py
│ │ │ ├── Makefile
│ │ │ ├── README.md
│ │ │ └── requirements.txt
│ │ ├── storage
│ │ ├── utils
│ │ │ └── factory.py
│ │ └── vector_stores
│ │ ├── azure_ai_search.py
│ │ ├── azure_mysql.py
│ │ ├── baidu.py
│ │ ├── base.py
│ │ ├── chroma.py
│ │ ├── configs.py
│ │ ├── databricks.py
│ │ ├── elasticsearch.py
│ │ ├── faiss.py
│ │ ├── __init__.py
│ │ ├── langchain.py
│ │ ├── milvus.py
│ │ ├── mongodb.py
│ │ ├── neptune_analytics.py
│ │ ├── opensearch.py
│ │ ├── pgvector.py
│ │ ├── pinecone.py
│ │ ├── qdrant.py
│ │ ├── redis.py
│ │ ├── s3_vectors.py
│ │ ├── supabase.py
│ │ ├── upstash_vector.py
│ │ ├── valkey.py
│ │ ├── vertex_ai_vector_search.py
│ │ └── weaviate.py
│ ├── neomem_history
│ │ └── history.db
│ ├── pyproject.toml
│ ├── README.md
│ └── requirements.txt
├── neomem_history
│ └── history.db
├── rag
│ ├── chatlogs
│ │ └── lyra
│ │ ├── 0000_Wire_ROCm_to_Cortex.json
│ │ ├── 0001_Branch___10_22_ct201branch-ssh_tut.json
│ │ ├── 0002_cortex_LLMs_11-1-25.json
│ │ ├── 0003_RAG_beta.json
│ │ ├── 0005_Cortex_v0_4_0_planning.json
│ │ ├── 0006_Cortex_v0_4_0_Refinement.json
│ │ ├── 0009_Branch___Cortex_v0_4_0_planning.json
│ │ ├── 0012_Cortex_4_-_neomem_11-1-25.json
│ │ ├── 0016_Memory_consolidation_concept.json
│ │ ├── 0017_Model_inventory_review.json
│ │ ├── 0018_Branch___Memory_consolidation_concept.json
│ │ ├── 0022_Branch___Intake_conversation_summaries.json
│ │ ├── 0026_Intake_conversation_summaries.json
│ │ ├── 0027_Trilium_AI_LLM_setup.json
│ │ ├── 0028_LLMs_and_sycophancy_levels.json
│ │ ├── 0031_UI_improvement_plan.json
│ │ ├── 0035_10_27-neomem_update.json
│ │ ├── 0044_Install_llama_cpp_on_ct201.json
│ │ ├── 0045_AI_task_assistant.json
│ │ ├── 0047_Project_scope_creation.json
│ │ ├── 0052_View_docker_container_logs.json
│ │ ├── 0053_10_21-Proxmox_fan_control.json
│ │ ├── 0054_10_21-pytorch_branch_Quant_experiments.json
│ │ ├── 0055_10_22_ct201branch-ssh_tut.json
│ │ ├── 0060_Lyra_project_folder_issue.json
│ │ ├── 0062_Build_pytorch_API.json
│ │ ├── 0063_PokerBrain_dataset_structure.json
│ │ ├── 0065_Install_PyTorch_setup.json
│ │ ├── 0066_ROCm_PyTorch_setup_quirks.json
│ │ ├── 0067_VM_model_setup_steps.json
│ │ ├── 0070_Proxmox_disk_error_fix.json
│ │ ├── 0072_Docker_Compose_vs_Portainer.json
│ │ ├── 0073_Check_system_temps_Proxmox.json
│ │ ├── 0075_Cortex_gpu_progress.json
│ │ ├── 0076_Backup_Proxmox_before_upgrade.json
│ │ ├── 0077_Storage_cleanup_advice.json
│ │ ├── 0082_Install_ROCm_on_Proxmox.json
│ │ ├── 0088_Thalamus_program_summary.json
│ │ ├── 0094_Cortex_blueprint_development.json
│ │ ├── 0095_mem0_advancments.json
│ │ ├── 0096_Embedding_provider_swap.json
│ │ ├── 0097_Update_git_commit_steps.json
│ │ ├── 0098_AI_software_description.json
│ │ ├── 0099_Seed_memory_process.json
│ │ ├── 0100_Set_up_Git_repo.json
│ │ ├── 0101_Customize_embedder_setup.json
│ │ ├── 0102_Seeding_Local_Lyra_memory.json
│ │ ├── 0103_Mem0_seeding_part_3.json
│ │ ├── 0104_Memory_build_prompt.json
│ │ ├── 0105_Git_submodule_setup_guide.json
│ │ ├── 0106_Serve_UI_on_LAN.json
│ │ ├── 0107_AI_name_suggestion.json
│ │ ├── 0108_Room_X_planning_update.json
│ │ ├── 0109_Salience_filtering_design.json
│ │ ├── 0110_RoomX_Cortex_build.json
│ │ ├── 0119_Explain_Lyra_cortex_idea.json
│ │ ├── 0120_Git_submodule_organization.json
│ │ ├── 0121_Web_UI_fix_guide.json
│ │ ├── 0122_UI_development_planning.json
│ │ ├── 0123_NVGRAM_debugging_steps.json
│ │ ├── 0124_NVGRAM_setup_troubleshooting.json
│ │ ├── 0125_NVGRAM_development_update.json
│ │ ├── 0126_RX_-_NeVGRAM_New_Features.json
│ │ ├── 0127_Error_troubleshooting_steps.json
│ │ ├── 0135_Proxmox_backup_with_ABB.json
│ │ ├── 0151_Auto-start_Lyra-Core_VM.json
│ │ ├── 0156_AI_GPU_benchmarks_comparison.json
│ │ └── 0251_Lyra_project_handoff.json
│ ├── chromadb
│ │ ├── c4f701ee-1978-44a1-9df4-3e865b5d33c1
│ │ │ ├── data_level0.bin
│ │ │ ├── header.bin
│ │ │ ├── index_metadata.pickle
│ │ │ ├── length.bin
│ │ │ └── link_lists.bin
│ │ └── chroma.sqlite3
│ ├── import.log
│ ├── lyra-chatlogs
│ │ ├── 0000_Wire_ROCm_to_Cortex.json
│ │ ├── 0001_Branch___10_22_ct201branch-ssh_tut.json
│ │ ├── 0002_cortex_LLMs_11-1-25.json
│ │ └── 0003_RAG_beta.json
│ ├── rag_api.py
│ ├── rag_build.py
│ ├── rag_chat_import.py
│ └── rag_query.py
├── README.md
└── volumes
├── neo4j_data
│ ├── databases
│ │ ├── neo4j
│ │ │ ├── database_lock
│ │ │ ├── id-buffer.tmp.0
│ │ │ ├── neostore
│ │ │ ├── neostore.counts.db
│ │ │ ├── neostore.indexstats.db
│ │ │ ├── neostore.labeltokenstore.db
│ │ │ ├── neostore.labeltokenstore.db.id
│ │ │ ├── neostore.labeltokenstore.db.names
│ │ │ ├── neostore.labeltokenstore.db.names.id
│ │ │ ├── neostore.nodestore.db
│ │ │ ├── neostore.nodestore.db.id
│ │ │ ├── neostore.nodestore.db.labels
│ │ │ ├── neostore.nodestore.db.labels.id
│ │ │ ├── neostore.propertystore.db
│ │ │ ├── neostore.propertystore.db.arrays
│ │ │ ├── neostore.propertystore.db.arrays.id
│ │ │ ├── neostore.propertystore.db.id
│ │ │ ├── neostore.propertystore.db.index
│ │ │ ├── neostore.propertystore.db.index.id
│ │ │ ├── neostore.propertystore.db.index.keys
│ │ │ ├── neostore.propertystore.db.index.keys.id
│ │ │ ├── neostore.propertystore.db.strings
│ │ │ ├── neostore.propertystore.db.strings.id
│ │ │ ├── neostore.relationshipgroupstore.db
│ │ │ ├── neostore.relationshipgroupstore.db.id
│ │ │ ├── neostore.relationshipgroupstore.degrees.db
│ │ │ ├── neostore.relationshipstore.db
│ │ │ ├── neostore.relationshipstore.db.id
│ │ │ ├── neostore.relationshiptypestore.db
│ │ │ ├── neostore.relationshiptypestore.db.id
│ │ │ ├── neostore.relationshiptypestore.db.names
│ │ │ ├── neostore.relationshiptypestore.db.names.id
│ │ │ ├── neostore.schemastore.db
│ │ │ ├── neostore.schemastore.db.id
│ │ │ └── schema
│ │ │ └── index
│ │ │ └── token-lookup-1.0
│ │ │ ├── 1
│ │ │ │ └── index-1
│ │ │ └── 2
│ │ │ └── index-2
│ │ ├── store_lock
│ │ └── system
│ │ ├── database_lock
│ │ ├── id-buffer.tmp.0
│ │ ├── neostore
│ │ ├── neostore.counts.db
│ │ ├── neostore.indexstats.db
│ │ ├── neostore.labeltokenstore.db
│ │ ├── neostore.labeltokenstore.db.id
│ │ ├── neostore.labeltokenstore.db.names
│ │ ├── neostore.labeltokenstore.db.names.id
│ │ ├── neostore.nodestore.db
│ │ ├── neostore.nodestore.db.id
│ │ ├── neostore.nodestore.db.labels
│ │ ├── neostore.nodestore.db.labels.id
│ │ ├── neostore.propertystore.db
│ │ ├── neostore.propertystore.db.arrays
│ │ ├── neostore.propertystore.db.arrays.id
│ │ ├── neostore.propertystore.db.id
│ │ ├── neostore.propertystore.db.index
│ │ ├── neostore.propertystore.db.index.id
│ │ ├── neostore.propertystore.db.index.keys
│ │ ├── neostore.propertystore.db.index.keys.id
│ │ ├── neostore.propertystore.db.strings
│ │ ├── neostore.propertystore.db.strings.id
│ │ ├── neostore.relationshipgroupstore.db
│ │ ├── neostore.relationshipgroupstore.db.id
│ │ ├── neostore.relationshipgroupstore.degrees.db
│ │ ├── neostore.relationshipstore.db
│ │ ├── neostore.relationshipstore.db.id
│ │ ├── neostore.relationshiptypestore.db
│ │ ├── neostore.relationshiptypestore.db.id
│ │ ├── neostore.relationshiptypestore.db.names
│ │ ├── neostore.relationshiptypestore.db.names.id
│ │ ├── neostore.schemastore.db
│ │ ├── neostore.schemastore.db.id
│ │ └── schema
│ │ └── index
│ │ ├── range-1.0
│ │ │ ├── 3
│ │ │ │ └── index-3
│ │ │ ├── 4
│ │ │ │ └── index-4
│ │ │ ├── 7
│ │ │ │ └── index-7
│ │ │ ├── 8
│ │ │ │ └── index-8
│ │ │ └── 9
│ │ │ └── index-9
│ │ └── token-lookup-1.0
│ │ ├── 1
│ │ │ └── index-1
│ │ └── 2
│ │ └── index-2
│ ├── dbms
│ │ └── auth.ini
│ ├── server_id
│ └── transactions
│ ├── neo4j
│ │ ├── checkpoint.0
│ │ └── neostore.transaction.db.0
│ └── system
│ ├── checkpoint.0
│ └── neostore.transaction.db.0
└── postgres_data [error opening dir]
-460
View File
@@ -1,460 +0,0 @@
/home/serversdown/project-lyra
├── CHANGELOG.md
├── core
│   ├── backups
│   │   ├── mem0_20250927_221040.sql
│   │   └── mem0_history_20250927_220925.tgz
│   ├── docker-compose.yml
│   ├── .env
│   ├── env experiments
│   │   ├── .env
│   │   ├── .env.local
│   │   └── .env.openai
│   ├── persona-sidecar
│   │   ├── Dockerfile
│   │   ├── package.json
│   │   ├── persona-server.js
│   │   └── personas.json
│   ├── PROJECT_SUMMARY.md
│   ├── relay
│   │   ├── Dockerfile
│   │   ├── .dockerignore
│   │   ├── lib
│   │   │   ├── cortex.js
│   │   │   └── llm.js
│   │   ├── package.json
│   │   ├── package-lock.json
│   │   ├── server.js
│   │   ├── sessions
│   │   │   ├── sess-6rxu7eia.json
│   │   │   ├── sess-6rxu7eia.jsonl
│   │   │   ├── sess-l08ndm60.json
│   │   │   └── sess-l08ndm60.jsonl
│   │   └── test-llm.js
│   └── ui
│   ├── index.html
│   ├── manifest.json
│   └── style.css
├── cortex
│   ├── Dockerfile
│   ├── .env
│   ├── ingest
│   │   ├── ingest_handler.py
│   │   └── intake_client.py
│   ├── llm
│   │   ├── llm_router.py
│   │   └── resolve_llm_url.py
│   ├── logs
│   │   └── reflections.log
│   ├── main.py
│   ├── neomem_client.py
│   ├── persona
│   │   └── speak.py
│   ├── rag.py
│   ├── reasoning
│   │   ├── reasoning.py
│   │   ├── refine.py
│   │   └── reflection.py
│   ├── requirements.txt
│   ├── router.py
│   ├── tests
│   └── utils
│   ├── config.py
│   ├── log_utils.py
│   └── schema.py
├── deprecated.env.txt
├── docker-compose.yml
├── .env
├── .gitignore
├── intake
│   ├── Dockerfile
│   ├── .env
│   ├── intake.py
│   ├── logs
│   ├── requirements.txt
│   └── venv
│   ├── bin
│   │   ├── python -> python3
│   │   ├── python3 -> /usr/bin/python3
│   │   └── python3.10 -> python3
│   ├── include
│   ├── lib
│   │   └── python3.10
│   │   └── site-packages
│   ├── lib64 -> lib
│   └── pyvenv.cfg
├── intake-logs
│   └── summaries.log
├── lyra_tree.txt
├── neomem
│   ├── _archive
│   │   └── old_servers
│   │   ├── main_backup.py
│   │   └── main_dev.py
│   ├── docker-compose.yml
│   ├── Dockerfile
│   ├── .env
│   ├── .gitignore
│   ├── neomem
│   │   ├── api
│   │   ├── client
│   │   │   ├── __init__.py
│   │   │   ├── main.py
│   │   │   ├── project.py
│   │   │   └── utils.py
│   │   ├── configs
│   │   │   ├── base.py
│   │   │   ├── embeddings
│   │   │   │   ├── base.py
│   │   │   │   └── __init__.py
│   │   │   ├── enums.py
│   │   │   ├── __init__.py
│   │   │   ├── llms
│   │   │   │   ├── anthropic.py
│   │   │   │   ├── aws_bedrock.py
│   │   │   │   ├── azure.py
│   │   │   │   ├── base.py
│   │   │   │   ├── deepseek.py
│   │   │   │   ├── __init__.py
│   │   │   │   ├── lmstudio.py
│   │   │   │   ├── ollama.py
│   │   │   │   ├── openai.py
│   │   │   │   └── vllm.py
│   │   │   ├── prompts.py
│   │   │   └── vector_stores
│   │   │   ├── azure_ai_search.py
│   │   │   ├── azure_mysql.py
│   │   │   ├── baidu.py
│   │   │   ├── chroma.py
│   │   │   ├── databricks.py
│   │   │   ├── elasticsearch.py
│   │   │   ├── faiss.py
│   │   │   ├── __init__.py
│   │   │   ├── langchain.py
│   │   │   ├── milvus.py
│   │   │   ├── mongodb.py
│   │   │   ├── neptune.py
│   │   │   ├── opensearch.py
│   │   │   ├── pgvector.py
│   │   │   ├── pinecone.py
│   │   │   ├── qdrant.py
│   │   │   ├── redis.py
│   │   │   ├── s3_vectors.py
│   │   │   ├── supabase.py
│   │   │   ├── upstash_vector.py
│   │   │   ├── valkey.py
│   │   │   ├── vertex_ai_vector_search.py
│   │   │   └── weaviate.py
│   │   ├── core
│   │   ├── embeddings
│   │   │   ├── aws_bedrock.py
│   │   │   ├── azure_openai.py
│   │   │   ├── base.py
│   │   │   ├── configs.py
│   │   │   ├── gemini.py
│   │   │   ├── huggingface.py
│   │   │   ├── __init__.py
│   │   │   ├── langchain.py
│   │   │   ├── lmstudio.py
│   │   │   ├── mock.py
│   │   │   ├── ollama.py
│   │   │   ├── openai.py
│   │   │   ├── together.py
│   │   │   └── vertexai.py
│   │   ├── exceptions.py
│   │   ├── graphs
│   │   │   ├── configs.py
│   │   │   ├── __init__.py
│   │   │   ├── neptune
│   │   │   │   ├── base.py
│   │   │   │   ├── __init__.py
│   │   │   │   ├── neptunedb.py
│   │   │   │   └── neptunegraph.py
│   │   │   ├── tools.py
│   │   │   └── utils.py
│   │   ├── __init__.py
│   │   ├── LICENSE
│   │   ├── llms
│   │   │   ├── anthropic.py
│   │   │   ├── aws_bedrock.py
│   │   │   ├── azure_openai.py
│   │   │   ├── azure_openai_structured.py
│   │   │   ├── base.py
│   │   │   ├── configs.py
│   │   │   ├── deepseek.py
│   │   │   ├── gemini.py
│   │   │   ├── groq.py
│   │   │   ├── __init__.py
│   │   │   ├── langchain.py
│   │   │   ├── litellm.py
│   │   │   ├── lmstudio.py
│   │   │   ├── ollama.py
│   │   │   ├── openai.py
│   │   │   ├── openai_structured.py
│   │   │   ├── sarvam.py
│   │   │   ├── together.py
│   │   │   ├── vllm.py
│   │   │   └── xai.py
│   │   ├── memory
│   │   │   ├── base.py
│   │   │   ├── graph_memory.py
│   │   │   ├── __init__.py
│   │   │   ├── kuzu_memory.py
│   │   │   ├── main.py
│   │   │   ├── memgraph_memory.py
│   │   │   ├── setup.py
│   │   │   ├── storage.py
│   │   │   ├── telemetry.py
│   │   │   └── utils.py
│   │   ├── proxy
│   │   │   ├── __init__.py
│   │   │   └── main.py
│   │   ├── server
│   │   │   ├── dev.Dockerfile
│   │   │   ├── docker-compose.yaml
│   │   │   ├── Dockerfile
│   │   │   ├── main_old.py
│   │   │   ├── main.py
│   │   │   ├── Makefile
│   │   │   ├── README.md
│   │   │   └── requirements.txt
│   │   ├── storage
│   │   ├── utils
│   │   │   └── factory.py
│   │   └── vector_stores
│   │   ├── azure_ai_search.py
│   │   ├── azure_mysql.py
│   │   ├── baidu.py
│   │   ├── base.py
│   │   ├── chroma.py
│   │   ├── configs.py
│   │   ├── databricks.py
│   │   ├── elasticsearch.py
│   │   ├── faiss.py
│   │   ├── __init__.py
│   │   ├── langchain.py
│   │   ├── milvus.py
│   │   ├── mongodb.py
│   │   ├── neptune_analytics.py
│   │   ├── opensearch.py
│   │   ├── pgvector.py
│   │   ├── pinecone.py
│   │   ├── qdrant.py
│   │   ├── redis.py
│   │   ├── s3_vectors.py
│   │   ├── supabase.py
│   │   ├── upstash_vector.py
│   │   ├── valkey.py
│   │   ├── vertex_ai_vector_search.py
│   │   └── weaviate.py
│   ├── neomem_history
│   │   └── history.db
│   ├── pyproject.toml
│   ├── README.md
│   └── requirements.txt
├── neomem_history
│   └── history.db
├── rag
│   ├── chatlogs
│   │   └── lyra
│   │   ├── 0000_Wire_ROCm_to_Cortex.json
│   │   ├── 0001_Branch___10_22_ct201branch-ssh_tut.json
│   │   ├── 0002_cortex_LLMs_11-1-25.json
│   │   ├── 0003_RAG_beta.json
│   │   ├── 0005_Cortex_v0_4_0_planning.json
│   │   ├── 0006_Cortex_v0_4_0_Refinement.json
│   │   ├── 0009_Branch___Cortex_v0_4_0_planning.json
│   │   ├── 0012_Cortex_4_-_neomem_11-1-25.json
│   │   ├── 0016_Memory_consolidation_concept.json
│   │   ├── 0017_Model_inventory_review.json
│   │   ├── 0018_Branch___Memory_consolidation_concept.json
│   │   ├── 0022_Branch___Intake_conversation_summaries.json
│   │   ├── 0026_Intake_conversation_summaries.json
│   │   ├── 0027_Trilium_AI_LLM_setup.json
│   │   ├── 0028_LLMs_and_sycophancy_levels.json
│   │   ├── 0031_UI_improvement_plan.json
│   │   ├── 0035_10_27-neomem_update.json
│   │   ├── 0044_Install_llama_cpp_on_ct201.json
│   │   ├── 0045_AI_task_assistant.json
│   │   ├── 0047_Project_scope_creation.json
│   │   ├── 0052_View_docker_container_logs.json
│   │   ├── 0053_10_21-Proxmox_fan_control.json
│   │   ├── 0054_10_21-pytorch_branch_Quant_experiments.json
│   │   ├── 0055_10_22_ct201branch-ssh_tut.json
│   │   ├── 0060_Lyra_project_folder_issue.json
│   │   ├── 0062_Build_pytorch_API.json
│   │   ├── 0063_PokerBrain_dataset_structure.json
│   │   ├── 0065_Install_PyTorch_setup.json
│   │   ├── 0066_ROCm_PyTorch_setup_quirks.json
│   │   ├── 0067_VM_model_setup_steps.json
│   │   ├── 0070_Proxmox_disk_error_fix.json
│   │   ├── 0072_Docker_Compose_vs_Portainer.json
│   │   ├── 0073_Check_system_temps_Proxmox.json
│   │   ├── 0075_Cortex_gpu_progress.json
│   │   ├── 0076_Backup_Proxmox_before_upgrade.json
│   │   ├── 0077_Storage_cleanup_advice.json
│   │   ├── 0082_Install_ROCm_on_Proxmox.json
│   │   ├── 0088_Thalamus_program_summary.json
│   │   ├── 0094_Cortex_blueprint_development.json
│   │   ├── 0095_mem0_advancments.json
│   │   ├── 0096_Embedding_provider_swap.json
│   │   ├── 0097_Update_git_commit_steps.json
│   │   ├── 0098_AI_software_description.json
│   │   ├── 0099_Seed_memory_process.json
│   │   ├── 0100_Set_up_Git_repo.json
│   │   ├── 0101_Customize_embedder_setup.json
│   │   ├── 0102_Seeding_Local_Lyra_memory.json
│   │   ├── 0103_Mem0_seeding_part_3.json
│   │   ├── 0104_Memory_build_prompt.json
│   │   ├── 0105_Git_submodule_setup_guide.json
│   │   ├── 0106_Serve_UI_on_LAN.json
│   │   ├── 0107_AI_name_suggestion.json
│   │   ├── 0108_Room_X_planning_update.json
│   │   ├── 0109_Salience_filtering_design.json
│   │   ├── 0110_RoomX_Cortex_build.json
│   │   ├── 0119_Explain_Lyra_cortex_idea.json
│   │   ├── 0120_Git_submodule_organization.json
│   │   ├── 0121_Web_UI_fix_guide.json
│   │   ├── 0122_UI_development_planning.json
│   │   ├── 0123_NVGRAM_debugging_steps.json
│   │   ├── 0124_NVGRAM_setup_troubleshooting.json
│   │   ├── 0125_NVGRAM_development_update.json
│   │   ├── 0126_RX_-_NeVGRAM_New_Features.json
│   │   ├── 0127_Error_troubleshooting_steps.json
│   │   ├── 0135_Proxmox_backup_with_ABB.json
│   │   ├── 0151_Auto-start_Lyra-Core_VM.json
│   │   ├── 0156_AI_GPU_benchmarks_comparison.json
│   │   └── 0251_Lyra_project_handoff.json
│   ├── chromadb
│   │   ├── c4f701ee-1978-44a1-9df4-3e865b5d33c1
│   │   │   ├── data_level0.bin
│   │   │   ├── header.bin
│   │   │   ├── index_metadata.pickle
│   │   │   ├── length.bin
│   │   │   └── link_lists.bin
│   │   └── chroma.sqlite3
│   ├── .env
│   ├── import.log
│   ├── lyra-chatlogs
│   │   ├── 0000_Wire_ROCm_to_Cortex.json
│   │   ├── 0001_Branch___10_22_ct201branch-ssh_tut.json
│   │   ├── 0002_cortex_LLMs_11-1-25.json
│   │   └── 0003_RAG_beta.json
│   ├── rag_api.py
│   ├── rag_build.py
│   ├── rag_chat_import.py
│   └── rag_query.py
├── README.md
├── vllm-mi50.md
└── volumes
├── neo4j_data
│   ├── databases
│   │   ├── neo4j
│   │   │   ├── database_lock
│   │   │   ├── id-buffer.tmp.0
│   │   │   ├── neostore
│   │   │   ├── neostore.counts.db
│   │   │   ├── neostore.indexstats.db
│   │   │   ├── neostore.labeltokenstore.db
│   │   │   ├── neostore.labeltokenstore.db.id
│   │   │   ├── neostore.labeltokenstore.db.names
│   │   │   ├── neostore.labeltokenstore.db.names.id
│   │   │   ├── neostore.nodestore.db
│   │   │   ├── neostore.nodestore.db.id
│   │   │   ├── neostore.nodestore.db.labels
│   │   │   ├── neostore.nodestore.db.labels.id
│   │   │   ├── neostore.propertystore.db
│   │   │   ├── neostore.propertystore.db.arrays
│   │   │   ├── neostore.propertystore.db.arrays.id
│   │   │   ├── neostore.propertystore.db.id
│   │   │   ├── neostore.propertystore.db.index
│   │   │   ├── neostore.propertystore.db.index.id
│   │   │   ├── neostore.propertystore.db.index.keys
│   │   │   ├── neostore.propertystore.db.index.keys.id
│   │   │   ├── neostore.propertystore.db.strings
│   │   │   ├── neostore.propertystore.db.strings.id
│   │   │   ├── neostore.relationshipgroupstore.db
│   │   │   ├── neostore.relationshipgroupstore.db.id
│   │   │   ├── neostore.relationshipgroupstore.degrees.db
│   │   │   ├── neostore.relationshipstore.db
│   │   │   ├── neostore.relationshipstore.db.id
│   │   │   ├── neostore.relationshiptypestore.db
│   │   │   ├── neostore.relationshiptypestore.db.id
│   │   │   ├── neostore.relationshiptypestore.db.names
│   │   │   ├── neostore.relationshiptypestore.db.names.id
│   │   │   ├── neostore.schemastore.db
│   │   │   ├── neostore.schemastore.db.id
│   │   │   └── schema
│   │   │   └── index
│   │   │   └── token-lookup-1.0
│   │   │   ├── 1
│   │   │   │   └── index-1
│   │   │   └── 2
│   │   │   └── index-2
│   │   ├── store_lock
│   │   └── system
│   │   ├── database_lock
│   │   ├── id-buffer.tmp.0
│   │   ├── neostore
│   │   ├── neostore.counts.db
│   │   ├── neostore.indexstats.db
│   │   ├── neostore.labeltokenstore.db
│   │   ├── neostore.labeltokenstore.db.id
│   │   ├── neostore.labeltokenstore.db.names
│   │   ├── neostore.labeltokenstore.db.names.id
│   │   ├── neostore.nodestore.db
│   │   ├── neostore.nodestore.db.id
│   │   ├── neostore.nodestore.db.labels
│   │   ├── neostore.nodestore.db.labels.id
│   │   ├── neostore.propertystore.db
│   │   ├── neostore.propertystore.db.arrays
│   │   ├── neostore.propertystore.db.arrays.id
│   │   ├── neostore.propertystore.db.id
│   │   ├── neostore.propertystore.db.index
│   │   ├── neostore.propertystore.db.index.id
│   │   ├── neostore.propertystore.db.index.keys
│   │   ├── neostore.propertystore.db.index.keys.id
│   │   ├── neostore.propertystore.db.strings
│   │   ├── neostore.propertystore.db.strings.id
│   │   ├── neostore.relationshipgroupstore.db
│   │   ├── neostore.relationshipgroupstore.db.id
│   │   ├── neostore.relationshipgroupstore.degrees.db
│   │   ├── neostore.relationshipstore.db
│   │   ├── neostore.relationshipstore.db.id
│   │   ├── neostore.relationshiptypestore.db
│   │   ├── neostore.relationshiptypestore.db.id
│   │   ├── neostore.relationshiptypestore.db.names
│   │   ├── neostore.relationshiptypestore.db.names.id
│   │   ├── neostore.schemastore.db
│   │   ├── neostore.schemastore.db.id
│   │   └── schema
│   │   └── index
│   │   ├── range-1.0
│   │   │   ├── 3
│   │   │   │   └── index-3
│   │   │   ├── 4
│   │   │   │   └── index-4
│   │   │   ├── 7
│   │   │   │   └── index-7
│   │   │   ├── 8
│   │   │   │   └── index-8
│   │   │   └── 9
│   │   │   └── index-9
│   │   └── token-lookup-1.0
│   │   ├── 1
│   │   │   └── index-1
│   │   └── 2
│   │   └── index-2
│   ├── dbms
│   │   └── auth.ini
│   ├── server_id
│   └── transactions
│   ├── neo4j
│   │   ├── checkpoint.0
│   │   └── neostore.transaction.db.0
│   └── system
│   ├── checkpoint.0
│   └── neostore.transaction.db.0
└── postgres_data [error opening dir]
81 directories, 376 files