chore: nuke legacy code, keep design docs for restart
Preserved on the archive branch. Keeping only the architecture and design thinking that survives the rewrite: - docs/ARCH_v0-6-1.md (Inner Self / Executive / Chat / Persona model) - docs/ARCHITECTURE_v0-6-0.md (predecessor architecture) - docs/PROJECT_SUMMARY.md (project history and rationale) - docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md (detailed design notes) - docs/ENVIRONMENT_VARIABLES.md (multi-backend env conventions) - docs/LLMS.md - docs/TRILLIUM_API.md (for future tool integration) Removed: all service code (cortex, core/relay, neomem, rag, sandbox, persona-sidecar), docker-compose, migration/logging docs, stale root test scripts, CHANGELOG.
This commit is contained in:
@@ -1,902 +0,0 @@
|
||||
# Project Lyra - README v0.9.0
|
||||
|
||||
Lyra is a modular persistent AI companion system with advanced reasoning capabilities and autonomous decision-making.
|
||||
It provides memory-backed chat using **Relay** + **Cortex** with integrated **Autonomy System**,
|
||||
featuring a multi-stage reasoning pipeline powered by HTTP-based LLM backends.
|
||||
|
||||
**NEW in v0.9.0:** Trilium Notes integration - Search and create notes from conversations
|
||||
|
||||
**Current Version:** v0.9.0 (2025-12-29)
|
||||
|
||||
> **Note:** As of v0.6.0, NeoMem is **disabled by default** while we work out integration hiccups in the pipeline. The autonomy system is being refined independently before full memory integration.
|
||||
|
||||
## Mission Statement
|
||||
|
||||
The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget evertything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
Project Lyra operates as a **single docker-compose deployment** with multiple Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
|
||||
|
||||
### Core Services
|
||||
|
||||
**1. Relay** (Node.js/Express) - Port 7078
|
||||
- Main orchestrator and message router
|
||||
- Coordinates all module interactions
|
||||
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
|
||||
- Internal endpoint: `POST /chat`
|
||||
- Dual-mode routing: Standard Mode (simple chat) or Cortex Mode (full reasoning)
|
||||
- Server-side session persistence with file-based storage
|
||||
- Session management API: `GET/POST/PATCH/DELETE /sessions`
|
||||
- Manages async calls to Cortex ingest
|
||||
- *(NeoMem integration currently disabled in v0.6.0)*
|
||||
|
||||
**2. UI** (Static HTML) - Port 8081 (nginx)
|
||||
- Browser-based chat interface with cyberpunk theme
|
||||
- Mode selector (Standard/Cortex) in header
|
||||
- Settings modal with backend selection and session management
|
||||
- Light/Dark mode toggle (dark by default)
|
||||
- **NEW in v0.8.0:** "🧠 Show Work" button for real-time thinking stream
|
||||
- Opens popup window with live SSE connection
|
||||
- Color-coded events: thinking, tool calls, results, completion
|
||||
- Auto-scrolling with animations
|
||||
- Session-aware (matches current chat session)
|
||||
- Server-synced session management (persists across browsers and reboots)
|
||||
- OpenAI-compatible message format
|
||||
|
||||
**3. NeoMem** (Python/FastAPI) - Port 7077 - **DISABLED IN v0.6.0**
|
||||
- Long-term memory database (fork of Mem0 OSS)
|
||||
- Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
|
||||
- RESTful API: `/memories`, `/search`
|
||||
- Semantic memory updates and retrieval
|
||||
- No external SDK dependencies - fully local
|
||||
- **Status:** Currently disabled while pipeline integration is refined
|
||||
|
||||
### Reasoning Layer
|
||||
|
||||
**4. Cortex** (Python/FastAPI) - Port 7081
|
||||
- Primary reasoning engine with multi-stage pipeline and autonomy system
|
||||
- **Includes embedded Intake module** (no separate service as of v0.5.1)
|
||||
- **Integrated Autonomy System** (NEW in v0.6.0) - See Autonomy System section below
|
||||
- **Tool Calling System** (NEW in v0.8.0) - Agentic execution for Standard Mode
|
||||
- Sandboxed code execution (Python, JavaScript, Bash)
|
||||
- Web search via Tavily API
|
||||
- **Trilium knowledge base integration** (NEW in v0.9.0)
|
||||
- Multi-iteration autonomous tool use (max 5 iterations)
|
||||
- Real-time thinking stream via SSE
|
||||
- **Dual Operating Modes:**
|
||||
- **Standard Mode** (v0.7.0) - Simple chatbot with context retention + tool calling (v0.8.0)
|
||||
- Bypasses reflection, reasoning, refinement stages
|
||||
- Direct LLM call with conversation history
|
||||
- User-selectable backend (SECONDARY, OPENAI, or custom)
|
||||
- **NEW:** Autonomous tool calling for code execution, web search, knowledge queries
|
||||
- **NEW:** "Show Your Work" real-time thinking stream
|
||||
- Faster responses for coding and practical tasks
|
||||
- **Cortex Mode** - Full 4-stage reasoning pipeline
|
||||
1. **Reflection** - Generates meta-awareness notes about conversation
|
||||
2. **Reasoning** - Creates initial draft answer using context
|
||||
3. **Refinement** - Polishes and improves the draft
|
||||
4. **Persona** - Applies Lyra's personality and speaking style
|
||||
- Integrates with Intake for short-term context via internal Python imports
|
||||
- Flexible LLM router supporting multiple backends via HTTP
|
||||
- **Endpoints:**
|
||||
- `POST /reason` - Main reasoning pipeline (Cortex Mode)
|
||||
- `POST /simple` - Direct LLM chat with tool calling (Standard Mode)
|
||||
- `GET /stream/thinking/{session_id}` - SSE stream for thinking events **NEW in v0.8.0**
|
||||
- `POST /ingest` - Receives conversation exchanges from Relay
|
||||
- `GET /health` - Service health check
|
||||
- `GET /debug/sessions` - Inspect in-memory SESSIONS state
|
||||
- `GET /debug/summary` - Test summarization for a session
|
||||
|
||||
**5. Intake** (Python Module) - **Embedded in Cortex**
|
||||
- **No longer a standalone service** - runs as Python module inside Cortex container
|
||||
- Short-term memory management with session-based circular buffer
|
||||
- In-memory SESSIONS dictionary: `session_id → {buffer: deque(maxlen=200), created_at: timestamp}`
|
||||
- Multi-level summarization (L1/L5/L10/L20/L30) produced by `summarize_context()`
|
||||
- Deferred summarization - actual summary generation happens during `/reason` call
|
||||
- Internal Python API:
|
||||
- `add_exchange_internal(exchange)` - Direct function call from Cortex
|
||||
- `summarize_context(session_id, exchanges)` - Async LLM-based summarization
|
||||
- `SESSIONS` - Module-level global state (requires single Uvicorn worker)
|
||||
|
||||
### LLM Backends (HTTP-based)
|
||||
|
||||
**All LLM communication is done via HTTP APIs:**
|
||||
- **PRIMARY**: llama.cpp server (`http://10.0.0.44:8080`) - AMD MI50 GPU backend
|
||||
- **SECONDARY**: Ollama server (`http://10.0.0.3:11434`) - RTX 3090 backend
|
||||
- Model: qwen2.5:7b-instruct-q4_K_M
|
||||
- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cloud-based models
|
||||
- Model: gpt-4o-mini
|
||||
- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
|
||||
- Model: llama-3.2-8b-instruct
|
||||
|
||||
Each module can be configured to use a different backend via environment variables.
|
||||
|
||||
### Autonomy System (NEW in v0.6.0)
|
||||
|
||||
**Cortex Autonomy Subsystems** - Multi-layered autonomous decision-making and learning
|
||||
- **Executive Layer** [cortex/autonomy/executive/](cortex/autonomy/executive/)
|
||||
- High-level planning and goal setting
|
||||
- Multi-step reasoning for complex objectives
|
||||
- Strategic decision making
|
||||
- **Decision Engine** [cortex/autonomy/tools/decision_engine.py](cortex/autonomy/tools/decision_engine.py)
|
||||
- Autonomous decision-making framework
|
||||
- Option evaluation and selection
|
||||
- Coordinated decision orchestration
|
||||
- **Autonomous Actions** [cortex/autonomy/actions/](cortex/autonomy/actions/)
|
||||
- Self-initiated action execution
|
||||
- Context-aware behavior implementation
|
||||
- Action logging and tracking
|
||||
- **Pattern Learning** [cortex/autonomy/learning/](cortex/autonomy/learning/)
|
||||
- Learns from interaction patterns
|
||||
- Identifies recurring user needs
|
||||
- Adaptive behavior refinement
|
||||
- **Proactive Monitoring** [cortex/autonomy/proactive/](cortex/autonomy/proactive/)
|
||||
- System state monitoring
|
||||
- Intervention opportunity detection
|
||||
- Background awareness capabilities
|
||||
- **Self-Analysis** [cortex/autonomy/self/](cortex/autonomy/self/)
|
||||
- Performance tracking and analysis
|
||||
- Cognitive pattern identification
|
||||
- Self-state persistence in [cortex/data/self_state.json](cortex/data/self_state.json)
|
||||
- **Orchestrator** [cortex/autonomy/tools/orchestrator.py](cortex/autonomy/tools/orchestrator.py)
|
||||
- Coordinates all autonomy subsystems
|
||||
- Manages tool selection and execution
|
||||
- Handles external integrations (with enable/disable controls)
|
||||
|
||||
**Autonomy Architecture:**
|
||||
The autonomy system operates in coordinated layers, all maintaining state in `self_state.json`:
|
||||
1. Executive Layer → Planning and goals
|
||||
2. Decision Layer → Evaluation and choices
|
||||
3. Action Layer → Execution
|
||||
4. Learning Layer → Pattern adaptation
|
||||
5. Monitoring Layer → Proactive awareness
|
||||
|
||||
---
|
||||
|
||||
## Data Flow Architecture (v0.7.0)
|
||||
|
||||
### Standard Mode Flow (NEW in v0.7.0):
|
||||
|
||||
```
|
||||
User (UI) → POST /v1/chat/completions {mode: "standard", backend: "SECONDARY"}
|
||||
↓
|
||||
Relay (7078)
|
||||
↓ POST /simple
|
||||
Cortex (7081)
|
||||
↓ (internal Python call)
|
||||
Intake module → get_recent_messages() (last 20 messages)
|
||||
↓
|
||||
Direct LLM call (user-selected backend: SECONDARY/OPENAI/custom)
|
||||
↓
|
||||
Returns simple response to Relay
|
||||
↓
|
||||
Relay → POST /ingest (async)
|
||||
↓
|
||||
Cortex → add_exchange_internal() → SESSIONS buffer
|
||||
↓
|
||||
Relay → POST /sessions/:id (save session to file)
|
||||
↓
|
||||
Relay → UI (returns final response)
|
||||
|
||||
Note: Bypasses reflection, reasoning, refinement, persona stages
|
||||
```
|
||||
|
||||
### Cortex Mode Flow (Full Reasoning):
|
||||
|
||||
```
|
||||
User (UI) → POST /v1/chat/completions {mode: "cortex"}
|
||||
↓
|
||||
Relay (7078)
|
||||
↓ POST /reason
|
||||
Cortex (7081)
|
||||
↓ (internal Python call)
|
||||
Intake module → summarize_context()
|
||||
↓
|
||||
Autonomy System → Decision evaluation & pattern learning
|
||||
↓
|
||||
Cortex processes (4 stages):
|
||||
1. reflection.py → meta-awareness notes (CLOUD backend)
|
||||
2. reasoning.py → draft answer (PRIMARY backend, autonomy-aware)
|
||||
3. refine.py → refined answer (PRIMARY backend)
|
||||
4. persona/speak.py → Lyra personality (CLOUD backend, autonomy-aware)
|
||||
↓
|
||||
Returns persona answer to Relay
|
||||
↓
|
||||
Relay → POST /ingest (async)
|
||||
↓
|
||||
Cortex → add_exchange_internal() → SESSIONS buffer
|
||||
↓
|
||||
Autonomy System → Update self_state.json (pattern tracking)
|
||||
↓
|
||||
Relay → POST /sessions/:id (save session to file)
|
||||
↓
|
||||
Relay → UI (returns final response)
|
||||
|
||||
Note: NeoMem integration disabled in v0.6.0
|
||||
```
|
||||
|
||||
### Session Persistence Flow (NEW in v0.7.0):
|
||||
|
||||
```
|
||||
UI loads → GET /sessions → Relay → List all sessions from files → UI dropdown
|
||||
User sends message → POST /sessions/:id → Relay → Save to sessions/*.json
|
||||
User renames session → PATCH /sessions/:id/metadata → Relay → Update *.meta.json
|
||||
User deletes session → DELETE /sessions/:id → Relay → Remove session files
|
||||
|
||||
Sessions stored in: core/relay/sessions/
|
||||
- {sessionId}.json (conversation history)
|
||||
- {sessionId}.meta.json (name, timestamps, metadata)
|
||||
```
|
||||
|
||||
### Cortex 4-Stage Reasoning Pipeline:
|
||||
|
||||
1. **Reflection** (`reflection.py`) - Cloud LLM (OpenAI)
|
||||
- Analyzes user intent and conversation context
|
||||
- Generates meta-awareness notes
|
||||
- "What is the user really asking?"
|
||||
|
||||
2. **Reasoning** (`reasoning.py`) - Primary LLM (llama.cpp)
|
||||
- Retrieves short-term context from Intake module
|
||||
- Creates initial draft answer
|
||||
- Integrates context, reflection notes, and user prompt
|
||||
|
||||
3. **Refinement** (`refine.py`) - Primary LLM (llama.cpp)
|
||||
- Polishes the draft answer
|
||||
- Improves clarity and coherence
|
||||
- Ensures factual consistency
|
||||
|
||||
4. **Persona** (`speak.py`) - Cloud LLM (OpenAI)
|
||||
- Applies Lyra's personality and speaking style
|
||||
- Natural, conversational output
|
||||
- Final answer returned to user
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
### Core Services
|
||||
|
||||
**Relay**:
|
||||
- Main orchestrator and message router
|
||||
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
|
||||
- Internal endpoint: `POST /chat`
|
||||
- Health check: `GET /_health`
|
||||
- **NEW:** Dual-mode routing (Standard/Cortex)
|
||||
- **NEW:** Server-side session persistence with CRUD API
|
||||
- **NEW:** Session management endpoints:
|
||||
- `GET /sessions` - List all sessions
|
||||
- `GET /sessions/:id` - Retrieve session history
|
||||
- `POST /sessions/:id` - Save session history
|
||||
- `PATCH /sessions/:id/metadata` - Update session metadata
|
||||
- `DELETE /sessions/:id` - Delete session
|
||||
- Async non-blocking calls to Cortex
|
||||
- Shared request handler for code reuse
|
||||
- Comprehensive error handling
|
||||
|
||||
**NeoMem (Memory Engine)**:
|
||||
- Forked from Mem0 OSS - fully independent
|
||||
- Drop-in compatible API (`/memories`, `/search`)
|
||||
- Local-first: runs on FastAPI with Postgres + Neo4j
|
||||
- No external SDK dependencies
|
||||
- Semantic memory updates - compares embeddings and performs in-place updates
|
||||
- Default service: `neomem-api` (port 7077)
|
||||
|
||||
**UI**:
|
||||
- Lightweight static HTML chat interface
|
||||
- Cyberpunk theme with light/dark mode toggle
|
||||
- **NEW:** Mode selector (Standard/Cortex) in header
|
||||
- **NEW:** Settings modal (⚙ button) with:
|
||||
- Backend selection for Standard Mode (SECONDARY/OPENAI/custom)
|
||||
- Session management (view, delete sessions)
|
||||
- Theme toggle (dark mode default)
|
||||
- **NEW:** Server-synced session management
|
||||
- Sessions persist across browsers and reboots
|
||||
- Rename sessions with custom names
|
||||
- Delete sessions with confirmation
|
||||
- Automatic session save on every message
|
||||
- OpenAI message format support
|
||||
|
||||
### Reasoning Layer
|
||||
|
||||
**Cortex** (v0.7.0):
|
||||
- **NEW:** Dual operating modes:
|
||||
- **Standard Mode** - Simple chat with context (`/simple` endpoint)
|
||||
- User-selectable backend (SECONDARY, OPENAI, or custom)
|
||||
- Full conversation history via Intake integration
|
||||
- Bypasses reasoning pipeline for faster responses
|
||||
- **Cortex Mode** - Full reasoning pipeline (`/reason` endpoint)
|
||||
- Multi-stage processing: reflection → reasoning → refine → persona
|
||||
- Per-stage backend selection
|
||||
- Autonomy system integration
|
||||
- Flexible LLM backend routing via HTTP
|
||||
- Async processing throughout
|
||||
- Embedded Intake module for short-term context
|
||||
- `/reason`, `/simple`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary` endpoints
|
||||
- Lenient error handling - never fails the chat pipeline
|
||||
|
||||
**Intake** (Embedded Module):
|
||||
- **Architectural change**: Now runs as Python module inside Cortex container
|
||||
- In-memory SESSIONS management (session_id → buffer)
|
||||
- Multi-level summarization: L1 (ultra-short), L5 (short), L10 (medium), L20 (detailed), L30 (full)
|
||||
- Deferred summarization strategy - summaries generated during `/reason` call
|
||||
- `bg_summarize()` is a logging stub - actual work deferred
|
||||
- **Single-worker constraint**: SESSIONS requires single Uvicorn worker or Redis/shared storage
|
||||
|
||||
**LLM Router**:
|
||||
- Dynamic backend selection via HTTP
|
||||
- Environment-driven configuration
|
||||
- Support for llama.cpp, Ollama, OpenAI, custom endpoints
|
||||
- Per-module backend preferences:
|
||||
- `CORTEX_LLM=SECONDARY` (Ollama for reasoning)
|
||||
- `INTAKE_LLM=PRIMARY` (llama.cpp for summarization)
|
||||
- `SPEAK_LLM=OPENAI` (Cloud for persona)
|
||||
- `NEOMEM_LLM=PRIMARY` (llama.cpp for memory operations)
|
||||
|
||||
### Beta Lyrae (RAG Memory DB) - Currently Disabled
|
||||
|
||||
- **RAG Knowledge DB - Beta Lyrae (sheliak)**
|
||||
- This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra.
|
||||
- It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation.
|
||||
- **Status**: Disabled in docker-compose.yml (v0.5.1)
|
||||
|
||||
The system uses:
|
||||
- **ChromaDB** for persistent vector storage
|
||||
- **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity
|
||||
- **FastAPI** (port 7090) for the `/rag/search` REST endpoint
|
||||
|
||||
Directory Layout:
|
||||
```
|
||||
rag/
|
||||
├── rag_chat_import.py # imports JSON chat logs
|
||||
├── rag_docs_import.py # (planned) PDF/EPUB/manual importer
|
||||
├── rag_build.py # legacy single-folder builder
|
||||
├── rag_query.py # command-line query helper
|
||||
├── rag_api.py # FastAPI service providing /rag/search
|
||||
├── chromadb/ # persistent vector store
|
||||
├── chatlogs/ # organized source data
|
||||
│ ├── poker/
|
||||
│ ├── work/
|
||||
│ ├── lyra/
|
||||
│ ├── personal/
|
||||
│ └── ...
|
||||
└── import.log # progress log for batch runs
|
||||
```
|
||||
|
||||
**OpenAI chatlog importer features:**
|
||||
- Recursive folder indexing with **category detection** from directory name
|
||||
- Smart chunking for long messages (5,000 chars per slice)
|
||||
- Automatic deduplication using SHA-1 hash of file + chunk
|
||||
- Timestamps for both file modification and import time
|
||||
- Full progress logging via tqdm
|
||||
- Safe to run in background with `nohup … &`
|
||||
|
||||
---
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
All services run in a single docker-compose stack with the following containers:
|
||||
|
||||
**Active Services:**
|
||||
- **relay** - Main orchestrator (port 7078)
|
||||
- **cortex** - Reasoning engine with embedded Intake and Autonomy System (port 7081)
|
||||
|
||||
**Disabled Services (v0.6.0):**
|
||||
- **neomem-postgres** - PostgreSQL with pgvector extension (port 5432) - *disabled while refining pipeline*
|
||||
- **neomem-neo4j** - Neo4j graph database (ports 7474, 7687) - *disabled while refining pipeline*
|
||||
- **neomem-api** - NeoMem memory service (port 7077) - *disabled while refining pipeline*
|
||||
- **intake** - No longer needed (embedded in Cortex as of v0.5.1)
|
||||
- **rag** - Beta Lyrae RAG service (port 7090) - currently disabled
|
||||
|
||||
All containers communicate via the `lyra_net` Docker bridge network.
|
||||
|
||||
## External LLM Services
|
||||
|
||||
The following LLM backends are accessed via HTTP (not part of docker-compose):
|
||||
|
||||
- **llama.cpp Server** (`http://10.0.0.44:8080`)
|
||||
- AMD MI50 GPU-accelerated inference
|
||||
- Primary backend for reasoning and refinement stages
|
||||
- Model path: `/model`
|
||||
|
||||
- **Ollama Server** (`http://10.0.0.3:11434`)
|
||||
- RTX 3090 GPU-accelerated inference
|
||||
- Secondary/configurable backend
|
||||
- Model: qwen2.5:7b-instruct-q4_K_M
|
||||
|
||||
- **OpenAI API** (`https://api.openai.com/v1`)
|
||||
- Cloud-based inference
|
||||
- Used for reflection and persona stages
|
||||
- Model: gpt-4o-mini
|
||||
|
||||
- **Fallback Server** (`http://10.0.0.41:11435`)
|
||||
- Emergency backup endpoint
|
||||
- Local llama-3.2-8b-instruct model
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
### v0.9.0 (2025-12-29) - Current Release
|
||||
**Major Feature: Trilium Notes Integration**
|
||||
- ✅ Added Trilium ETAPI integration for knowledge base access
|
||||
- ✅ `search_notes()` tool for searching personal notes during conversations
|
||||
- ✅ `create_note()` tool for capturing insights and information
|
||||
- ✅ ETAPI authentication with secure token management
|
||||
- ✅ Complete setup documentation and API reference
|
||||
- ✅ Environment configuration with feature flag (`ENABLE_TRILIUM`)
|
||||
- ✅ Automatic parent note handling (defaults to "root")
|
||||
- ✅ Connection error handling and user-friendly messages
|
||||
|
||||
**Key Capabilities:**
|
||||
- Search your Trilium notes during conversations for context
|
||||
- Create new notes from conversation insights automatically
|
||||
- Cross-reference information between chat and knowledge base
|
||||
- Future: Find duplicates, suggest organization, summarize notes
|
||||
|
||||
**Documentation:**
|
||||
- Added [TRILIUM_SETUP.md](TRILIUM_SETUP.md) - Complete setup guide
|
||||
- Added [docs/TRILIUM_API.md](docs/TRILIUM_API.md) - Full API reference
|
||||
|
||||
### v0.8.0 (2025-12-26)
|
||||
**Major Feature: Agentic Tool Calling + "Show Your Work"**
|
||||
- ✅ Added tool calling system for Standard Mode
|
||||
- ✅ Real-time thinking stream visualization
|
||||
- ✅ Sandboxed code execution (Python, JavaScript, Bash)
|
||||
- ✅ Web search integration via Tavily API
|
||||
- ✅ Server-Sent Events (SSE) for live tool execution updates
|
||||
|
||||
### v0.7.0 (2025-12-21)
|
||||
**Major Features: Standard Mode + Backend Selection + Session Persistence**
|
||||
- ✅ Added Standard Mode for simple chatbot functionality
|
||||
- ✅ UI mode selector (Standard/Cortex) in header
|
||||
- ✅ Settings modal with backend selection for Standard Mode
|
||||
- ✅ Server-side session persistence with file-based storage
|
||||
- ✅ Session management UI (view, rename, delete sessions)
|
||||
- ✅ Light/Dark mode toggle (dark by default)
|
||||
- ✅ Context retention in Standard Mode via Intake integration
|
||||
- ✅ Fixed modal positioning and z-index issues
|
||||
- ✅ Cortex `/simple` endpoint for direct LLM calls
|
||||
- ✅ Session CRUD API in Relay
|
||||
- ✅ Full backward compatibility - Cortex Mode unchanged
|
||||
|
||||
**Key Changes:**
|
||||
- Standard Mode bypasses 6 of 7 reasoning stages for faster responses
|
||||
- Sessions now sync across browsers and survive container restarts
|
||||
- User can select SECONDARY (Ollama), OPENAI, or custom backend for Standard Mode
|
||||
- Theme preference and backend selection persisted in localStorage
|
||||
- Session files stored in `core/relay/sessions/` directory
|
||||
|
||||
### v0.6.0 (2025-12-18)
|
||||
**Major Feature: Autonomy System (Phase 1, 2, and 2.5)**
|
||||
- ✅ Added autonomous decision-making framework
|
||||
- ✅ Implemented executive planning and goal-setting layer
|
||||
- ✅ Added pattern learning system for adaptive behavior
|
||||
- ✅ Implemented proactive monitoring capabilities
|
||||
- ✅ Created self-analysis and performance tracking system
|
||||
- ✅ Integrated self-state persistence (`cortex/data/self_state.json`)
|
||||
- ✅ Built decision engine with orchestrator coordination
|
||||
- ✅ Added autonomous action execution framework
|
||||
- ✅ Integrated autonomy into reasoning and persona layers
|
||||
- ✅ Created comprehensive test suites for autonomy features
|
||||
- ✅ Added complete system breakdown documentation
|
||||
|
||||
**Architecture Changes:**
|
||||
- Autonomy system integrated into Cortex reasoning pipeline
|
||||
- Multi-layered autonomous decision-making architecture
|
||||
- Self-state tracking across sessions
|
||||
- NeoMem disabled by default while refining pipeline integration
|
||||
- Enhanced orchestrator with flexible service controls
|
||||
|
||||
**Documentation:**
|
||||
- Added [PROJECT_LYRA_COMPLETE_BREAKDOWN.md](docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md)
|
||||
- Updated changelog with comprehensive autonomy system details
|
||||
|
||||
### v0.5.1 (2025-12-11)
|
||||
**Critical Intake Integration Fixes:**
|
||||
- ✅ Fixed `bg_summarize()` NameError preventing SESSIONS persistence
|
||||
- ✅ Fixed `/ingest` endpoint unreachable code
|
||||
- ✅ Added `cortex/intake/__init__.py` for proper package structure
|
||||
- ✅ Added diagnostic logging to verify SESSIONS singleton behavior
|
||||
- ✅ Added `/debug/sessions` and `/debug/summary` endpoints
|
||||
- ✅ Documented single-worker constraint in Dockerfile
|
||||
- ✅ Implemented lenient error handling (never fails chat pipeline)
|
||||
- ✅ Intake now embedded in Cortex - no longer standalone service
|
||||
|
||||
**Architecture Changes:**
|
||||
- Intake module runs inside Cortex container as pure Python import
|
||||
- No HTTP calls between Cortex and Intake (internal function calls)
|
||||
- SESSIONS persist correctly in Uvicorn worker
|
||||
- Deferred summarization strategy (summaries generated during `/reason`)
|
||||
|
||||
### v0.5.0 (2025-11-28)
|
||||
- ✅ Fixed all critical API wiring issues
|
||||
- ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
|
||||
- ✅ Fixed Cortex → Intake integration
|
||||
- ✅ Added missing Python package `__init__.py` files
|
||||
- ✅ End-to-end message flow verified and working
|
||||
|
||||
### Infrastructure v1.0.0 (2025-11-26)
|
||||
- Consolidated 9 scattered `.env` files into single source of truth
|
||||
- Multi-backend LLM strategy implemented
|
||||
- Docker Compose consolidation
|
||||
- Created `.env.example` security templates
|
||||
|
||||
### v0.4.x (Major Rewire)
|
||||
- Cortex multi-stage reasoning pipeline
|
||||
- LLM router with multi-backend support
|
||||
- Major architectural restructuring
|
||||
|
||||
### v0.3.x
|
||||
- Beta Lyrae RAG system
|
||||
- NeoMem integration
|
||||
- Basic Cortex reasoning loop
|
||||
|
||||
---
|
||||
|
||||
## Known Issues (v0.7.0)
|
||||
|
||||
### Temporarily Disabled
|
||||
- **NeoMem disabled by default** - Being refined independently before full integration
|
||||
- PostgreSQL + pgvector storage inactive
|
||||
- Neo4j graph database inactive
|
||||
- Memory persistence endpoints not active
|
||||
- RAG service (Beta Lyrae) currently disabled in docker-compose.yml
|
||||
|
||||
### Standard Mode Limitations
|
||||
- No reflection, reasoning, or refinement stages (by design)
|
||||
- DeepSeek R1 not recommended for Standard Mode (generates reasoning artifacts)
|
||||
- No RAG integration (same as Cortex Mode - currently disabled)
|
||||
- No NeoMem memory storage (same as Cortex Mode - currently disabled)
|
||||
|
||||
### Session Management Limitations
|
||||
- Sessions stored in container filesystem - requires volume mount for true persistence
|
||||
- No session import/export functionality yet
|
||||
- No session search or filtering
|
||||
- Old localStorage sessions don't automatically migrate to server
|
||||
|
||||
### Operational Notes
|
||||
- **Single-worker constraint**: Cortex must run with single Uvicorn worker to maintain SESSIONS state
|
||||
- Multi-worker scaling requires migrating SESSIONS to Redis or shared storage
|
||||
- Diagnostic endpoints (`/debug/sessions`, `/debug/summary`) available for troubleshooting
|
||||
- Backend selection only affects Standard Mode - Cortex Mode uses environment-configured backends
|
||||
|
||||
### Future Enhancements
|
||||
- Re-enable NeoMem integration after pipeline refinement
|
||||
- Full autonomy system maturation and optimization
|
||||
- Re-enable RAG service integration
|
||||
- Session import/export functionality
|
||||
- Session search and filtering UI
|
||||
- Migrate SESSIONS to Redis for multi-worker support
|
||||
- Add request correlation IDs for tracing
|
||||
- Comprehensive health checks across all services
|
||||
- Enhanced pattern learning with long-term memory integration
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
- Docker + Docker Compose
|
||||
- At least one HTTP-accessible LLM endpoint (llama.cpp, Ollama, or OpenAI API key)
|
||||
|
||||
### Setup
|
||||
1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys:
|
||||
```bash
|
||||
# Required: Configure at least one LLM backend
|
||||
LLM_PRIMARY_URL=http://10.0.0.44:8080 # llama.cpp
|
||||
LLM_SECONDARY_URL=http://10.0.0.3:11434 # Ollama
|
||||
OPENAI_API_KEY=sk-... # OpenAI
|
||||
```
|
||||
|
||||
2. Start all services with docker-compose:
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
3. Check service health:
|
||||
```bash
|
||||
# Relay health
|
||||
curl http://localhost:7078/_health
|
||||
|
||||
# Cortex health
|
||||
curl http://localhost:7081/health
|
||||
|
||||
# NeoMem health
|
||||
curl http://localhost:7077/health
|
||||
```
|
||||
|
||||
4. Access the UI at `http://localhost:8081`
|
||||
|
||||
### Using the UI
|
||||
|
||||
**Mode Selection:**
|
||||
- Use the **Mode** dropdown in the header to switch between:
|
||||
- **Standard** - Simple chatbot for coding and practical tasks
|
||||
- **Cortex** - Full reasoning pipeline with autonomy features
|
||||
|
||||
**Settings Menu:**
|
||||
1. Click the **⚙ Settings** button in the header
|
||||
2. **Backend Selection** (Standard Mode only):
|
||||
- Choose **SECONDARY** (Ollama/Qwen on 3090) - Fast, local
|
||||
- Choose **OPENAI** (GPT-4o-mini) - Cloud-based, high quality
|
||||
- Enter custom backend name for advanced configurations
|
||||
3. **Session Management**:
|
||||
- View all saved sessions with message counts and timestamps
|
||||
- Click 🗑️ to delete unwanted sessions
|
||||
4. **Theme Toggle**:
|
||||
- Click **🌙 Dark Mode** or **☀️ Light Mode** to switch themes
|
||||
|
||||
**Session Management:**
|
||||
- Sessions automatically save on every message
|
||||
- Use the **Session** dropdown to switch between sessions
|
||||
- Click **➕ New** to create a new session
|
||||
- Click **✏️ Rename** to rename the current session
|
||||
- Sessions persist across browsers and container restarts
|
||||
|
||||
### Test
|
||||
|
||||
**Test Standard Mode:**
|
||||
```bash
|
||||
curl -X POST http://localhost:7078/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"mode": "standard",
|
||||
"backend": "SECONDARY",
|
||||
"messages": [{"role": "user", "content": "Hello!"}],
|
||||
"sessionId": "test"
|
||||
}'
|
||||
```
|
||||
|
||||
**Test Cortex Mode (Full Reasoning):**
|
||||
```bash
|
||||
curl -X POST http://localhost:7078/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"mode": "cortex",
|
||||
"messages": [{"role": "user", "content": "Hello Lyra!"}],
|
||||
"sessionId": "test"
|
||||
}'
|
||||
```
|
||||
|
||||
**Test Cortex /ingest endpoint:**
|
||||
```bash
|
||||
curl -X POST http://localhost:7081/ingest \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"session_id": "test",
|
||||
"user_msg": "Hello",
|
||||
"assistant_msg": "Hi there!"
|
||||
}'
|
||||
```
|
||||
|
||||
**Inspect SESSIONS state:**
|
||||
```bash
|
||||
curl http://localhost:7081/debug/sessions
|
||||
```
|
||||
|
||||
**Get summary for a session:**
|
||||
```bash
|
||||
curl "http://localhost:7081/debug/summary?session_id=test"
|
||||
```
|
||||
|
||||
**List all sessions:**
|
||||
```bash
|
||||
curl http://localhost:7078/sessions
|
||||
```
|
||||
|
||||
**Get session history:**
|
||||
```bash
|
||||
curl http://localhost:7078/sessions/sess-abc123
|
||||
```
|
||||
|
||||
**Delete a session:**
|
||||
```bash
|
||||
curl -X DELETE http://localhost:7078/sessions/sess-abc123
|
||||
```
|
||||
|
||||
All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack.
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### LLM Backend Configuration
|
||||
|
||||
**Backend URLs (Full API endpoints):**
|
||||
```bash
|
||||
LLM_PRIMARY_URL=http://10.0.0.44:8080 # llama.cpp
|
||||
LLM_PRIMARY_MODEL=/model
|
||||
|
||||
LLM_SECONDARY_URL=http://10.0.0.3:11434 # Ollama
|
||||
LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
|
||||
|
||||
LLM_OPENAI_URL=https://api.openai.com/v1
|
||||
LLM_OPENAI_MODEL=gpt-4o-mini
|
||||
OPENAI_API_KEY=sk-...
|
||||
```
|
||||
|
||||
**Module-specific backend selection:**
|
||||
```bash
|
||||
CORTEX_LLM=SECONDARY # Use Ollama for reasoning
|
||||
INTAKE_LLM=PRIMARY # Use llama.cpp for summarization
|
||||
SPEAK_LLM=OPENAI # Use OpenAI for persona
|
||||
NEOMEM_LLM=PRIMARY # Use llama.cpp for memory
|
||||
UI_LLM=OPENAI # Use OpenAI for UI
|
||||
RELAY_LLM=PRIMARY # Use llama.cpp for relay
|
||||
STANDARD_MODE_LLM=SECONDARY # Default backend for Standard Mode (NEW in v0.7.0)
|
||||
```
|
||||
|
||||
### Database Configuration
|
||||
```bash
|
||||
POSTGRES_USER=neomem
|
||||
POSTGRES_PASSWORD=neomempass
|
||||
POSTGRES_DB=neomem
|
||||
POSTGRES_HOST=neomem-postgres
|
||||
POSTGRES_PORT=5432
|
||||
|
||||
NEO4J_URI=bolt://neomem-neo4j:7687
|
||||
NEO4J_USERNAME=neo4j
|
||||
NEO4J_PASSWORD=neomemgraph
|
||||
```
|
||||
|
||||
### Service URLs (Internal Docker Network)
|
||||
```bash
|
||||
NEOMEM_API=http://neomem-api:7077
|
||||
CORTEX_API=http://cortex:7081
|
||||
CORTEX_REASON_URL=http://cortex:7081/reason
|
||||
CORTEX_SIMPLE_URL=http://cortex:7081/simple # NEW in v0.7.0
|
||||
CORTEX_INGEST_URL=http://cortex:7081/ingest
|
||||
RELAY_URL=http://relay:7078
|
||||
```
|
||||
|
||||
### Feature Flags
|
||||
```bash
|
||||
CORTEX_ENABLED=true
|
||||
MEMORY_ENABLED=true
|
||||
PERSONA_ENABLED=false
|
||||
DEBUG_PROMPT=true
|
||||
VERBOSE_DEBUG=true
|
||||
ENABLE_TRILIUM=true # NEW in v0.9.0
|
||||
```
|
||||
|
||||
For complete environment variable reference, see [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md).
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
- [CHANGELOG.md](CHANGELOG.md) - Detailed version history
|
||||
- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Comprehensive project overview for AI context
|
||||
- [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md) - Environment variable reference
|
||||
- [DEPRECATED_FILES.md](DEPRECATED_FILES.md) - Deprecated files and migration guide
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### SESSIONS not persisting
|
||||
**Symptom:** Intake buffer always shows 0 exchanges, summaries always empty.
|
||||
|
||||
**Solution (Fixed in v0.5.1):**
|
||||
- Ensure `cortex/intake/__init__.py` exists
|
||||
- Check Cortex logs for `[Intake Module Init]` message showing SESSIONS object ID
|
||||
- Verify single-worker mode (Dockerfile: `uvicorn main:app --workers 1`)
|
||||
- Use `/debug/sessions` endpoint to inspect current state
|
||||
|
||||
### Cortex connection errors
|
||||
**Symptom:** Relay can't reach Cortex, 502 errors.
|
||||
|
||||
**Solution:**
|
||||
- Verify Cortex container is running: `docker ps | grep cortex`
|
||||
- Check Cortex health: `curl http://localhost:7081/health`
|
||||
- Verify environment variables: `CORTEX_REASON_URL=http://cortex:7081/reason`
|
||||
- Check docker network: `docker network inspect lyra_net`
|
||||
|
||||
### LLM backend timeouts
|
||||
**Symptom:** Reasoning stage hangs or times out.
|
||||
|
||||
**Solution:**
|
||||
- Verify LLM backend is running and accessible
|
||||
- Check LLM backend health: `curl http://10.0.0.44:8080/health`
|
||||
- Increase timeout in llm_router.py if using slow models
|
||||
- Check logs for specific backend errors
|
||||
|
||||
---
|
||||
|
||||
## License
|
||||
|
||||
NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
|
||||
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.
|
||||
|
||||
**Built with Claude Code**
|
||||
|
||||
---
|
||||
|
||||
## Integration Notes
|
||||
|
||||
- NeoMem API is compatible with Mem0 OSS endpoints (`/memories`, `/search`)
|
||||
- All services communicate via Docker internal networking on the `lyra_net` bridge
|
||||
- History and entity graphs are managed via PostgreSQL + Neo4j
|
||||
- LLM backends are accessed via HTTP and configured in `.env`
|
||||
- Intake module is imported internally by Cortex (no HTTP communication)
|
||||
- SESSIONS state is maintained in-memory within Cortex container
|
||||
|
||||
---
|
||||
|
||||
## Beta Lyrae - RAG Memory System (Currently Disabled)
|
||||
|
||||
**Note:** The RAG service is currently disabled in docker-compose.yml
|
||||
|
||||
### Requirements
|
||||
- Python 3.10+
|
||||
- Dependencies: `chromadb openai tqdm python-dotenv fastapi uvicorn`
|
||||
- Persistent storage: `./chromadb` or `/mnt/data/lyra_rag_db`
|
||||
|
||||
### Setup
|
||||
1. Import chat logs (must be in OpenAI message format):
|
||||
```bash
|
||||
python3 rag/rag_chat_import.py
|
||||
```
|
||||
|
||||
2. Build and start the RAG API server:
|
||||
```bash
|
||||
cd rag
|
||||
python3 rag_build.py
|
||||
uvicorn rag_api:app --host 0.0.0.0 --port 7090
|
||||
```
|
||||
|
||||
3. Query the RAG system:
|
||||
```bash
|
||||
curl -X POST http://127.0.0.1:7090/rag/search \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"query": "What is the current state of Cortex?",
|
||||
"where": {"category": "lyra"}
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Development Notes
|
||||
|
||||
### Cortex Architecture (v0.6.0)
|
||||
- Cortex contains embedded Intake module at `cortex/intake/`
|
||||
- Intake is imported as: `from intake.intake import add_exchange_internal, SESSIONS`
|
||||
- SESSIONS is a module-level global dictionary (singleton pattern)
|
||||
- Single-worker constraint required to maintain SESSIONS state
|
||||
- Diagnostic endpoints available for debugging: `/debug/sessions`, `/debug/summary`
|
||||
- **NEW:** Autonomy system integrated at `cortex/autonomy/`
|
||||
- Executive, decision, action, learning, and monitoring layers
|
||||
- Self-state persistence in `cortex/data/self_state.json`
|
||||
- Coordinated via orchestrator with flexible service controls
|
||||
|
||||
### Adding New LLM Backends
|
||||
1. Add backend URL to `.env`:
|
||||
```bash
|
||||
LLM_CUSTOM_URL=http://your-backend:port
|
||||
LLM_CUSTOM_MODEL=model-name
|
||||
```
|
||||
|
||||
2. Configure module to use new backend:
|
||||
```bash
|
||||
CORTEX_LLM=CUSTOM
|
||||
```
|
||||
|
||||
3. Restart Cortex container:
|
||||
```bash
|
||||
docker-compose restart cortex
|
||||
```
|
||||
|
||||
### Debugging Tips
|
||||
- Enable verbose logging: `VERBOSE_DEBUG=true` in `.env`
|
||||
- Check Cortex logs: `docker logs cortex -f`
|
||||
- Check Relay logs: `docker logs relay -f`
|
||||
- Inspect SESSIONS: `curl http://localhost:7081/debug/sessions`
|
||||
- Test summarization: `curl "http://localhost:7081/debug/summary?session_id=test"`
|
||||
- List sessions: `curl http://localhost:7078/sessions`
|
||||
- Test Standard Mode: `curl -X POST http://localhost:7078/v1/chat/completions -H "Content-Type: application/json" -d '{"mode":"standard","backend":"SECONDARY","messages":[{"role":"user","content":"test"}],"sessionId":"test"}'`
|
||||
- Monitor Docker network: `docker network inspect lyra_net`
|
||||
- Check session files: `ls -la core/relay/sessions/`
|
||||
Reference in New Issue
Block a user