610 lines
20 KiB
Markdown
610 lines
20 KiB
Markdown
# Project Lyra - README v0.5.1
|
|
|
|
Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
|
|
It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
|
|
with multi-stage reasoning pipeline powered by HTTP-based LLM backends.
|
|
|
|
**Current Version:** v0.5.1 (2025-12-11)
|
|
|
|
## Mission Statement
|
|
|
|
The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget evertything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
Project Lyra operates as a **single docker-compose deployment** with multiple Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
|
|
|
|
### Core Services
|
|
|
|
**1. Relay** (Node.js/Express) - Port 7078
|
|
- Main orchestrator and message router
|
|
- Coordinates all module interactions
|
|
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
|
|
- Internal endpoint: `POST /chat`
|
|
- Routes messages through Cortex reasoning pipeline
|
|
- Manages async calls to NeoMem and Cortex ingest
|
|
|
|
**2. UI** (Static HTML)
|
|
- Browser-based chat interface with cyberpunk theme
|
|
- Connects to Relay
|
|
- Saves and loads sessions
|
|
- OpenAI-compatible message format
|
|
|
|
**3. NeoMem** (Python/FastAPI) - Port 7077
|
|
- Long-term memory database (fork of Mem0 OSS)
|
|
- Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
|
|
- RESTful API: `/memories`, `/search`
|
|
- Semantic memory updates and retrieval
|
|
- No external SDK dependencies - fully local
|
|
|
|
### Reasoning Layer
|
|
|
|
**4. Cortex** (Python/FastAPI) - Port 7081
|
|
- Primary reasoning engine with multi-stage pipeline
|
|
- **Includes embedded Intake module** (no separate service as of v0.5.1)
|
|
- **4-Stage Processing:**
|
|
1. **Reflection** - Generates meta-awareness notes about conversation
|
|
2. **Reasoning** - Creates initial draft answer using context
|
|
3. **Refinement** - Polishes and improves the draft
|
|
4. **Persona** - Applies Lyra's personality and speaking style
|
|
- Integrates with Intake for short-term context via internal Python imports
|
|
- Flexible LLM router supporting multiple backends via HTTP
|
|
- **Endpoints:**
|
|
- `POST /reason` - Main reasoning pipeline
|
|
- `POST /ingest` - Receives conversation exchanges from Relay
|
|
- `GET /health` - Service health check
|
|
- `GET /debug/sessions` - Inspect in-memory SESSIONS state
|
|
- `GET /debug/summary` - Test summarization for a session
|
|
|
|
**5. Intake** (Python Module) - **Embedded in Cortex**
|
|
- **No longer a standalone service** - runs as Python module inside Cortex container
|
|
- Short-term memory management with session-based circular buffer
|
|
- In-memory SESSIONS dictionary: `session_id → {buffer: deque(maxlen=200), created_at: timestamp}`
|
|
- Multi-level summarization (L1/L5/L10/L20/L30) produced by `summarize_context()`
|
|
- Deferred summarization - actual summary generation happens during `/reason` call
|
|
- Internal Python API:
|
|
- `add_exchange_internal(exchange)` - Direct function call from Cortex
|
|
- `summarize_context(session_id, exchanges)` - Async LLM-based summarization
|
|
- `SESSIONS` - Module-level global state (requires single Uvicorn worker)
|
|
|
|
### LLM Backends (HTTP-based)
|
|
|
|
**All LLM communication is done via HTTP APIs:**
|
|
- **PRIMARY**: llama.cpp server (`http://10.0.0.44:8080`) - AMD MI50 GPU backend
|
|
- **SECONDARY**: Ollama server (`http://10.0.0.3:11434`) - RTX 3090 backend
|
|
- Model: qwen2.5:7b-instruct-q4_K_M
|
|
- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cloud-based models
|
|
- Model: gpt-4o-mini
|
|
- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
|
|
- Model: llama-3.2-8b-instruct
|
|
|
|
Each module can be configured to use a different backend via environment variables.
|
|
|
|
---
|
|
|
|
## Data Flow Architecture (v0.5.1)
|
|
|
|
### Normal Message Flow:
|
|
|
|
```
|
|
User (UI) → POST /v1/chat/completions
|
|
↓
|
|
Relay (7078)
|
|
↓ POST /reason
|
|
Cortex (7081)
|
|
↓ (internal Python call)
|
|
Intake module → summarize_context()
|
|
↓
|
|
Cortex processes (4 stages):
|
|
1. reflection.py → meta-awareness notes (CLOUD backend)
|
|
2. reasoning.py → draft answer (PRIMARY backend)
|
|
3. refine.py → refined answer (PRIMARY backend)
|
|
4. persona/speak.py → Lyra personality (CLOUD backend)
|
|
↓
|
|
Returns persona answer to Relay
|
|
↓
|
|
Relay → POST /ingest (async)
|
|
↓
|
|
Cortex → add_exchange_internal() → SESSIONS buffer
|
|
↓
|
|
Relay → NeoMem /memories (async, planned)
|
|
↓
|
|
Relay → UI (returns final response)
|
|
```
|
|
|
|
### Cortex 4-Stage Reasoning Pipeline:
|
|
|
|
1. **Reflection** (`reflection.py`) - Cloud LLM (OpenAI)
|
|
- Analyzes user intent and conversation context
|
|
- Generates meta-awareness notes
|
|
- "What is the user really asking?"
|
|
|
|
2. **Reasoning** (`reasoning.py`) - Primary LLM (llama.cpp)
|
|
- Retrieves short-term context from Intake module
|
|
- Creates initial draft answer
|
|
- Integrates context, reflection notes, and user prompt
|
|
|
|
3. **Refinement** (`refine.py`) - Primary LLM (llama.cpp)
|
|
- Polishes the draft answer
|
|
- Improves clarity and coherence
|
|
- Ensures factual consistency
|
|
|
|
4. **Persona** (`speak.py`) - Cloud LLM (OpenAI)
|
|
- Applies Lyra's personality and speaking style
|
|
- Natural, conversational output
|
|
- Final answer returned to user
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
### Core Services
|
|
|
|
**Relay**:
|
|
- Main orchestrator and message router
|
|
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
|
|
- Internal endpoint: `POST /chat`
|
|
- Health check: `GET /_health`
|
|
- Async non-blocking calls to Cortex
|
|
- Shared request handler for code reuse
|
|
- Comprehensive error handling
|
|
|
|
**NeoMem (Memory Engine)**:
|
|
- Forked from Mem0 OSS - fully independent
|
|
- Drop-in compatible API (`/memories`, `/search`)
|
|
- Local-first: runs on FastAPI with Postgres + Neo4j
|
|
- No external SDK dependencies
|
|
- Semantic memory updates - compares embeddings and performs in-place updates
|
|
- Default service: `neomem-api` (port 7077)
|
|
|
|
**UI**:
|
|
- Lightweight static HTML chat interface
|
|
- Cyberpunk theme
|
|
- Session save/load functionality
|
|
- OpenAI message format support
|
|
|
|
### Reasoning Layer
|
|
|
|
**Cortex** (v0.5.1):
|
|
- Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
|
|
- Flexible LLM backend routing via HTTP
|
|
- Per-stage backend selection
|
|
- Async processing throughout
|
|
- Embedded Intake module for short-term context
|
|
- `/reason`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary` endpoints
|
|
- Lenient error handling - never fails the chat pipeline
|
|
|
|
**Intake** (Embedded Module):
|
|
- **Architectural change**: Now runs as Python module inside Cortex container
|
|
- In-memory SESSIONS management (session_id → buffer)
|
|
- Multi-level summarization: L1 (ultra-short), L5 (short), L10 (medium), L20 (detailed), L30 (full)
|
|
- Deferred summarization strategy - summaries generated during `/reason` call
|
|
- `bg_summarize()` is a logging stub - actual work deferred
|
|
- **Single-worker constraint**: SESSIONS requires single Uvicorn worker or Redis/shared storage
|
|
|
|
**LLM Router**:
|
|
- Dynamic backend selection via HTTP
|
|
- Environment-driven configuration
|
|
- Support for llama.cpp, Ollama, OpenAI, custom endpoints
|
|
- Per-module backend preferences:
|
|
- `CORTEX_LLM=SECONDARY` (Ollama for reasoning)
|
|
- `INTAKE_LLM=PRIMARY` (llama.cpp for summarization)
|
|
- `SPEAK_LLM=OPENAI` (Cloud for persona)
|
|
- `NEOMEM_LLM=PRIMARY` (llama.cpp for memory operations)
|
|
|
|
### Beta Lyrae (RAG Memory DB) - Currently Disabled
|
|
|
|
- **RAG Knowledge DB - Beta Lyrae (sheliak)**
|
|
- This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra.
|
|
- It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation.
|
|
- **Status**: Disabled in docker-compose.yml (v0.5.1)
|
|
|
|
The system uses:
|
|
- **ChromaDB** for persistent vector storage
|
|
- **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity
|
|
- **FastAPI** (port 7090) for the `/rag/search` REST endpoint
|
|
|
|
Directory Layout:
|
|
```
|
|
rag/
|
|
├── rag_chat_import.py # imports JSON chat logs
|
|
├── rag_docs_import.py # (planned) PDF/EPUB/manual importer
|
|
├── rag_build.py # legacy single-folder builder
|
|
├── rag_query.py # command-line query helper
|
|
├── rag_api.py # FastAPI service providing /rag/search
|
|
├── chromadb/ # persistent vector store
|
|
├── chatlogs/ # organized source data
|
|
│ ├── poker/
|
|
│ ├── work/
|
|
│ ├── lyra/
|
|
│ ├── personal/
|
|
│ └── ...
|
|
└── import.log # progress log for batch runs
|
|
```
|
|
|
|
**OpenAI chatlog importer features:**
|
|
- Recursive folder indexing with **category detection** from directory name
|
|
- Smart chunking for long messages (5,000 chars per slice)
|
|
- Automatic deduplication using SHA-1 hash of file + chunk
|
|
- Timestamps for both file modification and import time
|
|
- Full progress logging via tqdm
|
|
- Safe to run in background with `nohup … &`
|
|
|
|
---
|
|
|
|
## Docker Deployment
|
|
|
|
All services run in a single docker-compose stack with the following containers:
|
|
|
|
**Active Services:**
|
|
- **neomem-postgres** - PostgreSQL with pgvector extension (port 5432)
|
|
- **neomem-neo4j** - Neo4j graph database (ports 7474, 7687)
|
|
- **neomem-api** - NeoMem memory service (port 7077)
|
|
- **relay** - Main orchestrator (port 7078)
|
|
- **cortex** - Reasoning engine with embedded Intake (port 7081)
|
|
|
|
**Disabled Services:**
|
|
- **intake** - No longer needed (embedded in Cortex as of v0.5.1)
|
|
- **rag** - Beta Lyrae RAG service (port 7090) - currently disabled
|
|
|
|
All containers communicate via the `lyra_net` Docker bridge network.
|
|
|
|
## External LLM Services
|
|
|
|
The following LLM backends are accessed via HTTP (not part of docker-compose):
|
|
|
|
- **llama.cpp Server** (`http://10.0.0.44:8080`)
|
|
- AMD MI50 GPU-accelerated inference
|
|
- Primary backend for reasoning and refinement stages
|
|
- Model path: `/model`
|
|
|
|
- **Ollama Server** (`http://10.0.0.3:11434`)
|
|
- RTX 3090 GPU-accelerated inference
|
|
- Secondary/configurable backend
|
|
- Model: qwen2.5:7b-instruct-q4_K_M
|
|
|
|
- **OpenAI API** (`https://api.openai.com/v1`)
|
|
- Cloud-based inference
|
|
- Used for reflection and persona stages
|
|
- Model: gpt-4o-mini
|
|
|
|
- **Fallback Server** (`http://10.0.0.41:11435`)
|
|
- Emergency backup endpoint
|
|
- Local llama-3.2-8b-instruct model
|
|
|
|
---
|
|
|
|
## Version History
|
|
|
|
### v0.5.1 (2025-12-11) - Current Release
|
|
**Critical Intake Integration Fixes:**
|
|
- ✅ Fixed `bg_summarize()` NameError preventing SESSIONS persistence
|
|
- ✅ Fixed `/ingest` endpoint unreachable code
|
|
- ✅ Added `cortex/intake/__init__.py` for proper package structure
|
|
- ✅ Added diagnostic logging to verify SESSIONS singleton behavior
|
|
- ✅ Added `/debug/sessions` and `/debug/summary` endpoints
|
|
- ✅ Documented single-worker constraint in Dockerfile
|
|
- ✅ Implemented lenient error handling (never fails chat pipeline)
|
|
- ✅ Intake now embedded in Cortex - no longer standalone service
|
|
|
|
**Architecture Changes:**
|
|
- Intake module runs inside Cortex container as pure Python import
|
|
- No HTTP calls between Cortex and Intake (internal function calls)
|
|
- SESSIONS persist correctly in Uvicorn worker
|
|
- Deferred summarization strategy (summaries generated during `/reason`)
|
|
|
|
### v0.5.0 (2025-11-28)
|
|
- ✅ Fixed all critical API wiring issues
|
|
- ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
|
|
- ✅ Fixed Cortex → Intake integration
|
|
- ✅ Added missing Python package `__init__.py` files
|
|
- ✅ End-to-end message flow verified and working
|
|
|
|
### Infrastructure v1.0.0 (2025-11-26)
|
|
- Consolidated 9 scattered `.env` files into single source of truth
|
|
- Multi-backend LLM strategy implemented
|
|
- Docker Compose consolidation
|
|
- Created `.env.example` security templates
|
|
|
|
### v0.4.x (Major Rewire)
|
|
- Cortex multi-stage reasoning pipeline
|
|
- LLM router with multi-backend support
|
|
- Major architectural restructuring
|
|
|
|
### v0.3.x
|
|
- Beta Lyrae RAG system
|
|
- NeoMem integration
|
|
- Basic Cortex reasoning loop
|
|
|
|
---
|
|
|
|
## Known Issues (v0.5.1)
|
|
|
|
### Critical (Fixed in v0.5.1)
|
|
- ~~Intake SESSIONS not persisting~~ ✅ **FIXED**
|
|
- ~~`bg_summarize()` NameError~~ ✅ **FIXED**
|
|
- ~~`/ingest` endpoint unreachable code~~ ✅ **FIXED**
|
|
|
|
### Non-Critical
|
|
- Session management endpoints not fully implemented in Relay
|
|
- RAG service currently disabled in docker-compose.yml
|
|
- NeoMem integration in Relay not yet active (planned for v0.5.2)
|
|
|
|
### Operational Notes
|
|
- **Single-worker constraint**: Cortex must run with single Uvicorn worker to maintain SESSIONS state
|
|
- Multi-worker scaling requires migrating SESSIONS to Redis or shared storage
|
|
- Diagnostic endpoints (`/debug/sessions`, `/debug/summary`) available for troubleshooting
|
|
|
|
### Future Enhancements
|
|
- Re-enable RAG service integration
|
|
- Implement full session persistence
|
|
- Migrate SESSIONS to Redis for multi-worker support
|
|
- Add request correlation IDs for tracing
|
|
- Comprehensive health checks across all services
|
|
- NeoMem integration in Relay
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
- Docker + Docker Compose
|
|
- At least one HTTP-accessible LLM endpoint (llama.cpp, Ollama, or OpenAI API key)
|
|
|
|
### Setup
|
|
1. Copy `.env.example` to `.env` and configure your LLM backend URLs and API keys:
|
|
```bash
|
|
# Required: Configure at least one LLM backend
|
|
LLM_PRIMARY_URL=http://10.0.0.44:8080 # llama.cpp
|
|
LLM_SECONDARY_URL=http://10.0.0.3:11434 # Ollama
|
|
OPENAI_API_KEY=sk-... # OpenAI
|
|
```
|
|
|
|
2. Start all services with docker-compose:
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
3. Check service health:
|
|
```bash
|
|
# Relay health
|
|
curl http://localhost:7078/_health
|
|
|
|
# Cortex health
|
|
curl http://localhost:7081/health
|
|
|
|
# NeoMem health
|
|
curl http://localhost:7077/health
|
|
```
|
|
|
|
4. Access the UI at `http://localhost:7078`
|
|
|
|
### Test
|
|
|
|
**Test Relay → Cortex pipeline:**
|
|
```bash
|
|
curl -X POST http://localhost:7078/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"messages": [{"role": "user", "content": "Hello Lyra!"}],
|
|
"session_id": "test"
|
|
}'
|
|
```
|
|
|
|
**Test Cortex /ingest endpoint:**
|
|
```bash
|
|
curl -X POST http://localhost:7081/ingest \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"session_id": "test",
|
|
"user_msg": "Hello",
|
|
"assistant_msg": "Hi there!"
|
|
}'
|
|
```
|
|
|
|
**Inspect SESSIONS state:**
|
|
```bash
|
|
curl http://localhost:7081/debug/sessions
|
|
```
|
|
|
|
**Get summary for a session:**
|
|
```bash
|
|
curl "http://localhost:7081/debug/summary?session_id=test"
|
|
```
|
|
|
|
All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack.
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
### LLM Backend Configuration
|
|
|
|
**Backend URLs (Full API endpoints):**
|
|
```bash
|
|
LLM_PRIMARY_URL=http://10.0.0.44:8080 # llama.cpp
|
|
LLM_PRIMARY_MODEL=/model
|
|
|
|
LLM_SECONDARY_URL=http://10.0.0.3:11434 # Ollama
|
|
LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
|
|
|
|
LLM_OPENAI_URL=https://api.openai.com/v1
|
|
LLM_OPENAI_MODEL=gpt-4o-mini
|
|
OPENAI_API_KEY=sk-...
|
|
```
|
|
|
|
**Module-specific backend selection:**
|
|
```bash
|
|
CORTEX_LLM=SECONDARY # Use Ollama for reasoning
|
|
INTAKE_LLM=PRIMARY # Use llama.cpp for summarization
|
|
SPEAK_LLM=OPENAI # Use OpenAI for persona
|
|
NEOMEM_LLM=PRIMARY # Use llama.cpp for memory
|
|
UI_LLM=OPENAI # Use OpenAI for UI
|
|
RELAY_LLM=PRIMARY # Use llama.cpp for relay
|
|
```
|
|
|
|
### Database Configuration
|
|
```bash
|
|
POSTGRES_USER=neomem
|
|
POSTGRES_PASSWORD=neomempass
|
|
POSTGRES_DB=neomem
|
|
POSTGRES_HOST=neomem-postgres
|
|
POSTGRES_PORT=5432
|
|
|
|
NEO4J_URI=bolt://neomem-neo4j:7687
|
|
NEO4J_USERNAME=neo4j
|
|
NEO4J_PASSWORD=neomemgraph
|
|
```
|
|
|
|
### Service URLs (Internal Docker Network)
|
|
```bash
|
|
NEOMEM_API=http://neomem-api:7077
|
|
CORTEX_API=http://cortex:7081
|
|
CORTEX_REASON_URL=http://cortex:7081/reason
|
|
CORTEX_INGEST_URL=http://cortex:7081/ingest
|
|
RELAY_URL=http://relay:7078
|
|
```
|
|
|
|
### Feature Flags
|
|
```bash
|
|
CORTEX_ENABLED=true
|
|
MEMORY_ENABLED=true
|
|
PERSONA_ENABLED=false
|
|
DEBUG_PROMPT=true
|
|
VERBOSE_DEBUG=true
|
|
```
|
|
|
|
For complete environment variable reference, see [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md).
|
|
|
|
---
|
|
|
|
## Documentation
|
|
|
|
- [CHANGELOG.md](CHANGELOG.md) - Detailed version history
|
|
- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Comprehensive project overview for AI context
|
|
- [ENVIRONMENT_VARIABLES.md](ENVIRONMENT_VARIABLES.md) - Environment variable reference
|
|
- [DEPRECATED_FILES.md](DEPRECATED_FILES.md) - Deprecated files and migration guide
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### SESSIONS not persisting
|
|
**Symptom:** Intake buffer always shows 0 exchanges, summaries always empty.
|
|
|
|
**Solution (Fixed in v0.5.1):**
|
|
- Ensure `cortex/intake/__init__.py` exists
|
|
- Check Cortex logs for `[Intake Module Init]` message showing SESSIONS object ID
|
|
- Verify single-worker mode (Dockerfile: `uvicorn main:app --workers 1`)
|
|
- Use `/debug/sessions` endpoint to inspect current state
|
|
|
|
### Cortex connection errors
|
|
**Symptom:** Relay can't reach Cortex, 502 errors.
|
|
|
|
**Solution:**
|
|
- Verify Cortex container is running: `docker ps | grep cortex`
|
|
- Check Cortex health: `curl http://localhost:7081/health`
|
|
- Verify environment variables: `CORTEX_REASON_URL=http://cortex:7081/reason`
|
|
- Check docker network: `docker network inspect lyra_net`
|
|
|
|
### LLM backend timeouts
|
|
**Symptom:** Reasoning stage hangs or times out.
|
|
|
|
**Solution:**
|
|
- Verify LLM backend is running and accessible
|
|
- Check LLM backend health: `curl http://10.0.0.44:8080/health`
|
|
- Increase timeout in llm_router.py if using slow models
|
|
- Check logs for specific backend errors
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
|
|
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.
|
|
|
|
**Built with Claude Code**
|
|
|
|
---
|
|
|
|
## Integration Notes
|
|
|
|
- NeoMem API is compatible with Mem0 OSS endpoints (`/memories`, `/search`)
|
|
- All services communicate via Docker internal networking on the `lyra_net` bridge
|
|
- History and entity graphs are managed via PostgreSQL + Neo4j
|
|
- LLM backends are accessed via HTTP and configured in `.env`
|
|
- Intake module is imported internally by Cortex (no HTTP communication)
|
|
- SESSIONS state is maintained in-memory within Cortex container
|
|
|
|
---
|
|
|
|
## Beta Lyrae - RAG Memory System (Currently Disabled)
|
|
|
|
**Note:** The RAG service is currently disabled in docker-compose.yml
|
|
|
|
### Requirements
|
|
- Python 3.10+
|
|
- Dependencies: `chromadb openai tqdm python-dotenv fastapi uvicorn`
|
|
- Persistent storage: `./chromadb` or `/mnt/data/lyra_rag_db`
|
|
|
|
### Setup
|
|
1. Import chat logs (must be in OpenAI message format):
|
|
```bash
|
|
python3 rag/rag_chat_import.py
|
|
```
|
|
|
|
2. Build and start the RAG API server:
|
|
```bash
|
|
cd rag
|
|
python3 rag_build.py
|
|
uvicorn rag_api:app --host 0.0.0.0 --port 7090
|
|
```
|
|
|
|
3. Query the RAG system:
|
|
```bash
|
|
curl -X POST http://127.0.0.1:7090/rag/search \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"query": "What is the current state of Cortex?",
|
|
"where": {"category": "lyra"}
|
|
}'
|
|
```
|
|
|
|
---
|
|
|
|
## Development Notes
|
|
|
|
### Cortex Architecture (v0.5.1)
|
|
- Cortex contains embedded Intake module at `cortex/intake/`
|
|
- Intake is imported as: `from intake.intake import add_exchange_internal, SESSIONS`
|
|
- SESSIONS is a module-level global dictionary (singleton pattern)
|
|
- Single-worker constraint required to maintain SESSIONS state
|
|
- Diagnostic endpoints available for debugging: `/debug/sessions`, `/debug/summary`
|
|
|
|
### Adding New LLM Backends
|
|
1. Add backend URL to `.env`:
|
|
```bash
|
|
LLM_CUSTOM_URL=http://your-backend:port
|
|
LLM_CUSTOM_MODEL=model-name
|
|
```
|
|
|
|
2. Configure module to use new backend:
|
|
```bash
|
|
CORTEX_LLM=CUSTOM
|
|
```
|
|
|
|
3. Restart Cortex container:
|
|
```bash
|
|
docker-compose restart cortex
|
|
```
|
|
|
|
### Debugging Tips
|
|
- Enable verbose logging: `VERBOSE_DEBUG=true` in `.env`
|
|
- Check Cortex logs: `docker logs cortex -f`
|
|
- Inspect SESSIONS: `curl http://localhost:7081/debug/sessions`
|
|
- Test summarization: `curl "http://localhost:7081/debug/summary?session_id=test"`
|
|
- Check Relay logs: `docker logs relay -f`
|
|
- Monitor Docker network: `docker network inspect lyra_net`
|