5f53fb32a4
- Simplified LLM call logic in llm_router.py, removing tool adapter complexity and enhancing error handling. - Added health check endpoint to main.py for system status verification. - Cleaned up router.py by removing unused imports and commented-out code, streamlining the structure. - Updated docker-compose.yml to unify services under a single Lyra container, enhancing deployment simplicity. - Created Dockerfile for unified container setup, including both Relay and Cortex services. - Added QUICKSTART.md for improved onboarding and usage instructions. - Implemented start.sh script to manage service startup and health checks.
484 lines
12 KiB
Markdown
484 lines
12 KiB
Markdown
# Project Lyra
|
|
|
|
**A streamlined AI conversation system with intelligent summarization and memory**
|
|
|
|
Lyra is a unified conversational AI system that processes your thoughts, summarizes conversations at multiple levels, and prepares them for semantic memory storage. Think of it as your personal thought processor—you dump ideas, it makes sense of them, and stores both the raw conversation and progressive summaries.
|
|
|
|
**Current Version:** v1.0.0 (2026-02-23)
|
|
|
|
---
|
|
|
|
## Mission Statement
|
|
|
|
Project Lyra is designed to be your **external brain**. Unlike typical chatbots that forget everything, Lyra:
|
|
- **Captures** everything you say in raw form
|
|
- **Summarizes** conversations at multiple granularities (L1-L30)
|
|
- **Stores** both raw and summarized data for future retrieval
|
|
- **Prepares** everything for semantic search via vector embeddings (Nebula, coming soon)
|
|
|
|
You can vomit ideas at it, and Lyra will organize, summarize, and remember.
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
Lyra runs as a **unified Docker container** with a clean separation of concerns:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────┐
|
|
│ Unified Container (lyra) │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────────────┐ │
|
|
│ │ Relay :7078 │ │ Cortex :7081 │ │
|
|
│ │ (Node.js) │→ │ (Python FastAPI) │ │
|
|
│ │ │ │ │ │
|
|
│ │ - API Gateway│ │ - /reason (full) │ │
|
|
│ │ - Sessions │ │ - /simple (fast) │ │
|
|
│ │ - OpenAI API │ │ - /ingest (intake) │ │
|
|
│ └──────────────┘ └──────────────────────┘ │
|
|
│ │ │
|
|
│ ↓ │
|
|
│ ┌──────────────┐ │
|
|
│ │ Intake │ │
|
|
│ │ (embedded) │ │
|
|
│ │ │ │
|
|
│ │ - L1-L30 │ │
|
|
│ │ - Summary │ │
|
|
│ │ - Buffer │ │
|
|
│ └──────────────┘ │
|
|
│ │ │
|
|
└────────────────────────────┼─────────────────┘
|
|
↓
|
|
┌─────────────┐
|
|
│ Nebula │ (coming soon)
|
|
│ (vector │
|
|
│ storage) │
|
|
└─────────────┘
|
|
```
|
|
|
|
### Components
|
|
|
|
**1. Relay (Node.js - Port 7078)**
|
|
- User-facing API gateway
|
|
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
|
|
- Session management (save, load, rename, delete)
|
|
- Proxies requests to Cortex
|
|
|
|
**2. Cortex (Python - Port 7081)**
|
|
- Main reasoning and processing brain
|
|
- Multi-stage reasoning pipeline
|
|
- LLM routing to different backends
|
|
- Embedded Intake module
|
|
|
|
**3. Intake (Python Module - Embedded)**
|
|
- Short-term memory buffer (200 messages per session)
|
|
- Multi-level summarization:
|
|
- **L1** (5 messages): Ultra-short summary
|
|
- **L5** (10 messages): Short overview
|
|
- **L10** (10 messages): "Reality Check" - tone, intent, direction
|
|
- **L20** (merged L10s): "Session Overview" - progress and themes
|
|
- **L30** (merged L20s): "Continuity Report" - high-level reflection
|
|
- Sends summaries to Nebula (HTTP POST with disk fallback)
|
|
|
|
**4. Nebula (Future - Port 7090)**
|
|
- Vector database for semantic memory
|
|
- RAG (Retrieval-Augmented Generation)
|
|
- Memory resurfacing based on similarity
|
|
|
|
---
|
|
|
|
## What Makes Lyra Different?
|
|
|
|
### Progressive Summarization
|
|
Most chatbots either keep raw history (expensive) or forget everything (useless). Lyra does both:
|
|
- **Raw storage**: Every conversation turn saved
|
|
- **L1-L30 summaries**: Multiple granularities for different use cases
|
|
- L1: "What just happened?" (immediate context)
|
|
- L10: "What's the vibe?" (tone and direction)
|
|
- L20: "What did we accomplish?" (session overview)
|
|
- L30: "What's the big picture?" (continuity across sessions)
|
|
|
|
### Nebula-Ready Architecture
|
|
Summaries are sent via HTTP to Nebula (when available), with automatic disk fallback:
|
|
```
|
|
.nebula_fallback/
|
|
└── {session_id}/
|
|
├── L10_20260223_203045.json
|
|
├── L20_20260223_204512.json
|
|
└── L30_20260223_210030.json
|
|
```
|
|
|
|
### Dual Mode Operation
|
|
- **Simple Mode** (`/simple`): Fast, direct LLM responses
|
|
- **Cortex Mode** (`/reason`): Full 4-stage reasoning pipeline
|
|
1. Reflection (meta-awareness)
|
|
2. Reasoning (draft)
|
|
3. Refinement (polish)
|
|
4. Persona (Lyra's voice)
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
- Docker + Docker Compose
|
|
- At least one LLM backend (llama.cpp, Ollama, OpenAI API)
|
|
|
|
### Run It
|
|
|
|
```bash
|
|
# 1. Create .env file with your LLM backend
|
|
cp .env.example .env
|
|
# Edit .env with your LLM URLs and API keys
|
|
|
|
# 2. Build and start
|
|
docker-compose up -d --build
|
|
|
|
# 3. Check health
|
|
curl http://localhost:7078/_health # Relay
|
|
curl http://localhost:7081/_health # Cortex
|
|
|
|
# 4. Open UI
|
|
open http://localhost:8081
|
|
```
|
|
|
|
### Test It
|
|
|
|
```bash
|
|
# Simple chat
|
|
curl -X POST http://localhost:7078/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"mode": "standard",
|
|
"messages": [{"role": "user", "content": "Hello!"}],
|
|
"sessionId": "test"
|
|
}'
|
|
|
|
# Full reasoning pipeline
|
|
curl -X POST http://localhost:7078/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"mode": "cortex",
|
|
"messages": [{"role": "user", "content": "Explain quantum computing"}],
|
|
"sessionId": "test"
|
|
}'
|
|
```
|
|
|
|
---
|
|
|
|
## Data Flow
|
|
|
|
### Simple Mode (Fast Path)
|
|
```
|
|
User → Relay → Cortex (/simple) → Direct LLM → Response
|
|
↓
|
|
Intake (buffer + summarize on triggers)
|
|
↓
|
|
Nebula (summaries only)
|
|
```
|
|
|
|
### Cortex Mode (Full Pipeline)
|
|
```
|
|
User → Relay → Cortex (/reason)
|
|
↓
|
|
1. Reflection (what's being asked?)
|
|
↓
|
|
2. Reasoning (draft answer)
|
|
↓
|
|
3. Refinement (polish)
|
|
↓
|
|
4. Persona (Lyra's voice)
|
|
↓
|
|
Intake (buffer + multi-level summaries)
|
|
↓
|
|
Nebula (raw + summaries)
|
|
↓
|
|
Response
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
**LLM Backends:**
|
|
```bash
|
|
# Primary backend (llama.cpp on AMD MI50)
|
|
LLM_PRIMARY_URL=http://10.0.0.44:8080
|
|
LLM_PRIMARY_MODEL=/model
|
|
|
|
# Secondary backend (Ollama on RTX 3090)
|
|
LLM_SECONDARY_URL=http://10.0.0.3:11434
|
|
LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M
|
|
|
|
# Cloud backend (OpenAI)
|
|
LLM_OPENAI_URL=https://api.openai.com/v1
|
|
LLM_OPENAI_MODEL=gpt-4o-mini
|
|
OPENAI_API_KEY=sk-...
|
|
```
|
|
|
|
**Module-Specific Backend Selection:**
|
|
```bash
|
|
CORTEX_LLM=PRIMARY # Reasoning engine
|
|
INTAKE_LLM=PRIMARY # Summarization
|
|
SPEAK_LLM=OPENAI # Persona (final voice)
|
|
STANDARD_MODE_LLM=SECONDARY # Simple mode default
|
|
```
|
|
|
|
**Nebula Integration:**
|
|
```bash
|
|
NEBULA_API=http://localhost:7090 # When Nebula is running
|
|
NEBULA_KEY=your-api-key # Optional auth
|
|
```
|
|
|
|
**Intake Settings:**
|
|
```bash
|
|
INTAKE_LLM=PRIMARY
|
|
SUMMARY_MAX_TOKENS=200
|
|
SUMMARY_TEMPERATURE=0.3
|
|
```
|
|
|
|
---
|
|
|
|
## API Reference
|
|
|
|
### Relay Endpoints (Port 7078)
|
|
|
|
**Chat (OpenAI-compatible):**
|
|
```bash
|
|
POST /v1/chat/completions
|
|
{
|
|
"mode": "standard" | "cortex",
|
|
"messages": [{"role": "user", "content": "..."}],
|
|
"sessionId": "session-123"
|
|
}
|
|
```
|
|
|
|
**Sessions:**
|
|
```bash
|
|
GET /sessions # List all sessions
|
|
GET /sessions/:id # Get session history
|
|
POST /sessions/:id # Save session
|
|
PATCH /sessions/:id/metadata # Rename session
|
|
DELETE /sessions/:id # Delete session
|
|
```
|
|
|
|
**Health:**
|
|
```bash
|
|
GET /_health
|
|
```
|
|
|
|
### Cortex Endpoints (Port 7081)
|
|
|
|
**Reasoning:**
|
|
```bash
|
|
POST /reason
|
|
{
|
|
"session_id": "session-123",
|
|
"user_prompt": "Your question here"
|
|
}
|
|
```
|
|
|
|
**Simple Mode:**
|
|
```bash
|
|
POST /simple
|
|
{
|
|
"session_id": "session-123",
|
|
"user_prompt": "Your question here",
|
|
"backend": "SECONDARY" # Optional
|
|
}
|
|
```
|
|
|
|
**Intake:**
|
|
```bash
|
|
POST /ingest
|
|
{
|
|
"session_id": "session-123",
|
|
"user_msg": "User message",
|
|
"assistant_msg": "Assistant response"
|
|
}
|
|
```
|
|
|
|
**Health:**
|
|
```bash
|
|
GET /_health
|
|
```
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
```
|
|
project-lyra/
|
|
├── Dockerfile # Unified container (Node + Python)
|
|
├── docker-compose.yml # Single lyra service + UI
|
|
├── start.sh # Startup script (Cortex → Relay)
|
|
├── .dockerignore
|
|
├── QUICKSTART.md # Quick reference
|
|
│
|
|
├── core/
|
|
│ └── relay/ # Node.js API gateway
|
|
│ ├── server.js
|
|
│ ├── lib/
|
|
│ │ ├── cortex.js # Cortex HTTP client
|
|
│ │ └── llm.js # LLM routing
|
|
│ └── sessions/ # Session storage (volume)
|
|
│
|
|
├── cortex/ # Python reasoning engine
|
|
│ ├── main.py # FastAPI app
|
|
│ ├── router.py # /reason, /simple, /ingest
|
|
│ ├── context.py # Session context
|
|
│ ├── llm/
|
|
│ │ └── llm_router.py # Multi-backend LLM routing
|
|
│ ├── intake/
|
|
│ │ └── intake.py # Summarization module
|
|
│ ├── reasoning/
|
|
│ │ ├── reflection.py
|
|
│ │ ├── reasoning.py
|
|
│ │ └── refine.py
|
|
│ └── persona/
|
|
│ └── speak.py
|
|
│
|
|
└── .nebula_fallback/ # Disk storage until Nebula runs
|
|
└── {session_id}/
|
|
├── L10_*.json
|
|
├── L20_*.json
|
|
└── L30_*.json
|
|
```
|
|
|
|
---
|
|
|
|
## Roadmap
|
|
|
|
### ✅ Phase 1 (Complete)
|
|
- Unified container architecture
|
|
- Multi-level summarization (L1-L30)
|
|
- HTTP client for Nebula (with disk fallback)
|
|
- Session management
|
|
- Dual-mode operation
|
|
|
|
### 🚧 Phase 2 (In Progress)
|
|
- Build Nebula vector database
|
|
- RAG integration
|
|
- Memory resurfacing based on semantic similarity
|
|
|
|
### 📋 Phase 3 (Planned)
|
|
- Entity extraction from summaries
|
|
- Topic clustering
|
|
- Automatic knowledge graph generation
|
|
- Temporal memory (what happened when)
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Container won't start
|
|
```bash
|
|
# Check logs
|
|
docker-compose logs lyra
|
|
|
|
# Common issues:
|
|
# - Missing .env file
|
|
# - Invalid LLM backend URLs
|
|
# - Port conflicts (7078, 7081)
|
|
```
|
|
|
|
### Summaries not appearing
|
|
```bash
|
|
# Check Nebula fallback directory
|
|
ls -la .nebula_fallback/
|
|
|
|
# Verify Cortex is processing
|
|
docker-compose logs lyra | grep "Nebula"
|
|
```
|
|
|
|
### Sessions not persisting
|
|
```bash
|
|
# Check volume mount
|
|
docker-compose exec lyra ls -la /app/relay/sessions/
|
|
|
|
# Verify session save calls
|
|
curl http://localhost:7078/sessions
|
|
```
|
|
|
|
---
|
|
|
|
## Development
|
|
|
|
### Making Changes
|
|
|
|
**Code changes (hot reload):**
|
|
```bash
|
|
docker-compose restart lyra
|
|
```
|
|
|
|
**Dependency changes (rebuild):**
|
|
```bash
|
|
docker-compose up -d --build lyra
|
|
```
|
|
|
|
**View logs:**
|
|
```bash
|
|
docker-compose logs -f lyra
|
|
```
|
|
|
|
### Adding a New LLM Backend
|
|
|
|
1. Add to `.env`:
|
|
```bash
|
|
LLM_CUSTOM_URL=http://your-backend:port
|
|
LLM_CUSTOM_MODEL=model-name
|
|
```
|
|
|
|
2. Configure module:
|
|
```bash
|
|
CORTEX_LLM=CUSTOM
|
|
```
|
|
|
|
3. Restart:
|
|
```bash
|
|
docker-compose restart lyra
|
|
```
|
|
|
|
---
|
|
|
|
## Version History
|
|
|
|
### v1.0.0 (2026-02-23) - The Great Simplification
|
|
**Major Refactor:**
|
|
- ✅ Unified Relay + Cortex into single container
|
|
- ✅ Removed NeoMem (replaced by upcoming Nebula)
|
|
- ✅ Removed old ingest_handler and RAG services
|
|
- ✅ Simplified to core flow: intake → summarize → store
|
|
- ✅ Added HTTP client for Nebula with disk fallback
|
|
- ✅ Cleaned docker-compose (2 services instead of 7)
|
|
- ✅ Updated documentation to reflect new architecture
|
|
|
|
**Architecture Changes:**
|
|
- Intake now sends summaries to Nebula (HTTP POST)
|
|
- Disk fallback writes JSON files to `.nebula_fallback/`
|
|
- Relay and Cortex communicate via localhost (faster)
|
|
- Single build, single deploy, single log stream
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
© 2026 Terra-Mechanics / ServersDown Labs. Apache 2.0.
|
|
|
|
**Built with Claude Code**
|
|
|
|
---
|
|
|
|
## Credits
|
|
|
|
Built by Brian with assistance from Claude (Anthropic).
|
|
|
|
Special thanks to the open source community:
|
|
- FastAPI
|
|
- Express.js
|
|
- Docker
|
|
- llama.cpp
|
|
- Ollama
|