2025-12-21 15:50:52 -05:00
2025-12-21 15:50:52 -05:00
2025-11-26 03:18:15 -05:00
2025-11-26 03:18:15 -05:00
2025-12-17 01:47:19 -05:00
2025-12-19 17:43:22 -05:00
2025-11-26 03:18:15 -05:00
2025-12-19 17:43:22 -05:00

Project Lyra - README v0.6.0

Lyra is a modular persistent AI companion system with advanced reasoning capabilities and autonomous decision-making. It provides memory-backed chat using Relay + Cortex with integrated Autonomy System, featuring a multi-stage reasoning pipeline powered by HTTP-based LLM backends.

Current Version: v0.6.0 (2025-12-18)

Note: As of v0.6.0, NeoMem is disabled by default while we work out integration hiccups in the pipeline. The autonomy system is being refined independently before full memory integration.

Mission Statement

The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget evertything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.


Architecture Overview

Project Lyra operates as a single docker-compose deployment with multiple Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:

Core Services

1. Relay (Node.js/Express) - Port 7078

  • Main orchestrator and message router
  • Coordinates all module interactions
  • OpenAI-compatible endpoint: POST /v1/chat/completions
  • Internal endpoint: POST /chat
  • Routes messages through Cortex reasoning pipeline
  • Manages async calls to Cortex ingest
  • (NeoMem integration currently disabled in v0.6.0)

2. UI (Static HTML)

  • Browser-based chat interface with cyberpunk theme
  • Connects to Relay
  • Saves and loads sessions
  • OpenAI-compatible message format

3. NeoMem (Python/FastAPI) - Port 7077 - DISABLED IN v0.6.0

  • Long-term memory database (fork of Mem0 OSS)
  • Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
  • RESTful API: /memories, /search
  • Semantic memory updates and retrieval
  • No external SDK dependencies - fully local
  • Status: Currently disabled while pipeline integration is refined

Reasoning Layer

4. Cortex (Python/FastAPI) - Port 7081

  • Primary reasoning engine with multi-stage pipeline and autonomy system
  • Includes embedded Intake module (no separate service as of v0.5.1)
  • Integrated Autonomy System (NEW in v0.6.0) - See Autonomy System section below
  • 4-Stage Processing:
    1. Reflection - Generates meta-awareness notes about conversation
    2. Reasoning - Creates initial draft answer using context
    3. Refinement - Polishes and improves the draft
    4. Persona - Applies Lyra's personality and speaking style
  • Integrates with Intake for short-term context via internal Python imports
  • Flexible LLM router supporting multiple backends via HTTP
  • Endpoints:
    • POST /reason - Main reasoning pipeline
    • POST /ingest - Receives conversation exchanges from Relay
    • GET /health - Service health check
    • GET /debug/sessions - Inspect in-memory SESSIONS state
    • GET /debug/summary - Test summarization for a session

5. Intake (Python Module) - Embedded in Cortex

  • No longer a standalone service - runs as Python module inside Cortex container
  • Short-term memory management with session-based circular buffer
  • In-memory SESSIONS dictionary: session_id → {buffer: deque(maxlen=200), created_at: timestamp}
  • Multi-level summarization (L1/L5/L10/L20/L30) produced by summarize_context()
  • Deferred summarization - actual summary generation happens during /reason call
  • Internal Python API:
    • add_exchange_internal(exchange) - Direct function call from Cortex
    • summarize_context(session_id, exchanges) - Async LLM-based summarization
    • SESSIONS - Module-level global state (requires single Uvicorn worker)

LLM Backends (HTTP-based)

All LLM communication is done via HTTP APIs:

  • PRIMARY: llama.cpp server (http://10.0.0.44:8080) - AMD MI50 GPU backend
  • SECONDARY: Ollama server (http://10.0.0.3:11434) - RTX 3090 backend
    • Model: qwen2.5:7b-instruct-q4_K_M
  • CLOUD: OpenAI API (https://api.openai.com/v1) - Cloud-based models
    • Model: gpt-4o-mini
  • FALLBACK: Local backup (http://10.0.0.41:11435) - Emergency fallback
    • Model: llama-3.2-8b-instruct

Each module can be configured to use a different backend via environment variables.

Autonomy System (NEW in v0.6.0)

Cortex Autonomy Subsystems - Multi-layered autonomous decision-making and learning

Autonomy Architecture: The autonomy system operates in coordinated layers, all maintaining state in self_state.json:

  1. Executive Layer → Planning and goals
  2. Decision Layer → Evaluation and choices
  3. Action Layer → Execution
  4. Learning Layer → Pattern adaptation
  5. Monitoring Layer → Proactive awareness

Data Flow Architecture (v0.6.0)

Normal Message Flow:

User (UI) → POST /v1/chat/completions
  ↓
Relay (7078)
  ↓ POST /reason
Cortex (7081)
  ↓ (internal Python call)
Intake module → summarize_context()
  ↓
Autonomy System → Decision evaluation & pattern learning
  ↓
Cortex processes (4 stages):
  1. reflection.py → meta-awareness notes (CLOUD backend)
  2. reasoning.py → draft answer (PRIMARY backend, autonomy-aware)
  3. refine.py → refined answer (PRIMARY backend)
  4. persona/speak.py → Lyra personality (CLOUD backend, autonomy-aware)
  ↓
Returns persona answer to Relay
  ↓
Relay → POST /ingest (async)
  ↓
Cortex → add_exchange_internal() → SESSIONS buffer
  ↓
Autonomy System → Update self_state.json (pattern tracking)
  ↓
Relay → UI (returns final response)

Note: NeoMem integration disabled in v0.6.0

Cortex 4-Stage Reasoning Pipeline:

  1. Reflection (reflection.py) - Cloud LLM (OpenAI)

    • Analyzes user intent and conversation context
    • Generates meta-awareness notes
    • "What is the user really asking?"
  2. Reasoning (reasoning.py) - Primary LLM (llama.cpp)

    • Retrieves short-term context from Intake module
    • Creates initial draft answer
    • Integrates context, reflection notes, and user prompt
  3. Refinement (refine.py) - Primary LLM (llama.cpp)

    • Polishes the draft answer
    • Improves clarity and coherence
    • Ensures factual consistency
  4. Persona (speak.py) - Cloud LLM (OpenAI)

    • Applies Lyra's personality and speaking style
    • Natural, conversational output
    • Final answer returned to user

Features

Core Services

Relay:

  • Main orchestrator and message router
  • OpenAI-compatible endpoint: POST /v1/chat/completions
  • Internal endpoint: POST /chat
  • Health check: GET /_health
  • Async non-blocking calls to Cortex
  • Shared request handler for code reuse
  • Comprehensive error handling

NeoMem (Memory Engine):

  • Forked from Mem0 OSS - fully independent
  • Drop-in compatible API (/memories, /search)
  • Local-first: runs on FastAPI with Postgres + Neo4j
  • No external SDK dependencies
  • Semantic memory updates - compares embeddings and performs in-place updates
  • Default service: neomem-api (port 7077)

UI:

  • Lightweight static HTML chat interface
  • Cyberpunk theme
  • Session save/load functionality
  • OpenAI message format support

Reasoning Layer

Cortex (v0.5.1):

  • Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
  • Flexible LLM backend routing via HTTP
  • Per-stage backend selection
  • Async processing throughout
  • Embedded Intake module for short-term context
  • /reason, /ingest, /health, /debug/sessions, /debug/summary endpoints
  • Lenient error handling - never fails the chat pipeline

Intake (Embedded Module):

  • Architectural change: Now runs as Python module inside Cortex container
  • In-memory SESSIONS management (session_id → buffer)
  • Multi-level summarization: L1 (ultra-short), L5 (short), L10 (medium), L20 (detailed), L30 (full)
  • Deferred summarization strategy - summaries generated during /reason call
  • bg_summarize() is a logging stub - actual work deferred
  • Single-worker constraint: SESSIONS requires single Uvicorn worker or Redis/shared storage

LLM Router:

  • Dynamic backend selection via HTTP
  • Environment-driven configuration
  • Support for llama.cpp, Ollama, OpenAI, custom endpoints
  • Per-module backend preferences:
    • CORTEX_LLM=SECONDARY (Ollama for reasoning)
    • INTAKE_LLM=PRIMARY (llama.cpp for summarization)
    • SPEAK_LLM=OPENAI (Cloud for persona)
    • NEOMEM_LLM=PRIMARY (llama.cpp for memory operations)

Beta Lyrae (RAG Memory DB) - Currently Disabled

  • RAG Knowledge DB - Beta Lyrae (sheliak)
    • This module implements the Retrieval-Augmented Generation (RAG) layer for Project Lyra.
    • It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation.
    • Status: Disabled in docker-compose.yml (v0.5.1)

The system uses:

  • ChromaDB for persistent vector storage
  • OpenAI Embeddings (text-embedding-3-small) for semantic similarity
  • FastAPI (port 7090) for the /rag/search REST endpoint

Directory Layout:

rag/
├── rag_chat_import.py    # imports JSON chat logs
├── rag_docs_import.py    # (planned) PDF/EPUB/manual importer
├── rag_build.py          # legacy single-folder builder
├── rag_query.py          # command-line query helper
├── rag_api.py            # FastAPI service providing /rag/search
├── chromadb/             # persistent vector store
├── chatlogs/             # organized source data
│   ├── poker/
│   ├── work/
│   ├── lyra/
│   ├── personal/
│   └── ...
└── import.log            # progress log for batch runs

OpenAI chatlog importer features:

  • Recursive folder indexing with category detection from directory name
  • Smart chunking for long messages (5,000 chars per slice)
  • Automatic deduplication using SHA-1 hash of file + chunk
  • Timestamps for both file modification and import time
  • Full progress logging via tqdm
  • Safe to run in background with nohup … &

Docker Deployment

All services run in a single docker-compose stack with the following containers:

Active Services:

  • relay - Main orchestrator (port 7078)
  • cortex - Reasoning engine with embedded Intake and Autonomy System (port 7081)

Disabled Services (v0.6.0):

  • neomem-postgres - PostgreSQL with pgvector extension (port 5432) - disabled while refining pipeline
  • neomem-neo4j - Neo4j graph database (ports 7474, 7687) - disabled while refining pipeline
  • neomem-api - NeoMem memory service (port 7077) - disabled while refining pipeline
  • intake - No longer needed (embedded in Cortex as of v0.5.1)
  • rag - Beta Lyrae RAG service (port 7090) - currently disabled

All containers communicate via the lyra_net Docker bridge network.

External LLM Services

The following LLM backends are accessed via HTTP (not part of docker-compose):

  • llama.cpp Server (http://10.0.0.44:8080)

    • AMD MI50 GPU-accelerated inference
    • Primary backend for reasoning and refinement stages
    • Model path: /model
  • Ollama Server (http://10.0.0.3:11434)

    • RTX 3090 GPU-accelerated inference
    • Secondary/configurable backend
    • Model: qwen2.5:7b-instruct-q4_K_M
  • OpenAI API (https://api.openai.com/v1)

    • Cloud-based inference
    • Used for reflection and persona stages
    • Model: gpt-4o-mini
  • Fallback Server (http://10.0.0.41:11435)

    • Emergency backup endpoint
    • Local llama-3.2-8b-instruct model

Version History

v0.6.0 (2025-12-18) - Current Release

Major Feature: Autonomy System (Phase 1, 2, and 2.5)

  • Added autonomous decision-making framework
  • Implemented executive planning and goal-setting layer
  • Added pattern learning system for adaptive behavior
  • Implemented proactive monitoring capabilities
  • Created self-analysis and performance tracking system
  • Integrated self-state persistence (cortex/data/self_state.json)
  • Built decision engine with orchestrator coordination
  • Added autonomous action execution framework
  • Integrated autonomy into reasoning and persona layers
  • Created comprehensive test suites for autonomy features
  • Added complete system breakdown documentation

Architecture Changes:

  • Autonomy system integrated into Cortex reasoning pipeline
  • Multi-layered autonomous decision-making architecture
  • Self-state tracking across sessions
  • NeoMem disabled by default while refining pipeline integration
  • Enhanced orchestrator with flexible service controls

Documentation:

v0.5.1 (2025-12-11)

Critical Intake Integration Fixes:

  • Fixed bg_summarize() NameError preventing SESSIONS persistence
  • Fixed /ingest endpoint unreachable code
  • Added cortex/intake/__init__.py for proper package structure
  • Added diagnostic logging to verify SESSIONS singleton behavior
  • Added /debug/sessions and /debug/summary endpoints
  • Documented single-worker constraint in Dockerfile
  • Implemented lenient error handling (never fails chat pipeline)
  • Intake now embedded in Cortex - no longer standalone service

Architecture Changes:

  • Intake module runs inside Cortex container as pure Python import
  • No HTTP calls between Cortex and Intake (internal function calls)
  • SESSIONS persist correctly in Uvicorn worker
  • Deferred summarization strategy (summaries generated during /reason)

v0.5.0 (2025-11-28)

  • Fixed all critical API wiring issues
  • Added OpenAI-compatible endpoint to Relay (/v1/chat/completions)
  • Fixed Cortex → Intake integration
  • Added missing Python package __init__.py files
  • End-to-end message flow verified and working

Infrastructure v1.0.0 (2025-11-26)

  • Consolidated 9 scattered .env files into single source of truth
  • Multi-backend LLM strategy implemented
  • Docker Compose consolidation
  • Created .env.example security templates

v0.4.x (Major Rewire)

  • Cortex multi-stage reasoning pipeline
  • LLM router with multi-backend support
  • Major architectural restructuring

v0.3.x

  • Beta Lyrae RAG system
  • NeoMem integration
  • Basic Cortex reasoning loop

Known Issues (v0.6.0)

Temporarily Disabled (v0.6.0)

  • NeoMem disabled by default - Being refined independently before full integration
    • PostgreSQL + pgvector storage inactive
    • Neo4j graph database inactive
    • Memory persistence endpoints not active
  • RAG service (Beta Lyrae) currently disabled in docker-compose.yml

Non-Critical

  • Session management endpoints not fully implemented in Relay
  • Full autonomy system integration still being refined
  • Memory retrieval integration pending NeoMem re-enablement

Operational Notes

  • Single-worker constraint: Cortex must run with single Uvicorn worker to maintain SESSIONS state
    • Multi-worker scaling requires migrating SESSIONS to Redis or shared storage
  • Diagnostic endpoints (/debug/sessions, /debug/summary) available for troubleshooting

Future Enhancements

  • Re-enable NeoMem integration after pipeline refinement
  • Full autonomy system maturation and optimization
  • Re-enable RAG service integration
  • Implement full session persistence
  • Migrate SESSIONS to Redis for multi-worker support
  • Add request correlation IDs for tracing
  • Comprehensive health checks across all services
  • Enhanced pattern learning with long-term memory integration

Quick Start

Prerequisites

  • Docker + Docker Compose
  • At least one HTTP-accessible LLM endpoint (llama.cpp, Ollama, or OpenAI API key)

Setup

  1. Copy .env.example to .env and configure your LLM backend URLs and API keys:

    # Required: Configure at least one LLM backend
    LLM_PRIMARY_URL=http://10.0.0.44:8080       # llama.cpp
    LLM_SECONDARY_URL=http://10.0.0.3:11434     # Ollama
    OPENAI_API_KEY=sk-...                        # OpenAI
    
  2. Start all services with docker-compose:

    docker-compose up -d
    
  3. Check service health:

    # Relay health
    curl http://localhost:7078/_health
    
    # Cortex health
    curl http://localhost:7081/health
    
    # NeoMem health
    curl http://localhost:7077/health
    
  4. Access the UI at http://localhost:7078

Test

Test Relay → Cortex pipeline:

curl -X POST http://localhost:7078/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Hello Lyra!"}],
    "session_id": "test"
  }'

Test Cortex /ingest endpoint:

curl -X POST http://localhost:7081/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "test",
    "user_msg": "Hello",
    "assistant_msg": "Hi there!"
  }'

Inspect SESSIONS state:

curl http://localhost:7081/debug/sessions

Get summary for a session:

curl "http://localhost:7081/debug/summary?session_id=test"

All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack.


Environment Variables

LLM Backend Configuration

Backend URLs (Full API endpoints):

LLM_PRIMARY_URL=http://10.0.0.44:8080           # llama.cpp
LLM_PRIMARY_MODEL=/model

LLM_SECONDARY_URL=http://10.0.0.3:11434         # Ollama
LLM_SECONDARY_MODEL=qwen2.5:7b-instruct-q4_K_M

LLM_OPENAI_URL=https://api.openai.com/v1
LLM_OPENAI_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-...

Module-specific backend selection:

CORTEX_LLM=SECONDARY      # Use Ollama for reasoning
INTAKE_LLM=PRIMARY        # Use llama.cpp for summarization
SPEAK_LLM=OPENAI          # Use OpenAI for persona
NEOMEM_LLM=PRIMARY        # Use llama.cpp for memory
UI_LLM=OPENAI             # Use OpenAI for UI
RELAY_LLM=PRIMARY         # Use llama.cpp for relay

Database Configuration

POSTGRES_USER=neomem
POSTGRES_PASSWORD=neomempass
POSTGRES_DB=neomem
POSTGRES_HOST=neomem-postgres
POSTGRES_PORT=5432

NEO4J_URI=bolt://neomem-neo4j:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=neomemgraph

Service URLs (Internal Docker Network)

NEOMEM_API=http://neomem-api:7077
CORTEX_API=http://cortex:7081
CORTEX_REASON_URL=http://cortex:7081/reason
CORTEX_INGEST_URL=http://cortex:7081/ingest
RELAY_URL=http://relay:7078

Feature Flags

CORTEX_ENABLED=true
MEMORY_ENABLED=true
PERSONA_ENABLED=false
DEBUG_PROMPT=true
VERBOSE_DEBUG=true

For complete environment variable reference, see ENVIRONMENT_VARIABLES.md.


Documentation


Troubleshooting

SESSIONS not persisting

Symptom: Intake buffer always shows 0 exchanges, summaries always empty.

Solution (Fixed in v0.5.1):

  • Ensure cortex/intake/__init__.py exists
  • Check Cortex logs for [Intake Module Init] message showing SESSIONS object ID
  • Verify single-worker mode (Dockerfile: uvicorn main:app --workers 1)
  • Use /debug/sessions endpoint to inspect current state

Cortex connection errors

Symptom: Relay can't reach Cortex, 502 errors.

Solution:

  • Verify Cortex container is running: docker ps | grep cortex
  • Check Cortex health: curl http://localhost:7081/health
  • Verify environment variables: CORTEX_REASON_URL=http://cortex:7081/reason
  • Check docker network: docker network inspect lyra_net

LLM backend timeouts

Symptom: Reasoning stage hangs or times out.

Solution:

  • Verify LLM backend is running and accessible
  • Check LLM backend health: curl http://10.0.0.44:8080/health
  • Increase timeout in llm_router.py if using slow models
  • Check logs for specific backend errors

License

NeoMem is a derivative work based on Mem0 OSS (Apache 2.0). © 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.

Built with Claude Code


Integration Notes

  • NeoMem API is compatible with Mem0 OSS endpoints (/memories, /search)
  • All services communicate via Docker internal networking on the lyra_net bridge
  • History and entity graphs are managed via PostgreSQL + Neo4j
  • LLM backends are accessed via HTTP and configured in .env
  • Intake module is imported internally by Cortex (no HTTP communication)
  • SESSIONS state is maintained in-memory within Cortex container

Beta Lyrae - RAG Memory System (Currently Disabled)

Note: The RAG service is currently disabled in docker-compose.yml

Requirements

  • Python 3.10+
  • Dependencies: chromadb openai tqdm python-dotenv fastapi uvicorn
  • Persistent storage: ./chromadb or /mnt/data/lyra_rag_db

Setup

  1. Import chat logs (must be in OpenAI message format):

    python3 rag/rag_chat_import.py
    
  2. Build and start the RAG API server:

    cd rag
    python3 rag_build.py
    uvicorn rag_api:app --host 0.0.0.0 --port 7090
    
  3. Query the RAG system:

    curl -X POST http://127.0.0.1:7090/rag/search \
      -H "Content-Type: application/json" \
      -d '{
        "query": "What is the current state of Cortex?",
        "where": {"category": "lyra"}
      }'
    

Development Notes

Cortex Architecture (v0.6.0)

  • Cortex contains embedded Intake module at cortex/intake/
  • Intake is imported as: from intake.intake import add_exchange_internal, SESSIONS
  • SESSIONS is a module-level global dictionary (singleton pattern)
  • Single-worker constraint required to maintain SESSIONS state
  • Diagnostic endpoints available for debugging: /debug/sessions, /debug/summary
  • NEW: Autonomy system integrated at cortex/autonomy/
    • Executive, decision, action, learning, and monitoring layers
    • Self-state persistence in cortex/data/self_state.json
    • Coordinated via orchestrator with flexible service controls

Adding New LLM Backends

  1. Add backend URL to .env:

    LLM_CUSTOM_URL=http://your-backend:port
    LLM_CUSTOM_MODEL=model-name
    
  2. Configure module to use new backend:

    CORTEX_LLM=CUSTOM
    
  3. Restart Cortex container:

    docker-compose restart cortex
    

Debugging Tips

  • Enable verbose logging: VERBOSE_DEBUG=true in .env
  • Check Cortex logs: docker logs cortex -f
  • Inspect SESSIONS: curl http://localhost:7081/debug/sessions
  • Test summarization: curl "http://localhost:7081/debug/summary?session_id=test"
  • Check Relay logs: docker logs relay -f
  • Monitor Docker network: docker network inspect lyra_net
Description
Beepo Boop this is a robot beep.
Readme 3.3 MiB
Languages
Python 92.8%
HTML 3.9%
JavaScript 1.7%
CSS 1.3%
Dockerfile 0.3%