2025-11-28 19:29:41 -05:00
2025-11-28 18:05:59 -05:00
2025-11-26 03:18:15 -05:00
2025-11-26 03:18:15 -05:00
2025-11-26 03:18:15 -05:00
2025-11-28 18:05:59 -05:00
2025-11-26 03:18:15 -05:00
2025-11-26 03:18:15 -05:00
2025-11-26 02:28:00 -05:00
2025-11-28 18:05:59 -05:00
2025-11-17 03:34:23 -05:00

Project Lyra - README v0.5.0

Lyra is a modular persistent AI companion system with advanced reasoning capabilities. It provides memory-backed chat using NeoMem + Relay + Cortex, with multi-stage reasoning pipeline powered by distributed LLM backends.

Mission Statement

The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.


Architecture Overview

Project Lyra operates as a series of Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:

A. VM 100 - lyra-core (Core Services)

1. Relay (Node.js/Express) - Port 7078

  • Main orchestrator and message router
  • Coordinates all module interactions
  • OpenAI-compatible endpoint: POST /v1/chat/completions
  • Internal endpoint: POST /chat
  • Routes messages through Cortex reasoning pipeline
  • Manages async calls to Intake and NeoMem

2. UI (Static HTML)

  • Browser-based chat interface with cyberpunk theme
  • Connects to Relay at http://10.0.0.40:7078
  • Saves and loads sessions
  • OpenAI-compatible message format

3. NeoMem (Python/FastAPI) - Port 7077

  • Long-term memory database (fork of Mem0 OSS)
  • Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
  • RESTful API: /memories, /search
  • Semantic memory updates and retrieval
  • No external SDK dependencies - fully local

B. VM 101 - lyra-cortex (Reasoning Layer)

4. Cortex (Python/FastAPI) - Port 7081

  • Primary reasoning engine with multi-stage pipeline
  • 4-Stage Processing:
    1. Reflection - Generates meta-awareness notes about conversation
    2. Reasoning - Creates initial draft answer using context
    3. Refinement - Polishes and improves the draft
    4. Persona - Applies Lyra's personality and speaking style
  • Integrates with Intake for short-term context
  • Flexible LLM router supporting multiple backends

5. Intake v0.2 (Python/FastAPI) - Port 7080

  • Simplified short-term memory summarization
  • Session-based circular buffer (deque, maxlen=200)
  • Single-level simple summarization (no cascading)
  • Background async processing with FastAPI BackgroundTasks
  • Pushes summaries to NeoMem automatically
  • API Endpoints:
    • POST /add_exchange - Add conversation exchange
    • GET /summaries?session_id={id} - Retrieve session summary
    • POST /close_session/{id} - Close and cleanup session

C. LLM Backends (Remote/Local APIs)

Multi-Backend Strategy:

  • PRIMARY: vLLM on AMD MI50 GPU (http://10.0.0.43:8000) - Cortex reasoning, Intake
  • SECONDARY: Ollama on RTX 3090 (http://10.0.0.3:11434) - Configurable per-module
  • CLOUD: OpenAI API (https://api.openai.com/v1) - Cortex persona layer
  • FALLBACK: Local backup (http://10.0.0.41:11435) - Emergency fallback

Data Flow Architecture (v0.5.0)

Normal Message Flow:

User (UI) → POST /v1/chat/completions
  ↓
Relay (7078)
  ↓ POST /reason
Cortex (7081)
  ↓ GET /summaries?session_id=xxx
Intake (7080) [RETURNS SUMMARY]
  ↓
Cortex processes (4 stages):
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer (uses LLM)
  3. refine.py → refined answer (uses LLM)
  4. persona/speak.py → Lyra personality (uses LLM)
  ↓
Returns persona answer to Relay
  ↓
Relay → Cortex /ingest (async, stub)
Relay → Intake /add_exchange (async)
  ↓
Intake → Background summarize → NeoMem
  ↓
Relay → UI (returns final response)

Cortex 4-Stage Reasoning Pipeline:

  1. Reflection (reflection.py) - Cloud backend (OpenAI)

    • Analyzes user intent and conversation context
    • Generates meta-awareness notes
    • "What is the user really asking?"
  2. Reasoning (reasoning.py) - Primary backend (vLLM)

    • Retrieves short-term context from Intake
    • Creates initial draft answer
    • Integrates context, reflection notes, and user prompt
  3. Refinement (refine.py) - Primary backend (vLLM)

    • Polishes the draft answer
    • Improves clarity and coherence
    • Ensures factual consistency
  4. Persona (speak.py) - Cloud backend (OpenAI)

    • Applies Lyra's personality and speaking style
    • Natural, conversational output
    • Final answer returned to user

Features

Lyra-Core (VM 100)

Relay:

  • Main orchestrator and message router
  • OpenAI-compatible endpoint: POST /v1/chat/completions
  • Internal endpoint: POST /chat
  • Health check: GET /_health
  • Async non-blocking calls to Cortex and Intake
  • Shared request handler for code reuse
  • Comprehensive error handling

NeoMem (Memory Engine):

  • Forked from Mem0 OSS - fully independent
  • Drop-in compatible API (/memories, /search)
  • Local-first: runs on FastAPI with Postgres + Neo4j
  • No external SDK dependencies
  • Semantic memory updates - compares embeddings and performs in-place updates
  • Default service: neomem-api (port 7077)

UI:

  • Lightweight static HTML chat interface
  • Cyberpunk theme
  • Session save/load functionality
  • OpenAI message format support

Cortex (VM 101)

Cortex (v0.5):

  • Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
  • Flexible LLM backend routing
  • Per-stage backend selection
  • Async processing throughout
  • IntakeClient integration for short-term context
  • /reason, /ingest (stub), /health endpoints

Intake (v0.2):

  • Simplified single-level summarization
  • Session-based circular buffer (200 exchanges max)
  • Background async summarization
  • Automatic NeoMem push
  • No persistent log files (memory-only)
  • Breaking change from v0.1: Removed cascading summaries (L1, L2, L5, L10, L20, L30)

LLM Router:

  • Dynamic backend selection
  • Environment-driven configuration
  • Support for vLLM, Ollama, OpenAI, custom endpoints
  • Per-module backend preferences

Beta Lyrae (RAG Memory DB) - added 11-3-25

  • RAG Knowledge DB - Beta Lyrae (sheliak)
    • This module implements the Retrieval-Augmented Generation (RAG) layer for Project Lyra.
    • It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation. The system uses:
    • ChromaDB for persistent vector storage
    • OpenAI Embeddings (text-embedding-3-small) for semantic similarity
    • FastAPI (port 7090) for the /rag/search REST endpoint
    • Directory Layout rag/ ├── rag_chat_import.py # imports JSON chat logs ├── rag_docs_import.py # (planned) PDF/EPUB/manual importer ├── rag_build.py # legacy single-folder builder ├── rag_query.py # command-line query helper ├── rag_api.py # FastAPI service providing /rag/search ├── chromadb/ # persistent vector store ├── chatlogs/ # organized source data │ ├── poker/ │ ├── work/ │ ├── lyra/ │ ├── personal/ │ └── ... └── import.log # progress log for batch runs
    • **OpenAI chatlog importer.
      • Takes JSON formatted chat logs and imports it to the RAG.
      • fetures include:
        • Recursive folder indexing with category detection from directory name
        • Smart chunking for long messages (5 000 chars per slice)
        • Automatic deduplication using SHA-1 hash of file + chunk
        • Timestamps for both file modification and import time
        • Full progress logging via tqdm
        • Safe to run in background with nohup … &
        • Metadata per chunk:
          {
            "chat_id": "<sha1 of filename>",
            "chunk_index": 0,
            "source": "chatlogs/lyra/0002_cortex_LLMs_11-1-25.json",
            "title": "cortex LLMs 11-1-25",
            "role": "assistant",
            "category": "lyra",
            "type": "chat",
            "file_modified": "2025-11-06T23:41:02",
            "imported_at": "2025-11-07T03:55:00Z"
          }```
          
          

Cortex VM (VM101, CT201)

  • CT201 main reasoning orchestrator.

    • This is the internal brain of Lyra.
    • Running in a privellaged LXC.
    • Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
    • Accessible via 10.0.0.43:8000/v1/completions.
  • **Intake v0.1.1 **

    • Recieves messages from relay and summarizes them in a cascading format.
    • Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
    • Intake then sends to cortex for self reflection, neomem for memory consolidation.
  • **Reflect ** -TBD

Self hosted vLLM server

  • CT201 main reasoning orchestrator.
    • This is the internal brain of Lyra.
    • Running in a privellaged LXC.
    • Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
    • Accessible via 10.0.0.43:8000/v1/completions.
  • Stack Flow
    • [Proxmox Host] └── loads AMDGPU driver └── boots CT201 (order=2)

      [CT201 GPU Container] ├── lyra-start-vllm.sh → starts vLLM ROCm model server ├── lyra-vllm.service → runs the above automatically ├── lyra-core.service → launches Cortex + Intake Docker stack └── Docker Compose → runs Cortex + Intake containers

      [Cortex Container] ├── Listens on port 7081 ├── Talks to NVGRAM (mem API) + Intake └── Main relay between Lyra UI ↔ memory ↔ model

      [Intake Container] ├── Listens on port 7080 ├── Summarizes every few exchanges ├── Writes summaries to /app/logs/summaries.log └── Future: sends summaries → Cortex for reflection


Version History

v0.5.0 (2025-11-28) - Current Release

  • Fixed all critical API wiring issues
  • Added OpenAI-compatible endpoint to Relay (/v1/chat/completions)
  • Fixed Cortex → Intake integration
  • Added missing Python package __init__.py files
  • End-to-end message flow verified and working

v0.4.x (Major Rewire)

  • Cortex multi-stage reasoning pipeline
  • Intake v0.2 simplification
  • LLM router with multi-backend support
  • Major architectural restructuring

v0.3.x

  • Beta Lyrae RAG system
  • NeoMem integration
  • Basic Cortex reasoning loop

Known Issues (v0.5.0)

Non-Critical

  • Session management endpoints not fully implemented in Relay
  • RAG service currently disabled in docker-compose.yml
  • Cortex /ingest endpoint is a stub

Future Enhancements

  • Re-enable RAG service integration
  • Implement full session persistence
  • Add request correlation IDs for tracing
  • Comprehensive health checks

Quick Start

Prerequisites

  • Docker + Docker Compose
  • PostgreSQL 13+, Neo4j 4.4+ (for NeoMem)
  • At least one LLM API endpoint (vLLM, Ollama, or OpenAI)

Setup

  1. Configure environment variables in .env files
  2. Start services: docker-compose up -d
  3. Check health: curl http://localhost:7078/_health
  4. Access UI: http://localhost:7078

Test

curl -X POST http://localhost:7078/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Hello Lyra!"}],
    "session_id": "test"
  }'

Documentation

  • See CHANGELOG.md for detailed version history
  • See ENVIRONMENT_VARIABLES.md for environment variable reference
  • Additional information available in the Trilium docs

License

NeoMem is a derivative work based on Mem0 OSS (Apache 2.0). © 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.

Built with Claude Code


📦 Requirements

  • Docker + Docker Compose
  • Postgres + Neo4j (for NeoMem)
  • Access to an open AI or ollama style API.
  • OpenAI API key (for Relay fallback LLMs)

Dependencies: - fastapi==0.115.8 - uvicorn==0.34.0 - pydantic==2.10.4 - python-dotenv==1.0.1 - psycopg>=3.2.8 - ollama


🔌 Integration Notes

Lyra-Core connects to neomem-api:8000 inside Docker or localhost:7077 locally.

API endpoints remain identical to Mem0 (/memories, /search).

History and entity graphs managed internally via Postgres + Neo4j.


🧱 Architecture Snapshot

User → Relay → Cortex
		 ↓
	 [RAG Search]
		 ↓
	 [Reflection Loop]
		 ↓
	 Intake (async summaries)
		 ↓
	 NeoMem (persistent memory)

Cortex v0.4.1 introduces the first fully integrated reasoning loop.

  • Data Flow:
    • User message enters Cortex via /reason.
    • Cortex assembles context:
      • Intake summaries (short-term memory)
      • RAG contextual data (knowledge base)
    • LLM generates initial draft (call_llm).
    • Reflection loop critiques and refines the answer.
    • Intake asynchronously summarizes and sends snapshots to NeoMem.

RAG API Configuration: Set RAG_API_URL in .env (default: http://localhost:7090).


Setup and Operation

Beta Lyrae - RAG memory system

Requirements -Env= python 3.10+ -Dependences: pip install chromadb openai tqdm python-dotenv fastapi uvicorn jq -Persistent storage path: ./chromadb (can be moved to /mnt/data/lyra_rag_db)

Import Chats

  • Chats need to be formatted into the correct format of
      "messages": [
        {
    	  "role:" "user",
    	  "content": "Message here"
    	},
    	"messages": [
        {
    	  "role:" "assistant",
    	  "content": "Message here"
    	},```
    
  • Organize the chats into categorical folders. This step is optional, but it helped me keep it straight.
  • run "python3 rag_chat_import.py", chats will then be imported automatically. For reference, it took 32 Minutes to import 68 Chat logs (aprox 10.3MB).

Build API Server

  • Run: rag_build.py, this automatically builds the chromaDB using data saved in the /chatlogs/ folder. (docs folder to be added in future.)
  • Run: rag_api.py or uvicorn rag_api:app --host 0.0.0.0 --port 7090

Query

  • Run: python3 rag_query.py "Question here?"
  • For testing a curl command can reach it too
    curl -X POST http://127.0.0.1:7090/rag/search \
      -H "Content-Type: application/json" \
      -d '{
    		"query": "What is the current state of Cortex and Project Lyra?",
    		"where": {"category": "lyra"}
    	  }'
    

Beta Lyrae - RAG System

📖 License

NeoMem is a derivative work based on the Mem0 OSS project (Apache 2.0).
This fork retains the original Apache 2.0 license and adds local modifications.
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.

Description
Beepo Boop this is a robot beep.
Readme 3.3 MiB
Languages
Python 92.8%
HTML 3.9%
JavaScript 1.7%
CSS 1.3%
Dockerfile 0.3%