Files
project-lyra/README.md
2025-11-17 03:34:23 -05:00

10 KiB

Project Lyra - README v0.3.0 - needs fixing

Lyra is a modular persistent AI companion system.
It provides memory-backed chat using NeoMem + Relay + Persona Sidecar,
with optional subconscious annotation powered by Cortex VM running local LLMs.

Mission Statement

The point of project lyra is to give an AI chatbot more abilities than a typical chatbot. typical chat bots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/data base/ co-creator/collaborattor all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.

Structure

Project Lyra exists as a series of docker containers that run independentally of each other but are all networked together. Think of it as how the brain has regions, Lyra has modules:
## A. VM 100 - lyra-core:
	1. ** Core v0.3.1 - Docker Stack
		- Relay - (docker container) - The main harness that connects the modules together and accepts input from the user.
		- UI - (HTML) - This is how the user communicates with lyra. ATM its a typical instant message interface, but plans are to make it much more than that.
		- Persona - (docker container) - This is the personality of lyra, set how you want her to behave. Give specific instructions for output. Basically prompt injection.
		- All of this is built and controlled by a single .env and docker-compose.lyra.yml.
	2. **NeoMem v0.1.0 - (docker stack)
		- NeoMem is Lyra's main long term memory data base. It is a fork of mem0 oss. Uses vector databases and graph.
		- NeoMem launches with a single separate docker-compose.neomem.yml.
		
## B. VM 101 - lyra - cortex
	3. ** Cortex - VM containing docker stack
	- This is the working reasoning layer of Lyra.
	- Built to be flexible in deployment. Run it locally or remotely (via wan/lan) 
	- Intake v0.1.0 - (docker Container) gives conversations context and purpose
		- Intake takes the last N exchanges and summarizes them into coherrent short term memories.
		- Uses a cascading summarization setup that quantizes the exchanges. Summaries occur at L2, L5, L10, L15, L20 etc.
		- Keeps the bot aware of what is going on with out having to send it the whole chat every time. 
	- Cortex - Docker container containing: 
		- Reasoning Layer
			- TBD
		- Reflect - (docker continer) - Not yet implemented, road map. 
			- Calls back to NeoMem after N exchanges and N summaries and edits memories created during the initial messaging step. This helps contain memories to coherrent thoughts, reduces the noise.
			- Can be done actively and asynchronously, or on a time basis (think human sleep and dreams). 
			- This stage is not yet built, this is just an idea. 
	
## C. Remote LLM APIs:
	3. **AI Backends
		- Lyra doesnt run models her self, she calls up APIs.
		- Endlessly customizable as long as it outputs to the same schema. 

🚀 Features

Lyra-Core VM (VM100)

  • **Relay **:

    • The main harness and orchestrator of Lyra.
    • OpenAI-compatible endpoint: POST /v1/chat/completions
    • Injects persona + relevant memories into every LLM call
    • Routes all memory storage/retrieval through NeoMem
    • Logs spans (neomem.add, neomem.search, persona.fetch, llm.generate)
  • NeoMem (Memory Engine):

    • Forked from Mem0 OSS and fully independent.
    • Drop-in compatible API (/memories, /search).
    • Local-first: runs on FastAPI with Postgres + Neo4j.
    • No external SDK dependencies.
    • Default service: neomem-api (port 7077).
    • Capable of adding new memories and updating previous memories. Compares existing embeddings and performs in place updates when a memory is judged to be a semantic match.
  • UI:

    • Lightweight static HTML chat page.
    • Connects to Relay at http://<host>:7078.
    • Nice cyberpunk theme!
    • Saves and loads sessions, which then in turn send to relay.

Beta Lyrae (RAG Memory DB) - added 11-3-25

  • RAG Knowledge DB - Beta Lyrae (sheliak)
    • This module implements the Retrieval-Augmented Generation (RAG) layer for Project Lyra.
    • It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation. The system uses:
    • ChromaDB for persistent vector storage
    • OpenAI Embeddings (text-embedding-3-small) for semantic similarity
    • FastAPI (port 7090) for the /rag/search REST endpoint
    • Directory Layout rag/ ├── rag_chat_import.py # imports JSON chat logs ├── rag_docs_import.py # (planned) PDF/EPUB/manual importer ├── rag_build.py # legacy single-folder builder ├── rag_query.py # command-line query helper ├── rag_api.py # FastAPI service providing /rag/search ├── chromadb/ # persistent vector store ├── chatlogs/ # organized source data │ ├── poker/ │ ├── work/ │ ├── lyra/ │ ├── personal/ │ └── ... └── import.log # progress log for batch runs
    • **OpenAI chatlog importer.
      • Takes JSON formatted chat logs and imports it to the RAG.
      • fetures include:
        • Recursive folder indexing with category detection from directory name
        • Smart chunking for long messages (5 000 chars per slice)
        • Automatic deduplication using SHA-1 hash of file + chunk
        • Timestamps for both file modification and import time
        • Full progress logging via tqdm
        • Safe to run in background with nohup … &
        • Metadata per chunk:
          {
            "chat_id": "<sha1 of filename>",
            "chunk_index": 0,
            "source": "chatlogs/lyra/0002_cortex_LLMs_11-1-25.json",
            "title": "cortex LLMs 11-1-25",
            "role": "assistant",
            "category": "lyra",
            "type": "chat",
            "file_modified": "2025-11-06T23:41:02",
            "imported_at": "2025-11-07T03:55:00Z"
          }```
          
          

Cortex VM (VM101, CT201)

  • CT201 main reasoning orchestrator.

    • This is the internal brain of Lyra.
    • Running in a privellaged LXC.
    • Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
    • Accessible via 10.0.0.43:8000/v1/completions.
  • **Intake v0.1.1 **

    • Recieves messages from relay and summarizes them in a cascading format.
    • Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
    • Intake then sends to cortex for self reflection, neomem for memory consolidation.
  • **Reflect ** -TBD

Self hosted vLLM server

  • CT201 main reasoning orchestrator.
    • This is the internal brain of Lyra.
    • Running in a privellaged LXC.
    • Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
    • Accessible via 10.0.0.43:8000/v1/completions.
  • Stack Flow
    • [Proxmox Host] └── loads AMDGPU driver └── boots CT201 (order=2)

      [CT201 GPU Container] ├── lyra-start-vllm.sh → starts vLLM ROCm model server ├── lyra-vllm.service → runs the above automatically ├── lyra-core.service → launches Cortex + Intake Docker stack └── Docker Compose → runs Cortex + Intake containers

      [Cortex Container] ├── Listens on port 7081 ├── Talks to NVGRAM (mem API) + Intake └── Main relay between Lyra UI ↔ memory ↔ model

      [Intake Container] ├── Listens on port 7080 ├── Summarizes every few exchanges ├── Writes summaries to /app/logs/summaries.log └── Future: sends summaries → Cortex for reflection

Additional information available in the trilium docs.


📦 Requirements

  • Docker + Docker Compose
  • Postgres + Neo4j (for NeoMem)
  • Access to an open AI or ollama style API.
  • OpenAI API key (for Relay fallback LLMs)

Dependencies: - fastapi==0.115.8 - uvicorn==0.34.0 - pydantic==2.10.4 - python-dotenv==1.0.1 - psycopg>=3.2.8 - ollama


🔌 Integration Notes

Lyra-Core connects to neomem-api:8000 inside Docker or localhost:7077 locally.

API endpoints remain identical to Mem0 (/memories, /search).

History and entity graphs managed internally via Postgres + Neo4j.


🧱 Architecture Snapshot

User → Relay → Cortex
		 ↓
	 [RAG Search]
		 ↓
	 [Reflection Loop]
		 ↓
	 Intake (async summaries)
		 ↓
	 NeoMem (persistent memory)

Cortex v0.4.1 introduces the first fully integrated reasoning loop.

  • Data Flow:
    • User message enters Cortex via /reason.
    • Cortex assembles context:
      • Intake summaries (short-term memory)
      • RAG contextual data (knowledge base)
    • LLM generates initial draft (call_llm).
    • Reflection loop critiques and refines the answer.
    • Intake asynchronously summarizes and sends snapshots to NeoMem.

RAG API Configuration: Set RAG_API_URL in .env (default: http://localhost:7090).


Setup and Operation

Beta Lyrae - RAG memory system

Requirements -Env= python 3.10+ -Dependences: pip install chromadb openai tqdm python-dotenv fastapi uvicorn jq -Persistent storage path: ./chromadb (can be moved to /mnt/data/lyra_rag_db)

Import Chats

  • Chats need to be formatted into the correct format of
      "messages": [
        {
    	  "role:" "user",
    	  "content": "Message here"
    	},
    	"messages": [
        {
    	  "role:" "assistant",
    	  "content": "Message here"
    	},```
    
  • Organize the chats into categorical folders. This step is optional, but it helped me keep it straight.
  • run "python3 rag_chat_import.py", chats will then be imported automatically. For reference, it took 32 Minutes to import 68 Chat logs (aprox 10.3MB).

Build API Server

  • Run: rag_build.py, this automatically builds the chromaDB using data saved in the /chatlogs/ folder. (docs folder to be added in future.)
  • Run: rag_api.py or uvicorn rag_api:app --host 0.0.0.0 --port 7090

Query

  • Run: python3 rag_query.py "Question here?"
  • For testing a curl command can reach it too
    curl -X POST http://127.0.0.1:7090/rag/search \
      -H "Content-Type: application/json" \
      -d '{
    		"query": "What is the current state of Cortex and Project Lyra?",
    		"where": {"category": "lyra"}
    	  }'
    

Beta Lyrae - RAG System

📖 License

NeoMem is a derivative work based on the Mem0 OSS project (Apache 2.0).
This fork retains the original Apache 2.0 license and adds local modifications.
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.