##### Project Lyra - README v0.3.0 - needs fixing #####

Lyra is a modular persistent AI companion system.  
It provides memory-backed chat using **NeoMem** + **Relay** + **Persona Sidecar**,  
with optional subconscious annotation powered by **Cortex VM** running local LLMs.

## Mission Statement ##
	The point of project lyra is to give an AI chatbot more abilities than a typical chatbot. typical chat bots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/data base/ co-creator/collaborattor all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
	
---
	
## Structure ##
	Project Lyra exists as a series of docker containers that run independentally of each other but are all networked together. Think of it as how the brain has regions, Lyra has modules:
	## A. VM 100 - lyra-core:
		1. ** Core v0.3.1 - Docker Stack
			- Relay - (docker container) - The main harness that connects the modules together and accepts input from the user.
			- UI - (HTML) - This is how the user communicates with lyra. ATM its a typical instant message interface, but plans are to make it much more than that.
			- Persona - (docker container) - This is the personality of lyra, set how you want her to behave. Give specific instructions for output. Basically prompt injection.
			- All of this is built and controlled by a single .env and docker-compose.lyra.yml.
		2. **NeoMem v0.1.0 - (docker stack)
			- NeoMem is Lyra's main long term memory data base. It is a fork of mem0 oss. Uses vector databases and graph.
			- NeoMem launches with a single separate docker-compose.neomem.yml.
			
	## B. VM 101 - lyra - cortex
		3. ** Cortex - VM containing docker stack
		- This is the working reasoning layer of Lyra.
		- Built to be flexible in deployment. Run it locally or remotely (via wan/lan) 
		- Intake v0.1.0 - (docker Container) gives conversations context and purpose
			- Intake takes the last N exchanges and summarizes them into coherrent short term memories.
			- Uses a cascading summarization setup that quantizes the exchanges. Summaries occur at L2, L5, L10, L15, L20 etc.
			- Keeps the bot aware of what is going on with out having to send it the whole chat every time. 
		- Cortex - Docker container containing: 
			- Reasoning Layer
				- TBD
			- Reflect - (docker continer) - Not yet implemented, road map. 
				- Calls back to NeoMem after N exchanges and N summaries and edits memories created during the initial messaging step. This helps contain memories to coherrent thoughts, reduces the noise.
				- Can be done actively and asynchronously, or on a time basis (think human sleep and dreams). 
				- This stage is not yet built, this is just an idea. 
		
	## C. Remote LLM APIs:
		3. **AI Backends
			- Lyra doesnt run models her self, she calls up APIs.
			- Endlessly customizable as long as it outputs to the same schema. 
			
---


## 🚀 Features ##

# Lyra-Core VM (VM100)
- **Relay **:
  - The main harness and orchestrator of Lyra.
  - OpenAI-compatible endpoint: `POST /v1/chat/completions`
  - Injects persona + relevant memories into every LLM call
  - Routes all memory storage/retrieval through **NeoMem**
  - Logs spans (`neomem.add`, `neomem.search`, `persona.fetch`, `llm.generate`)

- **NeoMem (Memory Engine)**:
  - Forked from Mem0 OSS and fully independent.
  - Drop-in compatible API (`/memories`, `/search`).
  - Local-first: runs on FastAPI with Postgres + Neo4j.
  - No external SDK dependencies.
  - Default service: `neomem-api` (port 7077).
  - Capable of adding new memories and updating previous memories. Compares existing embeddings and performs in place updates when a memory is judged to be a semantic match.

- **UI**:
  - Lightweight static HTML chat page.
  - Connects to Relay at `http://<host>:7078`.
  - Nice cyberpunk theme!
  - Saves and loads sessions, which then in turn send to relay.

# Beta Lyrae (RAG Memory DB) - added 11-3-25
- **RAG Knowledge DB - Beta Lyrae (sheliak)**
  - This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra.  
  - It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation.
		The system uses:
  - **ChromaDB** for persistent vector storage  
  - **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity  
  - **FastAPI** (port 7090) for the `/rag/search` REST endpoint  
  - Directory Layout
		rag/
		├── rag_chat_import.py # imports JSON chat logs
		├── rag_docs_import.py # (planned) PDF/EPUB/manual importer
		├── rag_build.py # legacy single-folder builder
		├── rag_query.py # command-line query helper
		├── rag_api.py # FastAPI service providing /rag/search
		├── chromadb/ # persistent vector store
		├── chatlogs/ # organized source data
		│ ├── poker/
		│ ├── work/
		│ ├── lyra/
		│ ├── personal/
		│ └── ...
		└── import.log # progress log for batch runs
  - **OpenAI chatlog importer.
	  - Takes JSON formatted chat logs and imports it to the RAG.
	  - **fetures include:**
	    - Recursive folder indexing with **category detection** from directory name  
		- Smart chunking for long messages (5 000 chars per slice)  
		- Automatic deduplication using SHA-1 hash of file + chunk
		- Timestamps for both file modification and import time
		- Full progress logging via tqdm
		- Safe to run in background with nohup … &
		- Metadata per chunk:
		  ```json
		  {
			"chat_id": "<sha1 of filename>",
			"chunk_index": 0,
			"source": "chatlogs/lyra/0002_cortex_LLMs_11-1-25.json",
			"title": "cortex LLMs 11-1-25",
			"role": "assistant",
			"category": "lyra",
			"type": "chat",
			"file_modified": "2025-11-06T23:41:02",
			"imported_at": "2025-11-07T03:55:00Z"
		  }```

# Cortex VM (VM101, CT201)
  - **CT201 main reasoning orchestrator.**
    - This is the internal brain of Lyra.
	- Running in a privellaged LXC.	
	- Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
	- Accessible via 10.0.0.43:8000/v1/completions.

  - **Intake v0.1.1 **
    - Recieves messages from relay and summarizes them in a cascading format.
	- Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
	- Intake then sends to cortex for self reflection, neomem for memory consolidation.
	
  - **Reflect **
    -TBD

# Self hosted vLLM server #
  - **CT201 main reasoning orchestrator.**
    - This is the internal brain of Lyra.
	- Running in a privellaged LXC.	
	- Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
	- Accessible via 10.0.0.43:8000/v1/completions.
  - **Stack Flow**
    -	[Proxmox Host]
			 └── loads AMDGPU driver
			 └── boots CT201 (order=2)

		[CT201 GPU Container]
			 ├── lyra-start-vllm.sh → starts vLLM ROCm model server
			 ├── lyra-vllm.service   → runs the above automatically
			 ├── lyra-core.service   → launches Cortex + Intake Docker stack
			 └── Docker Compose      → runs Cortex + Intake containers

		[Cortex Container]
			 ├── Listens on port 7081
			 ├── Talks to NVGRAM (mem API) + Intake
			 └── Main relay between Lyra UI ↔ memory ↔ model

		[Intake Container]
			├── Listens on port 7080
			├── Summarizes every few exchanges
			├── Writes summaries to /app/logs/summaries.log
			└── Future: sends summaries → Cortex for reflection


# Additional information available in the trilium docs. #
---

## 📦 Requirements

- Docker + Docker Compose  
- Postgres + Neo4j (for NeoMem)
- Access to an open AI or ollama style API.
- OpenAI API key (for Relay fallback LLMs)

**Dependencies:**
	- fastapi==0.115.8
	- uvicorn==0.34.0
	- pydantic==2.10.4
	- python-dotenv==1.0.1
	- psycopg>=3.2.8
	- ollama

---

🔌 Integration Notes

Lyra-Core connects to neomem-api:8000 inside Docker or localhost:7077 locally.

API endpoints remain identical to Mem0 (/memories, /search).

History and entity graphs managed internally via Postgres + Neo4j.

---

🧱 Architecture Snapshot

	User → Relay → Cortex
			 ↓
		 [RAG Search]
			 ↓
		 [Reflection Loop]
			 ↓
		 Intake (async summaries)
			 ↓
		 NeoMem (persistent memory)

**Cortex v0.4.1 introduces the first fully integrated reasoning loop.**
- Data Flow:
  - User message enters Cortex via /reason.
  - Cortex assembles context:
	- Intake summaries (short-term memory)
	- RAG contextual data (knowledge base)
  - LLM generates initial draft (call_llm).
  - Reflection loop critiques and refines the answer.
  - Intake asynchronously summarizes and sends snapshots to NeoMem.

RAG API Configuration:
Set RAG_API_URL in .env (default: http://localhost:7090).

---

## Setup and Operation ##

## Beta Lyrae - RAG memory system ##
**Requirements**
  -Env= python 3.10+
  -Dependences: pip install chromadb openai tqdm python-dotenv fastapi uvicorn jq
  -Persistent storage path: ./chromadb (can be moved to /mnt/data/lyra_rag_db)

**Import Chats**
  - Chats need to be formatted into the correct format of
	```
	  "messages": [
	    {
		  "role:" "user",
		  "content": "Message here"
		},
		"messages": [
	    {
		  "role:" "assistant",
		  "content": "Message here"
		},```
  - Organize the chats into categorical folders. This step is optional, but it helped me keep it straight.
  - run "python3 rag_chat_import.py", chats will then be imported automatically. For reference, it took 32 Minutes to import 68 Chat logs (aprox 10.3MB).

**Build API Server**
  - Run: rag_build.py, this automatically builds the chromaDB using data saved in the /chatlogs/ folder. (docs folder to be added in future.)
  - Run: rag_api.py or ```uvicorn rag_api:app --host 0.0.0.0 --port 7090```

**Query**
  - Run: python3 rag_query.py "Question here?"
  - For testing a curl command can reach it too
    ```
	curl -X POST http://127.0.0.1:7090/rag/search \
	  -H "Content-Type: application/json" \
	  -d '{
			"query": "What is the current state of Cortex and Project Lyra?",
			"where": {"category": "lyra"}
		  }'
	```
	
# Beta Lyrae - RAG System

## 📖 License
NeoMem is a derivative work based on the Mem0 OSS project (Apache 2.0).  
This fork retains the original Apache 2.0 license and adds local modifications.  
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.