Update to v0.9.1 #1

Merged
serversdown merged 44 commits from dev into main 2026-01-18 02:46:25 -05:00
12 changed files with 557 additions and 477 deletions
Showing only changes of commit d9281a1816 - Show all commits

View File

@@ -2,11 +2,106 @@
All notable changes to Project Lyra are organized by component. All notable changes to Project Lyra are organized by component.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
and adheres to [Semantic Versioning](https://semver.org/). and adheres to [Semantic Versioning](https://semver.org/).
# Last Updated: 11-26-25 # Last Updated: 11-28-25
--- ---
## 🧠 Lyra-Core ############################################################################## ## 🧠 Lyra-Core ##############################################################################
## [Project Lyra v0.5.0] - 2025-11-28
### 🔧 Fixed - Critical API Wiring & Integration
After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.
#### Cortex → Intake Integration ✅
- **Fixed** `IntakeClient` to use correct Intake v0.2 API endpoints
- Changed `GET /context/{session_id}``GET /summaries?session_id={session_id}`
- Updated JSON response parsing to extract `summary_text` field
- Fixed environment variable name: `INTAKE_API``INTAKE_API_URL`
- Corrected default port: `7083``7080`
- Added deprecation warning to `summarize_turn()` method (endpoint removed in Intake v0.2)
#### Relay → UI Compatibility ✅
- **Added** OpenAI-compatible endpoint `POST /v1/chat/completions`
- Accepts standard OpenAI format with `messages[]` array
- Returns OpenAI-compatible response structure with `choices[]`
- Extracts last message content from messages array
- Includes usage metadata (stub values for compatibility)
- **Refactored** Relay to use shared `handleChatRequest()` function
- Both `/chat` and `/v1/chat/completions` use same core logic
- Eliminates code duplication
- Consistent error handling across endpoints
#### Relay → Intake Connection ✅
- **Fixed** Intake URL fallback in Relay server configuration
- Corrected port: `7082``7080`
- Updated endpoint: `/summary``/add_exchange`
- Now properly sends exchanges to Intake for summarization
#### Code Quality & Python Package Structure ✅
- **Added** missing `__init__.py` files to all Cortex subdirectories
- `cortex/llm/__init__.py`
- `cortex/reasoning/__init__.py`
- `cortex/persona/__init__.py`
- `cortex/ingest/__init__.py`
- `cortex/utils/__init__.py`
- Improves package imports and IDE support
- **Removed** unused import in `cortex/router.py`: `from unittest import result`
- **Deleted** empty file `cortex/llm/resolve_llm_url.py` (was 0 bytes, never implemented)
### ✅ Verified Working
Complete end-to-end message flow now operational:
```
UI → Relay (/v1/chat/completions)
Relay → Cortex (/reason)
Cortex → Intake (/summaries) [retrieves context]
Cortex 4-stage pipeline:
1. reflection.py → meta-awareness notes
2. reasoning.py → draft answer
3. refine.py → polished answer
4. persona/speak.py → Lyra personality
Cortex → Relay (returns persona response)
Relay → Intake (/add_exchange) [async summary]
Intake → NeoMem (background memory storage)
Relay → UI (final response)
```
### 📝 Documentation
- **Added** this CHANGELOG entry with comprehensive v0.5.0 notes
- **Updated** README.md to reflect v0.5.0 architecture
- Documented new endpoints
- Updated data flow diagrams
- Clarified Intake v0.2 changes
- Corrected service descriptions
### 🐛 Issues Resolved
- ❌ Cortex could not retrieve context from Intake (wrong endpoint)
- ❌ UI could not send messages to Relay (endpoint mismatch)
- ❌ Relay could not send summaries to Intake (wrong port/endpoint)
- ❌ Python package imports were implicit (missing __init__.py)
### ⚠️ Known Issues (Non-Critical)
- Session management endpoints not implemented in Relay (`GET/POST /sessions/:id`)
- RAG service currently disabled in docker-compose.yml
- Cortex `/ingest` endpoint is a stub returning `{"status": "ok"}`
### 🎯 Migration Notes
If upgrading from v0.4.x:
1. Pull latest changes from git
2. Verify environment variables in `.env` files:
- Check `INTAKE_API_URL=http://intake:7080` (not `INTAKE_API`)
- Verify all service URLs use correct ports
3. Restart Docker containers: `docker-compose down && docker-compose up -d`
4. Test with a simple message through the UI
---
## [Infrastructure v1.0.0] - 2025-11-26 ## [Infrastructure v1.0.0] - 2025-11-26
### Changed ### Changed

303
README.md
View File

@@ -1,73 +1,178 @@
##### Project Lyra - README v0.3.0 - needs fixing ##### # Project Lyra - README v0.5.0
Lyra is a modular persistent AI companion system. Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
It provides memory-backed chat using **NeoMem** + **Relay** + **Persona Sidecar**, It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
with optional subconscious annotation powered by **Cortex VM** running local LLMs. with multi-stage reasoning pipeline powered by distributed LLM backends.
## Mission Statement ## ## Mission Statement
The point of project lyra is to give an AI chatbot more abilities than a typical chatbot. typical chat bots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/data base/ co-creator/collaborattor all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
--- ---
## Structure ## ## Architecture Overview
Project Lyra exists as a series of docker containers that run independentally of each other but are all networked together. Think of it as how the brain has regions, Lyra has modules:
## A. VM 100 - lyra-core:
1. ** Core v0.3.1 - Docker Stack
- Relay - (docker container) - The main harness that connects the modules together and accepts input from the user.
- UI - (HTML) - This is how the user communicates with lyra. ATM its a typical instant message interface, but plans are to make it much more than that.
- Persona - (docker container) - This is the personality of lyra, set how you want her to behave. Give specific instructions for output. Basically prompt injection.
- All of this is built and controlled by a single .env and docker-compose.lyra.yml.
2. **NeoMem v0.1.0 - (docker stack)
- NeoMem is Lyra's main long term memory data base. It is a fork of mem0 oss. Uses vector databases and graph.
- NeoMem launches with a single separate docker-compose.neomem.yml.
## B. VM 101 - lyra - cortex Project Lyra operates as a series of Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
3. ** Cortex - VM containing docker stack
- This is the working reasoning layer of Lyra.
- Built to be flexible in deployment. Run it locally or remotely (via wan/lan)
- Intake v0.1.0 - (docker Container) gives conversations context and purpose
- Intake takes the last N exchanges and summarizes them into coherrent short term memories.
- Uses a cascading summarization setup that quantizes the exchanges. Summaries occur at L2, L5, L10, L15, L20 etc.
- Keeps the bot aware of what is going on with out having to send it the whole chat every time.
- Cortex - Docker container containing:
- Reasoning Layer
- TBD
- Reflect - (docker continer) - Not yet implemented, road map.
- Calls back to NeoMem after N exchanges and N summaries and edits memories created during the initial messaging step. This helps contain memories to coherrent thoughts, reduces the noise.
- Can be done actively and asynchronously, or on a time basis (think human sleep and dreams).
- This stage is not yet built, this is just an idea.
## C. Remote LLM APIs: ### A. VM 100 - lyra-core (Core Services)
3. **AI Backends
- Lyra doesnt run models her self, she calls up APIs.
- Endlessly customizable as long as it outputs to the same schema.
--- **1. Relay** (Node.js/Express) - Port 7078
- Main orchestrator and message router
- Coordinates all module interactions
## 🚀 Features ##
# Lyra-Core VM (VM100)
- **Relay **:
- The main harness and orchestrator of Lyra.
- OpenAI-compatible endpoint: `POST /v1/chat/completions` - OpenAI-compatible endpoint: `POST /v1/chat/completions`
- Injects persona + relevant memories into every LLM call - Internal endpoint: `POST /chat`
- Routes all memory storage/retrieval through **NeoMem** - Routes messages through Cortex reasoning pipeline
- Logs spans (`neomem.add`, `neomem.search`, `persona.fetch`, `llm.generate`) - Manages async calls to Intake and NeoMem
- **NeoMem (Memory Engine)**: **2. UI** (Static HTML)
- Forked from Mem0 OSS and fully independent. - Browser-based chat interface with cyberpunk theme
- Drop-in compatible API (`/memories`, `/search`). - Connects to Relay at `http://10.0.0.40:7078`
- Local-first: runs on FastAPI with Postgres + Neo4j. - Saves and loads sessions
- No external SDK dependencies. - OpenAI-compatible message format
- Default service: `neomem-api` (port 7077).
- Capable of adding new memories and updating previous memories. Compares existing embeddings and performs in place updates when a memory is judged to be a semantic match.
- **UI**: **3. NeoMem** (Python/FastAPI) - Port 7077
- Lightweight static HTML chat page. - Long-term memory database (fork of Mem0 OSS)
- Connects to Relay at `http://<host>:7078`. - Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
- Nice cyberpunk theme! - RESTful API: `/memories`, `/search`
- Saves and loads sessions, which then in turn send to relay. - Semantic memory updates and retrieval
- No external SDK dependencies - fully local
### B. VM 101 - lyra-cortex (Reasoning Layer)
**4. Cortex** (Python/FastAPI) - Port 7081
- Primary reasoning engine with multi-stage pipeline
- **4-Stage Processing:**
1. **Reflection** - Generates meta-awareness notes about conversation
2. **Reasoning** - Creates initial draft answer using context
3. **Refinement** - Polishes and improves the draft
4. **Persona** - Applies Lyra's personality and speaking style
- Integrates with Intake for short-term context
- Flexible LLM router supporting multiple backends
**5. Intake v0.2** (Python/FastAPI) - Port 7080
- Simplified short-term memory summarization
- Session-based circular buffer (deque, maxlen=200)
- Single-level simple summarization (no cascading)
- Background async processing with FastAPI BackgroundTasks
- Pushes summaries to NeoMem automatically
- **API Endpoints:**
- `POST /add_exchange` - Add conversation exchange
- `GET /summaries?session_id={id}` - Retrieve session summary
- `POST /close_session/{id}` - Close and cleanup session
### C. LLM Backends (Remote/Local APIs)
**Multi-Backend Strategy:**
- **PRIMARY**: vLLM on AMD MI50 GPU (`http://10.0.0.43:8000`) - Cortex reasoning, Intake
- **SECONDARY**: Ollama on RTX 3090 (`http://10.0.0.3:11434`) - Configurable per-module
- **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cortex persona layer
- **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback
---
## Data Flow Architecture (v0.5.0)
### Normal Message Flow:
```
User (UI) → POST /v1/chat/completions
Relay (7078)
↓ POST /reason
Cortex (7081)
↓ GET /summaries?session_id=xxx
Intake (7080) [RETURNS SUMMARY]
Cortex processes (4 stages):
1. reflection.py → meta-awareness notes
2. reasoning.py → draft answer (uses LLM)
3. refine.py → refined answer (uses LLM)
4. persona/speak.py → Lyra personality (uses LLM)
Returns persona answer to Relay
Relay → Cortex /ingest (async, stub)
Relay → Intake /add_exchange (async)
Intake → Background summarize → NeoMem
Relay → UI (returns final response)
```
### Cortex 4-Stage Reasoning Pipeline:
1. **Reflection** (`reflection.py`) - Cloud backend (OpenAI)
- Analyzes user intent and conversation context
- Generates meta-awareness notes
- "What is the user really asking?"
2. **Reasoning** (`reasoning.py`) - Primary backend (vLLM)
- Retrieves short-term context from Intake
- Creates initial draft answer
- Integrates context, reflection notes, and user prompt
3. **Refinement** (`refine.py`) - Primary backend (vLLM)
- Polishes the draft answer
- Improves clarity and coherence
- Ensures factual consistency
4. **Persona** (`speak.py`) - Cloud backend (OpenAI)
- Applies Lyra's personality and speaking style
- Natural, conversational output
- Final answer returned to user
---
## Features
### Lyra-Core (VM 100)
**Relay**:
- Main orchestrator and message router
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
- Internal endpoint: `POST /chat`
- Health check: `GET /_health`
- Async non-blocking calls to Cortex and Intake
- Shared request handler for code reuse
- Comprehensive error handling
**NeoMem (Memory Engine)**:
- Forked from Mem0 OSS - fully independent
- Drop-in compatible API (`/memories`, `/search`)
- Local-first: runs on FastAPI with Postgres + Neo4j
- No external SDK dependencies
- Semantic memory updates - compares embeddings and performs in-place updates
- Default service: `neomem-api` (port 7077)
**UI**:
- Lightweight static HTML chat interface
- Cyberpunk theme
- Session save/load functionality
- OpenAI message format support
### Cortex (VM 101)
**Cortex** (v0.5):
- Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
- Flexible LLM backend routing
- Per-stage backend selection
- Async processing throughout
- IntakeClient integration for short-term context
- `/reason`, `/ingest` (stub), `/health` endpoints
**Intake** (v0.2):
- Simplified single-level summarization
- Session-based circular buffer (200 exchanges max)
- Background async summarization
- Automatic NeoMem push
- No persistent log files (memory-only)
- **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30)
**LLM Router**:
- Dynamic backend selection
- Environment-driven configuration
- Support for vLLM, Ollama, OpenAI, custom endpoints
- Per-module backend preferences
# Beta Lyrae (RAG Memory DB) - added 11-3-25 # Beta Lyrae (RAG Memory DB) - added 11-3-25
- **RAG Knowledge DB - Beta Lyrae (sheliak)** - **RAG Knowledge DB - Beta Lyrae (sheliak)**
@@ -159,7 +264,85 @@ with optional subconscious annotation powered by **Cortex VM** running local LLM
└── Future: sends summaries → Cortex for reflection └── Future: sends summaries → Cortex for reflection
# Additional information available in the trilium docs. # ---
## Version History
### v0.5.0 (2025-11-28) - Current Release
- ✅ Fixed all critical API wiring issues
- ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
- ✅ Fixed Cortex → Intake integration
- ✅ Added missing Python package `__init__.py` files
- ✅ End-to-end message flow verified and working
### v0.4.x (Major Rewire)
- Cortex multi-stage reasoning pipeline
- Intake v0.2 simplification
- LLM router with multi-backend support
- Major architectural restructuring
### v0.3.x
- Beta Lyrae RAG system
- NeoMem integration
- Basic Cortex reasoning loop
---
## Known Issues (v0.5.0)
### Non-Critical
- Session management endpoints not fully implemented in Relay
- RAG service currently disabled in docker-compose.yml
- Cortex `/ingest` endpoint is a stub
### Future Enhancements
- Re-enable RAG service integration
- Implement full session persistence
- Add request correlation IDs for tracing
- Comprehensive health checks
---
## Quick Start
### Prerequisites
- Docker + Docker Compose
- PostgreSQL 13+, Neo4j 4.4+ (for NeoMem)
- At least one LLM API endpoint (vLLM, Ollama, or OpenAI)
### Setup
1. Configure environment variables in `.env` files
2. Start services: `docker-compose up -d`
3. Check health: `curl http://localhost:7078/_health`
4. Access UI: `http://localhost:7078`
### Test
```bash
curl -X POST http://localhost:7078/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello Lyra!"}],
"session_id": "test"
}'
```
---
## Documentation
- See [CHANGELOG.md](CHANGELOG.md) for detailed version history
- See `ENVIRONMENT_VARIABLES.md` for environment variable reference
- Additional information available in the Trilium docs
---
## License
NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.
**Built with Claude Code**
--- ---
## 📦 Requirements ## 📦 Requirements

View File

@@ -13,7 +13,7 @@ const PORT = Number(process.env.PORT || 7078);
// core endpoints // core endpoints
const CORTEX_REASON = process.env.CORTEX_REASON_URL || "http://cortex:7081/reason"; const CORTEX_REASON = process.env.CORTEX_REASON_URL || "http://cortex:7081/reason";
const CORTEX_INGEST = process.env.CORTEX_INGEST_URL || "http://cortex:7081/ingest"; const CORTEX_INGEST = process.env.CORTEX_INGEST_URL || "http://cortex:7081/ingest";
const INTAKE_URL = process.env.INTAKE_URL || "http://intake:7082/summary"; const INTAKE_URL = process.env.INTAKE_URL || "http://intake:7080/add_exchange";
// ----------------------------------------------------- // -----------------------------------------------------
// Helper request wrapper // Helper request wrapper
@@ -41,6 +41,45 @@ async function postJSON(url, data) {
return json; return json;
} }
// -----------------------------------------------------
// Shared chat handler logic
// -----------------------------------------------------
async function handleChatRequest(session_id, user_msg) {
// 1. → Cortex.reason
let reason;
try {
reason = await postJSON(CORTEX_REASON, {
session_id,
user_prompt: user_msg
});
} catch (e) {
console.error("Relay → Cortex.reason error:", e.message);
throw new Error(`cortex_reason_failed: ${e.message}`);
}
const persona = reason.final_output || reason.persona || "(no persona text)";
// 2. → Cortex.ingest (async, non-blocking)
postJSON(CORTEX_INGEST, {
session_id,
user_msg,
assistant_msg: persona
}).catch(e => console.warn("Relay → Cortex.ingest failed:", e.message));
// 3. → Intake summary (async, non-blocking)
postJSON(INTAKE_URL, {
session_id,
user_msg,
assistant_msg: persona
}).catch(e => console.warn("Relay → Intake failed:", e.message));
// 4. Return result
return {
session_id,
reply: persona
};
}
// ----------------------------------------------------- // -----------------------------------------------------
// HEALTHCHECK // HEALTHCHECK
// ----------------------------------------------------- // -----------------------------------------------------
@@ -48,6 +87,59 @@ app.get("/_health", (_, res) => {
res.json({ ok: true }); res.json({ ok: true });
}); });
// -----------------------------------------------------
// OPENAI-COMPATIBLE ENDPOINT (for UI)
// -----------------------------------------------------
app.post("/v1/chat/completions", async (req, res) => {
try {
// Extract from OpenAI format
const session_id = req.body.session_id || req.body.user || "default";
const messages = req.body.messages || [];
const lastMessage = messages[messages.length - 1];
const user_msg = lastMessage?.content || "";
if (!user_msg) {
return res.status(400).json({ error: "No message content provided" });
}
console.log(`Relay (v1) → received: "${user_msg}"`);
// Call the same logic as /chat
const result = await handleChatRequest(session_id, user_msg);
// Return in OpenAI format
return res.json({
id: `chatcmpl-${Date.now()}`,
object: "chat.completion",
created: Math.floor(Date.now() / 1000),
model: "lyra",
choices: [{
index: 0,
message: {
role: "assistant",
content: result.reply
},
finish_reason: "stop"
}],
usage: {
prompt_tokens: 0,
completion_tokens: 0,
total_tokens: 0
}
});
} catch (err) {
console.error("Relay v1 endpoint fatal:", err);
res.status(500).json({
error: {
message: err.message || String(err),
type: "server_error",
code: "relay_failed"
}
});
}
});
// ----------------------------------------------------- // -----------------------------------------------------
// MAIN ENDPOINT (new canonical) // MAIN ENDPOINT (new canonical)
// ----------------------------------------------------- // -----------------------------------------------------
@@ -58,39 +150,8 @@ app.post("/chat", async (req, res) => {
console.log(`Relay → received: "${user_msg}"`); console.log(`Relay → received: "${user_msg}"`);
// 1. → Cortex.reason const result = await handleChatRequest(session_id, user_msg);
let reason; return res.json(result);
try {
reason = await postJSON(CORTEX_REASON, {
session_id,
user_prompt: user_msg
});
} catch (e) {
console.error("Relay → Cortex.reason error:", e.message);
return res.status(500).json({ error: "cortex_reason_failed", detail: e.message });
}
const persona = reason.final_output || reason.persona || "(no persona text)";
// 2. → Cortex.ingest
postJSON(CORTEX_INGEST, {
session_id,
user_msg,
assistant_msg: persona
}).catch(e => console.warn("Relay → Cortex.ingest failed:", e.message));
// 3. → Intake summary
postJSON(INTAKE_URL, {
session_id,
user_msg,
assistant_msg: persona
}).catch(e => console.warn("Relay → Intake failed:", e.message));
// 4. → Return to UI
return res.json({
session_id,
reply: persona
});
} catch (err) { } catch (err) {
console.error("Relay fatal:", err); console.error("Relay fatal:", err);

View File

@@ -0,0 +1 @@
# Ingest module - handles communication with Intake service

View File

@@ -8,9 +8,14 @@ class IntakeClient:
"""Handles short-term / episodic summaries from Intake service.""" """Handles short-term / episodic summaries from Intake service."""
def __init__(self): def __init__(self):
self.base_url = os.getenv("INTAKE_API", "http://intake:7083") self.base_url = os.getenv("INTAKE_API_URL", "http://intake:7080")
async def summarize_turn(self, session_id: str, user_msg: str, assistant_msg: Optional[str] = None) -> Dict[str, Any]: async def summarize_turn(self, session_id: str, user_msg: str, assistant_msg: Optional[str] = None) -> Dict[str, Any]:
"""
DEPRECATED: Intake v0.2 removed the /summarize endpoint.
Use add_exchange() instead, which auto-summarizes in the background.
This method is kept for backwards compatibility but will fail.
"""
payload = { payload = {
"session_id": session_id, "session_id": session_id,
"turns": [{"role": "user", "content": user_msg}] "turns": [{"role": "user", "content": user_msg}]
@@ -24,15 +29,17 @@ class IntakeClient:
r.raise_for_status() r.raise_for_status()
return r.json() return r.json()
except Exception as e: except Exception as e:
logger.warning(f"Intake summarize_turn failed: {e}") logger.warning(f"Intake summarize_turn failed (endpoint removed in v0.2): {e}")
return {} return {}
async def get_context(self, session_id: str) -> str: async def get_context(self, session_id: str) -> str:
"""Get summarized context for a session from Intake."""
async with httpx.AsyncClient(timeout=15) as client: async with httpx.AsyncClient(timeout=15) as client:
try: try:
r = await client.get(f"{self.base_url}/context/{session_id}") r = await client.get(f"{self.base_url}/summaries", params={"session_id": session_id})
r.raise_for_status() r.raise_for_status()
return r.text data = r.json()
return data.get("summary_text", "")
except Exception as e: except Exception as e:
logger.warning(f"Intake get_context failed: {e}") logger.warning(f"Intake get_context failed: {e}")
return "" return ""

1
cortex/llm/__init__.py Normal file
View File

@@ -0,0 +1 @@
# LLM module - provides LLM routing and backend abstraction

View File

@@ -0,0 +1 @@
# Persona module - applies Lyra's personality and speaking style

View File

@@ -0,0 +1 @@
# Reasoning module - multi-stage reasoning pipeline

View File

@@ -1,6 +1,5 @@
# router.py # router.py
from unittest import result
from fastapi import APIRouter, HTTPException from fastapi import APIRouter, HTTPException
from pydantic import BaseModel from pydantic import BaseModel

1
cortex/utils/__init__.py Normal file
View File

@@ -0,0 +1 @@
# Utilities module

View File

@@ -1,101 +1,39 @@
from fastapi import FastAPI, Body, Query, BackgroundTasks from fastapi import FastAPI, Body, Query, BackgroundTasks
from collections import deque from collections import deque
from datetime import datetime from datetime import datetime
from uuid import uuid4
import requests import requests
import os import os
import sys import sys
import asyncio
from dotenv import load_dotenv
# ───────────────────────────────────────────────
# 🔧 Load environment variables
# ───────────────────────────────────────────────
load_dotenv()
# ─────────────────────────────
# Config
# ─────────────────────────────
SUMMARY_MODEL = os.getenv("SUMMARY_MODEL_NAME", "mistral-7b-instruct-v0.2.Q4_K_M.gguf") SUMMARY_MODEL = os.getenv("SUMMARY_MODEL_NAME", "mistral-7b-instruct-v0.2.Q4_K_M.gguf")
SUMMARY_URL = os.getenv("SUMMARY_API_URL", "http://localhost:8080/v1/completions") SUMMARY_URL = os.getenv("SUMMARY_API_URL", "http://localhost:8080/v1/completions")
SUMMARY_MAX_TOKENS = int(os.getenv("SUMMARY_MAX_TOKENS", "200")) SUMMARY_MAX_TOKENS = int(os.getenv("SUMMARY_MAX_TOKENS", "200"))
SUMMARY_TEMPERATURE = float(os.getenv("SUMMARY_TEMPERATURE", "0.3")) SUMMARY_TEMPERATURE = float(os.getenv("SUMMARY_TEMPERATURE", "0.3"))
# ───────────────────────────────────────────────
# 🧠 NeoMem connection (session-aware)
# ───────────────────────────────────────────────
from uuid import uuid4
NEOMEM_API = os.getenv("NEOMEM_API") NEOMEM_API = os.getenv("NEOMEM_API")
NEOMEM_KEY = os.getenv("NEOMEM_KEY") NEOMEM_KEY = os.getenv("NEOMEM_KEY")
def push_summary_to_neomem(summary_text: str, level: str, session_id: str): # ─────────────────────────────
"""Send summarized text to NeoMem, tagged by session_id.""" # App + session buffer
if not NEOMEM_API: # ─────────────────────────────
print("⚠️ NEOMEM_API not set, skipping NeoMem push")
return
payload = {
"messages": [
{"role": "assistant", "content": summary_text}
],
"user_id": "brian",
# optional: uncomment if you want sessions tracked in NeoMem natively
# "run_id": session_id,
"metadata": {
"source": "intake",
"type": "summary",
"level": level,
"session_id": session_id,
"cortex": {}
}
}
headers = {"Content-Type": "application/json"}
if NEOMEM_KEY:
headers["Authorization"] = f"Bearer {NEOMEM_KEY}"
try:
r = requests.post(f"{NEOMEM_API}/memories", json=payload, headers=headers, timeout=25)
r.raise_for_status()
print(f"🧠 NeoMem updated ({level}, {session_id}, {len(summary_text)} chars)")
except Exception as e:
print(f"❌ NeoMem push failed ({level}, {session_id}): {e}")
# ───────────────────────────────────────────────
# ⚙️ FastAPI + buffer setup
# ───────────────────────────────────────────────
app = FastAPI() app = FastAPI()
# Multiple rolling buffers keyed by session_id
SESSIONS = {} SESSIONS = {}
# Summary trigger points
# → low-tier: quick factual recaps
# → mid-tier: “Reality Check” reflections
# → high-tier: rolling continuity synthesis
LEVELS = [1, 2, 5, 10, 20, 30]
@app.on_event("startup") @app.on_event("startup")
def show_boot_banner(): def banner():
print("🧩 Intake booting...") print("🧩 Intake v0.2 booting...")
print(f" Model: {SUMMARY_MODEL}") print(f" Model: {SUMMARY_MODEL}")
print(f" API: {SUMMARY_URL}") print(f" API: {SUMMARY_URL}")
print(f" Max tokens: {SUMMARY_MAX_TOKENS}, Temp: {SUMMARY_TEMPERATURE}")
sys.stdout.flush() sys.stdout.flush()
# ─────────────────────────────────────────────── # ─────────────────────────────
# 🧠 Hierarchical Summarizer (L10→L20→L30 cascade) # Helper: summarize exchanges
# ─────────────────────────────────────────────── # ─────────────────────────────
SUMMARIES_CACHE = {"L10": [], "L20": [], "L30": []} def llm(prompt: str):
def summarize(exchanges, level):
"""Hierarchical summarizer: builds local and meta summaries."""
# Join exchanges into readable text
text = "\n".join(
f"User: {e['turns'][0]['content']}\nAssistant: {e['turns'][1]['content']}"
for e in exchanges
)
def query_llm(prompt: str):
try: try:
resp = requests.post( resp = requests.post(
SUMMARY_URL, SUMMARY_URL,
@@ -105,326 +43,118 @@ def summarize(exchanges, level):
"max_tokens": SUMMARY_MAX_TOKENS, "max_tokens": SUMMARY_MAX_TOKENS,
"temperature": SUMMARY_TEMPERATURE, "temperature": SUMMARY_TEMPERATURE,
}, },
timeout=180, timeout=30,
) )
resp.raise_for_status() resp.raise_for_status()
data = resp.json() return resp.json().get("choices", [{}])[0].get("text", "").strip()
return data.get("choices", [{}])[0].get("text", "").strip()
except Exception as e: except Exception as e:
return f"[Error summarizing: {e}]" return f"[Error summarizing: {e}]"
# ───── L10: local “Reality Check” block ───── def summarize_simple(exchanges):
if level == 10: """Simple factual summary of recent exchanges."""
text = ""
for e in exchanges:
text += f"User: {e['user_msg']}\nAssistant: {e['assistant_msg']}\n\n"
prompt = f""" prompt = f"""
You are Lyra Intake performing a 'Reality Check' for the last {len(exchanges)} exchanges. Summarize the following conversation between Brian (user) and Lyra (assistant).
Summarize this block as one coherent paragraph describing the users focus, progress, and tone. Focus only on factual content. Avoid names, examples, story tone, or invented details.
Avoid bullet points.
Exchanges:
{text}
Reality Check Summary:
"""
summary = query_llm(prompt)
SUMMARIES_CACHE["L10"].append(summary)
# ───── L20: merge L10s ─────
elif level == 20:
# 1⃣ create fresh L10 for 1120
l10_prompt = f"""
You are Lyra Intake generating a second Reality Check for the most recent {len(exchanges)} exchanges.
Summarize them as one paragraph describing what's new or changed since the last block.
Avoid bullet points.
Exchanges:
{text}
Reality Check Summary:
"""
new_l10 = query_llm(l10_prompt)
SUMMARIES_CACHE["L10"].append(new_l10)
# 2⃣ merge all L10s into a Session Overview
joined_l10s = "\n\n".join(SUMMARIES_CACHE["L10"])
l20_prompt = f"""
You are Lyra Intake merging multiple 'Reality Checks' into a single Session Overview.
Summarize the following Reality Checks into one short paragraph capturing the ongoing goals,
patterns, and overall progress.
Reality Checks:
{joined_l10s}
Session Overview:
"""
l20_summary = query_llm(l20_prompt)
SUMMARIES_CACHE["L20"].append(l20_summary)
summary = new_l10 + "\n\n" + l20_summary
# ───── L30: continuity synthesis ─────
elif level == 30:
# 1⃣ create new L10 for 2130
new_l10 = query_llm(f"""
You are Lyra Intake creating a new Reality Check for exchanges 2130.
Summarize this block in one cohesive paragraph, describing any shifts in focus or tone.
Exchanges:
{text}
Reality Check Summary:
""")
SUMMARIES_CACHE["L10"].append(new_l10)
# 2⃣ merge all lower levels for continuity
joined = "\n\n".join(SUMMARIES_CACHE["L10"] + SUMMARIES_CACHE["L20"])
continuity_prompt = f"""
You are Lyra Intake performing a 'Continuity Report' — a high-level reflection combining all Reality Checks
and Session Overviews so far. Describe how the conversation has evolved, the key insights, and remaining threads.
Reality Checks and Overviews:
{joined}
Continuity Report:
"""
l30_summary = query_llm(continuity_prompt)
SUMMARIES_CACHE["L30"].append(l30_summary)
summary = new_l10 + "\n\n" + l30_summary
# ───── L1L5 (standard factual summaries) ─────
else:
prompt = f"""
You are Lyra Intake, a background summarization module for an AI assistant.
Your job is to compress recent chat exchanges between a user and an assistant
into a short, factual summary. The user's name is Brian, and the assistant's name is Lyra.
Focus only on the real conversation content.
Do NOT invent names, people, or examples. Avoid speculation or storytelling.
Summarize clearly what topics were discussed and what conclusions were reached.
Avoid speculation, names, or bullet points.
Exchanges:
{text} {text}
Summary: Summary:
""" """
summary = query_llm(prompt) return llm(prompt)
return f"[L{level} Summary of {len(exchanges)} exchanges]: {summary}" # ─────────────────────────────
# NeoMem push
# ─────────────────────────────
def push_to_neomem(summary: str, session_id: str):
if not NEOMEM_API:
return
from datetime import datetime headers = {"Content-Type": "application/json"}
if NEOMEM_KEY:
headers["Authorization"] = f"Bearer {NEOMEM_KEY}"
LOG_DIR = "/app/logs" payload = {
os.makedirs(LOG_DIR, exist_ok=True) "messages": [{"role": "assistant", "content": summary}],
"user_id": "brian",
"metadata": {
"source": "intake",
"session_id": session_id
}
}
def log_to_file(level: str, summary: str): try:
"""Append each summary to a persistent .txt log file.""" requests.post(
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S") f"{NEOMEM_API}/memories",
filename = os.path.join(LOG_DIR, "summaries.log") json=payload,
with open(filename, "a", encoding="utf-8") as f: headers=headers,
f.write(f"[{timestamp}] {level}\n{summary}\n{'='*60}\n\n") timeout=20
).raise_for_status()
print(f"🧠 NeoMem updated for {session_id}")
except Exception as e:
print(f"NeoMem push failed: {e}")
# ─────────────────────────────────────────────── # ─────────────────────────────
# 🔁 Background summarization helper # Background summarizer
# ─────────────────────────────────────────────── # ─────────────────────────────
def run_summarization_task(exchange, session_id): def bg_summarize(session_id: str):
"""Async-friendly wrapper for slow summarization work."""
try: try:
hopper = SESSIONS.get(session_id) hopper = SESSIONS.get(session_id)
if not hopper: if not hopper:
print(f"⚠️ No hopper found for {session_id}")
return return
buffer = hopper["buffer"] buf = list(hopper["buffer"])
count = len(buffer) summary = summarize_simple(buf)
summaries = {} push_to_neomem(summary, session_id)
if count < 30:
for lvl in LEVELS:
if lvl <= count:
s_text = summarize(list(buffer)[-lvl:], lvl)
log_to_file(f"L{lvl}", s_text)
push_summary_to_neomem(s_text, f"L{lvl}", session_id)
summaries[f"L{lvl}"] = s_text
else:
# optional: include your existing 30+ logic here
pass
if summaries:
print(f"🧩 [BG] Summaries generated asynchronously at count={count}: {list(summaries.keys())}")
print(f"🧩 Summary generated for {session_id}")
except Exception as e: except Exception as e:
print(f"💥 [BG] Async summarization failed: {e}") print(f"Summarizer error: {e}")
# ─────────────────────────────
# Routes
# ─────────────────────────────
# ───────────────────────────────────────────────
# 📨 Routes
# ───────────────────────────────────────────────
@app.post("/add_exchange") @app.post("/add_exchange")
def add_exchange(exchange: dict = Body(...), background_tasks: BackgroundTasks = None): def add_exchange(exchange: dict = Body(...), background_tasks: BackgroundTasks = None):
session_id = exchange.get("session_id") or f"sess-{uuid4().hex[:8]}" session_id = exchange.get("session_id") or f"sess-{uuid4().hex[:8]}"
exchange["session_id"] = session_id exchange["session_id"] = session_id
exchange["timestamp"] = datetime.now().isoformat()
if session_id not in SESSIONS: if session_id not in SESSIONS:
SESSIONS[session_id] = {"buffer": deque(maxlen=100), "last_update": datetime.now()} SESSIONS[session_id] = {
"buffer": deque(maxlen=200),
"created_at": datetime.now()
}
print(f"🆕 Hopper created: {session_id}") print(f"🆕 Hopper created: {session_id}")
hopper = SESSIONS[session_id] SESSIONS[session_id]["buffer"].append(exchange)
hopper["buffer"].append(exchange)
hopper["last_update"] = datetime.now()
count = len(hopper["buffer"])
# 🚀 queue background summarization
if background_tasks: if background_tasks:
background_tasks.add_task(run_summarization_task, exchange, session_id) background_tasks.add_task(bg_summarize, session_id)
print(f"Queued async summarization for {session_id}") print(f"Summarization queued for {session_id}")
return {"ok": True, "exchange_count": count, "queued": True} return {"ok": True, "session_id": session_id}
# # ── Normal tiered behavior up to 30 ── commented out for aysnc addon
# if count < 30:
# if count in LEVELS:
# for lvl in LEVELS:
# if lvl <= count:
# summaries[f"L{lvl}"] = summarize(list(buffer)[-lvl:], lvl)
# log_to_file(f"L{lvl}", summaries[f"L{lvl}"])
# push_summary_to_neomem(summaries[f"L{lvl}"], f"L{lvl}", session_id)
# # 🚀 Launch summarization in the background (non-blocking)
# if background_tasks:
# background_tasks.add_task(run_summarization_task, exchange, session_id)
# print(f"⏩ Queued async summarization for {session_id}")
# # ── Beyond 30: keep summarizing every +15 exchanges ──
# else:
# # Find next milestone after 30 (45, 60, 75, ...)
# milestone = 30 + ((count - 30) // 15) * 15
# if count == milestone:
# summaries[f"L{milestone}"] = summarize(list(buffer)[-15:], milestone)
# log_to_file(f"L{milestone}", summaries[f"L{milestone}"])
# push_summary_to_neomem(summaries[f"L{milestone}"], f"L{milestone}", session_id)
# # Optional: merge all continuity summaries so far into a running meta-summary
# joined = "\n\n".join(
# [s for key, s in summaries.items() if key.startswith("L")]
# )
# meta_prompt = f"""
# You are Lyra Intake composing an 'Ongoing Continuity Report' that merges
# all prior continuity summaries into one living narrative.
# Focus on major themes, changes, and lessons so far.
# Continuity Summaries:
# {joined}
# Ongoing Continuity Report:
# """
# meta_summary = f"[L∞ Ongoing Continuity Report]: {query_llm(meta_prompt)}"
# summaries["L∞"] = meta_summary
# log_to_file("L∞", meta_summary)
# push_summary_to_neomem(meta_summary, "L∞", session_id)
# print(f"🌀 L{milestone} continuity summary created (messages {count-14}-{count})")
# # ── Log summaries ──
# if summaries:
# print(f"🧩 Summaries generated at count={count}: {list(summaries.keys())}")
# return {
# "ok": True,
# "exchange_count": len(buffer),
# "queued": True
# }
# ───────────────────────────────────────────────
# Clear rubbish from hopper.
# ───────────────────────────────────────────────
def close_session(session_id: str):
"""Run a final summary for the given hopper, post it to NeoMem, then delete it."""
hopper = SESSIONS.get(session_id)
if not hopper:
print(f"⚠️ No active hopper for {session_id}")
return
buffer = hopper["buffer"]
if not buffer:
print(f"⚠️ Hopper {session_id} is empty, skipping closure")
del SESSIONS[session_id]
return
try:
print(f"🔒 Closing hopper {session_id} ({len(buffer)} exchanges)")
# Summarize everything left in the buffer
final_summary = summarize(list(buffer), 30) # level 30 = continuity synthesis
log_to_file("LFinal", final_summary)
push_summary_to_neomem(final_summary, "LFinal", session_id)
# Optionally: mark this as a special 'closure' memory
closure_note = f"[Session {session_id} closed with {len(buffer)} exchanges]"
push_summary_to_neomem(closure_note, "LFinalNote", session_id)
print(f"🧹 Hopper {session_id} closed and deleted")
except Exception as e:
print(f"💥 Error closing hopper {session_id}: {e}")
finally:
del SESSIONS[session_id]
@app.post("/close_session/{session_id}") @app.post("/close_session/{session_id}")
def close_session_endpoint(session_id: str): def close_session(session_id: str):
close_session(session_id) if session_id in SESSIONS:
del SESSIONS[session_id]
return {"ok": True, "closed": session_id} return {"ok": True, "closed": session_id}
# ───────────────────────────────────────────────
# 🧾 Provide recent summary for Cortex /reason calls
# ───────────────────────────────────────────────
@app.get("/summaries") @app.get("/summaries")
def get_summary(session_id: str = Query(..., description="Active session ID")): def get_summary(session_id: str = Query(...)):
""" hopper = SESSIONS.get(session_id)
Return the most recent summary (L10→L30→LFinal) for a given session. if not hopper:
If none exist yet, return a placeholder summary. return {"summary_text": "(none)", "session_id": session_id}
"""
try:
# Find the most recent file entry in summaries.log
log_path = os.path.join(LOG_DIR, "summaries.log")
if not os.path.exists(log_path):
return {
"summary_text": "(none)",
"last_message_ts": datetime.now().isoformat(),
"session_id": session_id,
"exchange_count": 0,
}
with open(log_path, "r", encoding="utf-8") as f: summary = summarize_simple(list(hopper["buffer"]))
lines = f.readlines() return {"summary_text": summary, "session_id": session_id}
# Grab the last summary section that mentions this session_id
recent_lines = [ln for ln in lines if session_id in ln or ln.startswith("[L")]
if recent_lines:
# Find the last non-empty summary text
snippet = "".join(recent_lines[-8:]).strip()
else:
snippet = "(no summaries yet)"
return {
"summary_text": snippet[-1000:], # truncate to avoid huge block
"last_message_ts": datetime.now().isoformat(),
"session_id": session_id,
"exchange_count": len(SESSIONS.get(session_id, {}).get("buffer", [])),
}
except Exception as e:
print(f"⚠️ /summaries failed for {session_id}: {e}")
return {
"summary_text": f"(error fetching summaries: {e})",
"last_message_ts": datetime.now().isoformat(),
"session_id": session_id,
"exchange_count": 0,
}
# ───────────────────────────────────────────────
# ✅ Health check
# ───────────────────────────────────────────────
@app.get("/health") @app.get("/health")
def health(): def health():
return {"ok": True, "model": SUMMARY_MODEL, "url": SUMMARY_URL} return {"ok": True, "model": SUMMARY_MODEL, "url": SUMMARY_URL}