docs updated

2025-11-28 18:05:59 -05:00
parent a83405beb1
commit d9281a1816
12 changed files with 557 additions and 477 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,11 +2,106 @@
 All notable changes to Project Lyra are organized by component.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
 and adheres to [Semantic Versioning](https://semver.org/).
-# Last Updated: 11-26-25
+# Last Updated: 11-28-25
 ---
 ## 🧠 Lyra-Core ##############################################################################
 ## [Project Lyra v0.5.0] - 2025-11-28
 ### 🔧 Fixed - Critical API Wiring & Integration
 After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.
 #### Cortex → Intake Integration ✅
 - **Fixed** `IntakeClient` to use correct Intake v0.2 API endpoints
  - Changed `GET /context/{session_id}` → `GET /summaries?session_id={session_id}`
  - Updated JSON response parsing to extract `summary_text` field
  - Fixed environment variable name: `INTAKE_API` → `INTAKE_API_URL`
  - Corrected default port: `7083` → `7080`
  - Added deprecation warning to `summarize_turn()` method (endpoint removed in Intake v0.2)
 #### Relay → UI Compatibility ✅
 - **Added** OpenAI-compatible endpoint `POST /v1/chat/completions`
  - Accepts standard OpenAI format with `messages[]` array
  - Returns OpenAI-compatible response structure with `choices[]`
  - Extracts last message content from messages array
  - Includes usage metadata (stub values for compatibility)
 - **Refactored** Relay to use shared `handleChatRequest()` function
  - Both `/chat` and `/v1/chat/completions` use same core logic
  - Eliminates code duplication
  - Consistent error handling across endpoints
 #### Relay → Intake Connection ✅
 - **Fixed** Intake URL fallback in Relay server configuration
  - Corrected port: `7082` → `7080`
  - Updated endpoint: `/summary` → `/add_exchange`
  - Now properly sends exchanges to Intake for summarization
 #### Code Quality & Python Package Structure ✅
 - **Added** missing `__init__.py` files to all Cortex subdirectories
  - `cortex/llm/__init__.py`
  - `cortex/reasoning/__init__.py`
  - `cortex/persona/__init__.py`
  - `cortex/ingest/__init__.py`
  - `cortex/utils/__init__.py`
  - Improves package imports and IDE support
 - **Removed** unused import in `cortex/router.py`: `from unittest import result`
 - **Deleted** empty file `cortex/llm/resolve_llm_url.py` (was 0 bytes, never implemented)
 ### ✅ Verified Working
 Complete end-to-end message flow now operational:
 ```
 UI → Relay (/v1/chat/completions)
  ↓
 Relay → Cortex (/reason)
  ↓
 Cortex → Intake (/summaries) [retrieves context]
  ↓
 Cortex 4-stage pipeline:
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer
  3. refine.py → polished answer
  4. persona/speak.py → Lyra personality
  ↓
 Cortex → Relay (returns persona response)
  ↓
 Relay → Intake (/add_exchange) [async summary]
  ↓
 Intake → NeoMem (background memory storage)
  ↓
 Relay → UI (final response)
 ```
 ### 📝 Documentation
 - **Added** this CHANGELOG entry with comprehensive v0.5.0 notes
 - **Updated** README.md to reflect v0.5.0 architecture
  - Documented new endpoints
  - Updated data flow diagrams
  - Clarified Intake v0.2 changes
  - Corrected service descriptions
 ### 🐛 Issues Resolved
 - ❌ Cortex could not retrieve context from Intake (wrong endpoint)
 - ❌ UI could not send messages to Relay (endpoint mismatch)
 - ❌ Relay could not send summaries to Intake (wrong port/endpoint)
 - ❌ Python package imports were implicit (missing __init__.py)
 ### ⚠️ Known Issues (Non-Critical)
 - Session management endpoints not implemented in Relay (`GET/POST /sessions/:id`)
 - RAG service currently disabled in docker-compose.yml
 - Cortex `/ingest` endpoint is a stub returning `{"status": "ok"}`
 ### 🎯 Migration Notes
 If upgrading from v0.4.x:
 1. Pull latest changes from git
 2. Verify environment variables in `.env` files:
   - Check `INTAKE_API_URL=http://intake:7080` (not `INTAKE_API`)
   - Verify all service URLs use correct ports
 3. Restart Docker containers: `docker-compose down && docker-compose up -d`
 4. Test with a simple message through the UI
 ---
 ## [Infrastructure v1.0.0] - 2025-11-26
 ### Changed
--- a/README.md
+++ b/README.md
@@ -1,73 +1,178 @@
-##### Project Lyra - README v0.3.0 - needs fixing #####
+# Project Lyra - README v0.5.0
-Lyra is a modular persistent AI companion system.  
+Lyra is a modular persistent AI companion system with advanced reasoning capabilities.
-It provides memory-backed chat using **NeoMem** + **Relay** + **Persona Sidecar**,  
+It provides memory-backed chat using **NeoMem** + **Relay** + **Cortex**,
-with optional subconscious annotation powered by **Cortex VM** running local LLMs.
+with multi-stage reasoning pipeline powered by distributed LLM backends.
-## Mission Statement ##
+## Mission Statement
-	The point of project lyra is to give an AI chatbot more abilities than a typical chatbot. typical chat bots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/data base/ co-creator/collaborattor all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
+
 The point of Project Lyra is to give an AI chatbot more abilities than a typical chatbot. Typical chatbots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/database/co-creator/collaborator all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
 ---
-## Structure ##
+## Architecture Overview
 	Project Lyra exists as a series of docker containers that run independentally of each other but are all networked together. Think of it as how the brain has regions, Lyra has modules:
 	## A. VM 100 - lyra-core:
 		1. ** Core v0.3.1 - Docker Stack
 			- Relay - (docker container) - The main harness that connects the modules together and accepts input from the user.
 			- UI - (HTML) - This is how the user communicates with lyra. ATM its a typical instant message interface, but plans are to make it much more than that.
 			- Persona - (docker container) - This is the personality of lyra, set how you want her to behave. Give specific instructions for output. Basically prompt injection.
 			- All of this is built and controlled by a single .env and docker-compose.lyra.yml.
 		2. **NeoMem v0.1.0 - (docker stack)
 			- NeoMem is Lyra's main long term memory data base. It is a fork of mem0 oss. Uses vector databases and graph.
 			- NeoMem launches with a single separate docker-compose.neomem.yml.
-	## B. VM 101 - lyra - cortex
+Project Lyra operates as a series of Docker containers networked together in a microservices architecture. Like how the brain has regions, Lyra has modules:
 		3. ** Cortex - VM containing docker stack
 		- This is the working reasoning layer of Lyra.
 		- Built to be flexible in deployment. Run it locally or remotely (via wan/lan) 
 		- Intake v0.1.0 - (docker Container) gives conversations context and purpose
 			- Intake takes the last N exchanges and summarizes them into coherrent short term memories.
 			- Uses a cascading summarization setup that quantizes the exchanges. Summaries occur at L2, L5, L10, L15, L20 etc.
 			- Keeps the bot aware of what is going on with out having to send it the whole chat every time. 
 		- Cortex - Docker container containing: 
 			- Reasoning Layer
 				- TBD
 			- Reflect - (docker continer) - Not yet implemented, road map. 
 				- Calls back to NeoMem after N exchanges and N summaries and edits memories created during the initial messaging step. This helps contain memories to coherrent thoughts, reduces the noise.
 				- Can be done actively and asynchronously, or on a time basis (think human sleep and dreams). 
 				- This stage is not yet built, this is just an idea. 
-	## C. Remote LLM APIs:
+### A. VM 100 - lyra-core (Core Services)
-		3. **AI Backends
+
-			- Lyra doesnt run models her self, she calls up APIs.
+**1. Relay** (Node.js/Express) - Port 7078
-			- Endlessly customizable as long as it outputs to the same schema. 
+- Main orchestrator and message router
 - Coordinates all module interactions
 - OpenAI-compatible endpoint: `POST /v1/chat/completions`
 - Internal endpoint: `POST /chat`
 - Routes messages through Cortex reasoning pipeline
 - Manages async calls to Intake and NeoMem
 **2. UI** (Static HTML)
 - Browser-based chat interface with cyberpunk theme
 - Connects to Relay at `http://10.0.0.40:7078`
 - Saves and loads sessions
 - OpenAI-compatible message format
 **3. NeoMem** (Python/FastAPI) - Port 7077
 - Long-term memory database (fork of Mem0 OSS)
 - Vector storage (PostgreSQL + pgvector) + Graph storage (Neo4j)
 - RESTful API: `/memories`, `/search`
 - Semantic memory updates and retrieval
 - No external SDK dependencies - fully local
 ### B. VM 101 - lyra-cortex (Reasoning Layer)
 **4. Cortex** (Python/FastAPI) - Port 7081
 - Primary reasoning engine with multi-stage pipeline
 - **4-Stage Processing:**
  1. **Reflection** - Generates meta-awareness notes about conversation
  2. **Reasoning** - Creates initial draft answer using context
  3. **Refinement** - Polishes and improves the draft
  4. **Persona** - Applies Lyra's personality and speaking style
 - Integrates with Intake for short-term context
 - Flexible LLM router supporting multiple backends
 **5. Intake v0.2** (Python/FastAPI) - Port 7080
 - Simplified short-term memory summarization
 - Session-based circular buffer (deque, maxlen=200)
 - Single-level simple summarization (no cascading)
 - Background async processing with FastAPI BackgroundTasks
 - Pushes summaries to NeoMem automatically
 - **API Endpoints:**
  - `POST /add_exchange` - Add conversation exchange
  - `GET /summaries?session_id={id}` - Retrieve session summary
  - `POST /close_session/{id}` - Close and cleanup session
 ### C. LLM Backends (Remote/Local APIs)
 **Multi-Backend Strategy:**
 - **PRIMARY**: vLLM on AMD MI50 GPU (`http://10.0.0.43:8000`) - Cortex reasoning, Intake
 - **SECONDARY**: Ollama on RTX 3090 (`http://10.0.0.3:11434`) - Configurable per-module
 - **CLOUD**: OpenAI API (`https://api.openai.com/v1`) - Cortex persona layer
 - **FALLBACK**: Local backup (`http://10.0.0.41:11435`) - Emergency fallback 
 ---
 ## Data Flow Architecture (v0.5.0)
-## 🚀 Features ##
+### Normal Message Flow:
-# Lyra-Core VM (VM100)
+```
- **Relay **:
+User (UI) → POST /v1/chat/completions
-  - The main harness and orchestrator of Lyra.
+  ↓
-  - OpenAI-compatible endpoint: `POST /v1/chat/completions`
+Relay (7078)
-  - Injects persona + relevant memories into every LLM call
+  ↓ POST /reason
-  - Routes all memory storage/retrieval through **NeoMem**
+Cortex (7081)
-  - Logs spans (`neomem.add`, `neomem.search`, `persona.fetch`, `llm.generate`)
+  ↓ GET /summaries?session_id=xxx
 Intake (7080) [RETURNS SUMMARY]
  ↓
 Cortex processes (4 stages):
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer (uses LLM)
  3. refine.py → refined answer (uses LLM)
  4. persona/speak.py → Lyra personality (uses LLM)
  ↓
 Returns persona answer to Relay
  ↓
 Relay → Cortex /ingest (async, stub)
 Relay → Intake /add_exchange (async)
  ↓
 Intake → Background summarize → NeoMem
  ↓
 Relay → UI (returns final response)
 ```
- **NeoMem (Memory Engine)**:
+### Cortex 4-Stage Reasoning Pipeline:
  - Forked from Mem0 OSS and fully independent.
  - Drop-in compatible API (`/memories`, `/search`).
  - Local-first: runs on FastAPI with Postgres + Neo4j.
  - No external SDK dependencies.
  - Default service: `neomem-api` (port 7077).
  - Capable of adding new memories and updating previous memories. Compares existing embeddings and performs in place updates when a memory is judged to be a semantic match.
- **UI**:
+1. **Reflection** (`reflection.py`) - Cloud backend (OpenAI)
-  - Lightweight static HTML chat page.
+   - Analyzes user intent and conversation context
-  - Connects to Relay at `http://<host>:7078`.
+   - Generates meta-awareness notes
-  - Nice cyberpunk theme!
+   - "What is the user really asking?"
-  - Saves and loads sessions, which then in turn send to relay.
+
 2. **Reasoning** (`reasoning.py`) - Primary backend (vLLM)
   - Retrieves short-term context from Intake
   - Creates initial draft answer
   - Integrates context, reflection notes, and user prompt
 3. **Refinement** (`refine.py`) - Primary backend (vLLM)
   - Polishes the draft answer
   - Improves clarity and coherence
   - Ensures factual consistency
 4. **Persona** (`speak.py`) - Cloud backend (OpenAI)
   - Applies Lyra's personality and speaking style
   - Natural, conversational output
   - Final answer returned to user
 ---
 ## Features
 ### Lyra-Core (VM 100)
 **Relay**:
 - Main orchestrator and message router
 - OpenAI-compatible endpoint: `POST /v1/chat/completions`
 - Internal endpoint: `POST /chat`
 - Health check: `GET /_health`
 - Async non-blocking calls to Cortex and Intake
 - Shared request handler for code reuse
 - Comprehensive error handling
 **NeoMem (Memory Engine)**:
 - Forked from Mem0 OSS - fully independent
 - Drop-in compatible API (`/memories`, `/search`)
 - Local-first: runs on FastAPI with Postgres + Neo4j
 - No external SDK dependencies
 - Semantic memory updates - compares embeddings and performs in-place updates
 - Default service: `neomem-api` (port 7077)
 **UI**:
 - Lightweight static HTML chat interface
 - Cyberpunk theme
 - Session save/load functionality
 - OpenAI message format support
 ### Cortex (VM 101)
 **Cortex** (v0.5):
 - Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
 - Flexible LLM backend routing
 - Per-stage backend selection
 - Async processing throughout
 - IntakeClient integration for short-term context
 - `/reason`, `/ingest` (stub), `/health` endpoints
 **Intake** (v0.2):
 - Simplified single-level summarization
 - Session-based circular buffer (200 exchanges max)
 - Background async summarization
 - Automatic NeoMem push
 - No persistent log files (memory-only)
 - **Breaking change from v0.1**: Removed cascading summaries (L1, L2, L5, L10, L20, L30)
 **LLM Router**:
 - Dynamic backend selection
 - Environment-driven configuration
 - Support for vLLM, Ollama, OpenAI, custom endpoints
 - Per-module backend preferences
 # Beta Lyrae (RAG Memory DB) - added 11-3-25
 - **RAG Knowledge DB - Beta Lyrae (sheliak)**
@@ -159,7 +264,85 @@ with optional subconscious annotation powered by **Cortex VM** running local LLM
 			└── Future: sends summaries → Cortex for reflection
-# Additional information available in the trilium docs. #
+---
 ## Version History
 ### v0.5.0 (2025-11-28) - Current Release
 - ✅ Fixed all critical API wiring issues
 - ✅ Added OpenAI-compatible endpoint to Relay (`/v1/chat/completions`)
 - ✅ Fixed Cortex → Intake integration
 - ✅ Added missing Python package `__init__.py` files
 - ✅ End-to-end message flow verified and working
 ### v0.4.x (Major Rewire)
 - Cortex multi-stage reasoning pipeline
 - Intake v0.2 simplification
 - LLM router with multi-backend support
 - Major architectural restructuring
 ### v0.3.x
 - Beta Lyrae RAG system
 - NeoMem integration
 - Basic Cortex reasoning loop
 ---
 ## Known Issues (v0.5.0)
 ### Non-Critical
 - Session management endpoints not fully implemented in Relay
 - RAG service currently disabled in docker-compose.yml
 - Cortex `/ingest` endpoint is a stub
 ### Future Enhancements
 - Re-enable RAG service integration
 - Implement full session persistence
 - Add request correlation IDs for tracing
 - Comprehensive health checks
 ---
 ## Quick Start
 ### Prerequisites
 - Docker + Docker Compose
 - PostgreSQL 13+, Neo4j 4.4+ (for NeoMem)
 - At least one LLM API endpoint (vLLM, Ollama, or OpenAI)
 ### Setup
 1. Configure environment variables in `.env` files
 2. Start services: `docker-compose up -d`
 3. Check health: `curl http://localhost:7078/_health`
 4. Access UI: `http://localhost:7078`
 ### Test
 ```bash
 curl -X POST http://localhost:7078/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Hello Lyra!"}],
    "session_id": "test"
  }'
 ```
 ---
 ## Documentation
 - See [CHANGELOG.md](CHANGELOG.md) for detailed version history
 - See `ENVIRONMENT_VARIABLES.md` for environment variable reference
 - Additional information available in the Trilium docs
 ---
 ## License
 NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
 © 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.
 **Built with Claude Code**
 ---
 ## 📦 Requirements
--- a/core/relay/server.js
+++ b/core/relay/server.js
@@ -13,7 +13,7 @@ const PORT = Number(process.env.PORT || 7078);
 // core endpoints
 const CORTEX_REASON = process.env.CORTEX_REASON_URL || "http://cortex:7081/reason";
 const CORTEX_INGEST = process.env.CORTEX_INGEST_URL || "http://cortex:7081/ingest";
-const INTAKE_URL    = process.env.INTAKE_URL       || "http://intake:7082/summary";
+const INTAKE_URL    = process.env.INTAKE_URL       || "http://intake:7080/add_exchange";
 // -----------------------------------------------------
 // Helper request wrapper
@@ -41,6 +41,45 @@ async function postJSON(url, data) {
  return json;
 }
 // -----------------------------------------------------
 // Shared chat handler logic
 // -----------------------------------------------------
 async function handleChatRequest(session_id, user_msg) {
  // 1. → Cortex.reason
  let reason;
  try {
    reason = await postJSON(CORTEX_REASON, {
      session_id,
      user_prompt: user_msg
    });
  } catch (e) {
    console.error("Relay → Cortex.reason error:", e.message);
    throw new Error(`cortex_reason_failed: ${e.message}`);
  }
  const persona = reason.final_output || reason.persona || "(no persona text)";
  // 2. → Cortex.ingest (async, non-blocking)
  postJSON(CORTEX_INGEST, {
    session_id,
    user_msg,
    assistant_msg: persona
  }).catch(e => console.warn("Relay → Cortex.ingest failed:", e.message));
  // 3. → Intake summary (async, non-blocking)
  postJSON(INTAKE_URL, {
    session_id,
    user_msg,
    assistant_msg: persona
  }).catch(e => console.warn("Relay → Intake failed:", e.message));
  // 4. Return result
  return {
    session_id,
    reply: persona
  };
 }
 // -----------------------------------------------------
 // HEALTHCHECK
 // -----------------------------------------------------
@@ -48,6 +87,59 @@ app.get("/_health", (_, res) => {
  res.json({ ok: true });
 });
 // -----------------------------------------------------
 // OPENAI-COMPATIBLE ENDPOINT (for UI)
 // -----------------------------------------------------
 app.post("/v1/chat/completions", async (req, res) => {
  try {
    // Extract from OpenAI format
    const session_id = req.body.session_id || req.body.user || "default";
    const messages = req.body.messages || [];
    const lastMessage = messages[messages.length - 1];
    const user_msg = lastMessage?.content || "";
    if (!user_msg) {
      return res.status(400).json({ error: "No message content provided" });
    }
    console.log(`Relay (v1) → received: "${user_msg}"`);
    // Call the same logic as /chat
    const result = await handleChatRequest(session_id, user_msg);
    // Return in OpenAI format
    return res.json({
      id: `chatcmpl-${Date.now()}`,
      object: "chat.completion",
      created: Math.floor(Date.now() / 1000),
      model: "lyra",
      choices: [{
        index: 0,
        message: {
          role: "assistant",
          content: result.reply
        },
        finish_reason: "stop"
      }],
      usage: {
        prompt_tokens: 0,
        completion_tokens: 0,
        total_tokens: 0
      }
    });
  } catch (err) {
    console.error("Relay v1 endpoint fatal:", err);
    res.status(500).json({
      error: {
        message: err.message || String(err),
        type: "server_error",
        code: "relay_failed"
      }
    });
  }
 });
 // -----------------------------------------------------
 // MAIN ENDPOINT (new canonical)
 // -----------------------------------------------------
@@ -58,39 +150,8 @@ app.post("/chat", async (req, res) => {
    console.log(`Relay → received: "${user_msg}"`);
-    // 1. → Cortex.reason
+    const result = await handleChatRequest(session_id, user_msg);
-    let reason;
+    return res.json(result);
    try {
      reason = await postJSON(CORTEX_REASON, {
        session_id,
        user_prompt: user_msg
      });
    } catch (e) {
      console.error("Relay → Cortex.reason error:", e.message);
      return res.status(500).json({ error: "cortex_reason_failed", detail: e.message });
    }
    const persona = reason.final_output || reason.persona || "(no persona text)";
    // 2. → Cortex.ingest
    postJSON(CORTEX_INGEST, {
      session_id,
      user_msg,
      assistant_msg: persona
    }).catch(e => console.warn("Relay → Cortex.ingest failed:", e.message));
    // 3. → Intake summary
    postJSON(INTAKE_URL, {
      session_id,
      user_msg,
      assistant_msg: persona
    }).catch(e => console.warn("Relay → Intake failed:", e.message));
    // 4. → Return to UI
    return res.json({
      session_id,
      reply: persona
    });
  } catch (err) {
    console.error("Relay fatal:", err);
--- a/cortex/ingest/init.py
+++ b/cortex/ingest/init.py
@@ -0,0 +1 @@
 # Ingest module - handles communication with Intake service
--- a/cortex/ingest/intake_client.py
+++ b/cortex/ingest/intake_client.py
@@ -8,9 +8,14 @@ class IntakeClient:
    """Handles short-term / episodic summaries from Intake service."""
    def __init__(self):
-        self.base_url = os.getenv("INTAKE_API", "http://intake:7083")
+        self.base_url = os.getenv("INTAKE_API_URL", "http://intake:7080")
    async def summarize_turn(self, session_id: str, user_msg: str, assistant_msg: Optional[str] = None) -> Dict[str, Any]:
        """
        DEPRECATED: Intake v0.2 removed the /summarize endpoint.
        Use add_exchange() instead, which auto-summarizes in the background.
        This method is kept for backwards compatibility but will fail.
        """
        payload = {
            "session_id": session_id,
            "turns": [{"role": "user", "content": user_msg}]
@@ -24,15 +29,17 @@ class IntakeClient:
                r.raise_for_status()
                return r.json()
            except Exception as e:
-                logger.warning(f"Intake summarize_turn failed: {e}")
+                logger.warning(f"Intake summarize_turn failed (endpoint removed in v0.2): {e}")
                return {}
    async def get_context(self, session_id: str) -> str:
        """Get summarized context for a session from Intake."""
        async with httpx.AsyncClient(timeout=15) as client:
            try:
-                r = await client.get(f"{self.base_url}/context/{session_id}")
+                r = await client.get(f"{self.base_url}/summaries", params={"session_id": session_id})
                r.raise_for_status()
-                return r.text
+                data = r.json()
                return data.get("summary_text", "")
            except Exception as e:
                logger.warning(f"Intake get_context failed: {e}")
                return ""
--- a/cortex/llm/init.py
+++ b/cortex/llm/init.py
@@ -0,0 +1 @@
 # LLM module - provides LLM routing and backend abstraction
--- a/cortex/llm/resolve_llm_url.py
+++ b/cortex/llm/resolve_llm_url.py
--- a/cortex/persona/init.py
+++ b/cortex/persona/init.py
@@ -0,0 +1 @@
 # Persona module - applies Lyra's personality and speaking style
--- a/cortex/reasoning/init.py
+++ b/cortex/reasoning/init.py
@@ -0,0 +1 @@
 # Reasoning module - multi-stage reasoning pipeline
--- a/cortex/router.py
+++ b/cortex/router.py
@@ -1,6 +1,5 @@
 # router.py
 from unittest import result
 from fastapi import APIRouter, HTTPException
 from pydantic import BaseModel
--- a/cortex/utils/init.py
+++ b/cortex/utils/init.py
@@ -0,0 +1 @@
 # Utilities module
--- a/intake/intake.py
+++ b/intake/intake.py
@@ -1,430 +1,160 @@
 from fastapi import FastAPI, Body, Query, BackgroundTasks
 from collections import deque
 from datetime import datetime
 from uuid import uuid4
 import requests
 import os
 import sys
 import asyncio
 from dotenv import load_dotenv
 # ───────────────────────────────────────────────
 # 🔧 Load environment variables
 # ───────────────────────────────────────────────
 load_dotenv()
 # ─────────────────────────────
 # Config
 # ─────────────────────────────
 SUMMARY_MODEL = os.getenv("SUMMARY_MODEL_NAME", "mistral-7b-instruct-v0.2.Q4_K_M.gguf")
 SUMMARY_URL = os.getenv("SUMMARY_API_URL", "http://localhost:8080/v1/completions")
 SUMMARY_MAX_TOKENS = int(os.getenv("SUMMARY_MAX_TOKENS", "200"))
 SUMMARY_TEMPERATURE = float(os.getenv("SUMMARY_TEMPERATURE", "0.3"))
 # ───────────────────────────────────────────────
 # 🧠 NeoMem connection (session-aware)
 # ───────────────────────────────────────────────
 from uuid import uuid4
 NEOMEM_API = os.getenv("NEOMEM_API")
 NEOMEM_KEY = os.getenv("NEOMEM_KEY")
-def push_summary_to_neomem(summary_text: str, level: str, session_id: str):
+# ─────────────────────────────
-    """Send summarized text to NeoMem, tagged by session_id."""
+# App + session buffer
-    if not NEOMEM_API:
+# ─────────────────────────────
-        print("⚠️  NEOMEM_API not set, skipping NeoMem push")
+app = FastAPI()
-        return
+SESSIONS = {}
-    payload = {
+@app.on_event("startup")
-        "messages": [
+def banner():
-            {"role": "assistant", "content": summary_text}
+    print("🧩 Intake v0.2 booting...")
-        ],
+    print(f"   Model: {SUMMARY_MODEL}")
-        "user_id": "brian",
+    print(f"   API:   {SUMMARY_URL}")
-        # optional: uncomment if you want sessions tracked in NeoMem natively
+    sys.stdout.flush()
-        # "run_id": session_id,
+
-        "metadata": {
+# ─────────────────────────────
-            "source": "intake",
+# Helper: summarize exchanges
-            "type": "summary",
+# ─────────────────────────────
-            "level": level,
+def llm(prompt: str):
-            "session_id": session_id,
+    try:
-            "cortex": {}
+        resp = requests.post(
-        }
+            SUMMARY_URL,
-    }
+            json={
                "model": SUMMARY_MODEL,
                "prompt": prompt,
                "max_tokens": SUMMARY_MAX_TOKENS,
                "temperature": SUMMARY_TEMPERATURE,
            },
            timeout=30,
        )
        resp.raise_for_status()
        return resp.json().get("choices", [{}])[0].get("text", "").strip()
    except Exception as e:
        return f"[Error summarizing: {e}]"
 def summarize_simple(exchanges):
    """Simple factual summary of recent exchanges."""
    text = ""
    for e in exchanges:
        text += f"User: {e['user_msg']}\nAssistant: {e['assistant_msg']}\n\n"
    prompt = f"""
    Summarize the following conversation between Brian (user) and Lyra (assistant).
    Focus only on factual content. Avoid names, examples, story tone, or invented details.
    {text}
    Summary:
    """
    return llm(prompt)
 # ─────────────────────────────
 # NeoMem push
 # ─────────────────────────────
 def push_to_neomem(summary: str, session_id: str):
    if not NEOMEM_API:
        return
    headers = {"Content-Type": "application/json"}
    if NEOMEM_KEY:
        headers["Authorization"] = f"Bearer {NEOMEM_KEY}"
    payload = {
        "messages": [{"role": "assistant", "content": summary}],
        "user_id": "brian",
        "metadata": {
            "source": "intake",
            "session_id": session_id
        }
    }
    try:
-        r = requests.post(f"{NEOMEM_API}/memories", json=payload, headers=headers, timeout=25)
+        requests.post(
-        r.raise_for_status()
+            f"{NEOMEM_API}/memories",
-        print(f"🧠 NeoMem updated ({level}, {session_id}, {len(summary_text)} chars)")
+            json=payload,
            headers=headers,
            timeout=20
        ).raise_for_status()
        print(f"🧠 NeoMem updated for {session_id}")
    except Exception as e:
-        print(f"❌ NeoMem push failed ({level}, {session_id}): {e}")
+        print(f"NeoMem push failed: {e}")
-
+# ─────────────────────────────
-# ───────────────────────────────────────────────
+# Background summarizer
-# ⚙️ FastAPI + buffer setup
+# ─────────────────────────────
-# ───────────────────────────────────────────────
+def bg_summarize(session_id: str):
 app = FastAPI()
 # Multiple rolling buffers keyed by session_id
 SESSIONS = {}
 # Summary trigger points
 # → low-tier: quick factual recaps
 # → mid-tier: “Reality Check” reflections
 # → high-tier: rolling continuity synthesis
 LEVELS = [1, 2, 5, 10, 20, 30]
@app.on_event("startup")
 def show_boot_banner():
    print("🧩 Intake booting...")
    print(f"   Model: {SUMMARY_MODEL}")
    print(f"   API:   {SUMMARY_URL}")
    print(f"   Max tokens: {SUMMARY_MAX_TOKENS}, Temp: {SUMMARY_TEMPERATURE}")
    sys.stdout.flush()
 # ───────────────────────────────────────────────
 # 🧠 Hierarchical Summarizer (L10→L20→L30 cascade)
 # ───────────────────────────────────────────────
 SUMMARIES_CACHE = {"L10": [], "L20": [], "L30": []}
 def summarize(exchanges, level):
    """Hierarchical summarizer: builds local and meta summaries."""
    # Join exchanges into readable text
    text = "\n".join(
        f"User: {e['turns'][0]['content']}\nAssistant: {e['turns'][1]['content']}"
        for e in exchanges
    )
    def query_llm(prompt: str):
        try:
            resp = requests.post(
                SUMMARY_URL,
                json={
                    "model": SUMMARY_MODEL,
                    "prompt": prompt,
                    "max_tokens": SUMMARY_MAX_TOKENS,
                    "temperature": SUMMARY_TEMPERATURE,
                },
                timeout=180,
            )
            resp.raise_for_status()
            data = resp.json()
            return data.get("choices", [{}])[0].get("text", "").strip()
        except Exception as e:
            return f"[Error summarizing: {e}]"
    # ───── L10: local “Reality Check” block ─────
    if level == 10:
        prompt = f"""
        You are Lyra Intake performing a 'Reality Check' for the last {len(exchanges)} exchanges.
        Summarize this block as one coherent paragraph describing the user’s focus, progress, and tone.
        Avoid bullet points.
        Exchanges:
        {text}
        Reality Check Summary:
        """
        summary = query_llm(prompt)
        SUMMARIES_CACHE["L10"].append(summary)
    # ───── L20: merge L10s ─────
    elif level == 20:
        # 1️⃣ create fresh L10 for 11–20
        l10_prompt = f"""
        You are Lyra Intake generating a second Reality Check for the most recent {len(exchanges)} exchanges.
        Summarize them as one paragraph describing what's new or changed since the last block.
        Avoid bullet points.
        Exchanges:
        {text}
        Reality Check Summary:
        """
        new_l10 = query_llm(l10_prompt)
        SUMMARIES_CACHE["L10"].append(new_l10)
        # 2️⃣ merge all L10s into a Session Overview
        joined_l10s = "\n\n".join(SUMMARIES_CACHE["L10"])
        l20_prompt = f"""
        You are Lyra Intake merging multiple 'Reality Checks' into a single Session Overview.
        Summarize the following Reality Checks into one short paragraph capturing the ongoing goals,
        patterns, and overall progress.
        Reality Checks:
        {joined_l10s}
        Session Overview:
        """
        l20_summary = query_llm(l20_prompt)
        SUMMARIES_CACHE["L20"].append(l20_summary)
        summary = new_l10 + "\n\n" + l20_summary
    # ───── L30: continuity synthesis ─────
    elif level == 30:
        # 1️⃣ create new L10 for 21–30
        new_l10 = query_llm(f"""
        You are Lyra Intake creating a new Reality Check for exchanges 21–30.
        Summarize this block in one cohesive paragraph, describing any shifts in focus or tone.
        Exchanges:
        {text}
        Reality Check Summary:
        """)
        SUMMARIES_CACHE["L10"].append(new_l10)
        # 2️⃣ merge all lower levels for continuity
        joined = "\n\n".join(SUMMARIES_CACHE["L10"] + SUMMARIES_CACHE["L20"])
        continuity_prompt = f"""
        You are Lyra Intake performing a 'Continuity Report' — a high-level reflection combining all Reality Checks
        and Session Overviews so far. Describe how the conversation has evolved, the key insights, and remaining threads.
        Reality Checks and Overviews:
        {joined}
        Continuity Report:
        """
        l30_summary = query_llm(continuity_prompt)
        SUMMARIES_CACHE["L30"].append(l30_summary)
        summary = new_l10 + "\n\n" + l30_summary
    # ───── L1–L5 (standard factual summaries) ─────
    else:
        prompt = f"""
        You are Lyra Intake, a background summarization module for an AI assistant.
        Your job is to compress recent chat exchanges between a user and an assistant
        into a short, factual summary. The user's name is Brian, and the assistant's name is Lyra. 
        Focus only on the real conversation content.
        Do NOT invent names, people, or examples. Avoid speculation or storytelling.
        Summarize clearly what topics were discussed and what conclusions were reached.
        Avoid speculation, names, or bullet points.
        Exchanges:
        {text}
        Summary:
        """
        summary = query_llm(prompt)
    return f"[L{level} Summary of {len(exchanges)} exchanges]: {summary}"
 from datetime import datetime
 LOG_DIR = "/app/logs"
 os.makedirs(LOG_DIR, exist_ok=True)
 def log_to_file(level: str, summary: str):
    """Append each summary to a persistent .txt log file."""
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    filename = os.path.join(LOG_DIR, "summaries.log")
    with open(filename, "a", encoding="utf-8") as f:
        f.write(f"[{timestamp}] {level}\n{summary}\n{'='*60}\n\n")
 # ───────────────────────────────────────────────
 # 🔁 Background summarization helper
 # ───────────────────────────────────────────────
 def run_summarization_task(exchange, session_id):
    """Async-friendly wrapper for slow summarization work."""
    try:
        hopper = SESSIONS.get(session_id)
        if not hopper:
            print(f"⚠️ No hopper found for {session_id}")
            return
-        buffer = hopper["buffer"]
+        buf = list(hopper["buffer"])
-        count = len(buffer)
+        summary = summarize_simple(buf)
-        summaries = {}
+        push_to_neomem(summary, session_id)
        if count < 30:
            for lvl in LEVELS:
                if lvl <= count:
                    s_text = summarize(list(buffer)[-lvl:], lvl)
                    log_to_file(f"L{lvl}", s_text)
                    push_summary_to_neomem(s_text, f"L{lvl}", session_id)
                    summaries[f"L{lvl}"] = s_text
        else:
            # optional: include your existing 30+ logic here
            pass
        if summaries:
            print(f"🧩 [BG] Summaries generated asynchronously at count={count}: {list(summaries.keys())}")
        print(f"🧩 Summary generated for {session_id}")
    except Exception as e:
-        print(f"💥 [BG] Async summarization failed: {e}")
+        print(f"Summarizer error: {e}")
 # ─────────────────────────────
 # Routes
 # ─────────────────────────────
 # ───────────────────────────────────────────────
 # 📨 Routes
 # ───────────────────────────────────────────────
@app.post("/add_exchange")
 def add_exchange(exchange: dict = Body(...), background_tasks: BackgroundTasks = None):
    session_id = exchange.get("session_id") or f"sess-{uuid4().hex[:8]}"
    exchange["session_id"] = session_id
    exchange["timestamp"] = datetime.now().isoformat()
    if session_id not in SESSIONS:
-        SESSIONS[session_id] = {"buffer": deque(maxlen=100), "last_update": datetime.now()}
+        SESSIONS[session_id] = {
            "buffer": deque(maxlen=200),
            "created_at": datetime.now()
        }
        print(f"🆕 Hopper created: {session_id}")
-    hopper = SESSIONS[session_id]
+    SESSIONS[session_id]["buffer"].append(exchange)
    hopper["buffer"].append(exchange)
    hopper["last_update"] = datetime.now()
    count = len(hopper["buffer"])
    # 🚀 queue background summarization
    if background_tasks:
-        background_tasks.add_task(run_summarization_task, exchange, session_id)
+        background_tasks.add_task(bg_summarize, session_id)
-        print(f"⏩ Queued async summarization for {session_id}")
+        print(f"⏩ Summarization queued for {session_id}")
-    return {"ok": True, "exchange_count": count, "queued": True}
+    return {"ok": True, "session_id": session_id}
    # # ── Normal tiered behavior up to 30 ── commented out for aysnc addon
    # if count < 30:
        # if count in LEVELS:
            # for lvl in LEVELS:
                # if lvl <= count:
                    # summaries[f"L{lvl}"] = summarize(list(buffer)[-lvl:], lvl)
                    # log_to_file(f"L{lvl}", summaries[f"L{lvl}"])
                    # push_summary_to_neomem(summaries[f"L{lvl}"], f"L{lvl}", session_id)
 # # 🚀 Launch summarization in the background (non-blocking)
 # if background_tasks:
    # background_tasks.add_task(run_summarization_task, exchange, session_id)
    # print(f"⏩ Queued async summarization for {session_id}")
    # # ── Beyond 30: keep summarizing every +15 exchanges ──
    # else:
        # # Find next milestone after 30 (45, 60, 75, ...)
        # milestone = 30 + ((count - 30) // 15) * 15
        # if count == milestone:
            # summaries[f"L{milestone}"] = summarize(list(buffer)[-15:], milestone)
            # log_to_file(f"L{milestone}", summaries[f"L{milestone}"])
            # push_summary_to_neomem(summaries[f"L{milestone}"], f"L{milestone}", session_id)
            # # Optional: merge all continuity summaries so far into a running meta-summary
            # joined = "\n\n".join(
                # [s for key, s in summaries.items() if key.startswith("L")]
            # )
            # meta_prompt = f"""
            # You are Lyra Intake composing an 'Ongoing Continuity Report' that merges
            # all prior continuity summaries into one living narrative.
            # Focus on major themes, changes, and lessons so far.
            # Continuity Summaries:
            # {joined}
            # Ongoing Continuity Report:
            # """
            # meta_summary = f"[L∞ Ongoing Continuity Report]: {query_llm(meta_prompt)}"
            # summaries["L∞"] = meta_summary
            # log_to_file("L∞", meta_summary)
            # push_summary_to_neomem(meta_summary, "L∞", session_id)
            # print(f"🌀 L{milestone} continuity summary created (messages {count-14}-{count})")
    # # ── Log summaries ──
    # if summaries:
        # print(f"🧩 Summaries generated at count={count}: {list(summaries.keys())}")
    # return {
        # "ok": True,
        # "exchange_count": len(buffer),
        # "queued": True
    # }
 # ───────────────────────────────────────────────
 # Clear rubbish from hopper.
 # ───────────────────────────────────────────────
 def close_session(session_id: str):
    """Run a final summary for the given hopper, post it to NeoMem, then delete it."""
    hopper = SESSIONS.get(session_id)
    if not hopper:
        print(f"⚠️ No active hopper for {session_id}")
        return
    buffer = hopper["buffer"]
    if not buffer:
        print(f"⚠️ Hopper {session_id} is empty, skipping closure")
        del SESSIONS[session_id]
        return
    try:
        print(f"🔒 Closing hopper {session_id} ({len(buffer)} exchanges)")
        # Summarize everything left in the buffer
        final_summary = summarize(list(buffer), 30)  # level 30 = continuity synthesis
        log_to_file("LFinal", final_summary)
        push_summary_to_neomem(final_summary, "LFinal", session_id)
        # Optionally: mark this as a special 'closure' memory
        closure_note = f"[Session {session_id} closed with {len(buffer)} exchanges]"
        push_summary_to_neomem(closure_note, "LFinalNote", session_id)
        print(f"🧹 Hopper {session_id} closed and deleted")
    except Exception as e:
        print(f"💥 Error closing hopper {session_id}: {e}")
    finally:
        del SESSIONS[session_id]
@app.post("/close_session/{session_id}")
-def close_session_endpoint(session_id: str):
+def close_session(session_id: str):
-    close_session(session_id)
+    if session_id in SESSIONS:
        del SESSIONS[session_id]
    return {"ok": True, "closed": session_id}
 # ───────────────────────────────────────────────
 # 🧾 Provide recent summary for Cortex /reason calls
 # ───────────────────────────────────────────────
@app.get("/summaries")
-def get_summary(session_id: str = Query(..., description="Active session ID")):
+def get_summary(session_id: str = Query(...)):
-    """
+    hopper = SESSIONS.get(session_id)
-    Return the most recent summary (L10→L30→LFinal) for a given session.
+    if not hopper:
-    If none exist yet, return a placeholder summary.
+        return {"summary_text": "(none)", "session_id": session_id}
    """
    try:
        # Find the most recent file entry in summaries.log
        log_path = os.path.join(LOG_DIR, "summaries.log")
        if not os.path.exists(log_path):
            return {
                "summary_text": "(none)",
                "last_message_ts": datetime.now().isoformat(),
                "session_id": session_id,
                "exchange_count": 0,
            }
-        with open(log_path, "r", encoding="utf-8") as f:
+    summary = summarize_simple(list(hopper["buffer"]))
-            lines = f.readlines()
+    return {"summary_text": summary, "session_id": session_id}
        # Grab the last summary section that mentions this session_id
        recent_lines = [ln for ln in lines if session_id in ln or ln.startswith("[L")]
        if recent_lines:
            # Find the last non-empty summary text
            snippet = "".join(recent_lines[-8:]).strip()
        else:
            snippet = "(no summaries yet)"
        return {
            "summary_text": snippet[-1000:],  # truncate to avoid huge block
            "last_message_ts": datetime.now().isoformat(),
            "session_id": session_id,
            "exchange_count": len(SESSIONS.get(session_id, {}).get("buffer", [])),
        }
    except Exception as e:
        print(f"⚠️ /summaries failed for {session_id}: {e}")
        return {
            "summary_text": f"(error fetching summaries: {e})",
            "last_message_ts": datetime.now().isoformat(),
            "session_id": session_id,
            "exchange_count": 0,
        }
 # ───────────────────────────────────────────────
 # ✅ Health check
 # ───────────────────────────────────────────────
@app.get("/health")
 def health():
    return {"ok": True, "model": SUMMARY_MODEL, "url": SUMMARY_URL}
		`@@ -0,0 +1 @@`
							`# Ingest module - handles communication with Intake service`
		`@@ -0,0 +1 @@`
							`# LLM module - provides LLM routing and backend abstraction`
		`@@ -0,0 +1 @@`
							`# Persona module - applies Lyra's personality and speaking style`
		`@@ -0,0 +1 @@`
							`# Reasoning module - multi-stage reasoning pipeline`