# Project Lyra Changelog All notable changes to Project Lyra. Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/). --- ## [Unreleased] --- ##[0.9.1] - 2025-12-29 Fixed: - chat auto scrolling now works. - Session names don't change to auto gen UID anymore. ## [0.9.0] - 2025-12-29 ### Added - Trilium Notes Integration **Trilium ETAPI Knowledge Base Integration** - **Trilium Tool Executor** [cortex/autonomy/tools/executors/trilium.py](cortex/autonomy/tools/executors/trilium.py) - `search_notes(query, limit)` - Search through Trilium notes via ETAPI - `create_note(title, content, parent_note_id)` - Create new notes in Trilium knowledge base - Full ETAPI authentication and error handling - Automatic parentNoteId defaulting to "root" for root-level notes - Connection error handling with user-friendly messages - **Tool Registry Integration** [cortex/autonomy/tools/registry.py](cortex/autonomy/tools/registry.py) - Added `ENABLE_TRILIUM` feature flag - Tool definitions with schema validation - Provider-agnostic tool calling support - **Setup Documentation** [TRILIUM_SETUP.md](TRILIUM_SETUP.md) - Step-by-step ETAPI token generation guide - Environment configuration instructions - Troubleshooting section for common issues - Security best practices for token management - **API Reference Documentation** [docs/TRILIUM_API.md](docs/TRILIUM_API.md) - Complete ETAPI endpoint reference - Authentication and request/response examples - Search syntax and advanced query patterns **Environment Configuration** - **New Environment Variables** [.env](.env) - `ENABLE_TRILIUM=true` - Enable/disable Trilium integration - `TRILIUM_URL=http://10.0.0.2:4292` - Trilium instance URL - `TRILIUM_ETAPI_TOKEN` - ETAPI authentication token **Capabilities Unlocked** - Personal knowledge base search during conversations - Automatic note creation from conversation insights - Cross-reference information between chat and notes - Context-aware responses using stored knowledge - Future: Find duplicates, suggest organization, summarize notes ### Changed - Spelling Corrections **Module Naming** - Renamed `trillium.py` to `trilium.py` (corrected spelling) - Updated all imports and references across codebase - Fixed environment variable names (TRILLIUM β†’ TRILIUM) - Updated documentation to use correct "Trilium" spelling --- ## [0.8.0] - 2025-12-26 ### Added - Tool Calling & "Show Your Work" Transparency Feature **Tool Calling System (Standard Mode)** - **Function Calling Infrastructure** [cortex/autonomy/tools/](cortex/autonomy/tools/) - Implemented agentic tool calling for Standard Mode with autonomous multi-step execution - Tool registry system with JSON schema definitions - Adapter pattern for provider-agnostic tool calling (OpenAI, Ollama, llama.cpp) - Maximum 5 iterations per request to prevent runaway loops - **Available Tools** - `execute_code` - Sandboxed Python/JavaScript/Bash execution via Docker - `web_search` - Tavily API integration for real-time web queries - `trilium_search` - Internal Trilium knowledge base queries - **Provider Adapters** [cortex/autonomy/tools/adapters/](cortex/autonomy/tools/adapters/) - `OpenAIAdapter` - Native function calling support - `OllamaAdapter` - XML-based tool calling for local models - `LlamaCppAdapter` - XML-based tool calling for llama.cpp backend - Automatic tool call parsing and result formatting - **Code Execution Sandbox** [cortex/autonomy/tools/code_executor.py](cortex/autonomy/tools/code_executor.py) - Docker-based isolated execution environment - Support for Python, JavaScript (Node.js), and Bash - 30-second timeout with automatic cleanup - Returns stdout, stderr, exit code, and execution time - Prevents filesystem access outside sandbox **"Show Your Work" - Real-Time Thinking Stream** - **Server-Sent Events (SSE) Streaming** [cortex/router.py:478-527](cortex/router.py#L478-L527) - New `/stream/thinking/{session_id}` endpoint for real-time event streaming - Broadcasts internal thinking process during tool calling operations - 30-second keepalive with automatic reconnection support - Events: `connected`, `thinking`, `tool_call`, `tool_result`, `done`, `error` - **Stream Manager** [cortex/autonomy/tools/stream_events.py](cortex/autonomy/tools/stream_events.py) - Pub/sub system for managing SSE subscriptions per session - Multiple clients can connect to same session stream - Automatic cleanup of dead queues and closed connections - Zero overhead when no subscribers active - **FunctionCaller Integration** [cortex/autonomy/tools/function_caller.py](cortex/autonomy/tools/function_caller.py) - Enhanced with event emission at each step: - "thinking" events before each LLM call - "tool_call" events when invoking tools - "tool_result" events after tool execution - "done" event with final answer - "error" events on failures - Session-aware streaming (only emits when subscribers exist) - Provider-agnostic implementation works with all backends - **Thinking Stream UI** [core/ui/thinking-stream.html](core/ui/thinking-stream.html) - Dedicated popup window for real-time thinking visualization - Color-coded events: green (thinking), orange (tool calls), blue (results), purple (done), red (errors) - Auto-scrolling event feed with animations - Connection status indicator with green/red dot - Clear events button and session info display - Mobile-friendly responsive design - **UI Integration** [core/ui/index.html](core/ui/index.html) - "🧠 Show Work" button in session selector - Opens thinking stream in popup window - Session ID passed via URL parameter for stream association - Purple/violet button styling to match cyberpunk theme **Tool Calling Configuration** - **Environment Variables** [.env](.env) - `STANDARD_MODE_ENABLE_TOOLS=true` - Enable/disable tool calling - `TAVILY_API_KEY` - API key for web search tool - `TRILLIUM_API_URL` - URL for Trillium knowledge base - **Standard Mode Tools Toggle** [cortex/router.py:389-470](cortex/router.py#L389-L470) - `/simple` endpoint checks `STANDARD_MODE_ENABLE_TOOLS` environment variable - Falls back to non-tool mode if disabled - Logs tool usage statistics (iterations, tools used) ### Changed - CORS & Architecture **CORS Support for SSE** - **Added CORS Middleware** [cortex/main.py](cortex/main.py) - FastAPI CORSMiddleware with wildcard origins for development - Allows cross-origin SSE connections from nginx UI (port 8081) to cortex (port 7081) - Credentials support enabled for authenticated requests - All methods and headers permitted **Tool Calling Pipeline** - **Standard Mode Enhancement** [cortex/router.py:389-470](cortex/router.py#L389-L470) - `/simple` endpoint now supports optional tool calling - Multi-iteration agentic loop with LLM + tool execution - Tool results injected back into conversation for next iteration - Graceful degradation to non-tool mode if tools disabled **JSON Response Formatting** - **SSE Event Structure** [cortex/router.py:497-499](cortex/router.py#L497-L499) - Fixed initial "connected" event to use proper JSON serialization - Changed from f-string with nested quotes to `json.dumps()` - Ensures valid JSON for all event types ### Fixed - Critical JavaScript & SSE Issues **JavaScript Variable Scoping Bug** - **Root cause**: `eventSource` variable used before declaration in [thinking-stream.html:218](core/ui/thinking-stream.html#L218) - **Symptom**: `Uncaught ReferenceError: can't access lexical declaration 'eventSource' before initialization` - **Solution**: Moved variable declarations before `connectStream()` call - **Impact**: Thinking stream page now loads without errors and establishes SSE connection **SSE Connection Not Establishing** - **Root cause**: CORS blocked cross-origin SSE requests from nginx (8081) to cortex (7081) - **Symptom**: Browser silently blocked EventSource connection, no errors in console - **Solution**: Added CORSMiddleware to cortex FastAPI app - **Impact**: SSE streams now connect successfully across ports **Invalid JSON in SSE Events** - **Root cause**: Initial "connected" event used f-string with nested quotes: `f"data: {{'type': 'connected', 'session_id': '{session_id}'}}\n\n"` - **Symptom**: Browser couldn't parse malformed JSON, connection appeared stuck on "Connecting..." - **Solution**: Used `json.dumps()` for proper JSON serialization - **Impact**: Connected event now parsed correctly, status updates to green dot ### Technical Improvements **Agentic Architecture** - Multi-iteration reasoning loop with tool execution - Provider-agnostic tool calling via adapter pattern - Automatic tool result injection into conversation context - Iteration limits to prevent infinite loops - Comprehensive logging at each step **Event Streaming Performance** - Zero overhead when no subscribers (check before emit) - Efficient pub/sub with asyncio queues - Automatic cleanup of disconnected clients - 30-second keepalive prevents timeout issues - Session-isolated streams prevent cross-talk **Code Quality** - Clean separation: tool execution, adapters, streaming, UI - Comprehensive error handling with fallbacks - Detailed logging for debugging tool calls - Type hints and docstrings throughout - Modular design for easy extension **Security** - Sandboxed code execution prevents filesystem access - Timeout limits prevent resource exhaustion - Docker isolation for untrusted code - No code execution without explicit user request ### Architecture - Tool Calling Flow **Standard Mode with Tools:** ``` User (UI) β†’ Relay β†’ Cortex /simple ↓ Check STANDARD_MODE_ENABLE_TOOLS ↓ LLM generates tool call β†’ FunctionCaller ↓ Execute tool (Docker sandbox / API call) ↓ Inject result β†’ LLM (next iteration) ↓ Repeat until done or max iterations ↓ Return final answer β†’ UI ``` **Thinking Stream Flow:** ``` Browser β†’ nginx:8081 β†’ thinking-stream.html ↓ EventSource connects to cortex:7081/stream/thinking/{session_id} ↓ ToolStreamManager.subscribe(session_id) β†’ asyncio.Queue ↓ User sends message β†’ /simple endpoint ↓ FunctionCaller emits events: - emit("thinking") β†’ Queue β†’ SSE β†’ Browser - emit("tool_call") β†’ Queue β†’ SSE β†’ Browser - emit("tool_result") β†’ Queue β†’ SSE β†’ Browser - emit("done") β†’ Queue β†’ SSE β†’ Browser ↓ Browser displays color-coded events in real-time ``` ### Documentation - **Added** [THINKING_STREAM.md](THINKING_STREAM.md) - Complete guide to "Show Your Work" feature - Usage examples with curl - Event type reference - Architecture diagrams - Demo page instructions - **Added** [UI_THINKING_STREAM.md](UI_THINKING_STREAM.md) - UI integration documentation - Button placement and styling - Popup window behavior - Session association logic ### Known Limitations **Tool Calling:** - Limited to 5 iterations per request (prevents runaway loops) - Python sandbox has no filesystem persistence (temporary only) - Web search requires Tavily API key (not free tier unlimited) - Trillium search requires separate knowledge base setup **Thinking Stream:** - CORS wildcard (`*`) is development-only (should restrict in production) - Stream ends after "done" event (must reconnect for new request) - No historical replay (only shows real-time events) - Single session per stream window ### Migration Notes **For Users Upgrading:** 1. New environment variable: `STANDARD_MODE_ENABLE_TOOLS=true` (default: enabled) 2. Thinking stream accessible via "🧠 Show Work" button in UI 3. Tool calling works automatically in Standard Mode when enabled 4. No changes required to existing Standard Mode usage **For Developers:** 1. Cortex now includes CORS middleware for SSE 2. New `/stream/thinking/{session_id}` endpoint available 3. FunctionCaller requires `session_id` parameter for streaming 4. Tool adapters can be extended by adding to `AVAILABLE_TOOLS` registry --- ## [0.7.0] - 2025-12-21 ### Added - Standard Mode & UI Enhancements **Standard Mode Implementation** - Added "Standard Mode" chat option that bypasses complex cortex reasoning pipeline - Provides simple chatbot functionality for coding and practical tasks - Maintains full conversation context across messages - Backend-agnostic - works with SECONDARY (Ollama), OPENAI, or custom backends - Created `/simple` endpoint in Cortex router [cortex/router.py:389](cortex/router.py#L389) - Mode selector in UI with toggle between Standard and Cortex modes - Standard Mode: Direct LLM chat with context retention - Cortex Mode: Full 7-stage reasoning pipeline (unchanged) **Backend Selection System** - UI settings modal with LLM backend selection for Standard Mode - Radio button selector: SECONDARY (Ollama/Qwen), OPENAI (GPT-4o-mini), or custom - Backend preference persisted in localStorage - Custom backend text input for advanced users - Backend parameter routing through entire stack: - UI sends `backend` parameter in request body - Relay forwards backend selection to Cortex - Cortex `/simple` endpoint respects user's backend choice - Environment-based fallback: Uses `STANDARD_MODE_LLM` if no backend specified **Session Management Overhaul** - Complete rewrite of session system to use server-side persistence - File-based storage in `core/relay/sessions/` directory - Session files: `{sessionId}.json` for history, `{sessionId}.meta.json` for metadata - Server is source of truth - sessions sync across browsers and reboots - Session metadata system for friendly names - Sessions display custom names instead of random IDs - Rename functionality in session dropdown - Last modified timestamps and message counts - Full CRUD API for sessions in Relay: - `GET /sessions` - List all sessions with metadata - `GET /sessions/:id` - Retrieve session history - `POST /sessions/:id` - Save session history - `PATCH /sessions/:id/metadata` - Update session name/metadata - `DELETE /sessions/:id` - Delete session and metadata - Session management UI in settings modal: - List of all sessions with message counts and timestamps - Delete button for each session with confirmation - Automatic session cleanup when deleting current session **UI Improvements** - Settings modal with hamburger menu (βš™ Settings button) - Backend selection section for Standard Mode - Session management section with delete functionality - Clean modal overlay with cyberpunk theme - ESC key and click-outside to close - Light/Dark mode toggle with dark mode as default - Theme preference persisted in localStorage - CSS variables for seamless theme switching - Toggle button shows current mode (πŸŒ™ Dark Mode / β˜€οΈ Light Mode) - Removed redundant model selector dropdown from header - Fixed modal positioning and z-index layering - Modal moved outside #chat container for proper rendering - Fixed z-index: overlay (999), modal content (1001) - Centered modal with proper backdrop blur **Context Retention for Standard Mode** - Integration with Intake module for conversation history - Added `get_recent_messages()` function in intake.py - Standard Mode retrieves last 20 messages from session buffer - Full context sent to LLM on each request - Message array format support in LLM router: - Updated Ollama provider to accept `messages` parameter - Updated OpenAI provider to accept `messages` parameter - Automatic conversion from messages to prompt string for non-chat APIs ### Changed - Architecture & Routing **Relay Server Updates** [core/relay/server.js](core/relay/server.js) - ES module migration for session persistence: - Imported `fs/promises`, `path`, `fileURLToPath` for file operations - Created `SESSIONS_DIR` constant for session storage location - Mode-based routing in both `/chat` and `/v1/chat/completions` endpoints: - Extracts `mode` parameter from request body (default: "cortex") - Routes to `CORTEX_SIMPLE` for Standard Mode, `CORTEX_REASON` for Cortex Mode - Backend parameter only used in Standard Mode - Session persistence functions: - `ensureSessionsDir()` - Creates sessions directory if needed - `loadSession(sessionId)` - Reads session history from file - `saveSession(sessionId, history, metadata)` - Writes session to file - `loadSessionMetadata(sessionId)` - Reads session metadata - `saveSessionMetadata(sessionId, metadata)` - Updates session metadata - `listSessions()` - Returns all sessions with metadata, sorted by last modified - `deleteSession(sessionId)` - Removes session and metadata files **Cortex Router Updates** [cortex/router.py](cortex/router.py) - Added `backend` field to `ReasonRequest` Pydantic model (optional) - Created `/simple` endpoint for Standard Mode: - Bypasses reflection, reasoning, refinement stages - Direct LLM call with conversation context - Uses backend from request or falls back to `STANDARD_MODE_LLM` env variable - Returns simple response structure without reasoning artifacts - Backend selection logic in `/simple`: - Normalizes backend names to uppercase - Maps UI backend names to system backend names - Validates backend availability before calling **Intake Integration** [cortex/intake/intake.py](cortex/intake/intake.py) - Added `get_recent_messages(session_id, limit)` function: - Retrieves last N messages from session buffer - Returns empty list if session doesn't exist - Used by `/simple` endpoint for context retrieval **LLM Router Enhancements** [cortex/llm/llm_router.py](cortex/llm/llm_router.py) - Added `messages` parameter support across all providers - Automatic message-to-prompt conversion for legacy APIs - Chat completion format for Ollama and OpenAI providers - Stop sequences for MI50/DeepSeek R1 to prevent runaway generation: - `"User:"`, `"\nUser:"`, `"Assistant:"`, `"\n\n\n"` **Environment Configuration** [.env](.env) - Added `STANDARD_MODE_LLM=SECONDARY` for default Standard Mode backend - Added `CORTEX_SIMPLE_URL=http://cortex:7081/simple` for routing **UI Architecture** [core/ui/index.html](core/ui/index.html) - Server-based session loading system: - `loadSessionsFromServer()` - Fetches sessions from Relay API - `renderSessions()` - Populates session dropdown from server data - Session state synchronized with server on every change - Backend selection persistence: - Loads saved backend from localStorage on page load - Includes backend parameter in request body when in Standard Mode - Settings modal pre-selects current backend choice - Dark mode by default: - Checks localStorage for theme preference - Sets dark theme if no preference found - Toggle button updates localStorage and applies theme **CSS Styling** [core/ui/style.css](core/ui/style.css) - Light mode CSS variables: - `--bg-dark: #f5f5f5` (light background) - `--text-main: #1a1a1a` (dark text) - `--text-fade: #666` (dimmed text) - Dark mode CSS variables (default): - `--bg-dark: #0a0a0a` (dark background) - `--text-main: #e6e6e6` (light text) - `--text-fade: #999` (dimmed text) - Modal positioning fixes: - `position: fixed` with `top: 50%`, `left: 50%`, `transform: translate(-50%, -50%)` - Z-index layering: overlay (999), content (1001) - Backdrop blur effect on modal overlay - Session list styling: - Session item cards with hover effects - Delete button with red hover state - Message count and timestamp display ### Fixed - Critical Issues **DeepSeek R1 Runaway Generation** - Root cause: R1 reasoning model generates thinking process and hallucinates conversations - Solution: - Changed `STANDARD_MODE_LLM` to SECONDARY (Ollama/Qwen) instead of PRIMARY (MI50/R1) - Added stop sequences to MI50 provider to prevent continuation - Documented R1 limitations for Standard Mode usage **Context Not Maintained in Standard Mode** - Root cause: `/simple` endpoint didn't retrieve conversation history from Intake - Solution: - Created `get_recent_messages()` function in intake.py - Standard Mode now pulls last 20 messages from session buffer - Full context sent to LLM with each request - User feedback: "it's saying it hasn't received any other messages from me, so it looks like the standard mode llm isn't getting the full chat" **OpenAI Backend 400 Errors** - Root cause: OpenAI provider only accepted prompt strings, not messages arrays - Solution: Updated OpenAI provider to support messages parameter like Ollama - Now handles chat completion format correctly **Modal Formatting Issues** - Root cause: Settings modal inside #chat container with overflow constraints - Symptoms: Modal appearing at bottom, jumbled layout, couldn't close - Solution: - Moved modal outside #chat container to be direct child of body - Changed positioning from absolute to fixed - Added proper z-index layering (overlay: 999, content: 1001) - Removed old model selector from header - User feedback: "the formating for the settings is all off. Its at the bottom and all jumbling together, i cant get it to go away" **Session Persistence Broken** - Root cause: Sessions stored only in localStorage, not synced with server - Symptoms: Sessions didn't persist across browsers or reboots, couldn't load messages - Solution: Complete rewrite of session system - Implemented server-side file persistence in Relay - Created CRUD API endpoints for session management - Updated UI to load sessions from server instead of localStorage - Added metadata system for session names - Sessions now survive container restarts and sync across browsers - User feedback: "sessions seem to exist locally only, i cant get them to actually load any messages and there is now way to delete them. If i open the ui in a different browser those arent there." ### Technical Improvements **Backward Compatibility** - All changes include defaults to maintain existing behavior - Cortex Mode completely unchanged - still uses full 7-stage pipeline - Standard Mode is opt-in via UI mode selector - If no backend specified, falls back to `STANDARD_MODE_LLM` env variable - Existing requests without mode parameter default to "cortex" **Code Quality** - Consistent async/await patterns throughout stack - Proper error handling with fallbacks - Clean separation between Standard and Cortex modes - Session persistence abstracted into helper functions - Modular UI code with clear event handlers **Performance** - Standard Mode bypasses 6 of 7 reasoning stages for faster responses - Session loading optimized with file-based caching - Backend selection happens once per message, not per LLM call - Minimal overhead for mode detection and routing ### Architecture - Dual-Mode Chat System **Standard Mode Flow:** ``` User (UI) β†’ Relay β†’ Cortex /simple β†’ Intake (get_recent_messages) β†’ LLM (direct call with context) β†’ Relay β†’ UI ``` **Cortex Mode Flow (Unchanged):** ``` User (UI) β†’ Relay β†’ Cortex /reason β†’ Reflection β†’ Reasoning β†’ Refinement β†’ Persona β†’ Relay β†’ UI ``` **Session Persistence:** ``` UI β†’ POST /sessions/:id β†’ Relay β†’ File system (sessions/*.json) UI β†’ GET /sessions β†’ Relay β†’ List all sessions β†’ UI dropdown ``` ### Known Limitations **Standard Mode:** - No reflection, reasoning, or refinement stages - No RAG integration (same as Cortex Mode - currently disabled) - No NeoMem memory storage (same as Cortex Mode - currently disabled) - DeepSeek R1 not recommended for Standard Mode (generates reasoning artifacts) **Session Management:** - Sessions stored in container filesystem - need volume mount for true persistence - No session import/export functionality yet - No session search or filtering ### Migration Notes **For Users Upgrading:** 1. Existing sessions in localStorage will not automatically migrate to server 2. Create new sessions after upgrade for server-side persistence 3. Theme preference (light/dark) will be preserved from localStorage 4. Backend preference will default to SECONDARY if not previously set **For Developers:** 1. Relay now requires `fs/promises` for session persistence 2. Cortex `/simple` endpoint expects `backend` parameter (optional) 3. UI sends `mode` and `backend` parameters in request body 4. Session files stored in `core/relay/sessions/` directory --- ## [0.6.0] - 2025-12-18 ### Added - Autonomy System (Phase 1 & 2) **Autonomy Phase 1** - Self-Awareness & Planning Foundation - **Executive Planning Module** [cortex/autonomy/executive/planner.py](cortex/autonomy/executive/planner.py) - Autonomous goal setting and task planning capabilities - Multi-step reasoning for complex objectives - Integration with self-state tracking - **Self-State Management** [cortex/data/self_state.json](cortex/data/self_state.json) - Persistent state tracking across sessions - Memory of past actions and outcomes - Self-awareness metadata storage - **Self Analyzer** [cortex/autonomy/self/analyzer.py](cortex/autonomy/self/analyzer.py) - Analyzes own performance and decision patterns - Identifies areas for improvement - Tracks cognitive patterns over time - **Test Suite** [cortex/tests/test_autonomy_phase1.py](cortex/tests/test_autonomy_phase1.py) - Unit tests for phase 1 autonomy features **Autonomy Phase 2** - Decision Making & Proactive Behavior - **Autonomous Actions Module** [cortex/autonomy/actions/autonomous_actions.py](cortex/autonomy/actions/autonomous_actions.py) - Self-initiated action execution - Context-aware decision implementation - Action logging and tracking - **Pattern Learning System** [cortex/autonomy/learning/pattern_learner.py](cortex/autonomy/learning/pattern_learner.py) - Learns from interaction patterns - Identifies recurring user needs - Adapts behavior based on learned patterns - **Proactive Monitor** [cortex/autonomy/proactive/monitor.py](cortex/autonomy/proactive/monitor.py) - Monitors system state for intervention opportunities - Detects patterns requiring proactive response - Background monitoring capabilities - **Decision Engine** [cortex/autonomy/tools/decision_engine.py](cortex/autonomy/tools/decision_engine.py) - Autonomous decision-making framework - Weighs options and selects optimal actions - Integrates with orchestrator for coordinated decisions - **Orchestrator** [cortex/autonomy/tools/orchestrator.py](cortex/autonomy/tools/orchestrator.py) - Coordinates multiple autonomy subsystems - Manages tool selection and execution - Handles NeoMem integration (with disable capability) - **Test Suite** [cortex/tests/test_autonomy_phase2.py](cortex/tests/test_autonomy_phase2.py) - Unit tests for phase 2 autonomy features **Autonomy Phase 2.5** - Pipeline Refinement - Tightened integration between autonomy modules and reasoning pipeline - Enhanced self-state persistence and tracking - Improved orchestrator reliability - NeoMem integration refinements in vector store handling [neomem/neomem/vector_stores/qdrant.py](neomem/neomem/vector_stores/qdrant.py) ### Added - Documentation - **Complete AI Agent Breakdown** [docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md](docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md) - Comprehensive system architecture documentation - Detailed component descriptions - Data flow diagrams - Integration points and API specifications ### Changed - Core Integration - **Router Updates** [cortex/router.py](cortex/router.py) - Integrated autonomy subsystems into main routing logic - Added endpoints for autonomous decision-making - Enhanced state management across requests - **Reasoning Pipeline** [cortex/reasoning/reasoning.py](cortex/reasoning/reasoning.py) - Integrated autonomy-aware reasoning - Self-state consideration in reasoning process - **Persona Layer** [cortex/persona/speak.py](cortex/persona/speak.py) - Autonomy-aware response generation - Self-state reflection in personality expression - **Context Handling** [cortex/context.py](cortex/context.py) - NeoMem disable capability for flexible deployment ### Changed - Development Environment - Updated [.gitignore](.gitignore) for better workspace management - Cleaned up VSCode settings - Removed [.vscode/settings.json](.vscode/settings.json) from repository ### Technical Improvements - Modular autonomy architecture with clear separation of concerns - Test-driven development for new autonomy features - Enhanced state persistence across system restarts - Flexible NeoMem integration with enable/disable controls ### Architecture - Autonomy System Design The autonomy system operates in layers: 1. **Executive Layer** - High-level planning and goal setting 2. **Decision Layer** - Evaluates options and makes choices 3. **Action Layer** - Executes autonomous decisions 4. **Learning Layer** - Adapts behavior based on patterns 5. **Monitoring Layer** - Proactive awareness of system state All layers coordinate through the orchestrator and maintain state in `self_state.json`. --- ## [0.5.2] - 2025-12-12 ### Fixed - LLM Router & Async HTTP - **Critical**: Replaced synchronous `requests` with async `httpx` in LLM router [cortex/llm/llm_router.py](cortex/llm/llm_router.py) - Event loop blocking was causing timeouts and empty responses - All three providers (MI50, Ollama, OpenAI) now use `await http_client.post()` - Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake - **Critical**: Fixed missing `backend` parameter in intake summarization [cortex/intake/intake.py:285](cortex/intake/intake.py#L285) - Was defaulting to PRIMARY (MI50) instead of respecting `INTAKE_LLM=SECONDARY` - Now correctly uses configured backend (Ollama on 3090) - **Relay**: Fixed session ID case mismatch [core/relay/server.js:87](core/relay/server.js#L87) - UI sends `sessionId` (camelCase) but relay expected `session_id` (snake_case) - Now accepts both variants: `req.body.session_id || req.body.sessionId` - Custom session IDs now properly tracked instead of defaulting to "default" ### Added - Error Handling & Diagnostics - Added comprehensive error handling in LLM router for all providers - HTTPError, JSONDecodeError, KeyError, and generic Exception handling - Detailed error messages with exception type and description - Provider-specific error logging (mi50, ollama, openai) - Added debug logging in intake summarization - Logs LLM response length and preview - Validates non-empty responses before JSON parsing - Helps diagnose empty or malformed responses ### Added - Session Management - Added session persistence endpoints in relay [core/relay/server.js:160-171](core/relay/server.js#L160-L171) - `GET /sessions/:id` - Retrieve session history - `POST /sessions/:id` - Save session history - In-memory storage using Map (ephemeral, resets on container restart) - Fixes UI "Failed to load session" errors ### Changed - Provider Configuration - Added `mi50` provider support for llama.cpp server [cortex/llm/llm_router.py:62-81](cortex/llm/llm_router.py#L62-L81) - Uses `/completion` endpoint with `n_predict` parameter - Extracts `content` field from response - Configured for MI50 GPU with DeepSeek model - Increased memory retrieval threshold from 0.78 to 0.90 [cortex/.env:20](cortex/.env#L20) - Filters out low-relevance memories (only returns 90%+ similarity) - Reduces noise in context retrieval ### Technical Improvements - Unified async HTTP handling across all LLM providers - Better separation of concerns between provider implementations - Improved error messages for debugging LLM API failures - Consistent timeout handling (120 seconds for all providers) --- ## [0.5.1] - 2025-12-11 ### Fixed - Intake Integration - **Critical**: Fixed `bg_summarize()` function not defined error - Was only a `TYPE_CHECKING` stub, now implemented as logging stub - Eliminated `NameError` preventing SESSIONS from persisting correctly - Function now logs exchange additions and defers summarization to `/reason` endpoint - **Critical**: Fixed `/ingest` endpoint unreachable code in [router.py:201-233](cortex/router.py#L201-L233) - Removed early return that prevented `update_last_assistant_message()` from executing - Removed duplicate `add_exchange_internal()` call - Implemented lenient error handling (each operation wrapped in try/except) - **Intake**: Added missing `__init__.py` to make intake a proper Python package [cortex/intake/__init__.py](cortex/intake/__init__.py) - Prevents namespace package issues - Enables proper module imports - Exports `SESSIONS`, `add_exchange_internal`, `summarize_context` ### Added - Diagnostics & Debugging - Added diagnostic logging to verify SESSIONS singleton behavior - Module initialization logs SESSIONS object ID [intake.py:14](cortex/intake/intake.py#L14) - Each `add_exchange_internal()` call logs object ID and buffer state [intake.py:343-358](cortex/intake/intake.py#L343-L358) - Added `/debug/sessions` HTTP endpoint [router.py:276-305](cortex/router.py#L276-L305) - Inspect SESSIONS from within running Uvicorn worker - Shows total sessions, session count, buffer sizes, recent exchanges - Returns SESSIONS object ID for verification - Added `/debug/summary` HTTP endpoint [router.py:238-271](cortex/router.py#L238-L271) - Test `summarize_context()` for any session - Returns L1/L5/L10/L20/L30 summaries - Includes buffer size and exchange preview ### Changed - Intake Architecture - **Intake no longer standalone service** - runs inside Cortex container as pure Python module - Imported as `from intake.intake import add_exchange_internal, SESSIONS` - No HTTP calls between Cortex and Intake - Eliminates network latency and dependency on Intake service being up - **Deferred summarization**: `bg_summarize()` is now a no-op stub [intake.py:318-325](cortex/intake/intake.py#L318-L325) - Actual summarization happens during `/reason` call via `summarize_context()` - Simplifies async/sync complexity - Prevents NameError when called from `add_exchange_internal()` - **Lenient error handling**: `/ingest` endpoint always returns success [router.py:201-233](cortex/router.py#L201-L233) - Each operation wrapped in try/except - Logs errors but never fails to avoid breaking chat pipeline - User requirement: never fail chat pipeline ### Documentation - Added single-worker constraint note in [cortex/Dockerfile:7-8](cortex/Dockerfile#L7-L8) - Documents that SESSIONS requires single Uvicorn worker - Notes that multi-worker scaling requires Redis or shared storage - Updated plan documentation with root cause analysis --- ## [0.5.0] - 2025-11-28 ### Fixed - Critical API Wiring & Integration After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity. #### Cortex β†’ Intake Integration - **Fixed** `IntakeClient` to use correct Intake v0.2 API endpoints - Changed `GET /context/{session_id}` β†’ `GET /summaries?session_id={session_id}` - Updated JSON response parsing to extract `summary_text` field - Fixed environment variable name: `INTAKE_API` β†’ `INTAKE_API_URL` - Corrected default port: `7083` β†’ `7080` - Added deprecation warning to `summarize_turn()` method (endpoint removed in Intake v0.2) #### Relay β†’ UI Compatibility - **Added** OpenAI-compatible endpoint `POST /v1/chat/completions` - Accepts standard OpenAI format with `messages[]` array - Returns OpenAI-compatible response structure with `choices[]` - Extracts last message content from messages array - Includes usage metadata (stub values for compatibility) - **Refactored** Relay to use shared `handleChatRequest()` function - Both `/chat` and `/v1/chat/completions` use same core logic - Eliminates code duplication - Consistent error handling across endpoints #### Relay β†’ Intake Connection - **Fixed** Intake URL fallback in Relay server configuration - Corrected port: `7082` β†’ `7080` - Updated endpoint: `/summary` β†’ `/add_exchange` - Now properly sends exchanges to Intake for summarization #### Code Quality & Python Package Structure - **Added** missing `__init__.py` files to all Cortex subdirectories - `cortex/llm/__init__.py` - `cortex/reasoning/__init__.py` - `cortex/persona/__init__.py` - `cortex/ingest/__init__.py` - `cortex/utils/__init__.py` - Improves package imports and IDE support - **Removed** unused import in `cortex/router.py`: `from unittest import result` - **Deleted** empty file `cortex/llm/resolve_llm_url.py` (was 0 bytes, never implemented) ### Verified Working Complete end-to-end message flow now operational: ``` UI β†’ Relay (/v1/chat/completions) ↓ Relay β†’ Cortex (/reason) ↓ Cortex β†’ Intake (/summaries) [retrieves context] ↓ Cortex 4-stage pipeline: 1. reflection.py β†’ meta-awareness notes 2. reasoning.py β†’ draft answer 3. refine.py β†’ polished answer 4. persona/speak.py β†’ Lyra personality ↓ Cortex β†’ Relay (returns persona response) ↓ Relay β†’ Intake (/add_exchange) [async summary] ↓ Intake β†’ NeoMem (background memory storage) ↓ Relay β†’ UI (final response) ``` ### Documentation - **Added** comprehensive v0.5.0 changelog entry - **Updated** README.md to reflect v0.5.0 architecture - Documented new endpoints - Updated data flow diagrams - Clarified Intake v0.2 changes - Corrected service descriptions ### Issues Resolved - ❌ Cortex could not retrieve context from Intake (wrong endpoint) - ❌ UI could not send messages to Relay (endpoint mismatch) - ❌ Relay could not send summaries to Intake (wrong port/endpoint) - ❌ Python package imports were implicit (missing __init__.py) ### Known Issues (Non-Critical) - Session management endpoints not implemented in Relay (`GET/POST /sessions/:id`) - RAG service currently disabled in docker-compose.yml - Cortex `/ingest` endpoint is a stub returning `{"status": "ok"}` ### Migration Notes If upgrading from v0.4.x: 1. Pull latest changes from git 2. Verify environment variables in `.env` files: - Check `INTAKE_API_URL=http://intake:7080` (not `INTAKE_API`) - Verify all service URLs use correct ports 3. Restart Docker containers: `docker-compose down && docker-compose up -d` 4. Test with a simple message through the UI --- ## [Infrastructure v1.0.0] - 2025-11-26 ### Changed - Environment Variable Consolidation **Major reorganization to eliminate duplication and improve maintainability** - Consolidated 9 scattered `.env` files into single source of truth architecture - Root `.env` now contains all shared infrastructure (LLM backends, databases, API keys, service URLs) - Service-specific `.env` files minimized to only essential overrides: - `cortex/.env`: Reduced from 42 to 22 lines (operational parameters only) - `neomem/.env`: Reduced from 26 to 14 lines (LLM naming conventions only) - `intake/.env`: Kept at 8 lines (already minimal) - **Result**: ~24% reduction in total configuration lines (197 β†’ ~150) **Docker Compose Consolidation** - All services now defined in single root `docker-compose.yml` - Relay service updated with complete configuration (env_file, volumes) - Removed redundant `core/docker-compose.yml` (marked as DEPRECATED) - Standardized network communication to use Docker container names **Service URL Standardization** - Internal services use container names: `http://neomem-api:7077`, `http://cortex:7081` - External services use IP addresses: `http://10.0.0.43:8000` (vLLM), `http://10.0.0.3:11434` (Ollama) - Removed IP/container name inconsistencies across files ### Added - Security & Documentation **Security Templates** - Created `.env.example` files for all services - Root `.env.example` with sanitized credentials - Service-specific templates: `cortex/.env.example`, `neomem/.env.example`, `intake/.env.example`, `rag/.env.example` - All `.env.example` files safe to commit to version control **Documentation** - `ENVIRONMENT_VARIABLES.md`: Comprehensive reference for all environment variables - Variable descriptions, defaults, and usage examples - Multi-backend LLM strategy documentation - Troubleshooting guide - Security best practices - `DEPRECATED_FILES.md`: Deletion guide for deprecated files with verification steps **Enhanced .gitignore** - Ignores all `.env` files (including subdirectories) - Tracks `.env.example` templates for documentation - Ignores `.env-backups/` directory ### Removed - `core/.env` - Redundant with root `.env`, now deleted - `core/docker-compose.yml` - Consolidated into main compose file (marked DEPRECATED) ### Fixed - Eliminated duplicate `OPENAI_API_KEY` across 5+ files - Eliminated duplicate LLM backend URLs across 4+ files - Eliminated duplicate database credentials across 3+ files - Resolved Cortex `environment:` section override in docker-compose (now uses env_file) ### Architecture - Multi-Backend LLM Strategy Root `.env` provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE: - **Cortex** β†’ vLLM (PRIMARY) for autonomous reasoning - **NeoMem** β†’ Ollama (SECONDARY) + OpenAI embeddings - **Intake** β†’ vLLM (PRIMARY) for summarization - **Relay** β†’ Fallback chain with user preference Preserves per-service flexibility while eliminating URL duplication. ### Migration - All original `.env` files backed up to `.env-backups/` with timestamp `20251126_025334` - Rollback plan documented in `ENVIRONMENT_VARIABLES.md` - Verification steps provided in `DEPRECATED_FILES.md` --- ## [0.4.x] - 2025-11-13 ### Added - Multi-Stage Reasoning Pipeline **Cortex v0.5 - Complete architectural overhaul** - **New `reasoning.py` module** - Async reasoning engine - Accepts user prompt, identity, RAG block, and reflection notes - Produces draft internal answers - Uses primary backend (vLLM) - **New `reflection.py` module** - Fully async meta-awareness layer - Produces actionable JSON "internal notes" - Enforces strict JSON schema and fallback parsing - Forces cloud backend (`backend_override="cloud"`) - **Integrated `refine.py` into pipeline** - New stage between reflection and persona - Runs exclusively on primary vLLM backend (MI50) - Produces final, internally consistent output for downstream persona layer - **Backend override system** - Each LLM call can now select its own backend - Enables multi-LLM cognition: Reflection β†’ cloud, Reasoning β†’ primary - **Identity loader** - Added `identity.py` with `load_identity()` for consistent persona retrieval - **Ingest handler** - Async stub created for future Intake β†’ NeoMem β†’ RAG pipeline **Cortex v0.4.1 - RAG Integration** - **RAG integration** - Added `rag.py` with `query_rag()` and `format_rag_block()` - Cortex now queries local RAG API (`http://10.0.0.41:7090/rag/search`) - Synthesized answers and top excerpts injected into reasoning prompt ### Changed - Unified LLM Architecture **Cortex v0.5** - **Unified LLM backend URL handling across Cortex** - ENV variables must now contain FULL API endpoints - Removed all internal path-appending (e.g. `.../v1/completions`) - `llm_router.py` rewritten to use env-provided URLs as-is - Ensures consistent behavior between draft, reflection, refine, and persona - **Rebuilt `main.py`** - Removed old annotation/analysis logic - New structure: load identity β†’ get RAG β†’ reflect β†’ reason β†’ return draft+notes - Routes now clean and minimal (`/reason`, `/ingest`, `/health`) - Async path throughout Cortex - **Refactored `llm_router.py`** - Removed old fallback logic during overrides - OpenAI requests now use `/v1/chat/completions` - Added proper OpenAI Authorization headers - Distinct payload format for vLLM vs OpenAI - Unified, correct parsing across models - **Simplified Cortex architecture** - Removed deprecated "context.py" and old reasoning code - Relay completely decoupled from smart behavior - **Updated environment specification** - `LLM_PRIMARY_URL` now set to `http://10.0.0.43:8000/v1/completions` - `LLM_SECONDARY_URL` remains `http://10.0.0.3:11434/api/generate` (Ollama) - `LLM_CLOUD_URL` set to `https://api.openai.com/v1/chat/completions` **Cortex v0.4.1** - **Revised `/reason` endpoint** - Now builds unified context blocks: [Intake] β†’ recent summaries, [RAG] β†’ contextual knowledge, [User Message] β†’ current input - Calls `call_llm()` for first pass, then `reflection_loop()` for meta-evaluation - Returns `cortex_prompt`, `draft_output`, `final_output`, and normalized reflection - **Reflection Pipeline Stability** - Cleaned parsing to normalize JSON vs. text reflections - Added fallback handling for malformed or non-JSON outputs - Log system improved to show raw JSON, extracted fields, and normalized summary - **Async Summarization (Intake v0.2.1)** - Intake summaries now run in background threads to avoid blocking Cortex - Summaries (L1–L∞) logged asynchronously with [BG] tags - **Environment & Networking Fixes** - Verified `.env` variables propagate correctly inside Cortex container - Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG - Adjusted localhost calls to service-IP mapping - **Behavioral Updates** - Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers) - RAG context successfully grounds reasoning outputs - Intake and NeoMem confirmed receiving summaries via `/add_exchange` - Log clarity pass: all reflective and contextual blocks clearly labeled ### Fixed **Cortex v0.5** - Resolved endpoint conflict where router expected base URLs and refine expected full URLs - Fixed by standardizing full-URL behavior across entire system - Reflection layer no longer fails silently (previously returned `[""]` due to MythoMax) - Resolved 404/401 errors caused by incorrect OpenAI URL endpoints - No more double-routing through vLLM during reflection - Corrected async/sync mismatch in multiple locations - Eliminated double-path bug (`/v1/completions/v1/completions`) caused by previous router logic ### Removed **Cortex v0.5** - Legacy `annotate`, `reason_check` glue logic from old architecture - Old backend probing junk code - Stale imports and unused modules leftover from previous prototype ### Verified **Cortex v0.5** - Cortex β†’ vLLM (MI50) β†’ refine β†’ final_output now functioning correctly - Refine shows `used_primary_backend: true` and no fallback - Manual curl test confirms endpoint accuracy ### Known Issues **Cortex v0.5** - Refine sometimes prefixes output with `"Final Answer:"`; next version will sanitize this - Hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned) **Cortex v0.4.1** - NeoMem tuning needed - improve retrieval latency and relevance - Need dedicated `/reflections/recent` endpoint for Cortex - Migrate to Cortex-first ingestion (Relay β†’ Cortex β†’ NeoMem) - Add persistent reflection recall (use prior reflections as meta-context) - Improve reflection JSON structure ("insight", "evaluation", "next_action" β†’ guaranteed fields) - Tighten temperature and prompt control for factual consistency - RAG optimization: add source ranking, filtering, multi-vector hybrid search - Cache RAG responses per session to reduce duplicate calls ### Notes **Cortex v0.5** This is the largest structural change to Cortex so far. It establishes: - Multi-model cognition - Clean layering - Identity + reflection separation - Correct async code - Deterministic backend routing - Predictable JSON reflection The system is now ready for: - Refinement loops - Persona-speaking layer - Containerized RAG - Long-term memory integration - True emergent-behavior experiments --- ## [0.3.x] - 2025-10-28 to 2025-09-26 ### Added **[Lyra Core v0.3.2 + Web UI v0.2.0] - 2025-10-28** - **New UI** - Cleaned up UI look and feel - **Sessions** - Sessions now persist over time - Ability to create new sessions or load sessions from previous instance - When changing session, updates what the prompt sends to relay (doesn't prompt with messages from other sessions) - Relay correctly wired in **[Lyra-Core 0.3.1] - 2025-10-09** - **NVGRAM Integration (Full Pipeline Reconnected)** - Replaced legacy Mem0 service with NVGRAM microservice (`nvgram-api` @ port 7077) - Updated `server.js` in Relay to route all memory ops via `${NVGRAM_API}/memories` and `/search` - Added `.env` variable: `NVGRAM_API=http://nvgram-api:7077` - Verified end-to-end Lyra conversation persistence: `relay β†’ nvgram-api β†’ postgres/neo4j β†’ relay β†’ ollama β†’ ui` - βœ… Memories stored, retrieved, and re-injected successfully **[Lyra-Core v0.3.0] - 2025-09-26** - **Salience filtering** in Relay - `.env` configurable: `SALIENCE_ENABLED`, `SALIENCE_MODE`, `SALIENCE_MODEL`, `SALIENCE_API_URL` - Supports `heuristic` and `llm` classification modes - LLM-based salience filter integrated with Cortex VM running `llama-server` - Logging improvements - Added debug logs for salience mode, raw LLM output, and unexpected outputs - Fail-closed behavior for unexpected LLM responses - Successfully tested with **Phi-3.5-mini** and **Qwen2-0.5B-Instruct** as salience classifiers - Verified end-to-end flow: Relay β†’ salience filter β†’ Mem0 add/search β†’ Persona injection β†’ LLM reply **[Cortex v0.3.0] - 2025-10-31** - **Cortex Service (FastAPI)** - New standalone reasoning engine (`cortex/main.py`) with endpoints: - `GET /health` – reports active backend + NeoMem status - `POST /reason` – evaluates `{prompt, response}` pairs - `POST /annotate` – experimental text analysis - Background NeoMem health monitor (5-minute interval) - **Multi-Backend Reasoning Support** - Environment-driven backend selection via `LLM_FORCE_BACKEND` - Supports: Primary (vLLM MI50), Secondary (Ollama 3090), Cloud (OpenAI), Fallback (llama.cpp CPU) - Per-backend model variables: `LLM_PRIMARY_MODEL`, `LLM_SECONDARY_MODEL`, `LLM_CLOUD_MODEL`, `LLM_FALLBACK_MODEL` - **Response Normalization Layer** - Implemented `normalize_llm_response()` to merge streamed outputs and repair malformed JSON - Handles Ollama's multi-line streaming and Mythomax's missing punctuation issues - Prints concise debug previews of merged content - **Environment Simplification** - Each service (`intake`, `cortex`, `neomem`) now maintains its own `.env` file - Removed reliance on shared/global env file to prevent cross-contamination - Verified Docker Compose networking across containers **[NeoMem 0.1.2] - 2025-10-27** (formerly NVGRAM) - **Renamed NVGRAM to NeoMem** - All future updates under name NeoMem - Features unchanged **[NVGRAM 0.1.1] - 2025-10-08** - **Async Memory Rewrite (Stability + Safety Patch)** - Introduced `AsyncMemory` class with fully asynchronous vector and graph store writes - Added input sanitation to prevent embedding errors (`'list' object has no attribute 'replace'`) - Implemented `flatten_messages()` helper in API layer to clean malformed payloads - Added structured request logging via `RequestLoggingMiddleware` (FastAPI middleware) - Health endpoint (`/health`) returns structured JSON `{status, version, service}` - Startup logs include sanitized embedder config with masked API keys **[NVGRAM 0.1.0] - 2025-10-07** - **Initial fork of Mem0 β†’ NVGRAM** - Created fully independent local-first memory engine based on Mem0 OSS - Renamed all internal modules, Docker services, environment variables from `mem0` β†’ `nvgram` - New service name: `nvgram-api`, default port 7077 - Maintains same API endpoints (`/memories`, `/search`) for drop-in compatibility - Uses FastAPI, Postgres, and Neo4j as persistent backends **[Lyra-Mem0 0.3.2] - 2025-10-05** - **Ollama LLM reasoning** alongside OpenAI embeddings - Introduced `LLM_PROVIDER=ollama`, `LLM_MODEL`, and `OLLAMA_HOST` in `.env.3090` - Verified local 3090 setup using `qwen2.5:7b-instruct-q4_K_M` - Split processing: Embeddings β†’ OpenAI `text-embedding-3-small`, LLM β†’ Local Ollama - Added `.env.3090` template for self-hosted inference nodes - Integrated runtime diagnostics and seeder progress tracking - File-level + message-level progress bars - Retry/back-off logic for timeouts (3 attempts) - Event logging (`ADD / UPDATE / NONE`) for every memory record - Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers - Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090) **[Lyra-Mem0 0.3.1] - 2025-10-03** - HuggingFace TEI integration (local 3090 embedder) - Dual-mode environment switch between OpenAI cloud and local - CSV export of memories from Postgres (`payload->>'data'`) **[Lyra-Mem0 0.3.0]** - **Ollama embeddings** in Mem0 OSS container - Configure `EMBEDDER_PROVIDER=ollama`, `EMBEDDER_MODEL`, `OLLAMA_HOST` via `.env` - Mounted `main.py` override from host into container to load custom `DEFAULT_CONFIG` - Installed `ollama` Python client into custom API container image - `.env.3090` file for external embedding mode (3090 machine) - Workflow for multiple embedding modes: LAN-based 3090/Ollama, Local-only CPU, OpenAI fallback **[Lyra-Mem0 v0.2.1]** - **Seeding pipeline** - Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0 - Implemented incremental seeding option (skip existing memories, only add new ones) - Verified insert process with Postgres-backed history DB **[Intake v0.1.0] - 2025-10-27** - Receives messages from relay and summarizes them in cascading format - Continues to summarize smaller amounts of exchanges while generating large-scale conversational summaries (L20) - Currently logs summaries to .log file in `/project-lyra/intake-logs/` **[Lyra-Cortex v0.2.0] - 2025-09-26** - Integrated **llama-server** on dedicated Cortex VM (Proxmox) - Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs - Benchmarked Phi-3.5-mini performance: ~18 tokens/sec CPU-only on Ryzen 7 7800X - Salience classification functional but sometimes inconsistent - Tested **Qwen2-0.5B-Instruct GGUF** as alternative salience classifier - Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval) - More responsive but over-classifies messages as "salient" - Established `.env` integration for model ID (`SALIENCE_MODEL`), enabling hot-swap between models ### Changed **[Lyra-Core 0.3.1] - 2025-10-09** - Renamed `MEM0_URL` β†’ `NVGRAM_API` across all relay environment configs - Updated Docker Compose service dependency order - `relay` now depends on `nvgram-api` healthcheck - Removed `mem0` references and volumes - Minor cleanup to Persona fetch block (null-checks and safer default persona string) **[Lyra-Core v0.3.1] - 2025-09-27** - Removed salience filter logic; Cortex is now default annotator - All user messages stored in Mem0; no discard tier applied - Cortex annotations (`metadata.cortex`) now attached to memories - Debug logging improvements - Pretty-print Cortex annotations - Injected prompt preview - Memory search hit list with scores - `.env` toggle (`CORTEX_ENABLED`) to bypass Cortex when needed **[Lyra-Core v0.3.0] - 2025-09-26** - Refactored `server.js` to gate `mem.add()` calls behind salience filter - Updated `.env` to support `SALIENCE_MODEL` **[Cortex v0.3.0] - 2025-10-31** - Refactored `reason_check()` to dynamically switch between **prompt** and **chat** mode depending on backend - Enhanced startup logs to announce active backend, model, URL, and mode - Improved error handling with clearer "Reasoning error" messages **[NVGRAM 0.1.1] - 2025-10-08** - Replaced synchronous `Memory.add()` with async-safe version supporting concurrent vector + graph writes - Normalized indentation and cleaned duplicate `main.py` references - Removed redundant `FastAPI()` app reinitialization - Updated internal logging to INFO-level timing format - Deprecated `@app.on_event("startup")` β†’ will migrate to `lifespan` handler in v0.1.2 **[NVGRAM 0.1.0] - 2025-10-07** - Removed dependency on external `mem0ai` SDK β€” all logic now local - Re-pinned requirements: fastapi==0.115.8, uvicorn==0.34.0, pydantic==2.10.4, python-dotenv==1.0.1, psycopg>=3.2.8, ollama - Adjusted `docker-compose` and `.env` templates to use new NVGRAM naming **[Lyra-Mem0 0.3.2] - 2025-10-05** - Updated `main.py` configuration block to load `LLM_PROVIDER`, `LLM_MODEL`, `OLLAMA_BASE_URL` - Fallback to OpenAI if Ollama unavailable - Adjusted `docker-compose.yml` mount paths to correctly map `/app/main.py` - Normalized `.env` loading so `mem0-api` and host environment share identical values - Improved seeder logging and progress telemetry - Added explicit `temperature` field to `DEFAULT_CONFIG['llm']['config']` **[Lyra-Mem0 0.3.0]** - `docker-compose.yml` updated to mount local `main.py` and `.env.3090` - Built custom Dockerfile (`mem0-api-server:latest`) extending base image with `pip install ollama` - Updated `requirements.txt` to include `ollama` package - Adjusted Mem0 container config so `main.py` pulls environment variables with `dotenv` - Tested new embeddings path with curl `/memories` API call **[Lyra-Mem0 v0.2.1]** - Updated `main.py` to load configuration from `.env` using `dotenv` and support multiple embedder backends - Mounted host `main.py` into container so local edits persist across rebuilds - Updated `docker-compose.yml` to mount `.env.3090` and support swap between profiles - Built custom Dockerfile (`mem0-api-server:latest`) including `pip install ollama` - Updated `requirements.txt` with `ollama` dependency - Adjusted startup flow so container automatically connects to external Ollama host (LAN IP) - Added logging to confirm model pulls and embedding requests ### Fixed **[Lyra-Core 0.3.1] - 2025-10-09** - Relay startup no longer crashes when NVGRAM is unavailable β€” deferred connection handling - `/memories` POST failures no longer crash Relay; now logged gracefully as `relay error Error: memAdd failed: 500` - Improved injected prompt debugging (`DEBUG_PROMPT=true` now prints clean JSON) **[Lyra-Core v0.3.1] - 2025-09-27** - Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner - Relay no longer "hangs" on malformed Cortex outputs **[Cortex v0.3.0] - 2025-10-31** - Corrected broken vLLM endpoint routing (`/v1/completions`) - Stabilized cross-container health reporting for NeoMem - Resolved JSON parse failures caused by streaming chunk delimiters **[NVGRAM 0.1.1] - 2025-10-08** - Eliminated repeating 500 error from OpenAI embedder caused by non-string message content - Masked API key leaks from boot logs - Ensured Neo4j reconnects gracefully on first retry **[Lyra-Mem0 0.3.2] - 2025-10-05** - Resolved crash during startup: `TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'` - Corrected mount type mismatch (file vs directory) causing `OCI runtime create failed` errors - Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests - "Unknown event" warnings now safely ignored (no longer break seeding loop) - Confirmed full dual-provider operation in logs (`api.openai.com` + `10.0.0.3:11434/api/chat`) **[Lyra-Mem0 0.3.1] - 2025-10-03** - `.env` CRLF vs LF line ending issues - Local seeding now possible via HuggingFace server **[Lyra-Mem0 0.3.0]** - Resolved container boot failure caused by missing `ollama` dependency (`ModuleNotFoundError`) - Fixed config overwrite issue where rebuilding container restored stock `main.py` - Worked around Neo4j error (`vector.similarity.cosine(): mismatched vector dimensions`) by confirming OpenAI vs. Ollama embedding vector sizes **[Lyra-Mem0 v0.2.1]** - Seeder process originally failed on old memories β€” now skips duplicates and continues batch - Resolved container boot error (`ModuleNotFoundError: ollama`) by extending image - Fixed overwrite issue where stock `main.py` replaced custom config during rebuild - Worked around Neo4j `vector.similarity.cosine()` dimension mismatch ### Known Issues **[Lyra-Core v0.3.0] - 2025-09-26** - Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient" - Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi") - CPU-only inference is functional but limited; larger models recommended once GPU available **[Lyra-Cortex v0.2.0] - 2025-09-26** - Small models tend to drift or over-classify - CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models - Need to set up `systemd` service for `llama-server` to auto-start on VM reboot ### Observations **[Lyra-Mem0 0.3.2] - 2025-10-05** - Stable GPU utilization: ~8 GB VRAM @ 92% load, β‰ˆ 67Β°C under sustained seeding - Next revision will re-format seed JSON to preserve `role` context (user vs assistant) **[Lyra-Mem0 v0.2.1]** - To fully unify embedding modes, a Hugging Face / local model with **1536-dim embeddings** will be needed (to match OpenAI's schema) - Current Ollama model (`mxbai-embed-large`) works, but returns 1024-dim vectors - Seeder workflow validated but should be wrapped in repeatable weekly run for full Cloudβ†’Local sync ### Next Steps **[Lyra-Core 0.3.1] - 2025-10-09** - Add salience visualization (e.g., memory weights displayed in injected system message) - Begin schema alignment with NVGRAM v0.1.2 for confidence scoring - Add relay auto-retry for transient 500 responses from NVGRAM **[NVGRAM 0.1.1] - 2025-10-08** - Integrate salience scoring and embedding confidence weight fields in Postgres schema - Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall - Migrate from deprecated `on_event` β†’ `lifespan` pattern in 0.1.2 **[NVGRAM 0.1.0] - 2025-10-07** - Integrate NVGRAM as new default backend in Lyra Relay - Deprecate remaining Mem0 references and archive old configs - Begin versioning as standalone project (`nvgram-core`, `nvgram-api`, etc.) **[Intake v0.1.0] - 2025-10-27** - Feed intake into NeoMem - Generate daily/hourly overall summary (IE: Today Brian and Lyra worked on x, y, and z) - Generate session-aware summaries with own intake hopper --- ## [0.2.x] - 2025-09-30 to 2025-09-24 ### Added **[Lyra-Mem0 v0.2.0] - 2025-09-30** - Standalone **Lyra-Mem0** stack created at `~/lyra-mem0/` - Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking - Added working `docker-compose.mem0.yml` and custom `Dockerfile` for building Mem0 API server - Verified REST API functionality - `POST /memories` works for adding memories - `POST /search` works for semantic search - Successful end-to-end test with persisted memory: *"Likes coffee in the morning"* β†’ retrievable via search βœ… **[Lyra-Core v0.2.0] - 2025-09-24** - Migrated Relay to use `mem0ai` SDK instead of raw fetch calls - Implemented `sessionId` support (client-supplied, fallback to `default`) - Added debug logs for memory add/search - Cleaned up Relay structure for clarity ### Changed **[Lyra-Mem0 v0.2.0] - 2025-09-30** - Split architecture into modular stacks: - `~/lyra-core` (Relay, Persona-Sidecar, etc.) - `~/lyra-mem0` (Mem0 OSS memory stack) - Removed old embedded mem0 containers from Lyra-Core compose file - Added Lyra-Mem0 section in README.md ### Next Steps **[Lyra-Mem0 v0.2.0] - 2025-09-30** - Wire **Relay β†’ Mem0 API** (integration not yet complete) - Add integration tests to verify persistence and retrieval from within Lyra-Core --- ## [0.1.x] - 2025-09-25 to 2025-09-23 ### Added **[Lyra_RAG v0.1.0] - 2025-11-07** - Initial standalone RAG module for Project Lyra - Persistent ChromaDB vector store (`./chromadb`) - Importer `rag_chat_import.py` with: - Recursive folder scanning and category tagging - Smart chunking (~5k chars) - SHA-1 deduplication and chat-ID metadata - Timestamp fields (`file_modified`, `imported_at`) - Background-safe operation (`nohup`/`tmux`) - 68 Lyra-category chats imported: - 6,556 new chunks added - 1,493 duplicates skipped - 7,997 total vectors stored **[Lyra_RAG v0.1.0 API] - 2025-11-07** - `/rag/search` FastAPI endpoint implemented (port 7090) - Supports natural-language queries and returns top related excerpts - Added answer synthesis step using `gpt-4o-mini` **[Lyra-Core v0.1.0] - 2025-09-23** - First working MVP of **Lyra Core Relay** - Relay service accepts `POST /v1/chat/completions` (OpenAI-compatible) - Memory integration with Mem0: - `POST /memories` on each user message - `POST /search` before LLM call - Persona Sidecar integration (`GET /current`) - OpenAI GPT + Ollama (Mythomax) support in Relay - Simple browser-based chat UI (talks to Relay at `http://:7078`) - `.env` standardization for Relay + Mem0 + Postgres + Neo4j - Working Neo4j + Postgres backing stores for Mem0 - Initial MVP relay service with raw fetch calls to Mem0 - Dockerized with basic healthcheck **[Lyra-Cortex v0.1.0] - 2025-09-25** - First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD) - Built **llama.cpp** with `llama-server` target via CMake - Integrated **Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF)** model - Verified API compatibility at `/v1/chat/completions` - Local test successful via `curl` β†’ ~523 token response generated - Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X) - Confirmed usable for salience scoring, summarization, and lightweight reasoning ### Fixed **[Lyra-Core v0.1.0] - 2025-09-23** - Resolved crash loop in Neo4j by restricting env vars (`NEO4J_AUTH` only) - Relay now correctly reads `MEM0_URL` and `MEM0_API_KEY` from `.env` ### Verified **[Lyra_RAG v0.1.0] - 2025-11-07** - Successful recall of Lyra-Core development history (v0.3.0 snapshot) - Correct metadata and category tagging for all new imports ### Known Issues **[Lyra-Core v0.1.0] - 2025-09-23** - No feedback loop (thumbs up/down) yet - Forget/delete flow is manual (via memory IDs) - Memory latency ~1–4s depending on embedding model ### Next Planned **[Lyra_RAG v0.1.0] - 2025-11-07** - Optional `where` filter parameter for category/date queries - Graceful "no results" handler for empty retrievals - `rag_docs_import.py` for PDFs and other document types ---