Files
project-lyra/CHANGELOG.md
serversdwn 6716245a99 v0.9.1
2025-12-29 22:44:47 -05:00

65 KiB
Raw Permalink Blame History

Project Lyra Changelog

All notable changes to Project Lyra. Format based on Keep a Changelog and Semantic Versioning.


[Unreleased]


##[0.9.1] - 2025-12-29 Fixed:

  • chat auto scrolling now works.
  • Session names don't change to auto gen UID anymore.

[0.9.0] - 2025-12-29

Added - Trilium Notes Integration

Trilium ETAPI Knowledge Base Integration

  • Trilium Tool Executor cortex/autonomy/tools/executors/trilium.py
    • search_notes(query, limit) - Search through Trilium notes via ETAPI
    • create_note(title, content, parent_note_id) - Create new notes in Trilium knowledge base
    • Full ETAPI authentication and error handling
    • Automatic parentNoteId defaulting to "root" for root-level notes
    • Connection error handling with user-friendly messages
  • Tool Registry Integration cortex/autonomy/tools/registry.py
    • Added ENABLE_TRILIUM feature flag
    • Tool definitions with schema validation
    • Provider-agnostic tool calling support
  • Setup Documentation TRILIUM_SETUP.md
    • Step-by-step ETAPI token generation guide
    • Environment configuration instructions
    • Troubleshooting section for common issues
    • Security best practices for token management
  • API Reference Documentation docs/TRILIUM_API.md
    • Complete ETAPI endpoint reference
    • Authentication and request/response examples
    • Search syntax and advanced query patterns

Environment Configuration

  • New Environment Variables .env
    • ENABLE_TRILIUM=true - Enable/disable Trilium integration
    • TRILIUM_URL=http://10.0.0.2:4292 - Trilium instance URL
    • TRILIUM_ETAPI_TOKEN - ETAPI authentication token

Capabilities Unlocked

  • Personal knowledge base search during conversations
  • Automatic note creation from conversation insights
  • Cross-reference information between chat and notes
  • Context-aware responses using stored knowledge
  • Future: Find duplicates, suggest organization, summarize notes

Changed - Spelling Corrections

Module Naming

  • Renamed trillium.py to trilium.py (corrected spelling)
  • Updated all imports and references across codebase
  • Fixed environment variable names (TRILLIUM → TRILIUM)
  • Updated documentation to use correct "Trilium" spelling

[0.8.0] - 2025-12-26

Added - Tool Calling & "Show Your Work" Transparency Feature

Tool Calling System (Standard Mode)

  • Function Calling Infrastructure cortex/autonomy/tools/
    • Implemented agentic tool calling for Standard Mode with autonomous multi-step execution
    • Tool registry system with JSON schema definitions
    • Adapter pattern for provider-agnostic tool calling (OpenAI, Ollama, llama.cpp)
    • Maximum 5 iterations per request to prevent runaway loops
  • Available Tools
    • execute_code - Sandboxed Python/JavaScript/Bash execution via Docker
    • web_search - Tavily API integration for real-time web queries
    • trilium_search - Internal Trilium knowledge base queries
  • Provider Adapters cortex/autonomy/tools/adapters/
    • OpenAIAdapter - Native function calling support
    • OllamaAdapter - XML-based tool calling for local models
    • LlamaCppAdapter - XML-based tool calling for llama.cpp backend
    • Automatic tool call parsing and result formatting
  • Code Execution Sandbox cortex/autonomy/tools/code_executor.py
    • Docker-based isolated execution environment
    • Support for Python, JavaScript (Node.js), and Bash
    • 30-second timeout with automatic cleanup
    • Returns stdout, stderr, exit code, and execution time
    • Prevents filesystem access outside sandbox

"Show Your Work" - Real-Time Thinking Stream

  • Server-Sent Events (SSE) Streaming cortex/router.py:478-527
    • New /stream/thinking/{session_id} endpoint for real-time event streaming
    • Broadcasts internal thinking process during tool calling operations
    • 30-second keepalive with automatic reconnection support
    • Events: connected, thinking, tool_call, tool_result, done, error
  • Stream Manager cortex/autonomy/tools/stream_events.py
    • Pub/sub system for managing SSE subscriptions per session
    • Multiple clients can connect to same session stream
    • Automatic cleanup of dead queues and closed connections
    • Zero overhead when no subscribers active
  • FunctionCaller Integration cortex/autonomy/tools/function_caller.py
    • Enhanced with event emission at each step:
      • "thinking" events before each LLM call
      • "tool_call" events when invoking tools
      • "tool_result" events after tool execution
      • "done" event with final answer
      • "error" events on failures
    • Session-aware streaming (only emits when subscribers exist)
    • Provider-agnostic implementation works with all backends
  • Thinking Stream UI core/ui/thinking-stream.html
    • Dedicated popup window for real-time thinking visualization
    • Color-coded events: green (thinking), orange (tool calls), blue (results), purple (done), red (errors)
    • Auto-scrolling event feed with animations
    • Connection status indicator with green/red dot
    • Clear events button and session info display
    • Mobile-friendly responsive design
  • UI Integration core/ui/index.html
    • "🧠 Show Work" button in session selector
    • Opens thinking stream in popup window
    • Session ID passed via URL parameter for stream association
    • Purple/violet button styling to match cyberpunk theme

Tool Calling Configuration

  • Environment Variables .env
    • STANDARD_MODE_ENABLE_TOOLS=true - Enable/disable tool calling
    • TAVILY_API_KEY - API key for web search tool
    • TRILLIUM_API_URL - URL for Trillium knowledge base
  • Standard Mode Tools Toggle cortex/router.py:389-470
    • /simple endpoint checks STANDARD_MODE_ENABLE_TOOLS environment variable
    • Falls back to non-tool mode if disabled
    • Logs tool usage statistics (iterations, tools used)

Changed - CORS & Architecture

CORS Support for SSE

  • Added CORS Middleware cortex/main.py
    • FastAPI CORSMiddleware with wildcard origins for development
    • Allows cross-origin SSE connections from nginx UI (port 8081) to cortex (port 7081)
    • Credentials support enabled for authenticated requests
    • All methods and headers permitted

Tool Calling Pipeline

  • Standard Mode Enhancement cortex/router.py:389-470
    • /simple endpoint now supports optional tool calling
    • Multi-iteration agentic loop with LLM + tool execution
    • Tool results injected back into conversation for next iteration
    • Graceful degradation to non-tool mode if tools disabled

JSON Response Formatting

  • SSE Event Structure cortex/router.py:497-499
    • Fixed initial "connected" event to use proper JSON serialization
    • Changed from f-string with nested quotes to json.dumps()
    • Ensures valid JSON for all event types

Fixed - Critical JavaScript & SSE Issues

JavaScript Variable Scoping Bug

  • Root cause: eventSource variable used before declaration in thinking-stream.html:218
  • Symptom: Uncaught ReferenceError: can't access lexical declaration 'eventSource' before initialization
  • Solution: Moved variable declarations before connectStream() call
  • Impact: Thinking stream page now loads without errors and establishes SSE connection

SSE Connection Not Establishing

  • Root cause: CORS blocked cross-origin SSE requests from nginx (8081) to cortex (7081)
  • Symptom: Browser silently blocked EventSource connection, no errors in console
  • Solution: Added CORSMiddleware to cortex FastAPI app
  • Impact: SSE streams now connect successfully across ports

Invalid JSON in SSE Events

  • Root cause: Initial "connected" event used f-string with nested quotes: f"data: {{'type': 'connected', 'session_id': '{session_id}'}}\n\n"
  • Symptom: Browser couldn't parse malformed JSON, connection appeared stuck on "Connecting..."
  • Solution: Used json.dumps() for proper JSON serialization
  • Impact: Connected event now parsed correctly, status updates to green dot

Technical Improvements

Agentic Architecture

  • Multi-iteration reasoning loop with tool execution
  • Provider-agnostic tool calling via adapter pattern
  • Automatic tool result injection into conversation context
  • Iteration limits to prevent infinite loops
  • Comprehensive logging at each step

Event Streaming Performance

  • Zero overhead when no subscribers (check before emit)
  • Efficient pub/sub with asyncio queues
  • Automatic cleanup of disconnected clients
  • 30-second keepalive prevents timeout issues
  • Session-isolated streams prevent cross-talk

Code Quality

  • Clean separation: tool execution, adapters, streaming, UI
  • Comprehensive error handling with fallbacks
  • Detailed logging for debugging tool calls
  • Type hints and docstrings throughout
  • Modular design for easy extension

Security

  • Sandboxed code execution prevents filesystem access
  • Timeout limits prevent resource exhaustion
  • Docker isolation for untrusted code
  • No code execution without explicit user request

Architecture - Tool Calling Flow

Standard Mode with Tools:

User (UI) → Relay → Cortex /simple
  ↓
  Check STANDARD_MODE_ENABLE_TOOLS
  ↓
  LLM generates tool call → FunctionCaller
  ↓
  Execute tool (Docker sandbox / API call)
  ↓
  Inject result → LLM (next iteration)
  ↓
  Repeat until done or max iterations
  ↓
  Return final answer → UI

Thinking Stream Flow:

Browser → nginx:8081 → thinking-stream.html
  ↓
EventSource connects to cortex:7081/stream/thinking/{session_id}
  ↓
ToolStreamManager.subscribe(session_id) → asyncio.Queue
  ↓
User sends message → /simple endpoint
  ↓
FunctionCaller emits events:
  - emit("thinking") → Queue → SSE → Browser
  - emit("tool_call") → Queue → SSE → Browser
  - emit("tool_result") → Queue → SSE → Browser
  - emit("done") → Queue → SSE → Browser
  ↓
Browser displays color-coded events in real-time

Documentation

  • Added THINKING_STREAM.md - Complete guide to "Show Your Work" feature
    • Usage examples with curl
    • Event type reference
    • Architecture diagrams
    • Demo page instructions
  • Added UI_THINKING_STREAM.md - UI integration documentation
    • Button placement and styling
    • Popup window behavior
    • Session association logic

Known Limitations

Tool Calling:

  • Limited to 5 iterations per request (prevents runaway loops)
  • Python sandbox has no filesystem persistence (temporary only)
  • Web search requires Tavily API key (not free tier unlimited)
  • Trillium search requires separate knowledge base setup

Thinking Stream:

  • CORS wildcard (*) is development-only (should restrict in production)
  • Stream ends after "done" event (must reconnect for new request)
  • No historical replay (only shows real-time events)
  • Single session per stream window

Migration Notes

For Users Upgrading:

  1. New environment variable: STANDARD_MODE_ENABLE_TOOLS=true (default: enabled)
  2. Thinking stream accessible via "🧠 Show Work" button in UI
  3. Tool calling works automatically in Standard Mode when enabled
  4. No changes required to existing Standard Mode usage

For Developers:

  1. Cortex now includes CORS middleware for SSE
  2. New /stream/thinking/{session_id} endpoint available
  3. FunctionCaller requires session_id parameter for streaming
  4. Tool adapters can be extended by adding to AVAILABLE_TOOLS registry

[0.7.0] - 2025-12-21

Added - Standard Mode & UI Enhancements

Standard Mode Implementation

  • Added "Standard Mode" chat option that bypasses complex cortex reasoning pipeline
    • Provides simple chatbot functionality for coding and practical tasks
    • Maintains full conversation context across messages
    • Backend-agnostic - works with SECONDARY (Ollama), OPENAI, or custom backends
    • Created /simple endpoint in Cortex router cortex/router.py:389
  • Mode selector in UI with toggle between Standard and Cortex modes
    • Standard Mode: Direct LLM chat with context retention
    • Cortex Mode: Full 7-stage reasoning pipeline (unchanged)

Backend Selection System

  • UI settings modal with LLM backend selection for Standard Mode
    • Radio button selector: SECONDARY (Ollama/Qwen), OPENAI (GPT-4o-mini), or custom
    • Backend preference persisted in localStorage
    • Custom backend text input for advanced users
  • Backend parameter routing through entire stack:
    • UI sends backend parameter in request body
    • Relay forwards backend selection to Cortex
    • Cortex /simple endpoint respects user's backend choice
  • Environment-based fallback: Uses STANDARD_MODE_LLM if no backend specified

Session Management Overhaul

  • Complete rewrite of session system to use server-side persistence
    • File-based storage in core/relay/sessions/ directory
    • Session files: {sessionId}.json for history, {sessionId}.meta.json for metadata
    • Server is source of truth - sessions sync across browsers and reboots
  • Session metadata system for friendly names
    • Sessions display custom names instead of random IDs
    • Rename functionality in session dropdown
    • Last modified timestamps and message counts
  • Full CRUD API for sessions in Relay:
    • GET /sessions - List all sessions with metadata
    • GET /sessions/:id - Retrieve session history
    • POST /sessions/:id - Save session history
    • PATCH /sessions/:id/metadata - Update session name/metadata
    • DELETE /sessions/:id - Delete session and metadata
  • Session management UI in settings modal:
    • List of all sessions with message counts and timestamps
    • Delete button for each session with confirmation
    • Automatic session cleanup when deleting current session

UI Improvements

  • Settings modal with hamburger menu (⚙ Settings button)
    • Backend selection section for Standard Mode
    • Session management section with delete functionality
    • Clean modal overlay with cyberpunk theme
    • ESC key and click-outside to close
  • Light/Dark mode toggle with dark mode as default
    • Theme preference persisted in localStorage
    • CSS variables for seamless theme switching
    • Toggle button shows current mode (🌙 Dark Mode / ☀️ Light Mode)
  • Removed redundant model selector dropdown from header
  • Fixed modal positioning and z-index layering
    • Modal moved outside #chat container for proper rendering
    • Fixed z-index: overlay (999), modal content (1001)
    • Centered modal with proper backdrop blur

Context Retention for Standard Mode

  • Integration with Intake module for conversation history
    • Added get_recent_messages() function in intake.py
    • Standard Mode retrieves last 20 messages from session buffer
    • Full context sent to LLM on each request
  • Message array format support in LLM router:
    • Updated Ollama provider to accept messages parameter
    • Updated OpenAI provider to accept messages parameter
    • Automatic conversion from messages to prompt string for non-chat APIs

Changed - Architecture & Routing

Relay Server Updates core/relay/server.js

  • ES module migration for session persistence:
    • Imported fs/promises, path, fileURLToPath for file operations
    • Created SESSIONS_DIR constant for session storage location
  • Mode-based routing in both /chat and /v1/chat/completions endpoints:
    • Extracts mode parameter from request body (default: "cortex")
    • Routes to CORTEX_SIMPLE for Standard Mode, CORTEX_REASON for Cortex Mode
    • Backend parameter only used in Standard Mode
  • Session persistence functions:
    • ensureSessionsDir() - Creates sessions directory if needed
    • loadSession(sessionId) - Reads session history from file
    • saveSession(sessionId, history, metadata) - Writes session to file
    • loadSessionMetadata(sessionId) - Reads session metadata
    • saveSessionMetadata(sessionId, metadata) - Updates session metadata
    • listSessions() - Returns all sessions with metadata, sorted by last modified
    • deleteSession(sessionId) - Removes session and metadata files

Cortex Router Updates cortex/router.py

  • Added backend field to ReasonRequest Pydantic model (optional)
  • Created /simple endpoint for Standard Mode:
    • Bypasses reflection, reasoning, refinement stages
    • Direct LLM call with conversation context
    • Uses backend from request or falls back to STANDARD_MODE_LLM env variable
    • Returns simple response structure without reasoning artifacts
  • Backend selection logic in /simple:
    • Normalizes backend names to uppercase
    • Maps UI backend names to system backend names
    • Validates backend availability before calling

Intake Integration cortex/intake/intake.py

  • Added get_recent_messages(session_id, limit) function:
    • Retrieves last N messages from session buffer
    • Returns empty list if session doesn't exist
    • Used by /simple endpoint for context retrieval

LLM Router Enhancements cortex/llm/llm_router.py

  • Added messages parameter support across all providers
  • Automatic message-to-prompt conversion for legacy APIs
  • Chat completion format for Ollama and OpenAI providers
  • Stop sequences for MI50/DeepSeek R1 to prevent runaway generation:
    • "User:", "\nUser:", "Assistant:", "\n\n\n"

Environment Configuration .env

  • Added STANDARD_MODE_LLM=SECONDARY for default Standard Mode backend
  • Added CORTEX_SIMPLE_URL=http://cortex:7081/simple for routing

UI Architecture core/ui/index.html

  • Server-based session loading system:
    • loadSessionsFromServer() - Fetches sessions from Relay API
    • renderSessions() - Populates session dropdown from server data
    • Session state synchronized with server on every change
  • Backend selection persistence:
    • Loads saved backend from localStorage on page load
    • Includes backend parameter in request body when in Standard Mode
    • Settings modal pre-selects current backend choice
  • Dark mode by default:
    • Checks localStorage for theme preference
    • Sets dark theme if no preference found
    • Toggle button updates localStorage and applies theme

CSS Styling core/ui/style.css

  • Light mode CSS variables:
    • --bg-dark: #f5f5f5 (light background)
    • --text-main: #1a1a1a (dark text)
    • --text-fade: #666 (dimmed text)
  • Dark mode CSS variables (default):
    • --bg-dark: #0a0a0a (dark background)
    • --text-main: #e6e6e6 (light text)
    • --text-fade: #999 (dimmed text)
  • Modal positioning fixes:
    • position: fixed with top: 50%, left: 50%, transform: translate(-50%, -50%)
    • Z-index layering: overlay (999), content (1001)
    • Backdrop blur effect on modal overlay
  • Session list styling:
    • Session item cards with hover effects
    • Delete button with red hover state
    • Message count and timestamp display

Fixed - Critical Issues

DeepSeek R1 Runaway Generation

  • Root cause: R1 reasoning model generates thinking process and hallucinates conversations
  • Solution:
    • Changed STANDARD_MODE_LLM to SECONDARY (Ollama/Qwen) instead of PRIMARY (MI50/R1)
    • Added stop sequences to MI50 provider to prevent continuation
    • Documented R1 limitations for Standard Mode usage

Context Not Maintained in Standard Mode

  • Root cause: /simple endpoint didn't retrieve conversation history from Intake
  • Solution:
    • Created get_recent_messages() function in intake.py
    • Standard Mode now pulls last 20 messages from session buffer
    • Full context sent to LLM with each request
  • User feedback: "it's saying it hasn't received any other messages from me, so it looks like the standard mode llm isn't getting the full chat"

OpenAI Backend 400 Errors

  • Root cause: OpenAI provider only accepted prompt strings, not messages arrays
  • Solution: Updated OpenAI provider to support messages parameter like Ollama
  • Now handles chat completion format correctly

Modal Formatting Issues

  • Root cause: Settings modal inside #chat container with overflow constraints
  • Symptoms: Modal appearing at bottom, jumbled layout, couldn't close
  • Solution:
    • Moved modal outside #chat container to be direct child of body
    • Changed positioning from absolute to fixed
    • Added proper z-index layering (overlay: 999, content: 1001)
    • Removed old model selector from header
  • User feedback: "the formating for the settings is all off. Its at the bottom and all jumbling together, i cant get it to go away"

Session Persistence Broken

  • Root cause: Sessions stored only in localStorage, not synced with server
  • Symptoms: Sessions didn't persist across browsers or reboots, couldn't load messages
  • Solution: Complete rewrite of session system
    • Implemented server-side file persistence in Relay
    • Created CRUD API endpoints for session management
    • Updated UI to load sessions from server instead of localStorage
    • Added metadata system for session names
    • Sessions now survive container restarts and sync across browsers
  • User feedback: "sessions seem to exist locally only, i cant get them to actually load any messages and there is now way to delete them. If i open the ui in a different browser those arent there."

Technical Improvements

Backward Compatibility

  • All changes include defaults to maintain existing behavior
  • Cortex Mode completely unchanged - still uses full 7-stage pipeline
  • Standard Mode is opt-in via UI mode selector
  • If no backend specified, falls back to STANDARD_MODE_LLM env variable
  • Existing requests without mode parameter default to "cortex"

Code Quality

  • Consistent async/await patterns throughout stack
  • Proper error handling with fallbacks
  • Clean separation between Standard and Cortex modes
  • Session persistence abstracted into helper functions
  • Modular UI code with clear event handlers

Performance

  • Standard Mode bypasses 6 of 7 reasoning stages for faster responses
  • Session loading optimized with file-based caching
  • Backend selection happens once per message, not per LLM call
  • Minimal overhead for mode detection and routing

Architecture - Dual-Mode Chat System

Standard Mode Flow:

User (UI) → Relay → Cortex /simple → Intake (get_recent_messages)
→ LLM (direct call with context) → Relay → UI

Cortex Mode Flow (Unchanged):

User (UI) → Relay → Cortex /reason → Reflection → Reasoning
→ Refinement → Persona → Relay → UI

Session Persistence:

UI → POST /sessions/:id → Relay → File system (sessions/*.json)
UI → GET /sessions → Relay → List all sessions → UI dropdown

Known Limitations

Standard Mode:

  • No reflection, reasoning, or refinement stages
  • No RAG integration (same as Cortex Mode - currently disabled)
  • No NeoMem memory storage (same as Cortex Mode - currently disabled)
  • DeepSeek R1 not recommended for Standard Mode (generates reasoning artifacts)

Session Management:

  • Sessions stored in container filesystem - need volume mount for true persistence
  • No session import/export functionality yet
  • No session search or filtering

Migration Notes

For Users Upgrading:

  1. Existing sessions in localStorage will not automatically migrate to server
  2. Create new sessions after upgrade for server-side persistence
  3. Theme preference (light/dark) will be preserved from localStorage
  4. Backend preference will default to SECONDARY if not previously set

For Developers:

  1. Relay now requires fs/promises for session persistence
  2. Cortex /simple endpoint expects backend parameter (optional)
  3. UI sends mode and backend parameters in request body
  4. Session files stored in core/relay/sessions/ directory

[0.6.0] - 2025-12-18

Added - Autonomy System (Phase 1 & 2)

Autonomy Phase 1 - Self-Awareness & Planning Foundation

Autonomy Phase 2 - Decision Making & Proactive Behavior

Autonomy Phase 2.5 - Pipeline Refinement

  • Tightened integration between autonomy modules and reasoning pipeline
  • Enhanced self-state persistence and tracking
  • Improved orchestrator reliability
  • NeoMem integration refinements in vector store handling neomem/neomem/vector_stores/qdrant.py

Added - Documentation

  • Complete AI Agent Breakdown docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md
    • Comprehensive system architecture documentation
    • Detailed component descriptions
    • Data flow diagrams
    • Integration points and API specifications

Changed - Core Integration

  • Router Updates cortex/router.py
    • Integrated autonomy subsystems into main routing logic
    • Added endpoints for autonomous decision-making
    • Enhanced state management across requests
  • Reasoning Pipeline cortex/reasoning/reasoning.py
    • Integrated autonomy-aware reasoning
    • Self-state consideration in reasoning process
  • Persona Layer cortex/persona/speak.py
    • Autonomy-aware response generation
    • Self-state reflection in personality expression
  • Context Handling cortex/context.py
    • NeoMem disable capability for flexible deployment

Changed - Development Environment

Technical Improvements

  • Modular autonomy architecture with clear separation of concerns
  • Test-driven development for new autonomy features
  • Enhanced state persistence across system restarts
  • Flexible NeoMem integration with enable/disable controls

Architecture - Autonomy System Design

The autonomy system operates in layers:

  1. Executive Layer - High-level planning and goal setting
  2. Decision Layer - Evaluates options and makes choices
  3. Action Layer - Executes autonomous decisions
  4. Learning Layer - Adapts behavior based on patterns
  5. Monitoring Layer - Proactive awareness of system state

All layers coordinate through the orchestrator and maintain state in self_state.json.


[0.5.2] - 2025-12-12

Fixed - LLM Router & Async HTTP

  • Critical: Replaced synchronous requests with async httpx in LLM router cortex/llm/llm_router.py
    • Event loop blocking was causing timeouts and empty responses
    • All three providers (MI50, Ollama, OpenAI) now use await http_client.post()
    • Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake
  • Critical: Fixed missing backend parameter in intake summarization cortex/intake/intake.py:285
    • Was defaulting to PRIMARY (MI50) instead of respecting INTAKE_LLM=SECONDARY
    • Now correctly uses configured backend (Ollama on 3090)
  • Relay: Fixed session ID case mismatch core/relay/server.js:87
    • UI sends sessionId (camelCase) but relay expected session_id (snake_case)
    • Now accepts both variants: req.body.session_id || req.body.sessionId
    • Custom session IDs now properly tracked instead of defaulting to "default"

Added - Error Handling & Diagnostics

  • Added comprehensive error handling in LLM router for all providers
    • HTTPError, JSONDecodeError, KeyError, and generic Exception handling
    • Detailed error messages with exception type and description
    • Provider-specific error logging (mi50, ollama, openai)
  • Added debug logging in intake summarization
    • Logs LLM response length and preview
    • Validates non-empty responses before JSON parsing
    • Helps diagnose empty or malformed responses

Added - Session Management

  • Added session persistence endpoints in relay core/relay/server.js:160-171
    • GET /sessions/:id - Retrieve session history
    • POST /sessions/:id - Save session history
    • In-memory storage using Map (ephemeral, resets on container restart)
    • Fixes UI "Failed to load session" errors

Changed - Provider Configuration

  • Added mi50 provider support for llama.cpp server cortex/llm/llm_router.py:62-81
    • Uses /completion endpoint with n_predict parameter
    • Extracts content field from response
    • Configured for MI50 GPU with DeepSeek model
  • Increased memory retrieval threshold from 0.78 to 0.90 cortex/.env:20
    • Filters out low-relevance memories (only returns 90%+ similarity)
    • Reduces noise in context retrieval

Technical Improvements

  • Unified async HTTP handling across all LLM providers
  • Better separation of concerns between provider implementations
  • Improved error messages for debugging LLM API failures
  • Consistent timeout handling (120 seconds for all providers)

[0.5.1] - 2025-12-11

Fixed - Intake Integration

  • Critical: Fixed bg_summarize() function not defined error
    • Was only a TYPE_CHECKING stub, now implemented as logging stub
    • Eliminated NameError preventing SESSIONS from persisting correctly
    • Function now logs exchange additions and defers summarization to /reason endpoint
  • Critical: Fixed /ingest endpoint unreachable code in router.py:201-233
    • Removed early return that prevented update_last_assistant_message() from executing
    • Removed duplicate add_exchange_internal() call
    • Implemented lenient error handling (each operation wrapped in try/except)
  • Intake: Added missing __init__.py to make intake a proper Python package cortex/intake/init.py
    • Prevents namespace package issues
    • Enables proper module imports
    • Exports SESSIONS, add_exchange_internal, summarize_context

Added - Diagnostics & Debugging

  • Added diagnostic logging to verify SESSIONS singleton behavior
  • Added /debug/sessions HTTP endpoint router.py:276-305
    • Inspect SESSIONS from within running Uvicorn worker
    • Shows total sessions, session count, buffer sizes, recent exchanges
    • Returns SESSIONS object ID for verification
  • Added /debug/summary HTTP endpoint router.py:238-271
    • Test summarize_context() for any session
    • Returns L1/L5/L10/L20/L30 summaries
    • Includes buffer size and exchange preview

Changed - Intake Architecture

  • Intake no longer standalone service - runs inside Cortex container as pure Python module
    • Imported as from intake.intake import add_exchange_internal, SESSIONS
    • No HTTP calls between Cortex and Intake
    • Eliminates network latency and dependency on Intake service being up
  • Deferred summarization: bg_summarize() is now a no-op stub intake.py:318-325
    • Actual summarization happens during /reason call via summarize_context()
    • Simplifies async/sync complexity
    • Prevents NameError when called from add_exchange_internal()
  • Lenient error handling: /ingest endpoint always returns success router.py:201-233
    • Each operation wrapped in try/except
    • Logs errors but never fails to avoid breaking chat pipeline
    • User requirement: never fail chat pipeline

Documentation

  • Added single-worker constraint note in cortex/Dockerfile:7-8
    • Documents that SESSIONS requires single Uvicorn worker
    • Notes that multi-worker scaling requires Redis or shared storage
  • Updated plan documentation with root cause analysis

[0.5.0] - 2025-11-28

Fixed - Critical API Wiring & Integration

After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.

Cortex → Intake Integration

  • Fixed IntakeClient to use correct Intake v0.2 API endpoints
    • Changed GET /context/{session_id}GET /summaries?session_id={session_id}
    • Updated JSON response parsing to extract summary_text field
    • Fixed environment variable name: INTAKE_APIINTAKE_API_URL
    • Corrected default port: 70837080
    • Added deprecation warning to summarize_turn() method (endpoint removed in Intake v0.2)

Relay → UI Compatibility

  • Added OpenAI-compatible endpoint POST /v1/chat/completions
    • Accepts standard OpenAI format with messages[] array
    • Returns OpenAI-compatible response structure with choices[]
    • Extracts last message content from messages array
    • Includes usage metadata (stub values for compatibility)
  • Refactored Relay to use shared handleChatRequest() function
    • Both /chat and /v1/chat/completions use same core logic
    • Eliminates code duplication
    • Consistent error handling across endpoints

Relay → Intake Connection

  • Fixed Intake URL fallback in Relay server configuration
    • Corrected port: 70827080
    • Updated endpoint: /summary/add_exchange
    • Now properly sends exchanges to Intake for summarization

Code Quality & Python Package Structure

  • Added missing __init__.py files to all Cortex subdirectories
    • cortex/llm/__init__.py
    • cortex/reasoning/__init__.py
    • cortex/persona/__init__.py
    • cortex/ingest/__init__.py
    • cortex/utils/__init__.py
    • Improves package imports and IDE support
  • Removed unused import in cortex/router.py: from unittest import result
  • Deleted empty file cortex/llm/resolve_llm_url.py (was 0 bytes, never implemented)

Verified Working

Complete end-to-end message flow now operational:

UI → Relay (/v1/chat/completions)
  ↓
Relay → Cortex (/reason)
  ↓
Cortex → Intake (/summaries) [retrieves context]
  ↓
Cortex 4-stage pipeline:
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer
  3. refine.py → polished answer
  4. persona/speak.py → Lyra personality
  ↓
Cortex → Relay (returns persona response)
  ↓
Relay → Intake (/add_exchange) [async summary]
  ↓
Intake → NeoMem (background memory storage)
  ↓
Relay → UI (final response)

Documentation

  • Added comprehensive v0.5.0 changelog entry
  • Updated README.md to reflect v0.5.0 architecture
    • Documented new endpoints
    • Updated data flow diagrams
    • Clarified Intake v0.2 changes
    • Corrected service descriptions

Issues Resolved

  • Cortex could not retrieve context from Intake (wrong endpoint)
  • UI could not send messages to Relay (endpoint mismatch)
  • Relay could not send summaries to Intake (wrong port/endpoint)
  • Python package imports were implicit (missing init.py)

Known Issues (Non-Critical)

  • Session management endpoints not implemented in Relay (GET/POST /sessions/:id)
  • RAG service currently disabled in docker-compose.yml
  • Cortex /ingest endpoint is a stub returning {"status": "ok"}

Migration Notes

If upgrading from v0.4.x:

  1. Pull latest changes from git
  2. Verify environment variables in .env files:
    • Check INTAKE_API_URL=http://intake:7080 (not INTAKE_API)
    • Verify all service URLs use correct ports
  3. Restart Docker containers: docker-compose down && docker-compose up -d
  4. Test with a simple message through the UI

[Infrastructure v1.0.0] - 2025-11-26

Changed - Environment Variable Consolidation

Major reorganization to eliminate duplication and improve maintainability

  • Consolidated 9 scattered .env files into single source of truth architecture
  • Root .env now contains all shared infrastructure (LLM backends, databases, API keys, service URLs)
  • Service-specific .env files minimized to only essential overrides:
    • cortex/.env: Reduced from 42 to 22 lines (operational parameters only)
    • neomem/.env: Reduced from 26 to 14 lines (LLM naming conventions only)
    • intake/.env: Kept at 8 lines (already minimal)
  • Result: ~24% reduction in total configuration lines (197 → ~150)

Docker Compose Consolidation

  • All services now defined in single root docker-compose.yml
  • Relay service updated with complete configuration (env_file, volumes)
  • Removed redundant core/docker-compose.yml (marked as DEPRECATED)
  • Standardized network communication to use Docker container names

Service URL Standardization

  • Internal services use container names: http://neomem-api:7077, http://cortex:7081
  • External services use IP addresses: http://10.0.0.43:8000 (vLLM), http://10.0.0.3:11434 (Ollama)
  • Removed IP/container name inconsistencies across files

Added - Security & Documentation

Security Templates - Created .env.example files for all services

  • Root .env.example with sanitized credentials
  • Service-specific templates: cortex/.env.example, neomem/.env.example, intake/.env.example, rag/.env.example
  • All .env.example files safe to commit to version control

Documentation

  • ENVIRONMENT_VARIABLES.md: Comprehensive reference for all environment variables
    • Variable descriptions, defaults, and usage examples
    • Multi-backend LLM strategy documentation
    • Troubleshooting guide
    • Security best practices
  • DEPRECATED_FILES.md: Deletion guide for deprecated files with verification steps

Enhanced .gitignore

  • Ignores all .env files (including subdirectories)
  • Tracks .env.example templates for documentation
  • Ignores .env-backups/ directory

Removed

  • core/.env - Redundant with root .env, now deleted
  • core/docker-compose.yml - Consolidated into main compose file (marked DEPRECATED)

Fixed

  • Eliminated duplicate OPENAI_API_KEY across 5+ files
  • Eliminated duplicate LLM backend URLs across 4+ files
  • Eliminated duplicate database credentials across 3+ files
  • Resolved Cortex environment: section override in docker-compose (now uses env_file)

Architecture - Multi-Backend LLM Strategy

Root .env provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE:

  • Cortex → vLLM (PRIMARY) for autonomous reasoning
  • NeoMem → Ollama (SECONDARY) + OpenAI embeddings
  • Intake → vLLM (PRIMARY) for summarization
  • Relay → Fallback chain with user preference

Preserves per-service flexibility while eliminating URL duplication.

Migration

  • All original .env files backed up to .env-backups/ with timestamp 20251126_025334
  • Rollback plan documented in ENVIRONMENT_VARIABLES.md
  • Verification steps provided in DEPRECATED_FILES.md

[0.4.x] - 2025-11-13

Added - Multi-Stage Reasoning Pipeline

Cortex v0.5 - Complete architectural overhaul

  • New reasoning.py module

    • Async reasoning engine
    • Accepts user prompt, identity, RAG block, and reflection notes
    • Produces draft internal answers
    • Uses primary backend (vLLM)
  • New reflection.py module

    • Fully async meta-awareness layer
    • Produces actionable JSON "internal notes"
    • Enforces strict JSON schema and fallback parsing
    • Forces cloud backend (backend_override="cloud")
  • Integrated refine.py into pipeline

    • New stage between reflection and persona
    • Runs exclusively on primary vLLM backend (MI50)
    • Produces final, internally consistent output for downstream persona layer
  • Backend override system

    • Each LLM call can now select its own backend
    • Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary
  • Identity loader

    • Added identity.py with load_identity() for consistent persona retrieval
  • Ingest handler

    • Async stub created for future Intake → NeoMem → RAG pipeline

Cortex v0.4.1 - RAG Integration

  • RAG integration
    • Added rag.py with query_rag() and format_rag_block()
    • Cortex now queries local RAG API (http://10.0.0.41:7090/rag/search)
    • Synthesized answers and top excerpts injected into reasoning prompt

Changed - Unified LLM Architecture

Cortex v0.5

  • Unified LLM backend URL handling across Cortex

    • ENV variables must now contain FULL API endpoints
    • Removed all internal path-appending (e.g. .../v1/completions)
    • llm_router.py rewritten to use env-provided URLs as-is
    • Ensures consistent behavior between draft, reflection, refine, and persona
  • Rebuilt main.py

    • Removed old annotation/analysis logic
    • New structure: load identity → get RAG → reflect → reason → return draft+notes
    • Routes now clean and minimal (/reason, /ingest, /health)
    • Async path throughout Cortex
  • Refactored llm_router.py

    • Removed old fallback logic during overrides
    • OpenAI requests now use /v1/chat/completions
    • Added proper OpenAI Authorization headers
    • Distinct payload format for vLLM vs OpenAI
    • Unified, correct parsing across models
  • Simplified Cortex architecture

    • Removed deprecated "context.py" and old reasoning code
    • Relay completely decoupled from smart behavior
  • Updated environment specification

    • LLM_PRIMARY_URL now set to http://10.0.0.43:8000/v1/completions
    • LLM_SECONDARY_URL remains http://10.0.0.3:11434/api/generate (Ollama)
    • LLM_CLOUD_URL set to https://api.openai.com/v1/chat/completions

Cortex v0.4.1

  • Revised /reason endpoint

    • Now builds unified context blocks: [Intake] → recent summaries, [RAG] → contextual knowledge, [User Message] → current input
    • Calls call_llm() for first pass, then reflection_loop() for meta-evaluation
    • Returns cortex_prompt, draft_output, final_output, and normalized reflection
  • Reflection Pipeline Stability

    • Cleaned parsing to normalize JSON vs. text reflections
    • Added fallback handling for malformed or non-JSON outputs
    • Log system improved to show raw JSON, extracted fields, and normalized summary
  • Async Summarization (Intake v0.2.1)

    • Intake summaries now run in background threads to avoid blocking Cortex
    • Summaries (L1L∞) logged asynchronously with [BG] tags
  • Environment & Networking Fixes

    • Verified .env variables propagate correctly inside Cortex container
    • Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG
    • Adjusted localhost calls to service-IP mapping
  • Behavioral Updates

    • Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers)
    • RAG context successfully grounds reasoning outputs
    • Intake and NeoMem confirmed receiving summaries via /add_exchange
    • Log clarity pass: all reflective and contextual blocks clearly labeled

Fixed

Cortex v0.5

  • Resolved endpoint conflict where router expected base URLs and refine expected full URLs
    • Fixed by standardizing full-URL behavior across entire system
  • Reflection layer no longer fails silently (previously returned [""] due to MythoMax)
  • Resolved 404/401 errors caused by incorrect OpenAI URL endpoints
  • No more double-routing through vLLM during reflection
  • Corrected async/sync mismatch in multiple locations
  • Eliminated double-path bug (/v1/completions/v1/completions) caused by previous router logic

Removed

Cortex v0.5

  • Legacy annotate, reason_check glue logic from old architecture
  • Old backend probing junk code
  • Stale imports and unused modules leftover from previous prototype

Verified

Cortex v0.5

  • Cortex → vLLM (MI50) → refine → final_output now functioning correctly
  • Refine shows used_primary_backend: true and no fallback
  • Manual curl test confirms endpoint accuracy

Known Issues

Cortex v0.5

  • Refine sometimes prefixes output with "Final Answer:"; next version will sanitize this
  • Hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned)

Cortex v0.4.1

  • NeoMem tuning needed - improve retrieval latency and relevance
  • Need dedicated /reflections/recent endpoint for Cortex
  • Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem)
  • Add persistent reflection recall (use prior reflections as meta-context)
  • Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields)
  • Tighten temperature and prompt control for factual consistency
  • RAG optimization: add source ranking, filtering, multi-vector hybrid search
  • Cache RAG responses per session to reduce duplicate calls

Notes

Cortex v0.5

This is the largest structural change to Cortex so far. It establishes:

  • Multi-model cognition
  • Clean layering
  • Identity + reflection separation
  • Correct async code
  • Deterministic backend routing
  • Predictable JSON reflection

The system is now ready for:

  • Refinement loops
  • Persona-speaking layer
  • Containerized RAG
  • Long-term memory integration
  • True emergent-behavior experiments

[0.3.x] - 2025-10-28 to 2025-09-26

Added

[Lyra Core v0.3.2 + Web UI v0.2.0] - 2025-10-28

  • New UI

    • Cleaned up UI look and feel
  • Sessions

    • Sessions now persist over time
    • Ability to create new sessions or load sessions from previous instance
    • When changing session, updates what the prompt sends to relay (doesn't prompt with messages from other sessions)
    • Relay correctly wired in

[Lyra-Core 0.3.1] - 2025-10-09

  • NVGRAM Integration (Full Pipeline Reconnected)
    • Replaced legacy Mem0 service with NVGRAM microservice (nvgram-api @ port 7077)
    • Updated server.js in Relay to route all memory ops via ${NVGRAM_API}/memories and /search
    • Added .env variable: NVGRAM_API=http://nvgram-api:7077
    • Verified end-to-end Lyra conversation persistence: relay → nvgram-api → postgres/neo4j → relay → ollama → ui
    • Memories stored, retrieved, and re-injected successfully

[Lyra-Core v0.3.0] - 2025-09-26

  • Salience filtering in Relay
    • .env configurable: SALIENCE_ENABLED, SALIENCE_MODE, SALIENCE_MODEL, SALIENCE_API_URL
    • Supports heuristic and llm classification modes
    • LLM-based salience filter integrated with Cortex VM running llama-server
  • Logging improvements
    • Added debug logs for salience mode, raw LLM output, and unexpected outputs
    • Fail-closed behavior for unexpected LLM responses
  • Successfully tested with Phi-3.5-mini and Qwen2-0.5B-Instruct as salience classifiers
  • Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply

[Cortex v0.3.0] - 2025-10-31

  • Cortex Service (FastAPI)

    • New standalone reasoning engine (cortex/main.py) with endpoints:
      • GET /health reports active backend + NeoMem status
      • POST /reason evaluates {prompt, response} pairs
      • POST /annotate experimental text analysis
    • Background NeoMem health monitor (5-minute interval)
  • Multi-Backend Reasoning Support

    • Environment-driven backend selection via LLM_FORCE_BACKEND
    • Supports: Primary (vLLM MI50), Secondary (Ollama 3090), Cloud (OpenAI), Fallback (llama.cpp CPU)
    • Per-backend model variables: LLM_PRIMARY_MODEL, LLM_SECONDARY_MODEL, LLM_CLOUD_MODEL, LLM_FALLBACK_MODEL
  • Response Normalization Layer

    • Implemented normalize_llm_response() to merge streamed outputs and repair malformed JSON
    • Handles Ollama's multi-line streaming and Mythomax's missing punctuation issues
    • Prints concise debug previews of merged content
  • Environment Simplification

    • Each service (intake, cortex, neomem) now maintains its own .env file
    • Removed reliance on shared/global env file to prevent cross-contamination
    • Verified Docker Compose networking across containers

[NeoMem 0.1.2] - 2025-10-27 (formerly NVGRAM)

  • Renamed NVGRAM to NeoMem
    • All future updates under name NeoMem
    • Features unchanged

[NVGRAM 0.1.1] - 2025-10-08

  • Async Memory Rewrite (Stability + Safety Patch)
    • Introduced AsyncMemory class with fully asynchronous vector and graph store writes
    • Added input sanitation to prevent embedding errors ('list' object has no attribute 'replace')
    • Implemented flatten_messages() helper in API layer to clean malformed payloads
    • Added structured request logging via RequestLoggingMiddleware (FastAPI middleware)
    • Health endpoint (/health) returns structured JSON {status, version, service}
    • Startup logs include sanitized embedder config with masked API keys

[NVGRAM 0.1.0] - 2025-10-07

  • Initial fork of Mem0 → NVGRAM
    • Created fully independent local-first memory engine based on Mem0 OSS
    • Renamed all internal modules, Docker services, environment variables from mem0nvgram
    • New service name: nvgram-api, default port 7077
    • Maintains same API endpoints (/memories, /search) for drop-in compatibility
    • Uses FastAPI, Postgres, and Neo4j as persistent backends

[Lyra-Mem0 0.3.2] - 2025-10-05

  • Ollama LLM reasoning alongside OpenAI embeddings
    • Introduced LLM_PROVIDER=ollama, LLM_MODEL, and OLLAMA_HOST in .env.3090
    • Verified local 3090 setup using qwen2.5:7b-instruct-q4_K_M
    • Split processing: Embeddings → OpenAI text-embedding-3-small, LLM → Local Ollama
  • Added .env.3090 template for self-hosted inference nodes
  • Integrated runtime diagnostics and seeder progress tracking
    • File-level + message-level progress bars
    • Retry/back-off logic for timeouts (3 attempts)
    • Event logging (ADD / UPDATE / NONE) for every memory record
  • Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers
  • Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090)

[Lyra-Mem0 0.3.1] - 2025-10-03

  • HuggingFace TEI integration (local 3090 embedder)
  • Dual-mode environment switch between OpenAI cloud and local
  • CSV export of memories from Postgres (payload->>'data')

[Lyra-Mem0 0.3.0]

  • Ollama embeddings in Mem0 OSS container
    • Configure EMBEDDER_PROVIDER=ollama, EMBEDDER_MODEL, OLLAMA_HOST via .env
    • Mounted main.py override from host into container to load custom DEFAULT_CONFIG
    • Installed ollama Python client into custom API container image
  • .env.3090 file for external embedding mode (3090 machine)
  • Workflow for multiple embedding modes: LAN-based 3090/Ollama, Local-only CPU, OpenAI fallback

[Lyra-Mem0 v0.2.1]

  • Seeding pipeline
    • Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0
    • Implemented incremental seeding option (skip existing memories, only add new ones)
    • Verified insert process with Postgres-backed history DB

[Intake v0.1.0] - 2025-10-27

  • Receives messages from relay and summarizes them in cascading format
  • Continues to summarize smaller amounts of exchanges while generating large-scale conversational summaries (L20)
  • Currently logs summaries to .log file in /project-lyra/intake-logs/

[Lyra-Cortex v0.2.0] - 2025-09-26

  • Integrated llama-server on dedicated Cortex VM (Proxmox)
  • Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs
  • Benchmarked Phi-3.5-mini performance: ~18 tokens/sec CPU-only on Ryzen 7 7800X
  • Salience classification functional but sometimes inconsistent
  • Tested Qwen2-0.5B-Instruct GGUF as alternative salience classifier
    • Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval)
    • More responsive but over-classifies messages as "salient"
  • Established .env integration for model ID (SALIENCE_MODEL), enabling hot-swap between models

Changed

[Lyra-Core 0.3.1] - 2025-10-09

  • Renamed MEM0_URLNVGRAM_API across all relay environment configs
  • Updated Docker Compose service dependency order
    • relay now depends on nvgram-api healthcheck
    • Removed mem0 references and volumes
  • Minor cleanup to Persona fetch block (null-checks and safer default persona string)

[Lyra-Core v0.3.1] - 2025-09-27

  • Removed salience filter logic; Cortex is now default annotator
  • All user messages stored in Mem0; no discard tier applied
  • Cortex annotations (metadata.cortex) now attached to memories
  • Debug logging improvements
    • Pretty-print Cortex annotations
    • Injected prompt preview
    • Memory search hit list with scores
  • .env toggle (CORTEX_ENABLED) to bypass Cortex when needed

[Lyra-Core v0.3.0] - 2025-09-26

  • Refactored server.js to gate mem.add() calls behind salience filter
  • Updated .env to support SALIENCE_MODEL

[Cortex v0.3.0] - 2025-10-31

  • Refactored reason_check() to dynamically switch between prompt and chat mode depending on backend
  • Enhanced startup logs to announce active backend, model, URL, and mode
  • Improved error handling with clearer "Reasoning error" messages

[NVGRAM 0.1.1] - 2025-10-08

  • Replaced synchronous Memory.add() with async-safe version supporting concurrent vector + graph writes
  • Normalized indentation and cleaned duplicate main.py references
  • Removed redundant FastAPI() app reinitialization
  • Updated internal logging to INFO-level timing format
  • Deprecated @app.on_event("startup") → will migrate to lifespan handler in v0.1.2

[NVGRAM 0.1.0] - 2025-10-07

  • Removed dependency on external mem0ai SDK — all logic now local
  • Re-pinned requirements: fastapi==0.115.8, uvicorn==0.34.0, pydantic==2.10.4, python-dotenv==1.0.1, psycopg>=3.2.8, ollama
  • Adjusted docker-compose and .env templates to use new NVGRAM naming

[Lyra-Mem0 0.3.2] - 2025-10-05

  • Updated main.py configuration block to load LLM_PROVIDER, LLM_MODEL, OLLAMA_BASE_URL
    • Fallback to OpenAI if Ollama unavailable
  • Adjusted docker-compose.yml mount paths to correctly map /app/main.py
  • Normalized .env loading so mem0-api and host environment share identical values
  • Improved seeder logging and progress telemetry
  • Added explicit temperature field to DEFAULT_CONFIG['llm']['config']

[Lyra-Mem0 0.3.0]

  • docker-compose.yml updated to mount local main.py and .env.3090
  • Built custom Dockerfile (mem0-api-server:latest) extending base image with pip install ollama
  • Updated requirements.txt to include ollama package
  • Adjusted Mem0 container config so main.py pulls environment variables with dotenv
  • Tested new embeddings path with curl /memories API call

[Lyra-Mem0 v0.2.1]

  • Updated main.py to load configuration from .env using dotenv and support multiple embedder backends
  • Mounted host main.py into container so local edits persist across rebuilds
  • Updated docker-compose.yml to mount .env.3090 and support swap between profiles
  • Built custom Dockerfile (mem0-api-server:latest) including pip install ollama
  • Updated requirements.txt with ollama dependency
  • Adjusted startup flow so container automatically connects to external Ollama host (LAN IP)
  • Added logging to confirm model pulls and embedding requests

Fixed

[Lyra-Core 0.3.1] - 2025-10-09

  • Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling
  • /memories POST failures no longer crash Relay; now logged gracefully as relay error Error: memAdd failed: 500
  • Improved injected prompt debugging (DEBUG_PROMPT=true now prints clean JSON)

[Lyra-Core v0.3.1] - 2025-09-27

  • Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner
  • Relay no longer "hangs" on malformed Cortex outputs

[Cortex v0.3.0] - 2025-10-31

  • Corrected broken vLLM endpoint routing (/v1/completions)
  • Stabilized cross-container health reporting for NeoMem
  • Resolved JSON parse failures caused by streaming chunk delimiters

[NVGRAM 0.1.1] - 2025-10-08

  • Eliminated repeating 500 error from OpenAI embedder caused by non-string message content
  • Masked API key leaks from boot logs
  • Ensured Neo4j reconnects gracefully on first retry

[Lyra-Mem0 0.3.2] - 2025-10-05

  • Resolved crash during startup: TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'
  • Corrected mount type mismatch (file vs directory) causing OCI runtime create failed errors
  • Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests
  • "Unknown event" warnings now safely ignored (no longer break seeding loop)
  • Confirmed full dual-provider operation in logs (api.openai.com + 10.0.0.3:11434/api/chat)

[Lyra-Mem0 0.3.1] - 2025-10-03

  • .env CRLF vs LF line ending issues
  • Local seeding now possible via HuggingFace server

[Lyra-Mem0 0.3.0]

  • Resolved container boot failure caused by missing ollama dependency (ModuleNotFoundError)
  • Fixed config overwrite issue where rebuilding container restored stock main.py
  • Worked around Neo4j error (vector.similarity.cosine(): mismatched vector dimensions) by confirming OpenAI vs. Ollama embedding vector sizes

[Lyra-Mem0 v0.2.1]

  • Seeder process originally failed on old memories — now skips duplicates and continues batch
  • Resolved container boot error (ModuleNotFoundError: ollama) by extending image
  • Fixed overwrite issue where stock main.py replaced custom config during rebuild
  • Worked around Neo4j vector.similarity.cosine() dimension mismatch

Known Issues

[Lyra-Core v0.3.0] - 2025-09-26

  • Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient"
  • Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi")
  • CPU-only inference is functional but limited; larger models recommended once GPU available

[Lyra-Cortex v0.2.0] - 2025-09-26

  • Small models tend to drift or over-classify
  • CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models
  • Need to set up systemd service for llama-server to auto-start on VM reboot

Observations

[Lyra-Mem0 0.3.2] - 2025-10-05

  • Stable GPU utilization: ~8 GB VRAM @ 92% load, ≈ 67°C under sustained seeding
  • Next revision will re-format seed JSON to preserve role context (user vs assistant)

[Lyra-Mem0 v0.2.1]

  • To fully unify embedding modes, a Hugging Face / local model with 1536-dim embeddings will be needed (to match OpenAI's schema)
  • Current Ollama model (mxbai-embed-large) works, but returns 1024-dim vectors
  • Seeder workflow validated but should be wrapped in repeatable weekly run for full Cloud→Local sync

Next Steps

[Lyra-Core 0.3.1] - 2025-10-09

  • Add salience visualization (e.g., memory weights displayed in injected system message)
  • Begin schema alignment with NVGRAM v0.1.2 for confidence scoring
  • Add relay auto-retry for transient 500 responses from NVGRAM

[NVGRAM 0.1.1] - 2025-10-08

  • Integrate salience scoring and embedding confidence weight fields in Postgres schema
  • Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall
  • Migrate from deprecated on_eventlifespan pattern in 0.1.2

[NVGRAM 0.1.0] - 2025-10-07

  • Integrate NVGRAM as new default backend in Lyra Relay
  • Deprecate remaining Mem0 references and archive old configs
  • Begin versioning as standalone project (nvgram-core, nvgram-api, etc.)

[Intake v0.1.0] - 2025-10-27

  • Feed intake into NeoMem
  • Generate daily/hourly overall summary (IE: Today Brian and Lyra worked on x, y, and z)
  • Generate session-aware summaries with own intake hopper

[0.2.x] - 2025-09-30 to 2025-09-24

Added

[Lyra-Mem0 v0.2.0] - 2025-09-30

  • Standalone Lyra-Mem0 stack created at ~/lyra-mem0/
    • Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking
    • Added working docker-compose.mem0.yml and custom Dockerfile for building Mem0 API server
  • Verified REST API functionality
    • POST /memories works for adding memories
    • POST /search works for semantic search
  • Successful end-to-end test with persisted memory: "Likes coffee in the morning" → retrievable via search

[Lyra-Core v0.2.0] - 2025-09-24

  • Migrated Relay to use mem0ai SDK instead of raw fetch calls
  • Implemented sessionId support (client-supplied, fallback to default)
  • Added debug logs for memory add/search
  • Cleaned up Relay structure for clarity

Changed

[Lyra-Mem0 v0.2.0] - 2025-09-30

  • Split architecture into modular stacks:
    • ~/lyra-core (Relay, Persona-Sidecar, etc.)
    • ~/lyra-mem0 (Mem0 OSS memory stack)
  • Removed old embedded mem0 containers from Lyra-Core compose file
  • Added Lyra-Mem0 section in README.md

Next Steps

[Lyra-Mem0 v0.2.0] - 2025-09-30

  • Wire Relay → Mem0 API (integration not yet complete)
  • Add integration tests to verify persistence and retrieval from within Lyra-Core

[0.1.x] - 2025-09-25 to 2025-09-23

Added

[Lyra_RAG v0.1.0] - 2025-11-07

  • Initial standalone RAG module for Project Lyra
  • Persistent ChromaDB vector store (./chromadb)
  • Importer rag_chat_import.py with:
    • Recursive folder scanning and category tagging
    • Smart chunking (~5k chars)
    • SHA-1 deduplication and chat-ID metadata
    • Timestamp fields (file_modified, imported_at)
    • Background-safe operation (nohup/tmux)
  • 68 Lyra-category chats imported:
    • 6,556 new chunks added
    • 1,493 duplicates skipped
    • 7,997 total vectors stored

[Lyra_RAG v0.1.0 API] - 2025-11-07

  • /rag/search FastAPI endpoint implemented (port 7090)
  • Supports natural-language queries and returns top related excerpts
  • Added answer synthesis step using gpt-4o-mini

[Lyra-Core v0.1.0] - 2025-09-23

  • First working MVP of Lyra Core Relay
  • Relay service accepts POST /v1/chat/completions (OpenAI-compatible)
  • Memory integration with Mem0:
    • POST /memories on each user message
    • POST /search before LLM call
  • Persona Sidecar integration (GET /current)
  • OpenAI GPT + Ollama (Mythomax) support in Relay
  • Simple browser-based chat UI (talks to Relay at http://<host>:7078)
  • .env standardization for Relay + Mem0 + Postgres + Neo4j
  • Working Neo4j + Postgres backing stores for Mem0
  • Initial MVP relay service with raw fetch calls to Mem0
  • Dockerized with basic healthcheck

[Lyra-Cortex v0.1.0] - 2025-09-25

  • First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD)
  • Built llama.cpp with llama-server target via CMake
  • Integrated Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF) model
  • Verified API compatibility at /v1/chat/completions
  • Local test successful via curl → ~523 token response generated
  • Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X)
  • Confirmed usable for salience scoring, summarization, and lightweight reasoning

Fixed

[Lyra-Core v0.1.0] - 2025-09-23

  • Resolved crash loop in Neo4j by restricting env vars (NEO4J_AUTH only)
  • Relay now correctly reads MEM0_URL and MEM0_API_KEY from .env

Verified

[Lyra_RAG v0.1.0] - 2025-11-07

  • Successful recall of Lyra-Core development history (v0.3.0 snapshot)
  • Correct metadata and category tagging for all new imports

Known Issues

[Lyra-Core v0.1.0] - 2025-09-23

  • No feedback loop (thumbs up/down) yet
  • Forget/delete flow is manual (via memory IDs)
  • Memory latency ~14s depending on embedding model

Next Planned

[Lyra_RAG v0.1.0] - 2025-11-07

  • Optional where filter parameter for category/date queries
  • Graceful "no results" handler for empty retrievals
  • rag_docs_import.py for PDFs and other document types