serversdown/project-lyra

Fork 0

Files

serversdwn 6716245a99 v0.9.1

2025-12-29 22:44:47 -05:00

65 KiB

Raw Permalink Blame History

Project Lyra Changelog

All notable changes to Project Lyra. Format based on Keep a Changelog and Semantic Versioning.

[Unreleased]

##[0.9.1] - 2025-12-29 Fixed:

chat auto scrolling now works.
Session names don't change to auto gen UID anymore.

[0.9.0] - 2025-12-29

Added - Trilium Notes Integration

Trilium ETAPI Knowledge Base Integration

Trilium Tool Executor cortex/autonomy/tools/executors/trilium.py
- search_notes(query, limit) - Search through Trilium notes via ETAPI
- create_note(title, content, parent_note_id) - Create new notes in Trilium knowledge base
- Full ETAPI authentication and error handling
- Automatic parentNoteId defaulting to "root" for root-level notes
- Connection error handling with user-friendly messages
Tool Registry Integration cortex/autonomy/tools/registry.py
- Added ENABLE_TRILIUM feature flag
- Tool definitions with schema validation
- Provider-agnostic tool calling support
Setup Documentation TRILIUM_SETUP.md
- Step-by-step ETAPI token generation guide
- Environment configuration instructions
- Troubleshooting section for common issues
- Security best practices for token management
API Reference Documentation docs/TRILIUM_API.md
- Complete ETAPI endpoint reference
- Authentication and request/response examples
- Search syntax and advanced query patterns

Environment Configuration

New Environment Variables .env
- ENABLE_TRILIUM=true - Enable/disable Trilium integration
- TRILIUM_URL=http://10.0.0.2:4292 - Trilium instance URL
- TRILIUM_ETAPI_TOKEN - ETAPI authentication token

Capabilities Unlocked

Personal knowledge base search during conversations
Automatic note creation from conversation insights
Cross-reference information between chat and notes
Context-aware responses using stored knowledge
Future: Find duplicates, suggest organization, summarize notes

Changed - Spelling Corrections

Module Naming

Renamed trillium.py to trilium.py (corrected spelling)
Updated all imports and references across codebase
Fixed environment variable names (TRILLIUM → TRILIUM)
Updated documentation to use correct "Trilium" spelling

[0.8.0] - 2025-12-26

Added - Tool Calling & "Show Your Work" Transparency Feature

Tool Calling System (Standard Mode)

Function Calling Infrastructure cortex/autonomy/tools/
- Implemented agentic tool calling for Standard Mode with autonomous multi-step execution
- Tool registry system with JSON schema definitions
- Adapter pattern for provider-agnostic tool calling (OpenAI, Ollama, llama.cpp)
- Maximum 5 iterations per request to prevent runaway loops
Available Tools
- execute_code - Sandboxed Python/JavaScript/Bash execution via Docker
- web_search - Tavily API integration for real-time web queries
- trilium_search - Internal Trilium knowledge base queries
Provider Adapters cortex/autonomy/tools/adapters/
- OpenAIAdapter - Native function calling support
- OllamaAdapter - XML-based tool calling for local models
- LlamaCppAdapter - XML-based tool calling for llama.cpp backend
- Automatic tool call parsing and result formatting
Code Execution Sandbox cortex/autonomy/tools/code_executor.py
- Docker-based isolated execution environment
- Support for Python, JavaScript (Node.js), and Bash
- 30-second timeout with automatic cleanup
- Returns stdout, stderr, exit code, and execution time
- Prevents filesystem access outside sandbox

"Show Your Work" - Real-Time Thinking Stream

Server-Sent Events (SSE) Streaming cortex/router.py:478-527
- New /stream/thinking/{session_id} endpoint for real-time event streaming
- Broadcasts internal thinking process during tool calling operations
- 30-second keepalive with automatic reconnection support
- Events: connected, thinking, tool_call, tool_result, done, error
Stream Manager cortex/autonomy/tools/stream_events.py
- Pub/sub system for managing SSE subscriptions per session
- Multiple clients can connect to same session stream
- Automatic cleanup of dead queues and closed connections
- Zero overhead when no subscribers active
FunctionCaller Integration cortex/autonomy/tools/function_caller.py
- Enhanced with event emission at each step:
  - "thinking" events before each LLM call
  - "tool_call" events when invoking tools
  - "tool_result" events after tool execution
  - "done" event with final answer
  - "error" events on failures
- Session-aware streaming (only emits when subscribers exist)
- Provider-agnostic implementation works with all backends
Thinking Stream UI core/ui/thinking-stream.html
- Dedicated popup window for real-time thinking visualization
- Color-coded events: green (thinking), orange (tool calls), blue (results), purple (done), red (errors)
- Auto-scrolling event feed with animations
- Connection status indicator with green/red dot
- Clear events button and session info display
- Mobile-friendly responsive design
UI Integration core/ui/index.html
- "🧠 Show Work" button in session selector
- Opens thinking stream in popup window
- Session ID passed via URL parameter for stream association
- Purple/violet button styling to match cyberpunk theme

Tool Calling Configuration

Environment Variables .env
- STANDARD_MODE_ENABLE_TOOLS=true - Enable/disable tool calling
- TAVILY_API_KEY - API key for web search tool
- TRILLIUM_API_URL - URL for Trillium knowledge base
Standard Mode Tools Toggle cortex/router.py:389-470
- /simple endpoint checks STANDARD_MODE_ENABLE_TOOLS environment variable
- Falls back to non-tool mode if disabled
- Logs tool usage statistics (iterations, tools used)

Changed - CORS & Architecture

CORS Support for SSE

Added CORS Middleware cortex/main.py
- FastAPI CORSMiddleware with wildcard origins for development
- Allows cross-origin SSE connections from nginx UI (port 8081) to cortex (port 7081)
- Credentials support enabled for authenticated requests
- All methods and headers permitted

Tool Calling Pipeline

Standard Mode Enhancement cortex/router.py:389-470
- /simple endpoint now supports optional tool calling
- Multi-iteration agentic loop with LLM + tool execution
- Tool results injected back into conversation for next iteration
- Graceful degradation to non-tool mode if tools disabled

JSON Response Formatting

SSE Event Structure cortex/router.py:497-499
- Fixed initial "connected" event to use proper JSON serialization
- Changed from f-string with nested quotes to json.dumps()
- Ensures valid JSON for all event types

Fixed - Critical JavaScript & SSE Issues

JavaScript Variable Scoping Bug

Root cause: eventSource variable used before declaration in thinking-stream.html:218
Symptom: Uncaught ReferenceError: can't access lexical declaration 'eventSource' before initialization
Solution: Moved variable declarations before connectStream() call
Impact: Thinking stream page now loads without errors and establishes SSE connection

SSE Connection Not Establishing

Root cause: CORS blocked cross-origin SSE requests from nginx (8081) to cortex (7081)
Symptom: Browser silently blocked EventSource connection, no errors in console
Solution: Added CORSMiddleware to cortex FastAPI app
Impact: SSE streams now connect successfully across ports

Invalid JSON in SSE Events

Root cause: Initial "connected" event used f-string with nested quotes: f"data: {{'type': 'connected', 'session_id': '{session_id}'}}\n\n"
Symptom: Browser couldn't parse malformed JSON, connection appeared stuck on "Connecting..."
Solution: Used json.dumps() for proper JSON serialization
Impact: Connected event now parsed correctly, status updates to green dot

Technical Improvements

Agentic Architecture

Multi-iteration reasoning loop with tool execution
Provider-agnostic tool calling via adapter pattern
Automatic tool result injection into conversation context
Iteration limits to prevent infinite loops
Comprehensive logging at each step

Event Streaming Performance

Zero overhead when no subscribers (check before emit)
Efficient pub/sub with asyncio queues
Automatic cleanup of disconnected clients
30-second keepalive prevents timeout issues
Session-isolated streams prevent cross-talk

Code Quality

Clean separation: tool execution, adapters, streaming, UI
Comprehensive error handling with fallbacks
Detailed logging for debugging tool calls
Type hints and docstrings throughout
Modular design for easy extension

Security

Sandboxed code execution prevents filesystem access
Timeout limits prevent resource exhaustion
Docker isolation for untrusted code
No code execution without explicit user request

Architecture - Tool Calling Flow

Standard Mode with Tools:

User (UI) → Relay → Cortex /simple
  ↓
  Check STANDARD_MODE_ENABLE_TOOLS
  ↓
  LLM generates tool call → FunctionCaller
  ↓
  Execute tool (Docker sandbox / API call)
  ↓
  Inject result → LLM (next iteration)
  ↓
  Repeat until done or max iterations
  ↓
  Return final answer → UI

Thinking Stream Flow:

Browser → nginx:8081 → thinking-stream.html
  ↓
EventSource connects to cortex:7081/stream/thinking/{session_id}
  ↓
ToolStreamManager.subscribe(session_id) → asyncio.Queue
  ↓
User sends message → /simple endpoint
  ↓
FunctionCaller emits events:
  - emit("thinking") → Queue → SSE → Browser
  - emit("tool_call") → Queue → SSE → Browser
  - emit("tool_result") → Queue → SSE → Browser
  - emit("done") → Queue → SSE → Browser
  ↓
Browser displays color-coded events in real-time

Documentation

Added THINKING_STREAM.md - Complete guide to "Show Your Work" feature
- Usage examples with curl
- Event type reference
- Architecture diagrams
- Demo page instructions
Added UI_THINKING_STREAM.md - UI integration documentation
- Button placement and styling
- Popup window behavior
- Session association logic

Known Limitations

Tool Calling:

Limited to 5 iterations per request (prevents runaway loops)
Python sandbox has no filesystem persistence (temporary only)
Web search requires Tavily API key (not free tier unlimited)
Trillium search requires separate knowledge base setup

Thinking Stream:

CORS wildcard (*) is development-only (should restrict in production)
Stream ends after "done" event (must reconnect for new request)
No historical replay (only shows real-time events)
Single session per stream window

Migration Notes

For Users Upgrading:

New environment variable: STANDARD_MODE_ENABLE_TOOLS=true (default: enabled)
Thinking stream accessible via "🧠 Show Work" button in UI
Tool calling works automatically in Standard Mode when enabled
No changes required to existing Standard Mode usage

For Developers:

Cortex now includes CORS middleware for SSE
New /stream/thinking/{session_id} endpoint available
FunctionCaller requires session_id parameter for streaming
Tool adapters can be extended by adding to AVAILABLE_TOOLS registry

[0.7.0] - 2025-12-21

Added - Standard Mode & UI Enhancements

Standard Mode Implementation

Added "Standard Mode" chat option that bypasses complex cortex reasoning pipeline
- Provides simple chatbot functionality for coding and practical tasks
- Maintains full conversation context across messages
- Backend-agnostic - works with SECONDARY (Ollama), OPENAI, or custom backends
- Created /simple endpoint in Cortex router cortex/router.py:389
Mode selector in UI with toggle between Standard and Cortex modes
- Standard Mode: Direct LLM chat with context retention
- Cortex Mode: Full 7-stage reasoning pipeline (unchanged)

Backend Selection System

UI settings modal with LLM backend selection for Standard Mode
- Radio button selector: SECONDARY (Ollama/Qwen), OPENAI (GPT-4o-mini), or custom
- Backend preference persisted in localStorage
- Custom backend text input for advanced users
Backend parameter routing through entire stack:
- UI sends backend parameter in request body
- Relay forwards backend selection to Cortex
- Cortex /simple endpoint respects user's backend choice
Environment-based fallback: Uses STANDARD_MODE_LLM if no backend specified

Session Management Overhaul

Complete rewrite of session system to use server-side persistence
- File-based storage in core/relay/sessions/ directory
- Session files: {sessionId}.json for history, {sessionId}.meta.json for metadata
- Server is source of truth - sessions sync across browsers and reboots
Session metadata system for friendly names
- Sessions display custom names instead of random IDs
- Rename functionality in session dropdown
- Last modified timestamps and message counts
Full CRUD API for sessions in Relay:
- GET /sessions - List all sessions with metadata
- GET /sessions/:id - Retrieve session history
- POST /sessions/:id - Save session history
- PATCH /sessions/:id/metadata - Update session name/metadata
- DELETE /sessions/:id - Delete session and metadata
Session management UI in settings modal:
- List of all sessions with message counts and timestamps
- Delete button for each session with confirmation
- Automatic session cleanup when deleting current session

UI Improvements

Settings modal with hamburger menu (⚙ Settings button)
- Backend selection section for Standard Mode
- Session management section with delete functionality
- Clean modal overlay with cyberpunk theme
- ESC key and click-outside to close
Light/Dark mode toggle with dark mode as default
- Theme preference persisted in localStorage
- CSS variables for seamless theme switching
- Toggle button shows current mode (🌙 Dark Mode / ☀️ Light Mode)
Removed redundant model selector dropdown from header
Fixed modal positioning and z-index layering
- Modal moved outside #chat container for proper rendering
- Fixed z-index: overlay (999), modal content (1001)
- Centered modal with proper backdrop blur

Context Retention for Standard Mode

Integration with Intake module for conversation history
- Added get_recent_messages() function in intake.py
- Standard Mode retrieves last 20 messages from session buffer
- Full context sent to LLM on each request
Message array format support in LLM router:
- Updated Ollama provider to accept messages parameter
- Updated OpenAI provider to accept messages parameter
- Automatic conversion from messages to prompt string for non-chat APIs

Changed - Architecture & Routing

Relay Server Updates core/relay/server.js

ES module migration for session persistence:
- Imported fs/promises, path, fileURLToPath for file operations
- Created SESSIONS_DIR constant for session storage location
Mode-based routing in both /chat and /v1/chat/completions endpoints:
- Extracts mode parameter from request body (default: "cortex")
- Routes to CORTEX_SIMPLE for Standard Mode, CORTEX_REASON for Cortex Mode
- Backend parameter only used in Standard Mode
Session persistence functions:
- ensureSessionsDir() - Creates sessions directory if needed
- loadSession(sessionId) - Reads session history from file
- saveSession(sessionId, history, metadata) - Writes session to file
- loadSessionMetadata(sessionId) - Reads session metadata
- saveSessionMetadata(sessionId, metadata) - Updates session metadata
- listSessions() - Returns all sessions with metadata, sorted by last modified
- deleteSession(sessionId) - Removes session and metadata files

Cortex Router Updates cortex/router.py

Added backend field to ReasonRequest Pydantic model (optional)
Created /simple endpoint for Standard Mode:
- Bypasses reflection, reasoning, refinement stages
- Direct LLM call with conversation context
- Uses backend from request or falls back to STANDARD_MODE_LLM env variable
- Returns simple response structure without reasoning artifacts
Backend selection logic in /simple:
- Normalizes backend names to uppercase
- Maps UI backend names to system backend names
- Validates backend availability before calling

Intake Integration cortex/intake/intake.py

Added get_recent_messages(session_id, limit) function:
- Retrieves last N messages from session buffer
- Returns empty list if session doesn't exist
- Used by /simple endpoint for context retrieval

LLM Router Enhancements cortex/llm/llm_router.py

Added messages parameter support across all providers
Automatic message-to-prompt conversion for legacy APIs
Chat completion format for Ollama and OpenAI providers
Stop sequences for MI50/DeepSeek R1 to prevent runaway generation:
- "User:", "\nUser:", "Assistant:", "\n\n\n"

Environment Configuration .env

Added STANDARD_MODE_LLM=SECONDARY for default Standard Mode backend
Added CORTEX_SIMPLE_URL=http://cortex:7081/simple for routing

UI Architecture core/ui/index.html

Server-based session loading system:
- loadSessionsFromServer() - Fetches sessions from Relay API
- renderSessions() - Populates session dropdown from server data
- Session state synchronized with server on every change
Backend selection persistence:
- Loads saved backend from localStorage on page load
- Includes backend parameter in request body when in Standard Mode
- Settings modal pre-selects current backend choice
Dark mode by default:
- Checks localStorage for theme preference
- Sets dark theme if no preference found
- Toggle button updates localStorage and applies theme

CSS Styling core/ui/style.css

Light mode CSS variables:
- --bg-dark: #f5f5f5 (light background)
- --text-main: #1a1a1a (dark text)
- --text-fade: #666 (dimmed text)
Dark mode CSS variables (default):
- --bg-dark: #0a0a0a (dark background)
- --text-main: #e6e6e6 (light text)
- --text-fade: #999 (dimmed text)
Modal positioning fixes:
- position: fixed with top: 50%, left: 50%, transform: translate(-50%, -50%)
- Z-index layering: overlay (999), content (1001)
- Backdrop blur effect on modal overlay
Session list styling:
- Session item cards with hover effects
- Delete button with red hover state
- Message count and timestamp display

Fixed - Critical Issues

DeepSeek R1 Runaway Generation

Root cause: R1 reasoning model generates thinking process and hallucinates conversations
Solution:
- Changed STANDARD_MODE_LLM to SECONDARY (Ollama/Qwen) instead of PRIMARY (MI50/R1)
- Added stop sequences to MI50 provider to prevent continuation
- Documented R1 limitations for Standard Mode usage

Context Not Maintained in Standard Mode

Root cause: /simple endpoint didn't retrieve conversation history from Intake
Solution:
- Created get_recent_messages() function in intake.py
- Standard Mode now pulls last 20 messages from session buffer
- Full context sent to LLM with each request
User feedback: "it's saying it hasn't received any other messages from me, so it looks like the standard mode llm isn't getting the full chat"

OpenAI Backend 400 Errors

Root cause: OpenAI provider only accepted prompt strings, not messages arrays
Solution: Updated OpenAI provider to support messages parameter like Ollama
Now handles chat completion format correctly

Modal Formatting Issues

Root cause: Settings modal inside #chat container with overflow constraints
Symptoms: Modal appearing at bottom, jumbled layout, couldn't close
Solution:
- Moved modal outside #chat container to be direct child of body
- Changed positioning from absolute to fixed
- Added proper z-index layering (overlay: 999, content: 1001)
- Removed old model selector from header
User feedback: "the formating for the settings is all off. Its at the bottom and all jumbling together, i cant get it to go away"

Session Persistence Broken

Root cause: Sessions stored only in localStorage, not synced with server
Symptoms: Sessions didn't persist across browsers or reboots, couldn't load messages
Solution: Complete rewrite of session system
- Implemented server-side file persistence in Relay
- Created CRUD API endpoints for session management
- Updated UI to load sessions from server instead of localStorage
- Added metadata system for session names
- Sessions now survive container restarts and sync across browsers
User feedback: "sessions seem to exist locally only, i cant get them to actually load any messages and there is now way to delete them. If i open the ui in a different browser those arent there."

Technical Improvements

Backward Compatibility

All changes include defaults to maintain existing behavior
Cortex Mode completely unchanged - still uses full 7-stage pipeline
Standard Mode is opt-in via UI mode selector
If no backend specified, falls back to STANDARD_MODE_LLM env variable
Existing requests without mode parameter default to "cortex"

Code Quality

Consistent async/await patterns throughout stack
Proper error handling with fallbacks
Clean separation between Standard and Cortex modes
Session persistence abstracted into helper functions
Modular UI code with clear event handlers

Performance

Standard Mode bypasses 6 of 7 reasoning stages for faster responses
Session loading optimized with file-based caching
Backend selection happens once per message, not per LLM call
Minimal overhead for mode detection and routing

Architecture - Dual-Mode Chat System

Standard Mode Flow:

User (UI) → Relay → Cortex /simple → Intake (get_recent_messages)
→ LLM (direct call with context) → Relay → UI

Cortex Mode Flow (Unchanged):

User (UI) → Relay → Cortex /reason → Reflection → Reasoning
→ Refinement → Persona → Relay → UI

Session Persistence:

UI → POST /sessions/:id → Relay → File system (sessions/*.json)
UI → GET /sessions → Relay → List all sessions → UI dropdown

Known Limitations

Standard Mode:

No reflection, reasoning, or refinement stages
No RAG integration (same as Cortex Mode - currently disabled)
No NeoMem memory storage (same as Cortex Mode - currently disabled)
DeepSeek R1 not recommended for Standard Mode (generates reasoning artifacts)

Session Management:

Sessions stored in container filesystem - need volume mount for true persistence
No session import/export functionality yet
No session search or filtering

Migration Notes

For Users Upgrading:

Existing sessions in localStorage will not automatically migrate to server
Create new sessions after upgrade for server-side persistence
Theme preference (light/dark) will be preserved from localStorage
Backend preference will default to SECONDARY if not previously set

For Developers:

Relay now requires fs/promises for session persistence
Cortex /simple endpoint expects backend parameter (optional)
UI sends mode and backend parameters in request body
Session files stored in core/relay/sessions/ directory

[0.6.0] - 2025-12-18

Added - Autonomy System (Phase 1 & 2)

Autonomy Phase 1 - Self-Awareness & Planning Foundation

Executive Planning Module cortex/autonomy/executive/planner.py
- Autonomous goal setting and task planning capabilities
- Multi-step reasoning for complex objectives
- Integration with self-state tracking
Self-State Management cortex/data/self_state.json
- Persistent state tracking across sessions
- Memory of past actions and outcomes
- Self-awareness metadata storage
Self Analyzer cortex/autonomy/self/analyzer.py
- Analyzes own performance and decision patterns
- Identifies areas for improvement
- Tracks cognitive patterns over time
Test Suite cortex/tests/test_autonomy_phase1.py
- Unit tests for phase 1 autonomy features

Autonomy Phase 2 - Decision Making & Proactive Behavior

Autonomous Actions Module cortex/autonomy/actions/autonomous_actions.py
- Self-initiated action execution
- Context-aware decision implementation
- Action logging and tracking
Pattern Learning System cortex/autonomy/learning/pattern_learner.py
- Learns from interaction patterns
- Identifies recurring user needs
- Adapts behavior based on learned patterns
Proactive Monitor cortex/autonomy/proactive/monitor.py
- Monitors system state for intervention opportunities
- Detects patterns requiring proactive response
- Background monitoring capabilities
Decision Engine cortex/autonomy/tools/decision_engine.py
- Autonomous decision-making framework
- Weighs options and selects optimal actions
- Integrates with orchestrator for coordinated decisions
Orchestrator cortex/autonomy/tools/orchestrator.py
- Coordinates multiple autonomy subsystems
- Manages tool selection and execution
- Handles NeoMem integration (with disable capability)
Test Suite cortex/tests/test_autonomy_phase2.py
- Unit tests for phase 2 autonomy features

Autonomy Phase 2.5 - Pipeline Refinement

Tightened integration between autonomy modules and reasoning pipeline
Enhanced self-state persistence and tracking
Improved orchestrator reliability
NeoMem integration refinements in vector store handling neomem/neomem/vector_stores/qdrant.py

Added - Documentation

Complete AI Agent Breakdown docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md
- Comprehensive system architecture documentation
- Detailed component descriptions
- Data flow diagrams
- Integration points and API specifications

Changed - Core Integration

Router Updates cortex/router.py
- Integrated autonomy subsystems into main routing logic
- Added endpoints for autonomous decision-making
- Enhanced state management across requests
Reasoning Pipeline cortex/reasoning/reasoning.py
- Integrated autonomy-aware reasoning
- Self-state consideration in reasoning process
Persona Layer cortex/persona/speak.py
- Autonomy-aware response generation
- Self-state reflection in personality expression
Context Handling cortex/context.py
- NeoMem disable capability for flexible deployment

Changed - Development Environment

Updated .gitignore for better workspace management
Cleaned up VSCode settings
Removed .vscode/settings.json from repository

Technical Improvements

Modular autonomy architecture with clear separation of concerns
Test-driven development for new autonomy features
Enhanced state persistence across system restarts
Flexible NeoMem integration with enable/disable controls

Architecture - Autonomy System Design

The autonomy system operates in layers:

Executive Layer - High-level planning and goal setting
Decision Layer - Evaluates options and makes choices
Action Layer - Executes autonomous decisions
Learning Layer - Adapts behavior based on patterns
Monitoring Layer - Proactive awareness of system state

All layers coordinate through the orchestrator and maintain state in self_state.json.

[0.5.2] - 2025-12-12

Fixed - LLM Router & Async HTTP

Critical: Replaced synchronous requests with async httpx in LLM router cortex/llm/llm_router.py
- Event loop blocking was causing timeouts and empty responses
- All three providers (MI50, Ollama, OpenAI) now use await http_client.post()
- Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake
Critical: Fixed missing backend parameter in intake summarization cortex/intake/intake.py:285
- Was defaulting to PRIMARY (MI50) instead of respecting INTAKE_LLM=SECONDARY
- Now correctly uses configured backend (Ollama on 3090)
Relay: Fixed session ID case mismatch core/relay/server.js:87
- UI sends sessionId (camelCase) but relay expected session_id (snake_case)
- Now accepts both variants: req.body.session_id || req.body.sessionId
- Custom session IDs now properly tracked instead of defaulting to "default"

Added - Error Handling & Diagnostics

Added comprehensive error handling in LLM router for all providers
- HTTPError, JSONDecodeError, KeyError, and generic Exception handling
- Detailed error messages with exception type and description
- Provider-specific error logging (mi50, ollama, openai)
Added debug logging in intake summarization
- Logs LLM response length and preview
- Validates non-empty responses before JSON parsing
- Helps diagnose empty or malformed responses

Added - Session Management

Added session persistence endpoints in relay core/relay/server.js:160-171
- GET /sessions/:id - Retrieve session history
- POST /sessions/:id - Save session history
- In-memory storage using Map (ephemeral, resets on container restart)
- Fixes UI "Failed to load session" errors

Changed - Provider Configuration

Added mi50 provider support for llama.cpp server cortex/llm/llm_router.py:62-81
- Uses /completion endpoint with n_predict parameter
- Extracts content field from response
- Configured for MI50 GPU with DeepSeek model
Increased memory retrieval threshold from 0.78 to 0.90 cortex/.env:20
- Filters out low-relevance memories (only returns 90%+ similarity)
- Reduces noise in context retrieval

Technical Improvements

Unified async HTTP handling across all LLM providers
Better separation of concerns between provider implementations
Improved error messages for debugging LLM API failures
Consistent timeout handling (120 seconds for all providers)

[0.5.1] - 2025-12-11

Fixed - Intake Integration

Critical: Fixed bg_summarize() function not defined error
- Was only a TYPE_CHECKING stub, now implemented as logging stub
- Eliminated NameError preventing SESSIONS from persisting correctly
- Function now logs exchange additions and defers summarization to /reason endpoint
Critical: Fixed /ingest endpoint unreachable code in router.py:201-233
- Removed early return that prevented update_last_assistant_message() from executing
- Removed duplicate add_exchange_internal() call
- Implemented lenient error handling (each operation wrapped in try/except)
Intake: Added missing __init__.py to make intake a proper Python package cortex/intake/init.py
- Prevents namespace package issues
- Enables proper module imports
- Exports SESSIONS, add_exchange_internal, summarize_context

Added - Diagnostics & Debugging

Added diagnostic logging to verify SESSIONS singleton behavior
- Module initialization logs SESSIONS object ID intake.py:14
- Each add_exchange_internal() call logs object ID and buffer state intake.py:343-358
Added /debug/sessions HTTP endpoint router.py:276-305
- Inspect SESSIONS from within running Uvicorn worker
- Shows total sessions, session count, buffer sizes, recent exchanges
- Returns SESSIONS object ID for verification
Added /debug/summary HTTP endpoint router.py:238-271
- Test summarize_context() for any session
- Returns L1/L5/L10/L20/L30 summaries
- Includes buffer size and exchange preview

Changed - Intake Architecture

Intake no longer standalone service - runs inside Cortex container as pure Python module
- Imported as from intake.intake import add_exchange_internal, SESSIONS
- No HTTP calls between Cortex and Intake
- Eliminates network latency and dependency on Intake service being up
Deferred summarization: bg_summarize() is now a no-op stub intake.py:318-325
- Actual summarization happens during /reason call via summarize_context()
- Simplifies async/sync complexity
- Prevents NameError when called from add_exchange_internal()
Lenient error handling: /ingest endpoint always returns success router.py:201-233
- Each operation wrapped in try/except
- Logs errors but never fails to avoid breaking chat pipeline
- User requirement: never fail chat pipeline

Documentation

Added single-worker constraint note in cortex/Dockerfile:7-8
- Documents that SESSIONS requires single Uvicorn worker
- Notes that multi-worker scaling requires Redis or shared storage
Updated plan documentation with root cause analysis

[0.5.0] - 2025-11-28

Fixed - Critical API Wiring & Integration

After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.

Cortex → Intake Integration

Fixed IntakeClient to use correct Intake v0.2 API endpoints
- Changed GET /context/{session_id} → GET /summaries?session_id={session_id}
- Updated JSON response parsing to extract summary_text field
- Fixed environment variable name: INTAKE_API → INTAKE_API_URL
- Corrected default port: 7083 → 7080
- Added deprecation warning to summarize_turn() method (endpoint removed in Intake v0.2)

Relay → UI Compatibility

Added OpenAI-compatible endpoint POST /v1/chat/completions
- Accepts standard OpenAI format with messages[] array
- Returns OpenAI-compatible response structure with choices[]
- Extracts last message content from messages array
- Includes usage metadata (stub values for compatibility)
Refactored Relay to use shared handleChatRequest() function
- Both /chat and /v1/chat/completions use same core logic
- Eliminates code duplication
- Consistent error handling across endpoints

Relay → Intake Connection

Fixed Intake URL fallback in Relay server configuration
- Corrected port: 7082 → 7080
- Updated endpoint: /summary → /add_exchange
- Now properly sends exchanges to Intake for summarization

Code Quality & Python Package Structure

Added missing __init__.py files to all Cortex subdirectories
- cortex/llm/__init__.py
- cortex/reasoning/__init__.py
- cortex/persona/__init__.py
- cortex/ingest/__init__.py
- cortex/utils/__init__.py
- Improves package imports and IDE support
Removed unused import in cortex/router.py: from unittest import result
Deleted empty file cortex/llm/resolve_llm_url.py (was 0 bytes, never implemented)

Verified Working

Complete end-to-end message flow now operational:

UI → Relay (/v1/chat/completions)
  ↓
Relay → Cortex (/reason)
  ↓
Cortex → Intake (/summaries) [retrieves context]
  ↓
Cortex 4-stage pipeline:
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer
  3. refine.py → polished answer
  4. persona/speak.py → Lyra personality
  ↓
Cortex → Relay (returns persona response)
  ↓
Relay → Intake (/add_exchange) [async summary]
  ↓
Intake → NeoMem (background memory storage)
  ↓
Relay → UI (final response)

Documentation

Added comprehensive v0.5.0 changelog entry
Updated README.md to reflect v0.5.0 architecture
- Documented new endpoints
- Updated data flow diagrams
- Clarified Intake v0.2 changes
- Corrected service descriptions

Issues Resolved

❌ Cortex could not retrieve context from Intake (wrong endpoint)
❌ UI could not send messages to Relay (endpoint mismatch)
❌ Relay could not send summaries to Intake (wrong port/endpoint)
❌ Python package imports were implicit (missing init.py)

Known Issues (Non-Critical)

Session management endpoints not implemented in Relay (GET/POST /sessions/:id)
RAG service currently disabled in docker-compose.yml
Cortex /ingest endpoint is a stub returning {"status": "ok"}

Migration Notes

If upgrading from v0.4.x:

Pull latest changes from git
Verify environment variables in .env files:
- Check INTAKE_API_URL=http://intake:7080 (not INTAKE_API)
- Verify all service URLs use correct ports
Restart Docker containers: docker-compose down && docker-compose up -d
Test with a simple message through the UI

[Infrastructure v1.0.0] - 2025-11-26

Changed - Environment Variable Consolidation

Major reorganization to eliminate duplication and improve maintainability

Consolidated 9 scattered .env files into single source of truth architecture
Root .env now contains all shared infrastructure (LLM backends, databases, API keys, service URLs)
Service-specific .env files minimized to only essential overrides:
- cortex/.env: Reduced from 42 to 22 lines (operational parameters only)
- neomem/.env: Reduced from 26 to 14 lines (LLM naming conventions only)
- intake/.env: Kept at 8 lines (already minimal)
Result: ~24% reduction in total configuration lines (197 → ~150)

Docker Compose Consolidation

All services now defined in single root docker-compose.yml
Relay service updated with complete configuration (env_file, volumes)
Removed redundant core/docker-compose.yml (marked as DEPRECATED)
Standardized network communication to use Docker container names

Service URL Standardization

Internal services use container names: http://neomem-api:7077, http://cortex:7081
External services use IP addresses: http://10.0.0.43:8000 (vLLM), http://10.0.0.3:11434 (Ollama)
Removed IP/container name inconsistencies across files

Added - Security & Documentation

Security Templates - Created .env.example files for all services

Root .env.example with sanitized credentials
Service-specific templates: cortex/.env.example, neomem/.env.example, intake/.env.example, rag/.env.example
All .env.example files safe to commit to version control

Documentation

ENVIRONMENT_VARIABLES.md: Comprehensive reference for all environment variables
- Variable descriptions, defaults, and usage examples
- Multi-backend LLM strategy documentation
- Troubleshooting guide
- Security best practices
DEPRECATED_FILES.md: Deletion guide for deprecated files with verification steps

Enhanced .gitignore

Ignores all .env files (including subdirectories)
Tracks .env.example templates for documentation
Ignores .env-backups/ directory

Removed

core/.env - Redundant with root .env, now deleted
core/docker-compose.yml - Consolidated into main compose file (marked DEPRECATED)

Fixed

Eliminated duplicate OPENAI_API_KEY across 5+ files
Eliminated duplicate LLM backend URLs across 4+ files
Eliminated duplicate database credentials across 3+ files
Resolved Cortex environment: section override in docker-compose (now uses env_file)

Architecture - Multi-Backend LLM Strategy

Root .env provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE:

Cortex → vLLM (PRIMARY) for autonomous reasoning
NeoMem → Ollama (SECONDARY) + OpenAI embeddings
Intake → vLLM (PRIMARY) for summarization
Relay → Fallback chain with user preference

Preserves per-service flexibility while eliminating URL duplication.

Migration

All original .env files backed up to .env-backups/ with timestamp 20251126_025334
Rollback plan documented in ENVIRONMENT_VARIABLES.md
Verification steps provided in DEPRECATED_FILES.md

[0.4.x] - 2025-11-13

Added - Multi-Stage Reasoning Pipeline

Cortex v0.5 - Complete architectural overhaul

New reasoning.py module
- Async reasoning engine
- Accepts user prompt, identity, RAG block, and reflection notes
- Produces draft internal answers
- Uses primary backend (vLLM)
New reflection.py module
- Fully async meta-awareness layer
- Produces actionable JSON "internal notes"
- Enforces strict JSON schema and fallback parsing
- Forces cloud backend (backend_override="cloud")
Integrated refine.py into pipeline
- New stage between reflection and persona
- Runs exclusively on primary vLLM backend (MI50)
- Produces final, internally consistent output for downstream persona layer
Backend override system
- Each LLM call can now select its own backend
- Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary
Identity loader
- Added identity.py with load_identity() for consistent persona retrieval
Ingest handler
- Async stub created for future Intake → NeoMem → RAG pipeline

Cortex v0.4.1 - RAG Integration

RAG integration
- Added rag.py with query_rag() and format_rag_block()
- Cortex now queries local RAG API (http://10.0.0.41:7090/rag/search)
- Synthesized answers and top excerpts injected into reasoning prompt

Changed - Unified LLM Architecture

Cortex v0.5

Unified LLM backend URL handling across Cortex
- ENV variables must now contain FULL API endpoints
- Removed all internal path-appending (e.g. .../v1/completions)
- llm_router.py rewritten to use env-provided URLs as-is
- Ensures consistent behavior between draft, reflection, refine, and persona
Rebuilt main.py
- Removed old annotation/analysis logic
- New structure: load identity → get RAG → reflect → reason → return draft+notes
- Routes now clean and minimal (/reason, /ingest, /health)
- Async path throughout Cortex
Refactored llm_router.py
- Removed old fallback logic during overrides
- OpenAI requests now use /v1/chat/completions
- Added proper OpenAI Authorization headers
- Distinct payload format for vLLM vs OpenAI
- Unified, correct parsing across models
Simplified Cortex architecture
- Removed deprecated "context.py" and old reasoning code
- Relay completely decoupled from smart behavior
Updated environment specification
- LLM_PRIMARY_URL now set to http://10.0.0.43:8000/v1/completions
- LLM_SECONDARY_URL remains http://10.0.0.3:11434/api/generate (Ollama)
- LLM_CLOUD_URL set to https://api.openai.com/v1/chat/completions

Cortex v0.4.1

Revised /reason endpoint
- Now builds unified context blocks: [Intake] → recent summaries, [RAG] → contextual knowledge, [User Message] → current input
- Calls call_llm() for first pass, then reflection_loop() for meta-evaluation
- Returns cortex_prompt, draft_output, final_output, and normalized reflection
Reflection Pipeline Stability
- Cleaned parsing to normalize JSON vs. text reflections
- Added fallback handling for malformed or non-JSON outputs
- Log system improved to show raw JSON, extracted fields, and normalized summary
Async Summarization (Intake v0.2.1)
- Intake summaries now run in background threads to avoid blocking Cortex
- Summaries (L1–L∞) logged asynchronously with [BG] tags
Environment & Networking Fixes
- Verified .env variables propagate correctly inside Cortex container
- Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG
- Adjusted localhost calls to service-IP mapping
Behavioral Updates
- Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers)
- RAG context successfully grounds reasoning outputs
- Intake and NeoMem confirmed receiving summaries via /add_exchange
- Log clarity pass: all reflective and contextual blocks clearly labeled

Fixed

Cortex v0.5

Resolved endpoint conflict where router expected base URLs and refine expected full URLs
- Fixed by standardizing full-URL behavior across entire system
Reflection layer no longer fails silently (previously returned [""] due to MythoMax)
Resolved 404/401 errors caused by incorrect OpenAI URL endpoints
No more double-routing through vLLM during reflection
Corrected async/sync mismatch in multiple locations
Eliminated double-path bug (/v1/completions/v1/completions) caused by previous router logic

Removed

Cortex v0.5

Legacy annotate, reason_check glue logic from old architecture
Old backend probing junk code
Stale imports and unused modules leftover from previous prototype

Verified

Cortex v0.5

Cortex → vLLM (MI50) → refine → final_output now functioning correctly
Refine shows used_primary_backend: true and no fallback
Manual curl test confirms endpoint accuracy

Known Issues

Cortex v0.5

Refine sometimes prefixes output with "Final Answer:"; next version will sanitize this
Hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned)

Cortex v0.4.1

NeoMem tuning needed - improve retrieval latency and relevance
Need dedicated /reflections/recent endpoint for Cortex
Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem)
Add persistent reflection recall (use prior reflections as meta-context)
Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields)
Tighten temperature and prompt control for factual consistency
RAG optimization: add source ranking, filtering, multi-vector hybrid search
Cache RAG responses per session to reduce duplicate calls

Notes

Cortex v0.5

This is the largest structural change to Cortex so far. It establishes:

Multi-model cognition
Clean layering
Identity + reflection separation
Correct async code
Deterministic backend routing
Predictable JSON reflection

The system is now ready for:

Refinement loops
Persona-speaking layer
Containerized RAG
Long-term memory integration
True emergent-behavior experiments

[0.3.x] - 2025-10-28 to 2025-09-26

Added

[Lyra Core v0.3.2 + Web UI v0.2.0] - 2025-10-28

New UI
- Cleaned up UI look and feel
Sessions
- Sessions now persist over time
- Ability to create new sessions or load sessions from previous instance
- When changing session, updates what the prompt sends to relay (doesn't prompt with messages from other sessions)
- Relay correctly wired in

[Lyra-Core 0.3.1] - 2025-10-09

NVGRAM Integration (Full Pipeline Reconnected)
- Replaced legacy Mem0 service with NVGRAM microservice (nvgram-api @ port 7077)
- Updated server.js in Relay to route all memory ops via ${NVGRAM_API}/memories and /search
- Added .env variable: NVGRAM_API=http://nvgram-api:7077
- Verified end-to-end Lyra conversation persistence: relay → nvgram-api → postgres/neo4j → relay → ollama → ui
- ✅ Memories stored, retrieved, and re-injected successfully

[Lyra-Core v0.3.0] - 2025-09-26

Salience filtering in Relay
- .env configurable: SALIENCE_ENABLED, SALIENCE_MODE, SALIENCE_MODEL, SALIENCE_API_URL
- Supports heuristic and llm classification modes
- LLM-based salience filter integrated with Cortex VM running llama-server
Logging improvements
- Added debug logs for salience mode, raw LLM output, and unexpected outputs
- Fail-closed behavior for unexpected LLM responses
Successfully tested with Phi-3.5-mini and Qwen2-0.5B-Instruct as salience classifiers
Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply

[Cortex v0.3.0] - 2025-10-31

Cortex Service (FastAPI)
- New standalone reasoning engine (cortex/main.py) with endpoints:
  - GET /health – reports active backend + NeoMem status
  - POST /reason – evaluates {prompt, response} pairs
  - POST /annotate – experimental text analysis
- Background NeoMem health monitor (5-minute interval)
Multi-Backend Reasoning Support
- Environment-driven backend selection via LLM_FORCE_BACKEND
- Supports: Primary (vLLM MI50), Secondary (Ollama 3090), Cloud (OpenAI), Fallback (llama.cpp CPU)
- Per-backend model variables: LLM_PRIMARY_MODEL, LLM_SECONDARY_MODEL, LLM_CLOUD_MODEL, LLM_FALLBACK_MODEL
Response Normalization Layer
- Implemented normalize_llm_response() to merge streamed outputs and repair malformed JSON
- Handles Ollama's multi-line streaming and Mythomax's missing punctuation issues
- Prints concise debug previews of merged content
Environment Simplification
- Each service (intake, cortex, neomem) now maintains its own .env file
- Removed reliance on shared/global env file to prevent cross-contamination
- Verified Docker Compose networking across containers

[NeoMem 0.1.2] - 2025-10-27 (formerly NVGRAM)

Renamed NVGRAM to NeoMem
- All future updates under name NeoMem
- Features unchanged

[NVGRAM 0.1.1] - 2025-10-08

Async Memory Rewrite (Stability + Safety Patch)
- Introduced AsyncMemory class with fully asynchronous vector and graph store writes
- Added input sanitation to prevent embedding errors ('list' object has no attribute 'replace')
- Implemented flatten_messages() helper in API layer to clean malformed payloads
- Added structured request logging via RequestLoggingMiddleware (FastAPI middleware)
- Health endpoint (/health) returns structured JSON {status, version, service}
- Startup logs include sanitized embedder config with masked API keys

[NVGRAM 0.1.0] - 2025-10-07

Initial fork of Mem0 → NVGRAM
- Created fully independent local-first memory engine based on Mem0 OSS
- Renamed all internal modules, Docker services, environment variables from mem0 → nvgram
- New service name: nvgram-api, default port 7077
- Maintains same API endpoints (/memories, /search) for drop-in compatibility
- Uses FastAPI, Postgres, and Neo4j as persistent backends

[Lyra-Mem0 0.3.2] - 2025-10-05

Ollama LLM reasoning alongside OpenAI embeddings
- Introduced LLM_PROVIDER=ollama, LLM_MODEL, and OLLAMA_HOST in .env.3090
- Verified local 3090 setup using qwen2.5:7b-instruct-q4_K_M
- Split processing: Embeddings → OpenAI text-embedding-3-small, LLM → Local Ollama
Added .env.3090 template for self-hosted inference nodes
Integrated runtime diagnostics and seeder progress tracking
- File-level + message-level progress bars
- Retry/back-off logic for timeouts (3 attempts)
- Event logging (ADD / UPDATE / NONE) for every memory record
Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers
Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090)

[Lyra-Mem0 0.3.1] - 2025-10-03

HuggingFace TEI integration (local 3090 embedder)
Dual-mode environment switch between OpenAI cloud and local
CSV export of memories from Postgres (payload->>'data')

[Lyra-Mem0 0.3.0]

Ollama embeddings in Mem0 OSS container
- Configure EMBEDDER_PROVIDER=ollama, EMBEDDER_MODEL, OLLAMA_HOST via .env
- Mounted main.py override from host into container to load custom DEFAULT_CONFIG
- Installed ollama Python client into custom API container image
.env.3090 file for external embedding mode (3090 machine)
Workflow for multiple embedding modes: LAN-based 3090/Ollama, Local-only CPU, OpenAI fallback

[Lyra-Mem0 v0.2.1]

Seeding pipeline
- Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0
- Implemented incremental seeding option (skip existing memories, only add new ones)
- Verified insert process with Postgres-backed history DB

[Intake v0.1.0] - 2025-10-27

Receives messages from relay and summarizes them in cascading format
Continues to summarize smaller amounts of exchanges while generating large-scale conversational summaries (L20)
Currently logs summaries to .log file in /project-lyra/intake-logs/

[Lyra-Cortex v0.2.0] - 2025-09-26

Integrated llama-server on dedicated Cortex VM (Proxmox)
Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs
Benchmarked Phi-3.5-mini performance: ~18 tokens/sec CPU-only on Ryzen 7 7800X
Salience classification functional but sometimes inconsistent
Tested Qwen2-0.5B-Instruct GGUF as alternative salience classifier
- Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval)
- More responsive but over-classifies messages as "salient"
Established .env integration for model ID (SALIENCE_MODEL), enabling hot-swap between models

Changed

[Lyra-Core 0.3.1] - 2025-10-09

Renamed MEM0_URL → NVGRAM_API across all relay environment configs
Updated Docker Compose service dependency order
- relay now depends on nvgram-api healthcheck
- Removed mem0 references and volumes
Minor cleanup to Persona fetch block (null-checks and safer default persona string)

[Lyra-Core v0.3.1] - 2025-09-27

Removed salience filter logic; Cortex is now default annotator
All user messages stored in Mem0; no discard tier applied
Cortex annotations (metadata.cortex) now attached to memories
Debug logging improvements
- Pretty-print Cortex annotations
- Injected prompt preview
- Memory search hit list with scores
.env toggle (CORTEX_ENABLED) to bypass Cortex when needed

[Lyra-Core v0.3.0] - 2025-09-26

Refactored server.js to gate mem.add() calls behind salience filter
Updated .env to support SALIENCE_MODEL

[Cortex v0.3.0] - 2025-10-31

Refactored reason_check() to dynamically switch between prompt and chat mode depending on backend
Enhanced startup logs to announce active backend, model, URL, and mode
Improved error handling with clearer "Reasoning error" messages

[NVGRAM 0.1.1] - 2025-10-08

Replaced synchronous Memory.add() with async-safe version supporting concurrent vector + graph writes
Normalized indentation and cleaned duplicate main.py references
Removed redundant FastAPI() app reinitialization
Updated internal logging to INFO-level timing format
Deprecated @app.on_event("startup") → will migrate to lifespan handler in v0.1.2

[NVGRAM 0.1.0] - 2025-10-07

Removed dependency on external mem0ai SDK — all logic now local
Re-pinned requirements: fastapi==0.115.8, uvicorn==0.34.0, pydantic==2.10.4, python-dotenv==1.0.1, psycopg>=3.2.8, ollama
Adjusted docker-compose and .env templates to use new NVGRAM naming

[Lyra-Mem0 0.3.2] - 2025-10-05

Updated main.py configuration block to load LLM_PROVIDER, LLM_MODEL, OLLAMA_BASE_URL
- Fallback to OpenAI if Ollama unavailable
Adjusted docker-compose.yml mount paths to correctly map /app/main.py
Normalized .env loading so mem0-api and host environment share identical values
Improved seeder logging and progress telemetry
Added explicit temperature field to DEFAULT_CONFIG['llm']['config']

[Lyra-Mem0 0.3.0]

docker-compose.yml updated to mount local main.py and .env.3090
Built custom Dockerfile (mem0-api-server:latest) extending base image with pip install ollama
Updated requirements.txt to include ollama package
Adjusted Mem0 container config so main.py pulls environment variables with dotenv
Tested new embeddings path with curl /memories API call

[Lyra-Mem0 v0.2.1]

Updated main.py to load configuration from .env using dotenv and support multiple embedder backends
Mounted host main.py into container so local edits persist across rebuilds
Updated docker-compose.yml to mount .env.3090 and support swap between profiles
Built custom Dockerfile (mem0-api-server:latest) including pip install ollama
Updated requirements.txt with ollama dependency
Adjusted startup flow so container automatically connects to external Ollama host (LAN IP)
Added logging to confirm model pulls and embedding requests

Fixed

[Lyra-Core 0.3.1] - 2025-10-09

Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling
/memories POST failures no longer crash Relay; now logged gracefully as relay error Error: memAdd failed: 500
Improved injected prompt debugging (DEBUG_PROMPT=true now prints clean JSON)

[Lyra-Core v0.3.1] - 2025-09-27

Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner
Relay no longer "hangs" on malformed Cortex outputs

[Cortex v0.3.0] - 2025-10-31

Corrected broken vLLM endpoint routing (/v1/completions)
Stabilized cross-container health reporting for NeoMem
Resolved JSON parse failures caused by streaming chunk delimiters

[NVGRAM 0.1.1] - 2025-10-08

Eliminated repeating 500 error from OpenAI embedder caused by non-string message content
Masked API key leaks from boot logs
Ensured Neo4j reconnects gracefully on first retry

[Lyra-Mem0 0.3.2] - 2025-10-05

Resolved crash during startup: TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'
Corrected mount type mismatch (file vs directory) causing OCI runtime create failed errors
Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests
"Unknown event" warnings now safely ignored (no longer break seeding loop)
Confirmed full dual-provider operation in logs (api.openai.com + 10.0.0.3:11434/api/chat)

[Lyra-Mem0 0.3.1] - 2025-10-03

.env CRLF vs LF line ending issues
Local seeding now possible via HuggingFace server

[Lyra-Mem0 0.3.0]

Resolved container boot failure caused by missing ollama dependency (ModuleNotFoundError)
Fixed config overwrite issue where rebuilding container restored stock main.py
Worked around Neo4j error (vector.similarity.cosine(): mismatched vector dimensions) by confirming OpenAI vs. Ollama embedding vector sizes

[Lyra-Mem0 v0.2.1]

Seeder process originally failed on old memories — now skips duplicates and continues batch
Resolved container boot error (ModuleNotFoundError: ollama) by extending image
Fixed overwrite issue where stock main.py replaced custom config during rebuild
Worked around Neo4j vector.similarity.cosine() dimension mismatch

Known Issues

[Lyra-Core v0.3.0] - 2025-09-26

Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient"
Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi")
CPU-only inference is functional but limited; larger models recommended once GPU available

[Lyra-Cortex v0.2.0] - 2025-09-26

Small models tend to drift or over-classify
CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models
Need to set up systemd service for llama-server to auto-start on VM reboot

Observations

[Lyra-Mem0 0.3.2] - 2025-10-05

Stable GPU utilization: ~8 GB VRAM @ 92% load, ≈ 67°C under sustained seeding
Next revision will re-format seed JSON to preserve role context (user vs assistant)

[Lyra-Mem0 v0.2.1]

To fully unify embedding modes, a Hugging Face / local model with 1536-dim embeddings will be needed (to match OpenAI's schema)
Current Ollama model (mxbai-embed-large) works, but returns 1024-dim vectors
Seeder workflow validated but should be wrapped in repeatable weekly run for full Cloud→Local sync

Next Steps

[Lyra-Core 0.3.1] - 2025-10-09

Add salience visualization (e.g., memory weights displayed in injected system message)
Begin schema alignment with NVGRAM v0.1.2 for confidence scoring
Add relay auto-retry for transient 500 responses from NVGRAM

[NVGRAM 0.1.1] - 2025-10-08

Integrate salience scoring and embedding confidence weight fields in Postgres schema
Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall
Migrate from deprecated on_event → lifespan pattern in 0.1.2

[NVGRAM 0.1.0] - 2025-10-07

Integrate NVGRAM as new default backend in Lyra Relay
Deprecate remaining Mem0 references and archive old configs
Begin versioning as standalone project (nvgram-core, nvgram-api, etc.)

[Intake v0.1.0] - 2025-10-27

Feed intake into NeoMem
Generate daily/hourly overall summary (IE: Today Brian and Lyra worked on x, y, and z)
Generate session-aware summaries with own intake hopper

[0.2.x] - 2025-09-30 to 2025-09-24

Added

[Lyra-Mem0 v0.2.0] - 2025-09-30

Standalone Lyra-Mem0 stack created at ~/lyra-mem0/
- Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking
- Added working docker-compose.mem0.yml and custom Dockerfile for building Mem0 API server
Verified REST API functionality
- POST /memories works for adding memories
- POST /search works for semantic search
Successful end-to-end test with persisted memory: "Likes coffee in the morning" → retrievable via search ✅

[Lyra-Core v0.2.0] - 2025-09-24

Migrated Relay to use mem0ai SDK instead of raw fetch calls
Implemented sessionId support (client-supplied, fallback to default)
Added debug logs for memory add/search
Cleaned up Relay structure for clarity

Changed

[Lyra-Mem0 v0.2.0] - 2025-09-30

Split architecture into modular stacks:
- ~/lyra-core (Relay, Persona-Sidecar, etc.)
- ~/lyra-mem0 (Mem0 OSS memory stack)
Removed old embedded mem0 containers from Lyra-Core compose file
Added Lyra-Mem0 section in README.md

Next Steps

[Lyra-Mem0 v0.2.0] - 2025-09-30

Wire Relay → Mem0 API (integration not yet complete)
Add integration tests to verify persistence and retrieval from within Lyra-Core

[0.1.x] - 2025-09-25 to 2025-09-23

Added

[Lyra_RAG v0.1.0] - 2025-11-07

Initial standalone RAG module for Project Lyra
Persistent ChromaDB vector store (./chromadb)
Importer rag_chat_import.py with:
- Recursive folder scanning and category tagging
- Smart chunking (~5k chars)
- SHA-1 deduplication and chat-ID metadata
- Timestamp fields (file_modified, imported_at)
- Background-safe operation (nohup/tmux)
68 Lyra-category chats imported:
- 6,556 new chunks added
- 1,493 duplicates skipped
- 7,997 total vectors stored

[Lyra_RAG v0.1.0 API] - 2025-11-07

/rag/search FastAPI endpoint implemented (port 7090)
Supports natural-language queries and returns top related excerpts
Added answer synthesis step using gpt-4o-mini

[Lyra-Core v0.1.0] - 2025-09-23

First working MVP of Lyra Core Relay
Relay service accepts POST /v1/chat/completions (OpenAI-compatible)
Memory integration with Mem0:
- POST /memories on each user message
- POST /search before LLM call
Persona Sidecar integration (GET /current)
OpenAI GPT + Ollama (Mythomax) support in Relay
Simple browser-based chat UI (talks to Relay at http://<host>:7078)
.env standardization for Relay + Mem0 + Postgres + Neo4j
Working Neo4j + Postgres backing stores for Mem0
Initial MVP relay service with raw fetch calls to Mem0
Dockerized with basic healthcheck

[Lyra-Cortex v0.1.0] - 2025-09-25

First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD)
Built llama.cpp with llama-server target via CMake
Integrated Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF) model
Verified API compatibility at /v1/chat/completions
Local test successful via curl → ~523 token response generated
Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X)
Confirmed usable for salience scoring, summarization, and lightweight reasoning

Fixed

[Lyra-Core v0.1.0] - 2025-09-23

Resolved crash loop in Neo4j by restricting env vars (NEO4J_AUTH only)
Relay now correctly reads MEM0_URL and MEM0_API_KEY from .env

Verified

[Lyra_RAG v0.1.0] - 2025-11-07

Successful recall of Lyra-Core development history (v0.3.0 snapshot)
Correct metadata and category tagging for all new imports

Known Issues

[Lyra-Core v0.1.0] - 2025-09-23

No feedback loop (thumbs up/down) yet
Forget/delete flow is manual (via memory IDs)
Memory latency ~1–4s depending on embedding model

Next Planned

[Lyra_RAG v0.1.0] - 2025-11-07

Optional where filter parameter for category/date queries
Graceful "no results" handler for empty retrievals
rag_docs_import.py for PDFs and other document types

65 KiB Raw Permalink Blame History Unescape Escape

Project Lyra Changelog

[Unreleased]

[0.9.0] - 2025-12-29

Added - Trilium Notes Integration

Changed - Spelling Corrections

[0.8.0] - 2025-12-26

Added - Tool Calling & "Show Your Work" Transparency Feature

Changed - CORS & Architecture

Fixed - Critical JavaScript & SSE Issues

Technical Improvements

Architecture - Tool Calling Flow

Documentation

Known Limitations

Migration Notes

[0.7.0] - 2025-12-21

Added - Standard Mode & UI Enhancements

Changed - Architecture & Routing

Fixed - Critical Issues

Technical Improvements

Architecture - Dual-Mode Chat System

Known Limitations

Migration Notes

[0.6.0] - 2025-12-18

Added - Autonomy System (Phase 1 & 2)

Added - Documentation

Changed - Core Integration

Changed - Development Environment

Technical Improvements

Architecture - Autonomy System Design

[0.5.2] - 2025-12-12

Fixed - LLM Router & Async HTTP

Added - Error Handling & Diagnostics

Added - Session Management

Changed - Provider Configuration

Technical Improvements

[0.5.1] - 2025-12-11

Fixed - Intake Integration

Added - Diagnostics & Debugging

Changed - Intake Architecture

Documentation

[0.5.0] - 2025-11-28

Fixed - Critical API Wiring & Integration

Cortex → Intake Integration

Relay → UI Compatibility

Relay → Intake Connection

Code Quality & Python Package Structure

Verified Working

Documentation

Issues Resolved

Known Issues (Non-Critical)

Migration Notes

[Infrastructure v1.0.0] - 2025-11-26

Changed - Environment Variable Consolidation

Added - Security & Documentation

Removed

Fixed

Architecture - Multi-Backend LLM Strategy

Migration

[0.4.x] - 2025-11-13

Added - Multi-Stage Reasoning Pipeline

Changed - Unified LLM Architecture

Fixed

Removed

Verified

Known Issues

Notes

[0.3.x] - 2025-10-28 to 2025-09-26

Added

Changed

Fixed

Known Issues

Observations

Next Steps

[0.2.x] - 2025-09-30 to 2025-09-24

Added

Changed

Next Steps

[0.1.x] - 2025-09-25 to 2025-09-23

Added

65 KiB

Raw Permalink Blame History