Files
project-lyra/CHANGELOG.md
2025-12-12 02:58:23 -05:00

37 KiB
Raw Blame History

Project Lyra Changelog

All notable changes to Project Lyra. Format based on Keep a Changelog and Semantic Versioning.


[Unreleased]


[0.5.2] - 2025-12-12

Fixed - LLM Router & Async HTTP

  • Critical: Replaced synchronous requests with async httpx in LLM router cortex/llm/llm_router.py
    • Event loop blocking was causing timeouts and empty responses
    • All three providers (MI50, Ollama, OpenAI) now use await http_client.post()
    • Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake
  • Critical: Fixed missing backend parameter in intake summarization cortex/intake/intake.py:285
    • Was defaulting to PRIMARY (MI50) instead of respecting INTAKE_LLM=SECONDARY
    • Now correctly uses configured backend (Ollama on 3090)
  • Relay: Fixed session ID case mismatch core/relay/server.js:87
    • UI sends sessionId (camelCase) but relay expected session_id (snake_case)
    • Now accepts both variants: req.body.session_id || req.body.sessionId
    • Custom session IDs now properly tracked instead of defaulting to "default"

Added - Error Handling & Diagnostics

  • Added comprehensive error handling in LLM router for all providers
    • HTTPError, JSONDecodeError, KeyError, and generic Exception handling
    • Detailed error messages with exception type and description
    • Provider-specific error logging (mi50, ollama, openai)
  • Added debug logging in intake summarization
    • Logs LLM response length and preview
    • Validates non-empty responses before JSON parsing
    • Helps diagnose empty or malformed responses

Added - Session Management

  • Added session persistence endpoints in relay core/relay/server.js:160-171
    • GET /sessions/:id - Retrieve session history
    • POST /sessions/:id - Save session history
    • In-memory storage using Map (ephemeral, resets on container restart)
    • Fixes UI "Failed to load session" errors

Changed - Provider Configuration

  • Added mi50 provider support for llama.cpp server cortex/llm/llm_router.py:62-81
    • Uses /completion endpoint with n_predict parameter
    • Extracts content field from response
    • Configured for MI50 GPU with DeepSeek model
  • Increased memory retrieval threshold from 0.78 to 0.90 cortex/.env:20
    • Filters out low-relevance memories (only returns 90%+ similarity)
    • Reduces noise in context retrieval

Technical Improvements

  • Unified async HTTP handling across all LLM providers
  • Better separation of concerns between provider implementations
  • Improved error messages for debugging LLM API failures
  • Consistent timeout handling (120 seconds for all providers)

[0.5.1] - 2025-12-11

Fixed - Intake Integration

  • Critical: Fixed bg_summarize() function not defined error
    • Was only a TYPE_CHECKING stub, now implemented as logging stub
    • Eliminated NameError preventing SESSIONS from persisting correctly
    • Function now logs exchange additions and defers summarization to /reason endpoint
  • Critical: Fixed /ingest endpoint unreachable code in router.py:201-233
    • Removed early return that prevented update_last_assistant_message() from executing
    • Removed duplicate add_exchange_internal() call
    • Implemented lenient error handling (each operation wrapped in try/except)
  • Intake: Added missing __init__.py to make intake a proper Python package cortex/intake/init.py
    • Prevents namespace package issues
    • Enables proper module imports
    • Exports SESSIONS, add_exchange_internal, summarize_context

Added - Diagnostics & Debugging

  • Added diagnostic logging to verify SESSIONS singleton behavior
  • Added /debug/sessions HTTP endpoint router.py:276-305
    • Inspect SESSIONS from within running Uvicorn worker
    • Shows total sessions, session count, buffer sizes, recent exchanges
    • Returns SESSIONS object ID for verification
  • Added /debug/summary HTTP endpoint router.py:238-271
    • Test summarize_context() for any session
    • Returns L1/L5/L10/L20/L30 summaries
    • Includes buffer size and exchange preview

Changed - Intake Architecture

  • Intake no longer standalone service - runs inside Cortex container as pure Python module
    • Imported as from intake.intake import add_exchange_internal, SESSIONS
    • No HTTP calls between Cortex and Intake
    • Eliminates network latency and dependency on Intake service being up
  • Deferred summarization: bg_summarize() is now a no-op stub intake.py:318-325
    • Actual summarization happens during /reason call via summarize_context()
    • Simplifies async/sync complexity
    • Prevents NameError when called from add_exchange_internal()
  • Lenient error handling: /ingest endpoint always returns success router.py:201-233
    • Each operation wrapped in try/except
    • Logs errors but never fails to avoid breaking chat pipeline
    • User requirement: never fail chat pipeline

Documentation

  • Added single-worker constraint note in cortex/Dockerfile:7-8
    • Documents that SESSIONS requires single Uvicorn worker
    • Notes that multi-worker scaling requires Redis or shared storage
  • Updated plan documentation with root cause analysis

[0.5.0] - 2025-11-28

Fixed - Critical API Wiring & Integration

After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.

Cortex → Intake Integration

  • Fixed IntakeClient to use correct Intake v0.2 API endpoints
    • Changed GET /context/{session_id}GET /summaries?session_id={session_id}
    • Updated JSON response parsing to extract summary_text field
    • Fixed environment variable name: INTAKE_APIINTAKE_API_URL
    • Corrected default port: 70837080
    • Added deprecation warning to summarize_turn() method (endpoint removed in Intake v0.2)

Relay → UI Compatibility

  • Added OpenAI-compatible endpoint POST /v1/chat/completions
    • Accepts standard OpenAI format with messages[] array
    • Returns OpenAI-compatible response structure with choices[]
    • Extracts last message content from messages array
    • Includes usage metadata (stub values for compatibility)
  • Refactored Relay to use shared handleChatRequest() function
    • Both /chat and /v1/chat/completions use same core logic
    • Eliminates code duplication
    • Consistent error handling across endpoints

Relay → Intake Connection

  • Fixed Intake URL fallback in Relay server configuration
    • Corrected port: 70827080
    • Updated endpoint: /summary/add_exchange
    • Now properly sends exchanges to Intake for summarization

Code Quality & Python Package Structure

  • Added missing __init__.py files to all Cortex subdirectories
    • cortex/llm/__init__.py
    • cortex/reasoning/__init__.py
    • cortex/persona/__init__.py
    • cortex/ingest/__init__.py
    • cortex/utils/__init__.py
    • Improves package imports and IDE support
  • Removed unused import in cortex/router.py: from unittest import result
  • Deleted empty file cortex/llm/resolve_llm_url.py (was 0 bytes, never implemented)

Verified Working

Complete end-to-end message flow now operational:

UI → Relay (/v1/chat/completions)
  ↓
Relay → Cortex (/reason)
  ↓
Cortex → Intake (/summaries) [retrieves context]
  ↓
Cortex 4-stage pipeline:
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer
  3. refine.py → polished answer
  4. persona/speak.py → Lyra personality
  ↓
Cortex → Relay (returns persona response)
  ↓
Relay → Intake (/add_exchange) [async summary]
  ↓
Intake → NeoMem (background memory storage)
  ↓
Relay → UI (final response)

Documentation

  • Added comprehensive v0.5.0 changelog entry
  • Updated README.md to reflect v0.5.0 architecture
    • Documented new endpoints
    • Updated data flow diagrams
    • Clarified Intake v0.2 changes
    • Corrected service descriptions

Issues Resolved

  • Cortex could not retrieve context from Intake (wrong endpoint)
  • UI could not send messages to Relay (endpoint mismatch)
  • Relay could not send summaries to Intake (wrong port/endpoint)
  • Python package imports were implicit (missing init.py)

Known Issues (Non-Critical)

  • Session management endpoints not implemented in Relay (GET/POST /sessions/:id)
  • RAG service currently disabled in docker-compose.yml
  • Cortex /ingest endpoint is a stub returning {"status": "ok"}

Migration Notes

If upgrading from v0.4.x:

  1. Pull latest changes from git
  2. Verify environment variables in .env files:
    • Check INTAKE_API_URL=http://intake:7080 (not INTAKE_API)
    • Verify all service URLs use correct ports
  3. Restart Docker containers: docker-compose down && docker-compose up -d
  4. Test with a simple message through the UI

[Infrastructure v1.0.0] - 2025-11-26

Changed - Environment Variable Consolidation

Major reorganization to eliminate duplication and improve maintainability

  • Consolidated 9 scattered .env files into single source of truth architecture
  • Root .env now contains all shared infrastructure (LLM backends, databases, API keys, service URLs)
  • Service-specific .env files minimized to only essential overrides:
    • cortex/.env: Reduced from 42 to 22 lines (operational parameters only)
    • neomem/.env: Reduced from 26 to 14 lines (LLM naming conventions only)
    • intake/.env: Kept at 8 lines (already minimal)
  • Result: ~24% reduction in total configuration lines (197 → ~150)

Docker Compose Consolidation

  • All services now defined in single root docker-compose.yml
  • Relay service updated with complete configuration (env_file, volumes)
  • Removed redundant core/docker-compose.yml (marked as DEPRECATED)
  • Standardized network communication to use Docker container names

Service URL Standardization

  • Internal services use container names: http://neomem-api:7077, http://cortex:7081
  • External services use IP addresses: http://10.0.0.43:8000 (vLLM), http://10.0.0.3:11434 (Ollama)
  • Removed IP/container name inconsistencies across files

Added - Security & Documentation

Security Templates - Created .env.example files for all services

  • Root .env.example with sanitized credentials
  • Service-specific templates: cortex/.env.example, neomem/.env.example, intake/.env.example, rag/.env.example
  • All .env.example files safe to commit to version control

Documentation

  • ENVIRONMENT_VARIABLES.md: Comprehensive reference for all environment variables
    • Variable descriptions, defaults, and usage examples
    • Multi-backend LLM strategy documentation
    • Troubleshooting guide
    • Security best practices
  • DEPRECATED_FILES.md: Deletion guide for deprecated files with verification steps

Enhanced .gitignore

  • Ignores all .env files (including subdirectories)
  • Tracks .env.example templates for documentation
  • Ignores .env-backups/ directory

Removed

  • core/.env - Redundant with root .env, now deleted
  • core/docker-compose.yml - Consolidated into main compose file (marked DEPRECATED)

Fixed

  • Eliminated duplicate OPENAI_API_KEY across 5+ files
  • Eliminated duplicate LLM backend URLs across 4+ files
  • Eliminated duplicate database credentials across 3+ files
  • Resolved Cortex environment: section override in docker-compose (now uses env_file)

Architecture - Multi-Backend LLM Strategy

Root .env provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE:

  • Cortex → vLLM (PRIMARY) for autonomous reasoning
  • NeoMem → Ollama (SECONDARY) + OpenAI embeddings
  • Intake → vLLM (PRIMARY) for summarization
  • Relay → Fallback chain with user preference

Preserves per-service flexibility while eliminating URL duplication.

Migration

  • All original .env files backed up to .env-backups/ with timestamp 20251126_025334
  • Rollback plan documented in ENVIRONMENT_VARIABLES.md
  • Verification steps provided in DEPRECATED_FILES.md

[0.4.x] - 2025-11-13

Added - Multi-Stage Reasoning Pipeline

Cortex v0.5 - Complete architectural overhaul

  • New reasoning.py module

    • Async reasoning engine
    • Accepts user prompt, identity, RAG block, and reflection notes
    • Produces draft internal answers
    • Uses primary backend (vLLM)
  • New reflection.py module

    • Fully async meta-awareness layer
    • Produces actionable JSON "internal notes"
    • Enforces strict JSON schema and fallback parsing
    • Forces cloud backend (backend_override="cloud")
  • Integrated refine.py into pipeline

    • New stage between reflection and persona
    • Runs exclusively on primary vLLM backend (MI50)
    • Produces final, internally consistent output for downstream persona layer
  • Backend override system

    • Each LLM call can now select its own backend
    • Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary
  • Identity loader

    • Added identity.py with load_identity() for consistent persona retrieval
  • Ingest handler

    • Async stub created for future Intake → NeoMem → RAG pipeline

Cortex v0.4.1 - RAG Integration

  • RAG integration
    • Added rag.py with query_rag() and format_rag_block()
    • Cortex now queries local RAG API (http://10.0.0.41:7090/rag/search)
    • Synthesized answers and top excerpts injected into reasoning prompt

Changed - Unified LLM Architecture

Cortex v0.5

  • Unified LLM backend URL handling across Cortex

    • ENV variables must now contain FULL API endpoints
    • Removed all internal path-appending (e.g. .../v1/completions)
    • llm_router.py rewritten to use env-provided URLs as-is
    • Ensures consistent behavior between draft, reflection, refine, and persona
  • Rebuilt main.py

    • Removed old annotation/analysis logic
    • New structure: load identity → get RAG → reflect → reason → return draft+notes
    • Routes now clean and minimal (/reason, /ingest, /health)
    • Async path throughout Cortex
  • Refactored llm_router.py

    • Removed old fallback logic during overrides
    • OpenAI requests now use /v1/chat/completions
    • Added proper OpenAI Authorization headers
    • Distinct payload format for vLLM vs OpenAI
    • Unified, correct parsing across models
  • Simplified Cortex architecture

    • Removed deprecated "context.py" and old reasoning code
    • Relay completely decoupled from smart behavior
  • Updated environment specification

    • LLM_PRIMARY_URL now set to http://10.0.0.43:8000/v1/completions
    • LLM_SECONDARY_URL remains http://10.0.0.3:11434/api/generate (Ollama)
    • LLM_CLOUD_URL set to https://api.openai.com/v1/chat/completions

Cortex v0.4.1

  • Revised /reason endpoint

    • Now builds unified context blocks: [Intake] → recent summaries, [RAG] → contextual knowledge, [User Message] → current input
    • Calls call_llm() for first pass, then reflection_loop() for meta-evaluation
    • Returns cortex_prompt, draft_output, final_output, and normalized reflection
  • Reflection Pipeline Stability

    • Cleaned parsing to normalize JSON vs. text reflections
    • Added fallback handling for malformed or non-JSON outputs
    • Log system improved to show raw JSON, extracted fields, and normalized summary
  • Async Summarization (Intake v0.2.1)

    • Intake summaries now run in background threads to avoid blocking Cortex
    • Summaries (L1L∞) logged asynchronously with [BG] tags
  • Environment & Networking Fixes

    • Verified .env variables propagate correctly inside Cortex container
    • Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG
    • Adjusted localhost calls to service-IP mapping
  • Behavioral Updates

    • Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers)
    • RAG context successfully grounds reasoning outputs
    • Intake and NeoMem confirmed receiving summaries via /add_exchange
    • Log clarity pass: all reflective and contextual blocks clearly labeled

Fixed

Cortex v0.5

  • Resolved endpoint conflict where router expected base URLs and refine expected full URLs
    • Fixed by standardizing full-URL behavior across entire system
  • Reflection layer no longer fails silently (previously returned [""] due to MythoMax)
  • Resolved 404/401 errors caused by incorrect OpenAI URL endpoints
  • No more double-routing through vLLM during reflection
  • Corrected async/sync mismatch in multiple locations
  • Eliminated double-path bug (/v1/completions/v1/completions) caused by previous router logic

Removed

Cortex v0.5

  • Legacy annotate, reason_check glue logic from old architecture
  • Old backend probing junk code
  • Stale imports and unused modules leftover from previous prototype

Verified

Cortex v0.5

  • Cortex → vLLM (MI50) → refine → final_output now functioning correctly
  • Refine shows used_primary_backend: true and no fallback
  • Manual curl test confirms endpoint accuracy

Known Issues

Cortex v0.5

  • Refine sometimes prefixes output with "Final Answer:"; next version will sanitize this
  • Hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned)

Cortex v0.4.1

  • NeoMem tuning needed - improve retrieval latency and relevance
  • Need dedicated /reflections/recent endpoint for Cortex
  • Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem)
  • Add persistent reflection recall (use prior reflections as meta-context)
  • Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields)
  • Tighten temperature and prompt control for factual consistency
  • RAG optimization: add source ranking, filtering, multi-vector hybrid search
  • Cache RAG responses per session to reduce duplicate calls

Notes

Cortex v0.5

This is the largest structural change to Cortex so far. It establishes:

  • Multi-model cognition
  • Clean layering
  • Identity + reflection separation
  • Correct async code
  • Deterministic backend routing
  • Predictable JSON reflection

The system is now ready for:

  • Refinement loops
  • Persona-speaking layer
  • Containerized RAG
  • Long-term memory integration
  • True emergent-behavior experiments

[0.3.x] - 2025-10-28 to 2025-09-26

Added

[Lyra Core v0.3.2 + Web UI v0.2.0] - 2025-10-28

  • New UI

    • Cleaned up UI look and feel
  • Sessions

    • Sessions now persist over time
    • Ability to create new sessions or load sessions from previous instance
    • When changing session, updates what the prompt sends to relay (doesn't prompt with messages from other sessions)
    • Relay correctly wired in

[Lyra-Core 0.3.1] - 2025-10-09

  • NVGRAM Integration (Full Pipeline Reconnected)
    • Replaced legacy Mem0 service with NVGRAM microservice (nvgram-api @ port 7077)
    • Updated server.js in Relay to route all memory ops via ${NVGRAM_API}/memories and /search
    • Added .env variable: NVGRAM_API=http://nvgram-api:7077
    • Verified end-to-end Lyra conversation persistence: relay → nvgram-api → postgres/neo4j → relay → ollama → ui
    • Memories stored, retrieved, and re-injected successfully

[Lyra-Core v0.3.0] - 2025-09-26

  • Salience filtering in Relay
    • .env configurable: SALIENCE_ENABLED, SALIENCE_MODE, SALIENCE_MODEL, SALIENCE_API_URL
    • Supports heuristic and llm classification modes
    • LLM-based salience filter integrated with Cortex VM running llama-server
  • Logging improvements
    • Added debug logs for salience mode, raw LLM output, and unexpected outputs
    • Fail-closed behavior for unexpected LLM responses
  • Successfully tested with Phi-3.5-mini and Qwen2-0.5B-Instruct as salience classifiers
  • Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply

[Cortex v0.3.0] - 2025-10-31

  • Cortex Service (FastAPI)

    • New standalone reasoning engine (cortex/main.py) with endpoints:
      • GET /health reports active backend + NeoMem status
      • POST /reason evaluates {prompt, response} pairs
      • POST /annotate experimental text analysis
    • Background NeoMem health monitor (5-minute interval)
  • Multi-Backend Reasoning Support

    • Environment-driven backend selection via LLM_FORCE_BACKEND
    • Supports: Primary (vLLM MI50), Secondary (Ollama 3090), Cloud (OpenAI), Fallback (llama.cpp CPU)
    • Per-backend model variables: LLM_PRIMARY_MODEL, LLM_SECONDARY_MODEL, LLM_CLOUD_MODEL, LLM_FALLBACK_MODEL
  • Response Normalization Layer

    • Implemented normalize_llm_response() to merge streamed outputs and repair malformed JSON
    • Handles Ollama's multi-line streaming and Mythomax's missing punctuation issues
    • Prints concise debug previews of merged content
  • Environment Simplification

    • Each service (intake, cortex, neomem) now maintains its own .env file
    • Removed reliance on shared/global env file to prevent cross-contamination
    • Verified Docker Compose networking across containers

[NeoMem 0.1.2] - 2025-10-27 (formerly NVGRAM)

  • Renamed NVGRAM to NeoMem
    • All future updates under name NeoMem
    • Features unchanged

[NVGRAM 0.1.1] - 2025-10-08

  • Async Memory Rewrite (Stability + Safety Patch)
    • Introduced AsyncMemory class with fully asynchronous vector and graph store writes
    • Added input sanitation to prevent embedding errors ('list' object has no attribute 'replace')
    • Implemented flatten_messages() helper in API layer to clean malformed payloads
    • Added structured request logging via RequestLoggingMiddleware (FastAPI middleware)
    • Health endpoint (/health) returns structured JSON {status, version, service}
    • Startup logs include sanitized embedder config with masked API keys

[NVGRAM 0.1.0] - 2025-10-07

  • Initial fork of Mem0 → NVGRAM
    • Created fully independent local-first memory engine based on Mem0 OSS
    • Renamed all internal modules, Docker services, environment variables from mem0nvgram
    • New service name: nvgram-api, default port 7077
    • Maintains same API endpoints (/memories, /search) for drop-in compatibility
    • Uses FastAPI, Postgres, and Neo4j as persistent backends

[Lyra-Mem0 0.3.2] - 2025-10-05

  • Ollama LLM reasoning alongside OpenAI embeddings
    • Introduced LLM_PROVIDER=ollama, LLM_MODEL, and OLLAMA_HOST in .env.3090
    • Verified local 3090 setup using qwen2.5:7b-instruct-q4_K_M
    • Split processing: Embeddings → OpenAI text-embedding-3-small, LLM → Local Ollama
  • Added .env.3090 template for self-hosted inference nodes
  • Integrated runtime diagnostics and seeder progress tracking
    • File-level + message-level progress bars
    • Retry/back-off logic for timeouts (3 attempts)
    • Event logging (ADD / UPDATE / NONE) for every memory record
  • Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers
  • Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090)

[Lyra-Mem0 0.3.1] - 2025-10-03

  • HuggingFace TEI integration (local 3090 embedder)
  • Dual-mode environment switch between OpenAI cloud and local
  • CSV export of memories from Postgres (payload->>'data')

[Lyra-Mem0 0.3.0]

  • Ollama embeddings in Mem0 OSS container
    • Configure EMBEDDER_PROVIDER=ollama, EMBEDDER_MODEL, OLLAMA_HOST via .env
    • Mounted main.py override from host into container to load custom DEFAULT_CONFIG
    • Installed ollama Python client into custom API container image
  • .env.3090 file for external embedding mode (3090 machine)
  • Workflow for multiple embedding modes: LAN-based 3090/Ollama, Local-only CPU, OpenAI fallback

[Lyra-Mem0 v0.2.1]

  • Seeding pipeline
    • Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0
    • Implemented incremental seeding option (skip existing memories, only add new ones)
    • Verified insert process with Postgres-backed history DB

[Intake v0.1.0] - 2025-10-27

  • Receives messages from relay and summarizes them in cascading format
  • Continues to summarize smaller amounts of exchanges while generating large-scale conversational summaries (L20)
  • Currently logs summaries to .log file in /project-lyra/intake-logs/

[Lyra-Cortex v0.2.0] - 2025-09-26

  • Integrated llama-server on dedicated Cortex VM (Proxmox)
  • Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs
  • Benchmarked Phi-3.5-mini performance: ~18 tokens/sec CPU-only on Ryzen 7 7800X
  • Salience classification functional but sometimes inconsistent
  • Tested Qwen2-0.5B-Instruct GGUF as alternative salience classifier
    • Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval)
    • More responsive but over-classifies messages as "salient"
  • Established .env integration for model ID (SALIENCE_MODEL), enabling hot-swap between models

Changed

[Lyra-Core 0.3.1] - 2025-10-09

  • Renamed MEM0_URLNVGRAM_API across all relay environment configs
  • Updated Docker Compose service dependency order
    • relay now depends on nvgram-api healthcheck
    • Removed mem0 references and volumes
  • Minor cleanup to Persona fetch block (null-checks and safer default persona string)

[Lyra-Core v0.3.1] - 2025-09-27

  • Removed salience filter logic; Cortex is now default annotator
  • All user messages stored in Mem0; no discard tier applied
  • Cortex annotations (metadata.cortex) now attached to memories
  • Debug logging improvements
    • Pretty-print Cortex annotations
    • Injected prompt preview
    • Memory search hit list with scores
  • .env toggle (CORTEX_ENABLED) to bypass Cortex when needed

[Lyra-Core v0.3.0] - 2025-09-26

  • Refactored server.js to gate mem.add() calls behind salience filter
  • Updated .env to support SALIENCE_MODEL

[Cortex v0.3.0] - 2025-10-31

  • Refactored reason_check() to dynamically switch between prompt and chat mode depending on backend
  • Enhanced startup logs to announce active backend, model, URL, and mode
  • Improved error handling with clearer "Reasoning error" messages

[NVGRAM 0.1.1] - 2025-10-08

  • Replaced synchronous Memory.add() with async-safe version supporting concurrent vector + graph writes
  • Normalized indentation and cleaned duplicate main.py references
  • Removed redundant FastAPI() app reinitialization
  • Updated internal logging to INFO-level timing format
  • Deprecated @app.on_event("startup") → will migrate to lifespan handler in v0.1.2

[NVGRAM 0.1.0] - 2025-10-07

  • Removed dependency on external mem0ai SDK — all logic now local
  • Re-pinned requirements: fastapi==0.115.8, uvicorn==0.34.0, pydantic==2.10.4, python-dotenv==1.0.1, psycopg>=3.2.8, ollama
  • Adjusted docker-compose and .env templates to use new NVGRAM naming

[Lyra-Mem0 0.3.2] - 2025-10-05

  • Updated main.py configuration block to load LLM_PROVIDER, LLM_MODEL, OLLAMA_BASE_URL
    • Fallback to OpenAI if Ollama unavailable
  • Adjusted docker-compose.yml mount paths to correctly map /app/main.py
  • Normalized .env loading so mem0-api and host environment share identical values
  • Improved seeder logging and progress telemetry
  • Added explicit temperature field to DEFAULT_CONFIG['llm']['config']

[Lyra-Mem0 0.3.0]

  • docker-compose.yml updated to mount local main.py and .env.3090
  • Built custom Dockerfile (mem0-api-server:latest) extending base image with pip install ollama
  • Updated requirements.txt to include ollama package
  • Adjusted Mem0 container config so main.py pulls environment variables with dotenv
  • Tested new embeddings path with curl /memories API call

[Lyra-Mem0 v0.2.1]

  • Updated main.py to load configuration from .env using dotenv and support multiple embedder backends
  • Mounted host main.py into container so local edits persist across rebuilds
  • Updated docker-compose.yml to mount .env.3090 and support swap between profiles
  • Built custom Dockerfile (mem0-api-server:latest) including pip install ollama
  • Updated requirements.txt with ollama dependency
  • Adjusted startup flow so container automatically connects to external Ollama host (LAN IP)
  • Added logging to confirm model pulls and embedding requests

Fixed

[Lyra-Core 0.3.1] - 2025-10-09

  • Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling
  • /memories POST failures no longer crash Relay; now logged gracefully as relay error Error: memAdd failed: 500
  • Improved injected prompt debugging (DEBUG_PROMPT=true now prints clean JSON)

[Lyra-Core v0.3.1] - 2025-09-27

  • Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner
  • Relay no longer "hangs" on malformed Cortex outputs

[Cortex v0.3.0] - 2025-10-31

  • Corrected broken vLLM endpoint routing (/v1/completions)
  • Stabilized cross-container health reporting for NeoMem
  • Resolved JSON parse failures caused by streaming chunk delimiters

[NVGRAM 0.1.1] - 2025-10-08

  • Eliminated repeating 500 error from OpenAI embedder caused by non-string message content
  • Masked API key leaks from boot logs
  • Ensured Neo4j reconnects gracefully on first retry

[Lyra-Mem0 0.3.2] - 2025-10-05

  • Resolved crash during startup: TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'
  • Corrected mount type mismatch (file vs directory) causing OCI runtime create failed errors
  • Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests
  • "Unknown event" warnings now safely ignored (no longer break seeding loop)
  • Confirmed full dual-provider operation in logs (api.openai.com + 10.0.0.3:11434/api/chat)

[Lyra-Mem0 0.3.1] - 2025-10-03

  • .env CRLF vs LF line ending issues
  • Local seeding now possible via HuggingFace server

[Lyra-Mem0 0.3.0]

  • Resolved container boot failure caused by missing ollama dependency (ModuleNotFoundError)
  • Fixed config overwrite issue where rebuilding container restored stock main.py
  • Worked around Neo4j error (vector.similarity.cosine(): mismatched vector dimensions) by confirming OpenAI vs. Ollama embedding vector sizes

[Lyra-Mem0 v0.2.1]

  • Seeder process originally failed on old memories — now skips duplicates and continues batch
  • Resolved container boot error (ModuleNotFoundError: ollama) by extending image
  • Fixed overwrite issue where stock main.py replaced custom config during rebuild
  • Worked around Neo4j vector.similarity.cosine() dimension mismatch

Known Issues

[Lyra-Core v0.3.0] - 2025-09-26

  • Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient"
  • Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi")
  • CPU-only inference is functional but limited; larger models recommended once GPU available

[Lyra-Cortex v0.2.0] - 2025-09-26

  • Small models tend to drift or over-classify
  • CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models
  • Need to set up systemd service for llama-server to auto-start on VM reboot

Observations

[Lyra-Mem0 0.3.2] - 2025-10-05

  • Stable GPU utilization: ~8 GB VRAM @ 92% load, ≈ 67°C under sustained seeding
  • Next revision will re-format seed JSON to preserve role context (user vs assistant)

[Lyra-Mem0 v0.2.1]

  • To fully unify embedding modes, a Hugging Face / local model with 1536-dim embeddings will be needed (to match OpenAI's schema)
  • Current Ollama model (mxbai-embed-large) works, but returns 1024-dim vectors
  • Seeder workflow validated but should be wrapped in repeatable weekly run for full Cloud→Local sync

Next Steps

[Lyra-Core 0.3.1] - 2025-10-09

  • Add salience visualization (e.g., memory weights displayed in injected system message)
  • Begin schema alignment with NVGRAM v0.1.2 for confidence scoring
  • Add relay auto-retry for transient 500 responses from NVGRAM

[NVGRAM 0.1.1] - 2025-10-08

  • Integrate salience scoring and embedding confidence weight fields in Postgres schema
  • Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall
  • Migrate from deprecated on_eventlifespan pattern in 0.1.2

[NVGRAM 0.1.0] - 2025-10-07

  • Integrate NVGRAM as new default backend in Lyra Relay
  • Deprecate remaining Mem0 references and archive old configs
  • Begin versioning as standalone project (nvgram-core, nvgram-api, etc.)

[Intake v0.1.0] - 2025-10-27

  • Feed intake into NeoMem
  • Generate daily/hourly overall summary (IE: Today Brian and Lyra worked on x, y, and z)
  • Generate session-aware summaries with own intake hopper

[0.2.x] - 2025-09-30 to 2025-09-24

Added

[Lyra-Mem0 v0.2.0] - 2025-09-30

  • Standalone Lyra-Mem0 stack created at ~/lyra-mem0/
    • Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking
    • Added working docker-compose.mem0.yml and custom Dockerfile for building Mem0 API server
  • Verified REST API functionality
    • POST /memories works for adding memories
    • POST /search works for semantic search
  • Successful end-to-end test with persisted memory: "Likes coffee in the morning" → retrievable via search

[Lyra-Core v0.2.0] - 2025-09-24

  • Migrated Relay to use mem0ai SDK instead of raw fetch calls
  • Implemented sessionId support (client-supplied, fallback to default)
  • Added debug logs for memory add/search
  • Cleaned up Relay structure for clarity

Changed

[Lyra-Mem0 v0.2.0] - 2025-09-30

  • Split architecture into modular stacks:
    • ~/lyra-core (Relay, Persona-Sidecar, etc.)
    • ~/lyra-mem0 (Mem0 OSS memory stack)
  • Removed old embedded mem0 containers from Lyra-Core compose file
  • Added Lyra-Mem0 section in README.md

Next Steps

[Lyra-Mem0 v0.2.0] - 2025-09-30

  • Wire Relay → Mem0 API (integration not yet complete)
  • Add integration tests to verify persistence and retrieval from within Lyra-Core

[0.1.x] - 2025-09-25 to 2025-09-23

Added

[Lyra_RAG v0.1.0] - 2025-11-07

  • Initial standalone RAG module for Project Lyra
  • Persistent ChromaDB vector store (./chromadb)
  • Importer rag_chat_import.py with:
    • Recursive folder scanning and category tagging
    • Smart chunking (~5k chars)
    • SHA-1 deduplication and chat-ID metadata
    • Timestamp fields (file_modified, imported_at)
    • Background-safe operation (nohup/tmux)
  • 68 Lyra-category chats imported:
    • 6,556 new chunks added
    • 1,493 duplicates skipped
    • 7,997 total vectors stored

[Lyra_RAG v0.1.0 API] - 2025-11-07

  • /rag/search FastAPI endpoint implemented (port 7090)
  • Supports natural-language queries and returns top related excerpts
  • Added answer synthesis step using gpt-4o-mini

[Lyra-Core v0.1.0] - 2025-09-23

  • First working MVP of Lyra Core Relay
  • Relay service accepts POST /v1/chat/completions (OpenAI-compatible)
  • Memory integration with Mem0:
    • POST /memories on each user message
    • POST /search before LLM call
  • Persona Sidecar integration (GET /current)
  • OpenAI GPT + Ollama (Mythomax) support in Relay
  • Simple browser-based chat UI (talks to Relay at http://<host>:7078)
  • .env standardization for Relay + Mem0 + Postgres + Neo4j
  • Working Neo4j + Postgres backing stores for Mem0
  • Initial MVP relay service with raw fetch calls to Mem0
  • Dockerized with basic healthcheck

[Lyra-Cortex v0.1.0] - 2025-09-25

  • First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD)
  • Built llama.cpp with llama-server target via CMake
  • Integrated Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF) model
  • Verified API compatibility at /v1/chat/completions
  • Local test successful via curl → ~523 token response generated
  • Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X)
  • Confirmed usable for salience scoring, summarization, and lightweight reasoning

Fixed

[Lyra-Core v0.1.0] - 2025-09-23

  • Resolved crash loop in Neo4j by restricting env vars (NEO4J_AUTH only)
  • Relay now correctly reads MEM0_URL and MEM0_API_KEY from .env

Verified

[Lyra_RAG v0.1.0] - 2025-11-07

  • Successful recall of Lyra-Core development history (v0.3.0 snapshot)
  • Correct metadata and category tagging for all new imports

Known Issues

[Lyra-Core v0.1.0] - 2025-09-23

  • No feedback loop (thumbs up/down) yet
  • Forget/delete flow is manual (via memory IDs)
  • Memory latency ~14s depending on embedding model

Next Planned

[Lyra_RAG v0.1.0] - 2025-11-07

  • Optional where filter parameter for category/date queries
  • Graceful "no results" handler for empty retrievals
  • rag_docs_import.py for PDFs and other document types