Files

serversdwn fe86759cfd v0.5.2 - fixed: llm router async, relay-UI mismatch, intake summarization failure, among others.

Memory relevance thresh. increased.

2025-12-12 02:58:23 -05:00

37 KiB

Raw Blame History

Project Lyra Changelog

All notable changes to Project Lyra. Format based on Keep a Changelog and Semantic Versioning.

[Unreleased]

[0.5.2] - 2025-12-12

Fixed - LLM Router & Async HTTP

Critical: Replaced synchronous requests with async httpx in LLM router cortex/llm/llm_router.py
- Event loop blocking was causing timeouts and empty responses
- All three providers (MI50, Ollama, OpenAI) now use await http_client.post()
- Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake
Critical: Fixed missing backend parameter in intake summarization cortex/intake/intake.py:285
- Was defaulting to PRIMARY (MI50) instead of respecting INTAKE_LLM=SECONDARY
- Now correctly uses configured backend (Ollama on 3090)
Relay: Fixed session ID case mismatch core/relay/server.js:87
- UI sends sessionId (camelCase) but relay expected session_id (snake_case)
- Now accepts both variants: req.body.session_id || req.body.sessionId
- Custom session IDs now properly tracked instead of defaulting to "default"

Added - Error Handling & Diagnostics

Added comprehensive error handling in LLM router for all providers
- HTTPError, JSONDecodeError, KeyError, and generic Exception handling
- Detailed error messages with exception type and description
- Provider-specific error logging (mi50, ollama, openai)
Added debug logging in intake summarization
- Logs LLM response length and preview
- Validates non-empty responses before JSON parsing
- Helps diagnose empty or malformed responses

Added - Session Management

Added session persistence endpoints in relay core/relay/server.js:160-171
- GET /sessions/:id - Retrieve session history
- POST /sessions/:id - Save session history
- In-memory storage using Map (ephemeral, resets on container restart)
- Fixes UI "Failed to load session" errors

Changed - Provider Configuration

Added mi50 provider support for llama.cpp server cortex/llm/llm_router.py:62-81
- Uses /completion endpoint with n_predict parameter
- Extracts content field from response
- Configured for MI50 GPU with DeepSeek model
Increased memory retrieval threshold from 0.78 to 0.90 cortex/.env:20
- Filters out low-relevance memories (only returns 90%+ similarity)
- Reduces noise in context retrieval

Technical Improvements

Unified async HTTP handling across all LLM providers
Better separation of concerns between provider implementations
Improved error messages for debugging LLM API failures
Consistent timeout handling (120 seconds for all providers)

[0.5.1] - 2025-12-11

Fixed - Intake Integration

Critical: Fixed bg_summarize() function not defined error
- Was only a TYPE_CHECKING stub, now implemented as logging stub
- Eliminated NameError preventing SESSIONS from persisting correctly
- Function now logs exchange additions and defers summarization to /reason endpoint
Critical: Fixed /ingest endpoint unreachable code in router.py:201-233
- Removed early return that prevented update_last_assistant_message() from executing
- Removed duplicate add_exchange_internal() call
- Implemented lenient error handling (each operation wrapped in try/except)
Intake: Added missing __init__.py to make intake a proper Python package cortex/intake/init.py
- Prevents namespace package issues
- Enables proper module imports
- Exports SESSIONS, add_exchange_internal, summarize_context

Added - Diagnostics & Debugging

Added diagnostic logging to verify SESSIONS singleton behavior
- Module initialization logs SESSIONS object ID intake.py:14
- Each add_exchange_internal() call logs object ID and buffer state intake.py:343-358
Added /debug/sessions HTTP endpoint router.py:276-305
- Inspect SESSIONS from within running Uvicorn worker
- Shows total sessions, session count, buffer sizes, recent exchanges
- Returns SESSIONS object ID for verification
Added /debug/summary HTTP endpoint router.py:238-271
- Test summarize_context() for any session
- Returns L1/L5/L10/L20/L30 summaries
- Includes buffer size and exchange preview

Changed - Intake Architecture

Intake no longer standalone service - runs inside Cortex container as pure Python module
- Imported as from intake.intake import add_exchange_internal, SESSIONS
- No HTTP calls between Cortex and Intake
- Eliminates network latency and dependency on Intake service being up
Deferred summarization: bg_summarize() is now a no-op stub intake.py:318-325
- Actual summarization happens during /reason call via summarize_context()
- Simplifies async/sync complexity
- Prevents NameError when called from add_exchange_internal()
Lenient error handling: /ingest endpoint always returns success router.py:201-233
- Each operation wrapped in try/except
- Logs errors but never fails to avoid breaking chat pipeline
- User requirement: never fail chat pipeline

Documentation

Added single-worker constraint note in cortex/Dockerfile:7-8
- Documents that SESSIONS requires single Uvicorn worker
- Notes that multi-worker scaling requires Redis or shared storage
Updated plan documentation with root cause analysis

[0.5.0] - 2025-11-28

Fixed - Critical API Wiring & Integration

After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.

Cortex → Intake Integration

Fixed IntakeClient to use correct Intake v0.2 API endpoints
- Changed GET /context/{session_id} → GET /summaries?session_id={session_id}
- Updated JSON response parsing to extract summary_text field
- Fixed environment variable name: INTAKE_API → INTAKE_API_URL
- Corrected default port: 7083 → 7080
- Added deprecation warning to summarize_turn() method (endpoint removed in Intake v0.2)

Relay → UI Compatibility

Added OpenAI-compatible endpoint POST /v1/chat/completions
- Accepts standard OpenAI format with messages[] array
- Returns OpenAI-compatible response structure with choices[]
- Extracts last message content from messages array
- Includes usage metadata (stub values for compatibility)
Refactored Relay to use shared handleChatRequest() function
- Both /chat and /v1/chat/completions use same core logic
- Eliminates code duplication
- Consistent error handling across endpoints

Relay → Intake Connection

Fixed Intake URL fallback in Relay server configuration
- Corrected port: 7082 → 7080
- Updated endpoint: /summary → /add_exchange
- Now properly sends exchanges to Intake for summarization

Code Quality & Python Package Structure

Added missing __init__.py files to all Cortex subdirectories
- cortex/llm/__init__.py
- cortex/reasoning/__init__.py
- cortex/persona/__init__.py
- cortex/ingest/__init__.py
- cortex/utils/__init__.py
- Improves package imports and IDE support
Removed unused import in cortex/router.py: from unittest import result
Deleted empty file cortex/llm/resolve_llm_url.py (was 0 bytes, never implemented)

Verified Working

Complete end-to-end message flow now operational:

UI → Relay (/v1/chat/completions)
  ↓
Relay → Cortex (/reason)
  ↓
Cortex → Intake (/summaries) [retrieves context]
  ↓
Cortex 4-stage pipeline:
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer
  3. refine.py → polished answer
  4. persona/speak.py → Lyra personality
  ↓
Cortex → Relay (returns persona response)
  ↓
Relay → Intake (/add_exchange) [async summary]
  ↓
Intake → NeoMem (background memory storage)
  ↓
Relay → UI (final response)

Documentation

Added comprehensive v0.5.0 changelog entry
Updated README.md to reflect v0.5.0 architecture
- Documented new endpoints
- Updated data flow diagrams
- Clarified Intake v0.2 changes
- Corrected service descriptions

Issues Resolved

❌ Cortex could not retrieve context from Intake (wrong endpoint)
❌ UI could not send messages to Relay (endpoint mismatch)
❌ Relay could not send summaries to Intake (wrong port/endpoint)
❌ Python package imports were implicit (missing init.py)

Known Issues (Non-Critical)

Session management endpoints not implemented in Relay (GET/POST /sessions/:id)
RAG service currently disabled in docker-compose.yml
Cortex /ingest endpoint is a stub returning {"status": "ok"}

Migration Notes

If upgrading from v0.4.x:

Pull latest changes from git
Verify environment variables in .env files:
- Check INTAKE_API_URL=http://intake:7080 (not INTAKE_API)
- Verify all service URLs use correct ports
Restart Docker containers: docker-compose down && docker-compose up -d
Test with a simple message through the UI

[Infrastructure v1.0.0] - 2025-11-26

Changed - Environment Variable Consolidation

Major reorganization to eliminate duplication and improve maintainability

Consolidated 9 scattered .env files into single source of truth architecture
Root .env now contains all shared infrastructure (LLM backends, databases, API keys, service URLs)
Service-specific .env files minimized to only essential overrides:
- cortex/.env: Reduced from 42 to 22 lines (operational parameters only)
- neomem/.env: Reduced from 26 to 14 lines (LLM naming conventions only)
- intake/.env: Kept at 8 lines (already minimal)
Result: ~24% reduction in total configuration lines (197 → ~150)

Docker Compose Consolidation

All services now defined in single root docker-compose.yml
Relay service updated with complete configuration (env_file, volumes)
Removed redundant core/docker-compose.yml (marked as DEPRECATED)
Standardized network communication to use Docker container names

Service URL Standardization

Internal services use container names: http://neomem-api:7077, http://cortex:7081
External services use IP addresses: http://10.0.0.43:8000 (vLLM), http://10.0.0.3:11434 (Ollama)
Removed IP/container name inconsistencies across files

Added - Security & Documentation

Security Templates - Created .env.example files for all services

Root .env.example with sanitized credentials
Service-specific templates: cortex/.env.example, neomem/.env.example, intake/.env.example, rag/.env.example
All .env.example files safe to commit to version control

Documentation

ENVIRONMENT_VARIABLES.md: Comprehensive reference for all environment variables
- Variable descriptions, defaults, and usage examples
- Multi-backend LLM strategy documentation
- Troubleshooting guide
- Security best practices
DEPRECATED_FILES.md: Deletion guide for deprecated files with verification steps

Enhanced .gitignore

Ignores all .env files (including subdirectories)
Tracks .env.example templates for documentation
Ignores .env-backups/ directory

Removed

core/.env - Redundant with root .env, now deleted
core/docker-compose.yml - Consolidated into main compose file (marked DEPRECATED)

Fixed

Eliminated duplicate OPENAI_API_KEY across 5+ files
Eliminated duplicate LLM backend URLs across 4+ files
Eliminated duplicate database credentials across 3+ files
Resolved Cortex environment: section override in docker-compose (now uses env_file)

Architecture - Multi-Backend LLM Strategy

Root .env provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE:

Cortex → vLLM (PRIMARY) for autonomous reasoning
NeoMem → Ollama (SECONDARY) + OpenAI embeddings
Intake → vLLM (PRIMARY) for summarization
Relay → Fallback chain with user preference

Preserves per-service flexibility while eliminating URL duplication.

Migration

All original .env files backed up to .env-backups/ with timestamp 20251126_025334
Rollback plan documented in ENVIRONMENT_VARIABLES.md
Verification steps provided in DEPRECATED_FILES.md

[0.4.x] - 2025-11-13

Added - Multi-Stage Reasoning Pipeline

Cortex v0.5 - Complete architectural overhaul

New reasoning.py module
- Async reasoning engine
- Accepts user prompt, identity, RAG block, and reflection notes
- Produces draft internal answers
- Uses primary backend (vLLM)
New reflection.py module
- Fully async meta-awareness layer
- Produces actionable JSON "internal notes"
- Enforces strict JSON schema and fallback parsing
- Forces cloud backend (backend_override="cloud")
Integrated refine.py into pipeline
- New stage between reflection and persona
- Runs exclusively on primary vLLM backend (MI50)
- Produces final, internally consistent output for downstream persona layer
Backend override system
- Each LLM call can now select its own backend
- Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary
Identity loader
- Added identity.py with load_identity() for consistent persona retrieval
Ingest handler
- Async stub created for future Intake → NeoMem → RAG pipeline

Cortex v0.4.1 - RAG Integration

RAG integration
- Added rag.py with query_rag() and format_rag_block()
- Cortex now queries local RAG API (http://10.0.0.41:7090/rag/search)
- Synthesized answers and top excerpts injected into reasoning prompt

Changed - Unified LLM Architecture

Cortex v0.5

Unified LLM backend URL handling across Cortex
- ENV variables must now contain FULL API endpoints
- Removed all internal path-appending (e.g. .../v1/completions)
- llm_router.py rewritten to use env-provided URLs as-is
- Ensures consistent behavior between draft, reflection, refine, and persona
Rebuilt main.py
- Removed old annotation/analysis logic
- New structure: load identity → get RAG → reflect → reason → return draft+notes
- Routes now clean and minimal (/reason, /ingest, /health)
- Async path throughout Cortex
Refactored llm_router.py
- Removed old fallback logic during overrides
- OpenAI requests now use /v1/chat/completions
- Added proper OpenAI Authorization headers
- Distinct payload format for vLLM vs OpenAI
- Unified, correct parsing across models
Simplified Cortex architecture
- Removed deprecated "context.py" and old reasoning code
- Relay completely decoupled from smart behavior
Updated environment specification
- LLM_PRIMARY_URL now set to http://10.0.0.43:8000/v1/completions
- LLM_SECONDARY_URL remains http://10.0.0.3:11434/api/generate (Ollama)
- LLM_CLOUD_URL set to https://api.openai.com/v1/chat/completions

Cortex v0.4.1

Revised /reason endpoint
- Now builds unified context blocks: [Intake] → recent summaries, [RAG] → contextual knowledge, [User Message] → current input
- Calls call_llm() for first pass, then reflection_loop() for meta-evaluation
- Returns cortex_prompt, draft_output, final_output, and normalized reflection
Reflection Pipeline Stability
- Cleaned parsing to normalize JSON vs. text reflections
- Added fallback handling for malformed or non-JSON outputs
- Log system improved to show raw JSON, extracted fields, and normalized summary
Async Summarization (Intake v0.2.1)
- Intake summaries now run in background threads to avoid blocking Cortex
- Summaries (L1–L∞) logged asynchronously with [BG] tags
Environment & Networking Fixes
- Verified .env variables propagate correctly inside Cortex container
- Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG
- Adjusted localhost calls to service-IP mapping
Behavioral Updates
- Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers)
- RAG context successfully grounds reasoning outputs
- Intake and NeoMem confirmed receiving summaries via /add_exchange
- Log clarity pass: all reflective and contextual blocks clearly labeled

Fixed

Cortex v0.5

Resolved endpoint conflict where router expected base URLs and refine expected full URLs
- Fixed by standardizing full-URL behavior across entire system
Reflection layer no longer fails silently (previously returned [""] due to MythoMax)
Resolved 404/401 errors caused by incorrect OpenAI URL endpoints
No more double-routing through vLLM during reflection
Corrected async/sync mismatch in multiple locations
Eliminated double-path bug (/v1/completions/v1/completions) caused by previous router logic

Removed

Cortex v0.5

Legacy annotate, reason_check glue logic from old architecture
Old backend probing junk code
Stale imports and unused modules leftover from previous prototype

Verified

Cortex v0.5

Cortex → vLLM (MI50) → refine → final_output now functioning correctly
Refine shows used_primary_backend: true and no fallback
Manual curl test confirms endpoint accuracy

Known Issues

Cortex v0.5

Refine sometimes prefixes output with "Final Answer:"; next version will sanitize this
Hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned)

Cortex v0.4.1

NeoMem tuning needed - improve retrieval latency and relevance
Need dedicated /reflections/recent endpoint for Cortex
Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem)
Add persistent reflection recall (use prior reflections as meta-context)
Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields)
Tighten temperature and prompt control for factual consistency
RAG optimization: add source ranking, filtering, multi-vector hybrid search
Cache RAG responses per session to reduce duplicate calls

Notes

Cortex v0.5

This is the largest structural change to Cortex so far. It establishes:

Multi-model cognition
Clean layering
Identity + reflection separation
Correct async code
Deterministic backend routing
Predictable JSON reflection

The system is now ready for:

Refinement loops
Persona-speaking layer
Containerized RAG
Long-term memory integration
True emergent-behavior experiments

[0.3.x] - 2025-10-28 to 2025-09-26

Added

[Lyra Core v0.3.2 + Web UI v0.2.0] - 2025-10-28

New UI
- Cleaned up UI look and feel
Sessions
- Sessions now persist over time
- Ability to create new sessions or load sessions from previous instance
- When changing session, updates what the prompt sends to relay (doesn't prompt with messages from other sessions)
- Relay correctly wired in

[Lyra-Core 0.3.1] - 2025-10-09

NVGRAM Integration (Full Pipeline Reconnected)
- Replaced legacy Mem0 service with NVGRAM microservice (nvgram-api @ port 7077)
- Updated server.js in Relay to route all memory ops via ${NVGRAM_API}/memories and /search
- Added .env variable: NVGRAM_API=http://nvgram-api:7077
- Verified end-to-end Lyra conversation persistence: relay → nvgram-api → postgres/neo4j → relay → ollama → ui
- ✅ Memories stored, retrieved, and re-injected successfully

[Lyra-Core v0.3.0] - 2025-09-26

Salience filtering in Relay
- .env configurable: SALIENCE_ENABLED, SALIENCE_MODE, SALIENCE_MODEL, SALIENCE_API_URL
- Supports heuristic and llm classification modes
- LLM-based salience filter integrated with Cortex VM running llama-server
Logging improvements
- Added debug logs for salience mode, raw LLM output, and unexpected outputs
- Fail-closed behavior for unexpected LLM responses
Successfully tested with Phi-3.5-mini and Qwen2-0.5B-Instruct as salience classifiers
Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply

[Cortex v0.3.0] - 2025-10-31

Cortex Service (FastAPI)
- New standalone reasoning engine (cortex/main.py) with endpoints:
  - GET /health – reports active backend + NeoMem status
  - POST /reason – evaluates {prompt, response} pairs
  - POST /annotate – experimental text analysis
- Background NeoMem health monitor (5-minute interval)
Multi-Backend Reasoning Support
- Environment-driven backend selection via LLM_FORCE_BACKEND
- Supports: Primary (vLLM MI50), Secondary (Ollama 3090), Cloud (OpenAI), Fallback (llama.cpp CPU)
- Per-backend model variables: LLM_PRIMARY_MODEL, LLM_SECONDARY_MODEL, LLM_CLOUD_MODEL, LLM_FALLBACK_MODEL
Response Normalization Layer
- Implemented normalize_llm_response() to merge streamed outputs and repair malformed JSON
- Handles Ollama's multi-line streaming and Mythomax's missing punctuation issues
- Prints concise debug previews of merged content
Environment Simplification
- Each service (intake, cortex, neomem) now maintains its own .env file
- Removed reliance on shared/global env file to prevent cross-contamination
- Verified Docker Compose networking across containers

[NeoMem 0.1.2] - 2025-10-27 (formerly NVGRAM)

Renamed NVGRAM to NeoMem
- All future updates under name NeoMem
- Features unchanged

[NVGRAM 0.1.1] - 2025-10-08

Async Memory Rewrite (Stability + Safety Patch)
- Introduced AsyncMemory class with fully asynchronous vector and graph store writes
- Added input sanitation to prevent embedding errors ('list' object has no attribute 'replace')
- Implemented flatten_messages() helper in API layer to clean malformed payloads
- Added structured request logging via RequestLoggingMiddleware (FastAPI middleware)
- Health endpoint (/health) returns structured JSON {status, version, service}
- Startup logs include sanitized embedder config with masked API keys

[NVGRAM 0.1.0] - 2025-10-07

Initial fork of Mem0 → NVGRAM
- Created fully independent local-first memory engine based on Mem0 OSS
- Renamed all internal modules, Docker services, environment variables from mem0 → nvgram
- New service name: nvgram-api, default port 7077
- Maintains same API endpoints (/memories, /search) for drop-in compatibility
- Uses FastAPI, Postgres, and Neo4j as persistent backends

[Lyra-Mem0 0.3.2] - 2025-10-05

Ollama LLM reasoning alongside OpenAI embeddings
- Introduced LLM_PROVIDER=ollama, LLM_MODEL, and OLLAMA_HOST in .env.3090
- Verified local 3090 setup using qwen2.5:7b-instruct-q4_K_M
- Split processing: Embeddings → OpenAI text-embedding-3-small, LLM → Local Ollama
Added .env.3090 template for self-hosted inference nodes
Integrated runtime diagnostics and seeder progress tracking
- File-level + message-level progress bars
- Retry/back-off logic for timeouts (3 attempts)
- Event logging (ADD / UPDATE / NONE) for every memory record
Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers
Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090)

[Lyra-Mem0 0.3.1] - 2025-10-03

HuggingFace TEI integration (local 3090 embedder)
Dual-mode environment switch between OpenAI cloud and local
CSV export of memories from Postgres (payload->>'data')

[Lyra-Mem0 0.3.0]

Ollama embeddings in Mem0 OSS container
- Configure EMBEDDER_PROVIDER=ollama, EMBEDDER_MODEL, OLLAMA_HOST via .env
- Mounted main.py override from host into container to load custom DEFAULT_CONFIG
- Installed ollama Python client into custom API container image
.env.3090 file for external embedding mode (3090 machine)
Workflow for multiple embedding modes: LAN-based 3090/Ollama, Local-only CPU, OpenAI fallback

[Lyra-Mem0 v0.2.1]

Seeding pipeline
- Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0
- Implemented incremental seeding option (skip existing memories, only add new ones)
- Verified insert process with Postgres-backed history DB

[Intake v0.1.0] - 2025-10-27

Receives messages from relay and summarizes them in cascading format
Continues to summarize smaller amounts of exchanges while generating large-scale conversational summaries (L20)
Currently logs summaries to .log file in /project-lyra/intake-logs/

[Lyra-Cortex v0.2.0] - 2025-09-26

Integrated llama-server on dedicated Cortex VM (Proxmox)
Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs
Benchmarked Phi-3.5-mini performance: ~18 tokens/sec CPU-only on Ryzen 7 7800X
Salience classification functional but sometimes inconsistent
Tested Qwen2-0.5B-Instruct GGUF as alternative salience classifier
- Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval)
- More responsive but over-classifies messages as "salient"
Established .env integration for model ID (SALIENCE_MODEL), enabling hot-swap between models

Changed

[Lyra-Core 0.3.1] - 2025-10-09

Renamed MEM0_URL → NVGRAM_API across all relay environment configs
Updated Docker Compose service dependency order
- relay now depends on nvgram-api healthcheck
- Removed mem0 references and volumes
Minor cleanup to Persona fetch block (null-checks and safer default persona string)

[Lyra-Core v0.3.1] - 2025-09-27

Removed salience filter logic; Cortex is now default annotator
All user messages stored in Mem0; no discard tier applied
Cortex annotations (metadata.cortex) now attached to memories
Debug logging improvements
- Pretty-print Cortex annotations
- Injected prompt preview
- Memory search hit list with scores
.env toggle (CORTEX_ENABLED) to bypass Cortex when needed

[Lyra-Core v0.3.0] - 2025-09-26

Refactored server.js to gate mem.add() calls behind salience filter
Updated .env to support SALIENCE_MODEL

[Cortex v0.3.0] - 2025-10-31

Refactored reason_check() to dynamically switch between prompt and chat mode depending on backend
Enhanced startup logs to announce active backend, model, URL, and mode
Improved error handling with clearer "Reasoning error" messages

[NVGRAM 0.1.1] - 2025-10-08

Replaced synchronous Memory.add() with async-safe version supporting concurrent vector + graph writes
Normalized indentation and cleaned duplicate main.py references
Removed redundant FastAPI() app reinitialization
Updated internal logging to INFO-level timing format
Deprecated @app.on_event("startup") → will migrate to lifespan handler in v0.1.2

[NVGRAM 0.1.0] - 2025-10-07

Removed dependency on external mem0ai SDK — all logic now local
Re-pinned requirements: fastapi==0.115.8, uvicorn==0.34.0, pydantic==2.10.4, python-dotenv==1.0.1, psycopg>=3.2.8, ollama
Adjusted docker-compose and .env templates to use new NVGRAM naming

[Lyra-Mem0 0.3.2] - 2025-10-05

Updated main.py configuration block to load LLM_PROVIDER, LLM_MODEL, OLLAMA_BASE_URL
- Fallback to OpenAI if Ollama unavailable
Adjusted docker-compose.yml mount paths to correctly map /app/main.py
Normalized .env loading so mem0-api and host environment share identical values
Improved seeder logging and progress telemetry
Added explicit temperature field to DEFAULT_CONFIG['llm']['config']

[Lyra-Mem0 0.3.0]

docker-compose.yml updated to mount local main.py and .env.3090
Built custom Dockerfile (mem0-api-server:latest) extending base image with pip install ollama
Updated requirements.txt to include ollama package
Adjusted Mem0 container config so main.py pulls environment variables with dotenv
Tested new embeddings path with curl /memories API call

[Lyra-Mem0 v0.2.1]

Updated main.py to load configuration from .env using dotenv and support multiple embedder backends
Mounted host main.py into container so local edits persist across rebuilds
Updated docker-compose.yml to mount .env.3090 and support swap between profiles
Built custom Dockerfile (mem0-api-server:latest) including pip install ollama
Updated requirements.txt with ollama dependency
Adjusted startup flow so container automatically connects to external Ollama host (LAN IP)
Added logging to confirm model pulls and embedding requests

Fixed

[Lyra-Core 0.3.1] - 2025-10-09

Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling
/memories POST failures no longer crash Relay; now logged gracefully as relay error Error: memAdd failed: 500
Improved injected prompt debugging (DEBUG_PROMPT=true now prints clean JSON)

[Lyra-Core v0.3.1] - 2025-09-27

Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner
Relay no longer "hangs" on malformed Cortex outputs

[Cortex v0.3.0] - 2025-10-31

Corrected broken vLLM endpoint routing (/v1/completions)
Stabilized cross-container health reporting for NeoMem
Resolved JSON parse failures caused by streaming chunk delimiters

[NVGRAM 0.1.1] - 2025-10-08

Eliminated repeating 500 error from OpenAI embedder caused by non-string message content
Masked API key leaks from boot logs
Ensured Neo4j reconnects gracefully on first retry

[Lyra-Mem0 0.3.2] - 2025-10-05

Resolved crash during startup: TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'
Corrected mount type mismatch (file vs directory) causing OCI runtime create failed errors
Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests
"Unknown event" warnings now safely ignored (no longer break seeding loop)
Confirmed full dual-provider operation in logs (api.openai.com + 10.0.0.3:11434/api/chat)

[Lyra-Mem0 0.3.1] - 2025-10-03

.env CRLF vs LF line ending issues
Local seeding now possible via HuggingFace server

[Lyra-Mem0 0.3.0]

Resolved container boot failure caused by missing ollama dependency (ModuleNotFoundError)
Fixed config overwrite issue where rebuilding container restored stock main.py
Worked around Neo4j error (vector.similarity.cosine(): mismatched vector dimensions) by confirming OpenAI vs. Ollama embedding vector sizes

[Lyra-Mem0 v0.2.1]

Seeder process originally failed on old memories — now skips duplicates and continues batch
Resolved container boot error (ModuleNotFoundError: ollama) by extending image
Fixed overwrite issue where stock main.py replaced custom config during rebuild
Worked around Neo4j vector.similarity.cosine() dimension mismatch

Known Issues

[Lyra-Core v0.3.0] - 2025-09-26

Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient"
Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi")
CPU-only inference is functional but limited; larger models recommended once GPU available

[Lyra-Cortex v0.2.0] - 2025-09-26

Small models tend to drift or over-classify
CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models
Need to set up systemd service for llama-server to auto-start on VM reboot

Observations

[Lyra-Mem0 0.3.2] - 2025-10-05

Stable GPU utilization: ~8 GB VRAM @ 92% load, ≈ 67°C under sustained seeding
Next revision will re-format seed JSON to preserve role context (user vs assistant)

[Lyra-Mem0 v0.2.1]

To fully unify embedding modes, a Hugging Face / local model with 1536-dim embeddings will be needed (to match OpenAI's schema)
Current Ollama model (mxbai-embed-large) works, but returns 1024-dim vectors
Seeder workflow validated but should be wrapped in repeatable weekly run for full Cloud→Local sync

Next Steps

[Lyra-Core 0.3.1] - 2025-10-09

Add salience visualization (e.g., memory weights displayed in injected system message)
Begin schema alignment with NVGRAM v0.1.2 for confidence scoring
Add relay auto-retry for transient 500 responses from NVGRAM

[NVGRAM 0.1.1] - 2025-10-08

Integrate salience scoring and embedding confidence weight fields in Postgres schema
Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall
Migrate from deprecated on_event → lifespan pattern in 0.1.2

[NVGRAM 0.1.0] - 2025-10-07

Integrate NVGRAM as new default backend in Lyra Relay
Deprecate remaining Mem0 references and archive old configs
Begin versioning as standalone project (nvgram-core, nvgram-api, etc.)

[Intake v0.1.0] - 2025-10-27

Feed intake into NeoMem
Generate daily/hourly overall summary (IE: Today Brian and Lyra worked on x, y, and z)
Generate session-aware summaries with own intake hopper

[0.2.x] - 2025-09-30 to 2025-09-24

Added

[Lyra-Mem0 v0.2.0] - 2025-09-30

Standalone Lyra-Mem0 stack created at ~/lyra-mem0/
- Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking
- Added working docker-compose.mem0.yml and custom Dockerfile for building Mem0 API server
Verified REST API functionality
- POST /memories works for adding memories
- POST /search works for semantic search
Successful end-to-end test with persisted memory: "Likes coffee in the morning" → retrievable via search ✅

[Lyra-Core v0.2.0] - 2025-09-24

Migrated Relay to use mem0ai SDK instead of raw fetch calls
Implemented sessionId support (client-supplied, fallback to default)
Added debug logs for memory add/search
Cleaned up Relay structure for clarity

Changed

[Lyra-Mem0 v0.2.0] - 2025-09-30

Split architecture into modular stacks:
- ~/lyra-core (Relay, Persona-Sidecar, etc.)
- ~/lyra-mem0 (Mem0 OSS memory stack)
Removed old embedded mem0 containers from Lyra-Core compose file
Added Lyra-Mem0 section in README.md

Next Steps

[Lyra-Mem0 v0.2.0] - 2025-09-30

Wire Relay → Mem0 API (integration not yet complete)
Add integration tests to verify persistence and retrieval from within Lyra-Core

[0.1.x] - 2025-09-25 to 2025-09-23

Added

[Lyra_RAG v0.1.0] - 2025-11-07

Initial standalone RAG module for Project Lyra
Persistent ChromaDB vector store (./chromadb)
Importer rag_chat_import.py with:
- Recursive folder scanning and category tagging
- Smart chunking (~5k chars)
- SHA-1 deduplication and chat-ID metadata
- Timestamp fields (file_modified, imported_at)
- Background-safe operation (nohup/tmux)
68 Lyra-category chats imported:
- 6,556 new chunks added
- 1,493 duplicates skipped
- 7,997 total vectors stored

[Lyra_RAG v0.1.0 API] - 2025-11-07

/rag/search FastAPI endpoint implemented (port 7090)
Supports natural-language queries and returns top related excerpts
Added answer synthesis step using gpt-4o-mini

[Lyra-Core v0.1.0] - 2025-09-23

First working MVP of Lyra Core Relay
Relay service accepts POST /v1/chat/completions (OpenAI-compatible)
Memory integration with Mem0:
- POST /memories on each user message
- POST /search before LLM call
Persona Sidecar integration (GET /current)
OpenAI GPT + Ollama (Mythomax) support in Relay
Simple browser-based chat UI (talks to Relay at http://<host>:7078)
.env standardization for Relay + Mem0 + Postgres + Neo4j
Working Neo4j + Postgres backing stores for Mem0
Initial MVP relay service with raw fetch calls to Mem0
Dockerized with basic healthcheck

[Lyra-Cortex v0.1.0] - 2025-09-25

First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD)
Built llama.cpp with llama-server target via CMake
Integrated Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF) model
Verified API compatibility at /v1/chat/completions
Local test successful via curl → ~523 token response generated
Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X)
Confirmed usable for salience scoring, summarization, and lightweight reasoning

Fixed

[Lyra-Core v0.1.0] - 2025-09-23

Resolved crash loop in Neo4j by restricting env vars (NEO4J_AUTH only)
Relay now correctly reads MEM0_URL and MEM0_API_KEY from .env

Verified

[Lyra_RAG v0.1.0] - 2025-11-07

Successful recall of Lyra-Core development history (v0.3.0 snapshot)
Correct metadata and category tagging for all new imports

Known Issues

[Lyra-Core v0.1.0] - 2025-09-23

No feedback loop (thumbs up/down) yet
Forget/delete flow is manual (via memory IDs)
Memory latency ~1–4s depending on embedding model

Next Planned

[Lyra_RAG v0.1.0] - 2025-11-07

Optional where filter parameter for category/date queries
Graceful "no results" handler for empty retrievals
rag_docs_import.py for PDFs and other document types

37 KiB Raw Blame History Unescape Escape

Project Lyra Changelog

[Unreleased]

[0.5.2] - 2025-12-12

Fixed - LLM Router & Async HTTP

Added - Error Handling & Diagnostics

Added - Session Management

Changed - Provider Configuration

Technical Improvements

[0.5.1] - 2025-12-11

Fixed - Intake Integration

Added - Diagnostics & Debugging

Changed - Intake Architecture

Documentation

[0.5.0] - 2025-11-28

Fixed - Critical API Wiring & Integration

Cortex → Intake Integration

Relay → UI Compatibility

Relay → Intake Connection

Code Quality & Python Package Structure

Verified Working

Documentation

Issues Resolved

Known Issues (Non-Critical)

Migration Notes

[Infrastructure v1.0.0] - 2025-11-26

Changed - Environment Variable Consolidation

Added - Security & Documentation

Removed

Fixed

Architecture - Multi-Backend LLM Strategy

Migration

[0.4.x] - 2025-11-13

Added - Multi-Stage Reasoning Pipeline

Changed - Unified LLM Architecture

Fixed

Removed

Verified

Known Issues

Notes

[0.3.x] - 2025-10-28 to 2025-09-26

Added

Changed

Fixed

Known Issues

Observations

Next Steps

[0.2.x] - 2025-09-30 to 2025-09-24

Added

Changed

Next Steps

[0.1.x] - 2025-09-25 to 2025-09-23

Added

Fixed

Verified

Known Issues

Next Planned

37 KiB

Raw Blame History