Files
project-lyra/CHANGELOG.md
2025-11-28 18:05:59 -05:00

35 KiB
Raw Blame History

Project Lyra — Modular Changelog

All notable changes to Project Lyra are organized by component. The format is based on Keep a Changelog and adheres to Semantic Versioning.

Last Updated: 11-28-25


🧠 Lyra-Core

[Project Lyra v0.5.0] - 2025-11-28

🔧 Fixed - Critical API Wiring & Integration

After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.

Cortex → Intake Integration

  • Fixed IntakeClient to use correct Intake v0.2 API endpoints
    • Changed GET /context/{session_id}GET /summaries?session_id={session_id}
    • Updated JSON response parsing to extract summary_text field
    • Fixed environment variable name: INTAKE_APIINTAKE_API_URL
    • Corrected default port: 70837080
    • Added deprecation warning to summarize_turn() method (endpoint removed in Intake v0.2)

Relay → UI Compatibility

  • Added OpenAI-compatible endpoint POST /v1/chat/completions
    • Accepts standard OpenAI format with messages[] array
    • Returns OpenAI-compatible response structure with choices[]
    • Extracts last message content from messages array
    • Includes usage metadata (stub values for compatibility)
  • Refactored Relay to use shared handleChatRequest() function
    • Both /chat and /v1/chat/completions use same core logic
    • Eliminates code duplication
    • Consistent error handling across endpoints

Relay → Intake Connection

  • Fixed Intake URL fallback in Relay server configuration
    • Corrected port: 70827080
    • Updated endpoint: /summary/add_exchange
    • Now properly sends exchanges to Intake for summarization

Code Quality & Python Package Structure

  • Added missing __init__.py files to all Cortex subdirectories
    • cortex/llm/__init__.py
    • cortex/reasoning/__init__.py
    • cortex/persona/__init__.py
    • cortex/ingest/__init__.py
    • cortex/utils/__init__.py
    • Improves package imports and IDE support
  • Removed unused import in cortex/router.py: from unittest import result
  • Deleted empty file cortex/llm/resolve_llm_url.py (was 0 bytes, never implemented)

Verified Working

Complete end-to-end message flow now operational:

UI → Relay (/v1/chat/completions)
  ↓
Relay → Cortex (/reason)
  ↓
Cortex → Intake (/summaries) [retrieves context]
  ↓
Cortex 4-stage pipeline:
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer
  3. refine.py → polished answer
  4. persona/speak.py → Lyra personality
  ↓
Cortex → Relay (returns persona response)
  ↓
Relay → Intake (/add_exchange) [async summary]
  ↓
Intake → NeoMem (background memory storage)
  ↓
Relay → UI (final response)

📝 Documentation

  • Added this CHANGELOG entry with comprehensive v0.5.0 notes
  • Updated README.md to reflect v0.5.0 architecture
    • Documented new endpoints
    • Updated data flow diagrams
    • Clarified Intake v0.2 changes
    • Corrected service descriptions

🐛 Issues Resolved

  • Cortex could not retrieve context from Intake (wrong endpoint)
  • UI could not send messages to Relay (endpoint mismatch)
  • Relay could not send summaries to Intake (wrong port/endpoint)
  • Python package imports were implicit (missing init.py)

⚠️ Known Issues (Non-Critical)

  • Session management endpoints not implemented in Relay (GET/POST /sessions/:id)
  • RAG service currently disabled in docker-compose.yml
  • Cortex /ingest endpoint is a stub returning {"status": "ok"}

🎯 Migration Notes

If upgrading from v0.4.x:

  1. Pull latest changes from git
  2. Verify environment variables in .env files:
    • Check INTAKE_API_URL=http://intake:7080 (not INTAKE_API)
    • Verify all service URLs use correct ports
  3. Restart Docker containers: docker-compose down && docker-compose up -d
  4. Test with a simple message through the UI

[Infrastructure v1.0.0] - 2025-11-26

Changed

  • Environment Variable Consolidation - Major reorganization to eliminate duplication and improve maintainability

    • Consolidated 9 scattered .env files into single source of truth architecture
    • Root .env now contains all shared infrastructure (LLM backends, databases, API keys, service URLs)
    • Service-specific .env files minimized to only essential overrides:
      • cortex/.env: Reduced from 42 to 22 lines (operational parameters only)
      • neomem/.env: Reduced from 26 to 14 lines (LLM naming conventions only)
      • intake/.env: Kept at 8 lines (already minimal)
    • Result: ~24% reduction in total configuration lines (197 → ~150)
  • Docker Compose Consolidation

    • All services now defined in single root docker-compose.yml
    • Relay service updated with complete configuration (env_file, volumes)
    • Removed redundant core/docker-compose.yml (marked as DEPRECATED)
    • Standardized network communication to use Docker container names
  • Service URL Standardization

    • Internal services use container names: http://neomem-api:7077, http://cortex:7081
    • External services use IP addresses: http://10.0.0.43:8000 (vLLM), http://10.0.0.3:11434 (Ollama)
    • Removed IP/container name inconsistencies across files

Added

  • Security Templates - Created .env.example files for all services

    • Root .env.example with sanitized credentials
    • Service-specific templates: cortex/.env.example, neomem/.env.example, intake/.env.example, rag/.env.example
    • All .env.example files safe to commit to version control
  • Documentation

    • ENVIRONMENT_VARIABLES.md: Comprehensive reference for all environment variables
      • Variable descriptions, defaults, and usage examples
      • Multi-backend LLM strategy documentation
      • Troubleshooting guide
      • Security best practices
    • DEPRECATED_FILES.md: Deletion guide for deprecated files with verification steps
  • Enhanced .gitignore

    • Ignores all .env files (including subdirectories)
    • Tracks .env.example templates for documentation
    • Ignores .env-backups/ directory

Removed

  • core/.env - Redundant with root .env, now deleted
  • core/docker-compose.yml - Consolidated into main compose file (marked DEPRECATED)

Fixed

  • Eliminated duplicate OPENAI_API_KEY across 5+ files
  • Eliminated duplicate LLM backend URLs across 4+ files
  • Eliminated duplicate database credentials across 3+ files
  • Resolved Cortex environment: section override in docker-compose (now uses env_file)

Architecture

  • Multi-Backend LLM Strategy: Root .env provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE
    • Cortex → vLLM (PRIMARY) for autonomous reasoning
    • NeoMem → Ollama (SECONDARY) + OpenAI embeddings
    • Intake → vLLM (PRIMARY) for summarization
    • Relay → Fallback chain with user preference
  • Preserves per-service flexibility while eliminating URL duplication

Migration

  • All original .env files backed up to .env-backups/ with timestamp 20251126_025334
  • Rollback plan documented in ENVIRONMENT_VARIABLES.md
  • Verification steps provided in DEPRECATED_FILES.md

[Lyra_RAG v0.1.0] 2025-11-07

Added

  • Initial standalone RAG module for Project Lyra.
  • Persistent ChromaDB vector store (./chromadb).
  • Importer rag_chat_import.py with:
    • Recursive folder scanning and category tagging.
    • Smart chunking (~5 k chars).
    • SHA-1 deduplication and chat-ID metadata.
    • Timestamp fields (file_modified, imported_at).
    • Background-safe operation (nohup/tmux).
  • 68 Lyra-category chats imported:
    • 6 556 new chunks added
    • 1 493 duplicates skipped
    • 7 997 total vectors now stored.

API

  • /rag/search FastAPI endpoint implemented (port 7090).
  • Supports natural-language queries and returns top related excerpts.
  • Added answer synthesis step using gpt-4o-mini.

Verified

  • Successful recall of Lyra-Core development history (v0.3.0 snapshot).
  • Correct metadata and category tagging for all new imports.

Next Planned

  • Optional where filter parameter for category/date queries.
  • Graceful “no results” handler for empty retrievals.
  • rag_docs_import.py for PDFs and other document types.

[Lyra Core v0.3.2 + Web Ui v0.2.0] - 2025-10-28

Added

  • ** New UI **

    • Cleaned up UI look and feel.
  • ** Added "sessions" **

    • Now sessions persist over time.
    • Ability to create new sessions or load sessions from a previous instance.
    • When changing the session, it updates what the prompt is sending relay (doesn't prompt with messages from other sessions).
    • Relay is correctly wired in.

[Lyra-Core 0.3.1] - 2025-10-09

Added

  • NVGRAM Integration (Full Pipeline Reconnected)
    • Replaced legacy Mem0 service with NVGRAM microservice (nvgram-api @ port 7077).
    • Updated server.js in Relay to route all memory ops via ${NVGRAM_API}/memories and /search.
    • Added .env variable:
      NVGRAM_API=http://nvgram-api:7077
      
    • Verified end-to-end Lyra conversation persistence:
      • relay → nvgram-api → postgres/neo4j → relay → ollama → ui
      • Memories stored, retrieved, and re-injected successfully.

Changed

  • Renamed MEM0_URLNVGRAM_API across all relay environment configs.
  • Updated Docker Compose service dependency order:
    • relay now depends on nvgram-api healthcheck.
    • Removed mem0 references and volumes.
  • Minor cleanup to Persona fetch block (null-checks and safer default persona string).

Fixed

  • Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling.
  • /memories POST failures no longer crash Relay; now logged gracefully as relay error Error: memAdd failed: 500.
  • Improved injected prompt debugging (DEBUG_PROMPT=true now prints clean JSON).

Goals / Next Steps

  • Add salience visualization (e.g., memory weights displayed in injected system message).
  • Begin schema alignment with NVGRAM v0.1.2 for confidence scoring.
  • Add relay auto-retry for transient 500 responses from NVGRAM.

[Lyra-Core] v0.3.1 - 2025-09-27

Changed

  • Removed salience filter logic; Cortex is now the default annotator.
  • All user messages stored in Mem0; no discard tier applied.

Added

  • Cortex annotations (metadata.cortex) now attached to memories.
  • Debug logging improvements:
    • Pretty-print Cortex annotations
    • Injected prompt preview
    • Memory search hit list with scores
  • .env toggle (CORTEX_ENABLED) to bypass Cortex when needed.

Fixed

  • Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner.
  • Relay no longer “hangs” on malformed Cortex outputs.

[Lyra-Core] v0.3.0 — 2025-09-26

Added

  • Implemented salience filtering in Relay:
    • .env configurable: SALIENCE_ENABLED, SALIENCE_MODE, SALIENCE_MODEL, SALIENCE_API_URL.
    • Supports heuristic and llm classification modes.
    • LLM-based salience filter integrated with Cortex VM running llama-server.
  • Logging improvements:
    • Added debug logs for salience mode, raw LLM output, and unexpected outputs.
    • Fail-closed behavior for unexpected LLM responses.
  • Successfully tested with Phi-3.5-mini and Qwen2-0.5B-Instruct as salience classifiers.
  • Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply.

Changed

  • Refactored server.js to gate mem.add() calls behind salience filter.
  • Updated .env to support SALIENCE_MODEL.

Known Issues

  • Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient".
  • Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi").
  • CPU-only inference is functional but limited; larger models recommended once GPU is available.

[Lyra-Core] v0.2.0 — 2025-09-24

Added

  • Migrated Relay to use mem0ai SDK instead of raw fetch calls.
  • Implemented sessionId support (client-supplied, fallback to default).
  • Added debug logs for memory add/search.
  • Cleaned up Relay structure for clarity.

[Lyra-Core] v0.1.0 — 2025-09-23

Added

  • First working MVP of Lyra Core Relay.
  • Relay service accepts POST /v1/chat/completions (OpenAI-compatible).
  • Memory integration with Mem0:
    • POST /memories on each user message.
    • POST /search before LLM call.
  • Persona Sidecar integration (GET /current).
  • OpenAI GPT + Ollama (Mythomax) support in Relay.
  • Simple browser-based chat UI (talks to Relay at http://<host>:7078).
  • .env standardization for Relay + Mem0 + Postgres + Neo4j.
  • Working Neo4j + Postgres backing stores for Mem0.
  • Initial MVP relay service with raw fetch calls to Mem0.
  • Dockerized with basic healthcheck.

Fixed

  • Resolved crash loop in Neo4j by restricting env vars (NEO4J_AUTH only).
  • Relay now correctly reads MEM0_URL and MEM0_API_KEY from .env.

Known Issues

  • No feedback loop (thumbs up/down) yet.
  • Forget/delete flow is manual (via memory IDs).
  • Memory latency ~14s depending on embedding model.

🧩 lyra-neomem (used to be NVGRAM / Lyra-Mem0)

[NeoMem 0.1.2] - 2025-10-27

Changed

  • Renamed NVGRAM to neomem
    • All future updates will be under the name NeoMem.
    • Features have not changed.

[NVGRAM 0.1.1] - 2025-10-08

Added

  • Async Memory Rewrite (Stability + Safety Patch)
    • Introduced AsyncMemory class with fully asynchronous vector and graph store writes.
    • Added input sanitation to prevent embedding errors ('list' object has no attribute 'replace').
    • Implemented flatten_messages() helper in API layer to clean malformed payloads.
    • Added structured request logging via RequestLoggingMiddleware (FastAPI middleware).
    • Health endpoint (/health) now returns structured JSON {status, version, service}.
    • Startup logs now include sanitized embedder config with API keys masked for safety:
      >>> Embedder config (sanitized): {'provider': 'openai', 'config': {'model': 'text-embedding-3-small', 'api_key': '***'}}
      ✅ Connected to Neo4j on attempt 1
      🧠 NVGRAM v0.1.1 — Neural Vectorized Graph Recall and Memory initialized
      

Changed

  • Replaced synchronous Memory.add() with async-safe version supporting concurrent vector + graph writes.
  • Normalized indentation and cleaned duplicate main.py references under /nvgram/ vs /nvgram/server/.
  • Removed redundant FastAPI() app reinitialization.
  • Updated internal logging to INFO-level timing format: 2025-10-08 21:48:45 [INFO] POST /memories -> 200 (11189.1 ms)
  • Deprecated @app.on_event("startup") (FastAPI deprecation warning) → will migrate to lifespan handler in v0.1.2.

Fixed

  • Eliminated repeating 500 error from OpenAI embedder caused by non-string message content.
  • Masked API key leaks from boot logs.
  • Ensured Neo4j reconnects gracefully on first retry.

Goals / Next Steps

  • Integrate salience scoring and embedding confidence weight fields in Postgres schema.
  • Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall.
  • Migrate from deprecated on_eventlifespan pattern in 0.1.2.

[NVGRAM 0.1.0] - 2025-10-07

Added

  • Initial fork of Mem0 → NVGRAM:
    • Created a fully independent local-first memory engine based on Mem0 OSS.
    • Renamed all internal modules, Docker services, and environment variables from mem0nvgram.
    • New service name: nvgram-api, default port 7077.
    • Maintains same API endpoints (/memories, /search) for drop-in compatibility with Lyra Core.
    • Uses FastAPI, Postgres, and Neo4j as persistent backends.
    • Verified clean startup:
      ✅ Connected to Neo4j on attempt 1
      INFO: Uvicorn running on http://0.0.0.0:7077
      
    • /docs and /openapi.json confirmed reachable and functional.

Changed

  • Removed dependency on the external mem0ai SDK — all logic now local.
  • Re-pinned requirements:
    • fastapi==0.115.8
    • uvicorn==0.34.0
    • pydantic==2.10.4
    • python-dotenv==1.0.1
    • psycopg>=3.2.8
    • ollama
  • Adjusted docker-compose and .env templates to use new NVGRAM naming and image paths.

Goals / Next Steps

  • Integrate NVGRAM as the new default backend in Lyra Relay.
  • Deprecate remaining Mem0 references and archive old configs.
  • Begin versioning as a standalone project (nvgram-core, nvgram-api, etc.).

[Lyra-Mem0 0.3.2] - 2025-10-05

Added

  • Support for Ollama LLM reasoning alongside OpenAI embeddings:
    • Introduced LLM_PROVIDER=ollama, LLM_MODEL, and OLLAMA_HOST in .env.3090.
    • Verified local 3090 setup using qwen2.5:7b-instruct-q4_K_M.
    • Split processing pipeline:
      • Embeddings → OpenAI text-embedding-3-small
      • LLM → Local Ollama (http://10.0.0.3:11434/api/chat).
  • Added .env.3090 template for self-hosted inference nodes.
  • Integrated runtime diagnostics and seeder progress tracking:
    • File-level + message-level progress bars.
    • Retry/back-off logic for timeouts (3 attempts).
    • Event logging (ADD / UPDATE / NONE) for every memory record.
  • Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers.
  • Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090).

Changed

  • Updated main.py configuration block to load:
    • LLM_PROVIDER, LLM_MODEL, and OLLAMA_BASE_URL.
    • Fallback to OpenAI if Ollama unavailable.
  • Adjusted docker-compose.yml mount paths to correctly map /app/main.py.
  • Normalized .env loading so mem0-api and host environment share identical values.
  • Improved seeder logging and progress telemetry for clearer diagnostics.
  • Added explicit temperature field to DEFAULT_CONFIG['llm']['config'] for tuning future local inference runs.

Fixed

  • Resolved crash during startup: TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'.
  • Corrected mount type mismatch (file vs directory) causing OCI runtime create failed errors.
  • Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests.
  • “Unknown event” warnings now safely ignored (no longer break seeding loop).
  • Confirmed full dual-provider operation in logs (api.openai.com + 10.0.0.3:11434/api/chat).

Observations

  • Stable GPU utilization: ~8 GB VRAM @ 92 % load, ≈ 67 °C under sustained seeding.
  • Next revision will re-format seed JSON to preserve role context (user vs assistant).

[Lyra-Mem0 0.3.1] - 2025-10-03

Added

  • HuggingFace TEI integration (local 3090 embedder).
  • Dual-mode environment switch between OpenAI cloud and local.
  • CSV export of memories from Postgres (payload->>'data').

Fixed

  • .env CRLF vs LF line ending issues.
  • Local seeding now possible via huggingface server running

[Lyra-mem0 0.3.0]

Added

  • Support for Ollama embeddings in Mem0 OSS container:
    • Added ability to configure EMBEDDER_PROVIDER=ollama and set EMBEDDER_MODEL + OLLAMA_HOST via .env.
    • Mounted main.py override from host into container to load custom DEFAULT_CONFIG.
    • Installed ollama Python client into custom API container image.
  • .env.3090 file created for external embedding mode (3090 machine):
  • Workflow to support multiple embedding modes:
    1. Fast LAN-based 3090/Ollama embeddings
    2. Local-only CPU embeddings (Lyra Cortex VM)
    3. OpenAI fallback embeddings

Changed

  • docker-compose.yml updated to mount local main.py and .env.3090.
  • Built custom Dockerfile (mem0-api-server:latest) extending base image with pip install ollama.
  • Updated requirements.txt to include ollama package.
  • Adjusted Mem0 container config so main.py pulls environment variables with dotenv (load_dotenv()).
  • Tested new embeddings path with curl /memories API call.

Fixed

  • Resolved container boot failure caused by missing ollama dependency (ModuleNotFoundError).
  • Fixed config overwrite issue where rebuilding container restored stock main.py.
  • Worked around Neo4j error (vector.similarity.cosine(): mismatched vector dimensions) by confirming OpenAI vs. Ollama embedding vector sizes and planning to standardize at 1536-dim.

--

[Lyra-mem0 v0.2.1]

Added

  • Seeding pipeline:
    • Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0.
    • Implemented incremental seeding option (skip existing memories, only add new ones).
    • Verified insert process with Postgres-backed history DB and curl /memories/search sanity check.
  • Ollama embedding support in Mem0 OSS container:
    • Added configuration for EMBEDDER_PROVIDER=ollama, EMBEDDER_MODEL, and OLLAMA_HOST via .env.
    • Created .env.3090 profile for LAN-connected 3090 machine with Ollama.
    • Set up three embedding modes:
      1. Fast LAN-based 3090/Ollama
      2. Local-only CPU model (Lyra Cortex VM)
      3. OpenAI fallback

Changed

  • Updated main.py to load configuration from .env using dotenv and support multiple embedder backends.
  • Mounted host main.py into container so local edits persist across rebuilds.
  • Updated docker-compose.yml to mount .env.3090 and support swap between profiles.
  • Built custom Dockerfile (mem0-api-server:latest) including pip install ollama.
  • Updated requirements.txt with ollama dependency.
  • Adjusted startup flow so container automatically connects to external Ollama host (LAN IP).
  • Added logging to confirm model pulls and embedding requests.

Fixed

  • Seeder process originally failed on old memories — now skips duplicates and continues batch.
  • Resolved container boot error (ModuleNotFoundError: ollama) by extending image.
  • Fixed overwrite issue where stock main.py replaced custom config during rebuild.
  • Worked around Neo4j vector.similarity.cosine() dimension mismatch by investigating OpenAI (1536-dim) vs Ollama (1024-dim) schemas.

Notes

  • To fully unify embedding modes, a Hugging Face / local model with 1536-dim embeddings will be needed (to match OpenAIs schema and avoid Neo4j errors).
  • Current Ollama model (mxbai-embed-large) works, but returns 1024-dim vectors.
  • Seeder workflow validated but should be wrapped in a repeatable weekly run for full Cloud→Local sync.

[Lyra-Mem0 v0.2.0] - 2025-09-30

Added

  • Standalone Lyra-Mem0 stack created at ~/lyra-mem0/
    • Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking.
    • Added working docker-compose.mem0.yml and custom Dockerfile for building the Mem0 API server.
  • Verified REST API functionality:
    • POST /memories works for adding memories.
    • POST /search works for semantic search.
  • Successful end-to-end test with persisted memory:
    "Likes coffee in the morning" → retrievable via search.

Changed

  • Split architecture into modular stacks:
    • ~/lyra-core (Relay, Persona-Sidecar, etc.)
    • ~/lyra-mem0 (Mem0 OSS memory stack)
  • Removed old embedded mem0 containers from the Lyra-Core compose file.
  • Added Lyra-Mem0 section in README.md.

Next Steps

  • Wire Relay → Mem0 API (integration not yet complete).
  • Add integration tests to verify persistence and retrieval from within Lyra-Core.

🧠 Lyra-Cortex

[ Cortex - v0.5] -2025-11-13

Added

  • New reasoning.py module

    • Async reasoning engine.
    • Accepts user prompt, identity, RAG block, and reflection notes.
    • Produces draft internal answers.
    • Uses primary backend (vLLM).
  • New reflection.py module

    • Fully async.
    • Produces actionable JSON “internal notes.”
    • Enforces strict JSON schema and fallback parsing.
    • Forces cloud backend (backend_override="cloud").
  • Integrated refine.py into Cortex reasoning pipeline:

    • New stage between reflection and persona.
    • Runs exclusively on primary vLLM backend (MI50).
    • Produces final, internally consistent output for downstream persona layer.
  • Backend override system

    • Each LLM call can now select its own backend.
    • Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary.
  • identity loader

    • Added identity.py with load_identity() for consistent persona retrieval.
  • ingest_handler

    • Async stub created for future Intake → NeoMem → RAG pipeline.

Changed

  • Unified LLM backend URL handling across Cortex:

    • ENV variables must now contain FULL API endpoints.
    • Removed all internal path-appending (e.g. .../v1/completions).
    • llm_router.py rewritten to use env-provided URLs as-is.
    • Ensures consistent behavior between draft, reflection, refine, and persona.
  • Rebuilt main.py

    • Removed old annotation/analysis logic.
    • New structure: load identity → get RAG → reflect → reason → return draft+notes.
    • Routes now clean and minimal (/reason, /ingest, /health).
    • Async path throughout Cortex.
  • Refactored llm_router.py

    • Removed old fallback logic during overrides.
    • OpenAI requests now use /v1/chat/completions.
    • Added proper OpenAI Authorization headers.
    • Distinct payload format for vLLM vs OpenAI.
    • Unified, correct parsing across models.
  • Simplified Cortex architecture

    • Removed deprecated “context.py” and old reasoning code.
    • Relay completely decoupled from smart behavior.
  • Updated environment specification:

    • LLM_PRIMARY_URL now set to http://10.0.0.43:8000/v1/completions.
    • LLM_SECONDARY_URL remains http://10.0.0.3:11434/api/generate (Ollama).
    • LLM_CLOUD_URL set to https://api.openai.com/v1/chat/completions.

Fixed

  • Resolved endpoint conflict where:
    • Router expected base URLs.
    • Refine expected full URLs.
    • Refine always fell back due to hitting incorrect endpoint.
    • Fixed by standardizing full-URL behavior across entire system.
  • Reflection layer no longer fails silently (previously returned [""] due to MythoMax).
  • Resolved 404/401 errors caused by incorrect OpenAI URL endpoints.
  • No more double-routing through vLLM during reflection.
  • Corrected async/sync mismatch in multiple locations.
  • Eliminated double-path bug (/v1/completions/v1/completions) caused by previous router logic.

Removed

  • Legacy annotate, reason_check glue logic from old architecture.
  • Old backend probing junk code.
  • Stale imports and unused modules leftover from previous prototype.

Verified

  • Cortex → vLLM (MI50) → refine → final_output now functioning correctly.
  • refine shows used_primary_backend: true and no fallback.
  • Manual curl test confirms endpoint accuracy.

Known Issues

  • refine sometimes prefixes output with "Final Answer:"; next version will sanitize this.
  • hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned).

Pending / Known Issues

  • RAG service does not exist — requires containerized FastAPI service.
  • Reasoning layer lacks self-revision loop (deliberate thought cycle).
  • No speak/persona generation layer yet (speak.py planned).
  • Intake summaries not yet routing into RAG or reflection layer.
  • No refinement engine between reasoning and speak.

Notes

This is the largest structural change to Cortex so far.
It establishes:

  • multi-model cognition
  • clean layering
  • identity + reflection separation
  • correct async code
  • deterministic backend routing
  • predictable JSON reflection

The system is now ready for:

  • refinement loops
  • persona-speaking layer
  • containerized RAG
  • long-term memory integration
  • true emergent-behavior experiments

[ Cortex - v0.4.1] - 2025-11-5

Added

  • RAG intergration
    • Added rag.py with query_rag() and format_rag_block().
    • Cortex now queries the local RAG API (http://10.0.0.41:7090/rag/search) for contextual augmentation.
    • Synthesized answers and top excerpts are injected into the reasoning prompt.

Changed

  • Revised /reason endpoint.

    • Now builds unified context blocks:
      • [Intake] → recent summaries
      • [RAG] → contextual knowledge
      • [User Message] → current input
    • Calls call_llm() for the first pass, then reflection_loop() for meta-evaluation.
    • Returns cortex_prompt, draft_output, final_output, and normalized reflection.
  • Reflection Pipeline Stability

    • Cleaned parsing to normalize JSON vs. text reflections.
    • Added fallback handling for malformed or non-JSON outputs.
    • Log system improved to show raw JSON, extracted fields, and normalized summary.
  • Async Summarization (Intake v0.2.1)

    • Intake summaries now run in background threads to avoid blocking Cortex.
    • Summaries (L1L∞) logged asynchronously with [BG] tags.
  • Environment & Networking Fixes

    • Verified .env variables propagate correctly inside the Cortex container.
    • Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG (shared serversdown_lyra_net).
    • Adjusted localhost calls to service-IP mapping (10.0.0.41 for Cortex host).
  • Behavioral Updates

    • Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers).
    • RAG context successfully grounds reasoning outputs.
    • Intake and NeoMem confirmed receiving summaries via /add_exchange.
    • Log clarity pass: all reflective and contextual blocks clearly labeled.
  • Known Gaps / Next Steps

    • NeoMem Tuning
    • Improve retrieval latency and relevance.
    • Implement a dedicated /reflections/recent endpoint for Cortex.
    • Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem).
  • Cortex Enhancements

    • Add persistent reflection recall (use prior reflections as meta-context).
    • Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields).
    • Tighten temperature and prompt control for factual consistency.
  • RAG Optimization -Add source ranking, filtering, and multi-vector hybrid search. -Cache RAG responses per session to reduce duplicate calls.

  • Documentation / Monitoring -Add health route for RAG and Intake summaries. -Include internal latency metrics in /health endpoint.

Consolidate logs into unified “Lyra Cortex Console” for tracing all module calls.

[Cortex - v0.3.0] 2025-10-31

Added

  • Cortex Service (FastAPI)

    • New standalone reasoning engine (cortex/main.py) with endpoints:
      • GET /health reports active backend + NeoMem status.
      • POST /reason evaluates {prompt, response} pairs.
      • POST /annotate experimental text analysis.
    • Background NeoMem health monitor (5-minute interval).
  • Multi-Backend Reasoning Support

    • Added environment-driven backend selection via LLM_FORCE_BACKEND.
    • Supports:
      • Primary → vLLM (MI50 node @ 10.0.0.43)
      • Secondary → Ollama (3090 node @ 10.0.0.3)
      • Cloud → OpenAI API
      • Fallback → llama.cpp (CPU)
    • Introduced per-backend model variables:
      LLM_PRIMARY_MODEL, LLM_SECONDARY_MODEL, LLM_CLOUD_MODEL, LLM_FALLBACK_MODEL.
  • Response Normalization Layer

    • Implemented normalize_llm_response() to merge streamed outputs and repair malformed JSON.
    • Handles Ollamas multi-line streaming and Mythomaxs missing punctuation issues.
    • Prints concise debug previews of merged content.
  • Environment Simplification

    • Each service (intake, cortex, neomem) now maintains its own .env file.
    • Removed reliance on shared/global env file to prevent cross-contamination.
    • Verified Docker Compose networking across containers.

Changed

  • Refactored reason_check() to dynamically switch between prompt and chat mode depending on backend.
  • Enhanced startup logs to announce active backend, model, URL, and mode.
  • Improved error handling with clearer “Reasoning error” messages.

Fixed

  • Corrected broken vLLM endpoint routing (/v1/completions).
  • Stabilized cross-container health reporting for NeoMem.
  • Resolved JSON parse failures caused by streaming chunk delimiters.

Next Planned [v0.4.0]

Planned Additions

  • Reflection Mode

    • Introduce REASONING_MODE=factcheck|reflection.
    • Output schema:
      { "insight": "...", "evaluation": "...", "next_action": "..." }
      
  • Cortex-First Pipeline

    • UI → Cortex → [Reflection + Verifier + Memory] → Speech LLM → User.
    • Allows Lyra to “think before speaking.”
  • Verifier Stub

    • New /verify endpoint for search-based factual grounding.
    • Asynchronous external truth checking.
  • Memory Integration

    • Feed reflective outputs into NeoMem.
    • Enable “dream” cycles for autonomous self-review.

Status: 🟢 Stable Core Multi-backend reasoning operational.
Next milestone: v0.4.0 — Reflection Mode + Thought Pipeline orchestration.


[Intake] v0.1.0 - 2025-10-27

- Recieves messages from relay and summarizes them in a cascading format.
- Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
- Currently logs summaries to .log file in /project-lyra/intake-logs/

** Next Steps ** - Feed intake into neomem. - Generate a daily/hourly/etc overall summary, (IE: Today Brian and Lyra worked on x, y, and z) - Generate session aware summaries, with its own intake hopper.

[Lyra-Cortex] v0.2.0 — 2025-09-26

**Added

  • Integrated llama-server on dedicated Cortex VM (Proxmox).
  • Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs.
  • Benchmarked Phi-3.5-mini performance:
    • ~18 tokens/sec CPU-only on Ryzen 7 7800X.
    • Salience classification functional but sometimes inconsistent ("sali", "fi", "jamming").
  • Tested Qwen2-0.5B-Instruct GGUF as alternative salience classifier:
    • Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval).
    • More responsive but over-classifies messages as “salient.”
  • Established .env integration for model ID (SALIENCE_MODEL), enabling hot-swap between models.

** Known Issues

  • Small models tend to drift or over-classify.
  • CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models.
  • Need to set up a systemd service for llama-server to auto-start on VM reboot.

[Lyra-Cortex] v0.1.0 — 2025-09-25

Added

  • First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD).
  • Built llama.cpp with llama-server target via CMake.
  • Integrated Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF) model.
  • Verified API compatibility at /v1/chat/completions.
  • Local test successful via curl → ~523 token response generated.
  • Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X).
  • Confirmed usable for salience scoring, summarization, and lightweight reasoning.