# Project Lyra — Modular Changelog All notable changes to Project Lyra are organized by component. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/). # Last Updated: 11-26-25 --- ## 🧠 Lyra-Core ############################################################################## ## [Infrastructure v1.0.0] - 2025-11-26 ### Changed - **Environment Variable Consolidation** - Major reorganization to eliminate duplication and improve maintainability - Consolidated 9 scattered `.env` files into single source of truth architecture - Root `.env` now contains all shared infrastructure (LLM backends, databases, API keys, service URLs) - Service-specific `.env` files minimized to only essential overrides: - `cortex/.env`: Reduced from 42 to 22 lines (operational parameters only) - `neomem/.env`: Reduced from 26 to 14 lines (LLM naming conventions only) - `intake/.env`: Kept at 8 lines (already minimal) - **Result**: ~24% reduction in total configuration lines (197 → ~150) - **Docker Compose Consolidation** - All services now defined in single root `docker-compose.yml` - Relay service updated with complete configuration (env_file, volumes) - Removed redundant `core/docker-compose.yml` (marked as DEPRECATED) - Standardized network communication to use Docker container names - **Service URL Standardization** - Internal services use container names: `http://neomem-api:7077`, `http://cortex:7081` - External services use IP addresses: `http://10.0.0.43:8000` (vLLM), `http://10.0.0.3:11434` (Ollama) - Removed IP/container name inconsistencies across files ### Added - **Security Templates** - Created `.env.example` files for all services - Root `.env.example` with sanitized credentials - Service-specific templates: `cortex/.env.example`, `neomem/.env.example`, `intake/.env.example`, `rag/.env.example` - All `.env.example` files safe to commit to version control - **Documentation** - `ENVIRONMENT_VARIABLES.md`: Comprehensive reference for all environment variables - Variable descriptions, defaults, and usage examples - Multi-backend LLM strategy documentation - Troubleshooting guide - Security best practices - `DEPRECATED_FILES.md`: Deletion guide for deprecated files with verification steps - **Enhanced .gitignore** - Ignores all `.env` files (including subdirectories) - Tracks `.env.example` templates for documentation - Ignores `.env-backups/` directory ### Removed - `core/.env` - Redundant with root `.env`, now deleted - `core/docker-compose.yml` - Consolidated into main compose file (marked DEPRECATED) ### Fixed - Eliminated duplicate `OPENAI_API_KEY` across 5+ files - Eliminated duplicate LLM backend URLs across 4+ files - Eliminated duplicate database credentials across 3+ files - Resolved Cortex `environment:` section override in docker-compose (now uses env_file) ### Architecture - **Multi-Backend LLM Strategy**: Root `.env` provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE - Cortex → vLLM (PRIMARY) for autonomous reasoning - NeoMem → Ollama (SECONDARY) + OpenAI embeddings - Intake → vLLM (PRIMARY) for summarization - Relay → Fallback chain with user preference - Preserves per-service flexibility while eliminating URL duplication ### Migration - All original `.env` files backed up to `.env-backups/` with timestamp `20251126_025334` - Rollback plan documented in `ENVIRONMENT_VARIABLES.md` - Verification steps provided in `DEPRECATED_FILES.md` --- ## [Lyra_RAG v0.1.0] 2025-11-07 ### Added - Initial standalone RAG module for Project Lyra. - Persistent ChromaDB vector store (`./chromadb`). - Importer `rag_chat_import.py` with: - Recursive folder scanning and category tagging. - Smart chunking (~5 k chars). - SHA-1 deduplication and chat-ID metadata. - Timestamp fields (`file_modified`, `imported_at`). - Background-safe operation (`nohup`/`tmux`). - 68 Lyra-category chats imported: - **6 556 new chunks added** - **1 493 duplicates skipped** - **7 997 total vectors** now stored. ### API - `/rag/search` FastAPI endpoint implemented (port 7090). - Supports natural-language queries and returns top related excerpts. - Added answer synthesis step using `gpt-4o-mini`. ### Verified - Successful recall of Lyra-Core development history (v0.3.0 snapshot). - Correct metadata and category tagging for all new imports. ### Next Planned - Optional `where` filter parameter for category/date queries. - Graceful “no results” handler for empty retrievals. - `rag_docs_import.py` for PDFs and other document types. ## [Lyra Core v0.3.2 + Web Ui v0.2.0] - 2025-10-28 ### Added - ** New UI ** - Cleaned up UI look and feel. - ** Added "sessions" ** - Now sessions persist over time. - Ability to create new sessions or load sessions from a previous instance. - When changing the session, it updates what the prompt is sending relay (doesn't prompt with messages from other sessions). - Relay is correctly wired in. ## [Lyra-Core 0.3.1] - 2025-10-09 ### Added - **NVGRAM Integration (Full Pipeline Reconnected)** - Replaced legacy Mem0 service with NVGRAM microservice (`nvgram-api` @ port 7077). - Updated `server.js` in Relay to route all memory ops via `${NVGRAM_API}/memories` and `/search`. - Added `.env` variable: ``` NVGRAM_API=http://nvgram-api:7077 ``` - Verified end-to-end Lyra conversation persistence: - `relay → nvgram-api → postgres/neo4j → relay → ollama → ui` - ✅ Memories stored, retrieved, and re-injected successfully. ### Changed - Renamed `MEM0_URL` → `NVGRAM_API` across all relay environment configs. - Updated Docker Compose service dependency order: - `relay` now depends on `nvgram-api` healthcheck. - Removed `mem0` references and volumes. - Minor cleanup to Persona fetch block (null-checks and safer default persona string). ### Fixed - Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling. - `/memories` POST failures no longer crash Relay; now logged gracefully as `relay error Error: memAdd failed: 500`. - Improved injected prompt debugging (`DEBUG_PROMPT=true` now prints clean JSON). ### Goals / Next Steps - Add salience visualization (e.g., memory weights displayed in injected system message). - Begin schema alignment with NVGRAM v0.1.2 for confidence scoring. - Add relay auto-retry for transient 500 responses from NVGRAM. --- ## [Lyra-Core] v0.3.1 - 2025-09-27 ### Changed - Removed salience filter logic; Cortex is now the default annotator. - All user messages stored in Mem0; no discard tier applied. ### Added - Cortex annotations (`metadata.cortex`) now attached to memories. - Debug logging improvements: - Pretty-print Cortex annotations - Injected prompt preview - Memory search hit list with scores - `.env` toggle (`CORTEX_ENABLED`) to bypass Cortex when needed. ### Fixed - Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner. - Relay no longer “hangs” on malformed Cortex outputs. --- ### [Lyra-Core] v0.3.0 — 2025-09-26 #### Added - Implemented **salience filtering** in Relay: - `.env` configurable: `SALIENCE_ENABLED`, `SALIENCE_MODE`, `SALIENCE_MODEL`, `SALIENCE_API_URL`. - Supports `heuristic` and `llm` classification modes. - LLM-based salience filter integrated with Cortex VM running `llama-server`. - Logging improvements: - Added debug logs for salience mode, raw LLM output, and unexpected outputs. - Fail-closed behavior for unexpected LLM responses. - Successfully tested with **Phi-3.5-mini** and **Qwen2-0.5B-Instruct** as salience classifiers. - Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply. #### Changed - Refactored `server.js` to gate `mem.add()` calls behind salience filter. - Updated `.env` to support `SALIENCE_MODEL`. #### Known Issues - Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient". - Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi"). - CPU-only inference is functional but limited; larger models recommended once GPU is available. --- ### [Lyra-Core] v0.2.0 — 2025-09-24 #### Added - Migrated Relay to use `mem0ai` SDK instead of raw fetch calls. - Implemented `sessionId` support (client-supplied, fallback to `default`). - Added debug logs for memory add/search. - Cleaned up Relay structure for clarity. --- ### [Lyra-Core] v0.1.0 — 2025-09-23 #### Added - First working MVP of **Lyra Core Relay**. - Relay service accepts `POST /v1/chat/completions` (OpenAI-compatible). - Memory integration with Mem0: - `POST /memories` on each user message. - `POST /search` before LLM call. - Persona Sidecar integration (`GET /current`). - OpenAI GPT + Ollama (Mythomax) support in Relay. - Simple browser-based chat UI (talks to Relay at `http://:7078`). - `.env` standardization for Relay + Mem0 + Postgres + Neo4j. - Working Neo4j + Postgres backing stores for Mem0. - Initial MVP relay service with raw fetch calls to Mem0. - Dockerized with basic healthcheck. #### Fixed - Resolved crash loop in Neo4j by restricting env vars (`NEO4J_AUTH` only). - Relay now correctly reads `MEM0_URL` and `MEM0_API_KEY` from `.env`. #### Known Issues - No feedback loop (thumbs up/down) yet. - Forget/delete flow is manual (via memory IDs). - Memory latency ~1–4s depending on embedding model. --- ## 🧩 lyra-neomem (used to be NVGRAM / Lyra-Mem0) ############################################################################## ## [NeoMem 0.1.2] - 2025-10-27 ### Changed - **Renamed NVGRAM to neomem** - All future updates will be under the name NeoMem. - Features have not changed. ## [NVGRAM 0.1.1] - 2025-10-08 ### Added - **Async Memory Rewrite (Stability + Safety Patch)** - Introduced `AsyncMemory` class with fully asynchronous vector and graph store writes. - Added **input sanitation** to prevent embedding errors (`'list' object has no attribute 'replace'`). - Implemented `flatten_messages()` helper in API layer to clean malformed payloads. - Added structured request logging via `RequestLoggingMiddleware` (FastAPI middleware). - Health endpoint (`/health`) now returns structured JSON `{status, version, service}`. - Startup logs now include **sanitized embedder config** with API keys masked for safety: ``` >>> Embedder config (sanitized): {'provider': 'openai', 'config': {'model': 'text-embedding-3-small', 'api_key': '***'}} ✅ Connected to Neo4j on attempt 1 🧠 NVGRAM v0.1.1 — Neural Vectorized Graph Recall and Memory initialized ``` ### Changed - Replaced synchronous `Memory.add()` with async-safe version supporting concurrent vector + graph writes. - Normalized indentation and cleaned duplicate `main.py` references under `/nvgram/` vs `/nvgram/server/`. - Removed redundant `FastAPI()` app reinitialization. - Updated internal logging to INFO-level timing format: 2025-10-08 21:48:45 [INFO] POST /memories -> 200 (11189.1 ms) - Deprecated `@app.on_event("startup")` (FastAPI deprecation warning) → will migrate to `lifespan` handler in v0.1.2. ### Fixed - Eliminated repeating 500 error from OpenAI embedder caused by non-string message content. - Masked API key leaks from boot logs. - Ensured Neo4j reconnects gracefully on first retry. ### Goals / Next Steps - Integrate **salience scoring** and **embedding confidence weight** fields in Postgres schema. - Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall. - Migrate from deprecated `on_event` → `lifespan` pattern in 0.1.2. --- ## [NVGRAM 0.1.0] - 2025-10-07 ### Added - **Initial fork of Mem0 → NVGRAM**: - Created a fully independent local-first memory engine based on Mem0 OSS. - Renamed all internal modules, Docker services, and environment variables from `mem0` → `nvgram`. - New service name: **`nvgram-api`**, default port **7077**. - Maintains same API endpoints (`/memories`, `/search`) for drop-in compatibility with Lyra Core. - Uses **FastAPI**, **Postgres**, and **Neo4j** as persistent backends. - Verified clean startup: ``` ✅ Connected to Neo4j on attempt 1 INFO: Uvicorn running on http://0.0.0.0:7077 ``` - `/docs` and `/openapi.json` confirmed reachable and functional. ### Changed - Removed dependency on the external `mem0ai` SDK — all logic now local. - Re-pinned requirements: - fastapi==0.115.8 - uvicorn==0.34.0 - pydantic==2.10.4 - python-dotenv==1.0.1 - psycopg>=3.2.8 - ollama - Adjusted `docker-compose` and `.env` templates to use new NVGRAM naming and image paths. ### Goals / Next Steps - Integrate NVGRAM as the new default backend in Lyra Relay. - Deprecate remaining Mem0 references and archive old configs. - Begin versioning as a standalone project (`nvgram-core`, `nvgram-api`, etc.). --- ## [Lyra-Mem0 0.3.2] - 2025-10-05 ### Added - Support for **Ollama LLM reasoning** alongside OpenAI embeddings: - Introduced `LLM_PROVIDER=ollama`, `LLM_MODEL`, and `OLLAMA_HOST` in `.env.3090`. - Verified local 3090 setup using `qwen2.5:7b-instruct-q4_K_M`. - Split processing pipeline: - Embeddings → OpenAI `text-embedding-3-small` - LLM → Local Ollama (`http://10.0.0.3:11434/api/chat`). - Added `.env.3090` template for self-hosted inference nodes. - Integrated runtime diagnostics and seeder progress tracking: - File-level + message-level progress bars. - Retry/back-off logic for timeouts (3 attempts). - Event logging (`ADD / UPDATE / NONE`) for every memory record. - Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers. - Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090). ### Changed - Updated `main.py` configuration block to load: - `LLM_PROVIDER`, `LLM_MODEL`, and `OLLAMA_BASE_URL`. - Fallback to OpenAI if Ollama unavailable. - Adjusted `docker-compose.yml` mount paths to correctly map `/app/main.py`. - Normalized `.env` loading so `mem0-api` and host environment share identical values. - Improved seeder logging and progress telemetry for clearer diagnostics. - Added explicit `temperature` field to `DEFAULT_CONFIG['llm']['config']` for tuning future local inference runs. ### Fixed - Resolved crash during startup: `TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'`. - Corrected mount type mismatch (file vs directory) causing `OCI runtime create failed` errors. - Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests. - “Unknown event” warnings now safely ignored (no longer break seeding loop). - Confirmed full dual-provider operation in logs (`api.openai.com` + `10.0.0.3:11434/api/chat`). ### Observations - Stable GPU utilization: ~8 GB VRAM @ 92 % load, ≈ 67 °C under sustained seeding. - Next revision will re-format seed JSON to preserve `role` context (user vs assistant). --- ## [Lyra-Mem0 0.3.1] - 2025-10-03 ### Added - HuggingFace TEI integration (local 3090 embedder). - Dual-mode environment switch between OpenAI cloud and local. - CSV export of memories from Postgres (`payload->>'data'`). ### Fixed - `.env` CRLF vs LF line ending issues. - Local seeding now possible via huggingface server running --- ## [Lyra-mem0 0.3.0] ### Added - Support for **Ollama embeddings** in Mem0 OSS container: - Added ability to configure `EMBEDDER_PROVIDER=ollama` and set `EMBEDDER_MODEL` + `OLLAMA_HOST` via `.env`. - Mounted `main.py` override from host into container to load custom `DEFAULT_CONFIG`. - Installed `ollama` Python client into custom API container image. - `.env.3090` file created for external embedding mode (3090 machine): - EMBEDDER_PROVIDER=ollama - EMBEDDER_MODEL=mxbai-embed-large - OLLAMA_HOST=http://10.0.0.3:11434 - Workflow to support **multiple embedding modes**: 1. Fast LAN-based 3090/Ollama embeddings 2. Local-only CPU embeddings (Lyra Cortex VM) 3. OpenAI fallback embeddings ### Changed - `docker-compose.yml` updated to mount local `main.py` and `.env.3090`. - Built **custom Dockerfile** (`mem0-api-server:latest`) extending base image with `pip install ollama`. - Updated `requirements.txt` to include `ollama` package. - Adjusted Mem0 container config so `main.py` pulls environment variables with `dotenv` (`load_dotenv()`). - Tested new embeddings path with curl `/memories` API call. ### Fixed - Resolved container boot failure caused by missing `ollama` dependency (`ModuleNotFoundError`). - Fixed config overwrite issue where rebuilding container restored stock `main.py`. - Worked around Neo4j error (`vector.similarity.cosine(): mismatched vector dimensions`) by confirming OpenAI vs. Ollama embedding vector sizes and planning to standardize at 1536-dim. -- ## [Lyra-mem0 v0.2.1] ### Added - **Seeding pipeline**: - Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0. - Implemented incremental seeding option (skip existing memories, only add new ones). - Verified insert process with Postgres-backed history DB and curl `/memories/search` sanity check. - **Ollama embedding support** in Mem0 OSS container: - Added configuration for `EMBEDDER_PROVIDER=ollama`, `EMBEDDER_MODEL`, and `OLLAMA_HOST` via `.env`. - Created `.env.3090` profile for LAN-connected 3090 machine with Ollama. - Set up three embedding modes: 1. Fast LAN-based 3090/Ollama 2. Local-only CPU model (Lyra Cortex VM) 3. OpenAI fallback ### Changed - Updated `main.py` to load configuration from `.env` using `dotenv` and support multiple embedder backends. - Mounted host `main.py` into container so local edits persist across rebuilds. - Updated `docker-compose.yml` to mount `.env.3090` and support swap between profiles. - Built **custom Dockerfile** (`mem0-api-server:latest`) including `pip install ollama`. - Updated `requirements.txt` with `ollama` dependency. - Adjusted startup flow so container automatically connects to external Ollama host (LAN IP). - Added logging to confirm model pulls and embedding requests. ### Fixed - Seeder process originally failed on old memories — now skips duplicates and continues batch. - Resolved container boot error (`ModuleNotFoundError: ollama`) by extending image. - Fixed overwrite issue where stock `main.py` replaced custom config during rebuild. - Worked around Neo4j `vector.similarity.cosine()` dimension mismatch by investigating OpenAI (1536-dim) vs Ollama (1024-dim) schemas. ### Notes - To fully unify embedding modes, a Hugging Face / local model with **1536-dim embeddings** will be needed (to match OpenAI’s schema and avoid Neo4j errors). - Current Ollama model (`mxbai-embed-large`) works, but returns 1024-dim vectors. - Seeder workflow validated but should be wrapped in a repeatable weekly run for full Cloud→Local sync. --- ## [Lyra-Mem0 v0.2.0] - 2025-09-30 ### Added - Standalone **Lyra-Mem0** stack created at `~/lyra-mem0/` - Includes **Postgres (pgvector)**, **Qdrant**, **Neo4j**, and **SQLite** for history tracking. - Added working `docker-compose.mem0.yml` and custom `Dockerfile` for building the Mem0 API server. - Verified REST API functionality: - `POST /memories` works for adding memories. - `POST /search` works for semantic search. - Successful end-to-end test with persisted memory: *"Likes coffee in the morning"* → retrievable via search. ✅ ### Changed - Split architecture into **modular stacks**: - `~/lyra-core` (Relay, Persona-Sidecar, etc.) - `~/lyra-mem0` (Mem0 OSS memory stack) - Removed old embedded mem0 containers from the Lyra-Core compose file. - Added Lyra-Mem0 section in README.md. ### Next Steps - Wire **Relay → Mem0 API** (integration not yet complete). - Add integration tests to verify persistence and retrieval from within Lyra-Core. --- ## 🧠 Lyra-Cortex ############################################################################## ## [ Cortex - v0.5] -2025-11-13 ### Added - **New `reasoning.py` module** - Async reasoning engine. - Accepts user prompt, identity, RAG block, and reflection notes. - Produces draft internal answers. - Uses primary backend (vLLM). - **New `reflection.py` module** - Fully async. - Produces actionable JSON “internal notes.” - Enforces strict JSON schema and fallback parsing. - Forces cloud backend (`backend_override="cloud"`). - Integrated `refine.py` into Cortex reasoning pipeline: - New stage between reflection and persona. - Runs exclusively on primary vLLM backend (MI50). - Produces final, internally consistent output for downstream persona layer. - **Backend override system** - Each LLM call can now select its own backend. - Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary. - **identity loader** - Added `identity.py` with `load_identity()` for consistent persona retrieval. - **ingest_handler** - Async stub created for future Intake → NeoMem → RAG pipeline. ### Changed - Unified LLM backend URL handling across Cortex: - ENV variables must now contain FULL API endpoints. - Removed all internal path-appending (e.g. `.../v1/completions`). - `llm_router.py` rewritten to use env-provided URLs as-is. - Ensures consistent behavior between draft, reflection, refine, and persona. - **Rebuilt `main.py`** - Removed old annotation/analysis logic. - New structure: load identity → get RAG → reflect → reason → return draft+notes. - Routes now clean and minimal (`/reason`, `/ingest`, `/health`). - Async path throughout Cortex. - **Refactored `llm_router.py`** - Removed old fallback logic during overrides. - OpenAI requests now use `/v1/chat/completions`. - Added proper OpenAI Authorization headers. - Distinct payload format for vLLM vs OpenAI. - Unified, correct parsing across models. - **Simplified Cortex architecture** - Removed deprecated “context.py” and old reasoning code. - Relay completely decoupled from smart behavior. - Updated environment specification: - `LLM_PRIMARY_URL` now set to `http://10.0.0.43:8000/v1/completions`. - `LLM_SECONDARY_URL` remains `http://10.0.0.3:11434/api/generate` (Ollama). - `LLM_CLOUD_URL` set to `https://api.openai.com/v1/chat/completions`. ### Fixed - Resolved endpoint conflict where: - Router expected base URLs. - Refine expected full URLs. - Refine always fell back due to hitting incorrect endpoint. - Fixed by standardizing full-URL behavior across entire system. - Reflection layer no longer fails silently (previously returned `[""]` due to MythoMax). - Resolved 404/401 errors caused by incorrect OpenAI URL endpoints. - No more double-routing through vLLM during reflection. - Corrected async/sync mismatch in multiple locations. - Eliminated double-path bug (`/v1/completions/v1/completions`) caused by previous router logic. ### Removed - Legacy `annotate`, `reason_check` glue logic from old architecture. - Old backend probing junk code. - Stale imports and unused modules leftover from previous prototype. ### Verified - Cortex → vLLM (MI50) → refine → final_output now functioning correctly. - refine shows `used_primary_backend: true` and no fallback. - Manual curl test confirms endpoint accuracy. ### Known Issues - refine sometimes prefixes output with `"Final Answer:"`; next version will sanitize this. - hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned). ### Pending / Known Issues - **RAG service does not exist** — requires containerized FastAPI service. - Reasoning layer lacks self-revision loop (deliberate thought cycle). - No speak/persona generation layer yet (`speak.py` planned). - Intake summaries not yet routing into RAG or reflection layer. - No refinement engine between reasoning and speak. ### Notes This is the largest structural change to Cortex so far. It establishes: - multi-model cognition - clean layering - identity + reflection separation - correct async code - deterministic backend routing - predictable JSON reflection The system is now ready for: - refinement loops - persona-speaking layer - containerized RAG - long-term memory integration - true emergent-behavior experiments ## [ Cortex - v0.4.1] - 2025-11-5 ### Added - **RAG intergration** - Added rag.py with query_rag() and format_rag_block(). - Cortex now queries the local RAG API (http://10.0.0.41:7090/rag/search) for contextual augmentation. - Synthesized answers and top excerpts are injected into the reasoning prompt. ### Changed ### - **Revised /reason endpoint.** - Now builds unified context blocks: - [Intake] → recent summaries - [RAG] → contextual knowledge - [User Message] → current input - Calls call_llm() for the first pass, then reflection_loop() for meta-evaluation. - Returns cortex_prompt, draft_output, final_output, and normalized reflection. - **Reflection Pipeline Stability** - Cleaned parsing to normalize JSON vs. text reflections. - Added fallback handling for malformed or non-JSON outputs. - Log system improved to show raw JSON, extracted fields, and normalized summary. - **Async Summarization (Intake v0.2.1)** - Intake summaries now run in background threads to avoid blocking Cortex. - Summaries (L1–L∞) logged asynchronously with [BG] tags. - **Environment & Networking Fixes** - Verified .env variables propagate correctly inside the Cortex container. - Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG (shared serversdown_lyra_net). - Adjusted localhost calls to service-IP mapping (10.0.0.41 for Cortex host). - **Behavioral Updates** - Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers). - RAG context successfully grounds reasoning outputs. - Intake and NeoMem confirmed receiving summaries via /add_exchange. - Log clarity pass: all reflective and contextual blocks clearly labeled. - **Known Gaps / Next Steps** - NeoMem Tuning - Improve retrieval latency and relevance. - Implement a dedicated /reflections/recent endpoint for Cortex. - Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem). - **Cortex Enhancements** - Add persistent reflection recall (use prior reflections as meta-context). - Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields). - Tighten temperature and prompt control for factual consistency. - **RAG Optimization** -Add source ranking, filtering, and multi-vector hybrid search. -Cache RAG responses per session to reduce duplicate calls. - **Documentation / Monitoring** -Add health route for RAG and Intake summaries. -Include internal latency metrics in /health endpoint. Consolidate logs into unified “Lyra Cortex Console” for tracing all module calls. ## [Cortex - v0.3.0] – 2025-10-31 ### Added - **Cortex Service (FastAPI)** - New standalone reasoning engine (`cortex/main.py`) with endpoints: - `GET /health` – reports active backend + NeoMem status. - `POST /reason` – evaluates `{prompt, response}` pairs. - `POST /annotate` – experimental text analysis. - Background NeoMem health monitor (5-minute interval). - **Multi-Backend Reasoning Support** - Added environment-driven backend selection via `LLM_FORCE_BACKEND`. - Supports: - **Primary** → vLLM (MI50 node @ 10.0.0.43) - **Secondary** → Ollama (3090 node @ 10.0.0.3) - **Cloud** → OpenAI API - **Fallback** → llama.cpp (CPU) - Introduced per-backend model variables: `LLM_PRIMARY_MODEL`, `LLM_SECONDARY_MODEL`, `LLM_CLOUD_MODEL`, `LLM_FALLBACK_MODEL`. - **Response Normalization Layer** - Implemented `normalize_llm_response()` to merge streamed outputs and repair malformed JSON. - Handles Ollama’s multi-line streaming and Mythomax’s missing punctuation issues. - Prints concise debug previews of merged content. - **Environment Simplification** - Each service (`intake`, `cortex`, `neomem`) now maintains its own `.env` file. - Removed reliance on shared/global env file to prevent cross-contamination. - Verified Docker Compose networking across containers. ### Changed - Refactored `reason_check()` to dynamically switch between **prompt** and **chat** mode depending on backend. - Enhanced startup logs to announce active backend, model, URL, and mode. - Improved error handling with clearer “Reasoning error” messages. ### Fixed - Corrected broken vLLM endpoint routing (`/v1/completions`). - Stabilized cross-container health reporting for NeoMem. - Resolved JSON parse failures caused by streaming chunk delimiters. --- ## Next Planned – [v0.4.0] ### Planned Additions - **Reflection Mode** - Introduce `REASONING_MODE=factcheck|reflection`. - Output schema: ```json { "insight": "...", "evaluation": "...", "next_action": "..." } ``` - **Cortex-First Pipeline** - UI → Cortex → [Reflection + Verifier + Memory] → Speech LLM → User. - Allows Lyra to “think before speaking.” - **Verifier Stub** - New `/verify` endpoint for search-based factual grounding. - Asynchronous external truth checking. - **Memory Integration** - Feed reflective outputs into NeoMem. - Enable “dream” cycles for autonomous self-review. --- **Status:** 🟢 Stable Core – Multi-backend reasoning operational. **Next milestone:** *v0.4.0 — Reflection Mode + Thought Pipeline orchestration.* --- ### [Intake] v0.1.0 - 2025-10-27 - Recieves messages from relay and summarizes them in a cascading format. - Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20) - Currently logs summaries to .log file in /project-lyra/intake-logs/ ** Next Steps ** - Feed intake into neomem. - Generate a daily/hourly/etc overall summary, (IE: Today Brian and Lyra worked on x, y, and z) - Generate session aware summaries, with its own intake hopper. ### [Lyra-Cortex] v0.2.0 — 2025-09-26 **Added - Integrated **llama-server** on dedicated Cortex VM (Proxmox). - Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs. - Benchmarked Phi-3.5-mini performance: - ~18 tokens/sec CPU-only on Ryzen 7 7800X. - Salience classification functional but sometimes inconsistent ("sali", "fi", "jamming"). - Tested **Qwen2-0.5B-Instruct GGUF** as alternative salience classifier: - Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval). - More responsive but over-classifies messages as “salient.” - Established `.env` integration for model ID (`SALIENCE_MODEL`), enabling hot-swap between models. ** Known Issues - Small models tend to drift or over-classify. - CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models. - Need to set up a `systemd` service for `llama-server` to auto-start on VM reboot. --- ### [Lyra-Cortex] v0.1.0 — 2025-09-25 #### Added - First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD). - Built **llama.cpp** with `llama-server` target via CMake. - Integrated **Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF)** model. - Verified **API compatibility** at `/v1/chat/completions`. - Local test successful via `curl` → ~523 token response generated. - Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X). - Confirmed usable for salience scoring, summarization, and lightweight reasoning.