# Project Lyra β€” Modular Changelog All notable changes to Project Lyra are organized by component. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/). # Last Updated: 11-28-25 --- ## 🧠 Lyra-Core ############################################################################## ## [Project Lyra v0.5.0] - 2025-11-28 ### πŸ”§ Fixed - Critical API Wiring & Integration After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity. #### Cortex β†’ Intake Integration βœ… - **Fixed** `IntakeClient` to use correct Intake v0.2 API endpoints - Changed `GET /context/{session_id}` β†’ `GET /summaries?session_id={session_id}` - Updated JSON response parsing to extract `summary_text` field - Fixed environment variable name: `INTAKE_API` β†’ `INTAKE_API_URL` - Corrected default port: `7083` β†’ `7080` - Added deprecation warning to `summarize_turn()` method (endpoint removed in Intake v0.2) #### Relay β†’ UI Compatibility βœ… - **Added** OpenAI-compatible endpoint `POST /v1/chat/completions` - Accepts standard OpenAI format with `messages[]` array - Returns OpenAI-compatible response structure with `choices[]` - Extracts last message content from messages array - Includes usage metadata (stub values for compatibility) - **Refactored** Relay to use shared `handleChatRequest()` function - Both `/chat` and `/v1/chat/completions` use same core logic - Eliminates code duplication - Consistent error handling across endpoints #### Relay β†’ Intake Connection βœ… - **Fixed** Intake URL fallback in Relay server configuration - Corrected port: `7082` β†’ `7080` - Updated endpoint: `/summary` β†’ `/add_exchange` - Now properly sends exchanges to Intake for summarization #### Code Quality & Python Package Structure βœ… - **Added** missing `__init__.py` files to all Cortex subdirectories - `cortex/llm/__init__.py` - `cortex/reasoning/__init__.py` - `cortex/persona/__init__.py` - `cortex/ingest/__init__.py` - `cortex/utils/__init__.py` - Improves package imports and IDE support - **Removed** unused import in `cortex/router.py`: `from unittest import result` - **Deleted** empty file `cortex/llm/resolve_llm_url.py` (was 0 bytes, never implemented) ### βœ… Verified Working Complete end-to-end message flow now operational: ``` UI β†’ Relay (/v1/chat/completions) ↓ Relay β†’ Cortex (/reason) ↓ Cortex β†’ Intake (/summaries) [retrieves context] ↓ Cortex 4-stage pipeline: 1. reflection.py β†’ meta-awareness notes 2. reasoning.py β†’ draft answer 3. refine.py β†’ polished answer 4. persona/speak.py β†’ Lyra personality ↓ Cortex β†’ Relay (returns persona response) ↓ Relay β†’ Intake (/add_exchange) [async summary] ↓ Intake β†’ NeoMem (background memory storage) ↓ Relay β†’ UI (final response) ``` ### πŸ“ Documentation - **Added** this CHANGELOG entry with comprehensive v0.5.0 notes - **Updated** README.md to reflect v0.5.0 architecture - Documented new endpoints - Updated data flow diagrams - Clarified Intake v0.2 changes - Corrected service descriptions ### πŸ› Issues Resolved - ❌ Cortex could not retrieve context from Intake (wrong endpoint) - ❌ UI could not send messages to Relay (endpoint mismatch) - ❌ Relay could not send summaries to Intake (wrong port/endpoint) - ❌ Python package imports were implicit (missing __init__.py) ### ⚠️ Known Issues (Non-Critical) - Session management endpoints not implemented in Relay (`GET/POST /sessions/:id`) - RAG service currently disabled in docker-compose.yml - Cortex `/ingest` endpoint is a stub returning `{"status": "ok"}` ### 🎯 Migration Notes If upgrading from v0.4.x: 1. Pull latest changes from git 2. Verify environment variables in `.env` files: - Check `INTAKE_API_URL=http://intake:7080` (not `INTAKE_API`) - Verify all service URLs use correct ports 3. Restart Docker containers: `docker-compose down && docker-compose up -d` 4. Test with a simple message through the UI --- ## [Infrastructure v1.0.0] - 2025-11-26 ### Changed - **Environment Variable Consolidation** - Major reorganization to eliminate duplication and improve maintainability - Consolidated 9 scattered `.env` files into single source of truth architecture - Root `.env` now contains all shared infrastructure (LLM backends, databases, API keys, service URLs) - Service-specific `.env` files minimized to only essential overrides: - `cortex/.env`: Reduced from 42 to 22 lines (operational parameters only) - `neomem/.env`: Reduced from 26 to 14 lines (LLM naming conventions only) - `intake/.env`: Kept at 8 lines (already minimal) - **Result**: ~24% reduction in total configuration lines (197 β†’ ~150) - **Docker Compose Consolidation** - All services now defined in single root `docker-compose.yml` - Relay service updated with complete configuration (env_file, volumes) - Removed redundant `core/docker-compose.yml` (marked as DEPRECATED) - Standardized network communication to use Docker container names - **Service URL Standardization** - Internal services use container names: `http://neomem-api:7077`, `http://cortex:7081` - External services use IP addresses: `http://10.0.0.43:8000` (vLLM), `http://10.0.0.3:11434` (Ollama) - Removed IP/container name inconsistencies across files ### Added - **Security Templates** - Created `.env.example` files for all services - Root `.env.example` with sanitized credentials - Service-specific templates: `cortex/.env.example`, `neomem/.env.example`, `intake/.env.example`, `rag/.env.example` - All `.env.example` files safe to commit to version control - **Documentation** - `ENVIRONMENT_VARIABLES.md`: Comprehensive reference for all environment variables - Variable descriptions, defaults, and usage examples - Multi-backend LLM strategy documentation - Troubleshooting guide - Security best practices - `DEPRECATED_FILES.md`: Deletion guide for deprecated files with verification steps - **Enhanced .gitignore** - Ignores all `.env` files (including subdirectories) - Tracks `.env.example` templates for documentation - Ignores `.env-backups/` directory ### Removed - `core/.env` - Redundant with root `.env`, now deleted - `core/docker-compose.yml` - Consolidated into main compose file (marked DEPRECATED) ### Fixed - Eliminated duplicate `OPENAI_API_KEY` across 5+ files - Eliminated duplicate LLM backend URLs across 4+ files - Eliminated duplicate database credentials across 3+ files - Resolved Cortex `environment:` section override in docker-compose (now uses env_file) ### Architecture - **Multi-Backend LLM Strategy**: Root `.env` provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE - Cortex β†’ vLLM (PRIMARY) for autonomous reasoning - NeoMem β†’ Ollama (SECONDARY) + OpenAI embeddings - Intake β†’ vLLM (PRIMARY) for summarization - Relay β†’ Fallback chain with user preference - Preserves per-service flexibility while eliminating URL duplication ### Migration - All original `.env` files backed up to `.env-backups/` with timestamp `20251126_025334` - Rollback plan documented in `ENVIRONMENT_VARIABLES.md` - Verification steps provided in `DEPRECATED_FILES.md` --- ## [Lyra_RAG v0.1.0] 2025-11-07 ### Added - Initial standalone RAG module for Project Lyra. - Persistent ChromaDB vector store (`./chromadb`). - Importer `rag_chat_import.py` with: - Recursive folder scanning and category tagging. - Smart chunking (~5 k chars). - SHA-1 deduplication and chat-ID metadata. - Timestamp fields (`file_modified`, `imported_at`). - Background-safe operation (`nohup`/`tmux`). - 68 Lyra-category chats imported: - **6 556 new chunks added** - **1 493 duplicates skipped** - **7 997 total vectors** now stored. ### API - `/rag/search` FastAPI endpoint implemented (port 7090). - Supports natural-language queries and returns top related excerpts. - Added answer synthesis step using `gpt-4o-mini`. ### Verified - Successful recall of Lyra-Core development history (v0.3.0 snapshot). - Correct metadata and category tagging for all new imports. ### Next Planned - Optional `where` filter parameter for category/date queries. - Graceful β€œno results” handler for empty retrievals. - `rag_docs_import.py` for PDFs and other document types. ## [Lyra Core v0.3.2 + Web Ui v0.2.0] - 2025-10-28 ### Added - ** New UI ** - Cleaned up UI look and feel. - ** Added "sessions" ** - Now sessions persist over time. - Ability to create new sessions or load sessions from a previous instance. - When changing the session, it updates what the prompt is sending relay (doesn't prompt with messages from other sessions). - Relay is correctly wired in. ## [Lyra-Core 0.3.1] - 2025-10-09 ### Added - **NVGRAM Integration (Full Pipeline Reconnected)** - Replaced legacy Mem0 service with NVGRAM microservice (`nvgram-api` @ port 7077). - Updated `server.js` in Relay to route all memory ops via `${NVGRAM_API}/memories` and `/search`. - Added `.env` variable: ``` NVGRAM_API=http://nvgram-api:7077 ``` - Verified end-to-end Lyra conversation persistence: - `relay β†’ nvgram-api β†’ postgres/neo4j β†’ relay β†’ ollama β†’ ui` - βœ… Memories stored, retrieved, and re-injected successfully. ### Changed - Renamed `MEM0_URL` β†’ `NVGRAM_API` across all relay environment configs. - Updated Docker Compose service dependency order: - `relay` now depends on `nvgram-api` healthcheck. - Removed `mem0` references and volumes. - Minor cleanup to Persona fetch block (null-checks and safer default persona string). ### Fixed - Relay startup no longer crashes when NVGRAM is unavailable β€” deferred connection handling. - `/memories` POST failures no longer crash Relay; now logged gracefully as `relay error Error: memAdd failed: 500`. - Improved injected prompt debugging (`DEBUG_PROMPT=true` now prints clean JSON). ### Goals / Next Steps - Add salience visualization (e.g., memory weights displayed in injected system message). - Begin schema alignment with NVGRAM v0.1.2 for confidence scoring. - Add relay auto-retry for transient 500 responses from NVGRAM. --- ## [Lyra-Core] v0.3.1 - 2025-09-27 ### Changed - Removed salience filter logic; Cortex is now the default annotator. - All user messages stored in Mem0; no discard tier applied. ### Added - Cortex annotations (`metadata.cortex`) now attached to memories. - Debug logging improvements: - Pretty-print Cortex annotations - Injected prompt preview - Memory search hit list with scores - `.env` toggle (`CORTEX_ENABLED`) to bypass Cortex when needed. ### Fixed - Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner. - Relay no longer β€œhangs” on malformed Cortex outputs. --- ### [Lyra-Core] v0.3.0 β€” 2025-09-26 #### Added - Implemented **salience filtering** in Relay: - `.env` configurable: `SALIENCE_ENABLED`, `SALIENCE_MODE`, `SALIENCE_MODEL`, `SALIENCE_API_URL`. - Supports `heuristic` and `llm` classification modes. - LLM-based salience filter integrated with Cortex VM running `llama-server`. - Logging improvements: - Added debug logs for salience mode, raw LLM output, and unexpected outputs. - Fail-closed behavior for unexpected LLM responses. - Successfully tested with **Phi-3.5-mini** and **Qwen2-0.5B-Instruct** as salience classifiers. - Verified end-to-end flow: Relay β†’ salience filter β†’ Mem0 add/search β†’ Persona injection β†’ LLM reply. #### Changed - Refactored `server.js` to gate `mem.add()` calls behind salience filter. - Updated `.env` to support `SALIENCE_MODEL`. #### Known Issues - Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient". - Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi"). - CPU-only inference is functional but limited; larger models recommended once GPU is available. --- ### [Lyra-Core] v0.2.0 β€” 2025-09-24 #### Added - Migrated Relay to use `mem0ai` SDK instead of raw fetch calls. - Implemented `sessionId` support (client-supplied, fallback to `default`). - Added debug logs for memory add/search. - Cleaned up Relay structure for clarity. --- ### [Lyra-Core] v0.1.0 β€” 2025-09-23 #### Added - First working MVP of **Lyra Core Relay**. - Relay service accepts `POST /v1/chat/completions` (OpenAI-compatible). - Memory integration with Mem0: - `POST /memories` on each user message. - `POST /search` before LLM call. - Persona Sidecar integration (`GET /current`). - OpenAI GPT + Ollama (Mythomax) support in Relay. - Simple browser-based chat UI (talks to Relay at `http://:7078`). - `.env` standardization for Relay + Mem0 + Postgres + Neo4j. - Working Neo4j + Postgres backing stores for Mem0. - Initial MVP relay service with raw fetch calls to Mem0. - Dockerized with basic healthcheck. #### Fixed - Resolved crash loop in Neo4j by restricting env vars (`NEO4J_AUTH` only). - Relay now correctly reads `MEM0_URL` and `MEM0_API_KEY` from `.env`. #### Known Issues - No feedback loop (thumbs up/down) yet. - Forget/delete flow is manual (via memory IDs). - Memory latency ~1–4s depending on embedding model. --- ## 🧩 lyra-neomem (used to be NVGRAM / Lyra-Mem0) ############################################################################## ## [NeoMem 0.1.2] - 2025-10-27 ### Changed - **Renamed NVGRAM to neomem** - All future updates will be under the name NeoMem. - Features have not changed. ## [NVGRAM 0.1.1] - 2025-10-08 ### Added - **Async Memory Rewrite (Stability + Safety Patch)** - Introduced `AsyncMemory` class with fully asynchronous vector and graph store writes. - Added **input sanitation** to prevent embedding errors (`'list' object has no attribute 'replace'`). - Implemented `flatten_messages()` helper in API layer to clean malformed payloads. - Added structured request logging via `RequestLoggingMiddleware` (FastAPI middleware). - Health endpoint (`/health`) now returns structured JSON `{status, version, service}`. - Startup logs now include **sanitized embedder config** with API keys masked for safety: ``` >>> Embedder config (sanitized): {'provider': 'openai', 'config': {'model': 'text-embedding-3-small', 'api_key': '***'}} βœ… Connected to Neo4j on attempt 1 🧠 NVGRAM v0.1.1 β€” Neural Vectorized Graph Recall and Memory initialized ``` ### Changed - Replaced synchronous `Memory.add()` with async-safe version supporting concurrent vector + graph writes. - Normalized indentation and cleaned duplicate `main.py` references under `/nvgram/` vs `/nvgram/server/`. - Removed redundant `FastAPI()` app reinitialization. - Updated internal logging to INFO-level timing format: 2025-10-08 21:48:45 [INFO] POST /memories -> 200 (11189.1 ms) - Deprecated `@app.on_event("startup")` (FastAPI deprecation warning) β†’ will migrate to `lifespan` handler in v0.1.2. ### Fixed - Eliminated repeating 500 error from OpenAI embedder caused by non-string message content. - Masked API key leaks from boot logs. - Ensured Neo4j reconnects gracefully on first retry. ### Goals / Next Steps - Integrate **salience scoring** and **embedding confidence weight** fields in Postgres schema. - Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall. - Migrate from deprecated `on_event` β†’ `lifespan` pattern in 0.1.2. --- ## [NVGRAM 0.1.0] - 2025-10-07 ### Added - **Initial fork of Mem0 β†’ NVGRAM**: - Created a fully independent local-first memory engine based on Mem0 OSS. - Renamed all internal modules, Docker services, and environment variables from `mem0` β†’ `nvgram`. - New service name: **`nvgram-api`**, default port **7077**. - Maintains same API endpoints (`/memories`, `/search`) for drop-in compatibility with Lyra Core. - Uses **FastAPI**, **Postgres**, and **Neo4j** as persistent backends. - Verified clean startup: ``` βœ… Connected to Neo4j on attempt 1 INFO: Uvicorn running on http://0.0.0.0:7077 ``` - `/docs` and `/openapi.json` confirmed reachable and functional. ### Changed - Removed dependency on the external `mem0ai` SDK β€” all logic now local. - Re-pinned requirements: - fastapi==0.115.8 - uvicorn==0.34.0 - pydantic==2.10.4 - python-dotenv==1.0.1 - psycopg>=3.2.8 - ollama - Adjusted `docker-compose` and `.env` templates to use new NVGRAM naming and image paths. ### Goals / Next Steps - Integrate NVGRAM as the new default backend in Lyra Relay. - Deprecate remaining Mem0 references and archive old configs. - Begin versioning as a standalone project (`nvgram-core`, `nvgram-api`, etc.). --- ## [Lyra-Mem0 0.3.2] - 2025-10-05 ### Added - Support for **Ollama LLM reasoning** alongside OpenAI embeddings: - Introduced `LLM_PROVIDER=ollama`, `LLM_MODEL`, and `OLLAMA_HOST` in `.env.3090`. - Verified local 3090 setup using `qwen2.5:7b-instruct-q4_K_M`. - Split processing pipeline: - Embeddings β†’ OpenAI `text-embedding-3-small` - LLM β†’ Local Ollama (`http://10.0.0.3:11434/api/chat`). - Added `.env.3090` template for self-hosted inference nodes. - Integrated runtime diagnostics and seeder progress tracking: - File-level + message-level progress bars. - Retry/back-off logic for timeouts (3 attempts). - Event logging (`ADD / UPDATE / NONE`) for every memory record. - Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers. - Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090). ### Changed - Updated `main.py` configuration block to load: - `LLM_PROVIDER`, `LLM_MODEL`, and `OLLAMA_BASE_URL`. - Fallback to OpenAI if Ollama unavailable. - Adjusted `docker-compose.yml` mount paths to correctly map `/app/main.py`. - Normalized `.env` loading so `mem0-api` and host environment share identical values. - Improved seeder logging and progress telemetry for clearer diagnostics. - Added explicit `temperature` field to `DEFAULT_CONFIG['llm']['config']` for tuning future local inference runs. ### Fixed - Resolved crash during startup: `TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'`. - Corrected mount type mismatch (file vs directory) causing `OCI runtime create failed` errors. - Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests. - β€œUnknown event” warnings now safely ignored (no longer break seeding loop). - Confirmed full dual-provider operation in logs (`api.openai.com` + `10.0.0.3:11434/api/chat`). ### Observations - Stable GPU utilization: ~8 GB VRAM @ 92 % load, β‰ˆ 67 Β°C under sustained seeding. - Next revision will re-format seed JSON to preserve `role` context (user vs assistant). --- ## [Lyra-Mem0 0.3.1] - 2025-10-03 ### Added - HuggingFace TEI integration (local 3090 embedder). - Dual-mode environment switch between OpenAI cloud and local. - CSV export of memories from Postgres (`payload->>'data'`). ### Fixed - `.env` CRLF vs LF line ending issues. - Local seeding now possible via huggingface server running --- ## [Lyra-mem0 0.3.0] ### Added - Support for **Ollama embeddings** in Mem0 OSS container: - Added ability to configure `EMBEDDER_PROVIDER=ollama` and set `EMBEDDER_MODEL` + `OLLAMA_HOST` via `.env`. - Mounted `main.py` override from host into container to load custom `DEFAULT_CONFIG`. - Installed `ollama` Python client into custom API container image. - `.env.3090` file created for external embedding mode (3090 machine): - EMBEDDER_PROVIDER=ollama - EMBEDDER_MODEL=mxbai-embed-large - OLLAMA_HOST=http://10.0.0.3:11434 - Workflow to support **multiple embedding modes**: 1. Fast LAN-based 3090/Ollama embeddings 2. Local-only CPU embeddings (Lyra Cortex VM) 3. OpenAI fallback embeddings ### Changed - `docker-compose.yml` updated to mount local `main.py` and `.env.3090`. - Built **custom Dockerfile** (`mem0-api-server:latest`) extending base image with `pip install ollama`. - Updated `requirements.txt` to include `ollama` package. - Adjusted Mem0 container config so `main.py` pulls environment variables with `dotenv` (`load_dotenv()`). - Tested new embeddings path with curl `/memories` API call. ### Fixed - Resolved container boot failure caused by missing `ollama` dependency (`ModuleNotFoundError`). - Fixed config overwrite issue where rebuilding container restored stock `main.py`. - Worked around Neo4j error (`vector.similarity.cosine(): mismatched vector dimensions`) by confirming OpenAI vs. Ollama embedding vector sizes and planning to standardize at 1536-dim. -- ## [Lyra-mem0 v0.2.1] ### Added - **Seeding pipeline**: - Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0. - Implemented incremental seeding option (skip existing memories, only add new ones). - Verified insert process with Postgres-backed history DB and curl `/memories/search` sanity check. - **Ollama embedding support** in Mem0 OSS container: - Added configuration for `EMBEDDER_PROVIDER=ollama`, `EMBEDDER_MODEL`, and `OLLAMA_HOST` via `.env`. - Created `.env.3090` profile for LAN-connected 3090 machine with Ollama. - Set up three embedding modes: 1. Fast LAN-based 3090/Ollama 2. Local-only CPU model (Lyra Cortex VM) 3. OpenAI fallback ### Changed - Updated `main.py` to load configuration from `.env` using `dotenv` and support multiple embedder backends. - Mounted host `main.py` into container so local edits persist across rebuilds. - Updated `docker-compose.yml` to mount `.env.3090` and support swap between profiles. - Built **custom Dockerfile** (`mem0-api-server:latest`) including `pip install ollama`. - Updated `requirements.txt` with `ollama` dependency. - Adjusted startup flow so container automatically connects to external Ollama host (LAN IP). - Added logging to confirm model pulls and embedding requests. ### Fixed - Seeder process originally failed on old memories β€” now skips duplicates and continues batch. - Resolved container boot error (`ModuleNotFoundError: ollama`) by extending image. - Fixed overwrite issue where stock `main.py` replaced custom config during rebuild. - Worked around Neo4j `vector.similarity.cosine()` dimension mismatch by investigating OpenAI (1536-dim) vs Ollama (1024-dim) schemas. ### Notes - To fully unify embedding modes, a Hugging Face / local model with **1536-dim embeddings** will be needed (to match OpenAI’s schema and avoid Neo4j errors). - Current Ollama model (`mxbai-embed-large`) works, but returns 1024-dim vectors. - Seeder workflow validated but should be wrapped in a repeatable weekly run for full Cloudβ†’Local sync. --- ## [Lyra-Mem0 v0.2.0] - 2025-09-30 ### Added - Standalone **Lyra-Mem0** stack created at `~/lyra-mem0/` - Includes **Postgres (pgvector)**, **Qdrant**, **Neo4j**, and **SQLite** for history tracking. - Added working `docker-compose.mem0.yml` and custom `Dockerfile` for building the Mem0 API server. - Verified REST API functionality: - `POST /memories` works for adding memories. - `POST /search` works for semantic search. - Successful end-to-end test with persisted memory: *"Likes coffee in the morning"* β†’ retrievable via search. βœ… ### Changed - Split architecture into **modular stacks**: - `~/lyra-core` (Relay, Persona-Sidecar, etc.) - `~/lyra-mem0` (Mem0 OSS memory stack) - Removed old embedded mem0 containers from the Lyra-Core compose file. - Added Lyra-Mem0 section in README.md. ### Next Steps - Wire **Relay β†’ Mem0 API** (integration not yet complete). - Add integration tests to verify persistence and retrieval from within Lyra-Core. --- ## 🧠 Lyra-Cortex ############################################################################## ## [ Cortex - v0.5] -2025-11-13 ### Added - **New `reasoning.py` module** - Async reasoning engine. - Accepts user prompt, identity, RAG block, and reflection notes. - Produces draft internal answers. - Uses primary backend (vLLM). - **New `reflection.py` module** - Fully async. - Produces actionable JSON β€œinternal notes.” - Enforces strict JSON schema and fallback parsing. - Forces cloud backend (`backend_override="cloud"`). - Integrated `refine.py` into Cortex reasoning pipeline: - New stage between reflection and persona. - Runs exclusively on primary vLLM backend (MI50). - Produces final, internally consistent output for downstream persona layer. - **Backend override system** - Each LLM call can now select its own backend. - Enables multi-LLM cognition: Reflection β†’ cloud, Reasoning β†’ primary. - **identity loader** - Added `identity.py` with `load_identity()` for consistent persona retrieval. - **ingest_handler** - Async stub created for future Intake β†’ NeoMem β†’ RAG pipeline. ### Changed - Unified LLM backend URL handling across Cortex: - ENV variables must now contain FULL API endpoints. - Removed all internal path-appending (e.g. `.../v1/completions`). - `llm_router.py` rewritten to use env-provided URLs as-is. - Ensures consistent behavior between draft, reflection, refine, and persona. - **Rebuilt `main.py`** - Removed old annotation/analysis logic. - New structure: load identity β†’ get RAG β†’ reflect β†’ reason β†’ return draft+notes. - Routes now clean and minimal (`/reason`, `/ingest`, `/health`). - Async path throughout Cortex. - **Refactored `llm_router.py`** - Removed old fallback logic during overrides. - OpenAI requests now use `/v1/chat/completions`. - Added proper OpenAI Authorization headers. - Distinct payload format for vLLM vs OpenAI. - Unified, correct parsing across models. - **Simplified Cortex architecture** - Removed deprecated β€œcontext.py” and old reasoning code. - Relay completely decoupled from smart behavior. - Updated environment specification: - `LLM_PRIMARY_URL` now set to `http://10.0.0.43:8000/v1/completions`. - `LLM_SECONDARY_URL` remains `http://10.0.0.3:11434/api/generate` (Ollama). - `LLM_CLOUD_URL` set to `https://api.openai.com/v1/chat/completions`. ### Fixed - Resolved endpoint conflict where: - Router expected base URLs. - Refine expected full URLs. - Refine always fell back due to hitting incorrect endpoint. - Fixed by standardizing full-URL behavior across entire system. - Reflection layer no longer fails silently (previously returned `[""]` due to MythoMax). - Resolved 404/401 errors caused by incorrect OpenAI URL endpoints. - No more double-routing through vLLM during reflection. - Corrected async/sync mismatch in multiple locations. - Eliminated double-path bug (`/v1/completions/v1/completions`) caused by previous router logic. ### Removed - Legacy `annotate`, `reason_check` glue logic from old architecture. - Old backend probing junk code. - Stale imports and unused modules leftover from previous prototype. ### Verified - Cortex β†’ vLLM (MI50) β†’ refine β†’ final_output now functioning correctly. - refine shows `used_primary_backend: true` and no fallback. - Manual curl test confirms endpoint accuracy. ### Known Issues - refine sometimes prefixes output with `"Final Answer:"`; next version will sanitize this. - hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned). ### Pending / Known Issues - **RAG service does not exist** β€” requires containerized FastAPI service. - Reasoning layer lacks self-revision loop (deliberate thought cycle). - No speak/persona generation layer yet (`speak.py` planned). - Intake summaries not yet routing into RAG or reflection layer. - No refinement engine between reasoning and speak. ### Notes This is the largest structural change to Cortex so far. It establishes: - multi-model cognition - clean layering - identity + reflection separation - correct async code - deterministic backend routing - predictable JSON reflection The system is now ready for: - refinement loops - persona-speaking layer - containerized RAG - long-term memory integration - true emergent-behavior experiments ## [ Cortex - v0.4.1] - 2025-11-5 ### Added - **RAG intergration** - Added rag.py with query_rag() and format_rag_block(). - Cortex now queries the local RAG API (http://10.0.0.41:7090/rag/search) for contextual augmentation. - Synthesized answers and top excerpts are injected into the reasoning prompt. ### Changed ### - **Revised /reason endpoint.** - Now builds unified context blocks: - [Intake] β†’ recent summaries - [RAG] β†’ contextual knowledge - [User Message] β†’ current input - Calls call_llm() for the first pass, then reflection_loop() for meta-evaluation. - Returns cortex_prompt, draft_output, final_output, and normalized reflection. - **Reflection Pipeline Stability** - Cleaned parsing to normalize JSON vs. text reflections. - Added fallback handling for malformed or non-JSON outputs. - Log system improved to show raw JSON, extracted fields, and normalized summary. - **Async Summarization (Intake v0.2.1)** - Intake summaries now run in background threads to avoid blocking Cortex. - Summaries (L1–L∞) logged asynchronously with [BG] tags. - **Environment & Networking Fixes** - Verified .env variables propagate correctly inside the Cortex container. - Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG (shared serversdown_lyra_net). - Adjusted localhost calls to service-IP mapping (10.0.0.41 for Cortex host). - **Behavioral Updates** - Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers). - RAG context successfully grounds reasoning outputs. - Intake and NeoMem confirmed receiving summaries via /add_exchange. - Log clarity pass: all reflective and contextual blocks clearly labeled. - **Known Gaps / Next Steps** - NeoMem Tuning - Improve retrieval latency and relevance. - Implement a dedicated /reflections/recent endpoint for Cortex. - Migrate to Cortex-first ingestion (Relay β†’ Cortex β†’ NeoMem). - **Cortex Enhancements** - Add persistent reflection recall (use prior reflections as meta-context). - Improve reflection JSON structure ("insight", "evaluation", "next_action" β†’ guaranteed fields). - Tighten temperature and prompt control for factual consistency. - **RAG Optimization** -Add source ranking, filtering, and multi-vector hybrid search. -Cache RAG responses per session to reduce duplicate calls. - **Documentation / Monitoring** -Add health route for RAG and Intake summaries. -Include internal latency metrics in /health endpoint. Consolidate logs into unified β€œLyra Cortex Console” for tracing all module calls. ## [Cortex - v0.3.0] – 2025-10-31 ### Added - **Cortex Service (FastAPI)** - New standalone reasoning engine (`cortex/main.py`) with endpoints: - `GET /health` – reports active backend + NeoMem status. - `POST /reason` – evaluates `{prompt, response}` pairs. - `POST /annotate` – experimental text analysis. - Background NeoMem health monitor (5-minute interval). - **Multi-Backend Reasoning Support** - Added environment-driven backend selection via `LLM_FORCE_BACKEND`. - Supports: - **Primary** β†’ vLLM (MI50 node @ 10.0.0.43) - **Secondary** β†’ Ollama (3090 node @ 10.0.0.3) - **Cloud** β†’ OpenAI API - **Fallback** β†’ llama.cpp (CPU) - Introduced per-backend model variables: `LLM_PRIMARY_MODEL`, `LLM_SECONDARY_MODEL`, `LLM_CLOUD_MODEL`, `LLM_FALLBACK_MODEL`. - **Response Normalization Layer** - Implemented `normalize_llm_response()` to merge streamed outputs and repair malformed JSON. - Handles Ollama’s multi-line streaming and Mythomax’s missing punctuation issues. - Prints concise debug previews of merged content. - **Environment Simplification** - Each service (`intake`, `cortex`, `neomem`) now maintains its own `.env` file. - Removed reliance on shared/global env file to prevent cross-contamination. - Verified Docker Compose networking across containers. ### Changed - Refactored `reason_check()` to dynamically switch between **prompt** and **chat** mode depending on backend. - Enhanced startup logs to announce active backend, model, URL, and mode. - Improved error handling with clearer β€œReasoning error” messages. ### Fixed - Corrected broken vLLM endpoint routing (`/v1/completions`). - Stabilized cross-container health reporting for NeoMem. - Resolved JSON parse failures caused by streaming chunk delimiters. --- ## Next Planned – [v0.4.0] ### Planned Additions - **Reflection Mode** - Introduce `REASONING_MODE=factcheck|reflection`. - Output schema: ```json { "insight": "...", "evaluation": "...", "next_action": "..." } ``` - **Cortex-First Pipeline** - UI β†’ Cortex β†’ [Reflection + Verifier + Memory] β†’ Speech LLM β†’ User. - Allows Lyra to β€œthink before speaking.” - **Verifier Stub** - New `/verify` endpoint for search-based factual grounding. - Asynchronous external truth checking. - **Memory Integration** - Feed reflective outputs into NeoMem. - Enable β€œdream” cycles for autonomous self-review. --- **Status:** 🟒 Stable Core – Multi-backend reasoning operational. **Next milestone:** *v0.4.0 β€” Reflection Mode + Thought Pipeline orchestration.* --- ### [Intake] v0.1.0 - 2025-10-27 - Recieves messages from relay and summarizes them in a cascading format. - Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20) - Currently logs summaries to .log file in /project-lyra/intake-logs/ ** Next Steps ** - Feed intake into neomem. - Generate a daily/hourly/etc overall summary, (IE: Today Brian and Lyra worked on x, y, and z) - Generate session aware summaries, with its own intake hopper. ### [Lyra-Cortex] v0.2.0 β€” 2025-09-26 **Added - Integrated **llama-server** on dedicated Cortex VM (Proxmox). - Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs. - Benchmarked Phi-3.5-mini performance: - ~18 tokens/sec CPU-only on Ryzen 7 7800X. - Salience classification functional but sometimes inconsistent ("sali", "fi", "jamming"). - Tested **Qwen2-0.5B-Instruct GGUF** as alternative salience classifier: - Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval). - More responsive but over-classifies messages as β€œsalient.” - Established `.env` integration for model ID (`SALIENCE_MODEL`), enabling hot-swap between models. ** Known Issues - Small models tend to drift or over-classify. - CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models. - Need to set up a `systemd` service for `llama-server` to auto-start on VM reboot. --- ### [Lyra-Cortex] v0.1.0 β€” 2025-09-25 #### Added - First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD). - Built **llama.cpp** with `llama-server` target via CMake. - Integrated **Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF)** model. - Verified **API compatibility** at `/v1/chat/completions`. - Local test successful via `curl` β†’ ~523 token response generated. - Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X). - Confirmed usable for salience scoring, summarization, and lightweight reasoning.