# Project Lyra — Modular Changelog
All notable changes to Project Lyra are organized by component.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
and adheres to [Semantic Versioning](https://semver.org/).
# Last Updated: 11-28-25
---

## 🧠 Lyra-Core ##############################################################################

## [Project Lyra v0.5.0] - 2025-11-28

### 🔧 Fixed - Critical API Wiring & Integration
After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.

#### Cortex → Intake Integration ✅
- **Fixed** `IntakeClient` to use correct Intake v0.2 API endpoints
  - Changed `GET /context/{session_id}` → `GET /summaries?session_id={session_id}`
  - Updated JSON response parsing to extract `summary_text` field
  - Fixed environment variable name: `INTAKE_API` → `INTAKE_API_URL`
  - Corrected default port: `7083` → `7080`
  - Added deprecation warning to `summarize_turn()` method (endpoint removed in Intake v0.2)

#### Relay → UI Compatibility ✅
- **Added** OpenAI-compatible endpoint `POST /v1/chat/completions`
  - Accepts standard OpenAI format with `messages[]` array
  - Returns OpenAI-compatible response structure with `choices[]`
  - Extracts last message content from messages array
  - Includes usage metadata (stub values for compatibility)
- **Refactored** Relay to use shared `handleChatRequest()` function
  - Both `/chat` and `/v1/chat/completions` use same core logic
  - Eliminates code duplication
  - Consistent error handling across endpoints

#### Relay → Intake Connection ✅
- **Fixed** Intake URL fallback in Relay server configuration
  - Corrected port: `7082` → `7080`
  - Updated endpoint: `/summary` → `/add_exchange`
  - Now properly sends exchanges to Intake for summarization

#### Code Quality & Python Package Structure ✅
- **Added** missing `__init__.py` files to all Cortex subdirectories
  - `cortex/llm/__init__.py`
  - `cortex/reasoning/__init__.py`
  - `cortex/persona/__init__.py`
  - `cortex/ingest/__init__.py`
  - `cortex/utils/__init__.py`
  - Improves package imports and IDE support
- **Removed** unused import in `cortex/router.py`: `from unittest import result`
- **Deleted** empty file `cortex/llm/resolve_llm_url.py` (was 0 bytes, never implemented)

### ✅ Verified Working
Complete end-to-end message flow now operational:
```
UI → Relay (/v1/chat/completions)
  ↓
Relay → Cortex (/reason)
  ↓
Cortex → Intake (/summaries) [retrieves context]
  ↓
Cortex 4-stage pipeline:
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer
  3. refine.py → polished answer
  4. persona/speak.py → Lyra personality
  ↓
Cortex → Relay (returns persona response)
  ↓
Relay → Intake (/add_exchange) [async summary]
  ↓
Intake → NeoMem (background memory storage)
  ↓
Relay → UI (final response)
```

### 📝 Documentation
- **Added** this CHANGELOG entry with comprehensive v0.5.0 notes
- **Updated** README.md to reflect v0.5.0 architecture
  - Documented new endpoints
  - Updated data flow diagrams
  - Clarified Intake v0.2 changes
  - Corrected service descriptions

### 🐛 Issues Resolved
- ❌ Cortex could not retrieve context from Intake (wrong endpoint)
- ❌ UI could not send messages to Relay (endpoint mismatch)
- ❌ Relay could not send summaries to Intake (wrong port/endpoint)
- ❌ Python package imports were implicit (missing __init__.py)

### ⚠️ Known Issues (Non-Critical)
- Session management endpoints not implemented in Relay (`GET/POST /sessions/:id`)
- RAG service currently disabled in docker-compose.yml
- Cortex `/ingest` endpoint is a stub returning `{"status": "ok"}`

### 🎯 Migration Notes
If upgrading from v0.4.x:
1. Pull latest changes from git
2. Verify environment variables in `.env` files:
   - Check `INTAKE_API_URL=http://intake:7080` (not `INTAKE_API`)
   - Verify all service URLs use correct ports
3. Restart Docker containers: `docker-compose down && docker-compose up -d`
4. Test with a simple message through the UI

---

## [Infrastructure v1.0.0] - 2025-11-26

### Changed
- **Environment Variable Consolidation** - Major reorganization to eliminate duplication and improve maintainability
  - Consolidated 9 scattered `.env` files into single source of truth architecture
  - Root `.env` now contains all shared infrastructure (LLM backends, databases, API keys, service URLs)
  - Service-specific `.env` files minimized to only essential overrides:
    - `cortex/.env`: Reduced from 42 to 22 lines (operational parameters only)
    - `neomem/.env`: Reduced from 26 to 14 lines (LLM naming conventions only)
    - `intake/.env`: Kept at 8 lines (already minimal)
  - **Result**: ~24% reduction in total configuration lines (197 → ~150)

- **Docker Compose Consolidation**
  - All services now defined in single root `docker-compose.yml`
  - Relay service updated with complete configuration (env_file, volumes)
  - Removed redundant `core/docker-compose.yml` (marked as DEPRECATED)
  - Standardized network communication to use Docker container names

- **Service URL Standardization**
  - Internal services use container names: `http://neomem-api:7077`, `http://cortex:7081`
  - External services use IP addresses: `http://10.0.0.43:8000` (vLLM), `http://10.0.0.3:11434` (Ollama)
  - Removed IP/container name inconsistencies across files

### Added
- **Security Templates** - Created `.env.example` files for all services
  - Root `.env.example` with sanitized credentials
  - Service-specific templates: `cortex/.env.example`, `neomem/.env.example`, `intake/.env.example`, `rag/.env.example`
  - All `.env.example` files safe to commit to version control

- **Documentation**
  - `ENVIRONMENT_VARIABLES.md`: Comprehensive reference for all environment variables
    - Variable descriptions, defaults, and usage examples
    - Multi-backend LLM strategy documentation
    - Troubleshooting guide
    - Security best practices
  - `DEPRECATED_FILES.md`: Deletion guide for deprecated files with verification steps

- **Enhanced .gitignore**
  - Ignores all `.env` files (including subdirectories)
  - Tracks `.env.example` templates for documentation
  - Ignores `.env-backups/` directory

### Removed
- `core/.env` - Redundant with root `.env`, now deleted
- `core/docker-compose.yml` - Consolidated into main compose file (marked DEPRECATED)

### Fixed
- Eliminated duplicate `OPENAI_API_KEY` across 5+ files
- Eliminated duplicate LLM backend URLs across 4+ files
- Eliminated duplicate database credentials across 3+ files
- Resolved Cortex `environment:` section override in docker-compose (now uses env_file)

### Architecture
- **Multi-Backend LLM Strategy**: Root `.env` provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE
  - Cortex → vLLM (PRIMARY) for autonomous reasoning
  - NeoMem → Ollama (SECONDARY) + OpenAI embeddings
  - Intake → vLLM (PRIMARY) for summarization
  - Relay → Fallback chain with user preference
- Preserves per-service flexibility while eliminating URL duplication

### Migration
- All original `.env` files backed up to `.env-backups/` with timestamp `20251126_025334`
- Rollback plan documented in `ENVIRONMENT_VARIABLES.md`
- Verification steps provided in `DEPRECATED_FILES.md`

---

## [Lyra_RAG v0.1.0] 2025-11-07
### Added
- Initial standalone RAG module for Project Lyra.
- Persistent ChromaDB vector store (`./chromadb`).
- Importer `rag_chat_import.py` with:
  - Recursive folder scanning and category tagging.
  - Smart chunking (~5 k chars).
  - SHA-1 deduplication and chat-ID metadata.
  - Timestamp fields (`file_modified`, `imported_at`).
  - Background-safe operation (`nohup`/`tmux`).
- 68 Lyra-category chats imported:
  - **6 556 new chunks added**
  - **1 493 duplicates skipped**
  - **7 997 total vectors** now stored.

### API
- `/rag/search` FastAPI endpoint implemented (port 7090).
- Supports natural-language queries and returns top related excerpts.
- Added answer synthesis step using `gpt-4o-mini`.

### Verified
- Successful recall of Lyra-Core development history (v0.3.0 snapshot).
- Correct metadata and category tagging for all new imports.

### Next Planned
- Optional `where` filter parameter for category/date queries.
- Graceful “no results” handler for empty retrievals.
- `rag_docs_import.py` for PDFs and other document types.

## [Lyra Core v0.3.2 + Web Ui v0.2.0] - 2025-10-28

### Added
- ** New UI **
  - Cleaned up UI look and feel.
  
- ** Added "sessions" **
  - Now sessions persist over time.
  - Ability to create new sessions or load sessions from a previous instance.
  - When changing the session, it updates what the prompt is sending relay (doesn't prompt with messages from other sessions).
  - Relay is correctly wired in.

## [Lyra-Core 0.3.1] - 2025-10-09

### Added
- **NVGRAM Integration (Full Pipeline Reconnected)**
  - Replaced legacy Mem0 service with NVGRAM microservice (`nvgram-api` @ port 7077).
  - Updated `server.js` in Relay to route all memory ops via `${NVGRAM_API}/memories` and `/search`.
  - Added `.env` variable:
    ```
    NVGRAM_API=http://nvgram-api:7077
    ```
  - Verified end-to-end Lyra conversation persistence:
    - `relay → nvgram-api → postgres/neo4j → relay → ollama → ui`
    - ✅ Memories stored, retrieved, and re-injected successfully.

### Changed
- Renamed `MEM0_URL` → `NVGRAM_API` across all relay environment configs.
- Updated Docker Compose service dependency order:
  - `relay` now depends on `nvgram-api` healthcheck.
  - Removed `mem0` references and volumes.
- Minor cleanup to Persona fetch block (null-checks and safer default persona string).

### Fixed
- Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling.
- `/memories` POST failures no longer crash Relay; now logged gracefully as `relay error Error: memAdd failed: 500`.
- Improved injected prompt debugging (`DEBUG_PROMPT=true` now prints clean JSON).

### Goals / Next Steps
- Add salience visualization (e.g., memory weights displayed in injected system message).
- Begin schema alignment with NVGRAM v0.1.2 for confidence scoring.
- Add relay auto-retry for transient 500 responses from NVGRAM.

---

## [Lyra-Core] v0.3.1 - 2025-09-27
### Changed
- Removed salience filter logic; Cortex is now the default annotator.
- All user messages stored in Mem0; no discard tier applied.

### Added
- Cortex annotations (`metadata.cortex`) now attached to memories.
- Debug logging improvements:
  - Pretty-print Cortex annotations
  - Injected prompt preview
  - Memory search hit list with scores
- `.env` toggle (`CORTEX_ENABLED`) to bypass Cortex when needed.

### Fixed
- Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner.
- Relay no longer “hangs” on malformed Cortex outputs.

---

### [Lyra-Core] v0.3.0 — 2025-09-26
#### Added
- Implemented **salience filtering** in Relay:
  - `.env` configurable: `SALIENCE_ENABLED`, `SALIENCE_MODE`, `SALIENCE_MODEL`, `SALIENCE_API_URL`.
  - Supports `heuristic` and `llm` classification modes.
  - LLM-based salience filter integrated with Cortex VM running `llama-server`.
- Logging improvements:
  - Added debug logs for salience mode, raw LLM output, and unexpected outputs.
  - Fail-closed behavior for unexpected LLM responses.
- Successfully tested with **Phi-3.5-mini** and **Qwen2-0.5B-Instruct** as salience classifiers.
- Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply.

#### Changed
- Refactored `server.js` to gate `mem.add()` calls behind salience filter.
- Updated `.env` to support `SALIENCE_MODEL`.

#### Known Issues
- Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient".
- Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi").
- CPU-only inference is functional but limited; larger models recommended once GPU is available.

---

### [Lyra-Core] v0.2.0 — 2025-09-24
#### Added
- Migrated Relay to use `mem0ai` SDK instead of raw fetch calls.
- Implemented `sessionId` support (client-supplied, fallback to `default`).
- Added debug logs for memory add/search.
- Cleaned up Relay structure for clarity.

---

### [Lyra-Core] v0.1.0 — 2025-09-23
#### Added
- First working MVP of **Lyra Core Relay**.
- Relay service accepts `POST /v1/chat/completions` (OpenAI-compatible).
- Memory integration with Mem0:
  - `POST /memories` on each user message.
  - `POST /search` before LLM call.
- Persona Sidecar integration (`GET /current`).
- OpenAI GPT + Ollama (Mythomax) support in Relay.
- Simple browser-based chat UI (talks to Relay at `http://<host>:7078`).
- `.env` standardization for Relay + Mem0 + Postgres + Neo4j.
- Working Neo4j + Postgres backing stores for Mem0.
- Initial MVP relay service with raw fetch calls to Mem0.
- Dockerized with basic healthcheck.

#### Fixed
- Resolved crash loop in Neo4j by restricting env vars (`NEO4J_AUTH` only).
- Relay now correctly reads `MEM0_URL` and `MEM0_API_KEY` from `.env`.

#### Known Issues
- No feedback loop (thumbs up/down) yet.
- Forget/delete flow is manual (via memory IDs).
- Memory latency ~1–4s depending on embedding model.

---

## 🧩 lyra-neomem (used to be NVGRAM / Lyra-Mem0) ##############################################################################

## [NeoMem 0.1.2] - 2025-10-27
### Changed
- **Renamed NVGRAM to neomem**
  - All future updates will be under the name NeoMem.
  - Features have not changed.

## [NVGRAM 0.1.1] - 2025-10-08
### Added
- **Async Memory Rewrite (Stability + Safety Patch)**
  - Introduced `AsyncMemory` class with fully asynchronous vector and graph store writes.
  - Added **input sanitation** to prevent embedding errors (`'list' object has no attribute 'replace'`).
  - Implemented `flatten_messages()` helper in API layer to clean malformed payloads.
  - Added structured request logging via `RequestLoggingMiddleware` (FastAPI middleware).
  - Health endpoint (`/health`) now returns structured JSON `{status, version, service}`.
  - Startup logs now include **sanitized embedder config** with API keys masked for safety:
    ```
    >>> Embedder config (sanitized): {'provider': 'openai', 'config': {'model': 'text-embedding-3-small', 'api_key': '***'}}
    ✅ Connected to Neo4j on attempt 1
    🧠 NVGRAM v0.1.1 — Neural Vectorized Graph Recall and Memory initialized
    ```

### Changed
- Replaced synchronous `Memory.add()` with async-safe version supporting concurrent vector + graph writes.
- Normalized indentation and cleaned duplicate `main.py` references under `/nvgram/` vs `/nvgram/server/`.
- Removed redundant `FastAPI()` app reinitialization.
- Updated internal logging to INFO-level timing format:
		2025-10-08 21:48:45 [INFO] POST /memories -> 200 (11189.1 ms)
- Deprecated `@app.on_event("startup")` (FastAPI deprecation warning) → will migrate to `lifespan` handler in v0.1.2.

### Fixed
- Eliminated repeating 500 error from OpenAI embedder caused by non-string message content.
- Masked API key leaks from boot logs.
- Ensured Neo4j reconnects gracefully on first retry.

### Goals / Next Steps
- Integrate **salience scoring** and **embedding confidence weight** fields in Postgres schema.
- Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall.
- Migrate from deprecated `on_event` → `lifespan` pattern in 0.1.2.

---

## [NVGRAM 0.1.0] - 2025-10-07
### Added
- **Initial fork of Mem0 → NVGRAM**:
  - Created a fully independent local-first memory engine based on Mem0 OSS.
  - Renamed all internal modules, Docker services, and environment variables from `mem0` → `nvgram`.
  - New service name: **`nvgram-api`**, default port **7077**.
  - Maintains same API endpoints (`/memories`, `/search`) for drop-in compatibility with Lyra Core.
  - Uses **FastAPI**, **Postgres**, and **Neo4j** as persistent backends.
  - Verified clean startup:
    ```
    ✅ Connected to Neo4j on attempt 1
    INFO: Uvicorn running on http://0.0.0.0:7077
    ```
  - `/docs` and `/openapi.json` confirmed reachable and functional.

### Changed
- Removed dependency on the external `mem0ai` SDK — all logic now local.
- Re-pinned requirements:
	- fastapi==0.115.8
	- uvicorn==0.34.0
	- pydantic==2.10.4
	- python-dotenv==1.0.1
	- psycopg>=3.2.8
	- ollama
- Adjusted `docker-compose` and `.env` templates to use new NVGRAM naming and image paths.

### Goals / Next Steps
- Integrate NVGRAM as the new default backend in Lyra Relay.
- Deprecate remaining Mem0 references and archive old configs.
- Begin versioning as a standalone project (`nvgram-core`, `nvgram-api`, etc.).

---

## [Lyra-Mem0 0.3.2] - 2025-10-05
### Added
- Support for **Ollama LLM reasoning** alongside OpenAI embeddings:
  - Introduced `LLM_PROVIDER=ollama`, `LLM_MODEL`, and `OLLAMA_HOST` in `.env.3090`.
  - Verified local 3090 setup using `qwen2.5:7b-instruct-q4_K_M`.
  - Split processing pipeline:
    - Embeddings → OpenAI `text-embedding-3-small`
    - LLM → Local Ollama (`http://10.0.0.3:11434/api/chat`).
- Added `.env.3090` template for self-hosted inference nodes.
- Integrated runtime diagnostics and seeder progress tracking:
  - File-level + message-level progress bars.
  - Retry/back-off logic for timeouts (3 attempts).
  - Event logging (`ADD / UPDATE / NONE`) for every memory record.
- Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers.
- Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090).

### Changed
- Updated `main.py` configuration block to load:
  - `LLM_PROVIDER`, `LLM_MODEL`, and `OLLAMA_BASE_URL`.
  - Fallback to OpenAI if Ollama unavailable.
- Adjusted `docker-compose.yml` mount paths to correctly map `/app/main.py`.
- Normalized `.env` loading so `mem0-api` and host environment share identical values.
- Improved seeder logging and progress telemetry for clearer diagnostics.
- Added explicit `temperature` field to `DEFAULT_CONFIG['llm']['config']` for tuning future local inference runs.

### Fixed
- Resolved crash during startup:
  `TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'`.
- Corrected mount type mismatch (file vs directory) causing `OCI runtime create failed` errors.
- Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests.
- “Unknown event” warnings now safely ignored (no longer break seeding loop).
- Confirmed full dual-provider operation in logs (`api.openai.com` + `10.0.0.3:11434/api/chat`).

### Observations
- Stable GPU utilization: ~8 GB VRAM @ 92 % load, ≈ 67 °C under sustained seeding.
- Next revision will re-format seed JSON to preserve `role` context (user vs assistant).

---

## [Lyra-Mem0 0.3.1] - 2025-10-03
### Added
- HuggingFace TEI integration (local 3090 embedder).
- Dual-mode environment switch between OpenAI cloud and local.
- CSV export of memories from Postgres (`payload->>'data'`).

### Fixed
- `.env` CRLF vs LF line ending issues.
- Local seeding now possible via huggingface server running 

---

## [Lyra-mem0 0.3.0]
### Added
- Support for **Ollama embeddings** in Mem0 OSS container:
  - Added ability to configure `EMBEDDER_PROVIDER=ollama` and set `EMBEDDER_MODEL` + `OLLAMA_HOST` via `.env`.
  - Mounted `main.py` override from host into container to load custom `DEFAULT_CONFIG`.
  - Installed `ollama` Python client into custom API container image.
- `.env.3090` file created for external embedding mode (3090 machine):
  - EMBEDDER_PROVIDER=ollama
  - EMBEDDER_MODEL=mxbai-embed-large
  - OLLAMA_HOST=http://10.0.0.3:11434
- Workflow to support **multiple embedding modes**:
  1. Fast LAN-based 3090/Ollama embeddings
  2. Local-only CPU embeddings (Lyra Cortex VM)
  3. OpenAI fallback embeddings

### Changed
- `docker-compose.yml` updated to mount local `main.py` and `.env.3090`.
- Built **custom Dockerfile** (`mem0-api-server:latest`) extending base image with `pip install ollama`.
- Updated `requirements.txt` to include `ollama` package.
- Adjusted Mem0 container config so `main.py` pulls environment variables with `dotenv` (`load_dotenv()`).
- Tested new embeddings path with curl `/memories` API call.

### Fixed
- Resolved container boot failure caused by missing `ollama` dependency (`ModuleNotFoundError`).
- Fixed config overwrite issue where rebuilding container restored stock `main.py`.
- Worked around Neo4j error (`vector.similarity.cosine(): mismatched vector dimensions`) by confirming OpenAI vs. Ollama embedding vector sizes and planning to standardize at 1536-dim.

--

## [Lyra-mem0 v0.2.1]

### Added
- **Seeding pipeline**:
  - Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0.
  - Implemented incremental seeding option (skip existing memories, only add new ones).
  - Verified insert process with Postgres-backed history DB and curl `/memories/search` sanity check.
- **Ollama embedding support** in Mem0 OSS container:
  - Added configuration for `EMBEDDER_PROVIDER=ollama`, `EMBEDDER_MODEL`, and `OLLAMA_HOST` via `.env`.
  - Created `.env.3090` profile for LAN-connected 3090 machine with Ollama.
  - Set up three embedding modes:
    1. Fast LAN-based 3090/Ollama
    2. Local-only CPU model (Lyra Cortex VM)
    3. OpenAI fallback

### Changed
- Updated `main.py` to load configuration from `.env` using `dotenv` and support multiple embedder backends.
- Mounted host `main.py` into container so local edits persist across rebuilds.
- Updated `docker-compose.yml` to mount `.env.3090` and support swap between profiles.
- Built **custom Dockerfile** (`mem0-api-server:latest`) including `pip install ollama`.
- Updated `requirements.txt` with `ollama` dependency.
- Adjusted startup flow so container automatically connects to external Ollama host (LAN IP).
- Added logging to confirm model pulls and embedding requests.

### Fixed
- Seeder process originally failed on old memories — now skips duplicates and continues batch.
- Resolved container boot error (`ModuleNotFoundError: ollama`) by extending image.
- Fixed overwrite issue where stock `main.py` replaced custom config during rebuild.
- Worked around Neo4j `vector.similarity.cosine()` dimension mismatch by investigating OpenAI (1536-dim) vs Ollama (1024-dim) schemas.

### Notes
- To fully unify embedding modes, a Hugging Face / local model with **1536-dim embeddings** will be needed (to match OpenAI’s schema and avoid Neo4j errors).
- Current Ollama model (`mxbai-embed-large`) works, but returns 1024-dim vectors.
- Seeder workflow validated but should be wrapped in a repeatable weekly run for full Cloud→Local sync.

---

## [Lyra-Mem0 v0.2.0] - 2025-09-30
### Added
- Standalone **Lyra-Mem0** stack created at `~/lyra-mem0/`
  - Includes **Postgres (pgvector)**, **Qdrant**, **Neo4j**, and **SQLite** for history tracking.
  - Added working `docker-compose.mem0.yml` and custom `Dockerfile` for building the Mem0 API server.
- Verified REST API functionality:
  - `POST /memories` works for adding memories.
  - `POST /search` works for semantic search.
- Successful end-to-end test with persisted memory:  
  *"Likes coffee in the morning"* → retrievable via search. ✅

### Changed
- Split architecture into **modular stacks**:
  - `~/lyra-core` (Relay, Persona-Sidecar, etc.)
  - `~/lyra-mem0` (Mem0 OSS memory stack)
- Removed old embedded mem0 containers from the Lyra-Core compose file.
- Added Lyra-Mem0 section in README.md.

### Next Steps
- Wire **Relay → Mem0 API** (integration not yet complete).
- Add integration tests to verify persistence and retrieval from within Lyra-Core.

---

## 🧠 Lyra-Cortex ##############################################################################

## [ Cortex - v0.5] -2025-11-13

### Added
- **New `reasoning.py` module**
  - Async reasoning engine.
  - Accepts user prompt, identity, RAG block, and reflection notes.
  - Produces draft internal answers.
  - Uses primary backend (vLLM).
- **New `reflection.py` module**
  - Fully async.
  - Produces actionable JSON “internal notes.”
  - Enforces strict JSON schema and fallback parsing.
  - Forces cloud backend (`backend_override="cloud"`).
- Integrated `refine.py` into Cortex reasoning pipeline:
  - New stage between reflection and persona.
  - Runs exclusively on primary vLLM backend (MI50).
  - Produces final, internally consistent output for downstream persona layer.
- **Backend override system**
  - Each LLM call can now select its own backend.
  - Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary.

- **identity loader**
  - Added `identity.py` with `load_identity()` for consistent persona retrieval.

- **ingest_handler**
  - Async stub created for future Intake → NeoMem → RAG pipeline.  

### Changed
- Unified LLM backend URL handling across Cortex:
  - ENV variables must now contain FULL API endpoints.
  - Removed all internal path-appending (e.g. `.../v1/completions`).
  - `llm_router.py` rewritten to use env-provided URLs as-is.
  - Ensures consistent behavior between draft, reflection, refine, and persona.
- **Rebuilt `main.py`**
  - Removed old annotation/analysis logic.
  - New structure: load identity → get RAG → reflect → reason → return draft+notes.
  - Routes now clean and minimal (`/reason`, `/ingest`, `/health`).
  - Async path throughout Cortex.

- **Refactored `llm_router.py`**
  - Removed old fallback logic during overrides.
  - OpenAI requests now use `/v1/chat/completions`.
  - Added proper OpenAI Authorization headers.
  - Distinct payload format for vLLM vs OpenAI.
  - Unified, correct parsing across models.

- **Simplified Cortex architecture**
  - Removed deprecated “context.py” and old reasoning code.
  - Relay completely decoupled from smart behavior.

- Updated environment specification:
  - `LLM_PRIMARY_URL` now set to `http://10.0.0.43:8000/v1/completions`.
  - `LLM_SECONDARY_URL` remains `http://10.0.0.3:11434/api/generate` (Ollama).
  - `LLM_CLOUD_URL` set to `https://api.openai.com/v1/chat/completions`.

### Fixed
- Resolved endpoint conflict where:
  - Router expected base URLs.
  - Refine expected full URLs.
  - Refine always fell back due to hitting incorrect endpoint.
  - Fixed by standardizing full-URL behavior across entire system.
- Reflection layer no longer fails silently (previously returned `[""]` due to MythoMax).
- Resolved 404/401 errors caused by incorrect OpenAI URL endpoints.
- No more double-routing through vLLM during reflection.
- Corrected async/sync mismatch in multiple locations.  
- Eliminated double-path bug (`/v1/completions/v1/completions`) caused by previous router logic.

### Removed
- Legacy `annotate`, `reason_check` glue logic from old architecture.
- Old backend probing junk code.
- Stale imports and unused modules leftover from previous prototype.

### Verified
- Cortex → vLLM (MI50) → refine → final_output now functioning correctly.
- refine shows `used_primary_backend: true` and no fallback.
- Manual curl test confirms endpoint accuracy.

### Known Issues
- refine sometimes prefixes output with `"Final Answer:"`; next version will sanitize this.
- hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned).

### Pending / Known Issues
- **RAG service does not exist** — requires containerized FastAPI service.
- Reasoning layer lacks self-revision loop (deliberate thought cycle).
- No speak/persona generation layer yet (`speak.py` planned).
- Intake summaries not yet routing into RAG or reflection layer.
- No refinement engine between reasoning and speak.

### Notes
This is the largest structural change to Cortex so far.  
It establishes:
- multi-model cognition  
- clean layering  
- identity + reflection separation  
- correct async code  
- deterministic backend routing  
- predictable JSON reflection  

The system is now ready for:
- refinement loops  
- persona-speaking layer  
- containerized RAG  
- long-term memory integration  
- true emergent-behavior experiments  


## [ Cortex - v0.4.1] - 2025-11-5
### Added
- **RAG intergration**
	- Added rag.py with query_rag() and format_rag_block().
	- Cortex now queries the local RAG API (http://10.0.0.41:7090/rag/search) for contextual augmentation.
	- Synthesized answers and top excerpts are injected into the reasoning prompt.

### Changed ###
- **Revised /reason endpoint.**
	- Now builds unified context blocks:
	  - [Intake] → recent summaries
	  - [RAG] → contextual knowledge
	  - [User Message] → current input 
	- Calls call_llm() for the first pass, then reflection_loop() for meta-evaluation.
	- Returns cortex_prompt, draft_output, final_output, and normalized reflection.
- **Reflection Pipeline Stability**
	- Cleaned parsing to normalize JSON vs. text reflections.
	- Added fallback handling for malformed or non-JSON outputs.
	- Log system improved to show raw JSON, extracted fields, and normalized summary.
- **Async Summarization (Intake v0.2.1)**
	- Intake summaries now run in background threads to avoid blocking Cortex.
	- Summaries (L1–L∞) logged asynchronously with [BG] tags.
- **Environment & Networking Fixes**
	- Verified .env variables propagate correctly inside the Cortex container.
	- Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG (shared serversdown_lyra_net).
	- Adjusted localhost calls to service-IP mapping (10.0.0.41 for Cortex host).
	
- **Behavioral Updates**
	- Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers).
	- RAG context successfully grounds reasoning outputs.
	- Intake and NeoMem confirmed receiving summaries via /add_exchange.
	- Log clarity pass: all reflective and contextual blocks clearly labeled.
- **Known Gaps / Next Steps**
	- NeoMem Tuning
	- Improve retrieval latency and relevance.
	- Implement a dedicated /reflections/recent endpoint for Cortex.
	- Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem).
- **Cortex Enhancements**
	- Add persistent reflection recall (use prior reflections as meta-context).
	- Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields).
	- Tighten temperature and prompt control for factual consistency.
- **RAG Optimization**
	-Add source ranking, filtering, and multi-vector hybrid search.
	-Cache RAG responses per session to reduce duplicate calls.
- **Documentation / Monitoring**
	-Add health route for RAG and Intake summaries.
	-Include internal latency metrics in /health endpoint.

Consolidate logs into unified “Lyra Cortex Console” for tracing all module calls.

## [Cortex - v0.3.0] – 2025-10-31
### Added
- **Cortex Service (FastAPI)**  
  - New standalone reasoning engine (`cortex/main.py`) with endpoints:
    - `GET /health` – reports active backend + NeoMem status.  
    - `POST /reason` – evaluates `{prompt, response}` pairs.  
    - `POST /annotate` – experimental text analysis.  
  - Background NeoMem health monitor (5-minute interval).

- **Multi-Backend Reasoning Support**  
  - Added environment-driven backend selection via `LLM_FORCE_BACKEND`.  
  - Supports:
    - **Primary** → vLLM (MI50 node @ 10.0.0.43)  
    - **Secondary** → Ollama (3090 node @ 10.0.0.3)  
    - **Cloud** → OpenAI API  
    - **Fallback** → llama.cpp (CPU)
  - Introduced per-backend model variables:  
    `LLM_PRIMARY_MODEL`, `LLM_SECONDARY_MODEL`, `LLM_CLOUD_MODEL`, `LLM_FALLBACK_MODEL`.

- **Response Normalization Layer**  
  - Implemented `normalize_llm_response()` to merge streamed outputs and repair malformed JSON.  
  - Handles Ollama’s multi-line streaming and Mythomax’s missing punctuation issues.  
  - Prints concise debug previews of merged content.

- **Environment Simplification**  
  - Each service (`intake`, `cortex`, `neomem`) now maintains its own `.env` file.  
  - Removed reliance on shared/global env file to prevent cross-contamination.  
  - Verified Docker Compose networking across containers.

### Changed
- Refactored `reason_check()` to dynamically switch between **prompt** and **chat** mode depending on backend.
- Enhanced startup logs to announce active backend, model, URL, and mode.
- Improved error handling with clearer “Reasoning error” messages.

### Fixed
- Corrected broken vLLM endpoint routing (`/v1/completions`).
- Stabilized cross-container health reporting for NeoMem.
- Resolved JSON parse failures caused by streaming chunk delimiters.

---

## Next Planned – [v0.4.0]
### Planned Additions
- **Reflection Mode**
  - Introduce `REASONING_MODE=factcheck|reflection`.  
  - Output schema:
    ```json
    { "insight": "...", "evaluation": "...", "next_action": "..." }
    ```

- **Cortex-First Pipeline**
  - UI → Cortex → [Reflection + Verifier + Memory] → Speech LLM → User.  
  - Allows Lyra to “think before speaking.”

- **Verifier Stub**
  - New `/verify` endpoint for search-based factual grounding.  
  - Asynchronous external truth checking.

- **Memory Integration**
  - Feed reflective outputs into NeoMem.  
  - Enable “dream” cycles for autonomous self-review.

---

**Status:** 🟢 Stable Core – Multi-backend reasoning operational.  
**Next milestone:** *v0.4.0 — Reflection Mode + Thought Pipeline orchestration.*

---

### [Intake] v0.1.0 - 2025-10-27
    - Recieves messages from relay and summarizes them in a cascading format.
	- Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
	- Currently logs summaries to .log file in /project-lyra/intake-logs/
  ** Next Steps **
    - Feed intake into neomem.
	- Generate a daily/hourly/etc overall summary, (IE: Today Brian and Lyra worked on x, y, and z)
	- Generate session aware summaries, with its own intake hopper.
  

### [Lyra-Cortex] v0.2.0 — 2025-09-26
**Added
- Integrated **llama-server** on dedicated Cortex VM (Proxmox).
- Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs.
- Benchmarked Phi-3.5-mini performance:
  - ~18 tokens/sec CPU-only on Ryzen 7 7800X.
  - Salience classification functional but sometimes inconsistent ("sali", "fi", "jamming").
- Tested **Qwen2-0.5B-Instruct GGUF** as alternative salience classifier:
  - Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval).
  - More responsive but over-classifies messages as “salient.”
- Established `.env` integration for model ID (`SALIENCE_MODEL`), enabling hot-swap between models.

** Known Issues
- Small models tend to drift or over-classify.
- CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models.
- Need to set up a `systemd` service for `llama-server` to auto-start on VM reboot.

---

### [Lyra-Cortex] v0.1.0 — 2025-09-25
#### Added
- First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD).
- Built **llama.cpp** with `llama-server` target via CMake.
- Integrated **Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF)** model.
- Verified **API compatibility** at `/v1/chat/completions`.
- Local test successful via `curl` → ~523 token response generated.
- Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X).
- Confirmed usable for salience scoring, summarization, and lightweight reasoning.