project-lyra/CHANGELOG.md

# Project Lyra Changelog

All notable changes to Project Lyra.
Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/).

---

## [Unreleased]

---

## [0.6.0] - 2025-12-18

### Added - Autonomy System (Phase 1 & 2)

**Autonomy Phase 1** - Self-Awareness & Planning Foundation
- **Executive Planning Module** [cortex/autonomy/executive/planner.py](cortex/autonomy/executive/planner.py)
  - Autonomous goal setting and task planning capabilities
  - Multi-step reasoning for complex objectives
  - Integration with self-state tracking
- **Self-State Management** [cortex/data/self_state.json](cortex/data/self_state.json)
  - Persistent state tracking across sessions
  - Memory of past actions and outcomes
  - Self-awareness metadata storage
- **Self Analyzer** [cortex/autonomy/self/analyzer.py](cortex/autonomy/self/analyzer.py)
  - Analyzes own performance and decision patterns
  - Identifies areas for improvement
  - Tracks cognitive patterns over time
- **Test Suite** [cortex/tests/test_autonomy_phase1.py](cortex/tests/test_autonomy_phase1.py)
  - Unit tests for phase 1 autonomy features

**Autonomy Phase 2** - Decision Making & Proactive Behavior
- **Autonomous Actions Module** [cortex/autonomy/actions/autonomous_actions.py](cortex/autonomy/actions/autonomous_actions.py)
  - Self-initiated action execution
  - Context-aware decision implementation
  - Action logging and tracking
- **Pattern Learning System** [cortex/autonomy/learning/pattern_learner.py](cortex/autonomy/learning/pattern_learner.py)
  - Learns from interaction patterns
  - Identifies recurring user needs
  - Adapts behavior based on learned patterns
- **Proactive Monitor** [cortex/autonomy/proactive/monitor.py](cortex/autonomy/proactive/monitor.py)
  - Monitors system state for intervention opportunities
  - Detects patterns requiring proactive response
  - Background monitoring capabilities
- **Decision Engine** [cortex/autonomy/tools/decision_engine.py](cortex/autonomy/tools/decision_engine.py)
  - Autonomous decision-making framework
  - Weighs options and selects optimal actions
  - Integrates with orchestrator for coordinated decisions
- **Orchestrator** [cortex/autonomy/tools/orchestrator.py](cortex/autonomy/tools/orchestrator.py)
  - Coordinates multiple autonomy subsystems
  - Manages tool selection and execution
  - Handles NeoMem integration (with disable capability)
- **Test Suite** [cortex/tests/test_autonomy_phase2.py](cortex/tests/test_autonomy_phase2.py)
  - Unit tests for phase 2 autonomy features

**Autonomy Phase 2.5** - Pipeline Refinement
- Tightened integration between autonomy modules and reasoning pipeline
- Enhanced self-state persistence and tracking
- Improved orchestrator reliability
- NeoMem integration refinements in vector store handling [neomem/neomem/vector_stores/qdrant.py](neomem/neomem/vector_stores/qdrant.py)

### Added - Documentation

- **Complete AI Agent Breakdown** [docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md](docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md)
  - Comprehensive system architecture documentation
  - Detailed component descriptions
  - Data flow diagrams
  - Integration points and API specifications

### Changed - Core Integration

- **Router Updates** [cortex/router.py](cortex/router.py)
  - Integrated autonomy subsystems into main routing logic
  - Added endpoints for autonomous decision-making
  - Enhanced state management across requests
- **Reasoning Pipeline** [cortex/reasoning/reasoning.py](cortex/reasoning/reasoning.py)
  - Integrated autonomy-aware reasoning
  - Self-state consideration in reasoning process
- **Persona Layer** [cortex/persona/speak.py](cortex/persona/speak.py)
  - Autonomy-aware response generation
  - Self-state reflection in personality expression
- **Context Handling** [cortex/context.py](cortex/context.py)
  - NeoMem disable capability for flexible deployment

### Changed - Development Environment

- Updated [.gitignore](.gitignore) for better workspace management
- Cleaned up VSCode settings
- Removed [.vscode/settings.json](.vscode/settings.json) from repository

### Technical Improvements

- Modular autonomy architecture with clear separation of concerns
- Test-driven development for new autonomy features
- Enhanced state persistence across system restarts
- Flexible NeoMem integration with enable/disable controls

### Architecture - Autonomy System Design

The autonomy system operates in layers:
1. **Executive Layer** - High-level planning and goal setting
2. **Decision Layer** - Evaluates options and makes choices
3. **Action Layer** - Executes autonomous decisions
4. **Learning Layer** - Adapts behavior based on patterns
5. **Monitoring Layer** - Proactive awareness of system state

All layers coordinate through the orchestrator and maintain state in `self_state.json`.

---

## [0.5.2] - 2025-12-12

### Fixed - LLM Router & Async HTTP
- **Critical**: Replaced synchronous `requests` with async `httpx` in LLM router [cortex/llm/llm_router.py](cortex/llm/llm_router.py)
  - Event loop blocking was causing timeouts and empty responses
  - All three providers (MI50, Ollama, OpenAI) now use `await http_client.post()`
  - Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake
- **Critical**: Fixed missing `backend` parameter in intake summarization [cortex/intake/intake.py:285](cortex/intake/intake.py#L285)
  - Was defaulting to PRIMARY (MI50) instead of respecting `INTAKE_LLM=SECONDARY`
  - Now correctly uses configured backend (Ollama on 3090)
- **Relay**: Fixed session ID case mismatch [core/relay/server.js:87](core/relay/server.js#L87)
  - UI sends `sessionId` (camelCase) but relay expected `session_id` (snake_case)
  - Now accepts both variants: `req.body.session_id || req.body.sessionId`
  - Custom session IDs now properly tracked instead of defaulting to "default"

### Added - Error Handling & Diagnostics
- Added comprehensive error handling in LLM router for all providers
  - HTTPError, JSONDecodeError, KeyError, and generic Exception handling
  - Detailed error messages with exception type and description
  - Provider-specific error logging (mi50, ollama, openai)
- Added debug logging in intake summarization
  - Logs LLM response length and preview
  - Validates non-empty responses before JSON parsing
  - Helps diagnose empty or malformed responses

### Added - Session Management
- Added session persistence endpoints in relay [core/relay/server.js:160-171](core/relay/server.js#L160-L171)
  - `GET /sessions/:id` - Retrieve session history
  - `POST /sessions/:id` - Save session history
  - In-memory storage using Map (ephemeral, resets on container restart)
  - Fixes UI "Failed to load session" errors

### Changed - Provider Configuration
- Added `mi50` provider support for llama.cpp server [cortex/llm/llm_router.py:62-81](cortex/llm/llm_router.py#L62-L81)
  - Uses `/completion` endpoint with `n_predict` parameter
  - Extracts `content` field from response
  - Configured for MI50 GPU with DeepSeek model
- Increased memory retrieval threshold from 0.78 to 0.90 [cortex/.env:20](cortex/.env#L20)
  - Filters out low-relevance memories (only returns 90%+ similarity)
  - Reduces noise in context retrieval

### Technical Improvements
- Unified async HTTP handling across all LLM providers
- Better separation of concerns between provider implementations
- Improved error messages for debugging LLM API failures
- Consistent timeout handling (120 seconds for all providers)

---

## [0.5.1] - 2025-12-11

### Fixed - Intake Integration
- **Critical**: Fixed `bg_summarize()` function not defined error
  - Was only a `TYPE_CHECKING` stub, now implemented as logging stub
  - Eliminated `NameError` preventing SESSIONS from persisting correctly
  - Function now logs exchange additions and defers summarization to `/reason` endpoint
- **Critical**: Fixed `/ingest` endpoint unreachable code in [router.py:201-233](cortex/router.py#L201-L233)
  - Removed early return that prevented `update_last_assistant_message()` from executing
  - Removed duplicate `add_exchange_internal()` call
  - Implemented lenient error handling (each operation wrapped in try/except)
- **Intake**: Added missing `__init__.py` to make intake a proper Python package [cortex/intake/__init__.py](cortex/intake/__init__.py)
  - Prevents namespace package issues
  - Enables proper module imports
  - Exports `SESSIONS`, `add_exchange_internal`, `summarize_context`

### Added - Diagnostics & Debugging
- Added diagnostic logging to verify SESSIONS singleton behavior
  - Module initialization logs SESSIONS object ID [intake.py:14](cortex/intake/intake.py#L14)
  - Each `add_exchange_internal()` call logs object ID and buffer state [intake.py:343-358](cortex/intake/intake.py#L343-L358)
- Added `/debug/sessions` HTTP endpoint [router.py:276-305](cortex/router.py#L276-L305)
  - Inspect SESSIONS from within running Uvicorn worker
  - Shows total sessions, session count, buffer sizes, recent exchanges
  - Returns SESSIONS object ID for verification
- Added `/debug/summary` HTTP endpoint [router.py:238-271](cortex/router.py#L238-L271)
  - Test `summarize_context()` for any session
  - Returns L1/L5/L10/L20/L30 summaries
  - Includes buffer size and exchange preview

### Changed - Intake Architecture
- **Intake no longer standalone service** - runs inside Cortex container as pure Python module
  - Imported as `from intake.intake import add_exchange_internal, SESSIONS`
  - No HTTP calls between Cortex and Intake
  - Eliminates network latency and dependency on Intake service being up
- **Deferred summarization**: `bg_summarize()` is now a no-op stub [intake.py:318-325](cortex/intake/intake.py#L318-L325)
  - Actual summarization happens during `/reason` call via `summarize_context()`
  - Simplifies async/sync complexity
  - Prevents NameError when called from `add_exchange_internal()`
- **Lenient error handling**: `/ingest` endpoint always returns success [router.py:201-233](cortex/router.py#L201-L233)
  - Each operation wrapped in try/except
  - Logs errors but never fails to avoid breaking chat pipeline
  - User requirement: never fail chat pipeline

### Documentation
- Added single-worker constraint note in [cortex/Dockerfile:7-8](cortex/Dockerfile#L7-L8)
  - Documents that SESSIONS requires single Uvicorn worker
  - Notes that multi-worker scaling requires Redis or shared storage
- Updated plan documentation with root cause analysis

---

## [0.5.0] - 2025-11-28

### Fixed - Critical API Wiring & Integration

After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.

#### Cortex → Intake Integration
- **Fixed** `IntakeClient` to use correct Intake v0.2 API endpoints
  - Changed `GET /context/{session_id}` → `GET /summaries?session_id={session_id}`
  - Updated JSON response parsing to extract `summary_text` field
  - Fixed environment variable name: `INTAKE_API` → `INTAKE_API_URL`
  - Corrected default port: `7083` → `7080`
  - Added deprecation warning to `summarize_turn()` method (endpoint removed in Intake v0.2)

#### Relay → UI Compatibility
- **Added** OpenAI-compatible endpoint `POST /v1/chat/completions`
  - Accepts standard OpenAI format with `messages[]` array
  - Returns OpenAI-compatible response structure with `choices[]`
  - Extracts last message content from messages array
  - Includes usage metadata (stub values for compatibility)
- **Refactored** Relay to use shared `handleChatRequest()` function
  - Both `/chat` and `/v1/chat/completions` use same core logic
  - Eliminates code duplication
  - Consistent error handling across endpoints

#### Relay → Intake Connection
- **Fixed** Intake URL fallback in Relay server configuration
  - Corrected port: `7082` → `7080`
  - Updated endpoint: `/summary` → `/add_exchange`
  - Now properly sends exchanges to Intake for summarization

#### Code Quality & Python Package Structure
- **Added** missing `__init__.py` files to all Cortex subdirectories
  - `cortex/llm/__init__.py`
  - `cortex/reasoning/__init__.py`
  - `cortex/persona/__init__.py`
  - `cortex/ingest/__init__.py`
  - `cortex/utils/__init__.py`
  - Improves package imports and IDE support
- **Removed** unused import in `cortex/router.py`: `from unittest import result`
- **Deleted** empty file `cortex/llm/resolve_llm_url.py` (was 0 bytes, never implemented)

### Verified Working

Complete end-to-end message flow now operational:
```
UI → Relay (/v1/chat/completions)
  ↓
Relay → Cortex (/reason)
  ↓
Cortex → Intake (/summaries) [retrieves context]
  ↓
Cortex 4-stage pipeline:
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer
  3. refine.py → polished answer
  4. persona/speak.py → Lyra personality
  ↓
Cortex → Relay (returns persona response)
  ↓
Relay → Intake (/add_exchange) [async summary]
  ↓
Intake → NeoMem (background memory storage)
  ↓
Relay → UI (final response)
```

### Documentation
- **Added** comprehensive v0.5.0 changelog entry
- **Updated** README.md to reflect v0.5.0 architecture
  - Documented new endpoints
  - Updated data flow diagrams
  - Clarified Intake v0.2 changes
  - Corrected service descriptions

### Issues Resolved
- ❌ Cortex could not retrieve context from Intake (wrong endpoint)
- ❌ UI could not send messages to Relay (endpoint mismatch)
- ❌ Relay could not send summaries to Intake (wrong port/endpoint)
- ❌ Python package imports were implicit (missing __init__.py)

### Known Issues (Non-Critical)
- Session management endpoints not implemented in Relay (`GET/POST /sessions/:id`)
- RAG service currently disabled in docker-compose.yml
- Cortex `/ingest` endpoint is a stub returning `{"status": "ok"}`

### Migration Notes
If upgrading from v0.4.x:
1. Pull latest changes from git
2. Verify environment variables in `.env` files:
   - Check `INTAKE_API_URL=http://intake:7080` (not `INTAKE_API`)
   - Verify all service URLs use correct ports
3. Restart Docker containers: `docker-compose down && docker-compose up -d`
4. Test with a simple message through the UI

---

## [Infrastructure v1.0.0] - 2025-11-26

### Changed - Environment Variable Consolidation

**Major reorganization to eliminate duplication and improve maintainability**

- Consolidated 9 scattered `.env` files into single source of truth architecture
- Root `.env` now contains all shared infrastructure (LLM backends, databases, API keys, service URLs)
- Service-specific `.env` files minimized to only essential overrides:
  - `cortex/.env`: Reduced from 42 to 22 lines (operational parameters only)
  - `neomem/.env`: Reduced from 26 to 14 lines (LLM naming conventions only)
  - `intake/.env`: Kept at 8 lines (already minimal)
- **Result**: ~24% reduction in total configuration lines (197 → ~150)

**Docker Compose Consolidation**
- All services now defined in single root `docker-compose.yml`
- Relay service updated with complete configuration (env_file, volumes)
- Removed redundant `core/docker-compose.yml` (marked as DEPRECATED)
- Standardized network communication to use Docker container names

**Service URL Standardization**
- Internal services use container names: `http://neomem-api:7077`, `http://cortex:7081`
- External services use IP addresses: `http://10.0.0.43:8000` (vLLM), `http://10.0.0.3:11434` (Ollama)
- Removed IP/container name inconsistencies across files

### Added - Security & Documentation

**Security Templates** - Created `.env.example` files for all services
- Root `.env.example` with sanitized credentials
- Service-specific templates: `cortex/.env.example`, `neomem/.env.example`, `intake/.env.example`, `rag/.env.example`
- All `.env.example` files safe to commit to version control

**Documentation**
- `ENVIRONMENT_VARIABLES.md`: Comprehensive reference for all environment variables
  - Variable descriptions, defaults, and usage examples
  - Multi-backend LLM strategy documentation
  - Troubleshooting guide
  - Security best practices
- `DEPRECATED_FILES.md`: Deletion guide for deprecated files with verification steps

**Enhanced .gitignore**
- Ignores all `.env` files (including subdirectories)
- Tracks `.env.example` templates for documentation
- Ignores `.env-backups/` directory

### Removed
- `core/.env` - Redundant with root `.env`, now deleted
- `core/docker-compose.yml` - Consolidated into main compose file (marked DEPRECATED)

### Fixed
- Eliminated duplicate `OPENAI_API_KEY` across 5+ files
- Eliminated duplicate LLM backend URLs across 4+ files
- Eliminated duplicate database credentials across 3+ files
- Resolved Cortex `environment:` section override in docker-compose (now uses env_file)

### Architecture - Multi-Backend LLM Strategy

Root `.env` provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE:
- **Cortex** → vLLM (PRIMARY) for autonomous reasoning
- **NeoMem** → Ollama (SECONDARY) + OpenAI embeddings
- **Intake** → vLLM (PRIMARY) for summarization
- **Relay** → Fallback chain with user preference

Preserves per-service flexibility while eliminating URL duplication.

### Migration
- All original `.env` files backed up to `.env-backups/` with timestamp `20251126_025334`
- Rollback plan documented in `ENVIRONMENT_VARIABLES.md`
- Verification steps provided in `DEPRECATED_FILES.md`

---

## [0.4.x] - 2025-11-13

### Added - Multi-Stage Reasoning Pipeline

**Cortex v0.5 - Complete architectural overhaul**

- **New `reasoning.py` module**
  - Async reasoning engine
  - Accepts user prompt, identity, RAG block, and reflection notes
  - Produces draft internal answers
  - Uses primary backend (vLLM)

- **New `reflection.py` module**
  - Fully async meta-awareness layer
  - Produces actionable JSON "internal notes"
  - Enforces strict JSON schema and fallback parsing
  - Forces cloud backend (`backend_override="cloud"`)

- **Integrated `refine.py` into pipeline**
  - New stage between reflection and persona
  - Runs exclusively on primary vLLM backend (MI50)
  - Produces final, internally consistent output for downstream persona layer

- **Backend override system**
  - Each LLM call can now select its own backend
  - Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary

- **Identity loader**
  - Added `identity.py` with `load_identity()` for consistent persona retrieval

- **Ingest handler**
  - Async stub created for future Intake → NeoMem → RAG pipeline

**Cortex v0.4.1 - RAG Integration**

- **RAG integration**
  - Added `rag.py` with `query_rag()` and `format_rag_block()`
  - Cortex now queries local RAG API (`http://10.0.0.41:7090/rag/search`)
  - Synthesized answers and top excerpts injected into reasoning prompt

### Changed - Unified LLM Architecture

**Cortex v0.5**

- **Unified LLM backend URL handling across Cortex**
  - ENV variables must now contain FULL API endpoints
  - Removed all internal path-appending (e.g. `.../v1/completions`)
  - `llm_router.py` rewritten to use env-provided URLs as-is
  - Ensures consistent behavior between draft, reflection, refine, and persona

- **Rebuilt `main.py`**
  - Removed old annotation/analysis logic
  - New structure: load identity → get RAG → reflect → reason → return draft+notes
  - Routes now clean and minimal (`/reason`, `/ingest`, `/health`)
  - Async path throughout Cortex

- **Refactored `llm_router.py`**
  - Removed old fallback logic during overrides
  - OpenAI requests now use `/v1/chat/completions`
  - Added proper OpenAI Authorization headers
  - Distinct payload format for vLLM vs OpenAI
  - Unified, correct parsing across models

- **Simplified Cortex architecture**
  - Removed deprecated "context.py" and old reasoning code
  - Relay completely decoupled from smart behavior

- **Updated environment specification**
  - `LLM_PRIMARY_URL` now set to `http://10.0.0.43:8000/v1/completions`
  - `LLM_SECONDARY_URL` remains `http://10.0.0.3:11434/api/generate` (Ollama)
  - `LLM_CLOUD_URL` set to `https://api.openai.com/v1/chat/completions`

**Cortex v0.4.1**

- **Revised `/reason` endpoint**
  - Now builds unified context blocks: [Intake] → recent summaries, [RAG] → contextual knowledge, [User Message] → current input
  - Calls `call_llm()` for first pass, then `reflection_loop()` for meta-evaluation
  - Returns `cortex_prompt`, `draft_output`, `final_output`, and normalized reflection

- **Reflection Pipeline Stability**
  - Cleaned parsing to normalize JSON vs. text reflections
  - Added fallback handling for malformed or non-JSON outputs
  - Log system improved to show raw JSON, extracted fields, and normalized summary

- **Async Summarization (Intake v0.2.1)**
  - Intake summaries now run in background threads to avoid blocking Cortex
  - Summaries (L1–L∞) logged asynchronously with [BG] tags

- **Environment & Networking Fixes**
  - Verified `.env` variables propagate correctly inside Cortex container
  - Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG
  - Adjusted localhost calls to service-IP mapping

- **Behavioral Updates**
  - Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers)
  - RAG context successfully grounds reasoning outputs
  - Intake and NeoMem confirmed receiving summaries via `/add_exchange`
  - Log clarity pass: all reflective and contextual blocks clearly labeled

### Fixed

**Cortex v0.5**

- Resolved endpoint conflict where router expected base URLs and refine expected full URLs
  - Fixed by standardizing full-URL behavior across entire system
- Reflection layer no longer fails silently (previously returned `[""]` due to MythoMax)
- Resolved 404/401 errors caused by incorrect OpenAI URL endpoints
- No more double-routing through vLLM during reflection
- Corrected async/sync mismatch in multiple locations
- Eliminated double-path bug (`/v1/completions/v1/completions`) caused by previous router logic

### Removed

**Cortex v0.5**

- Legacy `annotate`, `reason_check` glue logic from old architecture
- Old backend probing junk code
- Stale imports and unused modules leftover from previous prototype

### Verified

**Cortex v0.5**

- Cortex → vLLM (MI50) → refine → final_output now functioning correctly
- Refine shows `used_primary_backend: true` and no fallback
- Manual curl test confirms endpoint accuracy

### Known Issues

**Cortex v0.5**

- Refine sometimes prefixes output with `"Final Answer:"`; next version will sanitize this
- Hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned)

**Cortex v0.4.1**

- NeoMem tuning needed - improve retrieval latency and relevance
- Need dedicated `/reflections/recent` endpoint for Cortex
- Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem)
- Add persistent reflection recall (use prior reflections as meta-context)
- Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields)
- Tighten temperature and prompt control for factual consistency
- RAG optimization: add source ranking, filtering, multi-vector hybrid search
- Cache RAG responses per session to reduce duplicate calls

### Notes

**Cortex v0.5**

This is the largest structural change to Cortex so far. It establishes:
- Multi-model cognition
- Clean layering
- Identity + reflection separation
- Correct async code
- Deterministic backend routing
- Predictable JSON reflection

The system is now ready for:
- Refinement loops
- Persona-speaking layer
- Containerized RAG
- Long-term memory integration
- True emergent-behavior experiments

---

## [0.3.x] - 2025-10-28 to 2025-09-26

### Added

**[Lyra Core v0.3.2 + Web UI v0.2.0] - 2025-10-28**

- **New UI**
  - Cleaned up UI look and feel

- **Sessions**
  - Sessions now persist over time
  - Ability to create new sessions or load sessions from previous instance
  - When changing session, updates what the prompt sends to relay (doesn't prompt with messages from other sessions)
  - Relay correctly wired in

**[Lyra-Core 0.3.1] - 2025-10-09**

- **NVGRAM Integration (Full Pipeline Reconnected)**
  - Replaced legacy Mem0 service with NVGRAM microservice (`nvgram-api` @ port 7077)
  - Updated `server.js` in Relay to route all memory ops via `${NVGRAM_API}/memories` and `/search`
  - Added `.env` variable: `NVGRAM_API=http://nvgram-api:7077`
  - Verified end-to-end Lyra conversation persistence: `relay → nvgram-api → postgres/neo4j → relay → ollama → ui`
  - ✅ Memories stored, retrieved, and re-injected successfully

**[Lyra-Core v0.3.0] - 2025-09-26**

- **Salience filtering** in Relay
  - `.env` configurable: `SALIENCE_ENABLED`, `SALIENCE_MODE`, `SALIENCE_MODEL`, `SALIENCE_API_URL`
  - Supports `heuristic` and `llm` classification modes
  - LLM-based salience filter integrated with Cortex VM running `llama-server`
- Logging improvements
  - Added debug logs for salience mode, raw LLM output, and unexpected outputs
  - Fail-closed behavior for unexpected LLM responses
- Successfully tested with **Phi-3.5-mini** and **Qwen2-0.5B-Instruct** as salience classifiers
- Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply

**[Cortex v0.3.0] - 2025-10-31**

- **Cortex Service (FastAPI)**
  - New standalone reasoning engine (`cortex/main.py`) with endpoints:
    - `GET /health` – reports active backend + NeoMem status
    - `POST /reason` – evaluates `{prompt, response}` pairs
    - `POST /annotate` – experimental text analysis
  - Background NeoMem health monitor (5-minute interval)

- **Multi-Backend Reasoning Support**
  - Environment-driven backend selection via `LLM_FORCE_BACKEND`
  - Supports: Primary (vLLM MI50), Secondary (Ollama 3090), Cloud (OpenAI), Fallback (llama.cpp CPU)
  - Per-backend model variables: `LLM_PRIMARY_MODEL`, `LLM_SECONDARY_MODEL`, `LLM_CLOUD_MODEL`, `LLM_FALLBACK_MODEL`

- **Response Normalization Layer**
  - Implemented `normalize_llm_response()` to merge streamed outputs and repair malformed JSON
  - Handles Ollama's multi-line streaming and Mythomax's missing punctuation issues
  - Prints concise debug previews of merged content

- **Environment Simplification**
  - Each service (`intake`, `cortex`, `neomem`) now maintains its own `.env` file
  - Removed reliance on shared/global env file to prevent cross-contamination
  - Verified Docker Compose networking across containers

**[NeoMem 0.1.2] - 2025-10-27** (formerly NVGRAM)

- **Renamed NVGRAM to NeoMem**
  - All future updates under name NeoMem
  - Features unchanged

**[NVGRAM 0.1.1] - 2025-10-08**

- **Async Memory Rewrite (Stability + Safety Patch)**
  - Introduced `AsyncMemory` class with fully asynchronous vector and graph store writes
  - Added input sanitation to prevent embedding errors (`'list' object has no attribute 'replace'`)
  - Implemented `flatten_messages()` helper in API layer to clean malformed payloads
  - Added structured request logging via `RequestLoggingMiddleware` (FastAPI middleware)
  - Health endpoint (`/health`) returns structured JSON `{status, version, service}`
  - Startup logs include sanitized embedder config with masked API keys

**[NVGRAM 0.1.0] - 2025-10-07**

- **Initial fork of Mem0 → NVGRAM**
  - Created fully independent local-first memory engine based on Mem0 OSS
  - Renamed all internal modules, Docker services, environment variables from `mem0` → `nvgram`
  - New service name: `nvgram-api`, default port 7077
  - Maintains same API endpoints (`/memories`, `/search`) for drop-in compatibility
  - Uses FastAPI, Postgres, and Neo4j as persistent backends

**[Lyra-Mem0 0.3.2] - 2025-10-05**

- **Ollama LLM reasoning** alongside OpenAI embeddings
  - Introduced `LLM_PROVIDER=ollama`, `LLM_MODEL`, and `OLLAMA_HOST` in `.env.3090`
  - Verified local 3090 setup using `qwen2.5:7b-instruct-q4_K_M`
  - Split processing: Embeddings → OpenAI `text-embedding-3-small`, LLM → Local Ollama
- Added `.env.3090` template for self-hosted inference nodes
- Integrated runtime diagnostics and seeder progress tracking
  - File-level + message-level progress bars
  - Retry/back-off logic for timeouts (3 attempts)
  - Event logging (`ADD / UPDATE / NONE`) for every memory record
- Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers
- Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090)

**[Lyra-Mem0 0.3.1] - 2025-10-03**

- HuggingFace TEI integration (local 3090 embedder)
- Dual-mode environment switch between OpenAI cloud and local
- CSV export of memories from Postgres (`payload->>'data'`)

**[Lyra-Mem0 0.3.0]**

- **Ollama embeddings** in Mem0 OSS container
  - Configure `EMBEDDER_PROVIDER=ollama`, `EMBEDDER_MODEL`, `OLLAMA_HOST` via `.env`
  - Mounted `main.py` override from host into container to load custom `DEFAULT_CONFIG`
  - Installed `ollama` Python client into custom API container image
- `.env.3090` file for external embedding mode (3090 machine)
- Workflow for multiple embedding modes: LAN-based 3090/Ollama, Local-only CPU, OpenAI fallback

**[Lyra-Mem0 v0.2.1]**

- **Seeding pipeline**
  - Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0
  - Implemented incremental seeding option (skip existing memories, only add new ones)
  - Verified insert process with Postgres-backed history DB

**[Intake v0.1.0] - 2025-10-27**

- Receives messages from relay and summarizes them in cascading format
- Continues to summarize smaller amounts of exchanges while generating large-scale conversational summaries (L20)
- Currently logs summaries to .log file in `/project-lyra/intake-logs/`

**[Lyra-Cortex v0.2.0] - 2025-09-26**

- Integrated **llama-server** on dedicated Cortex VM (Proxmox)
- Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs
- Benchmarked Phi-3.5-mini performance: ~18 tokens/sec CPU-only on Ryzen 7 7800X
- Salience classification functional but sometimes inconsistent
- Tested **Qwen2-0.5B-Instruct GGUF** as alternative salience classifier
  - Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval)
  - More responsive but over-classifies messages as "salient"
- Established `.env` integration for model ID (`SALIENCE_MODEL`), enabling hot-swap between models

### Changed

**[Lyra-Core 0.3.1] - 2025-10-09**

- Renamed `MEM0_URL` → `NVGRAM_API` across all relay environment configs
- Updated Docker Compose service dependency order
  - `relay` now depends on `nvgram-api` healthcheck
  - Removed `mem0` references and volumes
- Minor cleanup to Persona fetch block (null-checks and safer default persona string)

**[Lyra-Core v0.3.1] - 2025-09-27**

- Removed salience filter logic; Cortex is now default annotator
- All user messages stored in Mem0; no discard tier applied
- Cortex annotations (`metadata.cortex`) now attached to memories
- Debug logging improvements
  - Pretty-print Cortex annotations
  - Injected prompt preview
  - Memory search hit list with scores
- `.env` toggle (`CORTEX_ENABLED`) to bypass Cortex when needed

**[Lyra-Core v0.3.0] - 2025-09-26**

- Refactored `server.js` to gate `mem.add()` calls behind salience filter
- Updated `.env` to support `SALIENCE_MODEL`

**[Cortex v0.3.0] - 2025-10-31**

- Refactored `reason_check()` to dynamically switch between **prompt** and **chat** mode depending on backend
- Enhanced startup logs to announce active backend, model, URL, and mode
- Improved error handling with clearer "Reasoning error" messages

**[NVGRAM 0.1.1] - 2025-10-08**

- Replaced synchronous `Memory.add()` with async-safe version supporting concurrent vector + graph writes
- Normalized indentation and cleaned duplicate `main.py` references
- Removed redundant `FastAPI()` app reinitialization
- Updated internal logging to INFO-level timing format
- Deprecated `@app.on_event("startup")` → will migrate to `lifespan` handler in v0.1.2

**[NVGRAM 0.1.0] - 2025-10-07**

- Removed dependency on external `mem0ai` SDK — all logic now local
- Re-pinned requirements: fastapi==0.115.8, uvicorn==0.34.0, pydantic==2.10.4, python-dotenv==1.0.1, psycopg>=3.2.8, ollama
- Adjusted `docker-compose` and `.env` templates to use new NVGRAM naming

**[Lyra-Mem0 0.3.2] - 2025-10-05**

- Updated `main.py` configuration block to load `LLM_PROVIDER`, `LLM_MODEL`, `OLLAMA_BASE_URL`
  - Fallback to OpenAI if Ollama unavailable
- Adjusted `docker-compose.yml` mount paths to correctly map `/app/main.py`
- Normalized `.env` loading so `mem0-api` and host environment share identical values
- Improved seeder logging and progress telemetry
- Added explicit `temperature` field to `DEFAULT_CONFIG['llm']['config']`

**[Lyra-Mem0 0.3.0]**

- `docker-compose.yml` updated to mount local `main.py` and `.env.3090`
- Built custom Dockerfile (`mem0-api-server:latest`) extending base image with `pip install ollama`
- Updated `requirements.txt` to include `ollama` package
- Adjusted Mem0 container config so `main.py` pulls environment variables with `dotenv`
- Tested new embeddings path with curl `/memories` API call

**[Lyra-Mem0 v0.2.1]**

- Updated `main.py` to load configuration from `.env` using `dotenv` and support multiple embedder backends
- Mounted host `main.py` into container so local edits persist across rebuilds
- Updated `docker-compose.yml` to mount `.env.3090` and support swap between profiles
- Built custom Dockerfile (`mem0-api-server:latest`) including `pip install ollama`
- Updated `requirements.txt` with `ollama` dependency
- Adjusted startup flow so container automatically connects to external Ollama host (LAN IP)
- Added logging to confirm model pulls and embedding requests

### Fixed

**[Lyra-Core 0.3.1] - 2025-10-09**

- Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling
- `/memories` POST failures no longer crash Relay; now logged gracefully as `relay error Error: memAdd failed: 500`
- Improved injected prompt debugging (`DEBUG_PROMPT=true` now prints clean JSON)

**[Lyra-Core v0.3.1] - 2025-09-27**

- Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner
- Relay no longer "hangs" on malformed Cortex outputs

**[Cortex v0.3.0] - 2025-10-31**

- Corrected broken vLLM endpoint routing (`/v1/completions`)
- Stabilized cross-container health reporting for NeoMem
- Resolved JSON parse failures caused by streaming chunk delimiters

**[NVGRAM 0.1.1] - 2025-10-08**

- Eliminated repeating 500 error from OpenAI embedder caused by non-string message content
- Masked API key leaks from boot logs
- Ensured Neo4j reconnects gracefully on first retry

**[Lyra-Mem0 0.3.2] - 2025-10-05**

- Resolved crash during startup: `TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'`
- Corrected mount type mismatch (file vs directory) causing `OCI runtime create failed` errors
- Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests
- "Unknown event" warnings now safely ignored (no longer break seeding loop)
- Confirmed full dual-provider operation in logs (`api.openai.com` + `10.0.0.3:11434/api/chat`)

**[Lyra-Mem0 0.3.1] - 2025-10-03**

- `.env` CRLF vs LF line ending issues
- Local seeding now possible via HuggingFace server

**[Lyra-Mem0 0.3.0]**

- Resolved container boot failure caused by missing `ollama` dependency (`ModuleNotFoundError`)
- Fixed config overwrite issue where rebuilding container restored stock `main.py`
- Worked around Neo4j error (`vector.similarity.cosine(): mismatched vector dimensions`) by confirming OpenAI vs. Ollama embedding vector sizes

**[Lyra-Mem0 v0.2.1]**

- Seeder process originally failed on old memories — now skips duplicates and continues batch
- Resolved container boot error (`ModuleNotFoundError: ollama`) by extending image
- Fixed overwrite issue where stock `main.py` replaced custom config during rebuild
- Worked around Neo4j `vector.similarity.cosine()` dimension mismatch

### Known Issues

**[Lyra-Core v0.3.0] - 2025-09-26**

- Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient"
- Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi")
- CPU-only inference is functional but limited; larger models recommended once GPU available

**[Lyra-Cortex v0.2.0] - 2025-09-26**

- Small models tend to drift or over-classify
- CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models
- Need to set up `systemd` service for `llama-server` to auto-start on VM reboot

### Observations

**[Lyra-Mem0 0.3.2] - 2025-10-05**

- Stable GPU utilization: ~8 GB VRAM @ 92% load, ≈ 67°C under sustained seeding
- Next revision will re-format seed JSON to preserve `role` context (user vs assistant)

**[Lyra-Mem0 v0.2.1]**

- To fully unify embedding modes, a Hugging Face / local model with **1536-dim embeddings** will be needed (to match OpenAI's schema)
- Current Ollama model (`mxbai-embed-large`) works, but returns 1024-dim vectors
- Seeder workflow validated but should be wrapped in repeatable weekly run for full Cloud→Local sync

### Next Steps

**[Lyra-Core 0.3.1] - 2025-10-09**

- Add salience visualization (e.g., memory weights displayed in injected system message)
- Begin schema alignment with NVGRAM v0.1.2 for confidence scoring
- Add relay auto-retry for transient 500 responses from NVGRAM

**[NVGRAM 0.1.1] - 2025-10-08**

- Integrate salience scoring and embedding confidence weight fields in Postgres schema
- Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall
- Migrate from deprecated `on_event` → `lifespan` pattern in 0.1.2

**[NVGRAM 0.1.0] - 2025-10-07**

- Integrate NVGRAM as new default backend in Lyra Relay
- Deprecate remaining Mem0 references and archive old configs
- Begin versioning as standalone project (`nvgram-core`, `nvgram-api`, etc.)

**[Intake v0.1.0] - 2025-10-27**

- Feed intake into NeoMem
- Generate daily/hourly overall summary (IE: Today Brian and Lyra worked on x, y, and z)
- Generate session-aware summaries with own intake hopper

---

## [0.2.x] - 2025-09-30 to 2025-09-24

### Added

**[Lyra-Mem0 v0.2.0] - 2025-09-30**

- Standalone **Lyra-Mem0** stack created at `~/lyra-mem0/`
  - Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking
  - Added working `docker-compose.mem0.yml` and custom `Dockerfile` for building Mem0 API server
- Verified REST API functionality
  - `POST /memories` works for adding memories
  - `POST /search` works for semantic search
- Successful end-to-end test with persisted memory: *"Likes coffee in the morning"* → retrievable via search ✅

**[Lyra-Core v0.2.0] - 2025-09-24**

- Migrated Relay to use `mem0ai` SDK instead of raw fetch calls
- Implemented `sessionId` support (client-supplied, fallback to `default`)
- Added debug logs for memory add/search
- Cleaned up Relay structure for clarity

### Changed

**[Lyra-Mem0 v0.2.0] - 2025-09-30**

- Split architecture into modular stacks:
  - `~/lyra-core` (Relay, Persona-Sidecar, etc.)
  - `~/lyra-mem0` (Mem0 OSS memory stack)
- Removed old embedded mem0 containers from Lyra-Core compose file
- Added Lyra-Mem0 section in README.md

### Next Steps

**[Lyra-Mem0 v0.2.0] - 2025-09-30**

- Wire **Relay → Mem0 API** (integration not yet complete)
- Add integration tests to verify persistence and retrieval from within Lyra-Core

---

## [0.1.x] - 2025-09-25 to 2025-09-23

### Added

**[Lyra_RAG v0.1.0] - 2025-11-07**

- Initial standalone RAG module for Project Lyra
- Persistent ChromaDB vector store (`./chromadb`)
- Importer `rag_chat_import.py` with:
  - Recursive folder scanning and category tagging
  - Smart chunking (~5k chars)
  - SHA-1 deduplication and chat-ID metadata
  - Timestamp fields (`file_modified`, `imported_at`)
  - Background-safe operation (`nohup`/`tmux`)
- 68 Lyra-category chats imported:
  - 6,556 new chunks added
  - 1,493 duplicates skipped
  - 7,997 total vectors stored

**[Lyra_RAG v0.1.0 API] - 2025-11-07**

- `/rag/search` FastAPI endpoint implemented (port 7090)
- Supports natural-language queries and returns top related excerpts
- Added answer synthesis step using `gpt-4o-mini`

**[Lyra-Core v0.1.0] - 2025-09-23**

- First working MVP of **Lyra Core Relay**
- Relay service accepts `POST /v1/chat/completions` (OpenAI-compatible)
- Memory integration with Mem0:
  - `POST /memories` on each user message
  - `POST /search` before LLM call
- Persona Sidecar integration (`GET /current`)
- OpenAI GPT + Ollama (Mythomax) support in Relay
- Simple browser-based chat UI (talks to Relay at `http://<host>:7078`)
- `.env` standardization for Relay + Mem0 + Postgres + Neo4j
- Working Neo4j + Postgres backing stores for Mem0
- Initial MVP relay service with raw fetch calls to Mem0
- Dockerized with basic healthcheck

**[Lyra-Cortex v0.1.0] - 2025-09-25**

- First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD)
- Built **llama.cpp** with `llama-server` target via CMake
- Integrated **Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF)** model
- Verified API compatibility at `/v1/chat/completions`
- Local test successful via `curl` → ~523 token response generated
- Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X)
- Confirmed usable for salience scoring, summarization, and lightweight reasoning

### Fixed

**[Lyra-Core v0.1.0] - 2025-09-23**

- Resolved crash loop in Neo4j by restricting env vars (`NEO4J_AUTH` only)
- Relay now correctly reads `MEM0_URL` and `MEM0_API_KEY` from `.env`

### Verified

**[Lyra_RAG v0.1.0] - 2025-11-07**

- Successful recall of Lyra-Core development history (v0.3.0 snapshot)
- Correct metadata and category tagging for all new imports

### Known Issues

**[Lyra-Core v0.1.0] - 2025-09-23**

- No feedback loop (thumbs up/down) yet
- Forget/delete flow is manual (via memory IDs)
- Memory latency ~1–4s depending on embedding model

### Next Planned

**[Lyra_RAG v0.1.0] - 2025-11-07**

- Optional `where` filter parameter for category/date queries
- Graceful "no results" handler for empty retrievals
- `rag_docs_import.py` for PDFs and other document types

---