Files
project-lyra/CHANGELOG.md
2025-12-19 17:43:22 -05:00

984 lines
41 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Project Lyra Changelog
All notable changes to Project Lyra.
Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/).
---
## [Unreleased]
---
## [0.6.0] - 2025-12-18
### Added - Autonomy System (Phase 1 & 2)
**Autonomy Phase 1** - Self-Awareness & Planning Foundation
- **Executive Planning Module** [cortex/autonomy/executive/planner.py](cortex/autonomy/executive/planner.py)
- Autonomous goal setting and task planning capabilities
- Multi-step reasoning for complex objectives
- Integration with self-state tracking
- **Self-State Management** [cortex/data/self_state.json](cortex/data/self_state.json)
- Persistent state tracking across sessions
- Memory of past actions and outcomes
- Self-awareness metadata storage
- **Self Analyzer** [cortex/autonomy/self/analyzer.py](cortex/autonomy/self/analyzer.py)
- Analyzes own performance and decision patterns
- Identifies areas for improvement
- Tracks cognitive patterns over time
- **Test Suite** [cortex/tests/test_autonomy_phase1.py](cortex/tests/test_autonomy_phase1.py)
- Unit tests for phase 1 autonomy features
**Autonomy Phase 2** - Decision Making & Proactive Behavior
- **Autonomous Actions Module** [cortex/autonomy/actions/autonomous_actions.py](cortex/autonomy/actions/autonomous_actions.py)
- Self-initiated action execution
- Context-aware decision implementation
- Action logging and tracking
- **Pattern Learning System** [cortex/autonomy/learning/pattern_learner.py](cortex/autonomy/learning/pattern_learner.py)
- Learns from interaction patterns
- Identifies recurring user needs
- Adapts behavior based on learned patterns
- **Proactive Monitor** [cortex/autonomy/proactive/monitor.py](cortex/autonomy/proactive/monitor.py)
- Monitors system state for intervention opportunities
- Detects patterns requiring proactive response
- Background monitoring capabilities
- **Decision Engine** [cortex/autonomy/tools/decision_engine.py](cortex/autonomy/tools/decision_engine.py)
- Autonomous decision-making framework
- Weighs options and selects optimal actions
- Integrates with orchestrator for coordinated decisions
- **Orchestrator** [cortex/autonomy/tools/orchestrator.py](cortex/autonomy/tools/orchestrator.py)
- Coordinates multiple autonomy subsystems
- Manages tool selection and execution
- Handles NeoMem integration (with disable capability)
- **Test Suite** [cortex/tests/test_autonomy_phase2.py](cortex/tests/test_autonomy_phase2.py)
- Unit tests for phase 2 autonomy features
**Autonomy Phase 2.5** - Pipeline Refinement
- Tightened integration between autonomy modules and reasoning pipeline
- Enhanced self-state persistence and tracking
- Improved orchestrator reliability
- NeoMem integration refinements in vector store handling [neomem/neomem/vector_stores/qdrant.py](neomem/neomem/vector_stores/qdrant.py)
### Added - Documentation
- **Complete AI Agent Breakdown** [docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md](docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md)
- Comprehensive system architecture documentation
- Detailed component descriptions
- Data flow diagrams
- Integration points and API specifications
### Changed - Core Integration
- **Router Updates** [cortex/router.py](cortex/router.py)
- Integrated autonomy subsystems into main routing logic
- Added endpoints for autonomous decision-making
- Enhanced state management across requests
- **Reasoning Pipeline** [cortex/reasoning/reasoning.py](cortex/reasoning/reasoning.py)
- Integrated autonomy-aware reasoning
- Self-state consideration in reasoning process
- **Persona Layer** [cortex/persona/speak.py](cortex/persona/speak.py)
- Autonomy-aware response generation
- Self-state reflection in personality expression
- **Context Handling** [cortex/context.py](cortex/context.py)
- NeoMem disable capability for flexible deployment
### Changed - Development Environment
- Updated [.gitignore](.gitignore) for better workspace management
- Cleaned up VSCode settings
- Removed [.vscode/settings.json](.vscode/settings.json) from repository
### Technical Improvements
- Modular autonomy architecture with clear separation of concerns
- Test-driven development for new autonomy features
- Enhanced state persistence across system restarts
- Flexible NeoMem integration with enable/disable controls
### Architecture - Autonomy System Design
The autonomy system operates in layers:
1. **Executive Layer** - High-level planning and goal setting
2. **Decision Layer** - Evaluates options and makes choices
3. **Action Layer** - Executes autonomous decisions
4. **Learning Layer** - Adapts behavior based on patterns
5. **Monitoring Layer** - Proactive awareness of system state
All layers coordinate through the orchestrator and maintain state in `self_state.json`.
---
## [0.5.2] - 2025-12-12
### Fixed - LLM Router & Async HTTP
- **Critical**: Replaced synchronous `requests` with async `httpx` in LLM router [cortex/llm/llm_router.py](cortex/llm/llm_router.py)
- Event loop blocking was causing timeouts and empty responses
- All three providers (MI50, Ollama, OpenAI) now use `await http_client.post()`
- Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake
- **Critical**: Fixed missing `backend` parameter in intake summarization [cortex/intake/intake.py:285](cortex/intake/intake.py#L285)
- Was defaulting to PRIMARY (MI50) instead of respecting `INTAKE_LLM=SECONDARY`
- Now correctly uses configured backend (Ollama on 3090)
- **Relay**: Fixed session ID case mismatch [core/relay/server.js:87](core/relay/server.js#L87)
- UI sends `sessionId` (camelCase) but relay expected `session_id` (snake_case)
- Now accepts both variants: `req.body.session_id || req.body.sessionId`
- Custom session IDs now properly tracked instead of defaulting to "default"
### Added - Error Handling & Diagnostics
- Added comprehensive error handling in LLM router for all providers
- HTTPError, JSONDecodeError, KeyError, and generic Exception handling
- Detailed error messages with exception type and description
- Provider-specific error logging (mi50, ollama, openai)
- Added debug logging in intake summarization
- Logs LLM response length and preview
- Validates non-empty responses before JSON parsing
- Helps diagnose empty or malformed responses
### Added - Session Management
- Added session persistence endpoints in relay [core/relay/server.js:160-171](core/relay/server.js#L160-L171)
- `GET /sessions/:id` - Retrieve session history
- `POST /sessions/:id` - Save session history
- In-memory storage using Map (ephemeral, resets on container restart)
- Fixes UI "Failed to load session" errors
### Changed - Provider Configuration
- Added `mi50` provider support for llama.cpp server [cortex/llm/llm_router.py:62-81](cortex/llm/llm_router.py#L62-L81)
- Uses `/completion` endpoint with `n_predict` parameter
- Extracts `content` field from response
- Configured for MI50 GPU with DeepSeek model
- Increased memory retrieval threshold from 0.78 to 0.90 [cortex/.env:20](cortex/.env#L20)
- Filters out low-relevance memories (only returns 90%+ similarity)
- Reduces noise in context retrieval
### Technical Improvements
- Unified async HTTP handling across all LLM providers
- Better separation of concerns between provider implementations
- Improved error messages for debugging LLM API failures
- Consistent timeout handling (120 seconds for all providers)
---
## [0.5.1] - 2025-12-11
### Fixed - Intake Integration
- **Critical**: Fixed `bg_summarize()` function not defined error
- Was only a `TYPE_CHECKING` stub, now implemented as logging stub
- Eliminated `NameError` preventing SESSIONS from persisting correctly
- Function now logs exchange additions and defers summarization to `/reason` endpoint
- **Critical**: Fixed `/ingest` endpoint unreachable code in [router.py:201-233](cortex/router.py#L201-L233)
- Removed early return that prevented `update_last_assistant_message()` from executing
- Removed duplicate `add_exchange_internal()` call
- Implemented lenient error handling (each operation wrapped in try/except)
- **Intake**: Added missing `__init__.py` to make intake a proper Python package [cortex/intake/__init__.py](cortex/intake/__init__.py)
- Prevents namespace package issues
- Enables proper module imports
- Exports `SESSIONS`, `add_exchange_internal`, `summarize_context`
### Added - Diagnostics & Debugging
- Added diagnostic logging to verify SESSIONS singleton behavior
- Module initialization logs SESSIONS object ID [intake.py:14](cortex/intake/intake.py#L14)
- Each `add_exchange_internal()` call logs object ID and buffer state [intake.py:343-358](cortex/intake/intake.py#L343-L358)
- Added `/debug/sessions` HTTP endpoint [router.py:276-305](cortex/router.py#L276-L305)
- Inspect SESSIONS from within running Uvicorn worker
- Shows total sessions, session count, buffer sizes, recent exchanges
- Returns SESSIONS object ID for verification
- Added `/debug/summary` HTTP endpoint [router.py:238-271](cortex/router.py#L238-L271)
- Test `summarize_context()` for any session
- Returns L1/L5/L10/L20/L30 summaries
- Includes buffer size and exchange preview
### Changed - Intake Architecture
- **Intake no longer standalone service** - runs inside Cortex container as pure Python module
- Imported as `from intake.intake import add_exchange_internal, SESSIONS`
- No HTTP calls between Cortex and Intake
- Eliminates network latency and dependency on Intake service being up
- **Deferred summarization**: `bg_summarize()` is now a no-op stub [intake.py:318-325](cortex/intake/intake.py#L318-L325)
- Actual summarization happens during `/reason` call via `summarize_context()`
- Simplifies async/sync complexity
- Prevents NameError when called from `add_exchange_internal()`
- **Lenient error handling**: `/ingest` endpoint always returns success [router.py:201-233](cortex/router.py#L201-L233)
- Each operation wrapped in try/except
- Logs errors but never fails to avoid breaking chat pipeline
- User requirement: never fail chat pipeline
### Documentation
- Added single-worker constraint note in [cortex/Dockerfile:7-8](cortex/Dockerfile#L7-L8)
- Documents that SESSIONS requires single Uvicorn worker
- Notes that multi-worker scaling requires Redis or shared storage
- Updated plan documentation with root cause analysis
---
## [0.5.0] - 2025-11-28
### Fixed - Critical API Wiring & Integration
After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.
#### Cortex → Intake Integration
- **Fixed** `IntakeClient` to use correct Intake v0.2 API endpoints
- Changed `GET /context/{session_id}``GET /summaries?session_id={session_id}`
- Updated JSON response parsing to extract `summary_text` field
- Fixed environment variable name: `INTAKE_API``INTAKE_API_URL`
- Corrected default port: `7083``7080`
- Added deprecation warning to `summarize_turn()` method (endpoint removed in Intake v0.2)
#### Relay → UI Compatibility
- **Added** OpenAI-compatible endpoint `POST /v1/chat/completions`
- Accepts standard OpenAI format with `messages[]` array
- Returns OpenAI-compatible response structure with `choices[]`
- Extracts last message content from messages array
- Includes usage metadata (stub values for compatibility)
- **Refactored** Relay to use shared `handleChatRequest()` function
- Both `/chat` and `/v1/chat/completions` use same core logic
- Eliminates code duplication
- Consistent error handling across endpoints
#### Relay → Intake Connection
- **Fixed** Intake URL fallback in Relay server configuration
- Corrected port: `7082``7080`
- Updated endpoint: `/summary``/add_exchange`
- Now properly sends exchanges to Intake for summarization
#### Code Quality & Python Package Structure
- **Added** missing `__init__.py` files to all Cortex subdirectories
- `cortex/llm/__init__.py`
- `cortex/reasoning/__init__.py`
- `cortex/persona/__init__.py`
- `cortex/ingest/__init__.py`
- `cortex/utils/__init__.py`
- Improves package imports and IDE support
- **Removed** unused import in `cortex/router.py`: `from unittest import result`
- **Deleted** empty file `cortex/llm/resolve_llm_url.py` (was 0 bytes, never implemented)
### Verified Working
Complete end-to-end message flow now operational:
```
UI → Relay (/v1/chat/completions)
Relay → Cortex (/reason)
Cortex → Intake (/summaries) [retrieves context]
Cortex 4-stage pipeline:
1. reflection.py → meta-awareness notes
2. reasoning.py → draft answer
3. refine.py → polished answer
4. persona/speak.py → Lyra personality
Cortex → Relay (returns persona response)
Relay → Intake (/add_exchange) [async summary]
Intake → NeoMem (background memory storage)
Relay → UI (final response)
```
### Documentation
- **Added** comprehensive v0.5.0 changelog entry
- **Updated** README.md to reflect v0.5.0 architecture
- Documented new endpoints
- Updated data flow diagrams
- Clarified Intake v0.2 changes
- Corrected service descriptions
### Issues Resolved
- ❌ Cortex could not retrieve context from Intake (wrong endpoint)
- ❌ UI could not send messages to Relay (endpoint mismatch)
- ❌ Relay could not send summaries to Intake (wrong port/endpoint)
- ❌ Python package imports were implicit (missing __init__.py)
### Known Issues (Non-Critical)
- Session management endpoints not implemented in Relay (`GET/POST /sessions/:id`)
- RAG service currently disabled in docker-compose.yml
- Cortex `/ingest` endpoint is a stub returning `{"status": "ok"}`
### Migration Notes
If upgrading from v0.4.x:
1. Pull latest changes from git
2. Verify environment variables in `.env` files:
- Check `INTAKE_API_URL=http://intake:7080` (not `INTAKE_API`)
- Verify all service URLs use correct ports
3. Restart Docker containers: `docker-compose down && docker-compose up -d`
4. Test with a simple message through the UI
---
## [Infrastructure v1.0.0] - 2025-11-26
### Changed - Environment Variable Consolidation
**Major reorganization to eliminate duplication and improve maintainability**
- Consolidated 9 scattered `.env` files into single source of truth architecture
- Root `.env` now contains all shared infrastructure (LLM backends, databases, API keys, service URLs)
- Service-specific `.env` files minimized to only essential overrides:
- `cortex/.env`: Reduced from 42 to 22 lines (operational parameters only)
- `neomem/.env`: Reduced from 26 to 14 lines (LLM naming conventions only)
- `intake/.env`: Kept at 8 lines (already minimal)
- **Result**: ~24% reduction in total configuration lines (197 → ~150)
**Docker Compose Consolidation**
- All services now defined in single root `docker-compose.yml`
- Relay service updated with complete configuration (env_file, volumes)
- Removed redundant `core/docker-compose.yml` (marked as DEPRECATED)
- Standardized network communication to use Docker container names
**Service URL Standardization**
- Internal services use container names: `http://neomem-api:7077`, `http://cortex:7081`
- External services use IP addresses: `http://10.0.0.43:8000` (vLLM), `http://10.0.0.3:11434` (Ollama)
- Removed IP/container name inconsistencies across files
### Added - Security & Documentation
**Security Templates** - Created `.env.example` files for all services
- Root `.env.example` with sanitized credentials
- Service-specific templates: `cortex/.env.example`, `neomem/.env.example`, `intake/.env.example`, `rag/.env.example`
- All `.env.example` files safe to commit to version control
**Documentation**
- `ENVIRONMENT_VARIABLES.md`: Comprehensive reference for all environment variables
- Variable descriptions, defaults, and usage examples
- Multi-backend LLM strategy documentation
- Troubleshooting guide
- Security best practices
- `DEPRECATED_FILES.md`: Deletion guide for deprecated files with verification steps
**Enhanced .gitignore**
- Ignores all `.env` files (including subdirectories)
- Tracks `.env.example` templates for documentation
- Ignores `.env-backups/` directory
### Removed
- `core/.env` - Redundant with root `.env`, now deleted
- `core/docker-compose.yml` - Consolidated into main compose file (marked DEPRECATED)
### Fixed
- Eliminated duplicate `OPENAI_API_KEY` across 5+ files
- Eliminated duplicate LLM backend URLs across 4+ files
- Eliminated duplicate database credentials across 3+ files
- Resolved Cortex `environment:` section override in docker-compose (now uses env_file)
### Architecture - Multi-Backend LLM Strategy
Root `.env` provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE:
- **Cortex** → vLLM (PRIMARY) for autonomous reasoning
- **NeoMem** → Ollama (SECONDARY) + OpenAI embeddings
- **Intake** → vLLM (PRIMARY) for summarization
- **Relay** → Fallback chain with user preference
Preserves per-service flexibility while eliminating URL duplication.
### Migration
- All original `.env` files backed up to `.env-backups/` with timestamp `20251126_025334`
- Rollback plan documented in `ENVIRONMENT_VARIABLES.md`
- Verification steps provided in `DEPRECATED_FILES.md`
---
## [0.4.x] - 2025-11-13
### Added - Multi-Stage Reasoning Pipeline
**Cortex v0.5 - Complete architectural overhaul**
- **New `reasoning.py` module**
- Async reasoning engine
- Accepts user prompt, identity, RAG block, and reflection notes
- Produces draft internal answers
- Uses primary backend (vLLM)
- **New `reflection.py` module**
- Fully async meta-awareness layer
- Produces actionable JSON "internal notes"
- Enforces strict JSON schema and fallback parsing
- Forces cloud backend (`backend_override="cloud"`)
- **Integrated `refine.py` into pipeline**
- New stage between reflection and persona
- Runs exclusively on primary vLLM backend (MI50)
- Produces final, internally consistent output for downstream persona layer
- **Backend override system**
- Each LLM call can now select its own backend
- Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary
- **Identity loader**
- Added `identity.py` with `load_identity()` for consistent persona retrieval
- **Ingest handler**
- Async stub created for future Intake → NeoMem → RAG pipeline
**Cortex v0.4.1 - RAG Integration**
- **RAG integration**
- Added `rag.py` with `query_rag()` and `format_rag_block()`
- Cortex now queries local RAG API (`http://10.0.0.41:7090/rag/search`)
- Synthesized answers and top excerpts injected into reasoning prompt
### Changed - Unified LLM Architecture
**Cortex v0.5**
- **Unified LLM backend URL handling across Cortex**
- ENV variables must now contain FULL API endpoints
- Removed all internal path-appending (e.g. `.../v1/completions`)
- `llm_router.py` rewritten to use env-provided URLs as-is
- Ensures consistent behavior between draft, reflection, refine, and persona
- **Rebuilt `main.py`**
- Removed old annotation/analysis logic
- New structure: load identity → get RAG → reflect → reason → return draft+notes
- Routes now clean and minimal (`/reason`, `/ingest`, `/health`)
- Async path throughout Cortex
- **Refactored `llm_router.py`**
- Removed old fallback logic during overrides
- OpenAI requests now use `/v1/chat/completions`
- Added proper OpenAI Authorization headers
- Distinct payload format for vLLM vs OpenAI
- Unified, correct parsing across models
- **Simplified Cortex architecture**
- Removed deprecated "context.py" and old reasoning code
- Relay completely decoupled from smart behavior
- **Updated environment specification**
- `LLM_PRIMARY_URL` now set to `http://10.0.0.43:8000/v1/completions`
- `LLM_SECONDARY_URL` remains `http://10.0.0.3:11434/api/generate` (Ollama)
- `LLM_CLOUD_URL` set to `https://api.openai.com/v1/chat/completions`
**Cortex v0.4.1**
- **Revised `/reason` endpoint**
- Now builds unified context blocks: [Intake] → recent summaries, [RAG] → contextual knowledge, [User Message] → current input
- Calls `call_llm()` for first pass, then `reflection_loop()` for meta-evaluation
- Returns `cortex_prompt`, `draft_output`, `final_output`, and normalized reflection
- **Reflection Pipeline Stability**
- Cleaned parsing to normalize JSON vs. text reflections
- Added fallback handling for malformed or non-JSON outputs
- Log system improved to show raw JSON, extracted fields, and normalized summary
- **Async Summarization (Intake v0.2.1)**
- Intake summaries now run in background threads to avoid blocking Cortex
- Summaries (L1L∞) logged asynchronously with [BG] tags
- **Environment & Networking Fixes**
- Verified `.env` variables propagate correctly inside Cortex container
- Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG
- Adjusted localhost calls to service-IP mapping
- **Behavioral Updates**
- Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers)
- RAG context successfully grounds reasoning outputs
- Intake and NeoMem confirmed receiving summaries via `/add_exchange`
- Log clarity pass: all reflective and contextual blocks clearly labeled
### Fixed
**Cortex v0.5**
- Resolved endpoint conflict where router expected base URLs and refine expected full URLs
- Fixed by standardizing full-URL behavior across entire system
- Reflection layer no longer fails silently (previously returned `[""]` due to MythoMax)
- Resolved 404/401 errors caused by incorrect OpenAI URL endpoints
- No more double-routing through vLLM during reflection
- Corrected async/sync mismatch in multiple locations
- Eliminated double-path bug (`/v1/completions/v1/completions`) caused by previous router logic
### Removed
**Cortex v0.5**
- Legacy `annotate`, `reason_check` glue logic from old architecture
- Old backend probing junk code
- Stale imports and unused modules leftover from previous prototype
### Verified
**Cortex v0.5**
- Cortex → vLLM (MI50) → refine → final_output now functioning correctly
- Refine shows `used_primary_backend: true` and no fallback
- Manual curl test confirms endpoint accuracy
### Known Issues
**Cortex v0.5**
- Refine sometimes prefixes output with `"Final Answer:"`; next version will sanitize this
- Hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned)
**Cortex v0.4.1**
- NeoMem tuning needed - improve retrieval latency and relevance
- Need dedicated `/reflections/recent` endpoint for Cortex
- Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem)
- Add persistent reflection recall (use prior reflections as meta-context)
- Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields)
- Tighten temperature and prompt control for factual consistency
- RAG optimization: add source ranking, filtering, multi-vector hybrid search
- Cache RAG responses per session to reduce duplicate calls
### Notes
**Cortex v0.5**
This is the largest structural change to Cortex so far. It establishes:
- Multi-model cognition
- Clean layering
- Identity + reflection separation
- Correct async code
- Deterministic backend routing
- Predictable JSON reflection
The system is now ready for:
- Refinement loops
- Persona-speaking layer
- Containerized RAG
- Long-term memory integration
- True emergent-behavior experiments
---
## [0.3.x] - 2025-10-28 to 2025-09-26
### Added
**[Lyra Core v0.3.2 + Web UI v0.2.0] - 2025-10-28**
- **New UI**
- Cleaned up UI look and feel
- **Sessions**
- Sessions now persist over time
- Ability to create new sessions or load sessions from previous instance
- When changing session, updates what the prompt sends to relay (doesn't prompt with messages from other sessions)
- Relay correctly wired in
**[Lyra-Core 0.3.1] - 2025-10-09**
- **NVGRAM Integration (Full Pipeline Reconnected)**
- Replaced legacy Mem0 service with NVGRAM microservice (`nvgram-api` @ port 7077)
- Updated `server.js` in Relay to route all memory ops via `${NVGRAM_API}/memories` and `/search`
- Added `.env` variable: `NVGRAM_API=http://nvgram-api:7077`
- Verified end-to-end Lyra conversation persistence: `relay → nvgram-api → postgres/neo4j → relay → ollama → ui`
- ✅ Memories stored, retrieved, and re-injected successfully
**[Lyra-Core v0.3.0] - 2025-09-26**
- **Salience filtering** in Relay
- `.env` configurable: `SALIENCE_ENABLED`, `SALIENCE_MODE`, `SALIENCE_MODEL`, `SALIENCE_API_URL`
- Supports `heuristic` and `llm` classification modes
- LLM-based salience filter integrated with Cortex VM running `llama-server`
- Logging improvements
- Added debug logs for salience mode, raw LLM output, and unexpected outputs
- Fail-closed behavior for unexpected LLM responses
- Successfully tested with **Phi-3.5-mini** and **Qwen2-0.5B-Instruct** as salience classifiers
- Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply
**[Cortex v0.3.0] - 2025-10-31**
- **Cortex Service (FastAPI)**
- New standalone reasoning engine (`cortex/main.py`) with endpoints:
- `GET /health` reports active backend + NeoMem status
- `POST /reason` evaluates `{prompt, response}` pairs
- `POST /annotate` experimental text analysis
- Background NeoMem health monitor (5-minute interval)
- **Multi-Backend Reasoning Support**
- Environment-driven backend selection via `LLM_FORCE_BACKEND`
- Supports: Primary (vLLM MI50), Secondary (Ollama 3090), Cloud (OpenAI), Fallback (llama.cpp CPU)
- Per-backend model variables: `LLM_PRIMARY_MODEL`, `LLM_SECONDARY_MODEL`, `LLM_CLOUD_MODEL`, `LLM_FALLBACK_MODEL`
- **Response Normalization Layer**
- Implemented `normalize_llm_response()` to merge streamed outputs and repair malformed JSON
- Handles Ollama's multi-line streaming and Mythomax's missing punctuation issues
- Prints concise debug previews of merged content
- **Environment Simplification**
- Each service (`intake`, `cortex`, `neomem`) now maintains its own `.env` file
- Removed reliance on shared/global env file to prevent cross-contamination
- Verified Docker Compose networking across containers
**[NeoMem 0.1.2] - 2025-10-27** (formerly NVGRAM)
- **Renamed NVGRAM to NeoMem**
- All future updates under name NeoMem
- Features unchanged
**[NVGRAM 0.1.1] - 2025-10-08**
- **Async Memory Rewrite (Stability + Safety Patch)**
- Introduced `AsyncMemory` class with fully asynchronous vector and graph store writes
- Added input sanitation to prevent embedding errors (`'list' object has no attribute 'replace'`)
- Implemented `flatten_messages()` helper in API layer to clean malformed payloads
- Added structured request logging via `RequestLoggingMiddleware` (FastAPI middleware)
- Health endpoint (`/health`) returns structured JSON `{status, version, service}`
- Startup logs include sanitized embedder config with masked API keys
**[NVGRAM 0.1.0] - 2025-10-07**
- **Initial fork of Mem0 → NVGRAM**
- Created fully independent local-first memory engine based on Mem0 OSS
- Renamed all internal modules, Docker services, environment variables from `mem0``nvgram`
- New service name: `nvgram-api`, default port 7077
- Maintains same API endpoints (`/memories`, `/search`) for drop-in compatibility
- Uses FastAPI, Postgres, and Neo4j as persistent backends
**[Lyra-Mem0 0.3.2] - 2025-10-05**
- **Ollama LLM reasoning** alongside OpenAI embeddings
- Introduced `LLM_PROVIDER=ollama`, `LLM_MODEL`, and `OLLAMA_HOST` in `.env.3090`
- Verified local 3090 setup using `qwen2.5:7b-instruct-q4_K_M`
- Split processing: Embeddings → OpenAI `text-embedding-3-small`, LLM → Local Ollama
- Added `.env.3090` template for self-hosted inference nodes
- Integrated runtime diagnostics and seeder progress tracking
- File-level + message-level progress bars
- Retry/back-off logic for timeouts (3 attempts)
- Event logging (`ADD / UPDATE / NONE`) for every memory record
- Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers
- Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090)
**[Lyra-Mem0 0.3.1] - 2025-10-03**
- HuggingFace TEI integration (local 3090 embedder)
- Dual-mode environment switch between OpenAI cloud and local
- CSV export of memories from Postgres (`payload->>'data'`)
**[Lyra-Mem0 0.3.0]**
- **Ollama embeddings** in Mem0 OSS container
- Configure `EMBEDDER_PROVIDER=ollama`, `EMBEDDER_MODEL`, `OLLAMA_HOST` via `.env`
- Mounted `main.py` override from host into container to load custom `DEFAULT_CONFIG`
- Installed `ollama` Python client into custom API container image
- `.env.3090` file for external embedding mode (3090 machine)
- Workflow for multiple embedding modes: LAN-based 3090/Ollama, Local-only CPU, OpenAI fallback
**[Lyra-Mem0 v0.2.1]**
- **Seeding pipeline**
- Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0
- Implemented incremental seeding option (skip existing memories, only add new ones)
- Verified insert process with Postgres-backed history DB
**[Intake v0.1.0] - 2025-10-27**
- Receives messages from relay and summarizes them in cascading format
- Continues to summarize smaller amounts of exchanges while generating large-scale conversational summaries (L20)
- Currently logs summaries to .log file in `/project-lyra/intake-logs/`
**[Lyra-Cortex v0.2.0] - 2025-09-26**
- Integrated **llama-server** on dedicated Cortex VM (Proxmox)
- Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs
- Benchmarked Phi-3.5-mini performance: ~18 tokens/sec CPU-only on Ryzen 7 7800X
- Salience classification functional but sometimes inconsistent
- Tested **Qwen2-0.5B-Instruct GGUF** as alternative salience classifier
- Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval)
- More responsive but over-classifies messages as "salient"
- Established `.env` integration for model ID (`SALIENCE_MODEL`), enabling hot-swap between models
### Changed
**[Lyra-Core 0.3.1] - 2025-10-09**
- Renamed `MEM0_URL``NVGRAM_API` across all relay environment configs
- Updated Docker Compose service dependency order
- `relay` now depends on `nvgram-api` healthcheck
- Removed `mem0` references and volumes
- Minor cleanup to Persona fetch block (null-checks and safer default persona string)
**[Lyra-Core v0.3.1] - 2025-09-27**
- Removed salience filter logic; Cortex is now default annotator
- All user messages stored in Mem0; no discard tier applied
- Cortex annotations (`metadata.cortex`) now attached to memories
- Debug logging improvements
- Pretty-print Cortex annotations
- Injected prompt preview
- Memory search hit list with scores
- `.env` toggle (`CORTEX_ENABLED`) to bypass Cortex when needed
**[Lyra-Core v0.3.0] - 2025-09-26**
- Refactored `server.js` to gate `mem.add()` calls behind salience filter
- Updated `.env` to support `SALIENCE_MODEL`
**[Cortex v0.3.0] - 2025-10-31**
- Refactored `reason_check()` to dynamically switch between **prompt** and **chat** mode depending on backend
- Enhanced startup logs to announce active backend, model, URL, and mode
- Improved error handling with clearer "Reasoning error" messages
**[NVGRAM 0.1.1] - 2025-10-08**
- Replaced synchronous `Memory.add()` with async-safe version supporting concurrent vector + graph writes
- Normalized indentation and cleaned duplicate `main.py` references
- Removed redundant `FastAPI()` app reinitialization
- Updated internal logging to INFO-level timing format
- Deprecated `@app.on_event("startup")` → will migrate to `lifespan` handler in v0.1.2
**[NVGRAM 0.1.0] - 2025-10-07**
- Removed dependency on external `mem0ai` SDK — all logic now local
- Re-pinned requirements: fastapi==0.115.8, uvicorn==0.34.0, pydantic==2.10.4, python-dotenv==1.0.1, psycopg>=3.2.8, ollama
- Adjusted `docker-compose` and `.env` templates to use new NVGRAM naming
**[Lyra-Mem0 0.3.2] - 2025-10-05**
- Updated `main.py` configuration block to load `LLM_PROVIDER`, `LLM_MODEL`, `OLLAMA_BASE_URL`
- Fallback to OpenAI if Ollama unavailable
- Adjusted `docker-compose.yml` mount paths to correctly map `/app/main.py`
- Normalized `.env` loading so `mem0-api` and host environment share identical values
- Improved seeder logging and progress telemetry
- Added explicit `temperature` field to `DEFAULT_CONFIG['llm']['config']`
**[Lyra-Mem0 0.3.0]**
- `docker-compose.yml` updated to mount local `main.py` and `.env.3090`
- Built custom Dockerfile (`mem0-api-server:latest`) extending base image with `pip install ollama`
- Updated `requirements.txt` to include `ollama` package
- Adjusted Mem0 container config so `main.py` pulls environment variables with `dotenv`
- Tested new embeddings path with curl `/memories` API call
**[Lyra-Mem0 v0.2.1]**
- Updated `main.py` to load configuration from `.env` using `dotenv` and support multiple embedder backends
- Mounted host `main.py` into container so local edits persist across rebuilds
- Updated `docker-compose.yml` to mount `.env.3090` and support swap between profiles
- Built custom Dockerfile (`mem0-api-server:latest`) including `pip install ollama`
- Updated `requirements.txt` with `ollama` dependency
- Adjusted startup flow so container automatically connects to external Ollama host (LAN IP)
- Added logging to confirm model pulls and embedding requests
### Fixed
**[Lyra-Core 0.3.1] - 2025-10-09**
- Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling
- `/memories` POST failures no longer crash Relay; now logged gracefully as `relay error Error: memAdd failed: 500`
- Improved injected prompt debugging (`DEBUG_PROMPT=true` now prints clean JSON)
**[Lyra-Core v0.3.1] - 2025-09-27**
- Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner
- Relay no longer "hangs" on malformed Cortex outputs
**[Cortex v0.3.0] - 2025-10-31**
- Corrected broken vLLM endpoint routing (`/v1/completions`)
- Stabilized cross-container health reporting for NeoMem
- Resolved JSON parse failures caused by streaming chunk delimiters
**[NVGRAM 0.1.1] - 2025-10-08**
- Eliminated repeating 500 error from OpenAI embedder caused by non-string message content
- Masked API key leaks from boot logs
- Ensured Neo4j reconnects gracefully on first retry
**[Lyra-Mem0 0.3.2] - 2025-10-05**
- Resolved crash during startup: `TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'`
- Corrected mount type mismatch (file vs directory) causing `OCI runtime create failed` errors
- Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests
- "Unknown event" warnings now safely ignored (no longer break seeding loop)
- Confirmed full dual-provider operation in logs (`api.openai.com` + `10.0.0.3:11434/api/chat`)
**[Lyra-Mem0 0.3.1] - 2025-10-03**
- `.env` CRLF vs LF line ending issues
- Local seeding now possible via HuggingFace server
**[Lyra-Mem0 0.3.0]**
- Resolved container boot failure caused by missing `ollama` dependency (`ModuleNotFoundError`)
- Fixed config overwrite issue where rebuilding container restored stock `main.py`
- Worked around Neo4j error (`vector.similarity.cosine(): mismatched vector dimensions`) by confirming OpenAI vs. Ollama embedding vector sizes
**[Lyra-Mem0 v0.2.1]**
- Seeder process originally failed on old memories — now skips duplicates and continues batch
- Resolved container boot error (`ModuleNotFoundError: ollama`) by extending image
- Fixed overwrite issue where stock `main.py` replaced custom config during rebuild
- Worked around Neo4j `vector.similarity.cosine()` dimension mismatch
### Known Issues
**[Lyra-Core v0.3.0] - 2025-09-26**
- Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient"
- Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi")
- CPU-only inference is functional but limited; larger models recommended once GPU available
**[Lyra-Cortex v0.2.0] - 2025-09-26**
- Small models tend to drift or over-classify
- CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models
- Need to set up `systemd` service for `llama-server` to auto-start on VM reboot
### Observations
**[Lyra-Mem0 0.3.2] - 2025-10-05**
- Stable GPU utilization: ~8 GB VRAM @ 92% load, ≈ 67°C under sustained seeding
- Next revision will re-format seed JSON to preserve `role` context (user vs assistant)
**[Lyra-Mem0 v0.2.1]**
- To fully unify embedding modes, a Hugging Face / local model with **1536-dim embeddings** will be needed (to match OpenAI's schema)
- Current Ollama model (`mxbai-embed-large`) works, but returns 1024-dim vectors
- Seeder workflow validated but should be wrapped in repeatable weekly run for full Cloud→Local sync
### Next Steps
**[Lyra-Core 0.3.1] - 2025-10-09**
- Add salience visualization (e.g., memory weights displayed in injected system message)
- Begin schema alignment with NVGRAM v0.1.2 for confidence scoring
- Add relay auto-retry for transient 500 responses from NVGRAM
**[NVGRAM 0.1.1] - 2025-10-08**
- Integrate salience scoring and embedding confidence weight fields in Postgres schema
- Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall
- Migrate from deprecated `on_event``lifespan` pattern in 0.1.2
**[NVGRAM 0.1.0] - 2025-10-07**
- Integrate NVGRAM as new default backend in Lyra Relay
- Deprecate remaining Mem0 references and archive old configs
- Begin versioning as standalone project (`nvgram-core`, `nvgram-api`, etc.)
**[Intake v0.1.0] - 2025-10-27**
- Feed intake into NeoMem
- Generate daily/hourly overall summary (IE: Today Brian and Lyra worked on x, y, and z)
- Generate session-aware summaries with own intake hopper
---
## [0.2.x] - 2025-09-30 to 2025-09-24
### Added
**[Lyra-Mem0 v0.2.0] - 2025-09-30**
- Standalone **Lyra-Mem0** stack created at `~/lyra-mem0/`
- Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking
- Added working `docker-compose.mem0.yml` and custom `Dockerfile` for building Mem0 API server
- Verified REST API functionality
- `POST /memories` works for adding memories
- `POST /search` works for semantic search
- Successful end-to-end test with persisted memory: *"Likes coffee in the morning"* → retrievable via search ✅
**[Lyra-Core v0.2.0] - 2025-09-24**
- Migrated Relay to use `mem0ai` SDK instead of raw fetch calls
- Implemented `sessionId` support (client-supplied, fallback to `default`)
- Added debug logs for memory add/search
- Cleaned up Relay structure for clarity
### Changed
**[Lyra-Mem0 v0.2.0] - 2025-09-30**
- Split architecture into modular stacks:
- `~/lyra-core` (Relay, Persona-Sidecar, etc.)
- `~/lyra-mem0` (Mem0 OSS memory stack)
- Removed old embedded mem0 containers from Lyra-Core compose file
- Added Lyra-Mem0 section in README.md
### Next Steps
**[Lyra-Mem0 v0.2.0] - 2025-09-30**
- Wire **Relay → Mem0 API** (integration not yet complete)
- Add integration tests to verify persistence and retrieval from within Lyra-Core
---
## [0.1.x] - 2025-09-25 to 2025-09-23
### Added
**[Lyra_RAG v0.1.0] - 2025-11-07**
- Initial standalone RAG module for Project Lyra
- Persistent ChromaDB vector store (`./chromadb`)
- Importer `rag_chat_import.py` with:
- Recursive folder scanning and category tagging
- Smart chunking (~5k chars)
- SHA-1 deduplication and chat-ID metadata
- Timestamp fields (`file_modified`, `imported_at`)
- Background-safe operation (`nohup`/`tmux`)
- 68 Lyra-category chats imported:
- 6,556 new chunks added
- 1,493 duplicates skipped
- 7,997 total vectors stored
**[Lyra_RAG v0.1.0 API] - 2025-11-07**
- `/rag/search` FastAPI endpoint implemented (port 7090)
- Supports natural-language queries and returns top related excerpts
- Added answer synthesis step using `gpt-4o-mini`
**[Lyra-Core v0.1.0] - 2025-09-23**
- First working MVP of **Lyra Core Relay**
- Relay service accepts `POST /v1/chat/completions` (OpenAI-compatible)
- Memory integration with Mem0:
- `POST /memories` on each user message
- `POST /search` before LLM call
- Persona Sidecar integration (`GET /current`)
- OpenAI GPT + Ollama (Mythomax) support in Relay
- Simple browser-based chat UI (talks to Relay at `http://<host>:7078`)
- `.env` standardization for Relay + Mem0 + Postgres + Neo4j
- Working Neo4j + Postgres backing stores for Mem0
- Initial MVP relay service with raw fetch calls to Mem0
- Dockerized with basic healthcheck
**[Lyra-Cortex v0.1.0] - 2025-09-25**
- First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD)
- Built **llama.cpp** with `llama-server` target via CMake
- Integrated **Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF)** model
- Verified API compatibility at `/v1/chat/completions`
- Local test successful via `curl` → ~523 token response generated
- Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X)
- Confirmed usable for salience scoring, summarization, and lightweight reasoning
### Fixed
**[Lyra-Core v0.1.0] - 2025-09-23**
- Resolved crash loop in Neo4j by restricting env vars (`NEO4J_AUTH` only)
- Relay now correctly reads `MEM0_URL` and `MEM0_API_KEY` from `.env`
### Verified
**[Lyra_RAG v0.1.0] - 2025-11-07**
- Successful recall of Lyra-Core development history (v0.3.0 snapshot)
- Correct metadata and category tagging for all new imports
### Known Issues
**[Lyra-Core v0.1.0] - 2025-09-23**
- No feedback loop (thumbs up/down) yet
- Forget/delete flow is manual (via memory IDs)
- Memory latency ~14s depending on embedding model
### Next Planned
**[Lyra_RAG v0.1.0] - 2025-11-07**
- Optional `where` filter parameter for category/date queries
- Graceful "no results" handler for empty retrievals
- `rag_docs_import.py` for PDFs and other document types
---