Files

serversdwn d9281a1816 docs updated

2025-11-28 18:05:59 -05:00

35 KiB

Raw Blame History

Project Lyra — Modular Changelog

All notable changes to Project Lyra are organized by component. The format is based on Keep a Changelog and adheres to Semantic Versioning.

Last Updated: 11-28-25

🧠 Lyra-Core

[Project Lyra v0.5.0] - 2025-11-28

🔧 Fixed - Critical API Wiring & Integration

After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.

Cortex → Intake Integration ✅

Fixed IntakeClient to use correct Intake v0.2 API endpoints
- Changed GET /context/{session_id} → GET /summaries?session_id={session_id}
- Updated JSON response parsing to extract summary_text field
- Fixed environment variable name: INTAKE_API → INTAKE_API_URL
- Corrected default port: 7083 → 7080
- Added deprecation warning to summarize_turn() method (endpoint removed in Intake v0.2)

Relay → UI Compatibility ✅

Added OpenAI-compatible endpoint POST /v1/chat/completions
- Accepts standard OpenAI format with messages[] array
- Returns OpenAI-compatible response structure with choices[]
- Extracts last message content from messages array
- Includes usage metadata (stub values for compatibility)
Refactored Relay to use shared handleChatRequest() function
- Both /chat and /v1/chat/completions use same core logic
- Eliminates code duplication
- Consistent error handling across endpoints

Relay → Intake Connection ✅

Fixed Intake URL fallback in Relay server configuration
- Corrected port: 7082 → 7080
- Updated endpoint: /summary → /add_exchange
- Now properly sends exchanges to Intake for summarization

Code Quality & Python Package Structure ✅

Added missing __init__.py files to all Cortex subdirectories
- cortex/llm/__init__.py
- cortex/reasoning/__init__.py
- cortex/persona/__init__.py
- cortex/ingest/__init__.py
- cortex/utils/__init__.py
- Improves package imports and IDE support
Removed unused import in cortex/router.py: from unittest import result
Deleted empty file cortex/llm/resolve_llm_url.py (was 0 bytes, never implemented)

✅ Verified Working

Complete end-to-end message flow now operational:

UI → Relay (/v1/chat/completions)
  ↓
Relay → Cortex (/reason)
  ↓
Cortex → Intake (/summaries) [retrieves context]
  ↓
Cortex 4-stage pipeline:
  1. reflection.py → meta-awareness notes
  2. reasoning.py → draft answer
  3. refine.py → polished answer
  4. persona/speak.py → Lyra personality
  ↓
Cortex → Relay (returns persona response)
  ↓
Relay → Intake (/add_exchange) [async summary]
  ↓
Intake → NeoMem (background memory storage)
  ↓
Relay → UI (final response)

📝 Documentation

Added this CHANGELOG entry with comprehensive v0.5.0 notes
Updated README.md to reflect v0.5.0 architecture
- Documented new endpoints
- Updated data flow diagrams
- Clarified Intake v0.2 changes
- Corrected service descriptions

🐛 Issues Resolved

❌ Cortex could not retrieve context from Intake (wrong endpoint)
❌ UI could not send messages to Relay (endpoint mismatch)
❌ Relay could not send summaries to Intake (wrong port/endpoint)
❌ Python package imports were implicit (missing init.py)

⚠️ Known Issues (Non-Critical)

Session management endpoints not implemented in Relay (GET/POST /sessions/:id)
RAG service currently disabled in docker-compose.yml
Cortex /ingest endpoint is a stub returning {"status": "ok"}

🎯 Migration Notes

If upgrading from v0.4.x:

Pull latest changes from git
Verify environment variables in .env files:
- Check INTAKE_API_URL=http://intake:7080 (not INTAKE_API)
- Verify all service URLs use correct ports
Restart Docker containers: docker-compose down && docker-compose up -d
Test with a simple message through the UI

[Infrastructure v1.0.0] - 2025-11-26

Changed

Environment Variable Consolidation - Major reorganization to eliminate duplication and improve maintainability
- Consolidated 9 scattered .env files into single source of truth architecture
- Root .env now contains all shared infrastructure (LLM backends, databases, API keys, service URLs)
- Service-specific .env files minimized to only essential overrides:
  - cortex/.env: Reduced from 42 to 22 lines (operational parameters only)
  - neomem/.env: Reduced from 26 to 14 lines (LLM naming conventions only)
  - intake/.env: Kept at 8 lines (already minimal)
- Result: ~24% reduction in total configuration lines (197 → ~150)
Docker Compose Consolidation
- All services now defined in single root docker-compose.yml
- Relay service updated with complete configuration (env_file, volumes)
- Removed redundant core/docker-compose.yml (marked as DEPRECATED)
- Standardized network communication to use Docker container names
Service URL Standardization
- Internal services use container names: http://neomem-api:7077, http://cortex:7081
- External services use IP addresses: http://10.0.0.43:8000 (vLLM), http://10.0.0.3:11434 (Ollama)
- Removed IP/container name inconsistencies across files

Added

Security Templates - Created .env.example files for all services
- Root .env.example with sanitized credentials
- Service-specific templates: cortex/.env.example, neomem/.env.example, intake/.env.example, rag/.env.example
- All .env.example files safe to commit to version control
Documentation
- ENVIRONMENT_VARIABLES.md: Comprehensive reference for all environment variables
  - Variable descriptions, defaults, and usage examples
  - Multi-backend LLM strategy documentation
  - Troubleshooting guide
  - Security best practices
- DEPRECATED_FILES.md: Deletion guide for deprecated files with verification steps
Enhanced .gitignore
- Ignores all .env files (including subdirectories)
- Tracks .env.example templates for documentation
- Ignores .env-backups/ directory

Removed

core/.env - Redundant with root .env, now deleted
core/docker-compose.yml - Consolidated into main compose file (marked DEPRECATED)

Fixed

Eliminated duplicate OPENAI_API_KEY across 5+ files
Eliminated duplicate LLM backend URLs across 4+ files
Eliminated duplicate database credentials across 3+ files
Resolved Cortex environment: section override in docker-compose (now uses env_file)

Architecture

Multi-Backend LLM Strategy: Root .env provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE
- Cortex → vLLM (PRIMARY) for autonomous reasoning
- NeoMem → Ollama (SECONDARY) + OpenAI embeddings
- Intake → vLLM (PRIMARY) for summarization
- Relay → Fallback chain with user preference
Preserves per-service flexibility while eliminating URL duplication

Migration

All original .env files backed up to .env-backups/ with timestamp 20251126_025334
Rollback plan documented in ENVIRONMENT_VARIABLES.md
Verification steps provided in DEPRECATED_FILES.md

[Lyra_RAG v0.1.0] 2025-11-07

Added

Initial standalone RAG module for Project Lyra.
Persistent ChromaDB vector store (./chromadb).
Importer rag_chat_import.py with:
- Recursive folder scanning and category tagging.
- Smart chunking (~5 k chars).
- SHA-1 deduplication and chat-ID metadata.
- Timestamp fields (file_modified, imported_at).
- Background-safe operation (nohup/tmux).
68 Lyra-category chats imported:
- 6 556 new chunks added
- 1 493 duplicates skipped
- 7 997 total vectors now stored.

API

/rag/search FastAPI endpoint implemented (port 7090).
Supports natural-language queries and returns top related excerpts.
Added answer synthesis step using gpt-4o-mini.

Verified

Successful recall of Lyra-Core development history (v0.3.0 snapshot).
Correct metadata and category tagging for all new imports.

Next Planned

Optional where filter parameter for category/date queries.
Graceful “no results” handler for empty retrievals.
rag_docs_import.py for PDFs and other document types.

[Lyra Core v0.3.2 + Web Ui v0.2.0] - 2025-10-28

Added

** New UI **
- Cleaned up UI look and feel.
** Added "sessions" **
- Now sessions persist over time.
- Ability to create new sessions or load sessions from a previous instance.
- When changing the session, it updates what the prompt is sending relay (doesn't prompt with messages from other sessions).
- Relay is correctly wired in.

[Lyra-Core 0.3.1] - 2025-10-09

Added

NVGRAM Integration (Full Pipeline Reconnected)
- Replaced legacy Mem0 service with NVGRAM microservice (nvgram-api @ port 7077).
- Updated server.js in Relay to route all memory ops via ${NVGRAM_API}/memories and /search.
- Added .env variable:
```
NVGRAM_API=http://nvgram-api:7077
```
- Verified end-to-end Lyra conversation persistence:
  - relay → nvgram-api → postgres/neo4j → relay → ollama → ui
  - ✅ Memories stored, retrieved, and re-injected successfully.

Changed

Renamed MEM0_URL → NVGRAM_API across all relay environment configs.
Updated Docker Compose service dependency order:
- relay now depends on nvgram-api healthcheck.
- Removed mem0 references and volumes.
Minor cleanup to Persona fetch block (null-checks and safer default persona string).

Fixed

Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling.
/memories POST failures no longer crash Relay; now logged gracefully as relay error Error: memAdd failed: 500.
Improved injected prompt debugging (DEBUG_PROMPT=true now prints clean JSON).

Goals / Next Steps

Add salience visualization (e.g., memory weights displayed in injected system message).
Begin schema alignment with NVGRAM v0.1.2 for confidence scoring.
Add relay auto-retry for transient 500 responses from NVGRAM.

[Lyra-Core] v0.3.1 - 2025-09-27

Changed

Removed salience filter logic; Cortex is now the default annotator.
All user messages stored in Mem0; no discard tier applied.

Added

Cortex annotations (metadata.cortex) now attached to memories.
Debug logging improvements:
- Pretty-print Cortex annotations
- Injected prompt preview
- Memory search hit list with scores
.env toggle (CORTEX_ENABLED) to bypass Cortex when needed.

Fixed

Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner.
Relay no longer “hangs” on malformed Cortex outputs.

[Lyra-Core] v0.3.0 — 2025-09-26

Added

Implemented salience filtering in Relay:
- .env configurable: SALIENCE_ENABLED, SALIENCE_MODE, SALIENCE_MODEL, SALIENCE_API_URL.
- Supports heuristic and llm classification modes.
- LLM-based salience filter integrated with Cortex VM running llama-server.
Logging improvements:
- Added debug logs for salience mode, raw LLM output, and unexpected outputs.
- Fail-closed behavior for unexpected LLM responses.
Successfully tested with Phi-3.5-mini and Qwen2-0.5B-Instruct as salience classifiers.
Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply.

Changed

Refactored server.js to gate mem.add() calls behind salience filter.
Updated .env to support SALIENCE_MODEL.

Known Issues

Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient".
Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi").
CPU-only inference is functional but limited; larger models recommended once GPU is available.

[Lyra-Core] v0.2.0 — 2025-09-24

Added

Migrated Relay to use mem0ai SDK instead of raw fetch calls.
Implemented sessionId support (client-supplied, fallback to default).
Added debug logs for memory add/search.
Cleaned up Relay structure for clarity.

[Lyra-Core] v0.1.0 — 2025-09-23

Added

First working MVP of Lyra Core Relay.
Relay service accepts POST /v1/chat/completions (OpenAI-compatible).
Memory integration with Mem0:
- POST /memories on each user message.
- POST /search before LLM call.
Persona Sidecar integration (GET /current).
OpenAI GPT + Ollama (Mythomax) support in Relay.
Simple browser-based chat UI (talks to Relay at http://<host>:7078).
.env standardization for Relay + Mem0 + Postgres + Neo4j.
Working Neo4j + Postgres backing stores for Mem0.
Initial MVP relay service with raw fetch calls to Mem0.
Dockerized with basic healthcheck.

Fixed

Resolved crash loop in Neo4j by restricting env vars (NEO4J_AUTH only).
Relay now correctly reads MEM0_URL and MEM0_API_KEY from .env.

Known Issues

No feedback loop (thumbs up/down) yet.
Forget/delete flow is manual (via memory IDs).
Memory latency ~1–4s depending on embedding model.

🧩 lyra-neomem (used to be NVGRAM / Lyra-Mem0)

[NeoMem 0.1.2] - 2025-10-27

Changed

Renamed NVGRAM to neomem
- All future updates will be under the name NeoMem.
- Features have not changed.

[NVGRAM 0.1.1] - 2025-10-08

Added

Async Memory Rewrite (Stability + Safety Patch)
- Introduced AsyncMemory class with fully asynchronous vector and graph store writes.
- Added input sanitation to prevent embedding errors ('list' object has no attribute 'replace').
- Implemented flatten_messages() helper in API layer to clean malformed payloads.
- Added structured request logging via RequestLoggingMiddleware (FastAPI middleware).
- Health endpoint (/health) now returns structured JSON {status, version, service}.
- Startup logs now include sanitized embedder config with API keys masked for safety:
```
>>> Embedder config (sanitized): {'provider': 'openai', 'config': {'model': 'text-embedding-3-small', 'api_key': '***'}}
✅ Connected to Neo4j on attempt 1
🧠 NVGRAM v0.1.1 — Neural Vectorized Graph Recall and Memory initialized
```

Changed

Replaced synchronous Memory.add() with async-safe version supporting concurrent vector + graph writes.
Normalized indentation and cleaned duplicate main.py references under /nvgram/ vs /nvgram/server/.
Removed redundant FastAPI() app reinitialization.
Updated internal logging to INFO-level timing format: 2025-10-08 21:48:45 [INFO] POST /memories -> 200 (11189.1 ms)
Deprecated @app.on_event("startup") (FastAPI deprecation warning) → will migrate to lifespan handler in v0.1.2.

Fixed

Eliminated repeating 500 error from OpenAI embedder caused by non-string message content.
Masked API key leaks from boot logs.
Ensured Neo4j reconnects gracefully on first retry.

Goals / Next Steps

Integrate salience scoring and embedding confidence weight fields in Postgres schema.
Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall.
Migrate from deprecated on_event → lifespan pattern in 0.1.2.

[NVGRAM 0.1.0] - 2025-10-07

Added

Initial fork of Mem0 → NVGRAM:
- Created a fully independent local-first memory engine based on Mem0 OSS.
- Renamed all internal modules, Docker services, and environment variables from mem0 → nvgram.
- New service name: nvgram-api, default port 7077.
- Maintains same API endpoints (/memories, /search) for drop-in compatibility with Lyra Core.
- Uses FastAPI, Postgres, and Neo4j as persistent backends.
- Verified clean startup:
```
✅ Connected to Neo4j on attempt 1
INFO: Uvicorn running on http://0.0.0.0:7077
```
- /docs and /openapi.json confirmed reachable and functional.

Changed

Removed dependency on the external mem0ai SDK — all logic now local.
Re-pinned requirements:
- fastapi==0.115.8
- uvicorn==0.34.0
- pydantic==2.10.4
- python-dotenv==1.0.1
- psycopg>=3.2.8
- ollama
Adjusted docker-compose and .env templates to use new NVGRAM naming and image paths.

Goals / Next Steps

Integrate NVGRAM as the new default backend in Lyra Relay.
Deprecate remaining Mem0 references and archive old configs.
Begin versioning as a standalone project (nvgram-core, nvgram-api, etc.).

[Lyra-Mem0 0.3.2] - 2025-10-05

Added

Support for Ollama LLM reasoning alongside OpenAI embeddings:
- Introduced LLM_PROVIDER=ollama, LLM_MODEL, and OLLAMA_HOST in .env.3090.
- Verified local 3090 setup using qwen2.5:7b-instruct-q4_K_M.
- Split processing pipeline:
  - Embeddings → OpenAI text-embedding-3-small
  - LLM → Local Ollama (http://10.0.0.3:11434/api/chat).
Added .env.3090 template for self-hosted inference nodes.
Integrated runtime diagnostics and seeder progress tracking:
- File-level + message-level progress bars.
- Retry/back-off logic for timeouts (3 attempts).
- Event logging (ADD / UPDATE / NONE) for every memory record.
Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers.
Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090).

Changed

Updated main.py configuration block to load:
- LLM_PROVIDER, LLM_MODEL, and OLLAMA_BASE_URL.
- Fallback to OpenAI if Ollama unavailable.
Adjusted docker-compose.yml mount paths to correctly map /app/main.py.
Normalized .env loading so mem0-api and host environment share identical values.
Improved seeder logging and progress telemetry for clearer diagnostics.
Added explicit temperature field to DEFAULT_CONFIG['llm']['config'] for tuning future local inference runs.

Fixed

Resolved crash during startup: TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'.
Corrected mount type mismatch (file vs directory) causing OCI runtime create failed errors.
Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests.
“Unknown event” warnings now safely ignored (no longer break seeding loop).
Confirmed full dual-provider operation in logs (api.openai.com + 10.0.0.3:11434/api/chat).

Observations

Stable GPU utilization: ~8 GB VRAM @ 92 % load, ≈ 67 °C under sustained seeding.
Next revision will re-format seed JSON to preserve role context (user vs assistant).

[Lyra-Mem0 0.3.1] - 2025-10-03

Added

HuggingFace TEI integration (local 3090 embedder).
Dual-mode environment switch between OpenAI cloud and local.
CSV export of memories from Postgres (payload->>'data').

Fixed

.env CRLF vs LF line ending issues.
Local seeding now possible via huggingface server running

[Lyra-mem0 0.3.0]

Added

Support for Ollama embeddings in Mem0 OSS container:
- Added ability to configure EMBEDDER_PROVIDER=ollama and set EMBEDDER_MODEL + OLLAMA_HOST via .env.
- Mounted main.py override from host into container to load custom DEFAULT_CONFIG.
- Installed ollama Python client into custom API container image.
.env.3090 file created for external embedding mode (3090 machine):
- EMBEDDER_PROVIDER=ollama
- EMBEDDER_MODEL=mxbai-embed-large
- OLLAMA_HOST=http://10.0.0.3:11434
Workflow to support multiple embedding modes:
1. Fast LAN-based 3090/Ollama embeddings
2. Local-only CPU embeddings (Lyra Cortex VM)
3. OpenAI fallback embeddings

Changed

docker-compose.yml updated to mount local main.py and .env.3090.
Built custom Dockerfile (mem0-api-server:latest) extending base image with pip install ollama.
Updated requirements.txt to include ollama package.
Adjusted Mem0 container config so main.py pulls environment variables with dotenv (load_dotenv()).
Tested new embeddings path with curl /memories API call.

Fixed

Resolved container boot failure caused by missing ollama dependency (ModuleNotFoundError).
Fixed config overwrite issue where rebuilding container restored stock main.py.
Worked around Neo4j error (vector.similarity.cosine(): mismatched vector dimensions) by confirming OpenAI vs. Ollama embedding vector sizes and planning to standardize at 1536-dim.

[Lyra-mem0 v0.2.1]

Added

Seeding pipeline:
- Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0.
- Implemented incremental seeding option (skip existing memories, only add new ones).
- Verified insert process with Postgres-backed history DB and curl /memories/search sanity check.
Ollama embedding support in Mem0 OSS container:
- Added configuration for EMBEDDER_PROVIDER=ollama, EMBEDDER_MODEL, and OLLAMA_HOST via .env.
- Created .env.3090 profile for LAN-connected 3090 machine with Ollama.
- Set up three embedding modes:
  1. Fast LAN-based 3090/Ollama
  2. Local-only CPU model (Lyra Cortex VM)
  3. OpenAI fallback

Changed

Updated main.py to load configuration from .env using dotenv and support multiple embedder backends.
Mounted host main.py into container so local edits persist across rebuilds.
Updated docker-compose.yml to mount .env.3090 and support swap between profiles.
Built custom Dockerfile (mem0-api-server:latest) including pip install ollama.
Updated requirements.txt with ollama dependency.
Adjusted startup flow so container automatically connects to external Ollama host (LAN IP).
Added logging to confirm model pulls and embedding requests.

Fixed

Seeder process originally failed on old memories — now skips duplicates and continues batch.
Resolved container boot error (ModuleNotFoundError: ollama) by extending image.
Fixed overwrite issue where stock main.py replaced custom config during rebuild.
Worked around Neo4j vector.similarity.cosine() dimension mismatch by investigating OpenAI (1536-dim) vs Ollama (1024-dim) schemas.

Notes

To fully unify embedding modes, a Hugging Face / local model with 1536-dim embeddings will be needed (to match OpenAI’s schema and avoid Neo4j errors).
Current Ollama model (mxbai-embed-large) works, but returns 1024-dim vectors.
Seeder workflow validated but should be wrapped in a repeatable weekly run for full Cloud→Local sync.

[Lyra-Mem0 v0.2.0] - 2025-09-30

Added

Standalone Lyra-Mem0 stack created at ~/lyra-mem0/
- Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking.
- Added working docker-compose.mem0.yml and custom Dockerfile for building the Mem0 API server.
Verified REST API functionality:
- POST /memories works for adding memories.
- POST /search works for semantic search.
Successful end-to-end test with persisted memory:
"Likes coffee in the morning" → retrievable via search. ✅

Changed

Split architecture into modular stacks:
- ~/lyra-core (Relay, Persona-Sidecar, etc.)
- ~/lyra-mem0 (Mem0 OSS memory stack)
Removed old embedded mem0 containers from the Lyra-Core compose file.
Added Lyra-Mem0 section in README.md.

Next Steps

Wire Relay → Mem0 API (integration not yet complete).
Add integration tests to verify persistence and retrieval from within Lyra-Core.

🧠 Lyra-Cortex

[ Cortex - v0.5] -2025-11-13

Added

New reasoning.py module
- Async reasoning engine.
- Accepts user prompt, identity, RAG block, and reflection notes.
- Produces draft internal answers.
- Uses primary backend (vLLM).
New reflection.py module
- Fully async.
- Produces actionable JSON “internal notes.”
- Enforces strict JSON schema and fallback parsing.
- Forces cloud backend (backend_override="cloud").
Integrated refine.py into Cortex reasoning pipeline:
- New stage between reflection and persona.
- Runs exclusively on primary vLLM backend (MI50).
- Produces final, internally consistent output for downstream persona layer.
Backend override system
- Each LLM call can now select its own backend.
- Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary.
identity loader
- Added identity.py with load_identity() for consistent persona retrieval.
ingest_handler
- Async stub created for future Intake → NeoMem → RAG pipeline.

Changed

Unified LLM backend URL handling across Cortex:
- ENV variables must now contain FULL API endpoints.
- Removed all internal path-appending (e.g. .../v1/completions).
- llm_router.py rewritten to use env-provided URLs as-is.
- Ensures consistent behavior between draft, reflection, refine, and persona.
Rebuilt main.py
- Removed old annotation/analysis logic.
- New structure: load identity → get RAG → reflect → reason → return draft+notes.
- Routes now clean and minimal (/reason, /ingest, /health).
- Async path throughout Cortex.
Refactored llm_router.py
- Removed old fallback logic during overrides.
- OpenAI requests now use /v1/chat/completions.
- Added proper OpenAI Authorization headers.
- Distinct payload format for vLLM vs OpenAI.
- Unified, correct parsing across models.
Simplified Cortex architecture
- Removed deprecated “context.py” and old reasoning code.
- Relay completely decoupled from smart behavior.
Updated environment specification:
- LLM_PRIMARY_URL now set to http://10.0.0.43:8000/v1/completions.
- LLM_SECONDARY_URL remains http://10.0.0.3:11434/api/generate (Ollama).
- LLM_CLOUD_URL set to https://api.openai.com/v1/chat/completions.

Fixed

Resolved endpoint conflict where:
- Router expected base URLs.
- Refine expected full URLs.
- Refine always fell back due to hitting incorrect endpoint.
- Fixed by standardizing full-URL behavior across entire system.
Reflection layer no longer fails silently (previously returned [""] due to MythoMax).
Resolved 404/401 errors caused by incorrect OpenAI URL endpoints.
No more double-routing through vLLM during reflection.
Corrected async/sync mismatch in multiple locations.
Eliminated double-path bug (/v1/completions/v1/completions) caused by previous router logic.

Removed

Legacy annotate, reason_check glue logic from old architecture.
Old backend probing junk code.
Stale imports and unused modules leftover from previous prototype.

Verified

Cortex → vLLM (MI50) → refine → final_output now functioning correctly.
refine shows used_primary_backend: true and no fallback.
Manual curl test confirms endpoint accuracy.

Known Issues

refine sometimes prefixes output with "Final Answer:"; next version will sanitize this.
hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned).

Pending / Known Issues

RAG service does not exist — requires containerized FastAPI service.
Reasoning layer lacks self-revision loop (deliberate thought cycle).
No speak/persona generation layer yet (speak.py planned).
Intake summaries not yet routing into RAG or reflection layer.
No refinement engine between reasoning and speak.

Notes

This is the largest structural change to Cortex so far.
It establishes:

multi-model cognition
clean layering
identity + reflection separation
correct async code
deterministic backend routing
predictable JSON reflection

The system is now ready for:

refinement loops
persona-speaking layer
containerized RAG
long-term memory integration
true emergent-behavior experiments

[ Cortex - v0.4.1] - 2025-11-5

Added

RAG intergration
- Added rag.py with query_rag() and format_rag_block().
- Cortex now queries the local RAG API (http://10.0.0.41:7090/rag/search) for contextual augmentation.
- Synthesized answers and top excerpts are injected into the reasoning prompt.

Changed

Revised /reason endpoint.
- Now builds unified context blocks:
  - [Intake] → recent summaries
  - [RAG] → contextual knowledge
  - [User Message] → current input
- Calls call_llm() for the first pass, then reflection_loop() for meta-evaluation.
- Returns cortex_prompt, draft_output, final_output, and normalized reflection.
Reflection Pipeline Stability
- Cleaned parsing to normalize JSON vs. text reflections.
- Added fallback handling for malformed or non-JSON outputs.
- Log system improved to show raw JSON, extracted fields, and normalized summary.
Async Summarization (Intake v0.2.1)
- Intake summaries now run in background threads to avoid blocking Cortex.
- Summaries (L1–L∞) logged asynchronously with [BG] tags.
Environment & Networking Fixes
- Verified .env variables propagate correctly inside the Cortex container.
- Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG (shared serversdown_lyra_net).
- Adjusted localhost calls to service-IP mapping (10.0.0.41 for Cortex host).
Behavioral Updates
- Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers).
- RAG context successfully grounds reasoning outputs.
- Intake and NeoMem confirmed receiving summaries via /add_exchange.
- Log clarity pass: all reflective and contextual blocks clearly labeled.
Known Gaps / Next Steps
- NeoMem Tuning
- Improve retrieval latency and relevance.
- Implement a dedicated /reflections/recent endpoint for Cortex.
- Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem).
Cortex Enhancements
- Add persistent reflection recall (use prior reflections as meta-context).
- Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields).
- Tighten temperature and prompt control for factual consistency.
RAG Optimization -Add source ranking, filtering, and multi-vector hybrid search. -Cache RAG responses per session to reduce duplicate calls.
Documentation / Monitoring -Add health route for RAG and Intake summaries. -Include internal latency metrics in /health endpoint.

Consolidate logs into unified “Lyra Cortex Console” for tracing all module calls.

[Cortex - v0.3.0] – 2025-10-31

Added

Cortex Service (FastAPI)
- New standalone reasoning engine (cortex/main.py) with endpoints:
  - GET /health – reports active backend + NeoMem status.
  - POST /reason – evaluates {prompt, response} pairs.
  - POST /annotate – experimental text analysis.
- Background NeoMem health monitor (5-minute interval).
Multi-Backend Reasoning Support
- Added environment-driven backend selection via LLM_FORCE_BACKEND.
- Supports:
  - Primary → vLLM (MI50 node @ 10.0.0.43)
  - Secondary → Ollama (3090 node @ 10.0.0.3)
  - Cloud → OpenAI API
  - Fallback → llama.cpp (CPU)
- Introduced per-backend model variables:
  LLM_PRIMARY_MODEL, LLM_SECONDARY_MODEL, LLM_CLOUD_MODEL, LLM_FALLBACK_MODEL.
Response Normalization Layer
- Implemented normalize_llm_response() to merge streamed outputs and repair malformed JSON.
- Handles Ollama’s multi-line streaming and Mythomax’s missing punctuation issues.
- Prints concise debug previews of merged content.
Environment Simplification
- Each service (intake, cortex, neomem) now maintains its own .env file.
- Removed reliance on shared/global env file to prevent cross-contamination.
- Verified Docker Compose networking across containers.

Changed

Refactored reason_check() to dynamically switch between prompt and chat mode depending on backend.
Enhanced startup logs to announce active backend, model, URL, and mode.
Improved error handling with clearer “Reasoning error” messages.

Fixed

Corrected broken vLLM endpoint routing (/v1/completions).
Stabilized cross-container health reporting for NeoMem.
Resolved JSON parse failures caused by streaming chunk delimiters.

Next Planned – [v0.4.0]

Planned Additions

Reflection Mode
- Introduce REASONING_MODE=factcheck|reflection.
- Output schema:
```
{ "insight": "...", "evaluation": "...", "next_action": "..." }
```
Cortex-First Pipeline
- UI → Cortex → [Reflection + Verifier + Memory] → Speech LLM → User.
- Allows Lyra to “think before speaking.”
Verifier Stub
- New /verify endpoint for search-based factual grounding.
- Asynchronous external truth checking.
Memory Integration
- Feed reflective outputs into NeoMem.
- Enable “dream” cycles for autonomous self-review.

Status: 🟢 Stable Core – Multi-backend reasoning operational.
Next milestone: v0.4.0 — Reflection Mode + Thought Pipeline orchestration.

[Intake] v0.1.0 - 2025-10-27

- Recieves messages from relay and summarizes them in a cascading format.
- Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
- Currently logs summaries to .log file in /project-lyra/intake-logs/

** Next Steps ** - Feed intake into neomem. - Generate a daily/hourly/etc overall summary, (IE: Today Brian and Lyra worked on x, y, and z) - Generate session aware summaries, with its own intake hopper.

[Lyra-Cortex] v0.2.0 — 2025-09-26

**Added

Integrated llama-server on dedicated Cortex VM (Proxmox).
Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs.
Benchmarked Phi-3.5-mini performance:
- ~18 tokens/sec CPU-only on Ryzen 7 7800X.
- Salience classification functional but sometimes inconsistent ("sali", "fi", "jamming").
Tested Qwen2-0.5B-Instruct GGUF as alternative salience classifier:
- Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval).
- More responsive but over-classifies messages as “salient.”
Established .env integration for model ID (SALIENCE_MODEL), enabling hot-swap between models.

** Known Issues

Small models tend to drift or over-classify.
CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models.
Need to set up a systemd service for llama-server to auto-start on VM reboot.

[Lyra-Cortex] v0.1.0 — 2025-09-25

Added

First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD).
Built llama.cpp with llama-server target via CMake.
Integrated Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF) model.
Verified API compatibility at /v1/chat/completions.
Local test successful via curl → ~523 token response generated.
Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X).
Confirmed usable for salience scoring, summarization, and lightweight reasoning.

35 KiB Raw Blame History Unescape Escape

Project Lyra — Modular Changelog

Last Updated: 11-28-25

🧠 Lyra-Core

[Project Lyra v0.5.0] - 2025-11-28

🔧 Fixed - Critical API Wiring & Integration

Cortex → Intake Integration ✅

Relay → UI Compatibility ✅

Relay → Intake Connection ✅

Code Quality & Python Package Structure ✅

✅ Verified Working

📝 Documentation

🐛 Issues Resolved

⚠️ Known Issues (Non-Critical)

🎯 Migration Notes

[Infrastructure v1.0.0] - 2025-11-26

Changed

Added

Removed

Fixed

Architecture

Migration

[Lyra_RAG v0.1.0] 2025-11-07

Added

API

Verified

Next Planned

[Lyra Core v0.3.2 + Web Ui v0.2.0] - 2025-10-28

Added

[Lyra-Core 0.3.1] - 2025-10-09

Added

Changed

Fixed

Goals / Next Steps

[Lyra-Core] v0.3.1 - 2025-09-27

Changed

Added

Fixed

[Lyra-Core] v0.3.0 — 2025-09-26

Added

Changed

Known Issues

[Lyra-Core] v0.2.0 — 2025-09-24

Added

[Lyra-Core] v0.1.0 — 2025-09-23

Added

Fixed

Known Issues

🧩 lyra-neomem (used to be NVGRAM / Lyra-Mem0)

[NeoMem 0.1.2] - 2025-10-27

Changed

[NVGRAM 0.1.1] - 2025-10-08

Added

Changed

Fixed

Goals / Next Steps

[NVGRAM 0.1.0] - 2025-10-07

Added

Changed

Goals / Next Steps

[Lyra-Mem0 0.3.2] - 2025-10-05

Added

Changed

Fixed

Observations

[Lyra-Mem0 0.3.1] - 2025-10-03

Added

Fixed

[Lyra-mem0 0.3.0]

Added

Changed

Fixed

[Lyra-mem0 v0.2.1]

Added

Changed

Fixed

Notes

[Lyra-Mem0 v0.2.0] - 2025-09-30

Added

Changed

35 KiB

Raw Blame History