31 KiB
Project Lyra — Modular Changelog
All notable changes to Project Lyra are organized by component. The format is based on Keep a Changelog and adheres to Semantic Versioning.
Last Updated: 11-26-25
🧠 Lyra-Core
[Infrastructure v1.0.0] - 2025-11-26
Changed
-
Environment Variable Consolidation - Major reorganization to eliminate duplication and improve maintainability
- Consolidated 9 scattered
.envfiles into single source of truth architecture - Root
.envnow contains all shared infrastructure (LLM backends, databases, API keys, service URLs) - Service-specific
.envfiles minimized to only essential overrides:cortex/.env: Reduced from 42 to 22 lines (operational parameters only)neomem/.env: Reduced from 26 to 14 lines (LLM naming conventions only)intake/.env: Kept at 8 lines (already minimal)
- Result: ~24% reduction in total configuration lines (197 → ~150)
- Consolidated 9 scattered
-
Docker Compose Consolidation
- All services now defined in single root
docker-compose.yml - Relay service updated with complete configuration (env_file, volumes)
- Removed redundant
core/docker-compose.yml(marked as DEPRECATED) - Standardized network communication to use Docker container names
- All services now defined in single root
-
Service URL Standardization
- Internal services use container names:
http://neomem-api:7077,http://cortex:7081 - External services use IP addresses:
http://10.0.0.43:8000(vLLM),http://10.0.0.3:11434(Ollama) - Removed IP/container name inconsistencies across files
- Internal services use container names:
Added
-
Security Templates - Created
.env.examplefiles for all services- Root
.env.examplewith sanitized credentials - Service-specific templates:
cortex/.env.example,neomem/.env.example,intake/.env.example,rag/.env.example - All
.env.examplefiles safe to commit to version control
- Root
-
Documentation
ENVIRONMENT_VARIABLES.md: Comprehensive reference for all environment variables- Variable descriptions, defaults, and usage examples
- Multi-backend LLM strategy documentation
- Troubleshooting guide
- Security best practices
DEPRECATED_FILES.md: Deletion guide for deprecated files with verification steps
-
Enhanced .gitignore
- Ignores all
.envfiles (including subdirectories) - Tracks
.env.exampletemplates for documentation - Ignores
.env-backups/directory
- Ignores all
Removed
core/.env- Redundant with root.env, now deletedcore/docker-compose.yml- Consolidated into main compose file (marked DEPRECATED)
Fixed
- Eliminated duplicate
OPENAI_API_KEYacross 5+ files - Eliminated duplicate LLM backend URLs across 4+ files
- Eliminated duplicate database credentials across 3+ files
- Resolved Cortex
environment:section override in docker-compose (now uses env_file)
Architecture
- Multi-Backend LLM Strategy: Root
.envprovides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE- Cortex → vLLM (PRIMARY) for autonomous reasoning
- NeoMem → Ollama (SECONDARY) + OpenAI embeddings
- Intake → vLLM (PRIMARY) for summarization
- Relay → Fallback chain with user preference
- Preserves per-service flexibility while eliminating URL duplication
Migration
- All original
.envfiles backed up to.env-backups/with timestamp20251126_025334 - Rollback plan documented in
ENVIRONMENT_VARIABLES.md - Verification steps provided in
DEPRECATED_FILES.md
[Lyra_RAG v0.1.0] 2025-11-07
Added
- Initial standalone RAG module for Project Lyra.
- Persistent ChromaDB vector store (
./chromadb). - Importer
rag_chat_import.pywith:- Recursive folder scanning and category tagging.
- Smart chunking (~5 k chars).
- SHA-1 deduplication and chat-ID metadata.
- Timestamp fields (
file_modified,imported_at). - Background-safe operation (
nohup/tmux).
- 68 Lyra-category chats imported:
- 6 556 new chunks added
- 1 493 duplicates skipped
- 7 997 total vectors now stored.
API
/rag/searchFastAPI endpoint implemented (port 7090).- Supports natural-language queries and returns top related excerpts.
- Added answer synthesis step using
gpt-4o-mini.
Verified
- Successful recall of Lyra-Core development history (v0.3.0 snapshot).
- Correct metadata and category tagging for all new imports.
Next Planned
- Optional
wherefilter parameter for category/date queries. - Graceful “no results” handler for empty retrievals.
rag_docs_import.pyfor PDFs and other document types.
[Lyra Core v0.3.2 + Web Ui v0.2.0] - 2025-10-28
Added
-
** New UI **
- Cleaned up UI look and feel.
-
** Added "sessions" **
- Now sessions persist over time.
- Ability to create new sessions or load sessions from a previous instance.
- When changing the session, it updates what the prompt is sending relay (doesn't prompt with messages from other sessions).
- Relay is correctly wired in.
[Lyra-Core 0.3.1] - 2025-10-09
Added
- NVGRAM Integration (Full Pipeline Reconnected)
- Replaced legacy Mem0 service with NVGRAM microservice (
nvgram-api@ port 7077). - Updated
server.jsin Relay to route all memory ops via${NVGRAM_API}/memoriesand/search. - Added
.envvariable:NVGRAM_API=http://nvgram-api:7077 - Verified end-to-end Lyra conversation persistence:
relay → nvgram-api → postgres/neo4j → relay → ollama → ui- ✅ Memories stored, retrieved, and re-injected successfully.
- Replaced legacy Mem0 service with NVGRAM microservice (
Changed
- Renamed
MEM0_URL→NVGRAM_APIacross all relay environment configs. - Updated Docker Compose service dependency order:
relaynow depends onnvgram-apihealthcheck.- Removed
mem0references and volumes.
- Minor cleanup to Persona fetch block (null-checks and safer default persona string).
Fixed
- Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling.
/memoriesPOST failures no longer crash Relay; now logged gracefully asrelay error Error: memAdd failed: 500.- Improved injected prompt debugging (
DEBUG_PROMPT=truenow prints clean JSON).
Goals / Next Steps
- Add salience visualization (e.g., memory weights displayed in injected system message).
- Begin schema alignment with NVGRAM v0.1.2 for confidence scoring.
- Add relay auto-retry for transient 500 responses from NVGRAM.
[Lyra-Core] v0.3.1 - 2025-09-27
Changed
- Removed salience filter logic; Cortex is now the default annotator.
- All user messages stored in Mem0; no discard tier applied.
Added
- Cortex annotations (
metadata.cortex) now attached to memories. - Debug logging improvements:
- Pretty-print Cortex annotations
- Injected prompt preview
- Memory search hit list with scores
.envtoggle (CORTEX_ENABLED) to bypass Cortex when needed.
Fixed
- Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner.
- Relay no longer “hangs” on malformed Cortex outputs.
[Lyra-Core] v0.3.0 — 2025-09-26
Added
- Implemented salience filtering in Relay:
.envconfigurable:SALIENCE_ENABLED,SALIENCE_MODE,SALIENCE_MODEL,SALIENCE_API_URL.- Supports
heuristicandllmclassification modes. - LLM-based salience filter integrated with Cortex VM running
llama-server.
- Logging improvements:
- Added debug logs for salience mode, raw LLM output, and unexpected outputs.
- Fail-closed behavior for unexpected LLM responses.
- Successfully tested with Phi-3.5-mini and Qwen2-0.5B-Instruct as salience classifiers.
- Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply.
Changed
- Refactored
server.jsto gatemem.add()calls behind salience filter. - Updated
.envto supportSALIENCE_MODEL.
Known Issues
- Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient".
- Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi").
- CPU-only inference is functional but limited; larger models recommended once GPU is available.
[Lyra-Core] v0.2.0 — 2025-09-24
Added
- Migrated Relay to use
mem0aiSDK instead of raw fetch calls. - Implemented
sessionIdsupport (client-supplied, fallback todefault). - Added debug logs for memory add/search.
- Cleaned up Relay structure for clarity.
[Lyra-Core] v0.1.0 — 2025-09-23
Added
- First working MVP of Lyra Core Relay.
- Relay service accepts
POST /v1/chat/completions(OpenAI-compatible). - Memory integration with Mem0:
POST /memorieson each user message.POST /searchbefore LLM call.
- Persona Sidecar integration (
GET /current). - OpenAI GPT + Ollama (Mythomax) support in Relay.
- Simple browser-based chat UI (talks to Relay at
http://<host>:7078). .envstandardization for Relay + Mem0 + Postgres + Neo4j.- Working Neo4j + Postgres backing stores for Mem0.
- Initial MVP relay service with raw fetch calls to Mem0.
- Dockerized with basic healthcheck.
Fixed
- Resolved crash loop in Neo4j by restricting env vars (
NEO4J_AUTHonly). - Relay now correctly reads
MEM0_URLandMEM0_API_KEYfrom.env.
Known Issues
- No feedback loop (thumbs up/down) yet.
- Forget/delete flow is manual (via memory IDs).
- Memory latency ~1–4s depending on embedding model.
🧩 lyra-neomem (used to be NVGRAM / Lyra-Mem0)
[NeoMem 0.1.2] - 2025-10-27
Changed
- Renamed NVGRAM to neomem
- All future updates will be under the name NeoMem.
- Features have not changed.
[NVGRAM 0.1.1] - 2025-10-08
Added
- Async Memory Rewrite (Stability + Safety Patch)
- Introduced
AsyncMemoryclass with fully asynchronous vector and graph store writes. - Added input sanitation to prevent embedding errors (
'list' object has no attribute 'replace'). - Implemented
flatten_messages()helper in API layer to clean malformed payloads. - Added structured request logging via
RequestLoggingMiddleware(FastAPI middleware). - Health endpoint (
/health) now returns structured JSON{status, version, service}. - Startup logs now include sanitized embedder config with API keys masked for safety:
>>> Embedder config (sanitized): {'provider': 'openai', 'config': {'model': 'text-embedding-3-small', 'api_key': '***'}} ✅ Connected to Neo4j on attempt 1 🧠 NVGRAM v0.1.1 — Neural Vectorized Graph Recall and Memory initialized
- Introduced
Changed
- Replaced synchronous
Memory.add()with async-safe version supporting concurrent vector + graph writes. - Normalized indentation and cleaned duplicate
main.pyreferences under/nvgram/vs/nvgram/server/. - Removed redundant
FastAPI()app reinitialization. - Updated internal logging to INFO-level timing format: 2025-10-08 21:48:45 [INFO] POST /memories -> 200 (11189.1 ms)
- Deprecated
@app.on_event("startup")(FastAPI deprecation warning) → will migrate tolifespanhandler in v0.1.2.
Fixed
- Eliminated repeating 500 error from OpenAI embedder caused by non-string message content.
- Masked API key leaks from boot logs.
- Ensured Neo4j reconnects gracefully on first retry.
Goals / Next Steps
- Integrate salience scoring and embedding confidence weight fields in Postgres schema.
- Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall.
- Migrate from deprecated
on_event→lifespanpattern in 0.1.2.
[NVGRAM 0.1.0] - 2025-10-07
Added
- Initial fork of Mem0 → NVGRAM:
- Created a fully independent local-first memory engine based on Mem0 OSS.
- Renamed all internal modules, Docker services, and environment variables from
mem0→nvgram. - New service name:
nvgram-api, default port 7077. - Maintains same API endpoints (
/memories,/search) for drop-in compatibility with Lyra Core. - Uses FastAPI, Postgres, and Neo4j as persistent backends.
- Verified clean startup:
✅ Connected to Neo4j on attempt 1 INFO: Uvicorn running on http://0.0.0.0:7077 /docsand/openapi.jsonconfirmed reachable and functional.
Changed
- Removed dependency on the external
mem0aiSDK — all logic now local. - Re-pinned requirements:
- fastapi==0.115.8
- uvicorn==0.34.0
- pydantic==2.10.4
- python-dotenv==1.0.1
- psycopg>=3.2.8
- ollama
- Adjusted
docker-composeand.envtemplates to use new NVGRAM naming and image paths.
Goals / Next Steps
- Integrate NVGRAM as the new default backend in Lyra Relay.
- Deprecate remaining Mem0 references and archive old configs.
- Begin versioning as a standalone project (
nvgram-core,nvgram-api, etc.).
[Lyra-Mem0 0.3.2] - 2025-10-05
Added
- Support for Ollama LLM reasoning alongside OpenAI embeddings:
- Introduced
LLM_PROVIDER=ollama,LLM_MODEL, andOLLAMA_HOSTin.env.3090. - Verified local 3090 setup using
qwen2.5:7b-instruct-q4_K_M. - Split processing pipeline:
- Embeddings → OpenAI
text-embedding-3-small - LLM → Local Ollama (
http://10.0.0.3:11434/api/chat).
- Embeddings → OpenAI
- Introduced
- Added
.env.3090template for self-hosted inference nodes. - Integrated runtime diagnostics and seeder progress tracking:
- File-level + message-level progress bars.
- Retry/back-off logic for timeouts (3 attempts).
- Event logging (
ADD / UPDATE / NONE) for every memory record.
- Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers.
- Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090).
Changed
- Updated
main.pyconfiguration block to load:LLM_PROVIDER,LLM_MODEL, andOLLAMA_BASE_URL.- Fallback to OpenAI if Ollama unavailable.
- Adjusted
docker-compose.ymlmount paths to correctly map/app/main.py. - Normalized
.envloading somem0-apiand host environment share identical values. - Improved seeder logging and progress telemetry for clearer diagnostics.
- Added explicit
temperaturefield toDEFAULT_CONFIG['llm']['config']for tuning future local inference runs.
Fixed
- Resolved crash during startup:
TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'. - Corrected mount type mismatch (file vs directory) causing
OCI runtime create failederrors. - Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests.
- “Unknown event” warnings now safely ignored (no longer break seeding loop).
- Confirmed full dual-provider operation in logs (
api.openai.com+10.0.0.3:11434/api/chat).
Observations
- Stable GPU utilization: ~8 GB VRAM @ 92 % load, ≈ 67 °C under sustained seeding.
- Next revision will re-format seed JSON to preserve
rolecontext (user vs assistant).
[Lyra-Mem0 0.3.1] - 2025-10-03
Added
- HuggingFace TEI integration (local 3090 embedder).
- Dual-mode environment switch between OpenAI cloud and local.
- CSV export of memories from Postgres (
payload->>'data').
Fixed
.envCRLF vs LF line ending issues.- Local seeding now possible via huggingface server running
[Lyra-mem0 0.3.0]
Added
- Support for Ollama embeddings in Mem0 OSS container:
- Added ability to configure
EMBEDDER_PROVIDER=ollamaand setEMBEDDER_MODEL+OLLAMA_HOSTvia.env. - Mounted
main.pyoverride from host into container to load customDEFAULT_CONFIG. - Installed
ollamaPython client into custom API container image.
- Added ability to configure
.env.3090file created for external embedding mode (3090 machine):- EMBEDDER_PROVIDER=ollama
- EMBEDDER_MODEL=mxbai-embed-large
- OLLAMA_HOST=http://10.0.0.3:11434
- Workflow to support multiple embedding modes:
- Fast LAN-based 3090/Ollama embeddings
- Local-only CPU embeddings (Lyra Cortex VM)
- OpenAI fallback embeddings
Changed
docker-compose.ymlupdated to mount localmain.pyand.env.3090.- Built custom Dockerfile (
mem0-api-server:latest) extending base image withpip install ollama. - Updated
requirements.txtto includeollamapackage. - Adjusted Mem0 container config so
main.pypulls environment variables withdotenv(load_dotenv()). - Tested new embeddings path with curl
/memoriesAPI call.
Fixed
- Resolved container boot failure caused by missing
ollamadependency (ModuleNotFoundError). - Fixed config overwrite issue where rebuilding container restored stock
main.py. - Worked around Neo4j error (
vector.similarity.cosine(): mismatched vector dimensions) by confirming OpenAI vs. Ollama embedding vector sizes and planning to standardize at 1536-dim.
--
[Lyra-mem0 v0.2.1]
Added
- Seeding pipeline:
- Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0.
- Implemented incremental seeding option (skip existing memories, only add new ones).
- Verified insert process with Postgres-backed history DB and curl
/memories/searchsanity check.
- Ollama embedding support in Mem0 OSS container:
- Added configuration for
EMBEDDER_PROVIDER=ollama,EMBEDDER_MODEL, andOLLAMA_HOSTvia.env. - Created
.env.3090profile for LAN-connected 3090 machine with Ollama. - Set up three embedding modes:
- Fast LAN-based 3090/Ollama
- Local-only CPU model (Lyra Cortex VM)
- OpenAI fallback
- Added configuration for
Changed
- Updated
main.pyto load configuration from.envusingdotenvand support multiple embedder backends. - Mounted host
main.pyinto container so local edits persist across rebuilds. - Updated
docker-compose.ymlto mount.env.3090and support swap between profiles. - Built custom Dockerfile (
mem0-api-server:latest) includingpip install ollama. - Updated
requirements.txtwithollamadependency. - Adjusted startup flow so container automatically connects to external Ollama host (LAN IP).
- Added logging to confirm model pulls and embedding requests.
Fixed
- Seeder process originally failed on old memories — now skips duplicates and continues batch.
- Resolved container boot error (
ModuleNotFoundError: ollama) by extending image. - Fixed overwrite issue where stock
main.pyreplaced custom config during rebuild. - Worked around Neo4j
vector.similarity.cosine()dimension mismatch by investigating OpenAI (1536-dim) vs Ollama (1024-dim) schemas.
Notes
- To fully unify embedding modes, a Hugging Face / local model with 1536-dim embeddings will be needed (to match OpenAI’s schema and avoid Neo4j errors).
- Current Ollama model (
mxbai-embed-large) works, but returns 1024-dim vectors. - Seeder workflow validated but should be wrapped in a repeatable weekly run for full Cloud→Local sync.
[Lyra-Mem0 v0.2.0] - 2025-09-30
Added
- Standalone Lyra-Mem0 stack created at
~/lyra-mem0/- Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking.
- Added working
docker-compose.mem0.ymland customDockerfilefor building the Mem0 API server.
- Verified REST API functionality:
POST /memoriesworks for adding memories.POST /searchworks for semantic search.
- Successful end-to-end test with persisted memory:
"Likes coffee in the morning" → retrievable via search. ✅
Changed
- Split architecture into modular stacks:
~/lyra-core(Relay, Persona-Sidecar, etc.)~/lyra-mem0(Mem0 OSS memory stack)
- Removed old embedded mem0 containers from the Lyra-Core compose file.
- Added Lyra-Mem0 section in README.md.
Next Steps
- Wire Relay → Mem0 API (integration not yet complete).
- Add integration tests to verify persistence and retrieval from within Lyra-Core.
🧠 Lyra-Cortex
[ Cortex - v0.5] -2025-11-13
Added
-
New
reasoning.pymodule- Async reasoning engine.
- Accepts user prompt, identity, RAG block, and reflection notes.
- Produces draft internal answers.
- Uses primary backend (vLLM).
-
New
reflection.pymodule- Fully async.
- Produces actionable JSON “internal notes.”
- Enforces strict JSON schema and fallback parsing.
- Forces cloud backend (
backend_override="cloud").
-
Integrated
refine.pyinto Cortex reasoning pipeline:- New stage between reflection and persona.
- Runs exclusively on primary vLLM backend (MI50).
- Produces final, internally consistent output for downstream persona layer.
-
Backend override system
- Each LLM call can now select its own backend.
- Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary.
-
identity loader
- Added
identity.pywithload_identity()for consistent persona retrieval.
- Added
-
ingest_handler
- Async stub created for future Intake → NeoMem → RAG pipeline.
Changed
-
Unified LLM backend URL handling across Cortex:
- ENV variables must now contain FULL API endpoints.
- Removed all internal path-appending (e.g.
.../v1/completions). llm_router.pyrewritten to use env-provided URLs as-is.- Ensures consistent behavior between draft, reflection, refine, and persona.
-
Rebuilt
main.py- Removed old annotation/analysis logic.
- New structure: load identity → get RAG → reflect → reason → return draft+notes.
- Routes now clean and minimal (
/reason,/ingest,/health). - Async path throughout Cortex.
-
Refactored
llm_router.py- Removed old fallback logic during overrides.
- OpenAI requests now use
/v1/chat/completions. - Added proper OpenAI Authorization headers.
- Distinct payload format for vLLM vs OpenAI.
- Unified, correct parsing across models.
-
Simplified Cortex architecture
- Removed deprecated “context.py” and old reasoning code.
- Relay completely decoupled from smart behavior.
-
Updated environment specification:
LLM_PRIMARY_URLnow set tohttp://10.0.0.43:8000/v1/completions.LLM_SECONDARY_URLremainshttp://10.0.0.3:11434/api/generate(Ollama).LLM_CLOUD_URLset tohttps://api.openai.com/v1/chat/completions.
Fixed
- Resolved endpoint conflict where:
- Router expected base URLs.
- Refine expected full URLs.
- Refine always fell back due to hitting incorrect endpoint.
- Fixed by standardizing full-URL behavior across entire system.
- Reflection layer no longer fails silently (previously returned
[""]due to MythoMax). - Resolved 404/401 errors caused by incorrect OpenAI URL endpoints.
- No more double-routing through vLLM during reflection.
- Corrected async/sync mismatch in multiple locations.
- Eliminated double-path bug (
/v1/completions/v1/completions) caused by previous router logic.
Removed
- Legacy
annotate,reason_checkglue logic from old architecture. - Old backend probing junk code.
- Stale imports and unused modules leftover from previous prototype.
Verified
- Cortex → vLLM (MI50) → refine → final_output now functioning correctly.
- refine shows
used_primary_backend: trueand no fallback. - Manual curl test confirms endpoint accuracy.
Known Issues
- refine sometimes prefixes output with
"Final Answer:"; next version will sanitize this. - hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned).
Pending / Known Issues
- RAG service does not exist — requires containerized FastAPI service.
- Reasoning layer lacks self-revision loop (deliberate thought cycle).
- No speak/persona generation layer yet (
speak.pyplanned). - Intake summaries not yet routing into RAG or reflection layer.
- No refinement engine between reasoning and speak.
Notes
This is the largest structural change to Cortex so far.
It establishes:
- multi-model cognition
- clean layering
- identity + reflection separation
- correct async code
- deterministic backend routing
- predictable JSON reflection
The system is now ready for:
- refinement loops
- persona-speaking layer
- containerized RAG
- long-term memory integration
- true emergent-behavior experiments
[ Cortex - v0.4.1] - 2025-11-5
Added
- RAG intergration
- Added rag.py with query_rag() and format_rag_block().
- Cortex now queries the local RAG API (http://10.0.0.41:7090/rag/search) for contextual augmentation.
- Synthesized answers and top excerpts are injected into the reasoning prompt.
Changed
-
Revised /reason endpoint.
- Now builds unified context blocks:
- [Intake] → recent summaries
- [RAG] → contextual knowledge
- [User Message] → current input
- Calls call_llm() for the first pass, then reflection_loop() for meta-evaluation.
- Returns cortex_prompt, draft_output, final_output, and normalized reflection.
- Now builds unified context blocks:
-
Reflection Pipeline Stability
- Cleaned parsing to normalize JSON vs. text reflections.
- Added fallback handling for malformed or non-JSON outputs.
- Log system improved to show raw JSON, extracted fields, and normalized summary.
-
Async Summarization (Intake v0.2.1)
- Intake summaries now run in background threads to avoid blocking Cortex.
- Summaries (L1–L∞) logged asynchronously with [BG] tags.
-
Environment & Networking Fixes
- Verified .env variables propagate correctly inside the Cortex container.
- Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG (shared serversdown_lyra_net).
- Adjusted localhost calls to service-IP mapping (10.0.0.41 for Cortex host).
-
Behavioral Updates
- Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers).
- RAG context successfully grounds reasoning outputs.
- Intake and NeoMem confirmed receiving summaries via /add_exchange.
- Log clarity pass: all reflective and contextual blocks clearly labeled.
-
Known Gaps / Next Steps
- NeoMem Tuning
- Improve retrieval latency and relevance.
- Implement a dedicated /reflections/recent endpoint for Cortex.
- Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem).
-
Cortex Enhancements
- Add persistent reflection recall (use prior reflections as meta-context).
- Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields).
- Tighten temperature and prompt control for factual consistency.
-
RAG Optimization -Add source ranking, filtering, and multi-vector hybrid search. -Cache RAG responses per session to reduce duplicate calls.
-
Documentation / Monitoring -Add health route for RAG and Intake summaries. -Include internal latency metrics in /health endpoint.
Consolidate logs into unified “Lyra Cortex Console” for tracing all module calls.
[Cortex - v0.3.0] – 2025-10-31
Added
-
Cortex Service (FastAPI)
- New standalone reasoning engine (
cortex/main.py) with endpoints:GET /health– reports active backend + NeoMem status.POST /reason– evaluates{prompt, response}pairs.POST /annotate– experimental text analysis.
- Background NeoMem health monitor (5-minute interval).
- New standalone reasoning engine (
-
Multi-Backend Reasoning Support
- Added environment-driven backend selection via
LLM_FORCE_BACKEND. - Supports:
- Primary → vLLM (MI50 node @ 10.0.0.43)
- Secondary → Ollama (3090 node @ 10.0.0.3)
- Cloud → OpenAI API
- Fallback → llama.cpp (CPU)
- Introduced per-backend model variables:
LLM_PRIMARY_MODEL,LLM_SECONDARY_MODEL,LLM_CLOUD_MODEL,LLM_FALLBACK_MODEL.
- Added environment-driven backend selection via
-
Response Normalization Layer
- Implemented
normalize_llm_response()to merge streamed outputs and repair malformed JSON. - Handles Ollama’s multi-line streaming and Mythomax’s missing punctuation issues.
- Prints concise debug previews of merged content.
- Implemented
-
Environment Simplification
- Each service (
intake,cortex,neomem) now maintains its own.envfile. - Removed reliance on shared/global env file to prevent cross-contamination.
- Verified Docker Compose networking across containers.
- Each service (
Changed
- Refactored
reason_check()to dynamically switch between prompt and chat mode depending on backend. - Enhanced startup logs to announce active backend, model, URL, and mode.
- Improved error handling with clearer “Reasoning error” messages.
Fixed
- Corrected broken vLLM endpoint routing (
/v1/completions). - Stabilized cross-container health reporting for NeoMem.
- Resolved JSON parse failures caused by streaming chunk delimiters.
Next Planned – [v0.4.0]
Planned Additions
-
Reflection Mode
- Introduce
REASONING_MODE=factcheck|reflection. - Output schema:
{ "insight": "...", "evaluation": "...", "next_action": "..." }
- Introduce
-
Cortex-First Pipeline
- UI → Cortex → [Reflection + Verifier + Memory] → Speech LLM → User.
- Allows Lyra to “think before speaking.”
-
Verifier Stub
- New
/verifyendpoint for search-based factual grounding. - Asynchronous external truth checking.
- New
-
Memory Integration
- Feed reflective outputs into NeoMem.
- Enable “dream” cycles for autonomous self-review.
Status: 🟢 Stable Core – Multi-backend reasoning operational.
Next milestone: v0.4.0 — Reflection Mode + Thought Pipeline orchestration.
[Intake] v0.1.0 - 2025-10-27
- Recieves messages from relay and summarizes them in a cascading format.
- Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
- Currently logs summaries to .log file in /project-lyra/intake-logs/
** Next Steps ** - Feed intake into neomem. - Generate a daily/hourly/etc overall summary, (IE: Today Brian and Lyra worked on x, y, and z) - Generate session aware summaries, with its own intake hopper.
[Lyra-Cortex] v0.2.0 — 2025-09-26
**Added
- Integrated llama-server on dedicated Cortex VM (Proxmox).
- Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs.
- Benchmarked Phi-3.5-mini performance:
- ~18 tokens/sec CPU-only on Ryzen 7 7800X.
- Salience classification functional but sometimes inconsistent ("sali", "fi", "jamming").
- Tested Qwen2-0.5B-Instruct GGUF as alternative salience classifier:
- Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval).
- More responsive but over-classifies messages as “salient.”
- Established
.envintegration for model ID (SALIENCE_MODEL), enabling hot-swap between models.
** Known Issues
- Small models tend to drift or over-classify.
- CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models.
- Need to set up a
systemdservice forllama-serverto auto-start on VM reboot.
[Lyra-Cortex] v0.1.0 — 2025-09-25
Added
- First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD).
- Built llama.cpp with
llama-servertarget via CMake. - Integrated Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF) model.
- Verified API compatibility at
/v1/chat/completions. - Local test successful via
curl→ ~523 token response generated. - Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X).
- Confirmed usable for salience scoring, summarization, and lightweight reasoning.