53 KiB
Project Lyra Changelog
All notable changes to Project Lyra. Format based on Keep a Changelog and Semantic Versioning.
[Unreleased]
[0.7.0] - 2025-12-21
Added - Standard Mode & UI Enhancements
Standard Mode Implementation
- Added "Standard Mode" chat option that bypasses complex cortex reasoning pipeline
- Provides simple chatbot functionality for coding and practical tasks
- Maintains full conversation context across messages
- Backend-agnostic - works with SECONDARY (Ollama), OPENAI, or custom backends
- Created
/simpleendpoint in Cortex router cortex/router.py:389
- Mode selector in UI with toggle between Standard and Cortex modes
- Standard Mode: Direct LLM chat with context retention
- Cortex Mode: Full 7-stage reasoning pipeline (unchanged)
Backend Selection System
- UI settings modal with LLM backend selection for Standard Mode
- Radio button selector: SECONDARY (Ollama/Qwen), OPENAI (GPT-4o-mini), or custom
- Backend preference persisted in localStorage
- Custom backend text input for advanced users
- Backend parameter routing through entire stack:
- UI sends
backendparameter in request body - Relay forwards backend selection to Cortex
- Cortex
/simpleendpoint respects user's backend choice
- UI sends
- Environment-based fallback: Uses
STANDARD_MODE_LLMif no backend specified
Session Management Overhaul
- Complete rewrite of session system to use server-side persistence
- File-based storage in
core/relay/sessions/directory - Session files:
{sessionId}.jsonfor history,{sessionId}.meta.jsonfor metadata - Server is source of truth - sessions sync across browsers and reboots
- File-based storage in
- Session metadata system for friendly names
- Sessions display custom names instead of random IDs
- Rename functionality in session dropdown
- Last modified timestamps and message counts
- Full CRUD API for sessions in Relay:
GET /sessions- List all sessions with metadataGET /sessions/:id- Retrieve session historyPOST /sessions/:id- Save session historyPATCH /sessions/:id/metadata- Update session name/metadataDELETE /sessions/:id- Delete session and metadata
- Session management UI in settings modal:
- List of all sessions with message counts and timestamps
- Delete button for each session with confirmation
- Automatic session cleanup when deleting current session
UI Improvements
- Settings modal with hamburger menu (⚙ Settings button)
- Backend selection section for Standard Mode
- Session management section with delete functionality
- Clean modal overlay with cyberpunk theme
- ESC key and click-outside to close
- Light/Dark mode toggle with dark mode as default
- Theme preference persisted in localStorage
- CSS variables for seamless theme switching
- Toggle button shows current mode (🌙 Dark Mode / ☀️ Light Mode)
- Removed redundant model selector dropdown from header
- Fixed modal positioning and z-index layering
- Modal moved outside #chat container for proper rendering
- Fixed z-index: overlay (999), modal content (1001)
- Centered modal with proper backdrop blur
Context Retention for Standard Mode
- Integration with Intake module for conversation history
- Added
get_recent_messages()function in intake.py - Standard Mode retrieves last 20 messages from session buffer
- Full context sent to LLM on each request
- Added
- Message array format support in LLM router:
- Updated Ollama provider to accept
messagesparameter - Updated OpenAI provider to accept
messagesparameter - Automatic conversion from messages to prompt string for non-chat APIs
- Updated Ollama provider to accept
Changed - Architecture & Routing
Relay Server Updates core/relay/server.js
- ES module migration for session persistence:
- Imported
fs/promises,path,fileURLToPathfor file operations - Created
SESSIONS_DIRconstant for session storage location
- Imported
- Mode-based routing in both
/chatand/v1/chat/completionsendpoints:- Extracts
modeparameter from request body (default: "cortex") - Routes to
CORTEX_SIMPLEfor Standard Mode,CORTEX_REASONfor Cortex Mode - Backend parameter only used in Standard Mode
- Extracts
- Session persistence functions:
ensureSessionsDir()- Creates sessions directory if neededloadSession(sessionId)- Reads session history from filesaveSession(sessionId, history, metadata)- Writes session to fileloadSessionMetadata(sessionId)- Reads session metadatasaveSessionMetadata(sessionId, metadata)- Updates session metadatalistSessions()- Returns all sessions with metadata, sorted by last modifieddeleteSession(sessionId)- Removes session and metadata files
Cortex Router Updates cortex/router.py
- Added
backendfield toReasonRequestPydantic model (optional) - Created
/simpleendpoint for Standard Mode:- Bypasses reflection, reasoning, refinement stages
- Direct LLM call with conversation context
- Uses backend from request or falls back to
STANDARD_MODE_LLMenv variable - Returns simple response structure without reasoning artifacts
- Backend selection logic in
/simple:- Normalizes backend names to uppercase
- Maps UI backend names to system backend names
- Validates backend availability before calling
Intake Integration cortex/intake/intake.py
- Added
get_recent_messages(session_id, limit)function:- Retrieves last N messages from session buffer
- Returns empty list if session doesn't exist
- Used by
/simpleendpoint for context retrieval
LLM Router Enhancements cortex/llm/llm_router.py
- Added
messagesparameter support across all providers - Automatic message-to-prompt conversion for legacy APIs
- Chat completion format for Ollama and OpenAI providers
- Stop sequences for MI50/DeepSeek R1 to prevent runaway generation:
"User:","\nUser:","Assistant:","\n\n\n"
Environment Configuration .env
- Added
STANDARD_MODE_LLM=SECONDARYfor default Standard Mode backend - Added
CORTEX_SIMPLE_URL=http://cortex:7081/simplefor routing
UI Architecture core/ui/index.html
- Server-based session loading system:
loadSessionsFromServer()- Fetches sessions from Relay APIrenderSessions()- Populates session dropdown from server data- Session state synchronized with server on every change
- Backend selection persistence:
- Loads saved backend from localStorage on page load
- Includes backend parameter in request body when in Standard Mode
- Settings modal pre-selects current backend choice
- Dark mode by default:
- Checks localStorage for theme preference
- Sets dark theme if no preference found
- Toggle button updates localStorage and applies theme
CSS Styling core/ui/style.css
- Light mode CSS variables:
--bg-dark: #f5f5f5(light background)--text-main: #1a1a1a(dark text)--text-fade: #666(dimmed text)
- Dark mode CSS variables (default):
--bg-dark: #0a0a0a(dark background)--text-main: #e6e6e6(light text)--text-fade: #999(dimmed text)
- Modal positioning fixes:
position: fixedwithtop: 50%,left: 50%,transform: translate(-50%, -50%)- Z-index layering: overlay (999), content (1001)
- Backdrop blur effect on modal overlay
- Session list styling:
- Session item cards with hover effects
- Delete button with red hover state
- Message count and timestamp display
Fixed - Critical Issues
DeepSeek R1 Runaway Generation
- Root cause: R1 reasoning model generates thinking process and hallucinates conversations
- Solution:
- Changed
STANDARD_MODE_LLMto SECONDARY (Ollama/Qwen) instead of PRIMARY (MI50/R1) - Added stop sequences to MI50 provider to prevent continuation
- Documented R1 limitations for Standard Mode usage
- Changed
Context Not Maintained in Standard Mode
- Root cause:
/simpleendpoint didn't retrieve conversation history from Intake - Solution:
- Created
get_recent_messages()function in intake.py - Standard Mode now pulls last 20 messages from session buffer
- Full context sent to LLM with each request
- Created
- User feedback: "it's saying it hasn't received any other messages from me, so it looks like the standard mode llm isn't getting the full chat"
OpenAI Backend 400 Errors
- Root cause: OpenAI provider only accepted prompt strings, not messages arrays
- Solution: Updated OpenAI provider to support messages parameter like Ollama
- Now handles chat completion format correctly
Modal Formatting Issues
- Root cause: Settings modal inside #chat container with overflow constraints
- Symptoms: Modal appearing at bottom, jumbled layout, couldn't close
- Solution:
- Moved modal outside #chat container to be direct child of body
- Changed positioning from absolute to fixed
- Added proper z-index layering (overlay: 999, content: 1001)
- Removed old model selector from header
- User feedback: "the formating for the settings is all off. Its at the bottom and all jumbling together, i cant get it to go away"
Session Persistence Broken
- Root cause: Sessions stored only in localStorage, not synced with server
- Symptoms: Sessions didn't persist across browsers or reboots, couldn't load messages
- Solution: Complete rewrite of session system
- Implemented server-side file persistence in Relay
- Created CRUD API endpoints for session management
- Updated UI to load sessions from server instead of localStorage
- Added metadata system for session names
- Sessions now survive container restarts and sync across browsers
- User feedback: "sessions seem to exist locally only, i cant get them to actually load any messages and there is now way to delete them. If i open the ui in a different browser those arent there."
Technical Improvements
Backward Compatibility
- All changes include defaults to maintain existing behavior
- Cortex Mode completely unchanged - still uses full 7-stage pipeline
- Standard Mode is opt-in via UI mode selector
- If no backend specified, falls back to
STANDARD_MODE_LLMenv variable - Existing requests without mode parameter default to "cortex"
Code Quality
- Consistent async/await patterns throughout stack
- Proper error handling with fallbacks
- Clean separation between Standard and Cortex modes
- Session persistence abstracted into helper functions
- Modular UI code with clear event handlers
Performance
- Standard Mode bypasses 6 of 7 reasoning stages for faster responses
- Session loading optimized with file-based caching
- Backend selection happens once per message, not per LLM call
- Minimal overhead for mode detection and routing
Architecture - Dual-Mode Chat System
Standard Mode Flow:
User (UI) → Relay → Cortex /simple → Intake (get_recent_messages)
→ LLM (direct call with context) → Relay → UI
Cortex Mode Flow (Unchanged):
User (UI) → Relay → Cortex /reason → Reflection → Reasoning
→ Refinement → Persona → Relay → UI
Session Persistence:
UI → POST /sessions/:id → Relay → File system (sessions/*.json)
UI → GET /sessions → Relay → List all sessions → UI dropdown
Known Limitations
Standard Mode:
- No reflection, reasoning, or refinement stages
- No RAG integration (same as Cortex Mode - currently disabled)
- No NeoMem memory storage (same as Cortex Mode - currently disabled)
- DeepSeek R1 not recommended for Standard Mode (generates reasoning artifacts)
Session Management:
- Sessions stored in container filesystem - need volume mount for true persistence
- No session import/export functionality yet
- No session search or filtering
Migration Notes
For Users Upgrading:
- Existing sessions in localStorage will not automatically migrate to server
- Create new sessions after upgrade for server-side persistence
- Theme preference (light/dark) will be preserved from localStorage
- Backend preference will default to SECONDARY if not previously set
For Developers:
- Relay now requires
fs/promisesfor session persistence - Cortex
/simpleendpoint expectsbackendparameter (optional) - UI sends
modeandbackendparameters in request body - Session files stored in
core/relay/sessions/directory
[0.6.0] - 2025-12-18
Added - Autonomy System (Phase 1 & 2)
Autonomy Phase 1 - Self-Awareness & Planning Foundation
- Executive Planning Module cortex/autonomy/executive/planner.py
- Autonomous goal setting and task planning capabilities
- Multi-step reasoning for complex objectives
- Integration with self-state tracking
- Self-State Management cortex/data/self_state.json
- Persistent state tracking across sessions
- Memory of past actions and outcomes
- Self-awareness metadata storage
- Self Analyzer cortex/autonomy/self/analyzer.py
- Analyzes own performance and decision patterns
- Identifies areas for improvement
- Tracks cognitive patterns over time
- Test Suite cortex/tests/test_autonomy_phase1.py
- Unit tests for phase 1 autonomy features
Autonomy Phase 2 - Decision Making & Proactive Behavior
- Autonomous Actions Module cortex/autonomy/actions/autonomous_actions.py
- Self-initiated action execution
- Context-aware decision implementation
- Action logging and tracking
- Pattern Learning System cortex/autonomy/learning/pattern_learner.py
- Learns from interaction patterns
- Identifies recurring user needs
- Adapts behavior based on learned patterns
- Proactive Monitor cortex/autonomy/proactive/monitor.py
- Monitors system state for intervention opportunities
- Detects patterns requiring proactive response
- Background monitoring capabilities
- Decision Engine cortex/autonomy/tools/decision_engine.py
- Autonomous decision-making framework
- Weighs options and selects optimal actions
- Integrates with orchestrator for coordinated decisions
- Orchestrator cortex/autonomy/tools/orchestrator.py
- Coordinates multiple autonomy subsystems
- Manages tool selection and execution
- Handles NeoMem integration (with disable capability)
- Test Suite cortex/tests/test_autonomy_phase2.py
- Unit tests for phase 2 autonomy features
Autonomy Phase 2.5 - Pipeline Refinement
- Tightened integration between autonomy modules and reasoning pipeline
- Enhanced self-state persistence and tracking
- Improved orchestrator reliability
- NeoMem integration refinements in vector store handling neomem/neomem/vector_stores/qdrant.py
Added - Documentation
- Complete AI Agent Breakdown docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md
- Comprehensive system architecture documentation
- Detailed component descriptions
- Data flow diagrams
- Integration points and API specifications
Changed - Core Integration
- Router Updates cortex/router.py
- Integrated autonomy subsystems into main routing logic
- Added endpoints for autonomous decision-making
- Enhanced state management across requests
- Reasoning Pipeline cortex/reasoning/reasoning.py
- Integrated autonomy-aware reasoning
- Self-state consideration in reasoning process
- Persona Layer cortex/persona/speak.py
- Autonomy-aware response generation
- Self-state reflection in personality expression
- Context Handling cortex/context.py
- NeoMem disable capability for flexible deployment
Changed - Development Environment
- Updated .gitignore for better workspace management
- Cleaned up VSCode settings
- Removed .vscode/settings.json from repository
Technical Improvements
- Modular autonomy architecture with clear separation of concerns
- Test-driven development for new autonomy features
- Enhanced state persistence across system restarts
- Flexible NeoMem integration with enable/disable controls
Architecture - Autonomy System Design
The autonomy system operates in layers:
- Executive Layer - High-level planning and goal setting
- Decision Layer - Evaluates options and makes choices
- Action Layer - Executes autonomous decisions
- Learning Layer - Adapts behavior based on patterns
- Monitoring Layer - Proactive awareness of system state
All layers coordinate through the orchestrator and maintain state in self_state.json.
[0.5.2] - 2025-12-12
Fixed - LLM Router & Async HTTP
- Critical: Replaced synchronous
requestswith asynchttpxin LLM router cortex/llm/llm_router.py- Event loop blocking was causing timeouts and empty responses
- All three providers (MI50, Ollama, OpenAI) now use
await http_client.post() - Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake
- Critical: Fixed missing
backendparameter in intake summarization cortex/intake/intake.py:285- Was defaulting to PRIMARY (MI50) instead of respecting
INTAKE_LLM=SECONDARY - Now correctly uses configured backend (Ollama on 3090)
- Was defaulting to PRIMARY (MI50) instead of respecting
- Relay: Fixed session ID case mismatch core/relay/server.js:87
- UI sends
sessionId(camelCase) but relay expectedsession_id(snake_case) - Now accepts both variants:
req.body.session_id || req.body.sessionId - Custom session IDs now properly tracked instead of defaulting to "default"
- UI sends
Added - Error Handling & Diagnostics
- Added comprehensive error handling in LLM router for all providers
- HTTPError, JSONDecodeError, KeyError, and generic Exception handling
- Detailed error messages with exception type and description
- Provider-specific error logging (mi50, ollama, openai)
- Added debug logging in intake summarization
- Logs LLM response length and preview
- Validates non-empty responses before JSON parsing
- Helps diagnose empty or malformed responses
Added - Session Management
- Added session persistence endpoints in relay core/relay/server.js:160-171
GET /sessions/:id- Retrieve session historyPOST /sessions/:id- Save session history- In-memory storage using Map (ephemeral, resets on container restart)
- Fixes UI "Failed to load session" errors
Changed - Provider Configuration
- Added
mi50provider support for llama.cpp server cortex/llm/llm_router.py:62-81- Uses
/completionendpoint withn_predictparameter - Extracts
contentfield from response - Configured for MI50 GPU with DeepSeek model
- Uses
- Increased memory retrieval threshold from 0.78 to 0.90 cortex/.env:20
- Filters out low-relevance memories (only returns 90%+ similarity)
- Reduces noise in context retrieval
Technical Improvements
- Unified async HTTP handling across all LLM providers
- Better separation of concerns between provider implementations
- Improved error messages for debugging LLM API failures
- Consistent timeout handling (120 seconds for all providers)
[0.5.1] - 2025-12-11
Fixed - Intake Integration
- Critical: Fixed
bg_summarize()function not defined error- Was only a
TYPE_CHECKINGstub, now implemented as logging stub - Eliminated
NameErrorpreventing SESSIONS from persisting correctly - Function now logs exchange additions and defers summarization to
/reasonendpoint
- Was only a
- Critical: Fixed
/ingestendpoint unreachable code in router.py:201-233- Removed early return that prevented
update_last_assistant_message()from executing - Removed duplicate
add_exchange_internal()call - Implemented lenient error handling (each operation wrapped in try/except)
- Removed early return that prevented
- Intake: Added missing
__init__.pyto make intake a proper Python package cortex/intake/init.py- Prevents namespace package issues
- Enables proper module imports
- Exports
SESSIONS,add_exchange_internal,summarize_context
Added - Diagnostics & Debugging
- Added diagnostic logging to verify SESSIONS singleton behavior
- Module initialization logs SESSIONS object ID intake.py:14
- Each
add_exchange_internal()call logs object ID and buffer state intake.py:343-358
- Added
/debug/sessionsHTTP endpoint router.py:276-305- Inspect SESSIONS from within running Uvicorn worker
- Shows total sessions, session count, buffer sizes, recent exchanges
- Returns SESSIONS object ID for verification
- Added
/debug/summaryHTTP endpoint router.py:238-271- Test
summarize_context()for any session - Returns L1/L5/L10/L20/L30 summaries
- Includes buffer size and exchange preview
- Test
Changed - Intake Architecture
- Intake no longer standalone service - runs inside Cortex container as pure Python module
- Imported as
from intake.intake import add_exchange_internal, SESSIONS - No HTTP calls between Cortex and Intake
- Eliminates network latency and dependency on Intake service being up
- Imported as
- Deferred summarization:
bg_summarize()is now a no-op stub intake.py:318-325- Actual summarization happens during
/reasoncall viasummarize_context() - Simplifies async/sync complexity
- Prevents NameError when called from
add_exchange_internal()
- Actual summarization happens during
- Lenient error handling:
/ingestendpoint always returns success router.py:201-233- Each operation wrapped in try/except
- Logs errors but never fails to avoid breaking chat pipeline
- User requirement: never fail chat pipeline
Documentation
- Added single-worker constraint note in cortex/Dockerfile:7-8
- Documents that SESSIONS requires single Uvicorn worker
- Notes that multi-worker scaling requires Redis or shared storage
- Updated plan documentation with root cause analysis
[0.5.0] - 2025-11-28
Fixed - Critical API Wiring & Integration
After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.
Cortex → Intake Integration
- Fixed
IntakeClientto use correct Intake v0.2 API endpoints- Changed
GET /context/{session_id}→GET /summaries?session_id={session_id} - Updated JSON response parsing to extract
summary_textfield - Fixed environment variable name:
INTAKE_API→INTAKE_API_URL - Corrected default port:
7083→7080 - Added deprecation warning to
summarize_turn()method (endpoint removed in Intake v0.2)
- Changed
Relay → UI Compatibility
- Added OpenAI-compatible endpoint
POST /v1/chat/completions- Accepts standard OpenAI format with
messages[]array - Returns OpenAI-compatible response structure with
choices[] - Extracts last message content from messages array
- Includes usage metadata (stub values for compatibility)
- Accepts standard OpenAI format with
- Refactored Relay to use shared
handleChatRequest()function- Both
/chatand/v1/chat/completionsuse same core logic - Eliminates code duplication
- Consistent error handling across endpoints
- Both
Relay → Intake Connection
- Fixed Intake URL fallback in Relay server configuration
- Corrected port:
7082→7080 - Updated endpoint:
/summary→/add_exchange - Now properly sends exchanges to Intake for summarization
- Corrected port:
Code Quality & Python Package Structure
- Added missing
__init__.pyfiles to all Cortex subdirectoriescortex/llm/__init__.pycortex/reasoning/__init__.pycortex/persona/__init__.pycortex/ingest/__init__.pycortex/utils/__init__.py- Improves package imports and IDE support
- Removed unused import in
cortex/router.py:from unittest import result - Deleted empty file
cortex/llm/resolve_llm_url.py(was 0 bytes, never implemented)
Verified Working
Complete end-to-end message flow now operational:
UI → Relay (/v1/chat/completions)
↓
Relay → Cortex (/reason)
↓
Cortex → Intake (/summaries) [retrieves context]
↓
Cortex 4-stage pipeline:
1. reflection.py → meta-awareness notes
2. reasoning.py → draft answer
3. refine.py → polished answer
4. persona/speak.py → Lyra personality
↓
Cortex → Relay (returns persona response)
↓
Relay → Intake (/add_exchange) [async summary]
↓
Intake → NeoMem (background memory storage)
↓
Relay → UI (final response)
Documentation
- Added comprehensive v0.5.0 changelog entry
- Updated README.md to reflect v0.5.0 architecture
- Documented new endpoints
- Updated data flow diagrams
- Clarified Intake v0.2 changes
- Corrected service descriptions
Issues Resolved
- ❌ Cortex could not retrieve context from Intake (wrong endpoint)
- ❌ UI could not send messages to Relay (endpoint mismatch)
- ❌ Relay could not send summaries to Intake (wrong port/endpoint)
- ❌ Python package imports were implicit (missing init.py)
Known Issues (Non-Critical)
- Session management endpoints not implemented in Relay (
GET/POST /sessions/:id) - RAG service currently disabled in docker-compose.yml
- Cortex
/ingestendpoint is a stub returning{"status": "ok"}
Migration Notes
If upgrading from v0.4.x:
- Pull latest changes from git
- Verify environment variables in
.envfiles:- Check
INTAKE_API_URL=http://intake:7080(notINTAKE_API) - Verify all service URLs use correct ports
- Check
- Restart Docker containers:
docker-compose down && docker-compose up -d - Test with a simple message through the UI
[Infrastructure v1.0.0] - 2025-11-26
Changed - Environment Variable Consolidation
Major reorganization to eliminate duplication and improve maintainability
- Consolidated 9 scattered
.envfiles into single source of truth architecture - Root
.envnow contains all shared infrastructure (LLM backends, databases, API keys, service URLs) - Service-specific
.envfiles minimized to only essential overrides:cortex/.env: Reduced from 42 to 22 lines (operational parameters only)neomem/.env: Reduced from 26 to 14 lines (LLM naming conventions only)intake/.env: Kept at 8 lines (already minimal)
- Result: ~24% reduction in total configuration lines (197 → ~150)
Docker Compose Consolidation
- All services now defined in single root
docker-compose.yml - Relay service updated with complete configuration (env_file, volumes)
- Removed redundant
core/docker-compose.yml(marked as DEPRECATED) - Standardized network communication to use Docker container names
Service URL Standardization
- Internal services use container names:
http://neomem-api:7077,http://cortex:7081 - External services use IP addresses:
http://10.0.0.43:8000(vLLM),http://10.0.0.3:11434(Ollama) - Removed IP/container name inconsistencies across files
Added - Security & Documentation
Security Templates - Created .env.example files for all services
- Root
.env.examplewith sanitized credentials - Service-specific templates:
cortex/.env.example,neomem/.env.example,intake/.env.example,rag/.env.example - All
.env.examplefiles safe to commit to version control
Documentation
ENVIRONMENT_VARIABLES.md: Comprehensive reference for all environment variables- Variable descriptions, defaults, and usage examples
- Multi-backend LLM strategy documentation
- Troubleshooting guide
- Security best practices
DEPRECATED_FILES.md: Deletion guide for deprecated files with verification steps
Enhanced .gitignore
- Ignores all
.envfiles (including subdirectories) - Tracks
.env.exampletemplates for documentation - Ignores
.env-backups/directory
Removed
core/.env- Redundant with root.env, now deletedcore/docker-compose.yml- Consolidated into main compose file (marked DEPRECATED)
Fixed
- Eliminated duplicate
OPENAI_API_KEYacross 5+ files - Eliminated duplicate LLM backend URLs across 4+ files
- Eliminated duplicate database credentials across 3+ files
- Resolved Cortex
environment:section override in docker-compose (now uses env_file)
Architecture - Multi-Backend LLM Strategy
Root .env provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE:
- Cortex → vLLM (PRIMARY) for autonomous reasoning
- NeoMem → Ollama (SECONDARY) + OpenAI embeddings
- Intake → vLLM (PRIMARY) for summarization
- Relay → Fallback chain with user preference
Preserves per-service flexibility while eliminating URL duplication.
Migration
- All original
.envfiles backed up to.env-backups/with timestamp20251126_025334 - Rollback plan documented in
ENVIRONMENT_VARIABLES.md - Verification steps provided in
DEPRECATED_FILES.md
[0.4.x] - 2025-11-13
Added - Multi-Stage Reasoning Pipeline
Cortex v0.5 - Complete architectural overhaul
-
New
reasoning.pymodule- Async reasoning engine
- Accepts user prompt, identity, RAG block, and reflection notes
- Produces draft internal answers
- Uses primary backend (vLLM)
-
New
reflection.pymodule- Fully async meta-awareness layer
- Produces actionable JSON "internal notes"
- Enforces strict JSON schema and fallback parsing
- Forces cloud backend (
backend_override="cloud")
-
Integrated
refine.pyinto pipeline- New stage between reflection and persona
- Runs exclusively on primary vLLM backend (MI50)
- Produces final, internally consistent output for downstream persona layer
-
Backend override system
- Each LLM call can now select its own backend
- Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary
-
Identity loader
- Added
identity.pywithload_identity()for consistent persona retrieval
- Added
-
Ingest handler
- Async stub created for future Intake → NeoMem → RAG pipeline
Cortex v0.4.1 - RAG Integration
- RAG integration
- Added
rag.pywithquery_rag()andformat_rag_block() - Cortex now queries local RAG API (
http://10.0.0.41:7090/rag/search) - Synthesized answers and top excerpts injected into reasoning prompt
- Added
Changed - Unified LLM Architecture
Cortex v0.5
-
Unified LLM backend URL handling across Cortex
- ENV variables must now contain FULL API endpoints
- Removed all internal path-appending (e.g.
.../v1/completions) llm_router.pyrewritten to use env-provided URLs as-is- Ensures consistent behavior between draft, reflection, refine, and persona
-
Rebuilt
main.py- Removed old annotation/analysis logic
- New structure: load identity → get RAG → reflect → reason → return draft+notes
- Routes now clean and minimal (
/reason,/ingest,/health) - Async path throughout Cortex
-
Refactored
llm_router.py- Removed old fallback logic during overrides
- OpenAI requests now use
/v1/chat/completions - Added proper OpenAI Authorization headers
- Distinct payload format for vLLM vs OpenAI
- Unified, correct parsing across models
-
Simplified Cortex architecture
- Removed deprecated "context.py" and old reasoning code
- Relay completely decoupled from smart behavior
-
Updated environment specification
LLM_PRIMARY_URLnow set tohttp://10.0.0.43:8000/v1/completionsLLM_SECONDARY_URLremainshttp://10.0.0.3:11434/api/generate(Ollama)LLM_CLOUD_URLset tohttps://api.openai.com/v1/chat/completions
Cortex v0.4.1
-
Revised
/reasonendpoint- Now builds unified context blocks: [Intake] → recent summaries, [RAG] → contextual knowledge, [User Message] → current input
- Calls
call_llm()for first pass, thenreflection_loop()for meta-evaluation - Returns
cortex_prompt,draft_output,final_output, and normalized reflection
-
Reflection Pipeline Stability
- Cleaned parsing to normalize JSON vs. text reflections
- Added fallback handling for malformed or non-JSON outputs
- Log system improved to show raw JSON, extracted fields, and normalized summary
-
Async Summarization (Intake v0.2.1)
- Intake summaries now run in background threads to avoid blocking Cortex
- Summaries (L1–L∞) logged asynchronously with [BG] tags
-
Environment & Networking Fixes
- Verified
.envvariables propagate correctly inside Cortex container - Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG
- Adjusted localhost calls to service-IP mapping
- Verified
-
Behavioral Updates
- Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers)
- RAG context successfully grounds reasoning outputs
- Intake and NeoMem confirmed receiving summaries via
/add_exchange - Log clarity pass: all reflective and contextual blocks clearly labeled
Fixed
Cortex v0.5
- Resolved endpoint conflict where router expected base URLs and refine expected full URLs
- Fixed by standardizing full-URL behavior across entire system
- Reflection layer no longer fails silently (previously returned
[""]due to MythoMax) - Resolved 404/401 errors caused by incorrect OpenAI URL endpoints
- No more double-routing through vLLM during reflection
- Corrected async/sync mismatch in multiple locations
- Eliminated double-path bug (
/v1/completions/v1/completions) caused by previous router logic
Removed
Cortex v0.5
- Legacy
annotate,reason_checkglue logic from old architecture - Old backend probing junk code
- Stale imports and unused modules leftover from previous prototype
Verified
Cortex v0.5
- Cortex → vLLM (MI50) → refine → final_output now functioning correctly
- Refine shows
used_primary_backend: trueand no fallback - Manual curl test confirms endpoint accuracy
Known Issues
Cortex v0.5
- Refine sometimes prefixes output with
"Final Answer:"; next version will sanitize this - Hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned)
Cortex v0.4.1
- NeoMem tuning needed - improve retrieval latency and relevance
- Need dedicated
/reflections/recentendpoint for Cortex - Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem)
- Add persistent reflection recall (use prior reflections as meta-context)
- Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields)
- Tighten temperature and prompt control for factual consistency
- RAG optimization: add source ranking, filtering, multi-vector hybrid search
- Cache RAG responses per session to reduce duplicate calls
Notes
Cortex v0.5
This is the largest structural change to Cortex so far. It establishes:
- Multi-model cognition
- Clean layering
- Identity + reflection separation
- Correct async code
- Deterministic backend routing
- Predictable JSON reflection
The system is now ready for:
- Refinement loops
- Persona-speaking layer
- Containerized RAG
- Long-term memory integration
- True emergent-behavior experiments
[0.3.x] - 2025-10-28 to 2025-09-26
Added
[Lyra Core v0.3.2 + Web UI v0.2.0] - 2025-10-28
-
New UI
- Cleaned up UI look and feel
-
Sessions
- Sessions now persist over time
- Ability to create new sessions or load sessions from previous instance
- When changing session, updates what the prompt sends to relay (doesn't prompt with messages from other sessions)
- Relay correctly wired in
[Lyra-Core 0.3.1] - 2025-10-09
- NVGRAM Integration (Full Pipeline Reconnected)
- Replaced legacy Mem0 service with NVGRAM microservice (
nvgram-api@ port 7077) - Updated
server.jsin Relay to route all memory ops via${NVGRAM_API}/memoriesand/search - Added
.envvariable:NVGRAM_API=http://nvgram-api:7077 - Verified end-to-end Lyra conversation persistence:
relay → nvgram-api → postgres/neo4j → relay → ollama → ui - ✅ Memories stored, retrieved, and re-injected successfully
- Replaced legacy Mem0 service with NVGRAM microservice (
[Lyra-Core v0.3.0] - 2025-09-26
- Salience filtering in Relay
.envconfigurable:SALIENCE_ENABLED,SALIENCE_MODE,SALIENCE_MODEL,SALIENCE_API_URL- Supports
heuristicandllmclassification modes - LLM-based salience filter integrated with Cortex VM running
llama-server
- Logging improvements
- Added debug logs for salience mode, raw LLM output, and unexpected outputs
- Fail-closed behavior for unexpected LLM responses
- Successfully tested with Phi-3.5-mini and Qwen2-0.5B-Instruct as salience classifiers
- Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply
[Cortex v0.3.0] - 2025-10-31
-
Cortex Service (FastAPI)
- New standalone reasoning engine (
cortex/main.py) with endpoints:GET /health– reports active backend + NeoMem statusPOST /reason– evaluates{prompt, response}pairsPOST /annotate– experimental text analysis
- Background NeoMem health monitor (5-minute interval)
- New standalone reasoning engine (
-
Multi-Backend Reasoning Support
- Environment-driven backend selection via
LLM_FORCE_BACKEND - Supports: Primary (vLLM MI50), Secondary (Ollama 3090), Cloud (OpenAI), Fallback (llama.cpp CPU)
- Per-backend model variables:
LLM_PRIMARY_MODEL,LLM_SECONDARY_MODEL,LLM_CLOUD_MODEL,LLM_FALLBACK_MODEL
- Environment-driven backend selection via
-
Response Normalization Layer
- Implemented
normalize_llm_response()to merge streamed outputs and repair malformed JSON - Handles Ollama's multi-line streaming and Mythomax's missing punctuation issues
- Prints concise debug previews of merged content
- Implemented
-
Environment Simplification
- Each service (
intake,cortex,neomem) now maintains its own.envfile - Removed reliance on shared/global env file to prevent cross-contamination
- Verified Docker Compose networking across containers
- Each service (
[NeoMem 0.1.2] - 2025-10-27 (formerly NVGRAM)
- Renamed NVGRAM to NeoMem
- All future updates under name NeoMem
- Features unchanged
[NVGRAM 0.1.1] - 2025-10-08
- Async Memory Rewrite (Stability + Safety Patch)
- Introduced
AsyncMemoryclass with fully asynchronous vector and graph store writes - Added input sanitation to prevent embedding errors (
'list' object has no attribute 'replace') - Implemented
flatten_messages()helper in API layer to clean malformed payloads - Added structured request logging via
RequestLoggingMiddleware(FastAPI middleware) - Health endpoint (
/health) returns structured JSON{status, version, service} - Startup logs include sanitized embedder config with masked API keys
- Introduced
[NVGRAM 0.1.0] - 2025-10-07
- Initial fork of Mem0 → NVGRAM
- Created fully independent local-first memory engine based on Mem0 OSS
- Renamed all internal modules, Docker services, environment variables from
mem0→nvgram - New service name:
nvgram-api, default port 7077 - Maintains same API endpoints (
/memories,/search) for drop-in compatibility - Uses FastAPI, Postgres, and Neo4j as persistent backends
[Lyra-Mem0 0.3.2] - 2025-10-05
- Ollama LLM reasoning alongside OpenAI embeddings
- Introduced
LLM_PROVIDER=ollama,LLM_MODEL, andOLLAMA_HOSTin.env.3090 - Verified local 3090 setup using
qwen2.5:7b-instruct-q4_K_M - Split processing: Embeddings → OpenAI
text-embedding-3-small, LLM → Local Ollama
- Introduced
- Added
.env.3090template for self-hosted inference nodes - Integrated runtime diagnostics and seeder progress tracking
- File-level + message-level progress bars
- Retry/back-off logic for timeouts (3 attempts)
- Event logging (
ADD / UPDATE / NONE) for every memory record
- Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers
- Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090)
[Lyra-Mem0 0.3.1] - 2025-10-03
- HuggingFace TEI integration (local 3090 embedder)
- Dual-mode environment switch between OpenAI cloud and local
- CSV export of memories from Postgres (
payload->>'data')
[Lyra-Mem0 0.3.0]
- Ollama embeddings in Mem0 OSS container
- Configure
EMBEDDER_PROVIDER=ollama,EMBEDDER_MODEL,OLLAMA_HOSTvia.env - Mounted
main.pyoverride from host into container to load customDEFAULT_CONFIG - Installed
ollamaPython client into custom API container image
- Configure
.env.3090file for external embedding mode (3090 machine)- Workflow for multiple embedding modes: LAN-based 3090/Ollama, Local-only CPU, OpenAI fallback
[Lyra-Mem0 v0.2.1]
- Seeding pipeline
- Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0
- Implemented incremental seeding option (skip existing memories, only add new ones)
- Verified insert process with Postgres-backed history DB
[Intake v0.1.0] - 2025-10-27
- Receives messages from relay and summarizes them in cascading format
- Continues to summarize smaller amounts of exchanges while generating large-scale conversational summaries (L20)
- Currently logs summaries to .log file in
/project-lyra/intake-logs/
[Lyra-Cortex v0.2.0] - 2025-09-26
- Integrated llama-server on dedicated Cortex VM (Proxmox)
- Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs
- Benchmarked Phi-3.5-mini performance: ~18 tokens/sec CPU-only on Ryzen 7 7800X
- Salience classification functional but sometimes inconsistent
- Tested Qwen2-0.5B-Instruct GGUF as alternative salience classifier
- Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval)
- More responsive but over-classifies messages as "salient"
- Established
.envintegration for model ID (SALIENCE_MODEL), enabling hot-swap between models
Changed
[Lyra-Core 0.3.1] - 2025-10-09
- Renamed
MEM0_URL→NVGRAM_APIacross all relay environment configs - Updated Docker Compose service dependency order
relaynow depends onnvgram-apihealthcheck- Removed
mem0references and volumes
- Minor cleanup to Persona fetch block (null-checks and safer default persona string)
[Lyra-Core v0.3.1] - 2025-09-27
- Removed salience filter logic; Cortex is now default annotator
- All user messages stored in Mem0; no discard tier applied
- Cortex annotations (
metadata.cortex) now attached to memories - Debug logging improvements
- Pretty-print Cortex annotations
- Injected prompt preview
- Memory search hit list with scores
.envtoggle (CORTEX_ENABLED) to bypass Cortex when needed
[Lyra-Core v0.3.0] - 2025-09-26
- Refactored
server.jsto gatemem.add()calls behind salience filter - Updated
.envto supportSALIENCE_MODEL
[Cortex v0.3.0] - 2025-10-31
- Refactored
reason_check()to dynamically switch between prompt and chat mode depending on backend - Enhanced startup logs to announce active backend, model, URL, and mode
- Improved error handling with clearer "Reasoning error" messages
[NVGRAM 0.1.1] - 2025-10-08
- Replaced synchronous
Memory.add()with async-safe version supporting concurrent vector + graph writes - Normalized indentation and cleaned duplicate
main.pyreferences - Removed redundant
FastAPI()app reinitialization - Updated internal logging to INFO-level timing format
- Deprecated
@app.on_event("startup")→ will migrate tolifespanhandler in v0.1.2
[NVGRAM 0.1.0] - 2025-10-07
- Removed dependency on external
mem0aiSDK — all logic now local - Re-pinned requirements: fastapi==0.115.8, uvicorn==0.34.0, pydantic==2.10.4, python-dotenv==1.0.1, psycopg>=3.2.8, ollama
- Adjusted
docker-composeand.envtemplates to use new NVGRAM naming
[Lyra-Mem0 0.3.2] - 2025-10-05
- Updated
main.pyconfiguration block to loadLLM_PROVIDER,LLM_MODEL,OLLAMA_BASE_URL- Fallback to OpenAI if Ollama unavailable
- Adjusted
docker-compose.ymlmount paths to correctly map/app/main.py - Normalized
.envloading somem0-apiand host environment share identical values - Improved seeder logging and progress telemetry
- Added explicit
temperaturefield toDEFAULT_CONFIG['llm']['config']
[Lyra-Mem0 0.3.0]
docker-compose.ymlupdated to mount localmain.pyand.env.3090- Built custom Dockerfile (
mem0-api-server:latest) extending base image withpip install ollama - Updated
requirements.txtto includeollamapackage - Adjusted Mem0 container config so
main.pypulls environment variables withdotenv - Tested new embeddings path with curl
/memoriesAPI call
[Lyra-Mem0 v0.2.1]
- Updated
main.pyto load configuration from.envusingdotenvand support multiple embedder backends - Mounted host
main.pyinto container so local edits persist across rebuilds - Updated
docker-compose.ymlto mount.env.3090and support swap between profiles - Built custom Dockerfile (
mem0-api-server:latest) includingpip install ollama - Updated
requirements.txtwithollamadependency - Adjusted startup flow so container automatically connects to external Ollama host (LAN IP)
- Added logging to confirm model pulls and embedding requests
Fixed
[Lyra-Core 0.3.1] - 2025-10-09
- Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling
/memoriesPOST failures no longer crash Relay; now logged gracefully asrelay error Error: memAdd failed: 500- Improved injected prompt debugging (
DEBUG_PROMPT=truenow prints clean JSON)
[Lyra-Core v0.3.1] - 2025-09-27
- Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner
- Relay no longer "hangs" on malformed Cortex outputs
[Cortex v0.3.0] - 2025-10-31
- Corrected broken vLLM endpoint routing (
/v1/completions) - Stabilized cross-container health reporting for NeoMem
- Resolved JSON parse failures caused by streaming chunk delimiters
[NVGRAM 0.1.1] - 2025-10-08
- Eliminated repeating 500 error from OpenAI embedder caused by non-string message content
- Masked API key leaks from boot logs
- Ensured Neo4j reconnects gracefully on first retry
[Lyra-Mem0 0.3.2] - 2025-10-05
- Resolved crash during startup:
TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url' - Corrected mount type mismatch (file vs directory) causing
OCI runtime create failederrors - Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests
- "Unknown event" warnings now safely ignored (no longer break seeding loop)
- Confirmed full dual-provider operation in logs (
api.openai.com+10.0.0.3:11434/api/chat)
[Lyra-Mem0 0.3.1] - 2025-10-03
.envCRLF vs LF line ending issues- Local seeding now possible via HuggingFace server
[Lyra-Mem0 0.3.0]
- Resolved container boot failure caused by missing
ollamadependency (ModuleNotFoundError) - Fixed config overwrite issue where rebuilding container restored stock
main.py - Worked around Neo4j error (
vector.similarity.cosine(): mismatched vector dimensions) by confirming OpenAI vs. Ollama embedding vector sizes
[Lyra-Mem0 v0.2.1]
- Seeder process originally failed on old memories — now skips duplicates and continues batch
- Resolved container boot error (
ModuleNotFoundError: ollama) by extending image - Fixed overwrite issue where stock
main.pyreplaced custom config during rebuild - Worked around Neo4j
vector.similarity.cosine()dimension mismatch
Known Issues
[Lyra-Core v0.3.0] - 2025-09-26
- Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient"
- Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi")
- CPU-only inference is functional but limited; larger models recommended once GPU available
[Lyra-Cortex v0.2.0] - 2025-09-26
- Small models tend to drift or over-classify
- CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models
- Need to set up
systemdservice forllama-serverto auto-start on VM reboot
Observations
[Lyra-Mem0 0.3.2] - 2025-10-05
- Stable GPU utilization: ~8 GB VRAM @ 92% load, ≈ 67°C under sustained seeding
- Next revision will re-format seed JSON to preserve
rolecontext (user vs assistant)
[Lyra-Mem0 v0.2.1]
- To fully unify embedding modes, a Hugging Face / local model with 1536-dim embeddings will be needed (to match OpenAI's schema)
- Current Ollama model (
mxbai-embed-large) works, but returns 1024-dim vectors - Seeder workflow validated but should be wrapped in repeatable weekly run for full Cloud→Local sync
Next Steps
[Lyra-Core 0.3.1] - 2025-10-09
- Add salience visualization (e.g., memory weights displayed in injected system message)
- Begin schema alignment with NVGRAM v0.1.2 for confidence scoring
- Add relay auto-retry for transient 500 responses from NVGRAM
[NVGRAM 0.1.1] - 2025-10-08
- Integrate salience scoring and embedding confidence weight fields in Postgres schema
- Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall
- Migrate from deprecated
on_event→lifespanpattern in 0.1.2
[NVGRAM 0.1.0] - 2025-10-07
- Integrate NVGRAM as new default backend in Lyra Relay
- Deprecate remaining Mem0 references and archive old configs
- Begin versioning as standalone project (
nvgram-core,nvgram-api, etc.)
[Intake v0.1.0] - 2025-10-27
- Feed intake into NeoMem
- Generate daily/hourly overall summary (IE: Today Brian and Lyra worked on x, y, and z)
- Generate session-aware summaries with own intake hopper
[0.2.x] - 2025-09-30 to 2025-09-24
Added
[Lyra-Mem0 v0.2.0] - 2025-09-30
- Standalone Lyra-Mem0 stack created at
~/lyra-mem0/- Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking
- Added working
docker-compose.mem0.ymland customDockerfilefor building Mem0 API server
- Verified REST API functionality
POST /memoriesworks for adding memoriesPOST /searchworks for semantic search
- Successful end-to-end test with persisted memory: "Likes coffee in the morning" → retrievable via search ✅
[Lyra-Core v0.2.0] - 2025-09-24
- Migrated Relay to use
mem0aiSDK instead of raw fetch calls - Implemented
sessionIdsupport (client-supplied, fallback todefault) - Added debug logs for memory add/search
- Cleaned up Relay structure for clarity
Changed
[Lyra-Mem0 v0.2.0] - 2025-09-30
- Split architecture into modular stacks:
~/lyra-core(Relay, Persona-Sidecar, etc.)~/lyra-mem0(Mem0 OSS memory stack)
- Removed old embedded mem0 containers from Lyra-Core compose file
- Added Lyra-Mem0 section in README.md
Next Steps
[Lyra-Mem0 v0.2.0] - 2025-09-30
- Wire Relay → Mem0 API (integration not yet complete)
- Add integration tests to verify persistence and retrieval from within Lyra-Core
[0.1.x] - 2025-09-25 to 2025-09-23
Added
[Lyra_RAG v0.1.0] - 2025-11-07
- Initial standalone RAG module for Project Lyra
- Persistent ChromaDB vector store (
./chromadb) - Importer
rag_chat_import.pywith:- Recursive folder scanning and category tagging
- Smart chunking (~5k chars)
- SHA-1 deduplication and chat-ID metadata
- Timestamp fields (
file_modified,imported_at) - Background-safe operation (
nohup/tmux)
- 68 Lyra-category chats imported:
- 6,556 new chunks added
- 1,493 duplicates skipped
- 7,997 total vectors stored
[Lyra_RAG v0.1.0 API] - 2025-11-07
/rag/searchFastAPI endpoint implemented (port 7090)- Supports natural-language queries and returns top related excerpts
- Added answer synthesis step using
gpt-4o-mini
[Lyra-Core v0.1.0] - 2025-09-23
- First working MVP of Lyra Core Relay
- Relay service accepts
POST /v1/chat/completions(OpenAI-compatible) - Memory integration with Mem0:
POST /memorieson each user messagePOST /searchbefore LLM call
- Persona Sidecar integration (
GET /current) - OpenAI GPT + Ollama (Mythomax) support in Relay
- Simple browser-based chat UI (talks to Relay at
http://<host>:7078) .envstandardization for Relay + Mem0 + Postgres + Neo4j- Working Neo4j + Postgres backing stores for Mem0
- Initial MVP relay service with raw fetch calls to Mem0
- Dockerized with basic healthcheck
[Lyra-Cortex v0.1.0] - 2025-09-25
- First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD)
- Built llama.cpp with
llama-servertarget via CMake - Integrated Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF) model
- Verified API compatibility at
/v1/chat/completions - Local test successful via
curl→ ~523 token response generated - Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X)
- Confirmed usable for salience scoring, summarization, and lightweight reasoning
Fixed
[Lyra-Core v0.1.0] - 2025-09-23
- Resolved crash loop in Neo4j by restricting env vars (
NEO4J_AUTHonly) - Relay now correctly reads
MEM0_URLandMEM0_API_KEYfrom.env
Verified
[Lyra_RAG v0.1.0] - 2025-11-07
- Successful recall of Lyra-Core development history (v0.3.0 snapshot)
- Correct metadata and category tagging for all new imports
Known Issues
[Lyra-Core v0.1.0] - 2025-09-23
- No feedback loop (thumbs up/down) yet
- Forget/delete flow is manual (via memory IDs)
- Memory latency ~1–4s depending on embedding model
Next Planned
[Lyra_RAG v0.1.0] - 2025-11-07
- Optional
wherefilter parameter for category/date queries - Graceful "no results" handler for empty retrievals
rag_docs_import.pyfor PDFs and other document types