1517 lines
65 KiB
Markdown
1517 lines
65 KiB
Markdown
# Project Lyra Changelog
|
||
|
||
All notable changes to Project Lyra.
|
||
Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/).
|
||
|
||
---
|
||
|
||
## [Unreleased]
|
||
|
||
---
|
||
|
||
## [0.9.0] - 2025-12-29
|
||
|
||
### Added - Trilium Notes Integration
|
||
|
||
**Trilium ETAPI Knowledge Base Integration**
|
||
- **Trilium Tool Executor** [cortex/autonomy/tools/executors/trilium.py](cortex/autonomy/tools/executors/trilium.py)
|
||
- `search_notes(query, limit)` - Search through Trilium notes via ETAPI
|
||
- `create_note(title, content, parent_note_id)` - Create new notes in Trilium knowledge base
|
||
- Full ETAPI authentication and error handling
|
||
- Automatic parentNoteId defaulting to "root" for root-level notes
|
||
- Connection error handling with user-friendly messages
|
||
- **Tool Registry Integration** [cortex/autonomy/tools/registry.py](cortex/autonomy/tools/registry.py)
|
||
- Added `ENABLE_TRILIUM` feature flag
|
||
- Tool definitions with schema validation
|
||
- Provider-agnostic tool calling support
|
||
- **Setup Documentation** [TRILIUM_SETUP.md](TRILIUM_SETUP.md)
|
||
- Step-by-step ETAPI token generation guide
|
||
- Environment configuration instructions
|
||
- Troubleshooting section for common issues
|
||
- Security best practices for token management
|
||
- **API Reference Documentation** [docs/TRILIUM_API.md](docs/TRILIUM_API.md)
|
||
- Complete ETAPI endpoint reference
|
||
- Authentication and request/response examples
|
||
- Search syntax and advanced query patterns
|
||
|
||
**Environment Configuration**
|
||
- **New Environment Variables** [.env](.env)
|
||
- `ENABLE_TRILIUM=true` - Enable/disable Trilium integration
|
||
- `TRILIUM_URL=http://10.0.0.2:4292` - Trilium instance URL
|
||
- `TRILIUM_ETAPI_TOKEN` - ETAPI authentication token
|
||
|
||
**Capabilities Unlocked**
|
||
- Personal knowledge base search during conversations
|
||
- Automatic note creation from conversation insights
|
||
- Cross-reference information between chat and notes
|
||
- Context-aware responses using stored knowledge
|
||
- Future: Find duplicates, suggest organization, summarize notes
|
||
|
||
### Changed - Spelling Corrections
|
||
|
||
**Module Naming**
|
||
- Renamed `trillium.py` to `trilium.py` (corrected spelling)
|
||
- Updated all imports and references across codebase
|
||
- Fixed environment variable names (TRILLIUM → TRILIUM)
|
||
- Updated documentation to use correct "Trilium" spelling
|
||
|
||
---
|
||
|
||
## [0.8.0] - 2025-12-26
|
||
|
||
### Added - Tool Calling & "Show Your Work" Transparency Feature
|
||
|
||
**Tool Calling System (Standard Mode)**
|
||
- **Function Calling Infrastructure** [cortex/autonomy/tools/](cortex/autonomy/tools/)
|
||
- Implemented agentic tool calling for Standard Mode with autonomous multi-step execution
|
||
- Tool registry system with JSON schema definitions
|
||
- Adapter pattern for provider-agnostic tool calling (OpenAI, Ollama, llama.cpp)
|
||
- Maximum 5 iterations per request to prevent runaway loops
|
||
- **Available Tools**
|
||
- `execute_code` - Sandboxed Python/JavaScript/Bash execution via Docker
|
||
- `web_search` - Tavily API integration for real-time web queries
|
||
- `trilium_search` - Internal Trilium knowledge base queries
|
||
- **Provider Adapters** [cortex/autonomy/tools/adapters/](cortex/autonomy/tools/adapters/)
|
||
- `OpenAIAdapter` - Native function calling support
|
||
- `OllamaAdapter` - XML-based tool calling for local models
|
||
- `LlamaCppAdapter` - XML-based tool calling for llama.cpp backend
|
||
- Automatic tool call parsing and result formatting
|
||
- **Code Execution Sandbox** [cortex/autonomy/tools/code_executor.py](cortex/autonomy/tools/code_executor.py)
|
||
- Docker-based isolated execution environment
|
||
- Support for Python, JavaScript (Node.js), and Bash
|
||
- 30-second timeout with automatic cleanup
|
||
- Returns stdout, stderr, exit code, and execution time
|
||
- Prevents filesystem access outside sandbox
|
||
|
||
**"Show Your Work" - Real-Time Thinking Stream**
|
||
- **Server-Sent Events (SSE) Streaming** [cortex/router.py:478-527](cortex/router.py#L478-L527)
|
||
- New `/stream/thinking/{session_id}` endpoint for real-time event streaming
|
||
- Broadcasts internal thinking process during tool calling operations
|
||
- 30-second keepalive with automatic reconnection support
|
||
- Events: `connected`, `thinking`, `tool_call`, `tool_result`, `done`, `error`
|
||
- **Stream Manager** [cortex/autonomy/tools/stream_events.py](cortex/autonomy/tools/stream_events.py)
|
||
- Pub/sub system for managing SSE subscriptions per session
|
||
- Multiple clients can connect to same session stream
|
||
- Automatic cleanup of dead queues and closed connections
|
||
- Zero overhead when no subscribers active
|
||
- **FunctionCaller Integration** [cortex/autonomy/tools/function_caller.py](cortex/autonomy/tools/function_caller.py)
|
||
- Enhanced with event emission at each step:
|
||
- "thinking" events before each LLM call
|
||
- "tool_call" events when invoking tools
|
||
- "tool_result" events after tool execution
|
||
- "done" event with final answer
|
||
- "error" events on failures
|
||
- Session-aware streaming (only emits when subscribers exist)
|
||
- Provider-agnostic implementation works with all backends
|
||
- **Thinking Stream UI** [core/ui/thinking-stream.html](core/ui/thinking-stream.html)
|
||
- Dedicated popup window for real-time thinking visualization
|
||
- Color-coded events: green (thinking), orange (tool calls), blue (results), purple (done), red (errors)
|
||
- Auto-scrolling event feed with animations
|
||
- Connection status indicator with green/red dot
|
||
- Clear events button and session info display
|
||
- Mobile-friendly responsive design
|
||
- **UI Integration** [core/ui/index.html](core/ui/index.html)
|
||
- "🧠 Show Work" button in session selector
|
||
- Opens thinking stream in popup window
|
||
- Session ID passed via URL parameter for stream association
|
||
- Purple/violet button styling to match cyberpunk theme
|
||
|
||
**Tool Calling Configuration**
|
||
- **Environment Variables** [.env](.env)
|
||
- `STANDARD_MODE_ENABLE_TOOLS=true` - Enable/disable tool calling
|
||
- `TAVILY_API_KEY` - API key for web search tool
|
||
- `TRILLIUM_API_URL` - URL for Trillium knowledge base
|
||
- **Standard Mode Tools Toggle** [cortex/router.py:389-470](cortex/router.py#L389-L470)
|
||
- `/simple` endpoint checks `STANDARD_MODE_ENABLE_TOOLS` environment variable
|
||
- Falls back to non-tool mode if disabled
|
||
- Logs tool usage statistics (iterations, tools used)
|
||
|
||
### Changed - CORS & Architecture
|
||
|
||
**CORS Support for SSE**
|
||
- **Added CORS Middleware** [cortex/main.py](cortex/main.py)
|
||
- FastAPI CORSMiddleware with wildcard origins for development
|
||
- Allows cross-origin SSE connections from nginx UI (port 8081) to cortex (port 7081)
|
||
- Credentials support enabled for authenticated requests
|
||
- All methods and headers permitted
|
||
|
||
**Tool Calling Pipeline**
|
||
- **Standard Mode Enhancement** [cortex/router.py:389-470](cortex/router.py#L389-L470)
|
||
- `/simple` endpoint now supports optional tool calling
|
||
- Multi-iteration agentic loop with LLM + tool execution
|
||
- Tool results injected back into conversation for next iteration
|
||
- Graceful degradation to non-tool mode if tools disabled
|
||
|
||
**JSON Response Formatting**
|
||
- **SSE Event Structure** [cortex/router.py:497-499](cortex/router.py#L497-L499)
|
||
- Fixed initial "connected" event to use proper JSON serialization
|
||
- Changed from f-string with nested quotes to `json.dumps()`
|
||
- Ensures valid JSON for all event types
|
||
|
||
### Fixed - Critical JavaScript & SSE Issues
|
||
|
||
**JavaScript Variable Scoping Bug**
|
||
- **Root cause**: `eventSource` variable used before declaration in [thinking-stream.html:218](core/ui/thinking-stream.html#L218)
|
||
- **Symptom**: `Uncaught ReferenceError: can't access lexical declaration 'eventSource' before initialization`
|
||
- **Solution**: Moved variable declarations before `connectStream()` call
|
||
- **Impact**: Thinking stream page now loads without errors and establishes SSE connection
|
||
|
||
**SSE Connection Not Establishing**
|
||
- **Root cause**: CORS blocked cross-origin SSE requests from nginx (8081) to cortex (7081)
|
||
- **Symptom**: Browser silently blocked EventSource connection, no errors in console
|
||
- **Solution**: Added CORSMiddleware to cortex FastAPI app
|
||
- **Impact**: SSE streams now connect successfully across ports
|
||
|
||
**Invalid JSON in SSE Events**
|
||
- **Root cause**: Initial "connected" event used f-string with nested quotes: `f"data: {{'type': 'connected', 'session_id': '{session_id}'}}\n\n"`
|
||
- **Symptom**: Browser couldn't parse malformed JSON, connection appeared stuck on "Connecting..."
|
||
- **Solution**: Used `json.dumps()` for proper JSON serialization
|
||
- **Impact**: Connected event now parsed correctly, status updates to green dot
|
||
|
||
### Technical Improvements
|
||
|
||
**Agentic Architecture**
|
||
- Multi-iteration reasoning loop with tool execution
|
||
- Provider-agnostic tool calling via adapter pattern
|
||
- Automatic tool result injection into conversation context
|
||
- Iteration limits to prevent infinite loops
|
||
- Comprehensive logging at each step
|
||
|
||
**Event Streaming Performance**
|
||
- Zero overhead when no subscribers (check before emit)
|
||
- Efficient pub/sub with asyncio queues
|
||
- Automatic cleanup of disconnected clients
|
||
- 30-second keepalive prevents timeout issues
|
||
- Session-isolated streams prevent cross-talk
|
||
|
||
**Code Quality**
|
||
- Clean separation: tool execution, adapters, streaming, UI
|
||
- Comprehensive error handling with fallbacks
|
||
- Detailed logging for debugging tool calls
|
||
- Type hints and docstrings throughout
|
||
- Modular design for easy extension
|
||
|
||
**Security**
|
||
- Sandboxed code execution prevents filesystem access
|
||
- Timeout limits prevent resource exhaustion
|
||
- Docker isolation for untrusted code
|
||
- No code execution without explicit user request
|
||
|
||
### Architecture - Tool Calling Flow
|
||
|
||
**Standard Mode with Tools:**
|
||
```
|
||
User (UI) → Relay → Cortex /simple
|
||
↓
|
||
Check STANDARD_MODE_ENABLE_TOOLS
|
||
↓
|
||
LLM generates tool call → FunctionCaller
|
||
↓
|
||
Execute tool (Docker sandbox / API call)
|
||
↓
|
||
Inject result → LLM (next iteration)
|
||
↓
|
||
Repeat until done or max iterations
|
||
↓
|
||
Return final answer → UI
|
||
```
|
||
|
||
**Thinking Stream Flow:**
|
||
```
|
||
Browser → nginx:8081 → thinking-stream.html
|
||
↓
|
||
EventSource connects to cortex:7081/stream/thinking/{session_id}
|
||
↓
|
||
ToolStreamManager.subscribe(session_id) → asyncio.Queue
|
||
↓
|
||
User sends message → /simple endpoint
|
||
↓
|
||
FunctionCaller emits events:
|
||
- emit("thinking") → Queue → SSE → Browser
|
||
- emit("tool_call") → Queue → SSE → Browser
|
||
- emit("tool_result") → Queue → SSE → Browser
|
||
- emit("done") → Queue → SSE → Browser
|
||
↓
|
||
Browser displays color-coded events in real-time
|
||
```
|
||
|
||
### Documentation
|
||
|
||
- **Added** [THINKING_STREAM.md](THINKING_STREAM.md) - Complete guide to "Show Your Work" feature
|
||
- Usage examples with curl
|
||
- Event type reference
|
||
- Architecture diagrams
|
||
- Demo page instructions
|
||
- **Added** [UI_THINKING_STREAM.md](UI_THINKING_STREAM.md) - UI integration documentation
|
||
- Button placement and styling
|
||
- Popup window behavior
|
||
- Session association logic
|
||
|
||
### Known Limitations
|
||
|
||
**Tool Calling:**
|
||
- Limited to 5 iterations per request (prevents runaway loops)
|
||
- Python sandbox has no filesystem persistence (temporary only)
|
||
- Web search requires Tavily API key (not free tier unlimited)
|
||
- Trillium search requires separate knowledge base setup
|
||
|
||
**Thinking Stream:**
|
||
- CORS wildcard (`*`) is development-only (should restrict in production)
|
||
- Stream ends after "done" event (must reconnect for new request)
|
||
- No historical replay (only shows real-time events)
|
||
- Single session per stream window
|
||
|
||
### Migration Notes
|
||
|
||
**For Users Upgrading:**
|
||
1. New environment variable: `STANDARD_MODE_ENABLE_TOOLS=true` (default: enabled)
|
||
2. Thinking stream accessible via "🧠 Show Work" button in UI
|
||
3. Tool calling works automatically in Standard Mode when enabled
|
||
4. No changes required to existing Standard Mode usage
|
||
|
||
**For Developers:**
|
||
1. Cortex now includes CORS middleware for SSE
|
||
2. New `/stream/thinking/{session_id}` endpoint available
|
||
3. FunctionCaller requires `session_id` parameter for streaming
|
||
4. Tool adapters can be extended by adding to `AVAILABLE_TOOLS` registry
|
||
|
||
---
|
||
|
||
## [0.7.0] - 2025-12-21
|
||
|
||
### Added - Standard Mode & UI Enhancements
|
||
|
||
**Standard Mode Implementation**
|
||
- Added "Standard Mode" chat option that bypasses complex cortex reasoning pipeline
|
||
- Provides simple chatbot functionality for coding and practical tasks
|
||
- Maintains full conversation context across messages
|
||
- Backend-agnostic - works with SECONDARY (Ollama), OPENAI, or custom backends
|
||
- Created `/simple` endpoint in Cortex router [cortex/router.py:389](cortex/router.py#L389)
|
||
- Mode selector in UI with toggle between Standard and Cortex modes
|
||
- Standard Mode: Direct LLM chat with context retention
|
||
- Cortex Mode: Full 7-stage reasoning pipeline (unchanged)
|
||
|
||
**Backend Selection System**
|
||
- UI settings modal with LLM backend selection for Standard Mode
|
||
- Radio button selector: SECONDARY (Ollama/Qwen), OPENAI (GPT-4o-mini), or custom
|
||
- Backend preference persisted in localStorage
|
||
- Custom backend text input for advanced users
|
||
- Backend parameter routing through entire stack:
|
||
- UI sends `backend` parameter in request body
|
||
- Relay forwards backend selection to Cortex
|
||
- Cortex `/simple` endpoint respects user's backend choice
|
||
- Environment-based fallback: Uses `STANDARD_MODE_LLM` if no backend specified
|
||
|
||
**Session Management Overhaul**
|
||
- Complete rewrite of session system to use server-side persistence
|
||
- File-based storage in `core/relay/sessions/` directory
|
||
- Session files: `{sessionId}.json` for history, `{sessionId}.meta.json` for metadata
|
||
- Server is source of truth - sessions sync across browsers and reboots
|
||
- Session metadata system for friendly names
|
||
- Sessions display custom names instead of random IDs
|
||
- Rename functionality in session dropdown
|
||
- Last modified timestamps and message counts
|
||
- Full CRUD API for sessions in Relay:
|
||
- `GET /sessions` - List all sessions with metadata
|
||
- `GET /sessions/:id` - Retrieve session history
|
||
- `POST /sessions/:id` - Save session history
|
||
- `PATCH /sessions/:id/metadata` - Update session name/metadata
|
||
- `DELETE /sessions/:id` - Delete session and metadata
|
||
- Session management UI in settings modal:
|
||
- List of all sessions with message counts and timestamps
|
||
- Delete button for each session with confirmation
|
||
- Automatic session cleanup when deleting current session
|
||
|
||
**UI Improvements**
|
||
- Settings modal with hamburger menu (⚙ Settings button)
|
||
- Backend selection section for Standard Mode
|
||
- Session management section with delete functionality
|
||
- Clean modal overlay with cyberpunk theme
|
||
- ESC key and click-outside to close
|
||
- Light/Dark mode toggle with dark mode as default
|
||
- Theme preference persisted in localStorage
|
||
- CSS variables for seamless theme switching
|
||
- Toggle button shows current mode (🌙 Dark Mode / ☀️ Light Mode)
|
||
- Removed redundant model selector dropdown from header
|
||
- Fixed modal positioning and z-index layering
|
||
- Modal moved outside #chat container for proper rendering
|
||
- Fixed z-index: overlay (999), modal content (1001)
|
||
- Centered modal with proper backdrop blur
|
||
|
||
**Context Retention for Standard Mode**
|
||
- Integration with Intake module for conversation history
|
||
- Added `get_recent_messages()` function in intake.py
|
||
- Standard Mode retrieves last 20 messages from session buffer
|
||
- Full context sent to LLM on each request
|
||
- Message array format support in LLM router:
|
||
- Updated Ollama provider to accept `messages` parameter
|
||
- Updated OpenAI provider to accept `messages` parameter
|
||
- Automatic conversion from messages to prompt string for non-chat APIs
|
||
|
||
### Changed - Architecture & Routing
|
||
|
||
**Relay Server Updates** [core/relay/server.js](core/relay/server.js)
|
||
- ES module migration for session persistence:
|
||
- Imported `fs/promises`, `path`, `fileURLToPath` for file operations
|
||
- Created `SESSIONS_DIR` constant for session storage location
|
||
- Mode-based routing in both `/chat` and `/v1/chat/completions` endpoints:
|
||
- Extracts `mode` parameter from request body (default: "cortex")
|
||
- Routes to `CORTEX_SIMPLE` for Standard Mode, `CORTEX_REASON` for Cortex Mode
|
||
- Backend parameter only used in Standard Mode
|
||
- Session persistence functions:
|
||
- `ensureSessionsDir()` - Creates sessions directory if needed
|
||
- `loadSession(sessionId)` - Reads session history from file
|
||
- `saveSession(sessionId, history, metadata)` - Writes session to file
|
||
- `loadSessionMetadata(sessionId)` - Reads session metadata
|
||
- `saveSessionMetadata(sessionId, metadata)` - Updates session metadata
|
||
- `listSessions()` - Returns all sessions with metadata, sorted by last modified
|
||
- `deleteSession(sessionId)` - Removes session and metadata files
|
||
|
||
**Cortex Router Updates** [cortex/router.py](cortex/router.py)
|
||
- Added `backend` field to `ReasonRequest` Pydantic model (optional)
|
||
- Created `/simple` endpoint for Standard Mode:
|
||
- Bypasses reflection, reasoning, refinement stages
|
||
- Direct LLM call with conversation context
|
||
- Uses backend from request or falls back to `STANDARD_MODE_LLM` env variable
|
||
- Returns simple response structure without reasoning artifacts
|
||
- Backend selection logic in `/simple`:
|
||
- Normalizes backend names to uppercase
|
||
- Maps UI backend names to system backend names
|
||
- Validates backend availability before calling
|
||
|
||
**Intake Integration** [cortex/intake/intake.py](cortex/intake/intake.py)
|
||
- Added `get_recent_messages(session_id, limit)` function:
|
||
- Retrieves last N messages from session buffer
|
||
- Returns empty list if session doesn't exist
|
||
- Used by `/simple` endpoint for context retrieval
|
||
|
||
**LLM Router Enhancements** [cortex/llm/llm_router.py](cortex/llm/llm_router.py)
|
||
- Added `messages` parameter support across all providers
|
||
- Automatic message-to-prompt conversion for legacy APIs
|
||
- Chat completion format for Ollama and OpenAI providers
|
||
- Stop sequences for MI50/DeepSeek R1 to prevent runaway generation:
|
||
- `"User:"`, `"\nUser:"`, `"Assistant:"`, `"\n\n\n"`
|
||
|
||
**Environment Configuration** [.env](.env)
|
||
- Added `STANDARD_MODE_LLM=SECONDARY` for default Standard Mode backend
|
||
- Added `CORTEX_SIMPLE_URL=http://cortex:7081/simple` for routing
|
||
|
||
**UI Architecture** [core/ui/index.html](core/ui/index.html)
|
||
- Server-based session loading system:
|
||
- `loadSessionsFromServer()` - Fetches sessions from Relay API
|
||
- `renderSessions()` - Populates session dropdown from server data
|
||
- Session state synchronized with server on every change
|
||
- Backend selection persistence:
|
||
- Loads saved backend from localStorage on page load
|
||
- Includes backend parameter in request body when in Standard Mode
|
||
- Settings modal pre-selects current backend choice
|
||
- Dark mode by default:
|
||
- Checks localStorage for theme preference
|
||
- Sets dark theme if no preference found
|
||
- Toggle button updates localStorage and applies theme
|
||
|
||
**CSS Styling** [core/ui/style.css](core/ui/style.css)
|
||
- Light mode CSS variables:
|
||
- `--bg-dark: #f5f5f5` (light background)
|
||
- `--text-main: #1a1a1a` (dark text)
|
||
- `--text-fade: #666` (dimmed text)
|
||
- Dark mode CSS variables (default):
|
||
- `--bg-dark: #0a0a0a` (dark background)
|
||
- `--text-main: #e6e6e6` (light text)
|
||
- `--text-fade: #999` (dimmed text)
|
||
- Modal positioning fixes:
|
||
- `position: fixed` with `top: 50%`, `left: 50%`, `transform: translate(-50%, -50%)`
|
||
- Z-index layering: overlay (999), content (1001)
|
||
- Backdrop blur effect on modal overlay
|
||
- Session list styling:
|
||
- Session item cards with hover effects
|
||
- Delete button with red hover state
|
||
- Message count and timestamp display
|
||
|
||
### Fixed - Critical Issues
|
||
|
||
**DeepSeek R1 Runaway Generation**
|
||
- Root cause: R1 reasoning model generates thinking process and hallucinates conversations
|
||
- Solution:
|
||
- Changed `STANDARD_MODE_LLM` to SECONDARY (Ollama/Qwen) instead of PRIMARY (MI50/R1)
|
||
- Added stop sequences to MI50 provider to prevent continuation
|
||
- Documented R1 limitations for Standard Mode usage
|
||
|
||
**Context Not Maintained in Standard Mode**
|
||
- Root cause: `/simple` endpoint didn't retrieve conversation history from Intake
|
||
- Solution:
|
||
- Created `get_recent_messages()` function in intake.py
|
||
- Standard Mode now pulls last 20 messages from session buffer
|
||
- Full context sent to LLM with each request
|
||
- User feedback: "it's saying it hasn't received any other messages from me, so it looks like the standard mode llm isn't getting the full chat"
|
||
|
||
**OpenAI Backend 400 Errors**
|
||
- Root cause: OpenAI provider only accepted prompt strings, not messages arrays
|
||
- Solution: Updated OpenAI provider to support messages parameter like Ollama
|
||
- Now handles chat completion format correctly
|
||
|
||
**Modal Formatting Issues**
|
||
- Root cause: Settings modal inside #chat container with overflow constraints
|
||
- Symptoms: Modal appearing at bottom, jumbled layout, couldn't close
|
||
- Solution:
|
||
- Moved modal outside #chat container to be direct child of body
|
||
- Changed positioning from absolute to fixed
|
||
- Added proper z-index layering (overlay: 999, content: 1001)
|
||
- Removed old model selector from header
|
||
- User feedback: "the formating for the settings is all off. Its at the bottom and all jumbling together, i cant get it to go away"
|
||
|
||
**Session Persistence Broken**
|
||
- Root cause: Sessions stored only in localStorage, not synced with server
|
||
- Symptoms: Sessions didn't persist across browsers or reboots, couldn't load messages
|
||
- Solution: Complete rewrite of session system
|
||
- Implemented server-side file persistence in Relay
|
||
- Created CRUD API endpoints for session management
|
||
- Updated UI to load sessions from server instead of localStorage
|
||
- Added metadata system for session names
|
||
- Sessions now survive container restarts and sync across browsers
|
||
- User feedback: "sessions seem to exist locally only, i cant get them to actually load any messages and there is now way to delete them. If i open the ui in a different browser those arent there."
|
||
|
||
### Technical Improvements
|
||
|
||
**Backward Compatibility**
|
||
- All changes include defaults to maintain existing behavior
|
||
- Cortex Mode completely unchanged - still uses full 7-stage pipeline
|
||
- Standard Mode is opt-in via UI mode selector
|
||
- If no backend specified, falls back to `STANDARD_MODE_LLM` env variable
|
||
- Existing requests without mode parameter default to "cortex"
|
||
|
||
**Code Quality**
|
||
- Consistent async/await patterns throughout stack
|
||
- Proper error handling with fallbacks
|
||
- Clean separation between Standard and Cortex modes
|
||
- Session persistence abstracted into helper functions
|
||
- Modular UI code with clear event handlers
|
||
|
||
**Performance**
|
||
- Standard Mode bypasses 6 of 7 reasoning stages for faster responses
|
||
- Session loading optimized with file-based caching
|
||
- Backend selection happens once per message, not per LLM call
|
||
- Minimal overhead for mode detection and routing
|
||
|
||
### Architecture - Dual-Mode Chat System
|
||
|
||
**Standard Mode Flow:**
|
||
```
|
||
User (UI) → Relay → Cortex /simple → Intake (get_recent_messages)
|
||
→ LLM (direct call with context) → Relay → UI
|
||
```
|
||
|
||
**Cortex Mode Flow (Unchanged):**
|
||
```
|
||
User (UI) → Relay → Cortex /reason → Reflection → Reasoning
|
||
→ Refinement → Persona → Relay → UI
|
||
```
|
||
|
||
**Session Persistence:**
|
||
```
|
||
UI → POST /sessions/:id → Relay → File system (sessions/*.json)
|
||
UI → GET /sessions → Relay → List all sessions → UI dropdown
|
||
```
|
||
|
||
### Known Limitations
|
||
|
||
**Standard Mode:**
|
||
- No reflection, reasoning, or refinement stages
|
||
- No RAG integration (same as Cortex Mode - currently disabled)
|
||
- No NeoMem memory storage (same as Cortex Mode - currently disabled)
|
||
- DeepSeek R1 not recommended for Standard Mode (generates reasoning artifacts)
|
||
|
||
**Session Management:**
|
||
- Sessions stored in container filesystem - need volume mount for true persistence
|
||
- No session import/export functionality yet
|
||
- No session search or filtering
|
||
|
||
### Migration Notes
|
||
|
||
**For Users Upgrading:**
|
||
1. Existing sessions in localStorage will not automatically migrate to server
|
||
2. Create new sessions after upgrade for server-side persistence
|
||
3. Theme preference (light/dark) will be preserved from localStorage
|
||
4. Backend preference will default to SECONDARY if not previously set
|
||
|
||
**For Developers:**
|
||
1. Relay now requires `fs/promises` for session persistence
|
||
2. Cortex `/simple` endpoint expects `backend` parameter (optional)
|
||
3. UI sends `mode` and `backend` parameters in request body
|
||
4. Session files stored in `core/relay/sessions/` directory
|
||
|
||
---
|
||
|
||
## [0.6.0] - 2025-12-18
|
||
|
||
### Added - Autonomy System (Phase 1 & 2)
|
||
|
||
**Autonomy Phase 1** - Self-Awareness & Planning Foundation
|
||
- **Executive Planning Module** [cortex/autonomy/executive/planner.py](cortex/autonomy/executive/planner.py)
|
||
- Autonomous goal setting and task planning capabilities
|
||
- Multi-step reasoning for complex objectives
|
||
- Integration with self-state tracking
|
||
- **Self-State Management** [cortex/data/self_state.json](cortex/data/self_state.json)
|
||
- Persistent state tracking across sessions
|
||
- Memory of past actions and outcomes
|
||
- Self-awareness metadata storage
|
||
- **Self Analyzer** [cortex/autonomy/self/analyzer.py](cortex/autonomy/self/analyzer.py)
|
||
- Analyzes own performance and decision patterns
|
||
- Identifies areas for improvement
|
||
- Tracks cognitive patterns over time
|
||
- **Test Suite** [cortex/tests/test_autonomy_phase1.py](cortex/tests/test_autonomy_phase1.py)
|
||
- Unit tests for phase 1 autonomy features
|
||
|
||
**Autonomy Phase 2** - Decision Making & Proactive Behavior
|
||
- **Autonomous Actions Module** [cortex/autonomy/actions/autonomous_actions.py](cortex/autonomy/actions/autonomous_actions.py)
|
||
- Self-initiated action execution
|
||
- Context-aware decision implementation
|
||
- Action logging and tracking
|
||
- **Pattern Learning System** [cortex/autonomy/learning/pattern_learner.py](cortex/autonomy/learning/pattern_learner.py)
|
||
- Learns from interaction patterns
|
||
- Identifies recurring user needs
|
||
- Adapts behavior based on learned patterns
|
||
- **Proactive Monitor** [cortex/autonomy/proactive/monitor.py](cortex/autonomy/proactive/monitor.py)
|
||
- Monitors system state for intervention opportunities
|
||
- Detects patterns requiring proactive response
|
||
- Background monitoring capabilities
|
||
- **Decision Engine** [cortex/autonomy/tools/decision_engine.py](cortex/autonomy/tools/decision_engine.py)
|
||
- Autonomous decision-making framework
|
||
- Weighs options and selects optimal actions
|
||
- Integrates with orchestrator for coordinated decisions
|
||
- **Orchestrator** [cortex/autonomy/tools/orchestrator.py](cortex/autonomy/tools/orchestrator.py)
|
||
- Coordinates multiple autonomy subsystems
|
||
- Manages tool selection and execution
|
||
- Handles NeoMem integration (with disable capability)
|
||
- **Test Suite** [cortex/tests/test_autonomy_phase2.py](cortex/tests/test_autonomy_phase2.py)
|
||
- Unit tests for phase 2 autonomy features
|
||
|
||
**Autonomy Phase 2.5** - Pipeline Refinement
|
||
- Tightened integration between autonomy modules and reasoning pipeline
|
||
- Enhanced self-state persistence and tracking
|
||
- Improved orchestrator reliability
|
||
- NeoMem integration refinements in vector store handling [neomem/neomem/vector_stores/qdrant.py](neomem/neomem/vector_stores/qdrant.py)
|
||
|
||
### Added - Documentation
|
||
|
||
- **Complete AI Agent Breakdown** [docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md](docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md)
|
||
- Comprehensive system architecture documentation
|
||
- Detailed component descriptions
|
||
- Data flow diagrams
|
||
- Integration points and API specifications
|
||
|
||
### Changed - Core Integration
|
||
|
||
- **Router Updates** [cortex/router.py](cortex/router.py)
|
||
- Integrated autonomy subsystems into main routing logic
|
||
- Added endpoints for autonomous decision-making
|
||
- Enhanced state management across requests
|
||
- **Reasoning Pipeline** [cortex/reasoning/reasoning.py](cortex/reasoning/reasoning.py)
|
||
- Integrated autonomy-aware reasoning
|
||
- Self-state consideration in reasoning process
|
||
- **Persona Layer** [cortex/persona/speak.py](cortex/persona/speak.py)
|
||
- Autonomy-aware response generation
|
||
- Self-state reflection in personality expression
|
||
- **Context Handling** [cortex/context.py](cortex/context.py)
|
||
- NeoMem disable capability for flexible deployment
|
||
|
||
### Changed - Development Environment
|
||
|
||
- Updated [.gitignore](.gitignore) for better workspace management
|
||
- Cleaned up VSCode settings
|
||
- Removed [.vscode/settings.json](.vscode/settings.json) from repository
|
||
|
||
### Technical Improvements
|
||
|
||
- Modular autonomy architecture with clear separation of concerns
|
||
- Test-driven development for new autonomy features
|
||
- Enhanced state persistence across system restarts
|
||
- Flexible NeoMem integration with enable/disable controls
|
||
|
||
### Architecture - Autonomy System Design
|
||
|
||
The autonomy system operates in layers:
|
||
1. **Executive Layer** - High-level planning and goal setting
|
||
2. **Decision Layer** - Evaluates options and makes choices
|
||
3. **Action Layer** - Executes autonomous decisions
|
||
4. **Learning Layer** - Adapts behavior based on patterns
|
||
5. **Monitoring Layer** - Proactive awareness of system state
|
||
|
||
All layers coordinate through the orchestrator and maintain state in `self_state.json`.
|
||
|
||
---
|
||
|
||
## [0.5.2] - 2025-12-12
|
||
|
||
### Fixed - LLM Router & Async HTTP
|
||
- **Critical**: Replaced synchronous `requests` with async `httpx` in LLM router [cortex/llm/llm_router.py](cortex/llm/llm_router.py)
|
||
- Event loop blocking was causing timeouts and empty responses
|
||
- All three providers (MI50, Ollama, OpenAI) now use `await http_client.post()`
|
||
- Fixes "Expecting value: line 1 column 1 (char 0)" JSON parsing errors in intake
|
||
- **Critical**: Fixed missing `backend` parameter in intake summarization [cortex/intake/intake.py:285](cortex/intake/intake.py#L285)
|
||
- Was defaulting to PRIMARY (MI50) instead of respecting `INTAKE_LLM=SECONDARY`
|
||
- Now correctly uses configured backend (Ollama on 3090)
|
||
- **Relay**: Fixed session ID case mismatch [core/relay/server.js:87](core/relay/server.js#L87)
|
||
- UI sends `sessionId` (camelCase) but relay expected `session_id` (snake_case)
|
||
- Now accepts both variants: `req.body.session_id || req.body.sessionId`
|
||
- Custom session IDs now properly tracked instead of defaulting to "default"
|
||
|
||
### Added - Error Handling & Diagnostics
|
||
- Added comprehensive error handling in LLM router for all providers
|
||
- HTTPError, JSONDecodeError, KeyError, and generic Exception handling
|
||
- Detailed error messages with exception type and description
|
||
- Provider-specific error logging (mi50, ollama, openai)
|
||
- Added debug logging in intake summarization
|
||
- Logs LLM response length and preview
|
||
- Validates non-empty responses before JSON parsing
|
||
- Helps diagnose empty or malformed responses
|
||
|
||
### Added - Session Management
|
||
- Added session persistence endpoints in relay [core/relay/server.js:160-171](core/relay/server.js#L160-L171)
|
||
- `GET /sessions/:id` - Retrieve session history
|
||
- `POST /sessions/:id` - Save session history
|
||
- In-memory storage using Map (ephemeral, resets on container restart)
|
||
- Fixes UI "Failed to load session" errors
|
||
|
||
### Changed - Provider Configuration
|
||
- Added `mi50` provider support for llama.cpp server [cortex/llm/llm_router.py:62-81](cortex/llm/llm_router.py#L62-L81)
|
||
- Uses `/completion` endpoint with `n_predict` parameter
|
||
- Extracts `content` field from response
|
||
- Configured for MI50 GPU with DeepSeek model
|
||
- Increased memory retrieval threshold from 0.78 to 0.90 [cortex/.env:20](cortex/.env#L20)
|
||
- Filters out low-relevance memories (only returns 90%+ similarity)
|
||
- Reduces noise in context retrieval
|
||
|
||
### Technical Improvements
|
||
- Unified async HTTP handling across all LLM providers
|
||
- Better separation of concerns between provider implementations
|
||
- Improved error messages for debugging LLM API failures
|
||
- Consistent timeout handling (120 seconds for all providers)
|
||
|
||
---
|
||
|
||
## [0.5.1] - 2025-12-11
|
||
|
||
### Fixed - Intake Integration
|
||
- **Critical**: Fixed `bg_summarize()` function not defined error
|
||
- Was only a `TYPE_CHECKING` stub, now implemented as logging stub
|
||
- Eliminated `NameError` preventing SESSIONS from persisting correctly
|
||
- Function now logs exchange additions and defers summarization to `/reason` endpoint
|
||
- **Critical**: Fixed `/ingest` endpoint unreachable code in [router.py:201-233](cortex/router.py#L201-L233)
|
||
- Removed early return that prevented `update_last_assistant_message()` from executing
|
||
- Removed duplicate `add_exchange_internal()` call
|
||
- Implemented lenient error handling (each operation wrapped in try/except)
|
||
- **Intake**: Added missing `__init__.py` to make intake a proper Python package [cortex/intake/__init__.py](cortex/intake/__init__.py)
|
||
- Prevents namespace package issues
|
||
- Enables proper module imports
|
||
- Exports `SESSIONS`, `add_exchange_internal`, `summarize_context`
|
||
|
||
### Added - Diagnostics & Debugging
|
||
- Added diagnostic logging to verify SESSIONS singleton behavior
|
||
- Module initialization logs SESSIONS object ID [intake.py:14](cortex/intake/intake.py#L14)
|
||
- Each `add_exchange_internal()` call logs object ID and buffer state [intake.py:343-358](cortex/intake/intake.py#L343-L358)
|
||
- Added `/debug/sessions` HTTP endpoint [router.py:276-305](cortex/router.py#L276-L305)
|
||
- Inspect SESSIONS from within running Uvicorn worker
|
||
- Shows total sessions, session count, buffer sizes, recent exchanges
|
||
- Returns SESSIONS object ID for verification
|
||
- Added `/debug/summary` HTTP endpoint [router.py:238-271](cortex/router.py#L238-L271)
|
||
- Test `summarize_context()` for any session
|
||
- Returns L1/L5/L10/L20/L30 summaries
|
||
- Includes buffer size and exchange preview
|
||
|
||
### Changed - Intake Architecture
|
||
- **Intake no longer standalone service** - runs inside Cortex container as pure Python module
|
||
- Imported as `from intake.intake import add_exchange_internal, SESSIONS`
|
||
- No HTTP calls between Cortex and Intake
|
||
- Eliminates network latency and dependency on Intake service being up
|
||
- **Deferred summarization**: `bg_summarize()` is now a no-op stub [intake.py:318-325](cortex/intake/intake.py#L318-L325)
|
||
- Actual summarization happens during `/reason` call via `summarize_context()`
|
||
- Simplifies async/sync complexity
|
||
- Prevents NameError when called from `add_exchange_internal()`
|
||
- **Lenient error handling**: `/ingest` endpoint always returns success [router.py:201-233](cortex/router.py#L201-L233)
|
||
- Each operation wrapped in try/except
|
||
- Logs errors but never fails to avoid breaking chat pipeline
|
||
- User requirement: never fail chat pipeline
|
||
|
||
### Documentation
|
||
- Added single-worker constraint note in [cortex/Dockerfile:7-8](cortex/Dockerfile#L7-L8)
|
||
- Documents that SESSIONS requires single Uvicorn worker
|
||
- Notes that multi-worker scaling requires Redis or shared storage
|
||
- Updated plan documentation with root cause analysis
|
||
|
||
---
|
||
|
||
## [0.5.0] - 2025-11-28
|
||
|
||
### Fixed - Critical API Wiring & Integration
|
||
|
||
After the major architectural rewire (v0.4.x), this release fixes all critical endpoint mismatches and ensures end-to-end system connectivity.
|
||
|
||
#### Cortex → Intake Integration
|
||
- **Fixed** `IntakeClient` to use correct Intake v0.2 API endpoints
|
||
- Changed `GET /context/{session_id}` → `GET /summaries?session_id={session_id}`
|
||
- Updated JSON response parsing to extract `summary_text` field
|
||
- Fixed environment variable name: `INTAKE_API` → `INTAKE_API_URL`
|
||
- Corrected default port: `7083` → `7080`
|
||
- Added deprecation warning to `summarize_turn()` method (endpoint removed in Intake v0.2)
|
||
|
||
#### Relay → UI Compatibility
|
||
- **Added** OpenAI-compatible endpoint `POST /v1/chat/completions`
|
||
- Accepts standard OpenAI format with `messages[]` array
|
||
- Returns OpenAI-compatible response structure with `choices[]`
|
||
- Extracts last message content from messages array
|
||
- Includes usage metadata (stub values for compatibility)
|
||
- **Refactored** Relay to use shared `handleChatRequest()` function
|
||
- Both `/chat` and `/v1/chat/completions` use same core logic
|
||
- Eliminates code duplication
|
||
- Consistent error handling across endpoints
|
||
|
||
#### Relay → Intake Connection
|
||
- **Fixed** Intake URL fallback in Relay server configuration
|
||
- Corrected port: `7082` → `7080`
|
||
- Updated endpoint: `/summary` → `/add_exchange`
|
||
- Now properly sends exchanges to Intake for summarization
|
||
|
||
#### Code Quality & Python Package Structure
|
||
- **Added** missing `__init__.py` files to all Cortex subdirectories
|
||
- `cortex/llm/__init__.py`
|
||
- `cortex/reasoning/__init__.py`
|
||
- `cortex/persona/__init__.py`
|
||
- `cortex/ingest/__init__.py`
|
||
- `cortex/utils/__init__.py`
|
||
- Improves package imports and IDE support
|
||
- **Removed** unused import in `cortex/router.py`: `from unittest import result`
|
||
- **Deleted** empty file `cortex/llm/resolve_llm_url.py` (was 0 bytes, never implemented)
|
||
|
||
### Verified Working
|
||
|
||
Complete end-to-end message flow now operational:
|
||
```
|
||
UI → Relay (/v1/chat/completions)
|
||
↓
|
||
Relay → Cortex (/reason)
|
||
↓
|
||
Cortex → Intake (/summaries) [retrieves context]
|
||
↓
|
||
Cortex 4-stage pipeline:
|
||
1. reflection.py → meta-awareness notes
|
||
2. reasoning.py → draft answer
|
||
3. refine.py → polished answer
|
||
4. persona/speak.py → Lyra personality
|
||
↓
|
||
Cortex → Relay (returns persona response)
|
||
↓
|
||
Relay → Intake (/add_exchange) [async summary]
|
||
↓
|
||
Intake → NeoMem (background memory storage)
|
||
↓
|
||
Relay → UI (final response)
|
||
```
|
||
|
||
### Documentation
|
||
- **Added** comprehensive v0.5.0 changelog entry
|
||
- **Updated** README.md to reflect v0.5.0 architecture
|
||
- Documented new endpoints
|
||
- Updated data flow diagrams
|
||
- Clarified Intake v0.2 changes
|
||
- Corrected service descriptions
|
||
|
||
### Issues Resolved
|
||
- ❌ Cortex could not retrieve context from Intake (wrong endpoint)
|
||
- ❌ UI could not send messages to Relay (endpoint mismatch)
|
||
- ❌ Relay could not send summaries to Intake (wrong port/endpoint)
|
||
- ❌ Python package imports were implicit (missing __init__.py)
|
||
|
||
### Known Issues (Non-Critical)
|
||
- Session management endpoints not implemented in Relay (`GET/POST /sessions/:id`)
|
||
- RAG service currently disabled in docker-compose.yml
|
||
- Cortex `/ingest` endpoint is a stub returning `{"status": "ok"}`
|
||
|
||
### Migration Notes
|
||
If upgrading from v0.4.x:
|
||
1. Pull latest changes from git
|
||
2. Verify environment variables in `.env` files:
|
||
- Check `INTAKE_API_URL=http://intake:7080` (not `INTAKE_API`)
|
||
- Verify all service URLs use correct ports
|
||
3. Restart Docker containers: `docker-compose down && docker-compose up -d`
|
||
4. Test with a simple message through the UI
|
||
|
||
---
|
||
|
||
## [Infrastructure v1.0.0] - 2025-11-26
|
||
|
||
### Changed - Environment Variable Consolidation
|
||
|
||
**Major reorganization to eliminate duplication and improve maintainability**
|
||
|
||
- Consolidated 9 scattered `.env` files into single source of truth architecture
|
||
- Root `.env` now contains all shared infrastructure (LLM backends, databases, API keys, service URLs)
|
||
- Service-specific `.env` files minimized to only essential overrides:
|
||
- `cortex/.env`: Reduced from 42 to 22 lines (operational parameters only)
|
||
- `neomem/.env`: Reduced from 26 to 14 lines (LLM naming conventions only)
|
||
- `intake/.env`: Kept at 8 lines (already minimal)
|
||
- **Result**: ~24% reduction in total configuration lines (197 → ~150)
|
||
|
||
**Docker Compose Consolidation**
|
||
- All services now defined in single root `docker-compose.yml`
|
||
- Relay service updated with complete configuration (env_file, volumes)
|
||
- Removed redundant `core/docker-compose.yml` (marked as DEPRECATED)
|
||
- Standardized network communication to use Docker container names
|
||
|
||
**Service URL Standardization**
|
||
- Internal services use container names: `http://neomem-api:7077`, `http://cortex:7081`
|
||
- External services use IP addresses: `http://10.0.0.43:8000` (vLLM), `http://10.0.0.3:11434` (Ollama)
|
||
- Removed IP/container name inconsistencies across files
|
||
|
||
### Added - Security & Documentation
|
||
|
||
**Security Templates** - Created `.env.example` files for all services
|
||
- Root `.env.example` with sanitized credentials
|
||
- Service-specific templates: `cortex/.env.example`, `neomem/.env.example`, `intake/.env.example`, `rag/.env.example`
|
||
- All `.env.example` files safe to commit to version control
|
||
|
||
**Documentation**
|
||
- `ENVIRONMENT_VARIABLES.md`: Comprehensive reference for all environment variables
|
||
- Variable descriptions, defaults, and usage examples
|
||
- Multi-backend LLM strategy documentation
|
||
- Troubleshooting guide
|
||
- Security best practices
|
||
- `DEPRECATED_FILES.md`: Deletion guide for deprecated files with verification steps
|
||
|
||
**Enhanced .gitignore**
|
||
- Ignores all `.env` files (including subdirectories)
|
||
- Tracks `.env.example` templates for documentation
|
||
- Ignores `.env-backups/` directory
|
||
|
||
### Removed
|
||
- `core/.env` - Redundant with root `.env`, now deleted
|
||
- `core/docker-compose.yml` - Consolidated into main compose file (marked DEPRECATED)
|
||
|
||
### Fixed
|
||
- Eliminated duplicate `OPENAI_API_KEY` across 5+ files
|
||
- Eliminated duplicate LLM backend URLs across 4+ files
|
||
- Eliminated duplicate database credentials across 3+ files
|
||
- Resolved Cortex `environment:` section override in docker-compose (now uses env_file)
|
||
|
||
### Architecture - Multi-Backend LLM Strategy
|
||
|
||
Root `.env` provides all backend OPTIONS (PRIMARY, SECONDARY, CLOUD, FALLBACK), services choose which to USE:
|
||
- **Cortex** → vLLM (PRIMARY) for autonomous reasoning
|
||
- **NeoMem** → Ollama (SECONDARY) + OpenAI embeddings
|
||
- **Intake** → vLLM (PRIMARY) for summarization
|
||
- **Relay** → Fallback chain with user preference
|
||
|
||
Preserves per-service flexibility while eliminating URL duplication.
|
||
|
||
### Migration
|
||
- All original `.env` files backed up to `.env-backups/` with timestamp `20251126_025334`
|
||
- Rollback plan documented in `ENVIRONMENT_VARIABLES.md`
|
||
- Verification steps provided in `DEPRECATED_FILES.md`
|
||
|
||
---
|
||
|
||
## [0.4.x] - 2025-11-13
|
||
|
||
### Added - Multi-Stage Reasoning Pipeline
|
||
|
||
**Cortex v0.5 - Complete architectural overhaul**
|
||
|
||
- **New `reasoning.py` module**
|
||
- Async reasoning engine
|
||
- Accepts user prompt, identity, RAG block, and reflection notes
|
||
- Produces draft internal answers
|
||
- Uses primary backend (vLLM)
|
||
|
||
- **New `reflection.py` module**
|
||
- Fully async meta-awareness layer
|
||
- Produces actionable JSON "internal notes"
|
||
- Enforces strict JSON schema and fallback parsing
|
||
- Forces cloud backend (`backend_override="cloud"`)
|
||
|
||
- **Integrated `refine.py` into pipeline**
|
||
- New stage between reflection and persona
|
||
- Runs exclusively on primary vLLM backend (MI50)
|
||
- Produces final, internally consistent output for downstream persona layer
|
||
|
||
- **Backend override system**
|
||
- Each LLM call can now select its own backend
|
||
- Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary
|
||
|
||
- **Identity loader**
|
||
- Added `identity.py` with `load_identity()` for consistent persona retrieval
|
||
|
||
- **Ingest handler**
|
||
- Async stub created for future Intake → NeoMem → RAG pipeline
|
||
|
||
**Cortex v0.4.1 - RAG Integration**
|
||
|
||
- **RAG integration**
|
||
- Added `rag.py` with `query_rag()` and `format_rag_block()`
|
||
- Cortex now queries local RAG API (`http://10.0.0.41:7090/rag/search`)
|
||
- Synthesized answers and top excerpts injected into reasoning prompt
|
||
|
||
### Changed - Unified LLM Architecture
|
||
|
||
**Cortex v0.5**
|
||
|
||
- **Unified LLM backend URL handling across Cortex**
|
||
- ENV variables must now contain FULL API endpoints
|
||
- Removed all internal path-appending (e.g. `.../v1/completions`)
|
||
- `llm_router.py` rewritten to use env-provided URLs as-is
|
||
- Ensures consistent behavior between draft, reflection, refine, and persona
|
||
|
||
- **Rebuilt `main.py`**
|
||
- Removed old annotation/analysis logic
|
||
- New structure: load identity → get RAG → reflect → reason → return draft+notes
|
||
- Routes now clean and minimal (`/reason`, `/ingest`, `/health`)
|
||
- Async path throughout Cortex
|
||
|
||
- **Refactored `llm_router.py`**
|
||
- Removed old fallback logic during overrides
|
||
- OpenAI requests now use `/v1/chat/completions`
|
||
- Added proper OpenAI Authorization headers
|
||
- Distinct payload format for vLLM vs OpenAI
|
||
- Unified, correct parsing across models
|
||
|
||
- **Simplified Cortex architecture**
|
||
- Removed deprecated "context.py" and old reasoning code
|
||
- Relay completely decoupled from smart behavior
|
||
|
||
- **Updated environment specification**
|
||
- `LLM_PRIMARY_URL` now set to `http://10.0.0.43:8000/v1/completions`
|
||
- `LLM_SECONDARY_URL` remains `http://10.0.0.3:11434/api/generate` (Ollama)
|
||
- `LLM_CLOUD_URL` set to `https://api.openai.com/v1/chat/completions`
|
||
|
||
**Cortex v0.4.1**
|
||
|
||
- **Revised `/reason` endpoint**
|
||
- Now builds unified context blocks: [Intake] → recent summaries, [RAG] → contextual knowledge, [User Message] → current input
|
||
- Calls `call_llm()` for first pass, then `reflection_loop()` for meta-evaluation
|
||
- Returns `cortex_prompt`, `draft_output`, `final_output`, and normalized reflection
|
||
|
||
- **Reflection Pipeline Stability**
|
||
- Cleaned parsing to normalize JSON vs. text reflections
|
||
- Added fallback handling for malformed or non-JSON outputs
|
||
- Log system improved to show raw JSON, extracted fields, and normalized summary
|
||
|
||
- **Async Summarization (Intake v0.2.1)**
|
||
- Intake summaries now run in background threads to avoid blocking Cortex
|
||
- Summaries (L1–L∞) logged asynchronously with [BG] tags
|
||
|
||
- **Environment & Networking Fixes**
|
||
- Verified `.env` variables propagate correctly inside Cortex container
|
||
- Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG
|
||
- Adjusted localhost calls to service-IP mapping
|
||
|
||
- **Behavioral Updates**
|
||
- Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers)
|
||
- RAG context successfully grounds reasoning outputs
|
||
- Intake and NeoMem confirmed receiving summaries via `/add_exchange`
|
||
- Log clarity pass: all reflective and contextual blocks clearly labeled
|
||
|
||
### Fixed
|
||
|
||
**Cortex v0.5**
|
||
|
||
- Resolved endpoint conflict where router expected base URLs and refine expected full URLs
|
||
- Fixed by standardizing full-URL behavior across entire system
|
||
- Reflection layer no longer fails silently (previously returned `[""]` due to MythoMax)
|
||
- Resolved 404/401 errors caused by incorrect OpenAI URL endpoints
|
||
- No more double-routing through vLLM during reflection
|
||
- Corrected async/sync mismatch in multiple locations
|
||
- Eliminated double-path bug (`/v1/completions/v1/completions`) caused by previous router logic
|
||
|
||
### Removed
|
||
|
||
**Cortex v0.5**
|
||
|
||
- Legacy `annotate`, `reason_check` glue logic from old architecture
|
||
- Old backend probing junk code
|
||
- Stale imports and unused modules leftover from previous prototype
|
||
|
||
### Verified
|
||
|
||
**Cortex v0.5**
|
||
|
||
- Cortex → vLLM (MI50) → refine → final_output now functioning correctly
|
||
- Refine shows `used_primary_backend: true` and no fallback
|
||
- Manual curl test confirms endpoint accuracy
|
||
|
||
### Known Issues
|
||
|
||
**Cortex v0.5**
|
||
|
||
- Refine sometimes prefixes output with `"Final Answer:"`; next version will sanitize this
|
||
- Hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned)
|
||
|
||
**Cortex v0.4.1**
|
||
|
||
- NeoMem tuning needed - improve retrieval latency and relevance
|
||
- Need dedicated `/reflections/recent` endpoint for Cortex
|
||
- Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem)
|
||
- Add persistent reflection recall (use prior reflections as meta-context)
|
||
- Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields)
|
||
- Tighten temperature and prompt control for factual consistency
|
||
- RAG optimization: add source ranking, filtering, multi-vector hybrid search
|
||
- Cache RAG responses per session to reduce duplicate calls
|
||
|
||
### Notes
|
||
|
||
**Cortex v0.5**
|
||
|
||
This is the largest structural change to Cortex so far. It establishes:
|
||
- Multi-model cognition
|
||
- Clean layering
|
||
- Identity + reflection separation
|
||
- Correct async code
|
||
- Deterministic backend routing
|
||
- Predictable JSON reflection
|
||
|
||
The system is now ready for:
|
||
- Refinement loops
|
||
- Persona-speaking layer
|
||
- Containerized RAG
|
||
- Long-term memory integration
|
||
- True emergent-behavior experiments
|
||
|
||
---
|
||
|
||
## [0.3.x] - 2025-10-28 to 2025-09-26
|
||
|
||
### Added
|
||
|
||
**[Lyra Core v0.3.2 + Web UI v0.2.0] - 2025-10-28**
|
||
|
||
- **New UI**
|
||
- Cleaned up UI look and feel
|
||
|
||
- **Sessions**
|
||
- Sessions now persist over time
|
||
- Ability to create new sessions or load sessions from previous instance
|
||
- When changing session, updates what the prompt sends to relay (doesn't prompt with messages from other sessions)
|
||
- Relay correctly wired in
|
||
|
||
**[Lyra-Core 0.3.1] - 2025-10-09**
|
||
|
||
- **NVGRAM Integration (Full Pipeline Reconnected)**
|
||
- Replaced legacy Mem0 service with NVGRAM microservice (`nvgram-api` @ port 7077)
|
||
- Updated `server.js` in Relay to route all memory ops via `${NVGRAM_API}/memories` and `/search`
|
||
- Added `.env` variable: `NVGRAM_API=http://nvgram-api:7077`
|
||
- Verified end-to-end Lyra conversation persistence: `relay → nvgram-api → postgres/neo4j → relay → ollama → ui`
|
||
- ✅ Memories stored, retrieved, and re-injected successfully
|
||
|
||
**[Lyra-Core v0.3.0] - 2025-09-26**
|
||
|
||
- **Salience filtering** in Relay
|
||
- `.env` configurable: `SALIENCE_ENABLED`, `SALIENCE_MODE`, `SALIENCE_MODEL`, `SALIENCE_API_URL`
|
||
- Supports `heuristic` and `llm` classification modes
|
||
- LLM-based salience filter integrated with Cortex VM running `llama-server`
|
||
- Logging improvements
|
||
- Added debug logs for salience mode, raw LLM output, and unexpected outputs
|
||
- Fail-closed behavior for unexpected LLM responses
|
||
- Successfully tested with **Phi-3.5-mini** and **Qwen2-0.5B-Instruct** as salience classifiers
|
||
- Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply
|
||
|
||
**[Cortex v0.3.0] - 2025-10-31**
|
||
|
||
- **Cortex Service (FastAPI)**
|
||
- New standalone reasoning engine (`cortex/main.py`) with endpoints:
|
||
- `GET /health` – reports active backend + NeoMem status
|
||
- `POST /reason` – evaluates `{prompt, response}` pairs
|
||
- `POST /annotate` – experimental text analysis
|
||
- Background NeoMem health monitor (5-minute interval)
|
||
|
||
- **Multi-Backend Reasoning Support**
|
||
- Environment-driven backend selection via `LLM_FORCE_BACKEND`
|
||
- Supports: Primary (vLLM MI50), Secondary (Ollama 3090), Cloud (OpenAI), Fallback (llama.cpp CPU)
|
||
- Per-backend model variables: `LLM_PRIMARY_MODEL`, `LLM_SECONDARY_MODEL`, `LLM_CLOUD_MODEL`, `LLM_FALLBACK_MODEL`
|
||
|
||
- **Response Normalization Layer**
|
||
- Implemented `normalize_llm_response()` to merge streamed outputs and repair malformed JSON
|
||
- Handles Ollama's multi-line streaming and Mythomax's missing punctuation issues
|
||
- Prints concise debug previews of merged content
|
||
|
||
- **Environment Simplification**
|
||
- Each service (`intake`, `cortex`, `neomem`) now maintains its own `.env` file
|
||
- Removed reliance on shared/global env file to prevent cross-contamination
|
||
- Verified Docker Compose networking across containers
|
||
|
||
**[NeoMem 0.1.2] - 2025-10-27** (formerly NVGRAM)
|
||
|
||
- **Renamed NVGRAM to NeoMem**
|
||
- All future updates under name NeoMem
|
||
- Features unchanged
|
||
|
||
**[NVGRAM 0.1.1] - 2025-10-08**
|
||
|
||
- **Async Memory Rewrite (Stability + Safety Patch)**
|
||
- Introduced `AsyncMemory` class with fully asynchronous vector and graph store writes
|
||
- Added input sanitation to prevent embedding errors (`'list' object has no attribute 'replace'`)
|
||
- Implemented `flatten_messages()` helper in API layer to clean malformed payloads
|
||
- Added structured request logging via `RequestLoggingMiddleware` (FastAPI middleware)
|
||
- Health endpoint (`/health`) returns structured JSON `{status, version, service}`
|
||
- Startup logs include sanitized embedder config with masked API keys
|
||
|
||
**[NVGRAM 0.1.0] - 2025-10-07**
|
||
|
||
- **Initial fork of Mem0 → NVGRAM**
|
||
- Created fully independent local-first memory engine based on Mem0 OSS
|
||
- Renamed all internal modules, Docker services, environment variables from `mem0` → `nvgram`
|
||
- New service name: `nvgram-api`, default port 7077
|
||
- Maintains same API endpoints (`/memories`, `/search`) for drop-in compatibility
|
||
- Uses FastAPI, Postgres, and Neo4j as persistent backends
|
||
|
||
**[Lyra-Mem0 0.3.2] - 2025-10-05**
|
||
|
||
- **Ollama LLM reasoning** alongside OpenAI embeddings
|
||
- Introduced `LLM_PROVIDER=ollama`, `LLM_MODEL`, and `OLLAMA_HOST` in `.env.3090`
|
||
- Verified local 3090 setup using `qwen2.5:7b-instruct-q4_K_M`
|
||
- Split processing: Embeddings → OpenAI `text-embedding-3-small`, LLM → Local Ollama
|
||
- Added `.env.3090` template for self-hosted inference nodes
|
||
- Integrated runtime diagnostics and seeder progress tracking
|
||
- File-level + message-level progress bars
|
||
- Retry/back-off logic for timeouts (3 attempts)
|
||
- Event logging (`ADD / UPDATE / NONE`) for every memory record
|
||
- Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers
|
||
- Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090)
|
||
|
||
**[Lyra-Mem0 0.3.1] - 2025-10-03**
|
||
|
||
- HuggingFace TEI integration (local 3090 embedder)
|
||
- Dual-mode environment switch between OpenAI cloud and local
|
||
- CSV export of memories from Postgres (`payload->>'data'`)
|
||
|
||
**[Lyra-Mem0 0.3.0]**
|
||
|
||
- **Ollama embeddings** in Mem0 OSS container
|
||
- Configure `EMBEDDER_PROVIDER=ollama`, `EMBEDDER_MODEL`, `OLLAMA_HOST` via `.env`
|
||
- Mounted `main.py` override from host into container to load custom `DEFAULT_CONFIG`
|
||
- Installed `ollama` Python client into custom API container image
|
||
- `.env.3090` file for external embedding mode (3090 machine)
|
||
- Workflow for multiple embedding modes: LAN-based 3090/Ollama, Local-only CPU, OpenAI fallback
|
||
|
||
**[Lyra-Mem0 v0.2.1]**
|
||
|
||
- **Seeding pipeline**
|
||
- Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0
|
||
- Implemented incremental seeding option (skip existing memories, only add new ones)
|
||
- Verified insert process with Postgres-backed history DB
|
||
|
||
**[Intake v0.1.0] - 2025-10-27**
|
||
|
||
- Receives messages from relay and summarizes them in cascading format
|
||
- Continues to summarize smaller amounts of exchanges while generating large-scale conversational summaries (L20)
|
||
- Currently logs summaries to .log file in `/project-lyra/intake-logs/`
|
||
|
||
**[Lyra-Cortex v0.2.0] - 2025-09-26**
|
||
|
||
- Integrated **llama-server** on dedicated Cortex VM (Proxmox)
|
||
- Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs
|
||
- Benchmarked Phi-3.5-mini performance: ~18 tokens/sec CPU-only on Ryzen 7 7800X
|
||
- Salience classification functional but sometimes inconsistent
|
||
- Tested **Qwen2-0.5B-Instruct GGUF** as alternative salience classifier
|
||
- Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval)
|
||
- More responsive but over-classifies messages as "salient"
|
||
- Established `.env` integration for model ID (`SALIENCE_MODEL`), enabling hot-swap between models
|
||
|
||
### Changed
|
||
|
||
**[Lyra-Core 0.3.1] - 2025-10-09**
|
||
|
||
- Renamed `MEM0_URL` → `NVGRAM_API` across all relay environment configs
|
||
- Updated Docker Compose service dependency order
|
||
- `relay` now depends on `nvgram-api` healthcheck
|
||
- Removed `mem0` references and volumes
|
||
- Minor cleanup to Persona fetch block (null-checks and safer default persona string)
|
||
|
||
**[Lyra-Core v0.3.1] - 2025-09-27**
|
||
|
||
- Removed salience filter logic; Cortex is now default annotator
|
||
- All user messages stored in Mem0; no discard tier applied
|
||
- Cortex annotations (`metadata.cortex`) now attached to memories
|
||
- Debug logging improvements
|
||
- Pretty-print Cortex annotations
|
||
- Injected prompt preview
|
||
- Memory search hit list with scores
|
||
- `.env` toggle (`CORTEX_ENABLED`) to bypass Cortex when needed
|
||
|
||
**[Lyra-Core v0.3.0] - 2025-09-26**
|
||
|
||
- Refactored `server.js` to gate `mem.add()` calls behind salience filter
|
||
- Updated `.env` to support `SALIENCE_MODEL`
|
||
|
||
**[Cortex v0.3.0] - 2025-10-31**
|
||
|
||
- Refactored `reason_check()` to dynamically switch between **prompt** and **chat** mode depending on backend
|
||
- Enhanced startup logs to announce active backend, model, URL, and mode
|
||
- Improved error handling with clearer "Reasoning error" messages
|
||
|
||
**[NVGRAM 0.1.1] - 2025-10-08**
|
||
|
||
- Replaced synchronous `Memory.add()` with async-safe version supporting concurrent vector + graph writes
|
||
- Normalized indentation and cleaned duplicate `main.py` references
|
||
- Removed redundant `FastAPI()` app reinitialization
|
||
- Updated internal logging to INFO-level timing format
|
||
- Deprecated `@app.on_event("startup")` → will migrate to `lifespan` handler in v0.1.2
|
||
|
||
**[NVGRAM 0.1.0] - 2025-10-07**
|
||
|
||
- Removed dependency on external `mem0ai` SDK — all logic now local
|
||
- Re-pinned requirements: fastapi==0.115.8, uvicorn==0.34.0, pydantic==2.10.4, python-dotenv==1.0.1, psycopg>=3.2.8, ollama
|
||
- Adjusted `docker-compose` and `.env` templates to use new NVGRAM naming
|
||
|
||
**[Lyra-Mem0 0.3.2] - 2025-10-05**
|
||
|
||
- Updated `main.py` configuration block to load `LLM_PROVIDER`, `LLM_MODEL`, `OLLAMA_BASE_URL`
|
||
- Fallback to OpenAI if Ollama unavailable
|
||
- Adjusted `docker-compose.yml` mount paths to correctly map `/app/main.py`
|
||
- Normalized `.env` loading so `mem0-api` and host environment share identical values
|
||
- Improved seeder logging and progress telemetry
|
||
- Added explicit `temperature` field to `DEFAULT_CONFIG['llm']['config']`
|
||
|
||
**[Lyra-Mem0 0.3.0]**
|
||
|
||
- `docker-compose.yml` updated to mount local `main.py` and `.env.3090`
|
||
- Built custom Dockerfile (`mem0-api-server:latest`) extending base image with `pip install ollama`
|
||
- Updated `requirements.txt` to include `ollama` package
|
||
- Adjusted Mem0 container config so `main.py` pulls environment variables with `dotenv`
|
||
- Tested new embeddings path with curl `/memories` API call
|
||
|
||
**[Lyra-Mem0 v0.2.1]**
|
||
|
||
- Updated `main.py` to load configuration from `.env` using `dotenv` and support multiple embedder backends
|
||
- Mounted host `main.py` into container so local edits persist across rebuilds
|
||
- Updated `docker-compose.yml` to mount `.env.3090` and support swap between profiles
|
||
- Built custom Dockerfile (`mem0-api-server:latest`) including `pip install ollama`
|
||
- Updated `requirements.txt` with `ollama` dependency
|
||
- Adjusted startup flow so container automatically connects to external Ollama host (LAN IP)
|
||
- Added logging to confirm model pulls and embedding requests
|
||
|
||
### Fixed
|
||
|
||
**[Lyra-Core 0.3.1] - 2025-10-09**
|
||
|
||
- Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling
|
||
- `/memories` POST failures no longer crash Relay; now logged gracefully as `relay error Error: memAdd failed: 500`
|
||
- Improved injected prompt debugging (`DEBUG_PROMPT=true` now prints clean JSON)
|
||
|
||
**[Lyra-Core v0.3.1] - 2025-09-27**
|
||
|
||
- Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner
|
||
- Relay no longer "hangs" on malformed Cortex outputs
|
||
|
||
**[Cortex v0.3.0] - 2025-10-31**
|
||
|
||
- Corrected broken vLLM endpoint routing (`/v1/completions`)
|
||
- Stabilized cross-container health reporting for NeoMem
|
||
- Resolved JSON parse failures caused by streaming chunk delimiters
|
||
|
||
**[NVGRAM 0.1.1] - 2025-10-08**
|
||
|
||
- Eliminated repeating 500 error from OpenAI embedder caused by non-string message content
|
||
- Masked API key leaks from boot logs
|
||
- Ensured Neo4j reconnects gracefully on first retry
|
||
|
||
**[Lyra-Mem0 0.3.2] - 2025-10-05**
|
||
|
||
- Resolved crash during startup: `TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'`
|
||
- Corrected mount type mismatch (file vs directory) causing `OCI runtime create failed` errors
|
||
- Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests
|
||
- "Unknown event" warnings now safely ignored (no longer break seeding loop)
|
||
- Confirmed full dual-provider operation in logs (`api.openai.com` + `10.0.0.3:11434/api/chat`)
|
||
|
||
**[Lyra-Mem0 0.3.1] - 2025-10-03**
|
||
|
||
- `.env` CRLF vs LF line ending issues
|
||
- Local seeding now possible via HuggingFace server
|
||
|
||
**[Lyra-Mem0 0.3.0]**
|
||
|
||
- Resolved container boot failure caused by missing `ollama` dependency (`ModuleNotFoundError`)
|
||
- Fixed config overwrite issue where rebuilding container restored stock `main.py`
|
||
- Worked around Neo4j error (`vector.similarity.cosine(): mismatched vector dimensions`) by confirming OpenAI vs. Ollama embedding vector sizes
|
||
|
||
**[Lyra-Mem0 v0.2.1]**
|
||
|
||
- Seeder process originally failed on old memories — now skips duplicates and continues batch
|
||
- Resolved container boot error (`ModuleNotFoundError: ollama`) by extending image
|
||
- Fixed overwrite issue where stock `main.py` replaced custom config during rebuild
|
||
- Worked around Neo4j `vector.similarity.cosine()` dimension mismatch
|
||
|
||
### Known Issues
|
||
|
||
**[Lyra-Core v0.3.0] - 2025-09-26**
|
||
|
||
- Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient"
|
||
- Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi")
|
||
- CPU-only inference is functional but limited; larger models recommended once GPU available
|
||
|
||
**[Lyra-Cortex v0.2.0] - 2025-09-26**
|
||
|
||
- Small models tend to drift or over-classify
|
||
- CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models
|
||
- Need to set up `systemd` service for `llama-server` to auto-start on VM reboot
|
||
|
||
### Observations
|
||
|
||
**[Lyra-Mem0 0.3.2] - 2025-10-05**
|
||
|
||
- Stable GPU utilization: ~8 GB VRAM @ 92% load, ≈ 67°C under sustained seeding
|
||
- Next revision will re-format seed JSON to preserve `role` context (user vs assistant)
|
||
|
||
**[Lyra-Mem0 v0.2.1]**
|
||
|
||
- To fully unify embedding modes, a Hugging Face / local model with **1536-dim embeddings** will be needed (to match OpenAI's schema)
|
||
- Current Ollama model (`mxbai-embed-large`) works, but returns 1024-dim vectors
|
||
- Seeder workflow validated but should be wrapped in repeatable weekly run for full Cloud→Local sync
|
||
|
||
### Next Steps
|
||
|
||
**[Lyra-Core 0.3.1] - 2025-10-09**
|
||
|
||
- Add salience visualization (e.g., memory weights displayed in injected system message)
|
||
- Begin schema alignment with NVGRAM v0.1.2 for confidence scoring
|
||
- Add relay auto-retry for transient 500 responses from NVGRAM
|
||
|
||
**[NVGRAM 0.1.1] - 2025-10-08**
|
||
|
||
- Integrate salience scoring and embedding confidence weight fields in Postgres schema
|
||
- Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall
|
||
- Migrate from deprecated `on_event` → `lifespan` pattern in 0.1.2
|
||
|
||
**[NVGRAM 0.1.0] - 2025-10-07**
|
||
|
||
- Integrate NVGRAM as new default backend in Lyra Relay
|
||
- Deprecate remaining Mem0 references and archive old configs
|
||
- Begin versioning as standalone project (`nvgram-core`, `nvgram-api`, etc.)
|
||
|
||
**[Intake v0.1.0] - 2025-10-27**
|
||
|
||
- Feed intake into NeoMem
|
||
- Generate daily/hourly overall summary (IE: Today Brian and Lyra worked on x, y, and z)
|
||
- Generate session-aware summaries with own intake hopper
|
||
|
||
---
|
||
|
||
## [0.2.x] - 2025-09-30 to 2025-09-24
|
||
|
||
### Added
|
||
|
||
**[Lyra-Mem0 v0.2.0] - 2025-09-30**
|
||
|
||
- Standalone **Lyra-Mem0** stack created at `~/lyra-mem0/`
|
||
- Includes Postgres (pgvector), Qdrant, Neo4j, and SQLite for history tracking
|
||
- Added working `docker-compose.mem0.yml` and custom `Dockerfile` for building Mem0 API server
|
||
- Verified REST API functionality
|
||
- `POST /memories` works for adding memories
|
||
- `POST /search` works for semantic search
|
||
- Successful end-to-end test with persisted memory: *"Likes coffee in the morning"* → retrievable via search ✅
|
||
|
||
**[Lyra-Core v0.2.0] - 2025-09-24**
|
||
|
||
- Migrated Relay to use `mem0ai` SDK instead of raw fetch calls
|
||
- Implemented `sessionId` support (client-supplied, fallback to `default`)
|
||
- Added debug logs for memory add/search
|
||
- Cleaned up Relay structure for clarity
|
||
|
||
### Changed
|
||
|
||
**[Lyra-Mem0 v0.2.0] - 2025-09-30**
|
||
|
||
- Split architecture into modular stacks:
|
||
- `~/lyra-core` (Relay, Persona-Sidecar, etc.)
|
||
- `~/lyra-mem0` (Mem0 OSS memory stack)
|
||
- Removed old embedded mem0 containers from Lyra-Core compose file
|
||
- Added Lyra-Mem0 section in README.md
|
||
|
||
### Next Steps
|
||
|
||
**[Lyra-Mem0 v0.2.0] - 2025-09-30**
|
||
|
||
- Wire **Relay → Mem0 API** (integration not yet complete)
|
||
- Add integration tests to verify persistence and retrieval from within Lyra-Core
|
||
|
||
---
|
||
|
||
## [0.1.x] - 2025-09-25 to 2025-09-23
|
||
|
||
### Added
|
||
|
||
**[Lyra_RAG v0.1.0] - 2025-11-07**
|
||
|
||
- Initial standalone RAG module for Project Lyra
|
||
- Persistent ChromaDB vector store (`./chromadb`)
|
||
- Importer `rag_chat_import.py` with:
|
||
- Recursive folder scanning and category tagging
|
||
- Smart chunking (~5k chars)
|
||
- SHA-1 deduplication and chat-ID metadata
|
||
- Timestamp fields (`file_modified`, `imported_at`)
|
||
- Background-safe operation (`nohup`/`tmux`)
|
||
- 68 Lyra-category chats imported:
|
||
- 6,556 new chunks added
|
||
- 1,493 duplicates skipped
|
||
- 7,997 total vectors stored
|
||
|
||
**[Lyra_RAG v0.1.0 API] - 2025-11-07**
|
||
|
||
- `/rag/search` FastAPI endpoint implemented (port 7090)
|
||
- Supports natural-language queries and returns top related excerpts
|
||
- Added answer synthesis step using `gpt-4o-mini`
|
||
|
||
**[Lyra-Core v0.1.0] - 2025-09-23**
|
||
|
||
- First working MVP of **Lyra Core Relay**
|
||
- Relay service accepts `POST /v1/chat/completions` (OpenAI-compatible)
|
||
- Memory integration with Mem0:
|
||
- `POST /memories` on each user message
|
||
- `POST /search` before LLM call
|
||
- Persona Sidecar integration (`GET /current`)
|
||
- OpenAI GPT + Ollama (Mythomax) support in Relay
|
||
- Simple browser-based chat UI (talks to Relay at `http://<host>:7078`)
|
||
- `.env` standardization for Relay + Mem0 + Postgres + Neo4j
|
||
- Working Neo4j + Postgres backing stores for Mem0
|
||
- Initial MVP relay service with raw fetch calls to Mem0
|
||
- Dockerized with basic healthcheck
|
||
|
||
**[Lyra-Cortex v0.1.0] - 2025-09-25**
|
||
|
||
- First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD)
|
||
- Built **llama.cpp** with `llama-server` target via CMake
|
||
- Integrated **Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF)** model
|
||
- Verified API compatibility at `/v1/chat/completions`
|
||
- Local test successful via `curl` → ~523 token response generated
|
||
- Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X)
|
||
- Confirmed usable for salience scoring, summarization, and lightweight reasoning
|
||
|
||
### Fixed
|
||
|
||
**[Lyra-Core v0.1.0] - 2025-09-23**
|
||
|
||
- Resolved crash loop in Neo4j by restricting env vars (`NEO4J_AUTH` only)
|
||
- Relay now correctly reads `MEM0_URL` and `MEM0_API_KEY` from `.env`
|
||
|
||
### Verified
|
||
|
||
**[Lyra_RAG v0.1.0] - 2025-11-07**
|
||
|
||
- Successful recall of Lyra-Core development history (v0.3.0 snapshot)
|
||
- Correct metadata and category tagging for all new imports
|
||
|
||
### Known Issues
|
||
|
||
**[Lyra-Core v0.1.0] - 2025-09-23**
|
||
|
||
- No feedback loop (thumbs up/down) yet
|
||
- Forget/delete flow is manual (via memory IDs)
|
||
- Memory latency ~1–4s depending on embedding model
|
||
|
||
### Next Planned
|
||
|
||
**[Lyra_RAG v0.1.0] - 2025-11-07**
|
||
|
||
- Optional `where` filter parameter for category/date queries
|
||
- Graceful "no results" handler for empty retrievals
|
||
- `rag_docs_import.py` for PDFs and other document types
|
||
|
||
---
|