feat: Implement Trillium notes executor for searching and creating notes via ETAPI

- Added `trillium.py` for searching and creating notes with Trillium's ETAPI. - Implemented `search_notes` and `create_note` functions with appropriate error handling and validation. feat: Add web search functionality using DuckDuckGo - Introduced `web_search.py` for performing web searches without API keys. - Implemented `search_web` function with result handling and validation. feat: Create provider-agnostic function caller for iterative tool calling - Developed `function_caller.py` to manage LLM interactions with tools. - Implemented iterative calling logic with error handling and tool execution. feat: Establish a tool registry for managing available tools - Created `registry.py` to define and manage tool availability and execution. - Integrated feature flags for enabling/disabling tools based on environment variables. feat: Implement event streaming for tool calling processes - Added `stream_events.py` to manage Server-Sent Events (SSE) for tool calling. - Enabled real-time updates during tool execution for enhanced user experience. test: Add tests for tool calling system components - Created `test_tools.py` to validate functionality of code execution, web search, and tool registry. - Implemented asynchronous tests to ensure proper execution and result handling. chore: Add Dockerfile for sandbox environment setup - Created `Dockerfile` to set up a Python environment with necessary dependencies for code execution. chore: Add debug regex script for testing XML parsing - Introduced `debug_regex.py` to validate regex patterns against XML tool calls. chore: Add HTML template for displaying thinking stream events - Created `test_thinking_stream.html` for visualizing tool calling events in a user-friendly format. test: Add tests for OllamaAdapter XML parsing - Developed `test_ollama_parser.py` to validate XML parsing with various test cases, including malformed XML.
2025-12-26 03:49:20 -05:00
parent f1471cde84
commit 64429b19e6
37 changed files with 3238 additions and 23 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,226 @@ Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Se

 ---

+## [0.8.0] - 2025-12-26
+
+### Added - Tool Calling & "Show Your Work" Transparency Feature
+
+**Tool Calling System (Standard Mode)**
+- **Function Calling Infrastructure** [cortex/autonomy/tools/](cortex/autonomy/tools/)
+  - Implemented agentic tool calling for Standard Mode with autonomous multi-step execution
+  - Tool registry system with JSON schema definitions
+  - Adapter pattern for provider-agnostic tool calling (OpenAI, Ollama, llama.cpp)
+  - Maximum 5 iterations per request to prevent runaway loops
+- **Available Tools**
+  - `execute_code` - Sandboxed Python/JavaScript/Bash execution via Docker
+  - `web_search` - Tavily API integration for real-time web queries
+  - `trillium_search` - Internal Trillium knowledge base queries
+- **Provider Adapters** [cortex/autonomy/tools/adapters/](cortex/autonomy/tools/adapters/)
+  - `OpenAIAdapter` - Native function calling support
+  - `OllamaAdapter` - XML-based tool calling for local models
+  - `LlamaCppAdapter` - XML-based tool calling for llama.cpp backend
+  - Automatic tool call parsing and result formatting
+- **Code Execution Sandbox** [cortex/autonomy/tools/code_executor.py](cortex/autonomy/tools/code_executor.py)
+  - Docker-based isolated execution environment
+  - Support for Python, JavaScript (Node.js), and Bash
+  - 30-second timeout with automatic cleanup
+  - Returns stdout, stderr, exit code, and execution time
+  - Prevents filesystem access outside sandbox
+
+**"Show Your Work" - Real-Time Thinking Stream**
+- **Server-Sent Events (SSE) Streaming** [cortex/router.py:478-527](cortex/router.py#L478-L527)
+  - New `/stream/thinking/{session_id}` endpoint for real-time event streaming
+  - Broadcasts internal thinking process during tool calling operations
+  - 30-second keepalive with automatic reconnection support
+  - Events: `connected`, `thinking`, `tool_call`, `tool_result`, `done`, `error`
+- **Stream Manager** [cortex/autonomy/tools/stream_events.py](cortex/autonomy/tools/stream_events.py)
+  - Pub/sub system for managing SSE subscriptions per session
+  - Multiple clients can connect to same session stream
+  - Automatic cleanup of dead queues and closed connections
+  - Zero overhead when no subscribers active
+- **FunctionCaller Integration** [cortex/autonomy/tools/function_caller.py](cortex/autonomy/tools/function_caller.py)
+  - Enhanced with event emission at each step:
+    - "thinking" events before each LLM call
+    - "tool_call" events when invoking tools
+    - "tool_result" events after tool execution
+    - "done" event with final answer
+    - "error" events on failures
+  - Session-aware streaming (only emits when subscribers exist)
+  - Provider-agnostic implementation works with all backends
+- **Thinking Stream UI** [core/ui/thinking-stream.html](core/ui/thinking-stream.html)
+  - Dedicated popup window for real-time thinking visualization
+  - Color-coded events: green (thinking), orange (tool calls), blue (results), purple (done), red (errors)
+  - Auto-scrolling event feed with animations
+  - Connection status indicator with green/red dot
+  - Clear events button and session info display
+  - Mobile-friendly responsive design
+- **UI Integration** [core/ui/index.html](core/ui/index.html)
+  - "🧠 Show Work" button in session selector
+  - Opens thinking stream in popup window
+  - Session ID passed via URL parameter for stream association
+  - Purple/violet button styling to match cyberpunk theme
+
+**Tool Calling Configuration**
+- **Environment Variables** [.env](.env)
+  - `STANDARD_MODE_ENABLE_TOOLS=true` - Enable/disable tool calling
+  - `TAVILY_API_KEY` - API key for web search tool
+  - `TRILLIUM_API_URL` - URL for Trillium knowledge base
+- **Standard Mode Tools Toggle** [cortex/router.py:389-470](cortex/router.py#L389-L470)
+  - `/simple` endpoint checks `STANDARD_MODE_ENABLE_TOOLS` environment variable
+  - Falls back to non-tool mode if disabled
+  - Logs tool usage statistics (iterations, tools used)
+
+### Changed - CORS & Architecture
+
+**CORS Support for SSE**
+- **Added CORS Middleware** [cortex/main.py](cortex/main.py)
+  - FastAPI CORSMiddleware with wildcard origins for development
+  - Allows cross-origin SSE connections from nginx UI (port 8081) to cortex (port 7081)
+  - Credentials support enabled for authenticated requests
+  - All methods and headers permitted
+
+**Tool Calling Pipeline**
+- **Standard Mode Enhancement** [cortex/router.py:389-470](cortex/router.py#L389-L470)
+  - `/simple` endpoint now supports optional tool calling
+  - Multi-iteration agentic loop with LLM + tool execution
+  - Tool results injected back into conversation for next iteration
+  - Graceful degradation to non-tool mode if tools disabled
+
+**JSON Response Formatting**
+- **SSE Event Structure** [cortex/router.py:497-499](cortex/router.py#L497-L499)
+  - Fixed initial "connected" event to use proper JSON serialization
+  - Changed from f-string with nested quotes to `json.dumps()`
+  - Ensures valid JSON for all event types
+
+### Fixed - Critical JavaScript & SSE Issues
+
+**JavaScript Variable Scoping Bug**
+- **Root cause**: `eventSource` variable used before declaration in [thinking-stream.html:218](core/ui/thinking-stream.html#L218)
+- **Symptom**: `Uncaught ReferenceError: can't access lexical declaration 'eventSource' before initialization`
+- **Solution**: Moved variable declarations before `connectStream()` call
+- **Impact**: Thinking stream page now loads without errors and establishes SSE connection
+
+**SSE Connection Not Establishing**
+- **Root cause**: CORS blocked cross-origin SSE requests from nginx (8081) to cortex (7081)
+- **Symptom**: Browser silently blocked EventSource connection, no errors in console
+- **Solution**: Added CORSMiddleware to cortex FastAPI app
+- **Impact**: SSE streams now connect successfully across ports
+
+**Invalid JSON in SSE Events**
+- **Root cause**: Initial "connected" event used f-string with nested quotes: `f"data: {{'type': 'connected', 'session_id': '{session_id}'}}\n\n"`
+- **Symptom**: Browser couldn't parse malformed JSON, connection appeared stuck on "Connecting..."
+- **Solution**: Used `json.dumps()` for proper JSON serialization
+- **Impact**: Connected event now parsed correctly, status updates to green dot
+
+### Technical Improvements
+
+**Agentic Architecture**
+- Multi-iteration reasoning loop with tool execution
+- Provider-agnostic tool calling via adapter pattern
+- Automatic tool result injection into conversation context
+- Iteration limits to prevent infinite loops
+- Comprehensive logging at each step
+
+**Event Streaming Performance**
+- Zero overhead when no subscribers (check before emit)
+- Efficient pub/sub with asyncio queues
+- Automatic cleanup of disconnected clients
+- 30-second keepalive prevents timeout issues
+- Session-isolated streams prevent cross-talk
+
+**Code Quality**
+- Clean separation: tool execution, adapters, streaming, UI
+- Comprehensive error handling with fallbacks
+- Detailed logging for debugging tool calls
+- Type hints and docstrings throughout
+- Modular design for easy extension
+
+**Security**
+- Sandboxed code execution prevents filesystem access
+- Timeout limits prevent resource exhaustion
+- Docker isolation for untrusted code
+- No code execution without explicit user request
+
+### Architecture - Tool Calling Flow
+
+**Standard Mode with Tools:**
+```
+User (UI) → Relay → Cortex /simple
+  ↓
+  Check STANDARD_MODE_ENABLE_TOOLS
+  ↓
+  LLM generates tool call → FunctionCaller
+  ↓
+  Execute tool (Docker sandbox / API call)
+  ↓
+  Inject result → LLM (next iteration)
+  ↓
+  Repeat until done or max iterations
+  ↓
+  Return final answer → UI
+```
+
+**Thinking Stream Flow:**
+```
+Browser → nginx:8081 → thinking-stream.html
+  ↓
+EventSource connects to cortex:7081/stream/thinking/{session_id}
+  ↓
+ToolStreamManager.subscribe(session_id) → asyncio.Queue
+  ↓
+User sends message → /simple endpoint
+  ↓
+FunctionCaller emits events:
+  - emit("thinking") → Queue → SSE → Browser
+  - emit("tool_call") → Queue → SSE → Browser
+  - emit("tool_result") → Queue → SSE → Browser
+  - emit("done") → Queue → SSE → Browser
+  ↓
+Browser displays color-coded events in real-time
+```
+
+### Documentation
+
+- **Added** [THINKING_STREAM.md](THINKING_STREAM.md) - Complete guide to "Show Your Work" feature
+  - Usage examples with curl
+  - Event type reference
+  - Architecture diagrams
+  - Demo page instructions
+- **Added** [UI_THINKING_STREAM.md](UI_THINKING_STREAM.md) - UI integration documentation
+  - Button placement and styling
+  - Popup window behavior
+  - Session association logic
+
+### Known Limitations
+
+**Tool Calling:**
+- Limited to 5 iterations per request (prevents runaway loops)
+- Python sandbox has no filesystem persistence (temporary only)
+- Web search requires Tavily API key (not free tier unlimited)
+- Trillium search requires separate knowledge base setup
+
+**Thinking Stream:**
+- CORS wildcard (`*`) is development-only (should restrict in production)
+- Stream ends after "done" event (must reconnect for new request)
+- No historical replay (only shows real-time events)
+- Single session per stream window
+
+### Migration Notes
+
+**For Users Upgrading:**
+1. New environment variable: `STANDARD_MODE_ENABLE_TOOLS=true` (default: enabled)
+2. Thinking stream accessible via "🧠 Show Work" button in UI
+3. Tool calling works automatically in Standard Mode when enabled
+4. No changes required to existing Standard Mode usage
+
+**For Developers:**
+1. Cortex now includes CORS middleware for SSE
+2. New `/stream/thinking/{session_id}` endpoint available
+3. FunctionCaller requires `session_id` parameter for streaming
+4. Tool adapters can be extended by adding to `AVAILABLE_TOOLS` registry
+
+---
+
 ## [0.7.0] - 2025-12-21

 ### Added - Standard Mode & UI Enhancements