feat: Implement Trillium notes executor for searching and creating notes via ETAPI

- Added `trillium.py` for searching and creating notes with Trillium's ETAPI.
- Implemented `search_notes` and `create_note` functions with appropriate error handling and validation.

feat: Add web search functionality using DuckDuckGo

- Introduced `web_search.py` for performing web searches without API keys.
- Implemented `search_web` function with result handling and validation.

feat: Create provider-agnostic function caller for iterative tool calling

- Developed `function_caller.py` to manage LLM interactions with tools.
- Implemented iterative calling logic with error handling and tool execution.

feat: Establish a tool registry for managing available tools

- Created `registry.py` to define and manage tool availability and execution.
- Integrated feature flags for enabling/disabling tools based on environment variables.

feat: Implement event streaming for tool calling processes

- Added `stream_events.py` to manage Server-Sent Events (SSE) for tool calling.
- Enabled real-time updates during tool execution for enhanced user experience.

test: Add tests for tool calling system components

- Created `test_tools.py` to validate functionality of code execution, web search, and tool registry.
- Implemented asynchronous tests to ensure proper execution and result handling.

chore: Add Dockerfile for sandbox environment setup

- Created `Dockerfile` to set up a Python environment with necessary dependencies for code execution.

chore: Add debug regex script for testing XML parsing

- Introduced `debug_regex.py` to validate regex patterns against XML tool calls.

chore: Add HTML template for displaying thinking stream events

- Created `test_thinking_stream.html` for visualizing tool calling events in a user-friendly format.

test: Add tests for OllamaAdapter XML parsing

- Developed `test_ollama_parser.py` to validate XML parsing with various test cases, including malformed XML.
This commit is contained in:
serversdwn
2025-12-26 03:49:20 -05:00
parent f1471cde84
commit 64429b19e6
37 changed files with 3238 additions and 23 deletions

View File

@@ -9,6 +9,226 @@ Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Se
---
## [0.8.0] - 2025-12-26
### Added - Tool Calling & "Show Your Work" Transparency Feature
**Tool Calling System (Standard Mode)**
- **Function Calling Infrastructure** [cortex/autonomy/tools/](cortex/autonomy/tools/)
- Implemented agentic tool calling for Standard Mode with autonomous multi-step execution
- Tool registry system with JSON schema definitions
- Adapter pattern for provider-agnostic tool calling (OpenAI, Ollama, llama.cpp)
- Maximum 5 iterations per request to prevent runaway loops
- **Available Tools**
- `execute_code` - Sandboxed Python/JavaScript/Bash execution via Docker
- `web_search` - Tavily API integration for real-time web queries
- `trillium_search` - Internal Trillium knowledge base queries
- **Provider Adapters** [cortex/autonomy/tools/adapters/](cortex/autonomy/tools/adapters/)
- `OpenAIAdapter` - Native function calling support
- `OllamaAdapter` - XML-based tool calling for local models
- `LlamaCppAdapter` - XML-based tool calling for llama.cpp backend
- Automatic tool call parsing and result formatting
- **Code Execution Sandbox** [cortex/autonomy/tools/code_executor.py](cortex/autonomy/tools/code_executor.py)
- Docker-based isolated execution environment
- Support for Python, JavaScript (Node.js), and Bash
- 30-second timeout with automatic cleanup
- Returns stdout, stderr, exit code, and execution time
- Prevents filesystem access outside sandbox
**"Show Your Work" - Real-Time Thinking Stream**
- **Server-Sent Events (SSE) Streaming** [cortex/router.py:478-527](cortex/router.py#L478-L527)
- New `/stream/thinking/{session_id}` endpoint for real-time event streaming
- Broadcasts internal thinking process during tool calling operations
- 30-second keepalive with automatic reconnection support
- Events: `connected`, `thinking`, `tool_call`, `tool_result`, `done`, `error`
- **Stream Manager** [cortex/autonomy/tools/stream_events.py](cortex/autonomy/tools/stream_events.py)
- Pub/sub system for managing SSE subscriptions per session
- Multiple clients can connect to same session stream
- Automatic cleanup of dead queues and closed connections
- Zero overhead when no subscribers active
- **FunctionCaller Integration** [cortex/autonomy/tools/function_caller.py](cortex/autonomy/tools/function_caller.py)
- Enhanced with event emission at each step:
- "thinking" events before each LLM call
- "tool_call" events when invoking tools
- "tool_result" events after tool execution
- "done" event with final answer
- "error" events on failures
- Session-aware streaming (only emits when subscribers exist)
- Provider-agnostic implementation works with all backends
- **Thinking Stream UI** [core/ui/thinking-stream.html](core/ui/thinking-stream.html)
- Dedicated popup window for real-time thinking visualization
- Color-coded events: green (thinking), orange (tool calls), blue (results), purple (done), red (errors)
- Auto-scrolling event feed with animations
- Connection status indicator with green/red dot
- Clear events button and session info display
- Mobile-friendly responsive design
- **UI Integration** [core/ui/index.html](core/ui/index.html)
- "🧠 Show Work" button in session selector
- Opens thinking stream in popup window
- Session ID passed via URL parameter for stream association
- Purple/violet button styling to match cyberpunk theme
**Tool Calling Configuration**
- **Environment Variables** [.env](.env)
- `STANDARD_MODE_ENABLE_TOOLS=true` - Enable/disable tool calling
- `TAVILY_API_KEY` - API key for web search tool
- `TRILLIUM_API_URL` - URL for Trillium knowledge base
- **Standard Mode Tools Toggle** [cortex/router.py:389-470](cortex/router.py#L389-L470)
- `/simple` endpoint checks `STANDARD_MODE_ENABLE_TOOLS` environment variable
- Falls back to non-tool mode if disabled
- Logs tool usage statistics (iterations, tools used)
### Changed - CORS & Architecture
**CORS Support for SSE**
- **Added CORS Middleware** [cortex/main.py](cortex/main.py)
- FastAPI CORSMiddleware with wildcard origins for development
- Allows cross-origin SSE connections from nginx UI (port 8081) to cortex (port 7081)
- Credentials support enabled for authenticated requests
- All methods and headers permitted
**Tool Calling Pipeline**
- **Standard Mode Enhancement** [cortex/router.py:389-470](cortex/router.py#L389-L470)
- `/simple` endpoint now supports optional tool calling
- Multi-iteration agentic loop with LLM + tool execution
- Tool results injected back into conversation for next iteration
- Graceful degradation to non-tool mode if tools disabled
**JSON Response Formatting**
- **SSE Event Structure** [cortex/router.py:497-499](cortex/router.py#L497-L499)
- Fixed initial "connected" event to use proper JSON serialization
- Changed from f-string with nested quotes to `json.dumps()`
- Ensures valid JSON for all event types
### Fixed - Critical JavaScript & SSE Issues
**JavaScript Variable Scoping Bug**
- **Root cause**: `eventSource` variable used before declaration in [thinking-stream.html:218](core/ui/thinking-stream.html#L218)
- **Symptom**: `Uncaught ReferenceError: can't access lexical declaration 'eventSource' before initialization`
- **Solution**: Moved variable declarations before `connectStream()` call
- **Impact**: Thinking stream page now loads without errors and establishes SSE connection
**SSE Connection Not Establishing**
- **Root cause**: CORS blocked cross-origin SSE requests from nginx (8081) to cortex (7081)
- **Symptom**: Browser silently blocked EventSource connection, no errors in console
- **Solution**: Added CORSMiddleware to cortex FastAPI app
- **Impact**: SSE streams now connect successfully across ports
**Invalid JSON in SSE Events**
- **Root cause**: Initial "connected" event used f-string with nested quotes: `f"data: {{'type': 'connected', 'session_id': '{session_id}'}}\n\n"`
- **Symptom**: Browser couldn't parse malformed JSON, connection appeared stuck on "Connecting..."
- **Solution**: Used `json.dumps()` for proper JSON serialization
- **Impact**: Connected event now parsed correctly, status updates to green dot
### Technical Improvements
**Agentic Architecture**
- Multi-iteration reasoning loop with tool execution
- Provider-agnostic tool calling via adapter pattern
- Automatic tool result injection into conversation context
- Iteration limits to prevent infinite loops
- Comprehensive logging at each step
**Event Streaming Performance**
- Zero overhead when no subscribers (check before emit)
- Efficient pub/sub with asyncio queues
- Automatic cleanup of disconnected clients
- 30-second keepalive prevents timeout issues
- Session-isolated streams prevent cross-talk
**Code Quality**
- Clean separation: tool execution, adapters, streaming, UI
- Comprehensive error handling with fallbacks
- Detailed logging for debugging tool calls
- Type hints and docstrings throughout
- Modular design for easy extension
**Security**
- Sandboxed code execution prevents filesystem access
- Timeout limits prevent resource exhaustion
- Docker isolation for untrusted code
- No code execution without explicit user request
### Architecture - Tool Calling Flow
**Standard Mode with Tools:**
```
User (UI) → Relay → Cortex /simple
Check STANDARD_MODE_ENABLE_TOOLS
LLM generates tool call → FunctionCaller
Execute tool (Docker sandbox / API call)
Inject result → LLM (next iteration)
Repeat until done or max iterations
Return final answer → UI
```
**Thinking Stream Flow:**
```
Browser → nginx:8081 → thinking-stream.html
EventSource connects to cortex:7081/stream/thinking/{session_id}
ToolStreamManager.subscribe(session_id) → asyncio.Queue
User sends message → /simple endpoint
FunctionCaller emits events:
- emit("thinking") → Queue → SSE → Browser
- emit("tool_call") → Queue → SSE → Browser
- emit("tool_result") → Queue → SSE → Browser
- emit("done") → Queue → SSE → Browser
Browser displays color-coded events in real-time
```
### Documentation
- **Added** [THINKING_STREAM.md](THINKING_STREAM.md) - Complete guide to "Show Your Work" feature
- Usage examples with curl
- Event type reference
- Architecture diagrams
- Demo page instructions
- **Added** [UI_THINKING_STREAM.md](UI_THINKING_STREAM.md) - UI integration documentation
- Button placement and styling
- Popup window behavior
- Session association logic
### Known Limitations
**Tool Calling:**
- Limited to 5 iterations per request (prevents runaway loops)
- Python sandbox has no filesystem persistence (temporary only)
- Web search requires Tavily API key (not free tier unlimited)
- Trillium search requires separate knowledge base setup
**Thinking Stream:**
- CORS wildcard (`*`) is development-only (should restrict in production)
- Stream ends after "done" event (must reconnect for new request)
- No historical replay (only shows real-time events)
- Single session per stream window
### Migration Notes
**For Users Upgrading:**
1. New environment variable: `STANDARD_MODE_ENABLE_TOOLS=true` (default: enabled)
2. Thinking stream accessible via "🧠 Show Work" button in UI
3. Tool calling works automatically in Standard Mode when enabled
4. No changes required to existing Standard Mode usage
**For Developers:**
1. Cortex now includes CORS middleware for SSE
2. New `/stream/thinking/{session_id}` endpoint available
3. FunctionCaller requires `session_id` parameter for streaming
4. Tool adapters can be extended by adding to `AVAILABLE_TOOLS` registry
---
## [0.7.0] - 2025-12-21
### Added - Standard Mode & UI Enhancements