project-lyra/THINKING_STREAM.md

# "Show Your Work" - Thinking Stream Feature

Real-time Server-Sent Events (SSE) stream that broadcasts the internal thinking process during tool calling operations.

## What It Does

When Lyra uses tools to answer a question, you can now watch her "think" in real-time through a parallel stream:

- 🤔 **Thinking** - When she's planning what to do
- 🔧 **Tool Calls** - When she decides to use a tool
- 📊 **Tool Results** - The results from tool execution
- ✅ **Done** - When she has the final answer
- ❌ **Errors** - If something goes wrong

## How To Use

### 1. Open the SSE Stream

Connect to the thinking stream for a session:

```bash
curl -N http://localhost:7081/stream/thinking/{session_id}
```

The stream will send Server-Sent Events in this format:

```
data: {"type": "thinking", "data": {"message": "🤔 Thinking... (iteration 1/5)"}}

data: {"type": "tool_call", "data": {"tool": "execute_code", "args": {...}, "message": "🔧 Using tool: execute_code"}}

data: {"type": "tool_result", "data": {"tool": "execute_code", "result": {...}, "message": "📊 Result: ..."}}

data: {"type": "done", "data": {"message": "✅ Complete!", "final_answer": "The result is..."}}
```

### 2. Send a Request

In parallel, send a request to `/simple` with the same `session_id`:

```bash
curl -X POST http://localhost:7081/simple \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "your-session-id",
    "user_prompt": "Calculate 50/2 using Python",
    "backend": "SECONDARY"
  }'
```

### 3. Watch the Stream

As the request processes, you'll see real-time events showing:
- Each thinking iteration
- Every tool call being made
- The results from each tool
- The final answer

## Event Types

| Event Type | Description | Data Fields |
|-----------|-------------|-------------|
| `connected` | Initial connection | `session_id` |
| `thinking` | LLM is processing | `message` |
| `tool_call` | Tool is being invoked | `tool`, `args`, `message` |
| `tool_result` | Tool execution completed | `tool`, `result`, `message` |
| `done` | Process complete | `message`, `final_answer` |
| `error` | Something went wrong | `message` |

## Demo Page

A demo HTML page is included at [test_thinking_stream.html](../test_thinking_stream.html):

```bash
# Serve the demo page
python3 -m http.server 8000
```

Then open http://localhost:8000/test_thinking_stream.html in your browser.

The demo shows:
- **Left panel**: Chat interface
- **Right panel**: Real-time thinking stream
- **Mobile**: Swipe between panels

## Architecture

### Components

1. **ToolStreamManager** (`autonomy/tools/stream_events.py`)
   - Manages SSE subscriptions per session
   - Broadcasts events to all connected clients
   - Handles automatic cleanup

2. **FunctionCaller** (`autonomy/tools/function_caller.py`)
   - Enhanced with event emission at each step
   - Checks for active subscribers before emitting
   - Passes `session_id` through the call chain

3. **SSE Endpoint** (`/stream/thinking/{session_id}`)
   - FastAPI streaming response
   - 30-second keepalive for connection maintenance
   - Automatic reconnection on client side

### Event Flow

```
Client                 SSE Endpoint           FunctionCaller          Tools
  |                         |                         |                  |
  |--- Connect SSE -------->|                         |                  |
  |<-- connected ----------|                          |                  |
  |                         |                         |                  |
  |--- POST /simple --------|                         |                  |
  |                         |                         |                  |
  |                         |<-- emit("thinking") ---|                  |
  |<-- thinking ------------|                         |                  |
  |                         |                         |                  |
  |                         |<-- emit("tool_call") ---|                  |
  |<-- tool_call -----------|                         |                  |
  |                         |                         |-- execute ------>|
  |                         |                         |<-- result -------|
  |                         |<-- emit("tool_result")--|                  |
  |<-- tool_result ---------|                         |                  |
  |                         |                         |                  |
  |                         |<-- emit("done") --------|                  |
  |<-- done ---------------|                         |                  |
  |                         |                         |                  |
```

## Configuration

No additional configuration needed! The feature works automatically when:
1. `STANDARD_MODE_ENABLE_TOOLS=true` (already set)
2. A client connects to the SSE stream BEFORE sending the request

## Example Output

```
🟢 Connected to thinking stream
✓ Connected (Session: thinking-demo-1735177234567)
🤔 Thinking... (iteration 1/5)
🔧 Using tool: execute_code
📊 Result: {'stdout': '12.0\n', 'stderr': '', 'exit_code': 0, 'execution_time': 0.04}
🤔 Thinking... (iteration 2/5)
✅ Complete!
```

## Use Cases

- **Debugging**: See exactly what tools are being called and why
- **Transparency**: Show users what the AI is doing behind the scenes
- **Education**: Learn how the system breaks down complex tasks
- **UI Enhancement**: Create engaging "thinking" animations
- **Mobile App**: Separate tab for "Show Your Work" view

## Future Enhancements

Potential additions:
- Token usage per iteration
- Estimated time remaining
- Tool execution duration
- Intermediate reasoning steps
- Visual progress indicators