project-lyra/LOGGING_REFACTOR_SUMMARY.md

# Cortex Logging Refactor Summary

## 🎯 Problem Statement

The cortex chat loop had severe logging issues that made debugging impossible:

1. **Massive verbosity**: 100+ log lines per chat message
2. **Raw LLM dumps**: Full JSON responses pretty-printed on every call (1000s of lines)
3. **Repeated data**: NeoMem results logged 71 times individually
4. **No structure**: Scattered emoji logs with no hierarchy
5. **Impossible to debug**: Couldn't tell if loops were happening or just verbose logging
6. **No loop protection**: Unbounded message history growth, no session cleanup, no duplicate detection

## ✅ What Was Fixed

### 1. **Structured Hierarchical Logging**

**Before:**
```
🔍 RAW LLM RESPONSE: {
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here is a very long response that goes on for hundreds of lines..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579
  }
}
🧠 Trying backend: PRIMARY (http://localhost:8000)
✅ Success via PRIMARY
[STAGE 0] Collecting unified context...
[STAGE 0] Context collected - 5 RAG results
[COLLECT_CONTEXT] Intake data retrieved:
{
  "L1": [...],
  "L5": [...],
  "L10": {...},
  "L20": {...},
  "L30": {...}
}
[COLLECT_CONTEXT] NeoMem search returned 71 results
  [1] Score: 0.923 - Memory content here...
  [2] Score: 0.891 - More memory content...
  [3] Score: 0.867 - Even more content...
  ... (68 more lines)
```

**After (summary mode - DEFAULT):**
```
✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question about...
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
🧠 Monologue | question | Tone: curious
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
📤 Output: 342 characters
```

**After (detailed mode - for debugging):**
```
════════════════════════════════════════════════════════════════════════════════════════════════════
🚀 PIPELINE START | Session: abc123 | 14:23:45.123
════════════════════════════════════════════════════════════════════════════════════════════════════
📝 User: What is the meaning of life?
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────────────────────
🧠 LLM CALL | Backend: PRIMARY | 14:23:45.234
────────────────────────────────────────────────────────────────────────────────────────────────────
📝 Prompt: You are Lyra, a thoughtful AI assistant...
💬 Reply: Based on philosophical perspectives, the meaning...

📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
────────────────────────────────────────────────────────────────────────────────────────────────────
[CONTEXT] Session abc123 | User: What is the meaning of life?
────────────────────────────────────────────────────────────────────────────────────────────────────
  Mode: default | Mood: neutral | Project: None
  Tools: RAG, WEB, WEATHER, CODEBRAIN, POKERBRAIN

  ╭─ INTAKE SUMMARIES ────────────────────────────────────────────────
  │ L1  : Last message discussed philosophy...
  │ L5  : Recent 5 messages covered existential topics...
  │ L10 : Past 10 messages showed curiosity pattern...
  │ L20 : Session focused on deep questions...
  │ L30 : Long-term trend shows philosophical interest...
  ╰───────────────────────────────────────────────────────────────────

  ╭─ RAG RESULTS (5) ──────────────────────────────────────────────
  │ [1] 0.923 | Previous discussion about purpose and meaning...
  │ [2] 0.891 | Note about existential philosophy...
  │ [3] 0.867 | Memory of Viktor Frankl discussion...
  │ [4] 0.834 | Reference to stoic philosophy...
  │ [5] 0.801 | Buddhism and the middle path...
  ╰───────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────

════════════════════════════════════════════════════════════════════════════════════════════════════
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
════════════════════════════════════════════════════════════════════════════════════════════════════
⏱️  Stage Timings:
   context        :   150ms ( 12.0%)
   identity       :    10ms (  0.8%)
   monologue      :   200ms ( 16.0%)
   tools          :     0ms (  0.0%)
   reflection     :    50ms (  4.0%)
   reasoning      :   450ms ( 36.0%)
   refinement     :   300ms ( 24.0%)
   persona        :   140ms ( 11.2%)
📤 Output: 342 characters
════════════════════════════════════════════════════════════════════════════════════════════════════
```

### 2. **Configurable Verbosity Levels**

Set via `LOG_DETAIL_LEVEL` environment variable:

- **`minimal`**: Only errors and critical events
- **`summary`**: Stage completion + errors (DEFAULT - recommended for production)
- **`detailed`**: Include raw LLM outputs, RAG results, timing breakdowns (for debugging)
- **`verbose`**: Everything including full JSON dumps (for deep debugging)

### 3. **Raw LLM Output Visibility** ✅

**You can now see raw LLM outputs clearly!**

In `detailed` or `verbose` mode, LLM calls show:
- Backend used
- Prompt preview
- Parsed reply
- **Raw JSON response in collapsible format** (verbose only)

```
╭─ RAW RESPONSE ────────────────────────────────────────────────────────────────────────────
│ {
│   "id": "chatcmpl-123",
│   "object": "chat.completion",
│   "model": "gpt-4",
│   "choices": [
│     {
│       "message": {
│         "content": "Full response here..."
│       }
│     }
│   ]
│ }
╰───────────────────────────────────────────────────────────────────────────────────────────
```

### 4. **Loop Detection & Protection** ✅

**New safety features:**

- **Duplicate message detection**: Prevents processing the same message twice
- **Message history trimming**: Auto-trims to last 100 messages (configurable via `MAX_MESSAGE_HISTORY`)
- **Session TTL**: Auto-expires inactive sessions after 24 hours (configurable via `SESSION_TTL_HOURS`)
- **Hash-based detection**: Uses MD5 hash to detect exact duplicate messages

**Example warning when loop detected:**
```
⚠️  DUPLICATE MESSAGE DETECTED | Session: abc123 | Message: What is the meaning of life?
🔁 LOOP DETECTED - Returning cached context to prevent processing duplicate
```

### 5. **Performance Timing** ✅

In `detailed` mode, see exactly where time is spent:

```
⏱️  Stage Timings:
   context        :   150ms ( 12.0%)  ← Context collection
   identity       :    10ms (  0.8%)  ← Identity loading
   monologue      :   200ms ( 16.0%)  ← Inner monologue
   tools          :     0ms (  0.0%)  ← Autonomous tools
   reflection     :    50ms (  4.0%)  ← Reflection notes
   reasoning      :   450ms ( 36.0%)  ← Main reasoning (BOTTLENECK)
   refinement     :   300ms ( 24.0%)  ← Answer refinement
   persona        :   140ms ( 11.2%)  ← Persona layer
```

**This helps you identify weak links in the chain!**

## 📁 Files Modified

### Core Changes

1. **[llm.js](core/relay/lib/llm.js)**
   - Removed massive JSON dump on line 53
   - Added structured logging with 4 verbosity levels
   - Shows raw responses only in verbose mode (collapsible format)
   - Tracks failed backends and shows summary on total failure

2. **[context.py](cortex/context.py)**
   - Condensed 71-line NeoMem loop to 5-line summary
   - Removed repeated intake data dumps
   - Added structured hierarchical logging with boxes
   - Added duplicate message detection
   - Added message history trimming
   - Added session TTL and cleanup

3. **[router.py](cortex/router.py)**
   - Replaced 15+ stage logs with unified pipeline summary
   - Added stage timing collection
   - Shows performance breakdown in detailed mode
   - Clean start/end markers with total duration

### New Files

4. **[utils/logging_utils.py](cortex/utils/logging_utils.py)** (NEW)
   - Reusable structured logging utilities
   - `PipelineLogger` class for hierarchical logging
   - Collapsible data sections
   - Stage timing tracking
   - Future-ready for expansion

5. **[.env.logging.example](.env.logging.example)** (NEW)
   - Complete logging configuration guide
   - Shows example output at each verbosity level
   - Documents all environment variables
   - Production-ready defaults

6. **[LOGGING_REFACTOR_SUMMARY.md](LOGGING_REFACTOR_SUMMARY.md)** (THIS FILE)

## 🚀 How to Use

### For Finding Weak Links (Your Use Case)

```bash
# Set in your .env or export:
export LOG_DETAIL_LEVEL=detailed
export VERBOSE_DEBUG=false  # or true for even more detail

# Now run your chat - you'll see:
# 1. Which LLM backend is used
# 2. Raw LLM outputs (in verbose mode)
# 3. Exact timing per stage
# 4. Which stage is taking longest
```

### For Production

```bash
export LOG_DETAIL_LEVEL=summary

# Minimal, clean logs:
# ✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question...
# ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
```

### For Deep Debugging

```bash
export LOG_DETAIL_LEVEL=verbose
export LOG_RAW_CONTEXT_DATA=true

# Shows EVERYTHING including full JSON dumps
```

## 🔍 Finding Weak Links - Quick Guide

**Problem: "Which LLM stage is failing or producing bad output?"**

1. Set `LOG_DETAIL_LEVEL=detailed`
2. Run a test conversation
3. Look for timing anomalies:
   ```
   reasoning      :  3450ms ( 76.0%)  ← BOTTLENECK!
   ```
4. Look for errors:
   ```
   ⚠️  Reflection failed: Connection timeout
   ```
5. Check raw LLM outputs (set `VERBOSE_DEBUG=true`):
   ```
   ╭─ RAW RESPONSE ────────────────────────────────────
   │ {
   │   "choices": [
   │     { "message": { "content": "..." } }
   │   ]
   │ }
   ╰───────────────────────────────────────────────────
   ```

**Problem: "Is the loop repeating operations?"**

1. Enable duplicate detection (on by default)
2. Look for loop warnings:
   ```
   ⚠️  DUPLICATE MESSAGE DETECTED | Session: abc123
   🔁 LOOP DETECTED - Returning cached context
   ```
3. Check stage timings - repeated stages will show up as duplicates

**Problem: "Which RAG memories are being used?"**

1. Set `LOG_DETAIL_LEVEL=detailed`
2. Look for RAG results box:
   ```
   ╭─ RAG RESULTS (5) ──────────────────────────────
   │ [1] 0.923 | Previous discussion about X...
   │ [2] 0.891 | Note about Y...
   ╰────────────────────────────────────────────────
   ```

## 📊 Environment Variables Reference

| Variable | Default | Description |
|----------|---------|-------------|
| `LOG_DETAIL_LEVEL` | `summary` | Verbosity: minimal/summary/detailed/verbose |
| `VERBOSE_DEBUG` | `false` | Legacy flag for maximum verbosity |
| `LOG_RAW_CONTEXT_DATA` | `false` | Show full intake data dumps |
| `ENABLE_DUPLICATE_DETECTION` | `true` | Detect and prevent duplicate messages |
| `MAX_MESSAGE_HISTORY` | `100` | Max messages to keep per session |
| `SESSION_TTL_HOURS` | `24` | Auto-expire sessions after N hours |

## 🎉 Results

**Before:** 1000+ lines of logs per chat message, unreadable, couldn't identify issues

**After (summary mode):** 5 lines of structured logs, clear and actionable

**After (detailed mode):** ~50 lines with full visibility into each stage, timing, and raw outputs

**Loop protection:** Automatic detection and prevention of duplicate processing

**You can now:**
✅ See raw LLM outputs clearly (in detailed/verbose mode)
✅ Identify performance bottlenecks (stage timings)
✅ Detect loops and duplicates (automatic)
✅ Find failing stages (error markers)
✅ Scan logs quickly (hierarchical structure)
✅ Debug production issues (adjustable verbosity)

## 🔧 Next Steps (Optional Improvements)

1. **Structured JSON logging**: Output as JSON for log aggregation tools
2. **Log rotation**: Implement file rotation for verbose logs
3. **Metrics export**: Export stage timings to Prometheus/Grafana
4. **Error categorization**: Tag errors by type (network, timeout, parsing, etc.)
5. **Performance alerts**: Auto-alert when stages exceed thresholds

---

**Happy debugging! You can now see what's actually happening in the cortex loop.** 🎯