Cortex debugging logs cleaned up

2025-12-20 02:49:20 -05:00
parent 970907cf1b
commit 6bb800f5f8
10 changed files with 1452 additions and 242 deletions
--- a/LOGGING_REFACTOR_SUMMARY.md
+++ b/LOGGING_REFACTOR_SUMMARY.md
@@ -0,0 +1,352 @@
+# Cortex Logging Refactor Summary
+
+## 🎯 Problem Statement
+
+The cortex chat loop had severe logging issues that made debugging impossible:
+
+1. **Massive verbosity**: 100+ log lines per chat message
+2. **Raw LLM dumps**: Full JSON responses pretty-printed on every call (1000s of lines)
+3. **Repeated data**: NeoMem results logged 71 times individually
+4. **No structure**: Scattered emoji logs with no hierarchy
+5. **Impossible to debug**: Couldn't tell if loops were happening or just verbose logging
+6. **No loop protection**: Unbounded message history growth, no session cleanup, no duplicate detection
+
+## ✅ What Was Fixed
+
+### 1. **Structured Hierarchical Logging**
+
+**Before:**
+```
+🔍 RAW LLM RESPONSE: {
+  "id": "chatcmpl-123",
+  "object": "chat.completion",
+  "created": 1234567890,
+  "model": "gpt-4",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "Here is a very long response that goes on for hundreds of lines..."
+      }
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 123,
+    "completion_tokens": 456,
+    "total_tokens": 579
+  }
+}
+🧠 Trying backend: PRIMARY (http://localhost:8000)
+✅ Success via PRIMARY
+[STAGE 0] Collecting unified context...
+[STAGE 0] Context collected - 5 RAG results
+[COLLECT_CONTEXT] Intake data retrieved:
+{
+  "L1": [...],
+  "L5": [...],
+  "L10": {...},
+  "L20": {...},
+  "L30": {...}
+}
+[COLLECT_CONTEXT] NeoMem search returned 71 results
+  [1] Score: 0.923 - Memory content here...
+  [2] Score: 0.891 - More memory content...
+  [3] Score: 0.867 - Even more content...
+  ... (68 more lines)
+```
+
+**After (summary mode - DEFAULT):**
+```
+✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question about...
+📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
+🧠 Monologue | question | Tone: curious
+✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
+📤 Output: 342 characters
+```
+
+**After (detailed mode - for debugging):**
+```
+════════════════════════════════════════════════════════════════════════════════════════════════════
+🚀 PIPELINE START | Session: abc123 | 14:23:45.123
+════════════════════════════════════════════════════════════════════════════════════════════════════
+📝 User: What is the meaning of life?
+────────────────────────────────────────────────────────────────────────────────────────────────────
+
+────────────────────────────────────────────────────────────────────────────────────────────────────
+🧠 LLM CALL | Backend: PRIMARY | 14:23:45.234
+────────────────────────────────────────────────────────────────────────────────────────────────────
+📝 Prompt: You are Lyra, a thoughtful AI assistant...
+💬 Reply: Based on philosophical perspectives, the meaning...
+
+📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
+────────────────────────────────────────────────────────────────────────────────────────────────────
+[CONTEXT] Session abc123 | User: What is the meaning of life?
+────────────────────────────────────────────────────────────────────────────────────────────────────
+  Mode: default | Mood: neutral | Project: None
+  Tools: RAG, WEB, WEATHER, CODEBRAIN, POKERBRAIN
+
+  ╭─ INTAKE SUMMARIES ────────────────────────────────────────────────
+  │ L1  : Last message discussed philosophy...
+  │ L5  : Recent 5 messages covered existential topics...
+  │ L10 : Past 10 messages showed curiosity pattern...
+  │ L20 : Session focused on deep questions...
+  │ L30 : Long-term trend shows philosophical interest...
+  ╰───────────────────────────────────────────────────────────────────
+
+  ╭─ RAG RESULTS (5) ──────────────────────────────────────────────
+  │ [1] 0.923 | Previous discussion about purpose and meaning...
+  │ [2] 0.891 | Note about existential philosophy...
+  │ [3] 0.867 | Memory of Viktor Frankl discussion...
+  │ [4] 0.834 | Reference to stoic philosophy...
+  │ [5] 0.801 | Buddhism and the middle path...
+  ╰───────────────────────────────────────────────────────────────────
+────────────────────────────────────────────────────────────────────────────────────────────────────
+
+════════════════════════════════════════════════════════════════════════════════════════════════════
+✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
+════════════════════════════════════════════════════════════════════════════════════════════════════
+⏱️  Stage Timings:
+   context        :   150ms ( 12.0%)
+   identity       :    10ms (  0.8%)
+   monologue      :   200ms ( 16.0%)
+   tools          :     0ms (  0.0%)
+   reflection     :    50ms (  4.0%)
+   reasoning      :   450ms ( 36.0%)
+   refinement     :   300ms ( 24.0%)
+   persona        :   140ms ( 11.2%)
+📤 Output: 342 characters
+════════════════════════════════════════════════════════════════════════════════════════════════════
+```
+
+### 2. **Configurable Verbosity Levels**
+
+Set via `LOG_DETAIL_LEVEL` environment variable:
+
+- **`minimal`**: Only errors and critical events
+- **`summary`**: Stage completion + errors (DEFAULT - recommended for production)
+- **`detailed`**: Include raw LLM outputs, RAG results, timing breakdowns (for debugging)
+- **`verbose`**: Everything including full JSON dumps (for deep debugging)
+
+### 3. **Raw LLM Output Visibility** ✅
+
+**You can now see raw LLM outputs clearly!**
+
+In `detailed` or `verbose` mode, LLM calls show:
+- Backend used
+- Prompt preview
+- Parsed reply
+- **Raw JSON response in collapsible format** (verbose only)
+
+```
+╭─ RAW RESPONSE ────────────────────────────────────────────────────────────────────────────
+│ {
+│   "id": "chatcmpl-123",
+│   "object": "chat.completion",
+│   "model": "gpt-4",
+│   "choices": [
+│     {
+│       "message": {
+│         "content": "Full response here..."
+│       }
+│     }
+│   ]
+│ }
+╰───────────────────────────────────────────────────────────────────────────────────────────
+```
+
+### 4. **Loop Detection & Protection** ✅
+
+**New safety features:**
+
+- **Duplicate message detection**: Prevents processing the same message twice
+- **Message history trimming**: Auto-trims to last 100 messages (configurable via `MAX_MESSAGE_HISTORY`)
+- **Session TTL**: Auto-expires inactive sessions after 24 hours (configurable via `SESSION_TTL_HOURS`)
+- **Hash-based detection**: Uses MD5 hash to detect exact duplicate messages
+
+**Example warning when loop detected:**
+```
+⚠️  DUPLICATE MESSAGE DETECTED | Session: abc123 | Message: What is the meaning of life?
+🔁 LOOP DETECTED - Returning cached context to prevent processing duplicate
+```
+
+### 5. **Performance Timing** ✅
+
+In `detailed` mode, see exactly where time is spent:
+
+```
+⏱️  Stage Timings:
+   context        :   150ms ( 12.0%)  ← Context collection
+   identity       :    10ms (  0.8%)  ← Identity loading
+   monologue      :   200ms ( 16.0%)  ← Inner monologue
+   tools          :     0ms (  0.0%)  ← Autonomous tools
+   reflection     :    50ms (  4.0%)  ← Reflection notes
+   reasoning      :   450ms ( 36.0%)  ← Main reasoning (BOTTLENECK)
+   refinement     :   300ms ( 24.0%)  ← Answer refinement
+   persona        :   140ms ( 11.2%)  ← Persona layer
+```
+
+**This helps you identify weak links in the chain!**
+
+## 📁 Files Modified
+
+### Core Changes
+
+1. **[llm.js](core/relay/lib/llm.js)**
+   - Removed massive JSON dump on line 53
+   - Added structured logging with 4 verbosity levels
+   - Shows raw responses only in verbose mode (collapsible format)
+   - Tracks failed backends and shows summary on total failure
+
+2. **[context.py](cortex/context.py)**
+   - Condensed 71-line NeoMem loop to 5-line summary
+   - Removed repeated intake data dumps
+   - Added structured hierarchical logging with boxes
+   - Added duplicate message detection
+   - Added message history trimming
+   - Added session TTL and cleanup
+
+3. **[router.py](cortex/router.py)**
+   - Replaced 15+ stage logs with unified pipeline summary
+   - Added stage timing collection
+   - Shows performance breakdown in detailed mode
+   - Clean start/end markers with total duration
+
+### New Files
+
+4. **[utils/logging_utils.py](cortex/utils/logging_utils.py)** (NEW)
+   - Reusable structured logging utilities
+   - `PipelineLogger` class for hierarchical logging
+   - Collapsible data sections
+   - Stage timing tracking
+   - Future-ready for expansion
+
+5. **[.env.logging.example](.env.logging.example)** (NEW)
+   - Complete logging configuration guide
+   - Shows example output at each verbosity level
+   - Documents all environment variables
+   - Production-ready defaults
+
+6. **[LOGGING_REFACTOR_SUMMARY.md](LOGGING_REFACTOR_SUMMARY.md)** (THIS FILE)
+
+## 🚀 How to Use
+
+### For Finding Weak Links (Your Use Case)
+
+```bash
+# Set in your .env or export:
+export LOG_DETAIL_LEVEL=detailed
+export VERBOSE_DEBUG=false  # or true for even more detail
+
+# Now run your chat - you'll see:
+# 1. Which LLM backend is used
+# 2. Raw LLM outputs (in verbose mode)
+# 3. Exact timing per stage
+# 4. Which stage is taking longest
+```
+
+### For Production
+
+```bash
+export LOG_DETAIL_LEVEL=summary
+
+# Minimal, clean logs:
+# ✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question...
+# ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
+```
+
+### For Deep Debugging
+
+```bash
+export LOG_DETAIL_LEVEL=verbose
+export LOG_RAW_CONTEXT_DATA=true
+
+# Shows EVERYTHING including full JSON dumps
+```
+
+## 🔍 Finding Weak Links - Quick Guide
+
+**Problem: "Which LLM stage is failing or producing bad output?"**
+
+1. Set `LOG_DETAIL_LEVEL=detailed`
+2. Run a test conversation
+3. Look for timing anomalies:
+   ```
+   reasoning      :  3450ms ( 76.0%)  ← BOTTLENECK!
+   ```
+4. Look for errors:
+   ```
+   ⚠️  Reflection failed: Connection timeout
+   ```
+5. Check raw LLM outputs (set `VERBOSE_DEBUG=true`):
+   ```
+   ╭─ RAW RESPONSE ────────────────────────────────────
+   │ {
+   │   "choices": [
+   │     { "message": { "content": "..." } }
+   │   ]
+   │ }
+   ╰───────────────────────────────────────────────────
+   ```
+
+**Problem: "Is the loop repeating operations?"**
+
+1. Enable duplicate detection (on by default)
+2. Look for loop warnings:
+   ```
+   ⚠️  DUPLICATE MESSAGE DETECTED | Session: abc123
+   🔁 LOOP DETECTED - Returning cached context
+   ```
+3. Check stage timings - repeated stages will show up as duplicates
+
+**Problem: "Which RAG memories are being used?"**
+
+1. Set `LOG_DETAIL_LEVEL=detailed`
+2. Look for RAG results box:
+   ```
+   ╭─ RAG RESULTS (5) ──────────────────────────────
+   │ [1] 0.923 | Previous discussion about X...
+   │ [2] 0.891 | Note about Y...
+   ╰────────────────────────────────────────────────
+   ```
+
+## 📊 Environment Variables Reference
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LOG_DETAIL_LEVEL` | `summary` | Verbosity: minimal/summary/detailed/verbose |
+| `VERBOSE_DEBUG` | `false` | Legacy flag for maximum verbosity |
+| `LOG_RAW_CONTEXT_DATA` | `false` | Show full intake data dumps |
+| `ENABLE_DUPLICATE_DETECTION` | `true` | Detect and prevent duplicate messages |
+| `MAX_MESSAGE_HISTORY` | `100` | Max messages to keep per session |
+| `SESSION_TTL_HOURS` | `24` | Auto-expire sessions after N hours |
+
+## 🎉 Results
+
+**Before:** 1000+ lines of logs per chat message, unreadable, couldn't identify issues
+
+**After (summary mode):** 5 lines of structured logs, clear and actionable
+
+**After (detailed mode):** ~50 lines with full visibility into each stage, timing, and raw outputs
+
+**Loop protection:** Automatic detection and prevention of duplicate processing
+
+**You can now:**
+✅ See raw LLM outputs clearly (in detailed/verbose mode)
+✅ Identify performance bottlenecks (stage timings)
+✅ Detect loops and duplicates (automatic)
+✅ Find failing stages (error markers)
+✅ Scan logs quickly (hierarchical structure)
+✅ Debug production issues (adjustable verbosity)
+
+## 🔧 Next Steps (Optional Improvements)
+
+1. **Structured JSON logging**: Output as JSON for log aggregation tools
+2. **Log rotation**: Implement file rotation for verbose logs
+3. **Metrics export**: Export stage timings to Prometheus/Grafana
+4. **Error categorization**: Tag errors by type (network, timeout, parsing, etc.)
+5. **Performance alerts**: Auto-alert when stages exceed thresholds
+
+---
+
+**Happy debugging! You can now see what's actually happening in the cortex loop.** 🎯