Files
project-lyra/LOGGING_REFACTOR_SUMMARY.md
2025-12-20 02:49:20 -05:00

15 KiB

Cortex Logging Refactor Summary

🎯 Problem Statement

The cortex chat loop had severe logging issues that made debugging impossible:

  1. Massive verbosity: 100+ log lines per chat message
  2. Raw LLM dumps: Full JSON responses pretty-printed on every call (1000s of lines)
  3. Repeated data: NeoMem results logged 71 times individually
  4. No structure: Scattered emoji logs with no hierarchy
  5. Impossible to debug: Couldn't tell if loops were happening or just verbose logging
  6. No loop protection: Unbounded message history growth, no session cleanup, no duplicate detection

What Was Fixed

1. Structured Hierarchical Logging

Before:

🔍 RAW LLM RESPONSE: {
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here is a very long response that goes on for hundreds of lines..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579
  }
}
🧠 Trying backend: PRIMARY (http://localhost:8000)
✅ Success via PRIMARY
[STAGE 0] Collecting unified context...
[STAGE 0] Context collected - 5 RAG results
[COLLECT_CONTEXT] Intake data retrieved:
{
  "L1": [...],
  "L5": [...],
  "L10": {...},
  "L20": {...},
  "L30": {...}
}
[COLLECT_CONTEXT] NeoMem search returned 71 results
  [1] Score: 0.923 - Memory content here...
  [2] Score: 0.891 - More memory content...
  [3] Score: 0.867 - Even more content...
  ... (68 more lines)

After (summary mode - DEFAULT):

✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question about...
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
🧠 Monologue | question | Tone: curious
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
📤 Output: 342 characters

After (detailed mode - for debugging):

════════════════════════════════════════════════════════════════════════════════════════════════════
🚀 PIPELINE START | Session: abc123 | 14:23:45.123
════════════════════════════════════════════════════════════════════════════════════════════════════
📝 User: What is the meaning of life?
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────────────────────
🧠 LLM CALL | Backend: PRIMARY | 14:23:45.234
────────────────────────────────────────────────────────────────────────────────────────────────────
📝 Prompt: You are Lyra, a thoughtful AI assistant...
💬 Reply: Based on philosophical perspectives, the meaning...

📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
────────────────────────────────────────────────────────────────────────────────────────────────────
[CONTEXT] Session abc123 | User: What is the meaning of life?
────────────────────────────────────────────────────────────────────────────────────────────────────
  Mode: default | Mood: neutral | Project: None
  Tools: RAG, WEB, WEATHER, CODEBRAIN, POKERBRAIN

  ╭─ INTAKE SUMMARIES ────────────────────────────────────────────────
  │ L1  : Last message discussed philosophy...
  │ L5  : Recent 5 messages covered existential topics...
  │ L10 : Past 10 messages showed curiosity pattern...
  │ L20 : Session focused on deep questions...
  │ L30 : Long-term trend shows philosophical interest...
  ╰───────────────────────────────────────────────────────────────────

  ╭─ RAG RESULTS (5) ──────────────────────────────────────────────
  │ [1] 0.923 | Previous discussion about purpose and meaning...
  │ [2] 0.891 | Note about existential philosophy...
  │ [3] 0.867 | Memory of Viktor Frankl discussion...
  │ [4] 0.834 | Reference to stoic philosophy...
  │ [5] 0.801 | Buddhism and the middle path...
  ╰───────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────

════════════════════════════════════════════════════════════════════════════════════════════════════
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
════════════════════════════════════════════════════════════════════════════════════════════════════
⏱️  Stage Timings:
   context        :   150ms ( 12.0%)
   identity       :    10ms (  0.8%)
   monologue      :   200ms ( 16.0%)
   tools          :     0ms (  0.0%)
   reflection     :    50ms (  4.0%)
   reasoning      :   450ms ( 36.0%)
   refinement     :   300ms ( 24.0%)
   persona        :   140ms ( 11.2%)
📤 Output: 342 characters
════════════════════════════════════════════════════════════════════════════════════════════════════

2. Configurable Verbosity Levels

Set via LOG_DETAIL_LEVEL environment variable:

  • minimal: Only errors and critical events
  • summary: Stage completion + errors (DEFAULT - recommended for production)
  • detailed: Include raw LLM outputs, RAG results, timing breakdowns (for debugging)
  • verbose: Everything including full JSON dumps (for deep debugging)

3. Raw LLM Output Visibility

You can now see raw LLM outputs clearly!

In detailed or verbose mode, LLM calls show:

  • Backend used
  • Prompt preview
  • Parsed reply
  • Raw JSON response in collapsible format (verbose only)
╭─ RAW RESPONSE ────────────────────────────────────────────────────────────────────────────
│ {
│   "id": "chatcmpl-123",
│   "object": "chat.completion",
│   "model": "gpt-4",
│   "choices": [
│     {
│       "message": {
│         "content": "Full response here..."
│       }
│     }
│   ]
│ }
╰───────────────────────────────────────────────────────────────────────────────────────────

4. Loop Detection & Protection

New safety features:

  • Duplicate message detection: Prevents processing the same message twice
  • Message history trimming: Auto-trims to last 100 messages (configurable via MAX_MESSAGE_HISTORY)
  • Session TTL: Auto-expires inactive sessions after 24 hours (configurable via SESSION_TTL_HOURS)
  • Hash-based detection: Uses MD5 hash to detect exact duplicate messages

Example warning when loop detected:

⚠️  DUPLICATE MESSAGE DETECTED | Session: abc123 | Message: What is the meaning of life?
🔁 LOOP DETECTED - Returning cached context to prevent processing duplicate

5. Performance Timing

In detailed mode, see exactly where time is spent:

⏱️  Stage Timings:
   context        :   150ms ( 12.0%)  ← Context collection
   identity       :    10ms (  0.8%)  ← Identity loading
   monologue      :   200ms ( 16.0%)  ← Inner monologue
   tools          :     0ms (  0.0%)  ← Autonomous tools
   reflection     :    50ms (  4.0%)  ← Reflection notes
   reasoning      :   450ms ( 36.0%)  ← Main reasoning (BOTTLENECK)
   refinement     :   300ms ( 24.0%)  ← Answer refinement
   persona        :   140ms ( 11.2%)  ← Persona layer

This helps you identify weak links in the chain!

📁 Files Modified

Core Changes

  1. llm.js

    • Removed massive JSON dump on line 53
    • Added structured logging with 4 verbosity levels
    • Shows raw responses only in verbose mode (collapsible format)
    • Tracks failed backends and shows summary on total failure
  2. context.py

    • Condensed 71-line NeoMem loop to 5-line summary
    • Removed repeated intake data dumps
    • Added structured hierarchical logging with boxes
    • Added duplicate message detection
    • Added message history trimming
    • Added session TTL and cleanup
  3. router.py

    • Replaced 15+ stage logs with unified pipeline summary
    • Added stage timing collection
    • Shows performance breakdown in detailed mode
    • Clean start/end markers with total duration

New Files

  1. utils/logging_utils.py (NEW)

    • Reusable structured logging utilities
    • PipelineLogger class for hierarchical logging
    • Collapsible data sections
    • Stage timing tracking
    • Future-ready for expansion
  2. .env.logging.example (NEW)

    • Complete logging configuration guide
    • Shows example output at each verbosity level
    • Documents all environment variables
    • Production-ready defaults
  3. LOGGING_REFACTOR_SUMMARY.md (THIS FILE)

🚀 How to Use

# Set in your .env or export:
export LOG_DETAIL_LEVEL=detailed
export VERBOSE_DEBUG=false  # or true for even more detail

# Now run your chat - you'll see:
# 1. Which LLM backend is used
# 2. Raw LLM outputs (in verbose mode)
# 3. Exact timing per stage
# 4. Which stage is taking longest

For Production

export LOG_DETAIL_LEVEL=summary

# Minimal, clean logs:
# ✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question...
# ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms

For Deep Debugging

export LOG_DETAIL_LEVEL=verbose
export LOG_RAW_CONTEXT_DATA=true

# Shows EVERYTHING including full JSON dumps

Problem: "Which LLM stage is failing or producing bad output?"

  1. Set LOG_DETAIL_LEVEL=detailed
  2. Run a test conversation
  3. Look for timing anomalies:
    reasoning      :  3450ms ( 76.0%)  ← BOTTLENECK!
    
  4. Look for errors:
    ⚠️  Reflection failed: Connection timeout
    
  5. Check raw LLM outputs (set VERBOSE_DEBUG=true):
    ╭─ RAW RESPONSE ────────────────────────────────────
    │ {
    │   "choices": [
    │     { "message": { "content": "..." } }
    │   ]
    │ }
    ╰───────────────────────────────────────────────────
    

Problem: "Is the loop repeating operations?"

  1. Enable duplicate detection (on by default)
  2. Look for loop warnings:
    ⚠️  DUPLICATE MESSAGE DETECTED | Session: abc123
    🔁 LOOP DETECTED - Returning cached context
    
  3. Check stage timings - repeated stages will show up as duplicates

Problem: "Which RAG memories are being used?"

  1. Set LOG_DETAIL_LEVEL=detailed
  2. Look for RAG results box:
    ╭─ RAG RESULTS (5) ──────────────────────────────
    │ [1] 0.923 | Previous discussion about X...
    │ [2] 0.891 | Note about Y...
    ╰────────────────────────────────────────────────
    

📊 Environment Variables Reference

Variable Default Description
LOG_DETAIL_LEVEL summary Verbosity: minimal/summary/detailed/verbose
VERBOSE_DEBUG false Legacy flag for maximum verbosity
LOG_RAW_CONTEXT_DATA false Show full intake data dumps
ENABLE_DUPLICATE_DETECTION true Detect and prevent duplicate messages
MAX_MESSAGE_HISTORY 100 Max messages to keep per session
SESSION_TTL_HOURS 24 Auto-expire sessions after N hours

🎉 Results

Before: 1000+ lines of logs per chat message, unreadable, couldn't identify issues

After (summary mode): 5 lines of structured logs, clear and actionable

After (detailed mode): ~50 lines with full visibility into each stage, timing, and raw outputs

Loop protection: Automatic detection and prevention of duplicate processing

You can now: See raw LLM outputs clearly (in detailed/verbose mode) Identify performance bottlenecks (stage timings) Detect loops and duplicates (automatic) Find failing stages (error markers) Scan logs quickly (hierarchical structure) Debug production issues (adjustable verbosity)

🔧 Next Steps (Optional Improvements)

  1. Structured JSON logging: Output as JSON for log aggregation tools
  2. Log rotation: Implement file rotation for verbose logs
  3. Metrics export: Export stage timings to Prometheus/Grafana
  4. Error categorization: Tag errors by type (network, timeout, parsing, etc.)
  5. Performance alerts: Auto-alert when stages exceed thresholds

Happy debugging! You can now see what's actually happening in the cortex loop. 🎯