serversdown/project-lyra

Fork 0

Files

serversdwn 6bb800f5f8 Cortex debugging logs cleaned up

2025-12-20 02:49:20 -05:00

15 KiB

Raw Permalink Blame History

Cortex Logging Refactor Summary

🎯 Problem Statement

The cortex chat loop had severe logging issues that made debugging impossible:

Massive verbosity: 100+ log lines per chat message
Raw LLM dumps: Full JSON responses pretty-printed on every call (1000s of lines)
Repeated data: NeoMem results logged 71 times individually
No structure: Scattered emoji logs with no hierarchy
Impossible to debug: Couldn't tell if loops were happening or just verbose logging
No loop protection: Unbounded message history growth, no session cleanup, no duplicate detection

✅ What Was Fixed

1. Structured Hierarchical Logging

Before:

🔍 RAW LLM RESPONSE: {
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here is a very long response that goes on for hundreds of lines..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579
  }
}
🧠 Trying backend: PRIMARY (http://localhost:8000)
✅ Success via PRIMARY
[STAGE 0] Collecting unified context...
[STAGE 0] Context collected - 5 RAG results
[COLLECT_CONTEXT] Intake data retrieved:
{
  "L1": [...],
  "L5": [...],
  "L10": {...},
  "L20": {...},
  "L30": {...}
}
[COLLECT_CONTEXT] NeoMem search returned 71 results
  [1] Score: 0.923 - Memory content here...
  [2] Score: 0.891 - More memory content...
  [3] Score: 0.867 - Even more content...
  ... (68 more lines)

After (summary mode - DEFAULT):

✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question about...
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
🧠 Monologue | question | Tone: curious
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
📤 Output: 342 characters

After (detailed mode - for debugging):

════════════════════════════════════════════════════════════════════════════════════════════════════
🚀 PIPELINE START | Session: abc123 | 14:23:45.123
════════════════════════════════════════════════════════════════════════════════════════════════════
📝 User: What is the meaning of life?
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────────────────────
🧠 LLM CALL | Backend: PRIMARY | 14:23:45.234
────────────────────────────────────────────────────────────────────────────────────────────────────
📝 Prompt: You are Lyra, a thoughtful AI assistant...
💬 Reply: Based on philosophical perspectives, the meaning...

📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
────────────────────────────────────────────────────────────────────────────────────────────────────
[CONTEXT] Session abc123 | User: What is the meaning of life?
────────────────────────────────────────────────────────────────────────────────────────────────────
  Mode: default | Mood: neutral | Project: None
  Tools: RAG, WEB, WEATHER, CODEBRAIN, POKERBRAIN

  ╭─ INTAKE SUMMARIES ────────────────────────────────────────────────
  │ L1  : Last message discussed philosophy...
  │ L5  : Recent 5 messages covered existential topics...
  │ L10 : Past 10 messages showed curiosity pattern...
  │ L20 : Session focused on deep questions...
  │ L30 : Long-term trend shows philosophical interest...
  ╰───────────────────────────────────────────────────────────────────

  ╭─ RAG RESULTS (5) ──────────────────────────────────────────────
  │ [1] 0.923 | Previous discussion about purpose and meaning...
  │ [2] 0.891 | Note about existential philosophy...
  │ [3] 0.867 | Memory of Viktor Frankl discussion...
  │ [4] 0.834 | Reference to stoic philosophy...
  │ [5] 0.801 | Buddhism and the middle path...
  ╰───────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────

════════════════════════════════════════════════════════════════════════════════════════════════════
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
════════════════════════════════════════════════════════════════════════════════════════════════════
⏱️  Stage Timings:
   context        :   150ms ( 12.0%)
   identity       :    10ms (  0.8%)
   monologue      :   200ms ( 16.0%)
   tools          :     0ms (  0.0%)
   reflection     :    50ms (  4.0%)
   reasoning      :   450ms ( 36.0%)
   refinement     :   300ms ( 24.0%)
   persona        :   140ms ( 11.2%)
📤 Output: 342 characters
════════════════════════════════════════════════════════════════════════════════════════════════════

2. Configurable Verbosity Levels

Set via LOG_DETAIL_LEVEL environment variable:

minimal: Only errors and critical events
summary: Stage completion + errors (DEFAULT - recommended for production)
detailed: Include raw LLM outputs, RAG results, timing breakdowns (for debugging)
verbose: Everything including full JSON dumps (for deep debugging)

3. Raw LLM Output Visibility ✅

You can now see raw LLM outputs clearly!

In detailed or verbose mode, LLM calls show:

Backend used
Prompt preview
Parsed reply
Raw JSON response in collapsible format (verbose only)

╭─ RAW RESPONSE ────────────────────────────────────────────────────────────────────────────
│ {
│   "id": "chatcmpl-123",
│   "object": "chat.completion",
│   "model": "gpt-4",
│   "choices": [
│     {
│       "message": {
│         "content": "Full response here..."
│       }
│     }
│   ]
│ }
╰───────────────────────────────────────────────────────────────────────────────────────────

4. Loop Detection & Protection ✅

New safety features:

Duplicate message detection: Prevents processing the same message twice
Message history trimming: Auto-trims to last 100 messages (configurable via MAX_MESSAGE_HISTORY)
Session TTL: Auto-expires inactive sessions after 24 hours (configurable via SESSION_TTL_HOURS)
Hash-based detection: Uses MD5 hash to detect exact duplicate messages

Example warning when loop detected:

⚠️  DUPLICATE MESSAGE DETECTED | Session: abc123 | Message: What is the meaning of life?
🔁 LOOP DETECTED - Returning cached context to prevent processing duplicate

5. Performance Timing ✅

In detailed mode, see exactly where time is spent:

⏱️  Stage Timings:
   context        :   150ms ( 12.0%)  ← Context collection
   identity       :    10ms (  0.8%)  ← Identity loading
   monologue      :   200ms ( 16.0%)  ← Inner monologue
   tools          :     0ms (  0.0%)  ← Autonomous tools
   reflection     :    50ms (  4.0%)  ← Reflection notes
   reasoning      :   450ms ( 36.0%)  ← Main reasoning (BOTTLENECK)
   refinement     :   300ms ( 24.0%)  ← Answer refinement
   persona        :   140ms ( 11.2%)  ← Persona layer

This helps you identify weak links in the chain!

📁 Files Modified

Core Changes

llm.js
- Removed massive JSON dump on line 53
- Added structured logging with 4 verbosity levels
- Shows raw responses only in verbose mode (collapsible format)
- Tracks failed backends and shows summary on total failure
context.py
- Condensed 71-line NeoMem loop to 5-line summary
- Removed repeated intake data dumps
- Added structured hierarchical logging with boxes
- Added duplicate message detection
- Added message history trimming
- Added session TTL and cleanup
router.py
- Replaced 15+ stage logs with unified pipeline summary
- Added stage timing collection
- Shows performance breakdown in detailed mode
- Clean start/end markers with total duration

New Files

utils/logging_utils.py (NEW)
- Reusable structured logging utilities
- PipelineLogger class for hierarchical logging
- Collapsible data sections
- Stage timing tracking
- Future-ready for expansion
.env.logging.example (NEW)
- Complete logging configuration guide
- Shows example output at each verbosity level
- Documents all environment variables
- Production-ready defaults
LOGGING_REFACTOR_SUMMARY.md (THIS FILE)

🚀 How to Use

For Finding Weak Links (Your Use Case)

# Set in your .env or export:
export LOG_DETAIL_LEVEL=detailed
export VERBOSE_DEBUG=false  # or true for even more detail

# Now run your chat - you'll see:
# 1. Which LLM backend is used
# 2. Raw LLM outputs (in verbose mode)
# 3. Exact timing per stage
# 4. Which stage is taking longest

For Production

export LOG_DETAIL_LEVEL=summary

# Minimal, clean logs:
# ✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question...
# ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms

For Deep Debugging

export LOG_DETAIL_LEVEL=verbose
export LOG_RAW_CONTEXT_DATA=true

# Shows EVERYTHING including full JSON dumps

🔍 Finding Weak Links - Quick Guide

Problem: "Which LLM stage is failing or producing bad output?"

Set LOG_DETAIL_LEVEL=detailed
Run a test conversation

Look for timing anomalies:

reasoning      :  3450ms ( 76.0%)  ← BOTTLENECK!

Look for errors:

⚠️  Reflection failed: Connection timeout

Check raw LLM outputs (set VERBOSE_DEBUG=true):

╭─ RAW RESPONSE ────────────────────────────────────
│ {
│   "choices": [
│     { "message": { "content": "..." } }
│   ]
│ }
╰───────────────────────────────────────────────────

Problem: "Is the loop repeating operations?"

Enable duplicate detection (on by default)

Look for loop warnings:

⚠️  DUPLICATE MESSAGE DETECTED | Session: abc123
🔁 LOOP DETECTED - Returning cached context

Check stage timings - repeated stages will show up as duplicates

Problem: "Which RAG memories are being used?"

Set LOG_DETAIL_LEVEL=detailed

Look for RAG results box:

╭─ RAG RESULTS (5) ──────────────────────────────
│ [1] 0.923 | Previous discussion about X...
│ [2] 0.891 | Note about Y...
╰────────────────────────────────────────────────

📊 Environment Variables Reference

Variable	Default	Description
`LOG_DETAIL_LEVEL`	`summary`	Verbosity: minimal/summary/detailed/verbose
`VERBOSE_DEBUG`	`false`	Legacy flag for maximum verbosity
`LOG_RAW_CONTEXT_DATA`	`false`	Show full intake data dumps
`ENABLE_DUPLICATE_DETECTION`	`true`	Detect and prevent duplicate messages
`MAX_MESSAGE_HISTORY`	`100`	Max messages to keep per session
`SESSION_TTL_HOURS`	`24`	Auto-expire sessions after N hours

🎉 Results

Before: 1000+ lines of logs per chat message, unreadable, couldn't identify issues

After (summary mode): 5 lines of structured logs, clear and actionable

After (detailed mode): ~50 lines with full visibility into each stage, timing, and raw outputs

Loop protection: Automatic detection and prevention of duplicate processing

You can now: ✅ See raw LLM outputs clearly (in detailed/verbose mode) ✅ Identify performance bottlenecks (stage timings) ✅ Detect loops and duplicates (automatic) ✅ Find failing stages (error markers) ✅ Scan logs quickly (hierarchical structure) ✅ Debug production issues (adjustable verbosity)

🔧 Next Steps (Optional Improvements)

Structured JSON logging: Output as JSON for log aggregation tools
Log rotation: Implement file rotation for verbose logs
Metrics export: Export stage timings to Prometheus/Grafana
Error categorization: Tag errors by type (network, timeout, parsing, etc.)
Performance alerts: Auto-alert when stages exceed thresholds

Happy debugging! You can now see what's actually happening in the cortex loop. 🎯

15 KiB Raw Permalink Blame History