15 KiB
Cortex Logging Refactor Summary
🎯 Problem Statement
The cortex chat loop had severe logging issues that made debugging impossible:
- Massive verbosity: 100+ log lines per chat message
- Raw LLM dumps: Full JSON responses pretty-printed on every call (1000s of lines)
- Repeated data: NeoMem results logged 71 times individually
- No structure: Scattered emoji logs with no hierarchy
- Impossible to debug: Couldn't tell if loops were happening or just verbose logging
- No loop protection: Unbounded message history growth, no session cleanup, no duplicate detection
✅ What Was Fixed
1. Structured Hierarchical Logging
Before:
🔍 RAW LLM RESPONSE: {
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1234567890,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here is a very long response that goes on for hundreds of lines..."
}
}
],
"usage": {
"prompt_tokens": 123,
"completion_tokens": 456,
"total_tokens": 579
}
}
🧠 Trying backend: PRIMARY (http://localhost:8000)
✅ Success via PRIMARY
[STAGE 0] Collecting unified context...
[STAGE 0] Context collected - 5 RAG results
[COLLECT_CONTEXT] Intake data retrieved:
{
"L1": [...],
"L5": [...],
"L10": {...},
"L20": {...},
"L30": {...}
}
[COLLECT_CONTEXT] NeoMem search returned 71 results
[1] Score: 0.923 - Memory content here...
[2] Score: 0.891 - More memory content...
[3] Score: 0.867 - Even more content...
... (68 more lines)
After (summary mode - DEFAULT):
✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question about...
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
🧠 Monologue | question | Tone: curious
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
📤 Output: 342 characters
After (detailed mode - for debugging):
════════════════════════════════════════════════════════════════════════════════════════════════════
🚀 PIPELINE START | Session: abc123 | 14:23:45.123
════════════════════════════════════════════════════════════════════════════════════════════════════
📝 User: What is the meaning of life?
────────────────────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────
🧠 LLM CALL | Backend: PRIMARY | 14:23:45.234
────────────────────────────────────────────────────────────────────────────────────────────────────
📝 Prompt: You are Lyra, a thoughtful AI assistant...
💬 Reply: Based on philosophical perspectives, the meaning...
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
────────────────────────────────────────────────────────────────────────────────────────────────────
[CONTEXT] Session abc123 | User: What is the meaning of life?
────────────────────────────────────────────────────────────────────────────────────────────────────
Mode: default | Mood: neutral | Project: None
Tools: RAG, WEB, WEATHER, CODEBRAIN, POKERBRAIN
╭─ INTAKE SUMMARIES ────────────────────────────────────────────────
│ L1 : Last message discussed philosophy...
│ L5 : Recent 5 messages covered existential topics...
│ L10 : Past 10 messages showed curiosity pattern...
│ L20 : Session focused on deep questions...
│ L30 : Long-term trend shows philosophical interest...
╰───────────────────────────────────────────────────────────────────
╭─ RAG RESULTS (5) ──────────────────────────────────────────────
│ [1] 0.923 | Previous discussion about purpose and meaning...
│ [2] 0.891 | Note about existential philosophy...
│ [3] 0.867 | Memory of Viktor Frankl discussion...
│ [4] 0.834 | Reference to stoic philosophy...
│ [5] 0.801 | Buddhism and the middle path...
╰───────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────
════════════════════════════════════════════════════════════════════════════════════════════════════
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
════════════════════════════════════════════════════════════════════════════════════════════════════
⏱️ Stage Timings:
context : 150ms ( 12.0%)
identity : 10ms ( 0.8%)
monologue : 200ms ( 16.0%)
tools : 0ms ( 0.0%)
reflection : 50ms ( 4.0%)
reasoning : 450ms ( 36.0%)
refinement : 300ms ( 24.0%)
persona : 140ms ( 11.2%)
📤 Output: 342 characters
════════════════════════════════════════════════════════════════════════════════════════════════════
2. Configurable Verbosity Levels
Set via LOG_DETAIL_LEVEL environment variable:
minimal: Only errors and critical eventssummary: Stage completion + errors (DEFAULT - recommended for production)detailed: Include raw LLM outputs, RAG results, timing breakdowns (for debugging)verbose: Everything including full JSON dumps (for deep debugging)
3. Raw LLM Output Visibility ✅
You can now see raw LLM outputs clearly!
In detailed or verbose mode, LLM calls show:
- Backend used
- Prompt preview
- Parsed reply
- Raw JSON response in collapsible format (verbose only)
╭─ RAW RESPONSE ────────────────────────────────────────────────────────────────────────────
│ {
│ "id": "chatcmpl-123",
│ "object": "chat.completion",
│ "model": "gpt-4",
│ "choices": [
│ {
│ "message": {
│ "content": "Full response here..."
│ }
│ }
│ ]
│ }
╰───────────────────────────────────────────────────────────────────────────────────────────
4. Loop Detection & Protection ✅
New safety features:
- Duplicate message detection: Prevents processing the same message twice
- Message history trimming: Auto-trims to last 100 messages (configurable via
MAX_MESSAGE_HISTORY) - Session TTL: Auto-expires inactive sessions after 24 hours (configurable via
SESSION_TTL_HOURS) - Hash-based detection: Uses MD5 hash to detect exact duplicate messages
Example warning when loop detected:
⚠️ DUPLICATE MESSAGE DETECTED | Session: abc123 | Message: What is the meaning of life?
🔁 LOOP DETECTED - Returning cached context to prevent processing duplicate
5. Performance Timing ✅
In detailed mode, see exactly where time is spent:
⏱️ Stage Timings:
context : 150ms ( 12.0%) ← Context collection
identity : 10ms ( 0.8%) ← Identity loading
monologue : 200ms ( 16.0%) ← Inner monologue
tools : 0ms ( 0.0%) ← Autonomous tools
reflection : 50ms ( 4.0%) ← Reflection notes
reasoning : 450ms ( 36.0%) ← Main reasoning (BOTTLENECK)
refinement : 300ms ( 24.0%) ← Answer refinement
persona : 140ms ( 11.2%) ← Persona layer
This helps you identify weak links in the chain!
📁 Files Modified
Core Changes
-
- Removed massive JSON dump on line 53
- Added structured logging with 4 verbosity levels
- Shows raw responses only in verbose mode (collapsible format)
- Tracks failed backends and shows summary on total failure
-
- Condensed 71-line NeoMem loop to 5-line summary
- Removed repeated intake data dumps
- Added structured hierarchical logging with boxes
- Added duplicate message detection
- Added message history trimming
- Added session TTL and cleanup
-
- Replaced 15+ stage logs with unified pipeline summary
- Added stage timing collection
- Shows performance breakdown in detailed mode
- Clean start/end markers with total duration
New Files
-
utils/logging_utils.py (NEW)
- Reusable structured logging utilities
PipelineLoggerclass for hierarchical logging- Collapsible data sections
- Stage timing tracking
- Future-ready for expansion
-
.env.logging.example (NEW)
- Complete logging configuration guide
- Shows example output at each verbosity level
- Documents all environment variables
- Production-ready defaults
-
LOGGING_REFACTOR_SUMMARY.md (THIS FILE)
🚀 How to Use
For Finding Weak Links (Your Use Case)
# Set in your .env or export:
export LOG_DETAIL_LEVEL=detailed
export VERBOSE_DEBUG=false # or true for even more detail
# Now run your chat - you'll see:
# 1. Which LLM backend is used
# 2. Raw LLM outputs (in verbose mode)
# 3. Exact timing per stage
# 4. Which stage is taking longest
For Production
export LOG_DETAIL_LEVEL=summary
# Minimal, clean logs:
# ✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question...
# ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
For Deep Debugging
export LOG_DETAIL_LEVEL=verbose
export LOG_RAW_CONTEXT_DATA=true
# Shows EVERYTHING including full JSON dumps
🔍 Finding Weak Links - Quick Guide
Problem: "Which LLM stage is failing or producing bad output?"
- Set
LOG_DETAIL_LEVEL=detailed - Run a test conversation
- Look for timing anomalies:
reasoning : 3450ms ( 76.0%) ← BOTTLENECK! - Look for errors:
⚠️ Reflection failed: Connection timeout - Check raw LLM outputs (set
VERBOSE_DEBUG=true):╭─ RAW RESPONSE ──────────────────────────────────── │ { │ "choices": [ │ { "message": { "content": "..." } } │ ] │ } ╰───────────────────────────────────────────────────
Problem: "Is the loop repeating operations?"
- Enable duplicate detection (on by default)
- Look for loop warnings:
⚠️ DUPLICATE MESSAGE DETECTED | Session: abc123 🔁 LOOP DETECTED - Returning cached context - Check stage timings - repeated stages will show up as duplicates
Problem: "Which RAG memories are being used?"
- Set
LOG_DETAIL_LEVEL=detailed - Look for RAG results box:
╭─ RAG RESULTS (5) ────────────────────────────── │ [1] 0.923 | Previous discussion about X... │ [2] 0.891 | Note about Y... ╰────────────────────────────────────────────────
📊 Environment Variables Reference
| Variable | Default | Description |
|---|---|---|
LOG_DETAIL_LEVEL |
summary |
Verbosity: minimal/summary/detailed/verbose |
VERBOSE_DEBUG |
false |
Legacy flag for maximum verbosity |
LOG_RAW_CONTEXT_DATA |
false |
Show full intake data dumps |
ENABLE_DUPLICATE_DETECTION |
true |
Detect and prevent duplicate messages |
MAX_MESSAGE_HISTORY |
100 |
Max messages to keep per session |
SESSION_TTL_HOURS |
24 |
Auto-expire sessions after N hours |
🎉 Results
Before: 1000+ lines of logs per chat message, unreadable, couldn't identify issues
After (summary mode): 5 lines of structured logs, clear and actionable
After (detailed mode): ~50 lines with full visibility into each stage, timing, and raw outputs
Loop protection: Automatic detection and prevention of duplicate processing
You can now: ✅ See raw LLM outputs clearly (in detailed/verbose mode) ✅ Identify performance bottlenecks (stage timings) ✅ Detect loops and duplicates (automatic) ✅ Find failing stages (error markers) ✅ Scan logs quickly (hierarchical structure) ✅ Debug production issues (adjustable verbosity)
🔧 Next Steps (Optional Improvements)
- Structured JSON logging: Output as JSON for log aggregation tools
- Log rotation: Implement file rotation for verbose logs
- Metrics export: Export stage timings to Prometheus/Grafana
- Error categorization: Tag errors by type (network, timeout, parsing, etc.)
- Performance alerts: Auto-alert when stages exceed thresholds
Happy debugging! You can now see what's actually happening in the cortex loop. 🎯