# Cortex Logging Refactor Summary ## 🎯 Problem Statement The cortex chat loop had severe logging issues that made debugging impossible: 1. **Massive verbosity**: 100+ log lines per chat message 2. **Raw LLM dumps**: Full JSON responses pretty-printed on every call (1000s of lines) 3. **Repeated data**: NeoMem results logged 71 times individually 4. **No structure**: Scattered emoji logs with no hierarchy 5. **Impossible to debug**: Couldn't tell if loops were happening or just verbose logging 6. **No loop protection**: Unbounded message history growth, no session cleanup, no duplicate detection ## ✅ What Was Fixed ### 1. **Structured Hierarchical Logging** **Before:** ``` 🔍 RAW LLM RESPONSE: { "id": "chatcmpl-123", "object": "chat.completion", "created": 1234567890, "model": "gpt-4", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Here is a very long response that goes on for hundreds of lines..." } } ], "usage": { "prompt_tokens": 123, "completion_tokens": 456, "total_tokens": 579 } } 🧠 Trying backend: PRIMARY (http://localhost:8000) ✅ Success via PRIMARY [STAGE 0] Collecting unified context... [STAGE 0] Context collected - 5 RAG results [COLLECT_CONTEXT] Intake data retrieved: { "L1": [...], "L5": [...], "L10": {...}, "L20": {...}, "L30": {...} } [COLLECT_CONTEXT] NeoMem search returned 71 results [1] Score: 0.923 - Memory content here... [2] Score: 0.891 - More memory content... [3] Score: 0.867 - Even more content... ... (68 more lines) ``` **After (summary mode - DEFAULT):** ``` ✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question about... 📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results 🧠 Monologue | question | Tone: curious ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms 📤 Output: 342 characters ``` **After (detailed mode - for debugging):** ``` ════════════════════════════════════════════════════════════════════════════════════════════════════ 🚀 PIPELINE START | Session: abc123 | 14:23:45.123 ════════════════════════════════════════════════════════════════════════════════════════════════════ 📝 User: What is the meaning of life? ──────────────────────────────────────────────────────────────────────────────────────────────────── ──────────────────────────────────────────────────────────────────────────────────────────────────── 🧠 LLM CALL | Backend: PRIMARY | 14:23:45.234 ──────────────────────────────────────────────────────────────────────────────────────────────────── 📝 Prompt: You are Lyra, a thoughtful AI assistant... 💬 Reply: Based on philosophical perspectives, the meaning... 📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results ──────────────────────────────────────────────────────────────────────────────────────────────────── [CONTEXT] Session abc123 | User: What is the meaning of life? ──────────────────────────────────────────────────────────────────────────────────────────────────── Mode: default | Mood: neutral | Project: None Tools: RAG, WEB, WEATHER, CODEBRAIN, POKERBRAIN ╭─ INTAKE SUMMARIES ──────────────────────────────────────────────── │ L1 : Last message discussed philosophy... │ L5 : Recent 5 messages covered existential topics... │ L10 : Past 10 messages showed curiosity pattern... │ L20 : Session focused on deep questions... │ L30 : Long-term trend shows philosophical interest... ╰─────────────────────────────────────────────────────────────────── ╭─ RAG RESULTS (5) ────────────────────────────────────────────── │ [1] 0.923 | Previous discussion about purpose and meaning... │ [2] 0.891 | Note about existential philosophy... │ [3] 0.867 | Memory of Viktor Frankl discussion... │ [4] 0.834 | Reference to stoic philosophy... │ [5] 0.801 | Buddhism and the middle path... ╰─────────────────────────────────────────────────────────────────── ──────────────────────────────────────────────────────────────────────────────────────────────────── ════════════════════════════════════════════════════════════════════════════════════════════════════ ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms ════════════════════════════════════════════════════════════════════════════════════════════════════ ⏱️ Stage Timings: context : 150ms ( 12.0%) identity : 10ms ( 0.8%) monologue : 200ms ( 16.0%) tools : 0ms ( 0.0%) reflection : 50ms ( 4.0%) reasoning : 450ms ( 36.0%) refinement : 300ms ( 24.0%) persona : 140ms ( 11.2%) 📤 Output: 342 characters ════════════════════════════════════════════════════════════════════════════════════════════════════ ``` ### 2. **Configurable Verbosity Levels** Set via `LOG_DETAIL_LEVEL` environment variable: - **`minimal`**: Only errors and critical events - **`summary`**: Stage completion + errors (DEFAULT - recommended for production) - **`detailed`**: Include raw LLM outputs, RAG results, timing breakdowns (for debugging) - **`verbose`**: Everything including full JSON dumps (for deep debugging) ### 3. **Raw LLM Output Visibility** ✅ **You can now see raw LLM outputs clearly!** In `detailed` or `verbose` mode, LLM calls show: - Backend used - Prompt preview - Parsed reply - **Raw JSON response in collapsible format** (verbose only) ``` ╭─ RAW RESPONSE ──────────────────────────────────────────────────────────────────────────── │ { │ "id": "chatcmpl-123", │ "object": "chat.completion", │ "model": "gpt-4", │ "choices": [ │ { │ "message": { │ "content": "Full response here..." │ } │ } │ ] │ } ╰─────────────────────────────────────────────────────────────────────────────────────────── ``` ### 4. **Loop Detection & Protection** ✅ **New safety features:** - **Duplicate message detection**: Prevents processing the same message twice - **Message history trimming**: Auto-trims to last 100 messages (configurable via `MAX_MESSAGE_HISTORY`) - **Session TTL**: Auto-expires inactive sessions after 24 hours (configurable via `SESSION_TTL_HOURS`) - **Hash-based detection**: Uses MD5 hash to detect exact duplicate messages **Example warning when loop detected:** ``` ⚠️ DUPLICATE MESSAGE DETECTED | Session: abc123 | Message: What is the meaning of life? 🔁 LOOP DETECTED - Returning cached context to prevent processing duplicate ``` ### 5. **Performance Timing** ✅ In `detailed` mode, see exactly where time is spent: ``` ⏱️ Stage Timings: context : 150ms ( 12.0%) ← Context collection identity : 10ms ( 0.8%) ← Identity loading monologue : 200ms ( 16.0%) ← Inner monologue tools : 0ms ( 0.0%) ← Autonomous tools reflection : 50ms ( 4.0%) ← Reflection notes reasoning : 450ms ( 36.0%) ← Main reasoning (BOTTLENECK) refinement : 300ms ( 24.0%) ← Answer refinement persona : 140ms ( 11.2%) ← Persona layer ``` **This helps you identify weak links in the chain!** ## 📁 Files Modified ### Core Changes 1. **[llm.js](core/relay/lib/llm.js)** - Removed massive JSON dump on line 53 - Added structured logging with 4 verbosity levels - Shows raw responses only in verbose mode (collapsible format) - Tracks failed backends and shows summary on total failure 2. **[context.py](cortex/context.py)** - Condensed 71-line NeoMem loop to 5-line summary - Removed repeated intake data dumps - Added structured hierarchical logging with boxes - Added duplicate message detection - Added message history trimming - Added session TTL and cleanup 3. **[router.py](cortex/router.py)** - Replaced 15+ stage logs with unified pipeline summary - Added stage timing collection - Shows performance breakdown in detailed mode - Clean start/end markers with total duration ### New Files 4. **[utils/logging_utils.py](cortex/utils/logging_utils.py)** (NEW) - Reusable structured logging utilities - `PipelineLogger` class for hierarchical logging - Collapsible data sections - Stage timing tracking - Future-ready for expansion 5. **[.env.logging.example](.env.logging.example)** (NEW) - Complete logging configuration guide - Shows example output at each verbosity level - Documents all environment variables - Production-ready defaults 6. **[LOGGING_REFACTOR_SUMMARY.md](LOGGING_REFACTOR_SUMMARY.md)** (THIS FILE) ## 🚀 How to Use ### For Finding Weak Links (Your Use Case) ```bash # Set in your .env or export: export LOG_DETAIL_LEVEL=detailed export VERBOSE_DEBUG=false # or true for even more detail # Now run your chat - you'll see: # 1. Which LLM backend is used # 2. Raw LLM outputs (in verbose mode) # 3. Exact timing per stage # 4. Which stage is taking longest ``` ### For Production ```bash export LOG_DETAIL_LEVEL=summary # Minimal, clean logs: # ✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question... # ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms ``` ### For Deep Debugging ```bash export LOG_DETAIL_LEVEL=verbose export LOG_RAW_CONTEXT_DATA=true # Shows EVERYTHING including full JSON dumps ``` ## 🔍 Finding Weak Links - Quick Guide **Problem: "Which LLM stage is failing or producing bad output?"** 1. Set `LOG_DETAIL_LEVEL=detailed` 2. Run a test conversation 3. Look for timing anomalies: ``` reasoning : 3450ms ( 76.0%) ← BOTTLENECK! ``` 4. Look for errors: ``` ⚠️ Reflection failed: Connection timeout ``` 5. Check raw LLM outputs (set `VERBOSE_DEBUG=true`): ``` ╭─ RAW RESPONSE ──────────────────────────────────── │ { │ "choices": [ │ { "message": { "content": "..." } } │ ] │ } ╰─────────────────────────────────────────────────── ``` **Problem: "Is the loop repeating operations?"** 1. Enable duplicate detection (on by default) 2. Look for loop warnings: ``` ⚠️ DUPLICATE MESSAGE DETECTED | Session: abc123 🔁 LOOP DETECTED - Returning cached context ``` 3. Check stage timings - repeated stages will show up as duplicates **Problem: "Which RAG memories are being used?"** 1. Set `LOG_DETAIL_LEVEL=detailed` 2. Look for RAG results box: ``` ╭─ RAG RESULTS (5) ────────────────────────────── │ [1] 0.923 | Previous discussion about X... │ [2] 0.891 | Note about Y... ╰──────────────────────────────────────────────── ``` ## 📊 Environment Variables Reference | Variable | Default | Description | |----------|---------|-------------| | `LOG_DETAIL_LEVEL` | `summary` | Verbosity: minimal/summary/detailed/verbose | | `VERBOSE_DEBUG` | `false` | Legacy flag for maximum verbosity | | `LOG_RAW_CONTEXT_DATA` | `false` | Show full intake data dumps | | `ENABLE_DUPLICATE_DETECTION` | `true` | Detect and prevent duplicate messages | | `MAX_MESSAGE_HISTORY` | `100` | Max messages to keep per session | | `SESSION_TTL_HOURS` | `24` | Auto-expire sessions after N hours | ## 🎉 Results **Before:** 1000+ lines of logs per chat message, unreadable, couldn't identify issues **After (summary mode):** 5 lines of structured logs, clear and actionable **After (detailed mode):** ~50 lines with full visibility into each stage, timing, and raw outputs **Loop protection:** Automatic detection and prevention of duplicate processing **You can now:** ✅ See raw LLM outputs clearly (in detailed/verbose mode) ✅ Identify performance bottlenecks (stage timings) ✅ Detect loops and duplicates (automatic) ✅ Find failing stages (error markers) ✅ Scan logs quickly (hierarchical structure) ✅ Debug production issues (adjustable verbosity) ## 🔧 Next Steps (Optional Improvements) 1. **Structured JSON logging**: Output as JSON for log aggregation tools 2. **Log rotation**: Implement file rotation for verbose logs 3. **Metrics export**: Export stage timings to Prometheus/Grafana 4. **Error categorization**: Tag errors by type (network, timeout, parsing, etc.) 5. **Performance alerts**: Auto-alert when stages exceed thresholds --- **Happy debugging! You can now see what's actually happening in the cortex loop.** 🎯