Cortex debugging logs cleaned up

This commit is contained in:
serversdwn
2025-12-20 02:49:20 -05:00
parent 970907cf1b
commit 6bb800f5f8
10 changed files with 1452 additions and 242 deletions

352
LOGGING_REFACTOR_SUMMARY.md Normal file
View File

@@ -0,0 +1,352 @@
# Cortex Logging Refactor Summary
## 🎯 Problem Statement
The cortex chat loop had severe logging issues that made debugging impossible:
1. **Massive verbosity**: 100+ log lines per chat message
2. **Raw LLM dumps**: Full JSON responses pretty-printed on every call (1000s of lines)
3. **Repeated data**: NeoMem results logged 71 times individually
4. **No structure**: Scattered emoji logs with no hierarchy
5. **Impossible to debug**: Couldn't tell if loops were happening or just verbose logging
6. **No loop protection**: Unbounded message history growth, no session cleanup, no duplicate detection
## ✅ What Was Fixed
### 1. **Structured Hierarchical Logging**
**Before:**
```
🔍 RAW LLM RESPONSE: {
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1234567890,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here is a very long response that goes on for hundreds of lines..."
}
}
],
"usage": {
"prompt_tokens": 123,
"completion_tokens": 456,
"total_tokens": 579
}
}
🧠 Trying backend: PRIMARY (http://localhost:8000)
✅ Success via PRIMARY
[STAGE 0] Collecting unified context...
[STAGE 0] Context collected - 5 RAG results
[COLLECT_CONTEXT] Intake data retrieved:
{
"L1": [...],
"L5": [...],
"L10": {...},
"L20": {...},
"L30": {...}
}
[COLLECT_CONTEXT] NeoMem search returned 71 results
[1] Score: 0.923 - Memory content here...
[2] Score: 0.891 - More memory content...
[3] Score: 0.867 - Even more content...
... (68 more lines)
```
**After (summary mode - DEFAULT):**
```
✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question about...
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
🧠 Monologue | question | Tone: curious
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
📤 Output: 342 characters
```
**After (detailed mode - for debugging):**
```
════════════════════════════════════════════════════════════════════════════════════════════════════
🚀 PIPELINE START | Session: abc123 | 14:23:45.123
════════════════════════════════════════════════════════════════════════════════════════════════════
📝 User: What is the meaning of life?
────────────────────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────
🧠 LLM CALL | Backend: PRIMARY | 14:23:45.234
────────────────────────────────────────────────────────────────────────────────────────────────────
📝 Prompt: You are Lyra, a thoughtful AI assistant...
💬 Reply: Based on philosophical perspectives, the meaning...
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
────────────────────────────────────────────────────────────────────────────────────────────────────
[CONTEXT] Session abc123 | User: What is the meaning of life?
────────────────────────────────────────────────────────────────────────────────────────────────────
Mode: default | Mood: neutral | Project: None
Tools: RAG, WEB, WEATHER, CODEBRAIN, POKERBRAIN
╭─ INTAKE SUMMARIES ────────────────────────────────────────────────
│ L1 : Last message discussed philosophy...
│ L5 : Recent 5 messages covered existential topics...
│ L10 : Past 10 messages showed curiosity pattern...
│ L20 : Session focused on deep questions...
│ L30 : Long-term trend shows philosophical interest...
╰───────────────────────────────────────────────────────────────────
╭─ RAG RESULTS (5) ──────────────────────────────────────────────
│ [1] 0.923 | Previous discussion about purpose and meaning...
│ [2] 0.891 | Note about existential philosophy...
│ [3] 0.867 | Memory of Viktor Frankl discussion...
│ [4] 0.834 | Reference to stoic philosophy...
│ [5] 0.801 | Buddhism and the middle path...
╰───────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────
════════════════════════════════════════════════════════════════════════════════════════════════════
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
════════════════════════════════════════════════════════════════════════════════════════════════════
⏱️ Stage Timings:
context : 150ms ( 12.0%)
identity : 10ms ( 0.8%)
monologue : 200ms ( 16.0%)
tools : 0ms ( 0.0%)
reflection : 50ms ( 4.0%)
reasoning : 450ms ( 36.0%)
refinement : 300ms ( 24.0%)
persona : 140ms ( 11.2%)
📤 Output: 342 characters
════════════════════════════════════════════════════════════════════════════════════════════════════
```
### 2. **Configurable Verbosity Levels**
Set via `LOG_DETAIL_LEVEL` environment variable:
- **`minimal`**: Only errors and critical events
- **`summary`**: Stage completion + errors (DEFAULT - recommended for production)
- **`detailed`**: Include raw LLM outputs, RAG results, timing breakdowns (for debugging)
- **`verbose`**: Everything including full JSON dumps (for deep debugging)
### 3. **Raw LLM Output Visibility** ✅
**You can now see raw LLM outputs clearly!**
In `detailed` or `verbose` mode, LLM calls show:
- Backend used
- Prompt preview
- Parsed reply
- **Raw JSON response in collapsible format** (verbose only)
```
╭─ RAW RESPONSE ────────────────────────────────────────────────────────────────────────────
│ {
│ "id": "chatcmpl-123",
│ "object": "chat.completion",
│ "model": "gpt-4",
│ "choices": [
│ {
│ "message": {
│ "content": "Full response here..."
│ }
│ }
│ ]
│ }
╰───────────────────────────────────────────────────────────────────────────────────────────
```
### 4. **Loop Detection & Protection** ✅
**New safety features:**
- **Duplicate message detection**: Prevents processing the same message twice
- **Message history trimming**: Auto-trims to last 100 messages (configurable via `MAX_MESSAGE_HISTORY`)
- **Session TTL**: Auto-expires inactive sessions after 24 hours (configurable via `SESSION_TTL_HOURS`)
- **Hash-based detection**: Uses MD5 hash to detect exact duplicate messages
**Example warning when loop detected:**
```
⚠️ DUPLICATE MESSAGE DETECTED | Session: abc123 | Message: What is the meaning of life?
🔁 LOOP DETECTED - Returning cached context to prevent processing duplicate
```
### 5. **Performance Timing** ✅
In `detailed` mode, see exactly where time is spent:
```
⏱️ Stage Timings:
context : 150ms ( 12.0%) ← Context collection
identity : 10ms ( 0.8%) ← Identity loading
monologue : 200ms ( 16.0%) ← Inner monologue
tools : 0ms ( 0.0%) ← Autonomous tools
reflection : 50ms ( 4.0%) ← Reflection notes
reasoning : 450ms ( 36.0%) ← Main reasoning (BOTTLENECK)
refinement : 300ms ( 24.0%) ← Answer refinement
persona : 140ms ( 11.2%) ← Persona layer
```
**This helps you identify weak links in the chain!**
## 📁 Files Modified
### Core Changes
1. **[llm.js](core/relay/lib/llm.js)**
- Removed massive JSON dump on line 53
- Added structured logging with 4 verbosity levels
- Shows raw responses only in verbose mode (collapsible format)
- Tracks failed backends and shows summary on total failure
2. **[context.py](cortex/context.py)**
- Condensed 71-line NeoMem loop to 5-line summary
- Removed repeated intake data dumps
- Added structured hierarchical logging with boxes
- Added duplicate message detection
- Added message history trimming
- Added session TTL and cleanup
3. **[router.py](cortex/router.py)**
- Replaced 15+ stage logs with unified pipeline summary
- Added stage timing collection
- Shows performance breakdown in detailed mode
- Clean start/end markers with total duration
### New Files
4. **[utils/logging_utils.py](cortex/utils/logging_utils.py)** (NEW)
- Reusable structured logging utilities
- `PipelineLogger` class for hierarchical logging
- Collapsible data sections
- Stage timing tracking
- Future-ready for expansion
5. **[.env.logging.example](.env.logging.example)** (NEW)
- Complete logging configuration guide
- Shows example output at each verbosity level
- Documents all environment variables
- Production-ready defaults
6. **[LOGGING_REFACTOR_SUMMARY.md](LOGGING_REFACTOR_SUMMARY.md)** (THIS FILE)
## 🚀 How to Use
### For Finding Weak Links (Your Use Case)
```bash
# Set in your .env or export:
export LOG_DETAIL_LEVEL=detailed
export VERBOSE_DEBUG=false # or true for even more detail
# Now run your chat - you'll see:
# 1. Which LLM backend is used
# 2. Raw LLM outputs (in verbose mode)
# 3. Exact timing per stage
# 4. Which stage is taking longest
```
### For Production
```bash
export LOG_DETAIL_LEVEL=summary
# Minimal, clean logs:
# ✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question...
# ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
```
### For Deep Debugging
```bash
export LOG_DETAIL_LEVEL=verbose
export LOG_RAW_CONTEXT_DATA=true
# Shows EVERYTHING including full JSON dumps
```
## 🔍 Finding Weak Links - Quick Guide
**Problem: "Which LLM stage is failing or producing bad output?"**
1. Set `LOG_DETAIL_LEVEL=detailed`
2. Run a test conversation
3. Look for timing anomalies:
```
reasoning : 3450ms ( 76.0%) ← BOTTLENECK!
```
4. Look for errors:
```
⚠️ Reflection failed: Connection timeout
```
5. Check raw LLM outputs (set `VERBOSE_DEBUG=true`):
```
╭─ RAW RESPONSE ────────────────────────────────────
│ {
│ "choices": [
│ { "message": { "content": "..." } }
│ ]
│ }
╰───────────────────────────────────────────────────
```
**Problem: "Is the loop repeating operations?"**
1. Enable duplicate detection (on by default)
2. Look for loop warnings:
```
⚠️ DUPLICATE MESSAGE DETECTED | Session: abc123
🔁 LOOP DETECTED - Returning cached context
```
3. Check stage timings - repeated stages will show up as duplicates
**Problem: "Which RAG memories are being used?"**
1. Set `LOG_DETAIL_LEVEL=detailed`
2. Look for RAG results box:
```
╭─ RAG RESULTS (5) ──────────────────────────────
│ [1] 0.923 | Previous discussion about X...
│ [2] 0.891 | Note about Y...
╰────────────────────────────────────────────────
```
## 📊 Environment Variables Reference
| Variable | Default | Description |
|----------|---------|-------------|
| `LOG_DETAIL_LEVEL` | `summary` | Verbosity: minimal/summary/detailed/verbose |
| `VERBOSE_DEBUG` | `false` | Legacy flag for maximum verbosity |
| `LOG_RAW_CONTEXT_DATA` | `false` | Show full intake data dumps |
| `ENABLE_DUPLICATE_DETECTION` | `true` | Detect and prevent duplicate messages |
| `MAX_MESSAGE_HISTORY` | `100` | Max messages to keep per session |
| `SESSION_TTL_HOURS` | `24` | Auto-expire sessions after N hours |
## 🎉 Results
**Before:** 1000+ lines of logs per chat message, unreadable, couldn't identify issues
**After (summary mode):** 5 lines of structured logs, clear and actionable
**After (detailed mode):** ~50 lines with full visibility into each stage, timing, and raw outputs
**Loop protection:** Automatic detection and prevention of duplicate processing
**You can now:**
✅ See raw LLM outputs clearly (in detailed/verbose mode)
✅ Identify performance bottlenecks (stage timings)
✅ Detect loops and duplicates (automatic)
✅ Find failing stages (error markers)
✅ Scan logs quickly (hierarchical structure)
✅ Debug production issues (adjustable verbosity)
## 🔧 Next Steps (Optional Improvements)
1. **Structured JSON logging**: Output as JSON for log aggregation tools
2. **Log rotation**: Implement file rotation for verbose logs
3. **Metrics export**: Export stage timings to Prometheus/Grafana
4. **Error categorization**: Tag errors by type (network, timeout, parsing, etc.)
5. **Performance alerts**: Auto-alert when stages exceed thresholds
---
**Happy debugging! You can now see what's actually happening in the cortex loop.** 🎯