Cortex debugging logs cleaned up
This commit is contained in:
352
LOGGING_REFACTOR_SUMMARY.md
Normal file
352
LOGGING_REFACTOR_SUMMARY.md
Normal file
@@ -0,0 +1,352 @@
|
||||
# Cortex Logging Refactor Summary
|
||||
|
||||
## 🎯 Problem Statement
|
||||
|
||||
The cortex chat loop had severe logging issues that made debugging impossible:
|
||||
|
||||
1. **Massive verbosity**: 100+ log lines per chat message
|
||||
2. **Raw LLM dumps**: Full JSON responses pretty-printed on every call (1000s of lines)
|
||||
3. **Repeated data**: NeoMem results logged 71 times individually
|
||||
4. **No structure**: Scattered emoji logs with no hierarchy
|
||||
5. **Impossible to debug**: Couldn't tell if loops were happening or just verbose logging
|
||||
6. **No loop protection**: Unbounded message history growth, no session cleanup, no duplicate detection
|
||||
|
||||
## ✅ What Was Fixed
|
||||
|
||||
### 1. **Structured Hierarchical Logging**
|
||||
|
||||
**Before:**
|
||||
```
|
||||
🔍 RAW LLM RESPONSE: {
|
||||
"id": "chatcmpl-123",
|
||||
"object": "chat.completion",
|
||||
"created": 1234567890,
|
||||
"model": "gpt-4",
|
||||
"choices": [
|
||||
{
|
||||
"index": 0,
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "Here is a very long response that goes on for hundreds of lines..."
|
||||
}
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 123,
|
||||
"completion_tokens": 456,
|
||||
"total_tokens": 579
|
||||
}
|
||||
}
|
||||
🧠 Trying backend: PRIMARY (http://localhost:8000)
|
||||
✅ Success via PRIMARY
|
||||
[STAGE 0] Collecting unified context...
|
||||
[STAGE 0] Context collected - 5 RAG results
|
||||
[COLLECT_CONTEXT] Intake data retrieved:
|
||||
{
|
||||
"L1": [...],
|
||||
"L5": [...],
|
||||
"L10": {...},
|
||||
"L20": {...},
|
||||
"L30": {...}
|
||||
}
|
||||
[COLLECT_CONTEXT] NeoMem search returned 71 results
|
||||
[1] Score: 0.923 - Memory content here...
|
||||
[2] Score: 0.891 - More memory content...
|
||||
[3] Score: 0.867 - Even more content...
|
||||
... (68 more lines)
|
||||
```
|
||||
|
||||
**After (summary mode - DEFAULT):**
|
||||
```
|
||||
✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question about...
|
||||
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
|
||||
🧠 Monologue | question | Tone: curious
|
||||
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
|
||||
📤 Output: 342 characters
|
||||
```
|
||||
|
||||
**After (detailed mode - for debugging):**
|
||||
```
|
||||
════════════════════════════════════════════════════════════════════════════════════════════════════
|
||||
🚀 PIPELINE START | Session: abc123 | 14:23:45.123
|
||||
════════════════════════════════════════════════════════════════════════════════════════════════════
|
||||
📝 User: What is the meaning of life?
|
||||
────────────────────────────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
────────────────────────────────────────────────────────────────────────────────────────────────────
|
||||
🧠 LLM CALL | Backend: PRIMARY | 14:23:45.234
|
||||
────────────────────────────────────────────────────────────────────────────────────────────────────
|
||||
📝 Prompt: You are Lyra, a thoughtful AI assistant...
|
||||
💬 Reply: Based on philosophical perspectives, the meaning...
|
||||
|
||||
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
|
||||
────────────────────────────────────────────────────────────────────────────────────────────────────
|
||||
[CONTEXT] Session abc123 | User: What is the meaning of life?
|
||||
────────────────────────────────────────────────────────────────────────────────────────────────────
|
||||
Mode: default | Mood: neutral | Project: None
|
||||
Tools: RAG, WEB, WEATHER, CODEBRAIN, POKERBRAIN
|
||||
|
||||
╭─ INTAKE SUMMARIES ────────────────────────────────────────────────
|
||||
│ L1 : Last message discussed philosophy...
|
||||
│ L5 : Recent 5 messages covered existential topics...
|
||||
│ L10 : Past 10 messages showed curiosity pattern...
|
||||
│ L20 : Session focused on deep questions...
|
||||
│ L30 : Long-term trend shows philosophical interest...
|
||||
╰───────────────────────────────────────────────────────────────────
|
||||
|
||||
╭─ RAG RESULTS (5) ──────────────────────────────────────────────
|
||||
│ [1] 0.923 | Previous discussion about purpose and meaning...
|
||||
│ [2] 0.891 | Note about existential philosophy...
|
||||
│ [3] 0.867 | Memory of Viktor Frankl discussion...
|
||||
│ [4] 0.834 | Reference to stoic philosophy...
|
||||
│ [5] 0.801 | Buddhism and the middle path...
|
||||
╰───────────────────────────────────────────────────────────────────
|
||||
────────────────────────────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
════════════════════════════════════════════════════════════════════════════════════════════════════
|
||||
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
|
||||
════════════════════════════════════════════════════════════════════════════════════════════════════
|
||||
⏱️ Stage Timings:
|
||||
context : 150ms ( 12.0%)
|
||||
identity : 10ms ( 0.8%)
|
||||
monologue : 200ms ( 16.0%)
|
||||
tools : 0ms ( 0.0%)
|
||||
reflection : 50ms ( 4.0%)
|
||||
reasoning : 450ms ( 36.0%)
|
||||
refinement : 300ms ( 24.0%)
|
||||
persona : 140ms ( 11.2%)
|
||||
📤 Output: 342 characters
|
||||
════════════════════════════════════════════════════════════════════════════════════════════════════
|
||||
```
|
||||
|
||||
### 2. **Configurable Verbosity Levels**
|
||||
|
||||
Set via `LOG_DETAIL_LEVEL` environment variable:
|
||||
|
||||
- **`minimal`**: Only errors and critical events
|
||||
- **`summary`**: Stage completion + errors (DEFAULT - recommended for production)
|
||||
- **`detailed`**: Include raw LLM outputs, RAG results, timing breakdowns (for debugging)
|
||||
- **`verbose`**: Everything including full JSON dumps (for deep debugging)
|
||||
|
||||
### 3. **Raw LLM Output Visibility** ✅
|
||||
|
||||
**You can now see raw LLM outputs clearly!**
|
||||
|
||||
In `detailed` or `verbose` mode, LLM calls show:
|
||||
- Backend used
|
||||
- Prompt preview
|
||||
- Parsed reply
|
||||
- **Raw JSON response in collapsible format** (verbose only)
|
||||
|
||||
```
|
||||
╭─ RAW RESPONSE ────────────────────────────────────────────────────────────────────────────
|
||||
│ {
|
||||
│ "id": "chatcmpl-123",
|
||||
│ "object": "chat.completion",
|
||||
│ "model": "gpt-4",
|
||||
│ "choices": [
|
||||
│ {
|
||||
│ "message": {
|
||||
│ "content": "Full response here..."
|
||||
│ }
|
||||
│ }
|
||||
│ ]
|
||||
│ }
|
||||
╰───────────────────────────────────────────────────────────────────────────────────────────
|
||||
```
|
||||
|
||||
### 4. **Loop Detection & Protection** ✅
|
||||
|
||||
**New safety features:**
|
||||
|
||||
- **Duplicate message detection**: Prevents processing the same message twice
|
||||
- **Message history trimming**: Auto-trims to last 100 messages (configurable via `MAX_MESSAGE_HISTORY`)
|
||||
- **Session TTL**: Auto-expires inactive sessions after 24 hours (configurable via `SESSION_TTL_HOURS`)
|
||||
- **Hash-based detection**: Uses MD5 hash to detect exact duplicate messages
|
||||
|
||||
**Example warning when loop detected:**
|
||||
```
|
||||
⚠️ DUPLICATE MESSAGE DETECTED | Session: abc123 | Message: What is the meaning of life?
|
||||
🔁 LOOP DETECTED - Returning cached context to prevent processing duplicate
|
||||
```
|
||||
|
||||
### 5. **Performance Timing** ✅
|
||||
|
||||
In `detailed` mode, see exactly where time is spent:
|
||||
|
||||
```
|
||||
⏱️ Stage Timings:
|
||||
context : 150ms ( 12.0%) ← Context collection
|
||||
identity : 10ms ( 0.8%) ← Identity loading
|
||||
monologue : 200ms ( 16.0%) ← Inner monologue
|
||||
tools : 0ms ( 0.0%) ← Autonomous tools
|
||||
reflection : 50ms ( 4.0%) ← Reflection notes
|
||||
reasoning : 450ms ( 36.0%) ← Main reasoning (BOTTLENECK)
|
||||
refinement : 300ms ( 24.0%) ← Answer refinement
|
||||
persona : 140ms ( 11.2%) ← Persona layer
|
||||
```
|
||||
|
||||
**This helps you identify weak links in the chain!**
|
||||
|
||||
## 📁 Files Modified
|
||||
|
||||
### Core Changes
|
||||
|
||||
1. **[llm.js](core/relay/lib/llm.js)**
|
||||
- Removed massive JSON dump on line 53
|
||||
- Added structured logging with 4 verbosity levels
|
||||
- Shows raw responses only in verbose mode (collapsible format)
|
||||
- Tracks failed backends and shows summary on total failure
|
||||
|
||||
2. **[context.py](cortex/context.py)**
|
||||
- Condensed 71-line NeoMem loop to 5-line summary
|
||||
- Removed repeated intake data dumps
|
||||
- Added structured hierarchical logging with boxes
|
||||
- Added duplicate message detection
|
||||
- Added message history trimming
|
||||
- Added session TTL and cleanup
|
||||
|
||||
3. **[router.py](cortex/router.py)**
|
||||
- Replaced 15+ stage logs with unified pipeline summary
|
||||
- Added stage timing collection
|
||||
- Shows performance breakdown in detailed mode
|
||||
- Clean start/end markers with total duration
|
||||
|
||||
### New Files
|
||||
|
||||
4. **[utils/logging_utils.py](cortex/utils/logging_utils.py)** (NEW)
|
||||
- Reusable structured logging utilities
|
||||
- `PipelineLogger` class for hierarchical logging
|
||||
- Collapsible data sections
|
||||
- Stage timing tracking
|
||||
- Future-ready for expansion
|
||||
|
||||
5. **[.env.logging.example](.env.logging.example)** (NEW)
|
||||
- Complete logging configuration guide
|
||||
- Shows example output at each verbosity level
|
||||
- Documents all environment variables
|
||||
- Production-ready defaults
|
||||
|
||||
6. **[LOGGING_REFACTOR_SUMMARY.md](LOGGING_REFACTOR_SUMMARY.md)** (THIS FILE)
|
||||
|
||||
## 🚀 How to Use
|
||||
|
||||
### For Finding Weak Links (Your Use Case)
|
||||
|
||||
```bash
|
||||
# Set in your .env or export:
|
||||
export LOG_DETAIL_LEVEL=detailed
|
||||
export VERBOSE_DEBUG=false # or true for even more detail
|
||||
|
||||
# Now run your chat - you'll see:
|
||||
# 1. Which LLM backend is used
|
||||
# 2. Raw LLM outputs (in verbose mode)
|
||||
# 3. Exact timing per stage
|
||||
# 4. Which stage is taking longest
|
||||
```
|
||||
|
||||
### For Production
|
||||
|
||||
```bash
|
||||
export LOG_DETAIL_LEVEL=summary
|
||||
|
||||
# Minimal, clean logs:
|
||||
# ✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question...
|
||||
# ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
|
||||
```
|
||||
|
||||
### For Deep Debugging
|
||||
|
||||
```bash
|
||||
export LOG_DETAIL_LEVEL=verbose
|
||||
export LOG_RAW_CONTEXT_DATA=true
|
||||
|
||||
# Shows EVERYTHING including full JSON dumps
|
||||
```
|
||||
|
||||
## 🔍 Finding Weak Links - Quick Guide
|
||||
|
||||
**Problem: "Which LLM stage is failing or producing bad output?"**
|
||||
|
||||
1. Set `LOG_DETAIL_LEVEL=detailed`
|
||||
2. Run a test conversation
|
||||
3. Look for timing anomalies:
|
||||
```
|
||||
reasoning : 3450ms ( 76.0%) ← BOTTLENECK!
|
||||
```
|
||||
4. Look for errors:
|
||||
```
|
||||
⚠️ Reflection failed: Connection timeout
|
||||
```
|
||||
5. Check raw LLM outputs (set `VERBOSE_DEBUG=true`):
|
||||
```
|
||||
╭─ RAW RESPONSE ────────────────────────────────────
|
||||
│ {
|
||||
│ "choices": [
|
||||
│ { "message": { "content": "..." } }
|
||||
│ ]
|
||||
│ }
|
||||
╰───────────────────────────────────────────────────
|
||||
```
|
||||
|
||||
**Problem: "Is the loop repeating operations?"**
|
||||
|
||||
1. Enable duplicate detection (on by default)
|
||||
2. Look for loop warnings:
|
||||
```
|
||||
⚠️ DUPLICATE MESSAGE DETECTED | Session: abc123
|
||||
🔁 LOOP DETECTED - Returning cached context
|
||||
```
|
||||
3. Check stage timings - repeated stages will show up as duplicates
|
||||
|
||||
**Problem: "Which RAG memories are being used?"**
|
||||
|
||||
1. Set `LOG_DETAIL_LEVEL=detailed`
|
||||
2. Look for RAG results box:
|
||||
```
|
||||
╭─ RAG RESULTS (5) ──────────────────────────────
|
||||
│ [1] 0.923 | Previous discussion about X...
|
||||
│ [2] 0.891 | Note about Y...
|
||||
╰────────────────────────────────────────────────
|
||||
```
|
||||
|
||||
## 📊 Environment Variables Reference
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `LOG_DETAIL_LEVEL` | `summary` | Verbosity: minimal/summary/detailed/verbose |
|
||||
| `VERBOSE_DEBUG` | `false` | Legacy flag for maximum verbosity |
|
||||
| `LOG_RAW_CONTEXT_DATA` | `false` | Show full intake data dumps |
|
||||
| `ENABLE_DUPLICATE_DETECTION` | `true` | Detect and prevent duplicate messages |
|
||||
| `MAX_MESSAGE_HISTORY` | `100` | Max messages to keep per session |
|
||||
| `SESSION_TTL_HOURS` | `24` | Auto-expire sessions after N hours |
|
||||
|
||||
## 🎉 Results
|
||||
|
||||
**Before:** 1000+ lines of logs per chat message, unreadable, couldn't identify issues
|
||||
|
||||
**After (summary mode):** 5 lines of structured logs, clear and actionable
|
||||
|
||||
**After (detailed mode):** ~50 lines with full visibility into each stage, timing, and raw outputs
|
||||
|
||||
**Loop protection:** Automatic detection and prevention of duplicate processing
|
||||
|
||||
**You can now:**
|
||||
✅ See raw LLM outputs clearly (in detailed/verbose mode)
|
||||
✅ Identify performance bottlenecks (stage timings)
|
||||
✅ Detect loops and duplicates (automatic)
|
||||
✅ Find failing stages (error markers)
|
||||
✅ Scan logs quickly (hierarchical structure)
|
||||
✅ Debug production issues (adjustable verbosity)
|
||||
|
||||
## 🔧 Next Steps (Optional Improvements)
|
||||
|
||||
1. **Structured JSON logging**: Output as JSON for log aggregation tools
|
||||
2. **Log rotation**: Implement file rotation for verbose logs
|
||||
3. **Metrics export**: Export stage timings to Prometheus/Grafana
|
||||
4. **Error categorization**: Tag errors by type (network, timeout, parsing, etc.)
|
||||
5. **Performance alerts**: Auto-alert when stages exceed thresholds
|
||||
|
||||
---
|
||||
|
||||
**Happy debugging! You can now see what's actually happening in the cortex loop.** 🎯
|
||||
Reference in New Issue
Block a user