353 lines
15 KiB
Markdown
353 lines
15 KiB
Markdown
# Cortex Logging Refactor Summary
|
|
|
|
## 🎯 Problem Statement
|
|
|
|
The cortex chat loop had severe logging issues that made debugging impossible:
|
|
|
|
1. **Massive verbosity**: 100+ log lines per chat message
|
|
2. **Raw LLM dumps**: Full JSON responses pretty-printed on every call (1000s of lines)
|
|
3. **Repeated data**: NeoMem results logged 71 times individually
|
|
4. **No structure**: Scattered emoji logs with no hierarchy
|
|
5. **Impossible to debug**: Couldn't tell if loops were happening or just verbose logging
|
|
6. **No loop protection**: Unbounded message history growth, no session cleanup, no duplicate detection
|
|
|
|
## ✅ What Was Fixed
|
|
|
|
### 1. **Structured Hierarchical Logging**
|
|
|
|
**Before:**
|
|
```
|
|
🔍 RAW LLM RESPONSE: {
|
|
"id": "chatcmpl-123",
|
|
"object": "chat.completion",
|
|
"created": 1234567890,
|
|
"model": "gpt-4",
|
|
"choices": [
|
|
{
|
|
"index": 0,
|
|
"message": {
|
|
"role": "assistant",
|
|
"content": "Here is a very long response that goes on for hundreds of lines..."
|
|
}
|
|
}
|
|
],
|
|
"usage": {
|
|
"prompt_tokens": 123,
|
|
"completion_tokens": 456,
|
|
"total_tokens": 579
|
|
}
|
|
}
|
|
🧠 Trying backend: PRIMARY (http://localhost:8000)
|
|
✅ Success via PRIMARY
|
|
[STAGE 0] Collecting unified context...
|
|
[STAGE 0] Context collected - 5 RAG results
|
|
[COLLECT_CONTEXT] Intake data retrieved:
|
|
{
|
|
"L1": [...],
|
|
"L5": [...],
|
|
"L10": {...},
|
|
"L20": {...},
|
|
"L30": {...}
|
|
}
|
|
[COLLECT_CONTEXT] NeoMem search returned 71 results
|
|
[1] Score: 0.923 - Memory content here...
|
|
[2] Score: 0.891 - More memory content...
|
|
[3] Score: 0.867 - Even more content...
|
|
... (68 more lines)
|
|
```
|
|
|
|
**After (summary mode - DEFAULT):**
|
|
```
|
|
✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question about...
|
|
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
|
|
🧠 Monologue | question | Tone: curious
|
|
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
|
|
📤 Output: 342 characters
|
|
```
|
|
|
|
**After (detailed mode - for debugging):**
|
|
```
|
|
════════════════════════════════════════════════════════════════════════════════════════════════════
|
|
🚀 PIPELINE START | Session: abc123 | 14:23:45.123
|
|
════════════════════════════════════════════════════════════════════════════════════════════════════
|
|
📝 User: What is the meaning of life?
|
|
────────────────────────────────────────────────────────────────────────────────────────────────────
|
|
|
|
────────────────────────────────────────────────────────────────────────────────────────────────────
|
|
🧠 LLM CALL | Backend: PRIMARY | 14:23:45.234
|
|
────────────────────────────────────────────────────────────────────────────────────────────────────
|
|
📝 Prompt: You are Lyra, a thoughtful AI assistant...
|
|
💬 Reply: Based on philosophical perspectives, the meaning...
|
|
|
|
📊 Context | Session: abc123 | Messages: 42 | Last: 5.2min | RAG: 5 results
|
|
────────────────────────────────────────────────────────────────────────────────────────────────────
|
|
[CONTEXT] Session abc123 | User: What is the meaning of life?
|
|
────────────────────────────────────────────────────────────────────────────────────────────────────
|
|
Mode: default | Mood: neutral | Project: None
|
|
Tools: RAG, WEB, WEATHER, CODEBRAIN, POKERBRAIN
|
|
|
|
╭─ INTAKE SUMMARIES ────────────────────────────────────────────────
|
|
│ L1 : Last message discussed philosophy...
|
|
│ L5 : Recent 5 messages covered existential topics...
|
|
│ L10 : Past 10 messages showed curiosity pattern...
|
|
│ L20 : Session focused on deep questions...
|
|
│ L30 : Long-term trend shows philosophical interest...
|
|
╰───────────────────────────────────────────────────────────────────
|
|
|
|
╭─ RAG RESULTS (5) ──────────────────────────────────────────────
|
|
│ [1] 0.923 | Previous discussion about purpose and meaning...
|
|
│ [2] 0.891 | Note about existential philosophy...
|
|
│ [3] 0.867 | Memory of Viktor Frankl discussion...
|
|
│ [4] 0.834 | Reference to stoic philosophy...
|
|
│ [5] 0.801 | Buddhism and the middle path...
|
|
╰───────────────────────────────────────────────────────────────────
|
|
────────────────────────────────────────────────────────────────────────────────────────────────────
|
|
|
|
════════════════════════════════════════════════════════════════════════════════════════════════════
|
|
✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
|
|
════════════════════════════════════════════════════════════════════════════════════════════════════
|
|
⏱️ Stage Timings:
|
|
context : 150ms ( 12.0%)
|
|
identity : 10ms ( 0.8%)
|
|
monologue : 200ms ( 16.0%)
|
|
tools : 0ms ( 0.0%)
|
|
reflection : 50ms ( 4.0%)
|
|
reasoning : 450ms ( 36.0%)
|
|
refinement : 300ms ( 24.0%)
|
|
persona : 140ms ( 11.2%)
|
|
📤 Output: 342 characters
|
|
════════════════════════════════════════════════════════════════════════════════════════════════════
|
|
```
|
|
|
|
### 2. **Configurable Verbosity Levels**
|
|
|
|
Set via `LOG_DETAIL_LEVEL` environment variable:
|
|
|
|
- **`minimal`**: Only errors and critical events
|
|
- **`summary`**: Stage completion + errors (DEFAULT - recommended for production)
|
|
- **`detailed`**: Include raw LLM outputs, RAG results, timing breakdowns (for debugging)
|
|
- **`verbose`**: Everything including full JSON dumps (for deep debugging)
|
|
|
|
### 3. **Raw LLM Output Visibility** ✅
|
|
|
|
**You can now see raw LLM outputs clearly!**
|
|
|
|
In `detailed` or `verbose` mode, LLM calls show:
|
|
- Backend used
|
|
- Prompt preview
|
|
- Parsed reply
|
|
- **Raw JSON response in collapsible format** (verbose only)
|
|
|
|
```
|
|
╭─ RAW RESPONSE ────────────────────────────────────────────────────────────────────────────
|
|
│ {
|
|
│ "id": "chatcmpl-123",
|
|
│ "object": "chat.completion",
|
|
│ "model": "gpt-4",
|
|
│ "choices": [
|
|
│ {
|
|
│ "message": {
|
|
│ "content": "Full response here..."
|
|
│ }
|
|
│ }
|
|
│ ]
|
|
│ }
|
|
╰───────────────────────────────────────────────────────────────────────────────────────────
|
|
```
|
|
|
|
### 4. **Loop Detection & Protection** ✅
|
|
|
|
**New safety features:**
|
|
|
|
- **Duplicate message detection**: Prevents processing the same message twice
|
|
- **Message history trimming**: Auto-trims to last 100 messages (configurable via `MAX_MESSAGE_HISTORY`)
|
|
- **Session TTL**: Auto-expires inactive sessions after 24 hours (configurable via `SESSION_TTL_HOURS`)
|
|
- **Hash-based detection**: Uses MD5 hash to detect exact duplicate messages
|
|
|
|
**Example warning when loop detected:**
|
|
```
|
|
⚠️ DUPLICATE MESSAGE DETECTED | Session: abc123 | Message: What is the meaning of life?
|
|
🔁 LOOP DETECTED - Returning cached context to prevent processing duplicate
|
|
```
|
|
|
|
### 5. **Performance Timing** ✅
|
|
|
|
In `detailed` mode, see exactly where time is spent:
|
|
|
|
```
|
|
⏱️ Stage Timings:
|
|
context : 150ms ( 12.0%) ← Context collection
|
|
identity : 10ms ( 0.8%) ← Identity loading
|
|
monologue : 200ms ( 16.0%) ← Inner monologue
|
|
tools : 0ms ( 0.0%) ← Autonomous tools
|
|
reflection : 50ms ( 4.0%) ← Reflection notes
|
|
reasoning : 450ms ( 36.0%) ← Main reasoning (BOTTLENECK)
|
|
refinement : 300ms ( 24.0%) ← Answer refinement
|
|
persona : 140ms ( 11.2%) ← Persona layer
|
|
```
|
|
|
|
**This helps you identify weak links in the chain!**
|
|
|
|
## 📁 Files Modified
|
|
|
|
### Core Changes
|
|
|
|
1. **[llm.js](core/relay/lib/llm.js)**
|
|
- Removed massive JSON dump on line 53
|
|
- Added structured logging with 4 verbosity levels
|
|
- Shows raw responses only in verbose mode (collapsible format)
|
|
- Tracks failed backends and shows summary on total failure
|
|
|
|
2. **[context.py](cortex/context.py)**
|
|
- Condensed 71-line NeoMem loop to 5-line summary
|
|
- Removed repeated intake data dumps
|
|
- Added structured hierarchical logging with boxes
|
|
- Added duplicate message detection
|
|
- Added message history trimming
|
|
- Added session TTL and cleanup
|
|
|
|
3. **[router.py](cortex/router.py)**
|
|
- Replaced 15+ stage logs with unified pipeline summary
|
|
- Added stage timing collection
|
|
- Shows performance breakdown in detailed mode
|
|
- Clean start/end markers with total duration
|
|
|
|
### New Files
|
|
|
|
4. **[utils/logging_utils.py](cortex/utils/logging_utils.py)** (NEW)
|
|
- Reusable structured logging utilities
|
|
- `PipelineLogger` class for hierarchical logging
|
|
- Collapsible data sections
|
|
- Stage timing tracking
|
|
- Future-ready for expansion
|
|
|
|
5. **[.env.logging.example](.env.logging.example)** (NEW)
|
|
- Complete logging configuration guide
|
|
- Shows example output at each verbosity level
|
|
- Documents all environment variables
|
|
- Production-ready defaults
|
|
|
|
6. **[LOGGING_REFACTOR_SUMMARY.md](LOGGING_REFACTOR_SUMMARY.md)** (THIS FILE)
|
|
|
|
## 🚀 How to Use
|
|
|
|
### For Finding Weak Links (Your Use Case)
|
|
|
|
```bash
|
|
# Set in your .env or export:
|
|
export LOG_DETAIL_LEVEL=detailed
|
|
export VERBOSE_DEBUG=false # or true for even more detail
|
|
|
|
# Now run your chat - you'll see:
|
|
# 1. Which LLM backend is used
|
|
# 2. Raw LLM outputs (in verbose mode)
|
|
# 3. Exact timing per stage
|
|
# 4. Which stage is taking longest
|
|
```
|
|
|
|
### For Production
|
|
|
|
```bash
|
|
export LOG_DETAIL_LEVEL=summary
|
|
|
|
# Minimal, clean logs:
|
|
# ✅ [LLM] PRIMARY | 14:23:45.123 | Reply: Based on your question...
|
|
# ✨ PIPELINE COMPLETE | Session: abc123 | Total: 1250ms
|
|
```
|
|
|
|
### For Deep Debugging
|
|
|
|
```bash
|
|
export LOG_DETAIL_LEVEL=verbose
|
|
export LOG_RAW_CONTEXT_DATA=true
|
|
|
|
# Shows EVERYTHING including full JSON dumps
|
|
```
|
|
|
|
## 🔍 Finding Weak Links - Quick Guide
|
|
|
|
**Problem: "Which LLM stage is failing or producing bad output?"**
|
|
|
|
1. Set `LOG_DETAIL_LEVEL=detailed`
|
|
2. Run a test conversation
|
|
3. Look for timing anomalies:
|
|
```
|
|
reasoning : 3450ms ( 76.0%) ← BOTTLENECK!
|
|
```
|
|
4. Look for errors:
|
|
```
|
|
⚠️ Reflection failed: Connection timeout
|
|
```
|
|
5. Check raw LLM outputs (set `VERBOSE_DEBUG=true`):
|
|
```
|
|
╭─ RAW RESPONSE ────────────────────────────────────
|
|
│ {
|
|
│ "choices": [
|
|
│ { "message": { "content": "..." } }
|
|
│ ]
|
|
│ }
|
|
╰───────────────────────────────────────────────────
|
|
```
|
|
|
|
**Problem: "Is the loop repeating operations?"**
|
|
|
|
1. Enable duplicate detection (on by default)
|
|
2. Look for loop warnings:
|
|
```
|
|
⚠️ DUPLICATE MESSAGE DETECTED | Session: abc123
|
|
🔁 LOOP DETECTED - Returning cached context
|
|
```
|
|
3. Check stage timings - repeated stages will show up as duplicates
|
|
|
|
**Problem: "Which RAG memories are being used?"**
|
|
|
|
1. Set `LOG_DETAIL_LEVEL=detailed`
|
|
2. Look for RAG results box:
|
|
```
|
|
╭─ RAG RESULTS (5) ──────────────────────────────
|
|
│ [1] 0.923 | Previous discussion about X...
|
|
│ [2] 0.891 | Note about Y...
|
|
╰────────────────────────────────────────────────
|
|
```
|
|
|
|
## 📊 Environment Variables Reference
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `LOG_DETAIL_LEVEL` | `summary` | Verbosity: minimal/summary/detailed/verbose |
|
|
| `VERBOSE_DEBUG` | `false` | Legacy flag for maximum verbosity |
|
|
| `LOG_RAW_CONTEXT_DATA` | `false` | Show full intake data dumps |
|
|
| `ENABLE_DUPLICATE_DETECTION` | `true` | Detect and prevent duplicate messages |
|
|
| `MAX_MESSAGE_HISTORY` | `100` | Max messages to keep per session |
|
|
| `SESSION_TTL_HOURS` | `24` | Auto-expire sessions after N hours |
|
|
|
|
## 🎉 Results
|
|
|
|
**Before:** 1000+ lines of logs per chat message, unreadable, couldn't identify issues
|
|
|
|
**After (summary mode):** 5 lines of structured logs, clear and actionable
|
|
|
|
**After (detailed mode):** ~50 lines with full visibility into each stage, timing, and raw outputs
|
|
|
|
**Loop protection:** Automatic detection and prevention of duplicate processing
|
|
|
|
**You can now:**
|
|
✅ See raw LLM outputs clearly (in detailed/verbose mode)
|
|
✅ Identify performance bottlenecks (stage timings)
|
|
✅ Detect loops and duplicates (automatic)
|
|
✅ Find failing stages (error markers)
|
|
✅ Scan logs quickly (hierarchical structure)
|
|
✅ Debug production issues (adjustable verbosity)
|
|
|
|
## 🔧 Next Steps (Optional Improvements)
|
|
|
|
1. **Structured JSON logging**: Output as JSON for log aggregation tools
|
|
2. **Log rotation**: Implement file rotation for verbose logs
|
|
3. **Metrics export**: Export stage timings to Prometheus/Grafana
|
|
4. **Error categorization**: Tag errors by type (network, timeout, parsing, etc.)
|
|
5. **Performance alerts**: Auto-alert when stages exceed thresholds
|
|
|
|
---
|
|
|
|
**Happy debugging! You can now see what's actually happening in the cortex loop.** 🎯
|