fix: consolidation no longer stalls or breaks the live chat turn

Two bugs surfacing in the log during live play:
- SUMMARY_BACKEND=mi50 (llama.cpp, 32B) was fed 24k-char chunks → "Context size
  has been exceeded". Chunk budget is now backend-aware: cloud 24k, local/mi50 8k,
  and the merge step recurses so merged partials never overflow either.
- maybe_summarize ran inline in the chat turn and retried 4× with backoff (~30s),
  stalling the reply and surfacing the error. It now runs in a background daemon
  thread, swallows errors (consolidation is best-effort maintenance), and dedupes
  so at most one summary per session runs at a time.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-20 04:37:17 +00:00
parent 5e9f3efeec
commit 5c41bd48d1
2 changed files with 44 additions and 7 deletions
+2 -2
View File
@@ -200,7 +200,7 @@ def respond(session_id: str, user_msg: str, backend: Backend = "cloud",
memory.remember(session_id, "assistant", reply)
# Compact this session once enough new turns have piled up.
summary.maybe_summarize(session_id)
summary.maybe_summarize_async(session_id)
return reply
@@ -259,5 +259,5 @@ def respond_stream(session_id: str, user_msg: str, backend: Backend = "cloud",
memory.remember(session_id, "user", user_msg)
memory.remember(session_id, "assistant", reply)
summary.maybe_summarize(session_id)
summary.maybe_summarize_async(session_id)
yield ("done", reply)