fix: consolidation no longer stalls or breaks the live chat turn
Two bugs surfacing in the log during live play: - SUMMARY_BACKEND=mi50 (llama.cpp, 32B) was fed 24k-char chunks → "Context size has been exceeded". Chunk budget is now backend-aware: cloud 24k, local/mi50 8k, and the merge step recurses so merged partials never overflow either. - maybe_summarize ran inline in the chat turn and retried 4× with backoff (~30s), stalling the reply and surfacing the error. It now runs in a background daemon thread, swallows errors (consolidation is best-effort maintenance), and dedupes so at most one summary per session runs at a time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
+2
-2
@@ -200,7 +200,7 @@ def respond(session_id: str, user_msg: str, backend: Backend = "cloud",
|
||||
memory.remember(session_id, "assistant", reply)
|
||||
|
||||
# Compact this session once enough new turns have piled up.
|
||||
summary.maybe_summarize(session_id)
|
||||
summary.maybe_summarize_async(session_id)
|
||||
return reply
|
||||
|
||||
|
||||
@@ -259,5 +259,5 @@ def respond_stream(session_id: str, user_msg: str, backend: Backend = "cloud",
|
||||
|
||||
memory.remember(session_id, "user", user_msg)
|
||||
memory.remember(session_id, "assistant", reply)
|
||||
summary.maybe_summarize(session_id)
|
||||
summary.maybe_summarize_async(session_id)
|
||||
yield ("done", reply)
|
||||
|
||||
Reference in New Issue
Block a user