feat(web): stream chat replies token-by-token (M3)

- llm.chat_call_stream: streaming generator for all 3 backends (Ollama NDJSON,
  OpenAI/MI50 SSE), accumulating tool-call fragments by index.
- chat.respond_stream: mirrors respond()'s tool loop and persistence/compaction,
  yielding ("delta", text) / ("tool", name) / ("done", reply).
- POST /v1/chat/stream: SSE endpoint; blocking generator bridged to async via a
  worker thread + asyncio.Queue. Old completions endpoint kept as fallback.
- Client streams into a live bubble with a blinking caret; rAF-throttled render
  (no full re-parse per token) and instant scroll during stream — fixes iOS
  Safari ghosting from per-token smooth-scroll. Falls back to the blocking
  endpoint only if nothing streamed (no double-persist).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-19 00:06:51 +00:00
parent fa168271e1
commit 5dc3fa17d7
5 changed files with 281 additions and 9 deletions
+13 -1
View File
@@ -139,7 +139,9 @@ button:hover, select:hover {
display: flex;
flex-direction: column;
gap: 8px;
scroll-behavior: smooth;
/* No CSS smooth-scroll: during streaming, per-token smooth scrolls pile up and
iOS Safari leaves ghost paint frames. Smooth is applied explicitly in JS where
it's a one-shot (load/finalize). */
}
/* Messages */
@@ -1090,6 +1092,16 @@ select:hover {
}
.msg.assistant pre code { background: none; padding: 0; font-size: 0.85em; }
/* Streaming: a blinking caret while tokens arrive (and a min-size while empty). */
.msg.assistant.streaming { min-width: 1.4em; min-height: 1.1em; }
.msg.assistant.streaming::after {
content: "▋";
margin-left: 1px;
color: var(--accent);
animation: caretBlink 1s steps(1) infinite;
}
@keyframes caretBlink { 0%, 50% { opacity: 0.85; } 50.01%, 100% { opacity: 0; } }
/* Behind-the-scenes 👍/👎 feedback (fine-tune signal) — subtle until hovered. */
.rate-bar { display: flex; gap: 6px; margin-top: 7px; opacity: 0.3; transition: opacity .15s; }
.msg.assistant:hover .rate-bar { opacity: 0.85; }