project-lyra

Files

T

serversdown 5dc3fa17d7 feat(web): stream chat replies token-by-token (M3)

- llm.chat_call_stream: streaming generator for all 3 backends (Ollama NDJSON,
  OpenAI/MI50 SSE), accumulating tool-call fragments by index.
- chat.respond_stream: mirrors respond()'s tool loop and persistence/compaction,
  yielding ("delta", text) / ("tool", name) / ("done", reply).
- POST /v1/chat/stream: SSE endpoint; blocking generator bridged to async via a
  worker thread + asyncio.Queue. Old completions endpoint kept as fallback.
- Client streams into a live bubble with a blinking caret; rAF-throttled render
  (no full re-parse per token) and instant scroll during stream — fixes iOS
  Safari ghosting from per-token smooth-scroll. Falls back to the blocking
  endpoint only if nothing streamed (no double-persist).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-19 00:06:51 +00:00

static

feat(web): stream chat replies token-by-token (M3)

2026-06-19 00:06:51 +00:00

__init__.py

feat: persona chat loop, web UI, and local (Ollama) embeddings

2026-06-15 18:36:31 +00:00

gen_icons.py

feat(web): iPhone PWA fixes (M1) + warm RTO redesign (M2)

2026-06-18 23:20:11 +00:00

server.py

feat(web): stream chat replies token-by-token (M3)

2026-06-19 00:06:51 +00:00