d7e2fce694
Refactor summarize_all to run LLM summarization across a thread pool (default 8 workers) while keeping all SQLite reads/writes on the main thread (the single connection is never shared across threads). Extract _summarize_transcript (transcript -> gist, no DB) for the worker. The MI50 proved far too slow for the large-transcript backfill (~29 summaries in 9h due to gfx906 prefill); on cloud gpt-4o-mini with concurrency this runs at ~30 summaries/minute (~17 min for the full backfill, ~$2). MI50 stays the chat backend where small prompts make it snappy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>