d7e2fce694bda6d8292d08fc31942fc33c9625ec
Refactor summarize_all to run LLM summarization across a thread pool (default 8 workers) while keeping all SQLite reads/writes on the main thread (the single connection is never shared across threads). Extract _summarize_transcript (transcript -> gist, no DB) for the worker. The MI50 proved far too slow for the large-transcript backfill (~29 summaries in 9h due to gfx906 prefill); on cloud gpt-4o-mini with concurrency this runs at ~30 summaries/minute (~17 min for the full backfill, ~$2). MI50 stays the chat backend where small prompts make it snappy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lyra
A persistent, autonomous AI assistant. From-scratch rewrite of an earlier attempt.
The design thinking that survives the rewrite lives in docs/ — start with docs/ARCH_v0-6-1.md. The previous implementation is preserved on the archive branch.
Status
Pre-MVP. Building toward the smallest useful version: chat with persistent memory across sessions.
Setup
uv sync
cp .env.example .env
# fill in ANTHROPIC_API_KEY and point LOCAL_BASE_URL at your Ollama
Architecture
The long-term target is the cognitive split in docs/ARCH_v0-6-1.md — Inner Self as the seat of consciousness, Executive for hard reasoning, Cortex Chat for drafting, Persona for voice. The MVP implements only the chat + memory baseline. Cognitive layers come back one at a time.
Description
Languages
HTML
46.7%
Python
32.1%
CSS
21.2%