project-lyra

Author	SHA1	Message	Date
serversdown	d7e2fce694	perf: concurrent summarize-all (parallel LLM, serial DB) Refactor summarize_all to run LLM summarization across a thread pool (default 8 workers) while keeping all SQLite reads/writes on the main thread (the single connection is never shared across threads). Extract _summarize_transcript (transcript -> gist, no DB) for the worker. The MI50 proved far too slow for the large-transcript backfill (~29 summaries in 9h due to gfx906 prefill); on cloud gpt-4o-mini with concurrency this runs at ~30 summaries/minute (~17 min for the full backfill, ~$2). MI50 stays the chat backend where small prompts make it snappy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 16:30:07 +00:00
serversdown	34392e4097	fix: make summarize-all resilient to backend hiccups The MI50 llama.cpp server OOM-killed (LXC RAM limit + 8GB prompt cache) mid-run, and summarize_all had no error handling, so one APIConnectionError killed the whole batch. Add retry-with-backoff around the summarization LLM call, and try/except per session in summarize_all (log + skip; unsummarized sessions get retried on the next run). (Server-side: CT202 RAM raised + prompt cache disabled.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 06:31:28 +00:00
serversdown	071522ea33	feat: summarize-all batch (consolidation step 1) Harden summarize_session to chunk + merge long sessions (imported convos can exceed the local model's context), and add summarize_all: idempotent, resumable batch that summarizes every session needing it (skips up-to-date ones), with progress logged to the live log. `lyra-summarize [limit]` CLI. This is the first consolidation stage feeding the profile (semantic memory) and era-rollup tiers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 04:08:41 +00:00
serversdown	d7c258eba0	feat: tiered, compacting memory (phase 1.5) Older sessions fade to a general idea; details stay retrievable. - memory: summaries table (one compacted gist per session, embedded), plus store_summary/get_summary/recall_summaries and unsummarized_count (tracks exchanges newer than the current summary) - lyra/summary.py: summarize_session compacts a session's raw turns into a third-person gist (default SUMMARY_BACKEND=local, so compaction is free); maybe_summarize re-summarizes once SUMMARIZE_AFTER new turns accumulate - chat.build_messages now layers context in tiers: persona -> gists of other sessions -> a few sharp raw cross-session details -> current session raw turns -> new message; respond() compacts the session after each turn - web: POST /sessions/{id}/summarize to compact on demand - summarization activity surfaces in the live log Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 18:52:58 +00:00

4 Commits