feat: tiered, compacting memory (phase 1.5)

Older sessions fade to a general idea; details stay retrievable. - memory: summaries table (one compacted gist per session, embedded), plus store_summary/get_summary/recall_summaries and unsummarized_count (tracks exchanges newer than the current summary) - lyra/summary.py: summarize_session compacts a session's raw turns into a third-person gist (default SUMMARY_BACKEND=local, so compaction is free); maybe_summarize re-summarizes once SUMMARIZE_AFTER new turns accumulate - chat.build_messages now layers context in tiers: persona -> gists of other sessions -> a few sharp raw cross-session details -> current session raw turns -> new message; respond() compacts the session after each turn - web: POST /sessions/{id}/summarize to compact on demand - summarization activity surfaces in the live log Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 18:52:58 +00:00
parent 84c4f75e03
commit d7c258eba0
6 changed files with 211 additions and 23 deletions
@@ -19,6 +19,7 @@ class Config:
    embed_backend: str  # "cloud" (OpenAI) or "local" (Ollama)
    embed_model: str  # OpenAI embedding model
    local_embed_model: str  # Ollama embedding model
+    summary_backend: str  # "local" or "cloud" — backend used to compact memory
    db_path: Path


@@ -31,5 +32,6 @@ def load() -> Config:
        embed_backend=os.getenv("EMBED_BACKEND", "cloud").lower(),
        embed_model=os.getenv("EMBED_MODEL", "text-embedding-3-small"),
        local_embed_model=os.getenv("LOCAL_EMBED_MODEL", "nomic-embed-text"),
+        summary_backend=os.getenv("SUMMARY_BACKEND", "local").lower(),
        db_path=Path(os.getenv("LYRA_DB_PATH", "data/lyra.db")),
    )