feat: tiered, compacting memory (phase 1.5)

Older sessions fade to a general idea; details stay retrievable. - memory: summaries table (one compacted gist per session, embedded), plus store_summary/get_summary/recall_summaries and unsummarized_count (tracks exchanges newer than the current summary) - lyra/summary.py: summarize_session compacts a session's raw turns into a third-person gist (default SUMMARY_BACKEND=local, so compaction is free); maybe_summarize re-summarizes once SUMMARIZE_AFTER new turns accumulate - chat.build_messages now layers context in tiers: persona -> gists of other sessions -> a few sharp raw cross-session details -> current session raw turns -> new message; respond() compacts the session after each turn - web: POST /sessions/{id}/summarize to compact on demand - summarization activity surfaces in the live log Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 18:52:58 +00:00
parent 84c4f75e03
commit d7c258eba0
6 changed files with 211 additions and 23 deletions
@@ -18,7 +18,7 @@ from fastapi import FastAPI, Request
 from fastapi.responses import StreamingResponse
 from fastapi.staticfiles import StaticFiles

-from lyra import chat, logbus, memory
+from lyra import chat, logbus, memory, summary
 from lyra.llm import Backend


@@ -77,6 +77,11 @@ def create_app() -> FastAPI:
        memory.delete_session(session_id)
        return {"ok": True}

+    @app.post("/sessions/{session_id}/summarize")
+    async def summarize(session_id: str) -> dict:
+        gist = await asyncio.to_thread(summary.summarize_session, session_id)
+        return {"ok": gist is not None, "summary": gist}
+
    @app.post("/v1/chat/completions")
    async def chat_completions(request: Request) -> dict:
        body = await request.json()