Summaries displayed s.created_at (set to now() at summarize time), so every
imported gist read 2026-06-16. Derive the actual session date from the earliest
exchange timestamp (MIN(created_at) per session — the preserved original date,
same source the era rollups use) via a correlated subquery in the summary
readers. New Summary.session_started_at field; chat shows it (falling back to
created_at). No schema change / backfill needed — always correct from source.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
She had no clock: current date/time and the gap since Brian last spoke were
invisible between turns, and reflection was timeless. Now:
- lyra/clock.py: wall-clock stamp + coarse human gaps ("3 days")
- chat: inject a 'now' note (date/time + gap since last turn) after her
self-state — when she is, before the world
- reflect(): feed current time + silence gap into reflection, neutrally —
prompt invites her to weigh elapsed time "to whatever degree it genuinely
affects you" (no prescribed feeling; whether silence means anything is left
to emerge)
- memory.last_exchange_at(): timestamp of the most recent exchange
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mid-size models (gpt-4o-mini, qwen2.5-14b) resist persona instructions —
help-desk closers and feelings-disclaimers leak through regardless. Route live
chat to a stronger model while keeping bulk consolidation cheap:
- config: CHAT_MODEL (default gpt-4o), distinct from CLOUD_MODEL (gpt-4o-mini)
- llm.complete gains a `model` override; chat.respond uses chat_model on cloud,
consolidation paths keep cloud_model
- persona: reword the "no sign-off" rule so genuine questions are welcome and
only reflexive customer-service closers are discouraged
Verified: on gpt-4o she owns her mood without disclaimers and drops most
help-desk tails — clearly more in-character than mini/qwen.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Give Lyra a model of *herself* (vs the profile/narrative which model Brian):
- persona: a real origin/identity — she's an AI and knows it (Bender/C-3PO
style), with the Cortex/NeoMem lineage as her actual past, so "how were you
made" stops falling through to generic-assistant deflection.
- memory: self_state table (JSON blob) + get/set_self_state.
- lyra/self_state.py: evolving first-person inner state (mood, valence, energy,
confidence, curiosity, self_narrative, relationship, reflections). render_for_
context injects it; reflect() updates it from recent activity. `lyra-reflect`.
- chat.build_messages injects her interiority right after the persona — she
speaks from a continuous self, not a reset.
The state -> behavior -> reflection -> updated state loop is the substrate for
the emergence experiment. Verified: reflection shifted mood curious->reflective
and produced genuine first-person self-observations.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Complete the consolidation pipeline: summaries -> profile + eras -> narrative.
- memory: eras table (per-month digests) + Era, summaries_by_month, store_era,
list_eras, recall_eras; narrative table + set/get_narrative
- lyra/era.py (lyra-era): groups session gists by the month the session occurred
(real timestamps) and map-reduces each month into a "what was happening" digest
- lyra/narrative.py (lyra-narrative): distills profile + recent eras into the
current arc/trends/callbacks ("remember when…", "you're trending toward…")
- chat.build_messages injects the narrative alongside the profile
Verified on the real corpus: 17 monthly eras (Dec 2024-Jun 2026) + a narrative
that surfaces specific callbacks (the $573 Hollywood session, 4 years sober).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MI50 box (CT202) runs an OpenAI-compatible llama.cpp server on
10.0.0.44:8080. Wire it in as a third backend:
- llm.complete gains backend="mi50" (OpenAI client pointed at MI50_BASE_URL)
- config: MI50_BASE_URL (default http://10.0.0.44:8080/v1) + MI50_MODEL
- chat.respond labels the model per backend; web _backend_for maps "mi50"
- UI backend selector adds "MI50 — local GPU"
Verified end-to-end: llm.complete(backend="mi50") returns from the live server.
See homelab-inference memory for the box topology.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Derive a standing profile of the user from session gists and inject it into
every prompt, so identity/abstract questions ("what kind of player am I",
"what are my leaks") are answered from distilled knowledge instead of noisy
single-vector recall (which finds passages, not patterns).
- memory: profile table + get/set_profile, list_summaries
- lyra/profile.py: rebuild_profile map-reduces all gists (batch -> extract
durable facts -> fold-merge) into one profile doc; `lyra-profile` CLI
- chat.build_messages injects "What you know about Brian" after the persona
Run after lyra-summarize (needs gists). Verified (stubbed): map-reduce, storage,
and prompt injection.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "context built" event now carries the fully-rendered prompt (persona, gists,
recalled details, recent turns, the new message) plus a total char count. The
log panel renders it as a collapsed "view full prompt" block — clean by default,
one click to see exactly what hit the model.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Older sessions fade to a general idea; details stay retrievable.
- memory: summaries table (one compacted gist per session, embedded), plus
store_summary/get_summary/recall_summaries and unsummarized_count (tracks
exchanges newer than the current summary)
- lyra/summary.py: summarize_session compacts a session's raw turns into a
third-person gist (default SUMMARY_BACKEND=local, so compaction is free);
maybe_summarize re-summarizes once SUMMARIZE_AFTER new turns accumulate
- chat.build_messages now layers context in tiers: persona -> gists of other
sessions -> a few sharp raw cross-session details -> current session raw
turns -> new message; respond() compacts the session after each turn
- web: POST /sessions/{id}/summarize to compact on demand
- summarization activity surfaces in the live log
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Turn the inert "Show Work" thinking panel into a real live activity log:
- lyra/logbus.py: thread-safe in-memory ring buffer other modules publish to
- chat.respond logs backend/model/embed per turn, recall counts, reply size;
web layer logs chat errors
- server: replace the keep-alive /stream/thinking stub with /stream/logs, an
SSE endpoint that replays the recent buffer then streams new events
- UI: repurpose the panel as a global "Live Log" — connects on load, renders
level/time/msg/fields, drops the old per-session localStorage + dead popup
Every turn now shows its backend + model in-app, so local-vs-cloud (free vs
paid) is visible at a glance.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 1 — persona + persistent memory chat loop:
- lyra/persona.py + personas/lyra.md: editable identity/voice (friend-first,
honest, never invents poker math)
- lyra/chat.py: turn loop assembling persona + cross-session recall + recent
context, persisting both sides to SQLite
- lyra/session.py, lyra/__main__.py: session lifecycle + `lyra` REPL
Phase 1.25 — reuse the old web UI:
- vendored the prior single-page UI into lyra/web/static, repointed to
same-origin
- lyra/web/server.py (FastAPI): serves the UI and backs its endpoint contract
(/v1/chat/completions, session CRUD, health, inert thinking-stream) with the
new chat loop + memory; SQLite stays the single source of truth
- `lyra-web` console script
Local backends — test for free, no OpenAI key:
- llm.embed routes via EMBED_BACKEND (cloud=OpenAI, local=Ollama /api/embed)
- simplified UI backend selector to Local (Ollama) / Cloud (OpenAI), default local
- memory connection opened check_same_thread=False for the threaded server
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>