- llm.chat_call_stream: streaming generator for all 3 backends (Ollama NDJSON,
OpenAI/MI50 SSE), accumulating tool-call fragments by index.
- chat.respond_stream: mirrors respond()'s tool loop and persistence/compaction,
yielding ("delta", text) / ("tool", name) / ("done", reply).
- POST /v1/chat/stream: SSE endpoint; blocking generator bridged to async via a
worker thread + asyncio.Queue. Old completions endpoint kept as fallback.
- Client streams into a live bubble with a blinking caret; rAF-throttled render
(no full re-parse per token) and instant scroll during stream — fixes iOS
Safari ghosting from per-token smooth-scroll. Falls back to the blocking
endpoint only if nothing streamed (no double-persist).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brian can rate Lyra's outputs as he uses her; each rating is stored as a
(context, content, rating) triple — the shape a future fine-tune / preference
dataset wants, collected passively during real use.
- memory: ratings table + add_rating (upsert: one row per item, re-rating
replaces), list_ratings, rating_counts
- server: POST /rate, GET /ratings/counts, GET /ratings/export (JSONL download)
- chat UI: subtle 👍/👎 on each assistant reply, captures the prompting message
as context
- journal/reflection UI: 👍/👎 on each thought
- tests: counts + upsert-replace behavior
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pick which OpenAI model answers on the Cloud backend (gpt-4o / -mini / 4.1 /
4.1-mini / o4-mini, or Default). Persisted in localStorage, sent as `model` in
the chat request; respond() applies it only on the cloud backend (local/mi50
keep their fixed models). Reachable from desktop + mobile via Settings.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the poker copilot loop: talk through a session -> structured capture
-> generated writeup in Brian's format, remembered + exportable.
- poker.generate_recap(): LLM produces Brian's .md log (Session Header, Money
Flow, Overview, Timeline, Key Hands w/ assessments, Villain Notes, Confidence
Bank, Scar Notes, Mental Game, Final Assessment) from the session's structured
data + the linked chat conversation; stored on poker_sessions.recap_md
- sessions now capture chat_session_id (via tool ctx) to pull the right convo;
list_recent_hands() for browsing
- generate_recap tool ("write up the recap")
- web: /recap/{id} (renders the md) + /recap/{id}/download (.md attachment) +
/hands browser (recent hands -> /hand/{id}); nav links added (desktop + mobile)
- tests: recap generation (stubbed), recent-hands listing
Verified live: recap for the Meadows session rendered + downloaded; all pages 200.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brian's idea: vomit rough shorthand, Lyra rebuilds it into a structured,
replayable hand history.
- poker.parse_hand(): focused LLM pass turning shorthand into a canonical hand
JSON (positions, stacks, hero cards, chronological actions w/ board reveals,
result); store_hand_history() persists JSON + extracted flat fields;
record_hand() = parse+store; standalone hands attach to a 'Hand Reviews' session
- poker_hands gains a `structured` JSON column (ALTER-migrated for existing DBs)
- record_hand tool wired into chat: "log this hand: ..." -> reconstructed + a
/hand/{id} link
- web: GET /hand/{id} viewer + /hand/{id}/data — a felt table with seats placed
around the oval (hero at bottom), hole cards, progressive board reveal, and
prev/next/end step-through of the action with running pot
- tests: store/get roundtrip, record_hand tool (stubbed parse)
Verified live: parsed a real AKs hand (BTN, 14 actions, full board) end to end.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Her reflections/metacognition were capped rolling windows (6/5), so older
thoughts were lost for good. Now everything she produces is also appended to a
permanent, append-only journal; the capped lists stay as her working-memory
window for context.
- memory: journal table + add_journal_entry/list_journal
- reflect(): persists every committed reflection + critique to the journal, and
the examine step gains a "journal" field — a deliberate, first-person note she
writes for herself (her knowing journaling), tagged by source (dream/manual)
- web: /journal diary view (kind filters, grouped by day) + /journal/data;
linked from /self
- tests assert reflections + metacognition land in the journal
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
You couldn't see her actually correct herself — /self showed only the result.
Now:
- reflect() logs the draft, the revised/committed version, and the self-critique
to the live log as an expandable "view details" block
- POST /self/reflect runs a reflection in the web process so it lands in /logs
live (reflections normally run in the dream process, whose logs only go to
journald); "↻ Reflect now" button on /self triggers it, with a logs ↗ link
- log viewers relabel the expander "view full prompt" -> "view details" (it now
carries prompts and reflection diffs)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A pull-up-anytime view of Lyra's interiority, so her thoughts aren't buried in
a DB blob. Mobile-first, auto-refreshing every 12s (and on tab focus).
- GET /self serves the page; GET /self/state returns her self-state + the
timestamp it last changed
- shows: current mood + feeling meters (valence/energy/confidence/curiosity),
her drives as bars, her self-narrative, the relationship line, and the
reflections list (newest first), plus cycle/reflection counters and "last
cycle Xm ago"
- memory.self_state_updated_at(): when her mind last changed
- index.html: "🧠 Mind" button opens /self
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The inline log panel is cramped, especially on mobile. Add a standalone
mobile-first log page and serve the chat server under systemd like the dream
loop (the nohup process didn't survive cleanly).
- static/logs.html: full-page live log — level filter chips, text search,
pause/resume with buffering, autoscroll toggle, color-coded levels, and the
expandable "view full prompt" block (where the now-note is visible in context)
- server: GET /logs serves the page (FileResponse)
- index.html: "⛶ Full Log" button opens /logs in a new tab
- deploy/lyra-web.service: user service so the chat server is reboot-resilient
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MI50 box (CT202) runs an OpenAI-compatible llama.cpp server on
10.0.0.44:8080. Wire it in as a third backend:
- llm.complete gains backend="mi50" (OpenAI client pointed at MI50_BASE_URL)
- config: MI50_BASE_URL (default http://10.0.0.44:8080/v1) + MI50_MODEL
- chat.respond labels the model per backend; web _backend_for maps "mi50"
- UI backend selector adds "MI50 — local GPU"
Verified end-to-end: llm.complete(backend="mi50") returns from the live server.
See homelab-inference memory for the box topology.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Older sessions fade to a general idea; details stay retrievable.
- memory: summaries table (one compacted gist per session, embedded), plus
store_summary/get_summary/recall_summaries and unsummarized_count (tracks
exchanges newer than the current summary)
- lyra/summary.py: summarize_session compacts a session's raw turns into a
third-person gist (default SUMMARY_BACKEND=local, so compaction is free);
maybe_summarize re-summarizes once SUMMARIZE_AFTER new turns accumulate
- chat.build_messages now layers context in tiers: persona -> gists of other
sessions -> a few sharp raw cross-session details -> current session raw
turns -> new message; respond() compacts the session after each turn
- web: POST /sessions/{id}/summarize to compact on demand
- summarization activity surfaces in the live log
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Turn the inert "Show Work" thinking panel into a real live activity log:
- lyra/logbus.py: thread-safe in-memory ring buffer other modules publish to
- chat.respond logs backend/model/embed per turn, recall counts, reply size;
web layer logs chat errors
- server: replace the keep-alive /stream/thinking stub with /stream/logs, an
SSE endpoint that replays the recent buffer then streams new events
- UI: repurpose the panel as a global "Live Log" — connects on load, renders
level/time/msg/fields, drops the old per-session localStorage + dead popup
Every turn now shows its backend + model in-app, so local-vs-cloud (free vs
paid) is visible at a glance.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 1 — persona + persistent memory chat loop:
- lyra/persona.py + personas/lyra.md: editable identity/voice (friend-first,
honest, never invents poker math)
- lyra/chat.py: turn loop assembling persona + cross-session recall + recent
context, persisting both sides to SQLite
- lyra/session.py, lyra/__main__.py: session lifecycle + `lyra` REPL
Phase 1.25 — reuse the old web UI:
- vendored the prior single-page UI into lyra/web/static, repointed to
same-origin
- lyra/web/server.py (FastAPI): serves the UI and backs its endpoint contract
(/v1/chat/completions, session CRUD, health, inert thinking-stream) with the
new chat loop + memory; SQLite stays the single source of truth
- `lyra-web` console script
Local backends — test for free, no OpenAI key:
- llm.embed routes via EMBED_BACKEND (cloud=OpenAI, local=Ollama /api/embed)
- simplified UI backend selector to Local (Ollama) / Cloud (OpenAI), default local
- memory connection opened check_same_thread=False for the threaded server
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>