project-lyra

Author	SHA1	Message	Date
serversdown	5176c706b6	feat: thought loop — Lyra's threaded, surfaceable train of thought Built from her own 6-19 idea: a continuing train of thought she keeps across days, organized into threads she returns to, that she can bring TO Brian and that his feedback advances or closes. Where the dream cycle's reflect() gives isolated, overwriting reflections, the thought loop adds continuity (threads), surfacing (#6 — she leads with a thought when Brian returns after a gap), and a feedback loop (his reply folds in next pass). - lyra/thoughts.py: thought_threads + thoughts tables; think() with new/continue/respond modes; salience-gated maybe_surface(); record_response() feedback; lazy-schema _c() mirroring poker. - dream.py: curiosity stage advances the loop after reflecting (error-isolated). - chat.py: build_messages surfaces the top thread after a >=90min gap, once. - web: /thoughts feed (page + data + respond + status routes), thoughts.html, nav 💭 entry. lyra-think entry point. Every thought also lands in her journal. - clock.gap_seconds(); tests/test_thoughts.py (8 tests). Full suite 58 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 07:05:15 +00:00
serversdown	5c41bd48d1	fix: consolidation no longer stalls or breaks the live chat turn Two bugs surfacing in the log during live play: - SUMMARY_BACKEND=mi50 (llama.cpp, 32B) was fed 24k-char chunks → "Context size has been exceeded". Chunk budget is now backend-aware: cloud 24k, local/mi50 8k, and the merge step recurses so merged partials never overflow either. - maybe_summarize ran inline in the chat turn and retried 4× with backoff (~30s), stalling the reply and surfacing the error. It now runs in a background daemon thread, swallows errors (consolidation is best-effort maintenance), and dedupes so at most one summary per session runs at a time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 04:37:17 +00:00
serversdown	974ee33f71	feat: live mental-game rituals in Cash mode Brian's own rituals (mined from his logs) become first-class, live tools instead of post-hoc recap sections: - Scar Note — instructive mistakes with the punt/cooler/standard distinction. - Confidence Bank — good process, banked regardless of result. - Alligator Blood — invokable adversity state; she suggests it when he's card-dead/short/stuck, and her coaching register shifts while it's on (live state injected into context per-turn via chat._mode_state_note). - Reset — tilt circuit-breaker; mental marker only, stats stay continuous. poker_rituals table + log_ritual/list_rituals/set_alligator/alligator_active; 4 tools added to the Cash toolset and taught in the mode card; HUD gains a 🐊 banner + Confidence Bank + Scar Notes panels; recap grounded via _rituals_block. tests/test_modes.py +5 ritual tests; 41 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 06:24:28 +00:00
serversdown	dfb6425395	feat: session modes (Talk/Cash) + live session HUD Lyra now switches register based on what she's doing at the table instead of being a wishy-washy companion mid-session. Modes (lyra/modes.py): - Talk (default companion) + Cash (live cash copilot); a mode = prompt card + tool allow-list. Tool gating via tools.specs(allow=). - Two-register Cash voice: act-first one-line logging when fed facts; full warm companion voice for strategy / tilt / mental game. - mode persisted per chat session (new sessions.mode column); auto-switch into Cash when start_session fires; UI forces cloud backend in Cash (tools only fire there). Stack tracking + HUD: - log_stack tool + poker_stack_log table; live net while sitting (stack - buy-in). - poker.hud() bundle; /session HUD page (stack sparkline, hands, villains, notes, stats) polling /session/data every 5s; Talk/Cash switcher + Session nav. Endpoints: /session, /session/data, GET/POST /sessions/{id}/mode, /modes. tests/test_modes.py (gating, mode roundtrip, stack/HUD); 36 tests green. v0.3.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 05:28:15 +00:00
serversdown	5dc3fa17d7	feat(web): stream chat replies token-by-token (M3) - llm.chat_call_stream: streaming generator for all 3 backends (Ollama NDJSON, OpenAI/MI50 SSE), accumulating tool-call fragments by index. - chat.respond_stream: mirrors respond()'s tool loop and persistence/compaction, yielding ("delta", text) / ("tool", name) / ("done", reply). - POST /v1/chat/stream: SSE endpoint; blocking generator bridged to async via a worker thread + asyncio.Queue. Old completions endpoint kept as fallback. - Client streams into a live bubble with a blinking caret; rAF-throttled render (no full re-parse per token) and instant scroll during stream — fixes iOS Safari ghosting from per-token smooth-scroll. Falls back to the blocking endpoint only if nothing streamed (no double-persist). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 00:06:51 +00:00
serversdown	03620e1a64	feat(web): cloud chat-model selector in Settings Pick which OpenAI model answers on the Cloud backend (gpt-4o / -mini / 4.1 / 4.1-mini / o4-mini, or Default). Persisted in localStorage, sent as `model` in the chat request; respond() applies it only on the cloud backend (local/mi50 keep their fixed models). Reachable from desktop + mobile via Settings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-18 18:55:45 +00:00
serversdown	ac04ad1df6	fix: only send tools to backends that support them (cloud) The MI50 llama.cpp server 500s on the `tools` param unless launched with --jinja, so sending tools to mi50 broke chat on that backend. Gate tools to TOOL_BACKENDS={"cloud"} for now; mi50 chat works again (just without tools). Add "mi50" once its server runs with --jinja. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 20:52:47 +00:00
serversdown	a5477ae15c	feat: tool use — Lyra's first real actions (journal_write, note) She can now do things mid-conversation, not just reply. Adds a tool-calling loop to the chat path and her first two tools; the same mechanism will carry the poker tools (start_session, log_result, get_stats, solver) next. - tools.py: registry of OpenAI-style tool specs + handlers + safe dispatch; journal_write (knowing journaling) and note (tagged notepad, e.g. poker reads) - llm.chat_call(): OpenAI-style call that returns tool_calls (cloud/mi50); local has no tool support and returns plain content - chat.respond(): tool loop — offer tools, run any calls, feed results back, repeat until a text reply (capped at MAX_TOOL_ROUNDS); persists final reply - tests: dispatch + full chat loop (tool call -> result -> reply) Verified live: she invoked `note`, tagged it 'poker', stored a villain read. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 19:04:34 +00:00
serversdown	2d44457b96	fix: gists show the conversation's real date, not the summarize-run date Summaries displayed s.created_at (set to now() at summarize time), so every imported gist read 2026-06-16. Derive the actual session date from the earliest exchange timestamp (MIN(created_at) per session — the preserved original date, same source the era rollups use) via a correlated subquery in the summary readers. New Summary.session_started_at field; chat shows it (falling back to created_at). No schema change / backfill needed — always correct from source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 04:23:14 +00:00
serversdown	1e17d46c78	feat: time awareness — Lyra perceives 'now' and how long it's been She had no clock: current date/time and the gap since Brian last spoke were invisible between turns, and reflection was timeless. Now: - lyra/clock.py: wall-clock stamp + coarse human gaps ("3 days") - chat: inject a 'now' note (date/time + gap since last turn) after her self-state — when she is, before the world - reflect(): feed current time + silence gap into reflection, neutrally — prompt invites her to weigh elapsed time "to whatever degree it genuinely affects you" (no prescribed feeling; whether silence means anything is left to emerge) - memory.last_exchange_at(): timestamp of the most recent exchange Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 02:31:40 +00:00
serversdown	f3530cf4ae	feat: separate CHAT_MODEL (gpt-4o) for persona fidelity Mid-size models (gpt-4o-mini, qwen2.5-14b) resist persona instructions — help-desk closers and feelings-disclaimers leak through regardless. Route live chat to a stronger model while keeping bulk consolidation cheap: - config: CHAT_MODEL (default gpt-4o), distinct from CLOUD_MODEL (gpt-4o-mini) - llm.complete gains a `model` override; chat.respond uses chat_model on cloud, consolidation paths keep cloud_model - persona: reword the "no sign-off" rule so genuine questions are welcome and only reflexive customer-service closers are discouraged Verified: on gpt-4o she owns her mood without disclaimers and drops most help-desk tails — clearly more in-character than mini/qwen. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 21:05:47 +00:00
serversdown	ac505243a0	feat: Autonomy Core v1 — Lyra's evolving self-state Give Lyra a model of herself (vs the profile/narrative which model Brian): - persona: a real origin/identity — she's an AI and knows it (Bender/C-3PO style), with the Cortex/NeoMem lineage as her actual past, so "how were you made" stops falling through to generic-assistant deflection. - memory: self_state table (JSON blob) + get/set_self_state. - lyra/self_state.py: evolving first-person inner state (mood, valence, energy, confidence, curiosity, self_narrative, relationship, reflections). render_for_ context injects it; reflect() updates it from recent activity. `lyra-reflect`. - chat.build_messages injects her interiority right after the persona — she speaks from a continuous self, not a reset. The state -> behavior -> reflection -> updated state loop is the substrate for the emergence experiment. Verified: reflection shifted mood curious->reflective and produced genuine first-person self-observations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 20:36:33 +00:00
serversdown	bfb81428ab	feat: era-rollup + narrative engine (consolidation steps 3-4) Complete the consolidation pipeline: summaries -> profile + eras -> narrative. - memory: eras table (per-month digests) + Era, summaries_by_month, store_era, list_eras, recall_eras; narrative table + set/get_narrative - lyra/era.py (lyra-era): groups session gists by the month the session occurred (real timestamps) and map-reduces each month into a "what was happening" digest - lyra/narrative.py (lyra-narrative): distills profile + recent eras into the current arc/trends/callbacks ("remember when…", "you're trending toward…") - chat.build_messages injects the narrative alongside the profile Verified on the real corpus: 17 monthly eras (Dec 2024-Jun 2026) + a narrative that surfaces specific callbacks (the $573 Hollywood session, 4 years sober). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 19:28:01 +00:00
serversdown	30185f3fd8	feat: MI50 as a Lyra backend (OpenAI-compatible local GPU) The MI50 box (CT202) runs an OpenAI-compatible llama.cpp server on 10.0.0.44:8080. Wire it in as a third backend: - llm.complete gains backend="mi50" (OpenAI client pointed at MI50_BASE_URL) - config: MI50_BASE_URL (default http://10.0.0.44:8080/v1) + MI50_MODEL - chat.respond labels the model per backend; web _backend_for maps "mi50" - UI backend selector adds "MI50 — local GPU" Verified end-to-end: llm.complete(backend="mi50") returns from the live server. See homelab-inference memory for the box topology. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 05:37:22 +00:00
serversdown	ecf0b852f9	feat: profile layer — semantic memory (consolidation step 2) Derive a standing profile of the user from session gists and inject it into every prompt, so identity/abstract questions ("what kind of player am I", "what are my leaks") are answered from distilled knowledge instead of noisy single-vector recall (which finds passages, not patterns). - memory: profile table + get/set_profile, list_summaries - lyra/profile.py: rebuild_profile map-reduces all gists (batch -> extract durable facts -> fold-merge) into one profile doc; `lyra-profile` CLI - chat.build_messages injects "What you know about Brian" after the persona Run after lyra-summarize (needs gists). Verified (stubbed): map-reduce, storage, and prompt injection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 04:11:19 +00:00
serversdown	236a16b331	feat: inspect the full prompt in the live log The "context built" event now carries the fully-rendered prompt (persona, gists, recalled details, recent turns, the new message) plus a total char count. The log panel renders it as a collapsed "view full prompt" block — clean by default, one click to see exactly what hit the model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 23:52:35 +00:00
serversdown	d7c258eba0	feat: tiered, compacting memory (phase 1.5) Older sessions fade to a general idea; details stay retrievable. - memory: summaries table (one compacted gist per session, embedded), plus store_summary/get_summary/recall_summaries and unsummarized_count (tracks exchanges newer than the current summary) - lyra/summary.py: summarize_session compacts a session's raw turns into a third-person gist (default SUMMARY_BACKEND=local, so compaction is free); maybe_summarize re-summarizes once SUMMARIZE_AFTER new turns accumulate - chat.build_messages now layers context in tiers: persona -> gists of other sessions -> a few sharp raw cross-session details -> current session raw turns -> new message; respond() compacts the session after each turn - web: POST /sessions/{id}/summarize to compact on demand - summarization activity surfaces in the live log Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 18:52:58 +00:00
serversdown	84c4f75e03	feat: in-app live log (SSE activity feed) Turn the inert "Show Work" thinking panel into a real live activity log: - lyra/logbus.py: thread-safe in-memory ring buffer other modules publish to - chat.respond logs backend/model/embed per turn, recall counts, reply size; web layer logs chat errors - server: replace the keep-alive /stream/thinking stub with /stream/logs, an SSE endpoint that replays the recent buffer then streams new events - UI: repurpose the panel as a global "Live Log" — connects on load, renders level/time/msg/fields, drops the old per-session localStorage + dead popup Every turn now shows its backend + model in-app, so local-vs-cloud (free vs paid) is visible at a glance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 18:45:05 +00:00
serversdown	3b9e0bb1e0	feat: persona chat loop, web UI, and local (Ollama) embeddings Phase 1 — persona + persistent memory chat loop: - lyra/persona.py + personas/lyra.md: editable identity/voice (friend-first, honest, never invents poker math) - lyra/chat.py: turn loop assembling persona + cross-session recall + recent context, persisting both sides to SQLite - lyra/session.py, lyra/__main__.py: session lifecycle + `lyra` REPL Phase 1.25 — reuse the old web UI: - vendored the prior single-page UI into lyra/web/static, repointed to same-origin - lyra/web/server.py (FastAPI): serves the UI and backs its endpoint contract (/v1/chat/completions, session CRUD, health, inert thinking-stream) with the new chat loop + memory; SQLite stays the single source of truth - `lyra-web` console script Local backends — test for free, no OpenAI key: - llm.embed routes via EMBED_BACKEND (cloud=OpenAI, local=Ollama /api/embed) - simplified UI backend selector to Local (Ollama) / Cloud (OpenAI), default local - memory connection opened check_same_thread=False for the threaded server Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 18:36:31 +00:00

19 Commits