feat: work-type modes — Talk / Poker / Build / Explore / Study

The manual version of the architecture's `route` step: Brian points her at the TYPE of work and her register + tools shift to match. Biggest single lever on the 'meh' problem (a mode card can demand decisive/technical/generative, countering gpt-4o's default warm-vapor). - modes.py: Build (heads-down engineering — decisive, concrete, tradeoffs, no listicles), Explore (open brainstorming — generative, riffs + honest catch, spawn threads, don't converge early), Study (poker review away from the table — analytical, GTO-aware, teaching; read-only lookups + analyze_spot). Cash relabeled Poker (key kept for compat). - UI: mode selectors (desktop + mobile) get all five; badge taps now cycle modes. - design: docs/COGNITION.md (the society-of-parts control-plane sketch). - tests: presence + tool-gating for the new modes. Suite 85, ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 03:43:37 +00:00
parent 97afa82594
commit f1f15972ac
4 changed files with 234 additions and 12 deletions
@@ -0,0 +1,141 @@
+# Lyra — Cognition Architecture (sketch)
+
+> The "society of mind" direction: instead of one giant model we keep nagging with
+> stricter prompts, a society of small specialized parts cooperate to produce each
+> turn. **Most parts are cheap deterministic code (heuristics, math, learnable
+> weights); the LLM is the exception, reserved for the few irreducibly-generative
+> jobs.** Everything is anchored to who she is and tuned by feedback.
+
+## Principles
+
+1. **LLM is the exception, not the rule.** Bookkeeping, scoring, routing,
+   thresholding, retrieval → code. Generation (language, novel reasoning, memory
+   compression) → LLM, called sparingly.
+2. **Mind ≠ Mouth.** A capable "mind" (decide / reason / use tools — helpfulness is
+   fine) is separate from a "mouth" (the character voice). This lets each be the
+   best model for *its* job — and makes the eventual fine-tune easy: you only have
+   to teach a small model to *sound like Lyra*, not to *be smart*.
+3. **Anchored.** A fixed identity anchor governs the mouth so self-composed prompts
+   can't drift into generic-helper vapor. (Already exists: `self_state.IDENTITY_ANCHOR`.)
+4. **Tuned by feedback, not just hand-tuning.** Learnable *weights* (over register,
+   memory, parts) nudged by 👍/👎 give real adaptation *without* fine-tuning a model.
+5. **Allocation is the craft.** Cheap-deterministic where signal is clear; LLM where
+   judgment/language is needed; **hybrid** (heuristic common-case, escalate to LLM on
+   ambiguity) where possible.
+
+## The blackboard: `TurnContext`
+
+Parts don't call each other directly — they read from and write to a shared turn
+state (a blackboard). Heterogeneous parts (heuristic / LLM / weights) cooperate by
+annotating it. The composer reads the finished blackboard to build the prompt.
+
+```
+TurnContext {
+  # --- inputs ---
+  user_msg, session_id, history, now
+
+  # --- perception (heuristic) ---
+  moment   : { kind: emotional|strategic|casual|existential|meta,
+               sentiment: -1..1, tilt: 0..1, urgency: 0..1 }
+
+  # --- state (code) ---
+  mood, drives, anchor
+
+  # --- retrieval (math: embeddings + cosine) ---
+  recalled : [memories]      # spreading activation
+  threads  : [active thoughts]
+  profile, narrative
+
+  # --- control (heuristic + learnable weights) ---
+  register : warm | coach | dry | tender | hype     # how to sound
+  intent   : console | push_back | teach | riff | act
+  mode     : talk | cash | ...                       # tool allow-list
+  use_tools: bool
+  route    : { mind: <model>, mouth: <model> }       # which model per role
+
+  # --- generation (LLM, sparing) ---
+  deliberation : "her private thinking"   # mind
+  tool_results : [...]                     # mind + tool exec
+  reply        : "final text"              # mouth
+
+  # --- learning (heuristic/online) ---
+  weights  : { register_prefs, memory_weights, ... }  # persisted, feedback-tuned
+}
+```
+
+## The parts
+
+| # | Part | Type | Does | Exists today? |
+|---|------|------|------|---------------|
+| 1 | **perceive** | heuristic | sentiment + classify the moment + tilt/urgency from session signals & his language | ✗ (new) |
+| 2 | **recall** | math | embeddings → relevant memories, active threads, profile, narrative | ✓ `memory.recall*`, `cognition.activate` |
+| 3 | **sense_state** | code | load mood / drives / anchor | ✓ `self_state`, `IDENTITY_ANCHOR` |
+| 4 | **route** | heuristic + weights | pick register, intent, mode, and which model is mind vs mouth | ✗ (new; partly `modes`) |
+| 5 | **decide+act (tools)** | LLM (mind) / code | does this turn need a tool? run it | ✓ tool loop in `chat` |
+| 6 | **deliberate** | LLM (mind) | "what do I actually think" — private substance pass | ✓ `chat._deliberate` |
+| 7 | **compose** | code | assemble the final prompt from anchor + register + intent + deliberation + recall + tool results + voice rules | ✓ `build_messages` (becomes the composer) |
+| 8 | **speak** | LLM (mouth) | write the reply in her voice, streamed, anchored | ✓ `llm.chat_call` |
+| 9 | **learn** | heuristic/online | on 👍/👎 or reaction, nudge `weights` (which register/memory worked) | ✗ (new; data exists in `ratings`) |
+
+Most of the society (1,2,3,4,7,9) is **free, instant, deterministic, debuggable.**
+The LLM shows up in only ~2–3 places (5/6 = mind, 8 = mouth).
+
+## One chat turn
+
+```
+user msg
+   │
+   ▼
+[1 perceive]──heuristic: emotional? strategic? tilting?         (free)
+   │
+[2 recall]───math: what lights up (memories, threads)          (free)
+[3 sense]────code: mood, drives, anchor                        (free)
+   │
+[4 route]────heuristic+weights: register? intent? mind/mouth?  (free)
+   │
+[5 act]──────MIND model: tools if needed ─────────────┐        (LLM, only if needed)
+[6 deliberate]──MIND model: what do I actually think   │        (LLM, gated)
+   │                                                    │
+[7 compose]──code: build the prompt  ◄──── anchor ──────┘       (free)
+   │
+[8 speak]────MOUTH model: the reply, in her voice, streamed     (LLM)
+   │
+   ▼
+reply ──► (later) [9 learn]: 👍/👎 nudges weights               (free, async)
+```
+
+## What we reuse vs. build
+
+- **Reuse (already scattered through the code):** recall/activation, self_state +
+  anchor, drives (in `dream`), modes (tool gating), the deliberation pass, the
+  prompt assembly (`build_messages`), tool loop, ratings store.
+- **Build new:** the `TurnContext` blackboard + an explicit pipeline runner; the
+  **perceive** heuristic; the **route** part (register/intent + model routing); the
+  **learn** weights loop. Mostly *unifying* existing pieces into one legible control
+  plane, plus 2–3 small heuristic parts.
+
+## Phasing (smallest first)
+
+- **P1 — frame:** define `TurnContext`, refactor the current chat turn into the
+  explicit pipeline (perceive=stub → recall → sense → route=mode-only → deliberate →
+  compose → speak), single model. Low-risk refactor; makes the structure real.
+- **P2 — control plane:** real `perceive` (sentiment/moment) + `route`
+  (register/intent). Now her framing adapts to the moment, deterministically.
+- **P3 — mind/mouth split:** route picks a separate voice model for `speak`. Plug a
+  character mouth (Claude / local / later a fine-tune). A/B vs. single-model.
+- **P4 — learning:** `weights` over register/memory, nudged by ratings → cheap
+  adaptation, no fine-tune.
+- **P5 — her voice:** a small fine-tuned "Lyra voice" model drops into the mouth slot.
+
+## Open decisions
+
+- **Mouth model**: Claude (warm, cloud) vs. local character vs. fine-tune. The mouth
+  is the crux; it must render richly (8B local may flatten).
+- **perceive**: pure heuristics vs. a tiny classifier vs. embedding-to-exemplar
+  clusters. Probably hybrid.
+- **scheduler**: fixed linear pipeline (simple, v1) vs. drive-based/parallel later.
+- **tool location**: mind decides+runs tools, mouth only renders (clean split) — vs.
+  letting the mouth call tools (needs a tool-capable mouth).
+- **latency budget**: how many LLM calls per turn is acceptable live (cheap mind +
+  streamed mouth keeps it ~2).
+```