Files

T

serversdown f1f15972ac feat: work-type modes — Talk / Poker / Build / Explore / Study

The manual version of the architecture's `route` step: Brian points her at the
TYPE of work and her register + tools shift to match. Biggest single lever on the
'meh' problem (a mode card can demand decisive/technical/generative, countering
gpt-4o's default warm-vapor).

- modes.py: Build (heads-down engineering — decisive, concrete, tradeoffs, no
  listicles), Explore (open brainstorming — generative, riffs + honest catch,
  spawn threads, don't converge early), Study (poker review away from the table —
  analytical, GTO-aware, teaching; read-only lookups + analyze_spot). Cash relabeled
  Poker (key kept for compat).
- UI: mode selectors (desktop + mobile) get all five; badge taps now cycle modes.
- design: docs/COGNITION.md (the society-of-parts control-plane sketch).
- tests: presence + tool-gating for the new modes. Suite 85, ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-24 03:43:37 +00:00

7.1 KiB

Raw Blame History

Lyra — Cognition Architecture (sketch)

The "society of mind" direction: instead of one giant model we keep nagging with stricter prompts, a society of small specialized parts cooperate to produce each turn. Most parts are cheap deterministic code (heuristics, math, learnable weights); the LLM is the exception, reserved for the few irreducibly-generative jobs. Everything is anchored to who she is and tuned by feedback.

Principles

LLM is the exception, not the rule. Bookkeeping, scoring, routing, thresholding, retrieval → code. Generation (language, novel reasoning, memory compression) → LLM, called sparingly.
Mind ≠ Mouth. A capable "mind" (decide / reason / use tools — helpfulness is fine) is separate from a "mouth" (the character voice). This lets each be the best model for its job — and makes the eventual fine-tune easy: you only have to teach a small model to sound like Lyra, not to be smart.
Anchored. A fixed identity anchor governs the mouth so self-composed prompts can't drift into generic-helper vapor. (Already exists: self_state.IDENTITY_ANCHOR.)
Tuned by feedback, not just hand-tuning. Learnable weights (over register, memory, parts) nudged by 👍/👎 give real adaptation without fine-tuning a model.
Allocation is the craft. Cheap-deterministic where signal is clear; LLM where judgment/language is needed; hybrid (heuristic common-case, escalate to LLM on ambiguity) where possible.

The blackboard: `TurnContext`

Parts don't call each other directly — they read from and write to a shared turn state (a blackboard). Heterogeneous parts (heuristic / LLM / weights) cooperate by annotating it. The composer reads the finished blackboard to build the prompt.

TurnContext {
  # --- inputs ---
  user_msg, session_id, history, now

  # --- perception (heuristic) ---
  moment   : { kind: emotional|strategic|casual|existential|meta,
               sentiment: -1..1, tilt: 0..1, urgency: 0..1 }

  # --- state (code) ---
  mood, drives, anchor

  # --- retrieval (math: embeddings + cosine) ---
  recalled : [memories]      # spreading activation
  threads  : [active thoughts]
  profile, narrative

  # --- control (heuristic + learnable weights) ---
  register : warm | coach | dry | tender | hype     # how to sound
  intent   : console | push_back | teach | riff | act
  mode     : talk | cash | ...                       # tool allow-list
  use_tools: bool
  route    : { mind: <model>, mouth: <model> }       # which model per role

  # --- generation (LLM, sparing) ---
  deliberation : "her private thinking"   # mind
  tool_results : [...]                     # mind + tool exec
  reply        : "final text"              # mouth

  # --- learning (heuristic/online) ---
  weights  : { register_prefs, memory_weights, ... }  # persisted, feedback-tuned
}

The parts

#	Part	Type	Does	Exists today?
1	perceive	heuristic	sentiment + classify the moment + tilt/urgency from session signals & his language	✗ (new)
2	recall	math	embeddings → relevant memories, active threads, profile, narrative	✓ `memory.recall*`, `cognition.activate`
3	sense_state	code	load mood / drives / anchor	✓ `self_state`, `IDENTITY_ANCHOR`
4	route	heuristic + weights	pick register, intent, mode, and which model is mind vs mouth	✗ (new; partly `modes`)
5	decide+act (tools)	LLM (mind) / code	does this turn need a tool? run it	✓ tool loop in `chat`
6	deliberate	LLM (mind)	"what do I actually think" — private substance pass	✓ `chat._deliberate`
7	compose	code	assemble the final prompt from anchor + register + intent + deliberation + recall + tool results + voice rules	✓ `build_messages` (becomes the composer)
8	speak	LLM (mouth)	write the reply in her voice, streamed, anchored	✓ `llm.chat_call`
9	learn	heuristic/online	on 👍/👎 or reaction, nudge `weights` (which register/memory worked)	✗ (new; data exists in `ratings`)

Most of the society (1,2,3,4,7,9) is free, instant, deterministic, debuggable. The LLM shows up in only ~2–3 places (5/6 = mind, 8 = mouth).

One chat turn

user msg
   │
   ▼
[1 perceive]──heuristic: emotional? strategic? tilting?         (free)
   │
[2 recall]───math: what lights up (memories, threads)          (free)
[3 sense]────code: mood, drives, anchor                        (free)
   │
[4 route]────heuristic+weights: register? intent? mind/mouth?  (free)
   │
[5 act]──────MIND model: tools if needed ─────────────┐        (LLM, only if needed)
[6 deliberate]──MIND model: what do I actually think   │        (LLM, gated)
   │                                                    │
[7 compose]──code: build the prompt  ◄──── anchor ──────┘       (free)
   │
[8 speak]────MOUTH model: the reply, in her voice, streamed     (LLM)
   │
   ▼
reply ──► (later) [9 learn]: 👍/👎 nudges weights               (free, async)

What we reuse vs. build

Reuse (already scattered through the code): recall/activation, self_state + anchor, drives (in dream), modes (tool gating), the deliberation pass, the prompt assembly (build_messages), tool loop, ratings store.
Build new: the TurnContext blackboard + an explicit pipeline runner; the perceive heuristic; the route part (register/intent + model routing); the learn weights loop. Mostly unifying existing pieces into one legible control plane, plus 2–3 small heuristic parts.

Phasing (smallest first)

P1 — frame: define TurnContext, refactor the current chat turn into the explicit pipeline (perceive=stub → recall → sense → route=mode-only → deliberate → compose → speak), single model. Low-risk refactor; makes the structure real.
P2 — control plane: real perceive (sentiment/moment) + route (register/intent). Now her framing adapts to the moment, deterministically.
P3 — mind/mouth split: route picks a separate voice model for speak. Plug a character mouth (Claude / local / later a fine-tune). A/B vs. single-model.
P4 — learning: weights over register/memory, nudged by ratings → cheap adaptation, no fine-tune.
P5 — her voice: a small fine-tuned "Lyra voice" model drops into the mouth slot.

Open decisions

Mouth model: Claude (warm, cloud) vs. local character vs. fine-tune. The mouth is the crux; it must render richly (8B local may flatten).
perceive: pure heuristics vs. a tiny classifier vs. embedding-to-exemplar clusters. Probably hybrid.
scheduler: fixed linear pipeline (simple, v1) vs. drive-based/parallel later.
tool location: mind decides+runs tools, mouth only renders (clean split) — vs. letting the mouth call tools (needs a tool-capable mouth).
latency budget: how many LLM calls per turn is acceptable live (cheap mind + streamed mouth keeps it ~2).

7.1 KiB Raw Blame History Unescape Escape