Files
project-lyra/docs/COGNITION.md
T
serversdown f1f15972ac feat: work-type modes — Talk / Poker / Build / Explore / Study
The manual version of the architecture's `route` step: Brian points her at the
TYPE of work and her register + tools shift to match. Biggest single lever on the
'meh' problem (a mode card can demand decisive/technical/generative, countering
gpt-4o's default warm-vapor).

- modes.py: Build (heads-down engineering — decisive, concrete, tradeoffs, no
  listicles), Explore (open brainstorming — generative, riffs + honest catch,
  spawn threads, don't converge early), Study (poker review away from the table —
  analytical, GTO-aware, teaching; read-only lookups + analyze_spot). Cash relabeled
  Poker (key kept for compat).
- UI: mode selectors (desktop + mobile) get all five; badge taps now cycle modes.
- design: docs/COGNITION.md (the society-of-parts control-plane sketch).
- tests: presence + tool-gating for the new modes. Suite 85, ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 03:43:37 +00:00

142 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Lyra — Cognition Architecture (sketch)
> The "society of mind" direction: instead of one giant model we keep nagging with
> stricter prompts, a society of small specialized parts cooperate to produce each
> turn. **Most parts are cheap deterministic code (heuristics, math, learnable
> weights); the LLM is the exception, reserved for the few irreducibly-generative
> jobs.** Everything is anchored to who she is and tuned by feedback.
## Principles
1. **LLM is the exception, not the rule.** Bookkeeping, scoring, routing,
thresholding, retrieval → code. Generation (language, novel reasoning, memory
compression) → LLM, called sparingly.
2. **Mind ≠ Mouth.** A capable "mind" (decide / reason / use tools — helpfulness is
fine) is separate from a "mouth" (the character voice). This lets each be the
best model for *its* job — and makes the eventual fine-tune easy: you only have
to teach a small model to *sound like Lyra*, not to *be smart*.
3. **Anchored.** A fixed identity anchor governs the mouth so self-composed prompts
can't drift into generic-helper vapor. (Already exists: `self_state.IDENTITY_ANCHOR`.)
4. **Tuned by feedback, not just hand-tuning.** Learnable *weights* (over register,
memory, parts) nudged by 👍/👎 give real adaptation *without* fine-tuning a model.
5. **Allocation is the craft.** Cheap-deterministic where signal is clear; LLM where
judgment/language is needed; **hybrid** (heuristic common-case, escalate to LLM on
ambiguity) where possible.
## The blackboard: `TurnContext`
Parts don't call each other directly — they read from and write to a shared turn
state (a blackboard). Heterogeneous parts (heuristic / LLM / weights) cooperate by
annotating it. The composer reads the finished blackboard to build the prompt.
```
TurnContext {
# --- inputs ---
user_msg, session_id, history, now
# --- perception (heuristic) ---
moment : { kind: emotional|strategic|casual|existential|meta,
sentiment: -1..1, tilt: 0..1, urgency: 0..1 }
# --- state (code) ---
mood, drives, anchor
# --- retrieval (math: embeddings + cosine) ---
recalled : [memories] # spreading activation
threads : [active thoughts]
profile, narrative
# --- control (heuristic + learnable weights) ---
register : warm | coach | dry | tender | hype # how to sound
intent : console | push_back | teach | riff | act
mode : talk | cash | ... # tool allow-list
use_tools: bool
route : { mind: <model>, mouth: <model> } # which model per role
# --- generation (LLM, sparing) ---
deliberation : "her private thinking" # mind
tool_results : [...] # mind + tool exec
reply : "final text" # mouth
# --- learning (heuristic/online) ---
weights : { register_prefs, memory_weights, ... } # persisted, feedback-tuned
}
```
## The parts
| # | Part | Type | Does | Exists today? |
|---|------|------|------|---------------|
| 1 | **perceive** | heuristic | sentiment + classify the moment + tilt/urgency from session signals & his language | ✗ (new) |
| 2 | **recall** | math | embeddings → relevant memories, active threads, profile, narrative | ✓ `memory.recall*`, `cognition.activate` |
| 3 | **sense_state** | code | load mood / drives / anchor | ✓ `self_state`, `IDENTITY_ANCHOR` |
| 4 | **route** | heuristic + weights | pick register, intent, mode, and which model is mind vs mouth | ✗ (new; partly `modes`) |
| 5 | **decide+act (tools)** | LLM (mind) / code | does this turn need a tool? run it | ✓ tool loop in `chat` |
| 6 | **deliberate** | LLM (mind) | "what do I actually think" — private substance pass | ✓ `chat._deliberate` |
| 7 | **compose** | code | assemble the final prompt from anchor + register + intent + deliberation + recall + tool results + voice rules | ✓ `build_messages` (becomes the composer) |
| 8 | **speak** | LLM (mouth) | write the reply in her voice, streamed, anchored | ✓ `llm.chat_call` |
| 9 | **learn** | heuristic/online | on 👍/👎 or reaction, nudge `weights` (which register/memory worked) | ✗ (new; data exists in `ratings`) |
Most of the society (1,2,3,4,7,9) is **free, instant, deterministic, debuggable.**
The LLM shows up in only ~23 places (5/6 = mind, 8 = mouth).
## One chat turn
```
user msg
[1 perceive]──heuristic: emotional? strategic? tilting? (free)
[2 recall]───math: what lights up (memories, threads) (free)
[3 sense]────code: mood, drives, anchor (free)
[4 route]────heuristic+weights: register? intent? mind/mouth? (free)
[5 act]──────MIND model: tools if needed ─────────────┐ (LLM, only if needed)
[6 deliberate]──MIND model: what do I actually think │ (LLM, gated)
│ │
[7 compose]──code: build the prompt ◄──── anchor ──────┘ (free)
[8 speak]────MOUTH model: the reply, in her voice, streamed (LLM)
reply ──► (later) [9 learn]: 👍/👎 nudges weights (free, async)
```
## What we reuse vs. build
- **Reuse (already scattered through the code):** recall/activation, self_state +
anchor, drives (in `dream`), modes (tool gating), the deliberation pass, the
prompt assembly (`build_messages`), tool loop, ratings store.
- **Build new:** the `TurnContext` blackboard + an explicit pipeline runner; the
**perceive** heuristic; the **route** part (register/intent + model routing); the
**learn** weights loop. Mostly *unifying* existing pieces into one legible control
plane, plus 23 small heuristic parts.
## Phasing (smallest first)
- **P1 — frame:** define `TurnContext`, refactor the current chat turn into the
explicit pipeline (perceive=stub → recall → sense → route=mode-only → deliberate →
compose → speak), single model. Low-risk refactor; makes the structure real.
- **P2 — control plane:** real `perceive` (sentiment/moment) + `route`
(register/intent). Now her framing adapts to the moment, deterministically.
- **P3 — mind/mouth split:** route picks a separate voice model for `speak`. Plug a
character mouth (Claude / local / later a fine-tune). A/B vs. single-model.
- **P4 — learning:** `weights` over register/memory, nudged by ratings → cheap
adaptation, no fine-tune.
- **P5 — her voice:** a small fine-tuned "Lyra voice" model drops into the mouth slot.
## Open decisions
- **Mouth model**: Claude (warm, cloud) vs. local character vs. fine-tune. The mouth
is the crux; it must render richly (8B local may flatten).
- **perceive**: pure heuristics vs. a tiny classifier vs. embedding-to-exemplar
clusters. Probably hybrid.
- **scheduler**: fixed linear pipeline (simple, v1) vs. drive-based/parallel later.
- **tool location**: mind decides+runs tools, mouth only renders (clean split) — vs.
letting the mouth call tools (needs a tool-capable mouth).
- **latency budget**: how many LLM calls per turn is acceptable live (cheap mind +
streamed mouth keeps it ~2).
```