# Lyra — Cognition Architecture (sketch) > The "society of mind" direction: instead of one giant model we keep nagging with > stricter prompts, a society of small specialized parts cooperate to produce each > turn. **Most parts are cheap deterministic code (heuristics, math, learnable > weights); the LLM is the exception, reserved for the few irreducibly-generative > jobs.** Everything is anchored to who she is and tuned by feedback. ## Principles 1. **LLM is the exception, not the rule.** Bookkeeping, scoring, routing, thresholding, retrieval → code. Generation (language, novel reasoning, memory compression) → LLM, called sparingly. 2. **Mind ≠ Mouth.** A capable "mind" (decide / reason / use tools — helpfulness is fine) is separate from a "mouth" (the character voice). This lets each be the best model for *its* job — and makes the eventual fine-tune easy: you only have to teach a small model to *sound like Lyra*, not to *be smart*. 3. **Anchored.** A fixed identity anchor governs the mouth so self-composed prompts can't drift into generic-helper vapor. (Already exists: `self_state.IDENTITY_ANCHOR`.) 4. **Tuned by feedback, not just hand-tuning.** Learnable *weights* (over register, memory, parts) nudged by 👍/👎 give real adaptation *without* fine-tuning a model. 5. **Allocation is the craft.** Cheap-deterministic where signal is clear; LLM where judgment/language is needed; **hybrid** (heuristic common-case, escalate to LLM on ambiguity) where possible. ## The blackboard: `TurnContext` Parts don't call each other directly — they read from and write to a shared turn state (a blackboard). Heterogeneous parts (heuristic / LLM / weights) cooperate by annotating it. The composer reads the finished blackboard to build the prompt. ``` TurnContext { # --- inputs --- user_msg, session_id, history, now # --- perception (heuristic) --- moment : { kind: emotional|strategic|casual|existential|meta, sentiment: -1..1, tilt: 0..1, urgency: 0..1 } # --- state (code) --- mood, drives, anchor # --- retrieval (math: embeddings + cosine) --- recalled : [memories] # spreading activation threads : [active thoughts] profile, narrative # --- control (heuristic + learnable weights) --- register : warm | coach | dry | tender | hype # how to sound intent : console | push_back | teach | riff | act mode : talk | cash | ... # tool allow-list use_tools: bool route : { mind: , mouth: } # which model per role # --- generation (LLM, sparing) --- deliberation : "her private thinking" # mind tool_results : [...] # mind + tool exec reply : "final text" # mouth # --- learning (heuristic/online) --- weights : { register_prefs, memory_weights, ... } # persisted, feedback-tuned } ``` ## The parts | # | Part | Type | Does | Exists today? | |---|------|------|------|---------------| | 1 | **perceive** | heuristic | sentiment + classify the moment + tilt/urgency from session signals & his language | ✗ (new) | | 2 | **recall** | math | embeddings → relevant memories, active threads, profile, narrative | ✓ `memory.recall*`, `cognition.activate` | | 3 | **sense_state** | code | load mood / drives / anchor | ✓ `self_state`, `IDENTITY_ANCHOR` | | 4 | **route** | heuristic + weights | pick register, intent, mode, and which model is mind vs mouth | ✗ (new; partly `modes`) | | 5 | **decide+act (tools)** | LLM (mind) / code | does this turn need a tool? run it | ✓ tool loop in `chat` | | 6 | **deliberate** | LLM (mind) | "what do I actually think" — private substance pass | ✓ `chat._deliberate` | | 7 | **compose** | code | assemble the final prompt from anchor + register + intent + deliberation + recall + tool results + voice rules | ✓ `build_messages` (becomes the composer) | | 8 | **speak** | LLM (mouth) | write the reply in her voice, streamed, anchored | ✓ `llm.chat_call` | | 9 | **learn** | heuristic/online | on 👍/👎 or reaction, nudge `weights` (which register/memory worked) | ✗ (new; data exists in `ratings`) | Most of the society (1,2,3,4,7,9) is **free, instant, deterministic, debuggable.** The LLM shows up in only ~2–3 places (5/6 = mind, 8 = mouth). ## One chat turn ``` user msg │ ▼ [1 perceive]──heuristic: emotional? strategic? tilting? (free) │ [2 recall]───math: what lights up (memories, threads) (free) [3 sense]────code: mood, drives, anchor (free) │ [4 route]────heuristic+weights: register? intent? mind/mouth? (free) │ [5 act]──────MIND model: tools if needed ─────────────┐ (LLM, only if needed) [6 deliberate]──MIND model: what do I actually think │ (LLM, gated) │ │ [7 compose]──code: build the prompt ◄──── anchor ──────┘ (free) │ [8 speak]────MOUTH model: the reply, in her voice, streamed (LLM) │ ▼ reply ──► (later) [9 learn]: 👍/👎 nudges weights (free, async) ``` ## What we reuse vs. build - **Reuse (already scattered through the code):** recall/activation, self_state + anchor, drives (in `dream`), modes (tool gating), the deliberation pass, the prompt assembly (`build_messages`), tool loop, ratings store. - **Build new:** the `TurnContext` blackboard + an explicit pipeline runner; the **perceive** heuristic; the **route** part (register/intent + model routing); the **learn** weights loop. Mostly *unifying* existing pieces into one legible control plane, plus 2–3 small heuristic parts. ## Phasing (smallest first) - **P1 — frame:** define `TurnContext`, refactor the current chat turn into the explicit pipeline (perceive=stub → recall → sense → route=mode-only → deliberate → compose → speak), single model. Low-risk refactor; makes the structure real. - **P2 — control plane:** real `perceive` (sentiment/moment) + `route` (register/intent). Now her framing adapts to the moment, deterministically. - **P3 — mind/mouth split:** route picks a separate voice model for `speak`. Plug a character mouth (Claude / local / later a fine-tune). A/B vs. single-model. - **P4 — learning:** `weights` over register/memory, nudged by ratings → cheap adaptation, no fine-tune. - **P5 — her voice:** a small fine-tuned "Lyra voice" model drops into the mouth slot. ## Open decisions - **Mouth model**: Claude (warm, cloud) vs. local character vs. fine-tune. The mouth is the crux; it must render richly (8B local may flatten). - **perceive**: pure heuristics vs. a tiny classifier vs. embedding-to-exemplar clusters. Probably hybrid. - **scheduler**: fixed linear pipeline (simple, v1) vs. drive-based/parallel later. - **tool location**: mind decides+runs tools, mouth only renders (clean split) — vs. letting the mouth call tools (needs a tool-capable mouth). - **latency budget**: how many LLM calls per turn is acceptable live (cheap mind + streamed mouth keeps it ~2). ```