First step of the cognition control plane (docs/COGNITION.md). The chat turn is now
an explicit society of parts over a shared TurnContext blackboard:
perceive (stub) -> route (session mode) -> compose (tiered prompt) -> deliberate.
- lyra/mind.py (new): TurnContext + the pipeline + assemble(); moved build_messages
and the deliberation helpers here (the assembly belongs in the control plane).
- lyra/chat.py: slimmed to "speak + persist" — calls mind.assemble(), runs the
tool/generation loop, persists. No behavior change (same prompt, same output).
- tests: point test_time/test_chat at mind; add an assemble() structure test;
make test_chat/test_tools hermetic (CHAT_DELIBERATE off so respond() doesn't make
a real LLM call). Suite 86 green in ~5s, ruff clean, no import cycle.
This is the frame; perceive/route/learn get filled in next phases — each opt-in.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chat had no thinking in it: respond() was a single gpt-4o call in default-
assistant voice (numbered lists, 'would you like to...', vague). All the cognition
work was background-only. This brings a thought step into the conversation.
- chat: before answering a substantive turn (trivial 'ok/lol' skipped), a private
_deliberate() pass — "what do you ACTUALLY think, your real take, the substance,
no pleasantries" — drawing on her in-context threads/journal. The thinking is then
injected as the LAST system note with voice enforcement (answer from this; no
numbered list / how-to outline unless asked; no 'would you like to' closer), so it
beats gpt-4o's boilerplate at the most influential position. Logged to /logs.
- Wired into respond() + respond_stream(). Config CHAT_DELIBERATE (default on) to
disable if the extra call's latency annoys.
- persona: "talk, don't outline" — prose over listicles, the first concrete move
over a survey of options.
- test_chat.py (gating + note composition + disabled). Suite 84, ruff clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>