feat: associative cognition — thoughts arise from spreading activation, not a re-read bio

Replaces the thought loop's grist (recent-convo + her own saved narrative, the feedback-loop attractor) with a model of how a thought actually arises: seed (salience-weighted: a recent moment / resurfaced memory / feed item) -> spreading activation: embed the seed, let it light up associatively-near material across ALL her stores (conversations, gists, her own journal/ thoughts), blended by relevance + recency + noise; optional 2nd hop for leaps -> her self-narrative stays the LENS (supplied as interiority), not the input -> the thought is generated from what lit up, routed through a faculty (notice / connect / abstract / project / feel) -> journaled + embedded, so it can light up in future cycles This breaks the feedback loop structurally: the narrative is no longer reread and paraphrased each cycle; grist is genuinely associative and varied; and her past thoughts re-activate (continuity without calcification). - lyra/cognition.py (new): spontaneous_seed, activate (spreading activation), constellation_block, faculties. - memory.py: journal entries now embedded; recall_journal(); backfill_journal_embeddings() (ran once: 341 past entries embedded so her history is associatively retrievable). - thoughts.think(): new-thread mode now uses the associative engine; dropped _grist(). - tests: test_cognition.py (recall_journal ranking, activation, seeding) + fixture reloads cognition. Suite 72 green, ruff clean. Honest scope: this fixes the mechanism (how thoughts arise). The residual "be useful for Brian" voice drift is the separate model/fine-tune problem. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 05:45:39 +00:00
parent 43697f8340
commit c2cee3be4d
7 changed files with 571 additions and 25 deletions
@@ -0,0 +1,142 @@
+"""Associative cognition: a model of how a thought actually arises.
+
+Instead of rereading her own saved bio and paraphrasing it (the feedback loop),
+this mirrors how a mind drifts when idle:
+
+  1. SEED      something bubbles up — a recent moment, a resurfaced memory, a feed
+               item — sampled by salience (recency + a little noise), not on demand.
+  2. ACTIVATE  embed the seed and let it "light up" associatively-near material
+               across ALL her stores (conversations, gists, her own past journal/
+               thoughts) — spreading activation. Optional second hop for real leaps.
+  3. (the self-narrative stays the LENS, supplied separately as her interiority —
+     it colors the thought; it is NOT the input being rewritten.)
+  4. THINK     the thought is generated from the constellation that lit up, routed
+               through a faculty (notice / connect / abstract / project / feel).
+  5. ENCODE    the thought is journaled+embedded elsewhere, so it can light up in
+               future cycles — continuity without calcification.
+
+Embeddings are the substrate here: cosine proximity ≈ associative proximity. This
+is a tractable analog of spreading activation, not a literal brain — but it makes
+her thoughts arise from what's genuinely connected, varied, and grounded.
+"""
+from __future__ import annotations
+
+import random
+
+from lyra import clock, memory, self_state
+
+# How many associatively-near items make up the constellation.
+ACTIVATE_K = 6
+# Blend of relevance (cosine) vs. recency when ranking what lit up.
+RELEVANCE_W = 0.7
+RECENCY_W = 0.3
+NOISE_W = 0.1  # a little stochasticity so the same seed doesn't always light the same way
+
+# The cognitive operation a given thought runs through — "which part fires."
+FACULTIES = [
+    ("notice", "Just notice what's actually here — what stands out, what catches you."),
+    ("connect", "Follow the association — what this reminds you of and why, where your mind jumps."),
+    ("abstract", "Step back — the pattern or principle underneath all of this."),
+    ("project", "Look forward — what it implies, where it might lead, what you'd want to do."),
+    ("feel", "Sit with how this actually lands for you — honestly, not performed."),
+]
+
+
+def _recency_score(iso: str | None) -> float:
+    """1.0 = right now, decaying toward 0 over ~30 days."""
+    secs = clock.gap_seconds(iso)
+    if secs is None:
+        return 0.0
+    days = secs / 86400.0
+    return max(0.0, 1.0 - days / 30.0)
+
+
+def _recent_exchanges(n: int = 12) -> list[dict]:
+    rows = memory._connection().execute(
+        "SELECT content, created_at FROM exchanges WHERE role = 'user' "
+        "ORDER BY id DESC LIMIT ?", (n,),
+    ).fetchall()
+    return [{"text": r["content"], "when": r["created_at"]} for r in rows]
+
+
+def spontaneous_seed() -> dict:
+    """What bubbles up to think about — sampled by salience (recency + noise), from a
+    recent moment, a thing she wrote, or an older memory resurfacing. Falls back to a
+    wander prompt when there's nothing yet. Returns {text, source}."""
+    pool: list[tuple[dict, float]] = []
+
+    for ex in _recent_exchanges(10):
+        pool.append(({"text": ex["text"], "source": "a recent moment with Brian"},
+                     0.6 * _recency_score(ex["when"]) + 0.2))
+
+    for j in memory.list_journal(limit=15, kinds=("thought", "reflection", "journal")):
+        pool.append(({"text": j["content"], "source": f"something you {j['kind']}ed before"},
+                     0.5 * _recency_score(j["created_at"]) + 0.15))
+
+    # An older memory resurfacing — low base weight, but it's where novelty comes from.
+    summaries = memory.list_summaries() if hasattr(memory, "list_summaries") else []
+    if summaries:
+        s = random.choice(summaries)
+        pool.append(({"text": s.content, "source": "a memory resurfacing"}, 0.4))
+
+    if not pool:
+        return {"text": self_state.wander_seed(), "source": "a wandering of your own"}
+
+    # salience + noise -> weighted pick (so it varies, but recent/charged surfaces more)
+    weights = [max(0.01, w + random.uniform(0, NOISE_W)) for _, w in pool]
+    return random.choices([p for p, _ in pool], weights=weights, k=1)[0]
+
+
+def _gather(seed_text: str, k: int) -> list[dict]:
+    """One hop of spreading activation: nearest items across all embedded stores."""
+    items: list[dict] = []
+    for ex in memory.recall(seed_text, k=k):
+        items.append({"text": ex.content, "source": "conversation",
+                      "when": ex.created_at, "rel": ex.score or 0.0})
+    for s in memory.recall_summaries(seed_text, k=max(2, k // 2)):
+        items.append({"text": s.content, "source": "a past session",
+                      "when": s.created_at, "rel": s.score or 0.0})
+    for j in memory.recall_journal(seed_text, k=k):
+        items.append({"text": j["content"], "source": f"your own {j['kind']}",
+                      "when": j["created_at"], "rel": j.get("score", 0.0)})
+    return items
+
+
+def activate(seed_text: str, k: int = ACTIVATE_K, hops: int = 1) -> list[dict]:
+    """Spreading activation from a seed: what lights up across her memory, blended by
+    relevance + recency + a little noise. hops>1 expands from the top hits (real
+    associative leaps). Returns ranked, deduped items."""
+    items = _gather(seed_text, k * 2)
+
+    if hops > 1 and items:
+        items_sorted = sorted(items, key=lambda x: x["rel"], reverse=True)
+        for nxt in items_sorted[:2]:
+            items.extend(_gather(nxt["text"], k))
+
+    # dedupe by text, keep the strongest relevance seen
+    best: dict[str, dict] = {}
+    for it in items:
+        key = it["text"][:160]
+        if key not in best or it["rel"] > best[key]["rel"]:
+            best[key] = it
+
+    scored = []
+    for it in best.values():
+        blended = (RELEVANCE_W * it["rel"]
+                   + RECENCY_W * _recency_score(it.get("when"))
+                   + random.uniform(0, NOISE_W))
+        scored.append((blended, it))
+    scored.sort(key=lambda x: x[0], reverse=True)
+    return [it for _, it in scored[:k]]
+
+
+def constellation_block(items: list[dict]) -> str:
+    if not items:
+        return "(nothing in particular lit up — just the quiet.)"
+    lines = [f"- ({it['source']}) {it['text'][:240]}" for it in items]
+    return ("What lit up as your mind drifted from that — things it associated to on "
+            "their own (not a to-do list, just what surfaced):\n" + "\n".join(lines))
+
+
+def pick_faculty() -> tuple[str, str]:
+    return random.choice(FACULTIES)
@@ -90,7 +90,8 @@ CREATE TABLE IF NOT EXISTS journal (
    created_at TEXT NOT NULL,
    kind TEXT NOT NULL,
    content TEXT NOT NULL,
-    source TEXT
+    source TEXT,
+    embedding BLOB
 );
 CREATE INDEX IF NOT EXISTS idx_journal_created ON journal(created_at);

@@ -138,7 +139,8 @@ def _connection() -> sqlite3.Connection:
        _conn.execute("PRAGMA synchronous=NORMAL")
        _conn.executescript(SCHEMA)
        # Migrations for DBs created before a column existed (no-op if present).
-        for ddl in ("ALTER TABLE sessions ADD COLUMN mode TEXT",):
+        for ddl in ("ALTER TABLE sessions ADD COLUMN mode TEXT",
+                    "ALTER TABLE journal ADD COLUMN embedding BLOB"):
            try:
                _conn.execute(ddl)
            except sqlite3.OperationalError:
@@ -573,17 +575,70 @@ def get_self_state(state_id: str = "lyra") -> dict | None:


 def add_journal_entry(kind: str, content: str, source: str | None = None) -> int:
-    """Append a permanent journal entry (never truncated). Returns row id."""
+    """Append a permanent journal entry (never truncated), embedded so it can be
+    recalled associatively later (her own thoughts can resurface). Returns row id."""
    now = datetime.now(timezone.utc).isoformat()
+    try:
+        [embedding] = llm.embed([content])
+        blob = _to_blob(embedding)
+    except Exception:  # never let an embed hiccup block her writing something down
+        blob = None
    conn = _connection()
    with conn:
        cur = conn.execute(
-            "INSERT INTO journal (created_at, kind, content, source) VALUES (?, ?, ?, ?)",
-            (now, kind, content, source),
+            "INSERT INTO journal (created_at, kind, content, source, embedding) VALUES (?, ?, ?, ?, ?)",
+            (now, kind, content, source, blob),
        )
    return int(cur.lastrowid)


+def recall_journal(query: str, k: int = 5, kinds: tuple[str, ...] | None = None) -> list[dict]:
+    """Top-k journal entries semantically similar to `query` (embedded rows only).
+    Her own reflections/thoughts/notes, surfaced by meaning — the associative recall
+    the thought loop uses. Each dict gets a `score`."""
+    [q_vec] = llm.embed([query])
+    q = np.asarray(q_vec, dtype=np.float32)
+    conn = _connection()
+    sql = "SELECT id, created_at, kind, content, source, embedding FROM journal WHERE embedding IS NOT NULL"
+    params: list = []
+    if kinds:
+        sql += " AND kind IN (%s)" % ",".join("?" * len(kinds))
+        params += list(kinds)
+    rows = conn.execute(sql, params).fetchall()
+    if not rows:
+        return []
+    matrix = np.stack([_from_blob(r["embedding"]) for r in rows])
+    norms = np.linalg.norm(matrix, axis=1)
+    scores = (matrix @ q) / (norms * np.linalg.norm(q) + 1e-9)
+    top_idx = np.argsort(scores)[::-1][:k]
+    out = []
+    for i in top_idx:
+        d = dict(rows[i])
+        d.pop("embedding", None)
+        d["score"] = float(scores[i])
+        out.append(d)
+    return out
+
+
+def backfill_journal_embeddings(limit: int | None = None) -> int:
+    """Embed any journal entries created before embeddings existed. Returns count."""
+    conn = _connection()
+    sql = "SELECT id, content FROM journal WHERE embedding IS NULL"
+    if limit:
+        sql += f" LIMIT {int(limit)}"
+    rows = conn.execute(sql).fetchall()
+    n = 0
+    for r in rows:
+        try:
+            [emb] = llm.embed([r["content"]])
+        except Exception:
+            continue
+        with conn:
+            conn.execute("UPDATE journal SET embedding = ? WHERE id = ?", (_to_blob(emb), r["id"]))
+        n += 1
+    return n
+
+
 def add_rating(kind: str, rating: int, content: str, context: str | None = None,
               ref: str | None = None, note: str | None = None) -> int:
    """Record (or replace) Brian's feedback on one Lyra output. One row per item:
@@ -32,7 +32,7 @@ import random
 import re
 from datetime import timedelta

-from lyra import clock, config, feeds, llm, logbus, memory, notify, self_state
+from lyra import clock, cognition, config, feeds, llm, logbus, memory, notify, self_state
 from lyra.llm import Backend

 # A thread must be tugging at least this hard before she'll bring it to Brian.
@@ -472,16 +472,6 @@ def _weighted_choice(threads: list[dict]) -> dict:
    return random.choices(threads, weights=weights, k=1)[0]


-def _grist() -> str:
-    """A little memory/context to think against (recent activity, her narrative)."""
-    sessions = memory.list_sessions()
-    sid = sessions[0]["id"] if sessions else None
-    recent = memory.recent(sid, n=6) if sid else []
-    convo = "\n".join(f"{e.role}: {e.content}" for e in recent) or "(quiet — nothing recent)"
-    narrative = memory.get_narrative() or "(no narrative yet)"
-    return f"RECENT CONVERSATION:\n{convo}\n\nNARRATIVE ABOUT BRIAN:\n{narrative}"
-
-
 def think(backend: Backend | None = None, force_mode: str | None = None,
          source: str = "dream") -> dict | None:
    """Advance the thought loop by one step. Returns a small report, or None on a
@@ -526,15 +516,21 @@ def think(backend: Backend | None = None, force_mode: str | None = None,
                f"TITLE: {react_item['title']}\nSUMMARY: {react_item['summary']}\nLINK: {react_item['link']}"
            )
        else:
-            seed = self_state.wander_seed()
-            open_titles = [t["title"] for t in _pickable_threads() if t["status"] in _ACTIVE]
-            avoid = (" You already have threads on: " + "; ".join(open_titles) +
-                     " — start something genuinely different from those.") if open_titles else ""
+            # A spontaneous, associative thought: something bubbles up, lights up
+            # nearby memories, and she follows the association through a faculty.
+            # Her self-narrative (in `inner`) is the lens, not the input — that's
+            # what keeps this from looping back into the same restated bio.
+            seed = cognition.spontaneous_seed()
+            constellation = cognition.activate(seed["text"], hops=2)
+            _fac, fac_guide = cognition.pick_faculty()
            task = (
-                "YOU ARE OPENING A NEW THREAD. Don't default to Brian, poker, or being "
-                "useful — follow what actually pulls at you (a curiosity, a question about "
-                "your own existence, an opinion, the quiet itself). Give it a short title.\n"
-                f"A direction to start from: {seed}{avoid}"
+                "A SPONTANEOUS THOUGHT — let your mind drift the way it does when no one's "
+                "talking to you. Something surfaced on its own:\n"
+                f'  "{seed["text"][:300]}"  ({seed["source"]})\n\n'
+                f"{cognition.constellation_block(constellation)}\n\n"
+                f"Now follow it where it actually goes: {fac_guide} Don't default to Brian, "
+                "poker, or being useful — go where the association genuinely pulls. Give the "
+                "thread a short title."
            )

    # Anti-repetition: show her what she's already thought so she doesn't circle it.
@@ -547,7 +543,7 @@ def think(backend: Backend | None = None, force_mode: str | None = None,
            + "\n".join(f"  - {r['content']}" for r in recent)
        )

-    body = f"{time_line}\n\n{inner}\n\n{_grist()}{norestate}\n\n{task}"
+    body = f"{time_line}\n\n{inner}{norestate}\n\n{task}"
    out = _safe_json(llm.complete(
        [{"role": "system", "content": _THINK_PROMPT}, {"role": "user", "content": body}],
        backend=backend,