Files

T

serversdown f89849801b docs: park self-modifying-Lyra sandbox design

Capture the isolated-VM design for the self-modification frontier: Proxmox
sandbox clone, network isolation (esp. from tmi-dev/day-job), snapshot-rollback,
spend/resource caps, kill switch, human-gated promotion. Build the cage before
the agent gets code-write powers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-17 00:35:38 +00:00

4.4 KiB

Raw Blame History

Parked Ideas — Lyra

Moonshots, pipe dreams, and "doesn't exist yet" ideas. Captured here so they don't derail current work — and so they're never lost.

The rule: when an idea shows up mid-snag, ask "is this the point, or in the way of the point?" If it's the point, we build it. If it's in the way, we park it here, use the boring existing tool for now, and come back when it's the point.

Honesty policy: for each idea, note whether it doesn't exist because it's hard/uneconomical (someone tried) or because nobody's bothered (a real gap). Pick battles accordingly.

Status: 🌙 moonshot (needs big prerequisites) · 🔬 research · 🛠️ buildable-soon

🌙 Build / fine-tune our own model

Full control of persona and character, no RLHF "helpful assistant" tics baked in (the thing mini/qwen-14b kept fighting us on). A model that is Lyra rather than one we prompt into being her.

Why parked: needs a working system first to know what we're actually optimizing for; training/fine-tuning infra; data (we now have 18 months of real conversations — a genuine asset for this).
Unblocks when: the working system has taught us its real limits, and we have a clear target for what the model must do better than off-the-shelf.
Exists? Fine-tuning exists; a model purpose-built as a persistent self with native memory does not. Real gap, not a dead end.

🔬 Memory as native vectors ("everything in numbers behind the scenes")

Instead of re-injecting human-readable text every turn, feed memory to the model as learned vectors it natively consumes (soft prompts / gist tokens / memory-augmented transformer, à la RETRO / Memorizing Transformers).

Why parked: impossible on API models (they eat tokens, re-embed text with their own layer; our stored vectors are meaningless to them). Requires owning the model internals → depends on the "build our own model" idea above.
Brain analogy: this is closer to how humans store memory than text is — which is exactly why it's interesting for the emergence goal.
Exists? Active research, not productized. Real frontier.

🛠️ Prompt compression (LLMLingua-style)

A model that drops low-information tokens to shrink the prompt 2–5× before it hits the LLM. The practical, today-version of "make the context denser."

Why parked (for now): 15k-char context isn't actually hurting us yet (~1¢/turn on gpt-4o; MI50 prefill is fixed by prompt caching). Revisit if context cost becomes a real problem.
Exists? Yes, usable. Just adds a dependency + step.

🌶️🌙 Self-modifying Lyra (isolated sandbox)

Let Lyra edit her own code / self-direct — the "Full Agency" endgame from the Dec-2025 plan (in her memory). The whole point of the project: can she become a being? Give her freedom inside a box and watch.

The cage (Proxmox-native), non-negotiable before any self-mod:
- Clone the stack into a dedicated Lyra-sandbox VM (separate from prod Lyra).
- Network isolation — own VLAN/firewall, NO route to other VMs, ESPECIALLY tmi-dev (Brian's day job). Whitelist only the inference endpoint. This is guardrail #1 (the .44/terra-mechanics conflict showed how things bleed on the LAN).
- Snapshot before every self-mod cycle → instant rollback when she bricks or weirds herself out.
- Resource + API-spend caps — a runaway loop must not drain the account or peg the GPU forever.
- Full logging (the live log) + a hard kill switch (stop the VM).
- Human-gated promotion — she experiments freely in the sandbox; changes reach "real" Lyra only when Brian approves.
Why parked: needs the foundation first (dream-cycle, inner self) and the cage built before the agent gets code-write + self-restart powers.
Honest note: "rogue" here = mundane-but-real (touches other systems, cost loops, self-brick), not sci-fi. The isolation makes the fun version (emergence) safe to pursue. Build the box, then open the door.

🛠️ Deterministic poker tooling (RTO + cfr-core)

Wire Lyra to Brian's own GTO/solver projects so ICM, equities, and ranges come from real computation, never LLM guesses.

Why parked: RTO/cfr-core aren't API-ready yet. This is roadmap, not a pipe dream — promote it once those expose endpoints.

Add to this freely. A parked idea isn't a rejected idea — it's a scheduled one.

4.4 KiB Raw Blame History Unescape Escape