codec-re: handoff polish — readmes, skeleton, remove decode-re/ duplicate

Three things to make pickup smoother: 1. analysis/README.md (NEW): catalogues the ~25 scratch scripts. Categorizes them as "still useful" / "superseded — keep for archaeology" / "pure exploration". Tells a fresh engineer which files to read first and which to ignore. 2. scratch/next_experiment_skeleton.py (NEW): stub + spec for the segment-channel scoring analyzer. Includes the fixture loader, block walker, and decode-segment-as-channel helper — just enough scaffolding that the next pass starts from "fill in score_segment_against_all_channels()" rather than from scratch. Already runs and confirms 13 segments per 3-sec event with sample starts going to 6590 (way past the 3328 actual samples) — strong evidence that not all segments carry Tran. 3. Removed decode-re/ duplicate. It was a mirror of tests/fixtures/. Analysis scripts that hardcoded decode-re/ paths updated to point at tests/fixtures/. CLAUDE.md note updated: future event uploads go directly into a dated subdirectory under tests/fixtures/. All 40 tests still pass. Skeleton runs.
2026-05-12 02:53:10 +00:00
parent f68ee9f0f9
commit ae0e17b5dc
31 changed files with 404 additions and 24860 deletions
@@ -0,0 +1,66 @@
+# analysis/ — exploratory scripts for waveform-body RE
+
+**These are scratch.** Run them, read them, copy them, but don't trust
+them as documentation.  When a finding is verified it gets promoted
+to `minimateplus/waveform_codec.py` and `tests/test_waveform_codec.py`;
+when it's wrong it stays here as a fossil.
+
+Authoritative status lives in:
+
+- `docs/waveform_codec_re_status.md` (current truth, working note)
+- `minimateplus/waveform_codec.py` (verified implementation + docstring)
+- `tests/test_waveform_codec.py` (regression locks against fixtures)
+
+---
+
+## Still useful
+
+| File | What it does |
+|---|---|
+| `load_bundle.py` | Fixture loader.  Parses BW binary + ASCII TXT into a `Bundle` dataclass with samples, metadata, body bytes.  Used by most other scripts here. |
+| `verify_tran.py` | Verifies `decode_tran_initial` against fixture ground truth across all events.  Useful when you change the decoder and want a quick sanity check. |
+| `inspect_5_11.py` | Inspects the 5-11-26 high-amplitude bundle's body structure, prints metadata, peaks, and block counts. |
+| `walk_5_11.py` | Walks blocks for the 5-11-26 bundle and prints offset/tag/length/data. |
+| `seg1_blocks.py` | Dumps all blocks in segment 1 of each event.  The starting point for cracking multi-segment Tran continuation. |
+| `full_tran.py` | Multi-segment Tran decoder attempt (broken — diverges at sample ~512).  Useful as a starting scaffold for the next experiment. |
+| `multi_segment.py` | Earlier multi-segment attempt with different segment-header consumption strategies.  Records what didn't work. |
+| `test_rle.py` | Tests `00 NN` interpretation as zero-RLE with different divisor values.  Documents how the RLE rule was confirmed. |
+
+## Superseded — keep for archaeology
+
+| File | Superseded by |
+|---|---|
+| `walk_v2.py` … `walk_v5.py` | `walk_v6.py` and ultimately `minimateplus/waveform_codec.walk_body`.  Each version represents one round of refinement.  Don't read in isolation — read the diff between them to see what was learned. |
+| `walk_chunks.py` | `walk_v6.py` / production walker |
+| `decode_v1.py` | First naive decoder attempt.  Wrong but readable. |
+
+## Pure exploration — read if curious
+
+| File | What it explored |
+|---|---|
+| `inspect_body.py` | Byte-frequency stats per event.  Established that bytes 0x00 / 0x10 dominate. |
+| `find_blocks.py` | Searched for repeating 2-byte tag patterns. |
+| `find_signal_runs.py` | Searched for stretches of bytes that "look like a smooth signal" (small inter-byte deltas).  Found the `20 NN` literal blocks. |
+| `dump_head.py`, `dump_trailer.py`, `dump_around.py` | Hex dumpers at various body positions. |
+| `compare_cd.py` | Byte-diff between event-c and event-d (same length, similar signal).  Used to identify structural vs data bytes. |
+| `brute_force.py` | Tested 96 combinations of channel-permutation × nibble-order × sign-convention × init-from-header on the quiet bundle.  All failed because the quiet bundle had T[0]=T[1]=0, making the preamble undetectable. |
+| `try_nibbles.py`, `try_layouts.py` | Earlier channel-interleaving hypotheses.  All wrong. |
+| `test_tran_continue.py` | Test of "Tran continues uninterrupted across `30 04` blocks" hypothesis.  Disproven. |
+
+---
+
+## Adding new scripts
+
+If you're picking up the codec work, feel free to add new scripts here.
+Suggested conventions:
+
+- Start the filename with what you're testing: `test_<hypothesis>.py`,
+  `verify_<piece>.py`, `inspect_<region>.py`.
+- Print enough output that the reader can see exactly which events
+  match / diverge and where.
+- When a finding is solid, move the verified logic to
+  `minimateplus/waveform_codec.py` and add a regression test in
+  `tests/test_waveform_codec.py` — don't leave the truth only in
+  this directory.
+- If a script is fully superseded, leave it in place (don't delete) —
+  the fossil record is useful when re-evaluating hypotheses later.
@@ -54,7 +54,7 @@ def decode_full_tran(body):

 def main():
    for stem in ("M529LL1L.V70", "M529LL1L.JQ0", "M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
-        path = f"decode-re/5-11-26/{stem}"
+        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        _, samples = _parse_txt(path + ".TXT")
@@ -4,7 +4,7 @@ sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import walk_body, find_data_start

-ROOT = "decode-re/5-11-26"
+ROOT = "tests/fixtures/5-11-26"


 def main():
@@ -10,7 +10,9 @@ import re
 from dataclasses import dataclass


-BUNDLE_ROOT = os.path.join(os.path.dirname(__file__), "..", "decode-re", "5-8-26")
+BUNDLE_ROOT = os.path.join(
+    os.path.dirname(__file__), "..", "tests", "fixtures", "decode-re-5-8-26"
+)


@dataclass
@@ -55,7 +55,7 @@ def decode_full_tran(body):

 def main():
    for stem in ("M529LL1L.V70", "M529LL1L.JQ0", "M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
-        path = f"decode-re/5-11-26/{stem}"
+        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        _, samples = _parse_txt(path + ".TXT")
@@ -6,7 +6,7 @@ from minimateplus.waveform_codec import walk_body, find_data_start

 def main():
    for stem in ("M529LL1A.SP0", "M529LL1L.JQ0", "M529LL1L.V70"):
-        path = f"decode-re/5-11-26/{stem}"
+        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        blocks = walk_body(body, find_data_start(body))
@@ -62,7 +62,7 @@ def decode_with_rle(body):

 def main():
    for stem in ("M529LL1L.V70", "M529LL1L.JQ0", "M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
-        path = f"decode-re/5-11-26/{stem}"
+        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        _, samples = _parse_txt(path + ".TXT")
@@ -15,7 +15,7 @@ def i8(b):

 def main():
    stem = "M529LL1A.SS0"
-    path = f"decode-re/5-11-26/{stem}"
+    path = f"tests/fixtures/5-11-26/{stem}"
    with open(path, "rb") as f:
        body = f.read()[43:-26]
    _, samples = _parse_txt(path + ".TXT")
@@ -17,7 +17,7 @@ def i8(b):

 def main():
    for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
-        path = f"decode-re/5-11-26/{stem}"
+        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            raw = f.read()
        body = raw[43:-26]
@@ -6,7 +6,7 @@ from minimateplus.waveform_codec import walk_body, find_data_start

 def main():
    for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
-        with open(f"decode-re/5-11-26/{stem}", "rb") as f:
+        with open(f"tests/fixtures/5-11-26/{stem}", "rb") as f:
            raw = f.read()
        body = raw[43:-26]
        start = find_data_start(body)