seismo-relay/analysis/README.md

# analysis/ — exploratory scripts for waveform-body RE

**These are scratch.** Run them, read them, copy them, but don't trust
them as documentation.  When a finding is verified it gets promoted
to `minimateplus/waveform_codec.py` and `tests/test_waveform_codec.py`;
when it's wrong it stays here as a fossil.

Authoritative status lives in:

- `docs/waveform_codec_re_status.md` (current truth, working note)
- `minimateplus/waveform_codec.py` (verified implementation + docstring)
- `tests/test_waveform_codec.py` (regression locks against fixtures)

---

## Still useful

| File | What it does |
|---|---|
| `load_bundle.py` | Fixture loader.  Parses BW binary + ASCII TXT into a `Bundle` dataclass with samples, metadata, body bytes.  Used by most other scripts here. |
| `verify_tran.py` | Verifies `decode_tran_initial` against fixture ground truth across all events.  Useful when you change the decoder and want a quick sanity check. |
| `inspect_5_11.py` | Inspects the 5-11-26 high-amplitude bundle's body structure, prints metadata, peaks, and block counts. |
| `walk_5_11.py` | Walks blocks for the 5-11-26 bundle and prints offset/tag/length/data. |
| `seg1_blocks.py` | Dumps all blocks in segment 1 of each event.  The starting point for cracking multi-segment Tran continuation. |
| `full_tran.py` | Multi-segment Tran decoder attempt (broken — diverges at sample ~512).  Useful as a starting scaffold for the next experiment. |
| `multi_segment.py` | Earlier multi-segment attempt with different segment-header consumption strategies.  Records what didn't work. |
| `test_rle.py` | Tests `00 NN` interpretation as zero-RLE with different divisor values.  Documents how the RLE rule was confirmed. |

## Superseded — keep for archaeology

| File | Superseded by |
|---|---|
| `walk_v2.py` … `walk_v5.py` | `walk_v6.py` and ultimately `minimateplus/waveform_codec.walk_body`.  Each version represents one round of refinement.  Don't read in isolation — read the diff between them to see what was learned. |
| `walk_chunks.py` | `walk_v6.py` / production walker |
| `decode_v1.py` | First naive decoder attempt.  Wrong but readable. |

## Pure exploration — read if curious

| File | What it explored |
|---|---|
| `inspect_body.py` | Byte-frequency stats per event.  Established that bytes 0x00 / 0x10 dominate. |
| `find_blocks.py` | Searched for repeating 2-byte tag patterns. |
| `find_signal_runs.py` | Searched for stretches of bytes that "look like a smooth signal" (small inter-byte deltas).  Found the `20 NN` literal blocks. |
| `dump_head.py`, `dump_trailer.py`, `dump_around.py` | Hex dumpers at various body positions. |
| `compare_cd.py` | Byte-diff between event-c and event-d (same length, similar signal).  Used to identify structural vs data bytes. |
| `brute_force.py` | Tested 96 combinations of channel-permutation × nibble-order × sign-convention × init-from-header on the quiet bundle.  All failed because the quiet bundle had T[0]=T[1]=0, making the preamble undetectable. |
| `try_nibbles.py`, `try_layouts.py` | Earlier channel-interleaving hypotheses.  All wrong. |
| `test_tran_continue.py` | Test of "Tran continues uninterrupted across `30 04` blocks" hypothesis.  Disproven. |

---

## Adding new scripts

If you're picking up the codec work, feel free to add new scripts here.
Suggested conventions:

- Start the filename with what you're testing: `test_<hypothesis>.py`,
  `verify_<piece>.py`, `inspect_<region>.py`.
- Print enough output that the reader can see exactly which events
  match / diverge and where.
- When a finding is solid, move the verified logic to
  `minimateplus/waveform_codec.py` and add a regression test in
  `tests/test_waveform_codec.py` — don't leave the truth only in
  this directory.
- If a script is fully superseded, leave it in place (don't delete) —
  the fossil record is useful when re-evaluating hypotheses later.