docs: clean up waveform-codec doc layers per review

Three "truth layers" had drifted apart between commits. Fixed: 1. waveform_codec.py docstring rewritten from the 2026-05-08 "structural framing only" state to the 2026-05-11 "Tran segment 0 solved + segment-header partially decoded" state. Killed stale "~80 sample-sets per segment" language (real segments are flash-page-byte-sized, not sample-count-sized; observed first-segment sizes are 42-510 samples depending on signal). Killed stale "preamble is 7 or 9 bytes" language (always 7). 2. docs/instantel_protocol_reference.md §7.6.1: added a clear "CURRENT STATUS" box at the top with a status table. Replaced the stale "~80 sample-sets" line with the verified per-event segment sizes. Merged two redundant segment-header field-table sections. 3. docs/waveform_codec_re_status.md (NEW): clean working-status doc. Solved / not solved / hypothesis / next experiment / fixtures / tests. The protocol reference remains the historical Rosetta Stone; this new file is the current-truth working note that shouldn't accumulate fossil layers. 4. CLAUDE.md §"Waveform body codec": prominent warning box at top — "DO NOT TRUST decoded sample arrays yet." BW binary passthrough is the only sample-bearing output to trust until the decoder lands. Added a "Next experiment" subsection pointing the next pass at the segment-channel scoring analyzer. 40 tests still pass.
2026-05-12 02:43:25 +00:00
parent 5bf5329369
commit f68ee9f0f9
4 changed files with 385 additions and 139 deletions
@@ -61,10 +61,24 @@ Full read pipeline + write pipeline + erase pipeline + monitor log + call home c

 ## Waveform body codec — PARTIAL (2026-05-11)

+> ### ⛔️ DO NOT TRUST decoded sample arrays yet
+>
+> `client.py:_decode_a5_waveform` still uses the broken legacy int16 LE
+> decoder.  The `.h5` sidecars SFM writes contain WRONG sample values
+> for every event.  Treat decoded sample arrays as "unverified" in all
+> downstream consumers.
+>
+> The **BW binary write path** (`blastware_file.py`) is unaffected —
+> it's pure passthrough of device flash bytes and remains byte-perfect.
+> Use the `.bw` binary as the authoritative waveform output until the
+> codec is fully decoded.
+>
+> Clean working-status doc: `docs/waveform_codec_re_status.md`.
+> Full archaeological record: `docs/instantel_protocol_reference.md §7.6.1`.
+
 The **per-byte decoding** of the Blastware waveform-file body (between the
 21-byte STRT record and the 26-byte footer) was historically claimed to be
-"raw int16 LE, 8 bytes per sample-set."  That was wrong — see the
-retraction in `docs/instantel_protocol_reference.md §7.6.1`.  The body
+"raw int16 LE, 8 bytes per sample-set."  That was wrong.  The body
 is actually a tagged-block stream with a custom delta+RLE codec.

 ### What's solved (2026-05-11)
@@ -96,13 +110,26 @@ is actually a tagged-block stream with a custom delta+RLE codec.
  (SS0, SV0) and breaks the simple Tran walk there.  Probably a channel-
  switch or alternative-encoding marker for high-amplitude regions.

+### Next experiment
+
+**Don't hero-code the full decoder.**  Build a small analysis tool — a
+segment-channel scoring analyzer.  For each segment of each fixture
+event, run the segment-0 Tran block-walk + RLE decode and score the
+cumulative trajectory against the BW ASCII truth for each of {Tran,
+Vert, Long, MicL} over that segment's sample range, trying different
+anchor-bytes candidates from the segment header.  The winning
+(channel, anchor-location) combination for each segment reveals
+whether segments rotate channels and which header bytes encode the
+per-segment channel anchors.
+
+See `docs/waveform_codec_re_status.md` for the full specification of
+the next experiment.
+
 ### Production-code status

 `client.py:_decode_a5_waveform` still uses the old (broken) int16 LE
-decoder.  Until the multi-channel decoder lands, the `.h5` sidecars
-produced by SFM contain WRONG samples — keep treating them as
-"unverified" downstream.  `decode_waveform_v2()` returns `None` as a
-placeholder.
+decoder (see warning at the top of this section).  `decode_waveform_v2()`
+in `minimateplus/waveform_codec.py` returns `None` as a placeholder.

 ### Test fixtures

@@ -860,20 +860,39 @@ MicL:  39 64 1D AA  =  0.0000875 psi

 ---

-#### 7.6.1 Blast / Waveform mode — 🟡 STRUCTURAL FRAMING + TRAN CODEC DECODED (2026-05-11)
+#### 7.6.1 Blast / Waveform mode — 🟡 PARTIAL DECODE (2026-05-11)

-> **Status (2026-05-11):** Block-level framing is solved.  The Tran-channel
-> encoding (preamble + first data block) is **fully verified** against the
-> 3-event May 11 2026 high-amplitude bundle (PPV 6-7 in/s) and the 4-event
-> May 8 bundle.  Verts / Long / MicL channel encodings and multi-block
-> Tran continuation are **still open**.  The previous int16 LE claim
-> remains REFUTED (see history below).
+> ### 📌 CURRENT STATUS — read this first
 >
-> The earlier "4-channel interleaved s16 LE, 8 bytes per sample-set"
-> claim was never validated and was wrong.  No event in the project's
-> archive ever came close to ADC saturation, yet the int16 LE decoder
-> consistently produced full-scale ±32K noise — that was the signature
-> of mis-aligned encoded data, not signal saturation.
+> The body codec is **partially decoded** as of 2026-05-11.  This
+> section contains both current-truth spec AND historical retractions;
+> when in doubt, the working summary lives at
+> `docs/waveform_codec_re_status.md`.
+>
+> | Item | Status |
+> |---|---|
+> | Body has tagged variable-length blocks, NOT raw int16 LE | ✅ confirmed |
+> | 5 block tag types (10/20/00/30/40 NN) with lengths | ✅ confirmed |
+> | 7-byte preamble: `00 02 00` + Tran[0] + Tran[1] int16 BE | ✅ confirmed |
+> | `00 NN` = RLE for zero deltas in the current channel | ✅ confirmed |
+> | Tran channel, segment 0 (~482-510 samples / event) | ✅ byte-exact, 5/5 events |
+> | Multi-segment Tran continuation | ❌ open (breaks at sample ~512) |
+> | Vert / Long / MicL channel decoders | ❌ open |
+> | `30 NN` block content (loud-from-start events) | ❌ open |
+> | Earlier "raw int16 LE, 8 bytes per sample-set" claim | ❌ REFUTED |
+>
+> **Production code in `client.py:_decode_a5_waveform` still uses the
+> broken int16 LE decoder.**  The `.h5` sidecars SFM produces contain
+> wrong sample values and must be treated as "unverified" downstream.
+> The BW binary write path is unaffected (it's pure passthrough of the
+> device's flash bytes, no decoding) and remains byte-perfect.
+
+The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim that
+appeared in earlier revisions of this section was never validated and
+was wrong.  No event in the project's archive ever came close to ADC
+saturation, yet the int16 LE decoder consistently produced full-scale
+±32K noise — that was the signature of mis-aligned encoded data, not
+signal saturation.

 ##### Body file layout

@@ -932,23 +951,38 @@ followed by a ``00 NN`` marker before the next data block.

 ##### Segments

-The body is divided into ~16 SEGMENTS for a 1280-sample event (= 1
-segment per ~80 sample-sets), separated by ``40 02`` segment headers.
-A 3328-sample event has ~42 segments.
+The body is divided into segments separated by ``40 02`` segment headers.
+**Segment size is variable** — bounded by a fixed device-flash byte
+budget, not a fixed sample count.  Quiet events fit more samples per
+segment (RLE compacts zero deltas via ``00 NN`` markers); loud events
+fit fewer.  Observed first-segment sizes in the bundled fixtures:

-The 18-byte ``40 02`` payload structure (CONFIRMED across all 4
-fixtures by inspecting the increment of bytes [8:12]):
+| Event | Segment 0 size (Tran samples) |
+|---|---|
+| SP0 (loud, 0.25s pretrig) | 510 |
+| SV0 (loud-from-start) | 58 (stops at first ``30 NN``) |
+| SS0 (loud-from-start) | 42 (stops at first ``30 04``) |
+| JQ0 (Vert-heavy, quiet Tran) | 510 |
+| V70 (Mic-heavy, quiet geos) | 510 |

-| Offset | Length | Field                                            |
-|--------|--------|--------------------------------------------------|
-| 0      | 4      | Anchor / channel state (open — see below)        |
-| 4      | 4      | Variable field (open)                            |
-| 8      | 4      | uint32 LE counter — increments by 1 per segment  |
-| 12     | 4      | Fixed pattern ``02 00 00 01``                    |
-| 16     | 2      | Variable tail                                    |
+⚠️ Earlier drafts of this section claimed "~80 sample-sets per segment"
+based on incomplete walks; that figure is wrong.  Segments are
+flash-page-sized in bytes, not sample-count-sized.

-The counter at bytes [8:12] starts in the 0x40s for a freshly-erased
-device and increments cleanly — useful as a structural sanity check.
+The 18-byte ``40 02`` payload structure:
+
+| Offset    | Field                                       | Status      |
+|-----------|---------------------------------------------|-------------|
+| [0:2]     | T_delta at first sample of new segment      | ✅ confirmed|
+|           | (int16 BE, in 16-count units)               |             |
+| [2:4]     | Likely T_delta at sample seg_start+1        | 🟡 likely   |
+| [4:6]     | Unknown (varies; possibly a checksum)       | ❓ open     |
+| [6:8]     | Byte length to next segment header − 2      | ✅ confirmed|
+|           | (uint16 BE; useful for walker pre-scan)     |             |
+| [8:12]    | Monotonic uint32 LE counter                 | ✅ confirmed|
+|           | (starts ~0x47, increments by 1 per segment) |             |
+| [12:14]   | Constant ``02 00``                          | ✅ confirmed|
+| [14:18]   | Unknown 4-byte field                        | ❓ open     |

 Examples from event-c (1 sec single-shot):

@@ -1008,26 +1042,25 @@ where the codec is most complex stop at the first ``30 04``.

 Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.

-##### Segment header T-delta (PARTIAL 2026-05-11)
+##### Multi-segment Tran continuation — OPEN

-The 20-byte ``40 02`` segment header has its first 2 bytes ([0:2] of
-payload) as an int16 BE Tran delta for the first sample of the new
-segment.  Verified across V70 (3 segments with 0 deltas) and SP0/JQ0
-(1 segment with +1 delta).  Other bytes of the segment header payload
-are partially understood:
+After segment 0 ends and the segment header's T_delta (bytes [0:2])
+is applied, the next segment's blocks produce values that diverge from
+truth by sample ~512.  The block structure inside segment 1 is
+identical to segment 0 (alternating ``10 NN`` / ``20 NN`` data +
+``00 NN`` RLE), and the per-segment delta budget exactly matches the
+segment size — V70 segment 1 has 264 nibble-deltas + 244 RLE-zeros =
+508 = the segment's sample count.  Cumulative deltas are correct in
+aggregate (V70 net-zero ≈ truth net-zero) but the per-sample trajectory
+is wrong when applied as Tran continuation.

-| Payload offset | Field | Status |
-|---|---|---|
-| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
-| [2:4] | unknown (often 0; not a simple V or T delta) | ❓ open |
-| [4:6] | unknown (varies per event; possibly a checksum) | ❓ open |
-| [6:8] | byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
-| [8:12] | monotonic uint32 LE counter | ✅ confirmed |
-| [12:14] | constant ``02 00`` | ✅ confirmed |
-| [14:18] | unknown 4-byte field | ❓ open |
-
-Multi-segment Tran decoding diverges after sample ~512 — the per-segment
-channel ordering after the header is still unknown.
+The strongest unverified hypothesis is that **segments rotate
+channels**: segment 0 = Tran, segment 1 = Vert, segment 2 = Long,
+segment 3 = Mic, segment 4 = Tran continuation, …  This would explain
+the per-segment delta-budget match while also explaining why segment
+1 isn't Tran continuation.  Verification needs the per-channel anchor
+to come from segment-header bytes [4:6] or [14:18], which are still
+open.

 ##### What's still open

@@ -0,0 +1,172 @@
+# Waveform body codec — current working status (2026-05-11)
+
+This is the **clean working note** for the body-codec reverse-engineering
+effort.  It supersedes scattered claims elsewhere when they conflict.
+The deep historical record (with retractions, dead ends, and dated
+analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
+authoritative implementation lives in `minimateplus/waveform_codec.py`.
+
+## TL;DR
+
+The Blastware waveform-file body is a **tagged variable-length block
+stream**, NOT raw int16 LE samples.  Block framing is solved.  Tran
+channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
+across all 5 high-amplitude fixture events).  Multi-segment continuation
+and the Vert / Long / MicL channel decoders are still open.
+
+**Production code in `minimateplus/client.py:_decode_a5_waveform` still
+uses the broken legacy int16 LE decoder.**  Sample arrays it writes to
+the `.h5` sidecars are wrong and must be treated as "unverified" by all
+downstream consumers.  The BW binary write path (`blastware_file.py`)
+is unaffected — it's pure passthrough and remains byte-perfect.
+
+## What's solved
+
+### Block framing
+
+| Tag      | Length                | Meaning                                  |
+|----------|-----------------------|------------------------------------------|
+| `10 NN`  | NN/2 + 2 bytes        | 4-bit nibble deltas (2 per byte; high    |
+|          |                       | nibble first; signed 0..7 / 8..F = -8..-1)|
+| `20 NN`  | NN + 2 bytes          | int8 signed deltas (1 per byte)          |
+| `00 NN`  | 2 bytes               | RLE: append NN copies of current value   |
+| `30 NN`  | NN*2 in data section, | Unknown content.  Only in loud-from-     |
+|          | NN*4 in trailer       | start events.                            |
+| `40 02`  | 20 bytes (fixed)      | Segment header                           |
+
+NN is always a multiple of 4.
+
+Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
+
+### 7-byte preamble
+
+```
+body[0:3]  = 00 02 00              magic
+body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
+body[5:7]  = Tran[1]   int16 BE    in 16-count units
+```
+
+### Tran channel, segment 0
+
+Segment 0 (everything before the first `40 02`) encodes Tran samples
+only.  Starting from preamble anchors Tran[0] and Tran[1], each block
+contributes to a running cumulative:
+
+- `10 NN` →  append NN nibble-deltas
+- `20 NN` →  append NN int8-deltas
+- `00 NN` →  append NN copies of current value (RLE)
+- `40 02` →  end segment 0
+
+Verified byte-exact:
+
+| Event | Description | Segment 0 size | Match |
+|---|---|---|---|
+| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
+| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
+| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
+| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
+| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
+
+Implementation: `decode_tran_initial()`.
+
+### Segment header (`40 02`, 20 bytes total)
+
+| Payload offset | Field | Status |
+|---|---|---|
+| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
+| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
+| [4:6] | Unknown (possibly checksum) | ❓ open |
+| [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
+| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
+| [12:14] | Constant `02 00` | ✅ confirmed |
+| [14:18] | Unknown 4-byte field | ❓ open |
+
+## What's still open
+
+1. **Multi-segment Tran continuation.**  After segment 0, applying
+   segment 1's blocks as Tran continuation diverges from truth by
+   sample ~512.  Block structure is identical to segment 0 and the
+   per-segment delta budget matches the segment size — but the per-
+   sample trajectory is wrong.
+
+2. **Vert / Long / MicL channel decoders.**  No verified decoder for
+   any non-Tran channel.
+
+3. **`30 NN` block content.**  Only appears in loud-from-start events.
+   Probably a channel-switch or alternative-encoding marker for high-
+   amplitude regions.  Walker steps over it without decoding.
+
+## Strongest unverified hypothesis
+
+Segments rotate channels:
+
+```
+segment 0  →  Tran samples 0..509
+segment 1  →  Vert samples 0..507
+segment 2  →  Long samples 0..507
+segment 3  →  Mic  samples 0..507
+segment 4  →  Tran samples 510..N (continuation)
+...
+```
+
+This would explain:
+- Why segment-0 = Tran works perfectly.
+- Why segment 1 has the same block structure but applying it as Tran
+  continuation gives wrong values.
+- Why the per-segment delta budget matches the segment size for a
+  *single* channel (508 deltas per segment, not 4 × 508).
+
+Not yet verified because the per-channel anchor at segment-start isn't
+identified in the segment header.  Bytes [4:6] and [14:18] of the
+header are the prime candidates.
+
+## Next experiment — segment-channel scoring analyzer
+
+Don't try to hero-code the full decoder.  Instead, build a small
+analysis tool that:
+
+1. For each segment in every fixture event, runs the segment-0 Tran
+   decoder (block-walk + RLE) and produces a cumulative trajectory
+   of 508 deltas.
+2. Scores that trajectory against the BW ASCII truth for *each* of
+   {Tran, Vert, Long, MicL} over the segment's sample range, starting
+   from different anchor-byte candidates from the segment header.
+3. Reports which (channel, anchor-bytes-location) combination produces
+   the lowest error for each segment.
+
+If the rotation hypothesis is right, segment 0 should clearly score
+best against Tran, segment 1 against Vert, etc.  The winning
+anchor-bytes-location will reveal which segment-header bytes encode
+the per-segment channel anchors.
+
+If the rotation hypothesis is *not* right, the scorer will at least
+narrow down what segment 1 actually carries.
+
+## Test fixtures
+
+Committed under `tests/fixtures/`:
+
+- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
+  PPV < 1 in/s).  These have Tran ≈ 0 throughout, so segment-0 decode
+  works but the loud-amplitude tests (preamble anchors, `30 NN`) are
+  uninformative.
+- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
+  channels).  These cracked the Tran codec.
+- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures.  JQ0 is Vert-heavy,
+  V70 is Mic-heavy (140 dB).  These cracked the `00 NN` RLE rule.
+
+Each fixture has a `.TXT` Blastware ASCII export as ground truth.
+
+## Tests
+
+`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
+
+- Block framing (5 tag types with correct lengths).
+- Walker contiguity (no gaps or overlaps).
+- Segment header parsing (counter monotonicity, fixed-pattern check).
+- `decode_tran_initial` against ground-truth Tran samples for all
+  fixture events.
+
+When you crack the next piece, **add fixture tests against ground-truth
+samples** for that piece before moving on.  Don't let unverified code
+ship without a regression lock-in.
@@ -1,119 +1,133 @@
 """
-waveform_codec.py — block-walker for the MiniMate Plus waveform body codec.
+waveform_codec.py — block-walker and partial decoder for the MiniMate Plus
+waveform-file body.

-PARTIAL REVERSE-ENGINEERING — 2026-05-08.
+PARTIAL REVERSE-ENGINEERING — last updated 2026-05-11.

-Status: STRUCTURAL FRAMING confirmed; per-block sample interpretation OPEN.
+The Blastware waveform-file body — the bytes between the 21-byte STRT
+record and the 26-byte file footer — is NOT raw int16 LE samples (the
+historical assumption that produced full-scale ±32K noise on every
+event).  It is a tagged variable-length block stream with a custom
+delta + RLE codec.

-This module replaces the int16-LE assumption that produced full-scale ±32K
-noise on every event. The body is NOT raw int16 LE: it is a sequence of
-tagged variable-length blocks. The block framing is solved here. The
-mapping from block bytes to ADC samples is **NOT yet pinned down** — the
-work-in-progress decoder ``decode_waveform_v2`` returns ``None`` until
-a verified algorithm is wired in.
+Current status:

-Until ``decode_waveform_v2`` returns a verified result, callers that need
-sample data should keep relying on the legacy decoder in ``client.py``
-(known-broken, but at least stable in shape) and not consume this
-module's sample output.
+- Block framing: ✅ solved (block types and lengths all confirmed)
+- Tran channel, segment 0: ✅ solved (decode_tran_initial returns
+  byte-exact values vs BW's ASCII export, across 5 of 5 loud-bundle
+  events; first ~510 samples per event)
+- Multi-segment Tran continuation: ❌ open (every hypothesis breaks
+  at the segment-1 boundary around sample 512)
+- Vert / Long / Mic channel decoders: ❌ open
+- 30 NN block content: ❌ open (only appears in loud-from-start events)
+
+Production code in client.py still uses the broken int16 LE decoder.
+``decode_waveform_v2`` here returns ``None`` as a placeholder.  Callers
+that need sample arrays should treat the legacy decoder's output as
+"unverified" — the BW binary write path is the only sample-bearing
+output that is currently trustworthy.

 ────────────────────────────────────────────────────────────────────────────
-Body structure (CONFIRMED 2026-05-08 against decode-re/5-8-26 4-event bundle)
+Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
 ────────────────────────────────────────────────────────────────────────────

-The Blastware waveform-file body lives between bytes [22+21=43] and the
-26-byte file footer (``[: -26]``).  Layout:
+    [7-byte preamble] [stream of tagged blocks] [trailer]

-    [preamble: 7 or 9 bytes]
-    [data section: a stream of tagged blocks]
-    [trailer: per-channel summary blocks]
+The preamble is always exactly 7 bytes:

-The preamble starts with the magic ``00 02 00 00``.  After that there is
-either 3 or 5 bytes of header before the first ``10 NN`` block tag — in
-the 4-event bundle, single-shot events have a 7-byte preamble and
-continuous events have 9.  The exact meaning of bytes [4:9] is open
-(empirically: byte [4] for event-a == truth Tran[0]; byte [4] for
-event-b == truth Tran[0]; events c/d = 0; treating it as a per-channel
-"initial value" partially matches but is inconsistent across events).
+    body[0:3]  = 00 02 00              magic
+    body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
+    body[5:7]  = Tran[1]   int16 BE    in 16-count units

-Blocks have 2-byte tags and these confirmed lengths:
+(Earlier drafts of this module described a "7-or-9-byte preamble";
+that was wrong — single-shot and continuous events both use 7 bytes.
+The "extra 2 bytes" on continuous events were the first ``00 NN`` RLE
+marker, not part of the preamble.)

-| Tag (hex) | Block type                           | Total length    |
-|-----------|--------------------------------------|-----------------|
-| ``10 NN`` | Small-delta data block               | NN/2 + 2 bytes  |
-| ``20 NN`` | Literal data block (looks int8-ish)  | NN + 2 bytes    |
-| ``00 NN`` | 2-byte marker between data blocks    | 2 bytes         |
-| ``30 NN`` | Trailer summary block                | NN × 4 bytes    |
-| ``40 02`` | Segment header                       | 20 bytes        |
+Block types and lengths (all confirmed):

-In the 4-event bundle, every event's body parses as a clean sequence of
-these blocks all the way through the trailer (when the walker is given
-the right preamble length).  No "??" stops occur once the start offset
-is correct.
+| Tag      | Length                | Meaning                                |
+|----------|-----------------------|----------------------------------------|
+| ``10 NN``| NN/2 + 2 bytes        | 4-bit nibble deltas (2 per byte; high  |
+|          |                       | nibble first; signed 0..7 / 8..F = -8..-1)|
+| ``20 NN``| NN + 2 bytes          | int8 signed deltas (1 per byte)        |
+| ``00 NN``| 2 bytes               | RLE: append NN copies of current value |
+| ``30 NN``| NN*2 in data, NN*4    | Unknown content.  Only in loud events. |
+|          | in trailer            |                                        |
+| ``40 02``| 20 bytes (fixed)      | Segment header                         |

-Segments and the ``40 02`` header
-────────────────────────────────────
+NN is always a multiple of 4.

-The body is divided into ~16 SEGMENTS, each separated by a ``40 02``
-header.  Each segment carries ~80 sample-sets (1280-sample event = 16
-segments × 80 sample-sets, 3328-sample event = ~42 segments).  The 18-byte
-``40 02`` payload contains:
+────────────────────────────────────────────────────────────────────────────
+Tran channel, segment 0 (CONFIRMED 2026-05-11)
+────────────────────────────────────────────────────────────────────────────

-    bytes  0..3   4-byte channel anchor / state (varies per segment)
-    bytes  4..7   4-byte field, varies (RMS/peak per channel?)
-    bytes  8..11  4-byte uint32 LE counter (increments by 1 per segment;
-                  starts at e.g. 0x47 for the first in-data segment)
-    bytes 12..15  4-byte fixed pattern: 02 00 00 01
-    bytes 16..17  2-byte segment-relative payload counter
+Segment 0 — everything before the first ``40 02`` segment header — encodes
+Tran samples only.  Starting from preamble anchors Tran[0] and Tran[1],
+each subsequent block contributes to the running Tran value:

-The counter at bytes [8..11] increments cleanly across segments — useful
-as a sanity check.  The role of bytes [0..3] (anchor candidates) and
-[4..7] is not pinned down: simple "channel state at segment boundary"
-hypotheses do NOT match truth across all four sample bundles tested.
+    10 NN  →  append NN deltas (4-bit signed nibbles)
+    20 NN  →  append NN deltas (int8 signed bytes)
+    00 NN  →  append NN copies of the current value (RLE zeros)
+    40 02  →  segment 0 ends; multi-segment continuation is open

-What's open
-────────────
+This decodes the first 482–510 samples of Tran for each event with zero
+errors against BW's ASCII export.  The exact segment-0 sample count
+varies per event (it's bounded by a fixed device-flash byte budget, not
+a fixed sample count — quiet events fit more samples because zero
+deltas pack into ``00 NN`` markers compactly).

-The mapping ``block bytes → ADC samples`` is the open question.  Tested
-hypotheses that did **not** match BW's ASCII export to within the
-required ±1 ADC count:
+Implementation: :func:`decode_tran_initial`.

-1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved
-   (TVLM/VTLM/LMTV/all 24 permutations × 2 nibble orders × 2 sign
-   conventions = 96 combinations tested).  All produce values that
-   diverge from truth after the first ~7 sample-sets.
+────────────────────────────────────────────────────────────────────────────
+Segment header (40 02, 20 bytes total)
+────────────────────────────────────────────────────────────────────────────

-2. ``20 NN`` data = int8 absolute samples for one channel.  Magnitudes
-   in observed blocks (peak ~±34 in the smoothest event-c block at
-   offset 351) do not match any channel's PPV at any plausible
-   ADC-count quantization (1-count, 4-count, 8-count, 16-count).
+The 18-byte payload of the ``40 02`` block:

-3. ``00 NN`` marker = "skip N sample-sets".  Sums of NN/4 across markers
-   do not match 80 sample-sets per segment.
+| Offset    | Field                                       | Status      |
+|-----------|---------------------------------------------|-------------|
+| [0:2]     | T_delta at first sample of new segment      | ✅ confirmed|
+|           | (int16 BE, in 16-count units)               |             |
+| [2:4]     | Likely T_delta at sample seg_start+1        | 🟡 likely   |
+| [4:6]     | Unknown (varies; possibly checksum)         | ❓ open     |
+| [6:8]     | Byte length to next segment header − 2      | ✅ confirmed|
+|           | (uint16 BE; useful for walker pre-scan)     |             |
+| [8:12]    | Monotonic uint32 LE counter                 | ✅ confirmed|
+|           | (starts ~0x47, increments by 1 per segment) |             |
+| [12:14]   | Constant ``02 00``                          | ✅ confirmed|
+| [14:18]   | Unknown 4-byte field                        | ❓ open     |

-4. Concatenating ALL ``10 NN`` payload bytes and reading as a continuous
-   nibble stream (TVLM round-robin) produces the same 96-combination
-   problem as (1).
+────────────────────────────────────────────────────────────────────────────
+What breaks the multi-segment decoder (the main open question)
+────────────────────────────────────────────────────────────────────────────

-The most promising lead — that ``20 NN`` blocks carry literal int8
-sample-sequences for the largest-amplitude channel within a segment —
-is consistent with the smooth waveform shape of those payloads, but
-the magnitude scaling has not been pinned down.  It's possible that
-``10 NN`` and ``20 NN`` blocks carry different bit-widths of the same
-channel-interleaved delta stream (variable-width like Rice coding)
-with 4-bit deltas as default and 8-bit deltas as escape.
+After segment 0 ends and the segment header T_delta is consumed,
+applying segment 1's blocks as Tran continuation produces values that
+diverge from truth by sample ~512.  The block structure inside segment
+1 is IDENTICAL to segment 0 (same alternating 10 NN / 00 NN pattern),
+and the delta budget matches the segment size exactly (V70 segment 1
+has 264 nibble-deltas + 244 RLE zeros = 508 = the segment's sample
+count).  But the cumulative is wrong.

-Potential next steps for whoever picks this up:
+The strongest unverified hypothesis is that segments rotate channels:

- Capture an event with a KNOWN external waveform (e.g. a calibration
-  signal of known frequency/amplitude) so the truth is unambiguous and
-  the magnitude scaling is unambiguous.
- Capture multiple events with the SAME signal but DIFFERENT geo_range
-  (Normal 10 in/s vs Sensitive 1.25 in/s) to disambiguate scaling.
- Examine sequential 0x10 segment headers for a single event — the
-  4-byte "anchor" should reflect cumulative sample state at the
-  boundary; matching it to truth at that sample index would unlock
-  the per-segment delta decode.
+    segment 0  →  Tran samples 0..509
+    segment 1  →  Vert samples 0..507
+    segment 2  →  Long samples 0..507
+    segment 3  →  Mic  samples 0..507
+    segment 4  →  Tran samples 510..N (continuation)
+    ...
+
+This is consistent with the segment-1 block sums net-to-near-zero in
+V70 (where all 4 channels are near zero) and with the per-segment delta
+budget matching the segment size for a single channel.  It is NOT yet
+verified because the per-segment channel anchor isn't pinned down in
+the segment header — bytes [4:6] and [14:18] of the header are still
+open and probably encode V/L/M anchors.
+
+See ``docs/waveform_codec_re_status.md`` for the current working notes
+and the suggested next experiment ("segment-channel scoring analyzer").
 """

 from __future__ import annotations