docs: clean up waveform-codec doc layers per review

Three "truth layers" had drifted apart between commits. Fixed: 1. waveform_codec.py docstring rewritten from the 2026-05-08 "structural framing only" state to the 2026-05-11 "Tran segment 0 solved + segment-header partially decoded" state. Killed stale "~80 sample-sets per segment" language (real segments are flash-page-byte-sized, not sample-count-sized; observed first-segment sizes are 42-510 samples depending on signal). Killed stale "preamble is 7 or 9 bytes" language (always 7). 2. docs/instantel_protocol_reference.md §7.6.1: added a clear "CURRENT STATUS" box at the top with a status table. Replaced the stale "~80 sample-sets" line with the verified per-event segment sizes. Merged two redundant segment-header field-table sections. 3. docs/waveform_codec_re_status.md (NEW): clean working-status doc. Solved / not solved / hypothesis / next experiment / fixtures / tests. The protocol reference remains the historical Rosetta Stone; this new file is the current-truth working note that shouldn't accumulate fossil layers. 4. CLAUDE.md §"Waveform body codec": prominent warning box at top — "DO NOT TRUST decoded sample arrays yet." BW binary passthrough is the only sample-bearing output to trust until the decoder lands. Added a "Next experiment" subsection pointing the next pass at the segment-channel scoring analyzer. 40 tests still pass.
2026-05-12 02:43:25 +00:00
parent 5bf5329369
commit f68ee9f0f9
4 changed files with 385 additions and 139 deletions
@@ -61,10 +61,24 @@ Full read pipeline + write pipeline + erase pipeline + monitor log + call home c
 ## Waveform body codec — PARTIAL (2026-05-11)
 > ### ⛔️ DO NOT TRUST decoded sample arrays yet
 >
 > `client.py:_decode_a5_waveform` still uses the broken legacy int16 LE
 > decoder.  The `.h5` sidecars SFM writes contain WRONG sample values
 > for every event.  Treat decoded sample arrays as "unverified" in all
 > downstream consumers.
 >
 > The **BW binary write path** (`blastware_file.py`) is unaffected —
 > it's pure passthrough of device flash bytes and remains byte-perfect.
 > Use the `.bw` binary as the authoritative waveform output until the
 > codec is fully decoded.
 >
 > Clean working-status doc: `docs/waveform_codec_re_status.md`.
 > Full archaeological record: `docs/instantel_protocol_reference.md §7.6.1`.
 The **per-byte decoding** of the Blastware waveform-file body (between the
 21-byte STRT record and the 26-byte footer) was historically claimed to be
-"raw int16 LE, 8 bytes per sample-set."  That was wrong — see the
+"raw int16 LE, 8 bytes per sample-set."  That was wrong.  The body
 retraction in `docs/instantel_protocol_reference.md §7.6.1`.  The body
 is actually a tagged-block stream with a custom delta+RLE codec.
 ### What's solved (2026-05-11)
@@ -96,13 +110,26 @@ is actually a tagged-block stream with a custom delta+RLE codec.
  (SS0, SV0) and breaks the simple Tran walk there.  Probably a channel-
  switch or alternative-encoding marker for high-amplitude regions.
 ### Next experiment
 **Don't hero-code the full decoder.**  Build a small analysis tool — a
 segment-channel scoring analyzer.  For each segment of each fixture
 event, run the segment-0 Tran block-walk + RLE decode and score the
 cumulative trajectory against the BW ASCII truth for each of {Tran,
 Vert, Long, MicL} over that segment's sample range, trying different
 anchor-bytes candidates from the segment header.  The winning
 (channel, anchor-location) combination for each segment reveals
 whether segments rotate channels and which header bytes encode the
 per-segment channel anchors.
 See `docs/waveform_codec_re_status.md` for the full specification of
 the next experiment.
 ### Production-code status
 `client.py:_decode_a5_waveform` still uses the old (broken) int16 LE
-decoder.  Until the multi-channel decoder lands, the `.h5` sidecars
+decoder (see warning at the top of this section).  `decode_waveform_v2()`
-produced by SFM contain WRONG samples — keep treating them as
+in `minimateplus/waveform_codec.py` returns `None` as a placeholder.
 "unverified" downstream.  `decode_waveform_v2()` returns `None` as a
 placeholder.
 ### Test fixtures
@@ -860,20 +860,39 @@ MicL:  39 64 1D AA  =  0.0000875 psi
 ---
-#### 7.6.1 Blast / Waveform mode — 🟡 STRUCTURAL FRAMING + TRAN CODEC DECODED (2026-05-11)
+#### 7.6.1 Blast / Waveform mode — 🟡 PARTIAL DECODE (2026-05-11)
-> **Status (2026-05-11):** Block-level framing is solved.  The Tran-channel
+> ### 📌 CURRENT STATUS — read this first
 > encoding (preamble + first data block) is **fully verified** against the
 > 3-event May 11 2026 high-amplitude bundle (PPV 6-7 in/s) and the 4-event
 > May 8 bundle.  Verts / Long / MicL channel encodings and multi-block
 > Tran continuation are **still open**.  The previous int16 LE claim
 > remains REFUTED (see history below).
 >
-> The earlier "4-channel interleaved s16 LE, 8 bytes per sample-set"
+> The body codec is **partially decoded** as of 2026-05-11.  This
-> claim was never validated and was wrong.  No event in the project's
+> section contains both current-truth spec AND historical retractions;
-> archive ever came close to ADC saturation, yet the int16 LE decoder
+> when in doubt, the working summary lives at
-> consistently produced full-scale ±32K noise — that was the signature
+> `docs/waveform_codec_re_status.md`.
-> of mis-aligned encoded data, not signal saturation.
+>
 > | Item | Status |
 > |---|---|
 > | Body has tagged variable-length blocks, NOT raw int16 LE | ✅ confirmed |
 > | 5 block tag types (10/20/00/30/40 NN) with lengths | ✅ confirmed |
 > | 7-byte preamble: `00 02 00` + Tran[0] + Tran[1] int16 BE | ✅ confirmed |
 > | `00 NN` = RLE for zero deltas in the current channel | ✅ confirmed |
 > | Tran channel, segment 0 (~482-510 samples / event) | ✅ byte-exact, 5/5 events |
 > | Multi-segment Tran continuation | ❌ open (breaks at sample ~512) |
 > | Vert / Long / MicL channel decoders | ❌ open |
 > | `30 NN` block content (loud-from-start events) | ❌ open |
 > | Earlier "raw int16 LE, 8 bytes per sample-set" claim | ❌ REFUTED |
 >
 > **Production code in `client.py:_decode_a5_waveform` still uses the
 > broken int16 LE decoder.**  The `.h5` sidecars SFM produces contain
 > wrong sample values and must be treated as "unverified" downstream.
 > The BW binary write path is unaffected (it's pure passthrough of the
 > device's flash bytes, no decoding) and remains byte-perfect.
 The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim that
 appeared in earlier revisions of this section was never validated and
 was wrong.  No event in the project's archive ever came close to ADC
 saturation, yet the int16 LE decoder consistently produced full-scale
 ±32K noise — that was the signature of mis-aligned encoded data, not
 signal saturation.
 ##### Body file layout
@@ -932,23 +951,38 @@ followed by a ``00 NN`` marker before the next data block.
 ##### Segments
-The body is divided into ~16 SEGMENTS for a 1280-sample event (= 1
+The body is divided into segments separated by ``40 02`` segment headers.
-segment per ~80 sample-sets), separated by ``40 02`` segment headers.
+**Segment size is variable** — bounded by a fixed device-flash byte
-A 3328-sample event has ~42 segments.
+budget, not a fixed sample count.  Quiet events fit more samples per
 segment (RLE compacts zero deltas via ``00 NN`` markers); loud events
 fit fewer.  Observed first-segment sizes in the bundled fixtures:
-The 18-byte ``40 02`` payload structure (CONFIRMED across all 4
+| Event | Segment 0 size (Tran samples) |
-fixtures by inspecting the increment of bytes [8:12]):
+|---|---|
 | SP0 (loud, 0.25s pretrig) | 510 |
 | SV0 (loud-from-start) | 58 (stops at first ``30 NN``) |
 | SS0 (loud-from-start) | 42 (stops at first ``30 04``) |
 | JQ0 (Vert-heavy, quiet Tran) | 510 |
 | V70 (Mic-heavy, quiet geos) | 510 |
-| Offset | Length | Field                                            |
+⚠️ Earlier drafts of this section claimed "~80 sample-sets per segment"
-|--------|--------|--------------------------------------------------|
+based on incomplete walks; that figure is wrong.  Segments are
-| 0      | 4      | Anchor / channel state (open — see below)        |
+flash-page-sized in bytes, not sample-count-sized.
 | 4      | 4      | Variable field (open)                            |
 | 8      | 4      | uint32 LE counter — increments by 1 per segment  |
 | 12     | 4      | Fixed pattern ``02 00 00 01``                    |
 | 16     | 2      | Variable tail                                    |
-The counter at bytes [8:12] starts in the 0x40s for a freshly-erased
+The 18-byte ``40 02`` payload structure:
-device and increments cleanly — useful as a structural sanity check.
+
 | Offset    | Field                                       | Status      |
 |-----------|---------------------------------------------|-------------|
 | [0:2]     | T_delta at first sample of new segment      | ✅ confirmed|
 |           | (int16 BE, in 16-count units)               |             |
 | [2:4]     | Likely T_delta at sample seg_start+1        | 🟡 likely   |
 | [4:6]     | Unknown (varies; possibly a checksum)       | ❓ open     |
 | [6:8]     | Byte length to next segment header − 2      | ✅ confirmed|
 |           | (uint16 BE; useful for walker pre-scan)     |             |
 | [8:12]    | Monotonic uint32 LE counter                 | ✅ confirmed|
 |           | (starts ~0x47, increments by 1 per segment) |             |
 | [12:14]   | Constant ``02 00``                          | ✅ confirmed|
 | [14:18]   | Unknown 4-byte field                        | ❓ open     |
 Examples from event-c (1 sec single-shot):
@@ -1008,26 +1042,25 @@ where the codec is most complex stop at the first ``30 04``.
 Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.
-##### Segment header T-delta (PARTIAL 2026-05-11)
+##### Multi-segment Tran continuation — OPEN
-The 20-byte ``40 02`` segment header has its first 2 bytes ([0:2] of
+After segment 0 ends and the segment header's T_delta (bytes [0:2])
-payload) as an int16 BE Tran delta for the first sample of the new
+is applied, the next segment's blocks produce values that diverge from
-segment.  Verified across V70 (3 segments with 0 deltas) and SP0/JQ0
+truth by sample ~512.  The block structure inside segment 1 is
-(1 segment with +1 delta).  Other bytes of the segment header payload
+identical to segment 0 (alternating ``10 NN`` / ``20 NN`` data +
-are partially understood:
+``00 NN`` RLE), and the per-segment delta budget exactly matches the
 segment size — V70 segment 1 has 264 nibble-deltas + 244 RLE-zeros =
 508 = the segment's sample count.  Cumulative deltas are correct in
 aggregate (V70 net-zero ≈ truth net-zero) but the per-sample trajectory
 is wrong when applied as Tran continuation.
-| Payload offset | Field | Status |
+The strongest unverified hypothesis is that **segments rotate
-|---|---|---|
+channels**: segment 0 = Tran, segment 1 = Vert, segment 2 = Long,
-| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
+segment 3 = Mic, segment 4 = Tran continuation, …  This would explain
-| [2:4] | unknown (often 0; not a simple V or T delta) | ❓ open |
+the per-segment delta-budget match while also explaining why segment
-| [4:6] | unknown (varies per event; possibly a checksum) | ❓ open |
+1 isn't Tran continuation.  Verification needs the per-channel anchor
-| [6:8] | byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
+to come from segment-header bytes [4:6] or [14:18], which are still
-| [8:12] | monotonic uint32 LE counter | ✅ confirmed |
+open.
 | [12:14] | constant ``02 00`` | ✅ confirmed |
 | [14:18] | unknown 4-byte field | ❓ open |
 Multi-segment Tran decoding diverges after sample ~512 — the per-segment
 channel ordering after the header is still unknown.
 ##### What's still open
@@ -0,0 +1,172 @@
 # Waveform body codec — current working status (2026-05-11)
 This is the **clean working note** for the body-codec reverse-engineering
 effort.  It supersedes scattered claims elsewhere when they conflict.
 The deep historical record (with retractions, dead ends, and dated
 analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
 authoritative implementation lives in `minimateplus/waveform_codec.py`.
 ## TL;DR
 The Blastware waveform-file body is a **tagged variable-length block
 stream**, NOT raw int16 LE samples.  Block framing is solved.  Tran
 channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
 across all 5 high-amplitude fixture events).  Multi-segment continuation
 and the Vert / Long / MicL channel decoders are still open.
 **Production code in `minimateplus/client.py:_decode_a5_waveform` still
 uses the broken legacy int16 LE decoder.**  Sample arrays it writes to
 the `.h5` sidecars are wrong and must be treated as "unverified" by all
 downstream consumers.  The BW binary write path (`blastware_file.py`)
 is unaffected — it's pure passthrough and remains byte-perfect.
 ## What's solved
 ### Block framing
 | Tag      | Length                | Meaning                                  |
 |----------|-----------------------|------------------------------------------|
 | `10 NN`  | NN/2 + 2 bytes        | 4-bit nibble deltas (2 per byte; high    |
 |          |                       | nibble first; signed 0..7 / 8..F = -8..-1)|
 | `20 NN`  | NN + 2 bytes          | int8 signed deltas (1 per byte)          |
 | `00 NN`  | 2 bytes               | RLE: append NN copies of current value   |
 | `30 NN`  | NN*2 in data section, | Unknown content.  Only in loud-from-     |
 |          | NN*4 in trailer       | start events.                            |
 | `40 02`  | 20 bytes (fixed)      | Segment header                           |
 NN is always a multiple of 4.
 Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
 ### 7-byte preamble
 ```
 body[0:3]  = 00 02 00              magic
 body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
 body[5:7]  = Tran[1]   int16 BE    in 16-count units
 ```
 ### Tran channel, segment 0
 Segment 0 (everything before the first `40 02`) encodes Tran samples
 only.  Starting from preamble anchors Tran[0] and Tran[1], each block
 contributes to a running cumulative:
 - `10 NN` →  append NN nibble-deltas
 - `20 NN` →  append NN int8-deltas
 - `00 NN` →  append NN copies of current value (RLE)
 - `40 02` →  end segment 0
 Verified byte-exact:
 | Event | Description | Segment 0 size | Match |
 |---|---|---|---|
 | `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
 | `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
 | `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
 | `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
 | `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
 Implementation: `decode_tran_initial()`.
 ### Segment header (`40 02`, 20 bytes total)
 | Payload offset | Field | Status |
 |---|---|---|
 | [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
 | [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
 | [4:6] | Unknown (possibly checksum) | ❓ open |
 | [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
 | [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
 | [12:14] | Constant `02 00` | ✅ confirmed |
 | [14:18] | Unknown 4-byte field | ❓ open |
 ## What's still open
 1. **Multi-segment Tran continuation.**  After segment 0, applying
   segment 1's blocks as Tran continuation diverges from truth by
   sample ~512.  Block structure is identical to segment 0 and the
   per-segment delta budget matches the segment size — but the per-
   sample trajectory is wrong.
 2. **Vert / Long / MicL channel decoders.**  No verified decoder for
   any non-Tran channel.
 3. **`30 NN` block content.**  Only appears in loud-from-start events.
   Probably a channel-switch or alternative-encoding marker for high-
   amplitude regions.  Walker steps over it without decoding.
 ## Strongest unverified hypothesis
 Segments rotate channels:
 ```
 segment 0  →  Tran samples 0..509
 segment 1  →  Vert samples 0..507
 segment 2  →  Long samples 0..507
 segment 3  →  Mic  samples 0..507
 segment 4  →  Tran samples 510..N (continuation)
 ...
 ```
 This would explain:
 - Why segment-0 = Tran works perfectly.
 - Why segment 1 has the same block structure but applying it as Tran
  continuation gives wrong values.
 - Why the per-segment delta budget matches the segment size for a
  *single* channel (508 deltas per segment, not 4 × 508).
 Not yet verified because the per-channel anchor at segment-start isn't
 identified in the segment header.  Bytes [4:6] and [14:18] of the
 header are the prime candidates.
 ## Next experiment — segment-channel scoring analyzer
 Don't try to hero-code the full decoder.  Instead, build a small
 analysis tool that:
 1. For each segment in every fixture event, runs the segment-0 Tran
   decoder (block-walk + RLE) and produces a cumulative trajectory
   of 508 deltas.
 2. Scores that trajectory against the BW ASCII truth for *each* of
   {Tran, Vert, Long, MicL} over the segment's sample range, starting
   from different anchor-byte candidates from the segment header.
 3. Reports which (channel, anchor-bytes-location) combination produces
   the lowest error for each segment.
 If the rotation hypothesis is right, segment 0 should clearly score
 best against Tran, segment 1 against Vert, etc.  The winning
 anchor-bytes-location will reveal which segment-header bytes encode
 the per-segment channel anchors.
 If the rotation hypothesis is *not* right, the scorer will at least
 narrow down what segment 1 actually carries.
 ## Test fixtures
 Committed under `tests/fixtures/`:
 - `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
  PPV < 1 in/s).  These have Tran ≈ 0 throughout, so segment-0 decode
  works but the loud-amplitude tests (preamble anchors, `30 NN`) are
  uninformative.
 - `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
  channels).  These cracked the Tran codec.
 - `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures.  JQ0 is Vert-heavy,
  V70 is Mic-heavy (140 dB).  These cracked the `00 NN` RLE rule.
 Each fixture has a `.TXT` Blastware ASCII export as ground truth.
 ## Tests
 `tests/test_waveform_codec.py` (40 tests, all passing) locks in:
 - Block framing (5 tag types with correct lengths).
 - Walker contiguity (no gaps or overlaps).
 - Segment header parsing (counter monotonicity, fixed-pattern check).
 - `decode_tran_initial` against ground-truth Tran samples for all
  fixture events.
 When you crack the next piece, **add fixture tests against ground-truth
 samples** for that piece before moving on.  Don't let unverified code
 ship without a regression lock-in.
@@ -1,119 +1,133 @@
 """
-waveform_codec.py — block-walker for the MiniMate Plus waveform body codec.
+waveform_codec.py — block-walker and partial decoder for the MiniMate Plus
 waveform-file body.
-PARTIAL REVERSE-ENGINEERING — 2026-05-08.
+PARTIAL REVERSE-ENGINEERING — last updated 2026-05-11.
-Status: STRUCTURAL FRAMING confirmed; per-block sample interpretation OPEN.
+The Blastware waveform-file body — the bytes between the 21-byte STRT
 record and the 26-byte file footer — is NOT raw int16 LE samples (the
 historical assumption that produced full-scale ±32K noise on every
 event).  It is a tagged variable-length block stream with a custom
 delta + RLE codec.
-This module replaces the int16-LE assumption that produced full-scale ±32K
+Current status:
 noise on every event. The body is NOT raw int16 LE: it is a sequence of
 tagged variable-length blocks. The block framing is solved here. The
 mapping from block bytes to ADC samples is **NOT yet pinned down** — the
 work-in-progress decoder ``decode_waveform_v2`` returns ``None`` until
 a verified algorithm is wired in.
-Until ``decode_waveform_v2`` returns a verified result, callers that need
+- Block framing: ✅ solved (block types and lengths all confirmed)
-sample data should keep relying on the legacy decoder in ``client.py``
+- Tran channel, segment 0: ✅ solved (decode_tran_initial returns
-(known-broken, but at least stable in shape) and not consume this
+  byte-exact values vs BW's ASCII export, across 5 of 5 loud-bundle
-module's sample output.
+  events; first ~510 samples per event)
 - Multi-segment Tran continuation: ❌ open (every hypothesis breaks
  at the segment-1 boundary around sample 512)
 - Vert / Long / Mic channel decoders: ❌ open
 - 30 NN block content: ❌ open (only appears in loud-from-start events)
 Production code in client.py still uses the broken int16 LE decoder.
 ``decode_waveform_v2`` here returns ``None`` as a placeholder.  Callers
 that need sample arrays should treat the legacy decoder's output as
 "unverified" — the BW binary write path is the only sample-bearing
 output that is currently trustworthy.
 ────────────────────────────────────────────────────────────────────────────
-Body structure (CONFIRMED 2026-05-08 against decode-re/5-8-26 4-event bundle)
+Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
 ────────────────────────────────────────────────────────────────────────────
-The Blastware waveform-file body lives between bytes [22+21=43] and the
+    [7-byte preamble] [stream of tagged blocks] [trailer]
 26-byte file footer (``[: -26]``).  Layout:
-    [preamble: 7 or 9 bytes]
+The preamble is always exactly 7 bytes:
    [data section: a stream of tagged blocks]
    [trailer: per-channel summary blocks]
-The preamble starts with the magic ``00 02 00 00``.  After that there is
+    body[0:3]  = 00 02 00              magic
-either 3 or 5 bytes of header before the first ``10 NN`` block tag — in
+    body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
-the 4-event bundle, single-shot events have a 7-byte preamble and
+    body[5:7]  = Tran[1]   int16 BE    in 16-count units
 continuous events have 9.  The exact meaning of bytes [4:9] is open
 (empirically: byte [4] for event-a == truth Tran[0]; byte [4] for
 event-b == truth Tran[0]; events c/d = 0; treating it as a per-channel
 "initial value" partially matches but is inconsistent across events).
-Blocks have 2-byte tags and these confirmed lengths:
+(Earlier drafts of this module described a "7-or-9-byte preamble";
 that was wrong — single-shot and continuous events both use 7 bytes.
 The "extra 2 bytes" on continuous events were the first ``00 NN`` RLE
 marker, not part of the preamble.)
-| Tag (hex) | Block type                           | Total length    |
+Block types and lengths (all confirmed):
 |-----------|--------------------------------------|-----------------|
 | ``10 NN`` | Small-delta data block               | NN/2 + 2 bytes  |
 | ``20 NN`` | Literal data block (looks int8-ish)  | NN + 2 bytes    |
 | ``00 NN`` | 2-byte marker between data blocks    | 2 bytes         |
 | ``30 NN`` | Trailer summary block                | NN × 4 bytes    |
 | ``40 02`` | Segment header                       | 20 bytes        |
-In the 4-event bundle, every event's body parses as a clean sequence of
+| Tag      | Length                | Meaning                                |
-these blocks all the way through the trailer (when the walker is given
+|----------|-----------------------|----------------------------------------|
-the right preamble length).  No "??" stops occur once the start offset
+| ``10 NN``| NN/2 + 2 bytes        | 4-bit nibble deltas (2 per byte; high  |
-is correct.
+|          |                       | nibble first; signed 0..7 / 8..F = -8..-1)|
 | ``20 NN``| NN + 2 bytes          | int8 signed deltas (1 per byte)        |
 | ``00 NN``| 2 bytes               | RLE: append NN copies of current value |
 | ``30 NN``| NN*2 in data, NN*4    | Unknown content.  Only in loud events. |
 |          | in trailer            |                                        |
 | ``40 02``| 20 bytes (fixed)      | Segment header                         |
-Segments and the ``40 02`` header
+NN is always a multiple of 4.
 ────────────────────────────────────
-The body is divided into ~16 SEGMENTS, each separated by a ``40 02``
+────────────────────────────────────────────────────────────────────────────
-header.  Each segment carries ~80 sample-sets (1280-sample event = 16
+Tran channel, segment 0 (CONFIRMED 2026-05-11)
-segments × 80 sample-sets, 3328-sample event = ~42 segments).  The 18-byte
+────────────────────────────────────────────────────────────────────────────
 ``40 02`` payload contains:
-    bytes  0..3   4-byte channel anchor / state (varies per segment)
+Segment 0 — everything before the first ``40 02`` segment header — encodes
-    bytes  4..7   4-byte field, varies (RMS/peak per channel?)
+Tran samples only.  Starting from preamble anchors Tran[0] and Tran[1],
-    bytes  8..11  4-byte uint32 LE counter (increments by 1 per segment;
+each subsequent block contributes to the running Tran value:
                  starts at e.g. 0x47 for the first in-data segment)
    bytes 12..15  4-byte fixed pattern: 02 00 00 01
    bytes 16..17  2-byte segment-relative payload counter
-The counter at bytes [8..11] increments cleanly across segments — useful
+    10 NN  →  append NN deltas (4-bit signed nibbles)
-as a sanity check.  The role of bytes [0..3] (anchor candidates) and
+    20 NN  →  append NN deltas (int8 signed bytes)
-[4..7] is not pinned down: simple "channel state at segment boundary"
+    00 NN  →  append NN copies of the current value (RLE zeros)
-hypotheses do NOT match truth across all four sample bundles tested.
+    40 02  →  segment 0 ends; multi-segment continuation is open
-What's open
+This decodes the first 482–510 samples of Tran for each event with zero
-────────────
+errors against BW's ASCII export.  The exact segment-0 sample count
 varies per event (it's bounded by a fixed device-flash byte budget, not
 a fixed sample count — quiet events fit more samples because zero
 deltas pack into ``00 NN`` markers compactly).
-The mapping ``block bytes → ADC samples`` is the open question.  Tested
+Implementation: :func:`decode_tran_initial`.
 hypotheses that did **not** match BW's ASCII export to within the
 required ±1 ADC count:
-1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved
+────────────────────────────────────────────────────────────────────────────
-   (TVLM/VTLM/LMTV/all 24 permutations × 2 nibble orders × 2 sign
+Segment header (40 02, 20 bytes total)
-   conventions = 96 combinations tested).  All produce values that
+────────────────────────────────────────────────────────────────────────────
   diverge from truth after the first ~7 sample-sets.
-2. ``20 NN`` data = int8 absolute samples for one channel.  Magnitudes
+The 18-byte payload of the ``40 02`` block:
   in observed blocks (peak ~±34 in the smoothest event-c block at
   offset 351) do not match any channel's PPV at any plausible
   ADC-count quantization (1-count, 4-count, 8-count, 16-count).
-3. ``00 NN`` marker = "skip N sample-sets".  Sums of NN/4 across markers
+| Offset    | Field                                       | Status      |
-   do not match 80 sample-sets per segment.
+|-----------|---------------------------------------------|-------------|
 | [0:2]     | T_delta at first sample of new segment      | ✅ confirmed|
 |           | (int16 BE, in 16-count units)               |             |
 | [2:4]     | Likely T_delta at sample seg_start+1        | 🟡 likely   |
 | [4:6]     | Unknown (varies; possibly checksum)         | ❓ open     |
 | [6:8]     | Byte length to next segment header − 2      | ✅ confirmed|
 |           | (uint16 BE; useful for walker pre-scan)     |             |
 | [8:12]    | Monotonic uint32 LE counter                 | ✅ confirmed|
 |           | (starts ~0x47, increments by 1 per segment) |             |
 | [12:14]   | Constant ``02 00``                          | ✅ confirmed|
 | [14:18]   | Unknown 4-byte field                        | ❓ open     |
-4. Concatenating ALL ``10 NN`` payload bytes and reading as a continuous
+────────────────────────────────────────────────────────────────────────────
-   nibble stream (TVLM round-robin) produces the same 96-combination
+What breaks the multi-segment decoder (the main open question)
-   problem as (1).
+────────────────────────────────────────────────────────────────────────────
-The most promising lead — that ``20 NN`` blocks carry literal int8
+After segment 0 ends and the segment header T_delta is consumed,
-sample-sequences for the largest-amplitude channel within a segment —
+applying segment 1's blocks as Tran continuation produces values that
-is consistent with the smooth waveform shape of those payloads, but
+diverge from truth by sample ~512.  The block structure inside segment
-the magnitude scaling has not been pinned down.  It's possible that
+1 is IDENTICAL to segment 0 (same alternating 10 NN / 00 NN pattern),
-``10 NN`` and ``20 NN`` blocks carry different bit-widths of the same
+and the delta budget matches the segment size exactly (V70 segment 1
-channel-interleaved delta stream (variable-width like Rice coding)
+has 264 nibble-deltas + 244 RLE zeros = 508 = the segment's sample
-with 4-bit deltas as default and 8-bit deltas as escape.
+count).  But the cumulative is wrong.
-Potential next steps for whoever picks this up:
+The strongest unverified hypothesis is that segments rotate channels:
- Capture an event with a KNOWN external waveform (e.g. a calibration
+    segment 0  →  Tran samples 0..509
-  signal of known frequency/amplitude) so the truth is unambiguous and
+    segment 1  →  Vert samples 0..507
-  the magnitude scaling is unambiguous.
+    segment 2  →  Long samples 0..507
- Capture multiple events with the SAME signal but DIFFERENT geo_range
+    segment 3  →  Mic  samples 0..507
-  (Normal 10 in/s vs Sensitive 1.25 in/s) to disambiguate scaling.
+    segment 4  →  Tran samples 510..N (continuation)
- Examine sequential 0x10 segment headers for a single event — the
+    ...
-  4-byte "anchor" should reflect cumulative sample state at the
+
-  boundary; matching it to truth at that sample index would unlock
+This is consistent with the segment-1 block sums net-to-near-zero in
-  the per-segment delta decode.
+V70 (where all 4 channels are near zero) and with the per-segment delta
 budget matching the segment size for a single channel.  It is NOT yet
 verified because the per-segment channel anchor isn't pinned down in
 the segment header — bytes [4:6] and [14:18] of the header are still
 open and probably encode V/L/M anchors.
 See ``docs/waveform_codec_re_status.md`` for the current working notes
 and the suggested next experiment ("segment-channel scoring analyzer").
 """
 from __future__ import annotations