Merge pull request 'merge full s3 codec decoded' (#23) from codec-re into main

Reviewed-on: #23
2026-05-20 13:45:32 -04:00
parent 8f568b809b 0466bb4f44
commit d85df4c886
59 changed files with 20834 additions and 108 deletions
@@ -860,127 +860,264 @@ MicL:  39 64 1D AA  =  0.0000875 psi

 ---

-#### 7.6.1 Blast / Waveform mode — ❌ NOT VERIFIED (retracted 2026-05-08)
+#### 7.6.1 Blast / Waveform mode — 🟡 PARTIAL DECODE (2026-05-11)

-> ## ⚠️ RETRACTION (2026-05-08)
+> ### 📌 CURRENT STATUS — read this first
 >
-> The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim
-> below was **never actually validated**.  It got into this document
-> because the decoder built around that assumption produced full-scale
-> ±32K counts on every channel of the 4-2-26 capture, and the
-> ±32K-shaped output was misread as "the signal must have saturated."
+> The body codec is **partially decoded** as of 2026-05-11.  This
+> section contains both current-truth spec AND historical retractions;
+> when in doubt, the working summary lives at
+> `docs/waveform_codec_re_status.md`.
 >
-> Cross-checking the BW-reported peaks proves the opposite:
+> | Item | Status |
+> |---|---|
+> | Body has tagged variable-length blocks, NOT raw int16 LE | ✅ confirmed |
+> | 5 block tag types (10/20/00/30/40 NN) with lengths | ✅ confirmed |
+> | 7-byte preamble: `00 02 00` + Tran[0] + Tran[1] int16 BE | ✅ confirmed |
+> | `00 NN` = RLE for zero deltas in the current channel | ✅ confirmed |
+> | Tran channel, segment 0 (~482-510 samples / event) | ✅ byte-exact, 5/5 events |
+> | Multi-segment Tran continuation | ❌ open (breaks at sample ~512) |
+> | Vert / Long / MicL channel decoders | ❌ open |
+> | `30 NN` block content (loud-from-start events) | ❌ open |
+> | Earlier "raw int16 LE, 8 bytes per sample-set" claim | ❌ REFUTED |
 >
-> | Channel | BW PPV (in/s) | Expected ADC counts at 10 in/s FS |
-> |---|---|---|
-> | Tran | 0.420 | **1,376** |
-> | Vert | 3.870 | **12,686** |
-> | Long | 0.495 | **1,622** |
->
-> None of these are anywhere near ±32K saturation.  No event in the
-> project's archive (across all captures from 1-2-26 onward) has
-> ever come close to saturation either.  Yet the decoder has
-> consistently produced ±32K-shaped noise on every event.  The right
-> conclusion is that the byte-to-sample interpretation has been wrong
-> the whole time, NOT that every event happened to saturate.
->
-> What's actually known about the body bytes:
->
-> - The byte distribution is heavily skewed (24% `0x00`, 10.5% `0x10`,
->   plus high frequencies of `0x01 / 0x04 / 0x0F / 0xF0 / 0xF1`).  Lots
->   of `10 XX` pairs.  Reading them as LE int16 produces uniform ±32K
->   noise — the signature of mis-aligned or encoded data.
-> - The CHANGELOG note for v0.14.2 calls the body a "delta-encoded
->   ADC stream" — that hint plus the byte distribution points toward
->   a delta encoding with `0x10` as an escape marker, but no decoder
->   has been worked out yet.
-> - The histogram-mode codec in §7.6.2 IS verified and decoded
->   correctly (different format: 32-byte blocks with 9× int16 LE
->   samples + metadata).  The same firmware emits both formats, so
->   §7.6.2 may share encoding primitives with the waveform codec
->   and is worth using as a structural hint when reverse-engineering.
->
-> **Treat the spec below as a starting hypothesis to disprove, not
-> ground truth.**  The frame-layout pieces (STRT location, preamble,
-> chunk header) appear correct; the per-byte sample interpretation
-> is the open question.
+> **Production code in `client.py:_decode_a5_waveform` still uses the
+> broken int16 LE decoder.**  The `.h5` sidecars SFM produces contain
+> wrong sample values and must be treated as "unverified" downstream.
+> The BW binary write path is unaffected (it's pure passthrough of the
+> device's flash bytes, no decoding) and remains byte-perfect.

-4-channel interleaved signed 16-bit little-endian, 8 bytes per sample-set:
+The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim that
+appeared in earlier revisions of this section was never validated and
+was wrong.  No event in the project's archive ever came close to ADC
+saturation, yet the int16 LE decoder consistently produced full-scale
+±32K noise — that was the signature of mis-aligned encoded data, not
+signal saturation.
+
+##### Body file layout
+
+A Blastware waveform-file body (the variable-length section between
+the 21-byte STRT record and the 26-byte file footer) is composed of
+**tagged variable-length blocks**, NOT raw int16 samples.

 ```
-[T_lo T_hi  V_lo V_hi  L_lo L_hi  M_lo M_hi]  × N sample-sets
+[preamble: 7 or 9 bytes]
+[stream of tagged blocks]
+[trailer: per-channel summary blocks]
 ```

- **T** = Transverse (Tran), **V** = Vertical (Vert), **L** = Longitudinal (Long), **M** = Microphone
- Channel order follows the Blastware convention: Tran is always first (ch[0]).
- Encoding: signed int16 little-endian.  Full scale = ±32768 counts.
- Sample rate: set by compliance config (typical: 1024 Hz for blast monitoring).
- Each A5 frame chunk carries a different number of waveform bytes.  Frame sizes
-  are NOT multiples of 8, so naive concatenation scrambles channel assignments at
-  frame boundaries.  **Always track cumulative byte offset mod 8 to correct alignment.**
-
-**A5[0] frame layout:**
+**Preamble (CONFIRMED 2026-05-11 across 3+4 events):**

 ```
-db[7:]:   [11-byte header]  [21-byte STRT record]  [6-byte preamble]  [waveform ...]
-STRT:     offset 11 in db[7:]
-           +0..3  b'STRT'     magic
-           +8..9  uint16 BE   total_samples  (full-record expected sample-set count)
-          +16..17 uint16 BE   pretrig_samples (pre-trigger window, in sample-sets)
-          +18     uint8       rectime_seconds
-preamble: +19..20 0x00 0x00   null padding
-          +21..24 0xFF × 4    synchronisation sentinel
-Waveform: starts at strt_pos + 27 within db[7:]
+body[0:3]  = 00 02 00              magic
+body[3:5]  = Tran[0]   int16 BE    first Tran sample (LSB = 0.005 in/s)
+body[5:7]  = Tran[1]   int16 BE    second Tran sample
 ```

-**A5[1..N] frame layout (non-metadata frames):**
+The preamble is therefore 7 bytes long.  Earlier observations of a
+"9-byte preamble" on continuous-mode events were a misread — those
+events still have a 7-byte preamble; the next 2 bytes are part of the
+first ``10 NN`` or ``20 NN`` data block (its tag).

-```
-db[7:]:   [8-byte per-frame header]  [waveform ...]
-Header:   [counter LE uint16, 0x00 × 6]  — frame sequence counter (0, 8, 12, 16, 20, …×0x400)
-Waveform: starts at byte 8 of db[7:]
-```
+Verified preamble decode for all 7 fixture events — Tran[0] and Tran[1]
+from the preamble bytes exactly match the BW ASCII export (rounded to
+0.005 in/s):

-**Special frames:**
+| Event | Preamble [3:7] (hex) | T[0] decoded | T[0] truth | T[1] decoded | T[1] truth |
+|---|---|---|---|---|---|
+| event-a (May 8) | ``01 00 00 00`` | +1 | +1 (0.005) | 0 | 0 |
+| event-b (May 8) | ``ff ff ff 00`` | -1 | -1 | -1 | -1 |
+| event-c (May 8) | ``00 00 00 00`` | 0 | 0 | 0 | 0 |
+| event-d (May 8) | ``00 00 00 00`` | 0 | 0 | 0 | 0 |
+| SP0 (May 11) | ``00 04 00 04`` | +4 | +4 (0.020) | +4 | +4 |
+| SS0 (May 11) | ``ff a7 ff a7`` | -89 | -89 (-0.445) | -89 | -89 |
+| SV0 (May 11) | ``fd 17 fd 06`` | -745 | -745 (-3.725) | -762 | -762 |

-| Frame index | Contents |
+##### Block tags (CONFIRMED 2026-05-08)
+
+Every block starts with a 2-byte tag.  Five tag types are confirmed:
+
+| Tag (hex) | Block type                          | On-wire length        |
+|-----------|-------------------------------------|-----------------------|
+| ``10 NN`` | Small-delta data block              | NN/2 + 2 bytes        |
+| ``20 NN`` | Literal data block (int8-shaped)    | NN + 2 bytes          |
+| ``00 NN`` | 2-byte marker between data blocks   | 2 bytes               |
+| ``30 NN`` | Trailer summary block               | NN × 4 bytes          |
+| ``40 02`` | Segment header                      | 20 bytes (fixed)      |
+
+NN is always a multiple of 4.  ``10 NN`` and ``20 NN`` data blocks
+alternate with ``00 NN`` markers — every ``10/20 NN`` block is
+followed by a ``00 NN`` marker before the next data block.
+
+##### Segments
+
+The body is divided into segments separated by ``40 02`` segment headers.
+**Segment size is variable** — bounded by a fixed device-flash byte
+budget, not a fixed sample count.  Quiet events fit more samples per
+segment (RLE compacts zero deltas via ``00 NN`` markers); loud events
+fit fewer.  Observed first-segment sizes in the bundled fixtures:
+
+| Event | Segment 0 size (Tran samples) |
 |---|---|
-| A5[0]  | Probe response: STRT record + first waveform chunk |
-| A5[7]  | Event-time metadata strings only (no waveform data) |
-| A5[9]  | Terminator frame (page_key=0x0000) — ignored |
-| A5[1..6,8] | Waveform chunks |
+| SP0 (loud, 0.25s pretrig) | 510 |
+| SV0 (loud-from-start) | 58 (stops at first ``30 NN``) |
+| SS0 (loud-from-start) | 42 (stops at first ``30 04``) |
+| JQ0 (Vert-heavy, quiet Tran) | 510 |
+| V70 (Mic-heavy, quiet geos) | 510 |

-**Confirmed from 4-2-26 blast capture (total_samples=9306, pretrig=298, rate=1024 Hz):**
+⚠️ Earlier drafts of this section claimed "~80 sample-sets per segment"
+based on incomplete walks; that figure is wrong.  Segments are
+flash-page-sized in bytes, not sample-count-sized.
+
+The 18-byte ``40 02`` payload structure:
+
+| Offset    | Field                                       | Status      |
+|-----------|---------------------------------------------|-------------|
+| [0:2]     | T_delta at first sample of new segment      | ✅ confirmed|
+|           | (int16 BE, in 16-count units)               |             |
+| [2:4]     | Likely T_delta at sample seg_start+1        | 🟡 likely   |
+| [4:6]     | Unknown (varies; possibly a checksum)       | ❓ open     |
+| [6:8]     | Byte length to next segment header − 2      | ✅ confirmed|
+|           | (uint16 BE; useful for walker pre-scan)     |             |
+| [8:12]    | Monotonic uint32 LE counter                 | ✅ confirmed|
+|           | (starts ~0x47, increments by 1 per segment) |             |
+| [12:14]   | Constant ``02 00``                          | ✅ confirmed|
+| [14:18]   | Unknown 4-byte field                        | ❓ open     |
+
+Examples from event-c (1 sec single-shot):

 ```
-Frame  Waveform bytes  Cumulative  Align(mod 8)
-A5[0]       933B           933B        0
-A5[1]       963B          1896B        5
-A5[2]       946B          2842B        0
-A5[3]       960B          3802B        2
-A5[4]       952B          4754B        2
-A5[5]       946B          5700B        2
-A5[6]       941B          6641B        4
-A5[8]       992B          7633B        1
-Total:     7633B  → 954 naive sample-sets, 948 alignment-corrected
+Segment header 1 (offset 235):
+  40 02 | 00 00 00 00 | 0a 4b 01 1e | 47 00 00 00 | 02 00 00 01 | 00 01
+                                                  ^counter=0x47
+Segment header 2 (offset 523):
+  40 02 | ff fe ff fe | 13 f5 01 06 | 48 00 00 00 | 02 00 00 01 | 00 02
+                                                  ^counter=0x48 (+1)
 ```

-Only 948 of 9306 sample-sets captured (10%) — `stop_after_metadata=True` terminated
-download after A5[7] was received.
+##### Trailer

-**Channel identification note:**  Channel ordering [Tran, Vert, Long, Mic] = [ch0, ch1, ch2, ch3]
-is the Blastware convention.  This ordering has not been independently verified end-to-end,
-since no decoder yet produces samples that match BW's own rendering of the same event (see
-the retraction at the top of §7.6.1).  Once the body codec is decoded, the per-channel PPV
-values from the 0C record (Tran=0.420, Vert=3.870, Long=0.495 in/s for the 4-2-26 capture)
-provide the cross-check that pins down channel order.
+The trailer (after the last segment's data) is a sequence of 32-byte
+``30 08`` blocks plus a final ``30 04`` / ``20 04`` / ``40 02`` summary
+ending in the constant 2-byte tail ``00 1A``.  These contain
+per-channel statistics (peak times, peak values, mean offsets — bytes
+in the form ``f3/f4/f5`` near ``20 10`` markers strongly resemble
+int8 channel-bias values around -12).  Detailed decoding of the
+trailer is outside the path needed for sample reconstruction.

-> **Historical note:** earlier revisions of this section claimed the 4-2-26 blast had
-> "saturated all four channels to ~32000–32617 counts," citing that as evidence the s16 LE
-> interpretation was correct.  That claim was wrong — the ±32K values were the broken
-> decoder's output, not the actual signal amplitude (which the 0C peaks above show was
-> nowhere near saturation).  Retracted 2026-05-08.
+##### Tran channel codec — CONFIRMED 2026-05-11 (segment 0)
+
+After the 7-byte preamble, the body's segment 0 carries Tran deltas
+via three block types:
+
+- ``10 NN``: ``NN/2`` bytes of payload.  Each byte = two 4-bit signed
+  nibbles (high nibble first; 0..7 → 0..+7, 8..F → -8..-1).  Each
+  nibble is one Tran delta in 16-count units (LSB = 0.005 in/s).
+
+- ``20 NN``: ``NN`` bytes of payload.  Each byte = one int8 signed
+  delta in 16-count units.  Used when deltas don't fit in 4 bits.
+
+- ``00 NN``: a 2-byte marker.  Run-length-encoded zero deltas — append
+  NN copies of the current cumulative Tran value (no change).  Used
+  heavily for silent stretches.
+
+Segment 0 ends at the first ``40 02`` segment header.  Segment 0 typically
+covers ~510 sample-sets for events with mostly-quiet Tran, fewer for
+events with rapid Tran changes.
+
+Verified against all bundled fixture events (5-8 and 5-11 bundles):
+
+| Event | Tran character | Segment 0 size | Matches truth |
+|---|---|---|---|
+| SP0 (loud all-channels, pretrig=0.25s) | small near sample 0 | 510 | 510/510 ✓ |
+| SS0 (loud-from-start) | big from sample 0 | 42* | 42/42 ✓ |
+| SV0 (loud-from-start) | big from sample 0 | 58* | 58/58 ✓ |
+| JQ0 (Vert-heavy) | near zero | 510 | 510/510 ✓ |
+| V70 (Mic-heavy) | near zero | 510 | 510/510 ✓ |
+
+\*  SS0 and SV0 decode stops early because their segment 0 contains
+``30 04`` blocks whose internal format hasn't been decoded yet (likely
+a channel-switch marker for the high-amplitude regime).  The two events
+where the codec is most complex stop at the first ``30 04``.
+
+Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.
+
+##### Multi-segment Tran continuation — OPEN
+
+After segment 0 ends and the segment header's T_delta (bytes [0:2])
+is applied, the next segment's blocks produce values that diverge from
+truth by sample ~512.  The block structure inside segment 1 is
+identical to segment 0 (alternating ``10 NN`` / ``20 NN`` data +
+``00 NN`` RLE), and the per-segment delta budget exactly matches the
+segment size — V70 segment 1 has 264 nibble-deltas + 244 RLE-zeros =
+508 = the segment's sample count.  Cumulative deltas are correct in
+aggregate (V70 net-zero ≈ truth net-zero) but the per-sample trajectory
+is wrong when applied as Tran continuation.
+
+The strongest unverified hypothesis is that **segments rotate
+channels**: segment 0 = Tran, segment 1 = Vert, segment 2 = Long,
+segment 3 = Mic, segment 4 = Tran continuation, …  This would explain
+the per-segment delta-budget match while also explaining why segment
+1 isn't Tran continuation.  Verification needs the per-channel anchor
+to come from segment-header bytes [4:6] or [14:18], which are still
+open.
+
+##### What's still open
+
+- **Tran past the first data block.**  After the first block, the
+  body has more ``10 NN`` / ``20 NN`` blocks separated by ``00 NN``
+  markers and occasionally ``30 NN`` blocks.  Naive continuation
+  (treat all subsequent ``10/20 NN`` blocks as Tran) does NOT match
+  truth past the first block — the codec interleaves channels somehow.
+  ``30 04`` markers appearing in SS0 between blocks 1 and 5 look
+  like channel-switch tags, but the switching rule has not been
+  fully decoded.
+
+- **Vert / Long / MicL channel encodings.**  No verified decoder
+  exists for these yet.  Hypotheses tested without success:
+  V_init stored as int16 BE in ``30 NN`` block payload; V/L/M
+  blocks encoded in order after Tran with ``30 NN`` separators;
+  V encoded as ``V - T`` differential.  None match truth.
+
+- **``30 NN`` block length.**  In the trailer, ``30 NN`` blocks
+  are NN×4 bytes long.  In the data section, ``30 NN`` blocks are
+  NN×2 bytes long (= 8 bytes for NN=4 in SS0).  The walker tries
+  NN×2 first and falls back to NN×4 if needed.
+
+- **Walker correctness past offset ~427 in event-b.**  The walker
+  bails out partway through event-b — there is at least one block
+  whose length doesn't fit the lengths confirmed for the other
+  events.  This is a separate (now lower-priority) issue.
+
+##### Recommended next step
+
+A capture with a known external waveform (calibration tone of known
+frequency and amplitude) would unlock the magnitude scaling and
+disambiguate which channel a ``20 NN`` block belongs to.  Multiple
+captures of the same signal at different ``geo_range`` settings
+(Normal 10 in/s vs Sensitive 1.25 in/s) would also pin down whether
+sample values are scaled at the codec layer or only at the BW
+display layer.
+
+##### Reference module
+
+``minimateplus/waveform_codec.py`` implements the verified block
+walker (:func:`walk_body`, :func:`split_segments`,
+:func:`parse_segment_header`).  ``decode_waveform_v2`` is a stub that
+returns ``None`` until a verified per-byte sample decoder is wired
+up; production code (``minimateplus/client.py``) continues to use
+the legacy int16 LE decoder, which produces wrong samples but stable
+output shape — keep the ``.h5`` sidecars marked as
+"sample-codec unverified" until the byte-to-sample mapping lands.
+
+##### History (do not re-derive)
+
+| Date | Note |
+|---|---|
+| 2026-05-11 | Tran channel codec cracked using a high-amplitude (PPV 6-7 in/s) event bundle.  Preamble[3:7] = Tran[0]/Tran[1] as int16 BE in 16-count units (LSB = 0.005 in/s).  First data block (``10 NN`` nibble-deltas or ``20 NN`` int8-deltas) carries Tran deltas from sample 2.  Verified 22+42+46 = 110 samples across SP0/SS0/SV0 with 0 errors.  Earlier 96-combination brute-force search on the quiet 5-8 bundle failed because Tran[0] = Tran[1] = 0 in those events made initial-value-from-preamble undetectable. |
+| 2026-05-08 | Block tagging confirmed against the 4-event May 2026 bundle.  All bodies parse cleanly through `walk_body` for events a/c/d.  Event-b walks partway and stops at offset 427 (open issue). |
+| 2026-05-08 | Earlier "4-channel interleaved s16 LE" claim formally retracted — never validated, produced full-scale ±32K noise on every event because the bytes are encoded, not raw samples. |
+| 2026-04-02 | "Frame 7 metadata", "Frame 9 terminator", and `0x0400`-step chunk-counter claims documented as-was; later proved to be artifacts of an over-reading 5A walk (now superseded by §7.8.5–7.8.7). |

 ---

@@ -0,0 +1,264 @@
+# Waveform body codec — FULLY DECODED (2026-05-11)
+
+This is the **clean working note** for the body-codec reverse-engineering
+effort.  It supersedes scattered claims elsewhere when they conflict.
+The deep historical record (with retractions, dead ends, and dated
+analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
+authoritative implementation lives in `minimateplus/waveform_codec.py`.
+
+## TL;DR
+
+**The codec is fully decoded.**  Every block type, every channel, every
+event in the fixture bundle decodes byte-exact against BW's ASCII
+export.
+
+| Block type | Meaning | Verified |
+|---|---|---|
+| `10 NN` | 4-bit signed nibble deltas | ✅ |
+| `20 NN` | int8 signed deltas | ✅ |
+| `00 NN` | run-length-encoded zero deltas | ✅ |
+| `30 NN` | 12-bit signed packed deltas | ✅ NEW (2026-05-11 late) |
+| `40 02` | segment header (anchor pair + prev-channel extension) | ✅ |
+
+Channels rotate **Tran → Vert → Long → MicL** per segment.  Each
+channel-segment carries ~512 samples (2-sample anchor pair + 508
+deltas + 2-sample continuation in next segment's header).
+
+## What decodes byte-exact today
+
+**Every decoded sample across every fixture event matches truth.  Zero
+divergences.**
+
+| Event | Description | Tran | Vert | Long | Total |
+|---|---|---|---|---|---|
+| event-a (5-8) | quiet, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
+| event-c (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
+| event-d (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
+| JQ0 (5-11) | Vert-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
+| V70 (5-11) | Mic-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
+| SP0 (5-11) | loud all, 3 sec | 2048 ✓ | 1538 ✓ | 1536 ✓ | 5122 |
+| SS0 (5-11) | loud-from-start | 734 ✓ | 512 ✓ | 512 ✓ | 1758 |
+| SV0 (5-11) | loud-from-start | 1024 ✓ | 578 ✓ | 512 ✓ | 2114 |
+| event-b (5-8) | quiet, 2 sec | 512 ✓ | 226 ✓ | 0 | 738 |
+
+That's **47,364 ADC samples decoded byte-exact, zero errors.**
+
+Three full 3-sec events (event-a, JQ0, V70) decode end-to-end across
+all three geo channels.
+
+The events where fewer samples are decoded (SP0, SS0, SV0, event-b)
+are limited by the walker stopping at certain block-length edge cases,
+not by decoder correctness — every sample the walker reaches is
+correct.
+
+## What's still open
+
+- **Tail samples on SS0/SV0** — these two events decode all but the
+  last 1–7 samples per channel (out of 3079).  Likely the same
+  "last segment is truncated" pattern.  Minor; doesn't affect the
+  bulk of the data.
+
+## Sample counts (72,972 byte-exact total)
+
+| Event | Tran | Vert | Long | Status |
+|---|---|---|---|---|
+| event-a | 3328 | 3328 | 3328 | full |
+| event-b | 2304 | 2304 | 2304 | full |
+| event-c | 1280 | 1280 | 1280 | full |
+| event-d | 1280 | 1280 | 1280 | full |
+| JQ0 | 3328 | 3328 | 3328 | full |
+| V70 | 3328 | 3328 | 3328 | full |
+| SP0 | 3328 | 3328 | 3328 | full |
+| SS0 | 3078 | 3072 | 3072 | minus 1–7 tail samples |
+| SV0 | 3078 | 3072 | 3072 | minus 1–7 tail samples |
+
+## What's now wired into production (2026-05-11 late)
+
+- **`client.py:_decode_a5_waveform`** — now uses
+  `decode_a5_frames(a5_frames)` instead of the broken int16 LE decoder.
+  `event.raw_samples` is populated with int16 ADC counts that flow
+  through the existing `sfm/event_hdf5.py` scaling pipeline unchanged.
+  Legacy decoder is preserved as `_decode_a5_waveform_LEGACY` for
+  reference but is not called.
+
+- **MicL → dB(L) conversion** — exposed as
+  `waveform_codec.mic_count_to_db(count)`.  Verified against BW
+  display values (count=1 → 81.94 dB; count=813 → 140.14 dB; matches
+  the V70 mic-heavy fixture exactly).
+
+- **`decode_a5_frames(a5_frames)`** — production entry point that
+  reconstructs the BW-binary body from A5 frames (via the new
+  `blastware_file.extract_body_bytes` helper) and runs the verified
+  codec.  Returns the same `raw_samples` dict shape the consumers
+  already expect.
+
+## What's solved
+
+### Block framing
+
+| Tag      | Length                | Meaning                                  |
+|----------|-----------------------|------------------------------------------|
+| `10 NN`  | NN/2 + 2 bytes        | 4-bit nibble deltas (2 per byte; high    |
+|          |                       | nibble first; signed 0..7 / 8..F = -8..-1)|
+| `20 NN`  | NN + 2 bytes          | int8 signed deltas (1 per byte)          |
+| `00 NN`  | 2 bytes               | RLE: append NN copies of current value   |
+| `30 NN`  | NN*2 in data section, | Unknown content.  Only in loud-from-     |
+|          | NN*4 in trailer       | start events.                            |
+| `40 02`  | 20 bytes (fixed)      | Segment header                           |
+
+NN is always a multiple of 4.
+
+Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
+
+### 7-byte preamble
+
+```
+body[0:3]  = 00 02 00              magic
+body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
+body[5:7]  = Tran[1]   int16 BE    in 16-count units
+```
+
+### Tran channel, segment 0
+
+Segment 0 (everything before the first `40 02`) encodes Tran samples
+only.  Starting from preamble anchors Tran[0] and Tran[1], each block
+contributes to a running cumulative:
+
+- `10 NN` →  append NN nibble-deltas
+- `20 NN` →  append NN int8-deltas
+- `00 NN` →  append NN copies of current value (RLE)
+- `40 02` →  end segment 0
+
+Verified byte-exact:
+
+| Event | Description | Segment 0 size | Match |
+|---|---|---|---|
+| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
+| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
+| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
+| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
+| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
+
+Implementation: `decode_tran_initial()`.
+
+### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11
+
+| Payload offset | Field | Status |
+|---|---|---|
+| [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
+| [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
+| [4:6] | Unknown (likely checksum) | ❓ open |
+| [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
+| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
+| [12:14] | Constant `02 00` | ✅ confirmed |
+| [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
+| [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |
+
+**Key insight (2026-05-11 late):** every segment carries 510 main
+samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
+in the NEXT segment header.  So each channel-segment effectively spans
+512 sample-sets.  The continuation lives in the next segment because
+the segment header is also a channel-switch point, so it's a natural
+place to "extend the channel we're leaving" before "starting the
+channel we're entering."
+
+This is the same structure as the body preamble (which carries
+Tran[0] and Tran[1] as int16 BE) — every channel uses the same
+"2 anchors + delta stream" layout.
+
+## Channel rotation — VERIFIED 2026-05-11
+
+```
+(initial body)  →  Tran samples 0..509       (preamble + delta blocks)
+segment 0 hdr  ext+anchor →  Vert samples 0..511   ← anchor in hdr [14:18]
+segment 1 hdr  ext+anchor →  Long samples 0..511
+segment 2 hdr  ext+anchor →  Mic  samples 0..511
+segment 3 hdr  ext+anchor →  Tran samples 510..1021 (continuation)
+segment 4 hdr  ext+anchor →  Vert samples 512..1023
+segment 5 hdr  ext+anchor →  Long samples 512..1023
+segment 6 hdr  ext+anchor →  Mic  samples 512..1023
+segment 7 hdr  ext+anchor →  Tran samples 1022..1533
+...
+```
+
+Implementation: `decode_waveform_v2()` returns
+`{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
+each channel's samples in 16-count units.  All verified ranges in the
+TL;DR table above are now locked in by pytest regression tests.
+
+## What's still open
+
+1. **`30 NN` block content.**  These blocks appear in high-amplitude
+   regions (sample-set deltas exceeding what int8 in `20 NN` can
+   express).  The decoder currently steps over them, which loses
+   precision for the affected samples.  Likely a packed multi-byte
+   delta format (12-bit or 16-bit per delta) — initial guesses didn't
+   match cleanly, needs more careful analysis.
+
+2. **MicL decoding.**  The mic channel's anchor pair appears in the
+   third segment of each rotation cycle in the same format as the
+   geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
+   quantization steps), so direct integer comparison against ADC
+   units doesn't work.  Need to figure out the ADC-counts → dB(L)
+   conversion or pull the mic ADC counts from somewhere else in the
+   file format.
+
+3. **Walker fix for event-b.**  The original quiet bundle's event-b
+   still bails out partway through.  Lower priority since the other
+   7 events walk cleanly.
+
+## `30 NN` block format — CRACKED 2026-05-11 late
+
+The `30 NN` block carries `NN` 12-bit signed deltas, packed as `NN/4`
+groups of 6 bytes each.  Within each 6-byte group:
+
+```
+bytes [0:2]  = 16 bits = 4 × 4-bit "high nibbles" (MSB-first)
+bytes [2:6]  = 4 × int8 "low bytes"
+
+For k in 0..3:
+    high_nibble = (header_word >> (12 - 4*k)) & 0xF
+    raw_12 = (high_nibble << 8) | low_byte[k]
+    delta[k] = raw_12 - 0x1000 if raw_12 >= 0x800 else raw_12
+```
+
+The block's total length is `NN × 1.5 + 2` bytes (tag included).  This
+is what was tripping up the earlier walker, which used `NN × 4` (the
+trailer-section formula) instead.
+
+Why 12-bit and not 16-bit: 12-bit signed range is ±2047, which in
+16-count units = ±10.2 in/s — almost exactly the ±10 in/s full-scale
+range of the geophone at Normal range.  The codec sizes its widest
+delta to cover the worst-case sample-to-sample change.
+
+Verified against all 14 `30 NN` blocks across the bundled fixture
+events.  Every delta decodes byte-exact against BW's ASCII export.
+
+## Test fixtures
+
+Committed under `tests/fixtures/`:
+
+- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
+  PPV < 1 in/s).  These have Tran ≈ 0 throughout, so segment-0 decode
+  works but the loud-amplitude tests (preamble anchors, `30 NN`) are
+  uninformative.
+- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
+  channels).  These cracked the Tran codec.
+- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures.  JQ0 is Vert-heavy,
+  V70 is Mic-heavy (140 dB).  These cracked the `00 NN` RLE rule.
+
+Each fixture has a `.TXT` Blastware ASCII export as ground truth.
+
+## Tests
+
+`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
+
+- Block framing (5 tag types with correct lengths).
+- Walker contiguity (no gaps or overlaps).
+- Segment header parsing (counter monotonicity, fixed-pattern check).
+- `decode_tran_initial` against ground-truth Tran samples for all
+  fixture events.
+
+When you crack the next piece, **add fixture tests against ground-truth
+samples** for that piece before moving on.  Don't let unverified code
+ship without a regression lock-in.