docs: clean up waveform-codec doc layers per review

Three "truth layers" had drifted apart between commits. Fixed: 1. waveform_codec.py docstring rewritten from the 2026-05-08 "structural framing only" state to the 2026-05-11 "Tran segment 0 solved + segment-header partially decoded" state. Killed stale "~80 sample-sets per segment" language (real segments are flash-page-byte-sized, not sample-count-sized; observed first-segment sizes are 42-510 samples depending on signal). Killed stale "preamble is 7 or 9 bytes" language (always 7). 2. docs/instantel_protocol_reference.md §7.6.1: added a clear "CURRENT STATUS" box at the top with a status table. Replaced the stale "~80 sample-sets" line with the verified per-event segment sizes. Merged two redundant segment-header field-table sections. 3. docs/waveform_codec_re_status.md (NEW): clean working-status doc. Solved / not solved / hypothesis / next experiment / fixtures / tests. The protocol reference remains the historical Rosetta Stone; this new file is the current-truth working note that shouldn't accumulate fossil layers. 4. CLAUDE.md §"Waveform body codec": prominent warning box at top — "DO NOT TRUST decoded sample arrays yet." BW binary passthrough is the only sample-bearing output to trust until the decoder lands. Added a "Next experiment" subsection pointing the next pass at the segment-channel scoring analyzer. 40 tests still pass.
2026-05-12 02:43:25 +00:00
parent 5bf5329369
commit f68ee9f0f9
4 changed files with 385 additions and 139 deletions
@@ -860,20 +860,39 @@ MicL:  39 64 1D AA  =  0.0000875 psi

 ---

-#### 7.6.1 Blast / Waveform mode — 🟡 STRUCTURAL FRAMING + TRAN CODEC DECODED (2026-05-11)
+#### 7.6.1 Blast / Waveform mode — 🟡 PARTIAL DECODE (2026-05-11)

-> **Status (2026-05-11):** Block-level framing is solved.  The Tran-channel
-> encoding (preamble + first data block) is **fully verified** against the
-> 3-event May 11 2026 high-amplitude bundle (PPV 6-7 in/s) and the 4-event
-> May 8 bundle.  Verts / Long / MicL channel encodings and multi-block
-> Tran continuation are **still open**.  The previous int16 LE claim
-> remains REFUTED (see history below).
+> ### 📌 CURRENT STATUS — read this first
 >
-> The earlier "4-channel interleaved s16 LE, 8 bytes per sample-set"
-> claim was never validated and was wrong.  No event in the project's
-> archive ever came close to ADC saturation, yet the int16 LE decoder
-> consistently produced full-scale ±32K noise — that was the signature
-> of mis-aligned encoded data, not signal saturation.
+> The body codec is **partially decoded** as of 2026-05-11.  This
+> section contains both current-truth spec AND historical retractions;
+> when in doubt, the working summary lives at
+> `docs/waveform_codec_re_status.md`.
+>
+> | Item | Status |
+> |---|---|
+> | Body has tagged variable-length blocks, NOT raw int16 LE | ✅ confirmed |
+> | 5 block tag types (10/20/00/30/40 NN) with lengths | ✅ confirmed |
+> | 7-byte preamble: `00 02 00` + Tran[0] + Tran[1] int16 BE | ✅ confirmed |
+> | `00 NN` = RLE for zero deltas in the current channel | ✅ confirmed |
+> | Tran channel, segment 0 (~482-510 samples / event) | ✅ byte-exact, 5/5 events |
+> | Multi-segment Tran continuation | ❌ open (breaks at sample ~512) |
+> | Vert / Long / MicL channel decoders | ❌ open |
+> | `30 NN` block content (loud-from-start events) | ❌ open |
+> | Earlier "raw int16 LE, 8 bytes per sample-set" claim | ❌ REFUTED |
+>
+> **Production code in `client.py:_decode_a5_waveform` still uses the
+> broken int16 LE decoder.**  The `.h5` sidecars SFM produces contain
+> wrong sample values and must be treated as "unverified" downstream.
+> The BW binary write path is unaffected (it's pure passthrough of the
+> device's flash bytes, no decoding) and remains byte-perfect.
+
+The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim that
+appeared in earlier revisions of this section was never validated and
+was wrong.  No event in the project's archive ever came close to ADC
+saturation, yet the int16 LE decoder consistently produced full-scale
+±32K noise — that was the signature of mis-aligned encoded data, not
+signal saturation.

 ##### Body file layout

@@ -932,23 +951,38 @@ followed by a ``00 NN`` marker before the next data block.

 ##### Segments

-The body is divided into ~16 SEGMENTS for a 1280-sample event (= 1
-segment per ~80 sample-sets), separated by ``40 02`` segment headers.
-A 3328-sample event has ~42 segments.
+The body is divided into segments separated by ``40 02`` segment headers.
+**Segment size is variable** — bounded by a fixed device-flash byte
+budget, not a fixed sample count.  Quiet events fit more samples per
+segment (RLE compacts zero deltas via ``00 NN`` markers); loud events
+fit fewer.  Observed first-segment sizes in the bundled fixtures:

-The 18-byte ``40 02`` payload structure (CONFIRMED across all 4
-fixtures by inspecting the increment of bytes [8:12]):
+| Event | Segment 0 size (Tran samples) |
+|---|---|
+| SP0 (loud, 0.25s pretrig) | 510 |
+| SV0 (loud-from-start) | 58 (stops at first ``30 NN``) |
+| SS0 (loud-from-start) | 42 (stops at first ``30 04``) |
+| JQ0 (Vert-heavy, quiet Tran) | 510 |
+| V70 (Mic-heavy, quiet geos) | 510 |

-| Offset | Length | Field                                            |
-|--------|--------|--------------------------------------------------|
-| 0      | 4      | Anchor / channel state (open — see below)        |
-| 4      | 4      | Variable field (open)                            |
-| 8      | 4      | uint32 LE counter — increments by 1 per segment  |
-| 12     | 4      | Fixed pattern ``02 00 00 01``                    |
-| 16     | 2      | Variable tail                                    |
+⚠️ Earlier drafts of this section claimed "~80 sample-sets per segment"
+based on incomplete walks; that figure is wrong.  Segments are
+flash-page-sized in bytes, not sample-count-sized.

-The counter at bytes [8:12] starts in the 0x40s for a freshly-erased
-device and increments cleanly — useful as a structural sanity check.
+The 18-byte ``40 02`` payload structure:
+
+| Offset    | Field                                       | Status      |
+|-----------|---------------------------------------------|-------------|
+| [0:2]     | T_delta at first sample of new segment      | ✅ confirmed|
+|           | (int16 BE, in 16-count units)               |             |
+| [2:4]     | Likely T_delta at sample seg_start+1        | 🟡 likely   |
+| [4:6]     | Unknown (varies; possibly a checksum)       | ❓ open     |
+| [6:8]     | Byte length to next segment header − 2      | ✅ confirmed|
+|           | (uint16 BE; useful for walker pre-scan)     |             |
+| [8:12]    | Monotonic uint32 LE counter                 | ✅ confirmed|
+|           | (starts ~0x47, increments by 1 per segment) |             |
+| [12:14]   | Constant ``02 00``                          | ✅ confirmed|
+| [14:18]   | Unknown 4-byte field                        | ❓ open     |

 Examples from event-c (1 sec single-shot):

@@ -1008,26 +1042,25 @@ where the codec is most complex stop at the first ``30 04``.

 Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.

-##### Segment header T-delta (PARTIAL 2026-05-11)
+##### Multi-segment Tran continuation — OPEN

-The 20-byte ``40 02`` segment header has its first 2 bytes ([0:2] of
-payload) as an int16 BE Tran delta for the first sample of the new
-segment.  Verified across V70 (3 segments with 0 deltas) and SP0/JQ0
-(1 segment with +1 delta).  Other bytes of the segment header payload
-are partially understood:
+After segment 0 ends and the segment header's T_delta (bytes [0:2])
+is applied, the next segment's blocks produce values that diverge from
+truth by sample ~512.  The block structure inside segment 1 is
+identical to segment 0 (alternating ``10 NN`` / ``20 NN`` data +
+``00 NN`` RLE), and the per-segment delta budget exactly matches the
+segment size — V70 segment 1 has 264 nibble-deltas + 244 RLE-zeros =
+508 = the segment's sample count.  Cumulative deltas are correct in
+aggregate (V70 net-zero ≈ truth net-zero) but the per-sample trajectory
+is wrong when applied as Tran continuation.

-| Payload offset | Field | Status |
-|---|---|---|
-| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
-| [2:4] | unknown (often 0; not a simple V or T delta) | ❓ open |
-| [4:6] | unknown (varies per event; possibly a checksum) | ❓ open |
-| [6:8] | byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
-| [8:12] | monotonic uint32 LE counter | ✅ confirmed |
-| [12:14] | constant ``02 00`` | ✅ confirmed |
-| [14:18] | unknown 4-byte field | ❓ open |
-
-Multi-segment Tran decoding diverges after sample ~512 — the per-segment
-channel ordering after the header is still unknown.
+The strongest unverified hypothesis is that **segments rotate
+channels**: segment 0 = Tran, segment 1 = Vert, segment 2 = Long,
+segment 3 = Mic, segment 4 = Tran continuation, …  This would explain
+the per-segment delta-budget match while also explaining why segment
+1 isn't Tran continuation.  Verification needs the per-channel anchor
+to come from segment-header bytes [4:6] or [14:18], which are still
+open.

 ##### What's still open