docs: clean up waveform-codec doc layers per review

Three "truth layers" had drifted apart between commits.  Fixed:

1. waveform_codec.py docstring rewritten from the 2026-05-08
   "structural framing only" state to the 2026-05-11 "Tran segment 0
   solved + segment-header partially decoded" state.  Killed stale
   "~80 sample-sets per segment" language (real segments are
   flash-page-byte-sized, not sample-count-sized; observed first-segment
   sizes are 42-510 samples depending on signal).  Killed stale
   "preamble is 7 or 9 bytes" language (always 7).

2. docs/instantel_protocol_reference.md §7.6.1: added a clear
   "CURRENT STATUS" box at the top with a status table.  Replaced the
   stale "~80 sample-sets" line with the verified per-event segment
   sizes.  Merged two redundant segment-header field-table sections.

3. docs/waveform_codec_re_status.md (NEW): clean working-status doc.
   Solved / not solved / hypothesis / next experiment / fixtures /
   tests.  The protocol reference remains the historical Rosetta
   Stone; this new file is the current-truth working note that
   shouldn't accumulate fossil layers.

4. CLAUDE.md §"Waveform body codec": prominent warning box at top —
   "DO NOT TRUST decoded sample arrays yet."  BW binary passthrough
   is the only sample-bearing output to trust until the decoder
   lands.  Added a "Next experiment" subsection pointing the next
   pass at the segment-channel scoring analyzer.

40 tests still pass.
This commit is contained in:
Claude
2026-05-12 02:43:25 +00:00
committed by serversdown
parent 5bf5329369
commit f68ee9f0f9
4 changed files with 385 additions and 139 deletions
+77 -44
View File
@@ -860,20 +860,39 @@ MicL: 39 64 1D AA = 0.0000875 psi
---
#### 7.6.1 Blast / Waveform mode — 🟡 STRUCTURAL FRAMING + TRAN CODEC DECODED (2026-05-11)
#### 7.6.1 Blast / Waveform mode — 🟡 PARTIAL DECODE (2026-05-11)
> **Status (2026-05-11):** Block-level framing is solved. The Tran-channel
> encoding (preamble + first data block) is **fully verified** against the
> 3-event May 11 2026 high-amplitude bundle (PPV 6-7 in/s) and the 4-event
> May 8 bundle. Verts / Long / MicL channel encodings and multi-block
> Tran continuation are **still open**. The previous int16 LE claim
> remains REFUTED (see history below).
> ### 📌 CURRENT STATUS — read this first
>
> The earlier "4-channel interleaved s16 LE, 8 bytes per sample-set"
> claim was never validated and was wrong. No event in the project's
> archive ever came close to ADC saturation, yet the int16 LE decoder
> consistently produced full-scale ±32K noise — that was the signature
> of mis-aligned encoded data, not signal saturation.
> The body codec is **partially decoded** as of 2026-05-11. This
> section contains both current-truth spec AND historical retractions;
> when in doubt, the working summary lives at
> `docs/waveform_codec_re_status.md`.
>
> | Item | Status |
> |---|---|
> | Body has tagged variable-length blocks, NOT raw int16 LE | ✅ confirmed |
> | 5 block tag types (10/20/00/30/40 NN) with lengths | ✅ confirmed |
> | 7-byte preamble: `00 02 00` + Tran[0] + Tran[1] int16 BE | ✅ confirmed |
> | `00 NN` = RLE for zero deltas in the current channel | ✅ confirmed |
> | Tran channel, segment 0 (~482-510 samples / event) | ✅ byte-exact, 5/5 events |
> | Multi-segment Tran continuation | ❌ open (breaks at sample ~512) |
> | Vert / Long / MicL channel decoders | ❌ open |
> | `30 NN` block content (loud-from-start events) | ❌ open |
> | Earlier "raw int16 LE, 8 bytes per sample-set" claim | ❌ REFUTED |
>
> **Production code in `client.py:_decode_a5_waveform` still uses the
> broken int16 LE decoder.** The `.h5` sidecars SFM produces contain
> wrong sample values and must be treated as "unverified" downstream.
> The BW binary write path is unaffected (it's pure passthrough of the
> device's flash bytes, no decoding) and remains byte-perfect.
The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim that
appeared in earlier revisions of this section was never validated and
was wrong. No event in the project's archive ever came close to ADC
saturation, yet the int16 LE decoder consistently produced full-scale
±32K noise — that was the signature of mis-aligned encoded data, not
signal saturation.
##### Body file layout
@@ -932,23 +951,38 @@ followed by a ``00 NN`` marker before the next data block.
##### Segments
The body is divided into ~16 SEGMENTS for a 1280-sample event (= 1
segment per ~80 sample-sets), separated by ``40 02`` segment headers.
A 3328-sample event has ~42 segments.
The body is divided into segments separated by ``40 02`` segment headers.
**Segment size is variable** — bounded by a fixed device-flash byte
budget, not a fixed sample count. Quiet events fit more samples per
segment (RLE compacts zero deltas via ``00 NN`` markers); loud events
fit fewer. Observed first-segment sizes in the bundled fixtures:
The 18-byte ``40 02`` payload structure (CONFIRMED across all 4
fixtures by inspecting the increment of bytes [8:12]):
| Event | Segment 0 size (Tran samples) |
|---|---|
| SP0 (loud, 0.25s pretrig) | 510 |
| SV0 (loud-from-start) | 58 (stops at first ``30 NN``) |
| SS0 (loud-from-start) | 42 (stops at first ``30 04``) |
| JQ0 (Vert-heavy, quiet Tran) | 510 |
| V70 (Mic-heavy, quiet geos) | 510 |
| Offset | Length | Field |
|--------|--------|--------------------------------------------------|
| 0 | 4 | Anchor / channel state (open — see below) |
| 4 | 4 | Variable field (open) |
| 8 | 4 | uint32 LE counter — increments by 1 per segment |
| 12 | 4 | Fixed pattern ``02 00 00 01`` |
| 16 | 2 | Variable tail |
⚠️ Earlier drafts of this section claimed "~80 sample-sets per segment"
based on incomplete walks; that figure is wrong. Segments are
flash-page-sized in bytes, not sample-count-sized.
The counter at bytes [8:12] starts in the 0x40s for a freshly-erased
device and increments cleanly — useful as a structural sanity check.
The 18-byte ``40 02`` payload structure:
| Offset | Field | Status |
|-----------|---------------------------------------------|-------------|
| [0:2] | T_delta at first sample of new segment | ✅ confirmed|
| | (int16 BE, in 16-count units) | |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (varies; possibly a checksum) | ❓ open |
| [6:8] | Byte length to next segment header 2 | ✅ confirmed|
| | (uint16 BE; useful for walker pre-scan) | |
| [8:12] | Monotonic uint32 LE counter | ✅ confirmed|
| | (starts ~0x47, increments by 1 per segment) | |
| [12:14] | Constant ``02 00`` | ✅ confirmed|
| [14:18] | Unknown 4-byte field | ❓ open |
Examples from event-c (1 sec single-shot):
@@ -1008,26 +1042,25 @@ where the codec is most complex stop at the first ``30 04``.
Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.
##### Segment header T-delta (PARTIAL 2026-05-11)
##### Multi-segment Tran continuation — OPEN
The 20-byte ``40 02`` segment header has its first 2 bytes ([0:2] of
payload) as an int16 BE Tran delta for the first sample of the new
segment. Verified across V70 (3 segments with 0 deltas) and SP0/JQ0
(1 segment with +1 delta). Other bytes of the segment header payload
are partially understood:
After segment 0 ends and the segment header's T_delta (bytes [0:2])
is applied, the next segment's blocks produce values that diverge from
truth by sample ~512. The block structure inside segment 1 is
identical to segment 0 (alternating ``10 NN`` / ``20 NN`` data +
``00 NN`` RLE), and the per-segment delta budget exactly matches the
segment size — V70 segment 1 has 264 nibble-deltas + 244 RLE-zeros =
508 = the segment's sample count. Cumulative deltas are correct in
aggregate (V70 net-zero ≈ truth net-zero) but the per-sample trajectory
is wrong when applied as Tran continuation.
| Payload offset | Field | Status |
|---|---|---|
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
| [2:4] | unknown (often 0; not a simple V or T delta) | ❓ open |
| [4:6] | unknown (varies per event; possibly a checksum) | ❓ open |
| [6:8] | byte length to next segment header 2 (uint16 BE) | ✅ confirmed |
| [8:12] | monotonic uint32 LE counter | ✅ confirmed |
| [12:14] | constant ``02 00`` | ✅ confirmed |
| [14:18] | unknown 4-byte field | ❓ open |
Multi-segment Tran decoding diverges after sample ~512 — the per-segment
channel ordering after the header is still unknown.
The strongest unverified hypothesis is that **segments rotate
channels**: segment 0 = Tran, segment 1 = Vert, segment 2 = Long,
segment 3 = Mic, segment 4 = Tran continuation, … This would explain
the per-segment delta-budget match while also explaining why segment
1 isn't Tran continuation. Verification needs the per-channel anchor
to come from segment-header bytes [4:6] or [14:18], which are still
open.
##### What's still open