docs: clean up waveform-codec doc layers per review

Three "truth layers" had drifted apart between commits.  Fixed:

1. waveform_codec.py docstring rewritten from the 2026-05-08
   "structural framing only" state to the 2026-05-11 "Tran segment 0
   solved + segment-header partially decoded" state.  Killed stale
   "~80 sample-sets per segment" language (real segments are
   flash-page-byte-sized, not sample-count-sized; observed first-segment
   sizes are 42-510 samples depending on signal).  Killed stale
   "preamble is 7 or 9 bytes" language (always 7).

2. docs/instantel_protocol_reference.md §7.6.1: added a clear
   "CURRENT STATUS" box at the top with a status table.  Replaced the
   stale "~80 sample-sets" line with the verified per-event segment
   sizes.  Merged two redundant segment-header field-table sections.

3. docs/waveform_codec_re_status.md (NEW): clean working-status doc.
   Solved / not solved / hypothesis / next experiment / fixtures /
   tests.  The protocol reference remains the historical Rosetta
   Stone; this new file is the current-truth working note that
   shouldn't accumulate fossil layers.

4. CLAUDE.md §"Waveform body codec": prominent warning box at top —
   "DO NOT TRUST decoded sample arrays yet."  BW binary passthrough
   is the only sample-bearing output to trust until the decoder
   lands.  Added a "Next experiment" subsection pointing the next
   pass at the segment-channel scoring analyzer.

40 tests still pass.
This commit is contained in:
Claude
2026-05-12 02:43:25 +00:00
committed by serversdown
parent 5bf5329369
commit f68ee9f0f9
4 changed files with 385 additions and 139 deletions
+77 -44
View File
@@ -860,20 +860,39 @@ MicL: 39 64 1D AA = 0.0000875 psi
---
#### 7.6.1 Blast / Waveform mode — 🟡 STRUCTURAL FRAMING + TRAN CODEC DECODED (2026-05-11)
#### 7.6.1 Blast / Waveform mode — 🟡 PARTIAL DECODE (2026-05-11)
> **Status (2026-05-11):** Block-level framing is solved. The Tran-channel
> encoding (preamble + first data block) is **fully verified** against the
> 3-event May 11 2026 high-amplitude bundle (PPV 6-7 in/s) and the 4-event
> May 8 bundle. Verts / Long / MicL channel encodings and multi-block
> Tran continuation are **still open**. The previous int16 LE claim
> remains REFUTED (see history below).
> ### 📌 CURRENT STATUS — read this first
>
> The earlier "4-channel interleaved s16 LE, 8 bytes per sample-set"
> claim was never validated and was wrong. No event in the project's
> archive ever came close to ADC saturation, yet the int16 LE decoder
> consistently produced full-scale ±32K noise — that was the signature
> of mis-aligned encoded data, not signal saturation.
> The body codec is **partially decoded** as of 2026-05-11. This
> section contains both current-truth spec AND historical retractions;
> when in doubt, the working summary lives at
> `docs/waveform_codec_re_status.md`.
>
> | Item | Status |
> |---|---|
> | Body has tagged variable-length blocks, NOT raw int16 LE | ✅ confirmed |
> | 5 block tag types (10/20/00/30/40 NN) with lengths | ✅ confirmed |
> | 7-byte preamble: `00 02 00` + Tran[0] + Tran[1] int16 BE | ✅ confirmed |
> | `00 NN` = RLE for zero deltas in the current channel | ✅ confirmed |
> | Tran channel, segment 0 (~482-510 samples / event) | ✅ byte-exact, 5/5 events |
> | Multi-segment Tran continuation | ❌ open (breaks at sample ~512) |
> | Vert / Long / MicL channel decoders | ❌ open |
> | `30 NN` block content (loud-from-start events) | ❌ open |
> | Earlier "raw int16 LE, 8 bytes per sample-set" claim | ❌ REFUTED |
>
> **Production code in `client.py:_decode_a5_waveform` still uses the
> broken int16 LE decoder.** The `.h5` sidecars SFM produces contain
> wrong sample values and must be treated as "unverified" downstream.
> The BW binary write path is unaffected (it's pure passthrough of the
> device's flash bytes, no decoding) and remains byte-perfect.
The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim that
appeared in earlier revisions of this section was never validated and
was wrong. No event in the project's archive ever came close to ADC
saturation, yet the int16 LE decoder consistently produced full-scale
±32K noise — that was the signature of mis-aligned encoded data, not
signal saturation.
##### Body file layout
@@ -932,23 +951,38 @@ followed by a ``00 NN`` marker before the next data block.
##### Segments
The body is divided into ~16 SEGMENTS for a 1280-sample event (= 1
segment per ~80 sample-sets), separated by ``40 02`` segment headers.
A 3328-sample event has ~42 segments.
The body is divided into segments separated by ``40 02`` segment headers.
**Segment size is variable** — bounded by a fixed device-flash byte
budget, not a fixed sample count. Quiet events fit more samples per
segment (RLE compacts zero deltas via ``00 NN`` markers); loud events
fit fewer. Observed first-segment sizes in the bundled fixtures:
The 18-byte ``40 02`` payload structure (CONFIRMED across all 4
fixtures by inspecting the increment of bytes [8:12]):
| Event | Segment 0 size (Tran samples) |
|---|---|
| SP0 (loud, 0.25s pretrig) | 510 |
| SV0 (loud-from-start) | 58 (stops at first ``30 NN``) |
| SS0 (loud-from-start) | 42 (stops at first ``30 04``) |
| JQ0 (Vert-heavy, quiet Tran) | 510 |
| V70 (Mic-heavy, quiet geos) | 510 |
| Offset | Length | Field |
|--------|--------|--------------------------------------------------|
| 0 | 4 | Anchor / channel state (open — see below) |
| 4 | 4 | Variable field (open) |
| 8 | 4 | uint32 LE counter — increments by 1 per segment |
| 12 | 4 | Fixed pattern ``02 00 00 01`` |
| 16 | 2 | Variable tail |
⚠️ Earlier drafts of this section claimed "~80 sample-sets per segment"
based on incomplete walks; that figure is wrong. Segments are
flash-page-sized in bytes, not sample-count-sized.
The counter at bytes [8:12] starts in the 0x40s for a freshly-erased
device and increments cleanly — useful as a structural sanity check.
The 18-byte ``40 02`` payload structure:
| Offset | Field | Status |
|-----------|---------------------------------------------|-------------|
| [0:2] | T_delta at first sample of new segment | ✅ confirmed|
| | (int16 BE, in 16-count units) | |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (varies; possibly a checksum) | ❓ open |
| [6:8] | Byte length to next segment header 2 | ✅ confirmed|
| | (uint16 BE; useful for walker pre-scan) | |
| [8:12] | Monotonic uint32 LE counter | ✅ confirmed|
| | (starts ~0x47, increments by 1 per segment) | |
| [12:14] | Constant ``02 00`` | ✅ confirmed|
| [14:18] | Unknown 4-byte field | ❓ open |
Examples from event-c (1 sec single-shot):
@@ -1008,26 +1042,25 @@ where the codec is most complex stop at the first ``30 04``.
Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.
##### Segment header T-delta (PARTIAL 2026-05-11)
##### Multi-segment Tran continuation — OPEN
The 20-byte ``40 02`` segment header has its first 2 bytes ([0:2] of
payload) as an int16 BE Tran delta for the first sample of the new
segment. Verified across V70 (3 segments with 0 deltas) and SP0/JQ0
(1 segment with +1 delta). Other bytes of the segment header payload
are partially understood:
After segment 0 ends and the segment header's T_delta (bytes [0:2])
is applied, the next segment's blocks produce values that diverge from
truth by sample ~512. The block structure inside segment 1 is
identical to segment 0 (alternating ``10 NN`` / ``20 NN`` data +
``00 NN`` RLE), and the per-segment delta budget exactly matches the
segment size — V70 segment 1 has 264 nibble-deltas + 244 RLE-zeros =
508 = the segment's sample count. Cumulative deltas are correct in
aggregate (V70 net-zero ≈ truth net-zero) but the per-sample trajectory
is wrong when applied as Tran continuation.
| Payload offset | Field | Status |
|---|---|---|
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
| [2:4] | unknown (often 0; not a simple V or T delta) | ❓ open |
| [4:6] | unknown (varies per event; possibly a checksum) | ❓ open |
| [6:8] | byte length to next segment header 2 (uint16 BE) | ✅ confirmed |
| [8:12] | monotonic uint32 LE counter | ✅ confirmed |
| [12:14] | constant ``02 00`` | ✅ confirmed |
| [14:18] | unknown 4-byte field | ❓ open |
Multi-segment Tran decoding diverges after sample ~512 — the per-segment
channel ordering after the header is still unknown.
The strongest unverified hypothesis is that **segments rotate
channels**: segment 0 = Tran, segment 1 = Vert, segment 2 = Long,
segment 3 = Mic, segment 4 = Tran continuation, … This would explain
the per-segment delta-budget match while also explaining why segment
1 isn't Tran continuation. Verification needs the per-channel anchor
to come from segment-header bytes [4:6] or [14:18], which are still
open.
##### What's still open
+172
View File
@@ -0,0 +1,172 @@
# Waveform body codec — current working status (2026-05-11)
This is the **clean working note** for the body-codec reverse-engineering
effort. It supersedes scattered claims elsewhere when they conflict.
The deep historical record (with retractions, dead ends, and dated
analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
authoritative implementation lives in `minimateplus/waveform_codec.py`.
## TL;DR
The Blastware waveform-file body is a **tagged variable-length block
stream**, NOT raw int16 LE samples. Block framing is solved. Tran
channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
across all 5 high-amplitude fixture events). Multi-segment continuation
and the Vert / Long / MicL channel decoders are still open.
**Production code in `minimateplus/client.py:_decode_a5_waveform` still
uses the broken legacy int16 LE decoder.** Sample arrays it writes to
the `.h5` sidecars are wrong and must be treated as "unverified" by all
downstream consumers. The BW binary write path (`blastware_file.py`)
is unaffected — it's pure passthrough and remains byte-perfect.
## What's solved
### Block framing
| Tag | Length | Meaning |
|----------|-----------------------|------------------------------------------|
| `10 NN` | NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
| `20 NN` | NN + 2 bytes | int8 signed deltas (1 per byte) |
| `00 NN` | 2 bytes | RLE: append NN copies of current value |
| `30 NN` | NN*2 in data section, | Unknown content. Only in loud-from- |
| | NN*4 in trailer | start events. |
| `40 02` | 20 bytes (fixed) | Segment header |
NN is always a multiple of 4.
Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
### 7-byte preamble
```
body[0:3] = 00 02 00 magic
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
body[5:7] = Tran[1] int16 BE in 16-count units
```
### Tran channel, segment 0
Segment 0 (everything before the first `40 02`) encodes Tran samples
only. Starting from preamble anchors Tran[0] and Tran[1], each block
contributes to a running cumulative:
- `10 NN` → append NN nibble-deltas
- `20 NN` → append NN int8-deltas
- `00 NN` → append NN copies of current value (RLE)
- `40 02` → end segment 0
Verified byte-exact:
| Event | Description | Segment 0 size | Match |
|---|---|---|---|
| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
Implementation: `decode_tran_initial()`.
### Segment header (`40 02`, 20 bytes total)
| Payload offset | Field | Status |
|---|---|---|
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (possibly checksum) | ❓ open |
| [6:8] | Byte length to next segment header 2 (uint16 BE) | ✅ confirmed |
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
| [12:14] | Constant `02 00` | ✅ confirmed |
| [14:18] | Unknown 4-byte field | ❓ open |
## What's still open
1. **Multi-segment Tran continuation.** After segment 0, applying
segment 1's blocks as Tran continuation diverges from truth by
sample ~512. Block structure is identical to segment 0 and the
per-segment delta budget matches the segment size — but the per-
sample trajectory is wrong.
2. **Vert / Long / MicL channel decoders.** No verified decoder for
any non-Tran channel.
3. **`30 NN` block content.** Only appears in loud-from-start events.
Probably a channel-switch or alternative-encoding marker for high-
amplitude regions. Walker steps over it without decoding.
## Strongest unverified hypothesis
Segments rotate channels:
```
segment 0 → Tran samples 0..509
segment 1 → Vert samples 0..507
segment 2 → Long samples 0..507
segment 3 → Mic samples 0..507
segment 4 → Tran samples 510..N (continuation)
...
```
This would explain:
- Why segment-0 = Tran works perfectly.
- Why segment 1 has the same block structure but applying it as Tran
continuation gives wrong values.
- Why the per-segment delta budget matches the segment size for a
*single* channel (508 deltas per segment, not 4 × 508).
Not yet verified because the per-channel anchor at segment-start isn't
identified in the segment header. Bytes [4:6] and [14:18] of the
header are the prime candidates.
## Next experiment — segment-channel scoring analyzer
Don't try to hero-code the full decoder. Instead, build a small
analysis tool that:
1. For each segment in every fixture event, runs the segment-0 Tran
decoder (block-walk + RLE) and produces a cumulative trajectory
of 508 deltas.
2. Scores that trajectory against the BW ASCII truth for *each* of
{Tran, Vert, Long, MicL} over the segment's sample range, starting
from different anchor-byte candidates from the segment header.
3. Reports which (channel, anchor-bytes-location) combination produces
the lowest error for each segment.
If the rotation hypothesis is right, segment 0 should clearly score
best against Tran, segment 1 against Vert, etc. The winning
anchor-bytes-location will reveal which segment-header bytes encode
the per-segment channel anchors.
If the rotation hypothesis is *not* right, the scorer will at least
narrow down what segment 1 actually carries.
## Test fixtures
Committed under `tests/fixtures/`:
- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode
works but the loud-amplitude tests (preamble anchors, `30 NN`) are
uninformative.
- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
channels). These cracked the Tran codec.
- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures. JQ0 is Vert-heavy,
V70 is Mic-heavy (140 dB). These cracked the `00 NN` RLE rule.
Each fixture has a `.TXT` Blastware ASCII export as ground truth.
## Tests
`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
- Block framing (5 tag types with correct lengths).
- Walker contiguity (no gaps or overlaps).
- Segment header parsing (counter monotonicity, fixed-pattern check).
- `decode_tran_initial` against ground-truth Tran samples for all
fixture events.
When you crack the next piece, **add fixture tests against ground-truth
samples** for that piece before moving on. Don't let unverified code
ship without a regression lock-in.