docs: clean up waveform-codec doc layers per review
Three "truth layers" had drifted apart between commits. Fixed: 1. waveform_codec.py docstring rewritten from the 2026-05-08 "structural framing only" state to the 2026-05-11 "Tran segment 0 solved + segment-header partially decoded" state. Killed stale "~80 sample-sets per segment" language (real segments are flash-page-byte-sized, not sample-count-sized; observed first-segment sizes are 42-510 samples depending on signal). Killed stale "preamble is 7 or 9 bytes" language (always 7). 2. docs/instantel_protocol_reference.md §7.6.1: added a clear "CURRENT STATUS" box at the top with a status table. Replaced the stale "~80 sample-sets" line with the verified per-event segment sizes. Merged two redundant segment-header field-table sections. 3. docs/waveform_codec_re_status.md (NEW): clean working-status doc. Solved / not solved / hypothesis / next experiment / fixtures / tests. The protocol reference remains the historical Rosetta Stone; this new file is the current-truth working note that shouldn't accumulate fossil layers. 4. CLAUDE.md §"Waveform body codec": prominent warning box at top — "DO NOT TRUST decoded sample arrays yet." BW binary passthrough is the only sample-bearing output to trust until the decoder lands. Added a "Next experiment" subsection pointing the next pass at the segment-channel scoring analyzer. 40 tests still pass.
This commit is contained in:
@@ -0,0 +1,172 @@
|
||||
# Waveform body codec — current working status (2026-05-11)
|
||||
|
||||
This is the **clean working note** for the body-codec reverse-engineering
|
||||
effort. It supersedes scattered claims elsewhere when they conflict.
|
||||
The deep historical record (with retractions, dead ends, and dated
|
||||
analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
|
||||
authoritative implementation lives in `minimateplus/waveform_codec.py`.
|
||||
|
||||
## TL;DR
|
||||
|
||||
The Blastware waveform-file body is a **tagged variable-length block
|
||||
stream**, NOT raw int16 LE samples. Block framing is solved. Tran
|
||||
channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
|
||||
across all 5 high-amplitude fixture events). Multi-segment continuation
|
||||
and the Vert / Long / MicL channel decoders are still open.
|
||||
|
||||
**Production code in `minimateplus/client.py:_decode_a5_waveform` still
|
||||
uses the broken legacy int16 LE decoder.** Sample arrays it writes to
|
||||
the `.h5` sidecars are wrong and must be treated as "unverified" by all
|
||||
downstream consumers. The BW binary write path (`blastware_file.py`)
|
||||
is unaffected — it's pure passthrough and remains byte-perfect.
|
||||
|
||||
## What's solved
|
||||
|
||||
### Block framing
|
||||
|
||||
| Tag | Length | Meaning |
|
||||
|----------|-----------------------|------------------------------------------|
|
||||
| `10 NN` | NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
|
||||
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
|
||||
| `20 NN` | NN + 2 bytes | int8 signed deltas (1 per byte) |
|
||||
| `00 NN` | 2 bytes | RLE: append NN copies of current value |
|
||||
| `30 NN` | NN*2 in data section, | Unknown content. Only in loud-from- |
|
||||
| | NN*4 in trailer | start events. |
|
||||
| `40 02` | 20 bytes (fixed) | Segment header |
|
||||
|
||||
NN is always a multiple of 4.
|
||||
|
||||
Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
|
||||
|
||||
### 7-byte preamble
|
||||
|
||||
```
|
||||
body[0:3] = 00 02 00 magic
|
||||
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
|
||||
body[5:7] = Tran[1] int16 BE in 16-count units
|
||||
```
|
||||
|
||||
### Tran channel, segment 0
|
||||
|
||||
Segment 0 (everything before the first `40 02`) encodes Tran samples
|
||||
only. Starting from preamble anchors Tran[0] and Tran[1], each block
|
||||
contributes to a running cumulative:
|
||||
|
||||
- `10 NN` → append NN nibble-deltas
|
||||
- `20 NN` → append NN int8-deltas
|
||||
- `00 NN` → append NN copies of current value (RLE)
|
||||
- `40 02` → end segment 0
|
||||
|
||||
Verified byte-exact:
|
||||
|
||||
| Event | Description | Segment 0 size | Match |
|
||||
|---|---|---|---|
|
||||
| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
|
||||
| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
|
||||
| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
|
||||
| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
|
||||
| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
|
||||
|
||||
Implementation: `decode_tran_initial()`.
|
||||
|
||||
### Segment header (`40 02`, 20 bytes total)
|
||||
|
||||
| Payload offset | Field | Status |
|
||||
|---|---|---|
|
||||
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
|
||||
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
|
||||
| [4:6] | Unknown (possibly checksum) | ❓ open |
|
||||
| [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
|
||||
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
|
||||
| [12:14] | Constant `02 00` | ✅ confirmed |
|
||||
| [14:18] | Unknown 4-byte field | ❓ open |
|
||||
|
||||
## What's still open
|
||||
|
||||
1. **Multi-segment Tran continuation.** After segment 0, applying
|
||||
segment 1's blocks as Tran continuation diverges from truth by
|
||||
sample ~512. Block structure is identical to segment 0 and the
|
||||
per-segment delta budget matches the segment size — but the per-
|
||||
sample trajectory is wrong.
|
||||
|
||||
2. **Vert / Long / MicL channel decoders.** No verified decoder for
|
||||
any non-Tran channel.
|
||||
|
||||
3. **`30 NN` block content.** Only appears in loud-from-start events.
|
||||
Probably a channel-switch or alternative-encoding marker for high-
|
||||
amplitude regions. Walker steps over it without decoding.
|
||||
|
||||
## Strongest unverified hypothesis
|
||||
|
||||
Segments rotate channels:
|
||||
|
||||
```
|
||||
segment 0 → Tran samples 0..509
|
||||
segment 1 → Vert samples 0..507
|
||||
segment 2 → Long samples 0..507
|
||||
segment 3 → Mic samples 0..507
|
||||
segment 4 → Tran samples 510..N (continuation)
|
||||
...
|
||||
```
|
||||
|
||||
This would explain:
|
||||
- Why segment-0 = Tran works perfectly.
|
||||
- Why segment 1 has the same block structure but applying it as Tran
|
||||
continuation gives wrong values.
|
||||
- Why the per-segment delta budget matches the segment size for a
|
||||
*single* channel (508 deltas per segment, not 4 × 508).
|
||||
|
||||
Not yet verified because the per-channel anchor at segment-start isn't
|
||||
identified in the segment header. Bytes [4:6] and [14:18] of the
|
||||
header are the prime candidates.
|
||||
|
||||
## Next experiment — segment-channel scoring analyzer
|
||||
|
||||
Don't try to hero-code the full decoder. Instead, build a small
|
||||
analysis tool that:
|
||||
|
||||
1. For each segment in every fixture event, runs the segment-0 Tran
|
||||
decoder (block-walk + RLE) and produces a cumulative trajectory
|
||||
of 508 deltas.
|
||||
2. Scores that trajectory against the BW ASCII truth for *each* of
|
||||
{Tran, Vert, Long, MicL} over the segment's sample range, starting
|
||||
from different anchor-byte candidates from the segment header.
|
||||
3. Reports which (channel, anchor-bytes-location) combination produces
|
||||
the lowest error for each segment.
|
||||
|
||||
If the rotation hypothesis is right, segment 0 should clearly score
|
||||
best against Tran, segment 1 against Vert, etc. The winning
|
||||
anchor-bytes-location will reveal which segment-header bytes encode
|
||||
the per-segment channel anchors.
|
||||
|
||||
If the rotation hypothesis is *not* right, the scorer will at least
|
||||
narrow down what segment 1 actually carries.
|
||||
|
||||
## Test fixtures
|
||||
|
||||
Committed under `tests/fixtures/`:
|
||||
|
||||
- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
|
||||
PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode
|
||||
works but the loud-amplitude tests (preamble anchors, `30 NN`) are
|
||||
uninformative.
|
||||
- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
|
||||
channels). These cracked the Tran codec.
|
||||
- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures. JQ0 is Vert-heavy,
|
||||
V70 is Mic-heavy (140 dB). These cracked the `00 NN` RLE rule.
|
||||
|
||||
Each fixture has a `.TXT` Blastware ASCII export as ground truth.
|
||||
|
||||
## Tests
|
||||
|
||||
`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
|
||||
|
||||
- Block framing (5 tag types with correct lengths).
|
||||
- Walker contiguity (no gaps or overlaps).
|
||||
- Segment header parsing (counter monotonicity, fixed-pattern check).
|
||||
- `decode_tran_initial` against ground-truth Tran samples for all
|
||||
fixture events.
|
||||
|
||||
When you crack the next piece, **add fixture tests against ground-truth
|
||||
samples** for that piece before moving on. Don't let unverified code
|
||||
ship without a regression lock-in.
|
||||
Reference in New Issue
Block a user