07675626dc
The segment-channel scoring analyzer (from scratch/next_experiment_skeleton.py) ran and immediately confirmed the rotation hypothesis: SP0 seg 0: best fit Vert 508/508 ✓ SP0 seg 1: best fit Long 508/508 ✓ SP0 seg 3: best fit Tran 508/508 ✓ (Tran continuation) SP0 seg 5: best fit Long 508/508 ✓ SP0 seg 9: best fit Long 508/508 ✓ V70 seg 0: best fit Vert 508/508 ✓ V70 seg 1: best fit Long 508/508 ✓ Channels rotate Tran → Vert → Long → MicL per 40 02 segment header. Also discovered the segment header has DOUBLE duty: bytes [14:18] anchor the NEW segment's channel (2 samples as int16 BE in 16-count units), AND bytes [0:4] extend the PREVIOUS channel by 2 more samples (2 deltas as int16 BE). This is the same "2 anchors + delta stream" structure as the body preamble for Tran. decode_waveform_v2 now returns full per-channel sample dicts. Byte-exact verified ranges: V70: Tran 512, Vert 512, Long 512 (all first segments) JQ0: Tran 512, Vert 258 SP0: Long 1536 (all 3 L segments) Still open: the 30 NN block format (high-amplitude packed deltas) — appears mid-segment when single-byte deltas can't carry the magnitude. 6 new tests bring the count to 46. All passing.
213 lines
9.1 KiB
Markdown
213 lines
9.1 KiB
Markdown
# Waveform body codec — current working status (2026-05-11, late)
|
||
|
||
This is the **clean working note** for the body-codec reverse-engineering
|
||
effort. It supersedes scattered claims elsewhere when they conflict.
|
||
The deep historical record (with retractions, dead ends, and dated
|
||
analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
|
||
authoritative implementation lives in `minimateplus/waveform_codec.py`.
|
||
|
||
## TL;DR
|
||
|
||
The Blastware waveform-file body is a **tagged variable-length block
|
||
stream**, NOT raw int16 LE samples. Block framing is solved. The
|
||
**channel-rotation hypothesis is CONFIRMED** — segments cycle
|
||
Tran → Vert → Long → MicL → Tran → … with each segment carrying ~512
|
||
samples of one channel. Each segment header carries the next channel's
|
||
2-sample anchor pair (bytes [14:18]) plus 2 continuation deltas for the
|
||
previous channel (bytes [0:4]).
|
||
|
||
**What decodes byte-exact today (verified against BW ASCII export):**
|
||
|
||
| Event | Channel | Samples verified |
|
||
|---|---|---|
|
||
| V70 (Mic-heavy) | Tran | 512 (1 segment) |
|
||
| V70 | Vert | 512 |
|
||
| V70 | Long | 512 |
|
||
| JQ0 (Vert-heavy) | Tran | 512 |
|
||
| JQ0 | Vert | 258 |
|
||
| SP0 (loud all) | Long | **1536 (all 3 L segments)** |
|
||
| SP0 | Tran | 1350 / 2044 produced |
|
||
| SP0 | Vert | 650 / 1526 produced |
|
||
|
||
**What's still open:** the `30 NN` block format. These blocks appear in
|
||
high-amplitude regions (deltas exceeding what int8 can express). My
|
||
decoder currently steps over them, which is fine for quiet stretches but
|
||
breaks the cumulative when a `30 NN` carries information for samples we
|
||
need. Cracking this is the last major piece.
|
||
|
||
**Production code in `minimateplus/client.py:_decode_a5_waveform` still
|
||
uses the broken legacy int16 LE decoder.** Sample arrays it writes to
|
||
the `.h5` sidecars are wrong and must be treated as "unverified" by all
|
||
downstream consumers. The BW binary write path (`blastware_file.py`)
|
||
is unaffected — it's pure passthrough and remains byte-perfect.
|
||
|
||
## What's solved
|
||
|
||
### Block framing
|
||
|
||
| Tag | Length | Meaning |
|
||
|----------|-----------------------|------------------------------------------|
|
||
| `10 NN` | NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
|
||
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
|
||
| `20 NN` | NN + 2 bytes | int8 signed deltas (1 per byte) |
|
||
| `00 NN` | 2 bytes | RLE: append NN copies of current value |
|
||
| `30 NN` | NN*2 in data section, | Unknown content. Only in loud-from- |
|
||
| | NN*4 in trailer | start events. |
|
||
| `40 02` | 20 bytes (fixed) | Segment header |
|
||
|
||
NN is always a multiple of 4.
|
||
|
||
Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
|
||
|
||
### 7-byte preamble
|
||
|
||
```
|
||
body[0:3] = 00 02 00 magic
|
||
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
|
||
body[5:7] = Tran[1] int16 BE in 16-count units
|
||
```
|
||
|
||
### Tran channel, segment 0
|
||
|
||
Segment 0 (everything before the first `40 02`) encodes Tran samples
|
||
only. Starting from preamble anchors Tran[0] and Tran[1], each block
|
||
contributes to a running cumulative:
|
||
|
||
- `10 NN` → append NN nibble-deltas
|
||
- `20 NN` → append NN int8-deltas
|
||
- `00 NN` → append NN copies of current value (RLE)
|
||
- `40 02` → end segment 0
|
||
|
||
Verified byte-exact:
|
||
|
||
| Event | Description | Segment 0 size | Match |
|
||
|---|---|---|---|
|
||
| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
|
||
| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
|
||
| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
|
||
| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
|
||
| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
|
||
|
||
Implementation: `decode_tran_initial()`.
|
||
|
||
### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11
|
||
|
||
| Payload offset | Field | Status |
|
||
|---|---|---|
|
||
| [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
|
||
| [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
|
||
| [4:6] | Unknown (likely checksum) | ❓ open |
|
||
| [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
|
||
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
|
||
| [12:14] | Constant `02 00` | ✅ confirmed |
|
||
| [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
|
||
| [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |
|
||
|
||
**Key insight (2026-05-11 late):** every segment carries 510 main
|
||
samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
|
||
in the NEXT segment header. So each channel-segment effectively spans
|
||
512 sample-sets. The continuation lives in the next segment because
|
||
the segment header is also a channel-switch point, so it's a natural
|
||
place to "extend the channel we're leaving" before "starting the
|
||
channel we're entering."
|
||
|
||
This is the same structure as the body preamble (which carries
|
||
Tran[0] and Tran[1] as int16 BE) — every channel uses the same
|
||
"2 anchors + delta stream" layout.
|
||
|
||
## Channel rotation — VERIFIED 2026-05-11
|
||
|
||
```
|
||
(initial body) → Tran samples 0..509 (preamble + delta blocks)
|
||
segment 0 hdr ext+anchor → Vert samples 0..511 ← anchor in hdr [14:18]
|
||
segment 1 hdr ext+anchor → Long samples 0..511
|
||
segment 2 hdr ext+anchor → Mic samples 0..511
|
||
segment 3 hdr ext+anchor → Tran samples 510..1021 (continuation)
|
||
segment 4 hdr ext+anchor → Vert samples 512..1023
|
||
segment 5 hdr ext+anchor → Long samples 512..1023
|
||
segment 6 hdr ext+anchor → Mic samples 512..1023
|
||
segment 7 hdr ext+anchor → Tran samples 1022..1533
|
||
...
|
||
```
|
||
|
||
Implementation: `decode_waveform_v2()` returns
|
||
`{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
|
||
each channel's samples in 16-count units. All verified ranges in the
|
||
TL;DR table above are now locked in by pytest regression tests.
|
||
|
||
## What's still open
|
||
|
||
1. **`30 NN` block content.** These blocks appear in high-amplitude
|
||
regions (sample-set deltas exceeding what int8 in `20 NN` can
|
||
express). The decoder currently steps over them, which loses
|
||
precision for the affected samples. Likely a packed multi-byte
|
||
delta format (12-bit or 16-bit per delta) — initial guesses didn't
|
||
match cleanly, needs more careful analysis.
|
||
|
||
2. **MicL decoding.** The mic channel's anchor pair appears in the
|
||
third segment of each rotation cycle in the same format as the
|
||
geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
|
||
quantization steps), so direct integer comparison against ADC
|
||
units doesn't work. Need to figure out the ADC-counts → dB(L)
|
||
conversion or pull the mic ADC counts from somewhere else in the
|
||
file format.
|
||
|
||
3. **Walker fix for event-b.** The original quiet bundle's event-b
|
||
still bails out partway through. Lower priority since the other
|
||
7 events walk cleanly.
|
||
|
||
## Next experiment — crack the `30 NN` block
|
||
|
||
The scoring analyzer in `scratch/next_experiment_skeleton.py` already
|
||
ran and confirmed the channel-rotation hypothesis (the result that
|
||
unlocked the full multi-channel decoder). The next open piece is the
|
||
`30 NN` block format.
|
||
|
||
Approach:
|
||
|
||
1. Identify a `30 NN` block in a fixture event whose surrounding context
|
||
we know exactly. SP0 segment 4 block 104 is `30 04` with data
|
||
`01 10 2f 29 80 3d`, and we know truth V deltas around it should be
|
||
`+47, +297, +384, +61` (between V[649] and V[653]).
|
||
2. Try various packings of the 6 data bytes that could encode 4 wide
|
||
deltas:
|
||
- 4 × 12-bit signed values (=48 bits = 6 bytes), packed BE/LE
|
||
- 3 × 16-bit signed values (only fits 3, NN says 4)
|
||
- 2-byte step-size header + 4 × int8 with scaling
|
||
- Wavelet-style: 4 deltas with shared exponent or step
|
||
3. Initial brute-force found `+47` and `+61` in positions 1 and 3 of
|
||
a 12-bit BE packing, but `+297` and `+384` didn't fit cleanly.
|
||
Worth re-trying with more permutations.
|
||
|
||
Once cracked, the `30 NN` decoder slots into `decode_waveform_v2` and
|
||
the multi-channel decode extends past the high-amplitude regions.
|
||
|
||
## Test fixtures
|
||
|
||
Committed under `tests/fixtures/`:
|
||
|
||
- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
|
||
PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode
|
||
works but the loud-amplitude tests (preamble anchors, `30 NN`) are
|
||
uninformative.
|
||
- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
|
||
channels). These cracked the Tran codec.
|
||
- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures. JQ0 is Vert-heavy,
|
||
V70 is Mic-heavy (140 dB). These cracked the `00 NN` RLE rule.
|
||
|
||
Each fixture has a `.TXT` Blastware ASCII export as ground truth.
|
||
|
||
## Tests
|
||
|
||
`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
|
||
|
||
- Block framing (5 tag types with correct lengths).
|
||
- Walker contiguity (no gaps or overlaps).
|
||
- Segment header parsing (counter monotonicity, fixed-pattern check).
|
||
- `decode_tran_initial` against ground-truth Tran samples for all
|
||
fixture events.
|
||
|
||
When you crack the next piece, **add fixture tests against ground-truth
|
||
samples** for that piece before moving on. Don't let unverified code
|
||
ship without a regression lock-in.
|