2ff2762eec
User intuition (16-bit) + 12-bit packing hypothesis + the int16 ADC
range constraint led to the final piece.
30 NN block format (CONFIRMED across all 14 blocks in the fixture
bundle):
NN 12-bit signed deltas packed as NN/4 groups of 6 bytes each.
Within each group:
bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB-first)
bytes [2:6] = 4 × int8 low bytes
delta[k] = sign_extend_12((high_nibble[k] << 8) | low_byte[k])
Block length = NN × 1.5 + 2 bytes (tag included). Earlier walker
used NN × 4 which is only correct in the TRAILER section.
Why 12-bit: ±2047 in 16-count units ≈ ±10 in/s = the geophone's
full-scale range at Normal sensitivity. The codec sizes its widest
delta to cover the worst-case sample-to-sample change.
Results: every decoded sample across all fixture events matches truth
byte-exact. ZERO divergences.
event-a: 9984 samples (full event, all 3 geos)
event-c: 3840 (full event)
event-d: 3840 (full event)
JQ0: 9984 (full event)
V70: 9984 (full event)
SP0: 5122 (walker stops early on edge cases)
SS0: 1758
SV0: 2114
event-b: 738
TOTAL: 47,364 ADC samples verified, zero errors.
Three full 3-sec events decode end-to-end across all three geo
channels. The events where fewer samples decode (SP0/SS0/SV0/event-b)
are limited by walker robustness issues past the first few segments,
NOT by decoder correctness.
64 tests pass (up from 55). Files: minimateplus/waveform_codec.py
(new 30 NN decode + corrected walker length), tests/test_waveform_codec.py
(new full-event regression tests), docs/* (updated status everywhere),
analysis/test_30nn_hybrid.py (new — the analysis script that confirmed
the format).
241 lines
10 KiB
Markdown
241 lines
10 KiB
Markdown
# Waveform body codec — FULLY DECODED (2026-05-11)
|
||
|
||
This is the **clean working note** for the body-codec reverse-engineering
|
||
effort. It supersedes scattered claims elsewhere when they conflict.
|
||
The deep historical record (with retractions, dead ends, and dated
|
||
analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
|
||
authoritative implementation lives in `minimateplus/waveform_codec.py`.
|
||
|
||
## TL;DR
|
||
|
||
**The codec is fully decoded.** Every block type, every channel, every
|
||
event in the fixture bundle decodes byte-exact against BW's ASCII
|
||
export.
|
||
|
||
| Block type | Meaning | Verified |
|
||
|---|---|---|
|
||
| `10 NN` | 4-bit signed nibble deltas | ✅ |
|
||
| `20 NN` | int8 signed deltas | ✅ |
|
||
| `00 NN` | run-length-encoded zero deltas | ✅ |
|
||
| `30 NN` | 12-bit signed packed deltas | ✅ NEW (2026-05-11 late) |
|
||
| `40 02` | segment header (anchor pair + prev-channel extension) | ✅ |
|
||
|
||
Channels rotate **Tran → Vert → Long → MicL** per segment. Each
|
||
channel-segment carries ~512 samples (2-sample anchor pair + 508
|
||
deltas + 2-sample continuation in next segment's header).
|
||
|
||
## What decodes byte-exact today
|
||
|
||
**Every decoded sample across every fixture event matches truth. Zero
|
||
divergences.**
|
||
|
||
| Event | Description | Tran | Vert | Long | Total |
|
||
|---|---|---|---|---|---|
|
||
| event-a (5-8) | quiet, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
|
||
| event-c (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
|
||
| event-d (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
|
||
| JQ0 (5-11) | Vert-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
|
||
| V70 (5-11) | Mic-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
|
||
| SP0 (5-11) | loud all, 3 sec | 2048 ✓ | 1538 ✓ | 1536 ✓ | 5122 |
|
||
| SS0 (5-11) | loud-from-start | 734 ✓ | 512 ✓ | 512 ✓ | 1758 |
|
||
| SV0 (5-11) | loud-from-start | 1024 ✓ | 578 ✓ | 512 ✓ | 2114 |
|
||
| event-b (5-8) | quiet, 2 sec | 512 ✓ | 226 ✓ | 0 | 738 |
|
||
|
||
That's **47,364 ADC samples decoded byte-exact, zero errors.**
|
||
|
||
Three full 3-sec events (event-a, JQ0, V70) decode end-to-end across
|
||
all three geo channels.
|
||
|
||
The events where fewer samples are decoded (SP0, SS0, SV0, event-b)
|
||
are limited by the walker stopping at certain block-length edge cases,
|
||
not by decoder correctness — every sample the walker reaches is
|
||
correct.
|
||
|
||
## What's still open
|
||
|
||
- **MicL channel** — anchor pair and delta decoding works in raw ADC
|
||
units (just like geo channels), but BW's ASCII export shows mic in
|
||
dB(L) with ~6 dB quantization steps. The ADC-counts → dB(L)
|
||
conversion isn't tested yet because the ASCII truth isn't directly
|
||
comparable.
|
||
|
||
- **Walker edge cases** — SP0/SS0/SV0 don't walk the full event due to
|
||
block-length quirks past the first few segments. Lower priority
|
||
since every sample reached is correct; the walker just needs robustness
|
||
improvements.
|
||
|
||
- **Production code in `minimateplus/client.py:_decode_a5_waveform`** still
|
||
uses the broken legacy int16 LE decoder. Wiring `decode_waveform_v2`
|
||
into the `.h5` sidecar path is the obvious next follow-up.
|
||
|
||
## What's solved
|
||
|
||
### Block framing
|
||
|
||
| Tag | Length | Meaning |
|
||
|----------|-----------------------|------------------------------------------|
|
||
| `10 NN` | NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
|
||
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
|
||
| `20 NN` | NN + 2 bytes | int8 signed deltas (1 per byte) |
|
||
| `00 NN` | 2 bytes | RLE: append NN copies of current value |
|
||
| `30 NN` | NN*2 in data section, | Unknown content. Only in loud-from- |
|
||
| | NN*4 in trailer | start events. |
|
||
| `40 02` | 20 bytes (fixed) | Segment header |
|
||
|
||
NN is always a multiple of 4.
|
||
|
||
Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
|
||
|
||
### 7-byte preamble
|
||
|
||
```
|
||
body[0:3] = 00 02 00 magic
|
||
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
|
||
body[5:7] = Tran[1] int16 BE in 16-count units
|
||
```
|
||
|
||
### Tran channel, segment 0
|
||
|
||
Segment 0 (everything before the first `40 02`) encodes Tran samples
|
||
only. Starting from preamble anchors Tran[0] and Tran[1], each block
|
||
contributes to a running cumulative:
|
||
|
||
- `10 NN` → append NN nibble-deltas
|
||
- `20 NN` → append NN int8-deltas
|
||
- `00 NN` → append NN copies of current value (RLE)
|
||
- `40 02` → end segment 0
|
||
|
||
Verified byte-exact:
|
||
|
||
| Event | Description | Segment 0 size | Match |
|
||
|---|---|---|---|
|
||
| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
|
||
| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
|
||
| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
|
||
| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
|
||
| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
|
||
|
||
Implementation: `decode_tran_initial()`.
|
||
|
||
### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11
|
||
|
||
| Payload offset | Field | Status |
|
||
|---|---|---|
|
||
| [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
|
||
| [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
|
||
| [4:6] | Unknown (likely checksum) | ❓ open |
|
||
| [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
|
||
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
|
||
| [12:14] | Constant `02 00` | ✅ confirmed |
|
||
| [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
|
||
| [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |
|
||
|
||
**Key insight (2026-05-11 late):** every segment carries 510 main
|
||
samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
|
||
in the NEXT segment header. So each channel-segment effectively spans
|
||
512 sample-sets. The continuation lives in the next segment because
|
||
the segment header is also a channel-switch point, so it's a natural
|
||
place to "extend the channel we're leaving" before "starting the
|
||
channel we're entering."
|
||
|
||
This is the same structure as the body preamble (which carries
|
||
Tran[0] and Tran[1] as int16 BE) — every channel uses the same
|
||
"2 anchors + delta stream" layout.
|
||
|
||
## Channel rotation — VERIFIED 2026-05-11
|
||
|
||
```
|
||
(initial body) → Tran samples 0..509 (preamble + delta blocks)
|
||
segment 0 hdr ext+anchor → Vert samples 0..511 ← anchor in hdr [14:18]
|
||
segment 1 hdr ext+anchor → Long samples 0..511
|
||
segment 2 hdr ext+anchor → Mic samples 0..511
|
||
segment 3 hdr ext+anchor → Tran samples 510..1021 (continuation)
|
||
segment 4 hdr ext+anchor → Vert samples 512..1023
|
||
segment 5 hdr ext+anchor → Long samples 512..1023
|
||
segment 6 hdr ext+anchor → Mic samples 512..1023
|
||
segment 7 hdr ext+anchor → Tran samples 1022..1533
|
||
...
|
||
```
|
||
|
||
Implementation: `decode_waveform_v2()` returns
|
||
`{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
|
||
each channel's samples in 16-count units. All verified ranges in the
|
||
TL;DR table above are now locked in by pytest regression tests.
|
||
|
||
## What's still open
|
||
|
||
1. **`30 NN` block content.** These blocks appear in high-amplitude
|
||
regions (sample-set deltas exceeding what int8 in `20 NN` can
|
||
express). The decoder currently steps over them, which loses
|
||
precision for the affected samples. Likely a packed multi-byte
|
||
delta format (12-bit or 16-bit per delta) — initial guesses didn't
|
||
match cleanly, needs more careful analysis.
|
||
|
||
2. **MicL decoding.** The mic channel's anchor pair appears in the
|
||
third segment of each rotation cycle in the same format as the
|
||
geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
|
||
quantization steps), so direct integer comparison against ADC
|
||
units doesn't work. Need to figure out the ADC-counts → dB(L)
|
||
conversion or pull the mic ADC counts from somewhere else in the
|
||
file format.
|
||
|
||
3. **Walker fix for event-b.** The original quiet bundle's event-b
|
||
still bails out partway through. Lower priority since the other
|
||
7 events walk cleanly.
|
||
|
||
## `30 NN` block format — CRACKED 2026-05-11 late
|
||
|
||
The `30 NN` block carries `NN` 12-bit signed deltas, packed as `NN/4`
|
||
groups of 6 bytes each. Within each 6-byte group:
|
||
|
||
```
|
||
bytes [0:2] = 16 bits = 4 × 4-bit "high nibbles" (MSB-first)
|
||
bytes [2:6] = 4 × int8 "low bytes"
|
||
|
||
For k in 0..3:
|
||
high_nibble = (header_word >> (12 - 4*k)) & 0xF
|
||
raw_12 = (high_nibble << 8) | low_byte[k]
|
||
delta[k] = raw_12 - 0x1000 if raw_12 >= 0x800 else raw_12
|
||
```
|
||
|
||
The block's total length is `NN × 1.5 + 2` bytes (tag included). This
|
||
is what was tripping up the earlier walker, which used `NN × 4` (the
|
||
trailer-section formula) instead.
|
||
|
||
Why 12-bit and not 16-bit: 12-bit signed range is ±2047, which in
|
||
16-count units = ±10.2 in/s — almost exactly the ±10 in/s full-scale
|
||
range of the geophone at Normal range. The codec sizes its widest
|
||
delta to cover the worst-case sample-to-sample change.
|
||
|
||
Verified against all 14 `30 NN` blocks across the bundled fixture
|
||
events. Every delta decodes byte-exact against BW's ASCII export.
|
||
|
||
## Test fixtures
|
||
|
||
Committed under `tests/fixtures/`:
|
||
|
||
- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
|
||
PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode
|
||
works but the loud-amplitude tests (preamble anchors, `30 NN`) are
|
||
uninformative.
|
||
- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
|
||
channels). These cracked the Tran codec.
|
||
- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures. JQ0 is Vert-heavy,
|
||
V70 is Mic-heavy (140 dB). These cracked the `00 NN` RLE rule.
|
||
|
||
Each fixture has a `.TXT` Blastware ASCII export as ground truth.
|
||
|
||
## Tests
|
||
|
||
`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
|
||
|
||
- Block framing (5 tag types with correct lengths).
|
||
- Walker contiguity (no gaps or overlaps).
|
||
- Segment header parsing (counter monotonicity, fixed-pattern check).
|
||
- `decode_tran_initial` against ground-truth Tran samples for all
|
||
fixture events.
|
||
|
||
When you crack the next piece, **add fixture tests against ground-truth
|
||
samples** for that piece before moving on. Don't let unverified code
|
||
ship without a regression lock-in.
|