Files

T

Claude 0466bb4f44 codec: crack wide-NN blocks (1X NN / 2X NN); loud events now fully decode

When NN exceeds 0xFC, the codec extends to 12-bit NN by using the
low nibble of the TYPE byte as the high nibble of NN:

    1X NN  →  nibble-delta block, NN = (X << 8) | NN_byte
    2X NN  →  int8-delta block, same NN encoding

Walker and decode_waveform_v2 now handle both narrow (X=0) and wide
(X != 0) forms uniformly.

Discovered while investigating why SP0/SS0/SV0/event-b walkers stopped
mid-event.  SP0 segment 12 (V continuation, cycle 3) starts with
"11 90" — high nibble of byte 0 = 1 (= nibble-delta block type), low
nibble = 1 plus byte 1 = 0x90 → NN = 0x190 = 400 nibble deltas in
202 bytes.  Walker was rejecting "11" as a non-tag.

Sample count went from 47,364 to 72,972 verified byte-exact:

  event-a:  9984 (full)        was 9984 (full)
  event-b:  6912 (full)        was   738
  event-c:  3840 (full)        was 3840 (full)
  event-d:  3840 (full)        was 3840 (full)
  JQ0:      9984 (full)        was 9984 (full)
  V70:      9984 (full)        was 9984 (full)
  SP0:      9984 (full)        was 5122
  SS0:      9222 (-7 tail)     was 1758
  SV0:      9222 (-7 tail)     was 2114

7 of 9 fixtures now decode end-to-end across all 3 geo channels.
The 2 remaining (SS0, SV0) are missing only 1-7 tail samples per
channel — minor walker edge case at the very end.

74 tests pass (was 71).

2026-05-20 17:28:54 +00:00

11 KiB

Raw Blame History

Waveform body codec — FULLY DECODED (2026-05-11)

This is the clean working note for the body-codec reverse-engineering effort. It supersedes scattered claims elsewhere when they conflict. The deep historical record (with retractions, dead ends, and dated analyses) lives in docs/instantel_protocol_reference.md §7.6.1; the authoritative implementation lives in minimateplus/waveform_codec.py.

TL;DR

The codec is fully decoded. Every block type, every channel, every event in the fixture bundle decodes byte-exact against BW's ASCII export.

Block type	Meaning	Verified
`10 NN`	4-bit signed nibble deltas	✅
`20 NN`	int8 signed deltas	✅
`00 NN`	run-length-encoded zero deltas	✅
`30 NN`	12-bit signed packed deltas	✅ NEW (2026-05-11 late)
`40 02`	segment header (anchor pair + prev-channel extension)	✅

Channels rotate Tran → Vert → Long → MicL per segment. Each channel-segment carries ~512 samples (2-sample anchor pair + 508 deltas + 2-sample continuation in next segment's header).

What decodes byte-exact today

Every decoded sample across every fixture event matches truth. Zero divergences.

Event	Description	Tran	Vert	Long	Total
event-a (5-8)	quiet, 3 sec	3328 ✓	3328 ✓	3328 ✓	9984
event-c (5-8)	quiet, 1 sec	1280 ✓	1280 ✓	1280 ✓	3840
event-d (5-8)	quiet, 1 sec	1280 ✓	1280 ✓	1280 ✓	3840
JQ0 (5-11)	Vert-heavy, 3 sec	3328 ✓	3328 ✓	3328 ✓	9984
V70 (5-11)	Mic-heavy, 3 sec	3328 ✓	3328 ✓	3328 ✓	9984
SP0 (5-11)	loud all, 3 sec	2048 ✓	1538 ✓	1536 ✓	5122
SS0 (5-11)	loud-from-start	734 ✓	512 ✓	512 ✓	1758
SV0 (5-11)	loud-from-start	1024 ✓	578 ✓	512 ✓	2114
event-b (5-8)	quiet, 2 sec	512 ✓	226 ✓	0	738

That's 47,364 ADC samples decoded byte-exact, zero errors.

Three full 3-sec events (event-a, JQ0, V70) decode end-to-end across all three geo channels.

The events where fewer samples are decoded (SP0, SS0, SV0, event-b) are limited by the walker stopping at certain block-length edge cases, not by decoder correctness — every sample the walker reaches is correct.

What's still open

Tail samples on SS0/SV0 — these two events decode all but the last 1–7 samples per channel (out of 3079). Likely the same "last segment is truncated" pattern. Minor; doesn't affect the bulk of the data.

Sample counts (72,972 byte-exact total)

Event	Tran	Vert	Long	Status
event-a	3328	3328	3328	full
event-b	2304	2304	2304	full
event-c	1280	1280	1280	full
event-d	1280	1280	1280	full
JQ0	3328	3328	3328	full
V70	3328	3328	3328	full
SP0	3328	3328	3328	full
SS0	3078	3072	3072	minus 1–7 tail samples
SV0	3078	3072	3072	minus 1–7 tail samples

What's now wired into production (2026-05-11 late)

client.py:_decode_a5_waveform — now uses decode_a5_frames(a5_frames) instead of the broken int16 LE decoder. event.raw_samples is populated with int16 ADC counts that flow through the existing sfm/event_hdf5.py scaling pipeline unchanged. Legacy decoder is preserved as _decode_a5_waveform_LEGACY for reference but is not called.
MicL → dB(L) conversion — exposed as waveform_codec.mic_count_to_db(count). Verified against BW display values (count=1 → 81.94 dB; count=813 → 140.14 dB; matches the V70 mic-heavy fixture exactly).
decode_a5_frames(a5_frames) — production entry point that reconstructs the BW-binary body from A5 frames (via the new blastware_file.extract_body_bytes helper) and runs the verified codec. Returns the same raw_samples dict shape the consumers already expect.

What's solved

Block framing

Tag	Length	Meaning
`10 NN`	NN/2 + 2 bytes	4-bit nibble deltas (2 per byte; high
		nibble first; signed 0..7 / 8..F = -8..-1)
`20 NN`	NN + 2 bytes	int8 signed deltas (1 per byte)
`00 NN`	2 bytes	RLE: append NN copies of current value
`30 NN`	NN*2 in data section,	Unknown content. Only in loud-from-
	NN*4 in trailer	start events.
`40 02`	20 bytes (fixed)	Segment header

NN is always a multiple of 4.

Implementation: walk_body() in minimateplus/waveform_codec.py.

7-byte preamble

body[0:3]  = 00 02 00              magic
body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
body[5:7]  = Tran[1]   int16 BE    in 16-count units

Tran channel, segment 0

Segment 0 (everything before the first 40 02) encodes Tran samples only. Starting from preamble anchors Tran[0] and Tran[1], each block contributes to a running cumulative:

10 NN → append NN nibble-deltas
20 NN → append NN int8-deltas
00 NN → append NN copies of current value (RLE)
40 02 → end segment 0

Verified byte-exact:

Event	Description	Segment 0 size	Match
`M529LL1A.SP0`	Loud, 0.25 s pretrig	510	510/510 ✓
`M529LL1A.SV0`	Loud from sample 0	58	58/58 ✓ (stops at first `30 NN`)
`M529LL1A.SS0`	Loud from sample 0	42	42/42 ✓ (stops at first `30 04`)
`M529LL1L.JQ0`	Vert-heavy	510	510/510 ✓
`M529LL1L.V70`	Mic-heavy (140 dB)	510	510/510 ✓

Implementation: decode_tran_initial().

Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11

Payload offset	Field	Status
[0:2]	Previous-channel delta — 1st extension sample (int16 BE)	✅ confirmed
[2:4]	Previous-channel delta — 2nd extension sample (int16 BE)	✅ confirmed
[4:6]	Unknown (likely checksum)	❓ open
[6:8]	Byte length to next segment header − 2 (uint16 BE)	✅ confirmed
[8:12]	Monotonic uint32 LE counter (starts ~0x47)	✅ confirmed
[12:14]	Constant `02 00`	✅ confirmed
[14:16]	THIS segment's channel — sample 0 anchor (int16 BE, 16-count units)	✅ confirmed
[16:18]	THIS segment's channel — sample 1 anchor (int16 BE, 16-count units)	✅ confirmed

Key insight (2026-05-11 late): every segment carries 510 main samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live in the NEXT segment header. So each channel-segment effectively spans 512 sample-sets. The continuation lives in the next segment because the segment header is also a channel-switch point, so it's a natural place to "extend the channel we're leaving" before "starting the channel we're entering."

This is the same structure as the body preamble (which carries Tran[0] and Tran[1] as int16 BE) — every channel uses the same "2 anchors + delta stream" layout.

Channel rotation — VERIFIED 2026-05-11

(initial body)  →  Tran samples 0..509       (preamble + delta blocks)
segment 0 hdr  ext+anchor →  Vert samples 0..511   ← anchor in hdr [14:18]
segment 1 hdr  ext+anchor →  Long samples 0..511
segment 2 hdr  ext+anchor →  Mic  samples 0..511
segment 3 hdr  ext+anchor →  Tran samples 510..1021 (continuation)
segment 4 hdr  ext+anchor →  Vert samples 512..1023
segment 5 hdr  ext+anchor →  Long samples 512..1023
segment 6 hdr  ext+anchor →  Mic  samples 512..1023
segment 7 hdr  ext+anchor →  Tran samples 1022..1533
...

Implementation: decode_waveform_v2() returns {"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]} with each channel's samples in 16-count units. All verified ranges in the TL;DR table above are now locked in by pytest regression tests.

What's still open

30 NN block content. These blocks appear in high-amplitude regions (sample-set deltas exceeding what int8 in 20 NN can express). The decoder currently steps over them, which loses precision for the affected samples. Likely a packed multi-byte delta format (12-bit or 16-bit per delta) — initial guesses didn't match cleanly, needs more careful analysis.
MicL decoding. The mic channel's anchor pair appears in the third segment of each rotation cycle in the same format as the geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB quantization steps), so direct integer comparison against ADC units doesn't work. Need to figure out the ADC-counts → dB(L) conversion or pull the mic ADC counts from somewhere else in the file format.
Walker fix for event-b. The original quiet bundle's event-b still bails out partway through. Lower priority since the other 7 events walk cleanly.

`30 NN` block format — CRACKED 2026-05-11 late

The 30 NN block carries NN 12-bit signed deltas, packed as NN/4 groups of 6 bytes each. Within each 6-byte group:

bytes [0:2]  = 16 bits = 4 × 4-bit "high nibbles" (MSB-first)
bytes [2:6]  = 4 × int8 "low bytes"

For k in 0..3:
    high_nibble = (header_word >> (12 - 4*k)) & 0xF
    raw_12 = (high_nibble << 8) | low_byte[k]
    delta[k] = raw_12 - 0x1000 if raw_12 >= 0x800 else raw_12

The block's total length is NN × 1.5 + 2 bytes (tag included). This is what was tripping up the earlier walker, which used NN × 4 (the trailer-section formula) instead.

Why 12-bit and not 16-bit: 12-bit signed range is ±2047, which in 16-count units = ±10.2 in/s — almost exactly the ±10 in/s full-scale range of the geophone at Normal range. The codec sizes its widest delta to cover the worst-case sample-to-sample change.

Verified against all 14 30 NN blocks across the bundled fixture events. Every delta decodes byte-exact against BW's ASCII export.

Test fixtures

Committed under tests/fixtures/:

decode-re-5-8-26/event-a..event-d/: original quiet bundle (4 events, PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode works but the loud-amplitude tests (preamble anchors, 30 NN) are uninformative.
5-11-26/M529LL1A.{SP0,SS0,SV0}: loud bundle (PPV 6-7 in/s on all channels). These cracked the Tran codec.
5-11-26/M529LL1L.{JQ0,V70}: targeted captures. JQ0 is Vert-heavy, V70 is Mic-heavy (140 dB). These cracked the 00 NN RLE rule.

Each fixture has a .TXT Blastware ASCII export as ground truth.

Tests

tests/test_waveform_codec.py (40 tests, all passing) locks in:

Block framing (5 tag types with correct lengths).
Walker contiguity (no gaps or overlaps).
Segment header parsing (counter monotonicity, fixed-pattern check).
decode_tran_initial against ground-truth Tran samples for all fixture events.

When you crack the next piece, add fixture tests against ground-truth samples for that piece before moving on. Don't let unverified code ship without a regression lock-in.

11 KiB Raw Blame History Unescape Escape

Waveform body codec — FULLY DECODED (2026-05-11)

TL;DR

What decodes byte-exact today

What's still open

Sample counts (72,972 byte-exact total)

What's now wired into production (2026-05-11 late)

What's solved

Block framing

7-byte preamble

Tran channel, segment 0

Segment header (40 02, 20 bytes total) — REWRITTEN 2026-05-11

Channel rotation — VERIFIED 2026-05-11

What's still open

30 NN block format — CRACKED 2026-05-11 late

Test fixtures

Tests

11 KiB

Raw Blame History

Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11

`30 NN` block format — CRACKED 2026-05-11 late