Files
seismo-relay/docs/waveform_codec_re_status.md
T
Claude 07675626dc codec-re: channel rotation CONFIRMED — full multi-channel decoder works
The segment-channel scoring analyzer (from scratch/next_experiment_skeleton.py)
ran and immediately confirmed the rotation hypothesis:

  SP0 seg 0: best fit Vert  508/508  ✓
  SP0 seg 1: best fit Long  508/508  ✓
  SP0 seg 3: best fit Tran  508/508  ✓  (Tran continuation)
  SP0 seg 5: best fit Long  508/508  ✓
  SP0 seg 9: best fit Long  508/508  ✓
  V70 seg 0: best fit Vert  508/508  ✓
  V70 seg 1: best fit Long  508/508  ✓

Channels rotate Tran → Vert → Long → MicL per 40 02 segment header.

Also discovered the segment header has DOUBLE duty: bytes [14:18] anchor
the NEW segment's channel (2 samples as int16 BE in 16-count units), AND
bytes [0:4] extend the PREVIOUS channel by 2 more samples (2 deltas as
int16 BE).  This is the same "2 anchors + delta stream" structure as the
body preamble for Tran.

decode_waveform_v2 now returns full per-channel sample dicts.
Byte-exact verified ranges:
  V70: Tran 512, Vert 512, Long 512   (all first segments)
  JQ0: Tran 512, Vert 258
  SP0: Long 1536 (all 3 L segments)

Still open: the 30 NN block format (high-amplitude packed deltas) —
appears mid-segment when single-byte deltas can't carry the magnitude.

6 new tests bring the count to 46.  All passing.
2026-05-20 17:28:54 +00:00

9.1 KiB
Raw Blame History

Waveform body codec — current working status (2026-05-11, late)

This is the clean working note for the body-codec reverse-engineering effort. It supersedes scattered claims elsewhere when they conflict. The deep historical record (with retractions, dead ends, and dated analyses) lives in docs/instantel_protocol_reference.md §7.6.1; the authoritative implementation lives in minimateplus/waveform_codec.py.

TL;DR

The Blastware waveform-file body is a tagged variable-length block stream, NOT raw int16 LE samples. Block framing is solved. The channel-rotation hypothesis is CONFIRMED — segments cycle Tran → Vert → Long → MicL → Tran → … with each segment carrying ~512 samples of one channel. Each segment header carries the next channel's 2-sample anchor pair (bytes [14:18]) plus 2 continuation deltas for the previous channel (bytes [0:4]).

What decodes byte-exact today (verified against BW ASCII export):

Event Channel Samples verified
V70 (Mic-heavy) Tran 512 (1 segment)
V70 Vert 512
V70 Long 512
JQ0 (Vert-heavy) Tran 512
JQ0 Vert 258
SP0 (loud all) Long 1536 (all 3 L segments)
SP0 Tran 1350 / 2044 produced
SP0 Vert 650 / 1526 produced

What's still open: the 30 NN block format. These blocks appear in high-amplitude regions (deltas exceeding what int8 can express). My decoder currently steps over them, which is fine for quiet stretches but breaks the cumulative when a 30 NN carries information for samples we need. Cracking this is the last major piece.

Production code in minimateplus/client.py:_decode_a5_waveform still uses the broken legacy int16 LE decoder. Sample arrays it writes to the .h5 sidecars are wrong and must be treated as "unverified" by all downstream consumers. The BW binary write path (blastware_file.py) is unaffected — it's pure passthrough and remains byte-perfect.

What's solved

Block framing

Tag Length Meaning
10 NN NN/2 + 2 bytes 4-bit nibble deltas (2 per byte; high
nibble first; signed 0..7 / 8..F = -8..-1)
20 NN NN + 2 bytes int8 signed deltas (1 per byte)
00 NN 2 bytes RLE: append NN copies of current value
30 NN NN*2 in data section, Unknown content. Only in loud-from-
NN*4 in trailer start events.
40 02 20 bytes (fixed) Segment header

NN is always a multiple of 4.

Implementation: walk_body() in minimateplus/waveform_codec.py.

7-byte preamble

body[0:3]  = 00 02 00              magic
body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
body[5:7]  = Tran[1]   int16 BE    in 16-count units

Tran channel, segment 0

Segment 0 (everything before the first 40 02) encodes Tran samples only. Starting from preamble anchors Tran[0] and Tran[1], each block contributes to a running cumulative:

  • 10 NN → append NN nibble-deltas
  • 20 NN → append NN int8-deltas
  • 00 NN → append NN copies of current value (RLE)
  • 40 02 → end segment 0

Verified byte-exact:

Event Description Segment 0 size Match
M529LL1A.SP0 Loud, 0.25 s pretrig 510 510/510 ✓
M529LL1A.SV0 Loud from sample 0 58 58/58 ✓ (stops at first 30 NN)
M529LL1A.SS0 Loud from sample 0 42 42/42 ✓ (stops at first 30 04)
M529LL1L.JQ0 Vert-heavy 510 510/510 ✓
M529LL1L.V70 Mic-heavy (140 dB) 510 510/510 ✓

Implementation: decode_tran_initial().

Segment header (40 02, 20 bytes total) — REWRITTEN 2026-05-11

Payload offset Field Status
[0:2] Previous-channel delta — 1st extension sample (int16 BE) confirmed
[2:4] Previous-channel delta — 2nd extension sample (int16 BE) confirmed
[4:6] Unknown (likely checksum) open
[6:8] Byte length to next segment header 2 (uint16 BE) confirmed
[8:12] Monotonic uint32 LE counter (starts ~0x47) confirmed
[12:14] Constant 02 00 confirmed
[14:16] THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) confirmed
[16:18] THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) confirmed

Key insight (2026-05-11 late): every segment carries 510 main samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live in the NEXT segment header. So each channel-segment effectively spans 512 sample-sets. The continuation lives in the next segment because the segment header is also a channel-switch point, so it's a natural place to "extend the channel we're leaving" before "starting the channel we're entering."

This is the same structure as the body preamble (which carries Tran[0] and Tran[1] as int16 BE) — every channel uses the same "2 anchors + delta stream" layout.

Channel rotation — VERIFIED 2026-05-11

(initial body)  →  Tran samples 0..509       (preamble + delta blocks)
segment 0 hdr  ext+anchor →  Vert samples 0..511   ← anchor in hdr [14:18]
segment 1 hdr  ext+anchor →  Long samples 0..511
segment 2 hdr  ext+anchor →  Mic  samples 0..511
segment 3 hdr  ext+anchor →  Tran samples 510..1021 (continuation)
segment 4 hdr  ext+anchor →  Vert samples 512..1023
segment 5 hdr  ext+anchor →  Long samples 512..1023
segment 6 hdr  ext+anchor →  Mic  samples 512..1023
segment 7 hdr  ext+anchor →  Tran samples 1022..1533
...

Implementation: decode_waveform_v2() returns {"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]} with each channel's samples in 16-count units. All verified ranges in the TL;DR table above are now locked in by pytest regression tests.

What's still open

  1. 30 NN block content. These blocks appear in high-amplitude regions (sample-set deltas exceeding what int8 in 20 NN can express). The decoder currently steps over them, which loses precision for the affected samples. Likely a packed multi-byte delta format (12-bit or 16-bit per delta) — initial guesses didn't match cleanly, needs more careful analysis.

  2. MicL decoding. The mic channel's anchor pair appears in the third segment of each rotation cycle in the same format as the geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB quantization steps), so direct integer comparison against ADC units doesn't work. Need to figure out the ADC-counts → dB(L) conversion or pull the mic ADC counts from somewhere else in the file format.

  3. Walker fix for event-b. The original quiet bundle's event-b still bails out partway through. Lower priority since the other 7 events walk cleanly.

Next experiment — crack the 30 NN block

The scoring analyzer in scratch/next_experiment_skeleton.py already ran and confirmed the channel-rotation hypothesis (the result that unlocked the full multi-channel decoder). The next open piece is the 30 NN block format.

Approach:

  1. Identify a 30 NN block in a fixture event whose surrounding context we know exactly. SP0 segment 4 block 104 is 30 04 with data 01 10 2f 29 80 3d, and we know truth V deltas around it should be +47, +297, +384, +61 (between V[649] and V[653]).
  2. Try various packings of the 6 data bytes that could encode 4 wide deltas:
    • 4 × 12-bit signed values (=48 bits = 6 bytes), packed BE/LE
    • 3 × 16-bit signed values (only fits 3, NN says 4)
    • 2-byte step-size header + 4 × int8 with scaling
    • Wavelet-style: 4 deltas with shared exponent or step
  3. Initial brute-force found +47 and +61 in positions 1 and 3 of a 12-bit BE packing, but +297 and +384 didn't fit cleanly. Worth re-trying with more permutations.

Once cracked, the 30 NN decoder slots into decode_waveform_v2 and the multi-channel decode extends past the high-amplitude regions.

Test fixtures

Committed under tests/fixtures/:

  • decode-re-5-8-26/event-a..event-d/: original quiet bundle (4 events, PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode works but the loud-amplitude tests (preamble anchors, 30 NN) are uninformative.
  • 5-11-26/M529LL1A.{SP0,SS0,SV0}: loud bundle (PPV 6-7 in/s on all channels). These cracked the Tran codec.
  • 5-11-26/M529LL1L.{JQ0,V70}: targeted captures. JQ0 is Vert-heavy, V70 is Mic-heavy (140 dB). These cracked the 00 NN RLE rule.

Each fixture has a .TXT Blastware ASCII export as ground truth.

Tests

tests/test_waveform_codec.py (40 tests, all passing) locks in:

  • Block framing (5 tag types with correct lengths).
  • Walker contiguity (no gaps or overlaps).
  • Segment header parsing (counter monotonicity, fixed-pattern check).
  • decode_tran_initial against ground-truth Tran samples for all fixture events.

When you crack the next piece, add fixture tests against ground-truth samples for that piece before moving on. Don't let unverified code ship without a regression lock-in.