Files
seismo-relay/docs/waveform_codec_re_status.md
T
Claude f68ee9f0f9 docs: clean up waveform-codec doc layers per review
Three "truth layers" had drifted apart between commits.  Fixed:

1. waveform_codec.py docstring rewritten from the 2026-05-08
   "structural framing only" state to the 2026-05-11 "Tran segment 0
   solved + segment-header partially decoded" state.  Killed stale
   "~80 sample-sets per segment" language (real segments are
   flash-page-byte-sized, not sample-count-sized; observed first-segment
   sizes are 42-510 samples depending on signal).  Killed stale
   "preamble is 7 or 9 bytes" language (always 7).

2. docs/instantel_protocol_reference.md §7.6.1: added a clear
   "CURRENT STATUS" box at the top with a status table.  Replaced the
   stale "~80 sample-sets" line with the verified per-event segment
   sizes.  Merged two redundant segment-header field-table sections.

3. docs/waveform_codec_re_status.md (NEW): clean working-status doc.
   Solved / not solved / hypothesis / next experiment / fixtures /
   tests.  The protocol reference remains the historical Rosetta
   Stone; this new file is the current-truth working note that
   shouldn't accumulate fossil layers.

4. CLAUDE.md §"Waveform body codec": prominent warning box at top —
   "DO NOT TRUST decoded sample arrays yet."  BW binary passthrough
   is the only sample-bearing output to trust until the decoder
   lands.  Added a "Next experiment" subsection pointing the next
   pass at the segment-channel scoring analyzer.

40 tests still pass.
2026-05-20 17:28:54 +00:00

6.8 KiB
Raw Blame History

Waveform body codec — current working status (2026-05-11)

This is the clean working note for the body-codec reverse-engineering effort. It supersedes scattered claims elsewhere when they conflict. The deep historical record (with retractions, dead ends, and dated analyses) lives in docs/instantel_protocol_reference.md §7.6.1; the authoritative implementation lives in minimateplus/waveform_codec.py.

TL;DR

The Blastware waveform-file body is a tagged variable-length block stream, NOT raw int16 LE samples. Block framing is solved. Tran channel segment-0 decoding is solved (byte-exact vs BW's ASCII export across all 5 high-amplitude fixture events). Multi-segment continuation and the Vert / Long / MicL channel decoders are still open.

Production code in minimateplus/client.py:_decode_a5_waveform still uses the broken legacy int16 LE decoder. Sample arrays it writes to the .h5 sidecars are wrong and must be treated as "unverified" by all downstream consumers. The BW binary write path (blastware_file.py) is unaffected — it's pure passthrough and remains byte-perfect.

What's solved

Block framing

Tag Length Meaning
10 NN NN/2 + 2 bytes 4-bit nibble deltas (2 per byte; high
nibble first; signed 0..7 / 8..F = -8..-1)
20 NN NN + 2 bytes int8 signed deltas (1 per byte)
00 NN 2 bytes RLE: append NN copies of current value
30 NN NN*2 in data section, Unknown content. Only in loud-from-
NN*4 in trailer start events.
40 02 20 bytes (fixed) Segment header

NN is always a multiple of 4.

Implementation: walk_body() in minimateplus/waveform_codec.py.

7-byte preamble

body[0:3]  = 00 02 00              magic
body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
body[5:7]  = Tran[1]   int16 BE    in 16-count units

Tran channel, segment 0

Segment 0 (everything before the first 40 02) encodes Tran samples only. Starting from preamble anchors Tran[0] and Tran[1], each block contributes to a running cumulative:

  • 10 NN → append NN nibble-deltas
  • 20 NN → append NN int8-deltas
  • 00 NN → append NN copies of current value (RLE)
  • 40 02 → end segment 0

Verified byte-exact:

Event Description Segment 0 size Match
M529LL1A.SP0 Loud, 0.25 s pretrig 510 510/510 ✓
M529LL1A.SV0 Loud from sample 0 58 58/58 ✓ (stops at first 30 NN)
M529LL1A.SS0 Loud from sample 0 42 42/42 ✓ (stops at first 30 04)
M529LL1L.JQ0 Vert-heavy 510 510/510 ✓
M529LL1L.V70 Mic-heavy (140 dB) 510 510/510 ✓

Implementation: decode_tran_initial().

Segment header (40 02, 20 bytes total)

Payload offset Field Status
[0:2] T_delta at first sample of new segment (int16 BE) confirmed
[2:4] Likely T_delta at sample seg_start+1 🟡 likely
[4:6] Unknown (possibly checksum) open
[6:8] Byte length to next segment header 2 (uint16 BE) confirmed
[8:12] Monotonic uint32 LE counter (starts ~0x47) confirmed
[12:14] Constant 02 00 confirmed
[14:18] Unknown 4-byte field open

What's still open

  1. Multi-segment Tran continuation. After segment 0, applying segment 1's blocks as Tran continuation diverges from truth by sample ~512. Block structure is identical to segment 0 and the per-segment delta budget matches the segment size — but the per- sample trajectory is wrong.

  2. Vert / Long / MicL channel decoders. No verified decoder for any non-Tran channel.

  3. 30 NN block content. Only appears in loud-from-start events. Probably a channel-switch or alternative-encoding marker for high- amplitude regions. Walker steps over it without decoding.

Strongest unverified hypothesis

Segments rotate channels:

segment 0  →  Tran samples 0..509
segment 1  →  Vert samples 0..507
segment 2  →  Long samples 0..507
segment 3  →  Mic  samples 0..507
segment 4  →  Tran samples 510..N (continuation)
...

This would explain:

  • Why segment-0 = Tran works perfectly.
  • Why segment 1 has the same block structure but applying it as Tran continuation gives wrong values.
  • Why the per-segment delta budget matches the segment size for a single channel (508 deltas per segment, not 4 × 508).

Not yet verified because the per-channel anchor at segment-start isn't identified in the segment header. Bytes [4:6] and [14:18] of the header are the prime candidates.

Next experiment — segment-channel scoring analyzer

Don't try to hero-code the full decoder. Instead, build a small analysis tool that:

  1. For each segment in every fixture event, runs the segment-0 Tran decoder (block-walk + RLE) and produces a cumulative trajectory of 508 deltas.
  2. Scores that trajectory against the BW ASCII truth for each of {Tran, Vert, Long, MicL} over the segment's sample range, starting from different anchor-byte candidates from the segment header.
  3. Reports which (channel, anchor-bytes-location) combination produces the lowest error for each segment.

If the rotation hypothesis is right, segment 0 should clearly score best against Tran, segment 1 against Vert, etc. The winning anchor-bytes-location will reveal which segment-header bytes encode the per-segment channel anchors.

If the rotation hypothesis is not right, the scorer will at least narrow down what segment 1 actually carries.

Test fixtures

Committed under tests/fixtures/:

  • decode-re-5-8-26/event-a..event-d/: original quiet bundle (4 events, PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode works but the loud-amplitude tests (preamble anchors, 30 NN) are uninformative.
  • 5-11-26/M529LL1A.{SP0,SS0,SV0}: loud bundle (PPV 6-7 in/s on all channels). These cracked the Tran codec.
  • 5-11-26/M529LL1L.{JQ0,V70}: targeted captures. JQ0 is Vert-heavy, V70 is Mic-heavy (140 dB). These cracked the 00 NN RLE rule.

Each fixture has a .TXT Blastware ASCII export as ground truth.

Tests

tests/test_waveform_codec.py (40 tests, all passing) locks in:

  • Block framing (5 tag types with correct lengths).
  • Walker contiguity (no gaps or overlaps).
  • Segment header parsing (counter monotonicity, fixed-pattern check).
  • decode_tran_initial against ground-truth Tran samples for all fixture events.

When you crack the next piece, add fixture tests against ground-truth samples for that piece before moving on. Don't let unverified code ship without a regression lock-in.