seismo-relay/docs/waveform_codec_re_status.md

# Waveform body codec — current working status (2026-05-11)

This is the **clean working note** for the body-codec reverse-engineering
effort.  It supersedes scattered claims elsewhere when they conflict.
The deep historical record (with retractions, dead ends, and dated
analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
authoritative implementation lives in `minimateplus/waveform_codec.py`.

## TL;DR

The Blastware waveform-file body is a **tagged variable-length block
stream**, NOT raw int16 LE samples.  Block framing is solved.  Tran
channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
across all 5 high-amplitude fixture events).  Multi-segment continuation
and the Vert / Long / MicL channel decoders are still open.

**Production code in `minimateplus/client.py:_decode_a5_waveform` still
uses the broken legacy int16 LE decoder.**  Sample arrays it writes to
the `.h5` sidecars are wrong and must be treated as "unverified" by all
downstream consumers.  The BW binary write path (`blastware_file.py`)
is unaffected — it's pure passthrough and remains byte-perfect.

## What's solved

### Block framing

| Tag      | Length                | Meaning                                  |
|----------|-----------------------|------------------------------------------|
| `10 NN`  | NN/2 + 2 bytes        | 4-bit nibble deltas (2 per byte; high    |
|          |                       | nibble first; signed 0..7 / 8..F = -8..-1)|
| `20 NN`  | NN + 2 bytes          | int8 signed deltas (1 per byte)          |
| `00 NN`  | 2 bytes               | RLE: append NN copies of current value   |
| `30 NN`  | NN*2 in data section, | Unknown content.  Only in loud-from-     |
|          | NN*4 in trailer       | start events.                            |
| `40 02`  | 20 bytes (fixed)      | Segment header                           |

NN is always a multiple of 4.

Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.

### 7-byte preamble

```
body[0:3]  = 00 02 00              magic
body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
body[5:7]  = Tran[1]   int16 BE    in 16-count units
```

### Tran channel, segment 0

Segment 0 (everything before the first `40 02`) encodes Tran samples
only.  Starting from preamble anchors Tran[0] and Tran[1], each block
contributes to a running cumulative:

- `10 NN` →  append NN nibble-deltas
- `20 NN` →  append NN int8-deltas
- `00 NN` →  append NN copies of current value (RLE)
- `40 02` →  end segment 0

Verified byte-exact:

| Event | Description | Segment 0 size | Match |
|---|---|---|---|
| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |

Implementation: `decode_tran_initial()`.

### Segment header (`40 02`, 20 bytes total)

| Payload offset | Field | Status |
|---|---|---|
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (possibly checksum) | ❓ open |
| [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
| [12:14] | Constant `02 00` | ✅ confirmed |
| [14:18] | Unknown 4-byte field | ❓ open |

## What's still open

1. **Multi-segment Tran continuation.**  After segment 0, applying
   segment 1's blocks as Tran continuation diverges from truth by
   sample ~512.  Block structure is identical to segment 0 and the
   per-segment delta budget matches the segment size — but the per-
   sample trajectory is wrong.

2. **Vert / Long / MicL channel decoders.**  No verified decoder for
   any non-Tran channel.

3. **`30 NN` block content.**  Only appears in loud-from-start events.
   Probably a channel-switch or alternative-encoding marker for high-
   amplitude regions.  Walker steps over it without decoding.

## Strongest unverified hypothesis

Segments rotate channels:

```
segment 0  →  Tran samples 0..509
segment 1  →  Vert samples 0..507
segment 2  →  Long samples 0..507
segment 3  →  Mic  samples 0..507
segment 4  →  Tran samples 510..N (continuation)
...
```

This would explain:
- Why segment-0 = Tran works perfectly.
- Why segment 1 has the same block structure but applying it as Tran
  continuation gives wrong values.
- Why the per-segment delta budget matches the segment size for a
  *single* channel (508 deltas per segment, not 4 × 508).

Not yet verified because the per-channel anchor at segment-start isn't
identified in the segment header.  Bytes [4:6] and [14:18] of the
header are the prime candidates.

## Next experiment — segment-channel scoring analyzer

Don't try to hero-code the full decoder.  Instead, build a small
analysis tool that:

1. For each segment in every fixture event, runs the segment-0 Tran
   decoder (block-walk + RLE) and produces a cumulative trajectory
   of 508 deltas.
2. Scores that trajectory against the BW ASCII truth for *each* of
   {Tran, Vert, Long, MicL} over the segment's sample range, starting
   from different anchor-byte candidates from the segment header.
3. Reports which (channel, anchor-bytes-location) combination produces
   the lowest error for each segment.

If the rotation hypothesis is right, segment 0 should clearly score
best against Tran, segment 1 against Vert, etc.  The winning
anchor-bytes-location will reveal which segment-header bytes encode
the per-segment channel anchors.

If the rotation hypothesis is *not* right, the scorer will at least
narrow down what segment 1 actually carries.

## Test fixtures

Committed under `tests/fixtures/`:

- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
  PPV < 1 in/s).  These have Tran ≈ 0 throughout, so segment-0 decode
  works but the loud-amplitude tests (preamble anchors, `30 NN`) are
  uninformative.
- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
  channels).  These cracked the Tran codec.
- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures.  JQ0 is Vert-heavy,
  V70 is Mic-heavy (140 dB).  These cracked the `00 NN` RLE rule.

Each fixture has a `.TXT` Blastware ASCII export as ground truth.

## Tests

`tests/test_waveform_codec.py` (40 tests, all passing) locks in:

- Block framing (5 tag types with correct lengths).
- Walker contiguity (no gaps or overlaps).
- Segment header parsing (counter monotonicity, fixed-pattern check).
- `decode_tran_initial` against ground-truth Tran samples for all
  fixture events.

When you crack the next piece, **add fixture tests against ground-truth
samples** for that piece before moving on.  Don't let unverified code
ship without a regression lock-in.