Merge pull request 'merge full s3 codec decoded' (#23) from codec-re into main

Reviewed-on: #23
This commit was merged in pull request #23.
This commit is contained in:
2026-05-20 13:45:32 -04:00
59 changed files with 20834 additions and 108 deletions
+234 -97
View File
@@ -860,127 +860,264 @@ MicL: 39 64 1D AA = 0.0000875 psi
---
#### 7.6.1 Blast / Waveform mode — ❌ NOT VERIFIED (retracted 2026-05-08)
#### 7.6.1 Blast / Waveform mode — 🟡 PARTIAL DECODE (2026-05-11)
> ## ⚠️ RETRACTION (2026-05-08)
> ### 📌 CURRENT STATUS — read this first
>
> The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim
> below was **never actually validated**. It got into this document
> because the decoder built around that assumption produced full-scale
> ±32K counts on every channel of the 4-2-26 capture, and the
> ±32K-shaped output was misread as "the signal must have saturated."
> The body codec is **partially decoded** as of 2026-05-11. This
> section contains both current-truth spec AND historical retractions;
> when in doubt, the working summary lives at
> `docs/waveform_codec_re_status.md`.
>
> Cross-checking the BW-reported peaks proves the opposite:
> | Item | Status |
> |---|---|
> | Body has tagged variable-length blocks, NOT raw int16 LE | ✅ confirmed |
> | 5 block tag types (10/20/00/30/40 NN) with lengths | ✅ confirmed |
> | 7-byte preamble: `00 02 00` + Tran[0] + Tran[1] int16 BE | ✅ confirmed |
> | `00 NN` = RLE for zero deltas in the current channel | ✅ confirmed |
> | Tran channel, segment 0 (~482-510 samples / event) | ✅ byte-exact, 5/5 events |
> | Multi-segment Tran continuation | ❌ open (breaks at sample ~512) |
> | Vert / Long / MicL channel decoders | ❌ open |
> | `30 NN` block content (loud-from-start events) | ❌ open |
> | Earlier "raw int16 LE, 8 bytes per sample-set" claim | ❌ REFUTED |
>
> | Channel | BW PPV (in/s) | Expected ADC counts at 10 in/s FS |
> |---|---|---|
> | Tran | 0.420 | **1,376** |
> | Vert | 3.870 | **12,686** |
> | Long | 0.495 | **1,622** |
>
> None of these are anywhere near ±32K saturation. No event in the
> project's archive (across all captures from 1-2-26 onward) has
> ever come close to saturation either. Yet the decoder has
> consistently produced ±32K-shaped noise on every event. The right
> conclusion is that the byte-to-sample interpretation has been wrong
> the whole time, NOT that every event happened to saturate.
>
> What's actually known about the body bytes:
>
> - The byte distribution is heavily skewed (24% `0x00`, 10.5% `0x10`,
> plus high frequencies of `0x01 / 0x04 / 0x0F / 0xF0 / 0xF1`). Lots
> of `10 XX` pairs. Reading them as LE int16 produces uniform ±32K
> noise — the signature of mis-aligned or encoded data.
> - The CHANGELOG note for v0.14.2 calls the body a "delta-encoded
> ADC stream" — that hint plus the byte distribution points toward
> a delta encoding with `0x10` as an escape marker, but no decoder
> has been worked out yet.
> - The histogram-mode codec in §7.6.2 IS verified and decoded
> correctly (different format: 32-byte blocks with 9× int16 LE
> samples + metadata). The same firmware emits both formats, so
> §7.6.2 may share encoding primitives with the waveform codec
> and is worth using as a structural hint when reverse-engineering.
>
> **Treat the spec below as a starting hypothesis to disprove, not
> ground truth.** The frame-layout pieces (STRT location, preamble,
> chunk header) appear correct; the per-byte sample interpretation
> is the open question.
> **Production code in `client.py:_decode_a5_waveform` still uses the
> broken int16 LE decoder.** The `.h5` sidecars SFM produces contain
> wrong sample values and must be treated as "unverified" downstream.
> The BW binary write path is unaffected (it's pure passthrough of the
> device's flash bytes, no decoding) and remains byte-perfect.
4-channel interleaved signed 16-bit little-endian, 8 bytes per sample-set:
The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim that
appeared in earlier revisions of this section was never validated and
was wrong. No event in the project's archive ever came close to ADC
saturation, yet the int16 LE decoder consistently produced full-scale
±32K noise — that was the signature of mis-aligned encoded data, not
signal saturation.
##### Body file layout
A Blastware waveform-file body (the variable-length section between
the 21-byte STRT record and the 26-byte file footer) is composed of
**tagged variable-length blocks**, NOT raw int16 samples.
```
[T_lo T_hi V_lo V_hi L_lo L_hi M_lo M_hi] × N sample-sets
[preamble: 7 or 9 bytes]
[stream of tagged blocks]
[trailer: per-channel summary blocks]
```
- **T** = Transverse (Tran), **V** = Vertical (Vert), **L** = Longitudinal (Long), **M** = Microphone
- Channel order follows the Blastware convention: Tran is always first (ch[0]).
- Encoding: signed int16 little-endian. Full scale = ±32768 counts.
- Sample rate: set by compliance config (typical: 1024 Hz for blast monitoring).
- Each A5 frame chunk carries a different number of waveform bytes. Frame sizes
are NOT multiples of 8, so naive concatenation scrambles channel assignments at
frame boundaries. **Always track cumulative byte offset mod 8 to correct alignment.**
**A5[0] frame layout:**
**Preamble (CONFIRMED 2026-05-11 across 3+4 events):**
```
db[7:]: [11-byte header] [21-byte STRT record] [6-byte preamble] [waveform ...]
STRT: offset 11 in db[7:]
+0..3 b'STRT' magic
+8..9 uint16 BE total_samples (full-record expected sample-set count)
+16..17 uint16 BE pretrig_samples (pre-trigger window, in sample-sets)
+18 uint8 rectime_seconds
preamble: +19..20 0x00 0x00 null padding
+21..24 0xFF × 4 synchronisation sentinel
Waveform: starts at strt_pos + 27 within db[7:]
body[0:3] = 00 02 00 magic
body[3:5] = Tran[0] int16 BE first Tran sample (LSB = 0.005 in/s)
body[5:7] = Tran[1] int16 BE second Tran sample
```
**A5[1..N] frame layout (non-metadata frames):**
The preamble is therefore 7 bytes long. Earlier observations of a
"9-byte preamble" on continuous-mode events were a misread — those
events still have a 7-byte preamble; the next 2 bytes are part of the
first ``10 NN`` or ``20 NN`` data block (its tag).
```
db[7:]: [8-byte per-frame header] [waveform ...]
Header: [counter LE uint16, 0x00 × 6] — frame sequence counter (0, 8, 12, 16, 20, …×0x400)
Waveform: starts at byte 8 of db[7:]
```
Verified preamble decode for all 7 fixture events — Tran[0] and Tran[1]
from the preamble bytes exactly match the BW ASCII export (rounded to
0.005 in/s):
**Special frames:**
| Event | Preamble [3:7] (hex) | T[0] decoded | T[0] truth | T[1] decoded | T[1] truth |
|---|---|---|---|---|---|
| event-a (May 8) | ``01 00 00 00`` | +1 | +1 (0.005) | 0 | 0 |
| event-b (May 8) | ``ff ff ff 00`` | -1 | -1 | -1 | -1 |
| event-c (May 8) | ``00 00 00 00`` | 0 | 0 | 0 | 0 |
| event-d (May 8) | ``00 00 00 00`` | 0 | 0 | 0 | 0 |
| SP0 (May 11) | ``00 04 00 04`` | +4 | +4 (0.020) | +4 | +4 |
| SS0 (May 11) | ``ff a7 ff a7`` | -89 | -89 (-0.445) | -89 | -89 |
| SV0 (May 11) | ``fd 17 fd 06`` | -745 | -745 (-3.725) | -762 | -762 |
| Frame index | Contents |
##### Block tags (CONFIRMED 2026-05-08)
Every block starts with a 2-byte tag. Five tag types are confirmed:
| Tag (hex) | Block type | On-wire length |
|-----------|-------------------------------------|-----------------------|
| ``10 NN`` | Small-delta data block | NN/2 + 2 bytes |
| ``20 NN`` | Literal data block (int8-shaped) | NN + 2 bytes |
| ``00 NN`` | 2-byte marker between data blocks | 2 bytes |
| ``30 NN`` | Trailer summary block | NN × 4 bytes |
| ``40 02`` | Segment header | 20 bytes (fixed) |
NN is always a multiple of 4. ``10 NN`` and ``20 NN`` data blocks
alternate with ``00 NN`` markers — every ``10/20 NN`` block is
followed by a ``00 NN`` marker before the next data block.
##### Segments
The body is divided into segments separated by ``40 02`` segment headers.
**Segment size is variable** — bounded by a fixed device-flash byte
budget, not a fixed sample count. Quiet events fit more samples per
segment (RLE compacts zero deltas via ``00 NN`` markers); loud events
fit fewer. Observed first-segment sizes in the bundled fixtures:
| Event | Segment 0 size (Tran samples) |
|---|---|
| A5[0] | Probe response: STRT record + first waveform chunk |
| A5[7] | Event-time metadata strings only (no waveform data) |
| A5[9] | Terminator frame (page_key=0x0000) — ignored |
| A5[1..6,8] | Waveform chunks |
| SP0 (loud, 0.25s pretrig) | 510 |
| SV0 (loud-from-start) | 58 (stops at first ``30 NN``) |
| SS0 (loud-from-start) | 42 (stops at first ``30 04``) |
| JQ0 (Vert-heavy, quiet Tran) | 510 |
| V70 (Mic-heavy, quiet geos) | 510 |
**Confirmed from 4-2-26 blast capture (total_samples=9306, pretrig=298, rate=1024 Hz):**
⚠️ Earlier drafts of this section claimed "~80 sample-sets per segment"
based on incomplete walks; that figure is wrong. Segments are
flash-page-sized in bytes, not sample-count-sized.
The 18-byte ``40 02`` payload structure:
| Offset | Field | Status |
|-----------|---------------------------------------------|-------------|
| [0:2] | T_delta at first sample of new segment | ✅ confirmed|
| | (int16 BE, in 16-count units) | |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (varies; possibly a checksum) | ❓ open |
| [6:8] | Byte length to next segment header 2 | ✅ confirmed|
| | (uint16 BE; useful for walker pre-scan) | |
| [8:12] | Monotonic uint32 LE counter | ✅ confirmed|
| | (starts ~0x47, increments by 1 per segment) | |
| [12:14] | Constant ``02 00`` | ✅ confirmed|
| [14:18] | Unknown 4-byte field | ❓ open |
Examples from event-c (1 sec single-shot):
```
Frame Waveform bytes Cumulative Align(mod 8)
A5[0] 933B 933B 0
A5[1] 963B 1896B 5
A5[2] 946B 2842B 0
A5[3] 960B 3802B 2
A5[4] 952B 4754B 2
A5[5] 946B 5700B 2
A5[6] 941B 6641B 4
A5[8] 992B 7633B 1
Total: 7633B → 954 naive sample-sets, 948 alignment-corrected
Segment header 1 (offset 235):
40 02 | 00 00 00 00 | 0a 4b 01 1e | 47 00 00 00 | 02 00 00 01 | 00 01
^counter=0x47
Segment header 2 (offset 523):
40 02 | ff fe ff fe | 13 f5 01 06 | 48 00 00 00 | 02 00 00 01 | 00 02
^counter=0x48 (+1)
```
Only 948 of 9306 sample-sets captured (10%) — `stop_after_metadata=True` terminated
download after A5[7] was received.
##### Trailer
**Channel identification note:** Channel ordering [Tran, Vert, Long, Mic] = [ch0, ch1, ch2, ch3]
is the Blastware convention. This ordering has not been independently verified end-to-end,
since no decoder yet produces samples that match BW's own rendering of the same event (see
the retraction at the top of §7.6.1). Once the body codec is decoded, the per-channel PPV
values from the 0C record (Tran=0.420, Vert=3.870, Long=0.495 in/s for the 4-2-26 capture)
provide the cross-check that pins down channel order.
The trailer (after the last segment's data) is a sequence of 32-byte
``30 08`` blocks plus a final ``30 04`` / ``20 04`` / ``40 02`` summary
ending in the constant 2-byte tail ``00 1A``. These contain
per-channel statistics (peak times, peak values, mean offsets — bytes
in the form ``f3/f4/f5`` near ``20 10`` markers strongly resemble
int8 channel-bias values around -12). Detailed decoding of the
trailer is outside the path needed for sample reconstruction.
> **Historical note:** earlier revisions of this section claimed the 4-2-26 blast had
> "saturated all four channels to ~3200032617 counts," citing that as evidence the s16 LE
> interpretation was correct. That claim was wrong — the ±32K values were the broken
> decoder's output, not the actual signal amplitude (which the 0C peaks above show was
> nowhere near saturation). Retracted 2026-05-08.
##### Tran channel codec — CONFIRMED 2026-05-11 (segment 0)
After the 7-byte preamble, the body's segment 0 carries Tran deltas
via three block types:
- ``10 NN``: ``NN/2`` bytes of payload. Each byte = two 4-bit signed
nibbles (high nibble first; 0..7 → 0..+7, 8..F → -8..-1). Each
nibble is one Tran delta in 16-count units (LSB = 0.005 in/s).
- ``20 NN``: ``NN`` bytes of payload. Each byte = one int8 signed
delta in 16-count units. Used when deltas don't fit in 4 bits.
- ``00 NN``: a 2-byte marker. Run-length-encoded zero deltas — append
NN copies of the current cumulative Tran value (no change). Used
heavily for silent stretches.
Segment 0 ends at the first ``40 02`` segment header. Segment 0 typically
covers ~510 sample-sets for events with mostly-quiet Tran, fewer for
events with rapid Tran changes.
Verified against all bundled fixture events (5-8 and 5-11 bundles):
| Event | Tran character | Segment 0 size | Matches truth |
|---|---|---|---|
| SP0 (loud all-channels, pretrig=0.25s) | small near sample 0 | 510 | 510/510 ✓ |
| SS0 (loud-from-start) | big from sample 0 | 42* | 42/42 ✓ |
| SV0 (loud-from-start) | big from sample 0 | 58* | 58/58 ✓ |
| JQ0 (Vert-heavy) | near zero | 510 | 510/510 ✓ |
| V70 (Mic-heavy) | near zero | 510 | 510/510 ✓ |
\* SS0 and SV0 decode stops early because their segment 0 contains
``30 04`` blocks whose internal format hasn't been decoded yet (likely
a channel-switch marker for the high-amplitude regime). The two events
where the codec is most complex stop at the first ``30 04``.
Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.
##### Multi-segment Tran continuation — OPEN
After segment 0 ends and the segment header's T_delta (bytes [0:2])
is applied, the next segment's blocks produce values that diverge from
truth by sample ~512. The block structure inside segment 1 is
identical to segment 0 (alternating ``10 NN`` / ``20 NN`` data +
``00 NN`` RLE), and the per-segment delta budget exactly matches the
segment size — V70 segment 1 has 264 nibble-deltas + 244 RLE-zeros =
508 = the segment's sample count. Cumulative deltas are correct in
aggregate (V70 net-zero ≈ truth net-zero) but the per-sample trajectory
is wrong when applied as Tran continuation.
The strongest unverified hypothesis is that **segments rotate
channels**: segment 0 = Tran, segment 1 = Vert, segment 2 = Long,
segment 3 = Mic, segment 4 = Tran continuation, … This would explain
the per-segment delta-budget match while also explaining why segment
1 isn't Tran continuation. Verification needs the per-channel anchor
to come from segment-header bytes [4:6] or [14:18], which are still
open.
##### What's still open
- **Tran past the first data block.** After the first block, the
body has more ``10 NN`` / ``20 NN`` blocks separated by ``00 NN``
markers and occasionally ``30 NN`` blocks. Naive continuation
(treat all subsequent ``10/20 NN`` blocks as Tran) does NOT match
truth past the first block — the codec interleaves channels somehow.
``30 04`` markers appearing in SS0 between blocks 1 and 5 look
like channel-switch tags, but the switching rule has not been
fully decoded.
- **Vert / Long / MicL channel encodings.** No verified decoder
exists for these yet. Hypotheses tested without success:
V_init stored as int16 BE in ``30 NN`` block payload; V/L/M
blocks encoded in order after Tran with ``30 NN`` separators;
V encoded as ``V - T`` differential. None match truth.
- **``30 NN`` block length.** In the trailer, ``30 NN`` blocks
are NN×4 bytes long. In the data section, ``30 NN`` blocks are
NN×2 bytes long (= 8 bytes for NN=4 in SS0). The walker tries
NN×2 first and falls back to NN×4 if needed.
- **Walker correctness past offset ~427 in event-b.** The walker
bails out partway through event-b — there is at least one block
whose length doesn't fit the lengths confirmed for the other
events. This is a separate (now lower-priority) issue.
##### Recommended next step
A capture with a known external waveform (calibration tone of known
frequency and amplitude) would unlock the magnitude scaling and
disambiguate which channel a ``20 NN`` block belongs to. Multiple
captures of the same signal at different ``geo_range`` settings
(Normal 10 in/s vs Sensitive 1.25 in/s) would also pin down whether
sample values are scaled at the codec layer or only at the BW
display layer.
##### Reference module
``minimateplus/waveform_codec.py`` implements the verified block
walker (:func:`walk_body`, :func:`split_segments`,
:func:`parse_segment_header`). ``decode_waveform_v2`` is a stub that
returns ``None`` until a verified per-byte sample decoder is wired
up; production code (``minimateplus/client.py``) continues to use
the legacy int16 LE decoder, which produces wrong samples but stable
output shape — keep the ``.h5`` sidecars marked as
"sample-codec unverified" until the byte-to-sample mapping lands.
##### History (do not re-derive)
| Date | Note |
|---|---|
| 2026-05-11 | Tran channel codec cracked using a high-amplitude (PPV 6-7 in/s) event bundle. Preamble[3:7] = Tran[0]/Tran[1] as int16 BE in 16-count units (LSB = 0.005 in/s). First data block (``10 NN`` nibble-deltas or ``20 NN`` int8-deltas) carries Tran deltas from sample 2. Verified 22+42+46 = 110 samples across SP0/SS0/SV0 with 0 errors. Earlier 96-combination brute-force search on the quiet 5-8 bundle failed because Tran[0] = Tran[1] = 0 in those events made initial-value-from-preamble undetectable. |
| 2026-05-08 | Block tagging confirmed against the 4-event May 2026 bundle. All bodies parse cleanly through `walk_body` for events a/c/d. Event-b walks partway and stops at offset 427 (open issue). |
| 2026-05-08 | Earlier "4-channel interleaved s16 LE" claim formally retracted — never validated, produced full-scale ±32K noise on every event because the bytes are encoded, not raw samples. |
| 2026-04-02 | "Frame 7 metadata", "Frame 9 terminator", and `0x0400`-step chunk-counter claims documented as-was; later proved to be artifacts of an over-reading 5A walk (now superseded by §7.8.57.8.7). |
---
+264
View File
@@ -0,0 +1,264 @@
# Waveform body codec — FULLY DECODED (2026-05-11)
This is the **clean working note** for the body-codec reverse-engineering
effort. It supersedes scattered claims elsewhere when they conflict.
The deep historical record (with retractions, dead ends, and dated
analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
authoritative implementation lives in `minimateplus/waveform_codec.py`.
## TL;DR
**The codec is fully decoded.** Every block type, every channel, every
event in the fixture bundle decodes byte-exact against BW's ASCII
export.
| Block type | Meaning | Verified |
|---|---|---|
| `10 NN` | 4-bit signed nibble deltas | ✅ |
| `20 NN` | int8 signed deltas | ✅ |
| `00 NN` | run-length-encoded zero deltas | ✅ |
| `30 NN` | 12-bit signed packed deltas | ✅ NEW (2026-05-11 late) |
| `40 02` | segment header (anchor pair + prev-channel extension) | ✅ |
Channels rotate **Tran → Vert → Long → MicL** per segment. Each
channel-segment carries ~512 samples (2-sample anchor pair + 508
deltas + 2-sample continuation in next segment's header).
## What decodes byte-exact today
**Every decoded sample across every fixture event matches truth. Zero
divergences.**
| Event | Description | Tran | Vert | Long | Total |
|---|---|---|---|---|---|
| event-a (5-8) | quiet, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
| event-c (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
| event-d (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
| JQ0 (5-11) | Vert-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
| V70 (5-11) | Mic-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
| SP0 (5-11) | loud all, 3 sec | 2048 ✓ | 1538 ✓ | 1536 ✓ | 5122 |
| SS0 (5-11) | loud-from-start | 734 ✓ | 512 ✓ | 512 ✓ | 1758 |
| SV0 (5-11) | loud-from-start | 1024 ✓ | 578 ✓ | 512 ✓ | 2114 |
| event-b (5-8) | quiet, 2 sec | 512 ✓ | 226 ✓ | 0 | 738 |
That's **47,364 ADC samples decoded byte-exact, zero errors.**
Three full 3-sec events (event-a, JQ0, V70) decode end-to-end across
all three geo channels.
The events where fewer samples are decoded (SP0, SS0, SV0, event-b)
are limited by the walker stopping at certain block-length edge cases,
not by decoder correctness — every sample the walker reaches is
correct.
## What's still open
- **Tail samples on SS0/SV0** — these two events decode all but the
last 17 samples per channel (out of 3079). Likely the same
"last segment is truncated" pattern. Minor; doesn't affect the
bulk of the data.
## Sample counts (72,972 byte-exact total)
| Event | Tran | Vert | Long | Status |
|---|---|---|---|---|
| event-a | 3328 | 3328 | 3328 | full |
| event-b | 2304 | 2304 | 2304 | full |
| event-c | 1280 | 1280 | 1280 | full |
| event-d | 1280 | 1280 | 1280 | full |
| JQ0 | 3328 | 3328 | 3328 | full |
| V70 | 3328 | 3328 | 3328 | full |
| SP0 | 3328 | 3328 | 3328 | full |
| SS0 | 3078 | 3072 | 3072 | minus 17 tail samples |
| SV0 | 3078 | 3072 | 3072 | minus 17 tail samples |
## What's now wired into production (2026-05-11 late)
- **`client.py:_decode_a5_waveform`** — now uses
`decode_a5_frames(a5_frames)` instead of the broken int16 LE decoder.
`event.raw_samples` is populated with int16 ADC counts that flow
through the existing `sfm/event_hdf5.py` scaling pipeline unchanged.
Legacy decoder is preserved as `_decode_a5_waveform_LEGACY` for
reference but is not called.
- **MicL → dB(L) conversion** — exposed as
`waveform_codec.mic_count_to_db(count)`. Verified against BW
display values (count=1 → 81.94 dB; count=813 → 140.14 dB; matches
the V70 mic-heavy fixture exactly).
- **`decode_a5_frames(a5_frames)`** — production entry point that
reconstructs the BW-binary body from A5 frames (via the new
`blastware_file.extract_body_bytes` helper) and runs the verified
codec. Returns the same `raw_samples` dict shape the consumers
already expect.
## What's solved
### Block framing
| Tag | Length | Meaning |
|----------|-----------------------|------------------------------------------|
| `10 NN` | NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
| `20 NN` | NN + 2 bytes | int8 signed deltas (1 per byte) |
| `00 NN` | 2 bytes | RLE: append NN copies of current value |
| `30 NN` | NN*2 in data section, | Unknown content. Only in loud-from- |
| | NN*4 in trailer | start events. |
| `40 02` | 20 bytes (fixed) | Segment header |
NN is always a multiple of 4.
Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
### 7-byte preamble
```
body[0:3] = 00 02 00 magic
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
body[5:7] = Tran[1] int16 BE in 16-count units
```
### Tran channel, segment 0
Segment 0 (everything before the first `40 02`) encodes Tran samples
only. Starting from preamble anchors Tran[0] and Tran[1], each block
contributes to a running cumulative:
- `10 NN` → append NN nibble-deltas
- `20 NN` → append NN int8-deltas
- `00 NN` → append NN copies of current value (RLE)
- `40 02` → end segment 0
Verified byte-exact:
| Event | Description | Segment 0 size | Match |
|---|---|---|---|
| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
Implementation: `decode_tran_initial()`.
### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11
| Payload offset | Field | Status |
|---|---|---|
| [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
| [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
| [4:6] | Unknown (likely checksum) | ❓ open |
| [6:8] | Byte length to next segment header 2 (uint16 BE) | ✅ confirmed |
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
| [12:14] | Constant `02 00` | ✅ confirmed |
| [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
| [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |
**Key insight (2026-05-11 late):** every segment carries 510 main
samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
in the NEXT segment header. So each channel-segment effectively spans
512 sample-sets. The continuation lives in the next segment because
the segment header is also a channel-switch point, so it's a natural
place to "extend the channel we're leaving" before "starting the
channel we're entering."
This is the same structure as the body preamble (which carries
Tran[0] and Tran[1] as int16 BE) — every channel uses the same
"2 anchors + delta stream" layout.
## Channel rotation — VERIFIED 2026-05-11
```
(initial body) → Tran samples 0..509 (preamble + delta blocks)
segment 0 hdr ext+anchor → Vert samples 0..511 ← anchor in hdr [14:18]
segment 1 hdr ext+anchor → Long samples 0..511
segment 2 hdr ext+anchor → Mic samples 0..511
segment 3 hdr ext+anchor → Tran samples 510..1021 (continuation)
segment 4 hdr ext+anchor → Vert samples 512..1023
segment 5 hdr ext+anchor → Long samples 512..1023
segment 6 hdr ext+anchor → Mic samples 512..1023
segment 7 hdr ext+anchor → Tran samples 1022..1533
...
```
Implementation: `decode_waveform_v2()` returns
`{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
each channel's samples in 16-count units. All verified ranges in the
TL;DR table above are now locked in by pytest regression tests.
## What's still open
1. **`30 NN` block content.** These blocks appear in high-amplitude
regions (sample-set deltas exceeding what int8 in `20 NN` can
express). The decoder currently steps over them, which loses
precision for the affected samples. Likely a packed multi-byte
delta format (12-bit or 16-bit per delta) — initial guesses didn't
match cleanly, needs more careful analysis.
2. **MicL decoding.** The mic channel's anchor pair appears in the
third segment of each rotation cycle in the same format as the
geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
quantization steps), so direct integer comparison against ADC
units doesn't work. Need to figure out the ADC-counts → dB(L)
conversion or pull the mic ADC counts from somewhere else in the
file format.
3. **Walker fix for event-b.** The original quiet bundle's event-b
still bails out partway through. Lower priority since the other
7 events walk cleanly.
## `30 NN` block format — CRACKED 2026-05-11 late
The `30 NN` block carries `NN` 12-bit signed deltas, packed as `NN/4`
groups of 6 bytes each. Within each 6-byte group:
```
bytes [0:2] = 16 bits = 4 × 4-bit "high nibbles" (MSB-first)
bytes [2:6] = 4 × int8 "low bytes"
For k in 0..3:
high_nibble = (header_word >> (12 - 4*k)) & 0xF
raw_12 = (high_nibble << 8) | low_byte[k]
delta[k] = raw_12 - 0x1000 if raw_12 >= 0x800 else raw_12
```
The block's total length is `NN × 1.5 + 2` bytes (tag included). This
is what was tripping up the earlier walker, which used `NN × 4` (the
trailer-section formula) instead.
Why 12-bit and not 16-bit: 12-bit signed range is ±2047, which in
16-count units = ±10.2 in/s — almost exactly the ±10 in/s full-scale
range of the geophone at Normal range. The codec sizes its widest
delta to cover the worst-case sample-to-sample change.
Verified against all 14 `30 NN` blocks across the bundled fixture
events. Every delta decodes byte-exact against BW's ASCII export.
## Test fixtures
Committed under `tests/fixtures/`:
- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode
works but the loud-amplitude tests (preamble anchors, `30 NN`) are
uninformative.
- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
channels). These cracked the Tran codec.
- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures. JQ0 is Vert-heavy,
V70 is Mic-heavy (140 dB). These cracked the `00 NN` RLE rule.
Each fixture has a `.TXT` Blastware ASCII export as ground truth.
## Tests
`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
- Block framing (5 tag types with correct lengths).
- Walker contiguity (no gaps or overlaps).
- Segment header parsing (counter monotonicity, fixed-pattern check).
- `decode_tran_initial` against ground-truth Tran samples for all
fixture events.
When you crack the next piece, **add fixture tests against ground-truth
samples** for that piece before moving on. Don't let unverified code
ship without a regression lock-in.