2ff2762eec
User intuition (16-bit) + 12-bit packing hypothesis + the int16 ADC
range constraint led to the final piece.
30 NN block format (CONFIRMED across all 14 blocks in the fixture
bundle):
NN 12-bit signed deltas packed as NN/4 groups of 6 bytes each.
Within each group:
bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB-first)
bytes [2:6] = 4 × int8 low bytes
delta[k] = sign_extend_12((high_nibble[k] << 8) | low_byte[k])
Block length = NN × 1.5 + 2 bytes (tag included). Earlier walker
used NN × 4 which is only correct in the TRAILER section.
Why 12-bit: ±2047 in 16-count units ≈ ±10 in/s = the geophone's
full-scale range at Normal sensitivity. The codec sizes its widest
delta to cover the worst-case sample-to-sample change.
Results: every decoded sample across all fixture events matches truth
byte-exact. ZERO divergences.
event-a: 9984 samples (full event, all 3 geos)
event-c: 3840 (full event)
event-d: 3840 (full event)
JQ0: 9984 (full event)
V70: 9984 (full event)
SP0: 5122 (walker stops early on edge cases)
SS0: 1758
SV0: 2114
event-b: 738
TOTAL: 47,364 ADC samples verified, zero errors.
Three full 3-sec events decode end-to-end across all three geo
channels. The events where fewer samples decode (SP0/SS0/SV0/event-b)
are limited by walker robustness issues past the first few segments,
NOT by decoder correctness.
64 tests pass (up from 55). Files: minimateplus/waveform_codec.py
(new 30 NN decode + corrected walker length), tests/test_waveform_codec.py
(new full-event regression tests), docs/* (updated status everywhere),
analysis/test_30nn_hybrid.py (new — the analysis script that confirmed
the format).
467 lines
21 KiB
Python
467 lines
21 KiB
Python
"""
|
||
waveform_codec.py — block-walker and partial decoder for the MiniMate Plus
|
||
waveform-file body.
|
||
|
||
PARTIAL REVERSE-ENGINEERING — last updated 2026-05-11.
|
||
|
||
The Blastware waveform-file body — the bytes between the 21-byte STRT
|
||
record and the 26-byte file footer — is NOT raw int16 LE samples (the
|
||
historical assumption that produced full-scale ±32K noise on every
|
||
event). It is a tagged variable-length block stream with a custom
|
||
delta + RLE codec.
|
||
|
||
Current status:
|
||
|
||
- Block framing: ✅ solved (block types and lengths all confirmed)
|
||
- Tran channel, segment 0: ✅ solved (decode_tran_initial returns
|
||
byte-exact values vs BW's ASCII export, across 5 of 5 loud-bundle
|
||
events; first ~510 samples per event)
|
||
- Multi-segment Tran continuation: ❌ open (every hypothesis breaks
|
||
at the segment-1 boundary around sample 512)
|
||
- Vert / Long / Mic channel decoders: ❌ open
|
||
- 30 NN block content: ❌ open (only appears in loud-from-start events)
|
||
|
||
Production code in client.py still uses the broken int16 LE decoder.
|
||
``decode_waveform_v2`` here returns ``None`` as a placeholder. Callers
|
||
that need sample arrays should treat the legacy decoder's output as
|
||
"unverified" — the BW binary write path is the only sample-bearing
|
||
output that is currently trustworthy.
|
||
|
||
────────────────────────────────────────────────────────────────────────────
|
||
Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
|
||
────────────────────────────────────────────────────────────────────────────
|
||
|
||
[7-byte preamble] [stream of tagged blocks] [trailer]
|
||
|
||
The preamble is always exactly 7 bytes:
|
||
|
||
body[0:3] = 00 02 00 magic
|
||
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
|
||
body[5:7] = Tran[1] int16 BE in 16-count units
|
||
|
||
(Earlier drafts of this module described a "7-or-9-byte preamble";
|
||
that was wrong — single-shot and continuous events both use 7 bytes.
|
||
The "extra 2 bytes" on continuous events were the first ``00 NN`` RLE
|
||
marker, not part of the preamble.)
|
||
|
||
Block types and lengths (all confirmed):
|
||
|
||
| Tag | Length | Meaning |
|
||
|----------|-----------------------|----------------------------------------|
|
||
| ``10 NN``| NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
|
||
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
|
||
| ``20 NN``| NN + 2 bytes | int8 signed deltas (1 per byte) |
|
||
| ``00 NN``| 2 bytes | RLE: append NN copies of current value |
|
||
| ``30 NN``| NN*2 in data, NN*4 | Unknown content. Only in loud events. |
|
||
| | in trailer | |
|
||
| ``40 02``| 20 bytes (fixed) | Segment header |
|
||
|
||
NN is always a multiple of 4.
|
||
|
||
────────────────────────────────────────────────────────────────────────────
|
||
Tran channel, segment 0 (CONFIRMED 2026-05-11)
|
||
────────────────────────────────────────────────────────────────────────────
|
||
|
||
Segment 0 — everything before the first ``40 02`` segment header — encodes
|
||
Tran samples only. Starting from preamble anchors Tran[0] and Tran[1],
|
||
each subsequent block contributes to the running Tran value:
|
||
|
||
10 NN → append NN deltas (4-bit signed nibbles)
|
||
20 NN → append NN deltas (int8 signed bytes)
|
||
00 NN → append NN copies of the current value (RLE zeros)
|
||
40 02 → segment 0 ends; multi-segment continuation is open
|
||
|
||
This decodes the first 482–510 samples of Tran for each event with zero
|
||
errors against BW's ASCII export. The exact segment-0 sample count
|
||
varies per event (it's bounded by a fixed device-flash byte budget, not
|
||
a fixed sample count — quiet events fit more samples because zero
|
||
deltas pack into ``00 NN`` markers compactly).
|
||
|
||
Implementation: :func:`decode_tran_initial`.
|
||
|
||
────────────────────────────────────────────────────────────────────────────
|
||
Segment header (40 02, 20 bytes total)
|
||
────────────────────────────────────────────────────────────────────────────
|
||
|
||
The 18-byte payload of the ``40 02`` block:
|
||
|
||
| Offset | Field | Status |
|
||
|-----------|---------------------------------------------|-------------|
|
||
| [0:2] | T_delta at first sample of new segment | ✅ confirmed|
|
||
| | (int16 BE, in 16-count units) | |
|
||
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
|
||
| [4:6] | Unknown (varies; possibly checksum) | ❓ open |
|
||
| [6:8] | Byte length to next segment header − 2 | ✅ confirmed|
|
||
| | (uint16 BE; useful for walker pre-scan) | |
|
||
| [8:12] | Monotonic uint32 LE counter | ✅ confirmed|
|
||
| | (starts ~0x47, increments by 1 per segment) | |
|
||
| [12:14] | Constant ``02 00`` | ✅ confirmed|
|
||
| [14:18] | Unknown 4-byte field | ❓ open |
|
||
|
||
────────────────────────────────────────────────────────────────────────────
|
||
What breaks the multi-segment decoder (the main open question)
|
||
────────────────────────────────────────────────────────────────────────────
|
||
|
||
After segment 0 ends and the segment header T_delta is consumed,
|
||
applying segment 1's blocks as Tran continuation produces values that
|
||
diverge from truth by sample ~512. The block structure inside segment
|
||
1 is IDENTICAL to segment 0 (same alternating 10 NN / 00 NN pattern),
|
||
and the delta budget matches the segment size exactly (V70 segment 1
|
||
has 264 nibble-deltas + 244 RLE zeros = 508 = the segment's sample
|
||
count). But the cumulative is wrong.
|
||
|
||
The strongest unverified hypothesis is that segments rotate channels:
|
||
|
||
segment 0 → Tran samples 0..509
|
||
segment 1 → Vert samples 0..507
|
||
segment 2 → Long samples 0..507
|
||
segment 3 → Mic samples 0..507
|
||
segment 4 → Tran samples 510..N (continuation)
|
||
...
|
||
|
||
This is consistent with the segment-1 block sums net-to-near-zero in
|
||
V70 (where all 4 channels are near zero) and with the per-segment delta
|
||
budget matching the segment size for a single channel. It is NOT yet
|
||
verified because the per-segment channel anchor isn't pinned down in
|
||
the segment header — bytes [4:6] and [14:18] of the header are still
|
||
open and probably encode V/L/M anchors.
|
||
|
||
See ``docs/waveform_codec_re_status.md`` for the current working notes
|
||
and the suggested next experiment ("segment-channel scoring analyzer").
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
from dataclasses import dataclass
|
||
from typing import List, Optional, Tuple
|
||
|
||
|
||
@dataclass
|
||
class WaveformBlock:
|
||
"""One tagged block parsed out of a Blastware waveform-file body."""
|
||
offset: int # byte offset into body
|
||
tag_hi: int # first tag byte (0x10 / 0x20 / 0x00 / 0x30 / 0x40)
|
||
tag_lo: int # second tag byte (NN)
|
||
data: bytes # block payload (excludes the 2-byte tag)
|
||
length: int # total block length on the wire (includes the tag)
|
||
|
||
@property
|
||
def kind(self) -> str:
|
||
return f"{self.tag_hi:02x} {self.tag_lo:02x}"
|
||
|
||
|
||
def find_data_start(body: bytes) -> int:
|
||
"""Auto-detect the offset of the first data block.
|
||
|
||
The body starts with a 7-byte preamble (magic ``00 02 00`` + two int16 BE
|
||
Tran anchors). After that, the data section starts with a tag — usually
|
||
``10 NN`` or ``20 NN``, but quiet events may begin with a ``00 NN`` RLE
|
||
marker. We return the offset of the first recognized tag.
|
||
"""
|
||
# Try fixed offset 7 first (canonical preamble length).
|
||
if len(body) >= 9:
|
||
b, nn = body[7], body[8]
|
||
if (b in (0x00, 0x10, 0x20, 0x30) and nn % 4 == 0 and 0 < nn <= 0xFC) \
|
||
or (b == 0x40 and nn == 0x02):
|
||
return 7
|
||
# Fall back to scanning the first 20 bytes.
|
||
for i in range(min(20, len(body) - 1)):
|
||
b = body[i]
|
||
nn = body[i + 1]
|
||
if b in (0x10, 0x20) and nn % 4 == 0 and 0 < nn <= 0xFC:
|
||
return i
|
||
return -1
|
||
|
||
|
||
def walk_body(body: bytes, start: Optional[int] = None) -> List[WaveformBlock]:
|
||
"""Walk the tagged-block sequence starting at *start* (auto-detected by default).
|
||
|
||
Stops when an unrecognized tag is encountered or end of body is reached.
|
||
Returned blocks are in stream order.
|
||
"""
|
||
if start is None:
|
||
start = find_data_start(body)
|
||
if start < 0:
|
||
return []
|
||
|
||
blocks: List[WaveformBlock] = []
|
||
i = start
|
||
while i + 1 < len(body):
|
||
t0 = body[i]
|
||
t1 = body[i + 1]
|
||
if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
|
||
length = t1 // 2 + 2
|
||
elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
|
||
length = t1 + 2
|
||
elif t0 == 0x00 and t1 % 4 == 0:
|
||
length = 2
|
||
elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
|
||
# Data-section ``30 NN`` blocks carry NN 12-bit signed deltas packed
|
||
# as NN/4 groups of (2-byte high-nibble field + 4 × int8 low byte).
|
||
# Length = NN/4 × 6 + 2 = NN × 1.5 + 2 (= 8 for NN=4, 14 for NN=8,
|
||
# 20 for NN=12, etc.). Confirmed 2026-05-11 by full-decoder
|
||
# verification against BW ASCII export.
|
||
#
|
||
# Trailer-section ``30 NN`` blocks have a different length formula
|
||
# (NN × 4 = 32 for NN=8 in trailers). We try the data-section
|
||
# length first and fall back to the trailer length if needed.
|
||
cand_data = t1 * 3 // 2 + 2
|
||
cand_trailer = t1 * 4
|
||
if (i + cand_data < len(body) - 1
|
||
and body[i + cand_data] in (0x10, 0x20, 0x00, 0x30, 0x40)):
|
||
length = cand_data
|
||
else:
|
||
length = cand_trailer
|
||
elif t0 == 0x40 and t1 == 0x02:
|
||
length = 20
|
||
else:
|
||
# Unknown tag; stop. Caller can inspect ``i`` to see where.
|
||
break
|
||
|
||
if i + length > len(body):
|
||
break
|
||
|
||
data = bytes(body[i + 2 : i + length])
|
||
blocks.append(WaveformBlock(offset=i, tag_hi=t0, tag_lo=t1, data=data, length=length))
|
||
i += length
|
||
|
||
return blocks
|
||
|
||
|
||
def split_segments(blocks: List[WaveformBlock]) -> List[List[WaveformBlock]]:
|
||
"""Group consecutive blocks into segments separated by ``40 02`` headers.
|
||
|
||
The first segment is whatever runs before the first ``40 02`` header
|
||
(typically the "segment 0" preamble data after the body preamble).
|
||
Subsequent segments start with a ``40 02`` block, then have their
|
||
own data blocks until the next ``40 02``.
|
||
"""
|
||
segments: List[List[WaveformBlock]] = []
|
||
current: List[WaveformBlock] = []
|
||
for b in blocks:
|
||
if b.tag_hi == 0x40 and b.tag_lo == 0x02:
|
||
if current:
|
||
segments.append(current)
|
||
current = [b]
|
||
else:
|
||
current.append(b)
|
||
if current:
|
||
segments.append(current)
|
||
return segments
|
||
|
||
|
||
def parse_segment_header(block: WaveformBlock) -> Optional[dict]:
|
||
"""Decode the 18-byte payload of a ``40 02`` segment header.
|
||
|
||
Returns a dict with the labelled fields, or None if *block* is not
|
||
a ``40 02`` header.
|
||
"""
|
||
if not (block.tag_hi == 0x40 and block.tag_lo == 0x02):
|
||
return None
|
||
if len(block.data) < 18:
|
||
return None
|
||
p = block.data
|
||
counter = int.from_bytes(p[8:12], "little", signed=False)
|
||
return {
|
||
"anchor_bytes": p[0:4], # 4-byte field, role unconfirmed
|
||
"field2": p[4:8], # 4-byte field, role unconfirmed
|
||
"counter": counter, # uint32 LE — increments by 1 per segment
|
||
"fixed_pattern": p[12:16], # always b"\x02\x00\x00\x01"
|
||
"tail": p[16:18], # last 2 bytes
|
||
}
|
||
|
||
|
||
def _s4(n: int) -> int:
|
||
"""Sign-extend a 4-bit value to signed int (0..7 → 0..7; 8..F → -8..-1)."""
|
||
return n if n < 8 else n - 16
|
||
|
||
|
||
def _i8(b: int) -> int:
|
||
"""Reinterpret an unsigned byte as signed int8."""
|
||
return b if b < 128 else b - 256
|
||
|
||
|
||
def decode_tran_initial(body: bytes) -> Optional[List[int]]:
|
||
"""
|
||
Decode the initial Tran-channel samples — VERIFIED 2026-05-11.
|
||
|
||
Returns Tran samples in **16-count units** (LSB = 0.005 in/s at Normal
|
||
range — the same quantization BW uses for its ASCII export). Returns
|
||
``None`` if the body cannot be parsed.
|
||
|
||
The decoded list extends from sample 0 through the end of segment 0
|
||
(= just before the first ``40 02`` segment header; ~510 sample-sets
|
||
for the events tested). Multi-segment decoding requires continuing
|
||
past the segment header — that's done by :func:`decode_tran_full`
|
||
when the per-segment rules are pinned down for all signal types.
|
||
|
||
Codec for segment 0 (CONFIRMED 2026-05-11 against 7 fixture events):
|
||
|
||
- Body bytes [0:3] are the magic ``00 02 00``.
|
||
- Body bytes [3:5] = ``Tran[0]`` as int16 BE in 16-count units.
|
||
- Body bytes [5:7] = ``Tran[1]`` as int16 BE in 16-count units.
|
||
- Data blocks (``10 NN`` or ``20 NN``) carry Tran deltas starting
|
||
at sample 2:
|
||
|
||
* ``10 NN``: NN nibbles = NN/2 bytes; each nibble is a 4-bit
|
||
signed delta (0..7 → 0..+7; 8..F → -8..-1). High nibble of
|
||
each byte comes first.
|
||
* ``20 NN``: NN int8 signed deltas (one delta per byte).
|
||
|
||
- ``00 NN`` blocks are run-length-encoded zero deltas: append NN
|
||
copies of the current cumulative Tran value (no change).
|
||
|
||
- ``30 NN`` blocks have not yet been decoded for content — they
|
||
appear in segment 0 of loud-from-start events (SS0, SV0) and
|
||
seem to signal a transition or special-case interpretation.
|
||
The walker steps over them but their data is ignored.
|
||
|
||
The walk stops at the first ``40 02`` segment header.
|
||
"""
|
||
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
|
||
return None
|
||
t0 = int.from_bytes(body[3:5], "big", signed=True)
|
||
t1 = int.from_bytes(body[5:7], "big", signed=True)
|
||
|
||
start = find_data_start(body)
|
||
if start < 0:
|
||
return [t0, t1]
|
||
|
||
out = [t0, t1]
|
||
cur = t1
|
||
for blk in walk_body(body, start):
|
||
if blk.tag_hi == 0x40:
|
||
# Segment boundary — stop. Multi-segment decode is decode_tran_full.
|
||
break
|
||
if blk.tag_hi == 0x10:
|
||
for byte in blk.data:
|
||
for nib in ((byte >> 4) & 0xF, byte & 0xF):
|
||
cur += _s4(nib)
|
||
out.append(cur)
|
||
elif blk.tag_hi == 0x20:
|
||
for byte in blk.data:
|
||
cur += _i8(byte)
|
||
out.append(cur)
|
||
elif blk.tag_hi == 0x00:
|
||
# RLE zero deltas: append NN copies of current Tran value.
|
||
for _ in range(blk.tag_lo):
|
||
out.append(cur)
|
||
# 30 NN: unknown content; skip.
|
||
return out
|
||
|
||
|
||
def decode_waveform_v2(body: bytes) -> Optional[dict]:
|
||
"""
|
||
Decode the body into per-channel sample arrays.
|
||
|
||
Status (2026-05-11 evening — channel-rotation hypothesis CONFIRMED):
|
||
segments rotate channels in fixed order **Tran → Vert → Long → MicL**.
|
||
Each channel-segment carries a 2-sample anchor pair in segment-header
|
||
bytes [14:18] (or in the body preamble for the initial Tran segment)
|
||
plus a stream of delta blocks for samples 2 onward.
|
||
|
||
Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
|
||
with each channel's decoded samples in 16-count units (LSB = 0.005
|
||
in/s at Normal range). Returns ``None`` if the body cannot be
|
||
parsed.
|
||
"""
|
||
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
|
||
return None
|
||
|
||
channels = ["Tran", "Vert", "Long", "MicL"]
|
||
out: dict = {ch: [] for ch in channels}
|
||
|
||
# Initial Tran segment: preamble anchor pair + delta blocks before first 40 02.
|
||
t0 = int.from_bytes(body[3:5], "big", signed=True)
|
||
t1 = int.from_bytes(body[5:7], "big", signed=True)
|
||
out["Tran"].extend([t0, t1])
|
||
|
||
start = find_data_start(body)
|
||
if start < 0:
|
||
return out
|
||
|
||
blocks = walk_body(body, start)
|
||
seg_idx = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
|
||
|
||
def apply_blocks(channel: str, anchor: int,
|
||
block_start: int, block_end: int) -> int:
|
||
"""Apply delta blocks [block_start, block_end) to *channel*'s sample
|
||
list, starting from *anchor*. Returns the final cumulative value."""
|
||
cur = anchor
|
||
for bi in range(block_start, block_end):
|
||
blk = blocks[bi]
|
||
if blk.tag_hi == 0x10:
|
||
for byte in blk.data:
|
||
for nib in ((byte >> 4) & 0xF, byte & 0xF):
|
||
cur += _s4(nib)
|
||
out[channel].append(cur)
|
||
elif blk.tag_hi == 0x20:
|
||
for byte in blk.data:
|
||
cur += _i8(byte)
|
||
out[channel].append(cur)
|
||
elif blk.tag_hi == 0x00:
|
||
for _ in range(blk.tag_lo):
|
||
out[channel].append(cur)
|
||
elif blk.tag_hi == 0x30:
|
||
# 12-bit signed deltas, packed as NN/4 groups of 6 bytes each:
|
||
# bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB first)
|
||
# bytes [2:6] = 4 × int8 low bytes
|
||
# Each delta = sign_extend_12((high_nibble << 8) | low_byte).
|
||
# Confirmed 2026-05-11 against all 14 ``30 NN`` blocks in the
|
||
# bundled fixtures.
|
||
n_groups = blk.tag_lo // 4
|
||
for g in range(n_groups):
|
||
grp = blk.data[g * 6 : (g + 1) * 6]
|
||
if len(grp) < 6:
|
||
break
|
||
high_word = (grp[0] << 8) | grp[1]
|
||
for k in range(4):
|
||
nib = (high_word >> (12 - 4 * k)) & 0xF
|
||
v = (nib << 8) | grp[2 + k]
|
||
if v >= 0x800:
|
||
v -= 0x1000
|
||
cur += v
|
||
out[channel].append(cur)
|
||
# 40 02: should not occur in segment data.
|
||
return cur
|
||
|
||
# Initial Tran segment: deltas from start of body up to first 40 02 (or end).
|
||
first_seg = seg_idx[0] if seg_idx else len(blocks)
|
||
last_tran_value = apply_blocks("Tran", t1, 0, first_seg)
|
||
|
||
# Subsequent segments rotate channels. Each segment header carries:
|
||
# bytes [0:2] and [2:4] = 2 deltas extending the PREVIOUS channel
|
||
# bytes [14:16] and [16:18] = anchor pair for THIS segment's channel
|
||
#
|
||
# Rotation: V, L, M, T, V, L, M, T, ... (initial Tran segment is the
|
||
# implicit T in the cycle.)
|
||
rotation = ["Vert", "Long", "MicL", "Tran"]
|
||
# Track each channel's "running cumulative value" so we can apply the
|
||
# previous-channel extension deltas at every segment boundary.
|
||
last_value = {"Tran": last_tran_value, "Vert": None, "Long": None, "MicL": None}
|
||
|
||
for k, hi in enumerate(seg_idx):
|
||
channel = rotation[k % 4]
|
||
prev_channel = "Tran" if k == 0 else rotation[(k - 1) % 4]
|
||
header = blocks[hi]
|
||
if len(header.data) < 18:
|
||
continue
|
||
# Extend the PREVIOUS channel by 2 more samples (deltas in bytes [0:4]).
|
||
prev_d0 = int.from_bytes(header.data[0:2], "big", signed=True)
|
||
prev_d1 = int.from_bytes(header.data[2:4], "big", signed=True)
|
||
if last_value[prev_channel] is not None:
|
||
v = last_value[prev_channel] + prev_d0
|
||
out[prev_channel].append(v)
|
||
v += prev_d1
|
||
out[prev_channel].append(v)
|
||
last_value[prev_channel] = v
|
||
# Anchor pair for THIS segment's channel.
|
||
c0 = int.from_bytes(header.data[14:16], "big", signed=True)
|
||
c1 = int.from_bytes(header.data[16:18], "big", signed=True)
|
||
out[channel].extend([c0, c1])
|
||
# Apply delta blocks for this segment.
|
||
next_hi = seg_idx[k + 1] if k + 1 < len(seg_idx) else len(blocks)
|
||
last_value[channel] = apply_blocks(channel, c1, hi + 1, next_hi)
|
||
|
||
return out
|