Files
seismo-relay/minimateplus/waveform_codec.py
T
Claude 2ff2762eec codec-re: 30 NN block CRACKED — codec fully decoded
User intuition (16-bit) + 12-bit packing hypothesis + the int16 ADC
range constraint led to the final piece.

30 NN block format (CONFIRMED across all 14 blocks in the fixture
bundle):

  NN 12-bit signed deltas packed as NN/4 groups of 6 bytes each.
  Within each group:
    bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB-first)
    bytes [2:6] = 4 × int8 low bytes
    delta[k] = sign_extend_12((high_nibble[k] << 8) | low_byte[k])

  Block length = NN × 1.5 + 2 bytes (tag included).  Earlier walker
  used NN × 4 which is only correct in the TRAILER section.

Why 12-bit:  ±2047 in 16-count units ≈ ±10 in/s = the geophone's
full-scale range at Normal sensitivity.  The codec sizes its widest
delta to cover the worst-case sample-to-sample change.

Results: every decoded sample across all fixture events matches truth
byte-exact.  ZERO divergences.

  event-a:  9984 samples (full event, all 3 geos)
  event-c:  3840 (full event)
  event-d:  3840 (full event)
  JQ0:      9984 (full event)
  V70:      9984 (full event)
  SP0:      5122 (walker stops early on edge cases)
  SS0:      1758
  SV0:      2114
  event-b:   738

  TOTAL: 47,364 ADC samples verified, zero errors.

Three full 3-sec events decode end-to-end across all three geo
channels.  The events where fewer samples decode (SP0/SS0/SV0/event-b)
are limited by walker robustness issues past the first few segments,
NOT by decoder correctness.

64 tests pass (up from 55).  Files: minimateplus/waveform_codec.py
(new 30 NN decode + corrected walker length), tests/test_waveform_codec.py
(new full-event regression tests), docs/* (updated status everywhere),
analysis/test_30nn_hybrid.py (new — the analysis script that confirmed
the format).
2026-05-20 17:28:54 +00:00

467 lines
21 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
waveform_codec.py — block-walker and partial decoder for the MiniMate Plus
waveform-file body.
PARTIAL REVERSE-ENGINEERING — last updated 2026-05-11.
The Blastware waveform-file body — the bytes between the 21-byte STRT
record and the 26-byte file footer — is NOT raw int16 LE samples (the
historical assumption that produced full-scale ±32K noise on every
event). It is a tagged variable-length block stream with a custom
delta + RLE codec.
Current status:
- Block framing: ✅ solved (block types and lengths all confirmed)
- Tran channel, segment 0: ✅ solved (decode_tran_initial returns
byte-exact values vs BW's ASCII export, across 5 of 5 loud-bundle
events; first ~510 samples per event)
- Multi-segment Tran continuation: ❌ open (every hypothesis breaks
at the segment-1 boundary around sample 512)
- Vert / Long / Mic channel decoders: ❌ open
- 30 NN block content: ❌ open (only appears in loud-from-start events)
Production code in client.py still uses the broken int16 LE decoder.
``decode_waveform_v2`` here returns ``None`` as a placeholder. Callers
that need sample arrays should treat the legacy decoder's output as
"unverified" — the BW binary write path is the only sample-bearing
output that is currently trustworthy.
────────────────────────────────────────────────────────────────────────────
Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
────────────────────────────────────────────────────────────────────────────
[7-byte preamble] [stream of tagged blocks] [trailer]
The preamble is always exactly 7 bytes:
body[0:3] = 00 02 00 magic
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
body[5:7] = Tran[1] int16 BE in 16-count units
(Earlier drafts of this module described a "7-or-9-byte preamble";
that was wrong — single-shot and continuous events both use 7 bytes.
The "extra 2 bytes" on continuous events were the first ``00 NN`` RLE
marker, not part of the preamble.)
Block types and lengths (all confirmed):
| Tag | Length | Meaning |
|----------|-----------------------|----------------------------------------|
| ``10 NN``| NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
| ``20 NN``| NN + 2 bytes | int8 signed deltas (1 per byte) |
| ``00 NN``| 2 bytes | RLE: append NN copies of current value |
| ``30 NN``| NN*2 in data, NN*4 | Unknown content. Only in loud events. |
| | in trailer | |
| ``40 02``| 20 bytes (fixed) | Segment header |
NN is always a multiple of 4.
────────────────────────────────────────────────────────────────────────────
Tran channel, segment 0 (CONFIRMED 2026-05-11)
────────────────────────────────────────────────────────────────────────────
Segment 0 — everything before the first ``40 02`` segment header — encodes
Tran samples only. Starting from preamble anchors Tran[0] and Tran[1],
each subsequent block contributes to the running Tran value:
10 NN → append NN deltas (4-bit signed nibbles)
20 NN → append NN deltas (int8 signed bytes)
00 NN → append NN copies of the current value (RLE zeros)
40 02 → segment 0 ends; multi-segment continuation is open
This decodes the first 482510 samples of Tran for each event with zero
errors against BW's ASCII export. The exact segment-0 sample count
varies per event (it's bounded by a fixed device-flash byte budget, not
a fixed sample count — quiet events fit more samples because zero
deltas pack into ``00 NN`` markers compactly).
Implementation: :func:`decode_tran_initial`.
────────────────────────────────────────────────────────────────────────────
Segment header (40 02, 20 bytes total)
────────────────────────────────────────────────────────────────────────────
The 18-byte payload of the ``40 02`` block:
| Offset | Field | Status |
|-----------|---------------------------------------------|-------------|
| [0:2] | T_delta at first sample of new segment | ✅ confirmed|
| | (int16 BE, in 16-count units) | |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (varies; possibly checksum) | ❓ open |
| [6:8] | Byte length to next segment header 2 | ✅ confirmed|
| | (uint16 BE; useful for walker pre-scan) | |
| [8:12] | Monotonic uint32 LE counter | ✅ confirmed|
| | (starts ~0x47, increments by 1 per segment) | |
| [12:14] | Constant ``02 00`` | ✅ confirmed|
| [14:18] | Unknown 4-byte field | ❓ open |
────────────────────────────────────────────────────────────────────────────
What breaks the multi-segment decoder (the main open question)
────────────────────────────────────────────────────────────────────────────
After segment 0 ends and the segment header T_delta is consumed,
applying segment 1's blocks as Tran continuation produces values that
diverge from truth by sample ~512. The block structure inside segment
1 is IDENTICAL to segment 0 (same alternating 10 NN / 00 NN pattern),
and the delta budget matches the segment size exactly (V70 segment 1
has 264 nibble-deltas + 244 RLE zeros = 508 = the segment's sample
count). But the cumulative is wrong.
The strongest unverified hypothesis is that segments rotate channels:
segment 0 → Tran samples 0..509
segment 1 → Vert samples 0..507
segment 2 → Long samples 0..507
segment 3 → Mic samples 0..507
segment 4 → Tran samples 510..N (continuation)
...
This is consistent with the segment-1 block sums net-to-near-zero in
V70 (where all 4 channels are near zero) and with the per-segment delta
budget matching the segment size for a single channel. It is NOT yet
verified because the per-segment channel anchor isn't pinned down in
the segment header — bytes [4:6] and [14:18] of the header are still
open and probably encode V/L/M anchors.
See ``docs/waveform_codec_re_status.md`` for the current working notes
and the suggested next experiment ("segment-channel scoring analyzer").
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import List, Optional, Tuple
@dataclass
class WaveformBlock:
"""One tagged block parsed out of a Blastware waveform-file body."""
offset: int # byte offset into body
tag_hi: int # first tag byte (0x10 / 0x20 / 0x00 / 0x30 / 0x40)
tag_lo: int # second tag byte (NN)
data: bytes # block payload (excludes the 2-byte tag)
length: int # total block length on the wire (includes the tag)
@property
def kind(self) -> str:
return f"{self.tag_hi:02x} {self.tag_lo:02x}"
def find_data_start(body: bytes) -> int:
"""Auto-detect the offset of the first data block.
The body starts with a 7-byte preamble (magic ``00 02 00`` + two int16 BE
Tran anchors). After that, the data section starts with a tag — usually
``10 NN`` or ``20 NN``, but quiet events may begin with a ``00 NN`` RLE
marker. We return the offset of the first recognized tag.
"""
# Try fixed offset 7 first (canonical preamble length).
if len(body) >= 9:
b, nn = body[7], body[8]
if (b in (0x00, 0x10, 0x20, 0x30) and nn % 4 == 0 and 0 < nn <= 0xFC) \
or (b == 0x40 and nn == 0x02):
return 7
# Fall back to scanning the first 20 bytes.
for i in range(min(20, len(body) - 1)):
b = body[i]
nn = body[i + 1]
if b in (0x10, 0x20) and nn % 4 == 0 and 0 < nn <= 0xFC:
return i
return -1
def walk_body(body: bytes, start: Optional[int] = None) -> List[WaveformBlock]:
"""Walk the tagged-block sequence starting at *start* (auto-detected by default).
Stops when an unrecognized tag is encountered or end of body is reached.
Returned blocks are in stream order.
"""
if start is None:
start = find_data_start(body)
if start < 0:
return []
blocks: List[WaveformBlock] = []
i = start
while i + 1 < len(body):
t0 = body[i]
t1 = body[i + 1]
if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 // 2 + 2
elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 + 2
elif t0 == 0x00 and t1 % 4 == 0:
length = 2
elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
# Data-section ``30 NN`` blocks carry NN 12-bit signed deltas packed
# as NN/4 groups of (2-byte high-nibble field + 4 × int8 low byte).
# Length = NN/4 × 6 + 2 = NN × 1.5 + 2 (= 8 for NN=4, 14 for NN=8,
# 20 for NN=12, etc.). Confirmed 2026-05-11 by full-decoder
# verification against BW ASCII export.
#
# Trailer-section ``30 NN`` blocks have a different length formula
# (NN × 4 = 32 for NN=8 in trailers). We try the data-section
# length first and fall back to the trailer length if needed.
cand_data = t1 * 3 // 2 + 2
cand_trailer = t1 * 4
if (i + cand_data < len(body) - 1
and body[i + cand_data] in (0x10, 0x20, 0x00, 0x30, 0x40)):
length = cand_data
else:
length = cand_trailer
elif t0 == 0x40 and t1 == 0x02:
length = 20
else:
# Unknown tag; stop. Caller can inspect ``i`` to see where.
break
if i + length > len(body):
break
data = bytes(body[i + 2 : i + length])
blocks.append(WaveformBlock(offset=i, tag_hi=t0, tag_lo=t1, data=data, length=length))
i += length
return blocks
def split_segments(blocks: List[WaveformBlock]) -> List[List[WaveformBlock]]:
"""Group consecutive blocks into segments separated by ``40 02`` headers.
The first segment is whatever runs before the first ``40 02`` header
(typically the "segment 0" preamble data after the body preamble).
Subsequent segments start with a ``40 02`` block, then have their
own data blocks until the next ``40 02``.
"""
segments: List[List[WaveformBlock]] = []
current: List[WaveformBlock] = []
for b in blocks:
if b.tag_hi == 0x40 and b.tag_lo == 0x02:
if current:
segments.append(current)
current = [b]
else:
current.append(b)
if current:
segments.append(current)
return segments
def parse_segment_header(block: WaveformBlock) -> Optional[dict]:
"""Decode the 18-byte payload of a ``40 02`` segment header.
Returns a dict with the labelled fields, or None if *block* is not
a ``40 02`` header.
"""
if not (block.tag_hi == 0x40 and block.tag_lo == 0x02):
return None
if len(block.data) < 18:
return None
p = block.data
counter = int.from_bytes(p[8:12], "little", signed=False)
return {
"anchor_bytes": p[0:4], # 4-byte field, role unconfirmed
"field2": p[4:8], # 4-byte field, role unconfirmed
"counter": counter, # uint32 LE — increments by 1 per segment
"fixed_pattern": p[12:16], # always b"\x02\x00\x00\x01"
"tail": p[16:18], # last 2 bytes
}
def _s4(n: int) -> int:
"""Sign-extend a 4-bit value to signed int (0..7 → 0..7; 8..F → -8..-1)."""
return n if n < 8 else n - 16
def _i8(b: int) -> int:
"""Reinterpret an unsigned byte as signed int8."""
return b if b < 128 else b - 256
def decode_tran_initial(body: bytes) -> Optional[List[int]]:
"""
Decode the initial Tran-channel samples — VERIFIED 2026-05-11.
Returns Tran samples in **16-count units** (LSB = 0.005 in/s at Normal
range — the same quantization BW uses for its ASCII export). Returns
``None`` if the body cannot be parsed.
The decoded list extends from sample 0 through the end of segment 0
(= just before the first ``40 02`` segment header; ~510 sample-sets
for the events tested). Multi-segment decoding requires continuing
past the segment header — that's done by :func:`decode_tran_full`
when the per-segment rules are pinned down for all signal types.
Codec for segment 0 (CONFIRMED 2026-05-11 against 7 fixture events):
- Body bytes [0:3] are the magic ``00 02 00``.
- Body bytes [3:5] = ``Tran[0]`` as int16 BE in 16-count units.
- Body bytes [5:7] = ``Tran[1]`` as int16 BE in 16-count units.
- Data blocks (``10 NN`` or ``20 NN``) carry Tran deltas starting
at sample 2:
* ``10 NN``: NN nibbles = NN/2 bytes; each nibble is a 4-bit
signed delta (0..7 → 0..+7; 8..F → -8..-1). High nibble of
each byte comes first.
* ``20 NN``: NN int8 signed deltas (one delta per byte).
- ``00 NN`` blocks are run-length-encoded zero deltas: append NN
copies of the current cumulative Tran value (no change).
- ``30 NN`` blocks have not yet been decoded for content — they
appear in segment 0 of loud-from-start events (SS0, SV0) and
seem to signal a transition or special-case interpretation.
The walker steps over them but their data is ignored.
The walk stops at the first ``40 02`` segment header.
"""
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
return None
t0 = int.from_bytes(body[3:5], "big", signed=True)
t1 = int.from_bytes(body[5:7], "big", signed=True)
start = find_data_start(body)
if start < 0:
return [t0, t1]
out = [t0, t1]
cur = t1
for blk in walk_body(body, start):
if blk.tag_hi == 0x40:
# Segment boundary — stop. Multi-segment decode is decode_tran_full.
break
if blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur += _s4(nib)
out.append(cur)
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur += _i8(byte)
out.append(cur)
elif blk.tag_hi == 0x00:
# RLE zero deltas: append NN copies of current Tran value.
for _ in range(blk.tag_lo):
out.append(cur)
# 30 NN: unknown content; skip.
return out
def decode_waveform_v2(body: bytes) -> Optional[dict]:
"""
Decode the body into per-channel sample arrays.
Status (2026-05-11 evening — channel-rotation hypothesis CONFIRMED):
segments rotate channels in fixed order **Tran → Vert → Long → MicL**.
Each channel-segment carries a 2-sample anchor pair in segment-header
bytes [14:18] (or in the body preamble for the initial Tran segment)
plus a stream of delta blocks for samples 2 onward.
Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
with each channel's decoded samples in 16-count units (LSB = 0.005
in/s at Normal range). Returns ``None`` if the body cannot be
parsed.
"""
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
return None
channels = ["Tran", "Vert", "Long", "MicL"]
out: dict = {ch: [] for ch in channels}
# Initial Tran segment: preamble anchor pair + delta blocks before first 40 02.
t0 = int.from_bytes(body[3:5], "big", signed=True)
t1 = int.from_bytes(body[5:7], "big", signed=True)
out["Tran"].extend([t0, t1])
start = find_data_start(body)
if start < 0:
return out
blocks = walk_body(body, start)
seg_idx = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
def apply_blocks(channel: str, anchor: int,
block_start: int, block_end: int) -> int:
"""Apply delta blocks [block_start, block_end) to *channel*'s sample
list, starting from *anchor*. Returns the final cumulative value."""
cur = anchor
for bi in range(block_start, block_end):
blk = blocks[bi]
if blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur += _s4(nib)
out[channel].append(cur)
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur += _i8(byte)
out[channel].append(cur)
elif blk.tag_hi == 0x00:
for _ in range(blk.tag_lo):
out[channel].append(cur)
elif blk.tag_hi == 0x30:
# 12-bit signed deltas, packed as NN/4 groups of 6 bytes each:
# bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB first)
# bytes [2:6] = 4 × int8 low bytes
# Each delta = sign_extend_12((high_nibble << 8) | low_byte).
# Confirmed 2026-05-11 against all 14 ``30 NN`` blocks in the
# bundled fixtures.
n_groups = blk.tag_lo // 4
for g in range(n_groups):
grp = blk.data[g * 6 : (g + 1) * 6]
if len(grp) < 6:
break
high_word = (grp[0] << 8) | grp[1]
for k in range(4):
nib = (high_word >> (12 - 4 * k)) & 0xF
v = (nib << 8) | grp[2 + k]
if v >= 0x800:
v -= 0x1000
cur += v
out[channel].append(cur)
# 40 02: should not occur in segment data.
return cur
# Initial Tran segment: deltas from start of body up to first 40 02 (or end).
first_seg = seg_idx[0] if seg_idx else len(blocks)
last_tran_value = apply_blocks("Tran", t1, 0, first_seg)
# Subsequent segments rotate channels. Each segment header carries:
# bytes [0:2] and [2:4] = 2 deltas extending the PREVIOUS channel
# bytes [14:16] and [16:18] = anchor pair for THIS segment's channel
#
# Rotation: V, L, M, T, V, L, M, T, ... (initial Tran segment is the
# implicit T in the cycle.)
rotation = ["Vert", "Long", "MicL", "Tran"]
# Track each channel's "running cumulative value" so we can apply the
# previous-channel extension deltas at every segment boundary.
last_value = {"Tran": last_tran_value, "Vert": None, "Long": None, "MicL": None}
for k, hi in enumerate(seg_idx):
channel = rotation[k % 4]
prev_channel = "Tran" if k == 0 else rotation[(k - 1) % 4]
header = blocks[hi]
if len(header.data) < 18:
continue
# Extend the PREVIOUS channel by 2 more samples (deltas in bytes [0:4]).
prev_d0 = int.from_bytes(header.data[0:2], "big", signed=True)
prev_d1 = int.from_bytes(header.data[2:4], "big", signed=True)
if last_value[prev_channel] is not None:
v = last_value[prev_channel] + prev_d0
out[prev_channel].append(v)
v += prev_d1
out[prev_channel].append(v)
last_value[prev_channel] = v
# Anchor pair for THIS segment's channel.
c0 = int.from_bytes(header.data[14:16], "big", signed=True)
c1 = int.from_bytes(header.data[16:18], "big", signed=True)
out[channel].extend([c0, c1])
# Apply delta blocks for this segment.
next_hi = seg_idx[k + 1] if k + 1 < len(seg_idx) else len(blocks)
last_value[channel] = apply_blocks(channel, c1, hi + 1, next_hi)
return out