Files
seismo-relay/minimateplus/waveform_codec.py
T
Claude a0c9a482c7 codec-re: 00 NN is RLE; full Tran segment-0 decode (4 of 5 events)
User uploaded a Vert-heavy event (JQ0) and a Mic-heavy event (V70).
Those two were exactly what was needed to crack the next piece:

- 00 NN block = run-length-encoded zero deltas in the current channel.
  Append NN copies of the current cumulative value (no change).
- find_data_start now recognizes 00 NN as a valid first tag (some events
  begin with a leading 00 NN RLE block).
- decode_tran_initial now decodes the FULL segment 0 (not just the first
  data block).

Results across 5 fixture events:
  - M529LL1A.SP0 (loud-all-channels)  : 510 / 510  ✓
  - M529LL1L.JQ0 (Vert-heavy)         : 510 / 510  ✓
  - M529LL1L.V70 (Mic-heavy)          : 510 / 510  ✓
  - M529LL1A.SV0 (loud-from-start)    :  58 /  58  ✓
  - M529LL1A.SS0 (loud-from-start)    :  42 / 502  (stops at first 30 04)

The 30 04 block (only seen in loud-from-start events) hasn't been
decoded yet — likely a channel-switch marker for the high-amplitude
regime.

Also discovered: segment header (40 02) payload bytes [0:2] = T_delta
at first sample of new segment, [6:8] = byte length to next segment.
Multi-segment Tran decoding still diverges after sample 512 because
the per-segment channel ordering after the header is unknown.

Tests: 40 pass (up from 36).

Files:
- minimateplus/waveform_codec.py: find_data_start fix, RLE handling,
  full segment-0 decode in decode_tran_initial
- tests/test_waveform_codec.py: synthetic RLE test, full segment 0
  tests for JQ0 and V70
- tests/fixtures/5-11-26/: M529LL1L.JQ0, M529LL1L.V70 + TXT exports
- docs/instantel_protocol_reference.md §7.6.1: RLE + segment-header docs
2026-05-20 17:28:54 +00:00

353 lines
14 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
waveform_codec.py — block-walker for the MiniMate Plus waveform body codec.
PARTIAL REVERSE-ENGINEERING — 2026-05-08.
Status: STRUCTURAL FRAMING confirmed; per-block sample interpretation OPEN.
This module replaces the int16-LE assumption that produced full-scale ±32K
noise on every event. The body is NOT raw int16 LE: it is a sequence of
tagged variable-length blocks. The block framing is solved here. The
mapping from block bytes to ADC samples is **NOT yet pinned down** — the
work-in-progress decoder ``decode_waveform_v2`` returns ``None`` until
a verified algorithm is wired in.
Until ``decode_waveform_v2`` returns a verified result, callers that need
sample data should keep relying on the legacy decoder in ``client.py``
(known-broken, but at least stable in shape) and not consume this
module's sample output.
────────────────────────────────────────────────────────────────────────────
Body structure (CONFIRMED 2026-05-08 against decode-re/5-8-26 4-event bundle)
────────────────────────────────────────────────────────────────────────────
The Blastware waveform-file body lives between bytes [22+21=43] and the
26-byte file footer (``[: -26]``). Layout:
[preamble: 7 or 9 bytes]
[data section: a stream of tagged blocks]
[trailer: per-channel summary blocks]
The preamble starts with the magic ``00 02 00 00``. After that there is
either 3 or 5 bytes of header before the first ``10 NN`` block tag — in
the 4-event bundle, single-shot events have a 7-byte preamble and
continuous events have 9. The exact meaning of bytes [4:9] is open
(empirically: byte [4] for event-a == truth Tran[0]; byte [4] for
event-b == truth Tran[0]; events c/d = 0; treating it as a per-channel
"initial value" partially matches but is inconsistent across events).
Blocks have 2-byte tags and these confirmed lengths:
| Tag (hex) | Block type | Total length |
|-----------|--------------------------------------|-----------------|
| ``10 NN`` | Small-delta data block | NN/2 + 2 bytes |
| ``20 NN`` | Literal data block (looks int8-ish) | NN + 2 bytes |
| ``00 NN`` | 2-byte marker between data blocks | 2 bytes |
| ``30 NN`` | Trailer summary block | NN × 4 bytes |
| ``40 02`` | Segment header | 20 bytes |
In the 4-event bundle, every event's body parses as a clean sequence of
these blocks all the way through the trailer (when the walker is given
the right preamble length). No "??" stops occur once the start offset
is correct.
Segments and the ``40 02`` header
────────────────────────────────────
The body is divided into ~16 SEGMENTS, each separated by a ``40 02``
header. Each segment carries ~80 sample-sets (1280-sample event = 16
segments × 80 sample-sets, 3328-sample event = ~42 segments). The 18-byte
``40 02`` payload contains:
bytes 0..3 4-byte channel anchor / state (varies per segment)
bytes 4..7 4-byte field, varies (RMS/peak per channel?)
bytes 8..11 4-byte uint32 LE counter (increments by 1 per segment;
starts at e.g. 0x47 for the first in-data segment)
bytes 12..15 4-byte fixed pattern: 02 00 00 01
bytes 16..17 2-byte segment-relative payload counter
The counter at bytes [8..11] increments cleanly across segments — useful
as a sanity check. The role of bytes [0..3] (anchor candidates) and
[4..7] is not pinned down: simple "channel state at segment boundary"
hypotheses do NOT match truth across all four sample bundles tested.
What's open
────────────
The mapping ``block bytes → ADC samples`` is the open question. Tested
hypotheses that did **not** match BW's ASCII export to within the
required ±1 ADC count:
1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved
(TVLM/VTLM/LMTV/all 24 permutations × 2 nibble orders × 2 sign
conventions = 96 combinations tested). All produce values that
diverge from truth after the first ~7 sample-sets.
2. ``20 NN`` data = int8 absolute samples for one channel. Magnitudes
in observed blocks (peak ~±34 in the smoothest event-c block at
offset 351) do not match any channel's PPV at any plausible
ADC-count quantization (1-count, 4-count, 8-count, 16-count).
3. ``00 NN`` marker = "skip N sample-sets". Sums of NN/4 across markers
do not match 80 sample-sets per segment.
4. Concatenating ALL ``10 NN`` payload bytes and reading as a continuous
nibble stream (TVLM round-robin) produces the same 96-combination
problem as (1).
The most promising lead — that ``20 NN`` blocks carry literal int8
sample-sequences for the largest-amplitude channel within a segment —
is consistent with the smooth waveform shape of those payloads, but
the magnitude scaling has not been pinned down. It's possible that
``10 NN`` and ``20 NN`` blocks carry different bit-widths of the same
channel-interleaved delta stream (variable-width like Rice coding)
with 4-bit deltas as default and 8-bit deltas as escape.
Potential next steps for whoever picks this up:
- Capture an event with a KNOWN external waveform (e.g. a calibration
signal of known frequency/amplitude) so the truth is unambiguous and
the magnitude scaling is unambiguous.
- Capture multiple events with the SAME signal but DIFFERENT geo_range
(Normal 10 in/s vs Sensitive 1.25 in/s) to disambiguate scaling.
- Examine sequential 0x10 segment headers for a single event — the
4-byte "anchor" should reflect cumulative sample state at the
boundary; matching it to truth at that sample index would unlock
the per-segment delta decode.
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import List, Optional, Tuple
@dataclass
class WaveformBlock:
"""One tagged block parsed out of a Blastware waveform-file body."""
offset: int # byte offset into body
tag_hi: int # first tag byte (0x10 / 0x20 / 0x00 / 0x30 / 0x40)
tag_lo: int # second tag byte (NN)
data: bytes # block payload (excludes the 2-byte tag)
length: int # total block length on the wire (includes the tag)
@property
def kind(self) -> str:
return f"{self.tag_hi:02x} {self.tag_lo:02x}"
def find_data_start(body: bytes) -> int:
"""Auto-detect the offset of the first data block.
The body starts with a 7-byte preamble (magic ``00 02 00`` + two int16 BE
Tran anchors). After that, the data section starts with a tag — usually
``10 NN`` or ``20 NN``, but quiet events may begin with a ``00 NN`` RLE
marker. We return the offset of the first recognized tag.
"""
# Try fixed offset 7 first (canonical preamble length).
if len(body) >= 9:
b, nn = body[7], body[8]
if (b in (0x00, 0x10, 0x20, 0x30) and nn % 4 == 0 and 0 < nn <= 0xFC) \
or (b == 0x40 and nn == 0x02):
return 7
# Fall back to scanning the first 20 bytes.
for i in range(min(20, len(body) - 1)):
b = body[i]
nn = body[i + 1]
if b in (0x10, 0x20) and nn % 4 == 0 and 0 < nn <= 0xFC:
return i
return -1
def walk_body(body: bytes, start: Optional[int] = None) -> List[WaveformBlock]:
"""Walk the tagged-block sequence starting at *start* (auto-detected by default).
Stops when an unrecognized tag is encountered or end of body is reached.
Returned blocks are in stream order.
"""
if start is None:
start = find_data_start(body)
if start < 0:
return []
blocks: List[WaveformBlock] = []
i = start
while i + 1 < len(body):
t0 = body[i]
t1 = body[i + 1]
if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 // 2 + 2
elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 + 2
elif t0 == 0x00 and t1 % 4 == 0:
length = 2
elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
# Data-section ``30 NN`` blocks have length NN*2 (= 8 for NN=4,
# confirmed in M529LL1A.SS0 at body offset 29). Trailer-section
# ``30 NN`` blocks have length NN*4 (= 32 for NN=8, confirmed in
# event-d trailer at body offset 3941). We pick NN*2 if it lands
# on a recognized tag, otherwise fall through to NN*4.
cand2 = t1 * 2
cand4 = t1 * 4
if (i + cand2 < len(body) - 1
and body[i + cand2] in (0x10, 0x20, 0x00, 0x30, 0x40)):
length = cand2
else:
length = cand4
elif t0 == 0x40 and t1 == 0x02:
length = 20
else:
# Unknown tag; stop. Caller can inspect ``i`` to see where.
break
if i + length > len(body):
break
data = bytes(body[i + 2 : i + length])
blocks.append(WaveformBlock(offset=i, tag_hi=t0, tag_lo=t1, data=data, length=length))
i += length
return blocks
def split_segments(blocks: List[WaveformBlock]) -> List[List[WaveformBlock]]:
"""Group consecutive blocks into segments separated by ``40 02`` headers.
The first segment is whatever runs before the first ``40 02`` header
(typically the "segment 0" preamble data after the body preamble).
Subsequent segments start with a ``40 02`` block, then have their
own data blocks until the next ``40 02``.
"""
segments: List[List[WaveformBlock]] = []
current: List[WaveformBlock] = []
for b in blocks:
if b.tag_hi == 0x40 and b.tag_lo == 0x02:
if current:
segments.append(current)
current = [b]
else:
current.append(b)
if current:
segments.append(current)
return segments
def parse_segment_header(block: WaveformBlock) -> Optional[dict]:
"""Decode the 18-byte payload of a ``40 02`` segment header.
Returns a dict with the labelled fields, or None if *block* is not
a ``40 02`` header.
"""
if not (block.tag_hi == 0x40 and block.tag_lo == 0x02):
return None
if len(block.data) < 18:
return None
p = block.data
counter = int.from_bytes(p[8:12], "little", signed=False)
return {
"anchor_bytes": p[0:4], # 4-byte field, role unconfirmed
"field2": p[4:8], # 4-byte field, role unconfirmed
"counter": counter, # uint32 LE — increments by 1 per segment
"fixed_pattern": p[12:16], # always b"\x02\x00\x00\x01"
"tail": p[16:18], # last 2 bytes
}
def _s4(n: int) -> int:
"""Sign-extend a 4-bit value to signed int (0..7 → 0..7; 8..F → -8..-1)."""
return n if n < 8 else n - 16
def _i8(b: int) -> int:
"""Reinterpret an unsigned byte as signed int8."""
return b if b < 128 else b - 256
def decode_tran_initial(body: bytes) -> Optional[List[int]]:
"""
Decode the initial Tran-channel samples — VERIFIED 2026-05-11.
Returns Tran samples in **16-count units** (LSB = 0.005 in/s at Normal
range — the same quantization BW uses for its ASCII export). Returns
``None`` if the body cannot be parsed.
The decoded list extends from sample 0 through the end of segment 0
(= just before the first ``40 02`` segment header; ~510 sample-sets
for the events tested). Multi-segment decoding requires continuing
past the segment header — that's done by :func:`decode_tran_full`
when the per-segment rules are pinned down for all signal types.
Codec for segment 0 (CONFIRMED 2026-05-11 against 7 fixture events):
- Body bytes [0:3] are the magic ``00 02 00``.
- Body bytes [3:5] = ``Tran[0]`` as int16 BE in 16-count units.
- Body bytes [5:7] = ``Tran[1]`` as int16 BE in 16-count units.
- Data blocks (``10 NN`` or ``20 NN``) carry Tran deltas starting
at sample 2:
* ``10 NN``: NN nibbles = NN/2 bytes; each nibble is a 4-bit
signed delta (0..7 → 0..+7; 8..F → -8..-1). High nibble of
each byte comes first.
* ``20 NN``: NN int8 signed deltas (one delta per byte).
- ``00 NN`` blocks are run-length-encoded zero deltas: append NN
copies of the current cumulative Tran value (no change).
- ``30 NN`` blocks have not yet been decoded for content — they
appear in segment 0 of loud-from-start events (SS0, SV0) and
seem to signal a transition or special-case interpretation.
The walker steps over them but their data is ignored.
The walk stops at the first ``40 02`` segment header.
"""
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
return None
t0 = int.from_bytes(body[3:5], "big", signed=True)
t1 = int.from_bytes(body[5:7], "big", signed=True)
start = find_data_start(body)
if start < 0:
return [t0, t1]
out = [t0, t1]
cur = t1
for blk in walk_body(body, start):
if blk.tag_hi == 0x40:
# Segment boundary — stop. Multi-segment decode is decode_tran_full.
break
if blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur += _s4(nib)
out.append(cur)
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur += _i8(byte)
out.append(cur)
elif blk.tag_hi == 0x00:
# RLE zero deltas: append NN copies of current Tran value.
for _ in range(blk.tag_lo):
out.append(cur)
# 30 NN: unknown content; skip.
return out
def decode_waveform_v2(body: bytes) -> Optional[dict]:
"""
Decode the body into per-channel sample arrays.
Returns ``None`` because the full multi-channel decoder is not yet
wired up. Tran is partially solved — see :func:`decode_tran_initial`
for the initial portion (verified against ground-truth BW exports).
Status (2026-05-11):
- Tran[0:N] correctly decoded by ``decode_tran_initial`` for the
first N samples of every fixture (where N = 22 / 42 / 46
depending on event).
- Subsequent Tran samples + all Vert / Long / MicL samples: open.
The block stream after the first data block likely interleaves
channels with ``30 NN`` channel-switch markers, but the exact
switching rule is still under investigation.
"""
return None