a0c9a482c7
User uploaded a Vert-heavy event (JQ0) and a Mic-heavy event (V70). Those two were exactly what was needed to crack the next piece: - 00 NN block = run-length-encoded zero deltas in the current channel. Append NN copies of the current cumulative value (no change). - find_data_start now recognizes 00 NN as a valid first tag (some events begin with a leading 00 NN RLE block). - decode_tran_initial now decodes the FULL segment 0 (not just the first data block). Results across 5 fixture events: - M529LL1A.SP0 (loud-all-channels) : 510 / 510 ✓ - M529LL1L.JQ0 (Vert-heavy) : 510 / 510 ✓ - M529LL1L.V70 (Mic-heavy) : 510 / 510 ✓ - M529LL1A.SV0 (loud-from-start) : 58 / 58 ✓ - M529LL1A.SS0 (loud-from-start) : 42 / 502 (stops at first 30 04) The 30 04 block (only seen in loud-from-start events) hasn't been decoded yet — likely a channel-switch marker for the high-amplitude regime. Also discovered: segment header (40 02) payload bytes [0:2] = T_delta at first sample of new segment, [6:8] = byte length to next segment. Multi-segment Tran decoding still diverges after sample 512 because the per-segment channel ordering after the header is unknown. Tests: 40 pass (up from 36). Files: - minimateplus/waveform_codec.py: find_data_start fix, RLE handling, full segment-0 decode in decode_tran_initial - tests/test_waveform_codec.py: synthetic RLE test, full segment 0 tests for JQ0 and V70 - tests/fixtures/5-11-26/: M529LL1L.JQ0, M529LL1L.V70 + TXT exports - docs/instantel_protocol_reference.md §7.6.1: RLE + segment-header docs
353 lines
14 KiB
Python
353 lines
14 KiB
Python
"""
|
||
waveform_codec.py — block-walker for the MiniMate Plus waveform body codec.
|
||
|
||
PARTIAL REVERSE-ENGINEERING — 2026-05-08.
|
||
|
||
Status: STRUCTURAL FRAMING confirmed; per-block sample interpretation OPEN.
|
||
|
||
This module replaces the int16-LE assumption that produced full-scale ±32K
|
||
noise on every event. The body is NOT raw int16 LE: it is a sequence of
|
||
tagged variable-length blocks. The block framing is solved here. The
|
||
mapping from block bytes to ADC samples is **NOT yet pinned down** — the
|
||
work-in-progress decoder ``decode_waveform_v2`` returns ``None`` until
|
||
a verified algorithm is wired in.
|
||
|
||
Until ``decode_waveform_v2`` returns a verified result, callers that need
|
||
sample data should keep relying on the legacy decoder in ``client.py``
|
||
(known-broken, but at least stable in shape) and not consume this
|
||
module's sample output.
|
||
|
||
────────────────────────────────────────────────────────────────────────────
|
||
Body structure (CONFIRMED 2026-05-08 against decode-re/5-8-26 4-event bundle)
|
||
────────────────────────────────────────────────────────────────────────────
|
||
|
||
The Blastware waveform-file body lives between bytes [22+21=43] and the
|
||
26-byte file footer (``[: -26]``). Layout:
|
||
|
||
[preamble: 7 or 9 bytes]
|
||
[data section: a stream of tagged blocks]
|
||
[trailer: per-channel summary blocks]
|
||
|
||
The preamble starts with the magic ``00 02 00 00``. After that there is
|
||
either 3 or 5 bytes of header before the first ``10 NN`` block tag — in
|
||
the 4-event bundle, single-shot events have a 7-byte preamble and
|
||
continuous events have 9. The exact meaning of bytes [4:9] is open
|
||
(empirically: byte [4] for event-a == truth Tran[0]; byte [4] for
|
||
event-b == truth Tran[0]; events c/d = 0; treating it as a per-channel
|
||
"initial value" partially matches but is inconsistent across events).
|
||
|
||
Blocks have 2-byte tags and these confirmed lengths:
|
||
|
||
| Tag (hex) | Block type | Total length |
|
||
|-----------|--------------------------------------|-----------------|
|
||
| ``10 NN`` | Small-delta data block | NN/2 + 2 bytes |
|
||
| ``20 NN`` | Literal data block (looks int8-ish) | NN + 2 bytes |
|
||
| ``00 NN`` | 2-byte marker between data blocks | 2 bytes |
|
||
| ``30 NN`` | Trailer summary block | NN × 4 bytes |
|
||
| ``40 02`` | Segment header | 20 bytes |
|
||
|
||
In the 4-event bundle, every event's body parses as a clean sequence of
|
||
these blocks all the way through the trailer (when the walker is given
|
||
the right preamble length). No "??" stops occur once the start offset
|
||
is correct.
|
||
|
||
Segments and the ``40 02`` header
|
||
────────────────────────────────────
|
||
|
||
The body is divided into ~16 SEGMENTS, each separated by a ``40 02``
|
||
header. Each segment carries ~80 sample-sets (1280-sample event = 16
|
||
segments × 80 sample-sets, 3328-sample event = ~42 segments). The 18-byte
|
||
``40 02`` payload contains:
|
||
|
||
bytes 0..3 4-byte channel anchor / state (varies per segment)
|
||
bytes 4..7 4-byte field, varies (RMS/peak per channel?)
|
||
bytes 8..11 4-byte uint32 LE counter (increments by 1 per segment;
|
||
starts at e.g. 0x47 for the first in-data segment)
|
||
bytes 12..15 4-byte fixed pattern: 02 00 00 01
|
||
bytes 16..17 2-byte segment-relative payload counter
|
||
|
||
The counter at bytes [8..11] increments cleanly across segments — useful
|
||
as a sanity check. The role of bytes [0..3] (anchor candidates) and
|
||
[4..7] is not pinned down: simple "channel state at segment boundary"
|
||
hypotheses do NOT match truth across all four sample bundles tested.
|
||
|
||
What's open
|
||
────────────
|
||
|
||
The mapping ``block bytes → ADC samples`` is the open question. Tested
|
||
hypotheses that did **not** match BW's ASCII export to within the
|
||
required ±1 ADC count:
|
||
|
||
1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved
|
||
(TVLM/VTLM/LMTV/all 24 permutations × 2 nibble orders × 2 sign
|
||
conventions = 96 combinations tested). All produce values that
|
||
diverge from truth after the first ~7 sample-sets.
|
||
|
||
2. ``20 NN`` data = int8 absolute samples for one channel. Magnitudes
|
||
in observed blocks (peak ~±34 in the smoothest event-c block at
|
||
offset 351) do not match any channel's PPV at any plausible
|
||
ADC-count quantization (1-count, 4-count, 8-count, 16-count).
|
||
|
||
3. ``00 NN`` marker = "skip N sample-sets". Sums of NN/4 across markers
|
||
do not match 80 sample-sets per segment.
|
||
|
||
4. Concatenating ALL ``10 NN`` payload bytes and reading as a continuous
|
||
nibble stream (TVLM round-robin) produces the same 96-combination
|
||
problem as (1).
|
||
|
||
The most promising lead — that ``20 NN`` blocks carry literal int8
|
||
sample-sequences for the largest-amplitude channel within a segment —
|
||
is consistent with the smooth waveform shape of those payloads, but
|
||
the magnitude scaling has not been pinned down. It's possible that
|
||
``10 NN`` and ``20 NN`` blocks carry different bit-widths of the same
|
||
channel-interleaved delta stream (variable-width like Rice coding)
|
||
with 4-bit deltas as default and 8-bit deltas as escape.
|
||
|
||
Potential next steps for whoever picks this up:
|
||
|
||
- Capture an event with a KNOWN external waveform (e.g. a calibration
|
||
signal of known frequency/amplitude) so the truth is unambiguous and
|
||
the magnitude scaling is unambiguous.
|
||
- Capture multiple events with the SAME signal but DIFFERENT geo_range
|
||
(Normal 10 in/s vs Sensitive 1.25 in/s) to disambiguate scaling.
|
||
- Examine sequential 0x10 segment headers for a single event — the
|
||
4-byte "anchor" should reflect cumulative sample state at the
|
||
boundary; matching it to truth at that sample index would unlock
|
||
the per-segment delta decode.
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
from dataclasses import dataclass
|
||
from typing import List, Optional, Tuple
|
||
|
||
|
||
@dataclass
|
||
class WaveformBlock:
|
||
"""One tagged block parsed out of a Blastware waveform-file body."""
|
||
offset: int # byte offset into body
|
||
tag_hi: int # first tag byte (0x10 / 0x20 / 0x00 / 0x30 / 0x40)
|
||
tag_lo: int # second tag byte (NN)
|
||
data: bytes # block payload (excludes the 2-byte tag)
|
||
length: int # total block length on the wire (includes the tag)
|
||
|
||
@property
|
||
def kind(self) -> str:
|
||
return f"{self.tag_hi:02x} {self.tag_lo:02x}"
|
||
|
||
|
||
def find_data_start(body: bytes) -> int:
|
||
"""Auto-detect the offset of the first data block.
|
||
|
||
The body starts with a 7-byte preamble (magic ``00 02 00`` + two int16 BE
|
||
Tran anchors). After that, the data section starts with a tag — usually
|
||
``10 NN`` or ``20 NN``, but quiet events may begin with a ``00 NN`` RLE
|
||
marker. We return the offset of the first recognized tag.
|
||
"""
|
||
# Try fixed offset 7 first (canonical preamble length).
|
||
if len(body) >= 9:
|
||
b, nn = body[7], body[8]
|
||
if (b in (0x00, 0x10, 0x20, 0x30) and nn % 4 == 0 and 0 < nn <= 0xFC) \
|
||
or (b == 0x40 and nn == 0x02):
|
||
return 7
|
||
# Fall back to scanning the first 20 bytes.
|
||
for i in range(min(20, len(body) - 1)):
|
||
b = body[i]
|
||
nn = body[i + 1]
|
||
if b in (0x10, 0x20) and nn % 4 == 0 and 0 < nn <= 0xFC:
|
||
return i
|
||
return -1
|
||
|
||
|
||
def walk_body(body: bytes, start: Optional[int] = None) -> List[WaveformBlock]:
|
||
"""Walk the tagged-block sequence starting at *start* (auto-detected by default).
|
||
|
||
Stops when an unrecognized tag is encountered or end of body is reached.
|
||
Returned blocks are in stream order.
|
||
"""
|
||
if start is None:
|
||
start = find_data_start(body)
|
||
if start < 0:
|
||
return []
|
||
|
||
blocks: List[WaveformBlock] = []
|
||
i = start
|
||
while i + 1 < len(body):
|
||
t0 = body[i]
|
||
t1 = body[i + 1]
|
||
if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
|
||
length = t1 // 2 + 2
|
||
elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
|
||
length = t1 + 2
|
||
elif t0 == 0x00 and t1 % 4 == 0:
|
||
length = 2
|
||
elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
|
||
# Data-section ``30 NN`` blocks have length NN*2 (= 8 for NN=4,
|
||
# confirmed in M529LL1A.SS0 at body offset 29). Trailer-section
|
||
# ``30 NN`` blocks have length NN*4 (= 32 for NN=8, confirmed in
|
||
# event-d trailer at body offset 3941). We pick NN*2 if it lands
|
||
# on a recognized tag, otherwise fall through to NN*4.
|
||
cand2 = t1 * 2
|
||
cand4 = t1 * 4
|
||
if (i + cand2 < len(body) - 1
|
||
and body[i + cand2] in (0x10, 0x20, 0x00, 0x30, 0x40)):
|
||
length = cand2
|
||
else:
|
||
length = cand4
|
||
elif t0 == 0x40 and t1 == 0x02:
|
||
length = 20
|
||
else:
|
||
# Unknown tag; stop. Caller can inspect ``i`` to see where.
|
||
break
|
||
|
||
if i + length > len(body):
|
||
break
|
||
|
||
data = bytes(body[i + 2 : i + length])
|
||
blocks.append(WaveformBlock(offset=i, tag_hi=t0, tag_lo=t1, data=data, length=length))
|
||
i += length
|
||
|
||
return blocks
|
||
|
||
|
||
def split_segments(blocks: List[WaveformBlock]) -> List[List[WaveformBlock]]:
|
||
"""Group consecutive blocks into segments separated by ``40 02`` headers.
|
||
|
||
The first segment is whatever runs before the first ``40 02`` header
|
||
(typically the "segment 0" preamble data after the body preamble).
|
||
Subsequent segments start with a ``40 02`` block, then have their
|
||
own data blocks until the next ``40 02``.
|
||
"""
|
||
segments: List[List[WaveformBlock]] = []
|
||
current: List[WaveformBlock] = []
|
||
for b in blocks:
|
||
if b.tag_hi == 0x40 and b.tag_lo == 0x02:
|
||
if current:
|
||
segments.append(current)
|
||
current = [b]
|
||
else:
|
||
current.append(b)
|
||
if current:
|
||
segments.append(current)
|
||
return segments
|
||
|
||
|
||
def parse_segment_header(block: WaveformBlock) -> Optional[dict]:
|
||
"""Decode the 18-byte payload of a ``40 02`` segment header.
|
||
|
||
Returns a dict with the labelled fields, or None if *block* is not
|
||
a ``40 02`` header.
|
||
"""
|
||
if not (block.tag_hi == 0x40 and block.tag_lo == 0x02):
|
||
return None
|
||
if len(block.data) < 18:
|
||
return None
|
||
p = block.data
|
||
counter = int.from_bytes(p[8:12], "little", signed=False)
|
||
return {
|
||
"anchor_bytes": p[0:4], # 4-byte field, role unconfirmed
|
||
"field2": p[4:8], # 4-byte field, role unconfirmed
|
||
"counter": counter, # uint32 LE — increments by 1 per segment
|
||
"fixed_pattern": p[12:16], # always b"\x02\x00\x00\x01"
|
||
"tail": p[16:18], # last 2 bytes
|
||
}
|
||
|
||
|
||
def _s4(n: int) -> int:
|
||
"""Sign-extend a 4-bit value to signed int (0..7 → 0..7; 8..F → -8..-1)."""
|
||
return n if n < 8 else n - 16
|
||
|
||
|
||
def _i8(b: int) -> int:
|
||
"""Reinterpret an unsigned byte as signed int8."""
|
||
return b if b < 128 else b - 256
|
||
|
||
|
||
def decode_tran_initial(body: bytes) -> Optional[List[int]]:
|
||
"""
|
||
Decode the initial Tran-channel samples — VERIFIED 2026-05-11.
|
||
|
||
Returns Tran samples in **16-count units** (LSB = 0.005 in/s at Normal
|
||
range — the same quantization BW uses for its ASCII export). Returns
|
||
``None`` if the body cannot be parsed.
|
||
|
||
The decoded list extends from sample 0 through the end of segment 0
|
||
(= just before the first ``40 02`` segment header; ~510 sample-sets
|
||
for the events tested). Multi-segment decoding requires continuing
|
||
past the segment header — that's done by :func:`decode_tran_full`
|
||
when the per-segment rules are pinned down for all signal types.
|
||
|
||
Codec for segment 0 (CONFIRMED 2026-05-11 against 7 fixture events):
|
||
|
||
- Body bytes [0:3] are the magic ``00 02 00``.
|
||
- Body bytes [3:5] = ``Tran[0]`` as int16 BE in 16-count units.
|
||
- Body bytes [5:7] = ``Tran[1]`` as int16 BE in 16-count units.
|
||
- Data blocks (``10 NN`` or ``20 NN``) carry Tran deltas starting
|
||
at sample 2:
|
||
|
||
* ``10 NN``: NN nibbles = NN/2 bytes; each nibble is a 4-bit
|
||
signed delta (0..7 → 0..+7; 8..F → -8..-1). High nibble of
|
||
each byte comes first.
|
||
* ``20 NN``: NN int8 signed deltas (one delta per byte).
|
||
|
||
- ``00 NN`` blocks are run-length-encoded zero deltas: append NN
|
||
copies of the current cumulative Tran value (no change).
|
||
|
||
- ``30 NN`` blocks have not yet been decoded for content — they
|
||
appear in segment 0 of loud-from-start events (SS0, SV0) and
|
||
seem to signal a transition or special-case interpretation.
|
||
The walker steps over them but their data is ignored.
|
||
|
||
The walk stops at the first ``40 02`` segment header.
|
||
"""
|
||
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
|
||
return None
|
||
t0 = int.from_bytes(body[3:5], "big", signed=True)
|
||
t1 = int.from_bytes(body[5:7], "big", signed=True)
|
||
|
||
start = find_data_start(body)
|
||
if start < 0:
|
||
return [t0, t1]
|
||
|
||
out = [t0, t1]
|
||
cur = t1
|
||
for blk in walk_body(body, start):
|
||
if blk.tag_hi == 0x40:
|
||
# Segment boundary — stop. Multi-segment decode is decode_tran_full.
|
||
break
|
||
if blk.tag_hi == 0x10:
|
||
for byte in blk.data:
|
||
for nib in ((byte >> 4) & 0xF, byte & 0xF):
|
||
cur += _s4(nib)
|
||
out.append(cur)
|
||
elif blk.tag_hi == 0x20:
|
||
for byte in blk.data:
|
||
cur += _i8(byte)
|
||
out.append(cur)
|
||
elif blk.tag_hi == 0x00:
|
||
# RLE zero deltas: append NN copies of current Tran value.
|
||
for _ in range(blk.tag_lo):
|
||
out.append(cur)
|
||
# 30 NN: unknown content; skip.
|
||
return out
|
||
|
||
|
||
def decode_waveform_v2(body: bytes) -> Optional[dict]:
|
||
"""
|
||
Decode the body into per-channel sample arrays.
|
||
|
||
Returns ``None`` because the full multi-channel decoder is not yet
|
||
wired up. Tran is partially solved — see :func:`decode_tran_initial`
|
||
for the initial portion (verified against ground-truth BW exports).
|
||
|
||
Status (2026-05-11):
|
||
- Tran[0:N] correctly decoded by ``decode_tran_initial`` for the
|
||
first N samples of every fixture (where N = 22 / 42 / 46
|
||
depending on event).
|
||
- Subsequent Tran samples + all Vert / Long / MicL samples: open.
|
||
The block stream after the first data block likely interleaves
|
||
channels with ``30 NN`` channel-switch markers, but the exact
|
||
switching rule is still under investigation.
|
||
"""
|
||
return None
|