codec-re: solve waveform body block framing; per-byte sample mapping still open
Decoded the structural framing of the Blastware waveform body — the bytes between the 21-byte STRT record and the 26-byte file footer. The body is a sequence of tagged variable-length blocks, NOT raw int16 LE. Five tag types (10/20/00/30/40 NN) and their lengths are now confirmed against the 4-event May 2026 fixture bundle. Body splits cleanly into ~16 segments (for a 1280-sample event) separated by 40 02 segment headers carrying a monotonically incrementing uint32 LE counter at bytes [8:12]. What's done: - minimateplus/waveform_codec.py — block walker, segment splitter, segment header parser. decode_waveform_v2 is a stub returning None until the byte-to-sample mapping is solved; client.py is unchanged. - tests/test_waveform_codec.py — 31 tests covering block detection, lengths, contiguous-walk, segment splitting, segment-header parsing, and counter monotonicity. All pass. - tests/fixtures/decode-re-5-8-26/ — bundled fixtures (4 events, BW binary + Blastware ASCII export each). - docs/instantel_protocol_reference.md §7.6.1 — replaced retraction box with the verified structural decoding plus an explicit list of what's still open. What's still open: the per-byte mapping inside 10 NN / 20 NN blocks. 96 channel-permutation × nibble-order × sign-convention combinations were brute-force tested; none match BW's ASCII export to within ±1 ADC count. The codec is more elaborate than uniform 4-bit deltas — likely a hybrid variable-bit-width scheme with segment-anchor resync points. Next recommended step: capture an event with a known calibration tone to pin down magnitude scaling. Walker also bails out partway through event-b (open issue documented in both the module and the protocol reference).
This commit is contained in:
@@ -0,0 +1,242 @@
|
||||
"""
|
||||
waveform_codec.py — block-walker for the MiniMate Plus waveform body codec.
|
||||
|
||||
PARTIAL REVERSE-ENGINEERING — 2026-05-08.
|
||||
|
||||
Status: STRUCTURAL FRAMING confirmed; per-block sample interpretation OPEN.
|
||||
|
||||
This module replaces the int16-LE assumption that produced full-scale ±32K
|
||||
noise on every event. The body is NOT raw int16 LE: it is a sequence of
|
||||
tagged variable-length blocks. The block framing is solved here. The
|
||||
mapping from block bytes to ADC samples is **NOT yet pinned down** — the
|
||||
work-in-progress decoder ``decode_waveform_v2`` returns ``None`` until
|
||||
a verified algorithm is wired in.
|
||||
|
||||
Until ``decode_waveform_v2`` returns a verified result, callers that need
|
||||
sample data should keep relying on the legacy decoder in ``client.py``
|
||||
(known-broken, but at least stable in shape) and not consume this
|
||||
module's sample output.
|
||||
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
Body structure (CONFIRMED 2026-05-08 against decode-re/5-8-26 4-event bundle)
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
The Blastware waveform-file body lives between bytes [22+21=43] and the
|
||||
26-byte file footer (``[: -26]``). Layout:
|
||||
|
||||
[preamble: 7 or 9 bytes]
|
||||
[data section: a stream of tagged blocks]
|
||||
[trailer: per-channel summary blocks]
|
||||
|
||||
The preamble starts with the magic ``00 02 00 00``. After that there is
|
||||
either 3 or 5 bytes of header before the first ``10 NN`` block tag — in
|
||||
the 4-event bundle, single-shot events have a 7-byte preamble and
|
||||
continuous events have 9. The exact meaning of bytes [4:9] is open
|
||||
(empirically: byte [4] for event-a == truth Tran[0]; byte [4] for
|
||||
event-b == truth Tran[0]; events c/d = 0; treating it as a per-channel
|
||||
"initial value" partially matches but is inconsistent across events).
|
||||
|
||||
Blocks have 2-byte tags and these confirmed lengths:
|
||||
|
||||
| Tag (hex) | Block type | Total length |
|
||||
|-----------|--------------------------------------|-----------------|
|
||||
| ``10 NN`` | Small-delta data block | NN/2 + 2 bytes |
|
||||
| ``20 NN`` | Literal data block (looks int8-ish) | NN + 2 bytes |
|
||||
| ``00 NN`` | 2-byte marker between data blocks | 2 bytes |
|
||||
| ``30 NN`` | Trailer summary block | NN × 4 bytes |
|
||||
| ``40 02`` | Segment header | 20 bytes |
|
||||
|
||||
In the 4-event bundle, every event's body parses as a clean sequence of
|
||||
these blocks all the way through the trailer (when the walker is given
|
||||
the right preamble length). No "??" stops occur once the start offset
|
||||
is correct.
|
||||
|
||||
Segments and the ``40 02`` header
|
||||
────────────────────────────────────
|
||||
|
||||
The body is divided into ~16 SEGMENTS, each separated by a ``40 02``
|
||||
header. Each segment carries ~80 sample-sets (1280-sample event = 16
|
||||
segments × 80 sample-sets, 3328-sample event = ~42 segments). The 18-byte
|
||||
``40 02`` payload contains:
|
||||
|
||||
bytes 0..3 4-byte channel anchor / state (varies per segment)
|
||||
bytes 4..7 4-byte field, varies (RMS/peak per channel?)
|
||||
bytes 8..11 4-byte uint32 LE counter (increments by 1 per segment;
|
||||
starts at e.g. 0x47 for the first in-data segment)
|
||||
bytes 12..15 4-byte fixed pattern: 02 00 00 01
|
||||
bytes 16..17 2-byte segment-relative payload counter
|
||||
|
||||
The counter at bytes [8..11] increments cleanly across segments — useful
|
||||
as a sanity check. The role of bytes [0..3] (anchor candidates) and
|
||||
[4..7] is not pinned down: simple "channel state at segment boundary"
|
||||
hypotheses do NOT match truth across all four sample bundles tested.
|
||||
|
||||
What's open
|
||||
────────────
|
||||
|
||||
The mapping ``block bytes → ADC samples`` is the open question. Tested
|
||||
hypotheses that did **not** match BW's ASCII export to within the
|
||||
required ±1 ADC count:
|
||||
|
||||
1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved
|
||||
(TVLM/VTLM/LMTV/all 24 permutations × 2 nibble orders × 2 sign
|
||||
conventions = 96 combinations tested). All produce values that
|
||||
diverge from truth after the first ~7 sample-sets.
|
||||
|
||||
2. ``20 NN`` data = int8 absolute samples for one channel. Magnitudes
|
||||
in observed blocks (peak ~±34 in the smoothest event-c block at
|
||||
offset 351) do not match any channel's PPV at any plausible
|
||||
ADC-count quantization (1-count, 4-count, 8-count, 16-count).
|
||||
|
||||
3. ``00 NN`` marker = "skip N sample-sets". Sums of NN/4 across markers
|
||||
do not match 80 sample-sets per segment.
|
||||
|
||||
4. Concatenating ALL ``10 NN`` payload bytes and reading as a continuous
|
||||
nibble stream (TVLM round-robin) produces the same 96-combination
|
||||
problem as (1).
|
||||
|
||||
The most promising lead — that ``20 NN`` blocks carry literal int8
|
||||
sample-sequences for the largest-amplitude channel within a segment —
|
||||
is consistent with the smooth waveform shape of those payloads, but
|
||||
the magnitude scaling has not been pinned down. It's possible that
|
||||
``10 NN`` and ``20 NN`` blocks carry different bit-widths of the same
|
||||
channel-interleaved delta stream (variable-width like Rice coding)
|
||||
with 4-bit deltas as default and 8-bit deltas as escape.
|
||||
|
||||
Potential next steps for whoever picks this up:
|
||||
|
||||
- Capture an event with a KNOWN external waveform (e.g. a calibration
|
||||
signal of known frequency/amplitude) so the truth is unambiguous and
|
||||
the magnitude scaling is unambiguous.
|
||||
- Capture multiple events with the SAME signal but DIFFERENT geo_range
|
||||
(Normal 10 in/s vs Sensitive 1.25 in/s) to disambiguate scaling.
|
||||
- Examine sequential 0x10 segment headers for a single event — the
|
||||
4-byte "anchor" should reflect cumulative sample state at the
|
||||
boundary; matching it to truth at that sample index would unlock
|
||||
the per-segment delta decode.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Optional, Tuple
|
||||
|
||||
|
||||
@dataclass
|
||||
class WaveformBlock:
|
||||
"""One tagged block parsed out of a Blastware waveform-file body."""
|
||||
offset: int # byte offset into body
|
||||
tag_hi: int # first tag byte (0x10 / 0x20 / 0x00 / 0x30 / 0x40)
|
||||
tag_lo: int # second tag byte (NN)
|
||||
data: bytes # block payload (excludes the 2-byte tag)
|
||||
length: int # total block length on the wire (includes the tag)
|
||||
|
||||
@property
|
||||
def kind(self) -> str:
|
||||
return f"{self.tag_hi:02x} {self.tag_lo:02x}"
|
||||
|
||||
|
||||
def find_data_start(body: bytes) -> int:
|
||||
"""Auto-detect the offset of the first ``10 NN`` block."""
|
||||
for i in range(min(20, len(body) - 1)):
|
||||
if body[i] == 0x10 and body[i + 1] % 4 == 0 and 0 < body[i + 1] <= 0xFC:
|
||||
return i
|
||||
return -1
|
||||
|
||||
|
||||
def walk_body(body: bytes, start: Optional[int] = None) -> List[WaveformBlock]:
|
||||
"""Walk the tagged-block sequence starting at *start* (auto-detected by default).
|
||||
|
||||
Stops when an unrecognized tag is encountered or end of body is reached.
|
||||
Returned blocks are in stream order.
|
||||
"""
|
||||
if start is None:
|
||||
start = find_data_start(body)
|
||||
if start < 0:
|
||||
return []
|
||||
|
||||
blocks: List[WaveformBlock] = []
|
||||
i = start
|
||||
while i + 1 < len(body):
|
||||
t0 = body[i]
|
||||
t1 = body[i + 1]
|
||||
if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
|
||||
length = t1 // 2 + 2
|
||||
elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
|
||||
length = t1 + 2
|
||||
elif t0 == 0x00 and t1 % 4 == 0:
|
||||
length = 2
|
||||
elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
|
||||
length = t1 * 4
|
||||
elif t0 == 0x40 and t1 == 0x02:
|
||||
length = 20
|
||||
else:
|
||||
# Unknown tag; stop. Caller can inspect ``i`` to see where.
|
||||
break
|
||||
|
||||
if i + length > len(body):
|
||||
break
|
||||
|
||||
data = bytes(body[i + 2 : i + length])
|
||||
blocks.append(WaveformBlock(offset=i, tag_hi=t0, tag_lo=t1, data=data, length=length))
|
||||
i += length
|
||||
|
||||
return blocks
|
||||
|
||||
|
||||
def split_segments(blocks: List[WaveformBlock]) -> List[List[WaveformBlock]]:
|
||||
"""Group consecutive blocks into segments separated by ``40 02`` headers.
|
||||
|
||||
The first segment is whatever runs before the first ``40 02`` header
|
||||
(typically the "segment 0" preamble data after the body preamble).
|
||||
Subsequent segments start with a ``40 02`` block, then have their
|
||||
own data blocks until the next ``40 02``.
|
||||
"""
|
||||
segments: List[List[WaveformBlock]] = []
|
||||
current: List[WaveformBlock] = []
|
||||
for b in blocks:
|
||||
if b.tag_hi == 0x40 and b.tag_lo == 0x02:
|
||||
if current:
|
||||
segments.append(current)
|
||||
current = [b]
|
||||
else:
|
||||
current.append(b)
|
||||
if current:
|
||||
segments.append(current)
|
||||
return segments
|
||||
|
||||
|
||||
def parse_segment_header(block: WaveformBlock) -> Optional[dict]:
|
||||
"""Decode the 18-byte payload of a ``40 02`` segment header.
|
||||
|
||||
Returns a dict with the labelled fields, or None if *block* is not
|
||||
a ``40 02`` header.
|
||||
"""
|
||||
if not (block.tag_hi == 0x40 and block.tag_lo == 0x02):
|
||||
return None
|
||||
if len(block.data) < 18:
|
||||
return None
|
||||
p = block.data
|
||||
counter = int.from_bytes(p[8:12], "little", signed=False)
|
||||
return {
|
||||
"anchor_bytes": p[0:4], # 4-byte field, role unconfirmed
|
||||
"field2": p[4:8], # 4-byte field, role unconfirmed
|
||||
"counter": counter, # uint32 LE — increments by 1 per segment
|
||||
"fixed_pattern": p[12:16], # always b"\x02\x00\x00\x01"
|
||||
"tail": p[16:18], # last 2 bytes
|
||||
}
|
||||
|
||||
|
||||
def decode_waveform_v2(body: bytes) -> Optional[dict]:
|
||||
"""
|
||||
Decode the body into per-channel sample arrays.
|
||||
|
||||
Returns a dict ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
|
||||
when a verified decoder is wired up; returns ``None`` otherwise.
|
||||
|
||||
Currently returns ``None`` because the byte-to-sample mapping is OPEN.
|
||||
The block framing in :func:`walk_body` is verified — callers can use
|
||||
that to inspect block-level structure without claiming the per-byte
|
||||
interpretation.
|
||||
"""
|
||||
return None
|
||||
Reference in New Issue
Block a user