""" waveform_codec.py — block-walker for the MiniMate Plus waveform body codec. PARTIAL REVERSE-ENGINEERING — 2026-05-08. Status: STRUCTURAL FRAMING confirmed; per-block sample interpretation OPEN. This module replaces the int16-LE assumption that produced full-scale ±32K noise on every event. The body is NOT raw int16 LE: it is a sequence of tagged variable-length blocks. The block framing is solved here. The mapping from block bytes to ADC samples is **NOT yet pinned down** — the work-in-progress decoder ``decode_waveform_v2`` returns ``None`` until a verified algorithm is wired in. Until ``decode_waveform_v2`` returns a verified result, callers that need sample data should keep relying on the legacy decoder in ``client.py`` (known-broken, but at least stable in shape) and not consume this module's sample output. ──────────────────────────────────────────────────────────────────────────── Body structure (CONFIRMED 2026-05-08 against decode-re/5-8-26 4-event bundle) ──────────────────────────────────────────────────────────────────────────── The Blastware waveform-file body lives between bytes [22+21=43] and the 26-byte file footer (``[: -26]``). Layout: [preamble: 7 or 9 bytes] [data section: a stream of tagged blocks] [trailer: per-channel summary blocks] The preamble starts with the magic ``00 02 00 00``. After that there is either 3 or 5 bytes of header before the first ``10 NN`` block tag — in the 4-event bundle, single-shot events have a 7-byte preamble and continuous events have 9. The exact meaning of bytes [4:9] is open (empirically: byte [4] for event-a == truth Tran[0]; byte [4] for event-b == truth Tran[0]; events c/d = 0; treating it as a per-channel "initial value" partially matches but is inconsistent across events). Blocks have 2-byte tags and these confirmed lengths: | Tag (hex) | Block type | Total length | |-----------|--------------------------------------|-----------------| | ``10 NN`` | Small-delta data block | NN/2 + 2 bytes | | ``20 NN`` | Literal data block (looks int8-ish) | NN + 2 bytes | | ``00 NN`` | 2-byte marker between data blocks | 2 bytes | | ``30 NN`` | Trailer summary block | NN × 4 bytes | | ``40 02`` | Segment header | 20 bytes | In the 4-event bundle, every event's body parses as a clean sequence of these blocks all the way through the trailer (when the walker is given the right preamble length). No "??" stops occur once the start offset is correct. Segments and the ``40 02`` header ──────────────────────────────────── The body is divided into ~16 SEGMENTS, each separated by a ``40 02`` header. Each segment carries ~80 sample-sets (1280-sample event = 16 segments × 80 sample-sets, 3328-sample event = ~42 segments). The 18-byte ``40 02`` payload contains: bytes 0..3 4-byte channel anchor / state (varies per segment) bytes 4..7 4-byte field, varies (RMS/peak per channel?) bytes 8..11 4-byte uint32 LE counter (increments by 1 per segment; starts at e.g. 0x47 for the first in-data segment) bytes 12..15 4-byte fixed pattern: 02 00 00 01 bytes 16..17 2-byte segment-relative payload counter The counter at bytes [8..11] increments cleanly across segments — useful as a sanity check. The role of bytes [0..3] (anchor candidates) and [4..7] is not pinned down: simple "channel state at segment boundary" hypotheses do NOT match truth across all four sample bundles tested. What's open ──────────── The mapping ``block bytes → ADC samples`` is the open question. Tested hypotheses that did **not** match BW's ASCII export to within the required ±1 ADC count: 1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved (TVLM/VTLM/LMTV/all 24 permutations × 2 nibble orders × 2 sign conventions = 96 combinations tested). All produce values that diverge from truth after the first ~7 sample-sets. 2. ``20 NN`` data = int8 absolute samples for one channel. Magnitudes in observed blocks (peak ~±34 in the smoothest event-c block at offset 351) do not match any channel's PPV at any plausible ADC-count quantization (1-count, 4-count, 8-count, 16-count). 3. ``00 NN`` marker = "skip N sample-sets". Sums of NN/4 across markers do not match 80 sample-sets per segment. 4. Concatenating ALL ``10 NN`` payload bytes and reading as a continuous nibble stream (TVLM round-robin) produces the same 96-combination problem as (1). The most promising lead — that ``20 NN`` blocks carry literal int8 sample-sequences for the largest-amplitude channel within a segment — is consistent with the smooth waveform shape of those payloads, but the magnitude scaling has not been pinned down. It's possible that ``10 NN`` and ``20 NN`` blocks carry different bit-widths of the same channel-interleaved delta stream (variable-width like Rice coding) with 4-bit deltas as default and 8-bit deltas as escape. Potential next steps for whoever picks this up: - Capture an event with a KNOWN external waveform (e.g. a calibration signal of known frequency/amplitude) so the truth is unambiguous and the magnitude scaling is unambiguous. - Capture multiple events with the SAME signal but DIFFERENT geo_range (Normal 10 in/s vs Sensitive 1.25 in/s) to disambiguate scaling. - Examine sequential 0x10 segment headers for a single event — the 4-byte "anchor" should reflect cumulative sample state at the boundary; matching it to truth at that sample index would unlock the per-segment delta decode. """ from __future__ import annotations from dataclasses import dataclass from typing import List, Optional, Tuple @dataclass class WaveformBlock: """One tagged block parsed out of a Blastware waveform-file body.""" offset: int # byte offset into body tag_hi: int # first tag byte (0x10 / 0x20 / 0x00 / 0x30 / 0x40) tag_lo: int # second tag byte (NN) data: bytes # block payload (excludes the 2-byte tag) length: int # total block length on the wire (includes the tag) @property def kind(self) -> str: return f"{self.tag_hi:02x} {self.tag_lo:02x}" def find_data_start(body: bytes) -> int: """Auto-detect the offset of the first ``10 NN`` block.""" for i in range(min(20, len(body) - 1)): if body[i] == 0x10 and body[i + 1] % 4 == 0 and 0 < body[i + 1] <= 0xFC: return i return -1 def walk_body(body: bytes, start: Optional[int] = None) -> List[WaveformBlock]: """Walk the tagged-block sequence starting at *start* (auto-detected by default). Stops when an unrecognized tag is encountered or end of body is reached. Returned blocks are in stream order. """ if start is None: start = find_data_start(body) if start < 0: return [] blocks: List[WaveformBlock] = [] i = start while i + 1 < len(body): t0 = body[i] t1 = body[i + 1] if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC: length = t1 // 2 + 2 elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC: length = t1 + 2 elif t0 == 0x00 and t1 % 4 == 0: length = 2 elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10: length = t1 * 4 elif t0 == 0x40 and t1 == 0x02: length = 20 else: # Unknown tag; stop. Caller can inspect ``i`` to see where. break if i + length > len(body): break data = bytes(body[i + 2 : i + length]) blocks.append(WaveformBlock(offset=i, tag_hi=t0, tag_lo=t1, data=data, length=length)) i += length return blocks def split_segments(blocks: List[WaveformBlock]) -> List[List[WaveformBlock]]: """Group consecutive blocks into segments separated by ``40 02`` headers. The first segment is whatever runs before the first ``40 02`` header (typically the "segment 0" preamble data after the body preamble). Subsequent segments start with a ``40 02`` block, then have their own data blocks until the next ``40 02``. """ segments: List[List[WaveformBlock]] = [] current: List[WaveformBlock] = [] for b in blocks: if b.tag_hi == 0x40 and b.tag_lo == 0x02: if current: segments.append(current) current = [b] else: current.append(b) if current: segments.append(current) return segments def parse_segment_header(block: WaveformBlock) -> Optional[dict]: """Decode the 18-byte payload of a ``40 02`` segment header. Returns a dict with the labelled fields, or None if *block* is not a ``40 02`` header. """ if not (block.tag_hi == 0x40 and block.tag_lo == 0x02): return None if len(block.data) < 18: return None p = block.data counter = int.from_bytes(p[8:12], "little", signed=False) return { "anchor_bytes": p[0:4], # 4-byte field, role unconfirmed "field2": p[4:8], # 4-byte field, role unconfirmed "counter": counter, # uint32 LE — increments by 1 per segment "fixed_pattern": p[12:16], # always b"\x02\x00\x00\x01" "tail": p[16:18], # last 2 bytes } def decode_waveform_v2(body: bytes) -> Optional[dict]: """ Decode the body into per-channel sample arrays. Returns a dict ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}`` when a verified decoder is wired up; returns ``None`` otherwise. Currently returns ``None`` because the byte-to-sample mapping is OPEN. The block framing in :func:`walk_body` is verified — callers can use that to inspect block-level structure without claiming the per-byte interpretation. """ return None