codec-re: solve waveform body block framing; per-byte sample mapping still open

Decoded the structural framing of the Blastware waveform body — the bytes between the 21-byte STRT record and the 26-byte file footer. The body is a sequence of tagged variable-length blocks, NOT raw int16 LE. Five tag types (10/20/00/30/40 NN) and their lengths are now confirmed against the 4-event May 2026 fixture bundle. Body splits cleanly into ~16 segments (for a 1280-sample event) separated by 40 02 segment headers carrying a monotonically incrementing uint32 LE counter at bytes [8:12]. What's done: - minimateplus/waveform_codec.py — block walker, segment splitter, segment header parser. decode_waveform_v2 is a stub returning None until the byte-to-sample mapping is solved; client.py is unchanged. - tests/test_waveform_codec.py — 31 tests covering block detection, lengths, contiguous-walk, segment splitting, segment-header parsing, and counter monotonicity. All pass. - tests/fixtures/decode-re-5-8-26/ — bundled fixtures (4 events, BW binary + Blastware ASCII export each). - docs/instantel_protocol_reference.md §7.6.1 — replaced retraction box with the verified structural decoding plus an explicit list of what's still open. What's still open: the per-byte mapping inside 10 NN / 20 NN blocks. 96 channel-permutation × nibble-order × sign-convention combinations were brute-force tested; none match BW's ASCII export to within ±1 ADC count. The codec is more elaborate than uniform 4-bit deltas — likely a hybrid variable-bit-width scheme with segment-anchor resync points. Next recommended step: capture an event with a known calibration tone to pin down magnitude scaling. Walker also bails out partway through event-b (open issue documented in both the module and the protocol reference).
2026-05-08 20:44:37 +00:00
parent 7bd0f8badf
commit d3f77d1d96
29 changed files with 10102 additions and 105 deletions
@@ -0,0 +1,73 @@
+"""Try decoding body as 4-bit signed nibble deltas, 4-channel round-robin."""
+import sys
+sys.path.insert(0, ".")
+from analysis.load_bundle import load_bundle
+
+
+CHANNELS = ("Tran", "Vert", "Long", "MicL")
+
+
+def s4(n):
+    """Sign-extend a 4-bit unsigned to int (0..7 → 0..7, 8..F → -8..-1)."""
+    return n if n < 8 else n - 16
+
+
+def decode_nibbles(body: bytes, skip_bytes: int = 7, n_channels: int = 4):
+    """Read body as 2 nibbles per byte; accumulate as deltas for n_channels round-robin."""
+    out = [[] for _ in range(n_channels)]
+    cur = [0] * n_channels
+    ch = 0
+    nibbles = []
+    for byte in body[skip_bytes:]:
+        nibbles.append((byte >> 4) & 0xF)
+        nibbles.append(byte & 0xF)
+    for n in nibbles:
+        cur[ch] += s4(n)
+        out[ch].append(cur[ch])
+        ch = (ch + 1) % n_channels
+    return out
+
+
+def cmp_to_truth(pred, truth, scale=16):
+    """Compare predicted ints (in 16-count units) to truth (in 16-count units = txt * 200).
+    Return (max_abs_err, mean_abs_err, n_compared).
+    """
+    n = min(len(pred), len(truth))
+    errs = []
+    for i in range(n):
+        p = pred[i]
+        t = truth[i]
+        errs.append(abs(p - t))
+    if not errs:
+        return None
+    return (max(errs), sum(errs) / len(errs), n)
+
+
+def main():
+    for name in ("event-a", "event-c"):
+        b = load_bundle(name)
+        # Convert TXT samples (in/s) to 16-count units (multiply by 200, since 0.005 in/s = 1)
+        # WAIT: 0.005 in/s = 16 ADC counts. 1 count = 0.000305 in/s.
+        # So in 1-count units: count = txt * (1/0.0003052) ≈ txt * 3276.7
+        # But TXT only has 0.005 resolution so equivalent to 16-count units = txt * 200.
+        truth_in_16 = {ch: [round(v * 200) for v in b.samples[ch]] for ch in CHANNELS[:3]}
+        # MicL is in dB, skip for now
+
+        # Try decoder with skip_bytes = 7
+        decoded = decode_nibbles(b.body, skip_bytes=7, n_channels=4)
+        print(f"\n=== {name} ===")
+        print(f"  body={len(b.body)}, nibbles={2*(len(b.body)-7)}, samples_per_ch={len(decoded[0])}")
+        print(f"  truth samples per ch: {len(truth_in_16['Tran'])}")
+        # Print first 24 of each
+        for i, chan in enumerate(CHANNELS):
+            pred_first = decoded[i][:24]
+            if chan in truth_in_16:
+                truth_first = truth_in_16[chan][:24]
+                print(f"  {chan} pred: {pred_first}")
+                print(f"  {chan} truth: {truth_first}")
+            else:
+                print(f"  {chan} pred: {pred_first}  (truth in dB, skipped)")
+
+
+if __name__ == "__main__":
+    main()