codec-re: channel rotation CONFIRMED — full multi-channel decoder works

The segment-channel scoring analyzer (from scratch/next_experiment_skeleton.py) ran and immediately confirmed the rotation hypothesis: SP0 seg 0: best fit Vert 508/508 ✓ SP0 seg 1: best fit Long 508/508 ✓ SP0 seg 3: best fit Tran 508/508 ✓ (Tran continuation) SP0 seg 5: best fit Long 508/508 ✓ SP0 seg 9: best fit Long 508/508 ✓ V70 seg 0: best fit Vert 508/508 ✓ V70 seg 1: best fit Long 508/508 ✓ Channels rotate Tran → Vert → Long → MicL per 40 02 segment header. Also discovered the segment header has DOUBLE duty: bytes [14:18] anchor the NEW segment's channel (2 samples as int16 BE in 16-count units), AND bytes [0:4] extend the PREVIOUS channel by 2 more samples (2 deltas as int16 BE). This is the same "2 anchors + delta stream" structure as the body preamble for Tran. decode_waveform_v2 now returns full per-channel sample dicts. Byte-exact verified ranges: V70: Tran 512, Vert 512, Long 512 (all first segments) JQ0: Tran 512, Vert 258 SP0: Long 1536 (all 3 L segments) Still open: the 30 NN block format (high-amplitude packed deltas) — appears mid-segment when single-byte deltas can't carry the magnitude. 6 new tests bring the count to 46. All passing.
2026-05-12 03:57:38 +00:00
parent ae0e17b5dc
commit 07675626dc
6 changed files with 365 additions and 136 deletions
@@ -1,4 +1,4 @@
-# Waveform body codec — current working status (2026-05-11)
+# Waveform body codec — current working status (2026-05-11, late)

 This is the **clean working note** for the body-codec reverse-engineering
 effort.  It supersedes scattered claims elsewhere when they conflict.
@@ -9,10 +9,31 @@ authoritative implementation lives in `minimateplus/waveform_codec.py`.
 ## TL;DR

 The Blastware waveform-file body is a **tagged variable-length block
-stream**, NOT raw int16 LE samples.  Block framing is solved.  Tran
-channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
-across all 5 high-amplitude fixture events).  Multi-segment continuation
-and the Vert / Long / MicL channel decoders are still open.
+stream**, NOT raw int16 LE samples.  Block framing is solved.  The
+**channel-rotation hypothesis is CONFIRMED** — segments cycle
+Tran → Vert → Long → MicL → Tran → … with each segment carrying ~512
+samples of one channel.  Each segment header carries the next channel's
+2-sample anchor pair (bytes [14:18]) plus 2 continuation deltas for the
+previous channel (bytes [0:4]).
+
+**What decodes byte-exact today (verified against BW ASCII export):**
+
+| Event | Channel | Samples verified |
+|---|---|---|
+| V70 (Mic-heavy) | Tran | 512 (1 segment) |
+| V70 | Vert | 512 |
+| V70 | Long | 512 |
+| JQ0 (Vert-heavy) | Tran | 512 |
+| JQ0 | Vert | 258 |
+| SP0 (loud all) | Long | **1536 (all 3 L segments)** |
+| SP0 | Tran | 1350 / 2044 produced |
+| SP0 | Vert | 650 / 1526 produced |
+
+**What's still open:** the `30 NN` block format.  These blocks appear in
+high-amplitude regions (deltas exceeding what int8 can express).  My
+decoder currently steps over them, which is fine for quiet stretches but
+breaks the cumulative when a `30 NN` carries information for samples we
+need.  Cracking this is the last major piece.

 **Production code in `minimateplus/client.py:_decode_a5_waveform` still
 uses the broken legacy int16 LE decoder.**  Sample arrays it writes to
@@ -69,78 +90,97 @@ Verified byte-exact:

 Implementation: `decode_tran_initial()`.

-### Segment header (`40 02`, 20 bytes total)
+### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11

 | Payload offset | Field | Status |
 |---|---|---|
-| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
-| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
-| [4:6] | Unknown (possibly checksum) | ❓ open |
+| [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
+| [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
+| [4:6] | Unknown (likely checksum) | ❓ open |
 | [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
 | [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
 | [12:14] | Constant `02 00` | ✅ confirmed |
-| [14:18] | Unknown 4-byte field | ❓ open |
+| [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
+| [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |

-## What's still open
+**Key insight (2026-05-11 late):** every segment carries 510 main
+samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
+in the NEXT segment header.  So each channel-segment effectively spans
+512 sample-sets.  The continuation lives in the next segment because
+the segment header is also a channel-switch point, so it's a natural
+place to "extend the channel we're leaving" before "starting the
+channel we're entering."

-1. **Multi-segment Tran continuation.**  After segment 0, applying
-   segment 1's blocks as Tran continuation diverges from truth by
-   sample ~512.  Block structure is identical to segment 0 and the
-   per-segment delta budget matches the segment size — but the per-
-   sample trajectory is wrong.
+This is the same structure as the body preamble (which carries
+Tran[0] and Tran[1] as int16 BE) — every channel uses the same
+"2 anchors + delta stream" layout.

-2. **Vert / Long / MicL channel decoders.**  No verified decoder for
-   any non-Tran channel.
-
-3. **`30 NN` block content.**  Only appears in loud-from-start events.
-   Probably a channel-switch or alternative-encoding marker for high-
-   amplitude regions.  Walker steps over it without decoding.
-
-## Strongest unverified hypothesis
-
-Segments rotate channels:
+## Channel rotation — VERIFIED 2026-05-11

 ```
-segment 0  →  Tran samples 0..509
-segment 1  →  Vert samples 0..507
-segment 2  →  Long samples 0..507
-segment 3  →  Mic  samples 0..507
-segment 4  →  Tran samples 510..N (continuation)
+(initial body)  →  Tran samples 0..509       (preamble + delta blocks)
+segment 0 hdr  ext+anchor →  Vert samples 0..511   ← anchor in hdr [14:18]
+segment 1 hdr  ext+anchor →  Long samples 0..511
+segment 2 hdr  ext+anchor →  Mic  samples 0..511
+segment 3 hdr  ext+anchor →  Tran samples 510..1021 (continuation)
+segment 4 hdr  ext+anchor →  Vert samples 512..1023
+segment 5 hdr  ext+anchor →  Long samples 512..1023
+segment 6 hdr  ext+anchor →  Mic  samples 512..1023
+segment 7 hdr  ext+anchor →  Tran samples 1022..1533
 ...
 ```

-This would explain:
- Why segment-0 = Tran works perfectly.
- Why segment 1 has the same block structure but applying it as Tran
-  continuation gives wrong values.
- Why the per-segment delta budget matches the segment size for a
-  *single* channel (508 deltas per segment, not 4 × 508).
+Implementation: `decode_waveform_v2()` returns
+`{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
+each channel's samples in 16-count units.  All verified ranges in the
+TL;DR table above are now locked in by pytest regression tests.

-Not yet verified because the per-channel anchor at segment-start isn't
-identified in the segment header.  Bytes [4:6] and [14:18] of the
-header are the prime candidates.
+## What's still open

-## Next experiment — segment-channel scoring analyzer
+1. **`30 NN` block content.**  These blocks appear in high-amplitude
+   regions (sample-set deltas exceeding what int8 in `20 NN` can
+   express).  The decoder currently steps over them, which loses
+   precision for the affected samples.  Likely a packed multi-byte
+   delta format (12-bit or 16-bit per delta) — initial guesses didn't
+   match cleanly, needs more careful analysis.

-Don't try to hero-code the full decoder.  Instead, build a small
-analysis tool that:
+2. **MicL decoding.**  The mic channel's anchor pair appears in the
+   third segment of each rotation cycle in the same format as the
+   geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
+   quantization steps), so direct integer comparison against ADC
+   units doesn't work.  Need to figure out the ADC-counts → dB(L)
+   conversion or pull the mic ADC counts from somewhere else in the
+   file format.

-1. For each segment in every fixture event, runs the segment-0 Tran
-   decoder (block-walk + RLE) and produces a cumulative trajectory
-   of 508 deltas.
-2. Scores that trajectory against the BW ASCII truth for *each* of
-   {Tran, Vert, Long, MicL} over the segment's sample range, starting
-   from different anchor-byte candidates from the segment header.
-3. Reports which (channel, anchor-bytes-location) combination produces
-   the lowest error for each segment.
+3. **Walker fix for event-b.**  The original quiet bundle's event-b
+   still bails out partway through.  Lower priority since the other
+   7 events walk cleanly.

-If the rotation hypothesis is right, segment 0 should clearly score
-best against Tran, segment 1 against Vert, etc.  The winning
-anchor-bytes-location will reveal which segment-header bytes encode
-the per-segment channel anchors.
+## Next experiment — crack the `30 NN` block

-If the rotation hypothesis is *not* right, the scorer will at least
-narrow down what segment 1 actually carries.
+The scoring analyzer in `scratch/next_experiment_skeleton.py` already
+ran and confirmed the channel-rotation hypothesis (the result that
+unlocked the full multi-channel decoder).  The next open piece is the
+`30 NN` block format.
+
+Approach:
+
+1. Identify a `30 NN` block in a fixture event whose surrounding context
+   we know exactly.  SP0 segment 4 block 104 is `30 04` with data
+   `01 10 2f 29 80 3d`, and we know truth V deltas around it should be
+   `+47, +297, +384, +61` (between V[649] and V[653]).
+2. Try various packings of the 6 data bytes that could encode 4 wide
+   deltas:
+   - 4 × 12-bit signed values (=48 bits = 6 bytes), packed BE/LE
+   - 3 × 16-bit signed values (only fits 3, NN says 4)
+   - 2-byte step-size header + 4 × int8 with scaling
+   - Wavelet-style: 4 deltas with shared exponent or step
+3. Initial brute-force found `+47` and `+61` in positions 1 and 3 of
+   a 12-bit BE packing, but `+297` and `+384` didn't fit cleanly.
+   Worth re-trying with more permutations.
+
+Once cracked, the `30 NN` decoder slots into `decode_waveform_v2` and
+the multi-channel decode extends past the high-amplitude regions.

 ## Test fixtures