codec-re: channel rotation CONFIRMED — full multi-channel decoder works

The segment-channel scoring analyzer (from scratch/next_experiment_skeleton.py)
ran and immediately confirmed the rotation hypothesis:

  SP0 seg 0: best fit Vert  508/508  ✓
  SP0 seg 1: best fit Long  508/508  ✓
  SP0 seg 3: best fit Tran  508/508  ✓  (Tran continuation)
  SP0 seg 5: best fit Long  508/508  ✓
  SP0 seg 9: best fit Long  508/508  ✓
  V70 seg 0: best fit Vert  508/508  ✓
  V70 seg 1: best fit Long  508/508  ✓

Channels rotate Tran → Vert → Long → MicL per 40 02 segment header.

Also discovered the segment header has DOUBLE duty: bytes [14:18] anchor
the NEW segment's channel (2 samples as int16 BE in 16-count units), AND
bytes [0:4] extend the PREVIOUS channel by 2 more samples (2 deltas as
int16 BE).  This is the same "2 anchors + delta stream" structure as the
body preamble for Tran.

decode_waveform_v2 now returns full per-channel sample dicts.
Byte-exact verified ranges:
  V70: Tran 512, Vert 512, Long 512   (all first segments)
  JQ0: Tran 512, Vert 258
  SP0: Long 1536 (all 3 L segments)

Still open: the 30 NN block format (high-amplitude packed deltas) —
appears mid-segment when single-byte deltas can't carry the magnitude.

6 new tests bring the count to 46.  All passing.
This commit is contained in:
Claude
2026-05-12 03:57:38 +00:00
committed by serversdown
parent ae0e17b5dc
commit 07675626dc
6 changed files with 365 additions and 136 deletions
+97 -57
View File
@@ -1,4 +1,4 @@
# Waveform body codec — current working status (2026-05-11)
# Waveform body codec — current working status (2026-05-11, late)
This is the **clean working note** for the body-codec reverse-engineering
effort. It supersedes scattered claims elsewhere when they conflict.
@@ -9,10 +9,31 @@ authoritative implementation lives in `minimateplus/waveform_codec.py`.
## TL;DR
The Blastware waveform-file body is a **tagged variable-length block
stream**, NOT raw int16 LE samples. Block framing is solved. Tran
channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
across all 5 high-amplitude fixture events). Multi-segment continuation
and the Vert / Long / MicL channel decoders are still open.
stream**, NOT raw int16 LE samples. Block framing is solved. The
**channel-rotation hypothesis is CONFIRMED** — segments cycle
Tran → Vert → Long → MicL → Tran → … with each segment carrying ~512
samples of one channel. Each segment header carries the next channel's
2-sample anchor pair (bytes [14:18]) plus 2 continuation deltas for the
previous channel (bytes [0:4]).
**What decodes byte-exact today (verified against BW ASCII export):**
| Event | Channel | Samples verified |
|---|---|---|
| V70 (Mic-heavy) | Tran | 512 (1 segment) |
| V70 | Vert | 512 |
| V70 | Long | 512 |
| JQ0 (Vert-heavy) | Tran | 512 |
| JQ0 | Vert | 258 |
| SP0 (loud all) | Long | **1536 (all 3 L segments)** |
| SP0 | Tran | 1350 / 2044 produced |
| SP0 | Vert | 650 / 1526 produced |
**What's still open:** the `30 NN` block format. These blocks appear in
high-amplitude regions (deltas exceeding what int8 can express). My
decoder currently steps over them, which is fine for quiet stretches but
breaks the cumulative when a `30 NN` carries information for samples we
need. Cracking this is the last major piece.
**Production code in `minimateplus/client.py:_decode_a5_waveform` still
uses the broken legacy int16 LE decoder.** Sample arrays it writes to
@@ -69,78 +90,97 @@ Verified byte-exact:
Implementation: `decode_tran_initial()`.
### Segment header (`40 02`, 20 bytes total)
### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11
| Payload offset | Field | Status |
|---|---|---|
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (possibly checksum) | ❓ open |
| [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
| [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
| [4:6] | Unknown (likely checksum) | ❓ open |
| [6:8] | Byte length to next segment header 2 (uint16 BE) | ✅ confirmed |
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
| [12:14] | Constant `02 00` | ✅ confirmed |
| [14:18] | Unknown 4-byte field | ❓ open |
| [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
| [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |
## What's still open
**Key insight (2026-05-11 late):** every segment carries 510 main
samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
in the NEXT segment header. So each channel-segment effectively spans
512 sample-sets. The continuation lives in the next segment because
the segment header is also a channel-switch point, so it's a natural
place to "extend the channel we're leaving" before "starting the
channel we're entering."
1. **Multi-segment Tran continuation.** After segment 0, applying
segment 1's blocks as Tran continuation diverges from truth by
sample ~512. Block structure is identical to segment 0 and the
per-segment delta budget matches the segment size — but the per-
sample trajectory is wrong.
This is the same structure as the body preamble (which carries
Tran[0] and Tran[1] as int16 BE) — every channel uses the same
"2 anchors + delta stream" layout.
2. **Vert / Long / MicL channel decoders.** No verified decoder for
any non-Tran channel.
3. **`30 NN` block content.** Only appears in loud-from-start events.
Probably a channel-switch or alternative-encoding marker for high-
amplitude regions. Walker steps over it without decoding.
## Strongest unverified hypothesis
Segments rotate channels:
## Channel rotation — VERIFIED 2026-05-11
```
segment 0 → Tran samples 0..509
segment 1 → Vert samples 0..507
segment 2 → Long samples 0..507
segment 3 → Mic samples 0..507
segment 4 → Tran samples 510..N (continuation)
(initial body) → Tran samples 0..509 (preamble + delta blocks)
segment 0 hdr ext+anchor → Vert samples 0..511 ← anchor in hdr [14:18]
segment 1 hdr ext+anchor → Long samples 0..511
segment 2 hdr ext+anchor → Mic samples 0..511
segment 3 hdr ext+anchor → Tran samples 510..1021 (continuation)
segment 4 hdr ext+anchor → Vert samples 512..1023
segment 5 hdr ext+anchor → Long samples 512..1023
segment 6 hdr ext+anchor → Mic samples 512..1023
segment 7 hdr ext+anchor → Tran samples 1022..1533
...
```
This would explain:
- Why segment-0 = Tran works perfectly.
- Why segment 1 has the same block structure but applying it as Tran
continuation gives wrong values.
- Why the per-segment delta budget matches the segment size for a
*single* channel (508 deltas per segment, not 4 × 508).
Implementation: `decode_waveform_v2()` returns
`{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
each channel's samples in 16-count units. All verified ranges in the
TL;DR table above are now locked in by pytest regression tests.
Not yet verified because the per-channel anchor at segment-start isn't
identified in the segment header. Bytes [4:6] and [14:18] of the
header are the prime candidates.
## What's still open
## Next experiment — segment-channel scoring analyzer
1. **`30 NN` block content.** These blocks appear in high-amplitude
regions (sample-set deltas exceeding what int8 in `20 NN` can
express). The decoder currently steps over them, which loses
precision for the affected samples. Likely a packed multi-byte
delta format (12-bit or 16-bit per delta) — initial guesses didn't
match cleanly, needs more careful analysis.
Don't try to hero-code the full decoder. Instead, build a small
analysis tool that:
2. **MicL decoding.** The mic channel's anchor pair appears in the
third segment of each rotation cycle in the same format as the
geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
quantization steps), so direct integer comparison against ADC
units doesn't work. Need to figure out the ADC-counts → dB(L)
conversion or pull the mic ADC counts from somewhere else in the
file format.
1. For each segment in every fixture event, runs the segment-0 Tran
decoder (block-walk + RLE) and produces a cumulative trajectory
of 508 deltas.
2. Scores that trajectory against the BW ASCII truth for *each* of
{Tran, Vert, Long, MicL} over the segment's sample range, starting
from different anchor-byte candidates from the segment header.
3. Reports which (channel, anchor-bytes-location) combination produces
the lowest error for each segment.
3. **Walker fix for event-b.** The original quiet bundle's event-b
still bails out partway through. Lower priority since the other
7 events walk cleanly.
If the rotation hypothesis is right, segment 0 should clearly score
best against Tran, segment 1 against Vert, etc. The winning
anchor-bytes-location will reveal which segment-header bytes encode
the per-segment channel anchors.
## Next experiment — crack the `30 NN` block
If the rotation hypothesis is *not* right, the scorer will at least
narrow down what segment 1 actually carries.
The scoring analyzer in `scratch/next_experiment_skeleton.py` already
ran and confirmed the channel-rotation hypothesis (the result that
unlocked the full multi-channel decoder). The next open piece is the
`30 NN` block format.
Approach:
1. Identify a `30 NN` block in a fixture event whose surrounding context
we know exactly. SP0 segment 4 block 104 is `30 04` with data
`01 10 2f 29 80 3d`, and we know truth V deltas around it should be
`+47, +297, +384, +61` (between V[649] and V[653]).
2. Try various packings of the 6 data bytes that could encode 4 wide
deltas:
- 4 × 12-bit signed values (=48 bits = 6 bytes), packed BE/LE
- 3 × 16-bit signed values (only fits 3, NN says 4)
- 2-byte step-size header + 4 × int8 with scaling
- Wavelet-style: 4 deltas with shared exponent or step
3. Initial brute-force found `+47` and `+61` in positions 1 and 3 of
a 12-bit BE packing, but `+297` and `+384` didn't fit cleanly.
Worth re-trying with more permutations.
Once cracked, the `30 NN` decoder slots into `decode_waveform_v2` and
the multi-channel decode extends past the high-amplitude regions.
## Test fixtures