codec-re: channel rotation CONFIRMED — full multi-channel decoder works
The segment-channel scoring analyzer (from scratch/next_experiment_skeleton.py) ran and immediately confirmed the rotation hypothesis: SP0 seg 0: best fit Vert 508/508 ✓ SP0 seg 1: best fit Long 508/508 ✓ SP0 seg 3: best fit Tran 508/508 ✓ (Tran continuation) SP0 seg 5: best fit Long 508/508 ✓ SP0 seg 9: best fit Long 508/508 ✓ V70 seg 0: best fit Vert 508/508 ✓ V70 seg 1: best fit Long 508/508 ✓ Channels rotate Tran → Vert → Long → MicL per 40 02 segment header. Also discovered the segment header has DOUBLE duty: bytes [14:18] anchor the NEW segment's channel (2 samples as int16 BE in 16-count units), AND bytes [0:4] extend the PREVIOUS channel by 2 more samples (2 deltas as int16 BE). This is the same "2 anchors + delta stream" structure as the body preamble for Tran. decode_waveform_v2 now returns full per-channel sample dicts. Byte-exact verified ranges: V70: Tran 512, Vert 512, Long 512 (all first segments) JQ0: Tran 512, Vert 258 SP0: Long 1536 (all 3 L segments) Still open: the 30 NN block format (high-amplitude packed deltas) — appears mid-segment when single-byte deltas can't carry the magnitude. 6 new tests bring the count to 46. All passing.
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
# Waveform body codec — current working status (2026-05-11)
|
||||
# Waveform body codec — current working status (2026-05-11, late)
|
||||
|
||||
This is the **clean working note** for the body-codec reverse-engineering
|
||||
effort. It supersedes scattered claims elsewhere when they conflict.
|
||||
@@ -9,10 +9,31 @@ authoritative implementation lives in `minimateplus/waveform_codec.py`.
|
||||
## TL;DR
|
||||
|
||||
The Blastware waveform-file body is a **tagged variable-length block
|
||||
stream**, NOT raw int16 LE samples. Block framing is solved. Tran
|
||||
channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
|
||||
across all 5 high-amplitude fixture events). Multi-segment continuation
|
||||
and the Vert / Long / MicL channel decoders are still open.
|
||||
stream**, NOT raw int16 LE samples. Block framing is solved. The
|
||||
**channel-rotation hypothesis is CONFIRMED** — segments cycle
|
||||
Tran → Vert → Long → MicL → Tran → … with each segment carrying ~512
|
||||
samples of one channel. Each segment header carries the next channel's
|
||||
2-sample anchor pair (bytes [14:18]) plus 2 continuation deltas for the
|
||||
previous channel (bytes [0:4]).
|
||||
|
||||
**What decodes byte-exact today (verified against BW ASCII export):**
|
||||
|
||||
| Event | Channel | Samples verified |
|
||||
|---|---|---|
|
||||
| V70 (Mic-heavy) | Tran | 512 (1 segment) |
|
||||
| V70 | Vert | 512 |
|
||||
| V70 | Long | 512 |
|
||||
| JQ0 (Vert-heavy) | Tran | 512 |
|
||||
| JQ0 | Vert | 258 |
|
||||
| SP0 (loud all) | Long | **1536 (all 3 L segments)** |
|
||||
| SP0 | Tran | 1350 / 2044 produced |
|
||||
| SP0 | Vert | 650 / 1526 produced |
|
||||
|
||||
**What's still open:** the `30 NN` block format. These blocks appear in
|
||||
high-amplitude regions (deltas exceeding what int8 can express). My
|
||||
decoder currently steps over them, which is fine for quiet stretches but
|
||||
breaks the cumulative when a `30 NN` carries information for samples we
|
||||
need. Cracking this is the last major piece.
|
||||
|
||||
**Production code in `minimateplus/client.py:_decode_a5_waveform` still
|
||||
uses the broken legacy int16 LE decoder.** Sample arrays it writes to
|
||||
@@ -69,78 +90,97 @@ Verified byte-exact:
|
||||
|
||||
Implementation: `decode_tran_initial()`.
|
||||
|
||||
### Segment header (`40 02`, 20 bytes total)
|
||||
### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11
|
||||
|
||||
| Payload offset | Field | Status |
|
||||
|---|---|---|
|
||||
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
|
||||
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
|
||||
| [4:6] | Unknown (possibly checksum) | ❓ open |
|
||||
| [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
|
||||
| [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
|
||||
| [4:6] | Unknown (likely checksum) | ❓ open |
|
||||
| [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
|
||||
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
|
||||
| [12:14] | Constant `02 00` | ✅ confirmed |
|
||||
| [14:18] | Unknown 4-byte field | ❓ open |
|
||||
| [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
|
||||
| [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |
|
||||
|
||||
## What's still open
|
||||
**Key insight (2026-05-11 late):** every segment carries 510 main
|
||||
samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
|
||||
in the NEXT segment header. So each channel-segment effectively spans
|
||||
512 sample-sets. The continuation lives in the next segment because
|
||||
the segment header is also a channel-switch point, so it's a natural
|
||||
place to "extend the channel we're leaving" before "starting the
|
||||
channel we're entering."
|
||||
|
||||
1. **Multi-segment Tran continuation.** After segment 0, applying
|
||||
segment 1's blocks as Tran continuation diverges from truth by
|
||||
sample ~512. Block structure is identical to segment 0 and the
|
||||
per-segment delta budget matches the segment size — but the per-
|
||||
sample trajectory is wrong.
|
||||
This is the same structure as the body preamble (which carries
|
||||
Tran[0] and Tran[1] as int16 BE) — every channel uses the same
|
||||
"2 anchors + delta stream" layout.
|
||||
|
||||
2. **Vert / Long / MicL channel decoders.** No verified decoder for
|
||||
any non-Tran channel.
|
||||
|
||||
3. **`30 NN` block content.** Only appears in loud-from-start events.
|
||||
Probably a channel-switch or alternative-encoding marker for high-
|
||||
amplitude regions. Walker steps over it without decoding.
|
||||
|
||||
## Strongest unverified hypothesis
|
||||
|
||||
Segments rotate channels:
|
||||
## Channel rotation — VERIFIED 2026-05-11
|
||||
|
||||
```
|
||||
segment 0 → Tran samples 0..509
|
||||
segment 1 → Vert samples 0..507
|
||||
segment 2 → Long samples 0..507
|
||||
segment 3 → Mic samples 0..507
|
||||
segment 4 → Tran samples 510..N (continuation)
|
||||
(initial body) → Tran samples 0..509 (preamble + delta blocks)
|
||||
segment 0 hdr ext+anchor → Vert samples 0..511 ← anchor in hdr [14:18]
|
||||
segment 1 hdr ext+anchor → Long samples 0..511
|
||||
segment 2 hdr ext+anchor → Mic samples 0..511
|
||||
segment 3 hdr ext+anchor → Tran samples 510..1021 (continuation)
|
||||
segment 4 hdr ext+anchor → Vert samples 512..1023
|
||||
segment 5 hdr ext+anchor → Long samples 512..1023
|
||||
segment 6 hdr ext+anchor → Mic samples 512..1023
|
||||
segment 7 hdr ext+anchor → Tran samples 1022..1533
|
||||
...
|
||||
```
|
||||
|
||||
This would explain:
|
||||
- Why segment-0 = Tran works perfectly.
|
||||
- Why segment 1 has the same block structure but applying it as Tran
|
||||
continuation gives wrong values.
|
||||
- Why the per-segment delta budget matches the segment size for a
|
||||
*single* channel (508 deltas per segment, not 4 × 508).
|
||||
Implementation: `decode_waveform_v2()` returns
|
||||
`{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
|
||||
each channel's samples in 16-count units. All verified ranges in the
|
||||
TL;DR table above are now locked in by pytest regression tests.
|
||||
|
||||
Not yet verified because the per-channel anchor at segment-start isn't
|
||||
identified in the segment header. Bytes [4:6] and [14:18] of the
|
||||
header are the prime candidates.
|
||||
## What's still open
|
||||
|
||||
## Next experiment — segment-channel scoring analyzer
|
||||
1. **`30 NN` block content.** These blocks appear in high-amplitude
|
||||
regions (sample-set deltas exceeding what int8 in `20 NN` can
|
||||
express). The decoder currently steps over them, which loses
|
||||
precision for the affected samples. Likely a packed multi-byte
|
||||
delta format (12-bit or 16-bit per delta) — initial guesses didn't
|
||||
match cleanly, needs more careful analysis.
|
||||
|
||||
Don't try to hero-code the full decoder. Instead, build a small
|
||||
analysis tool that:
|
||||
2. **MicL decoding.** The mic channel's anchor pair appears in the
|
||||
third segment of each rotation cycle in the same format as the
|
||||
geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
|
||||
quantization steps), so direct integer comparison against ADC
|
||||
units doesn't work. Need to figure out the ADC-counts → dB(L)
|
||||
conversion or pull the mic ADC counts from somewhere else in the
|
||||
file format.
|
||||
|
||||
1. For each segment in every fixture event, runs the segment-0 Tran
|
||||
decoder (block-walk + RLE) and produces a cumulative trajectory
|
||||
of 508 deltas.
|
||||
2. Scores that trajectory against the BW ASCII truth for *each* of
|
||||
{Tran, Vert, Long, MicL} over the segment's sample range, starting
|
||||
from different anchor-byte candidates from the segment header.
|
||||
3. Reports which (channel, anchor-bytes-location) combination produces
|
||||
the lowest error for each segment.
|
||||
3. **Walker fix for event-b.** The original quiet bundle's event-b
|
||||
still bails out partway through. Lower priority since the other
|
||||
7 events walk cleanly.
|
||||
|
||||
If the rotation hypothesis is right, segment 0 should clearly score
|
||||
best against Tran, segment 1 against Vert, etc. The winning
|
||||
anchor-bytes-location will reveal which segment-header bytes encode
|
||||
the per-segment channel anchors.
|
||||
## Next experiment — crack the `30 NN` block
|
||||
|
||||
If the rotation hypothesis is *not* right, the scorer will at least
|
||||
narrow down what segment 1 actually carries.
|
||||
The scoring analyzer in `scratch/next_experiment_skeleton.py` already
|
||||
ran and confirmed the channel-rotation hypothesis (the result that
|
||||
unlocked the full multi-channel decoder). The next open piece is the
|
||||
`30 NN` block format.
|
||||
|
||||
Approach:
|
||||
|
||||
1. Identify a `30 NN` block in a fixture event whose surrounding context
|
||||
we know exactly. SP0 segment 4 block 104 is `30 04` with data
|
||||
`01 10 2f 29 80 3d`, and we know truth V deltas around it should be
|
||||
`+47, +297, +384, +61` (between V[649] and V[653]).
|
||||
2. Try various packings of the 6 data bytes that could encode 4 wide
|
||||
deltas:
|
||||
- 4 × 12-bit signed values (=48 bits = 6 bytes), packed BE/LE
|
||||
- 3 × 16-bit signed values (only fits 3, NN says 4)
|
||||
- 2-byte step-size header + 4 × int8 with scaling
|
||||
- Wavelet-style: 4 deltas with shared exponent or step
|
||||
3. Initial brute-force found `+47` and `+61` in positions 1 and 3 of
|
||||
a 12-bit BE packing, but `+297` and `+384` didn't fit cleanly.
|
||||
Worth re-trying with more permutations.
|
||||
|
||||
Once cracked, the `30 NN` decoder slots into `decode_waveform_v2` and
|
||||
the multi-channel decode extends past the high-amplitude regions.
|
||||
|
||||
## Test fixtures
|
||||
|
||||
|
||||
Reference in New Issue
Block a user