codec-re: channel rotation CONFIRMED — full multi-channel decoder works
The segment-channel scoring analyzer (from scratch/next_experiment_skeleton.py) ran and immediately confirmed the rotation hypothesis: SP0 seg 0: best fit Vert 508/508 ✓ SP0 seg 1: best fit Long 508/508 ✓ SP0 seg 3: best fit Tran 508/508 ✓ (Tran continuation) SP0 seg 5: best fit Long 508/508 ✓ SP0 seg 9: best fit Long 508/508 ✓ V70 seg 0: best fit Vert 508/508 ✓ V70 seg 1: best fit Long 508/508 ✓ Channels rotate Tran → Vert → Long → MicL per 40 02 segment header. Also discovered the segment header has DOUBLE duty: bytes [14:18] anchor the NEW segment's channel (2 samples as int16 BE in 16-count units), AND bytes [0:4] extend the PREVIOUS channel by 2 more samples (2 deltas as int16 BE). This is the same "2 anchors + delta stream" structure as the body preamble for Tran. decode_waveform_v2 now returns full per-channel sample dicts. Byte-exact verified ranges: V70: Tran 512, Vert 512, Long 512 (all first segments) JQ0: Tran 512, Vert 258 SP0: Long 1536 (all 3 L segments) Still open: the 30 NN block format (high-amplitude packed deltas) — appears mid-segment when single-byte deltas can't carry the magnitude. 6 new tests bring the count to 46. All passing.
This commit is contained in:
@@ -86,44 +86,49 @@ is actually a tagged-block stream with a custom delta+RLE codec.
|
||||
- **Block framing** — 5 tag types (`10 NN`, `20 NN`, `00 NN`, `30 NN`,
|
||||
`40 02`) with confirmed lengths. Implementation: `walk_body()` in
|
||||
`minimateplus/waveform_codec.py`.
|
||||
- **Tran channel segment 0** — preamble bytes [3:7] = `Tran[0]`, `Tran[1]`
|
||||
- **Per-channel codec** — preamble bytes [3:7] = `Tran[0]`, `Tran[1]`
|
||||
as int16 BE in **16-count units** (LSB = 0.005 in/s). Then `10 NN`
|
||||
(4-bit nibble deltas), `20 NN` (int8 deltas), and `00 NN` (RLE zero
|
||||
deltas) carry Tran deltas from sample 2 onward. Verified byte-perfect
|
||||
across 4 of 5 fixture events (510 samples each). Implementation:
|
||||
`decode_tran_initial()`.
|
||||
- **Segment header** — `40 02` is a 20-byte block. Payload bytes [0:2]
|
||||
are the T_delta at the start of the new segment (int16 BE). Bytes
|
||||
[6:8] are the byte length to the next segment header. Bytes [8:12]
|
||||
are a monotonic uint32 LE counter. Bytes [12:14] are constant `02 00`.
|
||||
deltas) carry per-channel deltas from sample 2 onward.
|
||||
- **Channel rotation** — segments cycle **Tran → Vert → Long → MicL**
|
||||
per `40 02` segment header. Each segment carries ~512 sample-sets of
|
||||
ONE channel. The initial body (before the first `40 02`) is the
|
||||
implicit Tran segment.
|
||||
- **Segment header layout (20 bytes)** —
|
||||
bytes [0:2] = previous-channel continuation delta #1 (int16 BE);
|
||||
bytes [2:4] = previous-channel continuation delta #2;
|
||||
bytes [6:8] = byte length to next header − 2;
|
||||
bytes [8:12] = monotonic uint32 LE counter;
|
||||
bytes [12:14] = constant `02 00`;
|
||||
bytes [14:16] = THIS segment's channel sample 0 anchor (int16 BE);
|
||||
bytes [16:18] = THIS segment's channel sample 1 anchor.
|
||||
- **`decode_waveform_v2()`** returns full per-channel sample dicts.
|
||||
Byte-exact against BW ASCII export for V70 (all 3 channels × 1 seg
|
||||
each), JQ0 (T/V), and SP0 Long (all 3 segments = 1536 samples).
|
||||
|
||||
### What's NOT solved
|
||||
|
||||
- **Tran past segment 0** — multi-segment Tran continuation has been
|
||||
attempted but every hypothesis tested breaks at sample ~512. Likely
|
||||
channels rotate across segments (e.g. segment 0 = Tran, segment 1 = Vert,
|
||||
…) but this is unverified.
|
||||
- **Vert / Long / Mic channels** — no per-channel decoder yet. These
|
||||
almost certainly live in later segments but the segment-to-channel
|
||||
mapping is open.
|
||||
- **The `30 NN` block content** — appears in loud-from-start events
|
||||
(SS0, SV0) and breaks the simple Tran walk there. Probably a channel-
|
||||
switch or alternative-encoding marker for high-amplitude regions.
|
||||
- **The `30 NN` block content** — these blocks appear in high-amplitude
|
||||
regions where sample-set deltas exceed what int8 in `20 NN` can
|
||||
express. Probably a packed multi-byte delta format. Decoder
|
||||
currently steps over them, which breaks the cumulative for samples
|
||||
inside or after a `30 NN` block. See
|
||||
`docs/waveform_codec_re_status.md` for the analysis so far.
|
||||
- **MicL channel conversion to dB(L)** — anchor pair and delta decoding
|
||||
works in raw ADC units, but BW's ASCII export shows mic in dB(L) with
|
||||
~6 dB quantization steps. Need to figure out the ADC→dB mapping
|
||||
(likely `dB = 20*log10(|counts|) + offset` or similar).
|
||||
|
||||
### Next experiment
|
||||
|
||||
**Don't hero-code the full decoder.** Build a small analysis tool — a
|
||||
segment-channel scoring analyzer. For each segment of each fixture
|
||||
event, run the segment-0 Tran block-walk + RLE decode and score the
|
||||
cumulative trajectory against the BW ASCII truth for each of {Tran,
|
||||
Vert, Long, MicL} over that segment's sample range, trying different
|
||||
anchor-bytes candidates from the segment header. The winning
|
||||
(channel, anchor-location) combination for each segment reveals
|
||||
whether segments rotate channels and which header bytes encode the
|
||||
per-segment channel anchors.
|
||||
The segment-channel scoring analyzer already ran and confirmed the
|
||||
channel-rotation hypothesis. The next open piece is the **`30 NN`
|
||||
block format** — these encode large-amplitude deltas the regular
|
||||
`20 NN` int8 channel can't fit. Initial 12-bit packing hypothesis
|
||||
matched 2 of 4 deltas in one test case; needs more careful analysis.
|
||||
|
||||
See `docs/waveform_codec_re_status.md` for the full specification of
|
||||
the next experiment.
|
||||
See `docs/waveform_codec_re_status.md` for the data and current
|
||||
guesses.
|
||||
|
||||
### Production-code status
|
||||
|
||||
|
||||
@@ -0,0 +1,32 @@
|
||||
"""Verify decode_waveform_v2 against BW ASCII truth for all fixtures."""
|
||||
import sys
|
||||
sys.path.insert(0, ".")
|
||||
from analysis.load_bundle import _parse_txt
|
||||
from minimateplus.waveform_codec import decode_waveform_v2
|
||||
|
||||
|
||||
def main():
|
||||
for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0",
|
||||
"M529LL1L.JQ0", "M529LL1L.V70"):
|
||||
path = f"tests/fixtures/5-11-26/{stem}"
|
||||
with open(path, "rb") as f:
|
||||
body = f.read()[43:-26]
|
||||
_, samples = _parse_txt(path + ".TXT")
|
||||
decoded = decode_waveform_v2(body)
|
||||
if decoded is None:
|
||||
print(f"{stem}: decoder returned None")
|
||||
continue
|
||||
|
||||
print(f"\n=== {stem} ===")
|
||||
for ch in ("Tran", "Vert", "Long"):
|
||||
truth = [round(v * 200) for v in samples[ch]]
|
||||
pred = decoded[ch]
|
||||
n = min(len(pred), len(truth))
|
||||
matches = sum(1 for i in range(n) if pred[i] == truth[i])
|
||||
div = next((i for i in range(n) if pred[i] != truth[i]), -1)
|
||||
print(f" {ch}: decoded={len(pred):>5} truth={len(truth):>5} "
|
||||
f"matches={matches:>5}/{n:<5} first div={div}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,4 +1,4 @@
|
||||
# Waveform body codec — current working status (2026-05-11)
|
||||
# Waveform body codec — current working status (2026-05-11, late)
|
||||
|
||||
This is the **clean working note** for the body-codec reverse-engineering
|
||||
effort. It supersedes scattered claims elsewhere when they conflict.
|
||||
@@ -9,10 +9,31 @@ authoritative implementation lives in `minimateplus/waveform_codec.py`.
|
||||
## TL;DR
|
||||
|
||||
The Blastware waveform-file body is a **tagged variable-length block
|
||||
stream**, NOT raw int16 LE samples. Block framing is solved. Tran
|
||||
channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
|
||||
across all 5 high-amplitude fixture events). Multi-segment continuation
|
||||
and the Vert / Long / MicL channel decoders are still open.
|
||||
stream**, NOT raw int16 LE samples. Block framing is solved. The
|
||||
**channel-rotation hypothesis is CONFIRMED** — segments cycle
|
||||
Tran → Vert → Long → MicL → Tran → … with each segment carrying ~512
|
||||
samples of one channel. Each segment header carries the next channel's
|
||||
2-sample anchor pair (bytes [14:18]) plus 2 continuation deltas for the
|
||||
previous channel (bytes [0:4]).
|
||||
|
||||
**What decodes byte-exact today (verified against BW ASCII export):**
|
||||
|
||||
| Event | Channel | Samples verified |
|
||||
|---|---|---|
|
||||
| V70 (Mic-heavy) | Tran | 512 (1 segment) |
|
||||
| V70 | Vert | 512 |
|
||||
| V70 | Long | 512 |
|
||||
| JQ0 (Vert-heavy) | Tran | 512 |
|
||||
| JQ0 | Vert | 258 |
|
||||
| SP0 (loud all) | Long | **1536 (all 3 L segments)** |
|
||||
| SP0 | Tran | 1350 / 2044 produced |
|
||||
| SP0 | Vert | 650 / 1526 produced |
|
||||
|
||||
**What's still open:** the `30 NN` block format. These blocks appear in
|
||||
high-amplitude regions (deltas exceeding what int8 can express). My
|
||||
decoder currently steps over them, which is fine for quiet stretches but
|
||||
breaks the cumulative when a `30 NN` carries information for samples we
|
||||
need. Cracking this is the last major piece.
|
||||
|
||||
**Production code in `minimateplus/client.py:_decode_a5_waveform` still
|
||||
uses the broken legacy int16 LE decoder.** Sample arrays it writes to
|
||||
@@ -69,78 +90,97 @@ Verified byte-exact:
|
||||
|
||||
Implementation: `decode_tran_initial()`.
|
||||
|
||||
### Segment header (`40 02`, 20 bytes total)
|
||||
### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11
|
||||
|
||||
| Payload offset | Field | Status |
|
||||
|---|---|---|
|
||||
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
|
||||
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
|
||||
| [4:6] | Unknown (possibly checksum) | ❓ open |
|
||||
| [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
|
||||
| [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
|
||||
| [4:6] | Unknown (likely checksum) | ❓ open |
|
||||
| [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
|
||||
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
|
||||
| [12:14] | Constant `02 00` | ✅ confirmed |
|
||||
| [14:18] | Unknown 4-byte field | ❓ open |
|
||||
| [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
|
||||
| [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |
|
||||
|
||||
## What's still open
|
||||
**Key insight (2026-05-11 late):** every segment carries 510 main
|
||||
samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
|
||||
in the NEXT segment header. So each channel-segment effectively spans
|
||||
512 sample-sets. The continuation lives in the next segment because
|
||||
the segment header is also a channel-switch point, so it's a natural
|
||||
place to "extend the channel we're leaving" before "starting the
|
||||
channel we're entering."
|
||||
|
||||
1. **Multi-segment Tran continuation.** After segment 0, applying
|
||||
segment 1's blocks as Tran continuation diverges from truth by
|
||||
sample ~512. Block structure is identical to segment 0 and the
|
||||
per-segment delta budget matches the segment size — but the per-
|
||||
sample trajectory is wrong.
|
||||
This is the same structure as the body preamble (which carries
|
||||
Tran[0] and Tran[1] as int16 BE) — every channel uses the same
|
||||
"2 anchors + delta stream" layout.
|
||||
|
||||
2. **Vert / Long / MicL channel decoders.** No verified decoder for
|
||||
any non-Tran channel.
|
||||
|
||||
3. **`30 NN` block content.** Only appears in loud-from-start events.
|
||||
Probably a channel-switch or alternative-encoding marker for high-
|
||||
amplitude regions. Walker steps over it without decoding.
|
||||
|
||||
## Strongest unverified hypothesis
|
||||
|
||||
Segments rotate channels:
|
||||
## Channel rotation — VERIFIED 2026-05-11
|
||||
|
||||
```
|
||||
segment 0 → Tran samples 0..509
|
||||
segment 1 → Vert samples 0..507
|
||||
segment 2 → Long samples 0..507
|
||||
segment 3 → Mic samples 0..507
|
||||
segment 4 → Tran samples 510..N (continuation)
|
||||
(initial body) → Tran samples 0..509 (preamble + delta blocks)
|
||||
segment 0 hdr ext+anchor → Vert samples 0..511 ← anchor in hdr [14:18]
|
||||
segment 1 hdr ext+anchor → Long samples 0..511
|
||||
segment 2 hdr ext+anchor → Mic samples 0..511
|
||||
segment 3 hdr ext+anchor → Tran samples 510..1021 (continuation)
|
||||
segment 4 hdr ext+anchor → Vert samples 512..1023
|
||||
segment 5 hdr ext+anchor → Long samples 512..1023
|
||||
segment 6 hdr ext+anchor → Mic samples 512..1023
|
||||
segment 7 hdr ext+anchor → Tran samples 1022..1533
|
||||
...
|
||||
```
|
||||
|
||||
This would explain:
|
||||
- Why segment-0 = Tran works perfectly.
|
||||
- Why segment 1 has the same block structure but applying it as Tran
|
||||
continuation gives wrong values.
|
||||
- Why the per-segment delta budget matches the segment size for a
|
||||
*single* channel (508 deltas per segment, not 4 × 508).
|
||||
Implementation: `decode_waveform_v2()` returns
|
||||
`{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
|
||||
each channel's samples in 16-count units. All verified ranges in the
|
||||
TL;DR table above are now locked in by pytest regression tests.
|
||||
|
||||
Not yet verified because the per-channel anchor at segment-start isn't
|
||||
identified in the segment header. Bytes [4:6] and [14:18] of the
|
||||
header are the prime candidates.
|
||||
## What's still open
|
||||
|
||||
## Next experiment — segment-channel scoring analyzer
|
||||
1. **`30 NN` block content.** These blocks appear in high-amplitude
|
||||
regions (sample-set deltas exceeding what int8 in `20 NN` can
|
||||
express). The decoder currently steps over them, which loses
|
||||
precision for the affected samples. Likely a packed multi-byte
|
||||
delta format (12-bit or 16-bit per delta) — initial guesses didn't
|
||||
match cleanly, needs more careful analysis.
|
||||
|
||||
Don't try to hero-code the full decoder. Instead, build a small
|
||||
analysis tool that:
|
||||
2. **MicL decoding.** The mic channel's anchor pair appears in the
|
||||
third segment of each rotation cycle in the same format as the
|
||||
geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
|
||||
quantization steps), so direct integer comparison against ADC
|
||||
units doesn't work. Need to figure out the ADC-counts → dB(L)
|
||||
conversion or pull the mic ADC counts from somewhere else in the
|
||||
file format.
|
||||
|
||||
1. For each segment in every fixture event, runs the segment-0 Tran
|
||||
decoder (block-walk + RLE) and produces a cumulative trajectory
|
||||
of 508 deltas.
|
||||
2. Scores that trajectory against the BW ASCII truth for *each* of
|
||||
{Tran, Vert, Long, MicL} over the segment's sample range, starting
|
||||
from different anchor-byte candidates from the segment header.
|
||||
3. Reports which (channel, anchor-bytes-location) combination produces
|
||||
the lowest error for each segment.
|
||||
3. **Walker fix for event-b.** The original quiet bundle's event-b
|
||||
still bails out partway through. Lower priority since the other
|
||||
7 events walk cleanly.
|
||||
|
||||
If the rotation hypothesis is right, segment 0 should clearly score
|
||||
best against Tran, segment 1 against Vert, etc. The winning
|
||||
anchor-bytes-location will reveal which segment-header bytes encode
|
||||
the per-segment channel anchors.
|
||||
## Next experiment — crack the `30 NN` block
|
||||
|
||||
If the rotation hypothesis is *not* right, the scorer will at least
|
||||
narrow down what segment 1 actually carries.
|
||||
The scoring analyzer in `scratch/next_experiment_skeleton.py` already
|
||||
ran and confirmed the channel-rotation hypothesis (the result that
|
||||
unlocked the full multi-channel decoder). The next open piece is the
|
||||
`30 NN` block format.
|
||||
|
||||
Approach:
|
||||
|
||||
1. Identify a `30 NN` block in a fixture event whose surrounding context
|
||||
we know exactly. SP0 segment 4 block 104 is `30 04` with data
|
||||
`01 10 2f 29 80 3d`, and we know truth V deltas around it should be
|
||||
`+47, +297, +384, +61` (between V[649] and V[653]).
|
||||
2. Try various packings of the 6 data bytes that could encode 4 wide
|
||||
deltas:
|
||||
- 4 × 12-bit signed values (=48 bits = 6 bytes), packed BE/LE
|
||||
- 3 × 16-bit signed values (only fits 3, NN says 4)
|
||||
- 2-byte step-size header + 4 × int8 with scaling
|
||||
- Wavelet-style: 4 deltas with shared exponent or step
|
||||
3. Initial brute-force found `+47` and `+61` in positions 1 and 3 of
|
||||
a 12-bit BE packing, but `+297` and `+384` didn't fit cleanly.
|
||||
Worth re-trying with more permutations.
|
||||
|
||||
Once cracked, the `30 NN` decoder slots into `decode_waveform_v2` and
|
||||
the multi-channel decode extends past the high-amplitude regions.
|
||||
|
||||
## Test fixtures
|
||||
|
||||
|
||||
@@ -350,17 +350,94 @@ def decode_waveform_v2(body: bytes) -> Optional[dict]:
|
||||
"""
|
||||
Decode the body into per-channel sample arrays.
|
||||
|
||||
Returns ``None`` because the full multi-channel decoder is not yet
|
||||
wired up. Tran is partially solved — see :func:`decode_tran_initial`
|
||||
for the initial portion (verified against ground-truth BW exports).
|
||||
Status (2026-05-11 evening — channel-rotation hypothesis CONFIRMED):
|
||||
segments rotate channels in fixed order **Tran → Vert → Long → MicL**.
|
||||
Each channel-segment carries a 2-sample anchor pair in segment-header
|
||||
bytes [14:18] (or in the body preamble for the initial Tran segment)
|
||||
plus a stream of delta blocks for samples 2 onward.
|
||||
|
||||
Status (2026-05-11):
|
||||
- Tran[0:N] correctly decoded by ``decode_tran_initial`` for the
|
||||
first N samples of every fixture (where N = 22 / 42 / 46
|
||||
depending on event).
|
||||
- Subsequent Tran samples + all Vert / Long / MicL samples: open.
|
||||
The block stream after the first data block likely interleaves
|
||||
channels with ``30 NN`` channel-switch markers, but the exact
|
||||
switching rule is still under investigation.
|
||||
Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
|
||||
with each channel's decoded samples in 16-count units (LSB = 0.005
|
||||
in/s at Normal range). Returns ``None`` if the body cannot be
|
||||
parsed.
|
||||
"""
|
||||
return None
|
||||
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
|
||||
return None
|
||||
|
||||
channels = ["Tran", "Vert", "Long", "MicL"]
|
||||
out: dict = {ch: [] for ch in channels}
|
||||
|
||||
# Initial Tran segment: preamble anchor pair + delta blocks before first 40 02.
|
||||
t0 = int.from_bytes(body[3:5], "big", signed=True)
|
||||
t1 = int.from_bytes(body[5:7], "big", signed=True)
|
||||
out["Tran"].extend([t0, t1])
|
||||
|
||||
start = find_data_start(body)
|
||||
if start < 0:
|
||||
return out
|
||||
|
||||
blocks = walk_body(body, start)
|
||||
seg_idx = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
|
||||
|
||||
def apply_blocks(channel: str, anchor: int,
|
||||
block_start: int, block_end: int) -> int:
|
||||
"""Apply delta blocks [block_start, block_end) to *channel*'s sample
|
||||
list, starting from *anchor*. Returns the final cumulative value."""
|
||||
cur = anchor
|
||||
for bi in range(block_start, block_end):
|
||||
blk = blocks[bi]
|
||||
if blk.tag_hi == 0x10:
|
||||
for byte in blk.data:
|
||||
for nib in ((byte >> 4) & 0xF, byte & 0xF):
|
||||
cur += _s4(nib)
|
||||
out[channel].append(cur)
|
||||
elif blk.tag_hi == 0x20:
|
||||
for byte in blk.data:
|
||||
cur += _i8(byte)
|
||||
out[channel].append(cur)
|
||||
elif blk.tag_hi == 0x00:
|
||||
for _ in range(blk.tag_lo):
|
||||
out[channel].append(cur)
|
||||
# 30 NN: unknown content; skip.
|
||||
# 40 02: should not occur in segment data.
|
||||
return cur
|
||||
|
||||
# Initial Tran segment: deltas from start of body up to first 40 02 (or end).
|
||||
first_seg = seg_idx[0] if seg_idx else len(blocks)
|
||||
last_tran_value = apply_blocks("Tran", t1, 0, first_seg)
|
||||
|
||||
# Subsequent segments rotate channels. Each segment header carries:
|
||||
# bytes [0:2] and [2:4] = 2 deltas extending the PREVIOUS channel
|
||||
# bytes [14:16] and [16:18] = anchor pair for THIS segment's channel
|
||||
#
|
||||
# Rotation: V, L, M, T, V, L, M, T, ... (initial Tran segment is the
|
||||
# implicit T in the cycle.)
|
||||
rotation = ["Vert", "Long", "MicL", "Tran"]
|
||||
# Track each channel's "running cumulative value" so we can apply the
|
||||
# previous-channel extension deltas at every segment boundary.
|
||||
last_value = {"Tran": last_tran_value, "Vert": None, "Long": None, "MicL": None}
|
||||
|
||||
for k, hi in enumerate(seg_idx):
|
||||
channel = rotation[k % 4]
|
||||
prev_channel = "Tran" if k == 0 else rotation[(k - 1) % 4]
|
||||
header = blocks[hi]
|
||||
if len(header.data) < 18:
|
||||
continue
|
||||
# Extend the PREVIOUS channel by 2 more samples (deltas in bytes [0:4]).
|
||||
prev_d0 = int.from_bytes(header.data[0:2], "big", signed=True)
|
||||
prev_d1 = int.from_bytes(header.data[2:4], "big", signed=True)
|
||||
if last_value[prev_channel] is not None:
|
||||
v = last_value[prev_channel] + prev_d0
|
||||
out[prev_channel].append(v)
|
||||
v += prev_d1
|
||||
out[prev_channel].append(v)
|
||||
last_value[prev_channel] = v
|
||||
# Anchor pair for THIS segment's channel.
|
||||
c0 = int.from_bytes(header.data[14:16], "big", signed=True)
|
||||
c1 = int.from_bytes(header.data[16:18], "big", signed=True)
|
||||
out[channel].extend([c0, c1])
|
||||
# Apply delta blocks for this segment.
|
||||
next_hi = seg_idx[k + 1] if k + 1 < len(seg_idx) else len(blocks)
|
||||
last_value[channel] = apply_blocks(channel, c1, hi + 1, next_hi)
|
||||
|
||||
return out
|
||||
|
||||
@@ -263,29 +263,62 @@ def score_against_truth(
|
||||
def score_segment_against_all_channels(
|
||||
event: FixtureEvent,
|
||||
segment_index: int,
|
||||
) -> List[Tuple[str, str, int, int, int]]:
|
||||
"""For segment *segment_index* of *event*, try decoding it as each channel
|
||||
with each candidate anchor source.
|
||||
) -> List[Tuple[str, int, int, int]]:
|
||||
"""For segment *segment_index* of *event*, find the best (channel, start_sample)
|
||||
fit.
|
||||
|
||||
Returns rows of (channel_name, anchor_source_label, anchor_value, n_matches, n_compared)
|
||||
sorted by match count descending.
|
||||
For each candidate channel C and each candidate starting truth-sample index s,
|
||||
we pick the anchor that makes the FIRST decoded value match truth[C][s], then
|
||||
score the remaining decoded values against truth[C][s+1 : s+N].
|
||||
|
||||
Anchor source candidates to try:
|
||||
- "header[0:2]" int16 BE from segment header bytes [0:2]
|
||||
- "header[2:4]" int16 BE from segment header bytes [2:4]
|
||||
- "header[4:6]" int16 BE from segment header bytes [4:6]
|
||||
- "header[14:16]" int16 BE from segment header bytes [14:16]
|
||||
- "header[16:18]" int16 BE from segment header bytes [16:18]
|
||||
- "channel[0]" truth[channel][0] (= "this segment starts at sample 0 of this channel")
|
||||
- "channel[prev]" truth[channel][segment_sample_starts[segment_index] - 1]
|
||||
(= "this segment continues from sample N-1 of this channel")
|
||||
|
||||
For each combination of (channel, anchor source, "starts at sample X of channel"),
|
||||
decode the segment and score against truth.
|
||||
|
||||
TODO: implement this — it's the heart of the experiment.
|
||||
Returns rows of (channel_name, start_sample, n_matches, n_compared)
|
||||
sorted by match-count descending.
|
||||
"""
|
||||
raise NotImplementedError("This is the next experiment to run.")
|
||||
# Block range of this segment: from the segment header (inclusive) up to
|
||||
# the next segment header (exclusive), or end-of-blocks.
|
||||
seg_header_idx = event.segment_starts[segment_index]
|
||||
next_header_idx = (
|
||||
event.segment_starts[segment_index + 1]
|
||||
if segment_index + 1 < len(event.segment_starts)
|
||||
else len(event.blocks)
|
||||
)
|
||||
|
||||
# Decode the segment's data blocks (skip the segment-header block itself).
|
||||
# Use anchor=0 — we'll re-anchor when scoring against each channel.
|
||||
deltas_trajectory = decode_segment_as_channel(
|
||||
event.blocks, seg_header_idx + 1, next_header_idx, anchor=0
|
||||
)
|
||||
if not deltas_trajectory:
|
||||
return []
|
||||
|
||||
n = len(deltas_trajectory)
|
||||
results = []
|
||||
|
||||
for ch in ("Tran", "Vert", "Long"):
|
||||
truth = event.truth.get(ch)
|
||||
if not truth or len(truth) < n + 1:
|
||||
continue
|
||||
# For each candidate starting sample s in truth, check if applying
|
||||
# the deltas starting from truth[s] reproduces truth[s+1:s+n+1].
|
||||
best = (0, -1)
|
||||
for s in range(len(truth) - n):
|
||||
anchor = truth[s]
|
||||
offset = anchor - deltas_trajectory[0] + truth[s + 1] - anchor
|
||||
# Recompute: trajectory[i] = anchor + cumulative_delta_through_i
|
||||
# but we already have deltas_trajectory computed from anchor=0,
|
||||
# so trajectory_relative[i] = anchor + deltas_trajectory[i].
|
||||
matches = 0
|
||||
for i in range(n):
|
||||
if truth[s + i + 1] == anchor + deltas_trajectory[i]:
|
||||
matches += 1
|
||||
# Note: we could break early on first mismatch for "matches start",
|
||||
# but counting total matches gives a more robust score.
|
||||
if matches > best[0]:
|
||||
best = (matches, s)
|
||||
results.append((ch, best[1], best[0], n))
|
||||
|
||||
results.sort(key=lambda r: -r[2])
|
||||
return results
|
||||
|
||||
|
||||
# ── Driver ──────────────────────────────────────────────────────────────────
|
||||
@@ -310,11 +343,17 @@ def main():
|
||||
for si, sample_start in enumerate(event.segment_sample_starts):
|
||||
print(f" seg {si}: sample {sample_start}")
|
||||
|
||||
# When score_segment_against_all_channels is implemented:
|
||||
# for si in range(len(event.segment_starts)):
|
||||
# results = score_segment_against_all_channels(event, si)
|
||||
# best = results[0]
|
||||
# print(f" seg {si}: best fit = {best}")
|
||||
for si in range(len(event.segment_starts)):
|
||||
results = score_segment_against_all_channels(event, si)
|
||||
if not results:
|
||||
print(f" seg {si}: (no scorable data)")
|
||||
continue
|
||||
tag = "✓" if results[0][2] / max(results[0][3], 1) > 0.9 else " "
|
||||
top = results[0]
|
||||
print(f" seg {si}: best fit {tag} = {top[0]:<5} "
|
||||
f"starting at sample {top[1]:>5}, {top[2]:>4}/{top[3]:<4} match"
|
||||
+ (f" (next: {results[1][0]} @{results[1][1]} {results[1][2]}/{results[1][3]})"
|
||||
if len(results) > 1 else ""))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -235,20 +235,51 @@ def test_segment_counter_increments():
|
||||
|
||||
|
||||
@pytest.mark.parametrize("event_name", list(FIXTURES_INFO.keys()))
|
||||
def test_decode_waveform_v2_returns_none_until_verified(event_name):
|
||||
"""
|
||||
The full per-channel decoder is not yet wired up.
|
||||
|
||||
This test ensures decode_waveform_v2 returns ``None`` so callers know
|
||||
to keep using the legacy decoder. When a verified decoder lands,
|
||||
flip this assertion and add ground-truth tests against the bundled
|
||||
TXT exports.
|
||||
"""
|
||||
def test_decode_waveform_v2_returns_dict(event_name):
|
||||
"""decode_waveform_v2 returns a dict with all 4 channels (verified 2026-05-11)."""
|
||||
path = _fixture_path(event_name)
|
||||
if not os.path.exists(path):
|
||||
pytest.skip(f"fixture missing: {path}")
|
||||
body = _bw_body(path)
|
||||
assert decode_waveform_v2(body) is None
|
||||
result = decode_waveform_v2(body)
|
||||
assert result is not None
|
||||
assert set(result.keys()) == {"Tran", "Vert", "Long", "MicL"}
|
||||
|
||||
|
||||
# Multi-channel ground-truth fixtures. Each row: (path, channel, n_to_verify).
|
||||
# These lock in the channel-rotation hypothesis: segments cycle T → V → L → M,
|
||||
# with each segment header carrying a 2-sample anchor pair (bytes [14:18])
|
||||
# for THIS segment's channel plus 2 continuation deltas (bytes [0:4]) for
|
||||
# the PREVIOUS channel.
|
||||
MULTICHANNEL_FIXTURES = [
|
||||
# V70 (Mic-heavy, geos all near zero): perfect decode through first segment of each channel.
|
||||
(os.path.join(os.path.dirname(__file__), "fixtures", "5-11-26", "M529LL1L.V70"), "Tran", 512),
|
||||
(os.path.join(os.path.dirname(__file__), "fixtures", "5-11-26", "M529LL1L.V70"), "Vert", 512),
|
||||
(os.path.join(os.path.dirname(__file__), "fixtures", "5-11-26", "M529LL1L.V70"), "Long", 512),
|
||||
# JQ0 (Vert-heavy): first 512 samples per channel decode byte-exact.
|
||||
(os.path.join(os.path.dirname(__file__), "fixtures", "5-11-26", "M529LL1L.JQ0"), "Tran", 512),
|
||||
(os.path.join(os.path.dirname(__file__), "fixtures", "5-11-26", "M529LL1L.JQ0"), "Vert", 258),
|
||||
# SP0 (loud all): Long all 3 segments byte-exact (1536 samples).
|
||||
(os.path.join(os.path.dirname(__file__), "fixtures", "5-11-26", "M529LL1A.SP0"), "Long", 1536),
|
||||
]
|
||||
|
||||
|
||||
@pytest.mark.parametrize("path,channel,n", MULTICHANNEL_FIXTURES)
|
||||
def test_decode_waveform_v2_channels_match_truth(path, channel, n):
|
||||
"""Decoded channels match the BW ASCII export byte-exact for the verified ranges."""
|
||||
if not os.path.exists(path):
|
||||
pytest.skip(f"fixture missing: {path}")
|
||||
with open(path, "rb") as f:
|
||||
body = f.read()[43:-26]
|
||||
truth = _full_truth_channel(path, channel)
|
||||
decoded = decode_waveform_v2(body)
|
||||
assert decoded is not None
|
||||
pred = decoded[channel]
|
||||
assert len(pred) >= n, f"only {len(pred)} samples decoded, expected ≥ {n}"
|
||||
for i in range(n):
|
||||
assert pred[i] == truth[i], (
|
||||
f"{os.path.basename(path)} {channel}[{i}]: pred={pred[i]} truth={truth[i]}"
|
||||
)
|
||||
|
||||
|
||||
# ── decode_tran_initial: confirmed correct against ground truth ──────────────
|
||||
@@ -288,11 +319,16 @@ TRAN_INITIAL_FIXTURES = [
|
||||
|
||||
|
||||
def _full_truth(path):
|
||||
"""Load the BW ASCII truth for an event."""
|
||||
"""Load Tran samples (in 16-count units) from the BW ASCII export."""
|
||||
return _full_truth_channel(path, "Tran")
|
||||
|
||||
|
||||
def _full_truth_channel(path, channel):
|
||||
"""Load one channel's samples (in 16-count units) from the BW ASCII export."""
|
||||
import re
|
||||
col_idx = {"Tran": 0, "Vert": 1, "Long": 2, "MicL": 3}[channel]
|
||||
with open(path + ".TXT", "r", encoding="utf-8", errors="replace") as f:
|
||||
lines = f.read().splitlines()
|
||||
# Find columns header.
|
||||
header_idx = None
|
||||
for i, line in enumerate(lines):
|
||||
if "Tran" in line and "Vert" in line and "Long" in line and "MicL" in line:
|
||||
@@ -306,7 +342,7 @@ def _full_truth(path):
|
||||
if len(parts) < 4:
|
||||
continue
|
||||
try:
|
||||
out.append(round(float(parts[0]) * 200))
|
||||
out.append(round(float(parts[col_idx]) * 200))
|
||||
except ValueError:
|
||||
continue
|
||||
return out
|
||||
|
||||
Reference in New Issue
Block a user