codec: wire decode_waveform_v2 into production; add MicL dB helper

Replaces the broken legacy int16 LE decoder in client.py with the
verified multi-channel codec.  Three changes:

1. blastware_file.extract_body_bytes(a5_frames) — new helper that
   factors out the body-reconstruction logic from write_blastware_file
   so both writers (BW binary) and decoders (sample arrays) can use
   the same canonical bytes.

2. waveform_codec.decode_a5_frames(a5_frames) — production entry point.
   Returns the raw_samples dict consumers expect (Tran/Vert/Long as
   int16 ADC counts; MicL as native ADC counts).  Internally:
     A5 frames → extract_body_bytes → decode_waveform_v2
                → decoded_to_adc_counts (geos ×16; mic pass-through)

3. waveform_codec.mic_count_to_db(count) — MicL ADC → dB(L) per BW's
   display formula:
     dB = sign(count) × (81.94 + 20 × log10(|count|))   for |count| ≥ 1
   Verified against V70 fixture: count=813 → 140.14 dB (BW PSPL 140.1).

client.py:_decode_a5_waveform is reduced to a thin wrapper that calls
decode_a5_frames and populates event.raw_samples.  Original implementation
preserved as _decode_a5_waveform_LEGACY (dead code; reference only).

Also fixed a tail-end bug in decode_waveform_v2 where trailer-section
"40 02" markers (containing ASCII serial bytes, NOT real segment headers)
were being mis-interpreted, producing 2 spurious samples per channel at
the end of each event.  Added bytes [12:14] == "02 00" validation to
reject non-header markers.

7 new pytest tests cover the new helpers and dB conversion.  Total:
71 passing (up from 64).

Known limitation (carried over from before): the walker still stops
mid-event on the loudest fixtures (SP0/SS0/SV0/event-b) at some
mid-segment edge cases not yet characterized.  Every sample reached
is decoded correctly; the walker just doesn't reach all of them.
Loud events still yield 5,000–15,000 byte-exact samples each.
This commit is contained in:
Claude
2026-05-16 00:27:14 +00:00
committed by serversdown
parent 2ff2762eec
commit 85f4bcfe86
6 changed files with 370 additions and 46 deletions
+115 -19
View File
@@ -1,31 +1,35 @@
"""
waveform_codec.py — block-walker and partial decoder for the MiniMate Plus
waveform_codec.py — block-walker and verified decoder for the MiniMate Plus
waveform-file body.
PARTIAL REVERSE-ENGINEERING — last updated 2026-05-11.
FULLY DECODED 2026-05-11. Every block type, every channel, and the
channel-rotation rule are verified byte-exact against BW's ASCII export
across the 9-event fixture bundle (47,364 ADC samples, zero errors).
The Blastware waveform-file body — the bytes between the 21-byte STRT
record and the 26-byte file footer — is NOT raw int16 LE samples (the
historical assumption that produced full-scale ±32K noise on every
event). It is a tagged variable-length block stream with a custom
delta + RLE codec.
record and the 26-byte file footer — is a tagged variable-length block
stream with a custom delta + RLE codec. (Not raw int16 LE, which was
the historical wrong assumption that produced ±32K noise on every event.)
Current status:
- Block framing: ✅ solved (block types and lengths all confirmed)
- Tran channel, segment 0: ✅ solved (decode_tran_initial returns
byte-exact values vs BW's ASCII export, across 5 of 5 loud-bundle
events; first ~510 samples per event)
- Multi-segment Tran continuation: ❌ open (every hypothesis breaks
at the segment-1 boundary around sample 512)
- Vert / Long / Mic channel decoders: ❌ open
- 30 NN block content: ❌ open (only appears in loud-from-start events)
- Block framing: ✅ solved (5 block types and lengths all confirmed)
- Per-channel decode: ✅ solved (Tran / Vert / Long / MicL all byte-exact)
- Channel rotation: ✅ Tran → Vert → Long → MicL per segment
- Segment header: ✅ fully decoded (anchor pair + prev-channel extension)
- 30 NN packed-delta block: ✅ NN × 12-bit signed deltas in NN/4 groups
- MicL → dB(L) conversion: ✅ ``mic_count_to_db`` matches BW display
- Production wiring: ✅ ``client.py:_decode_a5_waveform`` uses the new
codec (via ``decode_a5_frames``). ``.h5`` sidecars now render
correctly.
Production code in client.py still uses the broken int16 LE decoder.
``decode_waveform_v2`` here returns ``None`` as a placeholder. Callers
that need sample arrays should treat the legacy decoder's output as
"unverified" — the BW binary write path is the only sample-bearing
output that is currently trustworthy.
Known limitations:
- Walker stops early on the loudest events (SP0, SS0, SV0, event-b) at
some mid-segment edge cases not yet fully characterized. Every
sample reached IS correct; the walker just doesn't reach all of
them yet. The cleanly-decoded subset is still ~500015000 samples
per loud event.
────────────────────────────────────────────────────────────────────────────
Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
@@ -132,6 +136,7 @@ and the suggested next experiment ("segment-channel scoring analyzer").
from __future__ import annotations
import math
from dataclasses import dataclass
from typing import List, Optional, Tuple
@@ -446,6 +451,12 @@ def decode_waveform_v2(body: bytes) -> Optional[dict]:
header = blocks[hi]
if len(header.data) < 18:
continue
# Validate: real segment headers have bytes [12:14] = `02 00`.
# Trailer/footer "40 02" markers contain ASCII serial bytes or other
# non-header data there and would otherwise be mis-interpreted as
# segment headers, adding spurious samples at the tail.
if header.data[12:14] != b"\x02\x00":
break
# Extend the PREVIOUS channel by 2 more samples (deltas in bytes [0:4]).
prev_d0 = int.from_bytes(header.data[0:2], "big", signed=True)
prev_d1 = int.from_bytes(header.data[2:4], "big", signed=True)
@@ -464,3 +475,88 @@ def decode_waveform_v2(body: bytes) -> Optional[dict]:
last_value[channel] = apply_blocks(channel, c1, hi + 1, next_hi)
return out
# ── ADC-scale conversion helpers ────────────────────────────────────────────
# Scaling factor: decode_waveform_v2 produces geo-channel samples in the BW
# display quantization (16-count units, LSB = 0.005 in/s at Normal range).
# The legacy consumer pipeline (sfm/event_hdf5.py) expects raw_samples in
# 1-count ADC units (× full_scale / 32768 → physical). To plug the new
# decoder in without rewriting consumers, multiply geo values by 16.
#
# Mic samples are already in raw ADC counts (decoded value 1 = 1 mic ADC count
# = -81.94 dB on the BW display). Mic values pass through unchanged.
_GEO_DECODER_TO_ADC = 16
def decoded_to_adc_counts(decoded: dict) -> dict:
"""Convert :func:`decode_waveform_v2` output to int16 ADC counts.
Geo channels are scaled by ×16 (decoder produces 16-count units,
consumer expects 1-count ADC). Mic is passed through as raw counts.
"""
if not decoded:
return {}
return {
"Tran": [v * _GEO_DECODER_TO_ADC for v in decoded.get("Tran", [])],
"Vert": [v * _GEO_DECODER_TO_ADC for v in decoded.get("Vert", [])],
"Long": [v * _GEO_DECODER_TO_ADC for v in decoded.get("Long", [])],
"MicL": list(decoded.get("MicL", [])),
}
def mic_count_to_db(count: int) -> float:
"""Convert a MicL ADC count to dB(L) for BW-display-compatible output.
Empirical formula (confirmed 2026-05-11 against V70 fixture: count=813
→ 140.1 dB; count=±1 → ±81.94 dB; count=±24 → ±109.5 dB):
dB = sign(count) × (81.94 + 20 × log10(|count|)) for |count| ≥ 1
dB = 0.0 for count == 0
The constant 81.94 corresponds to 10^(81.94/20) ≈ 12490 mic ADC counts
being the dB(L) reference level — almost certainly a calibration
constant from the device's mic.
"""
if count == 0:
return 0.0
sign = 1.0 if count > 0 else -1.0
return sign * (81.94 + 20.0 * math.log10(abs(count)))
# ── A5-frame entry point ────────────────────────────────────────────────────
def decode_a5_frames(a5_frames) -> Optional[dict]:
"""Decode a list of A5 (BULK_WAVEFORM_STREAM) frames into per-channel
int16 ADC samples.
Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
with each channel's samples in **1-count ADC units** (the legacy
``event.raw_samples`` convention — multiply by ``full_scale / 32768``
to convert to physical units; for mic, use :func:`mic_count_to_db` or
a per-count psi factor).
Returns ``None`` if the frames cannot be parsed.
This is the wired-up production entry point. It:
1. Reconstructs the BW-binary body bytes from the A5 frames
(``blastware_file.extract_body_bytes``).
2. Runs the verified codec (``decode_waveform_v2``) on the body.
3. Converts to int16 ADC counts via :func:`decoded_to_adc_counts`.
"""
# Local import to avoid a cycle: blastware_file imports models and
# ultimately client.py imports waveform_codec.
from .blastware_file import extract_body_bytes
if not a5_frames:
return None
_strt, body, _footer = extract_body_bytes(a5_frames)
if not body:
return None
decoded = decode_waveform_v2(body)
if decoded is None:
return None
return decoded_to_adc_counts(decoded)