minimateplus: wire read_blastware_file to verified body codec

`read_blastware_file()` was still calling `_decode_samples_4ch_int16_le`
(the retracted int16-LE-interleaved hypothesis) on the body bytes,
producing ±32K noise on every channel of every BW file read from disk.
This was the path watcher-forwarded events take into the system
(via the import endpoint → save_imported_bw → read_blastware_file,
since the watcher doesn't ship A5 frames), so every .h5 sidecar
generated for a forwarded event has been wrong since the feature
shipped.

The fix is mechanical: pass the body bytes straight to
`waveform_codec.decode_waveform_v2()` and run the result through
`decoded_to_adc_counts()` for the 16x geo scaling.  The body already
starts with the codec's exact 7-byte preamble `00 02 00 [Tran[0] BE]
[Tran[1] BE]` — confirmed by `body[:3].hex()` across all 9 fixture
events.  No body-slice adjustment needed.

If the codec returns None (truncated/malformed file, synthetic test
input with no real waveform), fall back to empty channels with a log
warning.  The rest of the event (timestamp, waveform_key, project
strings, sensor_location, peaks-from-samples=0) is still recoverable.

Verified against the bundled fixture corpus:

  V70  Tran/Vert/Long 3328/3328 sample-sets match .TXT ground truth
       within the 0.005 in/s display quantum, every row
  6S0/RG0/AB0/470 (5-8-26)  3328/2304/1280/1280 samples; Vert PPVs
       match BW's own report within 0.02 in/s
  JQ0  3328 samples, Vert PPV 3.384 vs BW 3.465
  SP0/SS0/SV0 (loud events)  3072–3328 samples; known walker
       tail-truncation 1–7 samples per channel, samples reached are
       byte-exact

Existing `test_read_blastware_file_round_trip` (synthetic empty event)
continues to pass thanks to the None-fallback.  Codec verify scripts
(`analysis/verify_quiet_bundle.py`, `analysis/verify_full_decode.py`)
re-run unchanged.

Added two regression-lock tests in tests/test_event_file_io.py:
  - test_read_blastware_file_decodes_via_codec[6 fixtures]
    — verifies sample count + Vert PPV per fixture
  - test_read_blastware_file_v70_samples_match_txt_truth
    — verifies every one of V70's 3328 sample-sets across Tran/Vert/Long
      matches the .TXT ground truth row-by-row within 0.003 in/s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 18:13:24 +00:00
parent beca5de06e
commit 31d691b40b
2 changed files with 114 additions and 5 deletions
+23 -5
View File
@@ -27,6 +27,7 @@ from typing import Optional, Union
from .models import Event, PeakValues, ProjectInfo, Timestamp
from . import blastware_file as _bw # avoid circular reference at module load
from .bw_ascii_report import BwAsciiReport
from .waveform_codec import decode_waveform_v2, decoded_to_adc_counts
# Reference pressure for dB(L) → psi conversion (20 µPa expressed in psi).
# Same constant as sfm/sfm_webapp.html so server-side and browser-side
@@ -755,11 +756,28 @@ def read_blastware_file(path: Union[str, Path]) -> Event:
ts1 = _bw._decode_ts_be(footer[2:10])
ts2 = _bw._decode_ts_be(footer[10:18])
# Body: first 6 bytes are the preamble (00 00 ff ff ff ff). Strip
# them before decoding samples. Any trailing tail past the last
# full sample-set is silently truncated by _decode_samples_4ch.
sample_bytes = body[6:] if body[:6].hex() in ("0000ffffffff", "0000FFFFFFFF") else body
samples = _decode_samples_4ch_int16_le(sample_bytes)
# Body: decode via the verified BW waveform-body codec. The body
# starts with the codec's 7-byte preamble ``00 02 00 [Tran[0] BE]
# [Tran[1] BE]`` and continues with the tagged-block stream the codec
# walks. See ``minimateplus/waveform_codec.py`` + ``docs/waveform_codec_re_status.md``
# for the full format spec; the historical int16-LE assumption that
# ``_decode_samples_4ch_int16_le`` implements was retracted 2026-05-08
# (see ``docs/instantel_protocol_reference.md`` §7.6.1).
#
# If decode fails (malformed file, truncated body, synthetic test
# input), fall back to empty channels — the rest of the event
# (timestamp, waveform_key, project strings) is still recoverable
# and useful. The peaks-from-samples helper handles empty input
# gracefully.
decoded = decode_waveform_v2(body)
if decoded is None:
log.warning(
"%s: waveform body codec failed to decode (body starts %s) — "
"raw_samples will be empty", path, body[:8].hex(" "),
)
samples = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
else:
samples = decoded_to_adc_counts(decoded)
# Metadata strings (label-anchored search across the body).
project = _find_first_string(body, b"Project:")