minimateplus: histogram body codec — FULLY DECODED
The histogram-mode event body is now byte-exact decodable.
Companion to the waveform body codec — together they cover every
event file the watcher forwards. Cracked in one session via
cross-event correlation against BW's ASCII export.
The §7.6.2 spec in instantel_protocol_reference.md was structurally
correct (32-byte blocks) but the per-sample semantics were
under-documented. Cross-checking block 130 of N844L6Z8.ZR0H
against its TXT row revealed the layout perfectly:
slot[0] = 10 (constant marker)
slot[1] = T_peak_count (× 0.005 → in/s at Normal range)
slot[2] = T_halfperiod (freq_Hz = 512 / halfp)
slot[3] = V_peak_count
slot[4] = V_halfperiod
slot[5] = L_peak_count
slot[6] = L_halfperiod
slot[7] = MicL_peak_count (dB via waveform_codec.mic_count_to_db)
slot[8] = MicL_halfperiod
The `>100 Hz` sentinel is halfperiod ≤ 5 (since 512/5 = 100 Hz).
Mic dB uses the SAME formula as the waveform codec (sign × (81.94
+ 20·log10(|count|))) — they share the mic ADC calibration constant.
Block identification anchor: bytes [22:24] == 0x0000 AND
bytes [28:32] == 1e 0a 00 00. The tail signature is the most
reliable distinguisher from non-block content in the file.
Files:
minimateplus/histogram_codec.py (new) — decoder + public API
matching the waveform codec's shape:
walk_body(body) -> records
decode_histogram_body(body) -> {Tran, Vert, Long, MicL}
decode_histogram_body_full(body) -> [per-interval dicts]
half_period_to_hz, geo_count_to_ins helpers
minimateplus/event_file_io.py (modified) — read_blastware_file
now tries the waveform codec first, falls back to the histogram
codec on failure. Same output shape, same downstream pipeline.
tests/test_histogram_codec.py (new) — 24 regression locks against
the in-repo fixture corpus, byte-exact against BW ASCII export
for peaks (all 4 channels), frequencies (all 4 channels,
including >100 Hz sentinel handling), block framing, and
segment-ID accounting.
scripts/backfill_sidecars.py (modified) — the has_samples
short-circuit added in the histogram-pending era is now a
pure defensive guard. Histograms in prod will regen .h5 files
correctly on the next backfill run.
docs/histogram_codec_re_status.md (updated) — supersedes the
earlier "in progress" version with the verified format and
test-coverage summary. Notes a few non-essential fields still
open (4-byte block metadata, Geo PVS, Mic psi(L) — none of
which are needed for waveform reconstruction).
Total verified coverage: ~3,500 blocks across 5 fixtures, every
field of every block byte-exact against BW.
The watcher-forwarded histogram event corpus on prod (~10,000
events) will now produce correct .h5 sidecars on the next backfill
run. No additional changes needed to the backfill flow — the
existing tool_version-bump cascade picks them up automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -307,21 +307,15 @@ def main(argv=None) -> int:
|
||||
# (sha mismatch / tool_version too old). The .h5 and
|
||||
# the sidecar are both derived from the same decoder
|
||||
# output, so if the sidecar is stale, so is the .h5.
|
||||
# This is the path that recovers from the broken-
|
||||
# int16-LE codec era — bumping TOOL_VERSION to 0.20.0+
|
||||
# marks every pre-codec sidecar stale, which now
|
||||
# correctly cascades to .h5 regeneration too.
|
||||
#
|
||||
# Skip the .h5 write when the decoder couldn't produce
|
||||
# samples — this is the histogram-mode case today
|
||||
# (waveform_codec.decode_waveform_v2 only handles the
|
||||
# waveform-mode body format per §7.6.1; the histogram
|
||||
# codec at §7.6.2 is documented but not yet implemented).
|
||||
# Without this check we'd replace the existing (broken
|
||||
# int16-LE) histogram .h5 with an empty one, which is
|
||||
# arguably worse for any consumer expecting non-empty
|
||||
# sample arrays. When the histogram codec lands, this
|
||||
# check can come out.
|
||||
# Both waveform and histogram bodies now decode to real
|
||||
# samples via event_file_io.read_blastware_file → either
|
||||
# waveform_codec.decode_waveform_v2 or histogram_codec.
|
||||
# decode_histogram_body. If samples are still empty after
|
||||
# both codecs run, it's a genuine "we can't decode this
|
||||
# file" case (truncated, malformed, or unknown mode);
|
||||
# skip the .h5 write so we don't replace whatever's
|
||||
# there with an empty placeholder.
|
||||
has_samples = bool(
|
||||
ev.raw_samples and any(
|
||||
ev.raw_samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL")
|
||||
@@ -336,7 +330,7 @@ def main(argv=None) -> int:
|
||||
and has_samples
|
||||
)
|
||||
if not has_samples and not args.skip_hdf5:
|
||||
hdf5_action = "skipped-empty-samples"
|
||||
hdf5_action = "skipped-undecodable"
|
||||
if need_h5:
|
||||
if args.dry_run:
|
||||
hdf5_action = "would (re)write"
|
||||
|
||||
Reference in New Issue
Block a user