Histogram body codec — full RE + peak-count fix that resolves the prod inflation incident #26
Reference in New Issue
Block a user
Delete Branch "feat/wire-histogram-codec"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Wires the verified MiniMate Plus histogram body codec into the import path, fully decoding .AB0H event files for the first time. Includes the post-mortem fix for the production incident where a faulty initial codec produced 35× inflated PVS sums on certain units (BE9558 / BE18003 Histogram+Continuous events) and required a DB rollback.
Background
The histogram body codec was previously a stub — only the waveform-mode codec (decode_waveform_v2) worked. read_blastware_file returned empty samples for any histogram event, falling back to the BW ASCII report's event-level peaks for the DB row.
A first cut at the histogram codec (
7183b95) interpreted per-channel peak counts as uint16 LE at byte offsets [6:8] / [10:12] / [14:16] / [18:20]. This happened to be byte-exact against the N844 (BE12844) fixture corpus and passed 24 regression tests.When deployed to prod and run as part of a backfill against the live waveform store, max PVS exploded from 988 → 34501 (35×), driven by histogram events from BE9558 and BE18003 units producing per-channel peaks up to 268 in/s (impossible — 10 in/s is the geophone full-scale at Normal range). Production DB was rolled back to the pre-backfill snapshot.
Root cause
The peak count field is uint8 at byte[6] / [10] / [14] / [18], not uint16 LE spanning [6:8] etc. The N844 fixture corpus has zero values in the adjacent bytes [7] / [11] / [15] / [19] for every block, making uint8 and uint16 LE numerically equivalent there. On non-N844 events those adjacent bytes hold an annotation field whose meaning isn't fully understood (empirically non-zero on intervals with sub-Hz or unmeasurable freq); the prior interpretation read them as the high byte of the peak count, producing physically impossible amplitudes.
Cross-correlated against BW's per-interval ASCII export on:
K558 (BE9558, Tran-drift fault): 100% T/V/L/M match across 3 events × 1435 blocks each
T003 (BE18003, Histogram+Continuous): 100% T/V/L, 99.3% M (the 0.7% delta is 0.05 dB rounding in BW's display)
N599 (BE13599): 100% all channels
N844 (BE12844, original fixture corpus): 100% all channels — unchanged
Secondary fix: bw_report preservation in backfill
scripts/backfill_sidecars.py regenerates .sfm.json sidecars from a rebuilt Event object. The Event is built from the binary + .a5.pkl + DB row, but the bw_report block (parsed from the original _ASCII.TXT at ingest time, then discarded — the .TXT itself isn't stored) wasn't preserved across regen. Pre-fix, every backfill silently wiped bw_report from every sidecar.
Now preserved verbatim, alongside the existing review and extensions preservation.
Validation
Tier-by-tier verification on dev (10.0.0.44) against an rsync'd snapshot of prod's DB + waveform store:
Layer Result
26 unit tests (24 N844 byte-exact + 2 synthetic K558 regression) ✅
Backfill PVS sum (snapshot ~14k events) 2059 → 1839 (10% reduction = K558 inflation removed)
Idempotency (re-run --force) identical output, max_pvs unchanged
bw_report preservation (10317 sidecars) 0 WIPED, 0 CHANGED
Runtime API: K558 waveform.json max_abs = 0.293 in/s, matches DB peak byte-exact
Runtime API: N844 baseline max 0.005-0.01 in/s = clean noise floor
Top-10 PVS events post-backfill 2 legit MiniMate Plus high-amplitude events + 8 Micromate events — sensible distribution
Known limitations
byte[5]!=0 histogram sub-format (filed as separate work): a handful of events (T190LD5Q, O121L4L1 in prod) use a histogram body format my walker doesn't recognize (byte[5] is non-zero instead of zero). Old codec and new codec both produce 0 valid blocks on these — DB peaks come from the bw_report ASCII overlay (which is what BW computed from the same binary, so still correct). Pre-existing, not a regression. Will need binary + ASCII pairs from a few byte[5]!=0 events to RE the format.
Deploy procedure
Rebuild sfm service against this branch
Optionally run scripts/backfill_sidecars.py --force to apply the codec correction to existing events (will reduce inflated K558-style PVS values to their correct magnitudes)
The scripts/check_bw_report_preservation.py tool lets you gate the backfill on preservation if you want belt-and-suspenders — capture a baseline, run backfill, diff to confirm 0 WIPED before/after
Files
minimateplus/histogram_codec.py — new, full codec implementation (uint8 peaks, uint16 LE half-periods, annotation byte preserved as record["annotations"] tuple for future RE)
minimateplus/event_file_io.py — wires codec into read_blastware_file, leaves peak_values=None when samples can't be decoded (so DB peaks fall back to bw_report rather than getting overwritten with zeros)
scripts/backfill_sidecars.py — preserves bw_report block on regen, filters Thor IDF files from the BW-only walk, cascades .h5 regen when sidecar is stale
scripts/check_bw_report_preservation.py — new, snapshot/diff tool for verifying backfill doesn't wipe bw_report (used in dev validation)
tests/test_histogram_codec.py — 26 tests including a synthetic K558 interval-12 block as regression lock for the uint8 fix
docs/histogram_codec_re_status.md — status doc with the RE history including the uint8 retraction
Dockerfile — includes scripts/ and micromate/ directories (were missing, broke Thor IDF endpoint)
Two-step tool to verify that backfill_sidecars doesn't wipe the bw_report block from existing sidecars. Workflow: 1. snapshot --out before.json (canonical-JSON hash per sidecar) 2. run backfill 3. diff --baseline before.json (classifies every sidecar: PRESERVED / CHANGED / WIPED / STILL_MISSING / NEW / ADDED / REMOVED) Exit code 1 if any WIPED or CHANGED entries found, 0 otherwise — so it can gate a CI step or a deploy script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>