histogram_codec: drop _MAX_PEAK_COUNT 4096 → 2200. The old ceiling
let extension-byte blocks slip through at up to 20.48 in/s per
channel, producing 35× inflated PVS sums when first deployed to
prod. 2200 covers Normal-range full-scale (10 in/s = 2000 counts)
plus 10% headroom for quantization edge cases.
backfill_sidecars: also preserve the bw_report block alongside
review + extensions when regenerating sidecars. event_to_sidecar_dict
takes a BwAsciiReport dataclass not a dict, so for bw_report we
overlay the existing block after regen rather than passing as a kwarg.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered while running the backfill on prod: certain histogram
blocks contain an undocumented extension byte format whose naive
uint16 LE interpretation yields physically impossible peak values
(150+ in/s when the device max is 10). Concrete example from
K558LKSG.3I0H block at body+7424:
bytes [6:10] = 05 79 69 00
current code: T_peak = uint16 LE = 0x7905 = 30981 → 154.9 in/s
reality: T_peak = byte[6] = 5 → 0.025 in/s (matches BW display)
The high byte (0x79 here) appears to be an extension field — possibly
"time of peak within interval" or a Histogram+Continuous sub-mode
marker. Observed across BE9558 and BE18003 units in prod data; never
appeared in the BE12844 fixture corpus the codec was originally
verified against.
Effect on prod: 26 out of 1433 blocks in this one event had inflated
peaks, plus dozens of similar events across the fleet → sum(PVS)
inflated from baseline 988 to 34501 (35x). Rolled back via the
pre-backfill snapshot before any UI exposure.
Defensive fix: bounds-check peak counts in `_decode_block`. Any
field exceeding `_MAX_PEAK_COUNT` (4096 = ~20 in/s, well past the
device's 10 in/s Normal-range FS) causes the block to be skipped
entirely. Other valid blocks in the same event still decode
correctly.
Trade-off: those skipped blocks lose their per-interval data
(peaks + frequencies). Acceptable until the extension format is
reverse-engineered — better than propagating bogus values into PVS
computations downstream.
The 24 existing tests all still pass — the fixtures used during the
original codec development don't exercise the extension-byte case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The histogram-mode event body is now byte-exact decodable.
Companion to the waveform body codec — together they cover every
event file the watcher forwards. Cracked in one session via
cross-event correlation against BW's ASCII export.
The §7.6.2 spec in instantel_protocol_reference.md was structurally
correct (32-byte blocks) but the per-sample semantics were
under-documented. Cross-checking block 130 of N844L6Z8.ZR0H
against its TXT row revealed the layout perfectly:
slot[0] = 10 (constant marker)
slot[1] = T_peak_count (× 0.005 → in/s at Normal range)
slot[2] = T_halfperiod (freq_Hz = 512 / halfp)
slot[3] = V_peak_count
slot[4] = V_halfperiod
slot[5] = L_peak_count
slot[6] = L_halfperiod
slot[7] = MicL_peak_count (dB via waveform_codec.mic_count_to_db)
slot[8] = MicL_halfperiod
The `>100 Hz` sentinel is halfperiod ≤ 5 (since 512/5 = 100 Hz).
Mic dB uses the SAME formula as the waveform codec (sign × (81.94
+ 20·log10(|count|))) — they share the mic ADC calibration constant.
Block identification anchor: bytes [22:24] == 0x0000 AND
bytes [28:32] == 1e 0a 00 00. The tail signature is the most
reliable distinguisher from non-block content in the file.
Files:
minimateplus/histogram_codec.py (new) — decoder + public API
matching the waveform codec's shape:
walk_body(body) -> records
decode_histogram_body(body) -> {Tran, Vert, Long, MicL}
decode_histogram_body_full(body) -> [per-interval dicts]
half_period_to_hz, geo_count_to_ins helpers
minimateplus/event_file_io.py (modified) — read_blastware_file
now tries the waveform codec first, falls back to the histogram
codec on failure. Same output shape, same downstream pipeline.
tests/test_histogram_codec.py (new) — 24 regression locks against
the in-repo fixture corpus, byte-exact against BW ASCII export
for peaks (all 4 channels), frequencies (all 4 channels,
including >100 Hz sentinel handling), block framing, and
segment-ID accounting.
scripts/backfill_sidecars.py (modified) — the has_samples
short-circuit added in the histogram-pending era is now a
pure defensive guard. Histograms in prod will regen .h5 files
correctly on the next backfill run.
docs/histogram_codec_re_status.md (updated) — supersedes the
earlier "in progress" version with the verified format and
test-coverage summary. Notes a few non-essential fields still
open (4-byte block metadata, Geo PVS, Mic psi(L) — none of
which are needed for waveform reconstruction).
Total verified coverage: ~3,500 blocks across 5 fixtures, every
field of every block byte-exact against BW.
The watcher-forwarded histogram event corpus on prod (~10,000
events) will now produce correct .h5 sidecars on the next backfill
run. No additional changes needed to the backfill flow — the
existing tool_version-bump cascade picks them up automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>