The histogram-mode event body is now byte-exact decodable.
Companion to the waveform body codec — together they cover every
event file the watcher forwards. Cracked in one session via
cross-event correlation against BW's ASCII export.
The §7.6.2 spec in instantel_protocol_reference.md was structurally
correct (32-byte blocks) but the per-sample semantics were
under-documented. Cross-checking block 130 of N844L6Z8.ZR0H
against its TXT row revealed the layout perfectly:
slot[0] = 10 (constant marker)
slot[1] = T_peak_count (× 0.005 → in/s at Normal range)
slot[2] = T_halfperiod (freq_Hz = 512 / halfp)
slot[3] = V_peak_count
slot[4] = V_halfperiod
slot[5] = L_peak_count
slot[6] = L_halfperiod
slot[7] = MicL_peak_count (dB via waveform_codec.mic_count_to_db)
slot[8] = MicL_halfperiod
The `>100 Hz` sentinel is halfperiod ≤ 5 (since 512/5 = 100 Hz).
Mic dB uses the SAME formula as the waveform codec (sign × (81.94
+ 20·log10(|count|))) — they share the mic ADC calibration constant.
Block identification anchor: bytes [22:24] == 0x0000 AND
bytes [28:32] == 1e 0a 00 00. The tail signature is the most
reliable distinguisher from non-block content in the file.
Files:
minimateplus/histogram_codec.py (new) — decoder + public API
matching the waveform codec's shape:
walk_body(body) -> records
decode_histogram_body(body) -> {Tran, Vert, Long, MicL}
decode_histogram_body_full(body) -> [per-interval dicts]
half_period_to_hz, geo_count_to_ins helpers
minimateplus/event_file_io.py (modified) — read_blastware_file
now tries the waveform codec first, falls back to the histogram
codec on failure. Same output shape, same downstream pipeline.
tests/test_histogram_codec.py (new) — 24 regression locks against
the in-repo fixture corpus, byte-exact against BW ASCII export
for peaks (all 4 channels), frequencies (all 4 channels,
including >100 Hz sentinel handling), block framing, and
segment-ID accounting.
scripts/backfill_sidecars.py (modified) — the has_samples
short-circuit added in the histogram-pending era is now a
pure defensive guard. Histograms in prod will regen .h5 files
correctly on the next backfill run.
docs/histogram_codec_re_status.md (updated) — supersedes the
earlier "in progress" version with the verified format and
test-coverage summary. Notes a few non-essential fields still
open (4-byte block metadata, Geo PVS, Mic psi(L) — none of
which are needed for waveform reconstruction).
Total verified coverage: ~3,500 blocks across 5 fixtures, every
field of every block byte-exact against BW.
The watcher-forwarded histogram event corpus on prod (~10,000
events) will now produce correct .h5 sidecars on the next backfill
run. No additional changes needed to the backfill flow — the
existing tool_version-bump cascade picks them up automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures everything learned in the 2026-05-20 session before scope
forced a pause:
- Block framing is solved: 32-byte blocks, one per histogram
interval, signature byte pattern `[22:24]=0x0000` +
`[28:32]=0x1e 0x0a 0x00 0x00` reliably identifies data blocks.
- Block count = interval count (791 blocks in N844L20G.630H for
a TXT-reported 792 intervals).
- Sample[0] = Tran peak in 0.0005 in/s/count units (verified on
one event — needs cross-event confirmation).
- Samples 1-8 → channel/metric mapping is still open. None of
the obvious layouts (peak-then-freq alternating, all-peaks-
then-all-freqs, per-channel 3-tuples) match the TXT values
across multiple blocks. Likely needs a higher-activity
fixture (current N844 corpus is all noise-floor data) to
disambiguate.
- `>100 Hz` sentinel encoding in the binary is unknown.
- 4-byte variable metadata field at block[24:28] needs
correlation work against TXT columns.
Doc mirrors the structure of docs/waveform_codec_re_status.md so
a future RE session has a familiar entry point. Includes the
suggested attack plan + the code seam where the eventual decoder
will land (minimateplus/histogram_codec.py).
The §7.6.2 spec in instantel_protocol_reference.md is structurally
correct but doesn't pin down per-sample semantics — this doc
supersedes it where they conflict on confidence level.
No code shipped on this branch. When the codec is cracked, the
plan is to land minimateplus/histogram_codec.py + wire into
event_file_io.read_blastware_file() + remove the has_samples
short-circuit from scripts/backfill_sidecars.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes a data-loss bug discovered while dry-running the backfill against
the prod store.
Symptom: every histogram event in the store has its body decoded by
read_blastware_file → codec returns None → samples = empty dict →
``ev.peak_values = _peaks_from_samples(empty)`` returns
``PeakValues(0, 0, 0, 0, 0)`` (NOT None). The backfill script's
existing "seed from DB row when peak_values is None" branch then
correctly *skips* the seeding, and the all-zeros PeakValues flows into
``db.insert_events()``'s UPSERT path, OVERWRITING the existing good DB
peak values for that event (which were populated from the paired BW
ASCII report at ingest).
Net effect: running the backfill on prod would have wiped the PPV /
mic / vector-sum columns for ~10,000 histogram events.
Fix: only compute peaks-from-samples when there are actually samples.
For events the codec couldn't decode (histogram-mode bodies, until
the §7.6.2 histogram codec is wired in), leave peak_values=None as
the "we don't know" signal. Downstream consumers:
- backfill_sidecars.py — its existing ``if ev.peak_values is None:``
branch (line 243) seeds from the DB row, preserving the real
BW-report peaks across the regen.
- WaveformStore.save_imported_bw — apply_report_to_event overlays
peaks from the paired BW ASCII report when one was uploaded.
Histogram imports without a paired report end up with NULL peaks
in the DB, which is correct (better than zeros — clearly says
"no peak data available" rather than "peaks are exactly zero").
Updated the existing synthetic-event round-trip test to expect
peak_values=None for the no-real-body case, which is the truth now.
The 7 fixture-corpus regression tests for real BW waveforms continue
to pass — those have decodable samples, so peak_values is still
populated from the codec output as before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered while dry-running the backfill on the prod store: ~10,000
of ~10,059 events are histogram-mode (filename extension `*H`), and
the waveform-body codec wired in via the previous commit doesn't
handle histogram-mode bodies — only the waveform-mode codec at
§7.6.1 is implemented; the histogram-mode codec at §7.6.2 of the
protocol reference is documented but no Python implementation
exists yet.
Without this guard, every histogram event's .h5 file would be
*replaced* with an empty one — strictly worse than today's
broken-int16-LE .h5 because any downstream viewer expecting
non-empty sample arrays would now error out instead of just
rendering wrong values.
Fix: after the decoder runs, check whether any channel has samples.
If not, skip the .h5 write entirely. The sidecar still regenerates
(refreshing the tool_version stamp and any peaks/project info from
the DB row), but the existing .h5 is left untouched.
This is a *temporary* gate. When the histogram codec lands (next
branch: `feat/wire-histogram-codec`), the has_samples check can be
removed and the backfill will then correctly regenerate all .h5
files, histogram and waveform alike.
Observed effect (dry-run on prod store, 10,059 events):
- waveform events (~5%): "[DRY ] would write … + .h5 (would (re)write)"
- histogram events (~95%): "[DRY ] would write … + .h5 (skipped-empty-samples)"
- sidecar tool_version bump succeeds for both
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>