the BE9558 / BE18003 extension-byte case
The bytes at [7]/[11]/[15]/[19] are an annotation field (purpose still
unclear — empirically non-zero on intervals with sub-Hz or unmeasurable
freq), NOT the high byte of the peak count. The N844 fixture corpus
the original RE was done against had zero values in those bytes for
every block, so uint8 and uint16 LE were equivalent there — but on
real BE9558 Tran-drift events and BE18003 Histogram+Continuous events
the uint16 LE interpretation produced peaks up to 268 in/s and 35×
inflated PVS sums.
Cross-correlated against BW's per-interval ASCII export on:
- K558LKZU/LL1P/LL3K → 100% T/V/L/M peak match (1435 blocks each)
- T003LKZR/LL0O/LL1M → 100% T/V/L, 99.3% M (0.05 dB rounding only)
- N599LKZS/LL0L → 100% all channels
- N844 fixture corpus → 100% all channels (unchanged)
Annotations preserved on every record for future RE; the defensive
_MAX_PEAK_COUNT bound is no longer needed (uint8 maxes at 1.275 in/s,
well below any physical limit).
Synthetic regression test added using the verbatim K558LKZU.RE0H
interval-12 block.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The histogram-mode event body is now byte-exact decodable.
Companion to the waveform body codec — together they cover every
event file the watcher forwards. Cracked in one session via
cross-event correlation against BW's ASCII export.
The §7.6.2 spec in instantel_protocol_reference.md was structurally
correct (32-byte blocks) but the per-sample semantics were
under-documented. Cross-checking block 130 of N844L6Z8.ZR0H
against its TXT row revealed the layout perfectly:
slot[0] = 10 (constant marker)
slot[1] = T_peak_count (× 0.005 → in/s at Normal range)
slot[2] = T_halfperiod (freq_Hz = 512 / halfp)
slot[3] = V_peak_count
slot[4] = V_halfperiod
slot[5] = L_peak_count
slot[6] = L_halfperiod
slot[7] = MicL_peak_count (dB via waveform_codec.mic_count_to_db)
slot[8] = MicL_halfperiod
The `>100 Hz` sentinel is halfperiod ≤ 5 (since 512/5 = 100 Hz).
Mic dB uses the SAME formula as the waveform codec (sign × (81.94
+ 20·log10(|count|))) — they share the mic ADC calibration constant.
Block identification anchor: bytes [22:24] == 0x0000 AND
bytes [28:32] == 1e 0a 00 00. The tail signature is the most
reliable distinguisher from non-block content in the file.
Files:
minimateplus/histogram_codec.py (new) — decoder + public API
matching the waveform codec's shape:
walk_body(body) -> records
decode_histogram_body(body) -> {Tran, Vert, Long, MicL}
decode_histogram_body_full(body) -> [per-interval dicts]
half_period_to_hz, geo_count_to_ins helpers
minimateplus/event_file_io.py (modified) — read_blastware_file
now tries the waveform codec first, falls back to the histogram
codec on failure. Same output shape, same downstream pipeline.
tests/test_histogram_codec.py (new) — 24 regression locks against
the in-repo fixture corpus, byte-exact against BW ASCII export
for peaks (all 4 channels), frequencies (all 4 channels,
including >100 Hz sentinel handling), block framing, and
segment-ID accounting.
scripts/backfill_sidecars.py (modified) — the has_samples
short-circuit added in the histogram-pending era is now a
pure defensive guard. Histograms in prod will regen .h5 files
correctly on the next backfill run.
docs/histogram_codec_re_status.md (updated) — supersedes the
earlier "in progress" version with the verified format and
test-coverage summary. Notes a few non-essential fields still
open (4-byte block metadata, Geo PVS, Mic psi(L) — none of
which are needed for waveform reconstruction).
Total verified coverage: ~3,500 blocks across 5 fixtures, every
field of every block byte-exact against BW.
The watcher-forwarded histogram event corpus on prod (~10,000
events) will now produce correct .h5 sidecars on the next backfill
run. No additional changes needed to the backfill flow — the
existing tool_version-bump cascade picks them up automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures everything learned in the 2026-05-20 session before scope
forced a pause:
- Block framing is solved: 32-byte blocks, one per histogram
interval, signature byte pattern `[22:24]=0x0000` +
`[28:32]=0x1e 0x0a 0x00 0x00` reliably identifies data blocks.
- Block count = interval count (791 blocks in N844L20G.630H for
a TXT-reported 792 intervals).
- Sample[0] = Tran peak in 0.0005 in/s/count units (verified on
one event — needs cross-event confirmation).
- Samples 1-8 → channel/metric mapping is still open. None of
the obvious layouts (peak-then-freq alternating, all-peaks-
then-all-freqs, per-channel 3-tuples) match the TXT values
across multiple blocks. Likely needs a higher-activity
fixture (current N844 corpus is all noise-floor data) to
disambiguate.
- `>100 Hz` sentinel encoding in the binary is unknown.
- 4-byte variable metadata field at block[24:28] needs
correlation work against TXT columns.
Doc mirrors the structure of docs/waveform_codec_re_status.md so
a future RE session has a familiar entry point. Includes the
suggested attack plan + the code seam where the eventual decoder
will land (minimateplus/histogram_codec.py).
The §7.6.2 spec in instantel_protocol_reference.md is structurally
correct but doesn't pin down per-sample semantics — this doc
supersedes it where they conflict on confidence level.
No code shipped on this branch. When the codec is cracked, the
plan is to land minimateplus/histogram_codec.py + wire into
event_file_io.read_blastware_file() + remove the has_samples
short-circuit from scripts/backfill_sidecars.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>