Two-step tool to verify that backfill_sidecars doesn't wipe the
bw_report block from existing sidecars. Workflow:
1. snapshot --out before.json (canonical-JSON hash per sidecar)
2. run backfill
3. diff --baseline before.json (classifies every sidecar:
PRESERVED / CHANGED / WIPED / STILL_MISSING / NEW / ADDED / REMOVED)
Exit code 1 if any WIPED or CHANGED entries found, 0 otherwise — so
it can gate a CI step or a deploy script.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
the BE9558 / BE18003 extension-byte case
The bytes at [7]/[11]/[15]/[19] are an annotation field (purpose still
unclear — empirically non-zero on intervals with sub-Hz or unmeasurable
freq), NOT the high byte of the peak count. The N844 fixture corpus
the original RE was done against had zero values in those bytes for
every block, so uint8 and uint16 LE were equivalent there — but on
real BE9558 Tran-drift events and BE18003 Histogram+Continuous events
the uint16 LE interpretation produced peaks up to 268 in/s and 35×
inflated PVS sums.
Cross-correlated against BW's per-interval ASCII export on:
- K558LKZU/LL1P/LL3K → 100% T/V/L/M peak match (1435 blocks each)
- T003LKZR/LL0O/LL1M → 100% T/V/L, 99.3% M (0.05 dB rounding only)
- N599LKZS/LL0L → 100% all channels
- N844 fixture corpus → 100% all channels (unchanged)
Annotations preserved on every record for future RE; the defensive
_MAX_PEAK_COUNT bound is no longer needed (uint8 maxes at 1.275 in/s,
well below any physical limit).
Synthetic regression test added using the verbatim K558LKZU.RE0H
interval-12 block.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
histogram_codec: drop _MAX_PEAK_COUNT 4096 → 2200. The old ceiling
let extension-byte blocks slip through at up to 20.48 in/s per
channel, producing 35× inflated PVS sums when first deployed to
prod. 2200 covers Normal-range full-scale (10 in/s = 2000 counts)
plus 10% headroom for quantization edge cases.
backfill_sidecars: also preserve the bw_report block alongside
review + extensions when regenerating sidecars. event_to_sidecar_dict
takes a BwAsciiReport dataclass not a dict, so for bw_report we
overlay the existing block after regen rather than passing as a kwarg.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered while running the backfill on prod: certain histogram
blocks contain an undocumented extension byte format whose naive
uint16 LE interpretation yields physically impossible peak values
(150+ in/s when the device max is 10). Concrete example from
K558LKSG.3I0H block at body+7424:
bytes [6:10] = 05 79 69 00
current code: T_peak = uint16 LE = 0x7905 = 30981 → 154.9 in/s
reality: T_peak = byte[6] = 5 → 0.025 in/s (matches BW display)
The high byte (0x79 here) appears to be an extension field — possibly
"time of peak within interval" or a Histogram+Continuous sub-mode
marker. Observed across BE9558 and BE18003 units in prod data; never
appeared in the BE12844 fixture corpus the codec was originally
verified against.
Effect on prod: 26 out of 1433 blocks in this one event had inflated
peaks, plus dozens of similar events across the fleet → sum(PVS)
inflated from baseline 988 to 34501 (35x). Rolled back via the
pre-backfill snapshot before any UI exposure.
Defensive fix: bounds-check peak counts in `_decode_block`. Any
field exceeding `_MAX_PEAK_COUNT` (4096 = ~20 in/s, well past the
device's 10 in/s Normal-range FS) causes the block to be skipped
entirely. Other valid blocks in the same event still decode
correctly.
Trade-off: those skipped blocks lose their per-interval data
(peaks + frequencies). Acceptable until the extension format is
reverse-engineered — better than propagating bogus values into PVS
computations downstream.
The 24 existing tests all still pass — the fixtures used during the
original codec development don't exercise the extension-byte case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered while dry-running the backfill on prod: the waveform store
contains both BW (.AB0*/.N00) and Thor IDF (.IDFW/.IDFH) event files
side-by-side because both go through the same per-serial directory
layout. The script's `_looks_like_event_file` heuristic accepted any
3-4 char extension ending in W or H, which matched both BW and IDF.
The script then routes everything through
`event_file_io.read_blastware_file`, which rejects IDF files with
"not a Blastware file (bad header prefix)" — 3807 errors on prod
out of 7201 total events.
Thor IDF events have their own ingest path
(`WaveformStore.save_imported_idf`) and their sidecars are populated
at ingest from the paired `.IDFW.txt` ASCII report. The backfill
script has no value to add for them — there's no decoder to refresh,
and the sidecar metadata is already correct. Filter them out.
After this fix, the prod backfill should run clean: ~3392 BW events
get sidecar+h5 regen as expected; the ~3807 Thor IDF events are
silently skipped.
The proper "IDF backfill" (refresh tool_version stamp on IDF
sidecars by re-running event_to_sidecar_dict against the stored
DB row + sidecar extensions block) is a separate, narrower
follow-up — not blocking the BW backfill rollout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>