scripts/backfill_sidecars: skip .h5 write when decoder returned no samples
Discovered while dry-running the backfill on the prod store: ~10,000 of ~10,059 events are histogram-mode (filename extension `*H`), and the waveform-body codec wired in via the previous commit doesn't handle histogram-mode bodies — only the waveform-mode codec at §7.6.1 is implemented; the histogram-mode codec at §7.6.2 of the protocol reference is documented but no Python implementation exists yet. Without this guard, every histogram event's .h5 file would be *replaced* with an empty one — strictly worse than today's broken-int16-LE .h5 because any downstream viewer expecting non-empty sample arrays would now error out instead of just rendering wrong values. Fix: after the decoder runs, check whether any channel has samples. If not, skip the .h5 write entirely. The sidecar still regenerates (refreshing the tool_version stamp and any peaks/project info from the DB row), but the existing .h5 is left untouched. This is a *temporary* gate. When the histogram codec lands (next branch: `feat/wire-histogram-codec`), the has_samples check can be removed and the backfill will then correctly regenerate all .h5 files, histogram and waveform alike. Observed effect (dry-run on prod store, 10,059 events): - waveform events (~5%): "[DRY ] would write … + .h5 (would (re)write)" - histogram events (~95%): "[DRY ] would write … + .h5 (skipped-empty-samples)" - sidecar tool_version bump succeeds for both Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -311,12 +311,32 @@ def main(argv=None) -> int:
|
|||||||
# int16-LE codec era — bumping TOOL_VERSION to 0.20.0+
|
# int16-LE codec era — bumping TOOL_VERSION to 0.20.0+
|
||||||
# marks every pre-codec sidecar stale, which now
|
# marks every pre-codec sidecar stale, which now
|
||||||
# correctly cascades to .h5 regeneration too.
|
# correctly cascades to .h5 regeneration too.
|
||||||
|
#
|
||||||
|
# Skip the .h5 write when the decoder couldn't produce
|
||||||
|
# samples — this is the histogram-mode case today
|
||||||
|
# (waveform_codec.decode_waveform_v2 only handles the
|
||||||
|
# waveform-mode body format per §7.6.1; the histogram
|
||||||
|
# codec at §7.6.2 is documented but not yet implemented).
|
||||||
|
# Without this check we'd replace the existing (broken
|
||||||
|
# int16-LE) histogram .h5 with an empty one, which is
|
||||||
|
# arguably worse for any consumer expecting non-empty
|
||||||
|
# sample arrays. When the histogram codec lands, this
|
||||||
|
# check can come out.
|
||||||
|
has_samples = bool(
|
||||||
|
ev.raw_samples and any(
|
||||||
|
ev.raw_samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL")
|
||||||
|
)
|
||||||
|
)
|
||||||
hdf5_path = store.hdf5_path_for(serial, path.name)
|
hdf5_path = store.hdf5_path_for(serial, path.name)
|
||||||
hdf5_filename = hdf5_path.name if hdf5_path.exists() else None
|
hdf5_filename = hdf5_path.name if hdf5_path.exists() else None
|
||||||
hdf5_action = "kept"
|
hdf5_action = "kept"
|
||||||
need_h5 = not args.skip_hdf5 and (
|
need_h5 = (
|
||||||
args.force or not hdf5_path.exists() or sidecar_stale
|
not args.skip_hdf5
|
||||||
|
and (args.force or not hdf5_path.exists() or sidecar_stale)
|
||||||
|
and has_samples
|
||||||
)
|
)
|
||||||
|
if not has_samples and not args.skip_hdf5:
|
||||||
|
hdf5_action = "skipped-empty-samples"
|
||||||
if need_h5:
|
if need_h5:
|
||||||
if args.dry_run:
|
if args.dry_run:
|
||||||
hdf5_action = "would (re)write"
|
hdf5_action = "would (re)write"
|
||||||
|
|||||||
Reference in New Issue
Block a user