scripts/backfill_sidecars: cascade h5 regen when sidecar is stale + bump TOOL_VERSION

Two coupled changes that close the rollout gap left by the
read_blastware_file codec wiring:

1. minimateplus/event_file_io.py: bump TOOL_VERSION from 0.16.1 to
   0.20.0.  This is the version stamp the backfill script reads from
   each sidecar's source.tool_version field to detect "this sidecar
   was written before the current decoder shipped, regenerate it."
   Bumping past every value baked into existing prod sidecars flags
   them all as stale on the next backfill run — which is exactly what
   we want, since every pre-codec-wiring sidecar was written by the
   retracted int16-LE decoder.

2. scripts/backfill_sidecars.py: when the sidecar is being
   regenerated this iteration (sha mismatch, tool_version too old,
   or --force), also regenerate the .h5.  Previously the .h5 logic
   only rewrote when --force was passed or the file was missing —
   so a tool_version-driven sidecar regen left the broken .h5 in
   place forever.  Added a `sidecar_stale` boolean to track the
   "we're rewriting the sidecar this iteration" state and wired it
   into the h5 need-rewrite check.

   Path coverage (verified by trace):
     - sidecar missing  → both regen
     - --force          → both regen
     - sha mismatch     → both regen
     - tool_ver too old → both regen (THE post-codec-wiring case)
     - everything OK    → skip iteration entirely (h5 untouched)

Operator review state (review.false_trigger, reviewer, notes) and
the sidecar's extensions block are preserved across regen by the
existing read-existing-sidecar / pass-into-event_to_sidecar_dict
path — unchanged from prior behavior.

Deploy procedure (on prod):
  1. Pull this change + the read_blastware_file codec wiring.
  2. `python scripts/backfill_sidecars.py --dry-run` to preview.
     Every sidecar with source.tool_version<0.20.0 will show as
     "would (re)write".
  3. Run for real (drop --dry-run).  Expect every pre-fix event
     to regen.  Big stores may take a while.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 18:24:06 +00:00
parent 31d691b40b
commit e8682d49ad
2 changed files with 36 additions and 6 deletions
+1 -1
View File
@@ -48,7 +48,7 @@ SIDECAR_KIND = "sfm.event"
# bumped without a `pip install` re-run — leading to confusing stale # bumped without a `pip install` re-run — leading to confusing stale
# version stamps in sidecars. Bump this constant and CHANGELOG.md # version stamps in sidecars. Bump this constant and CHANGELOG.md
# together at release time. # together at release time.
TOOL_VERSION = "0.16.1" TOOL_VERSION = "0.20.0"
try: try:
# Best-effort: prefer the installed metadata when it's NEWER than the # Best-effort: prefer the installed metadata when it's NEWER than the
+35 -5
View File
@@ -12,8 +12,20 @@ Walks `<store_root>/<serial>/<filename>` and for each BW event file:
parsing the BW binary directly (peaks computed from samples). parsing the BW binary directly (peaks computed from samples).
Clean waveform (.h5): Clean waveform (.h5):
- Skip when <filename>.h5 already exists (idempotent). - Regenerated whenever the sidecar is regenerated (sha mismatch
- Else write from .a5.pkl (preferred) or BW binary parse (fallback). OR sidecar.source.tool_version < current TOOL_VERSION OR --force).
The .h5 and the sidecar both come from the same decoder output,
so if the sidecar is stale the .h5 is too.
- Written when missing.
- --skip-hdf5 turns off all .h5 writes.
Typical use after a decoder upgrade:
1. Pull the new seismo-relay code (which bumped TOOL_VERSION).
2. Run this script — every sidecar with an older tool_version
stamp regenerates, and the associated .h5 cascade-regenerates.
3. Operator review state (review.false_trigger, notes, reviewer)
and the sidecar's extensions block are preserved across the
regen.
Usage: Usage:
python scripts/backfill_sidecars.py [--store-root PATH] python scripts/backfill_sidecars.py [--store-root PATH]
@@ -123,6 +135,12 @@ def main(argv=None) -> int:
# the sidecar was written by a build that includes any # the sidecar was written by a build that includes any
# decoder fixes shipped since). # decoder fixes shipped since).
# Either part failing → regenerate. --force bypasses both. # Either part failing → regenerate. --force bypasses both.
#
# Tracks whether we're regenerating the sidecar this iteration
# so the .h5 logic below knows to refresh that too — staleness
# of the sidecar implies staleness of the derived .h5 (both
# come out of the same decoder).
sidecar_stale = True
if sidecar_path.exists() and not args.force: if sidecar_path.exists() and not args.force:
try: try:
existing = event_file_io.read_sidecar(sidecar_path) existing = event_file_io.read_sidecar(sidecar_path)
@@ -136,6 +154,7 @@ def main(argv=None) -> int:
ver_ok = _vt(src_ver) >= _vt(event_file_io.TOOL_VERSION) ver_ok = _vt(src_ver) >= _vt(event_file_io.TOOL_VERSION)
if sha_ok and ver_ok: if sha_ok and ver_ok:
skipped += 1 skipped += 1
sidecar_stale = False
continue continue
if sha_ok and not ver_ok: if sha_ok and not ver_ok:
log.info( log.info(
@@ -281,12 +300,23 @@ def main(argv=None) -> int:
extensions=preserved_ext, extensions=preserved_ext,
) )
# Also emit the .h5 clean-waveform file when missing OR when # Also emit the .h5 clean-waveform file when:
# --force was passed (so a re-backfill picks up decoder fixes). # - it's missing, OR
# - --force was passed, OR
# - the sidecar is being regenerated this iteration
# (sha mismatch / tool_version too old). The .h5 and
# the sidecar are both derived from the same decoder
# output, so if the sidecar is stale, so is the .h5.
# This is the path that recovers from the broken-
# int16-LE codec era — bumping TOOL_VERSION to 0.20.0+
# marks every pre-codec sidecar stale, which now
# correctly cascades to .h5 regeneration too.
hdf5_path = store.hdf5_path_for(serial, path.name) hdf5_path = store.hdf5_path_for(serial, path.name)
hdf5_filename = hdf5_path.name if hdf5_path.exists() else None hdf5_filename = hdf5_path.name if hdf5_path.exists() else None
hdf5_action = "kept" hdf5_action = "kept"
need_h5 = not args.skip_hdf5 and (args.force or not hdf5_path.exists()) need_h5 = not args.skip_hdf5 and (
args.force or not hdf5_path.exists() or sidecar_stale
)
if need_h5: if need_h5:
if args.dry_run: if args.dry_run:
hdf5_action = "would (re)write" hdf5_action = "would (re)write"