scripts/backfill_sidecars: cascade h5 regen when sidecar is stale + bump TOOL_VERSION
Two coupled changes that close the rollout gap left by the
read_blastware_file codec wiring:
1. minimateplus/event_file_io.py: bump TOOL_VERSION from 0.16.1 to
0.20.0. This is the version stamp the backfill script reads from
each sidecar's source.tool_version field to detect "this sidecar
was written before the current decoder shipped, regenerate it."
Bumping past every value baked into existing prod sidecars flags
them all as stale on the next backfill run — which is exactly what
we want, since every pre-codec-wiring sidecar was written by the
retracted int16-LE decoder.
2. scripts/backfill_sidecars.py: when the sidecar is being
regenerated this iteration (sha mismatch, tool_version too old,
or --force), also regenerate the .h5. Previously the .h5 logic
only rewrote when --force was passed or the file was missing —
so a tool_version-driven sidecar regen left the broken .h5 in
place forever. Added a `sidecar_stale` boolean to track the
"we're rewriting the sidecar this iteration" state and wired it
into the h5 need-rewrite check.
Path coverage (verified by trace):
- sidecar missing → both regen
- --force → both regen
- sha mismatch → both regen
- tool_ver too old → both regen (THE post-codec-wiring case)
- everything OK → skip iteration entirely (h5 untouched)
Operator review state (review.false_trigger, reviewer, notes) and
the sidecar's extensions block are preserved across regen by the
existing read-existing-sidecar / pass-into-event_to_sidecar_dict
path — unchanged from prior behavior.
Deploy procedure (on prod):
1. Pull this change + the read_blastware_file codec wiring.
2. `python scripts/backfill_sidecars.py --dry-run` to preview.
Every sidecar with source.tool_version<0.20.0 will show as
"would (re)write".
3. Run for real (drop --dry-run). Expect every pre-fix event
to regen. Big stores may take a while.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -48,7 +48,7 @@ SIDECAR_KIND = "sfm.event"
|
|||||||
# bumped without a `pip install` re-run — leading to confusing stale
|
# bumped without a `pip install` re-run — leading to confusing stale
|
||||||
# version stamps in sidecars. Bump this constant and CHANGELOG.md
|
# version stamps in sidecars. Bump this constant and CHANGELOG.md
|
||||||
# together at release time.
|
# together at release time.
|
||||||
TOOL_VERSION = "0.16.1"
|
TOOL_VERSION = "0.20.0"
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Best-effort: prefer the installed metadata when it's NEWER than the
|
# Best-effort: prefer the installed metadata when it's NEWER than the
|
||||||
|
|||||||
@@ -12,8 +12,20 @@ Walks `<store_root>/<serial>/<filename>` and for each BW event file:
|
|||||||
parsing the BW binary directly (peaks computed from samples).
|
parsing the BW binary directly (peaks computed from samples).
|
||||||
|
|
||||||
Clean waveform (.h5):
|
Clean waveform (.h5):
|
||||||
- Skip when <filename>.h5 already exists (idempotent).
|
- Regenerated whenever the sidecar is regenerated (sha mismatch
|
||||||
- Else write from .a5.pkl (preferred) or BW binary parse (fallback).
|
OR sidecar.source.tool_version < current TOOL_VERSION OR --force).
|
||||||
|
The .h5 and the sidecar both come from the same decoder output,
|
||||||
|
so if the sidecar is stale the .h5 is too.
|
||||||
|
- Written when missing.
|
||||||
|
- --skip-hdf5 turns off all .h5 writes.
|
||||||
|
|
||||||
|
Typical use after a decoder upgrade:
|
||||||
|
1. Pull the new seismo-relay code (which bumped TOOL_VERSION).
|
||||||
|
2. Run this script — every sidecar with an older tool_version
|
||||||
|
stamp regenerates, and the associated .h5 cascade-regenerates.
|
||||||
|
3. Operator review state (review.false_trigger, notes, reviewer)
|
||||||
|
and the sidecar's extensions block are preserved across the
|
||||||
|
regen.
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
python scripts/backfill_sidecars.py [--store-root PATH]
|
python scripts/backfill_sidecars.py [--store-root PATH]
|
||||||
@@ -123,6 +135,12 @@ def main(argv=None) -> int:
|
|||||||
# the sidecar was written by a build that includes any
|
# the sidecar was written by a build that includes any
|
||||||
# decoder fixes shipped since).
|
# decoder fixes shipped since).
|
||||||
# Either part failing → regenerate. --force bypasses both.
|
# Either part failing → regenerate. --force bypasses both.
|
||||||
|
#
|
||||||
|
# Tracks whether we're regenerating the sidecar this iteration
|
||||||
|
# so the .h5 logic below knows to refresh that too — staleness
|
||||||
|
# of the sidecar implies staleness of the derived .h5 (both
|
||||||
|
# come out of the same decoder).
|
||||||
|
sidecar_stale = True
|
||||||
if sidecar_path.exists() and not args.force:
|
if sidecar_path.exists() and not args.force:
|
||||||
try:
|
try:
|
||||||
existing = event_file_io.read_sidecar(sidecar_path)
|
existing = event_file_io.read_sidecar(sidecar_path)
|
||||||
@@ -136,6 +154,7 @@ def main(argv=None) -> int:
|
|||||||
ver_ok = _vt(src_ver) >= _vt(event_file_io.TOOL_VERSION)
|
ver_ok = _vt(src_ver) >= _vt(event_file_io.TOOL_VERSION)
|
||||||
if sha_ok and ver_ok:
|
if sha_ok and ver_ok:
|
||||||
skipped += 1
|
skipped += 1
|
||||||
|
sidecar_stale = False
|
||||||
continue
|
continue
|
||||||
if sha_ok and not ver_ok:
|
if sha_ok and not ver_ok:
|
||||||
log.info(
|
log.info(
|
||||||
@@ -281,12 +300,23 @@ def main(argv=None) -> int:
|
|||||||
extensions=preserved_ext,
|
extensions=preserved_ext,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Also emit the .h5 clean-waveform file when missing OR when
|
# Also emit the .h5 clean-waveform file when:
|
||||||
# --force was passed (so a re-backfill picks up decoder fixes).
|
# - it's missing, OR
|
||||||
|
# - --force was passed, OR
|
||||||
|
# - the sidecar is being regenerated this iteration
|
||||||
|
# (sha mismatch / tool_version too old). The .h5 and
|
||||||
|
# the sidecar are both derived from the same decoder
|
||||||
|
# output, so if the sidecar is stale, so is the .h5.
|
||||||
|
# This is the path that recovers from the broken-
|
||||||
|
# int16-LE codec era — bumping TOOL_VERSION to 0.20.0+
|
||||||
|
# marks every pre-codec sidecar stale, which now
|
||||||
|
# correctly cascades to .h5 regeneration too.
|
||||||
hdf5_path = store.hdf5_path_for(serial, path.name)
|
hdf5_path = store.hdf5_path_for(serial, path.name)
|
||||||
hdf5_filename = hdf5_path.name if hdf5_path.exists() else None
|
hdf5_filename = hdf5_path.name if hdf5_path.exists() else None
|
||||||
hdf5_action = "kept"
|
hdf5_action = "kept"
|
||||||
need_h5 = not args.skip_hdf5 and (args.force or not hdf5_path.exists())
|
need_h5 = not args.skip_hdf5 and (
|
||||||
|
args.force or not hdf5_path.exists() or sidecar_stale
|
||||||
|
)
|
||||||
if need_h5:
|
if need_h5:
|
||||||
if args.dry_run:
|
if args.dry_run:
|
||||||
hdf5_action = "would (re)write"
|
hdf5_action = "would (re)write"
|
||||||
|
|||||||
Reference in New Issue
Block a user