minimateplus: wire read_blastware_file to verified body codec #24

Merged

serversdown merged 2 commits from feat/wire-codec-to-import-path into dev

2026-05-20 15:26:16 -04:00

Author	SHA1	Message	Date
serversdown	e8682d49ad	scripts/backfill_sidecars: cascade h5 regen when sidecar is stale + bump TOOL_VERSION Two coupled changes that close the rollout gap left by the read_blastware_file codec wiring: 1. minimateplus/event_file_io.py: bump TOOL_VERSION from 0.16.1 to 0.20.0. This is the version stamp the backfill script reads from each sidecar's source.tool_version field to detect "this sidecar was written before the current decoder shipped, regenerate it." Bumping past every value baked into existing prod sidecars flags them all as stale on the next backfill run — which is exactly what we want, since every pre-codec-wiring sidecar was written by the retracted int16-LE decoder. 2. scripts/backfill_sidecars.py: when the sidecar is being regenerated this iteration (sha mismatch, tool_version too old, or --force), also regenerate the .h5. Previously the .h5 logic only rewrote when --force was passed or the file was missing — so a tool_version-driven sidecar regen left the broken .h5 in place forever. Added a `sidecar_stale` boolean to track the "we're rewriting the sidecar this iteration" state and wired it into the h5 need-rewrite check. Path coverage (verified by trace): - sidecar missing → both regen - --force → both regen - sha mismatch → both regen - tool_ver too old → both regen (THE post-codec-wiring case) - everything OK → skip iteration entirely (h5 untouched) Operator review state (review.false_trigger, reviewer, notes) and the sidecar's extensions block are preserved across regen by the existing read-existing-sidecar / pass-into-event_to_sidecar_dict path — unchanged from prior behavior. Deploy procedure (on prod): 1. Pull this change + the read_blastware_file codec wiring. 2. `python scripts/backfill_sidecars.py --dry-run` to preview. Every sidecar with source.tool_version<0.20.0 will show as "would (re)write". 3. Run for real (drop --dry-run). Expect every pre-fix event to regen. Big stores may take a while. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 18:24:06 +00:00
serversdown	31d691b40b	minimateplus: wire read_blastware_file to verified body codec `read_blastware_file()` was still calling `_decode_samples_4ch_int16_le` (the retracted int16-LE-interleaved hypothesis) on the body bytes, producing ±32K noise on every channel of every BW file read from disk. This was the path watcher-forwarded events take into the system (via the import endpoint → save_imported_bw → read_blastware_file, since the watcher doesn't ship A5 frames), so every .h5 sidecar generated for a forwarded event has been wrong since the feature shipped. The fix is mechanical: pass the body bytes straight to `waveform_codec.decode_waveform_v2()` and run the result through `decoded_to_adc_counts()` for the 16x geo scaling. The body already starts with the codec's exact 7-byte preamble `00 02 00 [Tran[0] BE] [Tran[1] BE]` — confirmed by `body[:3].hex()` across all 9 fixture events. No body-slice adjustment needed. If the codec returns None (truncated/malformed file, synthetic test input with no real waveform), fall back to empty channels with a log warning. The rest of the event (timestamp, waveform_key, project strings, sensor_location, peaks-from-samples=0) is still recoverable. Verified against the bundled fixture corpus: V70 Tran/Vert/Long 3328/3328 sample-sets match .TXT ground truth within the 0.005 in/s display quantum, every row 6S0/RG0/AB0/470 (5-8-26) 3328/2304/1280/1280 samples; Vert PPVs match BW's own report within 0.02 in/s JQ0 3328 samples, Vert PPV 3.384 vs BW 3.465 SP0/SS0/SV0 (loud events) 3072–3328 samples; known walker tail-truncation 1–7 samples per channel, samples reached are byte-exact Existing `test_read_blastware_file_round_trip` (synthetic empty event) continues to pass thanks to the None-fallback. Codec verify scripts (`analysis/verify_quiet_bundle.py`, `analysis/verify_full_decode.py`) re-run unchanged. Added two regression-lock tests in tests/test_event_file_io.py: - test_read_blastware_file_decodes_via_codec[6 fixtures] — verifies sample count + Vert PPV per fixture - test_read_blastware_file_v70_samples_match_txt_truth — verifies every one of V70's 3328 sample-sets across Tran/Vert/Long matches the .TXT ground truth row-by-row within 0.003 in/s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 18:13:24 +00:00

Author

SHA1

Message

Date

serversdown

e8682d49ad

scripts/backfill_sidecars: cascade h5 regen when sidecar is stale + bump TOOL_VERSION

Two coupled changes that close the rollout gap left by the
read_blastware_file codec wiring:

1. minimateplus/event_file_io.py: bump TOOL_VERSION from 0.16.1 to
   0.20.0.  This is the version stamp the backfill script reads from
   each sidecar's source.tool_version field to detect "this sidecar
   was written before the current decoder shipped, regenerate it."
   Bumping past every value baked into existing prod sidecars flags
   them all as stale on the next backfill run — which is exactly what
   we want, since every pre-codec-wiring sidecar was written by the
   retracted int16-LE decoder.

2. scripts/backfill_sidecars.py: when the sidecar is being
   regenerated this iteration (sha mismatch, tool_version too old,
   or --force), also regenerate the .h5.  Previously the .h5 logic
   only rewrote when --force was passed or the file was missing —
   so a tool_version-driven sidecar regen left the broken .h5 in
   place forever.  Added a `sidecar_stale` boolean to track the
   "we're rewriting the sidecar this iteration" state and wired it
   into the h5 need-rewrite check.

   Path coverage (verified by trace):
     - sidecar missing  → both regen
     - --force          → both regen
     - sha mismatch     → both regen
     - tool_ver too old → both regen (THE post-codec-wiring case)
     - everything OK    → skip iteration entirely (h5 untouched)

Operator review state (review.false_trigger, reviewer, notes) and
the sidecar's extensions block are preserved across regen by the
existing read-existing-sidecar / pass-into-event_to_sidecar_dict
path — unchanged from prior behavior.

Deploy procedure (on prod):
  1. Pull this change + the read_blastware_file codec wiring.
  2. `python scripts/backfill_sidecars.py --dry-run` to preview.
     Every sidecar with source.tool_version<0.20.0 will show as
     "would (re)write".
  3. Run for real (drop --dry-run).  Expect every pre-fix event
     to regen.  Big stores may take a while.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 18:24:06 +00:00

serversdown

31d691b40b

minimateplus: wire read_blastware_file to verified body codec

`read_blastware_file()` was still calling `_decode_samples_4ch_int16_le`
(the retracted int16-LE-interleaved hypothesis) on the body bytes,
producing ±32K noise on every channel of every BW file read from disk.
This was the path watcher-forwarded events take into the system
(via the import endpoint → save_imported_bw → read_blastware_file,
since the watcher doesn't ship A5 frames), so every .h5 sidecar
generated for a forwarded event has been wrong since the feature
shipped.

The fix is mechanical: pass the body bytes straight to
`waveform_codec.decode_waveform_v2()` and run the result through
`decoded_to_adc_counts()` for the 16x geo scaling.  The body already
starts with the codec's exact 7-byte preamble `00 02 00 [Tran[0] BE]
[Tran[1] BE]` — confirmed by `body[:3].hex()` across all 9 fixture
events.  No body-slice adjustment needed.

If the codec returns None (truncated/malformed file, synthetic test
input with no real waveform), fall back to empty channels with a log
warning.  The rest of the event (timestamp, waveform_key, project
strings, sensor_location, peaks-from-samples=0) is still recoverable.

Verified against the bundled fixture corpus:

  V70  Tran/Vert/Long 3328/3328 sample-sets match .TXT ground truth
       within the 0.005 in/s display quantum, every row
  6S0/RG0/AB0/470 (5-8-26)  3328/2304/1280/1280 samples; Vert PPVs
       match BW's own report within 0.02 in/s
  JQ0  3328 samples, Vert PPV 3.384 vs BW 3.465
  SP0/SS0/SV0 (loud events)  3072–3328 samples; known walker
       tail-truncation 1–7 samples per channel, samples reached are
       byte-exact

Existing `test_read_blastware_file_round_trip` (synthetic empty event)
continues to pass thanks to the None-fallback.  Codec verify scripts
(`analysis/verify_quiet_bundle.py`, `analysis/verify_full_decode.py`)
re-run unchanged.

Added two regression-lock tests in tests/test_event_file_io.py:
  - test_read_blastware_file_decodes_via_codec[6 fixtures]
    — verifies sample count + Vert PPV per fixture
  - test_read_blastware_file_v70_samples_match_txt_truth
    — verifies every one of V70's 3328 sample-sets across Tran/Vert/Long
      matches the .TXT ground truth row-by-row within 0.003 in/s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 18:13:24 +00:00

minimateplus: wire read_blastware_file to verified body codec #24

2 Commits