backfill: overlay bw_report onto Event before DB upsert

Mirror what the ingest path does: BW's reported peaks (and sample_rate
/ record_time) take precedence over codec output where present.

Without this, --force backfill silently overwrites bw_report-overlaid
DB columns with codec-derived peaks.  Wrong for events where the codec
doesn't fully decode (waveform walker edge cases on SP0/SS0/SV0-style
events, histogram byte[5]!=0 sub-format that isn't yet RE'd), producing
PVS=0 on real high-amplitude events.  Bit on prod 2026-05-22 with
three top-10 waveform events ending up at PVS=0 (rolled back same day,
this fix is the proper resolution).

New helper minimateplus.event_file_io.apply_bw_report_dict_to_event
operates on the projected sidecar dict shape (the structure
_bw_report_to_dict produces, which is what gets preserved in the
sidecar).  Mirrors apply_report_to_event's semantics: only writes
fields where bw_report has a non-None value, no-ops cleanly on
empty / None input.

Dev validation against prod snapshot:
  pre  : 1839.7315 pvs_sum   356 events with DB PVS ≠ sidecar bw_report
  post : 2016.4902 pvs_sum     2 events still mismatched (both have NULL
                                timestamp + duplicate rows, edge case)

Both edge-case events DO get the correct value written by the new
backfill — their stale rows from prior backfills remain because
UNIQUE(serial, timestamp) doesn't fire on NULL.  Separate dedup
cleanup needed for those 2 events (0.014% of corpus); not blocking.

Backfill remains idempotent + bw_report preservation still passes
(0 WIPED, 0 CHANGED on the 3rd consecutive run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-22 18:56:22 +00:00
parent 49a524d0d4
commit 35842ac50a
3 changed files with 142 additions and 0 deletions
+17
View File
@@ -309,6 +309,23 @@ def main(argv=None) -> int:
except Exception:
pass
# Overlay BW ASCII report fields onto the rebuilt Event
# BEFORE the sidecar + DB write. Mirrors what the ingest
# path does — BW's reported peaks (and sample_rate /
# record_time) win over codec output where present.
#
# Without this step, --force backfill silently overwrites
# the bw_report-overlaid DB columns with codec-derived
# values, which is wrong for events the codec doesn't
# fully decode (e.g. waveform walker edge cases on
# SP0/SS0/SV0-style events, or histogram sub-formats with
# byte[5]!=0 that aren't yet RE'd). Net effect was PVS=0
# on three top-10 events on 2026-05-22.
if preserved_bw_report:
event_file_io.apply_bw_report_dict_to_event(
ev, preserved_bw_report,
)
sidecar = event_file_io.event_to_sidecar_dict(
ev,
serial=serial,