read_blastware_file: leave peak_values=None when samples can't be decoded

Fixes a data-loss bug discovered while dry-running the backfill against the prod store. Symptom: every histogram event in the store has its body decoded by read_blastware_file → codec returns None → samples = empty dict → ``ev.peak_values = _peaks_from_samples(empty)`` returns ``PeakValues(0, 0, 0, 0, 0)`` (NOT None). The backfill script's existing "seed from DB row when peak_values is None" branch then correctly *skips* the seeding, and the all-zeros PeakValues flows into ``db.insert_events()``'s UPSERT path, OVERWRITING the existing good DB peak values for that event (which were populated from the paired BW ASCII report at ingest). Net effect: running the backfill on prod would have wiped the PPV / mic / vector-sum columns for ~10,000 histogram events. Fix: only compute peaks-from-samples when there are actually samples. For events the codec couldn't decode (histogram-mode bodies, until the §7.6.2 histogram codec is wired in), leave peak_values=None as the "we don't know" signal. Downstream consumers: - backfill_sidecars.py — its existing ``if ev.peak_values is None:`` branch (line 243) seeds from the DB row, preserving the real BW-report peaks across the regen. - WaveformStore.save_imported_bw — apply_report_to_event overlays peaks from the paired BW ASCII report when one was uploaded. Histogram imports without a paired report end up with NULL peaks in the DB, which is correct (better than zeros — clearly says "no peak data available" rather than "peaks are exactly zero"). Updated the existing synthetic-event round-trip test to expect peak_values=None for the no-real-body case, which is the truth now. The 7 fixture-corpus regression tests for real BW waveforms continue to pass — those have decodable samples, so peak_values is still populated from the codec output as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 20:30:53 +00:00
parent c4648c1959
commit fa9d3cdef2
2 changed files with 21 additions and 4 deletions
@@ -811,7 +811,18 @@ def read_blastware_file(path: Union[str, Path]) -> Event:
        project=project, client=client, operator=user, sensor_location=seisloc,
    )
    ev.raw_samples = samples
-    ev.peak_values = _peaks_from_samples(samples)
+    # Only compute peaks from samples when we actually have samples.
+    # For events the codec couldn't decode (histogram-mode bodies, until
+    # the §7.6.2 histogram codec is wired in), samples is an empty dict
+    # and ``_peaks_from_samples`` would return PeakValues(0, 0, 0, 0, 0).
+    # That would then OVERWRITE existing good DB peak values (e.g. from
+    # paired BW ASCII reports) during the backfill UPSERT path.
+    # Leaving peak_values=None signals "we don't know" to downstream
+    # consumers; the backfill script seeds from the DB row when it sees
+    # None, and ``apply_report_to_event`` overlays from a paired ASCII
+    # report when one is supplied.
+    has_samples = any(samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL"))
+    ev.peak_values = _peaks_from_samples(samples) if has_samples else None
    ev._a5_frames = None  # not recoverable from BW file

    return ev