read_blastware_file: leave peak_values=None when samples can't be decoded
Fixes a data-loss bug discovered while dry-running the backfill against
the prod store.
Symptom: every histogram event in the store has its body decoded by
read_blastware_file → codec returns None → samples = empty dict →
``ev.peak_values = _peaks_from_samples(empty)`` returns
``PeakValues(0, 0, 0, 0, 0)`` (NOT None). The backfill script's
existing "seed from DB row when peak_values is None" branch then
correctly *skips* the seeding, and the all-zeros PeakValues flows into
``db.insert_events()``'s UPSERT path, OVERWRITING the existing good DB
peak values for that event (which were populated from the paired BW
ASCII report at ingest).
Net effect: running the backfill on prod would have wiped the PPV /
mic / vector-sum columns for ~10,000 histogram events.
Fix: only compute peaks-from-samples when there are actually samples.
For events the codec couldn't decode (histogram-mode bodies, until
the §7.6.2 histogram codec is wired in), leave peak_values=None as
the "we don't know" signal. Downstream consumers:
- backfill_sidecars.py — its existing ``if ev.peak_values is None:``
branch (line 243) seeds from the DB row, preserving the real
BW-report peaks across the regen.
- WaveformStore.save_imported_bw — apply_report_to_event overlays
peaks from the paired BW ASCII report when one was uploaded.
Histogram imports without a paired report end up with NULL peaks
in the DB, which is correct (better than zeros — clearly says
"no peak data available" rather than "peaks are exactly zero").
Updated the existing synthetic-event round-trip test to expect
peak_values=None for the no-real-body case, which is the truth now.
The 7 fixture-corpus regression tests for real BW waveforms continue
to pass — those have decodable samples, so peak_values is still
populated from the codec output as before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -289,9 +289,15 @@ def test_read_blastware_file_round_trip(tmp_path: Path):
|
||||
assert parsed.timestamp.second == ev.timestamp.second
|
||||
# No A5 source recoverable.
|
||||
assert parsed._a5_frames is None
|
||||
# Peaks computed from samples (synthetic = zero samples → zero peaks).
|
||||
assert parsed.peak_values is not None
|
||||
assert parsed.peak_values.peak_vector_sum == 0.0
|
||||
# The synthetic event has no real waveform body, so the codec can't
|
||||
# decode samples → read_blastware_file leaves peak_values=None
|
||||
# (the "we don't know" signal) rather than fabricating all-zero
|
||||
# peaks that would otherwise overwrite real DB values via UPSERT.
|
||||
assert parsed.peak_values is None
|
||||
assert parsed.raw_samples is not None
|
||||
# Empty channels — codec returned None for the malformed synthetic body.
|
||||
for ch in ("Tran", "Vert", "Long", "MicL"):
|
||||
assert parsed.raw_samples[ch] == []
|
||||
|
||||
|
||||
_BW_CODEC_FIXTURES = [
|
||||
|
||||
Reference in New Issue
Block a user