fix(import): resolve real serial from BW filename instead of bucketing to UNKNOWN

The /db/import/blastware_file endpoint was bucketing every
forwarded event into serial='UNKNOWN' in the DB.  WaveformStore
correctly decoded the serial from the BW filename and saved
files to <store>/<serial>/<filename> (e.g.
.../BE17353/S353L5KC.DR0H.h5), but the endpoint code called
db.insert_events(serial=_serial_from_event(ev)) — and
_serial_from_event was a stub that always returned None,
falling back to "UNKNOWN".

Effect on the user's prod server: 3,039 events forwarded across
24 distinct units, ALL inserted under serial='UNKNOWN'.  The
on-disk waveform store + sidecars + HDF5s were fine, but the
SFM webapp's /db/units only showed the two original manually-
uploaded serials because every forwarded row had its serial
column zeroed to UNKNOWN.

Fix:
  - WaveformStore.save_imported_bw() now surfaces the decoded
    serial on the returned `rec` dict (rec["serial"]).
  - The import endpoint uses rec["serial"] as the authoritative
    fallback when the operator hasn't supplied a serial_hint query
    parameter.  Order of precedence:
      query string `serial` → rec["serial"] → _serial_from_event(ev) → "UNKNOWN"
  - Response payload now includes `serial` per file so the watcher
    log lines (or any future caller) can see which unit each event
    was attributed to.

Recovery for existing DB rows:
  scripts/repair_unknown_serials.py walks the events table looking
  for rows with serial='UNKNOWN' and re-attributes each one to the
  serial decoded from blastware_filename.  Updates the row in place
  unless the target (serial, timestamp) already has a row, in which
  case the UNKNOWN duplicate is deleted.  Idempotent.  Default
  dry-run; pass --apply to commit.

  Verified on the user's actual DB (dry-run):
    UNKNOWN rows scanned:       3039
    Updated to real serial:     2602
    Deleted (duplicate of an
     already-correct row):      437
    Unresolved (bad filename):  0

After running the repair, /db/units will show all 24 units
correctly populated.
This commit is contained in:
2026-05-11 02:25:08 +00:00
parent a032fa5451
commit 082e5946bc
4 changed files with 169 additions and 1 deletions
+4
View File
@@ -418,6 +418,10 @@ def test_save_imported_bw_round_trip(tmp_path: Path):
assert rec["filename"] == fname
assert rec["a5_pickle_filename"] is None # no A5 source for BW imports
# The serial decoded from the BW filename surfaces on the record so
# the import endpoint can use it when calling SeismoDb.insert_events()
# (otherwise forwarded events would all bucket into serial="UNKNOWN").
assert rec["serial"] == "BE11529"
sc = store.load_sidecar("BE11529", fname)
assert sc is not None
assert sc["source"]["kind"] == "bw-import"