feat(bw-report): normalise operator-field label variants

Blastware writes the operator-supplied fields with different label
spellings across firmware versions and recording modes — most
notably "Seis. Location" on histogram exports vs "Seis Loc:" on
waveform exports.  Previous parser only matched the latter, so
every histogram event silently lost its sensor_location field.

Replace the four hardcoded `key.rstrip(":") == "X"` branches with
a single `_OPERATOR_LABEL_MAP` dispatch table keyed by normalised
label (lowercase, trailing colon/period stripped, internal
whitespace collapsed).  Adds these variants on day 1:

  project:         "Project:" / "Project"
  client:          "Client:"  / "Client"
  operator:        "User Name:" / "User Name"
  sensor_location: "Seis Loc:" / "Seis. Location" / "Seis Location"
                 / "Sensor Location" / "Seis Loc"

To absorb future BW label drift, add a one-line dict entry — no
new elif branch.

14 new tests cover:
  - Each label variant routes to the correct field (parametrised)
  - Case-insensitive matching ("seis loc" / "SEIS LOC" / "SeIs LoC")
  - Whitespace-collapse ("Seis  Loc" with double-space)
  - End-to-end parse of a real histogram fixture from
    example-events/histogram/ — sensor_location ('Loc #1 - 2652 Hepner...')
    populates correctly even though the file uses "Seis. Location"

Total bw_ascii_report tests: 19 → 33.  Full SFM suite still green
(69 passed, 44 skipped — pre-existing skips for h5py-dep tests).

Pairs with series3-watcher v1.5.4 (which fixes the filename pairing
so histograms actually reach this parser in the first place).
This commit is contained in:
2026-05-10 20:13:44 +00:00
parent cdfe4ad3c8
commit 6a7e8c6e86
2 changed files with 130 additions and 6 deletions
+63 -6
View File
@@ -265,6 +265,61 @@ def _parse_monitor_ts(s: str) -> Optional[datetime.datetime]:
return None
# ── Operator-field label normalisation ──────────────────────────────────────
#
# BW has used different label spellings across versions and recording
# modes for the same operator-supplied fields:
#
# project: "Project:" / "Project"
# client: "Client:" / "Client"
# operator: "User Name:" / "User Name"
# sensor_location: "Seis Loc:" / "Seis. Location" / "Seis Location"
# / "Sensor Location"
#
# Per user feedback ("the tags themselves dont matter a ton, what
# matters is the field"), we normalise labels at lookup time so the
# value-extraction works regardless of which spelling BW happens to
# emit on a given machine.
#
# To add a new variant: edit `_OPERATOR_LABEL_MAP` — single source of
# truth. Keys are normalised forms (lowercase, trailing colon and
# period stripped, internal whitespace collapsed); values are
# attribute names on `BwAsciiReport`.
_OPERATOR_LABEL_MAP = {
# project
"project": "project",
# client
"client": "client",
# operator
"user name": "operator",
# sensor location — most variants of "Seis*" + "Sensor Location"
"seis loc": "sensor_location",
"seis. loc": "sensor_location",
"seis. location": "sensor_location",
"seis location": "sensor_location",
"sensor location": "sensor_location",
}
def _normalise_label_for_lookup(key: str) -> str:
"""Normalise a label for the operator-field lookup.
Strips a trailing colon and/or period, collapses internal
whitespace runs, and lowercases. So all of:
"Seis Loc:"
"Seis. Location"
"seis location"
"Sensor Location"
map to canonical forms in `_OPERATOR_LABEL_MAP`.
"""
s = key.strip().rstrip(":").rstrip(".").strip()
s = _KEY_NORMALISE_RE.sub(" ", s)
return s.lower()
# ─────────────────────────────────────────────────────────────────────────────
# Top-level parser
# ─────────────────────────────────────────────────────────────────────────────
@@ -346,12 +401,14 @@ def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwA
report.calibration_date, report.calibration_by = _parse_calibration(value)
elif key == "Units": report.units = value
# Project labels in BW carry their own trailing colon — after
# _normalise_key we just strip it for matching.
elif key.rstrip(":") == "Project": report.project = value
elif key.rstrip(":") == "Client": report.client = value
elif key.rstrip(":") == "User Name":report.operator = value
elif key.rstrip(":") == "Seis Loc": report.sensor_location = value
# Operator-supplied labels (Project / Client / User Name /
# Seis Loc) — BW writes these with assorted spellings across
# firmware versions and recording modes (e.g. "Seis Loc:" on
# waveform exports vs "Seis. Location" on histogram exports).
# The label normaliser absorbs all known variants; see
# `_OPERATOR_LABEL_MAP` above for the dispatch table.
elif (slot := _OPERATOR_LABEL_MAP.get(_normalise_label_for_lookup(key))):
setattr(report, slot, value)
elif key == "Geo Range": report.geo_range_ips = _parse_number(value)