feat(bw-report): normalise operator-field label variants
Blastware writes the operator-supplied fields with different label
spellings across firmware versions and recording modes — most
notably "Seis. Location" on histogram exports vs "Seis Loc:" on
waveform exports. Previous parser only matched the latter, so
every histogram event silently lost its sensor_location field.
Replace the four hardcoded `key.rstrip(":") == "X"` branches with
a single `_OPERATOR_LABEL_MAP` dispatch table keyed by normalised
label (lowercase, trailing colon/period stripped, internal
whitespace collapsed). Adds these variants on day 1:
project: "Project:" / "Project"
client: "Client:" / "Client"
operator: "User Name:" / "User Name"
sensor_location: "Seis Loc:" / "Seis. Location" / "Seis Location"
/ "Sensor Location" / "Seis Loc"
To absorb future BW label drift, add a one-line dict entry — no
new elif branch.
14 new tests cover:
- Each label variant routes to the correct field (parametrised)
- Case-insensitive matching ("seis loc" / "SEIS LOC" / "SeIs LoC")
- Whitespace-collapse ("Seis Loc" with double-space)
- End-to-end parse of a real histogram fixture from
example-events/histogram/ — sensor_location ('Loc #1 - 2652 Hepner...')
populates correctly even though the file uses "Seis. Location"
Total bw_ascii_report tests: 19 → 33. Full SFM suite still green
(69 passed, 44 skipped — pre-existing skips for h5py-dep tests).
Pairs with series3-watcher v1.5.4 (which fixes the filename pairing
so histograms actually reach this parser in the first place).
This commit is contained in:
@@ -265,6 +265,61 @@ def _parse_monitor_ts(s: str) -> Optional[datetime.datetime]:
|
||||
return None
|
||||
|
||||
|
||||
# ── Operator-field label normalisation ──────────────────────────────────────
|
||||
#
|
||||
# BW has used different label spellings across versions and recording
|
||||
# modes for the same operator-supplied fields:
|
||||
#
|
||||
# project: "Project:" / "Project"
|
||||
# client: "Client:" / "Client"
|
||||
# operator: "User Name:" / "User Name"
|
||||
# sensor_location: "Seis Loc:" / "Seis. Location" / "Seis Location"
|
||||
# / "Sensor Location"
|
||||
#
|
||||
# Per user feedback ("the tags themselves dont matter a ton, what
|
||||
# matters is the field"), we normalise labels at lookup time so the
|
||||
# value-extraction works regardless of which spelling BW happens to
|
||||
# emit on a given machine.
|
||||
#
|
||||
# To add a new variant: edit `_OPERATOR_LABEL_MAP` — single source of
|
||||
# truth. Keys are normalised forms (lowercase, trailing colon and
|
||||
# period stripped, internal whitespace collapsed); values are
|
||||
# attribute names on `BwAsciiReport`.
|
||||
|
||||
_OPERATOR_LABEL_MAP = {
|
||||
# project
|
||||
"project": "project",
|
||||
# client
|
||||
"client": "client",
|
||||
# operator
|
||||
"user name": "operator",
|
||||
# sensor location — most variants of "Seis*" + "Sensor Location"
|
||||
"seis loc": "sensor_location",
|
||||
"seis. loc": "sensor_location",
|
||||
"seis. location": "sensor_location",
|
||||
"seis location": "sensor_location",
|
||||
"sensor location": "sensor_location",
|
||||
}
|
||||
|
||||
|
||||
def _normalise_label_for_lookup(key: str) -> str:
|
||||
"""Normalise a label for the operator-field lookup.
|
||||
|
||||
Strips a trailing colon and/or period, collapses internal
|
||||
whitespace runs, and lowercases. So all of:
|
||||
|
||||
"Seis Loc:"
|
||||
"Seis. Location"
|
||||
"seis location"
|
||||
"Sensor Location"
|
||||
|
||||
map to canonical forms in `_OPERATOR_LABEL_MAP`.
|
||||
"""
|
||||
s = key.strip().rstrip(":").rstrip(".").strip()
|
||||
s = _KEY_NORMALISE_RE.sub(" ", s)
|
||||
return s.lower()
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# Top-level parser
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
@@ -346,12 +401,14 @@ def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwA
|
||||
report.calibration_date, report.calibration_by = _parse_calibration(value)
|
||||
elif key == "Units": report.units = value
|
||||
|
||||
# Project labels in BW carry their own trailing colon — after
|
||||
# _normalise_key we just strip it for matching.
|
||||
elif key.rstrip(":") == "Project": report.project = value
|
||||
elif key.rstrip(":") == "Client": report.client = value
|
||||
elif key.rstrip(":") == "User Name":report.operator = value
|
||||
elif key.rstrip(":") == "Seis Loc": report.sensor_location = value
|
||||
# Operator-supplied labels (Project / Client / User Name /
|
||||
# Seis Loc) — BW writes these with assorted spellings across
|
||||
# firmware versions and recording modes (e.g. "Seis Loc:" on
|
||||
# waveform exports vs "Seis. Location" on histogram exports).
|
||||
# The label normaliser absorbs all known variants; see
|
||||
# `_OPERATOR_LABEL_MAP` above for the dispatch table.
|
||||
elif (slot := _OPERATOR_LABEL_MAP.get(_normalise_label_for_lookup(key))):
|
||||
setattr(report, slot, value)
|
||||
|
||||
elif key == "Geo Range": report.geo_range_ips = _parse_number(value)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user