Compare commits
9 Commits
bee118506b
..
main
| Author | SHA1 | Date | |
|---|---|---|---|
| d0b66368d5 | |||
| 25386cab8b | |||
| 6cb619ecc4 | |||
| 1ed86244d0 | |||
| b2c565f217 | |||
| 43f440812a | |||
| 23e83908c2 | |||
| 2eb1d25028 | |||
| cc821f9ee3 |
@@ -8,6 +8,63 @@ All notable changes to seismo-relay are documented here.
|
||||
|
||||
---
|
||||
|
||||
## v0.21.1 — 2026-06-01
|
||||
|
||||
Bug fixes against v0.21.0 surfaced after the first prod redeploy. Three
|
||||
production-visible symptoms — blank waveform charts on most Thor events,
|
||||
blank histogram charts on all Thor events, and a mic chart that
|
||||
auto-scaled against a dB(L) value treated as psi — all root-caused and
|
||||
fixed.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **Dynamic IDFW body offset.** The v0.21.0 codec hardcoded the body
|
||||
at file offset `0x0f1f` based on the example corpus, but only ~52%
|
||||
of production IDFW events use that offset; the rest sit at offsets
|
||||
from `0x1033` up to `0x3082` depending on header padding. At
|
||||
`0x0f1f` the codec would find a coincidentally-matching `00 02 00`
|
||||
magic, read the 2-byte Tran preamble, and return empty V/L/M
|
||||
arrays — producing near-empty .h5 files and blank charts.
|
||||
`micromate.idf_file._find_waveform_body_offset()` now scans every
|
||||
`00 02 00` magic position past `0x0E00`, trial-decodes each one,
|
||||
and picks the offset with the most samples. Validated across 483
|
||||
prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully
|
||||
decode, 126/483 partial (BW codec walker-stops-early on loud
|
||||
events — pre-existing limitation, samples reached are correct).
|
||||
|
||||
- **IDFH histograms now render bar charts.** Histograms previously
|
||||
skipped the .h5 write because there are no per-sample arrays, but
|
||||
the renderer drives the per-interval bar chart from .h5 channel
|
||||
data + `bw_report.histogram.n_intervals`. `save_imported_idf` now
|
||||
synthesizes a 1-sample-per-interval array from the decoded
|
||||
`IdfhInterval` peak counts and writes an .h5 so the existing
|
||||
renderer works unchanged — each "sample" is the per-interval peak
|
||||
ADC count, so the writer's `count × geo_fs/32768` conversion
|
||||
yields the right bar height.
|
||||
|
||||
- **Mic chart scaling on Thor events.** `PeakValues.micl` (consumed
|
||||
by the h5 writer's per-count mic scale factor) expects psi, but
|
||||
the Thor bridge was stuffing the dB(L) value (~99.4) into it,
|
||||
producing a per-count factor 5+ orders of magnitude too large and
|
||||
a flat-looking mic chart. Fixed by adding `IdfPeaks.mic_pspl_psi`
|
||||
alongside `mic_pspl_dbl`; `read_idf_file()` computes it from
|
||||
binary mic counts (`max(|MicL|) × 2.14e-6 psi/count`) for both
|
||||
IDFW and IDFH paths; `save_imported_idf` merges it onto the typed
|
||||
event after `IdfEvent.from_report`; the bridge feeds psi to
|
||||
`PeakValues.micl` with a dB(L)→psi formula fallback when only the
|
||||
dB(L) value is available. dB(L) for the report header still
|
||||
flows through `bw_report.mic.pspl_dbl` unchanged.
|
||||
|
||||
### Operator
|
||||
|
||||
After deploy, run `python scripts/backfill_thor_events.py` to refresh
|
||||
every existing Thor event's sidecar + .h5 with the corrected codec
|
||||
output. The script auto-skips events already at the current
|
||||
`TOOL_VERSION`, so the bump from `0.21.0` → `0.21.1` is what triggers
|
||||
the refresh.
|
||||
|
||||
---
|
||||
|
||||
## v0.21.0 — 2026-05-29
|
||||
|
||||
The "Thor / Series IV codec" release. Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline. Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event.
|
||||
|
||||
+82
-11
@@ -62,12 +62,23 @@ _THOR_PREFIX = b"\x00\x12\x01\x00\x00\x00"
|
||||
_BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00"
|
||||
_INSTANTEL_TAG = b"Instantel"
|
||||
|
||||
# Constant body offset for sig-A IDFW files (verified across 151/154 corpus
|
||||
# files in tests/fixtures/THORDATA_example). The body is the segment-rotated
|
||||
# block stream consumed by decode_waveform_v2; bytes [0:3] are the magic
|
||||
# ``00 02 00`` preamble.
|
||||
# Most common body offset for sig-A IDFW files (~50% of prod events;
|
||||
# 151/154 in the original tests/fixtures/THORDATA_example corpus). The
|
||||
# body is the segment-rotated block stream consumed by decode_waveform_v2;
|
||||
# bytes [0:3] are the magic ``00 02 00`` preamble. Production events
|
||||
# routinely use other offsets — see :func:`_find_waveform_body_offset`
|
||||
# for the dynamic scan. This constant survives only as the priority hint.
|
||||
_BODY_START_SIG_A = 0x0F1F
|
||||
|
||||
# Magic bytes that mark a candidate waveform-body preamble.
|
||||
_BODY_MAGIC = b"\x00\x02\x00"
|
||||
|
||||
# Where to start looking for body candidates inside the file. Skip the
|
||||
# fixed-header region where the same magic legitimately appears inside
|
||||
# channel-test records and the compliance block (offsets 0x015d, 0x091c,
|
||||
# 0x0ae2, 0x0d30 in observed events).
|
||||
_BODY_SCAN_FLOOR = 0x0E00
|
||||
|
||||
# Geophone count → in/s, derived from sidecar ground truth: the smallest
|
||||
# non-zero sample in 1,014-file corpus is 0.0003 in/s.
|
||||
_GEO_LSB_IPS = 0.0003
|
||||
@@ -179,17 +190,65 @@ def extract_binary_metadata(buf: bytes) -> IdfBinaryMetadata:
|
||||
# ─── Sample decoder + unit conversion ───────────────────────────────────────
|
||||
|
||||
|
||||
def _find_waveform_body_offset(buf: bytes) -> Optional[int]:
|
||||
"""Pick the file offset of the waveform body by trial-decoding every
|
||||
``00 02 00`` magic position past the fixed-header region.
|
||||
|
||||
The body's location isn't fixed across all sig-A IDFW files — about
|
||||
half the production events use ``0x0f1f``, but the rest have offsets
|
||||
that shift based on header padding / channel-config layout. We
|
||||
auto-detect by:
|
||||
|
||||
1. Find every ``00 02 00`` occurrence past ``_BODY_SCAN_FLOOR``.
|
||||
2. Try ``decode_waveform_v2()`` on each candidate.
|
||||
3. Pick the offset whose decoded sample count is largest.
|
||||
|
||||
Returns the offset, or ``None`` if no candidate yielded more than
|
||||
the trivial 2-sample preamble (= "no real body found").
|
||||
|
||||
Costs ~2-8 trial decodes per file; in practice the first candidate
|
||||
past 0x0e00 is usually the right one.
|
||||
"""
|
||||
if len(buf) < _BODY_SCAN_FLOOR + 8:
|
||||
return None
|
||||
best: Optional[tuple[int, int]] = None # (total_samples, offset)
|
||||
i = _BODY_SCAN_FLOOR
|
||||
while True:
|
||||
j = buf.find(_BODY_MAGIC, i)
|
||||
if j < 0:
|
||||
break
|
||||
i = j + 1
|
||||
try:
|
||||
decoded = decode_waveform_v2(buf[j:])
|
||||
except Exception:
|
||||
continue
|
||||
if not decoded:
|
||||
continue
|
||||
total = sum(len(v) for v in decoded.values())
|
||||
# A "real" body has more than just the 2-sample preamble.
|
||||
if total <= 2:
|
||||
continue
|
||||
if best is None or total > best[0]:
|
||||
best = (total, j)
|
||||
return best[1] if best else None
|
||||
|
||||
|
||||
def _decode_waveform_samples(buf: bytes) -> Optional[dict]:
|
||||
"""Decode samples from the sig-A body starting at file offset 0x0f1f.
|
||||
"""Decode samples from the sig-A waveform body.
|
||||
|
||||
Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in
|
||||
its own count unit (see :func:`mic_count_to_psi`). Returns None if
|
||||
decoding fails.
|
||||
no usable body is found.
|
||||
|
||||
Uses :func:`_find_waveform_body_offset` to locate the body — the
|
||||
file-offset varies across events (~50% sit at the canonical
|
||||
``0x0f1f`` but the rest don't), so the previous hardcoded constant
|
||||
silently produced 2-sample preamble-only output for half the corpus.
|
||||
"""
|
||||
if len(buf) < _BODY_START_SIG_A + 8:
|
||||
off = _find_waveform_body_offset(buf)
|
||||
if off is None:
|
||||
return None
|
||||
body = buf[_BODY_START_SIG_A:]
|
||||
return decode_waveform_v2(body)
|
||||
return decode_waveform_v2(buf[off:])
|
||||
|
||||
|
||||
def geo_count_to_ips(count: int) -> float:
|
||||
@@ -379,6 +438,10 @@ def read_idf_file(
|
||||
peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0)
|
||||
peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0)
|
||||
peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0)
|
||||
# Mic peak in psi — Thor stores per-interval mic ADC counts in the
|
||||
# binary; convert the max count to psi via the per-count factor.
|
||||
mic_peak_count = max((iv.peak_count("MicL") for iv in intervals), default=0)
|
||||
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
|
||||
rep = IdfReport(
|
||||
serial_number=md.serial,
|
||||
event_type="Full Histogram",
|
||||
@@ -392,7 +455,8 @@ def read_idf_file(
|
||||
vertical_ips=peak_vert,
|
||||
longitudinal_ips=peak_long,
|
||||
peak_vector_sum_ips=None,
|
||||
mic_pspl_dbl=None,
|
||||
mic_pspl_dbl=None, # IDFH binary doesn't carry the dB(L) value
|
||||
mic_pspl_psi=mic_peak_psi,
|
||||
)
|
||||
event = IdfEvent(
|
||||
serial=md.serial or "UNKNOWN",
|
||||
@@ -430,6 +494,11 @@ def read_idf_file(
|
||||
arr = decoded.get(ch, [])
|
||||
return geo_count_to_ips(max((abs(v) for v in arr), default=0))
|
||||
|
||||
# Mic peak psi from binary: max absolute MicL ADC count × 2.14e-6 psi/count.
|
||||
mic_arr = decoded.get("MicL", [])
|
||||
mic_peak_count = max((abs(v) for v in mic_arr), default=0)
|
||||
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
|
||||
|
||||
peaks = IdfPeaks(
|
||||
transverse_ips=_peak_ips("Tran"),
|
||||
vertical_ips=_peak_ips("Vert"),
|
||||
@@ -437,7 +506,9 @@ def read_idf_file(
|
||||
# PVS requires aligned per-sample √(T²+V²+L²); leave None — the
|
||||
# sidecar carries it and the bridge picks it up if present.
|
||||
peak_vector_sum_ips=None,
|
||||
mic_pspl_dbl=None,
|
||||
mic_pspl_dbl=None, # binary IDFW doesn't carry the dB(L) value;
|
||||
# sidecar .txt fills it via IdfReport.from_dict
|
||||
mic_pspl_psi=mic_peak_psi,
|
||||
)
|
||||
|
||||
event = IdfEvent(
|
||||
|
||||
+27
-6
@@ -159,12 +159,23 @@ class IdfReport:
|
||||
|
||||
@dataclass
|
||||
class IdfPeaks:
|
||||
"""Geophone + mic peak values for one Thor event. Native Thor units."""
|
||||
"""Geophone + mic peak values for one Thor event. Native Thor units.
|
||||
|
||||
Thor stores the mic peak in two parallel forms — ``mic_pspl_dbl`` is
|
||||
what the sidecar's top-level ``MicPSPL`` header field carries (dB(L)),
|
||||
used in the report header. ``mic_pspl_psi`` is the psi value derived
|
||||
either from the IDFW sample table / IDFH interval column 9, or from
|
||||
the binary mic counts (~2.14e-6 psi/count). Needed because the
|
||||
BW-shaped ``PeakValues.micl`` consumed by ``event_hdf5.write_event_hdf5``
|
||||
expects psi — feeding it dB(L) makes the h5 mic-chart scale factor
|
||||
blow up.
|
||||
"""
|
||||
transverse_ips: Optional[float] = None # in/s
|
||||
vertical_ips: Optional[float] = None # in/s
|
||||
longitudinal_ips: Optional[float] = None # in/s
|
||||
peak_vector_sum_ips: Optional[float] = None # in/s
|
||||
mic_pspl_dbl: Optional[float] = None # dB(L)
|
||||
mic_pspl_psi: Optional[float] = None # psi
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -324,10 +335,14 @@ class IdfEvent:
|
||||
machinery without those code paths needing to know about Thor.
|
||||
|
||||
Caveats of the bridge:
|
||||
- ``mic_ppv`` on the produced Event carries Thor's dB(L) value
|
||||
verbatim — the UI distinguishes via the ``device_family``
|
||||
column (Phase 1). Don't run the BW psi→dBL converter on
|
||||
Series IV rows.
|
||||
- ``PeakValues.micl`` carries the mic peak in **psi** (matching
|
||||
BW's convention) — set from :attr:`IdfPeaks.mic_pspl_psi`,
|
||||
with a dB(L)→psi fallback when only the dB(L) value is
|
||||
available. This is what the h5 writer's mic-scale-factor
|
||||
logic needs. The dB(L) value still flows through
|
||||
``bw_report.mic.pspl_dbl`` (set by the
|
||||
``idf_to_bw_report`` adapter) and the renderer reads it
|
||||
from there for the report header.
|
||||
- Many Thor-specific fields (Peak Acceleration / Displacement,
|
||||
sensor self-check, calibration) don't have a slot in
|
||||
``Event``. The full IdfReport is preserved on the
|
||||
@@ -349,11 +364,17 @@ class IdfEvent:
|
||||
minute=self.timestamp.minute,
|
||||
second=self.timestamp.second,
|
||||
)
|
||||
# Resolve mic peak as psi. Priority: binary-derived mic_pspl_psi
|
||||
# (set by read_idf_file) > dB(L)→psi fallback via standard formula
|
||||
# (psi = 2.9e-9 × 10^(dBL/20)) > None.
|
||||
mic_psi = self.peaks.mic_pspl_psi
|
||||
if mic_psi is None and self.peaks.mic_pspl_dbl is not None:
|
||||
mic_psi = 2.9e-9 * (10.0 ** (self.peaks.mic_pspl_dbl / 20.0))
|
||||
pv = PeakValues(
|
||||
tran=self.peaks.transverse_ips,
|
||||
vert=self.peaks.vertical_ips,
|
||||
long=self.peaks.longitudinal_ips,
|
||||
micl=self.peaks.mic_pspl_dbl, # dB(L) — see caveat above
|
||||
micl=mic_psi, # psi, matching BW's convention (h5 scaling depends on this)
|
||||
peak_vector_sum=self.peaks.peak_vector_sum_ips,
|
||||
)
|
||||
pi = ProjectInfo(
|
||||
|
||||
@@ -49,7 +49,7 @@ SIDECAR_KIND = "sfm.event"
|
||||
# bumped without a `pip install` re-run — leading to confusing stale
|
||||
# version stamps in sidecars. Bump this constant and CHANGELOG.md
|
||||
# together at release time.
|
||||
TOOL_VERSION = "0.21.0"
|
||||
TOOL_VERSION = "0.21.1"
|
||||
|
||||
try:
|
||||
# Best-effort: prefer the installed metadata when it's NEWER than the
|
||||
|
||||
+1
-1
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "seismo-relay"
|
||||
version = "0.21.0"
|
||||
version = "0.21.1"
|
||||
description = "Python client and REST server for MiniMate Plus seismographs"
|
||||
requires-python = ">=3.10"
|
||||
dependencies = [
|
||||
|
||||
@@ -0,0 +1,331 @@
|
||||
"""
|
||||
scripts/backfill_thor_events.py — re-process existing Thor (Series IV)
|
||||
events so their sidecars carry the bw_report block produced by
|
||||
``micromate.idf_to_bw_report.build_bw_report_from_idf`` + their .h5
|
||||
clean-waveform files for IDFW events.
|
||||
|
||||
Why this exists
|
||||
───────────────
|
||||
|
||||
Thor events ingested before v0.21.0 (or during the v0.21.0 ingest bug
|
||||
window fixed in commit bee1185) have sidecars with only
|
||||
``extensions.idf_report`` — no ``bw_report`` block. Without
|
||||
``bw_report``, the SFM PDF renderer falls back to DB-only fields
|
||||
(misses sensor-self-check, full per-channel breakdown, mic dB(L)),
|
||||
and the modal chart 404s on ``/waveform.json`` for IDFW events
|
||||
because no .h5 was written when the codec failed at ingest.
|
||||
|
||||
Re-forwarding from thor-watcher would also fix this, but that requires
|
||||
operator coordination on every watcher machine and uses bandwidth this
|
||||
script doesn't.
|
||||
|
||||
What this does
|
||||
──────────────
|
||||
|
||||
Walks ``<store>/<serial>/<filename>`` for ``.IDFW`` / ``.IDFH`` files
|
||||
and, for each one:
|
||||
|
||||
1. Reads the existing sidecar (preserving review state + captured_at).
|
||||
2. Re-runs ``micromate.idf_file.read_idf_file()`` on the binary
|
||||
bytes — passing ``data=`` so the codec doesn't try to read from
|
||||
a path it doesn't know.
|
||||
3. Pulls ``extensions.idf_report`` (the raw parsed Thor dict the
|
||||
v0.18.0+ ingest path already stashed) and runs the v0.21.0
|
||||
``build_bw_report_from_idf`` adapter against it.
|
||||
4. Writes the refreshed sidecar with the new ``bw_report``,
|
||||
bumped ``source.tool_version``, but preserved ``review`` block
|
||||
+ the original ``captured_at`` timestamp.
|
||||
5. Regenerates the .h5 waveform file via the existing
|
||||
``event_hdf5`` writer. For IDFW that's the decoded per-sample
|
||||
stream; for IDFH it's a 1-sample-per-interval synthesised array
|
||||
(peak ADC count per channel) so the renderer's bar-chart code
|
||||
has data to group on. Mic peak psi from the binary is merged
|
||||
onto the IdfEvent before the bridge so the h5 writer's per-count
|
||||
mic scale factor lands on a sensible value (without this the
|
||||
mic chart on Thor events plots dB(L)-as-pseudo-psi and shows
|
||||
bomb-level numbers).
|
||||
|
||||
Idempotent. Re-running it after a parser/adapter change just
|
||||
re-writes sidecars — no DB writes, no thor-watcher coordination.
|
||||
|
||||
Usage
|
||||
─────
|
||||
|
||||
python scripts/backfill_thor_events.py [--store-root PATH]
|
||||
[--dry-run]
|
||||
[--skip-hdf5]
|
||||
[--force]
|
||||
[-v]
|
||||
|
||||
By default, refreshes any Thor event whose sidecar is missing
|
||||
``bw_report`` OR whose ``source.tool_version`` is older than the
|
||||
current ``TOOL_VERSION``. ``--force`` refreshes every Thor event
|
||||
regardless.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Allow running from the repo root without installation.
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from minimateplus import event_file_io
|
||||
from sfm.waveform_store import WaveformStore
|
||||
|
||||
log = logging.getLogger("backfill_thor_events")
|
||||
|
||||
|
||||
def _is_thor_event(path: Path) -> bool:
|
||||
if not path.is_file():
|
||||
return False
|
||||
if path.name.endswith((".sfm.json", ".h5", "_ASCII.TXT")):
|
||||
return False
|
||||
return path.suffix.upper() in (".IDFW", ".IDFH")
|
||||
|
||||
|
||||
def _vtuple(s: str) -> tuple:
|
||||
try:
|
||||
return tuple(int(p) for p in str(s).split(".")[:3])
|
||||
except Exception:
|
||||
return (0, 0, 0)
|
||||
|
||||
|
||||
def main(argv=None) -> int:
|
||||
p = argparse.ArgumentParser(description=__doc__)
|
||||
p.add_argument(
|
||||
"--db-path",
|
||||
default=str(Path(__file__).resolve().parent.parent / "bridges" / "captures" / "seismo_relay.db"),
|
||||
help="Used only to derive the default --store-root.",
|
||||
)
|
||||
p.add_argument("--store-root", default=None)
|
||||
p.add_argument("--dry-run", action="store_true")
|
||||
p.add_argument("--skip-hdf5", action="store_true",
|
||||
help="Don't regenerate .h5 files for IDFW events.")
|
||||
p.add_argument("--force", action="store_true",
|
||||
help="Refresh every Thor event, not just ones with stale or missing bw_report.")
|
||||
p.add_argument("-v", "--verbose", action="store_true")
|
||||
args = p.parse_args(argv)
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.DEBUG if args.verbose else logging.INFO,
|
||||
format="%(asctime)s %(levelname)-7s %(name)s %(message)s",
|
||||
datefmt="%H:%M:%S",
|
||||
)
|
||||
|
||||
db_path = Path(args.db_path).expanduser().resolve()
|
||||
store_root = (
|
||||
Path(args.store_root).expanduser().resolve()
|
||||
if args.store_root else db_path.parent / "waveforms"
|
||||
)
|
||||
if not store_root.exists():
|
||||
log.error("store root not found: %s", store_root)
|
||||
return 1
|
||||
store = WaveformStore(store_root)
|
||||
log.info("store root: %s", store_root)
|
||||
log.info("current TOOL_VERSION: %s", event_file_io.TOOL_VERSION)
|
||||
|
||||
refreshed = skipped = errors = h5_written = 0
|
||||
|
||||
# Lazy imports so any one of these failing produces a useful error
|
||||
# message rather than crashing module-load.
|
||||
from micromate.idf_file import read_idf_file
|
||||
from micromate.idf_to_bw_report import build_bw_report_from_idf
|
||||
|
||||
for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
|
||||
serial = serial_dir.name
|
||||
for path in sorted(serial_dir.iterdir()):
|
||||
if not _is_thor_event(path):
|
||||
continue
|
||||
|
||||
sidecar_path = store.sidecar_path_for(serial, path.name)
|
||||
if not sidecar_path.exists():
|
||||
log.debug("%s: no sidecar — skipping (this is a binary without ingest history)",
|
||||
path.name)
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
try:
|
||||
existing = event_file_io.read_sidecar(sidecar_path)
|
||||
except Exception as exc:
|
||||
log.warning("%s: failed to read sidecar — %s", path.name, exc)
|
||||
errors += 1
|
||||
continue
|
||||
|
||||
has_bw_report = bool(existing.get("bw_report"))
|
||||
existing_version = (existing.get("source") or {}).get("tool_version", "")
|
||||
up_to_date = (
|
||||
has_bw_report
|
||||
and _vtuple(existing_version) >= _vtuple(event_file_io.TOOL_VERSION)
|
||||
)
|
||||
if up_to_date and not args.force:
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
# Re-decode the binary. Catch + log; continue with .txt-only
|
||||
# data if it fails (matches the live ingest path's behavior).
|
||||
idf_samples = None
|
||||
idf_intervals = None
|
||||
binary_md = None
|
||||
is_histogram = path.suffix.upper() == ".IDFH"
|
||||
try:
|
||||
binary_bytes = path.read_bytes()
|
||||
res = read_idf_file(path, data=binary_bytes)
|
||||
idf_samples = res.samples or None
|
||||
idf_intervals = res.intervals
|
||||
binary_md = res.binary_metadata
|
||||
is_histogram = res.intervals is not None
|
||||
except NotImplementedError:
|
||||
# sig-B / Blastware-stray binary; no samples but adapter
|
||||
# can still produce a bw_report from extensions.idf_report.
|
||||
log.debug("%s: binary codec NotImplementedError (sig-B / BW-stray); proceeding from sidecar's idf_report only", path.name)
|
||||
except Exception as exc:
|
||||
log.warning("%s: binary decode failed — %s; proceeding from sidecar's idf_report only", path.name, exc)
|
||||
|
||||
# Run the adapter. Pull report_dict from
|
||||
# extensions.idf_report (the v0.18.0+ ingest preserved it).
|
||||
report_dict = (existing.get("extensions") or {}).get("idf_report") or {}
|
||||
if not report_dict and binary_md is None:
|
||||
log.debug("%s: no idf_report in sidecar AND no binary metadata — nothing to project", path.name)
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
try:
|
||||
bw_report = build_bw_report_from_idf(
|
||||
report_dict, binary_md=binary_md,
|
||||
intervals=idf_intervals, is_histogram=is_histogram,
|
||||
)
|
||||
except Exception as exc:
|
||||
log.warning("%s: adapter failed — %s", path.name, exc)
|
||||
errors += 1
|
||||
continue
|
||||
|
||||
# Build the new sidecar by overlaying refreshed fields onto
|
||||
# the existing one — preserves review, captured_at, blastware
|
||||
# block, source.kind, etc.
|
||||
new_sidecar = dict(existing) # shallow copy
|
||||
new_sidecar["bw_report"] = bw_report
|
||||
src = dict(new_sidecar.get("source") or {})
|
||||
src["tool_version"] = event_file_io.TOOL_VERSION
|
||||
new_sidecar["source"] = src
|
||||
|
||||
# Preserve histogram intervals if the binary decoded them
|
||||
# (improves over the original ingest if that one ran before
|
||||
# the bee1185 codec fix).
|
||||
if idf_intervals is not None:
|
||||
ext = dict(new_sidecar.get("extensions") or {})
|
||||
ext["idf_intervals"] = [
|
||||
{
|
||||
"offset": iv.offset,
|
||||
"tran_peak": iv.peak_count("Tran"),
|
||||
"tran_halfp": iv.tran_halfp,
|
||||
"tran_freq": iv.freq_hz("Tran"),
|
||||
"vert_peak": iv.peak_count("Vert"),
|
||||
"vert_halfp": iv.vert_halfp,
|
||||
"vert_freq": iv.freq_hz("Vert"),
|
||||
"long_peak": iv.peak_count("Long"),
|
||||
"long_halfp": iv.long_halfp,
|
||||
"long_freq": iv.freq_hz("Long"),
|
||||
"mic_peak": iv.peak_count("MicL"),
|
||||
"mic_halfp": iv.micl_halfp,
|
||||
"mic_freq": iv.freq_hz("MicL"),
|
||||
}
|
||||
for iv in idf_intervals
|
||||
]
|
||||
new_sidecar["extensions"] = ext
|
||||
|
||||
if args.dry_run:
|
||||
will_write_h5 = (idf_samples or idf_intervals) and not args.skip_hdf5
|
||||
log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)",
|
||||
serial, path.name,
|
||||
"wrote" if not has_bw_report else "refreshed",
|
||||
"would write" if will_write_h5 else "skipped")
|
||||
else:
|
||||
event_file_io.write_sidecar(sidecar_path, new_sidecar)
|
||||
log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)",
|
||||
serial, path.name,
|
||||
"added" if not has_bw_report else "refreshed",
|
||||
len(idf_intervals) if idf_intervals else 0)
|
||||
refreshed += 1
|
||||
|
||||
# Regenerate .h5 by replaying the same IdfEvent → Event bridge
|
||||
# save_imported_idf uses. For IDFW we write the decoded per-
|
||||
# sample arrays. For IDFH we synthesise a 1-sample-per-interval
|
||||
# array (peak ADC count per channel per interval) so the
|
||||
# renderer's bar-chart code has something to group on.
|
||||
# Pre-condition: either real samples (IDFW) or decoded intervals
|
||||
# (IDFH). Skip otherwise.
|
||||
have_data = bool(idf_samples) or bool(idf_intervals)
|
||||
if have_data and not args.skip_hdf5:
|
||||
from sfm import event_hdf5
|
||||
hdf5_path = store.hdf5_path_for(serial, path.name)
|
||||
if args.dry_run:
|
||||
log.debug("[DRY] would write %s", hdf5_path.name)
|
||||
else:
|
||||
try:
|
||||
from micromate import IdfEvent
|
||||
from minimateplus.event_file_io import file_sha256
|
||||
idf_event = IdfEvent.from_report(report_dict, path.name)
|
||||
|
||||
# Merge the binary-derived mic peak psi (only the
|
||||
# binary path knows the proper psi value; the .txt
|
||||
# carries dB(L)). Without this, the h5 writer's
|
||||
# per-count mic factor is computed against the
|
||||
# dB(L) value-as-pseudo-psi and the mic chart
|
||||
# scales wildly.
|
||||
if (binary_md is not None and res is not None
|
||||
and res.event.peaks.mic_pspl_psi is not None):
|
||||
idf_event.peaks.mic_pspl_psi = res.event.peaks.mic_pspl_psi
|
||||
|
||||
sha256 = file_sha256(path)
|
||||
waveform_key = bytes.fromhex(sha256)[:16]
|
||||
ev = idf_event.to_minimateplus_event(waveform_key)
|
||||
|
||||
if is_histogram and idf_intervals:
|
||||
# 1 sample per interval per channel — same
|
||||
# synthesis save_imported_idf uses. The h5
|
||||
# writer's count×geo_fs/32768 conversion turns
|
||||
# each peak-ADC-count into the bar's physical
|
||||
# value.
|
||||
ev.raw_samples = {
|
||||
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
|
||||
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
|
||||
"Long": [iv.peak_count("Long") for iv in idf_intervals],
|
||||
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
|
||||
}
|
||||
ev.total_samples = ev.total_samples or len(idf_intervals)
|
||||
elif idf_samples:
|
||||
ev.raw_samples = idf_samples
|
||||
n_samp = max(
|
||||
(len(idf_samples.get(ch, []))
|
||||
for ch in ("Tran", "Vert", "Long", "MicL")),
|
||||
default=0,
|
||||
)
|
||||
ev.total_samples = ev.total_samples or n_samp
|
||||
|
||||
event_hdf5.write_event_hdf5(
|
||||
hdf5_path, ev,
|
||||
serial=serial,
|
||||
geo_range="normal",
|
||||
source_kind="idf-import",
|
||||
tool_version=event_file_io.TOOL_VERSION,
|
||||
)
|
||||
h5_written += 1
|
||||
log.debug("%s/%s — .h5 written (%s)",
|
||||
serial, path.name,
|
||||
f"{len(idf_intervals)} intervals" if is_histogram
|
||||
else f"{sum(len(v) for v in (idf_samples or {}).values())} samples")
|
||||
except Exception as exc:
|
||||
log.warning("%s/%s — .h5 write failed: %s",
|
||||
serial, path.name, exc)
|
||||
|
||||
log.info("Done. refreshed=%d skipped=%d errors=%d h5_written=%d",
|
||||
refreshed, skipped, errors, h5_written)
|
||||
return 0 if errors == 0 else 2
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@@ -0,0 +1,91 @@
|
||||
"""Re-ingest a prod IDFW + IDFH via the patched save_imported_idf and
|
||||
render both PDFs to confirm charts have data."""
|
||||
from __future__ import annotations
|
||||
import sys
|
||||
import json
|
||||
import datetime
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
|
||||
|
||||
from sfm.waveform_store import WaveformStore
|
||||
from sfm import report_pdf
|
||||
import h5py
|
||||
|
||||
|
||||
class FakeDb:
|
||||
def __init__(self, event):
|
||||
self.event = event
|
||||
def get_event(self, _id):
|
||||
return self.event
|
||||
|
||||
|
||||
def to_ts_iso(ts):
|
||||
if ts is None:
|
||||
return None
|
||||
try:
|
||||
return datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def render_case(idf_path: Path, serial: str, out_pdf: Path, h5_summary: bool = True):
|
||||
with tempfile.TemporaryDirectory() as td:
|
||||
store = WaveformStore(Path(td))
|
||||
ev, rec = store.save_imported_idf(
|
||||
idf_path.read_bytes(),
|
||||
idf_path,
|
||||
idf_report_text=None, # production worst case: no .txt
|
||||
)
|
||||
print(f"=== {idf_path.name} ===")
|
||||
print(f" h5: {rec['hdf5_filename']}, sidecar: {rec['sidecar_filename']}")
|
||||
|
||||
h5p = Path(td) / serial / f"{idf_path.name}.h5"
|
||||
if h5p.exists() and h5_summary:
|
||||
with h5py.File(h5p) as h:
|
||||
for ch in ("Tran", "Vert", "Long", "MicL"):
|
||||
ds = h.get(f"samples/{ch}")
|
||||
if ds is not None:
|
||||
n = ds.shape[0]
|
||||
mx = float(abs(ds[...]).max()) if n else 0
|
||||
print(f" samples/{ch}: n={n} max_abs={mx:.5f}")
|
||||
|
||||
record_type = "Histogram" if idf_path.suffix.upper() == ".IDFH" else "Waveform"
|
||||
fake_row = {
|
||||
"serial": serial,
|
||||
"blastware_filename": rec["filename"],
|
||||
"record_type": record_type,
|
||||
"timestamp": to_ts_iso(ev.timestamp),
|
||||
"sample_rate": ev.sample_rate,
|
||||
"project": ev.project_info.project if ev.project_info else None,
|
||||
"client": ev.project_info.client if ev.project_info else None,
|
||||
"operator": ev.project_info.operator if ev.project_info else None,
|
||||
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
|
||||
"created_at": None,
|
||||
}
|
||||
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
|
||||
print(f" ReportData: channels={ {k: len(v) for k,v in rd.channels.items()} }")
|
||||
if rd.is_histogram:
|
||||
print(f" histogram n_intervals={rd.histogram_n_intervals} interval_size={rd.histogram_interval_size}")
|
||||
pdf = report_pdf.render_event_report_pdf(rd)
|
||||
out_pdf.write_bytes(pdf)
|
||||
print(f" PDF: {out_pdf} ({len(pdf)} bytes)")
|
||||
|
||||
|
||||
def main():
|
||||
out_dir = Path("/tmp/thor_render_test"); out_dir.mkdir(exist_ok=True)
|
||||
cases = [
|
||||
# IDFW that decoded to preamble-only under the old codec
|
||||
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804154137.IDFW", "UM6047"),
|
||||
# IDFW that worked under the old codec (validates no regression)
|
||||
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804104450.IDFW", "UM6047"),
|
||||
# IDFH histogram
|
||||
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804190047.IDFH", "UM6047"),
|
||||
]
|
||||
for path, serial in cases:
|
||||
render_case(Path(path), serial, out_dir / f"{Path(path).name}.pdf")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
+57
-18
@@ -638,14 +638,7 @@ def _draw_channel_stats_waveform(ax, rd: ReportData) -> None:
|
||||
("Sensor Check", "sensor_check", ""),
|
||||
]
|
||||
_draw_stats_table(ax, rd, rows_spec)
|
||||
if rd.peak_vector_sum_ips is not None:
|
||||
line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s"
|
||||
if rd.peak_vector_sum_time_s is not None:
|
||||
line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
|
||||
ax.text(0.0, -0.08, line, fontsize=9, weight="bold",
|
||||
ha="left", va="top", transform=ax.transAxes)
|
||||
ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888",
|
||||
ha="left", va="top", transform=ax.transAxes)
|
||||
_draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec))
|
||||
|
||||
|
||||
def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
|
||||
@@ -663,20 +656,54 @@ def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
|
||||
("Sensor Check", "sensor_check", ""),
|
||||
]
|
||||
_draw_stats_table(ax, rd, rows_spec)
|
||||
if rd.peak_vector_sum_ips is not None:
|
||||
_draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec), histogram_when=True)
|
||||
|
||||
|
||||
def _draw_pvs_summary(
|
||||
ax,
|
||||
rd: ReportData,
|
||||
*,
|
||||
n_data_rows: int,
|
||||
histogram_when: bool = False,
|
||||
) -> None:
|
||||
"""Render the Peak Vector Sum + 'NA: Not Applicable' caption below the
|
||||
stats table.
|
||||
|
||||
Reads ``ax._stats_table_bottom`` (set by ``_draw_stats_table`` when
|
||||
it pins the table via an explicit ``bbox``) so the PVS line lands
|
||||
just below the table's known bottom edge instead of guessing at the
|
||||
geometry.
|
||||
|
||||
Centered horizontally for visual balance (the previous left-aligned
|
||||
x=0 landed under the label column, not the data, which looked off).
|
||||
"""
|
||||
if rd.peak_vector_sum_ips is None:
|
||||
return
|
||||
|
||||
line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s"
|
||||
# Histograms: "0.091 in/s on May 27, 2026 At 06:06:14"
|
||||
# The when_str is "HH:MM:SS Month DD, YYYY" — reformat for BW match.
|
||||
if rd.peak_vector_sum_when_str:
|
||||
if histogram_when and rd.peak_vector_sum_when_str:
|
||||
# Histogram absolute date+time. when_str is "HH:MM:SS Month DD, YYYY";
|
||||
# reformat to "<value> on <date> At <time>" to match BW.
|
||||
parts = rd.peak_vector_sum_when_str.split(" ", 1)
|
||||
if len(parts) == 2:
|
||||
line += f" on {parts[1]} At {parts[0]}"
|
||||
else:
|
||||
line += f" on {rd.peak_vector_sum_when_str}"
|
||||
ax.text(0.0, -0.08, line, fontsize=9, weight="bold",
|
||||
ha="left", va="top", transform=ax.transAxes)
|
||||
ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888",
|
||||
ha="left", va="top", transform=ax.transAxes)
|
||||
elif not histogram_when and rd.peak_vector_sum_time_s is not None:
|
||||
line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
|
||||
|
||||
# _draw_stats_table stashes the bbox bottom on the axes so we don't
|
||||
# have to guess geometry. Falls back to a conservative default if
|
||||
# the bbox approach hasn't run.
|
||||
table_bottom_y = getattr(ax, "_stats_table_bottom", -0.10)
|
||||
pvs_y = table_bottom_y - 0.04 # small gap below the table border
|
||||
|
||||
# Centered for visual balance — looks intentional rather than offset.
|
||||
# The original BW-replica had a "NA: Not Applicable" caption below
|
||||
# this line; dropped because we use "—" for missing values and the
|
||||
# legend was always squished against the PVS line.
|
||||
ax.text(0.5, pvs_y, line, fontsize=9, weight="bold",
|
||||
ha="center", va="top", transform=ax.transAxes)
|
||||
|
||||
|
||||
def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]]) -> None:
|
||||
@@ -711,16 +738,28 @@ def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]])
|
||||
_cell(field_name, "Long"),
|
||||
unit,
|
||||
])
|
||||
# Pin the table's position+size via bbox so we know exactly where
|
||||
# the bottom edge lands. Lets _draw_pvs_summary place the PVS line
|
||||
# just below the table without guessing at row heights.
|
||||
#
|
||||
# bbox = [x, y, width, height] in axes coords. Header + data rows
|
||||
# at row_h each; horizontal extent matches sum(colWidths).
|
||||
n_rows = len(table_data) # header + data rows
|
||||
row_h = 0.12 # axes-fraction per row (fits fontsize=8)
|
||||
table_height = n_rows * row_h
|
||||
table_bottom = 1.0 - table_height
|
||||
tbl = ax.table(
|
||||
cellText=table_data, loc="upper left",
|
||||
cellText=table_data,
|
||||
colWidths=[0.28, 0.14, 0.14, 0.14, 0.10],
|
||||
cellLoc="left", edges="open",
|
||||
bbox=[0.0, table_bottom, 0.80, table_height],
|
||||
)
|
||||
tbl.auto_set_font_size(False)
|
||||
tbl.set_fontsize(8)
|
||||
tbl.scale(1, 1.4)
|
||||
for j in range(5):
|
||||
tbl[(0, j)].set_text_props(weight="bold", color="#555")
|
||||
# Stash the bottom Y so _draw_pvs_summary can position itself below.
|
||||
ax._stats_table_bottom = table_bottom
|
||||
|
||||
|
||||
def _channel_axis_color(ch: str) -> str:
|
||||
|
||||
+31
-3
@@ -568,6 +568,16 @@ class WaveformStore:
|
||||
# precedence over the filename timestamp inside from_report().
|
||||
idf_event = IdfEvent.from_report(report_dict, source_path.name)
|
||||
|
||||
# The binary mic peak (psi) isn't carried through from_report() —
|
||||
# IdfReport.from_dict only sees the .txt's dB(L) value. Pull the
|
||||
# binary-derived ``mic_pspl_psi`` onto the typed IdfEvent so the
|
||||
# downstream bridge can populate ``PeakValues.micl`` (psi-shaped)
|
||||
# and the h5 writer's per-count mic factor lands at a sensible
|
||||
# value. Without this, the h5 mic chart auto-scales against the
|
||||
# dB(L) value-as-pseudo-psi and renders ~flat.
|
||||
if binary_peaks is not None and binary_peaks.mic_pspl_psi is not None:
|
||||
idf_event.peaks.mic_pspl_psi = binary_peaks.mic_pspl_psi
|
||||
|
||||
# Operator-supplied serial_hint wins over the binary's filename
|
||||
# prefix when both are present (e.g. callers passing a known-good
|
||||
# serial that overrides a misnamed export).
|
||||
@@ -600,10 +610,28 @@ class WaveformStore:
|
||||
n_samples = max((len(idf_samples.get(ch, [])) for ch in ("Tran", "Vert", "Long", "MicL")), default=0)
|
||||
ev.total_samples = ev.total_samples or n_samples
|
||||
|
||||
# 7. Write the .h5 clean-waveform file when we actually have samples.
|
||||
# Histograms (IDFH) don't have waveform samples — skip h5 for those.
|
||||
# For IDFH histograms there are no per-sample waveform arrays — the
|
||||
# device stores one peak ADC count per interval per channel. Synthesise
|
||||
# a 1-sample-per-interval array so the existing h5+renderer pipeline
|
||||
# (which groups samples down to ``n_intervals`` bars via max-per-group)
|
||||
# produces a non-blank histogram chart. Each "sample" is the peak ADC
|
||||
# count for that interval, so the h5 writer's ``count × geo_fs/32768``
|
||||
# conversion yields the right physical value for the bar height.
|
||||
if is_histogram and idf_intervals:
|
||||
hist_samples = {
|
||||
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
|
||||
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
|
||||
"Long": [iv.peak_count("Long") for iv in idf_intervals],
|
||||
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
|
||||
}
|
||||
ev.raw_samples = hist_samples
|
||||
ev.total_samples = ev.total_samples or len(idf_intervals)
|
||||
|
||||
# 7. Write the .h5 clean-waveform file when we have samples to write
|
||||
# (either the IDFW per-sample stream, or the IDFH synthesised per-
|
||||
# interval peak array). The renderer treats both shapes the same way.
|
||||
hdf5_filename: Optional[str] = None
|
||||
if idf_samples is not None and not is_histogram:
|
||||
if ev.raw_samples:
|
||||
hdf5_path = self.hdf5_path_for(serial, filename)
|
||||
try:
|
||||
event_hdf5.write_event_hdf5(
|
||||
|
||||
Reference in New Issue
Block a user