6 Commits

Author SHA1 Message Date
serversdown 25386cab8b fix(backfill): regenerate IDFH .h5 + merge binary mic_pspl_psi onto bridge
Two gaps in backfill_thor_events.py that left old Thor events showing
stale charts after a v0.21.1 backfill pass:
1. IDFH events were skipped from .h5 regeneration (the "have decoded
   samples" gate was IDFW-only).  Histograms kept their pre-v0.21.1
   .h5 — written from raw_samples = None, which the renderer turned
   into a near-empty bar chart, or for older events the dB(L)-as-pseudo-
   psi mic scale that produced "107.7 psi" peaks (atomic-bomb level
   instead of footstep level).  Fix: synthesise the same 1-sample-per-
   interval array save_imported_idf v0.21.1 uses (peak ADC count per
   channel per interval) so the renderer's bar-chart grouping has
   data to work with.
2. The IDFW h5 path didn't merge binary_peaks.mic_pspl_psi onto the
   IdfEvent before to_minimateplus_event().  The live save_imported_idf
   does this merge — without it, IdfEvent.from_report() only sees the
   .txt's dB(L) value, the bridge falls back to the dBL→psi formula
   (instead of the binary-accurate 2.14e-6 psi/count value), and the
   h5 writer's per-count mic factor lands on a less-correct value.
   Fix: same merge the live ingest does (lift res.event.peaks.mic_pspl_psi
   onto idf_event.peaks before the bridge call).
Verified against UM6047_20250804190047.IDFH (250-interval prod
histogram): 250 intervals decode, mic_pspl_psi = 2.78e-5 (was being
treated as dB(L)=107.7 in the old h5).
Operator: re-run after deploy.  `docker compose exec sfm python
scripts/backfill_thor_events.py` is idempotent — the existing version
check still skips events already at the new TOOL_VERSION, and review
state + captured_at are preserved on the second pass.
2026-06-01 20:02:54 +00:00
serversdown 6cb619ecc4 version bump - 0.21.1 2026-06-01 19:33:44 +00:00
serversdown 1ed86244d0 fix(thor-events): add parallel field for mic psi. Now shows mic in dbl and psi. (psi for charts) 2026-06-01 18:27:24 +00:00
serversdown b2c565f217 fix(idf_waveforms): _find_waveform_body_offset() — scans every 00 02 00 magic past offset 0x0E00, runs decode_waveform_v2 on each candidate, picks the one that returns the most samples. Validated on 483 prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully decode, 126/483 partial (BW codec walker-stops-early on loud events — known issue).
IDFH now synthesises a 1-sample-per-interval array from the binary intervals and writes an .h5 so the existing renderer works unchanged. Each "sample" is the per-interval peak ADC count → h5_value = count × geo_fs/32768 yields the right bar height.
2026-05-31 20:51:09 +00:00
serversdown 43f440812a scripts: add backfill_thor_events.py
Refreshes the bw_report sidecar block + .h5 waveform files for Thor
events ingested before the v0.21.0 adapter wiring + the bee1185 codec
fix.  Those events landed with extensions.idf_report only (no
bw_report, no .h5 for IDFW) — symptom on the UI side: the modal chart
404'd on /waveform.json and the PDF rendered from DB-only fields
without sensor self-check, full per-channel breakdown, or mic dB(L).

Walks <store>/<serial>/<filename>:
  - Reads the existing sidecar (preserves review state + captured_at)
  - Re-runs read_idf_file() on the binary bytes (passes data=
    kwarg so codec doesn't try the broken bare-path Path.read_bytes)
  - Reads extensions.idf_report from the existing sidecar
  - Runs build_bw_report_from_idf adapter
  - Writes refreshed sidecar with bw_report + bumped tool_version,
    preserving review block and original captured_at
  - For IDFW: regenerates .h5 by bridging IdfEvent.from_report ->
    to_minimateplus_event -> write_event_hdf5 (mirrors save_imported_idf
    steps 4-7)
  - IDFH events skip .h5 (histograms have no per-sample data)

Skips events already at current TOOL_VERSION with bw_report present.
--force overrides.  --skip-hdf5 limits to sidecar-only refresh.
--dry-run for preview.

Validated against the prod-snap waveform store: 3,815 Thor sidecars
refreshed cleanly with 0 errors, 462 IDFW .h5 files written, 2 skipped
(binaries with no sidecar — backfill doesn't conjure events from
nothing).  Verified one originally-broken IDFW event now serves
waveform.json (200, 168KB) and a fully populated PDF (119KB vs the
previous 56KB sparse output).

Operator workflow on prod:
  docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py --dry-run
  # Inspect counts, then for real:
  docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py

Idempotent — re-running it is a no-op once everything's at the current
TOOL_VERSION.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 04:37:43 +00:00
serversdown 23e83908c2 report_pdf: fix PVS overlapping stats table, drop NA caption
Two related fixes to the per-channel stats block:

1. Pin the stats table's position via an explicit bbox= on
   ax.table() so the bottom edge is at a known axes-fraction Y.
   The previous loc="upper left" + tbl.scale(1, 1.4) combo let
   matplotlib choose row heights based on text size, which made the
   table extend further below the axes than the hard-coded PVS line
   at y=-0.08 expected.  Result was the "Peak Vector Sum X in/s"
   string landing horizontally inside the Peak Displacement row.

   With bbox=[0, 1-N*0.12, 0.80, N*0.12] the table is pinned to a
   precise rectangle (12% axes-fraction per row × N rows tall).
   _draw_stats_table now stashes the bottom Y on the axes for the
   PVS helper to reference, so the geometry stays in sync.

2. Center PVS horizontally (ha="center" at x=0.5 instead of ha="left"
   at x=0).  The previous left-edge alignment put PVS at the same
   X as the label column, which read as "off-center" once the rest
   of the stats data was column-aligned further right.

3. Drop the "NA: Not Applicable" caption.  It existed to explain
   "—" placeholder cells, but "—" is universally understood and the
   caption was always visually squished against the PVS line below.
   Less cruft on the page; one fewer position to manage.

Verified against a real BE12599 histogram event (5 data rows) and
a real UM12947 IDFW waveform event (6 data rows) — both layouts
clear the table cleanly with no overlap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 22:17:43 +00:00
9 changed files with 684 additions and 46 deletions
+57
View File
@@ -8,6 +8,63 @@ All notable changes to seismo-relay are documented here.
--- ---
## v0.21.1 — 2026-06-01
Bug fixes against v0.21.0 surfaced after the first prod redeploy. Three
production-visible symptoms — blank waveform charts on most Thor events,
blank histogram charts on all Thor events, and a mic chart that
auto-scaled against a dB(L) value treated as psi — all root-caused and
fixed.
### Fixed
- **Dynamic IDFW body offset.** The v0.21.0 codec hardcoded the body
at file offset `0x0f1f` based on the example corpus, but only ~52%
of production IDFW events use that offset; the rest sit at offsets
from `0x1033` up to `0x3082` depending on header padding. At
`0x0f1f` the codec would find a coincidentally-matching `00 02 00`
magic, read the 2-byte Tran preamble, and return empty V/L/M
arrays — producing near-empty .h5 files and blank charts.
`micromate.idf_file._find_waveform_body_offset()` now scans every
`00 02 00` magic position past `0x0E00`, trial-decodes each one,
and picks the offset with the most samples. Validated across 483
prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully
decode, 126/483 partial (BW codec walker-stops-early on loud
events — pre-existing limitation, samples reached are correct).
- **IDFH histograms now render bar charts.** Histograms previously
skipped the .h5 write because there are no per-sample arrays, but
the renderer drives the per-interval bar chart from .h5 channel
data + `bw_report.histogram.n_intervals`. `save_imported_idf` now
synthesizes a 1-sample-per-interval array from the decoded
`IdfhInterval` peak counts and writes an .h5 so the existing
renderer works unchanged — each "sample" is the per-interval peak
ADC count, so the writer's `count × geo_fs/32768` conversion
yields the right bar height.
- **Mic chart scaling on Thor events.** `PeakValues.micl` (consumed
by the h5 writer's per-count mic scale factor) expects psi, but
the Thor bridge was stuffing the dB(L) value (~99.4) into it,
producing a per-count factor 5+ orders of magnitude too large and
a flat-looking mic chart. Fixed by adding `IdfPeaks.mic_pspl_psi`
alongside `mic_pspl_dbl`; `read_idf_file()` computes it from
binary mic counts (`max(|MicL|) × 2.14e-6 psi/count`) for both
IDFW and IDFH paths; `save_imported_idf` merges it onto the typed
event after `IdfEvent.from_report`; the bridge feeds psi to
`PeakValues.micl` with a dB(L)→psi formula fallback when only the
dB(L) value is available. dB(L) for the report header still
flows through `bw_report.mic.pspl_dbl` unchanged.
### Operator
After deploy, run `python scripts/backfill_thor_events.py` to refresh
every existing Thor event's sidecar + .h5 with the corrected codec
output. The script auto-skips events already at the current
`TOOL_VERSION`, so the bump from `0.21.0``0.21.1` is what triggers
the refresh.
---
## v0.21.0 — 2026-05-29 ## v0.21.0 — 2026-05-29
The "Thor / Series IV codec" release. Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline. Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event. The "Thor / Series IV codec" release. Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline. Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event.
+82 -11
View File
@@ -62,12 +62,23 @@ _THOR_PREFIX = b"\x00\x12\x01\x00\x00\x00"
_BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00" _BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00"
_INSTANTEL_TAG = b"Instantel" _INSTANTEL_TAG = b"Instantel"
# Constant body offset for sig-A IDFW files (verified across 151/154 corpus # Most common body offset for sig-A IDFW files (~50% of prod events;
# files in tests/fixtures/THORDATA_example). The body is the segment-rotated # 151/154 in the original tests/fixtures/THORDATA_example corpus). The
# block stream consumed by decode_waveform_v2; bytes [0:3] are the magic # body is the segment-rotated block stream consumed by decode_waveform_v2;
# ``00 02 00`` preamble. # bytes [0:3] are the magic ``00 02 00`` preamble. Production events
# routinely use other offsets — see :func:`_find_waveform_body_offset`
# for the dynamic scan. This constant survives only as the priority hint.
_BODY_START_SIG_A = 0x0F1F _BODY_START_SIG_A = 0x0F1F
# Magic bytes that mark a candidate waveform-body preamble.
_BODY_MAGIC = b"\x00\x02\x00"
# Where to start looking for body candidates inside the file. Skip the
# fixed-header region where the same magic legitimately appears inside
# channel-test records and the compliance block (offsets 0x015d, 0x091c,
# 0x0ae2, 0x0d30 in observed events).
_BODY_SCAN_FLOOR = 0x0E00
# Geophone count → in/s, derived from sidecar ground truth: the smallest # Geophone count → in/s, derived from sidecar ground truth: the smallest
# non-zero sample in 1,014-file corpus is 0.0003 in/s. # non-zero sample in 1,014-file corpus is 0.0003 in/s.
_GEO_LSB_IPS = 0.0003 _GEO_LSB_IPS = 0.0003
@@ -179,17 +190,65 @@ def extract_binary_metadata(buf: bytes) -> IdfBinaryMetadata:
# ─── Sample decoder + unit conversion ─────────────────────────────────────── # ─── Sample decoder + unit conversion ───────────────────────────────────────
def _find_waveform_body_offset(buf: bytes) -> Optional[int]:
"""Pick the file offset of the waveform body by trial-decoding every
``00 02 00`` magic position past the fixed-header region.
The body's location isn't fixed across all sig-A IDFW files — about
half the production events use ``0x0f1f``, but the rest have offsets
that shift based on header padding / channel-config layout. We
auto-detect by:
1. Find every ``00 02 00`` occurrence past ``_BODY_SCAN_FLOOR``.
2. Try ``decode_waveform_v2()`` on each candidate.
3. Pick the offset whose decoded sample count is largest.
Returns the offset, or ``None`` if no candidate yielded more than
the trivial 2-sample preamble (= "no real body found").
Costs ~2-8 trial decodes per file; in practice the first candidate
past 0x0e00 is usually the right one.
"""
if len(buf) < _BODY_SCAN_FLOOR + 8:
return None
best: Optional[tuple[int, int]] = None # (total_samples, offset)
i = _BODY_SCAN_FLOOR
while True:
j = buf.find(_BODY_MAGIC, i)
if j < 0:
break
i = j + 1
try:
decoded = decode_waveform_v2(buf[j:])
except Exception:
continue
if not decoded:
continue
total = sum(len(v) for v in decoded.values())
# A "real" body has more than just the 2-sample preamble.
if total <= 2:
continue
if best is None or total > best[0]:
best = (total, j)
return best[1] if best else None
def _decode_waveform_samples(buf: bytes) -> Optional[dict]: def _decode_waveform_samples(buf: bytes) -> Optional[dict]:
"""Decode samples from the sig-A body starting at file offset 0x0f1f. """Decode samples from the sig-A waveform body.
Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in
its own count unit (see :func:`mic_count_to_psi`). Returns None if its own count unit (see :func:`mic_count_to_psi`). Returns None if
decoding fails. no usable body is found.
Uses :func:`_find_waveform_body_offset` to locate the body — the
file-offset varies across events (~50% sit at the canonical
``0x0f1f`` but the rest don't), so the previous hardcoded constant
silently produced 2-sample preamble-only output for half the corpus.
""" """
if len(buf) < _BODY_START_SIG_A + 8: off = _find_waveform_body_offset(buf)
if off is None:
return None return None
body = buf[_BODY_START_SIG_A:] return decode_waveform_v2(buf[off:])
return decode_waveform_v2(body)
def geo_count_to_ips(count: int) -> float: def geo_count_to_ips(count: int) -> float:
@@ -379,6 +438,10 @@ def read_idf_file(
peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0) peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0)
peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0) peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0)
peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0) peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0)
# Mic peak in psi — Thor stores per-interval mic ADC counts in the
# binary; convert the max count to psi via the per-count factor.
mic_peak_count = max((iv.peak_count("MicL") for iv in intervals), default=0)
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
rep = IdfReport( rep = IdfReport(
serial_number=md.serial, serial_number=md.serial,
event_type="Full Histogram", event_type="Full Histogram",
@@ -392,7 +455,8 @@ def read_idf_file(
vertical_ips=peak_vert, vertical_ips=peak_vert,
longitudinal_ips=peak_long, longitudinal_ips=peak_long,
peak_vector_sum_ips=None, peak_vector_sum_ips=None,
mic_pspl_dbl=None, mic_pspl_dbl=None, # IDFH binary doesn't carry the dB(L) value
mic_pspl_psi=mic_peak_psi,
) )
event = IdfEvent( event = IdfEvent(
serial=md.serial or "UNKNOWN", serial=md.serial or "UNKNOWN",
@@ -430,6 +494,11 @@ def read_idf_file(
arr = decoded.get(ch, []) arr = decoded.get(ch, [])
return geo_count_to_ips(max((abs(v) for v in arr), default=0)) return geo_count_to_ips(max((abs(v) for v in arr), default=0))
# Mic peak psi from binary: max absolute MicL ADC count × 2.14e-6 psi/count.
mic_arr = decoded.get("MicL", [])
mic_peak_count = max((abs(v) for v in mic_arr), default=0)
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
peaks = IdfPeaks( peaks = IdfPeaks(
transverse_ips=_peak_ips("Tran"), transverse_ips=_peak_ips("Tran"),
vertical_ips=_peak_ips("Vert"), vertical_ips=_peak_ips("Vert"),
@@ -437,7 +506,9 @@ def read_idf_file(
# PVS requires aligned per-sample √(T²+V²+L²); leave None — the # PVS requires aligned per-sample √(T²+V²+L²); leave None — the
# sidecar carries it and the bridge picks it up if present. # sidecar carries it and the bridge picks it up if present.
peak_vector_sum_ips=None, peak_vector_sum_ips=None,
mic_pspl_dbl=None, mic_pspl_dbl=None, # binary IDFW doesn't carry the dB(L) value;
# sidecar .txt fills it via IdfReport.from_dict
mic_pspl_psi=mic_peak_psi,
) )
event = IdfEvent( event = IdfEvent(
+27 -6
View File
@@ -159,12 +159,23 @@ class IdfReport:
@dataclass @dataclass
class IdfPeaks: class IdfPeaks:
"""Geophone + mic peak values for one Thor event. Native Thor units.""" """Geophone + mic peak values for one Thor event. Native Thor units.
Thor stores the mic peak in two parallel forms — ``mic_pspl_dbl`` is
what the sidecar's top-level ``MicPSPL`` header field carries (dB(L)),
used in the report header. ``mic_pspl_psi`` is the psi value derived
either from the IDFW sample table / IDFH interval column 9, or from
the binary mic counts (~2.14e-6 psi/count). Needed because the
BW-shaped ``PeakValues.micl`` consumed by ``event_hdf5.write_event_hdf5``
expects psi — feeding it dB(L) makes the h5 mic-chart scale factor
blow up.
"""
transverse_ips: Optional[float] = None # in/s transverse_ips: Optional[float] = None # in/s
vertical_ips: Optional[float] = None # in/s vertical_ips: Optional[float] = None # in/s
longitudinal_ips: Optional[float] = None # in/s longitudinal_ips: Optional[float] = None # in/s
peak_vector_sum_ips: Optional[float] = None # in/s peak_vector_sum_ips: Optional[float] = None # in/s
mic_pspl_dbl: Optional[float] = None # dB(L) mic_pspl_dbl: Optional[float] = None # dB(L)
mic_pspl_psi: Optional[float] = None # psi
@dataclass @dataclass
@@ -324,10 +335,14 @@ class IdfEvent:
machinery without those code paths needing to know about Thor. machinery without those code paths needing to know about Thor.
Caveats of the bridge: Caveats of the bridge:
- ``mic_ppv`` on the produced Event carries Thor's dB(L) value - ``PeakValues.micl`` carries the mic peak in **psi** (matching
verbatim — the UI distinguishes via the ``device_family`` BW's convention) — set from :attr:`IdfPeaks.mic_pspl_psi`,
column (Phase 1). Don't run the BW psi→dBL converter on with a dB(L)→psi fallback when only the dB(L) value is
Series IV rows. available. This is what the h5 writer's mic-scale-factor
logic needs. The dB(L) value still flows through
``bw_report.mic.pspl_dbl`` (set by the
``idf_to_bw_report`` adapter) and the renderer reads it
from there for the report header.
- Many Thor-specific fields (Peak Acceleration / Displacement, - Many Thor-specific fields (Peak Acceleration / Displacement,
sensor self-check, calibration) don't have a slot in sensor self-check, calibration) don't have a slot in
``Event``. The full IdfReport is preserved on the ``Event``. The full IdfReport is preserved on the
@@ -349,11 +364,17 @@ class IdfEvent:
minute=self.timestamp.minute, minute=self.timestamp.minute,
second=self.timestamp.second, second=self.timestamp.second,
) )
# Resolve mic peak as psi. Priority: binary-derived mic_pspl_psi
# (set by read_idf_file) > dB(L)→psi fallback via standard formula
# (psi = 2.9e-9 × 10^(dBL/20)) > None.
mic_psi = self.peaks.mic_pspl_psi
if mic_psi is None and self.peaks.mic_pspl_dbl is not None:
mic_psi = 2.9e-9 * (10.0 ** (self.peaks.mic_pspl_dbl / 20.0))
pv = PeakValues( pv = PeakValues(
tran=self.peaks.transverse_ips, tran=self.peaks.transverse_ips,
vert=self.peaks.vertical_ips, vert=self.peaks.vertical_ips,
long=self.peaks.longitudinal_ips, long=self.peaks.longitudinal_ips,
micl=self.peaks.mic_pspl_dbl, # dB(L) — see caveat above micl=mic_psi, # psi, matching BW's convention (h5 scaling depends on this)
peak_vector_sum=self.peaks.peak_vector_sum_ips, peak_vector_sum=self.peaks.peak_vector_sum_ips,
) )
pi = ProjectInfo( pi = ProjectInfo(
+1 -1
View File
@@ -49,7 +49,7 @@ SIDECAR_KIND = "sfm.event"
# bumped without a `pip install` re-run — leading to confusing stale # bumped without a `pip install` re-run — leading to confusing stale
# version stamps in sidecars. Bump this constant and CHANGELOG.md # version stamps in sidecars. Bump this constant and CHANGELOG.md
# together at release time. # together at release time.
TOOL_VERSION = "0.21.0" TOOL_VERSION = "0.21.1"
try: try:
# Best-effort: prefer the installed metadata when it's NEWER than the # Best-effort: prefer the installed metadata when it's NEWER than the
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "seismo-relay" name = "seismo-relay"
version = "0.21.0" version = "0.21.1"
description = "Python client and REST server for MiniMate Plus seismographs" description = "Python client and REST server for MiniMate Plus seismographs"
requires-python = ">=3.10" requires-python = ">=3.10"
dependencies = [ dependencies = [
+331
View File
@@ -0,0 +1,331 @@
"""
scripts/backfill_thor_events.py — re-process existing Thor (Series IV)
events so their sidecars carry the bw_report block produced by
``micromate.idf_to_bw_report.build_bw_report_from_idf`` + their .h5
clean-waveform files for IDFW events.
Why this exists
───────────────
Thor events ingested before v0.21.0 (or during the v0.21.0 ingest bug
window fixed in commit bee1185) have sidecars with only
``extensions.idf_report`` — no ``bw_report`` block. Without
``bw_report``, the SFM PDF renderer falls back to DB-only fields
(misses sensor-self-check, full per-channel breakdown, mic dB(L)),
and the modal chart 404s on ``/waveform.json`` for IDFW events
because no .h5 was written when the codec failed at ingest.
Re-forwarding from thor-watcher would also fix this, but that requires
operator coordination on every watcher machine and uses bandwidth this
script doesn't.
What this does
──────────────
Walks ``<store>/<serial>/<filename>`` for ``.IDFW`` / ``.IDFH`` files
and, for each one:
1. Reads the existing sidecar (preserving review state + captured_at).
2. Re-runs ``micromate.idf_file.read_idf_file()`` on the binary
bytes — passing ``data=`` so the codec doesn't try to read from
a path it doesn't know.
3. Pulls ``extensions.idf_report`` (the raw parsed Thor dict the
v0.18.0+ ingest path already stashed) and runs the v0.21.0
``build_bw_report_from_idf`` adapter against it.
4. Writes the refreshed sidecar with the new ``bw_report``,
bumped ``source.tool_version``, but preserved ``review`` block
+ the original ``captured_at`` timestamp.
5. Regenerates the .h5 waveform file via the existing
``event_hdf5`` writer. For IDFW that's the decoded per-sample
stream; for IDFH it's a 1-sample-per-interval synthesised array
(peak ADC count per channel) so the renderer's bar-chart code
has data to group on. Mic peak psi from the binary is merged
onto the IdfEvent before the bridge so the h5 writer's per-count
mic scale factor lands on a sensible value (without this the
mic chart on Thor events plots dB(L)-as-pseudo-psi and shows
bomb-level numbers).
Idempotent. Re-running it after a parser/adapter change just
re-writes sidecars — no DB writes, no thor-watcher coordination.
Usage
─────
python scripts/backfill_thor_events.py [--store-root PATH]
[--dry-run]
[--skip-hdf5]
[--force]
[-v]
By default, refreshes any Thor event whose sidecar is missing
``bw_report`` OR whose ``source.tool_version`` is older than the
current ``TOOL_VERSION``. ``--force`` refreshes every Thor event
regardless.
"""
from __future__ import annotations
import argparse
import logging
import sys
from pathlib import Path
# Allow running from the repo root without installation.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from minimateplus import event_file_io
from sfm.waveform_store import WaveformStore
log = logging.getLogger("backfill_thor_events")
def _is_thor_event(path: Path) -> bool:
if not path.is_file():
return False
if path.name.endswith((".sfm.json", ".h5", "_ASCII.TXT")):
return False
return path.suffix.upper() in (".IDFW", ".IDFH")
def _vtuple(s: str) -> tuple:
try:
return tuple(int(p) for p in str(s).split(".")[:3])
except Exception:
return (0, 0, 0)
def main(argv=None) -> int:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument(
"--db-path",
default=str(Path(__file__).resolve().parent.parent / "bridges" / "captures" / "seismo_relay.db"),
help="Used only to derive the default --store-root.",
)
p.add_argument("--store-root", default=None)
p.add_argument("--dry-run", action="store_true")
p.add_argument("--skip-hdf5", action="store_true",
help="Don't regenerate .h5 files for IDFW events.")
p.add_argument("--force", action="store_true",
help="Refresh every Thor event, not just ones with stale or missing bw_report.")
p.add_argument("-v", "--verbose", action="store_true")
args = p.parse_args(argv)
logging.basicConfig(
level=logging.DEBUG if args.verbose else logging.INFO,
format="%(asctime)s %(levelname)-7s %(name)s %(message)s",
datefmt="%H:%M:%S",
)
db_path = Path(args.db_path).expanduser().resolve()
store_root = (
Path(args.store_root).expanduser().resolve()
if args.store_root else db_path.parent / "waveforms"
)
if not store_root.exists():
log.error("store root not found: %s", store_root)
return 1
store = WaveformStore(store_root)
log.info("store root: %s", store_root)
log.info("current TOOL_VERSION: %s", event_file_io.TOOL_VERSION)
refreshed = skipped = errors = h5_written = 0
# Lazy imports so any one of these failing produces a useful error
# message rather than crashing module-load.
from micromate.idf_file import read_idf_file
from micromate.idf_to_bw_report import build_bw_report_from_idf
for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
serial = serial_dir.name
for path in sorted(serial_dir.iterdir()):
if not _is_thor_event(path):
continue
sidecar_path = store.sidecar_path_for(serial, path.name)
if not sidecar_path.exists():
log.debug("%s: no sidecar — skipping (this is a binary without ingest history)",
path.name)
skipped += 1
continue
try:
existing = event_file_io.read_sidecar(sidecar_path)
except Exception as exc:
log.warning("%s: failed to read sidecar — %s", path.name, exc)
errors += 1
continue
has_bw_report = bool(existing.get("bw_report"))
existing_version = (existing.get("source") or {}).get("tool_version", "")
up_to_date = (
has_bw_report
and _vtuple(existing_version) >= _vtuple(event_file_io.TOOL_VERSION)
)
if up_to_date and not args.force:
skipped += 1
continue
# Re-decode the binary. Catch + log; continue with .txt-only
# data if it fails (matches the live ingest path's behavior).
idf_samples = None
idf_intervals = None
binary_md = None
is_histogram = path.suffix.upper() == ".IDFH"
try:
binary_bytes = path.read_bytes()
res = read_idf_file(path, data=binary_bytes)
idf_samples = res.samples or None
idf_intervals = res.intervals
binary_md = res.binary_metadata
is_histogram = res.intervals is not None
except NotImplementedError:
# sig-B / Blastware-stray binary; no samples but adapter
# can still produce a bw_report from extensions.idf_report.
log.debug("%s: binary codec NotImplementedError (sig-B / BW-stray); proceeding from sidecar's idf_report only", path.name)
except Exception as exc:
log.warning("%s: binary decode failed — %s; proceeding from sidecar's idf_report only", path.name, exc)
# Run the adapter. Pull report_dict from
# extensions.idf_report (the v0.18.0+ ingest preserved it).
report_dict = (existing.get("extensions") or {}).get("idf_report") or {}
if not report_dict and binary_md is None:
log.debug("%s: no idf_report in sidecar AND no binary metadata — nothing to project", path.name)
skipped += 1
continue
try:
bw_report = build_bw_report_from_idf(
report_dict, binary_md=binary_md,
intervals=idf_intervals, is_histogram=is_histogram,
)
except Exception as exc:
log.warning("%s: adapter failed — %s", path.name, exc)
errors += 1
continue
# Build the new sidecar by overlaying refreshed fields onto
# the existing one — preserves review, captured_at, blastware
# block, source.kind, etc.
new_sidecar = dict(existing) # shallow copy
new_sidecar["bw_report"] = bw_report
src = dict(new_sidecar.get("source") or {})
src["tool_version"] = event_file_io.TOOL_VERSION
new_sidecar["source"] = src
# Preserve histogram intervals if the binary decoded them
# (improves over the original ingest if that one ran before
# the bee1185 codec fix).
if idf_intervals is not None:
ext = dict(new_sidecar.get("extensions") or {})
ext["idf_intervals"] = [
{
"offset": iv.offset,
"tran_peak": iv.peak_count("Tran"),
"tran_halfp": iv.tran_halfp,
"tran_freq": iv.freq_hz("Tran"),
"vert_peak": iv.peak_count("Vert"),
"vert_halfp": iv.vert_halfp,
"vert_freq": iv.freq_hz("Vert"),
"long_peak": iv.peak_count("Long"),
"long_halfp": iv.long_halfp,
"long_freq": iv.freq_hz("Long"),
"mic_peak": iv.peak_count("MicL"),
"mic_halfp": iv.micl_halfp,
"mic_freq": iv.freq_hz("MicL"),
}
for iv in idf_intervals
]
new_sidecar["extensions"] = ext
if args.dry_run:
will_write_h5 = (idf_samples or idf_intervals) and not args.skip_hdf5
log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)",
serial, path.name,
"wrote" if not has_bw_report else "refreshed",
"would write" if will_write_h5 else "skipped")
else:
event_file_io.write_sidecar(sidecar_path, new_sidecar)
log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)",
serial, path.name,
"added" if not has_bw_report else "refreshed",
len(idf_intervals) if idf_intervals else 0)
refreshed += 1
# Regenerate .h5 by replaying the same IdfEvent → Event bridge
# save_imported_idf uses. For IDFW we write the decoded per-
# sample arrays. For IDFH we synthesise a 1-sample-per-interval
# array (peak ADC count per channel per interval) so the
# renderer's bar-chart code has something to group on.
# Pre-condition: either real samples (IDFW) or decoded intervals
# (IDFH). Skip otherwise.
have_data = bool(idf_samples) or bool(idf_intervals)
if have_data and not args.skip_hdf5:
from sfm import event_hdf5
hdf5_path = store.hdf5_path_for(serial, path.name)
if args.dry_run:
log.debug("[DRY] would write %s", hdf5_path.name)
else:
try:
from micromate import IdfEvent
from minimateplus.event_file_io import file_sha256
idf_event = IdfEvent.from_report(report_dict, path.name)
# Merge the binary-derived mic peak psi (only the
# binary path knows the proper psi value; the .txt
# carries dB(L)). Without this, the h5 writer's
# per-count mic factor is computed against the
# dB(L) value-as-pseudo-psi and the mic chart
# scales wildly.
if (binary_md is not None and res is not None
and res.event.peaks.mic_pspl_psi is not None):
idf_event.peaks.mic_pspl_psi = res.event.peaks.mic_pspl_psi
sha256 = file_sha256(path)
waveform_key = bytes.fromhex(sha256)[:16]
ev = idf_event.to_minimateplus_event(waveform_key)
if is_histogram and idf_intervals:
# 1 sample per interval per channel — same
# synthesis save_imported_idf uses. The h5
# writer's count×geo_fs/32768 conversion turns
# each peak-ADC-count into the bar's physical
# value.
ev.raw_samples = {
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
"Long": [iv.peak_count("Long") for iv in idf_intervals],
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
}
ev.total_samples = ev.total_samples or len(idf_intervals)
elif idf_samples:
ev.raw_samples = idf_samples
n_samp = max(
(len(idf_samples.get(ch, []))
for ch in ("Tran", "Vert", "Long", "MicL")),
default=0,
)
ev.total_samples = ev.total_samples or n_samp
event_hdf5.write_event_hdf5(
hdf5_path, ev,
serial=serial,
geo_range="normal",
source_kind="idf-import",
tool_version=event_file_io.TOOL_VERSION,
)
h5_written += 1
log.debug("%s/%s — .h5 written (%s)",
serial, path.name,
f"{len(idf_intervals)} intervals" if is_histogram
else f"{sum(len(v) for v in (idf_samples or {}).values())} samples")
except Exception as exc:
log.warning("%s/%s — .h5 write failed: %s",
serial, path.name, exc)
log.info("Done. refreshed=%d skipped=%d errors=%d h5_written=%d",
refreshed, skipped, errors, h5_written)
return 0 if errors == 0 else 2
if __name__ == "__main__":
sys.exit(main())
+91
View File
@@ -0,0 +1,91 @@
"""Re-ingest a prod IDFW + IDFH via the patched save_imported_idf and
render both PDFs to confirm charts have data."""
from __future__ import annotations
import sys
import json
import datetime
import tempfile
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
from sfm.waveform_store import WaveformStore
from sfm import report_pdf
import h5py
class FakeDb:
def __init__(self, event):
self.event = event
def get_event(self, _id):
return self.event
def to_ts_iso(ts):
if ts is None:
return None
try:
return datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
except Exception:
return None
def render_case(idf_path: Path, serial: str, out_pdf: Path, h5_summary: bool = True):
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idf_path.read_bytes(),
idf_path,
idf_report_text=None, # production worst case: no .txt
)
print(f"=== {idf_path.name} ===")
print(f" h5: {rec['hdf5_filename']}, sidecar: {rec['sidecar_filename']}")
h5p = Path(td) / serial / f"{idf_path.name}.h5"
if h5p.exists() and h5_summary:
with h5py.File(h5p) as h:
for ch in ("Tran", "Vert", "Long", "MicL"):
ds = h.get(f"samples/{ch}")
if ds is not None:
n = ds.shape[0]
mx = float(abs(ds[...]).max()) if n else 0
print(f" samples/{ch}: n={n} max_abs={mx:.5f}")
record_type = "Histogram" if idf_path.suffix.upper() == ".IDFH" else "Waveform"
fake_row = {
"serial": serial,
"blastware_filename": rec["filename"],
"record_type": record_type,
"timestamp": to_ts_iso(ev.timestamp),
"sample_rate": ev.sample_rate,
"project": ev.project_info.project if ev.project_info else None,
"client": ev.project_info.client if ev.project_info else None,
"operator": ev.project_info.operator if ev.project_info else None,
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
"created_at": None,
}
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
print(f" ReportData: channels={ {k: len(v) for k,v in rd.channels.items()} }")
if rd.is_histogram:
print(f" histogram n_intervals={rd.histogram_n_intervals} interval_size={rd.histogram_interval_size}")
pdf = report_pdf.render_event_report_pdf(rd)
out_pdf.write_bytes(pdf)
print(f" PDF: {out_pdf} ({len(pdf)} bytes)")
def main():
out_dir = Path("/tmp/thor_render_test"); out_dir.mkdir(exist_ok=True)
cases = [
# IDFW that decoded to preamble-only under the old codec
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804154137.IDFW", "UM6047"),
# IDFW that worked under the old codec (validates no regression)
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804104450.IDFW", "UM6047"),
# IDFH histogram
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804190047.IDFH", "UM6047"),
]
for path, serial in cases:
render_case(Path(path), serial, out_dir / f"{Path(path).name}.pdf")
if __name__ == "__main__":
main()
+63 -24
View File
@@ -638,14 +638,7 @@ def _draw_channel_stats_waveform(ax, rd: ReportData) -> None:
("Sensor Check", "sensor_check", ""), ("Sensor Check", "sensor_check", ""),
] ]
_draw_stats_table(ax, rd, rows_spec) _draw_stats_table(ax, rd, rows_spec)
if rd.peak_vector_sum_ips is not None: _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec))
line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s"
if rd.peak_vector_sum_time_s is not None:
line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
ax.text(0.0, -0.08, line, fontsize=9, weight="bold",
ha="left", va="top", transform=ax.transAxes)
ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888",
ha="left", va="top", transform=ax.transAxes)
def _draw_channel_stats_histogram(ax, rd: ReportData) -> None: def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
@@ -663,20 +656,54 @@ def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
("Sensor Check", "sensor_check", ""), ("Sensor Check", "sensor_check", ""),
] ]
_draw_stats_table(ax, rd, rows_spec) _draw_stats_table(ax, rd, rows_spec)
if rd.peak_vector_sum_ips is not None: _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec), histogram_when=True)
line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s"
# Histograms: "0.091 in/s on May 27, 2026 At 06:06:14"
# The when_str is "HH:MM:SS Month DD, YYYY" — reformat for BW match. def _draw_pvs_summary(
if rd.peak_vector_sum_when_str: ax,
parts = rd.peak_vector_sum_when_str.split(" ", 1) rd: ReportData,
if len(parts) == 2: *,
line += f" on {parts[1]} At {parts[0]}" n_data_rows: int,
else: histogram_when: bool = False,
line += f" on {rd.peak_vector_sum_when_str}" ) -> None:
ax.text(0.0, -0.08, line, fontsize=9, weight="bold", """Render the Peak Vector Sum + 'NA: Not Applicable' caption below the
ha="left", va="top", transform=ax.transAxes) stats table.
ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888",
ha="left", va="top", transform=ax.transAxes) Reads ``ax._stats_table_bottom`` (set by ``_draw_stats_table`` when
it pins the table via an explicit ``bbox``) so the PVS line lands
just below the table's known bottom edge instead of guessing at the
geometry.
Centered horizontally for visual balance (the previous left-aligned
x=0 landed under the label column, not the data, which looked off).
"""
if rd.peak_vector_sum_ips is None:
return
line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s"
if histogram_when and rd.peak_vector_sum_when_str:
# Histogram absolute date+time. when_str is "HH:MM:SS Month DD, YYYY";
# reformat to "<value> on <date> At <time>" to match BW.
parts = rd.peak_vector_sum_when_str.split(" ", 1)
if len(parts) == 2:
line += f" on {parts[1]} At {parts[0]}"
else:
line += f" on {rd.peak_vector_sum_when_str}"
elif not histogram_when and rd.peak_vector_sum_time_s is not None:
line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
# _draw_stats_table stashes the bbox bottom on the axes so we don't
# have to guess geometry. Falls back to a conservative default if
# the bbox approach hasn't run.
table_bottom_y = getattr(ax, "_stats_table_bottom", -0.10)
pvs_y = table_bottom_y - 0.04 # small gap below the table border
# Centered for visual balance — looks intentional rather than offset.
# The original BW-replica had a "NA: Not Applicable" caption below
# this line; dropped because we use "—" for missing values and the
# legend was always squished against the PVS line.
ax.text(0.5, pvs_y, line, fontsize=9, weight="bold",
ha="center", va="top", transform=ax.transAxes)
def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]]) -> None: def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]]) -> None:
@@ -711,16 +738,28 @@ def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]])
_cell(field_name, "Long"), _cell(field_name, "Long"),
unit, unit,
]) ])
# Pin the table's position+size via bbox so we know exactly where
# the bottom edge lands. Lets _draw_pvs_summary place the PVS line
# just below the table without guessing at row heights.
#
# bbox = [x, y, width, height] in axes coords. Header + data rows
# at row_h each; horizontal extent matches sum(colWidths).
n_rows = len(table_data) # header + data rows
row_h = 0.12 # axes-fraction per row (fits fontsize=8)
table_height = n_rows * row_h
table_bottom = 1.0 - table_height
tbl = ax.table( tbl = ax.table(
cellText=table_data, loc="upper left", cellText=table_data,
colWidths=[0.28, 0.14, 0.14, 0.14, 0.10], colWidths=[0.28, 0.14, 0.14, 0.14, 0.10],
cellLoc="left", edges="open", cellLoc="left", edges="open",
bbox=[0.0, table_bottom, 0.80, table_height],
) )
tbl.auto_set_font_size(False) tbl.auto_set_font_size(False)
tbl.set_fontsize(8) tbl.set_fontsize(8)
tbl.scale(1, 1.4)
for j in range(5): for j in range(5):
tbl[(0, j)].set_text_props(weight="bold", color="#555") tbl[(0, j)].set_text_props(weight="bold", color="#555")
# Stash the bottom Y so _draw_pvs_summary can position itself below.
ax._stats_table_bottom = table_bottom
def _channel_axis_color(ch: str) -> str: def _channel_axis_color(ch: str) -> str:
+31 -3
View File
@@ -568,6 +568,16 @@ class WaveformStore:
# precedence over the filename timestamp inside from_report(). # precedence over the filename timestamp inside from_report().
idf_event = IdfEvent.from_report(report_dict, source_path.name) idf_event = IdfEvent.from_report(report_dict, source_path.name)
# The binary mic peak (psi) isn't carried through from_report() —
# IdfReport.from_dict only sees the .txt's dB(L) value. Pull the
# binary-derived ``mic_pspl_psi`` onto the typed IdfEvent so the
# downstream bridge can populate ``PeakValues.micl`` (psi-shaped)
# and the h5 writer's per-count mic factor lands at a sensible
# value. Without this, the h5 mic chart auto-scales against the
# dB(L) value-as-pseudo-psi and renders ~flat.
if binary_peaks is not None and binary_peaks.mic_pspl_psi is not None:
idf_event.peaks.mic_pspl_psi = binary_peaks.mic_pspl_psi
# Operator-supplied serial_hint wins over the binary's filename # Operator-supplied serial_hint wins over the binary's filename
# prefix when both are present (e.g. callers passing a known-good # prefix when both are present (e.g. callers passing a known-good
# serial that overrides a misnamed export). # serial that overrides a misnamed export).
@@ -600,10 +610,28 @@ class WaveformStore:
n_samples = max((len(idf_samples.get(ch, [])) for ch in ("Tran", "Vert", "Long", "MicL")), default=0) n_samples = max((len(idf_samples.get(ch, [])) for ch in ("Tran", "Vert", "Long", "MicL")), default=0)
ev.total_samples = ev.total_samples or n_samples ev.total_samples = ev.total_samples or n_samples
# 7. Write the .h5 clean-waveform file when we actually have samples. # For IDFH histograms there are no per-sample waveform arrays — the
# Histograms (IDFH) don't have waveform samples — skip h5 for those. # device stores one peak ADC count per interval per channel. Synthesise
# a 1-sample-per-interval array so the existing h5+renderer pipeline
# (which groups samples down to ``n_intervals`` bars via max-per-group)
# produces a non-blank histogram chart. Each "sample" is the peak ADC
# count for that interval, so the h5 writer's ``count × geo_fs/32768``
# conversion yields the right physical value for the bar height.
if is_histogram and idf_intervals:
hist_samples = {
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
"Long": [iv.peak_count("Long") for iv in idf_intervals],
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
}
ev.raw_samples = hist_samples
ev.total_samples = ev.total_samples or len(idf_intervals)
# 7. Write the .h5 clean-waveform file when we have samples to write
# (either the IDFW per-sample stream, or the IDFH synthesised per-
# interval peak array). The renderer treats both shapes the same way.
hdf5_filename: Optional[str] = None hdf5_filename: Optional[str] = None
if idf_samples is not None and not is_histogram: if ev.raw_samples:
hdf5_path = self.hdf5_path_for(serial, filename) hdf5_path = self.hdf5_path_for(serial, filename)
try: try:
event_hdf5.write_event_hdf5( event_hdf5.write_event_hdf5(