fix(backfill): regenerate IDFH .h5 + merge binary mic_pspl_psi onto bridge

Two gaps in backfill_thor_events.py that left old Thor events showing stale charts after a v0.21.1 backfill pass: 1. IDFH events were skipped from .h5 regeneration (the "have decoded samples" gate was IDFW-only). Histograms kept their pre-v0.21.1 .h5 — written from raw_samples = None, which the renderer turned into a near-empty bar chart, or for older events the dB(L)-as-pseudo- psi mic scale that produced "107.7 psi" peaks (atomic-bomb level instead of footstep level). Fix: synthesise the same 1-sample-per- interval array save_imported_idf v0.21.1 uses (peak ADC count per channel per interval) so the renderer's bar-chart grouping has data to work with. 2. The IDFW h5 path didn't merge binary_peaks.mic_pspl_psi onto the IdfEvent before to_minimateplus_event(). The live save_imported_idf does this merge — without it, IdfEvent.from_report() only sees the .txt's dB(L) value, the bridge falls back to the dBL→psi formula (instead of the binary-accurate 2.14e-6 psi/count value), and the h5 writer's per-count mic factor lands on a less-correct value. Fix: same merge the live ingest does (lift res.event.peaks.mic_pspl_psi onto idf_event.peaks before the bridge call). Verified against UM6047_20250804190047.IDFH (250-interval prod histogram): 250 intervals decode, mic_pspl_psi = 2.78e-5 (was being treated as dB(L)=107.7 in the old h5). Operator: re-run after deploy. `docker compose exec sfm python scripts/backfill_thor_events.py` is idempotent — the existing version check still skips events already at the new TOOL_VERSION, and review state + captured_at are preserved on the second pass.
version bump - 0.21.1
2026-06-01 20:02:54 +00:00 · 2026-06-01 19:33:44 +00:00 · 2026-06-01 18:27:24 +00:00 · 2026-05-31 20:51:09 +00:00 · 2026-05-30 04:37:43 +00:00 · 2026-05-29 22:17:43 +00:00
9 changed files with 684 additions and 46 deletions
@@ -8,6 +8,63 @@ All notable changes to seismo-relay are documented here.
 ---
 ## v0.21.1 — 2026-06-01
 Bug fixes against v0.21.0 surfaced after the first prod redeploy.  Three
 production-visible symptoms — blank waveform charts on most Thor events,
 blank histogram charts on all Thor events, and a mic chart that
 auto-scaled against a dB(L) value treated as psi — all root-caused and
 fixed.
 ### Fixed
 - **Dynamic IDFW body offset.**  The v0.21.0 codec hardcoded the body
  at file offset `0x0f1f` based on the example corpus, but only ~52%
  of production IDFW events use that offset; the rest sit at offsets
  from `0x1033` up to `0x3082` depending on header padding.  At
  `0x0f1f` the codec would find a coincidentally-matching `00 02 00`
  magic, read the 2-byte Tran preamble, and return empty V/L/M
  arrays — producing near-empty .h5 files and blank charts.
  `micromate.idf_file._find_waveform_body_offset()` now scans every
  `00 02 00` magic position past `0x0E00`, trial-decodes each one,
  and picks the offset with the most samples.  Validated across 483
  prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully
  decode, 126/483 partial (BW codec walker-stops-early on loud
  events — pre-existing limitation, samples reached are correct).
 - **IDFH histograms now render bar charts.**  Histograms previously
  skipped the .h5 write because there are no per-sample arrays, but
  the renderer drives the per-interval bar chart from .h5 channel
  data + `bw_report.histogram.n_intervals`.  `save_imported_idf` now
  synthesizes a 1-sample-per-interval array from the decoded
  `IdfhInterval` peak counts and writes an .h5 so the existing
  renderer works unchanged — each "sample" is the per-interval peak
  ADC count, so the writer's `count × geo_fs/32768` conversion
  yields the right bar height.
 - **Mic chart scaling on Thor events.**  `PeakValues.micl` (consumed
  by the h5 writer's per-count mic scale factor) expects psi, but
  the Thor bridge was stuffing the dB(L) value (~99.4) into it,
  producing a per-count factor 5+ orders of magnitude too large and
  a flat-looking mic chart.  Fixed by adding `IdfPeaks.mic_pspl_psi`
  alongside `mic_pspl_dbl`; `read_idf_file()` computes it from
  binary mic counts (`max(|MicL|) × 2.14e-6 psi/count`) for both
  IDFW and IDFH paths; `save_imported_idf` merges it onto the typed
  event after `IdfEvent.from_report`; the bridge feeds psi to
  `PeakValues.micl` with a dB(L)→psi formula fallback when only the
  dB(L) value is available.  dB(L) for the report header still
  flows through `bw_report.mic.pspl_dbl` unchanged.
 ### Operator
 After deploy, run `python scripts/backfill_thor_events.py` to refresh
 every existing Thor event's sidecar + .h5 with the corrected codec
 output.  The script auto-skips events already at the current
 `TOOL_VERSION`, so the bump from `0.21.0` → `0.21.1` is what triggers
 the refresh.
 ---
 ## v0.21.0 — 2026-05-29
 The "Thor / Series IV codec" release.  Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline.  Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event.
@@ -62,12 +62,23 @@ _THOR_PREFIX = b"\x00\x12\x01\x00\x00\x00"
 _BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00"
 _INSTANTEL_TAG = b"Instantel"
-# Constant body offset for sig-A IDFW files (verified across 151/154 corpus
+# Most common body offset for sig-A IDFW files (~50% of prod events;
-# files in tests/fixtures/THORDATA_example).  The body is the segment-rotated
+# 151/154 in the original tests/fixtures/THORDATA_example corpus).  The
-# block stream consumed by decode_waveform_v2; bytes [0:3] are the magic
+# body is the segment-rotated block stream consumed by decode_waveform_v2;
-# ``00 02 00`` preamble.
+# bytes [0:3] are the magic ``00 02 00`` preamble.  Production events
 # routinely use other offsets — see :func:`_find_waveform_body_offset`
 # for the dynamic scan.  This constant survives only as the priority hint.
 _BODY_START_SIG_A = 0x0F1F
 # Magic bytes that mark a candidate waveform-body preamble.
 _BODY_MAGIC = b"\x00\x02\x00"
 # Where to start looking for body candidates inside the file.  Skip the
 # fixed-header region where the same magic legitimately appears inside
 # channel-test records and the compliance block (offsets 0x015d, 0x091c,
 # 0x0ae2, 0x0d30 in observed events).
 _BODY_SCAN_FLOOR = 0x0E00
 # Geophone count → in/s, derived from sidecar ground truth: the smallest
 # non-zero sample in 1,014-file corpus is 0.0003 in/s.
 _GEO_LSB_IPS = 0.0003
@@ -179,17 +190,65 @@ def extract_binary_metadata(buf: bytes) -> IdfBinaryMetadata:
 # ─── Sample decoder + unit conversion ───────────────────────────────────────
 def _find_waveform_body_offset(buf: bytes) -> Optional[int]:
    """Pick the file offset of the waveform body by trial-decoding every
    ``00 02 00`` magic position past the fixed-header region.
    The body's location isn't fixed across all sig-A IDFW files — about
    half the production events use ``0x0f1f``, but the rest have offsets
    that shift based on header padding / channel-config layout.  We
    auto-detect by:
      1. Find every ``00 02 00`` occurrence past ``_BODY_SCAN_FLOOR``.
      2. Try ``decode_waveform_v2()`` on each candidate.
      3. Pick the offset whose decoded sample count is largest.
    Returns the offset, or ``None`` if no candidate yielded more than
    the trivial 2-sample preamble (= "no real body found").
    Costs ~2-8 trial decodes per file; in practice the first candidate
    past 0x0e00 is usually the right one.
    """
    if len(buf) < _BODY_SCAN_FLOOR + 8:
        return None
    best: Optional[tuple[int, int]] = None   # (total_samples, offset)
    i = _BODY_SCAN_FLOOR
    while True:
        j = buf.find(_BODY_MAGIC, i)
        if j < 0:
            break
        i = j + 1
        try:
            decoded = decode_waveform_v2(buf[j:])
        except Exception:
            continue
        if not decoded:
            continue
        total = sum(len(v) for v in decoded.values())
        # A "real" body has more than just the 2-sample preamble.
        if total <= 2:
            continue
        if best is None or total > best[0]:
            best = (total, j)
    return best[1] if best else None
 def _decode_waveform_samples(buf: bytes) -> Optional[dict]:
-    """Decode samples from the sig-A body starting at file offset 0x0f1f.
+    """Decode samples from the sig-A waveform body.
    Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in
    its own count unit (see :func:`mic_count_to_psi`).  Returns None if
-    decoding fails.
+    no usable body is found.
    Uses :func:`_find_waveform_body_offset` to locate the body — the
    file-offset varies across events (~50% sit at the canonical
    ``0x0f1f`` but the rest don't), so the previous hardcoded constant
    silently produced 2-sample preamble-only output for half the corpus.
    """
-    if len(buf) < _BODY_START_SIG_A + 8:
+    off = _find_waveform_body_offset(buf)
    if off is None:
        return None
-    body = buf[_BODY_START_SIG_A:]
+    return decode_waveform_v2(buf[off:])
    return decode_waveform_v2(body)
 def geo_count_to_ips(count: int) -> float:
@@ -379,6 +438,10 @@ def read_idf_file(
        peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0)
        peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0)
        peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0)
        # Mic peak in psi — Thor stores per-interval mic ADC counts in the
        # binary; convert the max count to psi via the per-count factor.
        mic_peak_count = max((iv.peak_count("MicL") for iv in intervals), default=0)
        mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
        rep = IdfReport(
            serial_number=md.serial,
            event_type="Full Histogram",
@@ -392,7 +455,8 @@ def read_idf_file(
            vertical_ips=peak_vert,
            longitudinal_ips=peak_long,
            peak_vector_sum_ips=None,
-            mic_pspl_dbl=None,
+            mic_pspl_dbl=None,         # IDFH binary doesn't carry the dB(L) value
            mic_pspl_psi=mic_peak_psi,
        )
        event = IdfEvent(
            serial=md.serial or "UNKNOWN",
@@ -430,6 +494,11 @@ def read_idf_file(
        arr = decoded.get(ch, [])
        return geo_count_to_ips(max((abs(v) for v in arr), default=0))
    # Mic peak psi from binary: max absolute MicL ADC count × 2.14e-6 psi/count.
    mic_arr = decoded.get("MicL", [])
    mic_peak_count = max((abs(v) for v in mic_arr), default=0)
    mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
    peaks = IdfPeaks(
        transverse_ips=_peak_ips("Tran"),
        vertical_ips=_peak_ips("Vert"),
@@ -437,7 +506,9 @@ def read_idf_file(
        # PVS requires aligned per-sample √(T²+V²+L²); leave None — the
        # sidecar carries it and the bridge picks it up if present.
        peak_vector_sum_ips=None,
-        mic_pspl_dbl=None,
+        mic_pspl_dbl=None,             # binary IDFW doesn't carry the dB(L) value;
                                       # sidecar .txt fills it via IdfReport.from_dict
        mic_pspl_psi=mic_peak_psi,
    )
    event = IdfEvent(
@@ -159,12 +159,23 @@ class IdfReport:
@dataclass
 class IdfPeaks:
-    """Geophone + mic peak values for one Thor event.  Native Thor units."""
+    """Geophone + mic peak values for one Thor event.  Native Thor units.
    Thor stores the mic peak in two parallel forms — ``mic_pspl_dbl`` is
    what the sidecar's top-level ``MicPSPL`` header field carries (dB(L)),
    used in the report header.  ``mic_pspl_psi`` is the psi value derived
    either from the IDFW sample table / IDFH interval column 9, or from
    the binary mic counts (~2.14e-6 psi/count).  Needed because the
    BW-shaped ``PeakValues.micl`` consumed by ``event_hdf5.write_event_hdf5``
    expects psi — feeding it dB(L) makes the h5 mic-chart scale factor
    blow up.
    """
    transverse_ips:    Optional[float] = None    # in/s
    vertical_ips:      Optional[float] = None    # in/s
    longitudinal_ips:  Optional[float] = None    # in/s
    peak_vector_sum_ips: Optional[float] = None  # in/s
    mic_pspl_dbl:      Optional[float] = None    # dB(L)
    mic_pspl_psi:      Optional[float] = None    # psi
@dataclass
@@ -324,10 +335,14 @@ class IdfEvent:
        machinery without those code paths needing to know about Thor.
        Caveats of the bridge:
-          - ``mic_ppv`` on the produced Event carries Thor's dB(L) value
+          - ``PeakValues.micl`` carries the mic peak in **psi** (matching
-            verbatim — the UI distinguishes via the ``device_family``
+            BW's convention) — set from :attr:`IdfPeaks.mic_pspl_psi`,
-            column (Phase 1).  Don't run the BW psi→dBL converter on
+            with a dB(L)→psi fallback when only the dB(L) value is
-            Series IV rows.
+            available.  This is what the h5 writer's mic-scale-factor
            logic needs.  The dB(L) value still flows through
            ``bw_report.mic.pspl_dbl`` (set by the
            ``idf_to_bw_report`` adapter) and the renderer reads it
            from there for the report header.
          - Many Thor-specific fields (Peak Acceleration / Displacement,
            sensor self-check, calibration) don't have a slot in
            ``Event``.  The full IdfReport is preserved on the
@@ -349,11 +364,17 @@ class IdfEvent:
            minute=self.timestamp.minute,
            second=self.timestamp.second,
        )
        # Resolve mic peak as psi.  Priority: binary-derived mic_pspl_psi
        # (set by read_idf_file) > dB(L)→psi fallback via standard formula
        # (psi = 2.9e-9 × 10^(dBL/20)) > None.
        mic_psi = self.peaks.mic_pspl_psi
        if mic_psi is None and self.peaks.mic_pspl_dbl is not None:
            mic_psi = 2.9e-9 * (10.0 ** (self.peaks.mic_pspl_dbl / 20.0))
        pv = PeakValues(
            tran=self.peaks.transverse_ips,
            vert=self.peaks.vertical_ips,
            long=self.peaks.longitudinal_ips,
-            micl=self.peaks.mic_pspl_dbl,   # dB(L) — see caveat above
+            micl=mic_psi,   # psi, matching BW's convention (h5 scaling depends on this)
            peak_vector_sum=self.peaks.peak_vector_sum_ips,
        )
        pi = ProjectInfo(
@@ -49,7 +49,7 @@ SIDECAR_KIND   = "sfm.event"
 # bumped without a `pip install` re-run — leading to confusing stale
 # version stamps in sidecars.  Bump this constant and CHANGELOG.md
 # together at release time.
-TOOL_VERSION = "0.21.0"
+TOOL_VERSION = "0.21.1"
 try:
    # Best-effort: prefer the installed metadata when it's NEWER than the
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "seismo-relay"
-version = "0.21.0"
+version = "0.21.1"
 description = "Python client and REST server for MiniMate Plus seismographs"
 requires-python = ">=3.10"
 dependencies = [
@@ -0,0 +1,331 @@
 """
 scripts/backfill_thor_events.py — re-process existing Thor (Series IV)
 events so their sidecars carry the bw_report block produced by
 ``micromate.idf_to_bw_report.build_bw_report_from_idf`` + their .h5
 clean-waveform files for IDFW events.
 Why this exists
 ───────────────
 Thor events ingested before v0.21.0 (or during the v0.21.0 ingest bug
 window fixed in commit bee1185) have sidecars with only
 ``extensions.idf_report`` — no ``bw_report`` block.  Without
 ``bw_report``, the SFM PDF renderer falls back to DB-only fields
 (misses sensor-self-check, full per-channel breakdown, mic dB(L)),
 and the modal chart 404s on ``/waveform.json`` for IDFW events
 because no .h5 was written when the codec failed at ingest.
 Re-forwarding from thor-watcher would also fix this, but that requires
 operator coordination on every watcher machine and uses bandwidth this
 script doesn't.
 What this does
 ──────────────
 Walks ``<store>/<serial>/<filename>`` for ``.IDFW`` / ``.IDFH`` files
 and, for each one:
  1. Reads the existing sidecar (preserving review state + captured_at).
  2. Re-runs ``micromate.idf_file.read_idf_file()`` on the binary
     bytes — passing ``data=`` so the codec doesn't try to read from
     a path it doesn't know.
  3. Pulls ``extensions.idf_report`` (the raw parsed Thor dict the
     v0.18.0+ ingest path already stashed) and runs the v0.21.0
     ``build_bw_report_from_idf`` adapter against it.
  4. Writes the refreshed sidecar with the new ``bw_report``,
     bumped ``source.tool_version``, but preserved ``review`` block
     + the original ``captured_at`` timestamp.
  5. Regenerates the .h5 waveform file via the existing
     ``event_hdf5`` writer.  For IDFW that's the decoded per-sample
     stream; for IDFH it's a 1-sample-per-interval synthesised array
     (peak ADC count per channel) so the renderer's bar-chart code
     has data to group on.  Mic peak psi from the binary is merged
     onto the IdfEvent before the bridge so the h5 writer's per-count
     mic scale factor lands on a sensible value (without this the
     mic chart on Thor events plots dB(L)-as-pseudo-psi and shows
     bomb-level numbers).
 Idempotent.  Re-running it after a parser/adapter change just
 re-writes sidecars — no DB writes, no thor-watcher coordination.
 Usage
 ─────
    python scripts/backfill_thor_events.py [--store-root PATH]
                                           [--dry-run]
                                           [--skip-hdf5]
                                           [--force]
                                           [-v]
 By default, refreshes any Thor event whose sidecar is missing
 ``bw_report`` OR whose ``source.tool_version`` is older than the
 current ``TOOL_VERSION``.  ``--force`` refreshes every Thor event
 regardless.
 """
 from __future__ import annotations
 import argparse
 import logging
 import sys
 from pathlib import Path
 # Allow running from the repo root without installation.
 sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
 from minimateplus import event_file_io
 from sfm.waveform_store import WaveformStore
 log = logging.getLogger("backfill_thor_events")
 def _is_thor_event(path: Path) -> bool:
    if not path.is_file():
        return False
    if path.name.endswith((".sfm.json", ".h5", "_ASCII.TXT")):
        return False
    return path.suffix.upper() in (".IDFW", ".IDFH")
 def _vtuple(s: str) -> tuple:
    try:
        return tuple(int(p) for p in str(s).split(".")[:3])
    except Exception:
        return (0, 0, 0)
 def main(argv=None) -> int:
    p = argparse.ArgumentParser(description=__doc__)
    p.add_argument(
        "--db-path",
        default=str(Path(__file__).resolve().parent.parent / "bridges" / "captures" / "seismo_relay.db"),
        help="Used only to derive the default --store-root.",
    )
    p.add_argument("--store-root", default=None)
    p.add_argument("--dry-run", action="store_true")
    p.add_argument("--skip-hdf5", action="store_true",
                   help="Don't regenerate .h5 files for IDFW events.")
    p.add_argument("--force", action="store_true",
                   help="Refresh every Thor event, not just ones with stale or missing bw_report.")
    p.add_argument("-v", "--verbose", action="store_true")
    args = p.parse_args(argv)
    logging.basicConfig(
        level=logging.DEBUG if args.verbose else logging.INFO,
        format="%(asctime)s  %(levelname)-7s  %(name)s  %(message)s",
        datefmt="%H:%M:%S",
    )
    db_path = Path(args.db_path).expanduser().resolve()
    store_root = (
        Path(args.store_root).expanduser().resolve()
        if args.store_root else db_path.parent / "waveforms"
    )
    if not store_root.exists():
        log.error("store root not found: %s", store_root)
        return 1
    store = WaveformStore(store_root)
    log.info("store root: %s", store_root)
    log.info("current TOOL_VERSION: %s", event_file_io.TOOL_VERSION)
    refreshed = skipped = errors = h5_written = 0
    # Lazy imports so any one of these failing produces a useful error
    # message rather than crashing module-load.
    from micromate.idf_file import read_idf_file
    from micromate.idf_to_bw_report import build_bw_report_from_idf
    for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
        serial = serial_dir.name
        for path in sorted(serial_dir.iterdir()):
            if not _is_thor_event(path):
                continue
            sidecar_path = store.sidecar_path_for(serial, path.name)
            if not sidecar_path.exists():
                log.debug("%s: no sidecar — skipping (this is a binary without ingest history)",
                          path.name)
                skipped += 1
                continue
            try:
                existing = event_file_io.read_sidecar(sidecar_path)
            except Exception as exc:
                log.warning("%s: failed to read sidecar — %s", path.name, exc)
                errors += 1
                continue
            has_bw_report = bool(existing.get("bw_report"))
            existing_version = (existing.get("source") or {}).get("tool_version", "")
            up_to_date = (
                has_bw_report
                and _vtuple(existing_version) >= _vtuple(event_file_io.TOOL_VERSION)
            )
            if up_to_date and not args.force:
                skipped += 1
                continue
            # Re-decode the binary.  Catch + log; continue with .txt-only
            # data if it fails (matches the live ingest path's behavior).
            idf_samples = None
            idf_intervals = None
            binary_md = None
            is_histogram = path.suffix.upper() == ".IDFH"
            try:
                binary_bytes = path.read_bytes()
                res = read_idf_file(path, data=binary_bytes)
                idf_samples = res.samples or None
                idf_intervals = res.intervals
                binary_md = res.binary_metadata
                is_histogram = res.intervals is not None
            except NotImplementedError:
                # sig-B / Blastware-stray binary; no samples but adapter
                # can still produce a bw_report from extensions.idf_report.
                log.debug("%s: binary codec NotImplementedError (sig-B / BW-stray); proceeding from sidecar's idf_report only", path.name)
            except Exception as exc:
                log.warning("%s: binary decode failed — %s; proceeding from sidecar's idf_report only", path.name, exc)
            # Run the adapter.  Pull report_dict from
            # extensions.idf_report (the v0.18.0+ ingest preserved it).
            report_dict = (existing.get("extensions") or {}).get("idf_report") or {}
            if not report_dict and binary_md is None:
                log.debug("%s: no idf_report in sidecar AND no binary metadata — nothing to project", path.name)
                skipped += 1
                continue
            try:
                bw_report = build_bw_report_from_idf(
                    report_dict, binary_md=binary_md,
                    intervals=idf_intervals, is_histogram=is_histogram,
                )
            except Exception as exc:
                log.warning("%s: adapter failed — %s", path.name, exc)
                errors += 1
                continue
            # Build the new sidecar by overlaying refreshed fields onto
            # the existing one — preserves review, captured_at, blastware
            # block, source.kind, etc.
            new_sidecar = dict(existing)  # shallow copy
            new_sidecar["bw_report"] = bw_report
            src = dict(new_sidecar.get("source") or {})
            src["tool_version"] = event_file_io.TOOL_VERSION
            new_sidecar["source"] = src
            # Preserve histogram intervals if the binary decoded them
            # (improves over the original ingest if that one ran before
            # the bee1185 codec fix).
            if idf_intervals is not None:
                ext = dict(new_sidecar.get("extensions") or {})
                ext["idf_intervals"] = [
                    {
                        "offset":     iv.offset,
                        "tran_peak":  iv.peak_count("Tran"),
                        "tran_halfp": iv.tran_halfp,
                        "tran_freq":  iv.freq_hz("Tran"),
                        "vert_peak":  iv.peak_count("Vert"),
                        "vert_halfp": iv.vert_halfp,
                        "vert_freq":  iv.freq_hz("Vert"),
                        "long_peak":  iv.peak_count("Long"),
                        "long_halfp": iv.long_halfp,
                        "long_freq":  iv.freq_hz("Long"),
                        "mic_peak":   iv.peak_count("MicL"),
                        "mic_halfp":  iv.micl_halfp,
                        "mic_freq":   iv.freq_hz("MicL"),
                    }
                    for iv in idf_intervals
                ]
                new_sidecar["extensions"] = ext
            if args.dry_run:
                will_write_h5 = (idf_samples or idf_intervals) and not args.skip_hdf5
                log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)",
                         serial, path.name,
                         "wrote" if not has_bw_report else "refreshed",
                         "would write" if will_write_h5 else "skipped")
            else:
                event_file_io.write_sidecar(sidecar_path, new_sidecar)
                log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)",
                         serial, path.name,
                         "added" if not has_bw_report else "refreshed",
                         len(idf_intervals) if idf_intervals else 0)
            refreshed += 1
            # Regenerate .h5 by replaying the same IdfEvent → Event bridge
            # save_imported_idf uses.  For IDFW we write the decoded per-
            # sample arrays.  For IDFH we synthesise a 1-sample-per-interval
            # array (peak ADC count per channel per interval) so the
            # renderer's bar-chart code has something to group on.
            # Pre-condition: either real samples (IDFW) or decoded intervals
            # (IDFH).  Skip otherwise.
            have_data = bool(idf_samples) or bool(idf_intervals)
            if have_data and not args.skip_hdf5:
                from sfm import event_hdf5
                hdf5_path = store.hdf5_path_for(serial, path.name)
                if args.dry_run:
                    log.debug("[DRY] would write %s", hdf5_path.name)
                else:
                    try:
                        from micromate import IdfEvent
                        from minimateplus.event_file_io import file_sha256
                        idf_event = IdfEvent.from_report(report_dict, path.name)
                        # Merge the binary-derived mic peak psi (only the
                        # binary path knows the proper psi value; the .txt
                        # carries dB(L)).  Without this, the h5 writer's
                        # per-count mic factor is computed against the
                        # dB(L) value-as-pseudo-psi and the mic chart
                        # scales wildly.
                        if (binary_md is not None and res is not None
                                and res.event.peaks.mic_pspl_psi is not None):
                            idf_event.peaks.mic_pspl_psi = res.event.peaks.mic_pspl_psi
                        sha256 = file_sha256(path)
                        waveform_key = bytes.fromhex(sha256)[:16]
                        ev = idf_event.to_minimateplus_event(waveform_key)
                        if is_histogram and idf_intervals:
                            # 1 sample per interval per channel — same
                            # synthesis save_imported_idf uses.  The h5
                            # writer's count×geo_fs/32768 conversion turns
                            # each peak-ADC-count into the bar's physical
                            # value.
                            ev.raw_samples = {
                                "Tran": [iv.peak_count("Tran") for iv in idf_intervals],
                                "Vert": [iv.peak_count("Vert") for iv in idf_intervals],
                                "Long": [iv.peak_count("Long") for iv in idf_intervals],
                                "MicL": [iv.peak_count("MicL") for iv in idf_intervals],
                            }
                            ev.total_samples = ev.total_samples or len(idf_intervals)
                        elif idf_samples:
                            ev.raw_samples = idf_samples
                            n_samp = max(
                                (len(idf_samples.get(ch, []))
                                 for ch in ("Tran", "Vert", "Long", "MicL")),
                                default=0,
                            )
                            ev.total_samples = ev.total_samples or n_samp
                        event_hdf5.write_event_hdf5(
                            hdf5_path, ev,
                            serial=serial,
                            geo_range="normal",
                            source_kind="idf-import",
                            tool_version=event_file_io.TOOL_VERSION,
                        )
                        h5_written += 1
                        log.debug("%s/%s — .h5 written (%s)",
                                  serial, path.name,
                                  f"{len(idf_intervals)} intervals" if is_histogram
                                  else f"{sum(len(v) for v in (idf_samples or {}).values())} samples")
                    except Exception as exc:
                        log.warning("%s/%s — .h5 write failed: %s",
                                    serial, path.name, exc)
    log.info("Done.  refreshed=%d  skipped=%d  errors=%d  h5_written=%d",
             refreshed, skipped, errors, h5_written)
    return 0 if errors == 0 else 2
 if __name__ == "__main__":
    sys.exit(main())
@@ -0,0 +1,91 @@
 """Re-ingest a prod IDFW + IDFH via the patched save_imported_idf and
 render both PDFs to confirm charts have data."""
 from __future__ import annotations
 import sys
 import json
 import datetime
 import tempfile
 from pathlib import Path
 sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
 from sfm.waveform_store import WaveformStore
 from sfm import report_pdf
 import h5py
 class FakeDb:
    def __init__(self, event):
        self.event = event
    def get_event(self, _id):
        return self.event
 def to_ts_iso(ts):
    if ts is None:
        return None
    try:
        return datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
    except Exception:
        return None
 def render_case(idf_path: Path, serial: str, out_pdf: Path, h5_summary: bool = True):
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idf_path.read_bytes(),
            idf_path,
            idf_report_text=None,    # production worst case: no .txt
        )
        print(f"=== {idf_path.name} ===")
        print(f"  h5: {rec['hdf5_filename']}, sidecar: {rec['sidecar_filename']}")
        h5p = Path(td) / serial / f"{idf_path.name}.h5"
        if h5p.exists() and h5_summary:
            with h5py.File(h5p) as h:
                for ch in ("Tran", "Vert", "Long", "MicL"):
                    ds = h.get(f"samples/{ch}")
                    if ds is not None:
                        n = ds.shape[0]
                        mx = float(abs(ds[...]).max()) if n else 0
                        print(f"  samples/{ch}: n={n}  max_abs={mx:.5f}")
        record_type = "Histogram" if idf_path.suffix.upper() == ".IDFH" else "Waveform"
        fake_row = {
            "serial":              serial,
            "blastware_filename":  rec["filename"],
            "record_type":         record_type,
            "timestamp":           to_ts_iso(ev.timestamp),
            "sample_rate":         ev.sample_rate,
            "project":             ev.project_info.project if ev.project_info else None,
            "client":              ev.project_info.client if ev.project_info else None,
            "operator":            ev.project_info.operator if ev.project_info else None,
            "sensor_location":     ev.project_info.sensor_location if ev.project_info else None,
            "created_at":          None,
        }
        rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
        print(f"  ReportData: channels={ {k: len(v) for k,v in rd.channels.items()} }")
        if rd.is_histogram:
            print(f"  histogram n_intervals={rd.histogram_n_intervals} interval_size={rd.histogram_interval_size}")
        pdf = report_pdf.render_event_report_pdf(rd)
        out_pdf.write_bytes(pdf)
        print(f"  PDF: {out_pdf}  ({len(pdf)} bytes)")
 def main():
    out_dir = Path("/tmp/thor_render_test"); out_dir.mkdir(exist_ok=True)
    cases = [
        # IDFW that decoded to preamble-only under the old codec
        ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804154137.IDFW", "UM6047"),
        # IDFW that worked under the old codec (validates no regression)
        ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804104450.IDFW", "UM6047"),
        # IDFH histogram
        ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804190047.IDFH", "UM6047"),
    ]
    for path, serial in cases:
        render_case(Path(path), serial, out_dir / f"{Path(path).name}.pdf")
 if __name__ == "__main__":
    main()
@@ -638,14 +638,7 @@ def _draw_channel_stats_waveform(ax, rd: ReportData) -> None:
        ("Sensor Check",         "sensor_check",   ""),
    ]
    _draw_stats_table(ax, rd, rows_spec)
-    if rd.peak_vector_sum_ips is not None:
+    _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec))
        line = f"Peak Vector Sum   {rd.peak_vector_sum_ips:.3f} in/s"
        if rd.peak_vector_sum_time_s is not None:
            line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
        ax.text(0.0, -0.08, line, fontsize=9, weight="bold",
                ha="left", va="top", transform=ax.transAxes)
        ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888",
                ha="left", va="top", transform=ax.transAxes)
 def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
@@ -663,20 +656,54 @@ def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
        ("Sensor Check", "sensor_check",    ""),
    ]
    _draw_stats_table(ax, rd, rows_spec)
-    if rd.peak_vector_sum_ips is not None:
+    _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec), histogram_when=True)
-        line = f"Peak Vector Sum   {rd.peak_vector_sum_ips:.3f} in/s"
+
-        # Histograms: "0.091 in/s on May 27, 2026 At 06:06:14"
+
-        # The when_str is "HH:MM:SS Month DD, YYYY" — reformat for BW match.
+def _draw_pvs_summary(
-        if rd.peak_vector_sum_when_str:
+    ax,
-            parts = rd.peak_vector_sum_when_str.split(" ", 1)
+    rd: ReportData,
-            if len(parts) == 2:
+    *,
-                line += f" on {parts[1]} At {parts[0]}"
+    n_data_rows: int,
-            else:
+    histogram_when: bool = False,
-                line += f" on {rd.peak_vector_sum_when_str}"
+) -> None:
-        ax.text(0.0, -0.08, line, fontsize=9, weight="bold",
+    """Render the Peak Vector Sum + 'NA: Not Applicable' caption below the
-                ha="left", va="top", transform=ax.transAxes)
+    stats table.
-        ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888",
+
-                ha="left", va="top", transform=ax.transAxes)
+    Reads ``ax._stats_table_bottom`` (set by ``_draw_stats_table`` when
    it pins the table via an explicit ``bbox``) so the PVS line lands
    just below the table's known bottom edge instead of guessing at the
    geometry.
    Centered horizontally for visual balance (the previous left-aligned
    x=0 landed under the label column, not the data, which looked off).
    """
    if rd.peak_vector_sum_ips is None:
        return
    line = f"Peak Vector Sum   {rd.peak_vector_sum_ips:.3f} in/s"
    if histogram_when and rd.peak_vector_sum_when_str:
        # Histogram absolute date+time.  when_str is "HH:MM:SS Month DD, YYYY";
        # reformat to "<value> on <date> At <time>" to match BW.
        parts = rd.peak_vector_sum_when_str.split(" ", 1)
        if len(parts) == 2:
            line += f" on {parts[1]} At {parts[0]}"
        else:
            line += f" on {rd.peak_vector_sum_when_str}"
    elif not histogram_when and rd.peak_vector_sum_time_s is not None:
        line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
    # _draw_stats_table stashes the bbox bottom on the axes so we don't
    # have to guess geometry.  Falls back to a conservative default if
    # the bbox approach hasn't run.
    table_bottom_y = getattr(ax, "_stats_table_bottom", -0.10)
    pvs_y = table_bottom_y - 0.04   # small gap below the table border
    # Centered for visual balance — looks intentional rather than offset.
    # The original BW-replica had a "NA: Not Applicable" caption below
    # this line; dropped because we use "—" for missing values and the
    # legend was always squished against the PVS line.
    ax.text(0.5, pvs_y, line, fontsize=9, weight="bold",
            ha="center", va="top", transform=ax.transAxes)
 def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]]) -> None:
@@ -711,16 +738,28 @@ def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]])
            _cell(field_name, "Long"),
            unit,
        ])
    # Pin the table's position+size via bbox so we know exactly where
    # the bottom edge lands.  Lets _draw_pvs_summary place the PVS line
    # just below the table without guessing at row heights.
    #
    # bbox = [x, y, width, height] in axes coords.  Header + data rows
    # at row_h each; horizontal extent matches sum(colWidths).
    n_rows = len(table_data)        # header + data rows
    row_h  = 0.12                   # axes-fraction per row (fits fontsize=8)
    table_height = n_rows * row_h
    table_bottom = 1.0 - table_height
    tbl = ax.table(
-        cellText=table_data, loc="upper left",
+        cellText=table_data,
        colWidths=[0.28, 0.14, 0.14, 0.14, 0.10],
        cellLoc="left", edges="open",
        bbox=[0.0, table_bottom, 0.80, table_height],
    )
    tbl.auto_set_font_size(False)
    tbl.set_fontsize(8)
    tbl.scale(1, 1.4)
    for j in range(5):
        tbl[(0, j)].set_text_props(weight="bold", color="#555")
    # Stash the bottom Y so _draw_pvs_summary can position itself below.
    ax._stats_table_bottom = table_bottom
 def _channel_axis_color(ch: str) -> str:
@@ -568,6 +568,16 @@ class WaveformStore:
        # precedence over the filename timestamp inside from_report().
        idf_event = IdfEvent.from_report(report_dict, source_path.name)
        # The binary mic peak (psi) isn't carried through from_report() —
        # IdfReport.from_dict only sees the .txt's dB(L) value.  Pull the
        # binary-derived ``mic_pspl_psi`` onto the typed IdfEvent so the
        # downstream bridge can populate ``PeakValues.micl`` (psi-shaped)
        # and the h5 writer's per-count mic factor lands at a sensible
        # value.  Without this, the h5 mic chart auto-scales against the
        # dB(L) value-as-pseudo-psi and renders ~flat.
        if binary_peaks is not None and binary_peaks.mic_pspl_psi is not None:
            idf_event.peaks.mic_pspl_psi = binary_peaks.mic_pspl_psi
        # Operator-supplied serial_hint wins over the binary's filename
        # prefix when both are present (e.g. callers passing a known-good
        # serial that overrides a misnamed export).
@@ -600,10 +610,28 @@ class WaveformStore:
            n_samples = max((len(idf_samples.get(ch, [])) for ch in ("Tran", "Vert", "Long", "MicL")), default=0)
            ev.total_samples = ev.total_samples or n_samples
-        # 7. Write the .h5 clean-waveform file when we actually have samples.
+        # For IDFH histograms there are no per-sample waveform arrays — the
-        # Histograms (IDFH) don't have waveform samples — skip h5 for those.
+        # device stores one peak ADC count per interval per channel.  Synthesise
        # a 1-sample-per-interval array so the existing h5+renderer pipeline
        # (which groups samples down to ``n_intervals`` bars via max-per-group)
        # produces a non-blank histogram chart.  Each "sample" is the peak ADC
        # count for that interval, so the h5 writer's ``count × geo_fs/32768``
        # conversion yields the right physical value for the bar height.
        if is_histogram and idf_intervals:
            hist_samples = {
                "Tran": [iv.peak_count("Tran") for iv in idf_intervals],
                "Vert": [iv.peak_count("Vert") for iv in idf_intervals],
                "Long": [iv.peak_count("Long") for iv in idf_intervals],
                "MicL": [iv.peak_count("MicL") for iv in idf_intervals],
            }
            ev.raw_samples = hist_samples
            ev.total_samples = ev.total_samples or len(idf_intervals)
        # 7. Write the .h5 clean-waveform file when we have samples to write
        # (either the IDFW per-sample stream, or the IDFH synthesised per-
        # interval peak array).  The renderer treats both shapes the same way.
        hdf5_filename: Optional[str] = None
-        if idf_samples is not None and not is_histogram:
+        if ev.raw_samples:
            hdf5_path = self.hdf5_path_for(serial, filename)
            try:
                event_hdf5.write_event_hdf5(
Author	SHA1	Message	Date
serversdown	25386cab8b	fix(backfill): regenerate IDFH .h5 + merge binary mic_pspl_psi onto bridge Two gaps in backfill_thor_events.py that left old Thor events showing stale charts after a v0.21.1 backfill pass: 1. IDFH events were skipped from .h5 regeneration (the "have decoded samples" gate was IDFW-only). Histograms kept their pre-v0.21.1 .h5 — written from raw_samples = None, which the renderer turned into a near-empty bar chart, or for older events the dB(L)-as-pseudo- psi mic scale that produced "107.7 psi" peaks (atomic-bomb level instead of footstep level). Fix: synthesise the same 1-sample-per- interval array save_imported_idf v0.21.1 uses (peak ADC count per channel per interval) so the renderer's bar-chart grouping has data to work with. 2. The IDFW h5 path didn't merge binary_peaks.mic_pspl_psi onto the IdfEvent before to_minimateplus_event(). The live save_imported_idf does this merge — without it, IdfEvent.from_report() only sees the .txt's dB(L) value, the bridge falls back to the dBL→psi formula (instead of the binary-accurate 2.14e-6 psi/count value), and the h5 writer's per-count mic factor lands on a less-correct value. Fix: same merge the live ingest does (lift res.event.peaks.mic_pspl_psi onto idf_event.peaks before the bridge call). Verified against UM6047_20250804190047.IDFH (250-interval prod histogram): 250 intervals decode, mic_pspl_psi = 2.78e-5 (was being treated as dB(L)=107.7 in the old h5). Operator: re-run after deploy. `docker compose exec sfm python scripts/backfill_thor_events.py` is idempotent — the existing version check still skips events already at the new TOOL_VERSION, and review state + captured_at are preserved on the second pass.	2026-06-01 20:02:54 +00:00
serversdown	6cb619ecc4	version bump - 0.21.1	2026-06-01 19:33:44 +00:00
serversdown	1ed86244d0	fix(thor-events): add parallel field for mic psi. Now shows mic in dbl and psi. (psi for charts)	2026-06-01 18:27:24 +00:00
serversdown	b2c565f217	fix(idf_waveforms): _find_waveform_body_offset() — scans every 00 02 00 magic past offset 0x0E00, runs decode_waveform_v2 on each candidate, picks the one that returns the most samples. Validated on 483 prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully decode, 126/483 partial (BW codec walker-stops-early on loud events — known issue). IDFH now synthesises a 1-sample-per-interval array from the binary intervals and writes an .h5 so the existing renderer works unchanged. Each "sample" is the per-interval peak ADC count → h5_value = count × geo_fs/32768 yields the right bar height.	2026-05-31 20:51:09 +00:00
serversdown	43f440812a	scripts: add backfill_thor_events.py Refreshes the bw_report sidecar block + .h5 waveform files for Thor events ingested before the v0.21.0 adapter wiring + the `bee1185` codec fix. Those events landed with extensions.idf_report only (no bw_report, no .h5 for IDFW) — symptom on the UI side: the modal chart 404'd on /waveform.json and the PDF rendered from DB-only fields without sensor self-check, full per-channel breakdown, or mic dB(L). Walks <store>/<serial>/<filename>: - Reads the existing sidecar (preserves review state + captured_at) - Re-runs read_idf_file() on the binary bytes (passes data= kwarg so codec doesn't try the broken bare-path Path.read_bytes) - Reads extensions.idf_report from the existing sidecar - Runs build_bw_report_from_idf adapter - Writes refreshed sidecar with bw_report + bumped tool_version, preserving review block and original captured_at - For IDFW: regenerates .h5 by bridging IdfEvent.from_report -> to_minimateplus_event -> write_event_hdf5 (mirrors save_imported_idf steps 4-7) - IDFH events skip .h5 (histograms have no per-sample data) Skips events already at current TOOL_VERSION with bw_report present. --force overrides. --skip-hdf5 limits to sidecar-only refresh. --dry-run for preview. Validated against the prod-snap waveform store: 3,815 Thor sidecars refreshed cleanly with 0 errors, 462 IDFW .h5 files written, 2 skipped (binaries with no sidecar — backfill doesn't conjure events from nothing). Verified one originally-broken IDFW event now serves waveform.json (200, 168KB) and a fully populated PDF (119KB vs the previous 56KB sparse output). Operator workflow on prod: docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py --dry-run # Inspect counts, then for real: docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py Idempotent — re-running it is a no-op once everything's at the current TOOL_VERSION. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 04:37:43 +00:00
serversdown	23e83908c2	report_pdf: fix PVS overlapping stats table, drop NA caption Two related fixes to the per-channel stats block: 1. Pin the stats table's position via an explicit bbox= on ax.table() so the bottom edge is at a known axes-fraction Y. The previous loc="upper left" + tbl.scale(1, 1.4) combo let matplotlib choose row heights based on text size, which made the table extend further below the axes than the hard-coded PVS line at y=-0.08 expected. Result was the "Peak Vector Sum X in/s" string landing horizontally inside the Peak Displacement row. With bbox=[0, 1-N0.12, 0.80, N0.12] the table is pinned to a precise rectangle (12% axes-fraction per row × N rows tall). _draw_stats_table now stashes the bottom Y on the axes for the PVS helper to reference, so the geometry stays in sync. 2. Center PVS horizontally (ha="center" at x=0.5 instead of ha="left" at x=0). The previous left-edge alignment put PVS at the same X as the label column, which read as "off-center" once the rest of the stats data was column-aligned further right. 3. Drop the "NA: Not Applicable" caption. It existed to explain "—" placeholder cells, but "—" is universally understood and the caption was always visually squished against the PVS line below. Less cruft on the page; one fewer position to manage. Verified against a real BE12599 histogram event (5 data rows) and a real UM12947 IDFW waveform event (6 data rows) — both layouts clear the table cleanly with no overlap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 22:17:43 +00:00