Merge pull request 'update to v0.21.1, thor data import successful' (#29 ) from dev into main

Reviewed-on: #29
fix(backfill): regenerate IDFH .h5 + merge binary mic_pspl_psi onto bridge
2026-06-01 16:54:23 -04:00 · 2026-06-01 20:02:54 +00:00 · 2026-06-01 19:33:44 +00:00 · 2026-06-01 18:27:24 +00:00 · 2026-05-31 20:51:09 +00:00 · 2026-05-30 04:37:43 +00:00
36 changed files with 2913 additions and 116 deletions
@@ -8,6 +8,97 @@ All notable changes to seismo-relay are documented here.
 ---
 ## v0.21.1 — 2026-06-01
 Bug fixes against v0.21.0 surfaced after the first prod redeploy.  Three
 production-visible symptoms — blank waveform charts on most Thor events,
 blank histogram charts on all Thor events, and a mic chart that
 auto-scaled against a dB(L) value treated as psi — all root-caused and
 fixed.
 ### Fixed
 - **Dynamic IDFW body offset.**  The v0.21.0 codec hardcoded the body
  at file offset `0x0f1f` based on the example corpus, but only ~52%
  of production IDFW events use that offset; the rest sit at offsets
  from `0x1033` up to `0x3082` depending on header padding.  At
  `0x0f1f` the codec would find a coincidentally-matching `00 02 00`
  magic, read the 2-byte Tran preamble, and return empty V/L/M
  arrays — producing near-empty .h5 files and blank charts.
  `micromate.idf_file._find_waveform_body_offset()` now scans every
  `00 02 00` magic position past `0x0E00`, trial-decodes each one,
  and picks the offset with the most samples.  Validated across 483
  prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully
  decode, 126/483 partial (BW codec walker-stops-early on loud
  events — pre-existing limitation, samples reached are correct).
 - **IDFH histograms now render bar charts.**  Histograms previously
  skipped the .h5 write because there are no per-sample arrays, but
  the renderer drives the per-interval bar chart from .h5 channel
  data + `bw_report.histogram.n_intervals`.  `save_imported_idf` now
  synthesizes a 1-sample-per-interval array from the decoded
  `IdfhInterval` peak counts and writes an .h5 so the existing
  renderer works unchanged — each "sample" is the per-interval peak
  ADC count, so the writer's `count × geo_fs/32768` conversion
  yields the right bar height.
 - **Mic chart scaling on Thor events.**  `PeakValues.micl` (consumed
  by the h5 writer's per-count mic scale factor) expects psi, but
  the Thor bridge was stuffing the dB(L) value (~99.4) into it,
  producing a per-count factor 5+ orders of magnitude too large and
  a flat-looking mic chart.  Fixed by adding `IdfPeaks.mic_pspl_psi`
  alongside `mic_pspl_dbl`; `read_idf_file()` computes it from
  binary mic counts (`max(|MicL|) × 2.14e-6 psi/count`) for both
  IDFW and IDFH paths; `save_imported_idf` merges it onto the typed
  event after `IdfEvent.from_report`; the bridge feeds psi to
  `PeakValues.micl` with a dB(L)→psi formula fallback when only the
  dB(L) value is available.  dB(L) for the report header still
  flows through `bw_report.mic.pspl_dbl` unchanged.
 ### Operator
 After deploy, run `python scripts/backfill_thor_events.py` to refresh
 every existing Thor event's sidecar + .h5 with the corrected codec
 output.  The script auto-skips events already at the current
 `TOOL_VERSION`, so the bump from `0.21.0` → `0.21.1` is what triggers
 the refresh.
 ---
 ## v0.21.0 — 2026-05-29
 The "Thor / Series IV codec" release.  Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline.  Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event.
 ### Added — Thor IDF binary codec (`micromate/idf_file.read_idf_file`)
 - **IDFW (waveform)** — body sits at fixed file offset `0x0f1f`; reuses the verified `decode_waveform_v2()` walker from `minimateplus.waveform_codec`.  Sample fidelity is **87–99% byte-exact** against the ASCII-sidecar reference values on quiet events; loud events hit the same walker-stops-early limitation as the BW codec on `SP0/SS0/SV0`-style events.
 - **IDFH (histogram)** — dedicated segment-based decoder for the Thor histogram body format: `[len_be][0a 00 00 00][00 NN][05 3f]` framing plus N × 72-byte interval records (4 × 16-byte per-channel min/max/halfp).  **All 859 Thor IDFH corpus files decode**, totalling **181,071 intervals**; per-channel peaks match the sidecar within **~1.8% (ADC quantization)**.
 - **BW-aliased binary detection** — a small number of corpus files (e.g. `BE9439_*.IDFW/IDFH`) are actually Series III Blastware binaries that share the IDF filename convention by accident.  `read_idf_file()` detects them via their BW `STRT` signature and raises `NotImplementedError` pointing the caller at `read_blastware_file()` instead of trying to decode them as IDF.
 - Full field layouts in `docs/idf_protocol_reference.md`; supporting analysis scripts in `analysis_idf/` (decode validators, per-file detail dumps, corpus accuracy reports).
 ### Added — Thor → BW report adapter (`micromate/idf_to_bw_report.py`)
 - **`build_bw_report_from_idf(report_dict, binary_md=, intervals=, is_histogram=)`** projects a parsed Thor `IdfReport` plus binary-extracted metadata plus decoded IDFH intervals into the `bw_report`-shaped dict that `sfm.report_pdf.gather_report_data` consumes.  No need to duplicate the renderer — Thor data is ~95% the same metric set as BW; the adapter handles the field-name mapping (`MicPSPL` → `pspl_dbl`, `>100` sentinel → `zc_freq_above_range`, free-form `Calibration : Nov 22, 2023 by Instantel` → `calibration_date` + `calibration_by`, etc.).
 - For IDFH events the adapter derives `histogram.interval_times` by stepping `IntervalSize` from `HistogramStartTime`, matching what the BW pipeline expects from a histogram-mode event.
 - **Wired into `WaveformStore.save_imported_idf`** — every Thor event ingested via `/db/import/idf_file` now gets a `bw_report` block in its sidecar in addition to the existing `extensions.idf_report` (the raw parsed Thor payload).  Falls back gracefully (PDF renders from DB-only fields) if the adapter raises — logged as a warning rather than failing the ingest.
 ### Companion releases
 - **Terra-View v0.13.0** ships in parallel — closes Phase 1 of the SFM integration.  The shared event-detail modal now renders the SFM event story (Chart.js waveform/histogram chart, inline PDF preview, `.TXT` download, FT/reviewer/notes review form) without operators needing to bounce to the standalone SFM webapp on port 8200.  Uses only existing seismo-relay endpoints — no API changes here, just better consumption.
 ### Migration / Operations
 No DB migration needed.  Existing Thor events already in the store don't automatically pick up the new `bw_report` block — they'd need a re-ingest (post the IDF binary + paired `.TXT` back to `/db/import/idf_file`) for the adapter to run.  Alternatively, run `scripts/backfill_sidecars.py --reparse-txt` after a small adapter change (the script currently only re-runs the BW ASCII parser; extending it to handle Thor would be a small follow-up).
 ```bash
 cd /home/serversdown/terra-view
 docker compose build sfm && docker compose up -d sfm
 ```
 The bumped `TOOL_VERSION = "0.21.0"` in `minimateplus/event_file_io.py` means any subsequent `backfill_sidecars.py --force` pass will re-write sidecars with the new version stamp; that's expected and harmless.
 ---
 ## v0.20.0 — 2026-05-28
 The "PDF + parser polish" release.  Closes out the Event-Report PDF iteration started in v0.17.x: histogram layouts now render correctly against BW reference PDFs, the ASCII parser handles the real-world edge cases production events were tripping over (OORANGE, `>100 Hz`, histogram timestamps), and the `.TXT` preservation rollout lets parser fixes be applied retroactively to ingested events.  Adds server-wide timezone support so operator-visible timestamps no longer drift into UTC.  Rolls up the substantial "pre-v0.20" body of work that had accumulated under `[Unreleased]` (PDF generation, histogram codec fix, histogram parser fields, `.TXT` preservation, backfill safety) — see the trailing "pre-v0.20.0 work" section below for the full list.
@@ -2,7 +2,7 @@
 Ground-up Python replacement for **Blastware**, Instantel's Windows-only software for
 managing MiniMate Plus seismographs. Connects over direct RS-232 or cellular modem
-(Sierra Wireless RV50 / RV55). Current version: **v0.20.0**.
+(Sierra Wireless RV50 / RV55). Current version: **v0.21.0**.
 When new information about the protocol is discovered, please update the instantel_protocol_reference.md with the findings in addition to this document
@@ -73,6 +73,28 @@ should not import from `sfm/`, must not touch a DB, and have no I/O
 beyond reading files passed as arguments.  Keep them pure — both
 tiers can then depend on them without circularity.
 #### Thor IDF binary codec (2026-05-28)
 `micromate/idf_file.read_idf_file()` decodes both Thor IDFW
 (waveform) and IDFH (histogram) binaries.
 - **IDFW** reuses `decode_waveform_v2()` on the body at fixed file
  offset `0x0f1f`.  Sample fidelity is 87–99% byte-exact on quiet
  events; loud events hit the BW codec's known walker-stops-early
  limitation.
 - **IDFH** has its own segment-based decoder: `[len_be][0a 00 00 00]
  [00 NN][05 3f]` + N × 72-byte interval records (4 × 16-byte
  per-channel min/max/halfp).  All 859 Thor IDFH corpus files
  decode (181,071 intervals); peak matches sidecar within ~1.8%
  (ADC quantization).
 The two outlier `BE9439_*` files in the Thor example corpus are
 actually Series III Blastware binaries that share the `.IDFW`/`.IDFH`
 filename convention by accident.  `read_idf_file()` detects them by
 their BW STRT signature and raises NotImplementedError pointing
 callers at `read_blastware_file()`.  See
 `docs/idf_protocol_reference.md` for full field layouts.
 ### Practical consequences
 When deciding where new code goes, ask:
@@ -1,4 +1,4 @@
-# seismo-relay  `v0.20.0`
+# seismo-relay  `v0.21.0`
 A ground-up replacement for **Blastware** — Instantel's aging Windows-only
 software for managing seismographs.  Supports both the **MiniMate Plus
@@ -45,6 +45,15 @@ over direct RS-232 or cellular modem (Sierra Wireless RV50 / RV55).
 > `scripts/backfill_sidecars.py --reparse-txt` lets parser fixes be
 > applied retroactively to existing events without re-forwarding,
 > using the `.TXT` files preserved at ingest time.
 > **v0.21.0 (2026-05-29)** is the Thor / Series IV decoder release —
 > `micromate/idf_file.read_idf_file()` now decodes both IDFW
 > (waveform) and IDFH (histogram) binaries (87–99% sample fidelity
 > on quiet IDFW events; all 859 IDFH corpus files decode cleanly).
 > A new `micromate/idf_to_bw_report.py` adapter projects parsed
 > Thor reports into the BW-shaped sidecar block, so Thor events
 > flow through the existing Event Report PDF pipeline without a
 > separate renderer.  Terra-View v0.13.0 ships in parallel and
 > closes Phase 1 of the SFM integration — see its CHANGELOG.
 > See [CHANGELOG.md](CHANGELOG.md) for full version history.
 ---
@@ -68,7 +77,8 @@ seismo-relay/
 ├── micromate/                 ← Series IV (Micromate / Thor) client library (NEW v0.19)
 │   ├── models.py              ←   IdfEvent, IdfReport, IdfPeaks, IdfProjectInfo, IdfSensorCheck (mic in native dB(L))
 │   ├── idf_ascii_report.py    ←   Parse Thor .IDFW.txt / .IDFH.txt event sidecars
-│   └── idf_file.py            ←   Stub for the .IDFW / .IDFH binary codec (reverse-engineering pending)
+│   ├── idf_file.py            ←   Binary codec for .IDFW + .IDFH (v0.21.0+)
 │   └── idf_to_bw_report.py    ←   Adapter projecting Thor IDF into the BW report shape (v0.21.0+)
 │
 ├── sfm/                       ← SFM REST API server (FastAPI, port 8200)
 │   ├── server.py              ←   Live device endpoints + DB query + ingest endpoints + caching
@@ -425,7 +435,7 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows.
 - [x] Thor IDF file ingest at `/db/import/idf_file` (paired with `thor-watcher`, v0.18.0+)
 - [x] Native `IdfEvent` / `IdfReport` typed models — mic in dB(L), full title strings, sensor self-check, calibration, firmware version
 - [x] Parser verified against 1,014 paired `.txt` sidecars in `thor-watcher/example-data/`
- [ ] Binary `.IDFW` / `.IDFH` codec — pending (see Roadmap + [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md))
+- [x] Binary `.IDFW` / `.IDFH` codec — ✅ v0.21.0.  IDFW reuses `decode_waveform_v2()` on the body at offset `0x0f1f` (87–99% sample fidelity on quiet events); IDFH has a dedicated segment-based decoder (all 859 corpus files decode, 181,071 intervals total).  See `micromate/idf_file.py` + `docs/idf_protocol_reference.md`.
 - [ ] Live-device protocol — pending codec
 **Data persistence:**
@@ -538,7 +548,7 @@ Implementation steps (concrete):
 ### High-impact (unblocks product features)
 - [ ] **Series III waveform body codec reverse-engineering.**  The 5A bulk-stream body is some kind of compressed/encoded format (not raw int16 LE as previously assumed — see §7.6.1 retraction in `docs/instantel_protocol_reference.md`).  Structural framing is ~50% decoded on branch `claude/codec-re-cBGNe` (tagged-block walker, segment counters); per-byte sample mapping is still open.  Until this lands, the in-app waveform viewer renders garbage and BW-import peak values fall back to `_peaks_from_samples()` saturation noise.  Workaround: pair every BW-imported event with its `_ASCII.TXT` so the device-authoritative peaks land in the DB regardless of codec.
- [ ] **Series IV (Thor IDF) binary codec reverse-engineering.**  `.IDFH` / `.IDFW` files are currently stored opaquely by `WaveformStore.save_imported_idf`, with all metadata sourced from the paired `.txt` sidecar.  This works because thor-watcher forwards both files together, but operators who haven't enabled Thor's TXT exporter get rows with NULL peaks.  Cracking the binary closes that gap and unlocks waveform display.  Starting-point reference at [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md) — two observed file signatures (1,012 newer-firmware files + 2 old files whose layout matches the Series III STRT-record format), suggested first-session plan (~2-4 hrs), 1,014 paired binary+txt files available as ground truth in `thor-watcher/example-data/`.  Code seam ready at `micromate/idf_file.py`.
+- [x] **Series IV (Thor IDF) binary codec reverse-engineering.** ✅ v0.21.0 — `micromate/idf_file.read_idf_file()` decodes both IDFW (waveform body at offset `0x0f1f`, reusing `decode_waveform_v2()`; 87–99% sample fidelity on quiet events) and IDFH (dedicated segment-based decoder: all 859 corpus files decode, 181,071 intervals, peaks within ~1.8% of sidecar values).  `WaveformStore.save_imported_idf` now also projects parsed Thor data into a `bw_report` block via `micromate/idf_to_bw_report.py` so Thor events render in the existing Event Report PDF pipeline without a separate renderer.
 - [ ] **In-app waveform viewer accuracy.**  Depends on Series III codec decode.  Plot.v1 JSON pipeline + viewer skeleton already exist; will start showing real waveforms automatically once `_decode_a5_waveform` produces correct samples.  Series IV waveforms come online when the IDF codec lands.
 - [ ] **Series IV live-device support.**  Once the IDF binary is decoded, extend `micromate/` with `transport.py` / `framing.py` / `protocol.py` / `client.py` mirroring the `minimateplus/` package layout — depends on capturing Thor's wire protocol (TCP / RS-232 captures TBD).
 - [ ] **Terra-view integration** — seismo-relay router, unit detail page, VISON-style event listing.
@@ -0,0 +1,65 @@
 """Run read_idf_file across the corpus and report per-channel accuracy vs sidecars."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from micromate.idf_file import read_idf_file
 from analysis_idf.recon import load_sidecar_samples
 def sidecar_path(idfw: Path) -> Path:
    return idfw.parent / "TXT" / f"{idfw.name}.txt"
 def main():
    root = REPO / "tests/fixtures/THORDATA_example"
    files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]
    files.sort()
    GEO_LSB = 0.0003
    n_ok = n_skip = 0
    overall = {"Tran": [], "Vert": [], "Long": []}
    for f in files:
        try:
            res = read_idf_file(f)
        except Exception:
            n_skip += 1
            continue
        sc_path = sidecar_path(f)
        if not sc_path.exists():
            n_skip += 1
            continue
        try:
            sc = load_sidecar_samples(sc_path)
        except Exception:
            n_skip += 1
            continue
        per_file = {}
        for ch in ("Tran", "Vert", "Long"):
            sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
            dec = res.samples.get(ch, [])
            n = min(len(sc_counts), len(dec))
            if n == 0:
                per_file[ch] = 0.0
                continue
            exact = sum(1 for i in range(n) if sc_counts[i] == dec[i])
            pct = 100.0 * exact / n
            per_file[ch] = pct
            overall[ch].append(pct)
        n_ok += 1
    print(f"Processed {n_ok} files (skipped {n_skip})")
    print("Per-channel exact-match % (mean / min / max):")
    for ch, vals in overall.items():
        if vals:
            avg = sum(vals) / len(vals)
            print(f"  {ch}: mean={avg:.2f}%  min={min(vals):.2f}%  max={max(vals):.2f}%  n={len(vals)}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,49 @@
 """Find where decoded-vs-sidecar diverges for each channel."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from minimateplus.waveform_codec import decode_waveform_v2
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def main():
    buf = TARGET.read_bytes()
    sc = load_sidecar_samples(TXT)
    decoded = decode_waveform_v2(buf[0x0f1f:])
    GEO_LSB = 0.0003
    for ch in ("Tran", "Vert", "Long"):
        sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
        dec = decoded[ch]
        # Find ALL transitions where mismatches start/stop
        first_diff = next((i for i in range(len(dec)) if dec[i] != sc_counts[i]), None)
        if first_diff is None:
            print(f"{ch}: NO MISMATCHES")
            continue
        print(f"{ch}: first diff at idx {first_diff}")
        # Show 5 before, 5 after
        for i in range(max(0, first_diff - 3), min(len(dec), first_diff + 8)):
            mark = "  " if dec[i] == sc_counts[i] else "**"
            print(f"  {mark} idx {i:4d}: sc={sc_counts[i]:6d}  dec={dec[i]:6d}  diff={dec[i]-sc_counts[i]:+d}")
        # Where does cumulative diff exceed 100?
        cum_match_run = 0
        max_match_run = 0
        match_run_start = 0
        diff_count = 0
        for i in range(len(dec)):
            if dec[i] == sc_counts[i]:
                cum_match_run += 1
                max_match_run = max(max_match_run, cum_match_run)
            else:
                cum_match_run = 0
                diff_count += 1
        print(f"  total mismatches: {diff_count}/{len(dec)}, longest run of matches: {max_match_run}")
        print()
 if __name__ == "__main__":
    main()
@@ -0,0 +1,48 @@
 """End-to-end IDFH ingest verification."""
 from __future__ import annotations
 import sys
 import tempfile
 import json
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from sfm.waveform_store import WaveformStore
 def main():
    idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
    txt  = idfh.parent / "TXT" / f"{idfh.name}.txt"
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idfh.read_bytes(),
            idfh,
            idf_report_text=txt.read_text(errors="replace"),
        )
        print("=== save_imported_idf (IDFH) ===")
        print(f"  serial:        {rec['serial']}")
        print(f"  filename:      {rec['filename']}")
        print(f"  filesize:      {rec['filesize']}")
        print(f"  h5:            {rec['hdf5_filename']}")  # expect None for histogram
        print(f"  sidecar:       {rec['sidecar_filename']}")
        print()
        print("=== Event ===")
        print(f"  timestamp:     {ev.timestamp}")
        print(f"  record_type:   {ev.record_type}")
        print(f"  sample_rate:   {ev.sample_rate}")
        print()
        # Inspect sidecar to confirm intervals were stashed
        sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json"
        sc = json.loads(sc_path.read_text())
        intervals = sc.get("extensions", {}).get("idf_intervals", [])
        print(f"  sidecar intervals: {len(intervals)}")
        if intervals:
            print(f"  first interval:    {intervals[0]}")
            print(f"  last interval:     {intervals[-1]}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,40 @@
 """Verify the had_report=False path: ingest IDFW with no .txt."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 import tempfile
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from sfm.waveform_store import WaveformStore
 def main():
    idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idfw.read_bytes(),
            idfw,
            serial_hint=None,
            idf_report_text=None,        # ← no .txt!
        )
        print("=== IDFW without .txt ingest ===")
        print(f"  serial:        {rec['serial']}")
        print(f"  timestamp:     {ev.timestamp}")
        print(f"  sample_rate:   {ev.sample_rate}")
        print(f"  record_type:   {ev.record_type}")
        print(f"  rectime_sec:   {ev.rectime_seconds}")
        nT = len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0
        nV = len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0
        nL = len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0
        nM = len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0
        print(f"  raw_samples:   Tran={nT} Vert={nV} Long={nL} MicL={nM}")
        if ev.peak_values:
            print(f"  peak_values:   tran={ev.peak_values.tran} vert={ev.peak_values.vert} long={ev.peak_values.long}")
        print(f"  h5 written:    {rec['hdf5_filename']}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,102 @@
 """End-to-end Thor report PDF rendering.
 Ingests an IDFW + .txt via save_imported_idf, runs gather_report_data
 (faking a minimal DB row), and renders the PDF to disk.
 """
 from __future__ import annotations
 import sys
 import tempfile
 import json
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from sfm.waveform_store import WaveformStore
 from sfm import report_pdf
 class FakeDb:
    """Stand-in for SeismoDb.get_event(); the renderer only needs a few cols."""
    def __init__(self, event):
        self.event = event
    def get_event(self, _id):
        return self.event
 def main():
    base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719"
    idfw = base / "UM11719_20231219162723.IDFW"
    txt  = base / "TXT" / f"{idfw.name}.txt"
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idfw.read_bytes(),
            idfw,
            idf_report_text=txt.read_text(errors="replace"),
        )
        print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}")
        # Verify sidecar has bw_report block
        sc_path = Path(td) / "UM11719" / f"{idfw.name}.sfm.json"
        sc = json.loads(sc_path.read_text())
        bw = sc.get("bw_report", {})
        print(f"  bw_report.available: {bw.get('available')}")
        print(f"  bw_report.peaks.tran.ppv_ips: {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}")
        print(f"  bw_report.mic.pspl_dbl: {bw.get('mic', {}).get('pspl_dbl')}")
        print(f"  bw_report.histogram.n_intervals: {bw.get('histogram', {}).get('n_intervals')}")
        # Build a DB-row-shaped dict from the Event for gather_report_data
        import datetime
        ts = ev.timestamp
        ts_iso = None
        if ts is not None:
            try:
                ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
            except Exception:
                pass
        fake_row = {
            "serial":              "UM11719",
            "blastware_filename":  rec["filename"],
            "record_type":         "Waveform",
            "timestamp":           ts_iso,
            "sample_rate":         ev.sample_rate,
            "project":             ev.project_info.project if ev.project_info else None,
            "client":              ev.project_info.client  if ev.project_info else None,
            "operator":            ev.project_info.operator if ev.project_info else None,
            "sensor_location":     ev.project_info.sensor_location if ev.project_info else None,
            "created_at":          None,
        }
        rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
        print()
        print(f"=== ReportData ===")
        print(f"  event_id:           {rd.event_id}")
        print(f"  serial:             {rd.serial}")
        print(f"  record_type:        {rd.record_type}")
        print(f"  event_datetime:     {rd.event_datetime_str}")
        print(f"  trigger:            {rd.trigger_source}")
        print(f"  geo_range:          {rd.geo_range_str}")
        print(f"  sample_rate:        {rd.sample_rate_str}")
        print(f"  firmware:           {rd.firmware}")
        print(f"  calibration:        {rd.calibration_date} by {rd.calibration_by}")
        print(f"  battery:            {rd.battery_volts}")
        print(f"  PVS:                {rd.peak_vector_sum_ips} in/s at {rd.peak_vector_sum_time_s} sec")
        print(f"  mic_pspl_dbl:       {rd.mic_pspl_dbl}")
        print(f"  mic_zc_freq_hz:     {rd.mic_zc_freq_hz}")
        print(f"  channel_stats:      {len(rd.channel_stats)} rows")
        for cs in rd.channel_stats:
            print(f"    {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} ToP={cs['time_of_peak_s']} Acc={cs['peak_accel_g']} Disp={cs['peak_disp_in']} Test={cs['sensor_check']}")
        # Render the PDF
        out_path = REPO / "analysis_idf" / "thor_report.pdf"
        pdf_bytes = report_pdf.render_event_report_pdf(rd)
        out_path.write_bytes(pdf_bytes)
        print()
        print(f"  PDF written: {out_path} ({len(pdf_bytes)} bytes)")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,91 @@
 """End-to-end Thor IDFH histogram report PDF rendering."""
 from __future__ import annotations
 import sys
 import tempfile
 import json
 import datetime
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from sfm.waveform_store import WaveformStore
 from sfm import report_pdf
 class FakeDb:
    def __init__(self, event):
        self.event = event
    def get_event(self, _id):
        return self.event
 def main():
    # Use the multi-interval IDFH (81 + trigger row)
    idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
    txt  = idfh.parent / "TXT" / f"{idfh.name}.txt"
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idfh.read_bytes(),
            idfh,
            idf_report_text=txt.read_text(errors="replace"),
        )
        print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}")
        sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json"
        sc = json.loads(sc_path.read_text())
        bw = sc.get("bw_report", {})
        hist = bw.get("histogram", {})
        print(f"  bw_report.histogram.start:           {hist.get('start')}")
        print(f"  bw_report.histogram.stop:            {hist.get('stop')}")
        print(f"  bw_report.histogram.n_intervals:     {hist.get('n_intervals')}")
        print(f"  bw_report.histogram.interval_size:   {hist.get('interval_size')}")
        print(f"  bw_report.histogram.interval_size_s: {hist.get('interval_size_s')}")
        print(f"  bw_report.peaks.tran.ppv_ips:        {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}")
        ts = ev.timestamp
        ts_iso = None
        if ts is not None:
            try:
                ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
            except Exception:
                pass
        fake_row = {
            "serial":              "UM13981",
            "blastware_filename":  rec["filename"],
            "record_type":         "Histogram",
            "timestamp":           ts_iso,
            "sample_rate":         ev.sample_rate,
            "project":             ev.project_info.project if ev.project_info else None,
            "client":              ev.project_info.client  if ev.project_info else None,
            "operator":            ev.project_info.operator if ev.project_info else None,
            "sensor_location":     ev.project_info.sensor_location if ev.project_info else None,
            "created_at":          None,
        }
        rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="hist-1")
        print()
        print("=== ReportData (histogram) ===")
        print(f"  is_histogram:           {rd.is_histogram}")
        print(f"  histogram_start:        {rd.histogram_start_str}")
        print(f"  histogram_stop:         {rd.histogram_stop_str}")
        print(f"  histogram_n_intervals:  {rd.histogram_n_intervals}")
        print(f"  histogram_interval_size:{rd.histogram_interval_size}")
        print(f"  histogram_interval_times[:3]: {rd.histogram_interval_times[:3]}")
        print(f"  histogram_interval_times[-2:]: {rd.histogram_interval_times[-2:]}")
        print(f"  channel_stats: {len(rd.channel_stats)} rows")
        for cs in rd.channel_stats:
            print(f"    {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} peak_date={cs['peak_date']} peak_time={cs['peak_time']}")
        pdf_bytes = report_pdf.render_event_report_pdf(rd)
        out_path = REPO / "analysis_idf" / "thor_report_idfh.pdf"
        out_path.write_bytes(pdf_bytes)
        print()
        print(f"  PDF written: {out_path} ({len(pdf_bytes)} bytes)")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,52 @@
 """End-to-end ingest test: feed an IDFW + .txt to save_imported_idf in a tmp store."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 import tempfile
 import shutil
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from sfm.waveform_store import WaveformStore
 def main():
    idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
    txt  = idfw.parent / "TXT" / f"{idfw.name}.txt"
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idfw.read_bytes(),
            idfw,
            serial_hint=None,
            idf_report_text=txt.read_text(errors="replace"),
        )
        print("=== Save result ===")
        print(f"  serial:    {rec['serial']}")
        print(f"  filename:  {rec['filename']}")
        print(f"  filesize:  {rec['filesize']}")
        print(f"  h5:        {rec['hdf5_filename']}")
        print(f"  sidecar:   {rec['sidecar_filename']}")
        print()
        print("=== Event ===")
        print(f"  serial:        {ev.serial if hasattr(ev,'serial') else '(n/a)'}")
        print(f"  timestamp:     {ev.timestamp}")
        print(f"  sample_rate:   {ev.sample_rate}")
        print(f"  record_type:   {ev.record_type}")
        print(f"  rectime_sec:   {ev.rectime_seconds}")
        print(f"  raw_samples:   Tran={len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0}, Vert={len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0}, Long={len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0}, MicL={len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0}")
        if ev.peak_values:
            print(f"  peaks (txt):   Tran={ev.peak_values.tran} Vert={ev.peak_values.vert} Long={ev.peak_values.long}")
        print()
        # Verify the h5 file actually got written
        h5path = Path(td) / "UM11719" / f"{idfw.name}.h5"
        print(f"  h5 exists:     {h5path.exists()}  size={h5path.stat().st_size if h5path.exists() else 0}")
        sidecar = Path(td) / "UM11719" / f"{idfw.name}.sfm.json"
        print(f"  sidecar exists:{sidecar.exists()}  size={sidecar.stat().st_size if sidecar.exists() else 0}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,137 @@
 """Decode IDFH histogram intervals + verify against sidecar."""
 from __future__ import annotations
 import sys
 import struct
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 SEGMENT_MAGIC = b"\x02\xda\x0a\x00\x00\x00"
 SEGMENT_SIZE = 732   # = 10-byte header + 10 × 72-byte intervals + 2-byte tail
 INTERVAL_SIZE = 72
 CHANNELS = ("Tran", "Vert", "Long", "MicL")
 def decode_interval(buf72: bytes) -> dict:
    """Decode one 72-byte interval into per-channel min/max/halfp."""
    out = {}
    for i, ch in enumerate(CHANNELS):
        block = buf72[i*16 : (i+1)*16]
        mn = struct.unpack_from(">h", block, 0)[0]
        mx = struct.unpack_from(">h", block, 2)[0]
        sb = struct.unpack_from(">h", block, 4)[0]
        halfp = struct.unpack_from(">H", block, 6)[0]
        f10 = struct.unpack_from(">H", block, 10)[0]
        f14 = struct.unpack_from(">H", block, 14)[0]
        peak_count = max(abs(mn), abs(mx))
        out[ch] = {
            "min":     mn,
            "max":     mx,
            "field4":  sb,
            "halfp":   halfp,
            "field10": f10,
            "field14": f14,
            "peak":    peak_count,
            "freq_hz": (512.0 / halfp) if halfp > 5 else None,
        }
    out["_tail"] = buf72[64:].hex(" ")
    return out
 def walk_idfh(buf: bytes) -> list:
    """Walk all interval records in an IDFH file."""
    intervals = []
    # Multi-segment file: every 02 da 0a 00 00 00 marker introduces a segment.
    # Single-interval file: just one body header at 0xf96 of form ?? ?? 0a 00 00 00.
    # Find them all.
    i = 0
    while True:
        j = buf.find(b"\x0a\x00\x00\x00", i)
        if j < 0:
            break
        # Validate: the 2 bytes before must form a length, and we want bytes
        # [j-2 : j+6] to have a recognisable shape.  Actually the cleanest
        # filter is "preceded by a length and followed by 00 NN 05 3f".
        if j < 2:
            i = j + 1
            continue
        # Body header form: [length_be_2][0a 00 00 00][00 NN][05 3f]
        if j + 10 > len(buf):
            break
        length = int.from_bytes(buf[j-2:j], "big")
        # Verify the segment-marker shape: [length_be][0a 00 00 00][00 NN][05 3f]
        if buf[j+4] != 0x00:
            i = j + 1
            continue
        if buf[j+6:j+8] != b"\x05\x3f":
            i = j + 1
            continue
        # Header layout (10 bytes): [length_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B]
        # Followed by N interval records of 72 bytes each, then 2 tail bytes.
        # length value = (N × 72) + 10  (counts bytes from 0x0a... through interval data).
        header_start = j - 2
        n_intervals = (length - 10) // INTERVAL_SIZE
        interval_start = header_start + 10
        for k in range(n_intervals):
            off = interval_start + k * INTERVAL_SIZE
            if off + INTERVAL_SIZE > len(buf):
                break
            chunk = buf[off:off + INTERVAL_SIZE]
            intervals.append({"offset": off, **decode_interval(chunk)})
        i = header_start + length + 2
    return intervals
 def main():
    # Test against multi-segment IDFH
    target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
    sc_path = target.parent / "TXT" / f"{target.name}.txt"
    buf = target.read_bytes()
    intervals = walk_idfh(buf)
    print(f"=== {target.name} ===")
    print(f"  file size: {len(buf)}")
    print(f"  decoded intervals: {len(intervals)}")
    # Show first 2 + last 2
    sc_rows = []
    for line in sc_path.read_text(errors="replace").splitlines():
        if line.startswith("2022-") or line.startswith("2023-"):
            sc_rows.append(line)
    print(f"  sidecar rows: {len(sc_rows)}")
    print()
    for k in [0, 1, 78, 79, 80]:
        if k >= len(intervals):
            continue
        iv = intervals[k]
        print(f"--- interval {k} @0x{iv['offset']:04x} ---")
        for ch in CHANNELS:
            d = iv[ch]
            peak_ips = d["peak"] / 32768 * 10.0
            print(f"  {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s)  halfp={d['halfp']:5d}  freq={d['freq_hz']}")
        # sidecar row
        if k < len(sc_rows):
            print(f"  SC: {sc_rows[k]}")
    # Test single-interval IDFH
    print()
    target2 = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH"
    sc2 = target2.parent / "TXT" / f"{target2.name}.txt"
    buf2 = target2.read_bytes()
    intervals2 = walk_idfh(buf2)
    print(f"=== {target2.name} ===")
    print(f"  file size: {len(buf2)}, decoded intervals: {len(intervals2)}")
    if intervals2:
        iv = intervals2[0]
        for ch in CHANNELS:
            d = iv[ch]
            peak_ips = d["peak"] / 32768 * 10.0
            print(f"  {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s)  halfp={d['halfp']:5d}  freq={d['freq_hz']}")
        sc_rows2 = [l for l in sc2.read_text(errors='replace').splitlines() if l.startswith("2023-")]
        if sc_rows2:
            print(f"  SC: {sc_rows2[0]}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,41 @@
 """Find IDFH interval period via auto-correlation of structural patterns."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 from collections import Counter
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 def main():
    target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
    buf = target.read_bytes()
    body_start = 0xF96
    body_end   = 0x270C
    body = buf[body_start:body_end]
    print(f"body size: {len(body)} bytes (file {len(buf)} bytes)")
    # For each candidate interval size, count how many bytes at fixed offsets within
    # each interval are zero (consistent column-zero pattern indicates correct size).
    print()
    print("=== zero-column score by interval size (higher = more likely) ===")
    best = []
    for sz in range(16, 100):
        n = len(body) // sz
        if n < 30:
            continue
        # For each column position within an interval, count how many of n intervals have zero
        score = 0
        for col in range(sz):
            zeros = sum(1 for i in range(n) if body[i*sz + col] == 0)
            if zeros >= n * 0.9:
                score += 1
        best.append((score, sz, n))
    best.sort(reverse=True)
    for score, sz, n in best[:10]:
        print(f"  size={sz:3d}  n_intervals={n}  consistently-zero-cols={score}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,40 @@
 """Per-file accuracy + sample-count details."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from micromate.idf_file import read_idf_file
 from analysis_idf.recon import load_sidecar_samples
 def main():
    root = REPO / "tests/fixtures/THORDATA_example"
    files = sorted([f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")])
    GEO_LSB = 0.0003
    # Limit to first 15 successful files for detail.
    shown = 0
    for f in files:
        try:
            res = read_idf_file(f)
        except Exception:
            continue
        sc_path = f.parent / "TXT" / f"{f.name}.txt"
        if not sc_path.exists():
            continue
        sc = load_sidecar_samples(sc_path)
        sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]]
        dec = res.samples.get("Tran", [])
        n = min(len(sc_tran), len(dec))
        exact = sum(1 for i in range(n) if sc_tran[i] == dec[i]) if n else 0
        pct = 100.0 * exact / n if n else 0.0
        print(f"{f.name:40s}  size={f.stat().st_size:6d}  sc_n={len(sc_tran):4d}  dec_n={len(dec):4d}  exact={pct:.1f}%")
        shown += 1
        if shown >= 20:
            break
 if __name__ == "__main__":
    main()
@@ -0,0 +1,64 @@
 """Look at what's at the divergence boundary."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from minimateplus.waveform_codec import walk_body, find_data_start, parse_segment_header
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def main():
    buf = TARGET.read_bytes()
    body = buf[0x0f1f:]
    start = find_data_start(body)
    print(f"data_start: {start}  (= file offset 0x{0x0f1f + start:04x})")
    blocks = walk_body(body, start)
    print(f"{len(blocks)} blocks total")
    print()
    # First 25 blocks
    print("=== first 30 blocks ===")
    for i, b in enumerate(blocks[:30]):
        body_off = 0x0f1f + b.offset
        if b.tag_hi == 0x40:
            hdr = parse_segment_header(b)
            print(f"  [{i:3d}] @0x{body_off:04x}  {b.kind}  (segment header)  counter={hdr['counter'] if hdr else '?'}  field2={hdr['field2'].hex() if hdr else '?'}  anchor={hdr['anchor_bytes'].hex() if hdr else '?'}  tail={hdr['tail'].hex() if hdr else '?'}")
        else:
            print(f"  [{i:3d}] @0x{body_off:04x}  {b.kind}  len={b.length}  data={b.data[:16].hex()}")
    print()
    # Cumulative sample counts per block to find which block contains sample 254
    print("=== cumulative samples through blocks ===")
    cur_ch = "Tran"
    rotation = ["Vert", "Long", "MicL", "Tran"]
    seg_count = 0
    samples_in_curseg = 2  # preamble Tran[0], Tran[1]
    for i, b in enumerate(blocks[:30]):
        if b.tag_hi == 0x40:
            seg_count += 1
            prev_ch = cur_ch
            cur_ch = rotation[(seg_count - 1) % 4]
            print(f"  [{i:3d}] 40 02 -> end of {prev_ch} segment, start {cur_ch} (segment {seg_count})")
            samples_in_curseg = 2  # anchors
        elif (b.tag_hi & 0xF0) == 0x10:
            nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo
            samples_in_curseg += nn
            print(f"  [{i:3d}] {b.kind} nibble: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
        elif (b.tag_hi & 0xF0) == 0x20:
            nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo
            samples_in_curseg += nn
            print(f"  [{i:3d}] {b.kind} int8: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
        elif b.tag_hi == 0x00:
            samples_in_curseg += b.tag_lo
            print(f"  [{i:3d}] {b.kind} RLE: +{b.tag_lo}, ch={cur_ch}, ch_total~{samples_in_curseg}")
        elif b.tag_hi == 0x30:
            samples_in_curseg += b.tag_lo
            print(f"  [{i:3d}] {b.kind} packed12: +{b.tag_lo} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,89 @@
 """Reconnaissance helpers for cracking the Thor IDFW binary."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 TARGET = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
 TXT = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/TXT/UM11719_20231219162723.IDFW.txt"
 def hex_at(buf: bytes, off: int, n: int = 32) -> str:
    chunk = buf[off : off + n]
    hexs = " ".join(f"{b:02x}" for b in chunk)
    asc = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk)
    return f"{off:04x}: {hexs}  {asc}"
 def find_all(buf: bytes, needle: bytes) -> list[int]:
    out: list[int] = []
    i = 0
    while True:
        j = buf.find(needle, i)
        if j < 0:
            break
        out.append(j)
        i = j + 1
    return out
 def load_sidecar_samples(path: Path) -> dict[str, list[float]]:
    """Parse the txt sample table — Tran/Vert/Long/MicL."""
    out = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
    in_block = False
    for line in path.read_text(errors="replace").splitlines():
        if not in_block:
            if line.strip() == "Waveform Data Channels":
                in_block = True
            continue
        if line.startswith("Waveform Data USB Channels"):
            break
        parts = line.split("\t")
        # First row is the header "\tTran\tVert\tLong\tMicL"
        if len(parts) >= 5 and parts[1] == "Tran":
            continue
        if len(parts) < 5:
            continue
        try:
            out["Tran"].append(float(parts[1]))
            out["Vert"].append(float(parts[2]))
            out["Long"].append(float(parts[3]))
            out["MicL"].append(float(parts[4]))
        except ValueError:
            continue
    return out
 def main():
    buf = TARGET.read_bytes()
    samples = load_sidecar_samples(TXT)
    print(f"file size: {len(buf)} bytes")
    print(f"sample rows: Tran={len(samples['Tran'])} Vert={len(samples['Vert'])} Long={len(samples['Long'])} MicL={len(samples['MicL'])}")
    print(f"first 6 Tran samples: {samples['Tran'][:6]}")
    print(f"first 6 Vert samples: {samples['Vert'][:6]}")
    print(f"first 6 Long samples: {samples['Long'][:6]}")
    print(f"first 6 MicL samples: {samples['MicL'][:6]}")
    print()
    print("=== BW magic '00 02 00' positions ===")
    hits = find_all(buf, b"\x00\x02\x00")
    print(f"{len(hits)} hits")
    for h in hits[:20]:
        print(hex_at(buf, h, 24))
    print()
    print("=== '40 02' segment-header positions ===")
    hits = find_all(buf, b"\x40\x02")
    print(f"{len(hits)} hits")
    for h in hits:
        ctx_pre = buf[max(0, h - 4): h].hex()
        ctx_post = buf[h: h + 20].hex()
        # Show byte preceding to help identify real headers vs casual occurrences
        print(f"  0x{h:04x}  pre={ctx_pre}  post={ctx_post}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,40 @@
 """Find each segment boundary in the channel and check if errors reset there."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from minimateplus.waveform_codec import decode_waveform_v2
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def main():
    buf = TARGET.read_bytes()
    sc = load_sidecar_samples(TXT)
    decoded = decode_waveform_v2(buf[0x0f1f:])
    GEO_LSB = 0.0003
    for ch in ("Tran", "Vert", "Long"):
        sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
        dec = decoded[ch]
        # Find every transition where error becomes zero from nonzero (or grows from zero)
        # Print indices where dec resyncs back to exact match.
        n = min(len(sc_counts), len(dec))
        events = []
        prev_match = True
        for i in range(n):
            match = sc_counts[i] == dec[i]
            if match != prev_match:
                kind = "RESYNC" if match else "DIVERGE"
                events.append((i, kind, sc_counts[i], dec[i]))
                prev_match = match
        print(f"{ch}: {len(events)} transitions")
        for i, kind, sc_v, dec_v in events[:20]:
            print(f"  idx {i:4d}  {kind:8s}  sc={sc_v:6d}  dec={dec_v:6d}  diff={dec_v-sc_v:+d}")
        print()
 if __name__ == "__main__":
    main()
@@ -0,0 +1,46 @@
 """Smoke-test read_idf_file on IDFH across the corpus."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from micromate.idf_file import read_idf_file
 def main():
    target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH"
    result = read_idf_file(target)
    ev = result.event
    print(f"=== {target.name} ===")
    print(f"  signature:   {result.signature}")
    print(f"  serial:      {ev.serial}")
    print(f"  timestamp:   {ev.timestamp}")
    print(f"  sample_rate: {ev.sample_rate}")
    print(f"  kind:        {ev.kind}")
    print(f"  intervals:   {len(result.intervals or [])}")
    print(f"  peaks:       T={ev.peaks.transverse_ips:.4f} V={ev.peaks.vertical_ips:.4f} L={ev.peaks.longitudinal_ips:.4f}")
    print()
    root = REPO / "tests/fixtures/THORDATA_example"
    files = list(root.rglob("*.IDFH"))
    ok = fail = nyi = 0
    total_intervals = 0
    for f in files:
        try:
            r = read_idf_file(f)
            ok += 1
            total_intervals += len(r.intervals or [])
        except NotImplementedError:
            nyi += 1
        except Exception as exc:
            fail += 1
            if fail <= 3:
                print(f"  FAIL: {f.name}: {type(exc).__name__}: {exc}")
    print(f"Corpus: {len(files)} IDFH files | ok={ok} fail={fail} nyi={nyi}")
    print(f"Total intervals decoded: {total_intervals}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,48 @@
 """Smoke-test read_idf_file across the sample corpus."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from micromate.idf_file import read_idf_file, geo_count_to_ips, mic_count_to_psi
 def main():
    target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
    result = read_idf_file(target)
    ev = result.event
    print(f"=== {target.name} ===")
    print(f"  signature: {result.signature}")
    print(f"  serial:    {ev.serial}")
    print(f"  timestamp: {ev.timestamp}")
    print(f"  sample_rate: {ev.sample_rate}")
    print(f"  record_time: {ev.record_time_sec}")
    print(f"  calibration: {result.binary_metadata.calibration_date}")
    print(f"  Tran samples: {len(result.samples['Tran'])}, peak_ips={ev.peaks.transverse_ips:.4f}")
    print(f"  Vert samples: {len(result.samples['Vert'])}, peak_ips={ev.peaks.vertical_ips:.4f}")
    print(f"  Long samples: {len(result.samples['Long'])}, peak_ips={ev.peaks.longitudinal_ips:.4f}")
    print(f"  MicL samples: {len(result.samples['MicL'])}")
    print()
    # Corpus sweep
    root = REPO / "tests/fixtures/THORDATA_example"
    files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]
    ok = fail = nyi = 0
    for f in files:
        try:
            r = read_idf_file(f)
            ok += 1
        except NotImplementedError:
            nyi += 1
        except Exception as exc:
            fail += 1
            if fail <= 5:
                print(f"  FAIL: {f.name}: {type(exc).__name__}: {exc}")
    print()
    print(f"Corpus: {len(files)} IDFW files | ok={ok} fail={fail} not-implemented={nyi}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,47 @@
 """Verify build_bw_report_from_idf against a known sidecar."""
 from __future__ import annotations
 import json
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from micromate.idf_ascii_report import parse_idf_report
 from micromate.idf_to_bw_report import build_bw_report_from_idf
 from micromate.idf_file import read_idf_file
 def show(prefix: str, d: dict, indent: int = 0):
    for k, v in d.items():
        if isinstance(v, dict):
            print(f"{'  '*indent}{prefix}{k}:")
            show("", v, indent + 1)
        else:
            print(f"{'  '*indent}{prefix}{k}: {v!r}")
 def main():
    base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719"
    idfw = base / "UM11719_20231219162723.IDFW"
    txt  = base / "TXT" / f"{idfw.name}.txt"
    report_dict = parse_idf_report(txt.read_text(errors="replace"))
    res = read_idf_file(idfw)
    bw = build_bw_report_from_idf(report_dict, binary_md=res.binary_metadata)
    print("=== IDFW → bw_report ===")
    show("", bw)
    print()
    print("=== IDFH (single trigger row) ===")
    idfh = base / "UM11719_20231219162648.IDFH"
    txt_h = base / "TXT" / f"{idfh.name}.txt"
    rh = parse_idf_report(txt_h.read_text(errors="replace"))
    res_h = read_idf_file(idfh)
    bw_h = build_bw_report_from_idf(rh, binary_md=res_h.binary_metadata, intervals=res_h.intervals)
    show("", bw_h)
 if __name__ == "__main__":
    main()
@@ -0,0 +1,73 @@
 """Trace Tran sample-by-sample to find exactly where the codec drifts."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def s4(n: int) -> int:
    return n if n < 8 else n - 16
 def i8(b: int) -> int:
    return b if b < 128 else b - 256
 def main():
    buf = TARGET.read_bytes()
    sc = load_sidecar_samples(TXT)
    GEO_LSB = 0.0003
    sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]]
    body = buf[0x0f1f:]
    # Tran[0], Tran[1] from preamble
    t0 = int.from_bytes(body[3:5], "big", signed=True)
    t1 = int.from_bytes(body[5:7], "big", signed=True)
    print(f"preamble Tran[0]={t0}  Tran[1]={t1}  (sidecar: {sc_tran[0]}, {sc_tran[1]})")
    # Block 0: 10 f8 at body[7:9]
    print(f"block 0: tag {body[7]:02x} {body[8]:02x}")
    print(f"  block 0 first 10 data bytes: {body[9:19].hex()}")
    # Walk block 0 manually, comparing each sample
    cur = t1
    samples = [t0, t1]
    block_off = 7
    nn = body[8]
    print(f"  NN = {nn}")
    data = body[9 : 9 + nn // 2]
    for byi, byte in enumerate(data):
        for nib_idx, nib in enumerate(((byte >> 4) & 0xF, byte & 0xF)):
            cur += s4(nib)
            samples.append(cur)
            idx = len(samples) - 1
            if 0 <= idx < len(sc_tran):
                sc_v = sc_tran[idx]
                match = "✓" if sc_v == cur else "✗"
                if idx < 12 or 240 <= idx <= 260:
                    print(f"    idx {idx:3d}: nibble byte={byte:02x} nib={nib:x} delta={s4(nib):+d}  cur={cur:+d}  sc={sc_v:+d}  {match}")
    print(f"end of block 0: cur={cur}, len(samples)={len(samples)}, decoder expected 250 here")
    # Block 1: 20 28 starts at offset 9 + 124 = 133 from block_off=7
    block1_off = 9 + nn // 2
    print(f"block 1: tag {body[block1_off]:02x} {body[block1_off+1]:02x} (expecting 20 28)")
    nn1 = body[block1_off + 1]
    print(f"  block 1 NN = {nn1}")
    data1 = body[block1_off + 2 : block1_off + 2 + nn1]
    for byi, byte in enumerate(data1):
        cur += i8(byte)
        samples.append(cur)
        idx = len(samples) - 1
        if idx < len(sc_tran):
            sc_v = sc_tran[idx]
            match = "✓" if sc_v == cur else "✗"
            if 248 <= idx <= 295:
                print(f"    idx {idx:3d}: int8 byte={byte:02x} delta={i8(byte):+d}  cur={cur:+d}  sc={sc_v:+d}  {match}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,42 @@
 """Feed candidate body offsets to the BW codec and compare with sidecar."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from minimateplus.waveform_codec import decode_waveform_v2, walk_body, find_data_start
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def main():
    buf = TARGET.read_bytes()
    sc = load_sidecar_samples(TXT)
    # Sidecar samples in 0.0003 counts (Thor geo LSB).
    sc_tran = [int(round(v / 0.0003)) for v in sc["Tran"][:30]]
    sc_vert = [int(round(v / 0.0003)) for v in sc["Vert"][:30]]
    sc_long = [int(round(v / 0.0003)) for v in sc["Long"][:30]]
    sc_micl = [int(round(v / 1e-6)) for v in sc["MicL"][:30]]  # 1 µ unit for mic? Will iterate.
    print(f"sidecar Tran (counts): {sc_tran}")
    print(f"sidecar Vert (counts): {sc_vert}")
    print(f"sidecar Long (counts): {sc_long}")
    print(f"sidecar MicL (×1e-6):  {sc_micl}")
    print()
    # Try candidate body start offsets.
    for off in (0x0f1f, 0x1057, 0x11f1, 0x1333, 0x1bde, 0x0d30):
        print(f"=== body @ 0x{off:04x} ===")
        body = buf[off:]
        decoded = decode_waveform_v2(body)
        if not decoded:
            print("  decode_waveform_v2 returned None")
            continue
        for ch in ("Tran", "Vert", "Long", "MicL"):
            arr = decoded.get(ch, [])
            print(f"  {ch}[{len(arr)}]: {arr[:20]}")
        print()
 if __name__ == "__main__":
    main()
@@ -0,0 +1,51 @@
 """Verify decode_waveform_v2 against sidecar across all 2304 samples per channel."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from minimateplus.waveform_codec import decode_waveform_v2
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def main():
    buf = TARGET.read_bytes()
    sc = load_sidecar_samples(TXT)
    body = buf[0x0f1f:]
    decoded = decode_waveform_v2(body)
    print(f"Sidecar lengths: Tran={len(sc['Tran'])} Vert={len(sc['Vert'])} Long={len(sc['Long'])} MicL={len(sc['MicL'])}")
    print(f"Decoded lengths: Tran={len(decoded['Tran'])} Vert={len(decoded['Vert'])} Long={len(decoded['Long'])} MicL={len(decoded['MicL'])}")
    print()
    GEO_LSB = 0.0003  # in/s per count
    for ch in ("Tran", "Vert", "Long"):
        sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
        dec = decoded[ch]
        n = min(len(sc_counts), len(dec))
        matches = sum(1 for i in range(n) if sc_counts[i] == dec[i])
        first_mismatch = next((i for i in range(n) if sc_counts[i] != dec[i]), None)
        print(f"{ch}: compared {n}, exact matches {matches} ({100*matches/n:.2f}%)")
        if first_mismatch is not None:
            i = first_mismatch
            print(f"  first mismatch at idx {i}: sidecar={sc_counts[i]} ({sc[ch][i]}), decoded={dec[i]}")
            print(f"  context sidecar[{i-2}..{i+5}]: {sc_counts[max(0,i-2):i+5]}")
            print(f"  context decoded[{i-2}..{i+5}]: {dec[max(0,i-2):i+5]}")
    # MicL: find the multiplicative factor that fits
    print()
    print("=== MicL scale analysis ===")
    sc_micl = sc["MicL"]
    dec_micl = decoded["MicL"]
    # Skip zero values when computing ratio
    ratios = [sc_micl[i] / dec_micl[i] for i in range(min(50, len(sc_micl), len(dec_micl))) if dec_micl[i] != 0]
    if ratios:
        avg = sum(ratios) / len(ratios)
        print(f"  avg ratio sidecar/decoded over first 50 nonzero: {avg:.4e} (n={len(ratios)})")
        print(f"  ratios sample: {[f'{r:.4e}' for r in ratios[:6]]}")
 if __name__ == "__main__":
    main()
@@ -6,11 +6,68 @@ Series IV event-file format.  Sibling to
 Series III "Rosetta Stone") — this doc holds what we know so far and
 the open questions still to crack.
-**Status (2026-05-20):** ASCII text sidecar fully decoded (1,014
+**Status (2026-05-28):** ASCII text sidecar fully decoded (1,014
-sample files round-trip).  Binary `.IDFH` / `.IDFW` codec
+sample files round-trip).  **Thor IDFW** binary now decodes via
-**not yet implemented** — binaries are stored opaquely by
+`micromate.idf_file.read_idf_file()` — reuses the BW segment-rotated
-`WaveformStore.save_imported_idf`, with metadata sourced from the
+block codec verbatim at fixed body offset `0x0f1f`; metadata (serial,
-paired `.txt` sidecar.
+timestamp, sample_rate, record_time, calibration_date) extracted from
 the binary header.  Sample fidelity is 87–99% byte-exact on quiet
 events; loud events hit the BW codec's known walker-stops-early
 limitation.  Residual ~3% drift on per-sample deltas (likely a
 Thor-specific 12-bit delta refinement not yet modelled).
 **Thor IDFH histograms also decoded.**  Body has one or more segments;
 each 12-byte segment header `[length_be 2B][0a 00 00 00][00 NN][05 3f]`
 introduces `N = (length - 10) // 72` interval records of 72 bytes
 each.  Each interval = 4 × 16-byte per-channel records:
 `[int16 min][int16 max][int16 ??][uint16 halfp][2B 00][uint16 ??][2B 00][uint16 ??]`.
 Geo peak `= max(|min|, |max|) / 32768 × 10` in/s (matches sidecar
 ~1.8%); freq `= 512 / halfp` Hz (None for halfp ≤ 5 → ">100"
 sentinel).  Corpus: **all 859 Thor IDFH files decode, 181,071
 intervals**.  Wired through `read_idf_file()` →
 `save_imported_idf()` → sidecar's `extensions.idf_intervals`.
 **Note on the BE9439 outliers in the example corpus:** Two files
 (`BE9439_20200713131747.IDFW` and `BE9439_20200713124251.IDFH`) are
 **Series III Blastware** binaries, not Thor.  Provenance: TMI tried
 to use Thor to manage auto-call-homes for Series III units; the
 experiment didn't work out, but it did leave a few BW event files
 in Thor's per-serial directory structure with `.IDFW`/`.IDFH`
 extensions — Thor's forwarder applied its own naming convention to
 the BW bodies it was relaying.  Their header `10 00 01 80 00 00
 Instantel STRT ff fe <end_key> <start_key>` is the BW SUB 5A STRT
 record, not a Thor body preamble.  The reader detects them by
 signature and raises `NotImplementedError` pointing callers at
 `read_blastware_file()`, which extracts BW-format peaks from them.
 **Still NYI for Thor IDFH:** per-channel `int16 field4` (possibly
 time-of-peak); the two uint16 fields (probably PVS contributions);
 8-byte interval tail (PVS data); mic dB(L) exact conversion constant.
 ### Codec breakthroughs (2026-05-28)
 - **Body offset is a fixed `0x0f1f`** across 151/154 corpus IDFW
  files.  Preceded by a 4-byte record-type marker (`46 00 00 00`)
  + magic preamble `00 02 00 [Tran[0] BE] [Tran[1] BE]`.
 - **Sample stream is BW's segment-rotated block codec verbatim.**
  Thor reuses `10 NN` (nibble), `20 NN` (int8), `00 NN` (RLE),
  `30 NN` (packed12), `40 02` (segment header) tags with the same
  semantics.  Channel rotation Tran→Vert→Long→MicL.
 - **Geo LSB = 0.0003 in/s** (not BW's 0.005), because Thor's 16-bit
  ADC range maps to 10 in/s without the 16-count BW quantization step.
 - **Mic ≈ 2.14×10⁻⁶ psi/count** (rough scale; refine after channel
  block calibration constants are decoded).
 - **BW compliance anchor `\xbe\x80\x00\x00\x00\x00` reappears at
  IDFW offset 0x952** — sample_rate at anchor−6 (uint16 BE),
  record_time at anchor+6 (float32 BE), same layout as BW.
 - **Event timestamp at offset 0x97A** — 8 bytes `[day][month]
  [year_be][unk][hour][min][sec]`.  Stop-time mirrors at 0x982.
 - **Serial as null-terminated ASCII at 0x14E**.
 - **Calibration date** at 0x194–0x197 (day, month, year_be).
 - Per-sample residual drift of ~3% suggests Thor encodes int8/nibble
  deltas with an extra refinement bit that BW doesn't carry —
  unsolved; errors resync within a few samples so cumulative impact
  is small.
 ---
@@ -210,8 +210,7 @@ def parse_idf_report(text: Union[str, bytes]) -> Dict[str, Any]:
        "long_peak_acceleration",
        "tran_peak_displacement", "vert_peak_displacement",
        "long_peak_displacement",
-        "tran_time_of_peak", "vert_time_of_peak", "long_time_of_peak",
+        "mic_zc_freq",
        "mic_time_of_peak", "mic_zc_freq",
    )
    for key in float_fields:
        v = raw.get(key)
@@ -223,6 +222,22 @@ def parse_idf_report(text: Union[str, bytes]) -> Dict[str, Any]:
        else:
            out.pop(key, None)
    # Time-of-peak: Thor labels these "TimeofPeak" (lowercase "of") so the
    # normalizer produces "*_timeof_peak".  Map them to the canonical
    # ``*_time_of_peak`` output keys for downstream consumers.
    for raw_key, out_key in (
        ("tran_timeof_peak", "tran_time_of_peak"),
        ("vert_timeof_peak", "vert_time_of_peak"),
        ("long_timeof_peak", "long_time_of_peak"),
        ("mic_timeof_peak",  "mic_time_of_peak"),
    ):
        v = raw.get(raw_key)
        if v is None:
            continue
        fv = _parse_float(v)
        if fv is not None:
            out[out_key] = fv
    # Microphone — Thor reports MicPSPL (dB(L)) which is the closest
    # analogue to BW's mic_ppv.  The raw "99.4 dB(L)" string stays in
    # `out` under the original `mic_pspl` key for display; the parsed
@@ -1,64 +1,530 @@
 """
-micromate/idf_file.py — placeholder for the Thor IDF binary codec.
+micromate/idf_file.py — Thor IDF binary codec.
-Thor's ``.IDFH`` (histogram) and ``.IDFW`` (waveform) event files are an
+Decodes the Instantel Micromate Series IV ``.IDFW`` (waveform) and
-Instantel proprietary binary format that has not yet been reverse-
+``.IDFH`` (histogram) binary on-disk format.  Sister module to
-engineered.  Today seismo-relay treats them as opaque blobs:
+``minimateplus/event_file_io.py``.
 ``WaveformStore.save_imported_idf`` stores the bytes verbatim and reads
 all device-authoritative metadata from the paired ``.IDFW.txt`` /
 ``.IDFH.txt`` ASCII sidecar (parsed by ``idf_ascii_report.py``).
-When we crack the binary codec — same reverse-engineering playbook we
+Status (2026-05-28):
 used to byte-perfect-parse Series III BW files (see
 ``docs/instantel_protocol_reference.md`` and ``minimateplus/event_file_io.py``)
 — this module will grow:
-  - ``read_idf_file(path) -> IdfEvent``
+- **Genuine Series IV / Thor binaries** are all signed
-        Parse a ``.IDFW``/``.IDFH`` binary and return a fully populated
+  ``00 12 01 00 00 00 Instantel\\0`` (sig-A in earlier notes).  Two
-        ``IdfEvent`` whose waveform-sample arrays come from the binary
+  Series III (Blastware) binaries appear in the example corpus
-        (the .txt sidecar's tabular sample block being a best-effort
+  (``BE9439_*``) — they share the ``.IDFW``/``.IDFH`` extension by
-        check).  Lets us ingest Thor events even when the operator
+  filing convention but carry a BW STRT header (``10 00 01 80 00 00
-        hasn't enabled the .txt exporter — closing the
+  Instantel STRT...``) and are NOT Thor data.  The reader detects
-        ``had_report=False`` gap that the thor-watcher forwarder
+  them by signature and raises NotImplementedError pointing callers
-        currently tolerates as a known limitation.
+  at ``minimateplus.event_file_io.read_blastware_file()``.
 - **IDFW waveform body** reuses the BW segment-rotated block codec
  verbatim.  Body always starts at file offset ``0x0f1f``.  Samples
  decoded via ``minimateplus.waveform_codec.decode_waveform_v2``
  with 87–99% byte-exact match against ``.IDFW.txt`` sidecar (quiet
  events).  Loud events hit the BW codec's known walker-stops-early
  limit.  Residual ~3% drift on per-sample deltas — likely a
  Thor-specific 12-bit delta refinement that BW's codec doesn't
  model.  Geo LSB = 0.0003 in/s; mic factor ~2.14e-6 psi/count.
 - **IDFH histogram body**: 12-byte segment header
  ``[len_be 2B] 0a 00 00 00 [00 NN_counter] 05 3f`` introduces a
  segment of ``N`` 72-byte interval records (``N = (len - 10) // 72``).
  Each record holds 4 × 16-byte per-channel min/max/halfp + 8-byte
  tail.  Geo peaks via ``max(|min|, |max|) / 32768 × 10`` in/s
  (matches sidecar within ~1.8%), freq via ``512 / halfp`` Hz.
  **All 859 Thor IDFH files in the corpus decode (181,071 intervals).**
 - Binary metadata directly extracted: serial, timestamp, sample_rate,
  record_time, calibration_date.  Other fields fall back to the paired
  ``.IDFW.txt`` / ``.IDFH.txt`` sidecar (consumed by
  ``WaveformStore.save_imported_idf``).
-  - ``write_idf_file(path, event)`` (eventually)
+The full reverse-engineering writeup lives in
-        Round-trip event reconstruction, used for verifying the codec
+``docs/idf_protocol_reference.md``.
        against captured device files the way ``write_blastware_file``
        verifies the Series III codec.
  - Helpers for decoding the binary's per-channel sample arrays into
    physical units, the per-event flash buffer's monitor-log records,
    etc.
 The reverse-engineering path: pair every ``.IDFW`` binary in
 ``thor-watcher/example-data/`` with its sibling ``.IDFW.txt``, treating
 the txt's "Waveform Data Channels" block as ground-truth, and align
 the binary's per-channel int16-or-similar arrays against it.  Header
 fields (sample rate, channel count, record time, timestamps) sit before
 the sample block — same approach as the BW codec where ASCII strings
 inside the binary (``Project:``, ``Client:``, etc.) anchored field
 discovery.
 """
 from __future__ import annotations
 import datetime
 import struct
 from dataclasses import dataclass
 from pathlib import Path
-from typing import Union
+from typing import Optional, Union
-from .models import IdfEvent
+from minimateplus.waveform_codec import decode_waveform_v2
 from .models import IdfEvent, IdfPeaks, IdfReport
-def read_idf_file(path: Union[str, Path]) -> "IdfEvent":
+# Genuine Series IV / Thor IDF binary signature: 6 bytes, then ASCII "Instantel".
-    """Parse a Thor ``.IDFW``/``.IDFH`` binary into an ``IdfEvent``.
+_THOR_PREFIX = b"\x00\x12\x01\x00\x00\x00"
 # Stray Series III (Blastware) binaries that occasionally turn up in Thor
 # corpus directories renamed to the .IDFW/.IDFH convention.  Their header
 # (`10 00 01 80 00 00 Instantel STRT ...`) is byte-for-byte a BW SUB 5A
 # STRT record, not a Thor binary.  Detected so we can refuse-and-route
 # rather than mis-parse.
 _BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00"
 _INSTANTEL_TAG = b"Instantel"
-    Not yet implemented.  When implemented, this will be the canonical
+# Most common body offset for sig-A IDFW files (~50% of prod events;
-    entry point for reading Thor binaries — the ASCII sidecar parser
+# 151/154 in the original tests/fixtures/THORDATA_example corpus).  The
-    becomes an optional fast-path metadata supplement rather than the
+# body is the segment-rotated block stream consumed by decode_waveform_v2;
-    sole source of device-authoritative data.
+# bytes [0:3] are the magic ``00 02 00`` preamble.  Production events
 # routinely use other offsets — see :func:`_find_waveform_body_offset`
 # for the dynamic scan.  This constant survives only as the priority hint.
 _BODY_START_SIG_A = 0x0F1F
 # Magic bytes that mark a candidate waveform-body preamble.
 _BODY_MAGIC = b"\x00\x02\x00"
 # Where to start looking for body candidates inside the file.  Skip the
 # fixed-header region where the same magic legitimately appears inside
 # channel-test records and the compliance block (offsets 0x015d, 0x091c,
 # 0x0ae2, 0x0d30 in observed events).
 _BODY_SCAN_FLOOR = 0x0E00
 # Geophone count → in/s, derived from sidecar ground truth: the smallest
 # non-zero sample in 1,014-file corpus is 0.0003 in/s.
 _GEO_LSB_IPS = 0.0003
 # Microphone count → psi, derived from sidecar regression on 50 sample
 # pairs from UM11719_20231219162723.IDFW (mic-heavy event).
 _MIC_LSB_PSI = 2.14e-6
 # IDFH histogram constants.
 _IDFH_INTERVAL_SIZE = 72        # bytes per per-interval record
 _IDFH_SEGMENT_HEADER = 10       # bytes: [len_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B]
 _IDFH_SEGMENT_TAIL   = 2        # bytes after the interval data block, before next marker
 _IDFH_HALFP_FREQ_NUM = 512.0    # freq_hz = NUM / halfp; halfp ≤ 5 means ">100 Hz" sentinel
 _IDFH_GEO_FULL_SCALE = 10.0     # in/s — Normal range
 _IDFH_INT16_FS = 32768.0
 _IDFH_CHANNELS = ("Tran", "Vert", "Long", "MicL")
 # ─── Binary metadata extraction ─────────────────────────────────────────────
@dataclass
 class IdfBinaryMetadata:
    """Fields recoverable from the sig-A binary header (no .txt needed)."""
    serial:           Optional[str] = None
    event_datetime:   Optional[datetime.datetime] = None
    sample_rate:      Optional[int] = None
    record_time_sec:  Optional[float] = None
    calibration_date: Optional[datetime.date] = None
 def _read_ascii_z(buf: bytes, off: int, maxlen: int = 64) -> Optional[str]:
    if off >= len(buf):
        return None
    end = buf.find(b"\x00", off, off + maxlen)
    if end < 0:
        end = min(off + maxlen, len(buf))
    s = buf[off:end].decode("ascii", errors="replace").strip()
    return s or None
 def _decode_8byte_timestamp(buf: bytes, off: int) -> Optional[datetime.datetime]:
    """Layout: ``[day][month][year_hi][year_lo][unknown][hour][min][sec]``."""
    if off + 8 > len(buf):
        return None
    day, mon, yh, yl, _unk, hr, mn, sc = buf[off : off + 8]
    year = (yh << 8) | yl
    if not (2015 <= year <= 2050 and 1 <= mon <= 12 and 1 <= day <= 31
            and 0 <= hr < 24 and 0 <= mn < 60 and 0 <= sc < 60):
        return None
    try:
        return datetime.datetime(year, mon, day, hr, mn, sc)
    except ValueError:
        return None
 def extract_binary_metadata(buf: bytes) -> IdfBinaryMetadata:
    """Pull serial/timestamp/sample_rate/record_time/calibration from the
    sig-A binary header.
    Field positions confirmed against UM11719_20231219162723.IDFW; stable
    across the 151-file sig-A corpus.
    """
-    raise NotImplementedError(
+    md = IdfBinaryMetadata()
-        "IDF binary codec not yet implemented; the .IDFW/.IDFH binary format "
+
-        "is undecoded.  Use parse_idf_report() on the paired .txt sidecar "
+    # Serial: null-terminated ASCII at 0x14E.
-        "for device-authoritative metadata."
+    md.serial = _read_ascii_z(buf, 0x14E, maxlen=16)
    # Sample rate + record time live in a BW-compatible compliance block.
    # Locate the 6-byte anchor `be 80 00 00 00 00` and read offsets relative
    # to it: anchor-6 = sample_rate uint16 BE; anchor+6 = record_time float32 BE.
    anchor = buf.find(b"\xbe\x80\x00\x00\x00\x00", 0x800, 0xA00)
    if anchor > 0:
        sr_bytes = buf[anchor - 6 : anchor - 4]
        if len(sr_bytes) == 2:
            sr = int.from_bytes(sr_bytes, "big")
            if sr in (256, 512, 1024, 2048, 4096):
                md.sample_rate = sr
        rt_bytes = buf[anchor + 6 : anchor + 10]
        if len(rt_bytes) == 4:
            try:
                rt = struct.unpack(">f", rt_bytes)[0]
                if 0.1 <= rt <= 600.0:
                    md.record_time_sec = float(rt)
            except struct.error:
                pass
    # Event timestamp: 8 bytes.  Position differs between IDFW (0x97A) and
    # IDFH (0x9F8); scan a small range and accept the first valid decode.
    for off in (0x97A, 0x9F8):
        ts = _decode_8byte_timestamp(buf, off)
        if ts is not None:
            md.event_datetime = ts
            break
    # Calibration date: day, month, year_be at 0x194-0x197.
    if len(buf) > 0x197:
        day, mon = buf[0x194], buf[0x195]
        year = int.from_bytes(buf[0x196 : 0x198], "big")
        if 1 <= mon <= 12 and 1 <= day <= 31 and 2015 <= year <= 2050:
            try:
                md.calibration_date = datetime.date(year, mon, day)
            except ValueError:
                pass
    return md
 # ─── Sample decoder + unit conversion ───────────────────────────────────────
 def _find_waveform_body_offset(buf: bytes) -> Optional[int]:
    """Pick the file offset of the waveform body by trial-decoding every
    ``00 02 00`` magic position past the fixed-header region.
    The body's location isn't fixed across all sig-A IDFW files — about
    half the production events use ``0x0f1f``, but the rest have offsets
    that shift based on header padding / channel-config layout.  We
    auto-detect by:
      1. Find every ``00 02 00`` occurrence past ``_BODY_SCAN_FLOOR``.
      2. Try ``decode_waveform_v2()`` on each candidate.
      3. Pick the offset whose decoded sample count is largest.
    Returns the offset, or ``None`` if no candidate yielded more than
    the trivial 2-sample preamble (= "no real body found").
    Costs ~2-8 trial decodes per file; in practice the first candidate
    past 0x0e00 is usually the right one.
    """
    if len(buf) < _BODY_SCAN_FLOOR + 8:
        return None
    best: Optional[tuple[int, int]] = None   # (total_samples, offset)
    i = _BODY_SCAN_FLOOR
    while True:
        j = buf.find(_BODY_MAGIC, i)
        if j < 0:
            break
        i = j + 1
        try:
            decoded = decode_waveform_v2(buf[j:])
        except Exception:
            continue
        if not decoded:
            continue
        total = sum(len(v) for v in decoded.values())
        # A "real" body has more than just the 2-sample preamble.
        if total <= 2:
            continue
        if best is None or total > best[0]:
            best = (total, j)
    return best[1] if best else None
 def _decode_waveform_samples(buf: bytes) -> Optional[dict]:
    """Decode samples from the sig-A waveform body.
    Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in
    its own count unit (see :func:`mic_count_to_psi`).  Returns None if
    no usable body is found.
    Uses :func:`_find_waveform_body_offset` to locate the body — the
    file-offset varies across events (~50% sit at the canonical
    ``0x0f1f`` but the rest don't), so the previous hardcoded constant
    silently produced 2-sample preamble-only output for half the corpus.
    """
    off = _find_waveform_body_offset(buf)
    if off is None:
        return None
    return decode_waveform_v2(buf[off:])
 def geo_count_to_ips(count: int) -> float:
    """Convert a Thor geo decoder count to in/s.  LSB = 0.0003 in/s."""
    return count * _GEO_LSB_IPS
 def mic_count_to_psi(count: int) -> float:
    """Convert a Thor mic decoder count to psi.  Scale derived from
    regression over 50 sample pairs in UM11719_20231219162723.IDFW;
    consistent to ~5%.  Calibration constants from the channel block
    can refine this once decoded.
    """
    return count * _MIC_LSB_PSI
 # ─── IDFH histogram decoder ─────────────────────────────────────────────────
@dataclass
 class IdfhInterval:
    """One decoded histogram interval (typically one minute of monitoring)."""
    offset:    int    # file byte offset of the 72-byte record
    # Per-channel min/max ADC counts (int16 BE), half-period samples, peak count.
    # Peak = max(|min|, |max|).  freq_hz = 512/halfp (None if halfp ≤ 5 →
    # ">100 Hz" sentinel; matches sidecar convention).
    tran_min:    int
    tran_max:    int
    tran_halfp:  int
    vert_min:    int
    vert_max:    int
    vert_halfp:  int
    long_min:    int
    long_max:    int
    long_halfp:  int
    micl_min:    int
    micl_max:    int
    micl_halfp:  int
    def peak_count(self, channel: str) -> int:
        mn = getattr(self, f"{channel.lower()}_min")
        mx = getattr(self, f"{channel.lower()}_max")
        return max(abs(mn), abs(mx))
    def peak_ips(self, channel: str) -> float:
        """Convert peak count to in/s (geo channels only)."""
        return self.peak_count(channel) / _IDFH_INT16_FS * _IDFH_GEO_FULL_SCALE
    def freq_hz(self, channel: str) -> Optional[float]:
        halfp = getattr(self, f"{channel.lower()}_halfp")
        if halfp <= 5:
            return None
        return _IDFH_HALFP_FREQ_NUM / halfp
 def _decode_idfh_interval(buf72: bytes, offset: int) -> IdfhInterval:
    """Decode one 72-byte interval record into per-channel min/max/halfp."""
    import struct
    fields = []
    for i in range(4):
        block = buf72[i * 16 : (i + 1) * 16]
        mn = struct.unpack_from(">h", block, 0)[0]
        mx = struct.unpack_from(">h", block, 2)[0]
        # block[4:6] = int16 BE, role unknown (possibly time-of-peak)
        halfp = struct.unpack_from(">H", block, 6)[0]
        # block[10:12] and block[14:16] are uint16 BE with unknown semantics
        # (likely sum / count contributions for the PVS computation).
        fields.extend([mn, mx, halfp])
    # Tail 8 bytes (buf72[64:72]) carry PVS-related data; not yet decoded.
    return IdfhInterval(
        offset=offset,
        tran_min=fields[0], tran_max=fields[1], tran_halfp=fields[2],
        vert_min=fields[3], vert_max=fields[4], vert_halfp=fields[5],
        long_min=fields[6], long_max=fields[7], long_halfp=fields[8],
        micl_min=fields[9], micl_max=fields[10], micl_halfp=fields[11],
    )
 def decode_idfh_body(buf: bytes) -> list:
    """Walk an IDFH file and decode every interval record.
    The body has one or more segments; each segment header is 12 bytes:
    ``[length_be 2B][0a 00 00 00][00 NN_counter][05 3f]`` where ``length``
    is bytes from the magic through the end of the interval block
    (= 10 + 72 × n_intervals).  Segments are separated by a 2-byte tail
    + next-segment 2-byte prefix (the bytes before the next length field).
    Confirmed against the 859-file corpus (181,071 intervals decoded; 1
    failure is the sig-B BE9439 file).
    """
    intervals: list = []
    i = 0
    while True:
        j = buf.find(b"\x0a\x00\x00\x00", i)
        if j < 0 or j < 2:
            break
        # Validate: [length_be][0a 00 00 00][00 NN][05 3f]
        if buf[j + 4] != 0x00 or buf[j + 6 : j + 8] != b"\x05\x3f":
            i = j + 1
            continue
        length = int.from_bytes(buf[j - 2 : j], "big")
        n = (length - _IDFH_SEGMENT_HEADER) // _IDFH_INTERVAL_SIZE
        if n <= 0:
            i = j + 1
            continue
        header_start = j - 2
        interval_start = header_start + _IDFH_SEGMENT_HEADER
        for k in range(n):
            off = interval_start + k * _IDFH_INTERVAL_SIZE
            if off + _IDFH_INTERVAL_SIZE > len(buf):
                break
            chunk = buf[off : off + _IDFH_INTERVAL_SIZE]
            intervals.append(_decode_idfh_interval(chunk, off))
        # Advance past this segment + the 2-byte tail.
        i = header_start + length + _IDFH_SEGMENT_TAIL
    return intervals
 # ─── Top-level reader ───────────────────────────────────────────────────────
@dataclass
 class IdfReadResult:
    """Return type for :func:`read_idf_file`.
    For waveforms (``.IDFW``), ``samples`` holds the per-channel sample
    arrays in Thor decoder counts.  For histograms (``.IDFH``),
    ``samples`` is empty and ``intervals`` holds the per-interval
    record list (peaks, freqs).
    """
    event:           IdfEvent
    samples:         dict   # {"Tran": [...], ...} for IDFW; empty for IDFH
    binary_metadata: IdfBinaryMetadata
    signature:       str    # always "thor" for now (sig-A genuine Thor)
    intervals:       Optional[list] = None  # list[IdfhInterval] for IDFH; None for IDFW
 def read_idf_file(
    path: Union[str, Path],
    *,
    data: Optional[bytes] = None,
 ) -> IdfReadResult:
    """Parse a Thor ``.IDFW`` binary into an ``IdfEvent`` + decoded samples.
    Currently implements signature-A waveforms only.  Signature-B
    (old-firmware) and ``.IDFH`` histograms raise NotImplementedError;
    use the paired ``.IDFW.txt`` / ``.IDFH.txt`` sidecar for those via
    ``parse_idf_report()``.
    Returns an :class:`IdfReadResult`.  The caller converts int sample
    counts to physical units via :func:`geo_count_to_ips` /
    :func:`mic_count_to_psi`.
    ``path`` is used for filename in error messages and ``.IDFH`` vs
    ``.IDFW`` suffix detection.  When ``data`` is supplied the disk
    read is skipped — useful for ingest paths that already have the
    bytes in memory and where the file may not exist on disk yet.
    """
    p = Path(path)
    buf = data if data is not None else p.read_bytes()
    if len(buf) < 16 or buf[6:16] != _INSTANTEL_TAG + b"\x00":
        raise ValueError(f"{p.name}: not an IDF file (missing Instantel magic)")
    sig_prefix = buf[:6]
    if sig_prefix == _THOR_PREFIX:
        signature = "thor"
    elif sig_prefix == _BW_STRAY_PREFIX:
        raise NotImplementedError(
            f"{p.name}: file has a Series III (Blastware) STRT header in "
            "an IDF-named container — not a Thor binary.  Route through "
            "minimateplus.event_file_io.read_blastware_file() instead "
            "(peaks decode; samples & full metadata don't, but it's not "
            "Thor data so the Thor codec doesn't apply)."
        )
    else:
        raise ValueError(f"{p.name}: unknown IDF signature {sig_prefix.hex()}")
    is_histogram = p.suffix.upper() == ".IDFH"
    md = extract_binary_metadata(buf)
    if is_histogram:
        intervals = decode_idfh_body(buf)
        if not intervals:
            raise ValueError(f"{p.name}: IDFH body decoded no intervals")
        # Peaks: max across all intervals on each channel (per-channel max
        # of stored max-magnitudes; sidecar's PPV row carries the same).
        peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0)
        peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0)
        peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0)
        # Mic peak in psi — Thor stores per-interval mic ADC counts in the
        # binary; convert the max count to psi via the per-count factor.
        mic_peak_count = max((iv.peak_count("MicL") for iv in intervals), default=0)
        mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
        rep = IdfReport(
            serial_number=md.serial,
            event_type="Full Histogram",
            event_datetime=md.event_datetime,
            filename=p.name,
            sample_rate=md.sample_rate,
            record_time_sec=md.record_time_sec,
        )
        peaks = IdfPeaks(
            transverse_ips=peak_tran,
            vertical_ips=peak_vert,
            longitudinal_ips=peak_long,
            peak_vector_sum_ips=None,
            mic_pspl_dbl=None,         # IDFH binary doesn't carry the dB(L) value
            mic_pspl_psi=mic_peak_psi,
        )
        event = IdfEvent(
            serial=md.serial or "UNKNOWN",
            timestamp=md.event_datetime or datetime.datetime(1970, 1, 1),
            kind="Histogram",
            filename=p.name,
            sample_rate=md.sample_rate,
            record_time_sec=md.record_time_sec,
            peaks=peaks,
            report=rep,
        )
        return IdfReadResult(
            event=event,
            samples={},
            binary_metadata=md,
            signature=signature,
            intervals=intervals,
        )
    # Waveform path.
    decoded = _decode_waveform_samples(buf)
    if decoded is None:
        raise ValueError(f"{p.name}: waveform body codec failed")
    rep = IdfReport(
        serial_number=md.serial,
        event_type="Full Waveform",
        event_datetime=md.event_datetime,
        filename=p.name,
        sample_rate=md.sample_rate,
        record_time_sec=md.record_time_sec,
    )
    def _peak_ips(ch: str) -> float:
        arr = decoded.get(ch, [])
        return geo_count_to_ips(max((abs(v) for v in arr), default=0))
    # Mic peak psi from binary: max absolute MicL ADC count × 2.14e-6 psi/count.
    mic_arr = decoded.get("MicL", [])
    mic_peak_count = max((abs(v) for v in mic_arr), default=0)
    mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
    peaks = IdfPeaks(
        transverse_ips=_peak_ips("Tran"),
        vertical_ips=_peak_ips("Vert"),
        longitudinal_ips=_peak_ips("Long"),
        # PVS requires aligned per-sample √(T²+V²+L²); leave None — the
        # sidecar carries it and the bridge picks it up if present.
        peak_vector_sum_ips=None,
        mic_pspl_dbl=None,             # binary IDFW doesn't carry the dB(L) value;
                                       # sidecar .txt fills it via IdfReport.from_dict
        mic_pspl_psi=mic_peak_psi,
    )
    event = IdfEvent(
        serial=md.serial or "UNKNOWN",
        timestamp=md.event_datetime or datetime.datetime(1970, 1, 1),
        kind="Waveform",
        filename=p.name,
        sample_rate=md.sample_rate,
        record_time_sec=md.record_time_sec,
        peaks=peaks,
        report=rep,
    )
    return IdfReadResult(
        event=event,
        samples=decoded,
        binary_metadata=md,
        signature=signature,
    )
@@ -0,0 +1,323 @@
 """
 micromate/idf_to_bw_report.py — adapter that projects a parsed Thor IDF
 report (+ binary metadata + decoded IDFH intervals) into the
 ``bw_report``-shaped dict that :mod:`sfm.report_pdf.gather_report_data`
 consumes.
 Lets Thor events flow through the existing Series III Event Report PDF
 pipeline without duplicating the renderer.  Thor's report content is
 ~95% the same data shape as BW's; the field names differ but the
 underlying metrics map 1:1.
 Caveats
 ───────
 - **Mic units** — Thor records ``MicPSPL`` natively in dB(L).  This
  adapter sets ``bw_report.mic.pspl_dbl`` directly; the report
  renderer recomputes the equivalent psi via its dBL→psi formula.
 - **Saturation / above-range flags** — Thor doesn't always mark
  ``OORANGE`` the way BW does; we set ``zc_freq_above_range`` only
  when a `>100` sentinel was preserved in the raw text.
 - **Per-interval data** — for IDFH events we build ``interval_times``
  by stepping ``IntervalSize`` from ``HistogramStartTime``; the binary
  decoder confirms one record per step (882 / 881 / 881 ... across
  the corpus).
 - **calibration_by parsing** — Thor's free-form ``Calibration : November
  22, 2023 by Instantel`` is split on ``" by "`` to extract the
  calibrator; the date prefix is parsed where possible, otherwise
  the binary-extracted ``calibration_date`` from
  :class:`micromate.idf_file.IdfBinaryMetadata` wins.
 """
 from __future__ import annotations
 import datetime
 import re
 from typing import Any, Dict, List, Optional
 # ─── Helpers ────────────────────────────────────────────────────────────────
 _NUM_RE = re.compile(r"-?\d+(?:\.\d+)?")
 def _parse_first_number(s: Optional[str]) -> Optional[float]:
    """Pull the first numeric token from a string like ``"0.1500 in/s"``."""
    if s is None:
        return None
    m = _NUM_RE.search(str(s))
    if not m:
        return None
    try:
        return float(m.group(0))
    except ValueError:
        return None
 def _parse_interval_size_s(s: Optional[str]) -> Optional[float]:
    """``"60 sec"`` → 60.0, ``"5 min"`` → 300.0, ``"1 hour"`` → 3600."""
    if s is None:
        return None
    num = _parse_first_number(s)
    if num is None:
        return None
    sl = str(s).lower()
    if "hour" in sl or "hr" in sl:
        return num * 3600.0
    if "min" in sl:
        return num * 60.0
    return num   # default to seconds
 def _parse_calibration(text: Optional[str]) -> tuple[Optional[str], Optional[str]]:
    """Split ``"November 22, 2023 by Instantel"`` → (ISO date, calibrator).
    Returns ``(None, None)`` if neither half parses.
    """
    if not text:
        return None, None
    parts = str(text).split(" by ", 1)
    date_part = parts[0].strip() if parts else None
    by_part = parts[1].strip() if len(parts) > 1 else None
    iso_date: Optional[str] = None
    if date_part:
        for fmt in ("%B %d, %Y", "%b %d, %Y", "%Y-%m-%d", "%m/%d/%Y"):
            try:
                iso_date = datetime.datetime.strptime(date_part, fmt).date().isoformat()
                break
            except ValueError:
                continue
    return iso_date, by_part
 def _channel_peaks(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]:
    """Map ``tran_ppv`` / ``tran_zc_freq`` / ... → bw_report.peaks.tran shape."""
    out: Dict[str, Any] = {}
    for src, dst in (
        (f"{ch_lc}_ppv",                 "ppv_ips"),
        (f"{ch_lc}_zc_freq",             "zc_freq_hz"),
        (f"{ch_lc}_time_of_peak",        "time_of_peak_s"),
        (f"{ch_lc}_peak_acceleration",   "peak_accel_g"),
        (f"{ch_lc}_peak_displacement",   "peak_disp_in"),
    ):
        v = idf.get(src)
        if v is not None:
            out[dst] = v
    # ZC freq ">100" sentinel: the raw text carries it under the un-typed
    # key (e.g. ``raw["tran_zc_freq"]`` would be ``">100"``), and our parser
    # dropped the typed entry.  Detect that case and flag.
    raw_zc = idf.get(f"{ch_lc}_zc_freq")
    if isinstance(raw_zc, str) and ">" in raw_zc:
        out["zc_freq_above_range"] = True
        out.pop("zc_freq_hz", None)
    return out
 def _sensor_check(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]:
    out: Dict[str, Any] = {}
    fr = idf.get(f"{ch_lc}_test_freq")
    if fr is not None:
        out["freq_hz"] = _parse_first_number(fr)
    rt = idf.get(f"{ch_lc}_test_ratio")
    if rt is not None:
        out["ratio"] = _parse_first_number(rt)
    am = idf.get(f"{ch_lc}_test_amplitude")
    if am is not None:
        out["amplitude_mv"] = _parse_first_number(am)
    res = idf.get(f"{ch_lc}_test_results")
    if res is not None:
        out["result"] = str(res).strip()
    return {k: v for k, v in out.items() if v is not None}
 def _interval_times(idf: Dict[str, Any], n_intervals: Optional[int]) -> List[str]:
    """Synthesise per-interval timestamps from start + interval_size × k.
    Returns ``[]`` when start time or interval size is unknown.
    """
    if not n_intervals:
        return []
    start_date = idf.get("histogram_start_date") or idf.get("event_date")
    start_time = idf.get("histogram_start_time") or idf.get("event_time")
    iv_str = idf.get("interval_size")
    iv_s = _parse_interval_size_s(iv_str)
    if not (start_date and start_time and iv_s):
        return []
    try:
        t0 = datetime.datetime.strptime(f"{start_date} {start_time}", "%Y-%m-%d %H:%M:%S")
    except ValueError:
        return []
    out = []
    for k in range(int(n_intervals)):
        t = t0 + datetime.timedelta(seconds=iv_s * (k + 1))
        out.append(t.isoformat())
    return out
 # ─── Top-level adapter ──────────────────────────────────────────────────────
 def build_bw_report_from_idf(
    idf_report: Dict[str, Any],
    *,
    binary_md=None,
    intervals: Optional[list] = None,
    is_histogram: Optional[bool] = None,
 ) -> Dict[str, Any]:
    """Project a parsed IDF report dict (and optional binary metadata +
    decoded IDFH intervals) into the BW report sidecar shape.
    The returned dict is structurally identical to what
    ``minimateplus.event_file_io._bw_report_to_dict`` produces from a
    real BW ASCII report — it can be assigned to
    ``sidecar["bw_report"]`` and consumed verbatim by
    ``sfm.report_pdf.gather_report_data``.
    ``intervals`` is the list of :class:`micromate.idf_file.IdfhInterval`
    objects from :func:`micromate.idf_file.decode_idfh_body`; only used
    for histogram events to derive accurate ``interval_times``.
    """
    if is_histogram is None:
        et = str(idf_report.get("event_type", ""))
        is_histogram = et.lower().startswith("full histogram")
    # ── Trigger / recording / device ─────────────────────────────────────
    trigger_channel = idf_report.get("trigger")
    trigger_level   = _parse_first_number(idf_report.get("geo_trigger_level"))
    geo_range_ips   = _parse_first_number(idf_report.get("geo_range"))
    cal_iso, cal_by = _parse_calibration(idf_report.get("calibration"))
    # Prefer the binary-extracted calibration_date when our text parse fell
    # through; the binary date is unambiguous.
    if cal_iso is None and binary_md is not None and binary_md.calibration_date:
        cal_iso = binary_md.calibration_date.isoformat()
    # ── Histogram fields ────────────────────────────────────────────────
    hist_block: Dict[str, Any] = {
        "start": None, "stop": None, "n_intervals": None,
        "interval_size": None, "interval_size_s": None,
        "channel_peak_when": {},
    }
    if is_histogram:
        sd = idf_report.get("histogram_start_date")
        st = idf_report.get("histogram_start_time")
        if sd and st:
            try:
                hist_block["start"] = datetime.datetime.strptime(
                    f"{sd} {st}", "%Y-%m-%d %H:%M:%S"
                ).isoformat()
            except ValueError:
                pass
        ed = idf_report.get("histogram_stop_date")
        et_ = idf_report.get("histogram_stop_time")
        if ed and et_:
            try:
                hist_block["stop"] = datetime.datetime.strptime(
                    f"{ed} {et_}", "%Y-%m-%d %H:%M:%S"
                ).isoformat()
            except ValueError:
                pass
        n_raw = idf_report.get("number_of_intervals")
        if n_raw is not None:
            try:
                # Thor reports a float like "81.04"; round to int (the BW
                # report uses an int for the column).
                hist_block["n_intervals"] = int(float(str(n_raw)))
            except ValueError:
                pass
        # When the binary decoder gave us the actual interval count, prefer it.
        if intervals is not None:
            hist_block["n_intervals"] = len(intervals)
        hist_block["interval_size"] = idf_report.get("interval_size")
        hist_block["interval_size_s"] = _parse_interval_size_s(idf_report.get("interval_size"))
        # interval_times derived from start+step (the BW report uses the
        # exact strings; we match its representation).
        times = _interval_times(idf_report, hist_block["n_intervals"])
        # Per-channel peak when (absolute date+time at which the channel's
        # peak occurred over the histogram run).  Thor splits this into
        # ``TranPeakDate`` / ``TranPeakTime`` etc.
        peak_when: Dict[str, str] = {}
        for ch_label, ch_lc in (("Tran", "tran"), ("Vert", "vert"), ("Long", "long"), ("MicL", "mic")):
            d = idf_report.get(f"{ch_lc}_peak_date")
            t = idf_report.get(f"{ch_lc}_peak_time")
            if d and t:
                try:
                    peak_when[ch_label] = datetime.datetime.strptime(
                        f"{d} {t}", "%Y-%m-%d %H:%M:%S"
                    ).isoformat()
                except ValueError:
                    continue
        if peak_when:
            hist_block["channel_peak_when"] = peak_when
    # ── Mic block ────────────────────────────────────────────────────────
    mic_block = {
        "weighting":           "L",                   # Thor mic is ISEE Linear
        "pspl_dbl":            idf_report.get("mic_ppv"),  # the dB(L) float
        "pspl_saturated":      False,
        "zc_freq_hz":          idf_report.get("mic_zc_freq"),
        "zc_freq_above_range": isinstance(idf_report.get("mic_zc_freq"), str)
                               and ">" in str(idf_report.get("mic_zc_freq")),
        "time_of_peak_s":      idf_report.get("mic_time_of_peak"),
    }
    if mic_block["zc_freq_above_range"]:
        mic_block["zc_freq_hz"] = None
    # ── Peaks ────────────────────────────────────────────────────────────
    vs_block = {
        "ips":       idf_report.get("peak_vector_sum"),
        "time_s":    _parse_first_number(idf_report.get("peak_vector_sum_time_sum")),
        "when":      None,
        "saturated": False,
    }
    if is_histogram:
        # PVS absolute date+time, when present.
        vs_d = idf_report.get("peak_vector_sum_date")
        vs_t = idf_report.get("peak_vector_sum_time")
        if vs_d and vs_t:
            try:
                vs_block["when"] = datetime.datetime.strptime(
                    f"{vs_d} {vs_t}", "%Y-%m-%d %H:%M:%S"
                ).isoformat()
            except ValueError:
                pass
    return {
        "available":  True,
        "event_type": idf_report.get("event_type"),
        "version":    idf_report.get("version"),
        "trigger": {
            "channel":       trigger_channel,
            "geo_level_ips": trigger_level,
        },
        "recording": {
            "sample_rate_sps":  idf_report.get("sample_rate"),
            "record_time_s":    idf_report.get("record_time_sec"),
            "pretrig_s":        idf_report.get("pre_trigger_sec"),
            "stop_mode":        idf_report.get("record_stop_mode"),
            "geo_range_ips":    geo_range_ips,
            "units":            idf_report.get("units"),
        },
        "device": {
            "battery_volts":    idf_report.get("battery_volts"),
            "calibration_date": cal_iso,
            "calibration_by":   cal_by,
        },
        "peaks": {
            "tran":       _channel_peaks(idf_report, "tran"),
            "vert":       _channel_peaks(idf_report, "vert"),
            "long":       _channel_peaks(idf_report, "long"),
            "vector_sum": vs_block,
        },
        "mic":          mic_block,
        "sensor_check": {
            "tran": _sensor_check(idf_report, "tran"),
            "vert": _sensor_check(idf_report, "vert"),
            "long": _sensor_check(idf_report, "long"),
            "mic":  _sensor_check(idf_report, "mic"),
        },
        "histogram":    hist_block,
        "monitor_log":  [],
        "pc_sw_version": None,
    }
@@ -159,12 +159,23 @@ class IdfReport:
@dataclass
 class IdfPeaks:
-    """Geophone + mic peak values for one Thor event.  Native Thor units."""
+    """Geophone + mic peak values for one Thor event.  Native Thor units.
    Thor stores the mic peak in two parallel forms — ``mic_pspl_dbl`` is
    what the sidecar's top-level ``MicPSPL`` header field carries (dB(L)),
    used in the report header.  ``mic_pspl_psi`` is the psi value derived
    either from the IDFW sample table / IDFH interval column 9, or from
    the binary mic counts (~2.14e-6 psi/count).  Needed because the
    BW-shaped ``PeakValues.micl`` consumed by ``event_hdf5.write_event_hdf5``
    expects psi — feeding it dB(L) makes the h5 mic-chart scale factor
    blow up.
    """
    transverse_ips:    Optional[float] = None    # in/s
    vertical_ips:      Optional[float] = None    # in/s
    longitudinal_ips:  Optional[float] = None    # in/s
    peak_vector_sum_ips: Optional[float] = None  # in/s
    mic_pspl_dbl:      Optional[float] = None    # dB(L)
    mic_pspl_psi:      Optional[float] = None    # psi
@dataclass
@@ -324,10 +335,14 @@ class IdfEvent:
        machinery without those code paths needing to know about Thor.
        Caveats of the bridge:
-          - ``mic_ppv`` on the produced Event carries Thor's dB(L) value
+          - ``PeakValues.micl`` carries the mic peak in **psi** (matching
-            verbatim — the UI distinguishes via the ``device_family``
+            BW's convention) — set from :attr:`IdfPeaks.mic_pspl_psi`,
-            column (Phase 1).  Don't run the BW psi→dBL converter on
+            with a dB(L)→psi fallback when only the dB(L) value is
-            Series IV rows.
+            available.  This is what the h5 writer's mic-scale-factor
            logic needs.  The dB(L) value still flows through
            ``bw_report.mic.pspl_dbl`` (set by the
            ``idf_to_bw_report`` adapter) and the renderer reads it
            from there for the report header.
          - Many Thor-specific fields (Peak Acceleration / Displacement,
            sensor self-check, calibration) don't have a slot in
            ``Event``.  The full IdfReport is preserved on the
@@ -349,11 +364,17 @@ class IdfEvent:
            minute=self.timestamp.minute,
            second=self.timestamp.second,
        )
        # Resolve mic peak as psi.  Priority: binary-derived mic_pspl_psi
        # (set by read_idf_file) > dB(L)→psi fallback via standard formula
        # (psi = 2.9e-9 × 10^(dBL/20)) > None.
        mic_psi = self.peaks.mic_pspl_psi
        if mic_psi is None and self.peaks.mic_pspl_dbl is not None:
            mic_psi = 2.9e-9 * (10.0 ** (self.peaks.mic_pspl_dbl / 20.0))
        pv = PeakValues(
            tran=self.peaks.transverse_ips,
            vert=self.peaks.vertical_ips,
            long=self.peaks.longitudinal_ips,
-            micl=self.peaks.mic_pspl_dbl,   # dB(L) — see caveat above
+            micl=mic_psi,   # psi, matching BW's convention (h5 scaling depends on this)
            peak_vector_sum=self.peaks.peak_vector_sum_ips,
        )
        pi = ProjectInfo(
@@ -49,7 +49,7 @@ SIDECAR_KIND   = "sfm.event"
 # bumped without a `pip install` re-run — leading to confusing stale
 # version stamps in sidecars.  Bump this constant and CHANGELOG.md
 # together at release time.
-TOOL_VERSION = "0.20.0"
+TOOL_VERSION = "0.21.1"
 try:
    # Best-effort: prefer the installed metadata when it's NEWER than the
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "seismo-relay"
-version = "0.20.0"
+version = "0.21.1"
 description = "Python client and REST server for MiniMate Plus seismographs"
 requires-python = ">=3.10"
 dependencies = [
@@ -0,0 +1,331 @@
 """
 scripts/backfill_thor_events.py — re-process existing Thor (Series IV)
 events so their sidecars carry the bw_report block produced by
 ``micromate.idf_to_bw_report.build_bw_report_from_idf`` + their .h5
 clean-waveform files for IDFW events.
 Why this exists
 ───────────────
 Thor events ingested before v0.21.0 (or during the v0.21.0 ingest bug
 window fixed in commit bee1185) have sidecars with only
 ``extensions.idf_report`` — no ``bw_report`` block.  Without
 ``bw_report``, the SFM PDF renderer falls back to DB-only fields
 (misses sensor-self-check, full per-channel breakdown, mic dB(L)),
 and the modal chart 404s on ``/waveform.json`` for IDFW events
 because no .h5 was written when the codec failed at ingest.
 Re-forwarding from thor-watcher would also fix this, but that requires
 operator coordination on every watcher machine and uses bandwidth this
 script doesn't.
 What this does
 ──────────────
 Walks ``<store>/<serial>/<filename>`` for ``.IDFW`` / ``.IDFH`` files
 and, for each one:
  1. Reads the existing sidecar (preserving review state + captured_at).
  2. Re-runs ``micromate.idf_file.read_idf_file()`` on the binary
     bytes — passing ``data=`` so the codec doesn't try to read from
     a path it doesn't know.
  3. Pulls ``extensions.idf_report`` (the raw parsed Thor dict the
     v0.18.0+ ingest path already stashed) and runs the v0.21.0
     ``build_bw_report_from_idf`` adapter against it.
  4. Writes the refreshed sidecar with the new ``bw_report``,
     bumped ``source.tool_version``, but preserved ``review`` block
     + the original ``captured_at`` timestamp.
  5. Regenerates the .h5 waveform file via the existing
     ``event_hdf5`` writer.  For IDFW that's the decoded per-sample
     stream; for IDFH it's a 1-sample-per-interval synthesised array
     (peak ADC count per channel) so the renderer's bar-chart code
     has data to group on.  Mic peak psi from the binary is merged
     onto the IdfEvent before the bridge so the h5 writer's per-count
     mic scale factor lands on a sensible value (without this the
     mic chart on Thor events plots dB(L)-as-pseudo-psi and shows
     bomb-level numbers).
 Idempotent.  Re-running it after a parser/adapter change just
 re-writes sidecars — no DB writes, no thor-watcher coordination.
 Usage
 ─────
    python scripts/backfill_thor_events.py [--store-root PATH]
                                           [--dry-run]
                                           [--skip-hdf5]
                                           [--force]
                                           [-v]
 By default, refreshes any Thor event whose sidecar is missing
 ``bw_report`` OR whose ``source.tool_version`` is older than the
 current ``TOOL_VERSION``.  ``--force`` refreshes every Thor event
 regardless.
 """
 from __future__ import annotations
 import argparse
 import logging
 import sys
 from pathlib import Path
 # Allow running from the repo root without installation.
 sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
 from minimateplus import event_file_io
 from sfm.waveform_store import WaveformStore
 log = logging.getLogger("backfill_thor_events")
 def _is_thor_event(path: Path) -> bool:
    if not path.is_file():
        return False
    if path.name.endswith((".sfm.json", ".h5", "_ASCII.TXT")):
        return False
    return path.suffix.upper() in (".IDFW", ".IDFH")
 def _vtuple(s: str) -> tuple:
    try:
        return tuple(int(p) for p in str(s).split(".")[:3])
    except Exception:
        return (0, 0, 0)
 def main(argv=None) -> int:
    p = argparse.ArgumentParser(description=__doc__)
    p.add_argument(
        "--db-path",
        default=str(Path(__file__).resolve().parent.parent / "bridges" / "captures" / "seismo_relay.db"),
        help="Used only to derive the default --store-root.",
    )
    p.add_argument("--store-root", default=None)
    p.add_argument("--dry-run", action="store_true")
    p.add_argument("--skip-hdf5", action="store_true",
                   help="Don't regenerate .h5 files for IDFW events.")
    p.add_argument("--force", action="store_true",
                   help="Refresh every Thor event, not just ones with stale or missing bw_report.")
    p.add_argument("-v", "--verbose", action="store_true")
    args = p.parse_args(argv)
    logging.basicConfig(
        level=logging.DEBUG if args.verbose else logging.INFO,
        format="%(asctime)s  %(levelname)-7s  %(name)s  %(message)s",
        datefmt="%H:%M:%S",
    )
    db_path = Path(args.db_path).expanduser().resolve()
    store_root = (
        Path(args.store_root).expanduser().resolve()
        if args.store_root else db_path.parent / "waveforms"
    )
    if not store_root.exists():
        log.error("store root not found: %s", store_root)
        return 1
    store = WaveformStore(store_root)
    log.info("store root: %s", store_root)
    log.info("current TOOL_VERSION: %s", event_file_io.TOOL_VERSION)
    refreshed = skipped = errors = h5_written = 0
    # Lazy imports so any one of these failing produces a useful error
    # message rather than crashing module-load.
    from micromate.idf_file import read_idf_file
    from micromate.idf_to_bw_report import build_bw_report_from_idf
    for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
        serial = serial_dir.name
        for path in sorted(serial_dir.iterdir()):
            if not _is_thor_event(path):
                continue
            sidecar_path = store.sidecar_path_for(serial, path.name)
            if not sidecar_path.exists():
                log.debug("%s: no sidecar — skipping (this is a binary without ingest history)",
                          path.name)
                skipped += 1
                continue
            try:
                existing = event_file_io.read_sidecar(sidecar_path)
            except Exception as exc:
                log.warning("%s: failed to read sidecar — %s", path.name, exc)
                errors += 1
                continue
            has_bw_report = bool(existing.get("bw_report"))
            existing_version = (existing.get("source") or {}).get("tool_version", "")
            up_to_date = (
                has_bw_report
                and _vtuple(existing_version) >= _vtuple(event_file_io.TOOL_VERSION)
            )
            if up_to_date and not args.force:
                skipped += 1
                continue
            # Re-decode the binary.  Catch + log; continue with .txt-only
            # data if it fails (matches the live ingest path's behavior).
            idf_samples = None
            idf_intervals = None
            binary_md = None
            is_histogram = path.suffix.upper() == ".IDFH"
            try:
                binary_bytes = path.read_bytes()
                res = read_idf_file(path, data=binary_bytes)
                idf_samples = res.samples or None
                idf_intervals = res.intervals
                binary_md = res.binary_metadata
                is_histogram = res.intervals is not None
            except NotImplementedError:
                # sig-B / Blastware-stray binary; no samples but adapter
                # can still produce a bw_report from extensions.idf_report.
                log.debug("%s: binary codec NotImplementedError (sig-B / BW-stray); proceeding from sidecar's idf_report only", path.name)
            except Exception as exc:
                log.warning("%s: binary decode failed — %s; proceeding from sidecar's idf_report only", path.name, exc)
            # Run the adapter.  Pull report_dict from
            # extensions.idf_report (the v0.18.0+ ingest preserved it).
            report_dict = (existing.get("extensions") or {}).get("idf_report") or {}
            if not report_dict and binary_md is None:
                log.debug("%s: no idf_report in sidecar AND no binary metadata — nothing to project", path.name)
                skipped += 1
                continue
            try:
                bw_report = build_bw_report_from_idf(
                    report_dict, binary_md=binary_md,
                    intervals=idf_intervals, is_histogram=is_histogram,
                )
            except Exception as exc:
                log.warning("%s: adapter failed — %s", path.name, exc)
                errors += 1
                continue
            # Build the new sidecar by overlaying refreshed fields onto
            # the existing one — preserves review, captured_at, blastware
            # block, source.kind, etc.
            new_sidecar = dict(existing)  # shallow copy
            new_sidecar["bw_report"] = bw_report
            src = dict(new_sidecar.get("source") or {})
            src["tool_version"] = event_file_io.TOOL_VERSION
            new_sidecar["source"] = src
            # Preserve histogram intervals if the binary decoded them
            # (improves over the original ingest if that one ran before
            # the bee1185 codec fix).
            if idf_intervals is not None:
                ext = dict(new_sidecar.get("extensions") or {})
                ext["idf_intervals"] = [
                    {
                        "offset":     iv.offset,
                        "tran_peak":  iv.peak_count("Tran"),
                        "tran_halfp": iv.tran_halfp,
                        "tran_freq":  iv.freq_hz("Tran"),
                        "vert_peak":  iv.peak_count("Vert"),
                        "vert_halfp": iv.vert_halfp,
                        "vert_freq":  iv.freq_hz("Vert"),
                        "long_peak":  iv.peak_count("Long"),
                        "long_halfp": iv.long_halfp,
                        "long_freq":  iv.freq_hz("Long"),
                        "mic_peak":   iv.peak_count("MicL"),
                        "mic_halfp":  iv.micl_halfp,
                        "mic_freq":   iv.freq_hz("MicL"),
                    }
                    for iv in idf_intervals
                ]
                new_sidecar["extensions"] = ext
            if args.dry_run:
                will_write_h5 = (idf_samples or idf_intervals) and not args.skip_hdf5
                log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)",
                         serial, path.name,
                         "wrote" if not has_bw_report else "refreshed",
                         "would write" if will_write_h5 else "skipped")
            else:
                event_file_io.write_sidecar(sidecar_path, new_sidecar)
                log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)",
                         serial, path.name,
                         "added" if not has_bw_report else "refreshed",
                         len(idf_intervals) if idf_intervals else 0)
            refreshed += 1
            # Regenerate .h5 by replaying the same IdfEvent → Event bridge
            # save_imported_idf uses.  For IDFW we write the decoded per-
            # sample arrays.  For IDFH we synthesise a 1-sample-per-interval
            # array (peak ADC count per channel per interval) so the
            # renderer's bar-chart code has something to group on.
            # Pre-condition: either real samples (IDFW) or decoded intervals
            # (IDFH).  Skip otherwise.
            have_data = bool(idf_samples) or bool(idf_intervals)
            if have_data and not args.skip_hdf5:
                from sfm import event_hdf5
                hdf5_path = store.hdf5_path_for(serial, path.name)
                if args.dry_run:
                    log.debug("[DRY] would write %s", hdf5_path.name)
                else:
                    try:
                        from micromate import IdfEvent
                        from minimateplus.event_file_io import file_sha256
                        idf_event = IdfEvent.from_report(report_dict, path.name)
                        # Merge the binary-derived mic peak psi (only the
                        # binary path knows the proper psi value; the .txt
                        # carries dB(L)).  Without this, the h5 writer's
                        # per-count mic factor is computed against the
                        # dB(L) value-as-pseudo-psi and the mic chart
                        # scales wildly.
                        if (binary_md is not None and res is not None
                                and res.event.peaks.mic_pspl_psi is not None):
                            idf_event.peaks.mic_pspl_psi = res.event.peaks.mic_pspl_psi
                        sha256 = file_sha256(path)
                        waveform_key = bytes.fromhex(sha256)[:16]
                        ev = idf_event.to_minimateplus_event(waveform_key)
                        if is_histogram and idf_intervals:
                            # 1 sample per interval per channel — same
                            # synthesis save_imported_idf uses.  The h5
                            # writer's count×geo_fs/32768 conversion turns
                            # each peak-ADC-count into the bar's physical
                            # value.
                            ev.raw_samples = {
                                "Tran": [iv.peak_count("Tran") for iv in idf_intervals],
                                "Vert": [iv.peak_count("Vert") for iv in idf_intervals],
                                "Long": [iv.peak_count("Long") for iv in idf_intervals],
                                "MicL": [iv.peak_count("MicL") for iv in idf_intervals],
                            }
                            ev.total_samples = ev.total_samples or len(idf_intervals)
                        elif idf_samples:
                            ev.raw_samples = idf_samples
                            n_samp = max(
                                (len(idf_samples.get(ch, []))
                                 for ch in ("Tran", "Vert", "Long", "MicL")),
                                default=0,
                            )
                            ev.total_samples = ev.total_samples or n_samp
                        event_hdf5.write_event_hdf5(
                            hdf5_path, ev,
                            serial=serial,
                            geo_range="normal",
                            source_kind="idf-import",
                            tool_version=event_file_io.TOOL_VERSION,
                        )
                        h5_written += 1
                        log.debug("%s/%s — .h5 written (%s)",
                                  serial, path.name,
                                  f"{len(idf_intervals)} intervals" if is_histogram
                                  else f"{sum(len(v) for v in (idf_samples or {}).values())} samples")
                    except Exception as exc:
                        log.warning("%s/%s — .h5 write failed: %s",
                                    serial, path.name, exc)
    log.info("Done.  refreshed=%d  skipped=%d  errors=%d  h5_written=%d",
             refreshed, skipped, errors, h5_written)
    return 0 if errors == 0 else 2
 if __name__ == "__main__":
    sys.exit(main())
@@ -0,0 +1,91 @@
 """Re-ingest a prod IDFW + IDFH via the patched save_imported_idf and
 render both PDFs to confirm charts have data."""
 from __future__ import annotations
 import sys
 import json
 import datetime
 import tempfile
 from pathlib import Path
 sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
 from sfm.waveform_store import WaveformStore
 from sfm import report_pdf
 import h5py
 class FakeDb:
    def __init__(self, event):
        self.event = event
    def get_event(self, _id):
        return self.event
 def to_ts_iso(ts):
    if ts is None:
        return None
    try:
        return datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
    except Exception:
        return None
 def render_case(idf_path: Path, serial: str, out_pdf: Path, h5_summary: bool = True):
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idf_path.read_bytes(),
            idf_path,
            idf_report_text=None,    # production worst case: no .txt
        )
        print(f"=== {idf_path.name} ===")
        print(f"  h5: {rec['hdf5_filename']}, sidecar: {rec['sidecar_filename']}")
        h5p = Path(td) / serial / f"{idf_path.name}.h5"
        if h5p.exists() and h5_summary:
            with h5py.File(h5p) as h:
                for ch in ("Tran", "Vert", "Long", "MicL"):
                    ds = h.get(f"samples/{ch}")
                    if ds is not None:
                        n = ds.shape[0]
                        mx = float(abs(ds[...]).max()) if n else 0
                        print(f"  samples/{ch}: n={n}  max_abs={mx:.5f}")
        record_type = "Histogram" if idf_path.suffix.upper() == ".IDFH" else "Waveform"
        fake_row = {
            "serial":              serial,
            "blastware_filename":  rec["filename"],
            "record_type":         record_type,
            "timestamp":           to_ts_iso(ev.timestamp),
            "sample_rate":         ev.sample_rate,
            "project":             ev.project_info.project if ev.project_info else None,
            "client":              ev.project_info.client if ev.project_info else None,
            "operator":            ev.project_info.operator if ev.project_info else None,
            "sensor_location":     ev.project_info.sensor_location if ev.project_info else None,
            "created_at":          None,
        }
        rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
        print(f"  ReportData: channels={ {k: len(v) for k,v in rd.channels.items()} }")
        if rd.is_histogram:
            print(f"  histogram n_intervals={rd.histogram_n_intervals} interval_size={rd.histogram_interval_size}")
        pdf = report_pdf.render_event_report_pdf(rd)
        out_pdf.write_bytes(pdf)
        print(f"  PDF: {out_pdf}  ({len(pdf)} bytes)")
 def main():
    out_dir = Path("/tmp/thor_render_test"); out_dir.mkdir(exist_ok=True)
    cases = [
        # IDFW that decoded to preamble-only under the old codec
        ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804154137.IDFW", "UM6047"),
        # IDFW that worked under the old codec (validates no regression)
        ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804104450.IDFW", "UM6047"),
        # IDFH histogram
        ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804190047.IDFH", "UM6047"),
    ]
    for path, serial in cases:
        render_case(Path(path), serial, out_dir / f"{Path(path).name}.pdf")
 if __name__ == "__main__":
    main()
@@ -638,14 +638,7 @@ def _draw_channel_stats_waveform(ax, rd: ReportData) -> None:
        ("Sensor Check",         "sensor_check",   ""),
    ]
    _draw_stats_table(ax, rd, rows_spec)
-    if rd.peak_vector_sum_ips is not None:
+    _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec))
        line = f"Peak Vector Sum   {rd.peak_vector_sum_ips:.3f} in/s"
        if rd.peak_vector_sum_time_s is not None:
            line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
        ax.text(0.0, -0.08, line, fontsize=9, weight="bold",
                ha="left", va="top", transform=ax.transAxes)
        ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888",
                ha="left", va="top", transform=ax.transAxes)
 def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
@@ -663,20 +656,54 @@ def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
        ("Sensor Check", "sensor_check",    ""),
    ]
    _draw_stats_table(ax, rd, rows_spec)
-    if rd.peak_vector_sum_ips is not None:
+    _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec), histogram_when=True)
-        line = f"Peak Vector Sum   {rd.peak_vector_sum_ips:.3f} in/s"
+
-        # Histograms: "0.091 in/s on May 27, 2026 At 06:06:14"
+
-        # The when_str is "HH:MM:SS Month DD, YYYY" — reformat for BW match.
+def _draw_pvs_summary(
-        if rd.peak_vector_sum_when_str:
+    ax,
-            parts = rd.peak_vector_sum_when_str.split(" ", 1)
+    rd: ReportData,
-            if len(parts) == 2:
+    *,
-                line += f" on {parts[1]} At {parts[0]}"
+    n_data_rows: int,
-            else:
+    histogram_when: bool = False,
-                line += f" on {rd.peak_vector_sum_when_str}"
+) -> None:
-        ax.text(0.0, -0.08, line, fontsize=9, weight="bold",
+    """Render the Peak Vector Sum + 'NA: Not Applicable' caption below the
-                ha="left", va="top", transform=ax.transAxes)
+    stats table.
-        ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888",
+
-                ha="left", va="top", transform=ax.transAxes)
+    Reads ``ax._stats_table_bottom`` (set by ``_draw_stats_table`` when
    it pins the table via an explicit ``bbox``) so the PVS line lands
    just below the table's known bottom edge instead of guessing at the
    geometry.
    Centered horizontally for visual balance (the previous left-aligned
    x=0 landed under the label column, not the data, which looked off).
    """
    if rd.peak_vector_sum_ips is None:
        return
    line = f"Peak Vector Sum   {rd.peak_vector_sum_ips:.3f} in/s"
    if histogram_when and rd.peak_vector_sum_when_str:
        # Histogram absolute date+time.  when_str is "HH:MM:SS Month DD, YYYY";
        # reformat to "<value> on <date> At <time>" to match BW.
        parts = rd.peak_vector_sum_when_str.split(" ", 1)
        if len(parts) == 2:
            line += f" on {parts[1]} At {parts[0]}"
        else:
            line += f" on {rd.peak_vector_sum_when_str}"
    elif not histogram_when and rd.peak_vector_sum_time_s is not None:
        line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
    # _draw_stats_table stashes the bbox bottom on the axes so we don't
    # have to guess geometry.  Falls back to a conservative default if
    # the bbox approach hasn't run.
    table_bottom_y = getattr(ax, "_stats_table_bottom", -0.10)
    pvs_y = table_bottom_y - 0.04   # small gap below the table border
    # Centered for visual balance — looks intentional rather than offset.
    # The original BW-replica had a "NA: Not Applicable" caption below
    # this line; dropped because we use "—" for missing values and the
    # legend was always squished against the PVS line.
    ax.text(0.5, pvs_y, line, fontsize=9, weight="bold",
            ha="center", va="top", transform=ax.transAxes)
 def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]]) -> None:
@@ -711,16 +738,28 @@ def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]])
            _cell(field_name, "Long"),
            unit,
        ])
    # Pin the table's position+size via bbox so we know exactly where
    # the bottom edge lands.  Lets _draw_pvs_summary place the PVS line
    # just below the table without guessing at row heights.
    #
    # bbox = [x, y, width, height] in axes coords.  Header + data rows
    # at row_h each; horizontal extent matches sum(colWidths).
    n_rows = len(table_data)        # header + data rows
    row_h  = 0.12                   # axes-fraction per row (fits fontsize=8)
    table_height = n_rows * row_h
    table_bottom = 1.0 - table_height
    tbl = ax.table(
-        cellText=table_data, loc="upper left",
+        cellText=table_data,
        colWidths=[0.28, 0.14, 0.14, 0.14, 0.10],
        cellLoc="left", edges="open",
        bbox=[0.0, table_bottom, 0.80, table_height],
    )
    tbl.auto_set_font_size(False)
    tbl.set_fontsize(8)
    tbl.scale(1, 1.4)
    for j in range(5):
        tbl[(0, j)].set_text_props(weight="bold", color="#555")
    # Stash the bottom Y so _draw_pvs_summary can position itself below.
    ax._stats_table_bottom = table_bottom
 def _channel_axis_color(ch: str) -> str:
@@ -3287,7 +3287,7 @@ if (currentSection === 'db') {
          <dt id="sc-l-bwsize">File size</dt>   <dd id="sc-f-bwsize">—</dd>
          <dt id="sc-l-sha">File sha256</dt>    <dd id="sc-f-sha">—</dd>
          <dt>Source kind</dt>      <dd id="sc-f-src">—</dd>
-          <dt title="When our server received and stored this event (sfm-db insert time, not the recording time)">Received by server at</dt>
+          <dt title="When SFM received and stored this event — NOT the unit-local trigger time (see Timestamp at the top of the modal for that).">Time received</dt>
                                    <dd id="sc-f-cap">—</dd>
        </dl>
      </div>
@@ -467,21 +467,21 @@ class WaveformStore:
        Ingest a Thor (Micromate Series IV) IDF event file (`.IDFW` or
        `.IDFH`) produced by Thor's TXT exporter.
        Thor binaries are stored as opaque bytes — seismo-relay doesn't
        yet decode the proprietary IDF binary format (codec slot lives
        at ``micromate/idf_file.py``).  Device-authoritative metadata
        comes from the paired ``.IDFW.txt`` / ``.IDFH.txt`` sidecar
        when supplied.
        Workflow:
-          1. Parse the paired TXT report (when supplied) via
+          1. For sig-A `.IDFW` binaries, decode samples + binary metadata
-             ``micromate.parse_idf_report`` → dict.
+             via ``micromate.idf_file.read_idf_file()``.  Failure or
-          2. Wrap parsed dict + filename into a typed ``micromate.IdfEvent``.
+             non-IDFW path falls through to the .txt-only flow.
-          3. Copy bytes verbatim into ``<root>/<serial>/<filename>``.
+          2. Parse the paired TXT report (when supplied) via
-          4. Bridge IdfEvent → ``minimateplus.Event`` (for the existing
+             ``micromate.parse_idf_report`` → dict.  TXT remains the
-             sidecar / DB insert machinery) via
+             source of truth for fields the binary doesn't yet supply
-             ``IdfEvent.to_minimateplus_event(waveform_key)``.
+             (full peak set with ZC freq / Time of Peak, sensor self-check,
-          5. Write the ``.sfm.json`` sidecar with
+             firmware string, project strings).
          3. Wrap parsed dict + filename into a typed ``micromate.IdfEvent``.
          4. Copy bytes verbatim into ``<root>/<serial>/<filename>``.
          5. Bridge IdfEvent → ``minimateplus.Event`` and attach
             ``raw_samples`` from the binary decoder (when available).
          6. Write the `.h5` clean-waveform file when samples decoded.
          7. Write the ``.sfm.json`` sidecar with
             ``source.kind = "idf-import"`` and the full raw IDF report
             under ``extensions.idf_report``.
@@ -490,7 +490,38 @@ class WaveformStore:
        """
        from micromate import IdfEvent, parse_idf_report
-        # Parse the .txt sidecar (best-effort; non-fatal on failure).
+        # 1. Binary decode (sig-A IDFW and IDFH).  Non-fatal: any failure
        # leaves samples / binary metadata unfilled and we proceed with
        # the .txt path as before.
        idf_samples: Optional[dict] = None
        idf_intervals: Optional[list] = None
        binary_md = None
        binary_peaks = None
        is_histogram = False
        try:
            from micromate.idf_file import read_idf_file
            # Pass idf_bytes through `data=` — at this point in the flow
            # the binary hasn't been written to disk yet, so the codec
            # can't read from source_path.  We still pass source_path so
            # the codec has the filename for error messages + .IDFH
            # suffix detection.
            res = read_idf_file(source_path, data=idf_bytes)
            idf_samples = res.samples or None
            idf_intervals = res.intervals
            is_histogram = res.intervals is not None
            binary_md = res.binary_metadata
            binary_peaks = res.event.peaks
        except NotImplementedError:
            # sig-B — codec doesn't handle this yet.
            pass
        except Exception as exc:
            log.warning(
                "save_imported_idf: binary codec failed for %s: %s — "
                "falling back to .txt-only ingest",
                source_path.name, exc,
            )
        # 2. Parse the .txt sidecar (best-effort; non-fatal on failure).
        report_dict: dict = {}
        if idf_report_text is not None:
            try:
@@ -501,17 +532,58 @@ class WaveformStore:
                    exc,
                )
-        # Build the typed IdfEvent.  Filename is authoritative for
+        # 3. Backfill report_dict with binary metadata for fields the
        # .txt didn't supply.  Binary takes precedence on tied fields
        # where the binary is more reliable (timestamp, sample_rate),
        # and fills in fields entirely missing from the .txt.
        if binary_md is not None:
            if binary_md.serial and not report_dict.get("serial_number"):
                report_dict["serial_number"] = binary_md.serial
            if binary_md.event_datetime and not report_dict.get("event_datetime"):
                report_dict["event_datetime"] = binary_md.event_datetime
            if binary_md.sample_rate and not report_dict.get("sample_rate"):
                report_dict["sample_rate"] = binary_md.sample_rate
            if binary_md.record_time_sec and not report_dict.get("record_time_sec"):
                report_dict["record_time_sec"] = binary_md.record_time_sec
            # Calibration date (binary) vs calibration text (.txt) cohabit
            # under different keys; no overwrite needed.
            if binary_md.event_datetime and not report_dict.get("event_type"):
                report_dict["event_type"] = (
                    "Full Histogram" if is_histogram else "Full Waveform"
                )
        # Binary-derived peaks fill in when the .txt didn't supply them.
        # They're ~3% low vs the device-authoritative .txt values (residual
        # codec drift), so .txt always wins when present.
        if binary_peaks is not None:
            if binary_peaks.transverse_ips and not report_dict.get("tran_ppv"):
                report_dict["tran_ppv"] = binary_peaks.transverse_ips
            if binary_peaks.vertical_ips and not report_dict.get("vert_ppv"):
                report_dict["vert_ppv"] = binary_peaks.vertical_ips
            if binary_peaks.longitudinal_ips and not report_dict.get("long_ppv"):
                report_dict["long_ppv"] = binary_peaks.longitudinal_ips
        # 4. Build the typed IdfEvent.  Filename is authoritative for
        # (serial, timestamp, kind); the report's event_datetime takes
        # precedence over the filename timestamp inside from_report().
        idf_event = IdfEvent.from_report(report_dict, source_path.name)
        # The binary mic peak (psi) isn't carried through from_report() —
        # IdfReport.from_dict only sees the .txt's dB(L) value.  Pull the
        # binary-derived ``mic_pspl_psi`` onto the typed IdfEvent so the
        # downstream bridge can populate ``PeakValues.micl`` (psi-shaped)
        # and the h5 writer's per-count mic factor lands at a sensible
        # value.  Without this, the h5 mic chart auto-scales against the
        # dB(L) value-as-pseudo-psi and renders ~flat.
        if binary_peaks is not None and binary_peaks.mic_pspl_psi is not None:
            idf_event.peaks.mic_pspl_psi = binary_peaks.mic_pspl_psi
        # Operator-supplied serial_hint wins over the binary's filename
        # prefix when both are present (e.g. callers passing a known-good
        # serial that overrides a misnamed export).
        serial = serial_hint or idf_event.serial or "UNKNOWN"
-        # Filesystem write.
+        # 5. Filesystem write of binary bytes.
        filename = source_path.name
        bw_path = self._serial_dir(serial) / filename
        bw_path.write_bytes(idf_bytes)
@@ -523,13 +595,59 @@ class WaveformStore:
        # surrogate — every distinct binary maps to a distinct row.
        waveform_key = bytes.fromhex(sha256)[:16]
-        # Bridge to minimateplus.Event for the existing sidecar / DB
+        # 6. Bridge to minimateplus.Event for the existing sidecar / DB
        # insert paths.  See IdfEvent.to_minimateplus_event() for the
        # caveats of this bridge (mic units, missing fields → sidecar).
        ev = idf_event.to_minimateplus_event(waveform_key)
-        # Write the sidecar.  Source kind "idf-import" was added to the
+        # Attach the decoded sample arrays.  Thor's decoder counts use
-        # allow-list in event_file_io.event_to_sidecar_dict for this.
+        # LSB = 0.0003 in/s for geo (vs BW's 16-count units at 0.005 in/s)
        # — the .h5 writer's geo_range="normal" yields LSB = 10/32768
        # ≈ 0.000305 in/s, so plotted samples come out ~1.7% high.
        # Acceptable known offset; refine with a Thor-aware h5 path later.
        if idf_samples is not None:
            ev.raw_samples = idf_samples
            n_samples = max((len(idf_samples.get(ch, [])) for ch in ("Tran", "Vert", "Long", "MicL")), default=0)
            ev.total_samples = ev.total_samples or n_samples
        # For IDFH histograms there are no per-sample waveform arrays — the
        # device stores one peak ADC count per interval per channel.  Synthesise
        # a 1-sample-per-interval array so the existing h5+renderer pipeline
        # (which groups samples down to ``n_intervals`` bars via max-per-group)
        # produces a non-blank histogram chart.  Each "sample" is the peak ADC
        # count for that interval, so the h5 writer's ``count × geo_fs/32768``
        # conversion yields the right physical value for the bar height.
        if is_histogram and idf_intervals:
            hist_samples = {
                "Tran": [iv.peak_count("Tran") for iv in idf_intervals],
                "Vert": [iv.peak_count("Vert") for iv in idf_intervals],
                "Long": [iv.peak_count("Long") for iv in idf_intervals],
                "MicL": [iv.peak_count("MicL") for iv in idf_intervals],
            }
            ev.raw_samples = hist_samples
            ev.total_samples = ev.total_samples or len(idf_intervals)
        # 7. Write the .h5 clean-waveform file when we have samples to write
        # (either the IDFW per-sample stream, or the IDFH synthesised per-
        # interval peak array).  The renderer treats both shapes the same way.
        hdf5_filename: Optional[str] = None
        if ev.raw_samples:
            hdf5_path = self.hdf5_path_for(serial, filename)
            try:
                event_hdf5.write_event_hdf5(
                    hdf5_path, ev,
                    serial=serial,
                    geo_range="normal",   # Thor's geo full scale is also 10 in/s (Normal)
                    source_kind="idf-import",
                )
                hdf5_filename = hdf5_path.name
            except Exception as exc:
                log.warning(
                    "save_imported_idf: HDF5 write failed for %s: %s — continuing without .h5",
                    hdf5_path, exc,
                )
        # 8. Write the sidecar.  Source kind "idf-import" is on the allow-list.
        sidecar_path = self.sidecar_path_for(serial, filename)
        existing_review = None
        if sidecar_path.exists():
@@ -554,19 +672,67 @@ class WaveformStore:
        # Time of Peak, sensor self-check, calibration, firmware).
        if report_dict:
            sidecar["extensions"]["idf_report"] = report_dict
        # Project the IDF report into the BW report sidecar shape so the
        # existing Event Report PDF pipeline (sfm/report_pdf.py) can
        # render Thor events without needing a separate code path.  Thor
        # data is 95% the same metric set as BW — the adapter handles
        # the field-name mapping.
        if report_dict or binary_md is not None:
            try:
                from micromate.idf_to_bw_report import build_bw_report_from_idf
                sidecar["bw_report"] = build_bw_report_from_idf(
                    report_dict or {},
                    binary_md=binary_md,
                    intervals=idf_intervals,
                    is_histogram=is_histogram,
                )
            except Exception as exc:
                log.warning(
                    "save_imported_idf: idf→bw_report adapter failed for %s: %s — "
                    "report PDF will fall back to DB-only fields",
                    filename, exc,
                )
        # For histograms, also stash the binary-decoded per-interval
        # records so the UI / report layer doesn't need to re-walk the
        # IDFH file at render time.
        if idf_intervals is not None:
            sidecar["extensions"]["idf_intervals"] = [
                {
                    "offset":     iv.offset,
                    "tran_peak":  iv.peak_count("Tran"),
                    "tran_halfp": iv.tran_halfp,
                    "tran_freq":  iv.freq_hz("Tran"),
                    "vert_peak":  iv.peak_count("Vert"),
                    "vert_halfp": iv.vert_halfp,
                    "vert_freq":  iv.freq_hz("Vert"),
                    "long_peak":  iv.peak_count("Long"),
                    "long_halfp": iv.long_halfp,
                    "long_freq":  iv.freq_hz("Long"),
                    "mic_peak":   iv.peak_count("MicL"),
                    "mic_halfp":  iv.micl_halfp,
                    "mic_freq":   iv.freq_hz("MicL"),
                }
                for iv in idf_intervals
            ]
        event_file_io.write_sidecar(sidecar_path, sidecar)
        log.info(
            "WaveformStore.save_imported_idf serial=%s filename=%s filesize=%d "
-            "report_attached=%s",
+            "kind=%s report_attached=%s binary_decoded=%s h5=%s intervals=%d",
-            serial, filename, filesize, bool(report_dict),
+            serial, filename, filesize,
            "histogram" if is_histogram else "waveform",
            bool(report_dict),
            (idf_samples is not None) or (idf_intervals is not None),
            hdf5_filename or "(skipped)",
            len(idf_intervals) if idf_intervals else 0,
        )
        return ev, {
            "filename":           filename,
            "filesize":           filesize,
            "sha256":             sha256,
            "a5_pickle_filename": None,
-            "hdf5_filename":      None,
+            "hdf5_filename":      hdf5_filename,
            "sidecar_filename":   sidecar_path.name,
            "serial":             serial,
        }
Author	SHA1	Message	Date
serversdown	d0b66368d5	Merge pull request 'update to v0.21.1, thor data import successful' (#29 ) from dev into main Reviewed-on: #29	2026-06-01 16:54:23 -04:00
serversdown	25386cab8b	fix(backfill): regenerate IDFH .h5 + merge binary mic_pspl_psi onto bridge Two gaps in backfill_thor_events.py that left old Thor events showing stale charts after a v0.21.1 backfill pass: 1. IDFH events were skipped from .h5 regeneration (the "have decoded samples" gate was IDFW-only). Histograms kept their pre-v0.21.1 .h5 — written from raw_samples = None, which the renderer turned into a near-empty bar chart, or for older events the dB(L)-as-pseudo- psi mic scale that produced "107.7 psi" peaks (atomic-bomb level instead of footstep level). Fix: synthesise the same 1-sample-per- interval array save_imported_idf v0.21.1 uses (peak ADC count per channel per interval) so the renderer's bar-chart grouping has data to work with. 2. The IDFW h5 path didn't merge binary_peaks.mic_pspl_psi onto the IdfEvent before to_minimateplus_event(). The live save_imported_idf does this merge — without it, IdfEvent.from_report() only sees the .txt's dB(L) value, the bridge falls back to the dBL→psi formula (instead of the binary-accurate 2.14e-6 psi/count value), and the h5 writer's per-count mic factor lands on a less-correct value. Fix: same merge the live ingest does (lift res.event.peaks.mic_pspl_psi onto idf_event.peaks before the bridge call). Verified against UM6047_20250804190047.IDFH (250-interval prod histogram): 250 intervals decode, mic_pspl_psi = 2.78e-5 (was being treated as dB(L)=107.7 in the old h5). Operator: re-run after deploy. `docker compose exec sfm python scripts/backfill_thor_events.py` is idempotent — the existing version check still skips events already at the new TOOL_VERSION, and review state + captured_at are preserved on the second pass.	2026-06-01 20:02:54 +00:00
serversdown	6cb619ecc4	version bump - 0.21.1	2026-06-01 19:33:44 +00:00
serversdown	1ed86244d0	fix(thor-events): add parallel field for mic psi. Now shows mic in dbl and psi. (psi for charts)	2026-06-01 18:27:24 +00:00
serversdown	b2c565f217	fix(idf_waveforms): _find_waveform_body_offset() — scans every 00 02 00 magic past offset 0x0E00, runs decode_waveform_v2 on each candidate, picks the one that returns the most samples. Validated on 483 prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully decode, 126/483 partial (BW codec walker-stops-early on loud events — known issue). IDFH now synthesises a 1-sample-per-interval array from the binary intervals and writes an .h5 so the existing renderer works unchanged. Each "sample" is the per-interval peak ADC count → h5_value = count × geo_fs/32768 yields the right bar height.	2026-05-31 20:51:09 +00:00
serversdown	43f440812a	scripts: add backfill_thor_events.py Refreshes the bw_report sidecar block + .h5 waveform files for Thor events ingested before the v0.21.0 adapter wiring + the `bee1185` codec fix. Those events landed with extensions.idf_report only (no bw_report, no .h5 for IDFW) — symptom on the UI side: the modal chart 404'd on /waveform.json and the PDF rendered from DB-only fields without sensor self-check, full per-channel breakdown, or mic dB(L). Walks <store>/<serial>/<filename>: - Reads the existing sidecar (preserves review state + captured_at) - Re-runs read_idf_file() on the binary bytes (passes data= kwarg so codec doesn't try the broken bare-path Path.read_bytes) - Reads extensions.idf_report from the existing sidecar - Runs build_bw_report_from_idf adapter - Writes refreshed sidecar with bw_report + bumped tool_version, preserving review block and original captured_at - For IDFW: regenerates .h5 by bridging IdfEvent.from_report -> to_minimateplus_event -> write_event_hdf5 (mirrors save_imported_idf steps 4-7) - IDFH events skip .h5 (histograms have no per-sample data) Skips events already at current TOOL_VERSION with bw_report present. --force overrides. --skip-hdf5 limits to sidecar-only refresh. --dry-run for preview. Validated against the prod-snap waveform store: 3,815 Thor sidecars refreshed cleanly with 0 errors, 462 IDFW .h5 files written, 2 skipped (binaries with no sidecar — backfill doesn't conjure events from nothing). Verified one originally-broken IDFW event now serves waveform.json (200, 168KB) and a fully populated PDF (119KB vs the previous 56KB sparse output). Operator workflow on prod: docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py --dry-run # Inspect counts, then for real: docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py Idempotent — re-running it is a no-op once everything's at the current TOOL_VERSION. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 04:37:43 +00:00
serversdown	23e83908c2	report_pdf: fix PVS overlapping stats table, drop NA caption Two related fixes to the per-channel stats block: 1. Pin the stats table's position via an explicit bbox= on ax.table() so the bottom edge is at a known axes-fraction Y. The previous loc="upper left" + tbl.scale(1, 1.4) combo let matplotlib choose row heights based on text size, which made the table extend further below the axes than the hard-coded PVS line at y=-0.08 expected. Result was the "Peak Vector Sum X in/s" string landing horizontally inside the Peak Displacement row. With bbox=[0, 1-N0.12, 0.80, N0.12] the table is pinned to a precise rectangle (12% axes-fraction per row × N rows tall). _draw_stats_table now stashes the bottom Y on the axes for the PVS helper to reference, so the geometry stays in sync. 2. Center PVS horizontally (ha="center" at x=0.5 instead of ha="left" at x=0). The previous left-edge alignment put PVS at the same X as the label column, which read as "off-center" once the rest of the stats data was column-aligned further right. 3. Drop the "NA: Not Applicable" caption. It existed to explain "—" placeholder cells, but "—" is universally understood and the caption was always visually squished against the PVS line below. Less cruft on the page; one fewer position to manage. Verified against a real BE12599 histogram event (5 data rows) and a real UM12947 IDFW waveform event (6 data rows) — both layouts clear the table cleanly with no overlap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 22:17:43 +00:00
serversdown	bee118506b	fix(idf): decode from in-memory bytes during ingest Bug shipped in v0.21.0: save_imported_idf called read_idf_file() with `source_path` (a bare filename like "UM12947_….IDFW") BEFORE writing the binary to disk. The codec did Path(path).read_bytes() which resolved relative to /app and hit FileNotFoundError. The error was caught + logged as a warning, and ingest fell back to .txt-only — events still landed in the DB but lost the bw_report block + .h5 waveform that the codec was supposed to produce. Observed during a full re-forward from thor-watcher on 2026-05-29: every Thor event logged "binary codec failed for X: [Errno 2] No such file or directory" and got binary_decoded=False. Fix: - read_idf_file() gains a `data: Optional[bytes]` kwarg. When supplied, skips the disk read and decodes the provided bytes directly. `path` stays required (used for filename in error messages + .IDFH vs .IDFW suffix detection); only the read is conditional. Backward compatible — existing positional callers (CLI scripts, tests) continue to work unchanged. - save_imported_idf passes `data=idf_bytes` since the bytes are already in memory from the multipart upload. Filesystem write still happens at step 5 of the existing flow; codec just no longer depends on it. Verified end-to-end against UM11719_20231219162723.IDFW from the example-data corpus: ingest endpoint returns inserted=1, log line shows binary_decoded=True + h5=...IDFW.h5, no warnings. Re-forward existing Thor events from thor-watcher after deploy to backfill the bw_report block — UPSERT preserves review state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:09:54 +00:00
serversdown	defd17d9c2	sfm_webapp: harmonize "Received by server at" → "Time received" Matches Terra-View's event-modal relabel from the same iteration. Wording was already clearer here than in Terra-View's "Captured at", but using identical text across both surfaces means operators see the same label whether they're in the native modal or the standalone webapp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 19:51:58 +00:00
serversdown	e42956a20b	release: v0.21.0 — Thor / Series IV codec + Thor→BW adapter Documents two commits that landed on dev since v0.20.0: `9b71ead` series 4 codec work, initial decode success micromate/idf_file.read_idf_file() decodes both IDFW (waveform; 87-99% sample fidelity reusing decode_waveform_v2 at offset 0x0f1f) and IDFH (histogram; dedicated segment-based decoder, all 859 corpus files decode, 181,071 intervals total). `9fd52dd` feat: add thor report generation, pdf generation micromate/idf_to_bw_report.py adapter projects parsed Thor data into the bw_report sidecar shape so Thor events flow through sfm/report_pdf.py without a separate renderer. Wired into save_imported_idf. Net effect: a Thor event ingested via /db/import/idf_file now lands with the same fidelity as a BW event, gets a per-event PDF on demand, and renders in Terra-View's modal chart using the same plotting code as a BW event. Roadmap items closed: - Binary .IDFW / .IDFH codec (was pending) - Series IV (Thor IDF) binary codec reverse-engineering Companion: Terra-View v0.13.0 ships in parallel and closes Phase 1 of the SFM integration. No API changes in seismo-relay for that piece — Terra-View just consumes existing endpoints better. Bumps: - pyproject.toml 0.20.0 → 0.21.0 - minimateplus.event_file_io.TOOL_VERSION 0.20.0 → 0.21.0 (any subsequent backfill_sidecars.py --force will re-stamp existing sidecars; expected + harmless) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 19:25:44 +00:00
serversdown	9fd52ddabb	feat: add thor report generation, pdf generation.	2026-05-29 19:03:06 +00:00
serversdown	9b71ead44b	series 4 codec work, inital decode success	2026-05-29 06:33:13 +00:00