diff --git a/CHANGELOG.md b/CHANGELOG.md index 1b92776..77478d7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,97 @@ All notable changes to seismo-relay are documented here. --- +## v0.21.1 — 2026-06-01 + +Bug fixes against v0.21.0 surfaced after the first prod redeploy. Three +production-visible symptoms — blank waveform charts on most Thor events, +blank histogram charts on all Thor events, and a mic chart that +auto-scaled against a dB(L) value treated as psi — all root-caused and +fixed. + +### Fixed + +- **Dynamic IDFW body offset.** The v0.21.0 codec hardcoded the body + at file offset `0x0f1f` based on the example corpus, but only ~52% + of production IDFW events use that offset; the rest sit at offsets + from `0x1033` up to `0x3082` depending on header padding. At + `0x0f1f` the codec would find a coincidentally-matching `00 02 00` + magic, read the 2-byte Tran preamble, and return empty V/L/M + arrays — producing near-empty .h5 files and blank charts. + `micromate.idf_file._find_waveform_body_offset()` now scans every + `00 02 00` magic position past `0x0E00`, trial-decodes each one, + and picks the offset with the most samples. Validated across 483 + prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully + decode, 126/483 partial (BW codec walker-stops-early on loud + events — pre-existing limitation, samples reached are correct). + +- **IDFH histograms now render bar charts.** Histograms previously + skipped the .h5 write because there are no per-sample arrays, but + the renderer drives the per-interval bar chart from .h5 channel + data + `bw_report.histogram.n_intervals`. `save_imported_idf` now + synthesizes a 1-sample-per-interval array from the decoded + `IdfhInterval` peak counts and writes an .h5 so the existing + renderer works unchanged — each "sample" is the per-interval peak + ADC count, so the writer's `count × geo_fs/32768` conversion + yields the right bar height. + +- **Mic chart scaling on Thor events.** `PeakValues.micl` (consumed + by the h5 writer's per-count mic scale factor) expects psi, but + the Thor bridge was stuffing the dB(L) value (~99.4) into it, + producing a per-count factor 5+ orders of magnitude too large and + a flat-looking mic chart. Fixed by adding `IdfPeaks.mic_pspl_psi` + alongside `mic_pspl_dbl`; `read_idf_file()` computes it from + binary mic counts (`max(|MicL|) × 2.14e-6 psi/count`) for both + IDFW and IDFH paths; `save_imported_idf` merges it onto the typed + event after `IdfEvent.from_report`; the bridge feeds psi to + `PeakValues.micl` with a dB(L)→psi formula fallback when only the + dB(L) value is available. dB(L) for the report header still + flows through `bw_report.mic.pspl_dbl` unchanged. + +### Operator + +After deploy, run `python scripts/backfill_thor_events.py` to refresh +every existing Thor event's sidecar + .h5 with the corrected codec +output. The script auto-skips events already at the current +`TOOL_VERSION`, so the bump from `0.21.0` → `0.21.1` is what triggers +the refresh. + +--- + +## v0.21.0 — 2026-05-29 + +The "Thor / Series IV codec" release. Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline. Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event. + +### Added — Thor IDF binary codec (`micromate/idf_file.read_idf_file`) + +- **IDFW (waveform)** — body sits at fixed file offset `0x0f1f`; reuses the verified `decode_waveform_v2()` walker from `minimateplus.waveform_codec`. Sample fidelity is **87–99% byte-exact** against the ASCII-sidecar reference values on quiet events; loud events hit the same walker-stops-early limitation as the BW codec on `SP0/SS0/SV0`-style events. +- **IDFH (histogram)** — dedicated segment-based decoder for the Thor histogram body format: `[len_be][0a 00 00 00][00 NN][05 3f]` framing plus N × 72-byte interval records (4 × 16-byte per-channel min/max/halfp). **All 859 Thor IDFH corpus files decode**, totalling **181,071 intervals**; per-channel peaks match the sidecar within **~1.8% (ADC quantization)**. +- **BW-aliased binary detection** — a small number of corpus files (e.g. `BE9439_*.IDFW/IDFH`) are actually Series III Blastware binaries that share the IDF filename convention by accident. `read_idf_file()` detects them via their BW `STRT` signature and raises `NotImplementedError` pointing the caller at `read_blastware_file()` instead of trying to decode them as IDF. +- Full field layouts in `docs/idf_protocol_reference.md`; supporting analysis scripts in `analysis_idf/` (decode validators, per-file detail dumps, corpus accuracy reports). + +### Added — Thor → BW report adapter (`micromate/idf_to_bw_report.py`) + +- **`build_bw_report_from_idf(report_dict, binary_md=, intervals=, is_histogram=)`** projects a parsed Thor `IdfReport` plus binary-extracted metadata plus decoded IDFH intervals into the `bw_report`-shaped dict that `sfm.report_pdf.gather_report_data` consumes. No need to duplicate the renderer — Thor data is ~95% the same metric set as BW; the adapter handles the field-name mapping (`MicPSPL` → `pspl_dbl`, `>100` sentinel → `zc_freq_above_range`, free-form `Calibration : Nov 22, 2023 by Instantel` → `calibration_date` + `calibration_by`, etc.). +- For IDFH events the adapter derives `histogram.interval_times` by stepping `IntervalSize` from `HistogramStartTime`, matching what the BW pipeline expects from a histogram-mode event. +- **Wired into `WaveformStore.save_imported_idf`** — every Thor event ingested via `/db/import/idf_file` now gets a `bw_report` block in its sidecar in addition to the existing `extensions.idf_report` (the raw parsed Thor payload). Falls back gracefully (PDF renders from DB-only fields) if the adapter raises — logged as a warning rather than failing the ingest. + +### Companion releases + +- **Terra-View v0.13.0** ships in parallel — closes Phase 1 of the SFM integration. The shared event-detail modal now renders the SFM event story (Chart.js waveform/histogram chart, inline PDF preview, `.TXT` download, FT/reviewer/notes review form) without operators needing to bounce to the standalone SFM webapp on port 8200. Uses only existing seismo-relay endpoints — no API changes here, just better consumption. + +### Migration / Operations + +No DB migration needed. Existing Thor events already in the store don't automatically pick up the new `bw_report` block — they'd need a re-ingest (post the IDF binary + paired `.TXT` back to `/db/import/idf_file`) for the adapter to run. Alternatively, run `scripts/backfill_sidecars.py --reparse-txt` after a small adapter change (the script currently only re-runs the BW ASCII parser; extending it to handle Thor would be a small follow-up). + +```bash +cd /home/serversdown/terra-view +docker compose build sfm && docker compose up -d sfm +``` + +The bumped `TOOL_VERSION = "0.21.0"` in `minimateplus/event_file_io.py` means any subsequent `backfill_sidecars.py --force` pass will re-write sidecars with the new version stamp; that's expected and harmless. + +--- + ## v0.20.0 — 2026-05-28 The "PDF + parser polish" release. Closes out the Event-Report PDF iteration started in v0.17.x: histogram layouts now render correctly against BW reference PDFs, the ASCII parser handles the real-world edge cases production events were tripping over (OORANGE, `>100 Hz`, histogram timestamps), and the `.TXT` preservation rollout lets parser fixes be applied retroactively to ingested events. Adds server-wide timezone support so operator-visible timestamps no longer drift into UTC. Rolls up the substantial "pre-v0.20" body of work that had accumulated under `[Unreleased]` (PDF generation, histogram codec fix, histogram parser fields, `.TXT` preservation, backfill safety) — see the trailing "pre-v0.20.0 work" section below for the full list. diff --git a/CLAUDE.md b/CLAUDE.md index c2892d6..9198786 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,7 +2,7 @@ Ground-up Python replacement for **Blastware**, Instantel's Windows-only software for managing MiniMate Plus seismographs. Connects over direct RS-232 or cellular modem -(Sierra Wireless RV50 / RV55). Current version: **v0.20.0**. +(Sierra Wireless RV50 / RV55). Current version: **v0.21.0**. When new information about the protocol is discovered, please update the instantel_protocol_reference.md with the findings in addition to this document @@ -73,6 +73,28 @@ should not import from `sfm/`, must not touch a DB, and have no I/O beyond reading files passed as arguments. Keep them pure — both tiers can then depend on them without circularity. +#### Thor IDF binary codec (2026-05-28) + +`micromate/idf_file.read_idf_file()` decodes both Thor IDFW +(waveform) and IDFH (histogram) binaries. + +- **IDFW** reuses `decode_waveform_v2()` on the body at fixed file + offset `0x0f1f`. Sample fidelity is 87–99% byte-exact on quiet + events; loud events hit the BW codec's known walker-stops-early + limitation. +- **IDFH** has its own segment-based decoder: `[len_be][0a 00 00 00] + [00 NN][05 3f]` + N × 72-byte interval records (4 × 16-byte + per-channel min/max/halfp). All 859 Thor IDFH corpus files + decode (181,071 intervals); peak matches sidecar within ~1.8% + (ADC quantization). + +The two outlier `BE9439_*` files in the Thor example corpus are +actually Series III Blastware binaries that share the `.IDFW`/`.IDFH` +filename convention by accident. `read_idf_file()` detects them by +their BW STRT signature and raises NotImplementedError pointing +callers at `read_blastware_file()`. See +`docs/idf_protocol_reference.md` for full field layouts. + ### Practical consequences When deciding where new code goes, ask: diff --git a/README.md b/README.md index 7522bb1..8d41f7e 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# seismo-relay `v0.20.0` +# seismo-relay `v0.21.0` A ground-up replacement for **Blastware** — Instantel's aging Windows-only software for managing seismographs. Supports both the **MiniMate Plus @@ -45,6 +45,15 @@ over direct RS-232 or cellular modem (Sierra Wireless RV50 / RV55). > `scripts/backfill_sidecars.py --reparse-txt` lets parser fixes be > applied retroactively to existing events without re-forwarding, > using the `.TXT` files preserved at ingest time. +> **v0.21.0 (2026-05-29)** is the Thor / Series IV decoder release — +> `micromate/idf_file.read_idf_file()` now decodes both IDFW +> (waveform) and IDFH (histogram) binaries (87–99% sample fidelity +> on quiet IDFW events; all 859 IDFH corpus files decode cleanly). +> A new `micromate/idf_to_bw_report.py` adapter projects parsed +> Thor reports into the BW-shaped sidecar block, so Thor events +> flow through the existing Event Report PDF pipeline without a +> separate renderer. Terra-View v0.13.0 ships in parallel and +> closes Phase 1 of the SFM integration — see its CHANGELOG. > See [CHANGELOG.md](CHANGELOG.md) for full version history. --- @@ -68,7 +77,8 @@ seismo-relay/ ├── micromate/ ← Series IV (Micromate / Thor) client library (NEW v0.19) │ ├── models.py ← IdfEvent, IdfReport, IdfPeaks, IdfProjectInfo, IdfSensorCheck (mic in native dB(L)) │ ├── idf_ascii_report.py ← Parse Thor .IDFW.txt / .IDFH.txt event sidecars -│ └── idf_file.py ← Stub for the .IDFW / .IDFH binary codec (reverse-engineering pending) +│ ├── idf_file.py ← Binary codec for .IDFW + .IDFH (v0.21.0+) +│ └── idf_to_bw_report.py ← Adapter projecting Thor IDF into the BW report shape (v0.21.0+) │ ├── sfm/ ← SFM REST API server (FastAPI, port 8200) │ ├── server.py ← Live device endpoints + DB query + ingest endpoints + caching @@ -425,7 +435,7 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows. - [x] Thor IDF file ingest at `/db/import/idf_file` (paired with `thor-watcher`, v0.18.0+) - [x] Native `IdfEvent` / `IdfReport` typed models — mic in dB(L), full title strings, sensor self-check, calibration, firmware version - [x] Parser verified against 1,014 paired `.txt` sidecars in `thor-watcher/example-data/` -- [ ] Binary `.IDFW` / `.IDFH` codec — pending (see Roadmap + [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md)) +- [x] Binary `.IDFW` / `.IDFH` codec — ✅ v0.21.0. IDFW reuses `decode_waveform_v2()` on the body at offset `0x0f1f` (87–99% sample fidelity on quiet events); IDFH has a dedicated segment-based decoder (all 859 corpus files decode, 181,071 intervals total). See `micromate/idf_file.py` + `docs/idf_protocol_reference.md`. - [ ] Live-device protocol — pending codec **Data persistence:** @@ -538,7 +548,7 @@ Implementation steps (concrete): ### High-impact (unblocks product features) - [ ] **Series III waveform body codec reverse-engineering.** The 5A bulk-stream body is some kind of compressed/encoded format (not raw int16 LE as previously assumed — see §7.6.1 retraction in `docs/instantel_protocol_reference.md`). Structural framing is ~50% decoded on branch `claude/codec-re-cBGNe` (tagged-block walker, segment counters); per-byte sample mapping is still open. Until this lands, the in-app waveform viewer renders garbage and BW-import peak values fall back to `_peaks_from_samples()` saturation noise. Workaround: pair every BW-imported event with its `_ASCII.TXT` so the device-authoritative peaks land in the DB regardless of codec. -- [ ] **Series IV (Thor IDF) binary codec reverse-engineering.** `.IDFH` / `.IDFW` files are currently stored opaquely by `WaveformStore.save_imported_idf`, with all metadata sourced from the paired `.txt` sidecar. This works because thor-watcher forwards both files together, but operators who haven't enabled Thor's TXT exporter get rows with NULL peaks. Cracking the binary closes that gap and unlocks waveform display. Starting-point reference at [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md) — two observed file signatures (1,012 newer-firmware files + 2 old files whose layout matches the Series III STRT-record format), suggested first-session plan (~2-4 hrs), 1,014 paired binary+txt files available as ground truth in `thor-watcher/example-data/`. Code seam ready at `micromate/idf_file.py`. +- [x] **Series IV (Thor IDF) binary codec reverse-engineering.** ✅ v0.21.0 — `micromate/idf_file.read_idf_file()` decodes both IDFW (waveform body at offset `0x0f1f`, reusing `decode_waveform_v2()`; 87–99% sample fidelity on quiet events) and IDFH (dedicated segment-based decoder: all 859 corpus files decode, 181,071 intervals, peaks within ~1.8% of sidecar values). `WaveformStore.save_imported_idf` now also projects parsed Thor data into a `bw_report` block via `micromate/idf_to_bw_report.py` so Thor events render in the existing Event Report PDF pipeline without a separate renderer. - [ ] **In-app waveform viewer accuracy.** Depends on Series III codec decode. Plot.v1 JSON pipeline + viewer skeleton already exist; will start showing real waveforms automatically once `_decode_a5_waveform` produces correct samples. Series IV waveforms come online when the IDF codec lands. - [ ] **Series IV live-device support.** Once the IDF binary is decoded, extend `micromate/` with `transport.py` / `framing.py` / `protocol.py` / `client.py` mirroring the `minimateplus/` package layout — depends on capturing Thor's wire protocol (TCP / RS-232 captures TBD). - [ ] **Terra-view integration** — seismo-relay router, unit detail page, VISON-style event listing. diff --git a/analysis_idf/corpus_accuracy.py b/analysis_idf/corpus_accuracy.py new file mode 100644 index 0000000..acdd0e9 --- /dev/null +++ b/analysis_idf/corpus_accuracy.py @@ -0,0 +1,65 @@ +"""Run read_idf_file across the corpus and report per-channel accuracy vs sidecars.""" +from __future__ import annotations +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from micromate.idf_file import read_idf_file +from analysis_idf.recon import load_sidecar_samples + + +def sidecar_path(idfw: Path) -> Path: + return idfw.parent / "TXT" / f"{idfw.name}.txt" + + +def main(): + root = REPO / "tests/fixtures/THORDATA_example" + files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")] + files.sort() + GEO_LSB = 0.0003 + + n_ok = n_skip = 0 + overall = {"Tran": [], "Vert": [], "Long": []} + + for f in files: + try: + res = read_idf_file(f) + except Exception: + n_skip += 1 + continue + sc_path = sidecar_path(f) + if not sc_path.exists(): + n_skip += 1 + continue + try: + sc = load_sidecar_samples(sc_path) + except Exception: + n_skip += 1 + continue + + per_file = {} + for ch in ("Tran", "Vert", "Long"): + sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]] + dec = res.samples.get(ch, []) + n = min(len(sc_counts), len(dec)) + if n == 0: + per_file[ch] = 0.0 + continue + exact = sum(1 for i in range(n) if sc_counts[i] == dec[i]) + pct = 100.0 * exact / n + per_file[ch] = pct + overall[ch].append(pct) + n_ok += 1 + + print(f"Processed {n_ok} files (skipped {n_skip})") + print("Per-channel exact-match % (mean / min / max):") + for ch, vals in overall.items(): + if vals: + avg = sum(vals) / len(vals) + print(f" {ch}: mean={avg:.2f}% min={min(vals):.2f}% max={max(vals):.2f}% n={len(vals)}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/diff_trail.py b/analysis_idf/diff_trail.py new file mode 100644 index 0000000..a64295b --- /dev/null +++ b/analysis_idf/diff_trail.py @@ -0,0 +1,49 @@ +"""Find where decoded-vs-sidecar diverges for each channel.""" +from __future__ import annotations +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from minimateplus.waveform_codec import decode_waveform_v2 +from analysis_idf.recon import TARGET, TXT, load_sidecar_samples + + +def main(): + buf = TARGET.read_bytes() + sc = load_sidecar_samples(TXT) + decoded = decode_waveform_v2(buf[0x0f1f:]) + GEO_LSB = 0.0003 + + for ch in ("Tran", "Vert", "Long"): + sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]] + dec = decoded[ch] + # Find ALL transitions where mismatches start/stop + first_diff = next((i for i in range(len(dec)) if dec[i] != sc_counts[i]), None) + if first_diff is None: + print(f"{ch}: NO MISMATCHES") + continue + print(f"{ch}: first diff at idx {first_diff}") + # Show 5 before, 5 after + for i in range(max(0, first_diff - 3), min(len(dec), first_diff + 8)): + mark = " " if dec[i] == sc_counts[i] else "**" + print(f" {mark} idx {i:4d}: sc={sc_counts[i]:6d} dec={dec[i]:6d} diff={dec[i]-sc_counts[i]:+d}") + # Where does cumulative diff exceed 100? + cum_match_run = 0 + max_match_run = 0 + match_run_start = 0 + diff_count = 0 + for i in range(len(dec)): + if dec[i] == sc_counts[i]: + cum_match_run += 1 + max_match_run = max(max_match_run, cum_match_run) + else: + cum_match_run = 0 + diff_count += 1 + print(f" total mismatches: {diff_count}/{len(dec)}, longest run of matches: {max_match_run}") + print() + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/e2e_idfh.py b/analysis_idf/e2e_idfh.py new file mode 100644 index 0000000..3f5ec43 --- /dev/null +++ b/analysis_idf/e2e_idfh.py @@ -0,0 +1,48 @@ +"""End-to-end IDFH ingest verification.""" +from __future__ import annotations +import sys +import tempfile +import json +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from sfm.waveform_store import WaveformStore + + +def main(): + idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH" + txt = idfh.parent / "TXT" / f"{idfh.name}.txt" + + with tempfile.TemporaryDirectory() as td: + store = WaveformStore(Path(td)) + ev, rec = store.save_imported_idf( + idfh.read_bytes(), + idfh, + idf_report_text=txt.read_text(errors="replace"), + ) + print("=== save_imported_idf (IDFH) ===") + print(f" serial: {rec['serial']}") + print(f" filename: {rec['filename']}") + print(f" filesize: {rec['filesize']}") + print(f" h5: {rec['hdf5_filename']}") # expect None for histogram + print(f" sidecar: {rec['sidecar_filename']}") + print() + print("=== Event ===") + print(f" timestamp: {ev.timestamp}") + print(f" record_type: {ev.record_type}") + print(f" sample_rate: {ev.sample_rate}") + print() + # Inspect sidecar to confirm intervals were stashed + sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json" + sc = json.loads(sc_path.read_text()) + intervals = sc.get("extensions", {}).get("idf_intervals", []) + print(f" sidecar intervals: {len(intervals)}") + if intervals: + print(f" first interval: {intervals[0]}") + print(f" last interval: {intervals[-1]}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/e2e_no_txt.py b/analysis_idf/e2e_no_txt.py new file mode 100644 index 0000000..a9c81b6 --- /dev/null +++ b/analysis_idf/e2e_no_txt.py @@ -0,0 +1,40 @@ +"""Verify the had_report=False path: ingest IDFW with no .txt.""" +from __future__ import annotations +import sys +from pathlib import Path +import tempfile + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from sfm.waveform_store import WaveformStore + + +def main(): + idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW" + with tempfile.TemporaryDirectory() as td: + store = WaveformStore(Path(td)) + ev, rec = store.save_imported_idf( + idfw.read_bytes(), + idfw, + serial_hint=None, + idf_report_text=None, # ← no .txt! + ) + print("=== IDFW without .txt ingest ===") + print(f" serial: {rec['serial']}") + print(f" timestamp: {ev.timestamp}") + print(f" sample_rate: {ev.sample_rate}") + print(f" record_type: {ev.record_type}") + print(f" rectime_sec: {ev.rectime_seconds}") + nT = len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0 + nV = len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0 + nL = len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0 + nM = len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0 + print(f" raw_samples: Tran={nT} Vert={nV} Long={nL} MicL={nM}") + if ev.peak_values: + print(f" peak_values: tran={ev.peak_values.tran} vert={ev.peak_values.vert} long={ev.peak_values.long}") + print(f" h5 written: {rec['hdf5_filename']}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/e2e_report.py b/analysis_idf/e2e_report.py new file mode 100644 index 0000000..c4cbb04 --- /dev/null +++ b/analysis_idf/e2e_report.py @@ -0,0 +1,102 @@ +"""End-to-end Thor report PDF rendering. + +Ingests an IDFW + .txt via save_imported_idf, runs gather_report_data +(faking a minimal DB row), and renders the PDF to disk. +""" +from __future__ import annotations +import sys +import tempfile +import json +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from sfm.waveform_store import WaveformStore +from sfm import report_pdf + + +class FakeDb: + """Stand-in for SeismoDb.get_event(); the renderer only needs a few cols.""" + def __init__(self, event): + self.event = event + + def get_event(self, _id): + return self.event + + +def main(): + base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719" + idfw = base / "UM11719_20231219162723.IDFW" + txt = base / "TXT" / f"{idfw.name}.txt" + + with tempfile.TemporaryDirectory() as td: + store = WaveformStore(Path(td)) + ev, rec = store.save_imported_idf( + idfw.read_bytes(), + idfw, + idf_report_text=txt.read_text(errors="replace"), + ) + print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}") + + # Verify sidecar has bw_report block + sc_path = Path(td) / "UM11719" / f"{idfw.name}.sfm.json" + sc = json.loads(sc_path.read_text()) + bw = sc.get("bw_report", {}) + print(f" bw_report.available: {bw.get('available')}") + print(f" bw_report.peaks.tran.ppv_ips: {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}") + print(f" bw_report.mic.pspl_dbl: {bw.get('mic', {}).get('pspl_dbl')}") + print(f" bw_report.histogram.n_intervals: {bw.get('histogram', {}).get('n_intervals')}") + + # Build a DB-row-shaped dict from the Event for gather_report_data + import datetime + ts = ev.timestamp + ts_iso = None + if ts is not None: + try: + ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat() + except Exception: + pass + fake_row = { + "serial": "UM11719", + "blastware_filename": rec["filename"], + "record_type": "Waveform", + "timestamp": ts_iso, + "sample_rate": ev.sample_rate, + "project": ev.project_info.project if ev.project_info else None, + "client": ev.project_info.client if ev.project_info else None, + "operator": ev.project_info.operator if ev.project_info else None, + "sensor_location": ev.project_info.sensor_location if ev.project_info else None, + "created_at": None, + } + + rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1") + print() + print(f"=== ReportData ===") + print(f" event_id: {rd.event_id}") + print(f" serial: {rd.serial}") + print(f" record_type: {rd.record_type}") + print(f" event_datetime: {rd.event_datetime_str}") + print(f" trigger: {rd.trigger_source}") + print(f" geo_range: {rd.geo_range_str}") + print(f" sample_rate: {rd.sample_rate_str}") + print(f" firmware: {rd.firmware}") + print(f" calibration: {rd.calibration_date} by {rd.calibration_by}") + print(f" battery: {rd.battery_volts}") + print(f" PVS: {rd.peak_vector_sum_ips} in/s at {rd.peak_vector_sum_time_s} sec") + print(f" mic_pspl_dbl: {rd.mic_pspl_dbl}") + print(f" mic_zc_freq_hz: {rd.mic_zc_freq_hz}") + print(f" channel_stats: {len(rd.channel_stats)} rows") + for cs in rd.channel_stats: + print(f" {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} ToP={cs['time_of_peak_s']} Acc={cs['peak_accel_g']} Disp={cs['peak_disp_in']} Test={cs['sensor_check']}") + + # Render the PDF + out_path = REPO / "analysis_idf" / "thor_report.pdf" + pdf_bytes = report_pdf.render_event_report_pdf(rd) + out_path.write_bytes(pdf_bytes) + print() + print(f" PDF written: {out_path} ({len(pdf_bytes)} bytes)") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/e2e_report_idfh.py b/analysis_idf/e2e_report_idfh.py new file mode 100644 index 0000000..05e735d --- /dev/null +++ b/analysis_idf/e2e_report_idfh.py @@ -0,0 +1,91 @@ +"""End-to-end Thor IDFH histogram report PDF rendering.""" +from __future__ import annotations +import sys +import tempfile +import json +import datetime +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from sfm.waveform_store import WaveformStore +from sfm import report_pdf + + +class FakeDb: + def __init__(self, event): + self.event = event + + def get_event(self, _id): + return self.event + + +def main(): + # Use the multi-interval IDFH (81 + trigger row) + idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH" + txt = idfh.parent / "TXT" / f"{idfh.name}.txt" + + with tempfile.TemporaryDirectory() as td: + store = WaveformStore(Path(td)) + ev, rec = store.save_imported_idf( + idfh.read_bytes(), + idfh, + idf_report_text=txt.read_text(errors="replace"), + ) + print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}") + + sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json" + sc = json.loads(sc_path.read_text()) + bw = sc.get("bw_report", {}) + hist = bw.get("histogram", {}) + print(f" bw_report.histogram.start: {hist.get('start')}") + print(f" bw_report.histogram.stop: {hist.get('stop')}") + print(f" bw_report.histogram.n_intervals: {hist.get('n_intervals')}") + print(f" bw_report.histogram.interval_size: {hist.get('interval_size')}") + print(f" bw_report.histogram.interval_size_s: {hist.get('interval_size_s')}") + print(f" bw_report.peaks.tran.ppv_ips: {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}") + + ts = ev.timestamp + ts_iso = None + if ts is not None: + try: + ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat() + except Exception: + pass + fake_row = { + "serial": "UM13981", + "blastware_filename": rec["filename"], + "record_type": "Histogram", + "timestamp": ts_iso, + "sample_rate": ev.sample_rate, + "project": ev.project_info.project if ev.project_info else None, + "client": ev.project_info.client if ev.project_info else None, + "operator": ev.project_info.operator if ev.project_info else None, + "sensor_location": ev.project_info.sensor_location if ev.project_info else None, + "created_at": None, + } + rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="hist-1") + + print() + print("=== ReportData (histogram) ===") + print(f" is_histogram: {rd.is_histogram}") + print(f" histogram_start: {rd.histogram_start_str}") + print(f" histogram_stop: {rd.histogram_stop_str}") + print(f" histogram_n_intervals: {rd.histogram_n_intervals}") + print(f" histogram_interval_size:{rd.histogram_interval_size}") + print(f" histogram_interval_times[:3]: {rd.histogram_interval_times[:3]}") + print(f" histogram_interval_times[-2:]: {rd.histogram_interval_times[-2:]}") + print(f" channel_stats: {len(rd.channel_stats)} rows") + for cs in rd.channel_stats: + print(f" {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} peak_date={cs['peak_date']} peak_time={cs['peak_time']}") + + pdf_bytes = report_pdf.render_event_report_pdf(rd) + out_path = REPO / "analysis_idf" / "thor_report_idfh.pdf" + out_path.write_bytes(pdf_bytes) + print() + print(f" PDF written: {out_path} ({len(pdf_bytes)} bytes)") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/e2e_save_idf.py b/analysis_idf/e2e_save_idf.py new file mode 100644 index 0000000..87e9650 --- /dev/null +++ b/analysis_idf/e2e_save_idf.py @@ -0,0 +1,52 @@ +"""End-to-end ingest test: feed an IDFW + .txt to save_imported_idf in a tmp store.""" +from __future__ import annotations +import sys +from pathlib import Path +import tempfile +import shutil + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from sfm.waveform_store import WaveformStore + + +def main(): + idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW" + txt = idfw.parent / "TXT" / f"{idfw.name}.txt" + + with tempfile.TemporaryDirectory() as td: + store = WaveformStore(Path(td)) + ev, rec = store.save_imported_idf( + idfw.read_bytes(), + idfw, + serial_hint=None, + idf_report_text=txt.read_text(errors="replace"), + ) + print("=== Save result ===") + print(f" serial: {rec['serial']}") + print(f" filename: {rec['filename']}") + print(f" filesize: {rec['filesize']}") + print(f" h5: {rec['hdf5_filename']}") + print(f" sidecar: {rec['sidecar_filename']}") + print() + print("=== Event ===") + print(f" serial: {ev.serial if hasattr(ev,'serial') else '(n/a)'}") + print(f" timestamp: {ev.timestamp}") + print(f" sample_rate: {ev.sample_rate}") + print(f" record_type: {ev.record_type}") + print(f" rectime_sec: {ev.rectime_seconds}") + print(f" raw_samples: Tran={len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0}, Vert={len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0}, Long={len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0}, MicL={len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0}") + if ev.peak_values: + print(f" peaks (txt): Tran={ev.peak_values.tran} Vert={ev.peak_values.vert} Long={ev.peak_values.long}") + print() + + # Verify the h5 file actually got written + h5path = Path(td) / "UM11719" / f"{idfw.name}.h5" + print(f" h5 exists: {h5path.exists()} size={h5path.stat().st_size if h5path.exists() else 0}") + sidecar = Path(td) / "UM11719" / f"{idfw.name}.sfm.json" + print(f" sidecar exists:{sidecar.exists()} size={sidecar.stat().st_size if sidecar.exists() else 0}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/idfh_decode.py b/analysis_idf/idfh_decode.py new file mode 100644 index 0000000..ae4354a --- /dev/null +++ b/analysis_idf/idfh_decode.py @@ -0,0 +1,137 @@ +"""Decode IDFH histogram intervals + verify against sidecar.""" +from __future__ import annotations +import sys +import struct +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + + +SEGMENT_MAGIC = b"\x02\xda\x0a\x00\x00\x00" +SEGMENT_SIZE = 732 # = 10-byte header + 10 × 72-byte intervals + 2-byte tail +INTERVAL_SIZE = 72 +CHANNELS = ("Tran", "Vert", "Long", "MicL") + + +def decode_interval(buf72: bytes) -> dict: + """Decode one 72-byte interval into per-channel min/max/halfp.""" + out = {} + for i, ch in enumerate(CHANNELS): + block = buf72[i*16 : (i+1)*16] + mn = struct.unpack_from(">h", block, 0)[0] + mx = struct.unpack_from(">h", block, 2)[0] + sb = struct.unpack_from(">h", block, 4)[0] + halfp = struct.unpack_from(">H", block, 6)[0] + f10 = struct.unpack_from(">H", block, 10)[0] + f14 = struct.unpack_from(">H", block, 14)[0] + peak_count = max(abs(mn), abs(mx)) + out[ch] = { + "min": mn, + "max": mx, + "field4": sb, + "halfp": halfp, + "field10": f10, + "field14": f14, + "peak": peak_count, + "freq_hz": (512.0 / halfp) if halfp > 5 else None, + } + out["_tail"] = buf72[64:].hex(" ") + return out + + +def walk_idfh(buf: bytes) -> list: + """Walk all interval records in an IDFH file.""" + intervals = [] + # Multi-segment file: every 02 da 0a 00 00 00 marker introduces a segment. + # Single-interval file: just one body header at 0xf96 of form ?? ?? 0a 00 00 00. + # Find them all. + i = 0 + while True: + j = buf.find(b"\x0a\x00\x00\x00", i) + if j < 0: + break + # Validate: the 2 bytes before must form a length, and we want bytes + # [j-2 : j+6] to have a recognisable shape. Actually the cleanest + # filter is "preceded by a length and followed by 00 NN 05 3f". + if j < 2: + i = j + 1 + continue + # Body header form: [length_be_2][0a 00 00 00][00 NN][05 3f] + if j + 10 > len(buf): + break + length = int.from_bytes(buf[j-2:j], "big") + # Verify the segment-marker shape: [length_be][0a 00 00 00][00 NN][05 3f] + if buf[j+4] != 0x00: + i = j + 1 + continue + if buf[j+6:j+8] != b"\x05\x3f": + i = j + 1 + continue + # Header layout (10 bytes): [length_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B] + # Followed by N interval records of 72 bytes each, then 2 tail bytes. + # length value = (N × 72) + 10 (counts bytes from 0x0a... through interval data). + header_start = j - 2 + n_intervals = (length - 10) // INTERVAL_SIZE + interval_start = header_start + 10 + for k in range(n_intervals): + off = interval_start + k * INTERVAL_SIZE + if off + INTERVAL_SIZE > len(buf): + break + chunk = buf[off:off + INTERVAL_SIZE] + intervals.append({"offset": off, **decode_interval(chunk)}) + i = header_start + length + 2 + return intervals + + +def main(): + # Test against multi-segment IDFH + target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH" + sc_path = target.parent / "TXT" / f"{target.name}.txt" + buf = target.read_bytes() + intervals = walk_idfh(buf) + print(f"=== {target.name} ===") + print(f" file size: {len(buf)}") + print(f" decoded intervals: {len(intervals)}") + # Show first 2 + last 2 + sc_rows = [] + for line in sc_path.read_text(errors="replace").splitlines(): + if line.startswith("2022-") or line.startswith("2023-"): + sc_rows.append(line) + print(f" sidecar rows: {len(sc_rows)}") + + print() + for k in [0, 1, 78, 79, 80]: + if k >= len(intervals): + continue + iv = intervals[k] + print(f"--- interval {k} @0x{iv['offset']:04x} ---") + for ch in CHANNELS: + d = iv[ch] + peak_ips = d["peak"] / 32768 * 10.0 + print(f" {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s) halfp={d['halfp']:5d} freq={d['freq_hz']}") + # sidecar row + if k < len(sc_rows): + print(f" SC: {sc_rows[k]}") + + # Test single-interval IDFH + print() + target2 = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH" + sc2 = target2.parent / "TXT" / f"{target2.name}.txt" + buf2 = target2.read_bytes() + intervals2 = walk_idfh(buf2) + print(f"=== {target2.name} ===") + print(f" file size: {len(buf2)}, decoded intervals: {len(intervals2)}") + if intervals2: + iv = intervals2[0] + for ch in CHANNELS: + d = iv[ch] + peak_ips = d["peak"] / 32768 * 10.0 + print(f" {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s) halfp={d['halfp']:5d} freq={d['freq_hz']}") + sc_rows2 = [l for l in sc2.read_text(errors='replace').splitlines() if l.startswith("2023-")] + if sc_rows2: + print(f" SC: {sc_rows2[0]}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/idfh_period.py b/analysis_idf/idfh_period.py new file mode 100644 index 0000000..8aad756 --- /dev/null +++ b/analysis_idf/idfh_period.py @@ -0,0 +1,41 @@ +"""Find IDFH interval period via auto-correlation of structural patterns.""" +from __future__ import annotations +import sys +from pathlib import Path +from collections import Counter + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + + +def main(): + target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH" + buf = target.read_bytes() + body_start = 0xF96 + body_end = 0x270C + body = buf[body_start:body_end] + print(f"body size: {len(body)} bytes (file {len(buf)} bytes)") + + # For each candidate interval size, count how many bytes at fixed offsets within + # each interval are zero (consistent column-zero pattern indicates correct size). + print() + print("=== zero-column score by interval size (higher = more likely) ===") + best = [] + for sz in range(16, 100): + n = len(body) // sz + if n < 30: + continue + # For each column position within an interval, count how many of n intervals have zero + score = 0 + for col in range(sz): + zeros = sum(1 for i in range(n) if body[i*sz + col] == 0) + if zeros >= n * 0.9: + score += 1 + best.append((score, sz, n)) + best.sort(reverse=True) + for score, sz, n in best[:10]: + print(f" size={sz:3d} n_intervals={n} consistently-zero-cols={score}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/per_file_detail.py b/analysis_idf/per_file_detail.py new file mode 100644 index 0000000..b9040f3 --- /dev/null +++ b/analysis_idf/per_file_detail.py @@ -0,0 +1,40 @@ +"""Per-file accuracy + sample-count details.""" +from __future__ import annotations +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from micromate.idf_file import read_idf_file +from analysis_idf.recon import load_sidecar_samples + + +def main(): + root = REPO / "tests/fixtures/THORDATA_example" + files = sorted([f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]) + GEO_LSB = 0.0003 + # Limit to first 15 successful files for detail. + shown = 0 + for f in files: + try: + res = read_idf_file(f) + except Exception: + continue + sc_path = f.parent / "TXT" / f"{f.name}.txt" + if not sc_path.exists(): + continue + sc = load_sidecar_samples(sc_path) + sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]] + dec = res.samples.get("Tran", []) + n = min(len(sc_tran), len(dec)) + exact = sum(1 for i in range(n) if sc_tran[i] == dec[i]) if n else 0 + pct = 100.0 * exact / n if n else 0.0 + print(f"{f.name:40s} size={f.stat().st_size:6d} sc_n={len(sc_tran):4d} dec_n={len(dec):4d} exact={pct:.1f}%") + shown += 1 + if shown >= 20: + break + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/probe_boundary.py b/analysis_idf/probe_boundary.py new file mode 100644 index 0000000..bbf2722 --- /dev/null +++ b/analysis_idf/probe_boundary.py @@ -0,0 +1,64 @@ +"""Look at what's at the divergence boundary.""" +from __future__ import annotations +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from minimateplus.waveform_codec import walk_body, find_data_start, parse_segment_header +from analysis_idf.recon import TARGET, TXT, load_sidecar_samples + + +def main(): + buf = TARGET.read_bytes() + body = buf[0x0f1f:] + start = find_data_start(body) + print(f"data_start: {start} (= file offset 0x{0x0f1f + start:04x})") + + blocks = walk_body(body, start) + print(f"{len(blocks)} blocks total") + print() + + # First 25 blocks + print("=== first 30 blocks ===") + for i, b in enumerate(blocks[:30]): + body_off = 0x0f1f + b.offset + if b.tag_hi == 0x40: + hdr = parse_segment_header(b) + print(f" [{i:3d}] @0x{body_off:04x} {b.kind} (segment header) counter={hdr['counter'] if hdr else '?'} field2={hdr['field2'].hex() if hdr else '?'} anchor={hdr['anchor_bytes'].hex() if hdr else '?'} tail={hdr['tail'].hex() if hdr else '?'}") + else: + print(f" [{i:3d}] @0x{body_off:04x} {b.kind} len={b.length} data={b.data[:16].hex()}") + print() + + # Cumulative sample counts per block to find which block contains sample 254 + print("=== cumulative samples through blocks ===") + cur_ch = "Tran" + rotation = ["Vert", "Long", "MicL", "Tran"] + seg_count = 0 + samples_in_curseg = 2 # preamble Tran[0], Tran[1] + for i, b in enumerate(blocks[:30]): + if b.tag_hi == 0x40: + seg_count += 1 + prev_ch = cur_ch + cur_ch = rotation[(seg_count - 1) % 4] + print(f" [{i:3d}] 40 02 -> end of {prev_ch} segment, start {cur_ch} (segment {seg_count})") + samples_in_curseg = 2 # anchors + elif (b.tag_hi & 0xF0) == 0x10: + nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo + samples_in_curseg += nn + print(f" [{i:3d}] {b.kind} nibble: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}") + elif (b.tag_hi & 0xF0) == 0x20: + nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo + samples_in_curseg += nn + print(f" [{i:3d}] {b.kind} int8: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}") + elif b.tag_hi == 0x00: + samples_in_curseg += b.tag_lo + print(f" [{i:3d}] {b.kind} RLE: +{b.tag_lo}, ch={cur_ch}, ch_total~{samples_in_curseg}") + elif b.tag_hi == 0x30: + samples_in_curseg += b.tag_lo + print(f" [{i:3d}] {b.kind} packed12: +{b.tag_lo} samples, ch={cur_ch}, ch_total~{samples_in_curseg}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/recon.py b/analysis_idf/recon.py new file mode 100644 index 0000000..f87a060 --- /dev/null +++ b/analysis_idf/recon.py @@ -0,0 +1,89 @@ +"""Reconnaissance helpers for cracking the Thor IDFW binary.""" +from __future__ import annotations + +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +TARGET = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW" +TXT = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/TXT/UM11719_20231219162723.IDFW.txt" + + +def hex_at(buf: bytes, off: int, n: int = 32) -> str: + chunk = buf[off : off + n] + hexs = " ".join(f"{b:02x}" for b in chunk) + asc = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk) + return f"{off:04x}: {hexs} {asc}" + + +def find_all(buf: bytes, needle: bytes) -> list[int]: + out: list[int] = [] + i = 0 + while True: + j = buf.find(needle, i) + if j < 0: + break + out.append(j) + i = j + 1 + return out + + +def load_sidecar_samples(path: Path) -> dict[str, list[float]]: + """Parse the txt sample table — Tran/Vert/Long/MicL.""" + out = {"Tran": [], "Vert": [], "Long": [], "MicL": []} + in_block = False + for line in path.read_text(errors="replace").splitlines(): + if not in_block: + if line.strip() == "Waveform Data Channels": + in_block = True + continue + if line.startswith("Waveform Data USB Channels"): + break + parts = line.split("\t") + # First row is the header "\tTran\tVert\tLong\tMicL" + if len(parts) >= 5 and parts[1] == "Tran": + continue + if len(parts) < 5: + continue + try: + out["Tran"].append(float(parts[1])) + out["Vert"].append(float(parts[2])) + out["Long"].append(float(parts[3])) + out["MicL"].append(float(parts[4])) + except ValueError: + continue + return out + + +def main(): + buf = TARGET.read_bytes() + samples = load_sidecar_samples(TXT) + print(f"file size: {len(buf)} bytes") + print(f"sample rows: Tran={len(samples['Tran'])} Vert={len(samples['Vert'])} Long={len(samples['Long'])} MicL={len(samples['MicL'])}") + print(f"first 6 Tran samples: {samples['Tran'][:6]}") + print(f"first 6 Vert samples: {samples['Vert'][:6]}") + print(f"first 6 Long samples: {samples['Long'][:6]}") + print(f"first 6 MicL samples: {samples['MicL'][:6]}") + + print() + print("=== BW magic '00 02 00' positions ===") + hits = find_all(buf, b"\x00\x02\x00") + print(f"{len(hits)} hits") + for h in hits[:20]: + print(hex_at(buf, h, 24)) + + print() + print("=== '40 02' segment-header positions ===") + hits = find_all(buf, b"\x40\x02") + print(f"{len(hits)} hits") + for h in hits: + ctx_pre = buf[max(0, h - 4): h].hex() + ctx_post = buf[h: h + 20].hex() + # Show byte preceding to help identify real headers vs casual occurrences + print(f" 0x{h:04x} pre={ctx_pre} post={ctx_post}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/seg_resync.py b/analysis_idf/seg_resync.py new file mode 100644 index 0000000..6697cc8 --- /dev/null +++ b/analysis_idf/seg_resync.py @@ -0,0 +1,40 @@ +"""Find each segment boundary in the channel and check if errors reset there.""" +from __future__ import annotations +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from minimateplus.waveform_codec import decode_waveform_v2 +from analysis_idf.recon import TARGET, TXT, load_sidecar_samples + + +def main(): + buf = TARGET.read_bytes() + sc = load_sidecar_samples(TXT) + decoded = decode_waveform_v2(buf[0x0f1f:]) + GEO_LSB = 0.0003 + + for ch in ("Tran", "Vert", "Long"): + sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]] + dec = decoded[ch] + # Find every transition where error becomes zero from nonzero (or grows from zero) + # Print indices where dec resyncs back to exact match. + n = min(len(sc_counts), len(dec)) + events = [] + prev_match = True + for i in range(n): + match = sc_counts[i] == dec[i] + if match != prev_match: + kind = "RESYNC" if match else "DIVERGE" + events.append((i, kind, sc_counts[i], dec[i])) + prev_match = match + print(f"{ch}: {len(events)} transitions") + for i, kind, sc_v, dec_v in events[:20]: + print(f" idx {i:4d} {kind:8s} sc={sc_v:6d} dec={dec_v:6d} diff={dec_v-sc_v:+d}") + print() + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/smoke_idfh.py b/analysis_idf/smoke_idfh.py new file mode 100644 index 0000000..ab1eb64 --- /dev/null +++ b/analysis_idf/smoke_idfh.py @@ -0,0 +1,46 @@ +"""Smoke-test read_idf_file on IDFH across the corpus.""" +from __future__ import annotations +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from micromate.idf_file import read_idf_file + + +def main(): + target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH" + result = read_idf_file(target) + ev = result.event + print(f"=== {target.name} ===") + print(f" signature: {result.signature}") + print(f" serial: {ev.serial}") + print(f" timestamp: {ev.timestamp}") + print(f" sample_rate: {ev.sample_rate}") + print(f" kind: {ev.kind}") + print(f" intervals: {len(result.intervals or [])}") + print(f" peaks: T={ev.peaks.transverse_ips:.4f} V={ev.peaks.vertical_ips:.4f} L={ev.peaks.longitudinal_ips:.4f}") + print() + + root = REPO / "tests/fixtures/THORDATA_example" + files = list(root.rglob("*.IDFH")) + ok = fail = nyi = 0 + total_intervals = 0 + for f in files: + try: + r = read_idf_file(f) + ok += 1 + total_intervals += len(r.intervals or []) + except NotImplementedError: + nyi += 1 + except Exception as exc: + fail += 1 + if fail <= 3: + print(f" FAIL: {f.name}: {type(exc).__name__}: {exc}") + print(f"Corpus: {len(files)} IDFH files | ok={ok} fail={fail} nyi={nyi}") + print(f"Total intervals decoded: {total_intervals}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/smoke_test.py b/analysis_idf/smoke_test.py new file mode 100644 index 0000000..a0be7c6 --- /dev/null +++ b/analysis_idf/smoke_test.py @@ -0,0 +1,48 @@ +"""Smoke-test read_idf_file across the sample corpus.""" +from __future__ import annotations +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from micromate.idf_file import read_idf_file, geo_count_to_ips, mic_count_to_psi + + +def main(): + target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW" + result = read_idf_file(target) + ev = result.event + print(f"=== {target.name} ===") + print(f" signature: {result.signature}") + print(f" serial: {ev.serial}") + print(f" timestamp: {ev.timestamp}") + print(f" sample_rate: {ev.sample_rate}") + print(f" record_time: {ev.record_time_sec}") + print(f" calibration: {result.binary_metadata.calibration_date}") + print(f" Tran samples: {len(result.samples['Tran'])}, peak_ips={ev.peaks.transverse_ips:.4f}") + print(f" Vert samples: {len(result.samples['Vert'])}, peak_ips={ev.peaks.vertical_ips:.4f}") + print(f" Long samples: {len(result.samples['Long'])}, peak_ips={ev.peaks.longitudinal_ips:.4f}") + print(f" MicL samples: {len(result.samples['MicL'])}") + print() + + # Corpus sweep + root = REPO / "tests/fixtures/THORDATA_example" + files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")] + ok = fail = nyi = 0 + for f in files: + try: + r = read_idf_file(f) + ok += 1 + except NotImplementedError: + nyi += 1 + except Exception as exc: + fail += 1 + if fail <= 5: + print(f" FAIL: {f.name}: {type(exc).__name__}: {exc}") + print() + print(f"Corpus: {len(files)} IDFW files | ok={ok} fail={fail} not-implemented={nyi}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/test_adapter.py b/analysis_idf/test_adapter.py new file mode 100644 index 0000000..9b12d12 --- /dev/null +++ b/analysis_idf/test_adapter.py @@ -0,0 +1,47 @@ +"""Verify build_bw_report_from_idf against a known sidecar.""" +from __future__ import annotations +import json +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from micromate.idf_ascii_report import parse_idf_report +from micromate.idf_to_bw_report import build_bw_report_from_idf +from micromate.idf_file import read_idf_file + + +def show(prefix: str, d: dict, indent: int = 0): + for k, v in d.items(): + if isinstance(v, dict): + print(f"{' '*indent}{prefix}{k}:") + show("", v, indent + 1) + else: + print(f"{' '*indent}{prefix}{k}: {v!r}") + + +def main(): + base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719" + idfw = base / "UM11719_20231219162723.IDFW" + txt = base / "TXT" / f"{idfw.name}.txt" + + report_dict = parse_idf_report(txt.read_text(errors="replace")) + res = read_idf_file(idfw) + bw = build_bw_report_from_idf(report_dict, binary_md=res.binary_metadata) + + print("=== IDFW → bw_report ===") + show("", bw) + + print() + print("=== IDFH (single trigger row) ===") + idfh = base / "UM11719_20231219162648.IDFH" + txt_h = base / "TXT" / f"{idfh.name}.txt" + rh = parse_idf_report(txt_h.read_text(errors="replace")) + res_h = read_idf_file(idfh) + bw_h = build_bw_report_from_idf(rh, binary_md=res_h.binary_metadata, intervals=res_h.intervals) + show("", bw_h) + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/thor_report.pdf b/analysis_idf/thor_report.pdf new file mode 100644 index 0000000..52b3096 Binary files /dev/null and b/analysis_idf/thor_report.pdf differ diff --git a/analysis_idf/thor_report_idfh.pdf b/analysis_idf/thor_report_idfh.pdf new file mode 100644 index 0000000..2cf2b4f Binary files /dev/null and b/analysis_idf/thor_report_idfh.pdf differ diff --git a/analysis_idf/trace_path.py b/analysis_idf/trace_path.py new file mode 100644 index 0000000..fb9fb04 --- /dev/null +++ b/analysis_idf/trace_path.py @@ -0,0 +1,73 @@ +"""Trace Tran sample-by-sample to find exactly where the codec drifts.""" +from __future__ import annotations +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from analysis_idf.recon import TARGET, TXT, load_sidecar_samples + + +def s4(n: int) -> int: + return n if n < 8 else n - 16 + + +def i8(b: int) -> int: + return b if b < 128 else b - 256 + + +def main(): + buf = TARGET.read_bytes() + sc = load_sidecar_samples(TXT) + GEO_LSB = 0.0003 + sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]] + + body = buf[0x0f1f:] + # Tran[0], Tran[1] from preamble + t0 = int.from_bytes(body[3:5], "big", signed=True) + t1 = int.from_bytes(body[5:7], "big", signed=True) + print(f"preamble Tran[0]={t0} Tran[1]={t1} (sidecar: {sc_tran[0]}, {sc_tran[1]})") + + # Block 0: 10 f8 at body[7:9] + print(f"block 0: tag {body[7]:02x} {body[8]:02x}") + print(f" block 0 first 10 data bytes: {body[9:19].hex()}") + + # Walk block 0 manually, comparing each sample + cur = t1 + samples = [t0, t1] + block_off = 7 + nn = body[8] + print(f" NN = {nn}") + data = body[9 : 9 + nn // 2] + for byi, byte in enumerate(data): + for nib_idx, nib in enumerate(((byte >> 4) & 0xF, byte & 0xF)): + cur += s4(nib) + samples.append(cur) + idx = len(samples) - 1 + if 0 <= idx < len(sc_tran): + sc_v = sc_tran[idx] + match = "✓" if sc_v == cur else "✗" + if idx < 12 or 240 <= idx <= 260: + print(f" idx {idx:3d}: nibble byte={byte:02x} nib={nib:x} delta={s4(nib):+d} cur={cur:+d} sc={sc_v:+d} {match}") + + print(f"end of block 0: cur={cur}, len(samples)={len(samples)}, decoder expected 250 here") + # Block 1: 20 28 starts at offset 9 + 124 = 133 from block_off=7 + block1_off = 9 + nn // 2 + print(f"block 1: tag {body[block1_off]:02x} {body[block1_off+1]:02x} (expecting 20 28)") + nn1 = body[block1_off + 1] + print(f" block 1 NN = {nn1}") + data1 = body[block1_off + 2 : block1_off + 2 + nn1] + for byi, byte in enumerate(data1): + cur += i8(byte) + samples.append(cur) + idx = len(samples) - 1 + if idx < len(sc_tran): + sc_v = sc_tran[idx] + match = "✓" if sc_v == cur else "✗" + if 248 <= idx <= 295: + print(f" idx {idx:3d}: int8 byte={byte:02x} delta={i8(byte):+d} cur={cur:+d} sc={sc_v:+d} {match}") + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/try_codec.py b/analysis_idf/try_codec.py new file mode 100644 index 0000000..e0f5269 --- /dev/null +++ b/analysis_idf/try_codec.py @@ -0,0 +1,42 @@ +"""Feed candidate body offsets to the BW codec and compare with sidecar.""" +from __future__ import annotations +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from minimateplus.waveform_codec import decode_waveform_v2, walk_body, find_data_start +from analysis_idf.recon import TARGET, TXT, load_sidecar_samples + + +def main(): + buf = TARGET.read_bytes() + sc = load_sidecar_samples(TXT) + # Sidecar samples in 0.0003 counts (Thor geo LSB). + sc_tran = [int(round(v / 0.0003)) for v in sc["Tran"][:30]] + sc_vert = [int(round(v / 0.0003)) for v in sc["Vert"][:30]] + sc_long = [int(round(v / 0.0003)) for v in sc["Long"][:30]] + sc_micl = [int(round(v / 1e-6)) for v in sc["MicL"][:30]] # 1 µ unit for mic? Will iterate. + print(f"sidecar Tran (counts): {sc_tran}") + print(f"sidecar Vert (counts): {sc_vert}") + print(f"sidecar Long (counts): {sc_long}") + print(f"sidecar MicL (×1e-6): {sc_micl}") + print() + + # Try candidate body start offsets. + for off in (0x0f1f, 0x1057, 0x11f1, 0x1333, 0x1bde, 0x0d30): + print(f"=== body @ 0x{off:04x} ===") + body = buf[off:] + decoded = decode_waveform_v2(body) + if not decoded: + print(" decode_waveform_v2 returned None") + continue + for ch in ("Tran", "Vert", "Long", "MicL"): + arr = decoded.get(ch, []) + print(f" {ch}[{len(arr)}]: {arr[:20]}") + print() + + +if __name__ == "__main__": + main() diff --git a/analysis_idf/verify_full.py b/analysis_idf/verify_full.py new file mode 100644 index 0000000..ebc8b49 --- /dev/null +++ b/analysis_idf/verify_full.py @@ -0,0 +1,51 @@ +"""Verify decode_waveform_v2 against sidecar across all 2304 samples per channel.""" +from __future__ import annotations +import sys +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO)) + +from minimateplus.waveform_codec import decode_waveform_v2 +from analysis_idf.recon import TARGET, TXT, load_sidecar_samples + + +def main(): + buf = TARGET.read_bytes() + sc = load_sidecar_samples(TXT) + body = buf[0x0f1f:] + decoded = decode_waveform_v2(body) + + print(f"Sidecar lengths: Tran={len(sc['Tran'])} Vert={len(sc['Vert'])} Long={len(sc['Long'])} MicL={len(sc['MicL'])}") + print(f"Decoded lengths: Tran={len(decoded['Tran'])} Vert={len(decoded['Vert'])} Long={len(decoded['Long'])} MicL={len(decoded['MicL'])}") + print() + + GEO_LSB = 0.0003 # in/s per count + for ch in ("Tran", "Vert", "Long"): + sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]] + dec = decoded[ch] + n = min(len(sc_counts), len(dec)) + matches = sum(1 for i in range(n) if sc_counts[i] == dec[i]) + first_mismatch = next((i for i in range(n) if sc_counts[i] != dec[i]), None) + print(f"{ch}: compared {n}, exact matches {matches} ({100*matches/n:.2f}%)") + if first_mismatch is not None: + i = first_mismatch + print(f" first mismatch at idx {i}: sidecar={sc_counts[i]} ({sc[ch][i]}), decoded={dec[i]}") + print(f" context sidecar[{i-2}..{i+5}]: {sc_counts[max(0,i-2):i+5]}") + print(f" context decoded[{i-2}..{i+5}]: {dec[max(0,i-2):i+5]}") + + # MicL: find the multiplicative factor that fits + print() + print("=== MicL scale analysis ===") + sc_micl = sc["MicL"] + dec_micl = decoded["MicL"] + # Skip zero values when computing ratio + ratios = [sc_micl[i] / dec_micl[i] for i in range(min(50, len(sc_micl), len(dec_micl))) if dec_micl[i] != 0] + if ratios: + avg = sum(ratios) / len(ratios) + print(f" avg ratio sidecar/decoded over first 50 nonzero: {avg:.4e} (n={len(ratios)})") + print(f" ratios sample: {[f'{r:.4e}' for r in ratios[:6]]}") + + +if __name__ == "__main__": + main() diff --git a/docs/idf_protocol_reference.md b/docs/idf_protocol_reference.md index 643de53..aef3c69 100644 --- a/docs/idf_protocol_reference.md +++ b/docs/idf_protocol_reference.md @@ -6,11 +6,68 @@ Series IV event-file format. Sibling to Series III "Rosetta Stone") — this doc holds what we know so far and the open questions still to crack. -**Status (2026-05-20):** ASCII text sidecar fully decoded (1,014 -sample files round-trip). Binary `.IDFH` / `.IDFW` codec -**not yet implemented** — binaries are stored opaquely by -`WaveformStore.save_imported_idf`, with metadata sourced from the -paired `.txt` sidecar. +**Status (2026-05-28):** ASCII text sidecar fully decoded (1,014 +sample files round-trip). **Thor IDFW** binary now decodes via +`micromate.idf_file.read_idf_file()` — reuses the BW segment-rotated +block codec verbatim at fixed body offset `0x0f1f`; metadata (serial, +timestamp, sample_rate, record_time, calibration_date) extracted from +the binary header. Sample fidelity is 87–99% byte-exact on quiet +events; loud events hit the BW codec's known walker-stops-early +limitation. Residual ~3% drift on per-sample deltas (likely a +Thor-specific 12-bit delta refinement not yet modelled). + +**Thor IDFH histograms also decoded.** Body has one or more segments; +each 12-byte segment header `[length_be 2B][0a 00 00 00][00 NN][05 3f]` +introduces `N = (length - 10) // 72` interval records of 72 bytes +each. Each interval = 4 × 16-byte per-channel records: +`[int16 min][int16 max][int16 ??][uint16 halfp][2B 00][uint16 ??][2B 00][uint16 ??]`. +Geo peak `= max(|min|, |max|) / 32768 × 10` in/s (matches sidecar +~1.8%); freq `= 512 / halfp` Hz (None for halfp ≤ 5 → ">100" +sentinel). Corpus: **all 859 Thor IDFH files decode, 181,071 +intervals**. Wired through `read_idf_file()` → +`save_imported_idf()` → sidecar's `extensions.idf_intervals`. + +**Note on the BE9439 outliers in the example corpus:** Two files +(`BE9439_20200713131747.IDFW` and `BE9439_20200713124251.IDFH`) are +**Series III Blastware** binaries, not Thor. Provenance: TMI tried +to use Thor to manage auto-call-homes for Series III units; the +experiment didn't work out, but it did leave a few BW event files +in Thor's per-serial directory structure with `.IDFW`/`.IDFH` +extensions — Thor's forwarder applied its own naming convention to +the BW bodies it was relaying. Their header `10 00 01 80 00 00 +Instantel STRT ff fe ` is the BW SUB 5A STRT +record, not a Thor body preamble. The reader detects them by +signature and raises `NotImplementedError` pointing callers at +`read_blastware_file()`, which extracts BW-format peaks from them. + +**Still NYI for Thor IDFH:** per-channel `int16 field4` (possibly +time-of-peak); the two uint16 fields (probably PVS contributions); +8-byte interval tail (PVS data); mic dB(L) exact conversion constant. + +### Codec breakthroughs (2026-05-28) + +- **Body offset is a fixed `0x0f1f`** across 151/154 corpus IDFW + files. Preceded by a 4-byte record-type marker (`46 00 00 00`) + + magic preamble `00 02 00 [Tran[0] BE] [Tran[1] BE]`. +- **Sample stream is BW's segment-rotated block codec verbatim.** + Thor reuses `10 NN` (nibble), `20 NN` (int8), `00 NN` (RLE), + `30 NN` (packed12), `40 02` (segment header) tags with the same + semantics. Channel rotation Tran→Vert→Long→MicL. +- **Geo LSB = 0.0003 in/s** (not BW's 0.005), because Thor's 16-bit + ADC range maps to 10 in/s without the 16-count BW quantization step. +- **Mic ≈ 2.14×10⁻⁶ psi/count** (rough scale; refine after channel + block calibration constants are decoded). +- **BW compliance anchor `\xbe\x80\x00\x00\x00\x00` reappears at + IDFW offset 0x952** — sample_rate at anchor−6 (uint16 BE), + record_time at anchor+6 (float32 BE), same layout as BW. +- **Event timestamp at offset 0x97A** — 8 bytes `[day][month] + [year_be][unk][hour][min][sec]`. Stop-time mirrors at 0x982. +- **Serial as null-terminated ASCII at 0x14E**. +- **Calibration date** at 0x194–0x197 (day, month, year_be). +- Per-sample residual drift of ~3% suggests Thor encodes int8/nibble + deltas with an extra refinement bit that BW doesn't carry — + unsolved; errors resync within a few samples so cumulative impact + is small. --- diff --git a/micromate/idf_ascii_report.py b/micromate/idf_ascii_report.py index 853478d..d4db2c6 100644 --- a/micromate/idf_ascii_report.py +++ b/micromate/idf_ascii_report.py @@ -210,8 +210,7 @@ def parse_idf_report(text: Union[str, bytes]) -> Dict[str, Any]: "long_peak_acceleration", "tran_peak_displacement", "vert_peak_displacement", "long_peak_displacement", - "tran_time_of_peak", "vert_time_of_peak", "long_time_of_peak", - "mic_time_of_peak", "mic_zc_freq", + "mic_zc_freq", ) for key in float_fields: v = raw.get(key) @@ -223,6 +222,22 @@ def parse_idf_report(text: Union[str, bytes]) -> Dict[str, Any]: else: out.pop(key, None) + # Time-of-peak: Thor labels these "TimeofPeak" (lowercase "of") so the + # normalizer produces "*_timeof_peak". Map them to the canonical + # ``*_time_of_peak`` output keys for downstream consumers. + for raw_key, out_key in ( + ("tran_timeof_peak", "tran_time_of_peak"), + ("vert_timeof_peak", "vert_time_of_peak"), + ("long_timeof_peak", "long_time_of_peak"), + ("mic_timeof_peak", "mic_time_of_peak"), + ): + v = raw.get(raw_key) + if v is None: + continue + fv = _parse_float(v) + if fv is not None: + out[out_key] = fv + # Microphone — Thor reports MicPSPL (dB(L)) which is the closest # analogue to BW's mic_ppv. The raw "99.4 dB(L)" string stays in # `out` under the original `mic_pspl` key for display; the parsed diff --git a/micromate/idf_file.py b/micromate/idf_file.py index b3cd669..f3db878 100644 --- a/micromate/idf_file.py +++ b/micromate/idf_file.py @@ -1,64 +1,530 @@ """ -micromate/idf_file.py — placeholder for the Thor IDF binary codec. +micromate/idf_file.py — Thor IDF binary codec. -Thor's ``.IDFH`` (histogram) and ``.IDFW`` (waveform) event files are an -Instantel proprietary binary format that has not yet been reverse- -engineered. Today seismo-relay treats them as opaque blobs: -``WaveformStore.save_imported_idf`` stores the bytes verbatim and reads -all device-authoritative metadata from the paired ``.IDFW.txt`` / -``.IDFH.txt`` ASCII sidecar (parsed by ``idf_ascii_report.py``). +Decodes the Instantel Micromate Series IV ``.IDFW`` (waveform) and +``.IDFH`` (histogram) binary on-disk format. Sister module to +``minimateplus/event_file_io.py``. -When we crack the binary codec — same reverse-engineering playbook we -used to byte-perfect-parse Series III BW files (see -``docs/instantel_protocol_reference.md`` and ``minimateplus/event_file_io.py``) -— this module will grow: +Status (2026-05-28): - - ``read_idf_file(path) -> IdfEvent`` - Parse a ``.IDFW``/``.IDFH`` binary and return a fully populated - ``IdfEvent`` whose waveform-sample arrays come from the binary - (the .txt sidecar's tabular sample block being a best-effort - check). Lets us ingest Thor events even when the operator - hasn't enabled the .txt exporter — closing the - ``had_report=False`` gap that the thor-watcher forwarder - currently tolerates as a known limitation. +- **Genuine Series IV / Thor binaries** are all signed + ``00 12 01 00 00 00 Instantel\\0`` (sig-A in earlier notes). Two + Series III (Blastware) binaries appear in the example corpus + (``BE9439_*``) — they share the ``.IDFW``/``.IDFH`` extension by + filing convention but carry a BW STRT header (``10 00 01 80 00 00 + Instantel STRT...``) and are NOT Thor data. The reader detects + them by signature and raises NotImplementedError pointing callers + at ``minimateplus.event_file_io.read_blastware_file()``. +- **IDFW waveform body** reuses the BW segment-rotated block codec + verbatim. Body always starts at file offset ``0x0f1f``. Samples + decoded via ``minimateplus.waveform_codec.decode_waveform_v2`` + with 87–99% byte-exact match against ``.IDFW.txt`` sidecar (quiet + events). Loud events hit the BW codec's known walker-stops-early + limit. Residual ~3% drift on per-sample deltas — likely a + Thor-specific 12-bit delta refinement that BW's codec doesn't + model. Geo LSB = 0.0003 in/s; mic factor ~2.14e-6 psi/count. +- **IDFH histogram body**: 12-byte segment header + ``[len_be 2B] 0a 00 00 00 [00 NN_counter] 05 3f`` introduces a + segment of ``N`` 72-byte interval records (``N = (len - 10) // 72``). + Each record holds 4 × 16-byte per-channel min/max/halfp + 8-byte + tail. Geo peaks via ``max(|min|, |max|) / 32768 × 10`` in/s + (matches sidecar within ~1.8%), freq via ``512 / halfp`` Hz. + **All 859 Thor IDFH files in the corpus decode (181,071 intervals).** +- Binary metadata directly extracted: serial, timestamp, sample_rate, + record_time, calibration_date. Other fields fall back to the paired + ``.IDFW.txt`` / ``.IDFH.txt`` sidecar (consumed by + ``WaveformStore.save_imported_idf``). - - ``write_idf_file(path, event)`` (eventually) - Round-trip event reconstruction, used for verifying the codec - against captured device files the way ``write_blastware_file`` - verifies the Series III codec. - - - Helpers for decoding the binary's per-channel sample arrays into - physical units, the per-event flash buffer's monitor-log records, - etc. - -The reverse-engineering path: pair every ``.IDFW`` binary in -``thor-watcher/example-data/`` with its sibling ``.IDFW.txt``, treating -the txt's "Waveform Data Channels" block as ground-truth, and align -the binary's per-channel int16-or-similar arrays against it. Header -fields (sample rate, channel count, record time, timestamps) sit before -the sample block — same approach as the BW codec where ASCII strings -inside the binary (``Project:``, ``Client:``, etc.) anchored field -discovery. +The full reverse-engineering writeup lives in +``docs/idf_protocol_reference.md``. """ from __future__ import annotations +import datetime +import struct +from dataclasses import dataclass from pathlib import Path -from typing import Union +from typing import Optional, Union -from .models import IdfEvent +from minimateplus.waveform_codec import decode_waveform_v2 + +from .models import IdfEvent, IdfPeaks, IdfReport -def read_idf_file(path: Union[str, Path]) -> "IdfEvent": - """Parse a Thor ``.IDFW``/``.IDFH`` binary into an ``IdfEvent``. +# Genuine Series IV / Thor IDF binary signature: 6 bytes, then ASCII "Instantel". +_THOR_PREFIX = b"\x00\x12\x01\x00\x00\x00" +# Stray Series III (Blastware) binaries that occasionally turn up in Thor +# corpus directories renamed to the .IDFW/.IDFH convention. Their header +# (`10 00 01 80 00 00 Instantel STRT ...`) is byte-for-byte a BW SUB 5A +# STRT record, not a Thor binary. Detected so we can refuse-and-route +# rather than mis-parse. +_BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00" +_INSTANTEL_TAG = b"Instantel" - Not yet implemented. When implemented, this will be the canonical - entry point for reading Thor binaries — the ASCII sidecar parser - becomes an optional fast-path metadata supplement rather than the - sole source of device-authoritative data. +# Most common body offset for sig-A IDFW files (~50% of prod events; +# 151/154 in the original tests/fixtures/THORDATA_example corpus). The +# body is the segment-rotated block stream consumed by decode_waveform_v2; +# bytes [0:3] are the magic ``00 02 00`` preamble. Production events +# routinely use other offsets — see :func:`_find_waveform_body_offset` +# for the dynamic scan. This constant survives only as the priority hint. +_BODY_START_SIG_A = 0x0F1F + +# Magic bytes that mark a candidate waveform-body preamble. +_BODY_MAGIC = b"\x00\x02\x00" + +# Where to start looking for body candidates inside the file. Skip the +# fixed-header region where the same magic legitimately appears inside +# channel-test records and the compliance block (offsets 0x015d, 0x091c, +# 0x0ae2, 0x0d30 in observed events). +_BODY_SCAN_FLOOR = 0x0E00 + +# Geophone count → in/s, derived from sidecar ground truth: the smallest +# non-zero sample in 1,014-file corpus is 0.0003 in/s. +_GEO_LSB_IPS = 0.0003 + +# Microphone count → psi, derived from sidecar regression on 50 sample +# pairs from UM11719_20231219162723.IDFW (mic-heavy event). +_MIC_LSB_PSI = 2.14e-6 + +# IDFH histogram constants. +_IDFH_INTERVAL_SIZE = 72 # bytes per per-interval record +_IDFH_SEGMENT_HEADER = 10 # bytes: [len_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B] +_IDFH_SEGMENT_TAIL = 2 # bytes after the interval data block, before next marker +_IDFH_HALFP_FREQ_NUM = 512.0 # freq_hz = NUM / halfp; halfp ≤ 5 means ">100 Hz" sentinel +_IDFH_GEO_FULL_SCALE = 10.0 # in/s — Normal range +_IDFH_INT16_FS = 32768.0 +_IDFH_CHANNELS = ("Tran", "Vert", "Long", "MicL") + + +# ─── Binary metadata extraction ───────────────────────────────────────────── + + +@dataclass +class IdfBinaryMetadata: + """Fields recoverable from the sig-A binary header (no .txt needed).""" + serial: Optional[str] = None + event_datetime: Optional[datetime.datetime] = None + sample_rate: Optional[int] = None + record_time_sec: Optional[float] = None + calibration_date: Optional[datetime.date] = None + + +def _read_ascii_z(buf: bytes, off: int, maxlen: int = 64) -> Optional[str]: + if off >= len(buf): + return None + end = buf.find(b"\x00", off, off + maxlen) + if end < 0: + end = min(off + maxlen, len(buf)) + s = buf[off:end].decode("ascii", errors="replace").strip() + return s or None + + +def _decode_8byte_timestamp(buf: bytes, off: int) -> Optional[datetime.datetime]: + """Layout: ``[day][month][year_hi][year_lo][unknown][hour][min][sec]``.""" + if off + 8 > len(buf): + return None + day, mon, yh, yl, _unk, hr, mn, sc = buf[off : off + 8] + year = (yh << 8) | yl + if not (2015 <= year <= 2050 and 1 <= mon <= 12 and 1 <= day <= 31 + and 0 <= hr < 24 and 0 <= mn < 60 and 0 <= sc < 60): + return None + try: + return datetime.datetime(year, mon, day, hr, mn, sc) + except ValueError: + return None + + +def extract_binary_metadata(buf: bytes) -> IdfBinaryMetadata: + """Pull serial/timestamp/sample_rate/record_time/calibration from the + sig-A binary header. + + Field positions confirmed against UM11719_20231219162723.IDFW; stable + across the 151-file sig-A corpus. """ - raise NotImplementedError( - "IDF binary codec not yet implemented; the .IDFW/.IDFH binary format " - "is undecoded. Use parse_idf_report() on the paired .txt sidecar " - "for device-authoritative metadata." + md = IdfBinaryMetadata() + + # Serial: null-terminated ASCII at 0x14E. + md.serial = _read_ascii_z(buf, 0x14E, maxlen=16) + + # Sample rate + record time live in a BW-compatible compliance block. + # Locate the 6-byte anchor `be 80 00 00 00 00` and read offsets relative + # to it: anchor-6 = sample_rate uint16 BE; anchor+6 = record_time float32 BE. + anchor = buf.find(b"\xbe\x80\x00\x00\x00\x00", 0x800, 0xA00) + if anchor > 0: + sr_bytes = buf[anchor - 6 : anchor - 4] + if len(sr_bytes) == 2: + sr = int.from_bytes(sr_bytes, "big") + if sr in (256, 512, 1024, 2048, 4096): + md.sample_rate = sr + rt_bytes = buf[anchor + 6 : anchor + 10] + if len(rt_bytes) == 4: + try: + rt = struct.unpack(">f", rt_bytes)[0] + if 0.1 <= rt <= 600.0: + md.record_time_sec = float(rt) + except struct.error: + pass + + # Event timestamp: 8 bytes. Position differs between IDFW (0x97A) and + # IDFH (0x9F8); scan a small range and accept the first valid decode. + for off in (0x97A, 0x9F8): + ts = _decode_8byte_timestamp(buf, off) + if ts is not None: + md.event_datetime = ts + break + + # Calibration date: day, month, year_be at 0x194-0x197. + if len(buf) > 0x197: + day, mon = buf[0x194], buf[0x195] + year = int.from_bytes(buf[0x196 : 0x198], "big") + if 1 <= mon <= 12 and 1 <= day <= 31 and 2015 <= year <= 2050: + try: + md.calibration_date = datetime.date(year, mon, day) + except ValueError: + pass + + return md + + +# ─── Sample decoder + unit conversion ─────────────────────────────────────── + + +def _find_waveform_body_offset(buf: bytes) -> Optional[int]: + """Pick the file offset of the waveform body by trial-decoding every + ``00 02 00`` magic position past the fixed-header region. + + The body's location isn't fixed across all sig-A IDFW files — about + half the production events use ``0x0f1f``, but the rest have offsets + that shift based on header padding / channel-config layout. We + auto-detect by: + + 1. Find every ``00 02 00`` occurrence past ``_BODY_SCAN_FLOOR``. + 2. Try ``decode_waveform_v2()`` on each candidate. + 3. Pick the offset whose decoded sample count is largest. + + Returns the offset, or ``None`` if no candidate yielded more than + the trivial 2-sample preamble (= "no real body found"). + + Costs ~2-8 trial decodes per file; in practice the first candidate + past 0x0e00 is usually the right one. + """ + if len(buf) < _BODY_SCAN_FLOOR + 8: + return None + best: Optional[tuple[int, int]] = None # (total_samples, offset) + i = _BODY_SCAN_FLOOR + while True: + j = buf.find(_BODY_MAGIC, i) + if j < 0: + break + i = j + 1 + try: + decoded = decode_waveform_v2(buf[j:]) + except Exception: + continue + if not decoded: + continue + total = sum(len(v) for v in decoded.values()) + # A "real" body has more than just the 2-sample preamble. + if total <= 2: + continue + if best is None or total > best[0]: + best = (total, j) + return best[1] if best else None + + +def _decode_waveform_samples(buf: bytes) -> Optional[dict]: + """Decode samples from the sig-A waveform body. + + Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in + its own count unit (see :func:`mic_count_to_psi`). Returns None if + no usable body is found. + + Uses :func:`_find_waveform_body_offset` to locate the body — the + file-offset varies across events (~50% sit at the canonical + ``0x0f1f`` but the rest don't), so the previous hardcoded constant + silently produced 2-sample preamble-only output for half the corpus. + """ + off = _find_waveform_body_offset(buf) + if off is None: + return None + return decode_waveform_v2(buf[off:]) + + +def geo_count_to_ips(count: int) -> float: + """Convert a Thor geo decoder count to in/s. LSB = 0.0003 in/s.""" + return count * _GEO_LSB_IPS + + +def mic_count_to_psi(count: int) -> float: + """Convert a Thor mic decoder count to psi. Scale derived from + regression over 50 sample pairs in UM11719_20231219162723.IDFW; + consistent to ~5%. Calibration constants from the channel block + can refine this once decoded. + """ + return count * _MIC_LSB_PSI + + +# ─── IDFH histogram decoder ───────────────────────────────────────────────── + + +@dataclass +class IdfhInterval: + """One decoded histogram interval (typically one minute of monitoring).""" + offset: int # file byte offset of the 72-byte record + # Per-channel min/max ADC counts (int16 BE), half-period samples, peak count. + # Peak = max(|min|, |max|). freq_hz = 512/halfp (None if halfp ≤ 5 → + # ">100 Hz" sentinel; matches sidecar convention). + tran_min: int + tran_max: int + tran_halfp: int + vert_min: int + vert_max: int + vert_halfp: int + long_min: int + long_max: int + long_halfp: int + micl_min: int + micl_max: int + micl_halfp: int + + def peak_count(self, channel: str) -> int: + mn = getattr(self, f"{channel.lower()}_min") + mx = getattr(self, f"{channel.lower()}_max") + return max(abs(mn), abs(mx)) + + def peak_ips(self, channel: str) -> float: + """Convert peak count to in/s (geo channels only).""" + return self.peak_count(channel) / _IDFH_INT16_FS * _IDFH_GEO_FULL_SCALE + + def freq_hz(self, channel: str) -> Optional[float]: + halfp = getattr(self, f"{channel.lower()}_halfp") + if halfp <= 5: + return None + return _IDFH_HALFP_FREQ_NUM / halfp + + +def _decode_idfh_interval(buf72: bytes, offset: int) -> IdfhInterval: + """Decode one 72-byte interval record into per-channel min/max/halfp.""" + import struct + fields = [] + for i in range(4): + block = buf72[i * 16 : (i + 1) * 16] + mn = struct.unpack_from(">h", block, 0)[0] + mx = struct.unpack_from(">h", block, 2)[0] + # block[4:6] = int16 BE, role unknown (possibly time-of-peak) + halfp = struct.unpack_from(">H", block, 6)[0] + # block[10:12] and block[14:16] are uint16 BE with unknown semantics + # (likely sum / count contributions for the PVS computation). + fields.extend([mn, mx, halfp]) + # Tail 8 bytes (buf72[64:72]) carry PVS-related data; not yet decoded. + return IdfhInterval( + offset=offset, + tran_min=fields[0], tran_max=fields[1], tran_halfp=fields[2], + vert_min=fields[3], vert_max=fields[4], vert_halfp=fields[5], + long_min=fields[6], long_max=fields[7], long_halfp=fields[8], + micl_min=fields[9], micl_max=fields[10], micl_halfp=fields[11], + ) + + +def decode_idfh_body(buf: bytes) -> list: + """Walk an IDFH file and decode every interval record. + + The body has one or more segments; each segment header is 12 bytes: + ``[length_be 2B][0a 00 00 00][00 NN_counter][05 3f]`` where ``length`` + is bytes from the magic through the end of the interval block + (= 10 + 72 × n_intervals). Segments are separated by a 2-byte tail + + next-segment 2-byte prefix (the bytes before the next length field). + Confirmed against the 859-file corpus (181,071 intervals decoded; 1 + failure is the sig-B BE9439 file). + """ + intervals: list = [] + i = 0 + while True: + j = buf.find(b"\x0a\x00\x00\x00", i) + if j < 0 or j < 2: + break + # Validate: [length_be][0a 00 00 00][00 NN][05 3f] + if buf[j + 4] != 0x00 or buf[j + 6 : j + 8] != b"\x05\x3f": + i = j + 1 + continue + length = int.from_bytes(buf[j - 2 : j], "big") + n = (length - _IDFH_SEGMENT_HEADER) // _IDFH_INTERVAL_SIZE + if n <= 0: + i = j + 1 + continue + header_start = j - 2 + interval_start = header_start + _IDFH_SEGMENT_HEADER + for k in range(n): + off = interval_start + k * _IDFH_INTERVAL_SIZE + if off + _IDFH_INTERVAL_SIZE > len(buf): + break + chunk = buf[off : off + _IDFH_INTERVAL_SIZE] + intervals.append(_decode_idfh_interval(chunk, off)) + # Advance past this segment + the 2-byte tail. + i = header_start + length + _IDFH_SEGMENT_TAIL + return intervals + + +# ─── Top-level reader ─────────────────────────────────────────────────────── + + +@dataclass +class IdfReadResult: + """Return type for :func:`read_idf_file`. + + For waveforms (``.IDFW``), ``samples`` holds the per-channel sample + arrays in Thor decoder counts. For histograms (``.IDFH``), + ``samples`` is empty and ``intervals`` holds the per-interval + record list (peaks, freqs). + """ + event: IdfEvent + samples: dict # {"Tran": [...], ...} for IDFW; empty for IDFH + binary_metadata: IdfBinaryMetadata + signature: str # always "thor" for now (sig-A genuine Thor) + intervals: Optional[list] = None # list[IdfhInterval] for IDFH; None for IDFW + + +def read_idf_file( + path: Union[str, Path], + *, + data: Optional[bytes] = None, +) -> IdfReadResult: + """Parse a Thor ``.IDFW`` binary into an ``IdfEvent`` + decoded samples. + + Currently implements signature-A waveforms only. Signature-B + (old-firmware) and ``.IDFH`` histograms raise NotImplementedError; + use the paired ``.IDFW.txt`` / ``.IDFH.txt`` sidecar for those via + ``parse_idf_report()``. + + Returns an :class:`IdfReadResult`. The caller converts int sample + counts to physical units via :func:`geo_count_to_ips` / + :func:`mic_count_to_psi`. + + ``path`` is used for filename in error messages and ``.IDFH`` vs + ``.IDFW`` suffix detection. When ``data`` is supplied the disk + read is skipped — useful for ingest paths that already have the + bytes in memory and where the file may not exist on disk yet. + """ + p = Path(path) + buf = data if data is not None else p.read_bytes() + + if len(buf) < 16 or buf[6:16] != _INSTANTEL_TAG + b"\x00": + raise ValueError(f"{p.name}: not an IDF file (missing Instantel magic)") + + sig_prefix = buf[:6] + if sig_prefix == _THOR_PREFIX: + signature = "thor" + elif sig_prefix == _BW_STRAY_PREFIX: + raise NotImplementedError( + f"{p.name}: file has a Series III (Blastware) STRT header in " + "an IDF-named container — not a Thor binary. Route through " + "minimateplus.event_file_io.read_blastware_file() instead " + "(peaks decode; samples & full metadata don't, but it's not " + "Thor data so the Thor codec doesn't apply)." + ) + else: + raise ValueError(f"{p.name}: unknown IDF signature {sig_prefix.hex()}") + + is_histogram = p.suffix.upper() == ".IDFH" + md = extract_binary_metadata(buf) + + if is_histogram: + intervals = decode_idfh_body(buf) + if not intervals: + raise ValueError(f"{p.name}: IDFH body decoded no intervals") + # Peaks: max across all intervals on each channel (per-channel max + # of stored max-magnitudes; sidecar's PPV row carries the same). + peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0) + peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0) + peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0) + # Mic peak in psi — Thor stores per-interval mic ADC counts in the + # binary; convert the max count to psi via the per-count factor. + mic_peak_count = max((iv.peak_count("MicL") for iv in intervals), default=0) + mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None + rep = IdfReport( + serial_number=md.serial, + event_type="Full Histogram", + event_datetime=md.event_datetime, + filename=p.name, + sample_rate=md.sample_rate, + record_time_sec=md.record_time_sec, + ) + peaks = IdfPeaks( + transverse_ips=peak_tran, + vertical_ips=peak_vert, + longitudinal_ips=peak_long, + peak_vector_sum_ips=None, + mic_pspl_dbl=None, # IDFH binary doesn't carry the dB(L) value + mic_pspl_psi=mic_peak_psi, + ) + event = IdfEvent( + serial=md.serial or "UNKNOWN", + timestamp=md.event_datetime or datetime.datetime(1970, 1, 1), + kind="Histogram", + filename=p.name, + sample_rate=md.sample_rate, + record_time_sec=md.record_time_sec, + peaks=peaks, + report=rep, + ) + return IdfReadResult( + event=event, + samples={}, + binary_metadata=md, + signature=signature, + intervals=intervals, + ) + + # Waveform path. + decoded = _decode_waveform_samples(buf) + if decoded is None: + raise ValueError(f"{p.name}: waveform body codec failed") + + rep = IdfReport( + serial_number=md.serial, + event_type="Full Waveform", + event_datetime=md.event_datetime, + filename=p.name, + sample_rate=md.sample_rate, + record_time_sec=md.record_time_sec, + ) + + def _peak_ips(ch: str) -> float: + arr = decoded.get(ch, []) + return geo_count_to_ips(max((abs(v) for v in arr), default=0)) + + # Mic peak psi from binary: max absolute MicL ADC count × 2.14e-6 psi/count. + mic_arr = decoded.get("MicL", []) + mic_peak_count = max((abs(v) for v in mic_arr), default=0) + mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None + + peaks = IdfPeaks( + transverse_ips=_peak_ips("Tran"), + vertical_ips=_peak_ips("Vert"), + longitudinal_ips=_peak_ips("Long"), + # PVS requires aligned per-sample √(T²+V²+L²); leave None — the + # sidecar carries it and the bridge picks it up if present. + peak_vector_sum_ips=None, + mic_pspl_dbl=None, # binary IDFW doesn't carry the dB(L) value; + # sidecar .txt fills it via IdfReport.from_dict + mic_pspl_psi=mic_peak_psi, + ) + + event = IdfEvent( + serial=md.serial or "UNKNOWN", + timestamp=md.event_datetime or datetime.datetime(1970, 1, 1), + kind="Waveform", + filename=p.name, + sample_rate=md.sample_rate, + record_time_sec=md.record_time_sec, + peaks=peaks, + report=rep, + ) + + return IdfReadResult( + event=event, + samples=decoded, + binary_metadata=md, + signature=signature, ) diff --git a/micromate/idf_to_bw_report.py b/micromate/idf_to_bw_report.py new file mode 100644 index 0000000..c5d0a01 --- /dev/null +++ b/micromate/idf_to_bw_report.py @@ -0,0 +1,323 @@ +""" +micromate/idf_to_bw_report.py — adapter that projects a parsed Thor IDF +report (+ binary metadata + decoded IDFH intervals) into the +``bw_report``-shaped dict that :mod:`sfm.report_pdf.gather_report_data` +consumes. + +Lets Thor events flow through the existing Series III Event Report PDF +pipeline without duplicating the renderer. Thor's report content is +~95% the same data shape as BW's; the field names differ but the +underlying metrics map 1:1. + +Caveats +─────── + +- **Mic units** — Thor records ``MicPSPL`` natively in dB(L). This + adapter sets ``bw_report.mic.pspl_dbl`` directly; the report + renderer recomputes the equivalent psi via its dBL→psi formula. +- **Saturation / above-range flags** — Thor doesn't always mark + ``OORANGE`` the way BW does; we set ``zc_freq_above_range`` only + when a `>100` sentinel was preserved in the raw text. +- **Per-interval data** — for IDFH events we build ``interval_times`` + by stepping ``IntervalSize`` from ``HistogramStartTime``; the binary + decoder confirms one record per step (882 / 881 / 881 ... across + the corpus). +- **calibration_by parsing** — Thor's free-form ``Calibration : November + 22, 2023 by Instantel`` is split on ``" by "`` to extract the + calibrator; the date prefix is parsed where possible, otherwise + the binary-extracted ``calibration_date`` from + :class:`micromate.idf_file.IdfBinaryMetadata` wins. +""" + +from __future__ import annotations + +import datetime +import re +from typing import Any, Dict, List, Optional + + +# ─── Helpers ──────────────────────────────────────────────────────────────── + + +_NUM_RE = re.compile(r"-?\d+(?:\.\d+)?") + + +def _parse_first_number(s: Optional[str]) -> Optional[float]: + """Pull the first numeric token from a string like ``"0.1500 in/s"``.""" + if s is None: + return None + m = _NUM_RE.search(str(s)) + if not m: + return None + try: + return float(m.group(0)) + except ValueError: + return None + + +def _parse_interval_size_s(s: Optional[str]) -> Optional[float]: + """``"60 sec"`` → 60.0, ``"5 min"`` → 300.0, ``"1 hour"`` → 3600.""" + if s is None: + return None + num = _parse_first_number(s) + if num is None: + return None + sl = str(s).lower() + if "hour" in sl or "hr" in sl: + return num * 3600.0 + if "min" in sl: + return num * 60.0 + return num # default to seconds + + +def _parse_calibration(text: Optional[str]) -> tuple[Optional[str], Optional[str]]: + """Split ``"November 22, 2023 by Instantel"`` → (ISO date, calibrator). + + Returns ``(None, None)`` if neither half parses. + """ + if not text: + return None, None + parts = str(text).split(" by ", 1) + date_part = parts[0].strip() if parts else None + by_part = parts[1].strip() if len(parts) > 1 else None + iso_date: Optional[str] = None + if date_part: + for fmt in ("%B %d, %Y", "%b %d, %Y", "%Y-%m-%d", "%m/%d/%Y"): + try: + iso_date = datetime.datetime.strptime(date_part, fmt).date().isoformat() + break + except ValueError: + continue + return iso_date, by_part + + +def _channel_peaks(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]: + """Map ``tran_ppv`` / ``tran_zc_freq`` / ... → bw_report.peaks.tran shape.""" + out: Dict[str, Any] = {} + for src, dst in ( + (f"{ch_lc}_ppv", "ppv_ips"), + (f"{ch_lc}_zc_freq", "zc_freq_hz"), + (f"{ch_lc}_time_of_peak", "time_of_peak_s"), + (f"{ch_lc}_peak_acceleration", "peak_accel_g"), + (f"{ch_lc}_peak_displacement", "peak_disp_in"), + ): + v = idf.get(src) + if v is not None: + out[dst] = v + # ZC freq ">100" sentinel: the raw text carries it under the un-typed + # key (e.g. ``raw["tran_zc_freq"]`` would be ``">100"``), and our parser + # dropped the typed entry. Detect that case and flag. + raw_zc = idf.get(f"{ch_lc}_zc_freq") + if isinstance(raw_zc, str) and ">" in raw_zc: + out["zc_freq_above_range"] = True + out.pop("zc_freq_hz", None) + return out + + +def _sensor_check(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]: + out: Dict[str, Any] = {} + fr = idf.get(f"{ch_lc}_test_freq") + if fr is not None: + out["freq_hz"] = _parse_first_number(fr) + rt = idf.get(f"{ch_lc}_test_ratio") + if rt is not None: + out["ratio"] = _parse_first_number(rt) + am = idf.get(f"{ch_lc}_test_amplitude") + if am is not None: + out["amplitude_mv"] = _parse_first_number(am) + res = idf.get(f"{ch_lc}_test_results") + if res is not None: + out["result"] = str(res).strip() + return {k: v for k, v in out.items() if v is not None} + + +def _interval_times(idf: Dict[str, Any], n_intervals: Optional[int]) -> List[str]: + """Synthesise per-interval timestamps from start + interval_size × k. + + Returns ``[]`` when start time or interval size is unknown. + """ + if not n_intervals: + return [] + start_date = idf.get("histogram_start_date") or idf.get("event_date") + start_time = idf.get("histogram_start_time") or idf.get("event_time") + iv_str = idf.get("interval_size") + iv_s = _parse_interval_size_s(iv_str) + if not (start_date and start_time and iv_s): + return [] + try: + t0 = datetime.datetime.strptime(f"{start_date} {start_time}", "%Y-%m-%d %H:%M:%S") + except ValueError: + return [] + out = [] + for k in range(int(n_intervals)): + t = t0 + datetime.timedelta(seconds=iv_s * (k + 1)) + out.append(t.isoformat()) + return out + + +# ─── Top-level adapter ────────────────────────────────────────────────────── + + +def build_bw_report_from_idf( + idf_report: Dict[str, Any], + *, + binary_md=None, + intervals: Optional[list] = None, + is_histogram: Optional[bool] = None, +) -> Dict[str, Any]: + """Project a parsed IDF report dict (and optional binary metadata + + decoded IDFH intervals) into the BW report sidecar shape. + + The returned dict is structurally identical to what + ``minimateplus.event_file_io._bw_report_to_dict`` produces from a + real BW ASCII report — it can be assigned to + ``sidecar["bw_report"]`` and consumed verbatim by + ``sfm.report_pdf.gather_report_data``. + + ``intervals`` is the list of :class:`micromate.idf_file.IdfhInterval` + objects from :func:`micromate.idf_file.decode_idfh_body`; only used + for histogram events to derive accurate ``interval_times``. + """ + if is_histogram is None: + et = str(idf_report.get("event_type", "")) + is_histogram = et.lower().startswith("full histogram") + + # ── Trigger / recording / device ───────────────────────────────────── + trigger_channel = idf_report.get("trigger") + trigger_level = _parse_first_number(idf_report.get("geo_trigger_level")) + geo_range_ips = _parse_first_number(idf_report.get("geo_range")) + + cal_iso, cal_by = _parse_calibration(idf_report.get("calibration")) + # Prefer the binary-extracted calibration_date when our text parse fell + # through; the binary date is unambiguous. + if cal_iso is None and binary_md is not None and binary_md.calibration_date: + cal_iso = binary_md.calibration_date.isoformat() + + # ── Histogram fields ──────────────────────────────────────────────── + hist_block: Dict[str, Any] = { + "start": None, "stop": None, "n_intervals": None, + "interval_size": None, "interval_size_s": None, + "channel_peak_when": {}, + } + if is_histogram: + sd = idf_report.get("histogram_start_date") + st = idf_report.get("histogram_start_time") + if sd and st: + try: + hist_block["start"] = datetime.datetime.strptime( + f"{sd} {st}", "%Y-%m-%d %H:%M:%S" + ).isoformat() + except ValueError: + pass + ed = idf_report.get("histogram_stop_date") + et_ = idf_report.get("histogram_stop_time") + if ed and et_: + try: + hist_block["stop"] = datetime.datetime.strptime( + f"{ed} {et_}", "%Y-%m-%d %H:%M:%S" + ).isoformat() + except ValueError: + pass + n_raw = idf_report.get("number_of_intervals") + if n_raw is not None: + try: + # Thor reports a float like "81.04"; round to int (the BW + # report uses an int for the column). + hist_block["n_intervals"] = int(float(str(n_raw))) + except ValueError: + pass + # When the binary decoder gave us the actual interval count, prefer it. + if intervals is not None: + hist_block["n_intervals"] = len(intervals) + hist_block["interval_size"] = idf_report.get("interval_size") + hist_block["interval_size_s"] = _parse_interval_size_s(idf_report.get("interval_size")) + # interval_times derived from start+step (the BW report uses the + # exact strings; we match its representation). + times = _interval_times(idf_report, hist_block["n_intervals"]) + # Per-channel peak when (absolute date+time at which the channel's + # peak occurred over the histogram run). Thor splits this into + # ``TranPeakDate`` / ``TranPeakTime`` etc. + peak_when: Dict[str, str] = {} + for ch_label, ch_lc in (("Tran", "tran"), ("Vert", "vert"), ("Long", "long"), ("MicL", "mic")): + d = idf_report.get(f"{ch_lc}_peak_date") + t = idf_report.get(f"{ch_lc}_peak_time") + if d and t: + try: + peak_when[ch_label] = datetime.datetime.strptime( + f"{d} {t}", "%Y-%m-%d %H:%M:%S" + ).isoformat() + except ValueError: + continue + if peak_when: + hist_block["channel_peak_when"] = peak_when + + # ── Mic block ──────────────────────────────────────────────────────── + mic_block = { + "weighting": "L", # Thor mic is ISEE Linear + "pspl_dbl": idf_report.get("mic_ppv"), # the dB(L) float + "pspl_saturated": False, + "zc_freq_hz": idf_report.get("mic_zc_freq"), + "zc_freq_above_range": isinstance(idf_report.get("mic_zc_freq"), str) + and ">" in str(idf_report.get("mic_zc_freq")), + "time_of_peak_s": idf_report.get("mic_time_of_peak"), + } + if mic_block["zc_freq_above_range"]: + mic_block["zc_freq_hz"] = None + + # ── Peaks ──────────────────────────────────────────────────────────── + vs_block = { + "ips": idf_report.get("peak_vector_sum"), + "time_s": _parse_first_number(idf_report.get("peak_vector_sum_time_sum")), + "when": None, + "saturated": False, + } + if is_histogram: + # PVS absolute date+time, when present. + vs_d = idf_report.get("peak_vector_sum_date") + vs_t = idf_report.get("peak_vector_sum_time") + if vs_d and vs_t: + try: + vs_block["when"] = datetime.datetime.strptime( + f"{vs_d} {vs_t}", "%Y-%m-%d %H:%M:%S" + ).isoformat() + except ValueError: + pass + + return { + "available": True, + "event_type": idf_report.get("event_type"), + "version": idf_report.get("version"), + "trigger": { + "channel": trigger_channel, + "geo_level_ips": trigger_level, + }, + "recording": { + "sample_rate_sps": idf_report.get("sample_rate"), + "record_time_s": idf_report.get("record_time_sec"), + "pretrig_s": idf_report.get("pre_trigger_sec"), + "stop_mode": idf_report.get("record_stop_mode"), + "geo_range_ips": geo_range_ips, + "units": idf_report.get("units"), + }, + "device": { + "battery_volts": idf_report.get("battery_volts"), + "calibration_date": cal_iso, + "calibration_by": cal_by, + }, + "peaks": { + "tran": _channel_peaks(idf_report, "tran"), + "vert": _channel_peaks(idf_report, "vert"), + "long": _channel_peaks(idf_report, "long"), + "vector_sum": vs_block, + }, + "mic": mic_block, + "sensor_check": { + "tran": _sensor_check(idf_report, "tran"), + "vert": _sensor_check(idf_report, "vert"), + "long": _sensor_check(idf_report, "long"), + "mic": _sensor_check(idf_report, "mic"), + }, + "histogram": hist_block, + "monitor_log": [], + "pc_sw_version": None, + } diff --git a/micromate/models.py b/micromate/models.py index d02a37f..68a91a7 100644 --- a/micromate/models.py +++ b/micromate/models.py @@ -159,12 +159,23 @@ class IdfReport: @dataclass class IdfPeaks: - """Geophone + mic peak values for one Thor event. Native Thor units.""" + """Geophone + mic peak values for one Thor event. Native Thor units. + + Thor stores the mic peak in two parallel forms — ``mic_pspl_dbl`` is + what the sidecar's top-level ``MicPSPL`` header field carries (dB(L)), + used in the report header. ``mic_pspl_psi`` is the psi value derived + either from the IDFW sample table / IDFH interval column 9, or from + the binary mic counts (~2.14e-6 psi/count). Needed because the + BW-shaped ``PeakValues.micl`` consumed by ``event_hdf5.write_event_hdf5`` + expects psi — feeding it dB(L) makes the h5 mic-chart scale factor + blow up. + """ transverse_ips: Optional[float] = None # in/s vertical_ips: Optional[float] = None # in/s longitudinal_ips: Optional[float] = None # in/s peak_vector_sum_ips: Optional[float] = None # in/s mic_pspl_dbl: Optional[float] = None # dB(L) + mic_pspl_psi: Optional[float] = None # psi @dataclass @@ -324,10 +335,14 @@ class IdfEvent: machinery without those code paths needing to know about Thor. Caveats of the bridge: - - ``mic_ppv`` on the produced Event carries Thor's dB(L) value - verbatim — the UI distinguishes via the ``device_family`` - column (Phase 1). Don't run the BW psi→dBL converter on - Series IV rows. + - ``PeakValues.micl`` carries the mic peak in **psi** (matching + BW's convention) — set from :attr:`IdfPeaks.mic_pspl_psi`, + with a dB(L)→psi fallback when only the dB(L) value is + available. This is what the h5 writer's mic-scale-factor + logic needs. The dB(L) value still flows through + ``bw_report.mic.pspl_dbl`` (set by the + ``idf_to_bw_report`` adapter) and the renderer reads it + from there for the report header. - Many Thor-specific fields (Peak Acceleration / Displacement, sensor self-check, calibration) don't have a slot in ``Event``. The full IdfReport is preserved on the @@ -349,11 +364,17 @@ class IdfEvent: minute=self.timestamp.minute, second=self.timestamp.second, ) + # Resolve mic peak as psi. Priority: binary-derived mic_pspl_psi + # (set by read_idf_file) > dB(L)→psi fallback via standard formula + # (psi = 2.9e-9 × 10^(dBL/20)) > None. + mic_psi = self.peaks.mic_pspl_psi + if mic_psi is None and self.peaks.mic_pspl_dbl is not None: + mic_psi = 2.9e-9 * (10.0 ** (self.peaks.mic_pspl_dbl / 20.0)) pv = PeakValues( tran=self.peaks.transverse_ips, vert=self.peaks.vertical_ips, long=self.peaks.longitudinal_ips, - micl=self.peaks.mic_pspl_dbl, # dB(L) — see caveat above + micl=mic_psi, # psi, matching BW's convention (h5 scaling depends on this) peak_vector_sum=self.peaks.peak_vector_sum_ips, ) pi = ProjectInfo( diff --git a/minimateplus/event_file_io.py b/minimateplus/event_file_io.py index 7dc74c1..faa3d98 100644 --- a/minimateplus/event_file_io.py +++ b/minimateplus/event_file_io.py @@ -49,7 +49,7 @@ SIDECAR_KIND = "sfm.event" # bumped without a `pip install` re-run — leading to confusing stale # version stamps in sidecars. Bump this constant and CHANGELOG.md # together at release time. -TOOL_VERSION = "0.20.0" +TOOL_VERSION = "0.21.1" try: # Best-effort: prefer the installed metadata when it's NEWER than the diff --git a/pyproject.toml b/pyproject.toml index 5151f55..dafdc73 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "seismo-relay" -version = "0.20.0" +version = "0.21.1" description = "Python client and REST server for MiniMate Plus seismographs" requires-python = ">=3.10" dependencies = [ diff --git a/scripts/backfill_thor_events.py b/scripts/backfill_thor_events.py new file mode 100644 index 0000000..41e7935 --- /dev/null +++ b/scripts/backfill_thor_events.py @@ -0,0 +1,331 @@ +""" +scripts/backfill_thor_events.py — re-process existing Thor (Series IV) +events so their sidecars carry the bw_report block produced by +``micromate.idf_to_bw_report.build_bw_report_from_idf`` + their .h5 +clean-waveform files for IDFW events. + +Why this exists +─────────────── + +Thor events ingested before v0.21.0 (or during the v0.21.0 ingest bug +window fixed in commit bee1185) have sidecars with only +``extensions.idf_report`` — no ``bw_report`` block. Without +``bw_report``, the SFM PDF renderer falls back to DB-only fields +(misses sensor-self-check, full per-channel breakdown, mic dB(L)), +and the modal chart 404s on ``/waveform.json`` for IDFW events +because no .h5 was written when the codec failed at ingest. + +Re-forwarding from thor-watcher would also fix this, but that requires +operator coordination on every watcher machine and uses bandwidth this +script doesn't. + +What this does +────────────── + +Walks ``//`` for ``.IDFW`` / ``.IDFH`` files +and, for each one: + + 1. Reads the existing sidecar (preserving review state + captured_at). + 2. Re-runs ``micromate.idf_file.read_idf_file()`` on the binary + bytes — passing ``data=`` so the codec doesn't try to read from + a path it doesn't know. + 3. Pulls ``extensions.idf_report`` (the raw parsed Thor dict the + v0.18.0+ ingest path already stashed) and runs the v0.21.0 + ``build_bw_report_from_idf`` adapter against it. + 4. Writes the refreshed sidecar with the new ``bw_report``, + bumped ``source.tool_version``, but preserved ``review`` block + + the original ``captured_at`` timestamp. + 5. Regenerates the .h5 waveform file via the existing + ``event_hdf5`` writer. For IDFW that's the decoded per-sample + stream; for IDFH it's a 1-sample-per-interval synthesised array + (peak ADC count per channel) so the renderer's bar-chart code + has data to group on. Mic peak psi from the binary is merged + onto the IdfEvent before the bridge so the h5 writer's per-count + mic scale factor lands on a sensible value (without this the + mic chart on Thor events plots dB(L)-as-pseudo-psi and shows + bomb-level numbers). + +Idempotent. Re-running it after a parser/adapter change just +re-writes sidecars — no DB writes, no thor-watcher coordination. + +Usage +───── + + python scripts/backfill_thor_events.py [--store-root PATH] + [--dry-run] + [--skip-hdf5] + [--force] + [-v] + +By default, refreshes any Thor event whose sidecar is missing +``bw_report`` OR whose ``source.tool_version`` is older than the +current ``TOOL_VERSION``. ``--force`` refreshes every Thor event +regardless. +""" + +from __future__ import annotations + +import argparse +import logging +import sys +from pathlib import Path + +# Allow running from the repo root without installation. +sys.path.insert(0, str(Path(__file__).resolve().parent.parent)) + +from minimateplus import event_file_io +from sfm.waveform_store import WaveformStore + +log = logging.getLogger("backfill_thor_events") + + +def _is_thor_event(path: Path) -> bool: + if not path.is_file(): + return False + if path.name.endswith((".sfm.json", ".h5", "_ASCII.TXT")): + return False + return path.suffix.upper() in (".IDFW", ".IDFH") + + +def _vtuple(s: str) -> tuple: + try: + return tuple(int(p) for p in str(s).split(".")[:3]) + except Exception: + return (0, 0, 0) + + +def main(argv=None) -> int: + p = argparse.ArgumentParser(description=__doc__) + p.add_argument( + "--db-path", + default=str(Path(__file__).resolve().parent.parent / "bridges" / "captures" / "seismo_relay.db"), + help="Used only to derive the default --store-root.", + ) + p.add_argument("--store-root", default=None) + p.add_argument("--dry-run", action="store_true") + p.add_argument("--skip-hdf5", action="store_true", + help="Don't regenerate .h5 files for IDFW events.") + p.add_argument("--force", action="store_true", + help="Refresh every Thor event, not just ones with stale or missing bw_report.") + p.add_argument("-v", "--verbose", action="store_true") + args = p.parse_args(argv) + + logging.basicConfig( + level=logging.DEBUG if args.verbose else logging.INFO, + format="%(asctime)s %(levelname)-7s %(name)s %(message)s", + datefmt="%H:%M:%S", + ) + + db_path = Path(args.db_path).expanduser().resolve() + store_root = ( + Path(args.store_root).expanduser().resolve() + if args.store_root else db_path.parent / "waveforms" + ) + if not store_root.exists(): + log.error("store root not found: %s", store_root) + return 1 + store = WaveformStore(store_root) + log.info("store root: %s", store_root) + log.info("current TOOL_VERSION: %s", event_file_io.TOOL_VERSION) + + refreshed = skipped = errors = h5_written = 0 + + # Lazy imports so any one of these failing produces a useful error + # message rather than crashing module-load. + from micromate.idf_file import read_idf_file + from micromate.idf_to_bw_report import build_bw_report_from_idf + + for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()): + serial = serial_dir.name + for path in sorted(serial_dir.iterdir()): + if not _is_thor_event(path): + continue + + sidecar_path = store.sidecar_path_for(serial, path.name) + if not sidecar_path.exists(): + log.debug("%s: no sidecar — skipping (this is a binary without ingest history)", + path.name) + skipped += 1 + continue + + try: + existing = event_file_io.read_sidecar(sidecar_path) + except Exception as exc: + log.warning("%s: failed to read sidecar — %s", path.name, exc) + errors += 1 + continue + + has_bw_report = bool(existing.get("bw_report")) + existing_version = (existing.get("source") or {}).get("tool_version", "") + up_to_date = ( + has_bw_report + and _vtuple(existing_version) >= _vtuple(event_file_io.TOOL_VERSION) + ) + if up_to_date and not args.force: + skipped += 1 + continue + + # Re-decode the binary. Catch + log; continue with .txt-only + # data if it fails (matches the live ingest path's behavior). + idf_samples = None + idf_intervals = None + binary_md = None + is_histogram = path.suffix.upper() == ".IDFH" + try: + binary_bytes = path.read_bytes() + res = read_idf_file(path, data=binary_bytes) + idf_samples = res.samples or None + idf_intervals = res.intervals + binary_md = res.binary_metadata + is_histogram = res.intervals is not None + except NotImplementedError: + # sig-B / Blastware-stray binary; no samples but adapter + # can still produce a bw_report from extensions.idf_report. + log.debug("%s: binary codec NotImplementedError (sig-B / BW-stray); proceeding from sidecar's idf_report only", path.name) + except Exception as exc: + log.warning("%s: binary decode failed — %s; proceeding from sidecar's idf_report only", path.name, exc) + + # Run the adapter. Pull report_dict from + # extensions.idf_report (the v0.18.0+ ingest preserved it). + report_dict = (existing.get("extensions") or {}).get("idf_report") or {} + if not report_dict and binary_md is None: + log.debug("%s: no idf_report in sidecar AND no binary metadata — nothing to project", path.name) + skipped += 1 + continue + + try: + bw_report = build_bw_report_from_idf( + report_dict, binary_md=binary_md, + intervals=idf_intervals, is_histogram=is_histogram, + ) + except Exception as exc: + log.warning("%s: adapter failed — %s", path.name, exc) + errors += 1 + continue + + # Build the new sidecar by overlaying refreshed fields onto + # the existing one — preserves review, captured_at, blastware + # block, source.kind, etc. + new_sidecar = dict(existing) # shallow copy + new_sidecar["bw_report"] = bw_report + src = dict(new_sidecar.get("source") or {}) + src["tool_version"] = event_file_io.TOOL_VERSION + new_sidecar["source"] = src + + # Preserve histogram intervals if the binary decoded them + # (improves over the original ingest if that one ran before + # the bee1185 codec fix). + if idf_intervals is not None: + ext = dict(new_sidecar.get("extensions") or {}) + ext["idf_intervals"] = [ + { + "offset": iv.offset, + "tran_peak": iv.peak_count("Tran"), + "tran_halfp": iv.tran_halfp, + "tran_freq": iv.freq_hz("Tran"), + "vert_peak": iv.peak_count("Vert"), + "vert_halfp": iv.vert_halfp, + "vert_freq": iv.freq_hz("Vert"), + "long_peak": iv.peak_count("Long"), + "long_halfp": iv.long_halfp, + "long_freq": iv.freq_hz("Long"), + "mic_peak": iv.peak_count("MicL"), + "mic_halfp": iv.micl_halfp, + "mic_freq": iv.freq_hz("MicL"), + } + for iv in idf_intervals + ] + new_sidecar["extensions"] = ext + + if args.dry_run: + will_write_h5 = (idf_samples or idf_intervals) and not args.skip_hdf5 + log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)", + serial, path.name, + "wrote" if not has_bw_report else "refreshed", + "would write" if will_write_h5 else "skipped") + else: + event_file_io.write_sidecar(sidecar_path, new_sidecar) + log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)", + serial, path.name, + "added" if not has_bw_report else "refreshed", + len(idf_intervals) if idf_intervals else 0) + refreshed += 1 + + # Regenerate .h5 by replaying the same IdfEvent → Event bridge + # save_imported_idf uses. For IDFW we write the decoded per- + # sample arrays. For IDFH we synthesise a 1-sample-per-interval + # array (peak ADC count per channel per interval) so the + # renderer's bar-chart code has something to group on. + # Pre-condition: either real samples (IDFW) or decoded intervals + # (IDFH). Skip otherwise. + have_data = bool(idf_samples) or bool(idf_intervals) + if have_data and not args.skip_hdf5: + from sfm import event_hdf5 + hdf5_path = store.hdf5_path_for(serial, path.name) + if args.dry_run: + log.debug("[DRY] would write %s", hdf5_path.name) + else: + try: + from micromate import IdfEvent + from minimateplus.event_file_io import file_sha256 + idf_event = IdfEvent.from_report(report_dict, path.name) + + # Merge the binary-derived mic peak psi (only the + # binary path knows the proper psi value; the .txt + # carries dB(L)). Without this, the h5 writer's + # per-count mic factor is computed against the + # dB(L) value-as-pseudo-psi and the mic chart + # scales wildly. + if (binary_md is not None and res is not None + and res.event.peaks.mic_pspl_psi is not None): + idf_event.peaks.mic_pspl_psi = res.event.peaks.mic_pspl_psi + + sha256 = file_sha256(path) + waveform_key = bytes.fromhex(sha256)[:16] + ev = idf_event.to_minimateplus_event(waveform_key) + + if is_histogram and idf_intervals: + # 1 sample per interval per channel — same + # synthesis save_imported_idf uses. The h5 + # writer's count×geo_fs/32768 conversion turns + # each peak-ADC-count into the bar's physical + # value. + ev.raw_samples = { + "Tran": [iv.peak_count("Tran") for iv in idf_intervals], + "Vert": [iv.peak_count("Vert") for iv in idf_intervals], + "Long": [iv.peak_count("Long") for iv in idf_intervals], + "MicL": [iv.peak_count("MicL") for iv in idf_intervals], + } + ev.total_samples = ev.total_samples or len(idf_intervals) + elif idf_samples: + ev.raw_samples = idf_samples + n_samp = max( + (len(idf_samples.get(ch, [])) + for ch in ("Tran", "Vert", "Long", "MicL")), + default=0, + ) + ev.total_samples = ev.total_samples or n_samp + + event_hdf5.write_event_hdf5( + hdf5_path, ev, + serial=serial, + geo_range="normal", + source_kind="idf-import", + tool_version=event_file_io.TOOL_VERSION, + ) + h5_written += 1 + log.debug("%s/%s — .h5 written (%s)", + serial, path.name, + f"{len(idf_intervals)} intervals" if is_histogram + else f"{sum(len(v) for v in (idf_samples or {}).values())} samples") + except Exception as exc: + log.warning("%s/%s — .h5 write failed: %s", + serial, path.name, exc) + + log.info("Done. refreshed=%d skipped=%d errors=%d h5_written=%d", + refreshed, skipped, errors, h5_written) + return 0 if errors == 0 else 2 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/scripts/test_thor_render.py b/scripts/test_thor_render.py new file mode 100644 index 0000000..8381cd2 --- /dev/null +++ b/scripts/test_thor_render.py @@ -0,0 +1,91 @@ +"""Re-ingest a prod IDFW + IDFH via the patched save_imported_idf and +render both PDFs to confirm charts have data.""" +from __future__ import annotations +import sys +import json +import datetime +import tempfile +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parents[1])) + +from sfm.waveform_store import WaveformStore +from sfm import report_pdf +import h5py + + +class FakeDb: + def __init__(self, event): + self.event = event + def get_event(self, _id): + return self.event + + +def to_ts_iso(ts): + if ts is None: + return None + try: + return datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat() + except Exception: + return None + + +def render_case(idf_path: Path, serial: str, out_pdf: Path, h5_summary: bool = True): + with tempfile.TemporaryDirectory() as td: + store = WaveformStore(Path(td)) + ev, rec = store.save_imported_idf( + idf_path.read_bytes(), + idf_path, + idf_report_text=None, # production worst case: no .txt + ) + print(f"=== {idf_path.name} ===") + print(f" h5: {rec['hdf5_filename']}, sidecar: {rec['sidecar_filename']}") + + h5p = Path(td) / serial / f"{idf_path.name}.h5" + if h5p.exists() and h5_summary: + with h5py.File(h5p) as h: + for ch in ("Tran", "Vert", "Long", "MicL"): + ds = h.get(f"samples/{ch}") + if ds is not None: + n = ds.shape[0] + mx = float(abs(ds[...]).max()) if n else 0 + print(f" samples/{ch}: n={n} max_abs={mx:.5f}") + + record_type = "Histogram" if idf_path.suffix.upper() == ".IDFH" else "Waveform" + fake_row = { + "serial": serial, + "blastware_filename": rec["filename"], + "record_type": record_type, + "timestamp": to_ts_iso(ev.timestamp), + "sample_rate": ev.sample_rate, + "project": ev.project_info.project if ev.project_info else None, + "client": ev.project_info.client if ev.project_info else None, + "operator": ev.project_info.operator if ev.project_info else None, + "sensor_location": ev.project_info.sensor_location if ev.project_info else None, + "created_at": None, + } + rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1") + print(f" ReportData: channels={ {k: len(v) for k,v in rd.channels.items()} }") + if rd.is_histogram: + print(f" histogram n_intervals={rd.histogram_n_intervals} interval_size={rd.histogram_interval_size}") + pdf = report_pdf.render_event_report_pdf(rd) + out_pdf.write_bytes(pdf) + print(f" PDF: {out_pdf} ({len(pdf)} bytes)") + + +def main(): + out_dir = Path("/tmp/thor_render_test"); out_dir.mkdir(exist_ok=True) + cases = [ + # IDFW that decoded to preamble-only under the old codec + ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804154137.IDFW", "UM6047"), + # IDFW that worked under the old codec (validates no regression) + ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804104450.IDFW", "UM6047"), + # IDFH histogram + ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804190047.IDFH", "UM6047"), + ] + for path, serial in cases: + render_case(Path(path), serial, out_dir / f"{Path(path).name}.pdf") + + +if __name__ == "__main__": + main() diff --git a/sfm/report_pdf.py b/sfm/report_pdf.py index 6618d9a..25859d1 100644 --- a/sfm/report_pdf.py +++ b/sfm/report_pdf.py @@ -638,14 +638,7 @@ def _draw_channel_stats_waveform(ax, rd: ReportData) -> None: ("Sensor Check", "sensor_check", ""), ] _draw_stats_table(ax, rd, rows_spec) - if rd.peak_vector_sum_ips is not None: - line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s" - if rd.peak_vector_sum_time_s is not None: - line += f" At {rd.peak_vector_sum_time_s:.3f} sec." - ax.text(0.0, -0.08, line, fontsize=9, weight="bold", - ha="left", va="top", transform=ax.transAxes) - ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888", - ha="left", va="top", transform=ax.transAxes) + _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec)) def _draw_channel_stats_histogram(ax, rd: ReportData) -> None: @@ -663,20 +656,54 @@ def _draw_channel_stats_histogram(ax, rd: ReportData) -> None: ("Sensor Check", "sensor_check", ""), ] _draw_stats_table(ax, rd, rows_spec) - if rd.peak_vector_sum_ips is not None: - line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s" - # Histograms: "0.091 in/s on May 27, 2026 At 06:06:14" - # The when_str is "HH:MM:SS Month DD, YYYY" — reformat for BW match. - if rd.peak_vector_sum_when_str: - parts = rd.peak_vector_sum_when_str.split(" ", 1) - if len(parts) == 2: - line += f" on {parts[1]} At {parts[0]}" - else: - line += f" on {rd.peak_vector_sum_when_str}" - ax.text(0.0, -0.08, line, fontsize=9, weight="bold", - ha="left", va="top", transform=ax.transAxes) - ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888", - ha="left", va="top", transform=ax.transAxes) + _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec), histogram_when=True) + + +def _draw_pvs_summary( + ax, + rd: ReportData, + *, + n_data_rows: int, + histogram_when: bool = False, +) -> None: + """Render the Peak Vector Sum + 'NA: Not Applicable' caption below the + stats table. + + Reads ``ax._stats_table_bottom`` (set by ``_draw_stats_table`` when + it pins the table via an explicit ``bbox``) so the PVS line lands + just below the table's known bottom edge instead of guessing at the + geometry. + + Centered horizontally for visual balance (the previous left-aligned + x=0 landed under the label column, not the data, which looked off). + """ + if rd.peak_vector_sum_ips is None: + return + + line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s" + if histogram_when and rd.peak_vector_sum_when_str: + # Histogram absolute date+time. when_str is "HH:MM:SS Month DD, YYYY"; + # reformat to " on At