12 Commits

Author SHA1 Message Date
serversdown d0b66368d5 Merge pull request 'update to v0.21.1, thor data import successful' (#29) from dev into main
Reviewed-on: #29
2026-06-01 16:54:23 -04:00
serversdown 25386cab8b fix(backfill): regenerate IDFH .h5 + merge binary mic_pspl_psi onto bridge
Two gaps in backfill_thor_events.py that left old Thor events showing
stale charts after a v0.21.1 backfill pass:
1. IDFH events were skipped from .h5 regeneration (the "have decoded
   samples" gate was IDFW-only).  Histograms kept their pre-v0.21.1
   .h5 — written from raw_samples = None, which the renderer turned
   into a near-empty bar chart, or for older events the dB(L)-as-pseudo-
   psi mic scale that produced "107.7 psi" peaks (atomic-bomb level
   instead of footstep level).  Fix: synthesise the same 1-sample-per-
   interval array save_imported_idf v0.21.1 uses (peak ADC count per
   channel per interval) so the renderer's bar-chart grouping has
   data to work with.
2. The IDFW h5 path didn't merge binary_peaks.mic_pspl_psi onto the
   IdfEvent before to_minimateplus_event().  The live save_imported_idf
   does this merge — without it, IdfEvent.from_report() only sees the
   .txt's dB(L) value, the bridge falls back to the dBL→psi formula
   (instead of the binary-accurate 2.14e-6 psi/count value), and the
   h5 writer's per-count mic factor lands on a less-correct value.
   Fix: same merge the live ingest does (lift res.event.peaks.mic_pspl_psi
   onto idf_event.peaks before the bridge call).
Verified against UM6047_20250804190047.IDFH (250-interval prod
histogram): 250 intervals decode, mic_pspl_psi = 2.78e-5 (was being
treated as dB(L)=107.7 in the old h5).
Operator: re-run after deploy.  `docker compose exec sfm python
scripts/backfill_thor_events.py` is idempotent — the existing version
check still skips events already at the new TOOL_VERSION, and review
state + captured_at are preserved on the second pass.
2026-06-01 20:02:54 +00:00
serversdown 6cb619ecc4 version bump - 0.21.1 2026-06-01 19:33:44 +00:00
serversdown 1ed86244d0 fix(thor-events): add parallel field for mic psi. Now shows mic in dbl and psi. (psi for charts) 2026-06-01 18:27:24 +00:00
serversdown b2c565f217 fix(idf_waveforms): _find_waveform_body_offset() — scans every 00 02 00 magic past offset 0x0E00, runs decode_waveform_v2 on each candidate, picks the one that returns the most samples. Validated on 483 prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully decode, 126/483 partial (BW codec walker-stops-early on loud events — known issue).
IDFH now synthesises a 1-sample-per-interval array from the binary intervals and writes an .h5 so the existing renderer works unchanged. Each "sample" is the per-interval peak ADC count → h5_value = count × geo_fs/32768 yields the right bar height.
2026-05-31 20:51:09 +00:00
serversdown 43f440812a scripts: add backfill_thor_events.py
Refreshes the bw_report sidecar block + .h5 waveform files for Thor
events ingested before the v0.21.0 adapter wiring + the bee1185 codec
fix.  Those events landed with extensions.idf_report only (no
bw_report, no .h5 for IDFW) — symptom on the UI side: the modal chart
404'd on /waveform.json and the PDF rendered from DB-only fields
without sensor self-check, full per-channel breakdown, or mic dB(L).

Walks <store>/<serial>/<filename>:
  - Reads the existing sidecar (preserves review state + captured_at)
  - Re-runs read_idf_file() on the binary bytes (passes data=
    kwarg so codec doesn't try the broken bare-path Path.read_bytes)
  - Reads extensions.idf_report from the existing sidecar
  - Runs build_bw_report_from_idf adapter
  - Writes refreshed sidecar with bw_report + bumped tool_version,
    preserving review block and original captured_at
  - For IDFW: regenerates .h5 by bridging IdfEvent.from_report ->
    to_minimateplus_event -> write_event_hdf5 (mirrors save_imported_idf
    steps 4-7)
  - IDFH events skip .h5 (histograms have no per-sample data)

Skips events already at current TOOL_VERSION with bw_report present.
--force overrides.  --skip-hdf5 limits to sidecar-only refresh.
--dry-run for preview.

Validated against the prod-snap waveform store: 3,815 Thor sidecars
refreshed cleanly with 0 errors, 462 IDFW .h5 files written, 2 skipped
(binaries with no sidecar — backfill doesn't conjure events from
nothing).  Verified one originally-broken IDFW event now serves
waveform.json (200, 168KB) and a fully populated PDF (119KB vs the
previous 56KB sparse output).

Operator workflow on prod:
  docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py --dry-run
  # Inspect counts, then for real:
  docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py

Idempotent — re-running it is a no-op once everything's at the current
TOOL_VERSION.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 04:37:43 +00:00
serversdown 23e83908c2 report_pdf: fix PVS overlapping stats table, drop NA caption
Two related fixes to the per-channel stats block:

1. Pin the stats table's position via an explicit bbox= on
   ax.table() so the bottom edge is at a known axes-fraction Y.
   The previous loc="upper left" + tbl.scale(1, 1.4) combo let
   matplotlib choose row heights based on text size, which made the
   table extend further below the axes than the hard-coded PVS line
   at y=-0.08 expected.  Result was the "Peak Vector Sum X in/s"
   string landing horizontally inside the Peak Displacement row.

   With bbox=[0, 1-N*0.12, 0.80, N*0.12] the table is pinned to a
   precise rectangle (12% axes-fraction per row × N rows tall).
   _draw_stats_table now stashes the bottom Y on the axes for the
   PVS helper to reference, so the geometry stays in sync.

2. Center PVS horizontally (ha="center" at x=0.5 instead of ha="left"
   at x=0).  The previous left-edge alignment put PVS at the same
   X as the label column, which read as "off-center" once the rest
   of the stats data was column-aligned further right.

3. Drop the "NA: Not Applicable" caption.  It existed to explain
   "—" placeholder cells, but "—" is universally understood and the
   caption was always visually squished against the PVS line below.
   Less cruft on the page; one fewer position to manage.

Verified against a real BE12599 histogram event (5 data rows) and
a real UM12947 IDFW waveform event (6 data rows) — both layouts
clear the table cleanly with no overlap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 22:17:43 +00:00
serversdown bee118506b fix(idf): decode from in-memory bytes during ingest
Bug shipped in v0.21.0: save_imported_idf called read_idf_file()
with `source_path` (a bare filename like "UM12947_….IDFW") BEFORE
writing the binary to disk.  The codec did Path(path).read_bytes()
which resolved relative to /app and hit FileNotFoundError.  The
error was caught + logged as a warning, and ingest fell back to
.txt-only — events still landed in the DB but lost the bw_report
block + .h5 waveform that the codec was supposed to produce.

Observed during a full re-forward from thor-watcher on 2026-05-29:
every Thor event logged "binary codec failed for X: [Errno 2] No
such file or directory" and got binary_decoded=False.

Fix:
- read_idf_file() gains a `data: Optional[bytes]` kwarg.  When
  supplied, skips the disk read and decodes the provided bytes
  directly.  `path` stays required (used for filename in error
  messages + .IDFH vs .IDFW suffix detection); only the read is
  conditional.  Backward compatible — existing positional callers
  (CLI scripts, tests) continue to work unchanged.
- save_imported_idf passes `data=idf_bytes` since the bytes are
  already in memory from the multipart upload.  Filesystem write
  still happens at step 5 of the existing flow; codec just no
  longer depends on it.

Verified end-to-end against UM11719_20231219162723.IDFW from the
example-data corpus: ingest endpoint returns inserted=1, log line
shows binary_decoded=True + h5=...IDFW.h5, no warnings.

Re-forward existing Thor events from thor-watcher after deploy to
backfill the bw_report block — UPSERT preserves review state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 20:09:54 +00:00
serversdown defd17d9c2 sfm_webapp: harmonize "Received by server at" → "Time received"
Matches Terra-View's event-modal relabel from the same iteration.
Wording was already clearer here than in Terra-View's "Captured at",
but using identical text across both surfaces means operators see the
same label whether they're in the native modal or the standalone
webapp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 19:51:58 +00:00
serversdown e42956a20b release: v0.21.0 — Thor / Series IV codec + Thor→BW adapter
Documents two commits that landed on dev since v0.20.0:

  9b71ead  series 4 codec work, initial decode success
           micromate/idf_file.read_idf_file() decodes both IDFW
           (waveform; 87-99% sample fidelity reusing
           decode_waveform_v2 at offset 0x0f1f) and IDFH (histogram;
           dedicated segment-based decoder, all 859 corpus files
           decode, 181,071 intervals total).

  9fd52dd  feat: add thor report generation, pdf generation
           micromate/idf_to_bw_report.py adapter projects parsed
           Thor data into the bw_report sidecar shape so Thor
           events flow through sfm/report_pdf.py without a
           separate renderer.  Wired into save_imported_idf.

Net effect: a Thor event ingested via /db/import/idf_file now
lands with the same fidelity as a BW event, gets a per-event PDF
on demand, and renders in Terra-View's modal chart using the same
plotting code as a BW event.

Roadmap items closed:
- Binary .IDFW / .IDFH codec (was pending)
- Series IV (Thor IDF) binary codec reverse-engineering

Companion: Terra-View v0.13.0 ships in parallel and closes Phase 1
of the SFM integration.  No API changes in seismo-relay for that
piece — Terra-View just consumes existing endpoints better.

Bumps:
- pyproject.toml 0.20.0 → 0.21.0
- minimateplus.event_file_io.TOOL_VERSION 0.20.0 → 0.21.0
  (any subsequent backfill_sidecars.py --force will re-stamp
  existing sidecars; expected + harmless)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 19:25:44 +00:00
serversdown 9fd52ddabb feat: add thor report generation, pdf generation. 2026-05-29 19:03:06 +00:00
serversdown 9b71ead44b series 4 codec work, inital decode success 2026-05-29 06:33:13 +00:00
36 changed files with 2913 additions and 116 deletions
+91
View File
@@ -8,6 +8,97 @@ All notable changes to seismo-relay are documented here.
--- ---
## v0.21.1 — 2026-06-01
Bug fixes against v0.21.0 surfaced after the first prod redeploy. Three
production-visible symptoms — blank waveform charts on most Thor events,
blank histogram charts on all Thor events, and a mic chart that
auto-scaled against a dB(L) value treated as psi — all root-caused and
fixed.
### Fixed
- **Dynamic IDFW body offset.** The v0.21.0 codec hardcoded the body
at file offset `0x0f1f` based on the example corpus, but only ~52%
of production IDFW events use that offset; the rest sit at offsets
from `0x1033` up to `0x3082` depending on header padding. At
`0x0f1f` the codec would find a coincidentally-matching `00 02 00`
magic, read the 2-byte Tran preamble, and return empty V/L/M
arrays — producing near-empty .h5 files and blank charts.
`micromate.idf_file._find_waveform_body_offset()` now scans every
`00 02 00` magic position past `0x0E00`, trial-decodes each one,
and picks the offset with the most samples. Validated across 483
prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully
decode, 126/483 partial (BW codec walker-stops-early on loud
events — pre-existing limitation, samples reached are correct).
- **IDFH histograms now render bar charts.** Histograms previously
skipped the .h5 write because there are no per-sample arrays, but
the renderer drives the per-interval bar chart from .h5 channel
data + `bw_report.histogram.n_intervals`. `save_imported_idf` now
synthesizes a 1-sample-per-interval array from the decoded
`IdfhInterval` peak counts and writes an .h5 so the existing
renderer works unchanged — each "sample" is the per-interval peak
ADC count, so the writer's `count × geo_fs/32768` conversion
yields the right bar height.
- **Mic chart scaling on Thor events.** `PeakValues.micl` (consumed
by the h5 writer's per-count mic scale factor) expects psi, but
the Thor bridge was stuffing the dB(L) value (~99.4) into it,
producing a per-count factor 5+ orders of magnitude too large and
a flat-looking mic chart. Fixed by adding `IdfPeaks.mic_pspl_psi`
alongside `mic_pspl_dbl`; `read_idf_file()` computes it from
binary mic counts (`max(|MicL|) × 2.14e-6 psi/count`) for both
IDFW and IDFH paths; `save_imported_idf` merges it onto the typed
event after `IdfEvent.from_report`; the bridge feeds psi to
`PeakValues.micl` with a dB(L)→psi formula fallback when only the
dB(L) value is available. dB(L) for the report header still
flows through `bw_report.mic.pspl_dbl` unchanged.
### Operator
After deploy, run `python scripts/backfill_thor_events.py` to refresh
every existing Thor event's sidecar + .h5 with the corrected codec
output. The script auto-skips events already at the current
`TOOL_VERSION`, so the bump from `0.21.0``0.21.1` is what triggers
the refresh.
---
## v0.21.0 — 2026-05-29
The "Thor / Series IV codec" release. Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline. Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event.
### Added — Thor IDF binary codec (`micromate/idf_file.read_idf_file`)
- **IDFW (waveform)** — body sits at fixed file offset `0x0f1f`; reuses the verified `decode_waveform_v2()` walker from `minimateplus.waveform_codec`. Sample fidelity is **8799% byte-exact** against the ASCII-sidecar reference values on quiet events; loud events hit the same walker-stops-early limitation as the BW codec on `SP0/SS0/SV0`-style events.
- **IDFH (histogram)** — dedicated segment-based decoder for the Thor histogram body format: `[len_be][0a 00 00 00][00 NN][05 3f]` framing plus N × 72-byte interval records (4 × 16-byte per-channel min/max/halfp). **All 859 Thor IDFH corpus files decode**, totalling **181,071 intervals**; per-channel peaks match the sidecar within **~1.8% (ADC quantization)**.
- **BW-aliased binary detection** — a small number of corpus files (e.g. `BE9439_*.IDFW/IDFH`) are actually Series III Blastware binaries that share the IDF filename convention by accident. `read_idf_file()` detects them via their BW `STRT` signature and raises `NotImplementedError` pointing the caller at `read_blastware_file()` instead of trying to decode them as IDF.
- Full field layouts in `docs/idf_protocol_reference.md`; supporting analysis scripts in `analysis_idf/` (decode validators, per-file detail dumps, corpus accuracy reports).
### Added — Thor → BW report adapter (`micromate/idf_to_bw_report.py`)
- **`build_bw_report_from_idf(report_dict, binary_md=, intervals=, is_histogram=)`** projects a parsed Thor `IdfReport` plus binary-extracted metadata plus decoded IDFH intervals into the `bw_report`-shaped dict that `sfm.report_pdf.gather_report_data` consumes. No need to duplicate the renderer — Thor data is ~95% the same metric set as BW; the adapter handles the field-name mapping (`MicPSPL``pspl_dbl`, `>100` sentinel → `zc_freq_above_range`, free-form `Calibration : Nov 22, 2023 by Instantel``calibration_date` + `calibration_by`, etc.).
- For IDFH events the adapter derives `histogram.interval_times` by stepping `IntervalSize` from `HistogramStartTime`, matching what the BW pipeline expects from a histogram-mode event.
- **Wired into `WaveformStore.save_imported_idf`** — every Thor event ingested via `/db/import/idf_file` now gets a `bw_report` block in its sidecar in addition to the existing `extensions.idf_report` (the raw parsed Thor payload). Falls back gracefully (PDF renders from DB-only fields) if the adapter raises — logged as a warning rather than failing the ingest.
### Companion releases
- **Terra-View v0.13.0** ships in parallel — closes Phase 1 of the SFM integration. The shared event-detail modal now renders the SFM event story (Chart.js waveform/histogram chart, inline PDF preview, `.TXT` download, FT/reviewer/notes review form) without operators needing to bounce to the standalone SFM webapp on port 8200. Uses only existing seismo-relay endpoints — no API changes here, just better consumption.
### Migration / Operations
No DB migration needed. Existing Thor events already in the store don't automatically pick up the new `bw_report` block — they'd need a re-ingest (post the IDF binary + paired `.TXT` back to `/db/import/idf_file`) for the adapter to run. Alternatively, run `scripts/backfill_sidecars.py --reparse-txt` after a small adapter change (the script currently only re-runs the BW ASCII parser; extending it to handle Thor would be a small follow-up).
```bash
cd /home/serversdown/terra-view
docker compose build sfm && docker compose up -d sfm
```
The bumped `TOOL_VERSION = "0.21.0"` in `minimateplus/event_file_io.py` means any subsequent `backfill_sidecars.py --force` pass will re-write sidecars with the new version stamp; that's expected and harmless.
---
## v0.20.0 — 2026-05-28 ## v0.20.0 — 2026-05-28
The "PDF + parser polish" release. Closes out the Event-Report PDF iteration started in v0.17.x: histogram layouts now render correctly against BW reference PDFs, the ASCII parser handles the real-world edge cases production events were tripping over (OORANGE, `>100 Hz`, histogram timestamps), and the `.TXT` preservation rollout lets parser fixes be applied retroactively to ingested events. Adds server-wide timezone support so operator-visible timestamps no longer drift into UTC. Rolls up the substantial "pre-v0.20" body of work that had accumulated under `[Unreleased]` (PDF generation, histogram codec fix, histogram parser fields, `.TXT` preservation, backfill safety) — see the trailing "pre-v0.20.0 work" section below for the full list. The "PDF + parser polish" release. Closes out the Event-Report PDF iteration started in v0.17.x: histogram layouts now render correctly against BW reference PDFs, the ASCII parser handles the real-world edge cases production events were tripping over (OORANGE, `>100 Hz`, histogram timestamps), and the `.TXT` preservation rollout lets parser fixes be applied retroactively to ingested events. Adds server-wide timezone support so operator-visible timestamps no longer drift into UTC. Rolls up the substantial "pre-v0.20" body of work that had accumulated under `[Unreleased]` (PDF generation, histogram codec fix, histogram parser fields, `.TXT` preservation, backfill safety) — see the trailing "pre-v0.20.0 work" section below for the full list.
+23 -1
View File
@@ -2,7 +2,7 @@
Ground-up Python replacement for **Blastware**, Instantel's Windows-only software for Ground-up Python replacement for **Blastware**, Instantel's Windows-only software for
managing MiniMate Plus seismographs. Connects over direct RS-232 or cellular modem managing MiniMate Plus seismographs. Connects over direct RS-232 or cellular modem
(Sierra Wireless RV50 / RV55). Current version: **v0.20.0**. (Sierra Wireless RV50 / RV55). Current version: **v0.21.0**.
When new information about the protocol is discovered, please update the instantel_protocol_reference.md with the findings in addition to this document When new information about the protocol is discovered, please update the instantel_protocol_reference.md with the findings in addition to this document
@@ -73,6 +73,28 @@ should not import from `sfm/`, must not touch a DB, and have no I/O
beyond reading files passed as arguments. Keep them pure — both beyond reading files passed as arguments. Keep them pure — both
tiers can then depend on them without circularity. tiers can then depend on them without circularity.
#### Thor IDF binary codec (2026-05-28)
`micromate/idf_file.read_idf_file()` decodes both Thor IDFW
(waveform) and IDFH (histogram) binaries.
- **IDFW** reuses `decode_waveform_v2()` on the body at fixed file
offset `0x0f1f`. Sample fidelity is 8799% byte-exact on quiet
events; loud events hit the BW codec's known walker-stops-early
limitation.
- **IDFH** has its own segment-based decoder: `[len_be][0a 00 00 00]
[00 NN][05 3f]` + N × 72-byte interval records (4 × 16-byte
per-channel min/max/halfp). All 859 Thor IDFH corpus files
decode (181,071 intervals); peak matches sidecar within ~1.8%
(ADC quantization).
The two outlier `BE9439_*` files in the Thor example corpus are
actually Series III Blastware binaries that share the `.IDFW`/`.IDFH`
filename convention by accident. `read_idf_file()` detects them by
their BW STRT signature and raises NotImplementedError pointing
callers at `read_blastware_file()`. See
`docs/idf_protocol_reference.md` for full field layouts.
### Practical consequences ### Practical consequences
When deciding where new code goes, ask: When deciding where new code goes, ask:
+14 -4
View File
@@ -1,4 +1,4 @@
# seismo-relay `v0.20.0` # seismo-relay `v0.21.0`
A ground-up replacement for **Blastware** — Instantel's aging Windows-only A ground-up replacement for **Blastware** — Instantel's aging Windows-only
software for managing seismographs. Supports both the **MiniMate Plus software for managing seismographs. Supports both the **MiniMate Plus
@@ -45,6 +45,15 @@ over direct RS-232 or cellular modem (Sierra Wireless RV50 / RV55).
> `scripts/backfill_sidecars.py --reparse-txt` lets parser fixes be > `scripts/backfill_sidecars.py --reparse-txt` lets parser fixes be
> applied retroactively to existing events without re-forwarding, > applied retroactively to existing events without re-forwarding,
> using the `.TXT` files preserved at ingest time. > using the `.TXT` files preserved at ingest time.
> **v0.21.0 (2026-05-29)** is the Thor / Series IV decoder release —
> `micromate/idf_file.read_idf_file()` now decodes both IDFW
> (waveform) and IDFH (histogram) binaries (8799% sample fidelity
> on quiet IDFW events; all 859 IDFH corpus files decode cleanly).
> A new `micromate/idf_to_bw_report.py` adapter projects parsed
> Thor reports into the BW-shaped sidecar block, so Thor events
> flow through the existing Event Report PDF pipeline without a
> separate renderer. Terra-View v0.13.0 ships in parallel and
> closes Phase 1 of the SFM integration — see its CHANGELOG.
> See [CHANGELOG.md](CHANGELOG.md) for full version history. > See [CHANGELOG.md](CHANGELOG.md) for full version history.
--- ---
@@ -68,7 +77,8 @@ seismo-relay/
├── micromate/ ← Series IV (Micromate / Thor) client library (NEW v0.19) ├── micromate/ ← Series IV (Micromate / Thor) client library (NEW v0.19)
│ ├── models.py ← IdfEvent, IdfReport, IdfPeaks, IdfProjectInfo, IdfSensorCheck (mic in native dB(L)) │ ├── models.py ← IdfEvent, IdfReport, IdfPeaks, IdfProjectInfo, IdfSensorCheck (mic in native dB(L))
│ ├── idf_ascii_report.py ← Parse Thor .IDFW.txt / .IDFH.txt event sidecars │ ├── idf_ascii_report.py ← Parse Thor .IDFW.txt / .IDFH.txt event sidecars
── idf_file.py ← Stub for the .IDFW / .IDFH binary codec (reverse-engineering pending) ── idf_file.py ← Binary codec for .IDFW + .IDFH (v0.21.0+)
│ └── idf_to_bw_report.py ← Adapter projecting Thor IDF into the BW report shape (v0.21.0+)
├── sfm/ ← SFM REST API server (FastAPI, port 8200) ├── sfm/ ← SFM REST API server (FastAPI, port 8200)
│ ├── server.py ← Live device endpoints + DB query + ingest endpoints + caching │ ├── server.py ← Live device endpoints + DB query + ingest endpoints + caching
@@ -425,7 +435,7 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows.
- [x] Thor IDF file ingest at `/db/import/idf_file` (paired with `thor-watcher`, v0.18.0+) - [x] Thor IDF file ingest at `/db/import/idf_file` (paired with `thor-watcher`, v0.18.0+)
- [x] Native `IdfEvent` / `IdfReport` typed models — mic in dB(L), full title strings, sensor self-check, calibration, firmware version - [x] Native `IdfEvent` / `IdfReport` typed models — mic in dB(L), full title strings, sensor self-check, calibration, firmware version
- [x] Parser verified against 1,014 paired `.txt` sidecars in `thor-watcher/example-data/` - [x] Parser verified against 1,014 paired `.txt` sidecars in `thor-watcher/example-data/`
- [ ] Binary `.IDFW` / `.IDFH` codec — pending (see Roadmap + [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md)) - [x] Binary `.IDFW` / `.IDFH` codec — ✅ v0.21.0. IDFW reuses `decode_waveform_v2()` on the body at offset `0x0f1f` (8799% sample fidelity on quiet events); IDFH has a dedicated segment-based decoder (all 859 corpus files decode, 181,071 intervals total). See `micromate/idf_file.py` + `docs/idf_protocol_reference.md`.
- [ ] Live-device protocol — pending codec - [ ] Live-device protocol — pending codec
**Data persistence:** **Data persistence:**
@@ -538,7 +548,7 @@ Implementation steps (concrete):
### High-impact (unblocks product features) ### High-impact (unblocks product features)
- [ ] **Series III waveform body codec reverse-engineering.** The 5A bulk-stream body is some kind of compressed/encoded format (not raw int16 LE as previously assumed — see §7.6.1 retraction in `docs/instantel_protocol_reference.md`). Structural framing is ~50% decoded on branch `claude/codec-re-cBGNe` (tagged-block walker, segment counters); per-byte sample mapping is still open. Until this lands, the in-app waveform viewer renders garbage and BW-import peak values fall back to `_peaks_from_samples()` saturation noise. Workaround: pair every BW-imported event with its `_ASCII.TXT` so the device-authoritative peaks land in the DB regardless of codec. - [ ] **Series III waveform body codec reverse-engineering.** The 5A bulk-stream body is some kind of compressed/encoded format (not raw int16 LE as previously assumed — see §7.6.1 retraction in `docs/instantel_protocol_reference.md`). Structural framing is ~50% decoded on branch `claude/codec-re-cBGNe` (tagged-block walker, segment counters); per-byte sample mapping is still open. Until this lands, the in-app waveform viewer renders garbage and BW-import peak values fall back to `_peaks_from_samples()` saturation noise. Workaround: pair every BW-imported event with its `_ASCII.TXT` so the device-authoritative peaks land in the DB regardless of codec.
- [ ] **Series IV (Thor IDF) binary codec reverse-engineering.** `.IDFH` / `.IDFW` files are currently stored opaquely by `WaveformStore.save_imported_idf`, with all metadata sourced from the paired `.txt` sidecar. This works because thor-watcher forwards both files together, but operators who haven't enabled Thor's TXT exporter get rows with NULL peaks. Cracking the binary closes that gap and unlocks waveform display. Starting-point reference at [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md) — two observed file signatures (1,012 newer-firmware files + 2 old files whose layout matches the Series III STRT-record format), suggested first-session plan (~2-4 hrs), 1,014 paired binary+txt files available as ground truth in `thor-watcher/example-data/`. Code seam ready at `micromate/idf_file.py`. - [x] **Series IV (Thor IDF) binary codec reverse-engineering.** ✅ v0.21.0 — `micromate/idf_file.read_idf_file()` decodes both IDFW (waveform body at offset `0x0f1f`, reusing `decode_waveform_v2()`; 8799% sample fidelity on quiet events) and IDFH (dedicated segment-based decoder: all 859 corpus files decode, 181,071 intervals, peaks within ~1.8% of sidecar values). `WaveformStore.save_imported_idf` now also projects parsed Thor data into a `bw_report` block via `micromate/idf_to_bw_report.py` so Thor events render in the existing Event Report PDF pipeline without a separate renderer.
- [ ] **In-app waveform viewer accuracy.** Depends on Series III codec decode. Plot.v1 JSON pipeline + viewer skeleton already exist; will start showing real waveforms automatically once `_decode_a5_waveform` produces correct samples. Series IV waveforms come online when the IDF codec lands. - [ ] **In-app waveform viewer accuracy.** Depends on Series III codec decode. Plot.v1 JSON pipeline + viewer skeleton already exist; will start showing real waveforms automatically once `_decode_a5_waveform` produces correct samples. Series IV waveforms come online when the IDF codec lands.
- [ ] **Series IV live-device support.** Once the IDF binary is decoded, extend `micromate/` with `transport.py` / `framing.py` / `protocol.py` / `client.py` mirroring the `minimateplus/` package layout — depends on capturing Thor's wire protocol (TCP / RS-232 captures TBD). - [ ] **Series IV live-device support.** Once the IDF binary is decoded, extend `micromate/` with `transport.py` / `framing.py` / `protocol.py` / `client.py` mirroring the `minimateplus/` package layout — depends on capturing Thor's wire protocol (TCP / RS-232 captures TBD).
- [ ] **Terra-view integration** — seismo-relay router, unit detail page, VISON-style event listing. - [ ] **Terra-view integration** — seismo-relay router, unit detail page, VISON-style event listing.
+65
View File
@@ -0,0 +1,65 @@
"""Run read_idf_file across the corpus and report per-channel accuracy vs sidecars."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file
from analysis_idf.recon import load_sidecar_samples
def sidecar_path(idfw: Path) -> Path:
return idfw.parent / "TXT" / f"{idfw.name}.txt"
def main():
root = REPO / "tests/fixtures/THORDATA_example"
files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]
files.sort()
GEO_LSB = 0.0003
n_ok = n_skip = 0
overall = {"Tran": [], "Vert": [], "Long": []}
for f in files:
try:
res = read_idf_file(f)
except Exception:
n_skip += 1
continue
sc_path = sidecar_path(f)
if not sc_path.exists():
n_skip += 1
continue
try:
sc = load_sidecar_samples(sc_path)
except Exception:
n_skip += 1
continue
per_file = {}
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = res.samples.get(ch, [])
n = min(len(sc_counts), len(dec))
if n == 0:
per_file[ch] = 0.0
continue
exact = sum(1 for i in range(n) if sc_counts[i] == dec[i])
pct = 100.0 * exact / n
per_file[ch] = pct
overall[ch].append(pct)
n_ok += 1
print(f"Processed {n_ok} files (skipped {n_skip})")
print("Per-channel exact-match % (mean / min / max):")
for ch, vals in overall.items():
if vals:
avg = sum(vals) / len(vals)
print(f" {ch}: mean={avg:.2f}% min={min(vals):.2f}% max={max(vals):.2f}% n={len(vals)}")
if __name__ == "__main__":
main()
+49
View File
@@ -0,0 +1,49 @@
"""Find where decoded-vs-sidecar diverges for each channel."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
decoded = decode_waveform_v2(buf[0x0f1f:])
GEO_LSB = 0.0003
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = decoded[ch]
# Find ALL transitions where mismatches start/stop
first_diff = next((i for i in range(len(dec)) if dec[i] != sc_counts[i]), None)
if first_diff is None:
print(f"{ch}: NO MISMATCHES")
continue
print(f"{ch}: first diff at idx {first_diff}")
# Show 5 before, 5 after
for i in range(max(0, first_diff - 3), min(len(dec), first_diff + 8)):
mark = " " if dec[i] == sc_counts[i] else "**"
print(f" {mark} idx {i:4d}: sc={sc_counts[i]:6d} dec={dec[i]:6d} diff={dec[i]-sc_counts[i]:+d}")
# Where does cumulative diff exceed 100?
cum_match_run = 0
max_match_run = 0
match_run_start = 0
diff_count = 0
for i in range(len(dec)):
if dec[i] == sc_counts[i]:
cum_match_run += 1
max_match_run = max(max_match_run, cum_match_run)
else:
cum_match_run = 0
diff_count += 1
print(f" total mismatches: {diff_count}/{len(dec)}, longest run of matches: {max_match_run}")
print()
if __name__ == "__main__":
main()
+48
View File
@@ -0,0 +1,48 @@
"""End-to-end IDFH ingest verification."""
from __future__ import annotations
import sys
import tempfile
import json
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
def main():
idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
txt = idfh.parent / "TXT" / f"{idfh.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfh.read_bytes(),
idfh,
idf_report_text=txt.read_text(errors="replace"),
)
print("=== save_imported_idf (IDFH) ===")
print(f" serial: {rec['serial']}")
print(f" filename: {rec['filename']}")
print(f" filesize: {rec['filesize']}")
print(f" h5: {rec['hdf5_filename']}") # expect None for histogram
print(f" sidecar: {rec['sidecar_filename']}")
print()
print("=== Event ===")
print(f" timestamp: {ev.timestamp}")
print(f" record_type: {ev.record_type}")
print(f" sample_rate: {ev.sample_rate}")
print()
# Inspect sidecar to confirm intervals were stashed
sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json"
sc = json.loads(sc_path.read_text())
intervals = sc.get("extensions", {}).get("idf_intervals", [])
print(f" sidecar intervals: {len(intervals)}")
if intervals:
print(f" first interval: {intervals[0]}")
print(f" last interval: {intervals[-1]}")
if __name__ == "__main__":
main()
+40
View File
@@ -0,0 +1,40 @@
"""Verify the had_report=False path: ingest IDFW with no .txt."""
from __future__ import annotations
import sys
from pathlib import Path
import tempfile
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
def main():
idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfw.read_bytes(),
idfw,
serial_hint=None,
idf_report_text=None, # ← no .txt!
)
print("=== IDFW without .txt ingest ===")
print(f" serial: {rec['serial']}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" record_type: {ev.record_type}")
print(f" rectime_sec: {ev.rectime_seconds}")
nT = len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0
nV = len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0
nL = len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0
nM = len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0
print(f" raw_samples: Tran={nT} Vert={nV} Long={nL} MicL={nM}")
if ev.peak_values:
print(f" peak_values: tran={ev.peak_values.tran} vert={ev.peak_values.vert} long={ev.peak_values.long}")
print(f" h5 written: {rec['hdf5_filename']}")
if __name__ == "__main__":
main()
+102
View File
@@ -0,0 +1,102 @@
"""End-to-end Thor report PDF rendering.
Ingests an IDFW + .txt via save_imported_idf, runs gather_report_data
(faking a minimal DB row), and renders the PDF to disk.
"""
from __future__ import annotations
import sys
import tempfile
import json
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
from sfm import report_pdf
class FakeDb:
"""Stand-in for SeismoDb.get_event(); the renderer only needs a few cols."""
def __init__(self, event):
self.event = event
def get_event(self, _id):
return self.event
def main():
base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719"
idfw = base / "UM11719_20231219162723.IDFW"
txt = base / "TXT" / f"{idfw.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfw.read_bytes(),
idfw,
idf_report_text=txt.read_text(errors="replace"),
)
print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}")
# Verify sidecar has bw_report block
sc_path = Path(td) / "UM11719" / f"{idfw.name}.sfm.json"
sc = json.loads(sc_path.read_text())
bw = sc.get("bw_report", {})
print(f" bw_report.available: {bw.get('available')}")
print(f" bw_report.peaks.tran.ppv_ips: {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}")
print(f" bw_report.mic.pspl_dbl: {bw.get('mic', {}).get('pspl_dbl')}")
print(f" bw_report.histogram.n_intervals: {bw.get('histogram', {}).get('n_intervals')}")
# Build a DB-row-shaped dict from the Event for gather_report_data
import datetime
ts = ev.timestamp
ts_iso = None
if ts is not None:
try:
ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
except Exception:
pass
fake_row = {
"serial": "UM11719",
"blastware_filename": rec["filename"],
"record_type": "Waveform",
"timestamp": ts_iso,
"sample_rate": ev.sample_rate,
"project": ev.project_info.project if ev.project_info else None,
"client": ev.project_info.client if ev.project_info else None,
"operator": ev.project_info.operator if ev.project_info else None,
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
"created_at": None,
}
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
print()
print(f"=== ReportData ===")
print(f" event_id: {rd.event_id}")
print(f" serial: {rd.serial}")
print(f" record_type: {rd.record_type}")
print(f" event_datetime: {rd.event_datetime_str}")
print(f" trigger: {rd.trigger_source}")
print(f" geo_range: {rd.geo_range_str}")
print(f" sample_rate: {rd.sample_rate_str}")
print(f" firmware: {rd.firmware}")
print(f" calibration: {rd.calibration_date} by {rd.calibration_by}")
print(f" battery: {rd.battery_volts}")
print(f" PVS: {rd.peak_vector_sum_ips} in/s at {rd.peak_vector_sum_time_s} sec")
print(f" mic_pspl_dbl: {rd.mic_pspl_dbl}")
print(f" mic_zc_freq_hz: {rd.mic_zc_freq_hz}")
print(f" channel_stats: {len(rd.channel_stats)} rows")
for cs in rd.channel_stats:
print(f" {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} ToP={cs['time_of_peak_s']} Acc={cs['peak_accel_g']} Disp={cs['peak_disp_in']} Test={cs['sensor_check']}")
# Render the PDF
out_path = REPO / "analysis_idf" / "thor_report.pdf"
pdf_bytes = report_pdf.render_event_report_pdf(rd)
out_path.write_bytes(pdf_bytes)
print()
print(f" PDF written: {out_path} ({len(pdf_bytes)} bytes)")
if __name__ == "__main__":
main()
+91
View File
@@ -0,0 +1,91 @@
"""End-to-end Thor IDFH histogram report PDF rendering."""
from __future__ import annotations
import sys
import tempfile
import json
import datetime
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
from sfm import report_pdf
class FakeDb:
def __init__(self, event):
self.event = event
def get_event(self, _id):
return self.event
def main():
# Use the multi-interval IDFH (81 + trigger row)
idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
txt = idfh.parent / "TXT" / f"{idfh.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfh.read_bytes(),
idfh,
idf_report_text=txt.read_text(errors="replace"),
)
print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}")
sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json"
sc = json.loads(sc_path.read_text())
bw = sc.get("bw_report", {})
hist = bw.get("histogram", {})
print(f" bw_report.histogram.start: {hist.get('start')}")
print(f" bw_report.histogram.stop: {hist.get('stop')}")
print(f" bw_report.histogram.n_intervals: {hist.get('n_intervals')}")
print(f" bw_report.histogram.interval_size: {hist.get('interval_size')}")
print(f" bw_report.histogram.interval_size_s: {hist.get('interval_size_s')}")
print(f" bw_report.peaks.tran.ppv_ips: {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}")
ts = ev.timestamp
ts_iso = None
if ts is not None:
try:
ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
except Exception:
pass
fake_row = {
"serial": "UM13981",
"blastware_filename": rec["filename"],
"record_type": "Histogram",
"timestamp": ts_iso,
"sample_rate": ev.sample_rate,
"project": ev.project_info.project if ev.project_info else None,
"client": ev.project_info.client if ev.project_info else None,
"operator": ev.project_info.operator if ev.project_info else None,
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
"created_at": None,
}
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="hist-1")
print()
print("=== ReportData (histogram) ===")
print(f" is_histogram: {rd.is_histogram}")
print(f" histogram_start: {rd.histogram_start_str}")
print(f" histogram_stop: {rd.histogram_stop_str}")
print(f" histogram_n_intervals: {rd.histogram_n_intervals}")
print(f" histogram_interval_size:{rd.histogram_interval_size}")
print(f" histogram_interval_times[:3]: {rd.histogram_interval_times[:3]}")
print(f" histogram_interval_times[-2:]: {rd.histogram_interval_times[-2:]}")
print(f" channel_stats: {len(rd.channel_stats)} rows")
for cs in rd.channel_stats:
print(f" {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} peak_date={cs['peak_date']} peak_time={cs['peak_time']}")
pdf_bytes = report_pdf.render_event_report_pdf(rd)
out_path = REPO / "analysis_idf" / "thor_report_idfh.pdf"
out_path.write_bytes(pdf_bytes)
print()
print(f" PDF written: {out_path} ({len(pdf_bytes)} bytes)")
if __name__ == "__main__":
main()
+52
View File
@@ -0,0 +1,52 @@
"""End-to-end ingest test: feed an IDFW + .txt to save_imported_idf in a tmp store."""
from __future__ import annotations
import sys
from pathlib import Path
import tempfile
import shutil
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
def main():
idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
txt = idfw.parent / "TXT" / f"{idfw.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfw.read_bytes(),
idfw,
serial_hint=None,
idf_report_text=txt.read_text(errors="replace"),
)
print("=== Save result ===")
print(f" serial: {rec['serial']}")
print(f" filename: {rec['filename']}")
print(f" filesize: {rec['filesize']}")
print(f" h5: {rec['hdf5_filename']}")
print(f" sidecar: {rec['sidecar_filename']}")
print()
print("=== Event ===")
print(f" serial: {ev.serial if hasattr(ev,'serial') else '(n/a)'}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" record_type: {ev.record_type}")
print(f" rectime_sec: {ev.rectime_seconds}")
print(f" raw_samples: Tran={len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0}, Vert={len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0}, Long={len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0}, MicL={len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0}")
if ev.peak_values:
print(f" peaks (txt): Tran={ev.peak_values.tran} Vert={ev.peak_values.vert} Long={ev.peak_values.long}")
print()
# Verify the h5 file actually got written
h5path = Path(td) / "UM11719" / f"{idfw.name}.h5"
print(f" h5 exists: {h5path.exists()} size={h5path.stat().st_size if h5path.exists() else 0}")
sidecar = Path(td) / "UM11719" / f"{idfw.name}.sfm.json"
print(f" sidecar exists:{sidecar.exists()} size={sidecar.stat().st_size if sidecar.exists() else 0}")
if __name__ == "__main__":
main()
+137
View File
@@ -0,0 +1,137 @@
"""Decode IDFH histogram intervals + verify against sidecar."""
from __future__ import annotations
import sys
import struct
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
SEGMENT_MAGIC = b"\x02\xda\x0a\x00\x00\x00"
SEGMENT_SIZE = 732 # = 10-byte header + 10 × 72-byte intervals + 2-byte tail
INTERVAL_SIZE = 72
CHANNELS = ("Tran", "Vert", "Long", "MicL")
def decode_interval(buf72: bytes) -> dict:
"""Decode one 72-byte interval into per-channel min/max/halfp."""
out = {}
for i, ch in enumerate(CHANNELS):
block = buf72[i*16 : (i+1)*16]
mn = struct.unpack_from(">h", block, 0)[0]
mx = struct.unpack_from(">h", block, 2)[0]
sb = struct.unpack_from(">h", block, 4)[0]
halfp = struct.unpack_from(">H", block, 6)[0]
f10 = struct.unpack_from(">H", block, 10)[0]
f14 = struct.unpack_from(">H", block, 14)[0]
peak_count = max(abs(mn), abs(mx))
out[ch] = {
"min": mn,
"max": mx,
"field4": sb,
"halfp": halfp,
"field10": f10,
"field14": f14,
"peak": peak_count,
"freq_hz": (512.0 / halfp) if halfp > 5 else None,
}
out["_tail"] = buf72[64:].hex(" ")
return out
def walk_idfh(buf: bytes) -> list:
"""Walk all interval records in an IDFH file."""
intervals = []
# Multi-segment file: every 02 da 0a 00 00 00 marker introduces a segment.
# Single-interval file: just one body header at 0xf96 of form ?? ?? 0a 00 00 00.
# Find them all.
i = 0
while True:
j = buf.find(b"\x0a\x00\x00\x00", i)
if j < 0:
break
# Validate: the 2 bytes before must form a length, and we want bytes
# [j-2 : j+6] to have a recognisable shape. Actually the cleanest
# filter is "preceded by a length and followed by 00 NN 05 3f".
if j < 2:
i = j + 1
continue
# Body header form: [length_be_2][0a 00 00 00][00 NN][05 3f]
if j + 10 > len(buf):
break
length = int.from_bytes(buf[j-2:j], "big")
# Verify the segment-marker shape: [length_be][0a 00 00 00][00 NN][05 3f]
if buf[j+4] != 0x00:
i = j + 1
continue
if buf[j+6:j+8] != b"\x05\x3f":
i = j + 1
continue
# Header layout (10 bytes): [length_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B]
# Followed by N interval records of 72 bytes each, then 2 tail bytes.
# length value = (N × 72) + 10 (counts bytes from 0x0a... through interval data).
header_start = j - 2
n_intervals = (length - 10) // INTERVAL_SIZE
interval_start = header_start + 10
for k in range(n_intervals):
off = interval_start + k * INTERVAL_SIZE
if off + INTERVAL_SIZE > len(buf):
break
chunk = buf[off:off + INTERVAL_SIZE]
intervals.append({"offset": off, **decode_interval(chunk)})
i = header_start + length + 2
return intervals
def main():
# Test against multi-segment IDFH
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
sc_path = target.parent / "TXT" / f"{target.name}.txt"
buf = target.read_bytes()
intervals = walk_idfh(buf)
print(f"=== {target.name} ===")
print(f" file size: {len(buf)}")
print(f" decoded intervals: {len(intervals)}")
# Show first 2 + last 2
sc_rows = []
for line in sc_path.read_text(errors="replace").splitlines():
if line.startswith("2022-") or line.startswith("2023-"):
sc_rows.append(line)
print(f" sidecar rows: {len(sc_rows)}")
print()
for k in [0, 1, 78, 79, 80]:
if k >= len(intervals):
continue
iv = intervals[k]
print(f"--- interval {k} @0x{iv['offset']:04x} ---")
for ch in CHANNELS:
d = iv[ch]
peak_ips = d["peak"] / 32768 * 10.0
print(f" {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s) halfp={d['halfp']:5d} freq={d['freq_hz']}")
# sidecar row
if k < len(sc_rows):
print(f" SC: {sc_rows[k]}")
# Test single-interval IDFH
print()
target2 = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH"
sc2 = target2.parent / "TXT" / f"{target2.name}.txt"
buf2 = target2.read_bytes()
intervals2 = walk_idfh(buf2)
print(f"=== {target2.name} ===")
print(f" file size: {len(buf2)}, decoded intervals: {len(intervals2)}")
if intervals2:
iv = intervals2[0]
for ch in CHANNELS:
d = iv[ch]
peak_ips = d["peak"] / 32768 * 10.0
print(f" {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s) halfp={d['halfp']:5d} freq={d['freq_hz']}")
sc_rows2 = [l for l in sc2.read_text(errors='replace').splitlines() if l.startswith("2023-")]
if sc_rows2:
print(f" SC: {sc_rows2[0]}")
if __name__ == "__main__":
main()
+41
View File
@@ -0,0 +1,41 @@
"""Find IDFH interval period via auto-correlation of structural patterns."""
from __future__ import annotations
import sys
from pathlib import Path
from collections import Counter
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
def main():
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
buf = target.read_bytes()
body_start = 0xF96
body_end = 0x270C
body = buf[body_start:body_end]
print(f"body size: {len(body)} bytes (file {len(buf)} bytes)")
# For each candidate interval size, count how many bytes at fixed offsets within
# each interval are zero (consistent column-zero pattern indicates correct size).
print()
print("=== zero-column score by interval size (higher = more likely) ===")
best = []
for sz in range(16, 100):
n = len(body) // sz
if n < 30:
continue
# For each column position within an interval, count how many of n intervals have zero
score = 0
for col in range(sz):
zeros = sum(1 for i in range(n) if body[i*sz + col] == 0)
if zeros >= n * 0.9:
score += 1
best.append((score, sz, n))
best.sort(reverse=True)
for score, sz, n in best[:10]:
print(f" size={sz:3d} n_intervals={n} consistently-zero-cols={score}")
if __name__ == "__main__":
main()
+40
View File
@@ -0,0 +1,40 @@
"""Per-file accuracy + sample-count details."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file
from analysis_idf.recon import load_sidecar_samples
def main():
root = REPO / "tests/fixtures/THORDATA_example"
files = sorted([f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")])
GEO_LSB = 0.0003
# Limit to first 15 successful files for detail.
shown = 0
for f in files:
try:
res = read_idf_file(f)
except Exception:
continue
sc_path = f.parent / "TXT" / f"{f.name}.txt"
if not sc_path.exists():
continue
sc = load_sidecar_samples(sc_path)
sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]]
dec = res.samples.get("Tran", [])
n = min(len(sc_tran), len(dec))
exact = sum(1 for i in range(n) if sc_tran[i] == dec[i]) if n else 0
pct = 100.0 * exact / n if n else 0.0
print(f"{f.name:40s} size={f.stat().st_size:6d} sc_n={len(sc_tran):4d} dec_n={len(dec):4d} exact={pct:.1f}%")
shown += 1
if shown >= 20:
break
if __name__ == "__main__":
main()
+64
View File
@@ -0,0 +1,64 @@
"""Look at what's at the divergence boundary."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import walk_body, find_data_start, parse_segment_header
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
body = buf[0x0f1f:]
start = find_data_start(body)
print(f"data_start: {start} (= file offset 0x{0x0f1f + start:04x})")
blocks = walk_body(body, start)
print(f"{len(blocks)} blocks total")
print()
# First 25 blocks
print("=== first 30 blocks ===")
for i, b in enumerate(blocks[:30]):
body_off = 0x0f1f + b.offset
if b.tag_hi == 0x40:
hdr = parse_segment_header(b)
print(f" [{i:3d}] @0x{body_off:04x} {b.kind} (segment header) counter={hdr['counter'] if hdr else '?'} field2={hdr['field2'].hex() if hdr else '?'} anchor={hdr['anchor_bytes'].hex() if hdr else '?'} tail={hdr['tail'].hex() if hdr else '?'}")
else:
print(f" [{i:3d}] @0x{body_off:04x} {b.kind} len={b.length} data={b.data[:16].hex()}")
print()
# Cumulative sample counts per block to find which block contains sample 254
print("=== cumulative samples through blocks ===")
cur_ch = "Tran"
rotation = ["Vert", "Long", "MicL", "Tran"]
seg_count = 0
samples_in_curseg = 2 # preamble Tran[0], Tran[1]
for i, b in enumerate(blocks[:30]):
if b.tag_hi == 0x40:
seg_count += 1
prev_ch = cur_ch
cur_ch = rotation[(seg_count - 1) % 4]
print(f" [{i:3d}] 40 02 -> end of {prev_ch} segment, start {cur_ch} (segment {seg_count})")
samples_in_curseg = 2 # anchors
elif (b.tag_hi & 0xF0) == 0x10:
nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo
samples_in_curseg += nn
print(f" [{i:3d}] {b.kind} nibble: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
elif (b.tag_hi & 0xF0) == 0x20:
nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo
samples_in_curseg += nn
print(f" [{i:3d}] {b.kind} int8: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
elif b.tag_hi == 0x00:
samples_in_curseg += b.tag_lo
print(f" [{i:3d}] {b.kind} RLE: +{b.tag_lo}, ch={cur_ch}, ch_total~{samples_in_curseg}")
elif b.tag_hi == 0x30:
samples_in_curseg += b.tag_lo
print(f" [{i:3d}] {b.kind} packed12: +{b.tag_lo} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
if __name__ == "__main__":
main()
+89
View File
@@ -0,0 +1,89 @@
"""Reconnaissance helpers for cracking the Thor IDFW binary."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
TARGET = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
TXT = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/TXT/UM11719_20231219162723.IDFW.txt"
def hex_at(buf: bytes, off: int, n: int = 32) -> str:
chunk = buf[off : off + n]
hexs = " ".join(f"{b:02x}" for b in chunk)
asc = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk)
return f"{off:04x}: {hexs} {asc}"
def find_all(buf: bytes, needle: bytes) -> list[int]:
out: list[int] = []
i = 0
while True:
j = buf.find(needle, i)
if j < 0:
break
out.append(j)
i = j + 1
return out
def load_sidecar_samples(path: Path) -> dict[str, list[float]]:
"""Parse the txt sample table — Tran/Vert/Long/MicL."""
out = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
in_block = False
for line in path.read_text(errors="replace").splitlines():
if not in_block:
if line.strip() == "Waveform Data Channels":
in_block = True
continue
if line.startswith("Waveform Data USB Channels"):
break
parts = line.split("\t")
# First row is the header "\tTran\tVert\tLong\tMicL"
if len(parts) >= 5 and parts[1] == "Tran":
continue
if len(parts) < 5:
continue
try:
out["Tran"].append(float(parts[1]))
out["Vert"].append(float(parts[2]))
out["Long"].append(float(parts[3]))
out["MicL"].append(float(parts[4]))
except ValueError:
continue
return out
def main():
buf = TARGET.read_bytes()
samples = load_sidecar_samples(TXT)
print(f"file size: {len(buf)} bytes")
print(f"sample rows: Tran={len(samples['Tran'])} Vert={len(samples['Vert'])} Long={len(samples['Long'])} MicL={len(samples['MicL'])}")
print(f"first 6 Tran samples: {samples['Tran'][:6]}")
print(f"first 6 Vert samples: {samples['Vert'][:6]}")
print(f"first 6 Long samples: {samples['Long'][:6]}")
print(f"first 6 MicL samples: {samples['MicL'][:6]}")
print()
print("=== BW magic '00 02 00' positions ===")
hits = find_all(buf, b"\x00\x02\x00")
print(f"{len(hits)} hits")
for h in hits[:20]:
print(hex_at(buf, h, 24))
print()
print("=== '40 02' segment-header positions ===")
hits = find_all(buf, b"\x40\x02")
print(f"{len(hits)} hits")
for h in hits:
ctx_pre = buf[max(0, h - 4): h].hex()
ctx_post = buf[h: h + 20].hex()
# Show byte preceding to help identify real headers vs casual occurrences
print(f" 0x{h:04x} pre={ctx_pre} post={ctx_post}")
if __name__ == "__main__":
main()
+40
View File
@@ -0,0 +1,40 @@
"""Find each segment boundary in the channel and check if errors reset there."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
decoded = decode_waveform_v2(buf[0x0f1f:])
GEO_LSB = 0.0003
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = decoded[ch]
# Find every transition where error becomes zero from nonzero (or grows from zero)
# Print indices where dec resyncs back to exact match.
n = min(len(sc_counts), len(dec))
events = []
prev_match = True
for i in range(n):
match = sc_counts[i] == dec[i]
if match != prev_match:
kind = "RESYNC" if match else "DIVERGE"
events.append((i, kind, sc_counts[i], dec[i]))
prev_match = match
print(f"{ch}: {len(events)} transitions")
for i, kind, sc_v, dec_v in events[:20]:
print(f" idx {i:4d} {kind:8s} sc={sc_v:6d} dec={dec_v:6d} diff={dec_v-sc_v:+d}")
print()
if __name__ == "__main__":
main()
+46
View File
@@ -0,0 +1,46 @@
"""Smoke-test read_idf_file on IDFH across the corpus."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file
def main():
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH"
result = read_idf_file(target)
ev = result.event
print(f"=== {target.name} ===")
print(f" signature: {result.signature}")
print(f" serial: {ev.serial}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" kind: {ev.kind}")
print(f" intervals: {len(result.intervals or [])}")
print(f" peaks: T={ev.peaks.transverse_ips:.4f} V={ev.peaks.vertical_ips:.4f} L={ev.peaks.longitudinal_ips:.4f}")
print()
root = REPO / "tests/fixtures/THORDATA_example"
files = list(root.rglob("*.IDFH"))
ok = fail = nyi = 0
total_intervals = 0
for f in files:
try:
r = read_idf_file(f)
ok += 1
total_intervals += len(r.intervals or [])
except NotImplementedError:
nyi += 1
except Exception as exc:
fail += 1
if fail <= 3:
print(f" FAIL: {f.name}: {type(exc).__name__}: {exc}")
print(f"Corpus: {len(files)} IDFH files | ok={ok} fail={fail} nyi={nyi}")
print(f"Total intervals decoded: {total_intervals}")
if __name__ == "__main__":
main()
+48
View File
@@ -0,0 +1,48 @@
"""Smoke-test read_idf_file across the sample corpus."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file, geo_count_to_ips, mic_count_to_psi
def main():
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
result = read_idf_file(target)
ev = result.event
print(f"=== {target.name} ===")
print(f" signature: {result.signature}")
print(f" serial: {ev.serial}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" record_time: {ev.record_time_sec}")
print(f" calibration: {result.binary_metadata.calibration_date}")
print(f" Tran samples: {len(result.samples['Tran'])}, peak_ips={ev.peaks.transverse_ips:.4f}")
print(f" Vert samples: {len(result.samples['Vert'])}, peak_ips={ev.peaks.vertical_ips:.4f}")
print(f" Long samples: {len(result.samples['Long'])}, peak_ips={ev.peaks.longitudinal_ips:.4f}")
print(f" MicL samples: {len(result.samples['MicL'])}")
print()
# Corpus sweep
root = REPO / "tests/fixtures/THORDATA_example"
files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]
ok = fail = nyi = 0
for f in files:
try:
r = read_idf_file(f)
ok += 1
except NotImplementedError:
nyi += 1
except Exception as exc:
fail += 1
if fail <= 5:
print(f" FAIL: {f.name}: {type(exc).__name__}: {exc}")
print()
print(f"Corpus: {len(files)} IDFW files | ok={ok} fail={fail} not-implemented={nyi}")
if __name__ == "__main__":
main()
+47
View File
@@ -0,0 +1,47 @@
"""Verify build_bw_report_from_idf against a known sidecar."""
from __future__ import annotations
import json
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_ascii_report import parse_idf_report
from micromate.idf_to_bw_report import build_bw_report_from_idf
from micromate.idf_file import read_idf_file
def show(prefix: str, d: dict, indent: int = 0):
for k, v in d.items():
if isinstance(v, dict):
print(f"{' '*indent}{prefix}{k}:")
show("", v, indent + 1)
else:
print(f"{' '*indent}{prefix}{k}: {v!r}")
def main():
base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719"
idfw = base / "UM11719_20231219162723.IDFW"
txt = base / "TXT" / f"{idfw.name}.txt"
report_dict = parse_idf_report(txt.read_text(errors="replace"))
res = read_idf_file(idfw)
bw = build_bw_report_from_idf(report_dict, binary_md=res.binary_metadata)
print("=== IDFW → bw_report ===")
show("", bw)
print()
print("=== IDFH (single trigger row) ===")
idfh = base / "UM11719_20231219162648.IDFH"
txt_h = base / "TXT" / f"{idfh.name}.txt"
rh = parse_idf_report(txt_h.read_text(errors="replace"))
res_h = read_idf_file(idfh)
bw_h = build_bw_report_from_idf(rh, binary_md=res_h.binary_metadata, intervals=res_h.intervals)
show("", bw_h)
if __name__ == "__main__":
main()
Binary file not shown.
Binary file not shown.
+73
View File
@@ -0,0 +1,73 @@
"""Trace Tran sample-by-sample to find exactly where the codec drifts."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def s4(n: int) -> int:
return n if n < 8 else n - 16
def i8(b: int) -> int:
return b if b < 128 else b - 256
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
GEO_LSB = 0.0003
sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]]
body = buf[0x0f1f:]
# Tran[0], Tran[1] from preamble
t0 = int.from_bytes(body[3:5], "big", signed=True)
t1 = int.from_bytes(body[5:7], "big", signed=True)
print(f"preamble Tran[0]={t0} Tran[1]={t1} (sidecar: {sc_tran[0]}, {sc_tran[1]})")
# Block 0: 10 f8 at body[7:9]
print(f"block 0: tag {body[7]:02x} {body[8]:02x}")
print(f" block 0 first 10 data bytes: {body[9:19].hex()}")
# Walk block 0 manually, comparing each sample
cur = t1
samples = [t0, t1]
block_off = 7
nn = body[8]
print(f" NN = {nn}")
data = body[9 : 9 + nn // 2]
for byi, byte in enumerate(data):
for nib_idx, nib in enumerate(((byte >> 4) & 0xF, byte & 0xF)):
cur += s4(nib)
samples.append(cur)
idx = len(samples) - 1
if 0 <= idx < len(sc_tran):
sc_v = sc_tran[idx]
match = "" if sc_v == cur else ""
if idx < 12 or 240 <= idx <= 260:
print(f" idx {idx:3d}: nibble byte={byte:02x} nib={nib:x} delta={s4(nib):+d} cur={cur:+d} sc={sc_v:+d} {match}")
print(f"end of block 0: cur={cur}, len(samples)={len(samples)}, decoder expected 250 here")
# Block 1: 20 28 starts at offset 9 + 124 = 133 from block_off=7
block1_off = 9 + nn // 2
print(f"block 1: tag {body[block1_off]:02x} {body[block1_off+1]:02x} (expecting 20 28)")
nn1 = body[block1_off + 1]
print(f" block 1 NN = {nn1}")
data1 = body[block1_off + 2 : block1_off + 2 + nn1]
for byi, byte in enumerate(data1):
cur += i8(byte)
samples.append(cur)
idx = len(samples) - 1
if idx < len(sc_tran):
sc_v = sc_tran[idx]
match = "" if sc_v == cur else ""
if 248 <= idx <= 295:
print(f" idx {idx:3d}: int8 byte={byte:02x} delta={i8(byte):+d} cur={cur:+d} sc={sc_v:+d} {match}")
if __name__ == "__main__":
main()
+42
View File
@@ -0,0 +1,42 @@
"""Feed candidate body offsets to the BW codec and compare with sidecar."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2, walk_body, find_data_start
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
# Sidecar samples in 0.0003 counts (Thor geo LSB).
sc_tran = [int(round(v / 0.0003)) for v in sc["Tran"][:30]]
sc_vert = [int(round(v / 0.0003)) for v in sc["Vert"][:30]]
sc_long = [int(round(v / 0.0003)) for v in sc["Long"][:30]]
sc_micl = [int(round(v / 1e-6)) for v in sc["MicL"][:30]] # 1 µ unit for mic? Will iterate.
print(f"sidecar Tran (counts): {sc_tran}")
print(f"sidecar Vert (counts): {sc_vert}")
print(f"sidecar Long (counts): {sc_long}")
print(f"sidecar MicL (×1e-6): {sc_micl}")
print()
# Try candidate body start offsets.
for off in (0x0f1f, 0x1057, 0x11f1, 0x1333, 0x1bde, 0x0d30):
print(f"=== body @ 0x{off:04x} ===")
body = buf[off:]
decoded = decode_waveform_v2(body)
if not decoded:
print(" decode_waveform_v2 returned None")
continue
for ch in ("Tran", "Vert", "Long", "MicL"):
arr = decoded.get(ch, [])
print(f" {ch}[{len(arr)}]: {arr[:20]}")
print()
if __name__ == "__main__":
main()
+51
View File
@@ -0,0 +1,51 @@
"""Verify decode_waveform_v2 against sidecar across all 2304 samples per channel."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
body = buf[0x0f1f:]
decoded = decode_waveform_v2(body)
print(f"Sidecar lengths: Tran={len(sc['Tran'])} Vert={len(sc['Vert'])} Long={len(sc['Long'])} MicL={len(sc['MicL'])}")
print(f"Decoded lengths: Tran={len(decoded['Tran'])} Vert={len(decoded['Vert'])} Long={len(decoded['Long'])} MicL={len(decoded['MicL'])}")
print()
GEO_LSB = 0.0003 # in/s per count
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = decoded[ch]
n = min(len(sc_counts), len(dec))
matches = sum(1 for i in range(n) if sc_counts[i] == dec[i])
first_mismatch = next((i for i in range(n) if sc_counts[i] != dec[i]), None)
print(f"{ch}: compared {n}, exact matches {matches} ({100*matches/n:.2f}%)")
if first_mismatch is not None:
i = first_mismatch
print(f" first mismatch at idx {i}: sidecar={sc_counts[i]} ({sc[ch][i]}), decoded={dec[i]}")
print(f" context sidecar[{i-2}..{i+5}]: {sc_counts[max(0,i-2):i+5]}")
print(f" context decoded[{i-2}..{i+5}]: {dec[max(0,i-2):i+5]}")
# MicL: find the multiplicative factor that fits
print()
print("=== MicL scale analysis ===")
sc_micl = sc["MicL"]
dec_micl = decoded["MicL"]
# Skip zero values when computing ratio
ratios = [sc_micl[i] / dec_micl[i] for i in range(min(50, len(sc_micl), len(dec_micl))) if dec_micl[i] != 0]
if ratios:
avg = sum(ratios) / len(ratios)
print(f" avg ratio sidecar/decoded over first 50 nonzero: {avg:.4e} (n={len(ratios)})")
print(f" ratios sample: {[f'{r:.4e}' for r in ratios[:6]]}")
if __name__ == "__main__":
main()
+62 -5
View File
@@ -6,11 +6,68 @@ Series IV event-file format. Sibling to
Series III "Rosetta Stone") — this doc holds what we know so far and Series III "Rosetta Stone") — this doc holds what we know so far and
the open questions still to crack. the open questions still to crack.
**Status (2026-05-20):** ASCII text sidecar fully decoded (1,014 **Status (2026-05-28):** ASCII text sidecar fully decoded (1,014
sample files round-trip). Binary `.IDFH` / `.IDFW` codec sample files round-trip). **Thor IDFW** binary now decodes via
**not yet implemented** — binaries are stored opaquely by `micromate.idf_file.read_idf_file()` — reuses the BW segment-rotated
`WaveformStore.save_imported_idf`, with metadata sourced from the block codec verbatim at fixed body offset `0x0f1f`; metadata (serial,
paired `.txt` sidecar. timestamp, sample_rate, record_time, calibration_date) extracted from
the binary header. Sample fidelity is 8799% byte-exact on quiet
events; loud events hit the BW codec's known walker-stops-early
limitation. Residual ~3% drift on per-sample deltas (likely a
Thor-specific 12-bit delta refinement not yet modelled).
**Thor IDFH histograms also decoded.** Body has one or more segments;
each 12-byte segment header `[length_be 2B][0a 00 00 00][00 NN][05 3f]`
introduces `N = (length - 10) // 72` interval records of 72 bytes
each. Each interval = 4 × 16-byte per-channel records:
`[int16 min][int16 max][int16 ??][uint16 halfp][2B 00][uint16 ??][2B 00][uint16 ??]`.
Geo peak `= max(|min|, |max|) / 32768 × 10` in/s (matches sidecar
~1.8%); freq `= 512 / halfp` Hz (None for halfp ≤ 5 → ">100"
sentinel). Corpus: **all 859 Thor IDFH files decode, 181,071
intervals**. Wired through `read_idf_file()`
`save_imported_idf()` → sidecar's `extensions.idf_intervals`.
**Note on the BE9439 outliers in the example corpus:** Two files
(`BE9439_20200713131747.IDFW` and `BE9439_20200713124251.IDFH`) are
**Series III Blastware** binaries, not Thor. Provenance: TMI tried
to use Thor to manage auto-call-homes for Series III units; the
experiment didn't work out, but it did leave a few BW event files
in Thor's per-serial directory structure with `.IDFW`/`.IDFH`
extensions — Thor's forwarder applied its own naming convention to
the BW bodies it was relaying. Their header `10 00 01 80 00 00
Instantel STRT ff fe <end_key> <start_key>` is the BW SUB 5A STRT
record, not a Thor body preamble. The reader detects them by
signature and raises `NotImplementedError` pointing callers at
`read_blastware_file()`, which extracts BW-format peaks from them.
**Still NYI for Thor IDFH:** per-channel `int16 field4` (possibly
time-of-peak); the two uint16 fields (probably PVS contributions);
8-byte interval tail (PVS data); mic dB(L) exact conversion constant.
### Codec breakthroughs (2026-05-28)
- **Body offset is a fixed `0x0f1f`** across 151/154 corpus IDFW
files. Preceded by a 4-byte record-type marker (`46 00 00 00`)
+ magic preamble `00 02 00 [Tran[0] BE] [Tran[1] BE]`.
- **Sample stream is BW's segment-rotated block codec verbatim.**
Thor reuses `10 NN` (nibble), `20 NN` (int8), `00 NN` (RLE),
`30 NN` (packed12), `40 02` (segment header) tags with the same
semantics. Channel rotation Tran→Vert→Long→MicL.
- **Geo LSB = 0.0003 in/s** (not BW's 0.005), because Thor's 16-bit
ADC range maps to 10 in/s without the 16-count BW quantization step.
- **Mic ≈ 2.14×10⁻⁶ psi/count** (rough scale; refine after channel
block calibration constants are decoded).
- **BW compliance anchor `\xbe\x80\x00\x00\x00\x00` reappears at
IDFW offset 0x952** — sample_rate at anchor6 (uint16 BE),
record_time at anchor+6 (float32 BE), same layout as BW.
- **Event timestamp at offset 0x97A** — 8 bytes `[day][month]
[year_be][unk][hour][min][sec]`. Stop-time mirrors at 0x982.
- **Serial as null-terminated ASCII at 0x14E**.
- **Calibration date** at 0x1940x197 (day, month, year_be).
- Per-sample residual drift of ~3% suggests Thor encodes int8/nibble
deltas with an extra refinement bit that BW doesn't carry —
unsolved; errors resync within a few samples so cumulative impact
is small.
--- ---
+17 -2
View File
@@ -210,8 +210,7 @@ def parse_idf_report(text: Union[str, bytes]) -> Dict[str, Any]:
"long_peak_acceleration", "long_peak_acceleration",
"tran_peak_displacement", "vert_peak_displacement", "tran_peak_displacement", "vert_peak_displacement",
"long_peak_displacement", "long_peak_displacement",
"tran_time_of_peak", "vert_time_of_peak", "long_time_of_peak", "mic_zc_freq",
"mic_time_of_peak", "mic_zc_freq",
) )
for key in float_fields: for key in float_fields:
v = raw.get(key) v = raw.get(key)
@@ -223,6 +222,22 @@ def parse_idf_report(text: Union[str, bytes]) -> Dict[str, Any]:
else: else:
out.pop(key, None) out.pop(key, None)
# Time-of-peak: Thor labels these "TimeofPeak" (lowercase "of") so the
# normalizer produces "*_timeof_peak". Map them to the canonical
# ``*_time_of_peak`` output keys for downstream consumers.
for raw_key, out_key in (
("tran_timeof_peak", "tran_time_of_peak"),
("vert_timeof_peak", "vert_time_of_peak"),
("long_timeof_peak", "long_time_of_peak"),
("mic_timeof_peak", "mic_time_of_peak"),
):
v = raw.get(raw_key)
if v is None:
continue
fv = _parse_float(v)
if fv is not None:
out[out_key] = fv
# Microphone — Thor reports MicPSPL (dB(L)) which is the closest # Microphone — Thor reports MicPSPL (dB(L)) which is the closest
# analogue to BW's mic_ppv. The raw "99.4 dB(L)" string stays in # analogue to BW's mic_ppv. The raw "99.4 dB(L)" string stays in
# `out` under the original `mic_pspl` key for display; the parsed # `out` under the original `mic_pspl` key for display; the parsed
+514 -48
View File
@@ -1,64 +1,530 @@
""" """
micromate/idf_file.py — placeholder for the Thor IDF binary codec. micromate/idf_file.py — Thor IDF binary codec.
Thor's ``.IDFH`` (histogram) and ``.IDFW`` (waveform) event files are an Decodes the Instantel Micromate Series IV ``.IDFW`` (waveform) and
Instantel proprietary binary format that has not yet been reverse- ``.IDFH`` (histogram) binary on-disk format. Sister module to
engineered. Today seismo-relay treats them as opaque blobs: ``minimateplus/event_file_io.py``.
``WaveformStore.save_imported_idf`` stores the bytes verbatim and reads
all device-authoritative metadata from the paired ``.IDFW.txt`` /
``.IDFH.txt`` ASCII sidecar (parsed by ``idf_ascii_report.py``).
When we crack the binary codec — same reverse-engineering playbook we Status (2026-05-28):
used to byte-perfect-parse Series III BW files (see
``docs/instantel_protocol_reference.md`` and ``minimateplus/event_file_io.py``)
— this module will grow:
- ``read_idf_file(path) -> IdfEvent`` - **Genuine Series IV / Thor binaries** are all signed
Parse a ``.IDFW``/``.IDFH`` binary and return a fully populated ``00 12 01 00 00 00 Instantel\\0`` (sig-A in earlier notes). Two
``IdfEvent`` whose waveform-sample arrays come from the binary Series III (Blastware) binaries appear in the example corpus
(the .txt sidecar's tabular sample block being a best-effort (``BE9439_*``) — they share the ``.IDFW``/``.IDFH`` extension by
check). Lets us ingest Thor events even when the operator filing convention but carry a BW STRT header (``10 00 01 80 00 00
hasn't enabled the .txt exporter — closing the Instantel STRT...``) and are NOT Thor data. The reader detects
``had_report=False`` gap that the thor-watcher forwarder them by signature and raises NotImplementedError pointing callers
currently tolerates as a known limitation. at ``minimateplus.event_file_io.read_blastware_file()``.
- **IDFW waveform body** reuses the BW segment-rotated block codec
verbatim. Body always starts at file offset ``0x0f1f``. Samples
decoded via ``minimateplus.waveform_codec.decode_waveform_v2``
with 8799% byte-exact match against ``.IDFW.txt`` sidecar (quiet
events). Loud events hit the BW codec's known walker-stops-early
limit. Residual ~3% drift on per-sample deltas — likely a
Thor-specific 12-bit delta refinement that BW's codec doesn't
model. Geo LSB = 0.0003 in/s; mic factor ~2.14e-6 psi/count.
- **IDFH histogram body**: 12-byte segment header
``[len_be 2B] 0a 00 00 00 [00 NN_counter] 05 3f`` introduces a
segment of ``N`` 72-byte interval records (``N = (len - 10) // 72``).
Each record holds 4 × 16-byte per-channel min/max/halfp + 8-byte
tail. Geo peaks via ``max(|min|, |max|) / 32768 × 10`` in/s
(matches sidecar within ~1.8%), freq via ``512 / halfp`` Hz.
**All 859 Thor IDFH files in the corpus decode (181,071 intervals).**
- Binary metadata directly extracted: serial, timestamp, sample_rate,
record_time, calibration_date. Other fields fall back to the paired
``.IDFW.txt`` / ``.IDFH.txt`` sidecar (consumed by
``WaveformStore.save_imported_idf``).
- ``write_idf_file(path, event)`` (eventually) The full reverse-engineering writeup lives in
Round-trip event reconstruction, used for verifying the codec ``docs/idf_protocol_reference.md``.
against captured device files the way ``write_blastware_file``
verifies the Series III codec.
- Helpers for decoding the binary's per-channel sample arrays into
physical units, the per-event flash buffer's monitor-log records,
etc.
The reverse-engineering path: pair every ``.IDFW`` binary in
``thor-watcher/example-data/`` with its sibling ``.IDFW.txt``, treating
the txt's "Waveform Data Channels" block as ground-truth, and align
the binary's per-channel int16-or-similar arrays against it. Header
fields (sample rate, channel count, record time, timestamps) sit before
the sample block — same approach as the BW codec where ASCII strings
inside the binary (``Project:``, ``Client:``, etc.) anchored field
discovery.
""" """
from __future__ import annotations from __future__ import annotations
import datetime
import struct
from dataclasses import dataclass
from pathlib import Path from pathlib import Path
from typing import Union from typing import Optional, Union
from .models import IdfEvent from minimateplus.waveform_codec import decode_waveform_v2
from .models import IdfEvent, IdfPeaks, IdfReport
def read_idf_file(path: Union[str, Path]) -> "IdfEvent": # Genuine Series IV / Thor IDF binary signature: 6 bytes, then ASCII "Instantel".
"""Parse a Thor ``.IDFW``/``.IDFH`` binary into an ``IdfEvent``. _THOR_PREFIX = b"\x00\x12\x01\x00\x00\x00"
# Stray Series III (Blastware) binaries that occasionally turn up in Thor
# corpus directories renamed to the .IDFW/.IDFH convention. Their header
# (`10 00 01 80 00 00 Instantel STRT ...`) is byte-for-byte a BW SUB 5A
# STRT record, not a Thor binary. Detected so we can refuse-and-route
# rather than mis-parse.
_BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00"
_INSTANTEL_TAG = b"Instantel"
Not yet implemented. When implemented, this will be the canonical # Most common body offset for sig-A IDFW files (~50% of prod events;
entry point for reading Thor binaries — the ASCII sidecar parser # 151/154 in the original tests/fixtures/THORDATA_example corpus). The
becomes an optional fast-path metadata supplement rather than the # body is the segment-rotated block stream consumed by decode_waveform_v2;
sole source of device-authoritative data. # bytes [0:3] are the magic ``00 02 00`` preamble. Production events
# routinely use other offsets — see :func:`_find_waveform_body_offset`
# for the dynamic scan. This constant survives only as the priority hint.
_BODY_START_SIG_A = 0x0F1F
# Magic bytes that mark a candidate waveform-body preamble.
_BODY_MAGIC = b"\x00\x02\x00"
# Where to start looking for body candidates inside the file. Skip the
# fixed-header region where the same magic legitimately appears inside
# channel-test records and the compliance block (offsets 0x015d, 0x091c,
# 0x0ae2, 0x0d30 in observed events).
_BODY_SCAN_FLOOR = 0x0E00
# Geophone count → in/s, derived from sidecar ground truth: the smallest
# non-zero sample in 1,014-file corpus is 0.0003 in/s.
_GEO_LSB_IPS = 0.0003
# Microphone count → psi, derived from sidecar regression on 50 sample
# pairs from UM11719_20231219162723.IDFW (mic-heavy event).
_MIC_LSB_PSI = 2.14e-6
# IDFH histogram constants.
_IDFH_INTERVAL_SIZE = 72 # bytes per per-interval record
_IDFH_SEGMENT_HEADER = 10 # bytes: [len_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B]
_IDFH_SEGMENT_TAIL = 2 # bytes after the interval data block, before next marker
_IDFH_HALFP_FREQ_NUM = 512.0 # freq_hz = NUM / halfp; halfp ≤ 5 means ">100 Hz" sentinel
_IDFH_GEO_FULL_SCALE = 10.0 # in/s — Normal range
_IDFH_INT16_FS = 32768.0
_IDFH_CHANNELS = ("Tran", "Vert", "Long", "MicL")
# ─── Binary metadata extraction ─────────────────────────────────────────────
@dataclass
class IdfBinaryMetadata:
"""Fields recoverable from the sig-A binary header (no .txt needed)."""
serial: Optional[str] = None
event_datetime: Optional[datetime.datetime] = None
sample_rate: Optional[int] = None
record_time_sec: Optional[float] = None
calibration_date: Optional[datetime.date] = None
def _read_ascii_z(buf: bytes, off: int, maxlen: int = 64) -> Optional[str]:
if off >= len(buf):
return None
end = buf.find(b"\x00", off, off + maxlen)
if end < 0:
end = min(off + maxlen, len(buf))
s = buf[off:end].decode("ascii", errors="replace").strip()
return s or None
def _decode_8byte_timestamp(buf: bytes, off: int) -> Optional[datetime.datetime]:
"""Layout: ``[day][month][year_hi][year_lo][unknown][hour][min][sec]``."""
if off + 8 > len(buf):
return None
day, mon, yh, yl, _unk, hr, mn, sc = buf[off : off + 8]
year = (yh << 8) | yl
if not (2015 <= year <= 2050 and 1 <= mon <= 12 and 1 <= day <= 31
and 0 <= hr < 24 and 0 <= mn < 60 and 0 <= sc < 60):
return None
try:
return datetime.datetime(year, mon, day, hr, mn, sc)
except ValueError:
return None
def extract_binary_metadata(buf: bytes) -> IdfBinaryMetadata:
"""Pull serial/timestamp/sample_rate/record_time/calibration from the
sig-A binary header.
Field positions confirmed against UM11719_20231219162723.IDFW; stable
across the 151-file sig-A corpus.
""" """
raise NotImplementedError( md = IdfBinaryMetadata()
"IDF binary codec not yet implemented; the .IDFW/.IDFH binary format "
"is undecoded. Use parse_idf_report() on the paired .txt sidecar " # Serial: null-terminated ASCII at 0x14E.
"for device-authoritative metadata." md.serial = _read_ascii_z(buf, 0x14E, maxlen=16)
# Sample rate + record time live in a BW-compatible compliance block.
# Locate the 6-byte anchor `be 80 00 00 00 00` and read offsets relative
# to it: anchor-6 = sample_rate uint16 BE; anchor+6 = record_time float32 BE.
anchor = buf.find(b"\xbe\x80\x00\x00\x00\x00", 0x800, 0xA00)
if anchor > 0:
sr_bytes = buf[anchor - 6 : anchor - 4]
if len(sr_bytes) == 2:
sr = int.from_bytes(sr_bytes, "big")
if sr in (256, 512, 1024, 2048, 4096):
md.sample_rate = sr
rt_bytes = buf[anchor + 6 : anchor + 10]
if len(rt_bytes) == 4:
try:
rt = struct.unpack(">f", rt_bytes)[0]
if 0.1 <= rt <= 600.0:
md.record_time_sec = float(rt)
except struct.error:
pass
# Event timestamp: 8 bytes. Position differs between IDFW (0x97A) and
# IDFH (0x9F8); scan a small range and accept the first valid decode.
for off in (0x97A, 0x9F8):
ts = _decode_8byte_timestamp(buf, off)
if ts is not None:
md.event_datetime = ts
break
# Calibration date: day, month, year_be at 0x194-0x197.
if len(buf) > 0x197:
day, mon = buf[0x194], buf[0x195]
year = int.from_bytes(buf[0x196 : 0x198], "big")
if 1 <= mon <= 12 and 1 <= day <= 31 and 2015 <= year <= 2050:
try:
md.calibration_date = datetime.date(year, mon, day)
except ValueError:
pass
return md
# ─── Sample decoder + unit conversion ───────────────────────────────────────
def _find_waveform_body_offset(buf: bytes) -> Optional[int]:
"""Pick the file offset of the waveform body by trial-decoding every
``00 02 00`` magic position past the fixed-header region.
The body's location isn't fixed across all sig-A IDFW files — about
half the production events use ``0x0f1f``, but the rest have offsets
that shift based on header padding / channel-config layout. We
auto-detect by:
1. Find every ``00 02 00`` occurrence past ``_BODY_SCAN_FLOOR``.
2. Try ``decode_waveform_v2()`` on each candidate.
3. Pick the offset whose decoded sample count is largest.
Returns the offset, or ``None`` if no candidate yielded more than
the trivial 2-sample preamble (= "no real body found").
Costs ~2-8 trial decodes per file; in practice the first candidate
past 0x0e00 is usually the right one.
"""
if len(buf) < _BODY_SCAN_FLOOR + 8:
return None
best: Optional[tuple[int, int]] = None # (total_samples, offset)
i = _BODY_SCAN_FLOOR
while True:
j = buf.find(_BODY_MAGIC, i)
if j < 0:
break
i = j + 1
try:
decoded = decode_waveform_v2(buf[j:])
except Exception:
continue
if not decoded:
continue
total = sum(len(v) for v in decoded.values())
# A "real" body has more than just the 2-sample preamble.
if total <= 2:
continue
if best is None or total > best[0]:
best = (total, j)
return best[1] if best else None
def _decode_waveform_samples(buf: bytes) -> Optional[dict]:
"""Decode samples from the sig-A waveform body.
Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in
its own count unit (see :func:`mic_count_to_psi`). Returns None if
no usable body is found.
Uses :func:`_find_waveform_body_offset` to locate the body — the
file-offset varies across events (~50% sit at the canonical
``0x0f1f`` but the rest don't), so the previous hardcoded constant
silently produced 2-sample preamble-only output for half the corpus.
"""
off = _find_waveform_body_offset(buf)
if off is None:
return None
return decode_waveform_v2(buf[off:])
def geo_count_to_ips(count: int) -> float:
"""Convert a Thor geo decoder count to in/s. LSB = 0.0003 in/s."""
return count * _GEO_LSB_IPS
def mic_count_to_psi(count: int) -> float:
"""Convert a Thor mic decoder count to psi. Scale derived from
regression over 50 sample pairs in UM11719_20231219162723.IDFW;
consistent to ~5%. Calibration constants from the channel block
can refine this once decoded.
"""
return count * _MIC_LSB_PSI
# ─── IDFH histogram decoder ─────────────────────────────────────────────────
@dataclass
class IdfhInterval:
"""One decoded histogram interval (typically one minute of monitoring)."""
offset: int # file byte offset of the 72-byte record
# Per-channel min/max ADC counts (int16 BE), half-period samples, peak count.
# Peak = max(|min|, |max|). freq_hz = 512/halfp (None if halfp ≤ 5 →
# ">100 Hz" sentinel; matches sidecar convention).
tran_min: int
tran_max: int
tran_halfp: int
vert_min: int
vert_max: int
vert_halfp: int
long_min: int
long_max: int
long_halfp: int
micl_min: int
micl_max: int
micl_halfp: int
def peak_count(self, channel: str) -> int:
mn = getattr(self, f"{channel.lower()}_min")
mx = getattr(self, f"{channel.lower()}_max")
return max(abs(mn), abs(mx))
def peak_ips(self, channel: str) -> float:
"""Convert peak count to in/s (geo channels only)."""
return self.peak_count(channel) / _IDFH_INT16_FS * _IDFH_GEO_FULL_SCALE
def freq_hz(self, channel: str) -> Optional[float]:
halfp = getattr(self, f"{channel.lower()}_halfp")
if halfp <= 5:
return None
return _IDFH_HALFP_FREQ_NUM / halfp
def _decode_idfh_interval(buf72: bytes, offset: int) -> IdfhInterval:
"""Decode one 72-byte interval record into per-channel min/max/halfp."""
import struct
fields = []
for i in range(4):
block = buf72[i * 16 : (i + 1) * 16]
mn = struct.unpack_from(">h", block, 0)[0]
mx = struct.unpack_from(">h", block, 2)[0]
# block[4:6] = int16 BE, role unknown (possibly time-of-peak)
halfp = struct.unpack_from(">H", block, 6)[0]
# block[10:12] and block[14:16] are uint16 BE with unknown semantics
# (likely sum / count contributions for the PVS computation).
fields.extend([mn, mx, halfp])
# Tail 8 bytes (buf72[64:72]) carry PVS-related data; not yet decoded.
return IdfhInterval(
offset=offset,
tran_min=fields[0], tran_max=fields[1], tran_halfp=fields[2],
vert_min=fields[3], vert_max=fields[4], vert_halfp=fields[5],
long_min=fields[6], long_max=fields[7], long_halfp=fields[8],
micl_min=fields[9], micl_max=fields[10], micl_halfp=fields[11],
)
def decode_idfh_body(buf: bytes) -> list:
"""Walk an IDFH file and decode every interval record.
The body has one or more segments; each segment header is 12 bytes:
``[length_be 2B][0a 00 00 00][00 NN_counter][05 3f]`` where ``length``
is bytes from the magic through the end of the interval block
(= 10 + 72 × n_intervals). Segments are separated by a 2-byte tail
+ next-segment 2-byte prefix (the bytes before the next length field).
Confirmed against the 859-file corpus (181,071 intervals decoded; 1
failure is the sig-B BE9439 file).
"""
intervals: list = []
i = 0
while True:
j = buf.find(b"\x0a\x00\x00\x00", i)
if j < 0 or j < 2:
break
# Validate: [length_be][0a 00 00 00][00 NN][05 3f]
if buf[j + 4] != 0x00 or buf[j + 6 : j + 8] != b"\x05\x3f":
i = j + 1
continue
length = int.from_bytes(buf[j - 2 : j], "big")
n = (length - _IDFH_SEGMENT_HEADER) // _IDFH_INTERVAL_SIZE
if n <= 0:
i = j + 1
continue
header_start = j - 2
interval_start = header_start + _IDFH_SEGMENT_HEADER
for k in range(n):
off = interval_start + k * _IDFH_INTERVAL_SIZE
if off + _IDFH_INTERVAL_SIZE > len(buf):
break
chunk = buf[off : off + _IDFH_INTERVAL_SIZE]
intervals.append(_decode_idfh_interval(chunk, off))
# Advance past this segment + the 2-byte tail.
i = header_start + length + _IDFH_SEGMENT_TAIL
return intervals
# ─── Top-level reader ───────────────────────────────────────────────────────
@dataclass
class IdfReadResult:
"""Return type for :func:`read_idf_file`.
For waveforms (``.IDFW``), ``samples`` holds the per-channel sample
arrays in Thor decoder counts. For histograms (``.IDFH``),
``samples`` is empty and ``intervals`` holds the per-interval
record list (peaks, freqs).
"""
event: IdfEvent
samples: dict # {"Tran": [...], ...} for IDFW; empty for IDFH
binary_metadata: IdfBinaryMetadata
signature: str # always "thor" for now (sig-A genuine Thor)
intervals: Optional[list] = None # list[IdfhInterval] for IDFH; None for IDFW
def read_idf_file(
path: Union[str, Path],
*,
data: Optional[bytes] = None,
) -> IdfReadResult:
"""Parse a Thor ``.IDFW`` binary into an ``IdfEvent`` + decoded samples.
Currently implements signature-A waveforms only. Signature-B
(old-firmware) and ``.IDFH`` histograms raise NotImplementedError;
use the paired ``.IDFW.txt`` / ``.IDFH.txt`` sidecar for those via
``parse_idf_report()``.
Returns an :class:`IdfReadResult`. The caller converts int sample
counts to physical units via :func:`geo_count_to_ips` /
:func:`mic_count_to_psi`.
``path`` is used for filename in error messages and ``.IDFH`` vs
``.IDFW`` suffix detection. When ``data`` is supplied the disk
read is skipped — useful for ingest paths that already have the
bytes in memory and where the file may not exist on disk yet.
"""
p = Path(path)
buf = data if data is not None else p.read_bytes()
if len(buf) < 16 or buf[6:16] != _INSTANTEL_TAG + b"\x00":
raise ValueError(f"{p.name}: not an IDF file (missing Instantel magic)")
sig_prefix = buf[:6]
if sig_prefix == _THOR_PREFIX:
signature = "thor"
elif sig_prefix == _BW_STRAY_PREFIX:
raise NotImplementedError(
f"{p.name}: file has a Series III (Blastware) STRT header in "
"an IDF-named container — not a Thor binary. Route through "
"minimateplus.event_file_io.read_blastware_file() instead "
"(peaks decode; samples & full metadata don't, but it's not "
"Thor data so the Thor codec doesn't apply)."
)
else:
raise ValueError(f"{p.name}: unknown IDF signature {sig_prefix.hex()}")
is_histogram = p.suffix.upper() == ".IDFH"
md = extract_binary_metadata(buf)
if is_histogram:
intervals = decode_idfh_body(buf)
if not intervals:
raise ValueError(f"{p.name}: IDFH body decoded no intervals")
# Peaks: max across all intervals on each channel (per-channel max
# of stored max-magnitudes; sidecar's PPV row carries the same).
peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0)
peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0)
peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0)
# Mic peak in psi — Thor stores per-interval mic ADC counts in the
# binary; convert the max count to psi via the per-count factor.
mic_peak_count = max((iv.peak_count("MicL") for iv in intervals), default=0)
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
rep = IdfReport(
serial_number=md.serial,
event_type="Full Histogram",
event_datetime=md.event_datetime,
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
)
peaks = IdfPeaks(
transverse_ips=peak_tran,
vertical_ips=peak_vert,
longitudinal_ips=peak_long,
peak_vector_sum_ips=None,
mic_pspl_dbl=None, # IDFH binary doesn't carry the dB(L) value
mic_pspl_psi=mic_peak_psi,
)
event = IdfEvent(
serial=md.serial or "UNKNOWN",
timestamp=md.event_datetime or datetime.datetime(1970, 1, 1),
kind="Histogram",
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
peaks=peaks,
report=rep,
)
return IdfReadResult(
event=event,
samples={},
binary_metadata=md,
signature=signature,
intervals=intervals,
)
# Waveform path.
decoded = _decode_waveform_samples(buf)
if decoded is None:
raise ValueError(f"{p.name}: waveform body codec failed")
rep = IdfReport(
serial_number=md.serial,
event_type="Full Waveform",
event_datetime=md.event_datetime,
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
)
def _peak_ips(ch: str) -> float:
arr = decoded.get(ch, [])
return geo_count_to_ips(max((abs(v) for v in arr), default=0))
# Mic peak psi from binary: max absolute MicL ADC count × 2.14e-6 psi/count.
mic_arr = decoded.get("MicL", [])
mic_peak_count = max((abs(v) for v in mic_arr), default=0)
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
peaks = IdfPeaks(
transverse_ips=_peak_ips("Tran"),
vertical_ips=_peak_ips("Vert"),
longitudinal_ips=_peak_ips("Long"),
# PVS requires aligned per-sample √(T²+V²+L²); leave None — the
# sidecar carries it and the bridge picks it up if present.
peak_vector_sum_ips=None,
mic_pspl_dbl=None, # binary IDFW doesn't carry the dB(L) value;
# sidecar .txt fills it via IdfReport.from_dict
mic_pspl_psi=mic_peak_psi,
)
event = IdfEvent(
serial=md.serial or "UNKNOWN",
timestamp=md.event_datetime or datetime.datetime(1970, 1, 1),
kind="Waveform",
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
peaks=peaks,
report=rep,
)
return IdfReadResult(
event=event,
samples=decoded,
binary_metadata=md,
signature=signature,
) )
+323
View File
@@ -0,0 +1,323 @@
"""
micromate/idf_to_bw_report.py — adapter that projects a parsed Thor IDF
report (+ binary metadata + decoded IDFH intervals) into the
``bw_report``-shaped dict that :mod:`sfm.report_pdf.gather_report_data`
consumes.
Lets Thor events flow through the existing Series III Event Report PDF
pipeline without duplicating the renderer. Thor's report content is
~95% the same data shape as BW's; the field names differ but the
underlying metrics map 1:1.
Caveats
───────
- **Mic units** — Thor records ``MicPSPL`` natively in dB(L). This
adapter sets ``bw_report.mic.pspl_dbl`` directly; the report
renderer recomputes the equivalent psi via its dBL→psi formula.
- **Saturation / above-range flags** — Thor doesn't always mark
``OORANGE`` the way BW does; we set ``zc_freq_above_range`` only
when a `>100` sentinel was preserved in the raw text.
- **Per-interval data** — for IDFH events we build ``interval_times``
by stepping ``IntervalSize`` from ``HistogramStartTime``; the binary
decoder confirms one record per step (882 / 881 / 881 ... across
the corpus).
- **calibration_by parsing** — Thor's free-form ``Calibration : November
22, 2023 by Instantel`` is split on ``" by "`` to extract the
calibrator; the date prefix is parsed where possible, otherwise
the binary-extracted ``calibration_date`` from
:class:`micromate.idf_file.IdfBinaryMetadata` wins.
"""
from __future__ import annotations
import datetime
import re
from typing import Any, Dict, List, Optional
# ─── Helpers ────────────────────────────────────────────────────────────────
_NUM_RE = re.compile(r"-?\d+(?:\.\d+)?")
def _parse_first_number(s: Optional[str]) -> Optional[float]:
"""Pull the first numeric token from a string like ``"0.1500 in/s"``."""
if s is None:
return None
m = _NUM_RE.search(str(s))
if not m:
return None
try:
return float(m.group(0))
except ValueError:
return None
def _parse_interval_size_s(s: Optional[str]) -> Optional[float]:
"""``"60 sec"`` → 60.0, ``"5 min"`` → 300.0, ``"1 hour"`` → 3600."""
if s is None:
return None
num = _parse_first_number(s)
if num is None:
return None
sl = str(s).lower()
if "hour" in sl or "hr" in sl:
return num * 3600.0
if "min" in sl:
return num * 60.0
return num # default to seconds
def _parse_calibration(text: Optional[str]) -> tuple[Optional[str], Optional[str]]:
"""Split ``"November 22, 2023 by Instantel"`` → (ISO date, calibrator).
Returns ``(None, None)`` if neither half parses.
"""
if not text:
return None, None
parts = str(text).split(" by ", 1)
date_part = parts[0].strip() if parts else None
by_part = parts[1].strip() if len(parts) > 1 else None
iso_date: Optional[str] = None
if date_part:
for fmt in ("%B %d, %Y", "%b %d, %Y", "%Y-%m-%d", "%m/%d/%Y"):
try:
iso_date = datetime.datetime.strptime(date_part, fmt).date().isoformat()
break
except ValueError:
continue
return iso_date, by_part
def _channel_peaks(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]:
"""Map ``tran_ppv`` / ``tran_zc_freq`` / ... → bw_report.peaks.tran shape."""
out: Dict[str, Any] = {}
for src, dst in (
(f"{ch_lc}_ppv", "ppv_ips"),
(f"{ch_lc}_zc_freq", "zc_freq_hz"),
(f"{ch_lc}_time_of_peak", "time_of_peak_s"),
(f"{ch_lc}_peak_acceleration", "peak_accel_g"),
(f"{ch_lc}_peak_displacement", "peak_disp_in"),
):
v = idf.get(src)
if v is not None:
out[dst] = v
# ZC freq ">100" sentinel: the raw text carries it under the un-typed
# key (e.g. ``raw["tran_zc_freq"]`` would be ``">100"``), and our parser
# dropped the typed entry. Detect that case and flag.
raw_zc = idf.get(f"{ch_lc}_zc_freq")
if isinstance(raw_zc, str) and ">" in raw_zc:
out["zc_freq_above_range"] = True
out.pop("zc_freq_hz", None)
return out
def _sensor_check(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]:
out: Dict[str, Any] = {}
fr = idf.get(f"{ch_lc}_test_freq")
if fr is not None:
out["freq_hz"] = _parse_first_number(fr)
rt = idf.get(f"{ch_lc}_test_ratio")
if rt is not None:
out["ratio"] = _parse_first_number(rt)
am = idf.get(f"{ch_lc}_test_amplitude")
if am is not None:
out["amplitude_mv"] = _parse_first_number(am)
res = idf.get(f"{ch_lc}_test_results")
if res is not None:
out["result"] = str(res).strip()
return {k: v for k, v in out.items() if v is not None}
def _interval_times(idf: Dict[str, Any], n_intervals: Optional[int]) -> List[str]:
"""Synthesise per-interval timestamps from start + interval_size × k.
Returns ``[]`` when start time or interval size is unknown.
"""
if not n_intervals:
return []
start_date = idf.get("histogram_start_date") or idf.get("event_date")
start_time = idf.get("histogram_start_time") or idf.get("event_time")
iv_str = idf.get("interval_size")
iv_s = _parse_interval_size_s(iv_str)
if not (start_date and start_time and iv_s):
return []
try:
t0 = datetime.datetime.strptime(f"{start_date} {start_time}", "%Y-%m-%d %H:%M:%S")
except ValueError:
return []
out = []
for k in range(int(n_intervals)):
t = t0 + datetime.timedelta(seconds=iv_s * (k + 1))
out.append(t.isoformat())
return out
# ─── Top-level adapter ──────────────────────────────────────────────────────
def build_bw_report_from_idf(
idf_report: Dict[str, Any],
*,
binary_md=None,
intervals: Optional[list] = None,
is_histogram: Optional[bool] = None,
) -> Dict[str, Any]:
"""Project a parsed IDF report dict (and optional binary metadata +
decoded IDFH intervals) into the BW report sidecar shape.
The returned dict is structurally identical to what
``minimateplus.event_file_io._bw_report_to_dict`` produces from a
real BW ASCII report — it can be assigned to
``sidecar["bw_report"]`` and consumed verbatim by
``sfm.report_pdf.gather_report_data``.
``intervals`` is the list of :class:`micromate.idf_file.IdfhInterval`
objects from :func:`micromate.idf_file.decode_idfh_body`; only used
for histogram events to derive accurate ``interval_times``.
"""
if is_histogram is None:
et = str(idf_report.get("event_type", ""))
is_histogram = et.lower().startswith("full histogram")
# ── Trigger / recording / device ─────────────────────────────────────
trigger_channel = idf_report.get("trigger")
trigger_level = _parse_first_number(idf_report.get("geo_trigger_level"))
geo_range_ips = _parse_first_number(idf_report.get("geo_range"))
cal_iso, cal_by = _parse_calibration(idf_report.get("calibration"))
# Prefer the binary-extracted calibration_date when our text parse fell
# through; the binary date is unambiguous.
if cal_iso is None and binary_md is not None and binary_md.calibration_date:
cal_iso = binary_md.calibration_date.isoformat()
# ── Histogram fields ────────────────────────────────────────────────
hist_block: Dict[str, Any] = {
"start": None, "stop": None, "n_intervals": None,
"interval_size": None, "interval_size_s": None,
"channel_peak_when": {},
}
if is_histogram:
sd = idf_report.get("histogram_start_date")
st = idf_report.get("histogram_start_time")
if sd and st:
try:
hist_block["start"] = datetime.datetime.strptime(
f"{sd} {st}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
pass
ed = idf_report.get("histogram_stop_date")
et_ = idf_report.get("histogram_stop_time")
if ed and et_:
try:
hist_block["stop"] = datetime.datetime.strptime(
f"{ed} {et_}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
pass
n_raw = idf_report.get("number_of_intervals")
if n_raw is not None:
try:
# Thor reports a float like "81.04"; round to int (the BW
# report uses an int for the column).
hist_block["n_intervals"] = int(float(str(n_raw)))
except ValueError:
pass
# When the binary decoder gave us the actual interval count, prefer it.
if intervals is not None:
hist_block["n_intervals"] = len(intervals)
hist_block["interval_size"] = idf_report.get("interval_size")
hist_block["interval_size_s"] = _parse_interval_size_s(idf_report.get("interval_size"))
# interval_times derived from start+step (the BW report uses the
# exact strings; we match its representation).
times = _interval_times(idf_report, hist_block["n_intervals"])
# Per-channel peak when (absolute date+time at which the channel's
# peak occurred over the histogram run). Thor splits this into
# ``TranPeakDate`` / ``TranPeakTime`` etc.
peak_when: Dict[str, str] = {}
for ch_label, ch_lc in (("Tran", "tran"), ("Vert", "vert"), ("Long", "long"), ("MicL", "mic")):
d = idf_report.get(f"{ch_lc}_peak_date")
t = idf_report.get(f"{ch_lc}_peak_time")
if d and t:
try:
peak_when[ch_label] = datetime.datetime.strptime(
f"{d} {t}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
continue
if peak_when:
hist_block["channel_peak_when"] = peak_when
# ── Mic block ────────────────────────────────────────────────────────
mic_block = {
"weighting": "L", # Thor mic is ISEE Linear
"pspl_dbl": idf_report.get("mic_ppv"), # the dB(L) float
"pspl_saturated": False,
"zc_freq_hz": idf_report.get("mic_zc_freq"),
"zc_freq_above_range": isinstance(idf_report.get("mic_zc_freq"), str)
and ">" in str(idf_report.get("mic_zc_freq")),
"time_of_peak_s": idf_report.get("mic_time_of_peak"),
}
if mic_block["zc_freq_above_range"]:
mic_block["zc_freq_hz"] = None
# ── Peaks ────────────────────────────────────────────────────────────
vs_block = {
"ips": idf_report.get("peak_vector_sum"),
"time_s": _parse_first_number(idf_report.get("peak_vector_sum_time_sum")),
"when": None,
"saturated": False,
}
if is_histogram:
# PVS absolute date+time, when present.
vs_d = idf_report.get("peak_vector_sum_date")
vs_t = idf_report.get("peak_vector_sum_time")
if vs_d and vs_t:
try:
vs_block["when"] = datetime.datetime.strptime(
f"{vs_d} {vs_t}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
pass
return {
"available": True,
"event_type": idf_report.get("event_type"),
"version": idf_report.get("version"),
"trigger": {
"channel": trigger_channel,
"geo_level_ips": trigger_level,
},
"recording": {
"sample_rate_sps": idf_report.get("sample_rate"),
"record_time_s": idf_report.get("record_time_sec"),
"pretrig_s": idf_report.get("pre_trigger_sec"),
"stop_mode": idf_report.get("record_stop_mode"),
"geo_range_ips": geo_range_ips,
"units": idf_report.get("units"),
},
"device": {
"battery_volts": idf_report.get("battery_volts"),
"calibration_date": cal_iso,
"calibration_by": cal_by,
},
"peaks": {
"tran": _channel_peaks(idf_report, "tran"),
"vert": _channel_peaks(idf_report, "vert"),
"long": _channel_peaks(idf_report, "long"),
"vector_sum": vs_block,
},
"mic": mic_block,
"sensor_check": {
"tran": _sensor_check(idf_report, "tran"),
"vert": _sensor_check(idf_report, "vert"),
"long": _sensor_check(idf_report, "long"),
"mic": _sensor_check(idf_report, "mic"),
},
"histogram": hist_block,
"monitor_log": [],
"pc_sw_version": None,
}
+27 -6
View File
@@ -159,12 +159,23 @@ class IdfReport:
@dataclass @dataclass
class IdfPeaks: class IdfPeaks:
"""Geophone + mic peak values for one Thor event. Native Thor units.""" """Geophone + mic peak values for one Thor event. Native Thor units.
Thor stores the mic peak in two parallel forms — ``mic_pspl_dbl`` is
what the sidecar's top-level ``MicPSPL`` header field carries (dB(L)),
used in the report header. ``mic_pspl_psi`` is the psi value derived
either from the IDFW sample table / IDFH interval column 9, or from
the binary mic counts (~2.14e-6 psi/count). Needed because the
BW-shaped ``PeakValues.micl`` consumed by ``event_hdf5.write_event_hdf5``
expects psi — feeding it dB(L) makes the h5 mic-chart scale factor
blow up.
"""
transverse_ips: Optional[float] = None # in/s transverse_ips: Optional[float] = None # in/s
vertical_ips: Optional[float] = None # in/s vertical_ips: Optional[float] = None # in/s
longitudinal_ips: Optional[float] = None # in/s longitudinal_ips: Optional[float] = None # in/s
peak_vector_sum_ips: Optional[float] = None # in/s peak_vector_sum_ips: Optional[float] = None # in/s
mic_pspl_dbl: Optional[float] = None # dB(L) mic_pspl_dbl: Optional[float] = None # dB(L)
mic_pspl_psi: Optional[float] = None # psi
@dataclass @dataclass
@@ -324,10 +335,14 @@ class IdfEvent:
machinery without those code paths needing to know about Thor. machinery without those code paths needing to know about Thor.
Caveats of the bridge: Caveats of the bridge:
- ``mic_ppv`` on the produced Event carries Thor's dB(L) value - ``PeakValues.micl`` carries the mic peak in **psi** (matching
verbatim — the UI distinguishes via the ``device_family`` BW's convention) — set from :attr:`IdfPeaks.mic_pspl_psi`,
column (Phase 1). Don't run the BW psi→dBL converter on with a dB(L)→psi fallback when only the dB(L) value is
Series IV rows. available. This is what the h5 writer's mic-scale-factor
logic needs. The dB(L) value still flows through
``bw_report.mic.pspl_dbl`` (set by the
``idf_to_bw_report`` adapter) and the renderer reads it
from there for the report header.
- Many Thor-specific fields (Peak Acceleration / Displacement, - Many Thor-specific fields (Peak Acceleration / Displacement,
sensor self-check, calibration) don't have a slot in sensor self-check, calibration) don't have a slot in
``Event``. The full IdfReport is preserved on the ``Event``. The full IdfReport is preserved on the
@@ -349,11 +364,17 @@ class IdfEvent:
minute=self.timestamp.minute, minute=self.timestamp.minute,
second=self.timestamp.second, second=self.timestamp.second,
) )
# Resolve mic peak as psi. Priority: binary-derived mic_pspl_psi
# (set by read_idf_file) > dB(L)→psi fallback via standard formula
# (psi = 2.9e-9 × 10^(dBL/20)) > None.
mic_psi = self.peaks.mic_pspl_psi
if mic_psi is None and self.peaks.mic_pspl_dbl is not None:
mic_psi = 2.9e-9 * (10.0 ** (self.peaks.mic_pspl_dbl / 20.0))
pv = PeakValues( pv = PeakValues(
tran=self.peaks.transverse_ips, tran=self.peaks.transverse_ips,
vert=self.peaks.vertical_ips, vert=self.peaks.vertical_ips,
long=self.peaks.longitudinal_ips, long=self.peaks.longitudinal_ips,
micl=self.peaks.mic_pspl_dbl, # dB(L) — see caveat above micl=mic_psi, # psi, matching BW's convention (h5 scaling depends on this)
peak_vector_sum=self.peaks.peak_vector_sum_ips, peak_vector_sum=self.peaks.peak_vector_sum_ips,
) )
pi = ProjectInfo( pi = ProjectInfo(
+1 -1
View File
@@ -49,7 +49,7 @@ SIDECAR_KIND = "sfm.event"
# bumped without a `pip install` re-run — leading to confusing stale # bumped without a `pip install` re-run — leading to confusing stale
# version stamps in sidecars. Bump this constant and CHANGELOG.md # version stamps in sidecars. Bump this constant and CHANGELOG.md
# together at release time. # together at release time.
TOOL_VERSION = "0.20.0" TOOL_VERSION = "0.21.1"
try: try:
# Best-effort: prefer the installed metadata when it's NEWER than the # Best-effort: prefer the installed metadata when it's NEWER than the
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "seismo-relay" name = "seismo-relay"
version = "0.20.0" version = "0.21.1"
description = "Python client and REST server for MiniMate Plus seismographs" description = "Python client and REST server for MiniMate Plus seismographs"
requires-python = ">=3.10" requires-python = ">=3.10"
dependencies = [ dependencies = [
+331
View File
@@ -0,0 +1,331 @@
"""
scripts/backfill_thor_events.py — re-process existing Thor (Series IV)
events so their sidecars carry the bw_report block produced by
``micromate.idf_to_bw_report.build_bw_report_from_idf`` + their .h5
clean-waveform files for IDFW events.
Why this exists
───────────────
Thor events ingested before v0.21.0 (or during the v0.21.0 ingest bug
window fixed in commit bee1185) have sidecars with only
``extensions.idf_report`` — no ``bw_report`` block. Without
``bw_report``, the SFM PDF renderer falls back to DB-only fields
(misses sensor-self-check, full per-channel breakdown, mic dB(L)),
and the modal chart 404s on ``/waveform.json`` for IDFW events
because no .h5 was written when the codec failed at ingest.
Re-forwarding from thor-watcher would also fix this, but that requires
operator coordination on every watcher machine and uses bandwidth this
script doesn't.
What this does
──────────────
Walks ``<store>/<serial>/<filename>`` for ``.IDFW`` / ``.IDFH`` files
and, for each one:
1. Reads the existing sidecar (preserving review state + captured_at).
2. Re-runs ``micromate.idf_file.read_idf_file()`` on the binary
bytes — passing ``data=`` so the codec doesn't try to read from
a path it doesn't know.
3. Pulls ``extensions.idf_report`` (the raw parsed Thor dict the
v0.18.0+ ingest path already stashed) and runs the v0.21.0
``build_bw_report_from_idf`` adapter against it.
4. Writes the refreshed sidecar with the new ``bw_report``,
bumped ``source.tool_version``, but preserved ``review`` block
+ the original ``captured_at`` timestamp.
5. Regenerates the .h5 waveform file via the existing
``event_hdf5`` writer. For IDFW that's the decoded per-sample
stream; for IDFH it's a 1-sample-per-interval synthesised array
(peak ADC count per channel) so the renderer's bar-chart code
has data to group on. Mic peak psi from the binary is merged
onto the IdfEvent before the bridge so the h5 writer's per-count
mic scale factor lands on a sensible value (without this the
mic chart on Thor events plots dB(L)-as-pseudo-psi and shows
bomb-level numbers).
Idempotent. Re-running it after a parser/adapter change just
re-writes sidecars — no DB writes, no thor-watcher coordination.
Usage
─────
python scripts/backfill_thor_events.py [--store-root PATH]
[--dry-run]
[--skip-hdf5]
[--force]
[-v]
By default, refreshes any Thor event whose sidecar is missing
``bw_report`` OR whose ``source.tool_version`` is older than the
current ``TOOL_VERSION``. ``--force`` refreshes every Thor event
regardless.
"""
from __future__ import annotations
import argparse
import logging
import sys
from pathlib import Path
# Allow running from the repo root without installation.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from minimateplus import event_file_io
from sfm.waveform_store import WaveformStore
log = logging.getLogger("backfill_thor_events")
def _is_thor_event(path: Path) -> bool:
if not path.is_file():
return False
if path.name.endswith((".sfm.json", ".h5", "_ASCII.TXT")):
return False
return path.suffix.upper() in (".IDFW", ".IDFH")
def _vtuple(s: str) -> tuple:
try:
return tuple(int(p) for p in str(s).split(".")[:3])
except Exception:
return (0, 0, 0)
def main(argv=None) -> int:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument(
"--db-path",
default=str(Path(__file__).resolve().parent.parent / "bridges" / "captures" / "seismo_relay.db"),
help="Used only to derive the default --store-root.",
)
p.add_argument("--store-root", default=None)
p.add_argument("--dry-run", action="store_true")
p.add_argument("--skip-hdf5", action="store_true",
help="Don't regenerate .h5 files for IDFW events.")
p.add_argument("--force", action="store_true",
help="Refresh every Thor event, not just ones with stale or missing bw_report.")
p.add_argument("-v", "--verbose", action="store_true")
args = p.parse_args(argv)
logging.basicConfig(
level=logging.DEBUG if args.verbose else logging.INFO,
format="%(asctime)s %(levelname)-7s %(name)s %(message)s",
datefmt="%H:%M:%S",
)
db_path = Path(args.db_path).expanduser().resolve()
store_root = (
Path(args.store_root).expanduser().resolve()
if args.store_root else db_path.parent / "waveforms"
)
if not store_root.exists():
log.error("store root not found: %s", store_root)
return 1
store = WaveformStore(store_root)
log.info("store root: %s", store_root)
log.info("current TOOL_VERSION: %s", event_file_io.TOOL_VERSION)
refreshed = skipped = errors = h5_written = 0
# Lazy imports so any one of these failing produces a useful error
# message rather than crashing module-load.
from micromate.idf_file import read_idf_file
from micromate.idf_to_bw_report import build_bw_report_from_idf
for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
serial = serial_dir.name
for path in sorted(serial_dir.iterdir()):
if not _is_thor_event(path):
continue
sidecar_path = store.sidecar_path_for(serial, path.name)
if not sidecar_path.exists():
log.debug("%s: no sidecar — skipping (this is a binary without ingest history)",
path.name)
skipped += 1
continue
try:
existing = event_file_io.read_sidecar(sidecar_path)
except Exception as exc:
log.warning("%s: failed to read sidecar — %s", path.name, exc)
errors += 1
continue
has_bw_report = bool(existing.get("bw_report"))
existing_version = (existing.get("source") or {}).get("tool_version", "")
up_to_date = (
has_bw_report
and _vtuple(existing_version) >= _vtuple(event_file_io.TOOL_VERSION)
)
if up_to_date and not args.force:
skipped += 1
continue
# Re-decode the binary. Catch + log; continue with .txt-only
# data if it fails (matches the live ingest path's behavior).
idf_samples = None
idf_intervals = None
binary_md = None
is_histogram = path.suffix.upper() == ".IDFH"
try:
binary_bytes = path.read_bytes()
res = read_idf_file(path, data=binary_bytes)
idf_samples = res.samples or None
idf_intervals = res.intervals
binary_md = res.binary_metadata
is_histogram = res.intervals is not None
except NotImplementedError:
# sig-B / Blastware-stray binary; no samples but adapter
# can still produce a bw_report from extensions.idf_report.
log.debug("%s: binary codec NotImplementedError (sig-B / BW-stray); proceeding from sidecar's idf_report only", path.name)
except Exception as exc:
log.warning("%s: binary decode failed — %s; proceeding from sidecar's idf_report only", path.name, exc)
# Run the adapter. Pull report_dict from
# extensions.idf_report (the v0.18.0+ ingest preserved it).
report_dict = (existing.get("extensions") or {}).get("idf_report") or {}
if not report_dict and binary_md is None:
log.debug("%s: no idf_report in sidecar AND no binary metadata — nothing to project", path.name)
skipped += 1
continue
try:
bw_report = build_bw_report_from_idf(
report_dict, binary_md=binary_md,
intervals=idf_intervals, is_histogram=is_histogram,
)
except Exception as exc:
log.warning("%s: adapter failed — %s", path.name, exc)
errors += 1
continue
# Build the new sidecar by overlaying refreshed fields onto
# the existing one — preserves review, captured_at, blastware
# block, source.kind, etc.
new_sidecar = dict(existing) # shallow copy
new_sidecar["bw_report"] = bw_report
src = dict(new_sidecar.get("source") or {})
src["tool_version"] = event_file_io.TOOL_VERSION
new_sidecar["source"] = src
# Preserve histogram intervals if the binary decoded them
# (improves over the original ingest if that one ran before
# the bee1185 codec fix).
if idf_intervals is not None:
ext = dict(new_sidecar.get("extensions") or {})
ext["idf_intervals"] = [
{
"offset": iv.offset,
"tran_peak": iv.peak_count("Tran"),
"tran_halfp": iv.tran_halfp,
"tran_freq": iv.freq_hz("Tran"),
"vert_peak": iv.peak_count("Vert"),
"vert_halfp": iv.vert_halfp,
"vert_freq": iv.freq_hz("Vert"),
"long_peak": iv.peak_count("Long"),
"long_halfp": iv.long_halfp,
"long_freq": iv.freq_hz("Long"),
"mic_peak": iv.peak_count("MicL"),
"mic_halfp": iv.micl_halfp,
"mic_freq": iv.freq_hz("MicL"),
}
for iv in idf_intervals
]
new_sidecar["extensions"] = ext
if args.dry_run:
will_write_h5 = (idf_samples or idf_intervals) and not args.skip_hdf5
log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)",
serial, path.name,
"wrote" if not has_bw_report else "refreshed",
"would write" if will_write_h5 else "skipped")
else:
event_file_io.write_sidecar(sidecar_path, new_sidecar)
log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)",
serial, path.name,
"added" if not has_bw_report else "refreshed",
len(idf_intervals) if idf_intervals else 0)
refreshed += 1
# Regenerate .h5 by replaying the same IdfEvent → Event bridge
# save_imported_idf uses. For IDFW we write the decoded per-
# sample arrays. For IDFH we synthesise a 1-sample-per-interval
# array (peak ADC count per channel per interval) so the
# renderer's bar-chart code has something to group on.
# Pre-condition: either real samples (IDFW) or decoded intervals
# (IDFH). Skip otherwise.
have_data = bool(idf_samples) or bool(idf_intervals)
if have_data and not args.skip_hdf5:
from sfm import event_hdf5
hdf5_path = store.hdf5_path_for(serial, path.name)
if args.dry_run:
log.debug("[DRY] would write %s", hdf5_path.name)
else:
try:
from micromate import IdfEvent
from minimateplus.event_file_io import file_sha256
idf_event = IdfEvent.from_report(report_dict, path.name)
# Merge the binary-derived mic peak psi (only the
# binary path knows the proper psi value; the .txt
# carries dB(L)). Without this, the h5 writer's
# per-count mic factor is computed against the
# dB(L) value-as-pseudo-psi and the mic chart
# scales wildly.
if (binary_md is not None and res is not None
and res.event.peaks.mic_pspl_psi is not None):
idf_event.peaks.mic_pspl_psi = res.event.peaks.mic_pspl_psi
sha256 = file_sha256(path)
waveform_key = bytes.fromhex(sha256)[:16]
ev = idf_event.to_minimateplus_event(waveform_key)
if is_histogram and idf_intervals:
# 1 sample per interval per channel — same
# synthesis save_imported_idf uses. The h5
# writer's count×geo_fs/32768 conversion turns
# each peak-ADC-count into the bar's physical
# value.
ev.raw_samples = {
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
"Long": [iv.peak_count("Long") for iv in idf_intervals],
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
}
ev.total_samples = ev.total_samples or len(idf_intervals)
elif idf_samples:
ev.raw_samples = idf_samples
n_samp = max(
(len(idf_samples.get(ch, []))
for ch in ("Tran", "Vert", "Long", "MicL")),
default=0,
)
ev.total_samples = ev.total_samples or n_samp
event_hdf5.write_event_hdf5(
hdf5_path, ev,
serial=serial,
geo_range="normal",
source_kind="idf-import",
tool_version=event_file_io.TOOL_VERSION,
)
h5_written += 1
log.debug("%s/%s — .h5 written (%s)",
serial, path.name,
f"{len(idf_intervals)} intervals" if is_histogram
else f"{sum(len(v) for v in (idf_samples or {}).values())} samples")
except Exception as exc:
log.warning("%s/%s — .h5 write failed: %s",
serial, path.name, exc)
log.info("Done. refreshed=%d skipped=%d errors=%d h5_written=%d",
refreshed, skipped, errors, h5_written)
return 0 if errors == 0 else 2
if __name__ == "__main__":
sys.exit(main())
+91
View File
@@ -0,0 +1,91 @@
"""Re-ingest a prod IDFW + IDFH via the patched save_imported_idf and
render both PDFs to confirm charts have data."""
from __future__ import annotations
import sys
import json
import datetime
import tempfile
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
from sfm.waveform_store import WaveformStore
from sfm import report_pdf
import h5py
class FakeDb:
def __init__(self, event):
self.event = event
def get_event(self, _id):
return self.event
def to_ts_iso(ts):
if ts is None:
return None
try:
return datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
except Exception:
return None
def render_case(idf_path: Path, serial: str, out_pdf: Path, h5_summary: bool = True):
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idf_path.read_bytes(),
idf_path,
idf_report_text=None, # production worst case: no .txt
)
print(f"=== {idf_path.name} ===")
print(f" h5: {rec['hdf5_filename']}, sidecar: {rec['sidecar_filename']}")
h5p = Path(td) / serial / f"{idf_path.name}.h5"
if h5p.exists() and h5_summary:
with h5py.File(h5p) as h:
for ch in ("Tran", "Vert", "Long", "MicL"):
ds = h.get(f"samples/{ch}")
if ds is not None:
n = ds.shape[0]
mx = float(abs(ds[...]).max()) if n else 0
print(f" samples/{ch}: n={n} max_abs={mx:.5f}")
record_type = "Histogram" if idf_path.suffix.upper() == ".IDFH" else "Waveform"
fake_row = {
"serial": serial,
"blastware_filename": rec["filename"],
"record_type": record_type,
"timestamp": to_ts_iso(ev.timestamp),
"sample_rate": ev.sample_rate,
"project": ev.project_info.project if ev.project_info else None,
"client": ev.project_info.client if ev.project_info else None,
"operator": ev.project_info.operator if ev.project_info else None,
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
"created_at": None,
}
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
print(f" ReportData: channels={ {k: len(v) for k,v in rd.channels.items()} }")
if rd.is_histogram:
print(f" histogram n_intervals={rd.histogram_n_intervals} interval_size={rd.histogram_interval_size}")
pdf = report_pdf.render_event_report_pdf(rd)
out_pdf.write_bytes(pdf)
print(f" PDF: {out_pdf} ({len(pdf)} bytes)")
def main():
out_dir = Path("/tmp/thor_render_test"); out_dir.mkdir(exist_ok=True)
cases = [
# IDFW that decoded to preamble-only under the old codec
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804154137.IDFW", "UM6047"),
# IDFW that worked under the old codec (validates no regression)
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804104450.IDFW", "UM6047"),
# IDFH histogram
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804190047.IDFH", "UM6047"),
]
for path, serial in cases:
render_case(Path(path), serial, out_dir / f"{Path(path).name}.pdf")
if __name__ == "__main__":
main()
+63 -24
View File
@@ -638,14 +638,7 @@ def _draw_channel_stats_waveform(ax, rd: ReportData) -> None:
("Sensor Check", "sensor_check", ""), ("Sensor Check", "sensor_check", ""),
] ]
_draw_stats_table(ax, rd, rows_spec) _draw_stats_table(ax, rd, rows_spec)
if rd.peak_vector_sum_ips is not None: _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec))
line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s"
if rd.peak_vector_sum_time_s is not None:
line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
ax.text(0.0, -0.08, line, fontsize=9, weight="bold",
ha="left", va="top", transform=ax.transAxes)
ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888",
ha="left", va="top", transform=ax.transAxes)
def _draw_channel_stats_histogram(ax, rd: ReportData) -> None: def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
@@ -663,20 +656,54 @@ def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
("Sensor Check", "sensor_check", ""), ("Sensor Check", "sensor_check", ""),
] ]
_draw_stats_table(ax, rd, rows_spec) _draw_stats_table(ax, rd, rows_spec)
if rd.peak_vector_sum_ips is not None: _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec), histogram_when=True)
line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s"
# Histograms: "0.091 in/s on May 27, 2026 At 06:06:14"
# The when_str is "HH:MM:SS Month DD, YYYY" — reformat for BW match. def _draw_pvs_summary(
if rd.peak_vector_sum_when_str: ax,
parts = rd.peak_vector_sum_when_str.split(" ", 1) rd: ReportData,
if len(parts) == 2: *,
line += f" on {parts[1]} At {parts[0]}" n_data_rows: int,
else: histogram_when: bool = False,
line += f" on {rd.peak_vector_sum_when_str}" ) -> None:
ax.text(0.0, -0.08, line, fontsize=9, weight="bold", """Render the Peak Vector Sum + 'NA: Not Applicable' caption below the
ha="left", va="top", transform=ax.transAxes) stats table.
ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888",
ha="left", va="top", transform=ax.transAxes) Reads ``ax._stats_table_bottom`` (set by ``_draw_stats_table`` when
it pins the table via an explicit ``bbox``) so the PVS line lands
just below the table's known bottom edge instead of guessing at the
geometry.
Centered horizontally for visual balance (the previous left-aligned
x=0 landed under the label column, not the data, which looked off).
"""
if rd.peak_vector_sum_ips is None:
return
line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s"
if histogram_when and rd.peak_vector_sum_when_str:
# Histogram absolute date+time. when_str is "HH:MM:SS Month DD, YYYY";
# reformat to "<value> on <date> At <time>" to match BW.
parts = rd.peak_vector_sum_when_str.split(" ", 1)
if len(parts) == 2:
line += f" on {parts[1]} At {parts[0]}"
else:
line += f" on {rd.peak_vector_sum_when_str}"
elif not histogram_when and rd.peak_vector_sum_time_s is not None:
line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
# _draw_stats_table stashes the bbox bottom on the axes so we don't
# have to guess geometry. Falls back to a conservative default if
# the bbox approach hasn't run.
table_bottom_y = getattr(ax, "_stats_table_bottom", -0.10)
pvs_y = table_bottom_y - 0.04 # small gap below the table border
# Centered for visual balance — looks intentional rather than offset.
# The original BW-replica had a "NA: Not Applicable" caption below
# this line; dropped because we use "—" for missing values and the
# legend was always squished against the PVS line.
ax.text(0.5, pvs_y, line, fontsize=9, weight="bold",
ha="center", va="top", transform=ax.transAxes)
def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]]) -> None: def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]]) -> None:
@@ -711,16 +738,28 @@ def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]])
_cell(field_name, "Long"), _cell(field_name, "Long"),
unit, unit,
]) ])
# Pin the table's position+size via bbox so we know exactly where
# the bottom edge lands. Lets _draw_pvs_summary place the PVS line
# just below the table without guessing at row heights.
#
# bbox = [x, y, width, height] in axes coords. Header + data rows
# at row_h each; horizontal extent matches sum(colWidths).
n_rows = len(table_data) # header + data rows
row_h = 0.12 # axes-fraction per row (fits fontsize=8)
table_height = n_rows * row_h
table_bottom = 1.0 - table_height
tbl = ax.table( tbl = ax.table(
cellText=table_data, loc="upper left", cellText=table_data,
colWidths=[0.28, 0.14, 0.14, 0.14, 0.10], colWidths=[0.28, 0.14, 0.14, 0.14, 0.10],
cellLoc="left", edges="open", cellLoc="left", edges="open",
bbox=[0.0, table_bottom, 0.80, table_height],
) )
tbl.auto_set_font_size(False) tbl.auto_set_font_size(False)
tbl.set_fontsize(8) tbl.set_fontsize(8)
tbl.scale(1, 1.4)
for j in range(5): for j in range(5):
tbl[(0, j)].set_text_props(weight="bold", color="#555") tbl[(0, j)].set_text_props(weight="bold", color="#555")
# Stash the bottom Y so _draw_pvs_summary can position itself below.
ax._stats_table_bottom = table_bottom
def _channel_axis_color(ch: str) -> str: def _channel_axis_color(ch: str) -> str:
+1 -1
View File
@@ -3287,7 +3287,7 @@ if (currentSection === 'db') {
<dt id="sc-l-bwsize">File size</dt> <dd id="sc-f-bwsize"></dd> <dt id="sc-l-bwsize">File size</dt> <dd id="sc-f-bwsize"></dd>
<dt id="sc-l-sha">File sha256</dt> <dd id="sc-f-sha"></dd> <dt id="sc-l-sha">File sha256</dt> <dd id="sc-f-sha"></dd>
<dt>Source kind</dt> <dd id="sc-f-src"></dd> <dt>Source kind</dt> <dd id="sc-f-src"></dd>
<dt title="When our server received and stored this event (sfm-db insert time, not the recording time)">Received by server at</dt> <dt title="When SFM received and stored this event — NOT the unit-local trigger time (see Timestamp at the top of the modal for that).">Time received</dt>
<dd id="sc-f-cap"></dd> <dd id="sc-f-cap"></dd>
</dl> </dl>
</div> </div>
+189 -23
View File
@@ -467,21 +467,21 @@ class WaveformStore:
Ingest a Thor (Micromate Series IV) IDF event file (`.IDFW` or Ingest a Thor (Micromate Series IV) IDF event file (`.IDFW` or
`.IDFH`) produced by Thor's TXT exporter. `.IDFH`) produced by Thor's TXT exporter.
Thor binaries are stored as opaque bytes seismo-relay doesn't
yet decode the proprietary IDF binary format (codec slot lives
at ``micromate/idf_file.py``). Device-authoritative metadata
comes from the paired ``.IDFW.txt`` / ``.IDFH.txt`` sidecar
when supplied.
Workflow: Workflow:
1. Parse the paired TXT report (when supplied) via 1. For sig-A `.IDFW` binaries, decode samples + binary metadata
``micromate.parse_idf_report`` dict. via ``micromate.idf_file.read_idf_file()``. Failure or
2. Wrap parsed dict + filename into a typed ``micromate.IdfEvent``. non-IDFW path falls through to the .txt-only flow.
3. Copy bytes verbatim into ``<root>/<serial>/<filename>``. 2. Parse the paired TXT report (when supplied) via
4. Bridge IdfEvent ``minimateplus.Event`` (for the existing ``micromate.parse_idf_report`` dict. TXT remains the
sidecar / DB insert machinery) via source of truth for fields the binary doesn't yet supply
``IdfEvent.to_minimateplus_event(waveform_key)``. (full peak set with ZC freq / Time of Peak, sensor self-check,
5. Write the ``.sfm.json`` sidecar with firmware string, project strings).
3. Wrap parsed dict + filename into a typed ``micromate.IdfEvent``.
4. Copy bytes verbatim into ``<root>/<serial>/<filename>``.
5. Bridge IdfEvent ``minimateplus.Event`` and attach
``raw_samples`` from the binary decoder (when available).
6. Write the `.h5` clean-waveform file when samples decoded.
7. Write the ``.sfm.json`` sidecar with
``source.kind = "idf-import"`` and the full raw IDF report ``source.kind = "idf-import"`` and the full raw IDF report
under ``extensions.idf_report``. under ``extensions.idf_report``.
@@ -490,7 +490,38 @@ class WaveformStore:
""" """
from micromate import IdfEvent, parse_idf_report from micromate import IdfEvent, parse_idf_report
# Parse the .txt sidecar (best-effort; non-fatal on failure). # 1. Binary decode (sig-A IDFW and IDFH). Non-fatal: any failure
# leaves samples / binary metadata unfilled and we proceed with
# the .txt path as before.
idf_samples: Optional[dict] = None
idf_intervals: Optional[list] = None
binary_md = None
binary_peaks = None
is_histogram = False
try:
from micromate.idf_file import read_idf_file
# Pass idf_bytes through `data=` — at this point in the flow
# the binary hasn't been written to disk yet, so the codec
# can't read from source_path. We still pass source_path so
# the codec has the filename for error messages + .IDFH
# suffix detection.
res = read_idf_file(source_path, data=idf_bytes)
idf_samples = res.samples or None
idf_intervals = res.intervals
is_histogram = res.intervals is not None
binary_md = res.binary_metadata
binary_peaks = res.event.peaks
except NotImplementedError:
# sig-B — codec doesn't handle this yet.
pass
except Exception as exc:
log.warning(
"save_imported_idf: binary codec failed for %s: %s"
"falling back to .txt-only ingest",
source_path.name, exc,
)
# 2. Parse the .txt sidecar (best-effort; non-fatal on failure).
report_dict: dict = {} report_dict: dict = {}
if idf_report_text is not None: if idf_report_text is not None:
try: try:
@@ -501,17 +532,58 @@ class WaveformStore:
exc, exc,
) )
# Build the typed IdfEvent. Filename is authoritative for # 3. Backfill report_dict with binary metadata for fields the
# .txt didn't supply. Binary takes precedence on tied fields
# where the binary is more reliable (timestamp, sample_rate),
# and fills in fields entirely missing from the .txt.
if binary_md is not None:
if binary_md.serial and not report_dict.get("serial_number"):
report_dict["serial_number"] = binary_md.serial
if binary_md.event_datetime and not report_dict.get("event_datetime"):
report_dict["event_datetime"] = binary_md.event_datetime
if binary_md.sample_rate and not report_dict.get("sample_rate"):
report_dict["sample_rate"] = binary_md.sample_rate
if binary_md.record_time_sec and not report_dict.get("record_time_sec"):
report_dict["record_time_sec"] = binary_md.record_time_sec
# Calibration date (binary) vs calibration text (.txt) cohabit
# under different keys; no overwrite needed.
if binary_md.event_datetime and not report_dict.get("event_type"):
report_dict["event_type"] = (
"Full Histogram" if is_histogram else "Full Waveform"
)
# Binary-derived peaks fill in when the .txt didn't supply them.
# They're ~3% low vs the device-authoritative .txt values (residual
# codec drift), so .txt always wins when present.
if binary_peaks is not None:
if binary_peaks.transverse_ips and not report_dict.get("tran_ppv"):
report_dict["tran_ppv"] = binary_peaks.transverse_ips
if binary_peaks.vertical_ips and not report_dict.get("vert_ppv"):
report_dict["vert_ppv"] = binary_peaks.vertical_ips
if binary_peaks.longitudinal_ips and not report_dict.get("long_ppv"):
report_dict["long_ppv"] = binary_peaks.longitudinal_ips
# 4. Build the typed IdfEvent. Filename is authoritative for
# (serial, timestamp, kind); the report's event_datetime takes # (serial, timestamp, kind); the report's event_datetime takes
# precedence over the filename timestamp inside from_report(). # precedence over the filename timestamp inside from_report().
idf_event = IdfEvent.from_report(report_dict, source_path.name) idf_event = IdfEvent.from_report(report_dict, source_path.name)
# The binary mic peak (psi) isn't carried through from_report() —
# IdfReport.from_dict only sees the .txt's dB(L) value. Pull the
# binary-derived ``mic_pspl_psi`` onto the typed IdfEvent so the
# downstream bridge can populate ``PeakValues.micl`` (psi-shaped)
# and the h5 writer's per-count mic factor lands at a sensible
# value. Without this, the h5 mic chart auto-scales against the
# dB(L) value-as-pseudo-psi and renders ~flat.
if binary_peaks is not None and binary_peaks.mic_pspl_psi is not None:
idf_event.peaks.mic_pspl_psi = binary_peaks.mic_pspl_psi
# Operator-supplied serial_hint wins over the binary's filename # Operator-supplied serial_hint wins over the binary's filename
# prefix when both are present (e.g. callers passing a known-good # prefix when both are present (e.g. callers passing a known-good
# serial that overrides a misnamed export). # serial that overrides a misnamed export).
serial = serial_hint or idf_event.serial or "UNKNOWN" serial = serial_hint or idf_event.serial or "UNKNOWN"
# Filesystem write. # 5. Filesystem write of binary bytes.
filename = source_path.name filename = source_path.name
bw_path = self._serial_dir(serial) / filename bw_path = self._serial_dir(serial) / filename
bw_path.write_bytes(idf_bytes) bw_path.write_bytes(idf_bytes)
@@ -523,13 +595,59 @@ class WaveformStore:
# surrogate — every distinct binary maps to a distinct row. # surrogate — every distinct binary maps to a distinct row.
waveform_key = bytes.fromhex(sha256)[:16] waveform_key = bytes.fromhex(sha256)[:16]
# Bridge to minimateplus.Event for the existing sidecar / DB # 6. Bridge to minimateplus.Event for the existing sidecar / DB
# insert paths. See IdfEvent.to_minimateplus_event() for the # insert paths. See IdfEvent.to_minimateplus_event() for the
# caveats of this bridge (mic units, missing fields → sidecar). # caveats of this bridge (mic units, missing fields → sidecar).
ev = idf_event.to_minimateplus_event(waveform_key) ev = idf_event.to_minimateplus_event(waveform_key)
# Write the sidecar. Source kind "idf-import" was added to the # Attach the decoded sample arrays. Thor's decoder counts use
# allow-list in event_file_io.event_to_sidecar_dict for this. # LSB = 0.0003 in/s for geo (vs BW's 16-count units at 0.005 in/s)
# — the .h5 writer's geo_range="normal" yields LSB = 10/32768
# ≈ 0.000305 in/s, so plotted samples come out ~1.7% high.
# Acceptable known offset; refine with a Thor-aware h5 path later.
if idf_samples is not None:
ev.raw_samples = idf_samples
n_samples = max((len(idf_samples.get(ch, [])) for ch in ("Tran", "Vert", "Long", "MicL")), default=0)
ev.total_samples = ev.total_samples or n_samples
# For IDFH histograms there are no per-sample waveform arrays — the
# device stores one peak ADC count per interval per channel. Synthesise
# a 1-sample-per-interval array so the existing h5+renderer pipeline
# (which groups samples down to ``n_intervals`` bars via max-per-group)
# produces a non-blank histogram chart. Each "sample" is the peak ADC
# count for that interval, so the h5 writer's ``count × geo_fs/32768``
# conversion yields the right physical value for the bar height.
if is_histogram and idf_intervals:
hist_samples = {
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
"Long": [iv.peak_count("Long") for iv in idf_intervals],
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
}
ev.raw_samples = hist_samples
ev.total_samples = ev.total_samples or len(idf_intervals)
# 7. Write the .h5 clean-waveform file when we have samples to write
# (either the IDFW per-sample stream, or the IDFH synthesised per-
# interval peak array). The renderer treats both shapes the same way.
hdf5_filename: Optional[str] = None
if ev.raw_samples:
hdf5_path = self.hdf5_path_for(serial, filename)
try:
event_hdf5.write_event_hdf5(
hdf5_path, ev,
serial=serial,
geo_range="normal", # Thor's geo full scale is also 10 in/s (Normal)
source_kind="idf-import",
)
hdf5_filename = hdf5_path.name
except Exception as exc:
log.warning(
"save_imported_idf: HDF5 write failed for %s: %s — continuing without .h5",
hdf5_path, exc,
)
# 8. Write the sidecar. Source kind "idf-import" is on the allow-list.
sidecar_path = self.sidecar_path_for(serial, filename) sidecar_path = self.sidecar_path_for(serial, filename)
existing_review = None existing_review = None
if sidecar_path.exists(): if sidecar_path.exists():
@@ -554,19 +672,67 @@ class WaveformStore:
# Time of Peak, sensor self-check, calibration, firmware). # Time of Peak, sensor self-check, calibration, firmware).
if report_dict: if report_dict:
sidecar["extensions"]["idf_report"] = report_dict sidecar["extensions"]["idf_report"] = report_dict
# Project the IDF report into the BW report sidecar shape so the
# existing Event Report PDF pipeline (sfm/report_pdf.py) can
# render Thor events without needing a separate code path. Thor
# data is 95% the same metric set as BW — the adapter handles
# the field-name mapping.
if report_dict or binary_md is not None:
try:
from micromate.idf_to_bw_report import build_bw_report_from_idf
sidecar["bw_report"] = build_bw_report_from_idf(
report_dict or {},
binary_md=binary_md,
intervals=idf_intervals,
is_histogram=is_histogram,
)
except Exception as exc:
log.warning(
"save_imported_idf: idf→bw_report adapter failed for %s: %s"
"report PDF will fall back to DB-only fields",
filename, exc,
)
# For histograms, also stash the binary-decoded per-interval
# records so the UI / report layer doesn't need to re-walk the
# IDFH file at render time.
if idf_intervals is not None:
sidecar["extensions"]["idf_intervals"] = [
{
"offset": iv.offset,
"tran_peak": iv.peak_count("Tran"),
"tran_halfp": iv.tran_halfp,
"tran_freq": iv.freq_hz("Tran"),
"vert_peak": iv.peak_count("Vert"),
"vert_halfp": iv.vert_halfp,
"vert_freq": iv.freq_hz("Vert"),
"long_peak": iv.peak_count("Long"),
"long_halfp": iv.long_halfp,
"long_freq": iv.freq_hz("Long"),
"mic_peak": iv.peak_count("MicL"),
"mic_halfp": iv.micl_halfp,
"mic_freq": iv.freq_hz("MicL"),
}
for iv in idf_intervals
]
event_file_io.write_sidecar(sidecar_path, sidecar) event_file_io.write_sidecar(sidecar_path, sidecar)
log.info( log.info(
"WaveformStore.save_imported_idf serial=%s filename=%s filesize=%d " "WaveformStore.save_imported_idf serial=%s filename=%s filesize=%d "
"report_attached=%s", "kind=%s report_attached=%s binary_decoded=%s h5=%s intervals=%d",
serial, filename, filesize, bool(report_dict), serial, filename, filesize,
"histogram" if is_histogram else "waveform",
bool(report_dict),
(idf_samples is not None) or (idf_intervals is not None),
hdf5_filename or "(skipped)",
len(idf_intervals) if idf_intervals else 0,
) )
return ev, { return ev, {
"filename": filename, "filename": filename,
"filesize": filesize, "filesize": filesize,
"sha256": sha256, "sha256": sha256,
"a5_pickle_filename": None, "a5_pickle_filename": None,
"hdf5_filename": None, "hdf5_filename": hdf5_filename,
"sidecar_filename": sidecar_path.name, "sidecar_filename": sidecar_path.name,
"serial": serial, "serial": serial,
} }