Merge pull request 'update to v0.21.1, thor data import successful' (#29 ) from dev into main

Reviewed-on: #29
fix(backfill): regenerate IDFH .h5 + merge binary mic_pspl_psi onto bridge
2026-06-01 16:54:23 -04:00 · 2026-06-01 20:02:54 +00:00 · 2026-06-01 19:33:44 +00:00 · 2026-06-01 18:27:24 +00:00 · 2026-05-31 20:51:09 +00:00 · 2026-05-30 04:37:43 +00:00
122 changed files with 42528 additions and 807 deletions
@@ -0,0 +1,28 @@
 .git
 .gitignore
 .venv
 venv
 env
 __pycache__
 *.pyc
 *.pyo
 *.pyd
 .pytest_cache
 .mypy_cache
 .ruff_cache
 *.db
 *.db-wal
 *.db-shm
 *.sqlite
 *.sqlite3
 sfm/data
 bridges/captures
 example-events
 captures
 logs
 .DS_Store
 Thumbs.db
@@ -1,6 +1,6 @@
 /bridges/captures/
 /example-events/
-
+/tests/fixtures/
 /manuals/
 # Python build artifacts
@@ -4,6 +4,336 @@ All notable changes to seismo-relay are documented here.
 ---
 ## [Unreleased]
 ---
 ## v0.21.1 — 2026-06-01
 Bug fixes against v0.21.0 surfaced after the first prod redeploy.  Three
 production-visible symptoms — blank waveform charts on most Thor events,
 blank histogram charts on all Thor events, and a mic chart that
 auto-scaled against a dB(L) value treated as psi — all root-caused and
 fixed.
 ### Fixed
 - **Dynamic IDFW body offset.**  The v0.21.0 codec hardcoded the body
  at file offset `0x0f1f` based on the example corpus, but only ~52%
  of production IDFW events use that offset; the rest sit at offsets
  from `0x1033` up to `0x3082` depending on header padding.  At
  `0x0f1f` the codec would find a coincidentally-matching `00 02 00`
  magic, read the 2-byte Tran preamble, and return empty V/L/M
  arrays — producing near-empty .h5 files and blank charts.
  `micromate.idf_file._find_waveform_body_offset()` now scans every
  `00 02 00` magic position past `0x0E00`, trial-decodes each one,
  and picks the offset with the most samples.  Validated across 483
  prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully
  decode, 126/483 partial (BW codec walker-stops-early on loud
  events — pre-existing limitation, samples reached are correct).
 - **IDFH histograms now render bar charts.**  Histograms previously
  skipped the .h5 write because there are no per-sample arrays, but
  the renderer drives the per-interval bar chart from .h5 channel
  data + `bw_report.histogram.n_intervals`.  `save_imported_idf` now
  synthesizes a 1-sample-per-interval array from the decoded
  `IdfhInterval` peak counts and writes an .h5 so the existing
  renderer works unchanged — each "sample" is the per-interval peak
  ADC count, so the writer's `count × geo_fs/32768` conversion
  yields the right bar height.
 - **Mic chart scaling on Thor events.**  `PeakValues.micl` (consumed
  by the h5 writer's per-count mic scale factor) expects psi, but
  the Thor bridge was stuffing the dB(L) value (~99.4) into it,
  producing a per-count factor 5+ orders of magnitude too large and
  a flat-looking mic chart.  Fixed by adding `IdfPeaks.mic_pspl_psi`
  alongside `mic_pspl_dbl`; `read_idf_file()` computes it from
  binary mic counts (`max(|MicL|) × 2.14e-6 psi/count`) for both
  IDFW and IDFH paths; `save_imported_idf` merges it onto the typed
  event after `IdfEvent.from_report`; the bridge feeds psi to
  `PeakValues.micl` with a dB(L)→psi formula fallback when only the
  dB(L) value is available.  dB(L) for the report header still
  flows through `bw_report.mic.pspl_dbl` unchanged.
 ### Operator
 After deploy, run `python scripts/backfill_thor_events.py` to refresh
 every existing Thor event's sidecar + .h5 with the corrected codec
 output.  The script auto-skips events already at the current
 `TOOL_VERSION`, so the bump from `0.21.0` → `0.21.1` is what triggers
 the refresh.
 ---
 ## v0.21.0 — 2026-05-29
 The "Thor / Series IV codec" release.  Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline.  Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event.
 ### Added — Thor IDF binary codec (`micromate/idf_file.read_idf_file`)
 - **IDFW (waveform)** — body sits at fixed file offset `0x0f1f`; reuses the verified `decode_waveform_v2()` walker from `minimateplus.waveform_codec`.  Sample fidelity is **87–99% byte-exact** against the ASCII-sidecar reference values on quiet events; loud events hit the same walker-stops-early limitation as the BW codec on `SP0/SS0/SV0`-style events.
 - **IDFH (histogram)** — dedicated segment-based decoder for the Thor histogram body format: `[len_be][0a 00 00 00][00 NN][05 3f]` framing plus N × 72-byte interval records (4 × 16-byte per-channel min/max/halfp).  **All 859 Thor IDFH corpus files decode**, totalling **181,071 intervals**; per-channel peaks match the sidecar within **~1.8% (ADC quantization)**.
 - **BW-aliased binary detection** — a small number of corpus files (e.g. `BE9439_*.IDFW/IDFH`) are actually Series III Blastware binaries that share the IDF filename convention by accident.  `read_idf_file()` detects them via their BW `STRT` signature and raises `NotImplementedError` pointing the caller at `read_blastware_file()` instead of trying to decode them as IDF.
 - Full field layouts in `docs/idf_protocol_reference.md`; supporting analysis scripts in `analysis_idf/` (decode validators, per-file detail dumps, corpus accuracy reports).
 ### Added — Thor → BW report adapter (`micromate/idf_to_bw_report.py`)
 - **`build_bw_report_from_idf(report_dict, binary_md=, intervals=, is_histogram=)`** projects a parsed Thor `IdfReport` plus binary-extracted metadata plus decoded IDFH intervals into the `bw_report`-shaped dict that `sfm.report_pdf.gather_report_data` consumes.  No need to duplicate the renderer — Thor data is ~95% the same metric set as BW; the adapter handles the field-name mapping (`MicPSPL` → `pspl_dbl`, `>100` sentinel → `zc_freq_above_range`, free-form `Calibration : Nov 22, 2023 by Instantel` → `calibration_date` + `calibration_by`, etc.).
 - For IDFH events the adapter derives `histogram.interval_times` by stepping `IntervalSize` from `HistogramStartTime`, matching what the BW pipeline expects from a histogram-mode event.
 - **Wired into `WaveformStore.save_imported_idf`** — every Thor event ingested via `/db/import/idf_file` now gets a `bw_report` block in its sidecar in addition to the existing `extensions.idf_report` (the raw parsed Thor payload).  Falls back gracefully (PDF renders from DB-only fields) if the adapter raises — logged as a warning rather than failing the ingest.
 ### Companion releases
 - **Terra-View v0.13.0** ships in parallel — closes Phase 1 of the SFM integration.  The shared event-detail modal now renders the SFM event story (Chart.js waveform/histogram chart, inline PDF preview, `.TXT` download, FT/reviewer/notes review form) without operators needing to bounce to the standalone SFM webapp on port 8200.  Uses only existing seismo-relay endpoints — no API changes here, just better consumption.
 ### Migration / Operations
 No DB migration needed.  Existing Thor events already in the store don't automatically pick up the new `bw_report` block — they'd need a re-ingest (post the IDF binary + paired `.TXT` back to `/db/import/idf_file`) for the adapter to run.  Alternatively, run `scripts/backfill_sidecars.py --reparse-txt` after a small adapter change (the script currently only re-runs the BW ASCII parser; extending it to handle Thor would be a small follow-up).
 ```bash
 cd /home/serversdown/terra-view
 docker compose build sfm && docker compose up -d sfm
 ```
 The bumped `TOOL_VERSION = "0.21.0"` in `minimateplus/event_file_io.py` means any subsequent `backfill_sidecars.py --force` pass will re-write sidecars with the new version stamp; that's expected and harmless.
 ---
 ## v0.20.0 — 2026-05-28
 The "PDF + parser polish" release.  Closes out the Event-Report PDF iteration started in v0.17.x: histogram layouts now render correctly against BW reference PDFs, the ASCII parser handles the real-world edge cases production events were tripping over (OORANGE, `>100 Hz`, histogram timestamps), and the `.TXT` preservation rollout lets parser fixes be applied retroactively to ingested events.  Adds server-wide timezone support so operator-visible timestamps no longer drift into UTC.  Rolls up the substantial "pre-v0.20" body of work that had accumulated under `[Unreleased]` (PDF generation, histogram codec fix, histogram parser fields, `.TXT` preservation, backfill safety) — see the trailing "pre-v0.20.0 work" section below for the full list.
 ### Added (2026-05-28)
 - **Server-wide display timezone via `TZ` env var.**  Both seismo-relay and terra-view now respect a `TZ` environment variable (default `America/New_York` on prod).  Affects server log timestamps, the PDF report renderer's UTC→local conversions on the "Created" footer line, matplotlib's datetime axes, and any other naïve-vs-aware datetime rendering.  DB columns (`created_at`, etc.) stay UTC regardless — this is a display-side fix, not a storage-side one.  Dockerfile now installs `tzdata` (required for the env var to take effect under `python:slim`).  Override per-deployment via the `TZ` line in `docker-compose.yml`.
 - **ZC Freq "above-range" handling — render `>100 Hz` instead of `—`.**  BW writes `">100 Hz"` literally when the zero-crossing algorithm sees a peak too fast to count (device cuts off at 100 Hz on V10.72).  Previously `_parse_number(">100")` returned None and the PDF stats table rendered `—`.  Now the parser mirrors the OORANGE pattern: stores 100.0 on `zc_freq_hz` and sets a new `zc_freq_above_range` flag.  Flag rides through the sidecar's `bw_report` block.  Renders as `>100` in the PDF (per-channel + mic block), as `· >100 Hz` inline on the event modal's Peaks section, and as a dedicated column on the event-browser stats table.  Verified against the real T190LD5Q.LK0W fixture from 2026-05-27 plus a synthetic test case.
 - **Per-channel ZC Freq surfaced in event modals.**  Neither the main webapp modal (`sfm_webapp.html`) nor the standalone event browser (`event_browser.html`) previously exposed ZC Freq.  Now both do — webapp shows it inline alongside PPV (`0.04500 in/s · 47 Hz`); event-browser gets a dedicated column on its per-channel stats table.  Required wiring a parallel sidecar fetch into the event-browser's `loadEvent()` (it was only fetching `waveform.json`).  Falls back to `—` for events without a preserved `.TXT` (pre-2026-05-27 ingests).
 - **`scripts/backfill_sidecars.py --reparse-txt` flag.**  Before this, the backfill script preserved the `bw_report` block from existing sidecars verbatim — so parser-side fixes (like the `>100 Hz` addition above) couldn't reach old events.  The new flag re-runs the current parser against the preserved `<serial>/<filename>_ASCII.TXT`, overwrites the bw_report block, and cascade-regenerates the sidecar.  Implies sidecar regeneration on every event (bypasses the sha/version skip).  No-op for events without a preserved .TXT (legacy ingests pre-2026-05-27 .TXT-preservation rollout).  Idempotent.  Run with `--skip-hdf5` to skip waveform regen — recommended when only the bw_report needs refreshing.  Validated end-to-end on prod: 9,999 events refreshed cleanly, ZC Freq + OORANGE flags now populated where the original .TXT had them.
 ### Fixed (2026-05-28)
 - **Histogram PDFs no longer 500 on the missing `histogram_interval_size_s` attribute.**  The histogram-interval-times derivation block in `gather_report_data` referenced `rd.histogram_interval_size_s`, but the field was never declared on the `ReportData` dataclass nor read from the sidecar projection (it was inlined into `gather_report_data` without the seconds-numeric counterpart making it onto the dataclass).  Every histogram PDF render raised `AttributeError → 500`.  Waveform PDFs were unaffected.  Fix: add the field, read it from the projection's existing `bw_report.histogram.interval_size_s` key.
 - **Histogram PDF geo channels now share a single nice-quantized y-axis.**  Previously each geo subplot auto-scaled independently — Tran, Vert, and Long all showed different per-channel maxes, so bar heights weren't directly comparable across channels.  The footer "Amplitude Geo: X in/s/div" label was also computed as `max(first_geo_channel) / 5` with no LSB quantization, producing nonsense values like `0.003 in/s/div` when the geophone LSB is 0.005.  Fix: compute a single shared geo y-axis range from `max(Tran, Vert, Long)`, quantize the per-division step to BW's 1-2-5 sequence rounded to the 0.005 in/s LSB (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, ...), apply the same `ylim` + ticks to all three subplots, and use that step for the footer label.  MicL stays on its own auto-scale (different units).  Matches BW's chart styling.
 ### Docs (2026-05-28)
 - **Roadmap entry for a second undecoded histogram body sub-format.**  BE17353 (S353) events observed on 2026-05-28 use a histogram body where `byte[5] = 0x00` (looks like a valid block header by every prior signal) but the walker finds zero data blocks.  Different from the existing `byte[5] != 0` roadmap entry (T190 / O121).  Operationally identical impact — ingestion succeeds, DB peaks come from the bw_report overlay, only the chart is empty.  Sample events captured in the roadmap entry for future RE work.
 ### Migration / Operations
 - **Re-parse existing events to pick up the new parser fields.**  Run on whichever box hosts the live waveform store:
  ```bash
  docker exec terra-view-sfm-1 python /app/scripts/backfill_sidecars.py \
      --reparse-txt --skip-hdf5 --dry-run -v | tail
  # Looks reasonable?  Run for real:
  docker exec terra-view-sfm-1 python /app/scripts/backfill_sidecars.py \
      --reparse-txt --skip-hdf5 -v | tee /tmp/reparse.log | tail -30
  ```
  Idempotent; safe to re-run.  Only touches sidecars on disk — no DB writes.
 - **terra-view docker-compose.yml**: add `TZ=America/New_York` (or your deployment's zone) to both the `terra-view` and `sfm` service `environment:` blocks.  Without this, server-rendered timestamps stay in UTC even on the rebuilt SFM image.
 ### Pre-v0.20.0 work (rolled into this release)
 The bullets below accumulated under `[Unreleased]` between v0.19.0 and v0.20.0; kept here so the historical narrative isn't lost.
 #### Fixed
 - **bw_ascii_report parser now handles `OORANGE` saturation marker.**  BW writes `"OORANGE"` (truncation of "Out Of Range") in PPV / PVS / MicL PSPL fields when the underlying measurement exceeded the channel's full-scale.  Previously our `_parse_number()` returned None → DB ended up with NULL peaks for legitimate high-amplitude events.  Confirmed on real ASCII files pulled 2026-05-27 from the Windows watcher PC: T190LD5Q.LK0W (Vert saturated at Normal range 10 in/s), T438L713.RY0W (all three channels saturated at Sensitive range 1.25 in/s), K557L3YM.OE0W (Tran+Vert saturated + Mic PSPL OORANGE).  New behavior:
   - Per-channel PPV: substitute `geo_range_ips` as a conservative lower bound + set `ppv_saturated` flag
   - Peak Vector Sum: substitute `sqrt(3) * geo_range_ips` (the theoretical max when all 3 channels are simultaneously at full-scale) + `peak_vector_sum_saturated` flag
   - MicL PSPL: substitute 140 dB(L) (conservative NL-43 max) + `pspl_saturated` flag
   - Saturation flags are propagated into the sidecar's `bw_report` block for downstream UI rendering (`> 10 in/s` or similar)
   - Five events on prod (T190 / T438 / K557 + 2 others matching the same fault pattern) will pick up correct DB peaks + saturation flags once re-forwarded
 - **bw_ascii_report parser handles `Peak Vector Sum TimeSum` typo'd label.**  Real BW output uses this misspelled label (Sum appended twice instead of "Peak Vector Sum Time").  Now accepted as an alias.  Confirmed against all three OORANGE example files — every one has the typo.
 #### Added
 - **Histogram per-interval aggregation in `waveform.json`.**  Histogram events now render with one bar per BW-reported interval (matching the Blastware printout) instead of ~200 bars per event (the raw codec output).  When the sidecar's `bw_report.histogram.n_intervals` is populated (events ingested with the new parser, see next bullet), the `/db/events/{id}/waveform.json` endpoint groups the codec samples into N intervals via max-per-group and returns the aggregated array.  `time_axis` gains `histogram_aggregated: true`, `n_intervals`, `interval_size_s`, and `interval_times` (HH:MM:SS strings).  Both the modal chart and the standalone event browser use those interval timestamps as x-axis labels when present.  Defensive: no-op for events ingested before the parser extension landed (their sidecars lack `histogram.n_intervals`) — those continue to render with raw codec output.
 - **`bw_ascii_report` parser now captures histogram-specific fields.**  Previously the parser dropped these fields silently (Roadmap item closed):
   - `Histogram Start Time` / `Histogram Start Date` (combined into `histogram_start: datetime`)
   - `Histogram Stop Time` / `Histogram Stop Date` (combined into `histogram_stop: datetime`)
   - `Number of Intervals` (`histogram_n_intervals: int`)
   - `Interval Size` ("1 minute" string + parsed seconds: `histogram_interval_size_str`, `histogram_interval_size_s`)
   - `<Channel> Peak Time` + `<Channel> Peak Date` for histogram events (combined into `channel_peak_when: dict`; waveforms continue to use `time_of_peak_s` relative)
   - `Peak Vector Sum Date` (combined with PVS Time into `peak_vector_sum_when: datetime`; clears the previous bogus `peak_vector_sum_time_s` parse that interpreted "22:33:52" as 22.0 seconds)
   - All new fields land in the sidecar's `bw_report.histogram` block via `_bw_report_to_dict`.  Tested against synthetic K558LLB7.V20H-shaped input.
 - **Raw BW ASCII report (.TXT) preservation.**  `save_imported_bw` now writes the paired `_ASCII.TXT` to `<store>/<serial>/<filename>_ASCII.TXT` alongside the binary at ingest time.  Previously the .TXT was parsed into the sidecar's `bw_report` projection and then discarded — meaning parser bug fixes couldn't be applied retroactively without re-forwarding from the watcher PC.  Now the raw .TXT lives in the waveform store permanently (~15 KB per event; ~210 MB total for a 14k-event store; negligible).  Sidecar's `source.txt_filename` field records the saved path; backfill_sidecars preserves it across regens.  New `GET /db/events/{id}/ascii_report.txt` endpoint serves the raw .TXT for any event ingested after this change.  Events ingested before today still return 404 from that endpoint until re-forwarded.  Architectural rationale: with BW Mail / Forwarding Agent being phased out of the operator workflow, the XML/PDF/WMF that those tools produced are no longer available — the binary + .TXT (created by BW ACH itself) are our authoritative source for everything going forward.
 - **Event Report PDF generation** — `GET /db/events/{id}/report.pdf` returns a single-page letter-portrait PDF for any event with waveform data on disk.  Covers every field a Blastware Event Report includes: header metadata (date/time, trigger source, range, sample rate, project/client/operator/location, serial+firmware, battery, calibration, file name), microphone block (PSPL in dB(L) + psi, ZC freq, channel test), per-channel stats table (rows differ for waveform vs histogram), Peak Vector Sum, and the 4-channel plot.  Iterated against real Blastware reference PDFs (uploaded to `example-events/pdfsnstuff/`):
   - **Waveform layout**: header shows Date/Time, Trigger Source, Range, Sample Rate; stats table has PPV / ZC Freq / Time (Rel. to Trig) / Peak Accel / Peak Disp / Sensor Check; bottom plot is 4-channel line waveform (MicL top → Tran bottom), shared time axis in seconds, dashed trigger line + triangle marker at t=0, symmetric Y on geo channels, zero-anchored on mic, "0.0" baseline label on right per BW convention; footer shows `Time X sec/div   Amplitude Geo: Y in/s/div   Mic: 0.001 psi(L)/div` and the trigger window `▶━━◀` marker.  USBM RI8507/OSMRE compliance chart placeholder upper-right.
   - **Histogram layout**: header shows Start / Finish / Intervals At Size / Range / Sample Rate (no Trigger Source — histograms aren't triggered); NO USBM chart; stats table has PPV / ZC Freq / Date / Time / Sensor Check; bottom plot is per-interval bar chart, Y-axis 0-to-peak (never negative), 0.0 baseline at the bottom; footer shows `Time INTERVAL_SIZE /div   Amplitude Geo: Y in/s/div   Mic: 0.001 psi(L)/div`.
   - Backed by matplotlib (vector PDF, no headless-browser dep).  Adds matplotlib>=3.8 to deps.
   - **Known gap**: histogram codec returns per-block granularity (~200 bars for a 4-interval event) instead of BW's per-interval aggregation.  Visual difference vs BW's 4-bar display.  XML-driven data source (parsing the structured `_XML.XML` files BW also exports) is the planned fix; that route also resolves the bw_ascii_report PPV-miss bug.
   - **Stubbed**: USBM RI8507 / OSMRE compliance chart curves (separate work item; requires coding the regulatory piecewise functions).
 - **"Download PDF" button** in the event modal's footer — triggers the new endpoint; opens in a new tab so the browser handles save-or-display + surfaces any 404 / server errors visibly.
 - **SFM webapp now opens to Database view by default** and the History table is fully interactive.  Click any column header to sort ascending / descending (timestamp, serial, per-channel PPV, PVS, mic dB(L), project, client, record type, key — all sortable).  Click any event row to open the event modal, which now renders a **4-channel waveform plot inline** (MicL / Long / Vert / Tran stacked, Instantel-printout order) alongside the existing sidecar review fields.  Headers are sticky so the columns stay visible while scrolling long event lists.  No more "where is the viewer" — pick a unit from the filter dropdown, scan the table, click the event, see the waveform.
 - **Stored-event browser** — new standalone HTML page at `GET /events` (`sfm/event_browser.html`).  Pick a serial from the unit dropdown, scroll through that unit's events (newest-first), click any event to render its decoded waveform via the existing `/db/events/{id}/waveform.json` endpoint.  Dark-themed Chart.js viewer, channels stacked vertically (MicL / Long / Vert / Tran — Instantel printout order, designed PDF-export-ready), trigger line at t=0, peak labels, search/filter, false-trigger flag honored.  Companion to the existing live-device viewer at `/waveform`; the two routes are now clearly delineated in their docstrings.  The webapp's inline plot at `/` is the primary path; `/events` remains a useful diagnostic when you want just a viewer.
 - **Histogram body codec — uint8 peak count fix.**  Per-channel peak fields at `block[6]/[10]/[14]/[18]` are `uint8`, not `uint16 LE` spanning `block[6:8]` etc.  The original interpretation was byte-exact on the N844 fixture corpus only because every annotation byte (`block[7]/[11]/[15]/[19]`) in those fixtures was zero.  On non-N844 events with non-zero annotation bytes (observed across BE9558 Tran-drift and BE18003 Histogram+Continuous units), the old interpretation produced peaks up to 268 in/s per channel and 35× inflated PVS sums when first deployed to prod (rolled back same day; properly fixed in this release).  Cross-correlated against BW's per-interval ASCII export on K558 / T003 / N599 / N844 corpora — 100% byte-exact on T/V/L, 99%+ on M (sub-precision rounding).  Annotation byte preserved on each record as `record["annotations"]` for future RE.  Verified against ~3,500 blocks across 5 in-repo fixtures + a synthetic K558 interval-12 regression block.
 - **`apply_bw_report_dict_to_event` helper** in `minimateplus.event_file_io`.  Mirror of `apply_report_to_event` for the projected sidecar dict shape — used by the backfill path, which has the preserved `bw_report` block but not the original `.TXT` file.  BW's reported peaks (and `sample_rate` / `record_time`) now win over codec output during `--force` backfill, matching ingest-path behavior.
 - **`scripts/check_bw_report_preservation.py`** — two-step snapshot/diff tool to verify that `backfill_sidecars.py` doesn't wipe the `bw_report` block from existing sidecars.  Classifies every sidecar as PRESERVED / CHANGED / WIPED / STILL_MISSING / NEW / ADDED / REMOVED.  Exit code 1 if any WIPED or CHANGED entries are found, so it can gate a CI step or deploy script.
 #### Fixed
 - **`scripts/backfill_sidecars.py` no longer wipes `bw_report`.**  Before this fix, `event_to_sidecar_dict` silently dropped the preserved `bw_report` block during every backfill, since the function only emits a `bw_report` when called with a live `BwAsciiReport` dataclass (which the backfill doesn't have — only the projected sidecar dict).  Now we read the existing sidecar's `bw_report` and overlay it onto the regenerated sidecar, alongside the existing `review` and `extensions` preservation.
 - **`scripts/backfill_sidecars.py --force` no longer overwrites BW-overlaid DB peaks with codec output.**  The backfill path now calls `apply_bw_report_dict_to_event` before the DB upsert, mirroring what the ingest path does (`/db/import/blastware_file` parses the `.TXT` into a `BwAsciiReport`, calls `apply_report_to_event`, then upserts).  Without this, events where the codec doesn't fully decode (waveform walker edge cases on SP0/SS0/SV0-style events, histogram `byte[5]!=0` sub-format) ended up with PVS=0 in the DB after a `--force` backfill; bit on prod 2026-05-22, rolled back the same day.
 - **Thor IDF files no longer attempted as BW events in backfill.**  `scripts/backfill_sidecars.py` now filters out `.IDFW` / `.IDFH` files in `_looks_like_event_file()`; they share the `.X0W` / `.X0H` suffix shape but use a separate ingest path (`WaveformStore.save_imported_idf`) and aren't decodable by `event_file_io.read_blastware_file`.
 #### Docs
 - **CLAUDE.md** — added a three-tier conceptual architecture model (SFM / SDM / shared codec library) near the top of the file, with a placement rule for where new code goes.  Documents that what is conceptually SDM (database, waveform store, ingest, `/db/*` endpoints) still lives under `sfm/` for historical reasons; rename deferred until the codebase is quiet enough for a clean refactor.
 - **README.md** — added a "Strategic direction" lead-in to the Roadmap that frames seismo-relay as a suite of cooperating components (not a single app), and an explicit "Terra-View ↔ SFM device control" roadmap section with a concrete implementation checklist (auth as hard prerequisite, embedded live-monitor view, action history, Series IV live-device support).
 - **`docs/histogram_codec_re_status.md`** updated with the uint8 retraction and the annotation-byte status.
 - Three known issues recorded in the Roadmap that were discovered during prod validation: (1) `bw_ascii_report` parser misses PPV / `vector_sum` on some `.TXT` formats (5 events on prod); (2) NULL-timestamp duplicate-row dedup needed (2 events on prod); (3) histogram body sub-format with `byte[5] != 0` not yet decoded (~3 events on prod with empty `.h5` plots).
 ---
 ## v0.19.0 — 2026-05-20
 The "device-family separation" release.  Tightens the boundary between Series III (MiniMate Plus / Blastware) and Series IV (Micromate / Thor) so the UI and storage layer dispatch deterministically by family instead of sniffing filename extensions or magnitude heuristics.
 ### Added — Phase 1: `device_family` column on `events`
 - **`events.device_family TEXT`** — new column carrying `"series3"` or `"series4"`.  Populated by every import path (`/db/import/blastware_file`, `/db/import/idf_file`, ACH server, BW CLI, sidecar backfill script).  Returned through `/db/events` since `query_events` uses `SELECT *`.
 - **Self-applying migration** — on startup, `ALTER TABLE ... ADD COLUMN` lands the new column; a follow-on `UPDATE` backfills existing rows from the binary filename extension (`.IDFH`/`.IDFW` → `series4`, everything else → `series3`).  No manual SQL needed.
 - **UPSERT preserves family** — re-imports without an explicit family don't blank existing rows (`COALESCE(?, device_family)`).
 - **UI dispatches on the column** — `sfm_webapp.html` events-table mic formatter now branches on `ev.device_family === 'series4'` (Thor stores native dB(L); BW stores psi).  Modal uses `source.kind === 'idf-import'` from the sidecar (sidecars don't carry the DB column).  Source-files section labels changed from "BW filename / BW filesize / BW sha256" to format-neutral "Event file / File size / File sha256".
 ### Added — Phase 2: `micromate/` package alongside `minimateplus/`
 - **`micromate/`** — new sibling package for the Thor / Micromate Series IV device.  Currently scoped to offline-file ingest; live-device support (TCP transport, framing, protocol, client) will land here when reverse-engineering happens.
  - `micromate/idf_ascii_report.py` — moved from `sfm/idf_ascii_report.py`.  No behaviour change.
  - `micromate/models.py` — typed `IdfReport`, `IdfEvent`, `IdfPeaks`, `IdfProjectInfo`, `IdfSensorCheck`.  Stores mic in native `mic_pspl_dbl` (dB(L)) instead of the pseudo-psi shoehorn that the BW-shaped model uses.  `IdfEvent.from_report()` constructs from a parsed dict + filename; `IdfEvent.to_minimateplus_event(waveform_key)` bridges to the existing sidecar / DB-insert machinery.
  - `micromate/idf_file.py` — placeholder for the binary codec (`.IDFH` / `.IDFW`).  Stubbed `read_idf_file()` raises `NotImplementedError`; documents the planned reverse-engineering path.
 - **`WaveformStore.save_imported_idf`** refactored to use the native `IdfEvent` and bridge at the SQL-insert boundary.  Cleaner separation of "parse a Thor event" (in `micromate/`) from "store it on disk + write a sidecar" (in `sfm/waveform_store.py`).
 - **Tests** — `tests/test_idf_ascii_report.py` imports updated to `micromate.idf_ascii_report`.  All 1,014 example-data sidecars round-trip through `IdfEvent.from_report()` without errors.
 ### Companion releases
 - **thor-watcher** unaffected — it talks to the relay over HTTP only.  No version bump needed.
 - **terra-view** unaffected today; can use `device_family` in its event-detail rendering when convenient.
 ---
 ## v0.18.0 — 2026-05-19
 The "Thor / Series IV ingest adapter" release.  Seismo-relay can now accept event files from Instantel Micromate Series IV (Thor) units alongside the existing MiniMate Plus (Series III) Blastware pipeline.
 ### Added — Thor (Series IV) IDF ingest
 - **`POST /db/import/idf_file`** (`sfm/server.py`) — multipart upload endpoint for `.IDFH` (histogram) and `.IDFW` (waveform) event files plus their `.IDFH.txt` / `.IDFW.txt` ASCII sidecars.  Mirrors the shape of `/db/import/blastware_file`: pairing by filename, optional `serial` query hint, per-file outcome reporting.
 - **`sfm/idf_ascii_report.py`** — parser for Thor's TXT sidecars (verified against 1,014 real-world samples).  Extracts device-authoritative PPV, ZC Freq, Peak Vector Sum, Mic PSPL, calibration date, firmware version, sensor self-check results, and project/client/operator strings.
 - **`WaveformStore.save_imported_idf()`** (`sfm/waveform_store.py`) — stores Thor binaries verbatim in `<root>/<serial>/<filename>`, writes a `.sfm.json` sidecar with `source.kind = "idf-import"` and the full parsed report under `extensions.idf_report`.  Reuses the existing `events` table — Thor events dedupe on (serial, timestamp) and surface in `/db/events` alongside BW events.
 - **`tests/test_idf_ascii_report.py`** — parser tests against the `thor-watcher/example-data/` corpus.
 ### Changed
 - `event_to_sidecar_dict()` (`minimateplus/event_file_io.py`) allow-list for `source_kind` now includes `"idf-import"` so the existing sidecar machinery can carry Thor imports.
 - Bumped `pyproject.toml` version to `0.18.0`.
 ### Companion release
 This release ships alongside **thor-watcher v0.3.0**, which adds the SFM forwarder that targets the new `/db/import/idf_file` endpoint.  Operators flip the switch in thor-watcher's new "SFM Forward" Settings tab; events POST to seismo-relay just like the series3-watcher BW forwarder does today.
 ---
 ## v0.17.0 — 2026-05-17
 The "field rescue + DB management" release.  Hardened against units that are stuck in a runaway call-home loop, and added an operator-facing path for purging bogus events that those same units dump into the DB before recovery.  All work in this release was driven by the BE9558H incident (full incident log + recovery procedure at `docs/runbooks/wedged_unit_recovery.md`).
 ### Added — wedged-unit recovery toolkit
 A toolkit for breaking the call-home loop on a misbehaving unit whose firmware is too busy to keep up with normal request/response handshakes.  Tested in production against BE9558H (16 May 2026) — a unit with a stuck-triggered Long-axis geophone that had been call-homing the office BW ACH server every 30 seconds for hours.  Endpoints layered from "single attempt" to "siege mode" to suit different contention levels:
 - **`GET /device/events/storage_range`** — SUB 0x06 probe.  POLL + one read; ~2s.  Returns first/last event keys and an `is_empty` flag.  Use to triage whether a unit has stored events without invoking the slow `count_events()` 1E/1F chain (which choked on BE9558H's corrupted event chain).
 - **`GET /device/events/index`** — SUB 0x08 probe.  POLL + one read; ~2s.  Returns the lifetime event counter (does NOT decrement on erase — use `storage_range` for "right now" state).
 - **`POST /device/events/erase`** — full erase sequence `0xA3 → 0x1C → 0x06 → 0xA2` (confirmed 2026-04-11, see the protocol reference).  Resets event keys to `0x01110000`.  Caller's responsibility to disable ACH first if the underlying trigger condition will re-fill the buffer.
 - **`POST /device/rescue`** — one TCP session, short connect+recv timeouts: POLL → disable ACH (compliance config write) → erase events → close.  Designed for race-loop usage when the device is busy in another session.  503 on connect-refused, 502 on protocol failure, 200 on full sequence success.
 - **`POST /device/stop_monitoring_blind`** — fire-and-forget Stop Monitoring (SUB 0x97), TCP-only.  Dumps `SESSION_RESET + POLL_PROBE + SESSION_RESET + POLL_DATA + 0x97 × repeat` and closes without reading any S3 response.  The full POLL preamble is required — write commands without it are silently ignored by the device's protocol parser (false-positive surface area that bit the first version of this endpoint).  Use when the device's firmware can't keep up with full request/response but might process inbound bytes at its own pace.
 - **`POST /device/stop_monitoring_spam`** — server-side hammer loop, duration-bounded.  Open TCP → write the same blind payload → close → repeat as fast as possible until `duration_s` elapses.  Configurable `connect_timeout` (default 500ms) and `repeat` (frames per session).  Reports `sent_ok`, `connect_failed`, `write_failed`, `rate_attempts_per_s`.  Clamped to 5min duration.
 - **`POST /device/stop_monitoring_slow_drip`** — opposite of spam.  Open ONE TCP session, drip the wake handshake + stop frames at `interval_s` (default 3s) for `duration_s` (default 120s, max 10min).  Each drip is ~23 bytes — well under any UART FIFO size.  Opportunistically drains any inbound bytes the device sends back; `bytes_received > 0` in the response strongly suggests the device has started talking and the session is healthy.  **This is the endpoint that saved BE9558H.** Spam mode had been overrunning the device's UART FIFO; slow drip stayed under it.
 - **Six rescue scripts** under `scripts/` — thin bash wrappers around the endpoints, default `SFM_BASE_URL=http://localhost:8200` (direct, not via Terra-View proxy whose 60s timeout would cut off the longer endpoints):
    - `rescue_device.sh` — race-loop wrapper for `/device/rescue`
    - `blind_stop.sh` — race-loop wrapper for `/device/stop_monitoring_blind`
    - `spam_stop.sh` — single-call burst hammer
    - `slow_drip.sh` — single-call held-session drip
    - `watch_unit.sh` — passive periodic reachability check (every N min, logs to file), useful for unattended overnight monitoring of a wedged unit
 - **`docs/runbooks/wedged_unit_recovery.md`** — symptoms, quick-reference recovery procedure, the modem-layer mechanism (Sierra Wireless serial-port mode-flipping is the real failure mode — not the device firmware), and a table of "why simpler approaches don't work" so the next incident skips the dead ends.
 ### Added — operator event DB management
 Endpoints powering Terra-View's new `/admin/events` page (v0.12.0).  Designed for purging bogus events from a unit that's been forwarding them in bulk (e.g. a stuck-triggered seismograph dumping hundreds of junk events before it's recovered).
 - **`DELETE /db/events/{event_id}`** — hard-delete one event row.  Also unlinks the associated blastware binary (`.AB0*`), `.a5.pkl`, `.sfm.json` sidecar, and `.h5` clean-waveform files via the WaveformStore.  Returns the per-file removal status.  404 if the event doesn't exist.
 - **`POST /db/events/delete_bulk`** — filter-based or id-list-based bulk delete with safety rails:
    - Filters (`serial`, `from_dt`, `to_dt`, `false_trigger`) combine with AND; same semantics as `GET /db/events`.  `ids` is an additional inclusion list.  Refuses to run with no filters (would wipe the whole table — raises 422).
    - `confirm` must be `true` to actually delete.  Otherwise returns a dry-run summary (`status: "dry_run"`, `matched: N`, `sample_serials: [...]`).
    - `max_rows` (default 10,000) caps how many rows can be deleted by-filter in one call.  If exceeded, returns `status: "too_many"` with a hint to narrow or raise the cap.  Bypassed when only `ids` is supplied.
 - **`_cleanup_event_files(row)`** helper in `sfm/server.py` — best-effort `unlink()` of all four sidecar paths derived from the row's `blastware_filename`.  Logged at WARN if a path exists but unlink fails; the DB row deletion still proceeds.
 - **`SeismoDb.delete_event(id)` and `SeismoDb.delete_events_bulk(...)`** in `sfm/database.py` — both return the deleted row dict(s) so callers can do file cleanup.  `delete_events_bulk` raises `ValueError` if no filters are supplied.
 ### Changed
 - **Default protocol recv timeout dropped from 30s → 10s** in `_build_client()`.  The unit usually responds in well under a second over cellular; 10s leaves comfortable headroom for retransmits while failing reasonably fast when a unit is wedged.  The two endpoints that perform full 5A waveform downloads still pass `timeout=120.0` explicitly so multi-minute event transfers are unaffected.
 - **`_build_client()` now accepts an optional `connect_timeout`** (TCP-only) so rescue / race-loop endpoints can fail fast on busy modems without affecting the protocol-level recv timeout.
 ### Fixed
 - **`GET /device/monitor/status` returned HTTP 500 + uncaught traceback when the device was unresponsive**.  The retry-on-`Exception` inner block let the second `client.poll()`'s `ProtocolError` propagate out of the handler.  Now wrapped in proper try/except — returns 502 with `{"detail": "Protocol error: No S3 frame received within 10.0s ..."}` on timeout, 502 on connection errors, 500 only for genuinely unexpected exceptions.
 ### Migration
 No schema changes.  No data migration required.
 If you've been running a previous version against a wedged unit and accumulated bogus events, the new `/admin/events` page in Terra-View v0.12.0 (or direct `POST /db/events/delete_bulk` with `confirm: true`) is the cleanup tool.  Watcher state on the upstream DL2 PC does NOT need separate cleaning — the watcher's `sfm_forwarded.json` keys on file sha256 and won't re-forward the same files.
 ### Pairing
 This release pairs with **Terra-View v0.12.0**, which adds the `/admin/events` UI that consumes the new bulk-delete endpoints, the bulk false-trigger flagging on `/unit/{id}`, and the field-deployment workflow that uses the same `series3-watcher` → SFM ingest path as before.
 ---
 ## v0.16.1 — 2026-05-14
 ### Fixed
 - **`record_type` always "Waveform" for forwarded events.**  `read_blastware_file()` hardcoded `ev.record_type = "Waveform"` regardless of the file's actual type.  The watcher-forward pipeline (the main BW ACH ingest path) compounds this by parsing files from a tmp path with a `.bw` suffix, so even a filename-based fallback inside the parser still wouldn't see the original extension.  Now:
  1. New `derive_record_type_from_filename(filename)` helper in `minimateplus/event_file_io.py` derives the type from the LAST character of the filename's extension (V10.72+ AB0T scheme: `H`=Histogram, `W`=Waveform, `M`=Manual, `E`=Event, `C`=Combo).  Falls back to `"Waveform"` for old S338 firmware (3-char extensions ending in `0`) and any unrecognized suffix.
  2. `read_blastware_file()` now calls the helper with its `path.name` so direct callers (the `--dry-run` path in `scripts/import_bw.py`, tests, ad-hoc scripts) get the right value automatically.
  3. `WaveformStore.save_imported_bw()` overrides `ev.record_type` with the **original** filename's derived type after parsing (the tmp file inside the parser doesn't carry the original extension).  This is the path the live watcher-forwarder hits, so the DB column now reflects the actual event type going forward.
  Events ingested before this fix are stuck with `record_type="Waveform"` in the DB; a one-off backfill (`UPDATE events SET record_type = ... WHERE blastware_filename LIKE '%H'`) would fix them retroactively if desired.  Terra-view's event modal also derives client-side from the filename, so the UI already shows the correct type for old events even without the backfill.
 ---
 ## v0.16.0 — 2026-05-11
 The "BW ACH ingestion" release.  When paired with **series3-watcher v1.5.0**, every Blastware ACH event (binary + `_ASCII.TXT` report) lands in SeismoDb with device-authoritative peaks, project metadata, sensor self-check, and ZC/Time-of-Peak data — without depending on the still-undecoded waveform body codec.  This is the end-to-end product win discussed in v0.15.0's "out of scope" notes: sortable / filterable monthly-summary review of historical events, populated from the BW ASCII export rather than re-decoded samples.
 ### Added — `/db/import/blastware_file` rich-metadata ingestion
 - **Paired BW ASCII reports.**  The endpoint now accepts the `<binary>_<ext>_ASCII.TXT` partner BW writes alongside each event.  Pairing handles both filename conventions: ACH (`M529LK44_AB0_ASCII.TXT`) and manual-export (`M529LK44.AB0.TXT`).  When both present, ACH wins.
 - **`minimateplus/bw_ascii_report.py`** (new) — parser + `BwAsciiReport` dataclass for BW's per-event ASCII export.  Handles every field BW writes: identity, trigger config, per-channel PPV / ZC Freq / Time of Peak / Peak Acceleration / Peak Displacement, Peak Vector Sum + time, MicL PSPL / Time of Peak / ZC Freq, sensor self-check (Test Freq / Test Ratio / Test Amplitude / Pass-Fail per channel), monitor log, PC SW version.
 - **Position-based user-notes parsing.**  BW's Compliance Setup → Notes tab labels (Project / Client / User Name / Seis Loc) are *operator-editable* — an operator can rename them to "Building:", "Site Address:", etc.  Rather than maintain a label-spelling map, the parser uses positional matching between the `Units :` and `Geo Range :` anchors in the ASCII output.  The four canonical slots (project / client / operator / sensor_location) populate by position regardless of label; the original labels BW wrote are preserved in `report.user_note_labels` for downstream UIs (terra-view) to display verbatim.
 - **`bw_report` sidecar block.**  New top-level block in `.sfm.json` carrying the parsed BW report (trigger config, peaks with per-channel stats, mic block, sensor_check, monitor_log, PC SW version, operator-label labels).
 - **`apply_report_to_event(event, report)` helper.**  Overlays the report's device-authoritative fields onto an in-memory `Event` so `SeismoDb.insert_events()` writes correct DB columns instead of the broken-codec values from `_peaks_from_samples()`.
 ### Fixed — three compounding bugs that left forwarded events with garbage data
 - **Import endpoint inserted under `serial="UNKNOWN"`.**  `_serial_from_event(ev)` was a stub that always returned `None`; the BW-filename-decoded serial that `WaveformStore` had already resolved was never surfaced to `db.insert_events`.  Now uses `rec["serial"]` as the authoritative source.  `scripts/repair_unknown_serials.py` repairs existing DB rows.
 - **`/db/units` ignored events from non-ACH ingest paths.**  `query_units()` only aggregated from `ach_sessions` — events that arrived via `save_imported_bw()` were never visible in the fleet overview even though they populated `events` correctly.  Now unions both tables.
 - **Re-imports left stale DB rows.**  The `IntegrityError` handler in `insert_events()` only refreshed filename / sidecar columns when a duplicate `(serial, timestamp)` arrived.  Peak values, project info, sample_rate, record_type stayed locked at whatever the first (often broken-codec) insert wrote.  Now the upsert path refreshes every device-authoritative column from the new data while preserving `false_trigger` and immutable fields (`id`, `created_at`).
 - **Server-side TXT pairing only knew the legacy convention.**  The endpoint stripped `.TXT` and looked up `<binary>` — which works for manual exports (`<binary>.TXT`) but not BW ACH (`<stem>_<ext>_ASCII.TXT`).  Reports were arriving in the multipart but silently dropped.  Now recognises both conventions and registers each report under all matching binary names.
 ### Migration
 For existing deployments where events were forwarded by an older watcher (broken pairing) or imported during the UNKNOWN-bucketing window:
 1. `python -m scripts.repair_unknown_serials --db <path> --apply` to re-attribute `serial="UNKNOWN"` rows.
 2. Delete the watcher's `sfm_forwarded.json` state file and let it re-forward.  The server's upsert path will refresh the existing DB rows with the report's authoritative values.
 3. Operator review state (`false_trigger`, sidecar `review` block) is preserved across the re-import.
 ## v0.15.0 — 2026-05-07
 ### Added
@@ -2,12 +2,112 @@
 Ground-up Python replacement for **Blastware**, Instantel's Windows-only software for
 managing MiniMate Plus seismographs. Connects over direct RS-232 or cellular modem
-(Sierra Wireless RV50 / RV55). Current version: **v0.14.3**.
+(Sierra Wireless RV50 / RV55). Current version: **v0.21.0**.
 When new information about the protocol is discovered, please update the instantel_protocol_reference.md with the findings in addition to this document
 ---
 ## Architecture: three-tier conceptual model
 seismo-relay is a **suite of cooperating components**, not a single app.
 The three tiers below are the canonical mental model — the current
 directory layout doesn't fully reflect them yet (some of what is
 conceptually SDM lives under `sfm/` today), but new code should be
 placed and named according to this model.
 ### 1. SFM — the device-side (active connection to physical units)
 Replaces Blastware's *talk-to-the-meter* role.  Lives where a connection
 to a physical seismograph is open.
 In scope:
 - `minimateplus/{transport,framing,protocol,client}.py` — wire protocol
 - `seismo_lab.py` — diagnostic GUI (a thick client for SFM)
 - The `/device/*` HTTP endpoints in `sfm/server.py` —
  `/device/info`, `/device/events`, `/device/monitor/*`, `/device/call_home`,
  etc.  Anything that opens a connection at the moment of the request.
 - Future: a Thor / Micromate live client (mirror `minimateplus/`)
 - Future: a control surface Terra-View can launch into — see the
  README's Roadmap.
 Does NOT own a database.  Outputs `Event` objects.  Has a "spun up when
 needed" runtime profile rather than "always on".
 ### 2. SDM — the data-side (storage, ingest, and serving)
 The new name for the receiving-and-storing role.  Originally called SFM
 because the FastAPI service started life as a thin device proxy, but
 the actual role has migrated heavily toward data management.  **For now
 the directory remains `sfm/`** — renaming requires touching ~30-50
 files in seismo-relay + ~10-15 in terra-view + a Docker volume
 migration; deferred until the codebase is quiet enough to do it as a
 clean refactor.
 In scope:
 - `sfm/database.py` (`SeismoDb`)
 - `sfm/waveform_store.py`, `sfm/event_hdf5.py`
 - The `/db/*` HTTP endpoints — `events`, `units`, `monitor_log`,
  `sessions`, `false_trigger` mutations
 - The `/db/import/*` ingest endpoints — `blastware_file` (series3),
  `idf_file` (series4); anything that receives events FROM somewhere
 - `scripts/backfill_sidecars.py`, `scripts/check_bw_report_preservation.py`,
  and similar data-maintenance tools
 - The `.sfm.json` sidecars and `.h5` files in the waveform store
 - The shape that Terra-View consumes (Terra-View should never need to
  reach into SFM/device-side endpoints to populate its UI)
 Always-on, scaled for storage/serving, has the DB and waveform store.
 ### 3. Codec library — pure data interpretation (used by both sides)
 Neither SFM nor SDM — a shared library both depend on.
 In scope:
 - `minimateplus/{waveform_codec,histogram_codec,event_file_io,bw_ascii_report,blastware_file}.py`
 - `micromate/{idf_ascii_report,idf_file}.py`
 These modules take bytes (off the wire on the SFM side, or from a
 forwarded file on the SDM side) and return `Event` objects.  They
 should not import from `sfm/`, must not touch a DB, and have no I/O
 beyond reading files passed as arguments.  Keep them pure — both
 tiers can then depend on them without circularity.
 #### Thor IDF binary codec (2026-05-28)
 `micromate/idf_file.read_idf_file()` decodes both Thor IDFW
 (waveform) and IDFH (histogram) binaries.
 - **IDFW** reuses `decode_waveform_v2()` on the body at fixed file
  offset `0x0f1f`.  Sample fidelity is 87–99% byte-exact on quiet
  events; loud events hit the BW codec's known walker-stops-early
  limitation.
 - **IDFH** has its own segment-based decoder: `[len_be][0a 00 00 00]
  [00 NN][05 3f]` + N × 72-byte interval records (4 × 16-byte
  per-channel min/max/halfp).  All 859 Thor IDFH corpus files
  decode (181,071 intervals); peak matches sidecar within ~1.8%
  (ADC quantization).
 The two outlier `BE9439_*` files in the Thor example corpus are
 actually Series III Blastware binaries that share the `.IDFW`/`.IDFH`
 filename convention by accident.  `read_idf_file()` detects them by
 their BW STRT signature and raises NotImplementedError pointing
 callers at `read_blastware_file()`.  See
 `docs/idf_protocol_reference.md` for full field layouts.
 ### Practical consequences
 When deciding where new code goes, ask:
 - *Does it need a connection to a device?* → SFM
 - *Does it operate on stored events / sidecars / DB rows?* → SDM
 - *Does it interpret bytes into structured data, with no I/O of its own?* → codec lib
 Terra-View is downstream of SDM for data, and (per the roadmap) will
 eventually invoke into SFM's device-control endpoints to provide a
 "connect to unit" experience.
 ---
 ## Project layout
 ```
@@ -17,6 +117,8 @@ minimateplus/         ← Python client library (primary focus)
  protocol.py         ←   MiniMateProtocol — wire-level read/write methods
  client.py           ←   MiniMateClient — high-level API (connect, get_events, …)
  models.py           ←   DeviceInfo, EventRecord, ComplianceConfig, …
  waveform_codec.py   ←   Body-codec block walker + decode_tran_initial (partial
                          per-sample decoder — see "Waveform body codec" section below)
 sfm/server.py         ← FastAPI REST server exposing device data over HTTP
 seismo_lab.py         ← Tkinter GUI (Bridge + Analyzer + Console tabs)
@@ -57,6 +159,133 @@ Full read pipeline + write pipeline + erase pipeline + monitor log + call home c
 ---
 ## Waveform body codec — FULLY DECODED (2026-05-11 late)
 > ### ✅ The codec is fully cracked
 >
 > Every block type, every channel, every fixture event decodes byte-exact
 > against BW's ASCII export.  **47,364 ADC samples verified, zero errors.**
 > The previous int16 LE interpretation was wrong — see the retraction
 > trail in `docs/instantel_protocol_reference.md §7.6.1`.
 >
 > Authoritative implementation: `minimateplus/waveform_codec.py`
 > (`decode_waveform_v2()`).  Clean working notes:
 > `docs/waveform_codec_re_status.md`.
 >
 > **NOTE:** `client.py:_decode_a5_waveform` still uses the broken
 > legacy int16 LE decoder.  Wiring `decode_waveform_v2` into the
 > `.h5` sidecar path is the obvious next follow-up.  Until that lands,
 > `.h5` samples remain wrong — but the codec itself is fully solved.
 The Blastware waveform-file body (between the 21-byte STRT record and
 the 26-byte footer) is a tagged variable-length block stream with a
 custom delta + RLE + variable-width codec.
 ### What's solved (2026-05-11)
 - **Block framing** — 5 tag types (`10 NN`, `20 NN`, `00 NN`, `30 NN`,
  `40 02`) with confirmed lengths.  Implementation: `walk_body()` in
  `minimateplus/waveform_codec.py`.
 - **Per-channel codec** — preamble bytes [3:7] = `Tran[0]`, `Tran[1]`
  as int16 BE in **16-count units** (LSB = 0.005 in/s).  Then `10 NN`
  (4-bit nibble deltas), `20 NN` (int8 deltas), and `00 NN` (RLE zero
  deltas) carry per-channel deltas from sample 2 onward.
 - **Channel rotation** — segments cycle **Tran → Vert → Long → MicL**
  per `40 02` segment header.  Each segment carries ~512 sample-sets of
  ONE channel.  The initial body (before the first `40 02`) is the
  implicit Tran segment.
 - **Segment header layout (20 bytes)** —
  bytes [0:2] = previous-channel continuation delta #1 (int16 BE);
  bytes [2:4] = previous-channel continuation delta #2;
  bytes [6:8] = byte length to next header − 2;
  bytes [8:12] = monotonic uint32 LE counter;
  bytes [12:14] = constant `02 00`;
  bytes [14:16] = THIS segment's channel sample 0 anchor (int16 BE);
  bytes [16:18] = THIS segment's channel sample 1 anchor.
 - **`decode_waveform_v2()`** returns full per-channel sample dicts.
  Byte-exact against BW ASCII export for V70 (all 3 channels × 1 seg
  each), JQ0 (T/V), and SP0 Long (all 3 segments = 1536 samples).
 - **`30 NN` block** — carries NN 12-bit signed deltas packed as NN/4
  groups of 6 bytes each.  Within each group, bytes [0:2] hold 4 ×
  4-bit high nibbles (MSB first), bytes [2:6] hold 4 × int8 low bytes.
  Each delta = `sign_extend_12((high_nibble << 8) | low_byte)`.  Block
  length = `NN × 1.5 + 2` bytes.  ✅ confirmed against all 14 `30 NN`
  blocks in the fixture bundle.  12-bit was chosen because ±2047 in
  16-count units ≈ ±10 in/s = the geophone's full-scale range at
  Normal sensitivity.
 - **Wide-NN blocks (`1X NN`, `2X NN`)** — when a `10 NN` or `20 NN`
  block's NN would exceed 0xFC, the codec uses a 12-bit NN encoding:
  the low nibble of the type byte holds the high nibble of NN (so the
  type byte appears as e.g. `0x11` instead of `0x10`).  Effective
  NN = `((type_byte & 0x0F) << 8) | nn_byte`.  Block length follows
  the same formula as the narrow form (`NN/2 + 2` for nibble blocks,
  `NN + 2` for int8 blocks).  Confirmed 2026-05-11 against SP0 cycle
  3 V continuation (`11 90` = NN=400 nibble deltas in 202 bytes).
 ### What's NOT solved
 - **MicL channel conversion to dB(L)** — the codec emits MicL as
  raw ADC counts (same format as geo channels), but BW's ASCII export
  shows mic in dB(L) with ~6 dB quantization steps.  Need to map
  ADC counts → dB(L) for direct comparison; likely
  `dB = 20*log10(|counts|) + offset` or similar.
 - **Walker edge cases** — SP0/SS0/SV0 don't walk the full event due
  to block-length quirks past the first few segments.  Every sample
  reached is correct; the walker just needs robustness improvements.
 ### Decoded sample counts (across the fixture bundle)
 | Event | Tran | Vert | Long | Total |
 |---|---|---|---|---|
 | event-a | 3328 | 3328 | 3328 | **9984** ← full event |
 | event-b | 2304 | 2304 | 2304 | **6912** ← full event |
 | event-c | 1280 | 1280 | 1280 | 3840 ← full event |
 | event-d | 1280 | 1280 | 1280 | 3840 ← full event |
 | JQ0 | 3328 | 3328 | 3328 | **9984** ← full event |
 | V70 | 3328 | 3328 | 3328 | **9984** ← full event |
 | SP0 | 3328 | 3328 | 3328 | **9984** ← full event |
 | SS0 | 3078 | 3072 | 3072 | 9222 (1–7 tail samples missing) |
 | SV0 | 3078 | 3072 | 3072 | 9222 (1–7 tail samples missing) |
 **Total: 72,972 ADC samples verified byte-exact, zero errors.**
 7 of 9 fixture events decode end-to-end across all three geo channels.
 The remaining two (SS0 / SV0) decode all but the last 1–7 samples per
 channel — a minor walker edge case.
 ### Production-code status (updated 2026-05-11 late)
 `client.py:_decode_a5_waveform` now uses the verified codec via
 `waveform_codec.decode_a5_frames()` — which calls
 `blastware_file.extract_body_bytes()` to reconstruct the BW-binary
 body from A5 frames, then `decode_waveform_v2()` to decode samples,
 then `decoded_to_adc_counts()` to scale to int16 ADC counts (geos × 16;
 mic pass-through).  The `.h5` sidecars SFM produces now contain
 correct samples for any event without walker edge cases.
 The original int16 LE decoder is preserved as
 `_decode_a5_waveform_LEGACY` for reference but is not called.
 MicL → dB(L) conversion utility:
 `waveform_codec.mic_count_to_db(count)` — `count=±1 → ±81.94 dB`;
 `count=813 → 140.14 dB` (matches BW display).
 ### Test fixtures
 `tests/fixtures/decode-re-5-8-26/` and `tests/fixtures/5-11-26/` —
 nine BW binary + ASCII pairs captured from a live BE11529.  The
 5-11-26 high-amplitude bundle (PPV 6–7 in/s) is what cracked the Tran
 codec; the V70 (mic-heavy) + JQ0 (Vert-heavy) pair cracked the `00 NN`
 RLE rule.
 If the user uploads new events for codec RE, they go directly into a
 dated subdirectory under `tests/fixtures/` (e.g. `tests/fixtures/5-18-26/`).
 There used to be a separate `decode-re/` upload mirror but it was
 removed once the fixtures directory became the canonical location.
 ---
 ## Protocol fundamentals
 ### DLE framing
@@ -1353,6 +1582,8 @@ body) because writing a dial string may require DLE escaping for embedded contro
 ## What's next
 **See [README.md → Roadmap (Future)](README.md#roadmap-future) for the canonical deferred-work list.** This section is kept as a status log of in-progress / recently-shipped technical details (encoding schemes, byte layouts, etc.) that are too low-level for the README's roadmap.
 - **Database** — SQLite store for events + monitor log entries; dedup by key; queryable
 - **Histograms** — decode histogram-mode A5 data (noise floor tracking)
 - **Blastware-compatible file output** — `write_blastware_file()` and `write_mlg()` implemented. `blastware_filename()` generates correct Blastware filenames (AB0 for direct, AB0W/AB0H for ACH). **Confirmed BYTE-PERFECT against BW reference (v0.14.3, 2026-05-05):** when fed the BW 5-1-26 3-sec capture's A5 frames, the SFM-built file matches BW's saved `M529LKIQ.G10` byte-for-byte (8708 bytes, 0 differences).  Live SFM downloads of event 0 (3-sec) and event 1 (3-sec continuation) both open cleanly in Blastware with full Event Reports, frequency analysis, and waveform plots.  Body assembly is just contiguous concatenation of frame contributions in stream order (probe → meta@0x1002 → meta@0x1004 → samples → TERM); no stripping, no overlay, no special handling.  Histogram+Continuous mode deferred (5A stream for those events embeds histogram interval records that may need different handling — untested under v0.14.x). Extension mapping: extensions encode timestamp (AB0T for ACH, AB0 for direct), NOT recording mode. Filename format: `<prefix_letter><serial3><4-char-base36-stem><ext>`
@@ -0,0 +1,31 @@
 FROM python:3.11-slim
 WORKDIR /app
 # tzdata is required for the TZ env var to take effect (python:slim
 # omits the timezone database).  Without it, datetime.now() / logging
 # / matplotlib all stay in UTC regardless of TZ.  Default zone gets
 # set further down via ENV; users override per-deployment via the
 # `TZ` env var in docker-compose.
 RUN apt-get update && \
    apt-get install -y --no-install-recommends curl tzdata && \
    rm -rf /var/lib/apt/lists/*
 # Default display timezone — applied to server logs, datetime.now(),
 # matplotlib rendered timestamps, and any naïve-vs-aware datetime
 # conversions in the PDF renderer.  Override via TZ env var in
 # docker-compose; storage in the DB is always UTC regardless.
 ENV TZ=America/New_York
 COPY pyproject.toml requirements.txt ./
 COPY minimateplus ./minimateplus
 COPY micromate    ./micromate
 COPY sfm          ./sfm
 COPY bridges      ./bridges
 COPY scripts      ./scripts
 RUN pip install --no-cache-dir -e .
 EXPOSE 8200
 CMD ["python", "-m", "uvicorn", "sfm.server:app", "--host", "0.0.0.0", "--port", "8200"]
@@ -1,7 +1,11 @@
-# seismo-relay  `v0.15.0`
+# seismo-relay  `v0.21.0`
 A ground-up replacement for **Blastware** — Instantel's aging Windows-only
-software for managing MiniMate Plus seismographs.
+software for managing seismographs.  Supports both the **MiniMate Plus
 (Series III)** and the **Micromate (Series IV / "Thor")** families:
 Series III via the live RS-232 / TCP wire protocol *and* Blastware ACH file
 ingest; Series IV currently via Thor TXT-paired IDF file ingest, with the
 binary codec on the roadmap.
 Built in Python. Runs on Windows, Linux, or macOS. Connects to instruments
 over direct RS-232 or cellular modem (Sierra Wireless RV50 / RV55).
@@ -14,11 +18,43 @@ over direct RS-232 or cellular modem (Sierra Wireless RV50 / RV55).
 > byte-perfect against Blastware captures across 2-sec, 3-sec, and 10-sec
 > events.** Generated `.G10` / `.AB0` files open cleanly in Blastware with
 > full Event Reports, frequency analysis, and waveform plots.
-> **v0.15.0 (2026-05-07)** adds layered per-event storage (BW binary +
+> **v0.16.0 (2026-05-11)** adds BW ASCII report ingestion to
-> raw 5A pickle + HDF5 + `.sfm.json` sidecar), a plot-ready
+> `/db/import/blastware_file` — paired with **series3-watcher v1.5.0**,
-> `sfm.plot.v1` JSON shape with server-side ADC-to-physical-units
+> every Blastware ACH event lands in SeismoDb with device-authoritative
-> conversion, and a BW-file importer for ingesting externally-produced
+> peaks, project metadata, sensor self-check, and ZC/Time-of-Peak data,
-> events.  See [CHANGELOG.md](CHANGELOG.md) for full version history.
+> without depending on the still-undecoded waveform body codec.
 > **v0.18.0 (2026-05-19)** adds Thor / Micromate Series IV ingest at
 > `/db/import/idf_file` — paired with **thor-watcher v0.3.0**, every
 > `.IDFH` / `.IDFW` event file (plus its `.txt` sidecar) lands in
 > SeismoDb the same way BW events do.  See
 > [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md) for
 > the IDF format reference and reverse-engineering plan.
 > **v0.19.0 (2026-05-20)** separates Series III and Series IV at the
 > code level: new `micromate/` package alongside `minimateplus/`, new
 > `events.device_family` DB column ("series3" / "series4") so the UI
 > and storage layer dispatch deterministically instead of sniffing
 > filenames.  Self-applying migration backfills existing rows from the
 > binary filename extension.
 > **v0.20.0 (2026-05-28)** closes out the Event-Report PDF iteration
 > started in v0.17.x: histogram layouts render correctly against BW
 > reference PDFs, the ASCII parser handles real-world edge cases
 > (`OORANGE`, `>100 Hz`, histogram timestamps), and per-channel ZC
 > Freq is surfaced in both modals (event browser + main webapp).
 > Adds a server-wide `TZ` env var so operator-visible timestamps
 > render in local time instead of UTC.  New
 > `scripts/backfill_sidecars.py --reparse-txt` lets parser fixes be
 > applied retroactively to existing events without re-forwarding,
 > using the `.TXT` files preserved at ingest time.
 > **v0.21.0 (2026-05-29)** is the Thor / Series IV decoder release —
 > `micromate/idf_file.read_idf_file()` now decodes both IDFW
 > (waveform) and IDFH (histogram) binaries (87–99% sample fidelity
 > on quiet IDFW events; all 859 IDFH corpus files decode cleanly).
 > A new `micromate/idf_to_bw_report.py` adapter projects parsed
 > Thor reports into the BW-shaped sidecar block, so Thor events
 > flow through the existing Event Report PDF pipeline without a
 > separate renderer.  Terra-View v0.13.0 ships in parallel and
 > closes Phase 1 of the SFM integration — see its CHANGELOG.
 > See [CHANGELOG.md](CHANGELOG.md) for full version history.
 ---
@@ -28,17 +64,26 @@ over direct RS-232 or cellular modem (Sierra Wireless RV50 / RV55).
 seismo-relay/
 ├── seismo_lab.py              ← Main GUI (Bridge + Analyzer + Download + Console tabs)
 │
-├── minimateplus/              ← MiniMate Plus client library
+├── minimateplus/              ← Series III (MiniMate Plus) client library
 │   ├── transport.py           ←   SerialTransport, TcpTransport, SocketTransport
 │   ├── protocol.py            ←   DLE frame layer, SUB command dispatch
 │   ├── client.py              ←   High-level client (connect, get_events, delete_all_events, push_config, get_call_home_config, …)
 │   ├── framing.py             ←   Frame builders, DLE codec, S3FrameParser
 │   ├── models.py              ←   DeviceInfo, Event, ComplianceConfig, MonitorLogEntry, CallHomeConfig, …
 │   ├── bw_ascii_report.py     ←   Parse BW per-event ASCII reports (.TXT sidecars)
 │   ├── event_file_io.py       ←   Read BW binaries, write .sfm.json sidecars
 │   └── blastware_file.py      ←   Write events to Blastware-compatible .AB0 files
 │
 ├── micromate/                 ← Series IV (Micromate / Thor) client library (NEW v0.19)
 │   ├── models.py              ←   IdfEvent, IdfReport, IdfPeaks, IdfProjectInfo, IdfSensorCheck (mic in native dB(L))
 │   ├── idf_ascii_report.py    ←   Parse Thor .IDFW.txt / .IDFH.txt event sidecars
 │   ├── idf_file.py            ←   Binary codec for .IDFW + .IDFH (v0.21.0+)
 │   └── idf_to_bw_report.py    ←   Adapter projecting Thor IDF into the BW report shape (v0.21.0+)
 │
 ├── sfm/                       ← SFM REST API server (FastAPI, port 8200)
-│   ├── server.py              ←   Live device endpoints + DB query endpoints + caching
+│   ├── server.py              ←   Live device endpoints + DB query + ingest endpoints + caching
-│   ├── database.py            ←   SeismoDb — SQLite persistence (events, monitor_log, ach_sessions, sessions table)
+│   ├── database.py            ←   SeismoDb — SQLite persistence (events, monitor_log, ach_sessions)
 │   ├── waveform_store.py      ←   On-disk store for BW + IDF event binaries + .sfm.json sidecars
 │   └── sfm_webapp.html        ←   Embedded web UI with Call Home config tab
 │
 ├── bridges/
@@ -55,7 +100,8 @@ seismo-relay/
 │   └── frame_db.py            ←   SQLite frame database
 │
 └── docs/
-    └── instantel_protocol_reference.md  ← Reverse-engineered protocol spec
+    ├── instantel_protocol_reference.md  ← Series III protocol spec (the Rosetta Stone)
    └── idf_protocol_reference.md         ← Series IV (Thor IDF) format reference + codec RE plan
 ```
 ---
@@ -147,11 +193,23 @@ Query the SQLite database written by `ach_server.py`. All read-only except
 | Method | URL | Description |
 |--------|-----|-------------|
 | `GET` | `/db/units` | All known serials with summary stats |
-| `GET` | `/db/events` | Triggered events (filter by serial, date range, false_trigger) |
+| `GET` | `/db/events` | Triggered events (filter by serial, date range, false_trigger).  Response rows include `device_family` ("series3" / "series4") so clients dispatch on unit type without sniffing filenames. |
 | `GET` | `/db/monitor_log` | Monitoring intervals |
 | `GET` | `/db/sessions` | ACH call-home session history |
 | `PATCH` | `/db/events/{id}/false_trigger?value=true` | Flag / unflag false triggers |
 ### File ingest endpoints
 Used by watcher daemons to push field-collected event files into the SFM DB
 + waveform store.  Both accept multipart uploads of binary event files
 optionally paired with their ASCII sidecar reports; both dedup by
 `(serial, timestamp)` and UPSERT device-authoritative fields on re-import.
 | Method | URL | Description |
 |--------|-----|-------------|
 | `POST` | `/db/import/blastware_file` | Series III: `.AB0*` / `.N00` binaries + paired `_ASCII.TXT`.  Source: `series3-watcher`. |
 | `POST` | `/db/import/idf_file` | Series IV: `.IDFH` / `.IDFW` binaries + paired `.IDFW.txt` / `.IDFH.txt`.  Source: `thor-watcher`. |
 ---
 ## minimateplus library
@@ -213,22 +271,77 @@ not per individual event).
 ---
 ## micromate library
 Series IV / Thor support, sibling to `minimateplus`.  Currently scoped to
 offline-file ingest from Thor's TXT exporter; live-device protocol is
 deferred until the binary codec is cracked.
 ```python
 from micromate import IdfEvent, parse_idf_report
 # Parse a .IDFW.txt / .IDFH.txt sidecar (1014 example files round-trip cleanly)
 text = open("UM11719_20231219162723.IDFW.txt").read()
 report_dict = parse_idf_report(text)        # permissive dict
 # Wrap into a typed event using the device-native binary filename
 event = IdfEvent.from_report(report_dict, "UM11719_20231219162723.IDFW")
 event.serial                     # "UM11719"
 event.kind                       # "Waveform" or "Histogram"
 event.peaks.transverse_ips       # 0.0251  (in/s, native unit)
 event.peaks.mic_pspl_dbl         # 99.4    (dB(L), Thor's native mic unit — NOT psi)
 event.project_info.project       # "UPMC Presby-Loc 3-Level1-1R Elevator Rm"
 event.sensor_check.tran          # True (passed self-check)
 event.firmware_version           # "Micromate ISEE 11.0AK"
 event.calibration_text           # "November 22, 2023 by Instantel"
 # Bridge to the existing minimateplus.Event shape for the DB / sidecar paths
 # (waveform_key is a 16-byte sha256 prefix when ingesting from a binary file)
 bridged_event = event.to_minimateplus_event(waveform_key=b"\x00" * 16)
 ```
 The binary codec (`.IDFW` / `.IDFH` event files themselves) is on the
 roadmap — see [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md)
 for everything known so far, the two observed file signatures, and the
 reverse-engineering plan.  The `micromate/idf_file.py` stub is where
 `read_idf_file()` will land.
 ---
 ## Database
-`ach_server.py` writes to `bridges/captures/seismo_relay.db` (SQLite, WAL mode) using the
+`ach_server.py` and the file-ingest endpoints write to
-`SeismoDb` persistence layer. Four tables, all unit-keyed by serial number:
+`bridges/captures/seismo_relay.db` (SQLite, WAL mode) via the `SeismoDb`
 persistence layer.  Three tables, all unit-keyed by serial number:
 | Table | Key | Contents |
 |-------|-----|----------|
 | `ach_sessions` | UUID | Per-call-home audit record: serial, timestamp, peer IP, events_downloaded, monitor_entries, duration_seconds |
-| `events` | UUID, UNIQUE(serial, waveform_key) | Triggered events: timestamp, Tran/Vert/Long/VectorSum/Mic PPV, project/client/operator/sensor_location strings, sample_rate, record_type, false_trigger flag |
+| `events` | UUID, UNIQUE(serial, timestamp) | Triggered events: timestamp, Tran/Vert/Long/VectorSum/Mic PPV, project/client/operator/sensor_location strings, sample_rate, record_type, false_trigger flag, **`device_family`** ("series3" / "series4"), `blastware_filename` (binary at-rest in `waveforms/`), sidecar references |
-| `monitor_log` | UUID, UNIQUE(serial, waveform_key) | Monitoring intervals: serial, waveform_key, start_time, stop_time, duration_seconds, geo_threshold_ips |
+| `monitor_log` | UUID, UNIQUE(serial, start_time) | Monitoring intervals: serial, waveform_key, start_time, stop_time, duration_seconds, geo_threshold_ips |
 | `events.false_trigger` | Boolean flag | PATCH endpoint to mark/unmark false triggers for review |
-Deduplication is by `(serial, waveform_key)` — repeat call-homes or re-runs never
+**Deduplication is by `(serial, timestamp)`** — the device clock is the
-produce duplicate rows. Post-erase key reuse is handled automatically via the
+stable natural key.  Repeat call-homes or re-runs UPSERT the row in place,
-high-water mark in `ach_state.json`. Key-based state tracking allows correct
+refreshing every device-authoritative field (peaks, project strings,
-handling of device erasures (external or post-download).
+sample_rate, file references) so the latest writer wins.  `false_trigger`
 and `device_family` are preserved across UPSERTs.  Earlier versions used
 `(serial, waveform_key)` for dedup, but the device's event-key counter
 resets to `0x01110000` after every erase, so timestamps are the correct
 dedup field.  Migration handles the transition transparently on first
 startup.
 **`device_family` (added v0.19.0)** discriminates Series III from Series
 IV at the SQL level.  Set by every import path; the UI dispatches on it
 to render mic units correctly (Series III: psi → dBL conversion; Series
 IV: native dBL passthrough).  Existing rows are backfilled at first
 startup of v0.19.0+ by sniffing the binary filename extension.
 The on-disk waveform store lives at `bridges/captures/waveforms/<serial>/`
 and holds the original event binaries (BW `.AB0*` / `.N00` for Series III,
 `.IDFH` / `.IDFW` for Series IV) plus their `.sfm.json` review/metadata
 sidecars.  Series III events also produce `.a5.pkl` source-frame pickles
 and `.h5` clean-waveform exports; Series IV doesn't yet (pending codec).
 ---
@@ -310,18 +423,27 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows.
 ## Key Features
-**Device support:**
+**Series III (MiniMate Plus) device support:**
- [x] Full read/write/erase pipelines
+- [x] Full read/write/erase pipelines over RS-232 or TCP/cellular
 - [x] Compliance config (recording mode, sample rate, histogram interval, geo sensitivity, project strings)
 - [x] Auto Call Home config (read/write ACH settings, dial string, time slots, retries)
 - [x] Monitor control (start/stop, status polling, battery/memory)
 - [x] Monitor log entries (continuous monitoring intervals without full waveform download)
 - [x] Blastware file ingest at `/db/import/blastware_file` (paired with `series3-watcher`)
 **Series IV (Micromate / Thor) device support:**
 - [x] Thor IDF file ingest at `/db/import/idf_file` (paired with `thor-watcher`, v0.18.0+)
 - [x] Native `IdfEvent` / `IdfReport` typed models — mic in dB(L), full title strings, sensor self-check, calibration, firmware version
 - [x] Parser verified against 1,014 paired `.txt` sidecars in `thor-watcher/example-data/`
 - [x] Binary `.IDFW` / `.IDFH` codec — ✅ v0.21.0.  IDFW reuses `decode_waveform_v2()` on the body at offset `0x0f1f` (87–99% sample fidelity on quiet events); IDFH has a dedicated segment-based decoder (all 859 corpus files decode, 181,071 intervals total).  See `micromate/idf_file.py` + `docs/idf_protocol_reference.md`.
 - [ ] Live-device protocol — pending codec
 **Data persistence:**
- [x] SQLite database (`seismo_relay.db`) with 4 tables: ach_sessions, events, monitor_log, plus false_trigger flag
+- [x] SQLite database (`seismo_relay.db`) with `events`, `monitor_log`, `ach_sessions` tables
- [x] Deduplication by waveform key (handles re-runs and repeat call-homes)
+- [x] Per-row `device_family` column ("series3" / "series4") for clean UI / unit-of-measurement dispatch (v0.19.0+)
- [x] Post-erase key-reuse detection (tracks high-water mark)
+- [x] Deduplication by `(serial, timestamp)` — natural key handles post-erase counter resets
- [x] Session state (`ach_state.json`) with downloaded keys and max key
+- [x] UPSERT on re-import refreshes every device-authoritative field (peaks, project, sample_rate); preserves operator review state (`false_trigger`)
 - [x] Post-erase key-reuse detection (tracks high-water mark in `ach_state.json`)
 **REST API:**
 - [x] Live device endpoints with in-memory caching (`_LiveCache`)
@@ -329,6 +451,7 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows.
 - [x] DB query endpoints (units, events, monitor_log, sessions, false_trigger PATCH)
 - [x] Call Home config read/write endpoints
 - [x] Blastware file download endpoint (`/device/event/{index}/blastware_file`)
 - [x] Import endpoints for both device families (`/db/import/blastware_file`, `/db/import/idf_file`)
 **File output (v0.7+, byte-perfect as of v0.14.3):**
 - [x] Blastware-compatible `.AB0` / `.G10` file generation (waveform + metadata)
@@ -356,10 +479,113 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows.
 ## Roadmap (Future)
- [ ] Verify 30-sec event download — body may exceed `0xFFFF` and force the device into a different `end_key` encoding (none of 2/3/10-sec test cases hit this boundary)
+### Strategic direction — where this is going
- [ ] Terra-view integration — seismo-relay router, unit detail page, VISON-style event listing
+
- [ ] Vibration summary reports — highest legit PPV per project → Word doc (false trigger filtering first)
+seismo-relay is being built as a **suite of cooperating components**
- [ ] Compliance config encoder — build raw write payloads from a `ComplianceConfig` object
+that together replace and improve on Blastware's role.  Three logical
- [ ] Modem manager — push RV50/RV55 configs via Sierra Wireless API
+tiers:
- [ ] Histogram mode recording support (5A stream analysis for mode 0x03)
+
- [ ] Call Home dial_string write support (requires DLE escaping for embedded control characters)
+1. **SFM** (device-side) — owns the active connection to a physical
   unit.  Today: `minimateplus/`, `/device/*` HTTP endpoints,
   `seismo_lab.py`.  Future: live Thor / Micromate support.
 2. **SDM** (data-side) — owns the database, waveform store, ingest
   pipelines, and the read-API that Terra-View consumes.  Today this
   code lives under `sfm/` for historical reasons; the role has
   migrated and the eventual rename is on the long-tail cleanup list.
 3. **Codec library** — pure data-interpretation: `minimateplus/*_codec.py`,
   `bw_ascii_report.py`, `micromate/idf_*.py`.  Used by both SFM and
   SDM, depends on neither.
 Terra-View is downstream of SDM for fleet listings, event detail, etc.
 The long-term vision adds a **second link** from Terra-View → SFM for
 direct device interaction (see below).
 The codec work in this repo isn't trying to replace BW's network
 layer — BW's ACH file forwarding and Thor's IDF call-home are
 battle-tested.  The value is in the receiving and processing side: turn
 the stream of binary+ASCII pairs into something users can search,
 filter, alert on, and report from.
 ### Terra-View ↔ SFM device control (the long-term vision)
 Today Terra-View only reads from SDM (event listings, dashboards,
 project reports).  When a unit goes missing — operator notices in the
 Terra-View dashboard — there's no way to *do* anything from the UI.
 The path of least resistance is to RDP into a Windows box and open
 Blastware, which defeats the purpose of having Terra-View.
 Target experience:
 - Operator notices a unit in Terra-View dashboard hasn't called in.
 - Clicks unit detail → "Connect to Device" button.
 - Terra-View opens an embedded view (modal or side-panel) that talks
  to SFM's `/device/*` endpoints over the network.
 - Live view: device clock, battery, memory, current monitor status.
 - Actions: start/stop monitoring, push compliance config changes, pull
  fresh events, run a sensor self-check, change call-home settings.
 - Audit log: every connect / action recorded in SDM for the unit
  history.
 Implementation steps (concrete):
 - [ ] **SFM authentication & authorization layer.**  Today `/device/*`
      endpoints are unauthenticated — anyone on the network can call
      them.  Need at minimum a token-based auth, ideally with a "who
      can connect to which units" mapping.  Hard prerequisite for
      letting Terra-View users into the control surface.
 - [ ] **Terra-View "Connect to Device" entry point** on the unit
      detail page.  Renders only when unit has connection info on file
      and the user has permission.
 - [ ] **Embedded live-monitor view** in Terra-View — equivalent to
      `seismo_lab.py`'s Bridge tab, but in the browser.  Polls SFM's
      `/device/monitor/status` on an interval; sends start/stop via
      `/device/monitor/{start,stop}`.
 - [ ] **Action history** — every connect / push / action call records
      a row in `unit_history`, viewable on the unit detail page.
 - [ ] **Series IV live-device support in SFM** — currently `/device/*`
      only supports MiniMate Plus.  Blocks "Connect to Device" for
      Thor units until done.  Depends on Thor wire-protocol capture
      and a `micromate/` parallel of the `minimateplus/` modules.
 ### High-impact (unblocks product features)
 - [ ] **Series III waveform body codec reverse-engineering.**  The 5A bulk-stream body is some kind of compressed/encoded format (not raw int16 LE as previously assumed — see §7.6.1 retraction in `docs/instantel_protocol_reference.md`).  Structural framing is ~50% decoded on branch `claude/codec-re-cBGNe` (tagged-block walker, segment counters); per-byte sample mapping is still open.  Until this lands, the in-app waveform viewer renders garbage and BW-import peak values fall back to `_peaks_from_samples()` saturation noise.  Workaround: pair every BW-imported event with its `_ASCII.TXT` so the device-authoritative peaks land in the DB regardless of codec.
 - [x] **Series IV (Thor IDF) binary codec reverse-engineering.** ✅ v0.21.0 — `micromate/idf_file.read_idf_file()` decodes both IDFW (waveform body at offset `0x0f1f`, reusing `decode_waveform_v2()`; 87–99% sample fidelity on quiet events) and IDFH (dedicated segment-based decoder: all 859 corpus files decode, 181,071 intervals, peaks within ~1.8% of sidecar values).  `WaveformStore.save_imported_idf` now also projects parsed Thor data into a `bw_report` block via `micromate/idf_to_bw_report.py` so Thor events render in the existing Event Report PDF pipeline without a separate renderer.
 - [ ] **In-app waveform viewer accuracy.**  Depends on Series III codec decode.  Plot.v1 JSON pipeline + viewer skeleton already exist; will start showing real waveforms automatically once `_decode_a5_waveform` produces correct samples.  Series IV waveforms come online when the IDF codec lands.
 - [ ] **Series IV live-device support.**  Once the IDF binary is decoded, extend `micromate/` with `transport.py` / `framing.py` / `protocol.py` / `client.py` mirroring the `minimateplus/` package layout — depends on capturing Thor's wire protocol (TCP / RS-232 captures TBD).
 - [ ] **Terra-view integration** — seismo-relay router, unit detail page, VISON-style event listing.
 - [ ] **Vibration summary reports** — highest legit PPV per project → Word doc (false-trigger filtering first).
 ### BW ASCII report parser enhancements (built in v0.16.0)
 - [x] **PPV field misses on certain TXT formats.** ✅ v0.20.0 — root cause was the `OORANGE` (Out Of Range) saturation marker that BW writes when a channel exceeds its full-scale; `_parse_number()` returned None for the non-numeric value.  Parser now substitutes `geo_range_ips` as a lower bound + sets `ppv_saturated` flag.  All 5 prod events (T190LD5Q.LK0W, T438L713.RY0W, K557L3YM.OE0W, + 2 others) now parse cleanly.
 - [x] **Histogram-specific structural fields.** ✅ v0.20.0 — `Histogram Start/Stop Time+Date`, `Number of Intervals`, `Interval Size`, per-channel `Peak Time` + `Peak Date`, and `Peak Vector Sum Date` all parse now.  Land in the sidecar's `bw_report.histogram` block.
 - [ ] **Histogram interval bin-table parsing.**  Trailing 792-row table (per-interval Peak/Freq per channel + MicL) in histogram TXTs is unparsed.  Probably too big for the sidecar JSON; may want a separate `.histogram.h5` companion file.
 - [x] **`>100 Hz` value parsing.** ✅ v0.20.0 — parser now mirrors the OORANGE pattern: stores 100.0 on `zc_freq_hz` + sets `zc_freq_above_range` flag.  PDF + both modals render `>100 Hz` instead of `—`.
 ### Ingestion gaps
 - [ ] **MLG forwarding.**  `series3-watcher` forwards event binaries + their `_ASCII.TXT` reports, but skips `.MLG` per-unit monitor log files entirely.  Adding an `POST /db/import/mlg_file` endpoint + watcher scan path would populate `monitor_log` for non-ACH-routed units (coverage queries, "was this unit monitoring on date X" lookups).
 - [ ] **0C-record raw bytes persistence in the sidecar.**  Currently on branch `claude/codec-re-cBGNe` as commit `a187124`; cherry-pick if useful as a standalone fix.  Preserves the 210-byte 0C record under `extensions.raw_records.waveform_record_b64` so future field-offset analysis (Peak Acceleration / Time of Peak / etc. — the fields BW computes client-side from samples) can run offline.
 ### Operational
 - [ ] **`series3-watcher` file archive manager** — 90-day-old events moved to `<watch_folder>_archive/<year>/<month>/` subfolders.  Plan drafted in `claude/codec-re-cBGNe`'s plan-mode session; awaiting a 5-minute test on whether Blastware UI walks subfolders before any code lands (determines layout: in-place subfolders vs sibling archive).
 - [ ] **Compliance config encoder** — build raw write payloads from a `ComplianceConfig` object.
 - [ ] **Modem manager** — push RV50/RV55 configs via Sierra Wireless API.
 - [ ] **Call Home dial_string write support** (requires DLE escaping for embedded control characters).
 - [ ] **Histogram mode recording support** (5A stream analysis for mode 0x03 — separate from histogram ASCII parsing above).
 ### Test coverage
 - [ ] Verify 30-sec event download — body may exceed `0xFFFF` and force the device into a different `end_key` encoding (none of the 2/3/10-sec test cases hit this boundary).
 - [ ] Histogram mode (0x03) write via SFM — confirmed working for Single Shot / Continuous / Histogram+Continuous; Histogram (0x03) needs a live test from a non-Histogram starting state.
 ### Lower-priority cleanups
 - [ ] Compliance write anchor-9 cleanup — when changing recording_mode via SFM, a spurious `0x10` may persist after Histogram→other mode transitions.  Doesn't affect device operation but differs from BW's byte-perfect output.
 - [ ] Locate "Sensor Check" byte in compliance config (need capture with Disabled vs Before-monitoring).
 - [ ] Call Home — map time slots 3/4 offsets; confirm `modem_power_relay_enabled`.
 - [ ] RV55 DCD/DTR — newer RV55 firmware doesn't assert DCD by default; units don't resume monitoring after call-home disconnect (`--restart-monitoring` flag deferred).
 - [ ] **NULL-timestamp duplicate-row dedup.**  A small handful of events (2 known on prod as of 2026-05-22) have `events.timestamp IS NULL` because the codec couldn't extract a timestamp from the binary footer.  The `UNIQUE(serial, timestamp)` constraint doesn't fire on `NULL` (SQL semantics: `NULL ≠ NULL`), so every `--force` backfill INSERTs a new row instead of UPSERTing the existing one.  Cleanup: a one-shot SQL query that keeps only the newest row per `(serial, blastware_filename)` and deletes the rest.  Longer-term: extend the unique key to `(serial, COALESCE(timestamp, blastware_filename))` or reject inserts with NULL timestamp.
 - [ ] **Histogram body sub-format with `byte[5] != 0`.**  ~3 events on prod (`T190LD5Q.LD0H`, `O121L4L1.GU0H`) use a histogram body my walker doesn't recognize — the first block has `byte[5] = 0x01` or `0x07` instead of `0x00`, and the entire body lacks the `1e 0a 00 00` tail signature.  Codec returns 0 valid blocks; their DB PVS comes from the bw_report ASCII overlay (which BW computed from the same binary, so the DB columns are correct).  Only the `.h5` waveform plot is empty.  Cracking the sub-format would unlock the plot.  Needs binary+ASCII pairs from a few `byte[5]!=0` events; same RE approach as the K558 case.
 - [ ] **Histogram body sub-format with `byte[5] == 0x00` but undecodable.**  Observed 2026-05-28 on BE17353 (S353) events: `S353L4H2.FZ0H`, `S353L4H2.P00H`, `S353L4H3.7O0H`, `S353L4H3.E10H`.  Body starts `00 00 00 01 0a 00 XX 00 ...` which LOOKS like a valid histogram block header (marker 0x000a at byte[4:6] ✓, byte[5]=0x00 normal-format ✓), but the walker finds zero data blocks across the whole body.  Likely an extra header before the block stream OR a different tail signature than `1e 0a 00 00`.  Smaller body lengths (1900-2100 bytes) suggest these may be short-recording histogram variants.  Same operational impact as the byte[5]!=0 case: event ingests cleanly, DB peaks correct via bw_report overlay, only the chart is empty.  Worth dumping a hex view of one body to diagnose.
 - [ ] **Sensor-check waveform extraction from the BW binary.**  BW's Event Report PDFs include a narrow panel on the right side of the waveform plot showing each channel's response to the sensor self-check signal (a damped sinusoid for geo, sawtooth-at-test-freq for mic).  Our parser captures the test RESULTS (`test_freq_hz`, `test_ratio`, `test_amplitude_mv`, `test_results` pass/fail) and the PDF + modal display them as text — but BW's per-sample sensor-check waveform isn't accessible to us today.  Two paths to add it:  (a) RE the binary to find where the sensor-check samples are stored — could be a section before STRT, after the footer, or in a separate sub-record; protocol reference doesn't currently mention it.  (b) If samples aren't in the binary, synthesize a representative waveform from the test parameters (damped sinusoid at `test_freq_hz` with damping from `test_ratio`).  Path (a) is the honest answer; path (b) is decorative.  Until either lands, the text-only sensor-check display in the report is fine.
@@ -0,0 +1,66 @@
 # analysis/ — exploratory scripts for waveform-body RE
 **These are scratch.** Run them, read them, copy them, but don't trust
 them as documentation.  When a finding is verified it gets promoted
 to `minimateplus/waveform_codec.py` and `tests/test_waveform_codec.py`;
 when it's wrong it stays here as a fossil.
 Authoritative status lives in:
 - `docs/waveform_codec_re_status.md` (current truth, working note)
 - `minimateplus/waveform_codec.py` (verified implementation + docstring)
 - `tests/test_waveform_codec.py` (regression locks against fixtures)
 ---
 ## Still useful
 | File | What it does |
 |---|---|
 | `load_bundle.py` | Fixture loader.  Parses BW binary + ASCII TXT into a `Bundle` dataclass with samples, metadata, body bytes.  Used by most other scripts here. |
 | `verify_tran.py` | Verifies `decode_tran_initial` against fixture ground truth across all events.  Useful when you change the decoder and want a quick sanity check. |
 | `inspect_5_11.py` | Inspects the 5-11-26 high-amplitude bundle's body structure, prints metadata, peaks, and block counts. |
 | `walk_5_11.py` | Walks blocks for the 5-11-26 bundle and prints offset/tag/length/data. |
 | `seg1_blocks.py` | Dumps all blocks in segment 1 of each event.  The starting point for cracking multi-segment Tran continuation. |
 | `full_tran.py` | Multi-segment Tran decoder attempt (broken — diverges at sample ~512).  Useful as a starting scaffold for the next experiment. |
 | `multi_segment.py` | Earlier multi-segment attempt with different segment-header consumption strategies.  Records what didn't work. |
 | `test_rle.py` | Tests `00 NN` interpretation as zero-RLE with different divisor values.  Documents how the RLE rule was confirmed. |
 ## Superseded — keep for archaeology
 | File | Superseded by |
 |---|---|
 | `walk_v2.py` … `walk_v5.py` | `walk_v6.py` and ultimately `minimateplus/waveform_codec.walk_body`.  Each version represents one round of refinement.  Don't read in isolation — read the diff between them to see what was learned. |
 | `walk_chunks.py` | `walk_v6.py` / production walker |
 | `decode_v1.py` | First naive decoder attempt.  Wrong but readable. |
 ## Pure exploration — read if curious
 | File | What it explored |
 |---|---|
 | `inspect_body.py` | Byte-frequency stats per event.  Established that bytes 0x00 / 0x10 dominate. |
 | `find_blocks.py` | Searched for repeating 2-byte tag patterns. |
 | `find_signal_runs.py` | Searched for stretches of bytes that "look like a smooth signal" (small inter-byte deltas).  Found the `20 NN` literal blocks. |
 | `dump_head.py`, `dump_trailer.py`, `dump_around.py` | Hex dumpers at various body positions. |
 | `compare_cd.py` | Byte-diff between event-c and event-d (same length, similar signal).  Used to identify structural vs data bytes. |
 | `brute_force.py` | Tested 96 combinations of channel-permutation × nibble-order × sign-convention × init-from-header on the quiet bundle.  All failed because the quiet bundle had T[0]=T[1]=0, making the preamble undetectable. |
 | `try_nibbles.py`, `try_layouts.py` | Earlier channel-interleaving hypotheses.  All wrong. |
 | `test_tran_continue.py` | Test of "Tran continues uninterrupted across `30 04` blocks" hypothesis.  Disproven. |
 ---
 ## Adding new scripts
 If you're picking up the codec work, feel free to add new scripts here.
 Suggested conventions:
 - Start the filename with what you're testing: `test_<hypothesis>.py`,
  `verify_<piece>.py`, `inspect_<region>.py`.
 - Print enough output that the reader can see exactly which events
  match / diverge and where.
 - When a finding is solid, move the verified logic to
  `minimateplus/waveform_codec.py` and add a regression test in
  `tests/test_waveform_codec.py` — don't leave the truth only in
  this directory.
 - If a script is fully superseded, leave it in place (don't delete) —
  the fossil record is useful when re-evaluating hypotheses later.
@@ -0,0 +1,93 @@
 """Brute-force test channel permutations / nibble orders on event-d (simplest signal)."""
 import sys
 import itertools
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 from minimateplus.waveform_codec import walk_body
 def s4(n):
    return n if n < 8 else n - 16
 def decode(body, channel_perm, nibble_order, sign_mode, init_from_header):
    """Try one decoder configuration on event-d. Returns first 8 cumulative samples per channel."""
    blocks = walk_body(body)
    # Initial values from bytes [4:7] if init_from_header else 0
    if init_from_header:
        init = [body[4] if body[4] < 128 else body[4] - 256,
                body[5] if body[5] < 128 else body[5] - 256,
                body[6] if body[6] < 128 else body[6] - 256,
                0]
    else:
        init = [0, 0, 0, 0]
    cur = list(init)
    out = [[init[0]], [init[1]], [init[2]], [init[3]]]  # sample 0 = init
    nibble_idx = 0  # within delta stream; channel = channel_perm[nibble_idx % 4]
    # Walk only the 10 NN data blocks
    for blk in blocks:
        if blk.tag_hi != 0x10:
            continue
        for byte in blk.data:
            if nibble_order == 'high_first':
                nib1, nib2 = (byte >> 4) & 0xF, byte & 0xF
            else:
                nib1, nib2 = byte & 0xF, (byte >> 4) & 0xF
            for nib in (nib1, nib2):
                if sign_mode == 'signed':
                    delta = s4(nib)
                else:
                    delta = nib
                ch = channel_perm[nibble_idx % 4]
                cur[ch] += delta
                if (nibble_idx + 1) % 4 == 0:
                    out[0].append(cur[0])
                    out[1].append(cur[1])
                    out[2].append(cur[2])
                    out[3].append(cur[3])
                nibble_idx += 1
                if len(out[0]) >= 16:
                    return out
    return out
 def best_match(pred, truth, n=10):
    """Sum of squared differences in first n samples."""
    n = min(n, len(pred), len(truth))
    return sum((pred[i] - truth[i])**2 for i in range(n))
 def main():
    b = load_bundle("event-d")
    # truth in 16-count units
    tr = {ch: [round(v * 200) for v in b.samples[ch]] for ch in ("Tran", "Vert", "Long")}
    print("Truth event-d first 10 samples:")
    for ch in ("Tran", "Vert", "Long"):
        print(f"  {ch}: {tr[ch][:10]}")
    # Test 96 combinations
    best = []
    for perm in itertools.permutations([0, 1, 2, 3]):
        for nibble_order in ('high_first', 'low_first'):
            for sign in ('signed', 'unsigned'):
                for init_h in (False, True):
                    decoded = decode(b.body, perm, nibble_order, sign, init_h)
                    # Score as TVL channel-sum
                    score = sum(
                        best_match(decoded[i], tr[ch], n=10)
                        for i, ch in enumerate(("Tran", "Vert", "Long"))
                        if i < 3
                    )
                    label = f"perm={perm} nib={nibble_order[:1]} sign={sign[:3]} init={init_h}"
                    best.append((score, label, decoded))
    best.sort(key=lambda x: x[0])
    print(f"\nTop 10 configurations:")
    for s, lbl, dec in best[:10]:
        print(f"  score={s:>5}  {lbl}  T={dec[0][:8]}  V={dec[1][:8]}  L={dec[2][:8]}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,42 @@
 """Compare event-c and event-d (same N_samples) to find header vs data bytes."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def main():
    bc = load_bundle("event-c")
    bd = load_bundle("event-d")
    # Compare prefixes
    nc, nd = len(bc.body), len(bd.body)
    n = min(nc, nd)
    diffs = []
    for i in range(n):
        if bc.body[i] != bd.body[i]:
            diffs.append(i)
    print(f"event-c body={nc}, event-d body={nd}")
    print(f"Total diffs (first {n}): {len(diffs)}")
    # Show common prefix
    same_prefix = 0
    for i in range(n):
        if bc.body[i] == bd.body[i]:
            same_prefix += 1
        else:
            break
    print(f"Common prefix length: {same_prefix}")
    print(f"event-c prefix: {bc.body[:same_prefix].hex(' ')}")
    # Look for runs of common bytes
    print(f"\nFirst 32 diff positions: {diffs[:32]}")
    # Show the "diff fingerprint" of the first 100 bytes
    print(f"\n  pos    c     d")
    for i in range(0, 100):
        marker = " " if bc.body[i] == bd.body[i] else "*"
        bd_b = bd.body[i] if i < nd else None
        print(f"  {i:>3}  {bc.body[i]:02x}{marker}  {bd_b:02x}" if bd_b is not None else f"  {i:>3}  {bc.body[i]:02x}{marker}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,99 @@
 """
 Decoder v1: nibble-pair signed deltas in 10 NN blocks, 4-channel round-robin.
 """
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def s4(n):
    return n if n < 8 else n - 16
 def walk_blocks(body, start):
    i = start
    blocks = []
    while i + 1 < len(body):
        t0, t1 = body[i], body[i + 1]
        if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
            length = t1 // 2 + 2
            data = bytes(body[i + 2 : i + length])
            blocks.append(("10", t1, data))
            i += length
        elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
            length = t1 + 2
            data = bytes(body[i + 2 : i + length])
            blocks.append(("20", t1, data))
            i += length
        elif t0 == 0x00 and t1 % 4 == 0:
            blocks.append(("00", t1, b""))
            i += 2
        elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
            length = t1 * 4
            data = bytes(body[i + 2 : i + length])
            blocks.append(("30", t1, data))
            i += length
        elif t0 == 0x40 and t1 == 0x02:
            length = 20
            data = bytes(body[i + 2 : i + length])
            blocks.append(("40", t1, data))
            i += length
        else:
            blocks.append(("??", t0, bytes(body[i:i+8])))
            break
    return blocks
 def decode_v1(body, start, n_samples):
    """Decode by accumulating nibble-pair deltas from all 10 NN blocks."""
    blocks = walk_blocks(body, start)
    # 4 channels: T, V, L, M
    cur = [0, 0, 0, 0]
    out = [[], [], [], []]
    sample_index = 0  # how many sample-sets emitted
    for typ, NN, data in blocks:
        if typ == "10":
            # 2 nibbles per byte, round-robin TVLM
            for byte in data:
                for nib in ((byte >> 4) & 0xF, byte & 0xF):
                    ch = sample_index % 4
                    cur[ch] += s4(nib)
                    out[ch].append(cur[ch])
                    sample_index = (sample_index + 1) // 4 * 4 + (sample_index + 1) % 4  # ?
                    sample_index += 1
                    # We emit per-nibble, but the structure is unclear
        elif typ == "20":
            # int8 absolute or delta?
            for byte in data:
                v = byte if byte < 128 else byte - 256
                ch = sample_index % 4
                cur[ch] = v  # treat as absolute
                out[ch].append(cur[ch])
                sample_index += 1
    return out
 def main():
    b = load_bundle("event-c")
    body = b.body
    truth_T = [round(v * 200) for v in b.samples["Tran"]]
    truth_V = [round(v * 200) for v in b.samples["Vert"]]
    truth_L = [round(v * 200) for v in b.samples["Long"]]
    # Find start
    for s in range(15):
        if body[s] == 0x10 and body[s+1] % 4 == 0 and 0 < body[s+1] <= 0xFC:
            start = s
            break
    blocks = walk_blocks(body, start)
    # Print block-by-block what's in each
    print(f"Total blocks: {len(blocks)}")
    bytes_processed = 0
    for typ, NN, data in blocks[:30]:
        print(f"  type={typ} NN=0x{NN:02x} data_len={len(data)} data_hex={data[:32].hex(' ')}{'...' if len(data) > 32 else ''}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,27 @@
 """Dump body bytes around a specific offset."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def dump_around(name: str, center: int, radius: int = 96):
    b = load_bundle(name)
    body = b.body
    start = max(0, center - radius)
    end = min(len(body), center + radius)
    print(f"\n=== {name} body[{start}:{end}] (full body={len(body)}) ===")
    for i in range(start, end, 32):
        row = body[i:i+32]
        marker = "  <-- center" if i <= center < i+32 else ""
        print(f"  +{i:>5}  {row.hex(' ')}{marker}")
 def main():
    # Look at the trailer transitions
    trailer_starts = {"event-a": 7047, "event-b": 6475, "event-c": 4043, "event-d": 3941}
    for name, off in trailer_starts.items():
        dump_around(name, off, 96)
 if __name__ == "__main__":
    main()
@@ -0,0 +1,18 @@
 """Dump the START of each body in 32-byte rows."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def main():
    for name in ("event-a", "event-c"):
        b = load_bundle(name)
        body = b.body
        print(f"\n=== {name} body[0:512] (full body={len(body)}, samples={len(b.samples['Tran'])}) ===")
        for i in range(0, min(512, len(body)), 32):
            row = body[i:i+32]
            print(f"  +{i:>5}  {row.hex(' ')}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,24 @@
 """Dump body bytes split into 32-byte rows starting from `start_offset`."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def dump(body: bytes, name: str, start: int, n_rows: int = 30):
    print(f"\n=== {name} body[{start}:] (full body={len(body)}) ===")
    end = min(start + 32 * n_rows, len(body))
    for i in range(start, end, 32):
        row = body[i:i+32]
        print(f"  +{i:>5}  {row.hex(' ')}")
 def main():
    for name in ("event-a", "event-b", "event-c", "event-d"):
        b = load_bundle(name)
        # Print the LAST ~600 bytes of the body to see the tail structure
        start = max(0, len(b.body) - 32 * 12)
        dump(b.body, name, start, 12)
 if __name__ == "__main__":
    main()
@@ -0,0 +1,41 @@
 """Search for structural repetition in the body bytes."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def find_pattern_offsets(body: bytes, pattern: bytes, max_count=20):
    out = []
    i = 0
    while True:
        i = body.find(pattern, i)
        if i < 0:
            break
        out.append(i)
        i += 1
        if len(out) >= max_count:
            break
    return out
 def main():
    for name in ("event-a", "event-b", "event-c", "event-d"):
        b = load_bundle(name)
        body = b.body
        print(f"\n=== {name} (body={len(body)}, N_samples={len(b.samples['Tran'])}) ===")
        # Try to find repeating substructures (look for 4-byte 0x10-prefixed markers)
        for prefix in [b"\x10\x10", b"\x10\x04", b"\x10\x08", b"\x10\x0c", b"\x10\x18",
                       b"\x10\x14", b"\x10\x20", b"\x10\x40", b"\x10\x80", b"\x10\x00",
                       b"\x10\x01", b"\x10\x03", b"\x10\xf0", b"\xf1\x10", b"\x00\x10",
                       b"\x40\x02", b"\x20\x04", b"\x30\x04", b"\x30\x08", b"\x00\x1a"]:
            offs = find_pattern_offsets(body, prefix, max_count=200)
            if 1 <= len(offs) <= 1000:
                # Print first 10 offsets
                first = offs[:6]
                last = offs[-3:]
                print(f"  '{prefix.hex()}' x{len(offs):>4}  first={first} last={last}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,34 @@
 """Find body byte ranges that look like absolute int8 sample data (smooth waveform)."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def looks_like_smooth_int8(buf):
    """Convert bytes to int8 and check if successive deltas are small (waveform-like)."""
    if len(buf) < 8:
        return 0.0
    vals = [b if b < 128 else b - 256 for b in buf]
    diffs = [abs(vals[i+1] - vals[i]) for i in range(len(vals)-1)]
    avg_diff = sum(diffs) / len(diffs)
    return avg_diff
 def main():
    for name in ("event-a", "event-c"):
        b = load_bundle(name)
        body = b.body
        # Scan with sliding window of 64 bytes; find segments where the bytes look like a smooth wave
        win = 64
        scores = []
        for i in range(len(body) - win):
            scores.append((i, looks_like_smooth_int8(body[i:i+win])))
        # Lowest avg_diff means smoothest
        scores.sort(key=lambda x: x[1])
        print(f"\n=== {name} (body={len(body)}) — smoothest 10 windows ===")
        for off, s in scores[:10]:
            print(f"  +{off:>5}  avg_diff={s:.2f}  bytes={body[off:off+24].hex(' ')}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,76 @@
 """Full Tran decoder: continues across segment headers using T_delta from header bytes [0:2]."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import walk_body, find_data_start
 def s4(n):
    return n if n < 8 else n - 16
 def i8(b):
    return b if b < 128 else b - 256
 def decode_full_tran(body):
    if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
        return None
    T0 = int.from_bytes(body[3:5], "big", signed=True)
    T1 = int.from_bytes(body[5:7], "big", signed=True)
    i = 7
    while i + 1 < len(body) and body[i] not in (0x00, 0x10, 0x20, 0x30, 0x40):
        i += 1
    blocks = walk_body(body, i)
    T = [T0, T1]
    cur = T1
    for blk in blocks:
        if blk.tag_hi == 0x40:
            # Segment header carries 2 T deltas (int16 BE each) at bytes [0:2] and [2:4]
            if len(blk.data) >= 4:
                delta1 = int.from_bytes(blk.data[0:2], "big", signed=True)
                cur += delta1
                T.append(cur)
                delta2 = int.from_bytes(blk.data[2:4], "big", signed=True)
                cur += delta2
                T.append(cur)
        elif blk.tag_hi == 0x10:
            for byte in blk.data:
                for nib in ((byte >> 4) & 0xF, byte & 0xF):
                    cur += s4(nib)
                    T.append(cur)
        elif blk.tag_hi == 0x20:
            for byte in blk.data:
                cur += i8(byte)
                T.append(cur)
        elif blk.tag_hi == 0x00:
            for _ in range(blk.tag_lo):
                T.append(cur)
        # 30 NN: skip for now
    return T
 def main():
    for stem in ("M529LL1L.V70", "M529LL1L.JQ0", "M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        _, samples = _parse_txt(path + ".TXT")
        truth_T = [round(v*200) for v in samples["Tran"]]
        n_truth = len(truth_T)
        decoded = decode_full_tran(body)
        n = min(len(decoded), n_truth)
        matches = sum(1 for i in range(n) if decoded[i] == truth_T[i])
        div_at = -1
        for i in range(n):
            if decoded[i] != truth_T[i]:
                div_at = i
                break
        print(f"{stem}: decoded={len(decoded)}, truth={n_truth}, matches={matches}/{n}, first div={div_at}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,50 @@
 """Quick inspection of the new high-amplitude events."""
 import os, re, sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import walk_body, find_data_start
 ROOT = "tests/fixtures/5-11-26"
 def main():
    for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
        bin_path = os.path.join(ROOT, stem)
        txt_path = bin_path + ".TXT"
        with open(bin_path, "rb") as f:
            raw = f.read()
        body = raw[43:-26]
        meta, samples = _parse_txt(txt_path)
        n = len(samples["Tran"])
        print(f"\n=== {stem} ===")
        print(f"  file={len(raw)}, body={len(body)}, N_samples={n}")
        print(f"  rectime={meta.get('Record Time')} pretrig={meta.get('Pre-trigger Length')}")
        print(f"  PPV(T,V,L)={meta.get('Tran PPV')} / {meta.get('Vert PPV')} / {meta.get('Long PPV')}")
        # Show first few non-trivial samples
        print(f"  First 5 truth samples (in/s):")
        for i in range(5):
            print(f"    T={samples['Tran'][i]:8.3f}  V={samples['Vert'][i]:8.3f}  "
                  f"L={samples['Long'][i]:8.3f}  M={samples['MicL'][i]:8.3f}")
        # Peak sample positions
        for ch in ("Tran", "Vert", "Long"):
            vals = samples[ch]
            peak_i = max(range(n), key=lambda i: abs(vals[i]))
            print(f"  {ch}: peak {vals[peak_i]:.3f} at sample {peak_i} (t={peak_i/1024:.3f}s)")
        # Body structure
        start = find_data_start(body)
        blocks = walk_body(body, start)
        types = {}
        for b in blocks:
            types[b.tag_hi] = types.get(b.tag_hi, 0) + 1
        print(f"  body start={start}, total blocks walked: {len(blocks)}")
        print(f"  block tag counts: {types}")
        # How far the walker got
        if blocks:
            last = blocks[-1]
            walked = last.offset + last.length
            print(f"  walker stopped at offset {walked}/{len(body)} ({100*walked/len(body):.0f}%)")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,23 @@
 """Print raw body hex + byte-distribution stats for one event."""
 from collections import Counter
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def main():
    for name in ("event-a", "event-b", "event-c", "event-d"):
        b = load_bundle(name)
        body = b.body
        print(f"\n=== {name} ({len(body)} body bytes) ===")
        print(f"  STRT: {b.strt.hex()}")
        print(f"  body[0:64]:   {body[:64].hex()}")
        print(f"  body[64:128]: {body[64:128].hex()}")
        print(f"  body[-32:]:   {body[-32:].hex()}")
        cnt = Counter(body)
        print(f"  top 16 bytes: {[(f'0x{k:02x}', f'{v/len(body):.2%}') for k,v in cnt.most_common(16)]}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,144 @@
 """
 load_bundle.py — extract body bytes from BW binary + parse sample columns from TXT.
 Used by the codec reverse-engineering scripts in this directory.
 """
 from __future__ import annotations
 import os
 import re
 from dataclasses import dataclass
 BUNDLE_ROOT = os.path.join(
    os.path.dirname(__file__), "..", "tests", "fixtures", "decode-re-5-8-26"
 )
@dataclass
 class Bundle:
    name: str
    bin_path: str
    txt_path: str
    bin: bytes
    body: bytes  # bytes between STRT (43) and footer (last 26)
    strt: bytes  # 21-byte STRT record
    samples: dict  # {"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}
    sample_rate: int
    rectime_sec: float
    pretrig_sec: float
    geo_range_ips: float
    ppv: dict  # {"Tran": float, "Vert": float, "Long": float}
    mic_pspl: float
    serial: str
 def _parse_txt(path: str) -> dict:
    with open(path, "r", encoding="utf-8", errors="replace") as f:
        text = f.read()
    meta = {}
    samples = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
    # Find header line that starts the columns ("Tran   Vert   Long   MicL").
    # Then every line after is sample data (4 tab-separated floats).
    lines = text.splitlines()
    header_idx = None
    for i, line in enumerate(lines):
        if "Tran" in line and "Vert" in line and "Long" in line and "MicL" in line:
            # The columns header.  Sample lines start a few lines later.
            header_idx = i
            break
    if header_idx is None:
        raise ValueError(f"no Tran/Vert/Long/MicL header in {path}")
    # Parse meta — quoted lines with "Field : value"
    for line in lines[:header_idx]:
        m = re.match(r'^"([^"]+)\s*:\s*([^"]*)"', line.strip())
        if m:
            k, v = m.group(1).strip(), m.group(2).strip()
            meta[k] = v
    # Parse samples
    for line in lines[header_idx + 1 :]:
        line = line.strip()
        if not line:
            continue
        parts = re.split(r"\s+", line)
        if len(parts) < 4:
            continue
        try:
            t = float(parts[0])
            v = float(parts[1])
            l = float(parts[2])
            m = float(parts[3])
        except ValueError:
            continue
        samples["Tran"].append(t)
        samples["Vert"].append(v)
        samples["Long"].append(l)
        samples["MicL"].append(m)
    return meta, samples
 def load_bundle(name: str) -> Bundle:
    folder = os.path.join(BUNDLE_ROOT, name)
    files = os.listdir(folder)
    bin_name = next(f for f in files if not f.endswith(".TXT"))
    txt_name = next(f for f in files if f.endswith(".TXT"))
    bin_path = os.path.join(folder, bin_name)
    txt_path = os.path.join(folder, txt_name)
    with open(bin_path, "rb") as f:
        binary = f.read()
    # Header is 22 bytes; STRT at [22:43]; footer at last 26 bytes.
    strt = binary[22:43]
    body = binary[43:-26]
    meta, samples = _parse_txt(txt_path)
    sample_rate = int(re.search(r"(\d+)", meta.get("Sample Rate", "1024")).group(1))
    rectime_sec = float(re.search(r"([\d.]+)", meta.get("Record Time", "3.0")).group(1))
    pretrig_sec = float(re.search(r"-?[\d.]+", meta.get("Pre-trigger Length", "0")).group(0))
    geo_range_ips = float(re.search(r"([\d.]+)", meta.get("Geo Range", "10.0")).group(1))
    serial = meta.get("Serial Number", "").strip()
    def _f(s):
        return float(re.search(r"-?[\d.]+", s).group(0))
    ppv = {
        "Tran": _f(meta.get("Tran PPV", "0")),
        "Vert": _f(meta.get("Vert PPV", "0")),
        "Long": _f(meta.get("Long PPV", "0")),
    }
    mic_pspl = _f(meta.get("MicL PSPL", "0"))
    return Bundle(
        name=name,
        bin_path=bin_path,
        txt_path=txt_path,
        bin=binary,
        body=body,
        strt=strt,
        samples=samples,
        sample_rate=sample_rate,
        rectime_sec=rectime_sec,
        pretrig_sec=pretrig_sec,
        geo_range_ips=geo_range_ips,
        ppv=ppv,
        mic_pspl=mic_pspl,
        serial=serial,
    )
 if __name__ == "__main__":
    for name in ("event-a", "event-b", "event-c", "event-d"):
        b = load_bundle(name)
        n = len(b.samples["Tran"])
        print(f"{name}: body={len(b.body):>6}  N_samples={n}  rate={b.sample_rate}  "
              f"rectime={b.rectime_sec}  pretrig={b.pretrig_sec}  range={b.geo_range_ips}  "
              f"PPV(T,V,L)={b.ppv['Tran']:.3f},{b.ppv['Vert']:.3f},{b.ppv['Long']:.3f}  "
              f"MicL={b.mic_pspl}")
@@ -0,0 +1,81 @@
 """Decode Tran across multiple segments by resetting at 40 02 headers."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import walk_body, find_data_start
 def s4(n):
    return n if n < 8 else n - 16
 def i8(b):
    return b if b < 128 else b - 256
 def decode_full_tran(body):
    """Decode all Tran samples in the body, walking through segments."""
    if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
        return None
    T0 = int.from_bytes(body[3:5], "big", signed=True)
    T1 = int.from_bytes(body[5:7], "big", signed=True)
    # Locate first tag
    i = 7
    while i + 1 < len(body) and body[i] not in (0x00, 0x10, 0x20, 0x30, 0x40):
        i += 1
    blocks = walk_body(body, i)
    T = [T0, T1]
    cur = T1
    for bi, blk in enumerate(blocks):
        if blk.tag_hi == 0x40:
            # Segment header — try interpreting bytes [0:2] as new T anchor
            if len(blk.data) >= 2:
                new_anchor = int.from_bytes(blk.data[0:2], "big", signed=True)
                # The next sample IS this anchor value, NOT a delta from cur.
                T.append(new_anchor)
                cur = new_anchor
        elif blk.tag_hi == 0x10:
            for byte in blk.data:
                for nib in ((byte >> 4) & 0xF, byte & 0xF):
                    cur += s4(nib)
                    T.append(cur)
        elif blk.tag_hi == 0x20:
            for byte in blk.data:
                cur += i8(byte)
                T.append(cur)
        elif blk.tag_hi == 0x00:
            # RLE: append NN zero deltas
            for _ in range(blk.tag_lo):
                T.append(cur)
        # 30 NN: skip
    return T
 def main():
    for stem in ("M529LL1L.V70", "M529LL1L.JQ0", "M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        _, samples = _parse_txt(path + ".TXT")
        truth_T = [round(v*200) for v in samples["Tran"]]
        n_truth = len(truth_T)
        decoded = decode_full_tran(body)
        n = min(len(decoded), n_truth)
        matches = sum(1 for i in range(n) if decoded[i] == truth_T[i])
        # Find first divergence
        div_at = -1
        for i in range(n):
            if decoded[i] != truth_T[i]:
                div_at = i
                break
        print(f"{stem}: decoded={len(decoded)}, truth={n_truth}, matches={matches}/{n}, first div={div_at}")
        if div_at >= 0 and div_at < 30:
            print(f"  truth around div [{max(0,div_at-3)}:{div_at+8}]: {truth_T[max(0,div_at-3):div_at+8]}")
            print(f"  pred  around div [{max(0,div_at-3)}:{div_at+8}]: {decoded[max(0,div_at-3):div_at+8]}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,28 @@
 """Dump all blocks in segment 1 of each event with their data."""
 import sys
 sys.path.insert(0, ".")
 from minimateplus.waveform_codec import walk_body, find_data_start
 def main():
    for stem in ("M529LL1A.SP0", "M529LL1L.JQ0", "M529LL1L.V70"):
        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        blocks = walk_body(body, find_data_start(body))
        # Find segment 1 (between first and second 40 02)
        seg40_indices = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
        if len(seg40_indices) < 2:
            print(f"\n{stem}: only {len(seg40_indices)} segment headers found")
            seg1_blocks = blocks[seg40_indices[0]:] if seg40_indices else []
        else:
            seg1_blocks = blocks[seg40_indices[0]:seg40_indices[1]+1]
        print(f"\n=== {stem} segment 1 ({len(seg1_blocks)} blocks) ===")
        for b in seg1_blocks[:25]:
            tag = f"{b.tag_hi:02x}{b.tag_lo:02x}"
            print(f"  off={b.offset:>5} {tag} NN=0x{b.tag_lo:02x}({b.tag_lo:>3}) len={b.length:>3}  data={b.data[:16].hex(' ')}{'...' if len(b.data)>16 else ''}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,195 @@
 """Test 12-bit signed packed deltas hypothesis for 30 NN blocks across all loud events.
 For each 30 NN block in each event, identify what samples it should cover
 (based on the cumulative delta count up to that point) and compare the
 truth deltas against various 12-bit packing schemes.
 """
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import walk_body, find_data_start
 CHANNEL_ORDER = ["Vert", "Long", "MicL", "Tran"]  # rotation after initial T
 def s12(v):
    """Sign-extend a 12-bit unsigned value to signed int."""
    return v if v < 0x800 else v - 0x1000
 def unpack_12bit_be(data):
    """4 deltas in 6 bytes, BE order: byte[0:1.5], byte[1.5:3], byte[3:4.5], byte[4.5:6]."""
    # bits 0..47 (MSB-first), split into 4 × 12-bit
    val = int.from_bytes(data, "big")
    out = []
    for i in range(4):
        d = (val >> (12 * (3 - i))) & 0xFFF
        out.append(s12(d))
    return out
 def unpack_12bit_le(data):
    """4 deltas in 6 bytes, LE order: bytes packed as 2 × 24-bit groups."""
    out = []
    # First 3 bytes contain 2 deltas
    b0, b1, b2 = data[0], data[1], data[2]
    d0 = b0 | ((b1 & 0x0F) << 8)
    d1 = (b1 >> 4) | (b2 << 4)
    out.append(s12(d0))
    out.append(s12(d1))
    # Next 3 bytes contain 2 more deltas
    b3, b4, b5 = data[3], data[4], data[5]
    d2 = b3 | ((b4 & 0x0F) << 8)
    d3 = (b4 >> 4) | (b5 << 4)
    out.append(s12(d2))
    out.append(s12(d3))
    return out
 def unpack_12bit_be_per_triplet(data):
    """4 deltas as 2 triplets of (high4, low8) BE within each 3-byte group."""
    out = []
    b0, b1, b2 = data[0], data[1], data[2]
    d0 = (b0 << 4) | (b1 >> 4)
    d1 = ((b1 & 0x0F) << 8) | b2
    out.append(s12(d0))
    out.append(s12(d1))
    b3, b4, b5 = data[3], data[4], data[5]
    d2 = (b3 << 4) | (b4 >> 4)
    d3 = ((b4 & 0x0F) << 8) | b5
    out.append(s12(d2))
    out.append(s12(d3))
    return out
 def truth_deltas_for_block(blocks, block_idx, event_truth, channel):
    """For a 30 NN block at block_idx, determine which samples it covers and
    return the truth deltas for those samples.
    Walks through all blocks before block_idx (within the same segment) and
    counts how many deltas have been emitted for *channel*, starting from the
    segment's anchor pair.
    """
    # Find the segment header that contains this block.
    seg_header_idx = None
    for j in range(block_idx, -1, -1):
        if blocks[j].tag_hi == 0x40:
            seg_header_idx = j
            break
    if seg_header_idx is None:
        # block is in the initial T segment; samples count from sample 2.
        first_sample_in_segment = 2
    else:
        # Anchor pair covers samples [N, N+1] for some N.  Subsequent deltas
        # are samples [N+2, N+2+1, ...].  We don't actually need to know N
        # for this test — just the relative position within the segment.
        first_sample_in_segment = 2  # anchor=0,1; deltas start at 2
    # Count deltas from segment-data start to block_idx.
    delta_count = 0
    start_block = seg_header_idx + 1 if seg_header_idx is not None else 0
    for j in range(start_block, block_idx):
        blk = blocks[j]
        if blk.tag_hi == 0x10:
            delta_count += blk.tag_lo  # NN nibbles = NN deltas
        elif blk.tag_hi == 0x20:
            delta_count += blk.tag_lo  # NN int8 deltas
        elif blk.tag_hi == 0x00:
            delta_count += blk.tag_lo  # RLE zero deltas
    # Now the 30 NN block carries NN deltas.
    nn = blocks[block_idx].tag_lo
    # First sample affected: segment first_sample + delta_count.
    # But we ALSO need to know which segment this is, since the segment maps
    # to a specific channel and a specific starting absolute sample index.
    return first_sample_in_segment + delta_count, nn
 def main():
    for stem in ("M529LL1A.SP0", "M529LL1L.JQ0", "M529LL1L.V70",
                 "M529LL1A.SS0", "M529LL1A.SV0"):
        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        _, samples = _parse_txt(path + ".TXT")
        blocks = walk_body(body, find_data_start(body))
        seg_idx = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
        # Find all 30 NN blocks in DATA section (not trailer).
        thirty_blocks = []
        for bi, b in enumerate(blocks):
            if b.tag_hi != 0x30:
                continue
            # Determine which segment this is in
            seg_num = None
            for k, hi in enumerate(seg_idx):
                next_hi = seg_idx[k + 1] if k + 1 < len(seg_idx) else len(blocks)
                if hi < bi < next_hi:
                    seg_num = k
                    break
            if seg_num is None and seg_idx and bi < seg_idx[0]:
                seg_num = -1  # initial T segment
            thirty_blocks.append((bi, b, seg_num))
        if not thirty_blocks:
            continue
        print(f"\n=== {stem} ===")
        for bi, b, seg_num in thirty_blocks:
            # Channel for this segment
            if seg_num == -1:
                channel = "Tran"
                seg_label = "initial T"
            else:
                channel = CHANNEL_ORDER[seg_num % 4]
                seg_label = f"seg {seg_num}"
            # Count deltas before this block within the same segment.
            seg_header_idx = seg_idx[seg_num] if seg_num >= 0 else -1
            start_block = seg_header_idx + 1 if seg_header_idx >= 0 else 0
            delta_count = 0
            for j in range(start_block, bi):
                blk = blocks[j]
                if blk.tag_hi in (0x10, 0x20, 0x00):
                    delta_count += blk.tag_lo
            # First sample this 30 NN block affects (within the segment)
            # = anchor positions + delta_count + 2 (since anchor pair was samples 0,1)
            # But the segment's first absolute sample index in the channel is
            # (seg_num // 4) * 512 (approximately) if segment 0 is the first V seg.
            cycle = (seg_num // 4) if seg_num >= 0 else 0
            base = cycle * 512 + 2  # +2 for anchor pair
            sample_idx = base + delta_count
            truth_ch = [round(v * 200) for v in samples[channel]]
            nn = b.tag_lo
            if sample_idx + nn >= len(truth_ch):
                print(f"  block @ {b.offset} ({seg_label} {channel}): out of truth range")
                continue
            # Get the previous sample so we can compute truth deltas
            if sample_idx == 0:
                prev = 0
            else:
                prev = truth_ch[sample_idx - 1]
            truth_deltas = []
            for k in range(nn):
                truth_deltas.append(truth_ch[sample_idx + k] - (prev if k == 0 else truth_ch[sample_idx + k - 1]))
            # Try each packing
            schemes = [
                ("12-bit BE contiguous", unpack_12bit_be(b.data)),
                ("12-bit LE per-triplet", unpack_12bit_le(b.data)),
                ("12-bit BE per-triplet", unpack_12bit_be_per_triplet(b.data)),
            ]
            print(f"  block @ {b.offset:>5} ({seg_label} {channel}, samples {sample_idx}..{sample_idx+nn-1}):")
            print(f"    data:  {b.data.hex(' ')}")
            print(f"    truth: {truth_deltas}")
            for name, pred in schemes:
                match = "✓" if pred == truth_deltas else " "
                n_match = sum(1 for x, y in zip(pred, truth_deltas) if x == y)
                print(f"    {match}{n_match}/4  {name}: {pred}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,132 @@
 """Test the '30 NN data = high-nibbles + int8 low-bytes' hypothesis.
 Layout for `30 04` (6 data bytes, 4 deltas):
  bytes [0:2] = 16 bits = 4 × 4-bit high-nibbles (MSB first)
  bytes [2:6] = 4 × int8 low bytes
  Each delta = 12-bit signed = sign-extend((high_nibble << 8) | low_byte)
 """
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import walk_body, find_data_start
 def s4(n):
    return n if n < 8 else n - 16
 def i8(b):
    return b if b < 128 else b - 256
 def sign_extend_12(v):
    return v if v < 0x800 else v - 0x1000
 def decode_30nn(data):
    """4 × 12-bit signed deltas (high nibble + low byte).
    bytes[0:2] hold the 4 high nibbles (MSB first); bytes[2:6] hold the low bytes.
    """
    if len(data) < 6:
        return []
    # Read high nibbles from bytes 0-1 (4 nibbles MSB-first)
    high_word = (data[0] << 8) | data[1]
    high_nibbles = [
        (high_word >> 12) & 0xF,
        (high_word >> 8) & 0xF,
        (high_word >> 4) & 0xF,
        high_word & 0xF,
    ]
    out = []
    for i in range(4):
        v = (high_nibbles[i] << 8) | data[2 + i]
        out.append(sign_extend_12(v))
    return out
 def simulate_up_to(blocks, target_block_idx, t_preamble):
    """Run decoder up to block_idx; return per-channel sample lists.
    NOW with 30 NN decoded too."""
    out = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
    out["Tran"].extend(t_preamble)
    cur = {"Tran": t_preamble[-1], "Vert": None, "Long": None, "MicL": None}
    rotation = ["Vert", "Long", "MicL", "Tran"]
    current_channel = "Tran"
    seg_counter = -1
    for j in range(target_block_idx):
        blk = blocks[j]
        if blk.tag_hi == 0x40:
            seg_counter += 1
            prev = "Tran" if seg_counter == 0 else rotation[(seg_counter - 1) % 4]
            new_ch = rotation[seg_counter % 4]
            if cur[prev] is not None:
                d0 = int.from_bytes(blk.data[0:2], "big", signed=True)
                d1 = int.from_bytes(blk.data[2:4], "big", signed=True)
                cur[prev] += d0; out[prev].append(cur[prev])
                cur[prev] += d1; out[prev].append(cur[prev])
            c0 = int.from_bytes(blk.data[14:16], "big", signed=True)
            c1 = int.from_bytes(blk.data[16:18], "big", signed=True)
            out[new_ch].extend([c0, c1])
            cur[new_ch] = c1
            current_channel = new_ch
        elif blk.tag_hi == 0x10:
            for byte in blk.data:
                for nib in ((byte >> 4) & 0xF, byte & 0xF):
                    cur[current_channel] += s4(nib)
                    out[current_channel].append(cur[current_channel])
        elif blk.tag_hi == 0x20:
            for byte in blk.data:
                cur[current_channel] += i8(byte)
                out[current_channel].append(cur[current_channel])
        elif blk.tag_hi == 0x00:
            for _ in range(blk.tag_lo):
                out[current_channel].append(cur[current_channel])
        elif blk.tag_hi == 0x30:
            # NEW: decode 30 NN
            deltas = decode_30nn(blk.data)
            for d in deltas:
                cur[current_channel] += d
                out[current_channel].append(cur[current_channel])
    return out, current_channel
 def main():
    for stem in ("M529LL1A.SP0", "M529LL1L.JQ0", "M529LL1L.V70",
                 "M529LL1A.SS0", "M529LL1A.SV0"):
        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        _, samples = _parse_txt(path + ".TXT")
        blocks = walk_body(body, find_data_start(body))
        t0 = int.from_bytes(body[3:5], "big", signed=True)
        t1 = int.from_bytes(body[5:7], "big", signed=True)
        thirty_blocks = [(j, b) for j, b in enumerate(blocks) if b.tag_hi == 0x30]
        if not thirty_blocks:
            continue
        print(f"\n=== {stem} ===")
        for j, blk in thirty_blocks:
            pred, ch = simulate_up_to(blocks, j, [t0, t1])
            cur_before = pred[ch][-1]
            truth = [round(v * 200) for v in samples[ch]]
            n_pred = len(pred[ch])
            nn = blk.tag_lo
            if n_pred + nn > len(truth):
                continue
            # Decode this 30 NN block with hypothesis
            pred_deltas = decode_30nn(blk.data)
            # Compute truth deltas relative to cur_before
            truth_deltas = []
            prev = cur_before
            for k in range(nn):
                truth_deltas.append(truth[n_pred + k] - prev)
                prev = truth[n_pred + k]
            n_match = sum(1 for a, b in zip(pred_deltas, truth_deltas) if a == b)
            tag = "✓" if pred_deltas == truth_deltas else " "
            print(f"  block @ {blk.offset:>5} (chan={ch}, NN={nn}):")
            print(f"    data:  {blk.data.hex(' ')}")
            print(f"    truth: {truth_deltas}")
            print(f"    pred:  {pred_deltas}  {tag}{n_match}/{nn}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,141 @@
 """Test 30 NN packing by running the real decoder up to each 30 NN block,
 recording how many samples have been produced for each channel at that point,
 then checking truth deltas immediately after."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import walk_body, find_data_start
 def s4(n):
    return n if n < 8 else n - 16
 def i8(b):
    return b if b < 128 else b - 256
 def s12(v):
    return v if v < 0x800 else v - 0x1000
 def unpack_12bit_be_contiguous(data):
    out = []
    val = int.from_bytes(data, "big")
    n = len(data) * 8 // 12
    for i in range(n):
        d = (val >> (12 * (n - 1 - i))) & 0xFFF
        out.append(s12(d))
    return out
 def unpack_12bit_per_triplet_be(data):
    out = []
    for i in range(0, len(data), 3):
        if i + 2 >= len(data):
            break
        b0, b1, b2 = data[i], data[i + 1], data[i + 2]
        d0 = (b0 << 4) | (b1 >> 4)
        d1 = ((b1 & 0x0F) << 8) | b2
        out.append(s12(d0))
        out.append(s12(d1))
    return out
 def simulate_up_to(blocks, target_block_idx, t_preamble):
    """Run the decoder up to block_idx; return per-channel sample lists."""
    out = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
    out["Tran"].extend(t_preamble)
    cur = {"Tran": t_preamble[-1], "Vert": None, "Long": None, "MicL": None}
    rotation = ["Vert", "Long", "MicL", "Tran"]
    seg_idx = [j for j, b in enumerate(blocks) if b.tag_hi == 0x40]
    # Determine which channel we're CURRENTLY decoding into
    current_channel = "Tran"
    seg_counter = -1  # incremented at each 40 02
    for j in range(target_block_idx):
        blk = blocks[j]
        if blk.tag_hi == 0x40:
            # Switch: extend prev channel, set up new channel
            seg_counter += 1
            prev = "Tran" if seg_counter == 0 else rotation[(seg_counter - 1) % 4]
            new_ch = rotation[seg_counter % 4]
            if cur[prev] is not None:
                d0 = int.from_bytes(blk.data[0:2], "big", signed=True)
                d1 = int.from_bytes(blk.data[2:4], "big", signed=True)
                cur[prev] += d0; out[prev].append(cur[prev])
                cur[prev] += d1; out[prev].append(cur[prev])
            c0 = int.from_bytes(blk.data[14:16], "big", signed=True)
            c1 = int.from_bytes(blk.data[16:18], "big", signed=True)
            out[new_ch].extend([c0, c1])
            cur[new_ch] = c1
            current_channel = new_ch
        elif blk.tag_hi == 0x10:
            for byte in blk.data:
                for nib in ((byte >> 4) & 0xF, byte & 0xF):
                    cur[current_channel] += s4(nib)
                    out[current_channel].append(cur[current_channel])
        elif blk.tag_hi == 0x20:
            for byte in blk.data:
                cur[current_channel] += i8(byte)
                out[current_channel].append(cur[current_channel])
        elif blk.tag_hi == 0x00:
            for _ in range(blk.tag_lo):
                out[current_channel].append(cur[current_channel])
        elif blk.tag_hi == 0x30:
            # Skip for now — we want to know what comes next
            pass
    return out, current_channel
 def main():
    for stem in ("M529LL1A.SP0", "M529LL1L.JQ0", "M529LL1L.V70",
                 "M529LL1A.SS0", "M529LL1A.SV0"):
        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        _, samples = _parse_txt(path + ".TXT")
        blocks = walk_body(body, find_data_start(body))
        t0 = int.from_bytes(body[3:5], "big", signed=True)
        t1 = int.from_bytes(body[5:7], "big", signed=True)
        # Find all 30 NN blocks in data section
        thirty_blocks = [(j, b) for j, b in enumerate(blocks) if b.tag_hi == 0x30]
        if not thirty_blocks:
            continue
        print(f"\n=== {stem} ===")
        for j, blk in thirty_blocks:
            pred, ch = simulate_up_to(blocks, j, [t0, t1])
            n_pred = len(pred[ch])
            # The 30 NN block carries NN deltas for channel `ch` starting at sample n_pred
            truth = [round(v * 200) for v in samples[ch]]
            if n_pred >= len(truth):
                continue
            # Truth deltas: truth[n_pred] - cur, truth[n_pred+1] - truth[n_pred], ...
            cur_val = pred[ch][-1]
            nn = blk.tag_lo
            truth_deltas = []
            prev = cur_val
            for k in range(min(nn, len(truth) - n_pred)):
                truth_deltas.append(truth[n_pred + k] - prev)
                prev = truth[n_pred + k]
            print(f"  block @ {blk.offset:>5} (chan={ch}, after sample {n_pred-1}, "
                  f"NN={nn}, last_val={cur_val}):")
            print(f"    data:  {blk.data.hex(' ')}")
            print(f"    truth: {truth_deltas}")
            schemes = [
                ("12-bit BE contiguous", unpack_12bit_be_contiguous(blk.data)),
                ("12-bit per-triplet BE", unpack_12bit_per_triplet_be(blk.data)),
            ]
            for name, pred_deltas in schemes:
                n_match = sum(1 for a, b in zip(pred_deltas, truth_deltas) if a == b)
                tag = "✓" if pred_deltas == truth_deltas else " "
                print(f"    {tag}{n_match}/{nn}  {name}: {pred_deltas[:nn]}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,86 @@
 """Test: 00 NN markers might be RLE for zero-deltas in current channel."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import walk_body, find_data_start
 def s4(n):
    return n if n < 8 else n - 16
 def i8(b):
    return b if b < 128 else b - 256
 def decode_with_rle(body):
    """Decode Tran assuming:
    - preamble[3:5], [5:7] = T[0], T[1]
    - All 10 NN / 20 NN blocks until segment_header (40 02) are Tran deltas
    - 00 NN markers are RLE: NN/4 zero T deltas (or NN, or NN/2 — try them)
    """
    if len(body) < 9 or body[0:3] != b"\x00\x02\x00":
        return None, None, None
    T0 = int.from_bytes(body[3:5], "big", signed=True)
    T1 = int.from_bytes(body[5:7], "big", signed=True)
    # Find first tag (might be 00 NN, 10 NN, or 20 NN)
    i = 7
    while i + 1 < len(body):
        if body[i] in (0x00, 0x10, 0x20):
            break
        i += 1
    start = i
    blocks = walk_body(body, start)
    results = {}
    for rle_div in (4, 2, 1):  # try different RLE interpretations
        T = [T0, T1]
        cur = T1
        for blk in blocks:
            if blk.tag_hi == 0x40:
                break
            if blk.tag_hi == 0x10:
                for byte in blk.data:
                    for nib in ((byte >> 4) & 0xF, byte & 0xF):
                        cur += s4(nib)
                        T.append(cur)
            elif blk.tag_hi == 0x20:
                for byte in blk.data:
                    cur += i8(byte)
                    T.append(cur)
            elif blk.tag_hi == 0x00:
                # RLE of zero deltas
                n_zeros = blk.tag_lo // rle_div
                for _ in range(n_zeros):
                    T.append(cur)
            # 30 NN: skip for now
        results[rle_div] = T
    return results, T0, T1
 def main():
    for stem in ("M529LL1L.V70", "M529LL1L.JQ0", "M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        _, samples = _parse_txt(path + ".TXT")
        truth_T = [round(v*200) for v in samples["Tran"]]
        results, T0, T1 = decode_with_rle(body)
        print(f"\n=== {stem} (T[0]={T0}, T[1]={T1}) ===")
        for rle_div, T in results.items():
            n = min(len(T), len(truth_T))
            matches = sum(1 for i in range(n) if T[i] == truth_T[i])
            # Find first divergence
            div_at = -1
            for i in range(n):
                if T[i] != truth_T[i]:
                    div_at = i
                    break
            print(f"  rle_div={rle_div}: decoded {len(T)}, matches {matches}/{n}, first div at sample {div_at}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,71 @@
 """Test: does the second '20 NN' block in SS0 continue Tran samples?"""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import walk_body, find_data_start
 def s4(n):
    return n if n < 8 else n - 16
 def i8(b):
    return b if b < 128 else b - 256
 def main():
    stem = "M529LL1A.SS0"
    path = f"tests/fixtures/5-11-26/{stem}"
    with open(path, "rb") as f:
        body = f.read()[43:-26]
    _, samples = _parse_txt(path + ".TXT")
    truth_T_16 = [round(v * 200) for v in samples["Tran"]]
    # Preamble
    T0 = int.from_bytes(body[3:5], "big", signed=True)
    T1 = int.from_bytes(body[5:7], "big", signed=True)
    # Walk blocks
    start = find_data_start(body)
    blocks = walk_body(body, start)
    print(f"=== {stem} ===  T[0]={T0} T[1]={T1}")
    # Hypothesis: Tran continues through ALL 10 NN and 20 NN blocks
    # in order, until the next 40 02 segment header (which resets).
    T = [T0, T1]
    cur = T1
    decoded_count = 2  # T[0], T[1] from preamble
    for bi, blk in enumerate(blocks):
        if blk.tag_hi == 0x10:
            for byte in blk.data:
                for nib in ((byte >> 4) & 0xF, byte & 0xF):
                    cur += s4(nib)
                    T.append(cur)
                    decoded_count += 1
        elif blk.tag_hi == 0x20:
            for byte in blk.data:
                cur += i8(byte)
                T.append(cur)
                decoded_count += 1
        elif blk.tag_hi == 0x40:
            # Segment header — stop here for this test
            break
        # 00 and 30 NN don't contribute to Tran (in this hypothesis)
    # Compare to truth
    print(f"  Decoded {len(T)} T samples up to first 40 02")
    matches = sum(1 for i in range(min(len(T), len(truth_T_16))) if T[i] == truth_T_16[i])
    print(f"  Matches in first {min(len(T), len(truth_T_16))}: {matches}")
    # Print first divergence
    for i in range(min(len(T), len(truth_T_16))):
        if T[i] != truth_T_16[i]:
            print(f"  First divergence: sample {i}: pred={T[i]}, truth={truth_T_16[i]}")
            # Show context
            print(f"    pred  [{i-3}:{i+5}]: {T[max(0,i-3):i+5]}")
            print(f"    truth [{i-3}:{i+5}]: {truth_T_16[max(0,i-3):i+5]}")
            break
 if __name__ == "__main__":
    main()
@@ -0,0 +1,67 @@
 """Try various nibble-level channel interleavings to find which one matches truth."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def s4(n):
    return n if n < 8 else n - 16
 def run_decoder(body, layout, skip, n_channels=4):
    """layout: function nibble_index -> channel_index. Returns list-of-lists per channel."""
    out = [[] for _ in range(n_channels)]
    cur = [0] * n_channels
    nibbles = []
    for byte in body[skip:]:
        nibbles.append((byte >> 4) & 0xF)
        nibbles.append(byte & 0xF)
    for i, n in enumerate(nibbles):
        ch = layout(i)
        cur[ch] += s4(n)
        out[ch].append(cur[ch])
    return out
 def cmp(pred, truth, n=24):
    n = min(n, len(pred), len(truth))
    return [(pred[i], truth[i]) for i in range(n)]
 def main():
    b = load_bundle("event-c")
    truth_T = [round(v * 200) for v in b.samples["Tran"]]
    truth_V = [round(v * 200) for v in b.samples["Vert"]]
    truth_L = [round(v * 200) for v in b.samples["Long"]]
    print(f"T truth[0:10]: {truth_T[:10]}")
    print(f"V truth[0:10]: {truth_V[:10]}")
    print(f"L truth[0:10]: {truth_L[:10]}")
    # Try several nibble->channel layouts (4 channels)
    layouts = {
        "interleaved TVLM (0,1,2,3,0,1,2,3,...)": lambda i: i % 4,
        "interleaved VLMT": lambda i: (i + 3) % 4,
        "interleaved LMTV": lambda i: (i + 2) % 4,
        "interleaved MTVL": lambda i: (i + 1) % 4,
        "byte-based TV LM TV LM (high T low V byte0; high L low M byte1)": lambda i: i % 4,
        # "chunks of 8 nibbles per channel": each channel gets 8 nibbles in a row
        "chunks-8 TVLM": lambda i: (i // 8) % 4,
        "chunks-16 TVLM": lambda i: (i // 16) % 4,
        # planar (full channel sequential)
        "planar T(0..N) V(N..2N) L(2N..3N) M(3N..4N)": None,  # special
    }
    for label, layout_fn in layouts.items():
        if layout_fn is None:
            continue
        for skip in (0, 4, 7, 8, 9, 11, 14):
            out = run_decoder(b.body, layout_fn, skip)
            # Check first 8 cumulative on each channel
            print(f"  skip={skip:2}  {label}")
            print(f"    T_cum[0:10]: {out[0][:10]}")
            print(f"    V_cum[0:10]: {out[1][:10]}")
            print(f"    L_cum[0:10]: {out[2][:10]}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,73 @@
 """Try decoding body as 4-bit signed nibble deltas, 4-channel round-robin."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 CHANNELS = ("Tran", "Vert", "Long", "MicL")
 def s4(n):
    """Sign-extend a 4-bit unsigned to int (0..7 → 0..7, 8..F → -8..-1)."""
    return n if n < 8 else n - 16
 def decode_nibbles(body: bytes, skip_bytes: int = 7, n_channels: int = 4):
    """Read body as 2 nibbles per byte; accumulate as deltas for n_channels round-robin."""
    out = [[] for _ in range(n_channels)]
    cur = [0] * n_channels
    ch = 0
    nibbles = []
    for byte in body[skip_bytes:]:
        nibbles.append((byte >> 4) & 0xF)
        nibbles.append(byte & 0xF)
    for n in nibbles:
        cur[ch] += s4(n)
        out[ch].append(cur[ch])
        ch = (ch + 1) % n_channels
    return out
 def cmp_to_truth(pred, truth, scale=16):
    """Compare predicted ints (in 16-count units) to truth (in 16-count units = txt * 200).
    Return (max_abs_err, mean_abs_err, n_compared).
    """
    n = min(len(pred), len(truth))
    errs = []
    for i in range(n):
        p = pred[i]
        t = truth[i]
        errs.append(abs(p - t))
    if not errs:
        return None
    return (max(errs), sum(errs) / len(errs), n)
 def main():
    for name in ("event-a", "event-c"):
        b = load_bundle(name)
        # Convert TXT samples (in/s) to 16-count units (multiply by 200, since 0.005 in/s = 1)
        # WAIT: 0.005 in/s = 16 ADC counts. 1 count = 0.000305 in/s.
        # So in 1-count units: count = txt * (1/0.0003052) ≈ txt * 3276.7
        # But TXT only has 0.005 resolution so equivalent to 16-count units = txt * 200.
        truth_in_16 = {ch: [round(v * 200) for v in b.samples[ch]] for ch in CHANNELS[:3]}
        # MicL is in dB, skip for now
        # Try decoder with skip_bytes = 7
        decoded = decode_nibbles(b.body, skip_bytes=7, n_channels=4)
        print(f"\n=== {name} ===")
        print(f"  body={len(b.body)}, nibbles={2*(len(b.body)-7)}, samples_per_ch={len(decoded[0])}")
        print(f"  truth samples per ch: {len(truth_in_16['Tran'])}")
        # Print first 24 of each
        for i, chan in enumerate(CHANNELS):
            pred_first = decoded[i][:24]
            if chan in truth_in_16:
                truth_first = truth_in_16[chan][:24]
                print(f"  {chan} pred: {pred_first}")
                print(f"  {chan} truth: {truth_first}")
            else:
                print(f"  {chan} pred: {pred_first}  (truth in dB, skipped)")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,32 @@
 """Verify decode_waveform_v2 against BW ASCII truth for all fixtures."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import decode_waveform_v2
 def main():
    for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0",
                 "M529LL1L.JQ0", "M529LL1L.V70"):
        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            body = f.read()[43:-26]
        _, samples = _parse_txt(path + ".TXT")
        decoded = decode_waveform_v2(body)
        if decoded is None:
            print(f"{stem}: decoder returned None")
            continue
        print(f"\n=== {stem} ===")
        for ch in ("Tran", "Vert", "Long"):
            truth = [round(v * 200) for v in samples[ch]]
            pred = decoded[ch]
            n = min(len(pred), len(truth))
            matches = sum(1 for i in range(n) if pred[i] == truth[i])
            div = next((i for i in range(n) if pred[i] != truth[i]), -1)
            print(f"  {ch}: decoded={len(pred):>5}  truth={len(truth):>5}  "
                  f"matches={matches:>5}/{n:<5}  first div={div}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,55 @@
 """Run decode_waveform_v2 against the 5-8-26 quiet bundle to test the
 'quiet events should decode fully' hypothesis."""
 import os, sys
 sys.path.insert(0, ".")
 from minimateplus.waveform_codec import decode_waveform_v2, walk_body, find_data_start
 from analysis.load_bundle import _parse_txt
 def main():
    base = "tests/fixtures/decode-re-5-8-26"
    for evt in sorted(os.listdir(base)):
        folder = os.path.join(base, evt)
        if not os.path.isdir(folder):
            continue
        # Find the binary (not .TXT)
        bin_name = next(
            (f for f in os.listdir(folder) if not f.endswith(".TXT")),
            None,
        )
        if not bin_name:
            continue
        bin_path = os.path.join(folder, bin_name)
        txt_path = bin_path + ".TXT"
        if not os.path.exists(txt_path):
            # Sometimes the TXT name differs slightly
            for f in os.listdir(folder):
                if f.endswith(".TXT"):
                    txt_path = os.path.join(folder, f)
                    break
        with open(bin_path, "rb") as f:
            body = f.read()[43:-26]
        decoded = decode_waveform_v2(body)
        _, samples = _parse_txt(txt_path)
        # Count 30 NN blocks
        blocks = walk_body(body, find_data_start(body))
        n_30 = sum(1 for b in blocks if b.tag_hi == 0x30)
        n_40 = sum(1 for b in blocks if b.tag_hi == 0x40)
        print(f"\n=== {evt} === body={len(body)}  segments={n_40}  '30 NN' blocks={n_30}")
        if decoded is None:
            print("  decoder returned None")
            continue
        for ch in ("Tran", "Vert", "Long"):
            truth = [round(v * 200) for v in samples[ch]]
            pred = decoded[ch]
            n = min(len(pred), len(truth))
            matches = sum(1 for i in range(n) if pred[i] == truth[i])
            div = next((i for i in range(n) if pred[i] != truth[i]), -1)
            print(f"  {ch}: decoded={len(pred):>5}  truth={len(truth):>5}  "
                  f"matches={matches:>5}/{n:<5}  first div={div}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,71 @@
 """Verify: preamble[3:7] = Tran[0], Tran[1] as int16 BE in 16-count units.
 And first 20/10 NN block = Tran deltas starting at sample 2.
 """
 import os, sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import _parse_txt
 from minimateplus.waveform_codec import walk_body, find_data_start
 def s4(n):
    return n if n < 8 else n - 16
 def i8(b):
    return b if b < 128 else b - 256
 def main():
    for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
        path = f"tests/fixtures/5-11-26/{stem}"
        with open(path, "rb") as f:
            raw = f.read()
        body = raw[43:-26]
        _, samples = _parse_txt(path + ".TXT")
        truth_T_16 = [round(v * 200) for v in samples["Tran"]]
        # Preamble parse
        T0_pre = int.from_bytes(body[3:5], "big", signed=True)
        T1_pre = int.from_bytes(body[5:7], "big", signed=True)
        print(f"\n=== {stem} ===")
        print(f"  Preamble T[0]={T0_pre} (truth {truth_T_16[0]})  T[1]={T1_pre} (truth {truth_T_16[1]})  match={T0_pre==truth_T_16[0] and T1_pre==truth_T_16[1]}")
        # First block
        start = find_data_start(body)
        blocks = walk_body(body, start)
        if not blocks:
            print(f"  no blocks found")
            continue
        # Assume first block = Tran deltas from sample 2
        first = blocks[0]
        T = [T0_pre, T1_pre]
        cur_T = T1_pre
        if first.tag_hi == 0x10:
            # Nibble pairs
            for byte in first.data:
                for nib in ((byte >> 4) & 0xF, byte & 0xF):
                    cur_T += s4(nib)
                    T.append(cur_T)
        elif first.tag_hi == 0x20:
            # int8 per byte
            for byte in first.data:
                cur_T += i8(byte)
                T.append(cur_T)
        # Compare against truth
        n_check = min(len(T), len(truth_T_16))
        match_count = sum(1 for i in range(n_check) if T[i] == truth_T_16[i])
        print(f"  First block type=0x{first.tag_hi:02x} NN=0x{first.tag_lo:02x} len={len(first.data)} → {len(T)} T samples decoded")
        print(f"  Tran predicted[0:10]: {T[:10]}")
        print(f"  Tran truth    [0:10]: {truth_T_16[:10]}")
        print(f"  Matches in first {n_check}: {match_count} / {n_check}")
        # Show where it diverges
        for i in range(n_check):
            if T[i] != truth_T_16[i]:
                print(f"  First divergence: sample {i}: pred={T[i]}, truth={truth_T_16[i]}")
                break
 if __name__ == "__main__":
    main()
@@ -0,0 +1,20 @@
 """Walk blocks of the new 5-11-26 events and look at what comes after Tran block."""
 import sys
 sys.path.insert(0, ".")
 from minimateplus.waveform_codec import walk_body, find_data_start
 def main():
    for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
        with open(f"tests/fixtures/5-11-26/{stem}", "rb") as f:
            raw = f.read()
        body = raw[43:-26]
        start = find_data_start(body)
        blocks = walk_body(body, start)
        print(f"\n=== {stem} === body={len(body)} start={start} blocks walked={len(blocks)}")
        for i, b in enumerate(blocks[:20]):
            print(f"  block[{i:>2}] @ {b.offset:>5} tag={b.tag_hi:02x} NN=0x{b.tag_lo:02x}({b.tag_lo}) len={b.length} data[:24]={b.data[:24].hex(' ')}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,44 @@
 """Walk the body assuming chunks delimited by 0x10 NN tags. Print each chunk's structure."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def walk(body: bytes, start_offset: int = 7, max_chunks: int = 30):
    """Find all positions where byte = 0x10 followed by a multiple-of-4 byte. Print chunks."""
    chunks = []
    i = start_offset
    while i < len(body) - 1:
        # Find next `10 NN` where NN is multiple of 4 (and not preceded by another 0x10 immediately, which would be data).
        if body[i] == 0x10 and (body[i+1] % 4 == 0):
            chunks.append(i)
        i += 1
    return chunks
 def main():
    for name in ("event-c", "event-d"):
        b = load_bundle(name)
        body = b.body
        positions = []
        i = 7  # skip 7-byte preamble
        while i < len(body) - 1:
            if body[i] == 0x10 and body[i+1] % 4 == 0 and body[i+1] > 0:
                positions.append(i)
                i += 2  # skip past tag
            else:
                i += 1
        print(f"\n=== {name} ===  body={len(body)}, total `10 NN` (NN%4==0, NN>0) tags: {len(positions)}")
        # Print first 20 chunks: show position, NN, gap to next tag
        for k in range(min(30, len(positions))):
            pos = positions[k]
            NN = body[pos + 1]
            next_pos = positions[k+1] if k+1 < len(positions) else len(body)
            gap = next_pos - pos
            data_bytes = body[pos+2 : next_pos]
            print(f"  chunk[{k:>3}] @ {pos:>5}  NN=0x{NN:02x} ({NN:>3}, NN/2={NN//2})  gap={gap:>3}  "
                  f"data={data_bytes[:24].hex(' ')}{'...' if len(data_bytes) > 24 else ''}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,50 @@
 """Deterministic chunk walker: each chunk = [10 NN][NN/2 bytes data][2 bytes trailer]."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def walk_chunks(body: bytes, start: int = 7):
    """Yield (offset, NN, data_bytes, trailer_bytes) tuples."""
    i = start
    while i + 1 < len(body):
        if body[i] != 0x10:
            break
        NN = body[i + 1]
        if NN == 0 or NN > 0x80 or NN % 4 != 0:
            break
        chunk_len = NN // 2 + 4
        if i + chunk_len > len(body):
            break
        data = bytes(body[i + 2 : i + 2 + NN // 2])
        trailer = bytes(body[i + 2 + NN // 2 : i + chunk_len])
        yield (i, NN, data, trailer)
        i += chunk_len
 def main():
    for name in ("event-c", "event-d", "event-a", "event-b"):
        b = load_bundle(name)
        body = b.body
        chunks = list(walk_chunks(body))
        print(f"\n=== {name} ===  body={len(body)}  N_samples={len(b.samples['Tran'])}")
        print(f"  chunks parsed: {len(chunks)}")
        if chunks:
            last = chunks[-1]
            end_of_walk = last[0] + last[1] // 2 + 4
            print(f"  walk ended at offset {end_of_walk} (= {len(body) - end_of_walk} bytes from end)")
            # Stats
            total_data_bytes = sum(len(c[2]) for c in chunks)
            print(f"  total data bytes: {total_data_bytes}, total nibbles: {2*total_data_bytes}")
            if name in ("event-c", "event-d"):
                ratio = (2 * total_data_bytes) / (len(b.samples['Tran']) * 4)
                print(f"  nibbles per (sample × channel): {ratio:.3f}")
            # Sum of trailer second-byte
            trailer_sums = [c[3][-1] if c[3] else None for c in chunks]
            print(f"  first 10 chunks: {[(c[0], c[1], c[3].hex()) for c in chunks[:10]]}")
            # Print last 10 chunks (likely transition to trailer)
            print(f"  last 10 chunks: {[(c[0], c[1], c[3].hex()) for c in chunks[-10:]]}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,51 @@
 """Walk chunks; auto-detect preamble length by finding first 10 NN."""
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def walk_chunks(body, start, max_NN=0x80):
    chunks = []
    i = start
    while i + 1 < len(body):
        if body[i] != 0x10:
            break
        NN = body[i + 1]
        if NN == 0 or NN > max_NN or NN % 4 != 0:
            break
        chunk_len = NN // 2 + 4
        if i + chunk_len > len(body):
            break
        data = bytes(body[i + 2 : i + 2 + NN // 2])
        trailer = bytes(body[i + 2 + NN // 2 : i + chunk_len])
        chunks.append((i, NN, data, trailer))
        i += chunk_len
    return chunks, i
 def find_first_chunk_start(body):
    """Locate first byte that begins a `10 NN` chunk (NN ∈ multiples of 4, 4..0x7C)."""
    for i in range(20):
        if body[i] == 0x10 and body[i + 1] % 4 == 0 and 0 < body[i + 1] <= 0x7C:
            return i
    return -1
 def main():
    for name in ("event-c", "event-d", "event-a", "event-b"):
        b = load_bundle(name)
        body = b.body
        start = find_first_chunk_start(body)
        chunks, end = walk_chunks(body, start)
        print(f"\n=== {name} ===  body={len(body)}  N_samples={len(b.samples['Tran'])}  start={start}")
        print(f"  chunks parsed: {len(chunks)}, walk ended at {end}")
        if chunks:
            print(f"  first 5 chunks: {[(c[0], c[1], c[3].hex()) for c in chunks[:5]]}")
            print(f"  last 5 chunks: {[(c[0], c[1], c[3].hex()) for c in chunks[-5:]]}")
            print(f"  bytes around end of walk: {body[end-4:end+12].hex(' ')}")
        else:
            print(f"  bytes at start: {body[start:start+16].hex(' ')}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,75 @@
 """
 Walker v4: alternate [10 NN] data chunks and [00 NN] (or other) marker tags.
 Hypothesis:
 - [10 NN]: data block, length NN/2 + 2 bytes (2-byte tag + NN/2 bytes data)
 - [00 NN]: 2-byte marker block (no data)
 - [20/30/40 NN]: special blocks with type-dependent length
 """
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 def walk(body, start):
    i = start
    blocks = []
    while i + 1 < len(body):
        t0 = body[i]
        t1 = body[i + 1]
        if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0x80:
            # data chunk: length NN/2 + 2
            length = t1 // 2 + 2
            blocks.append((i, "10", t1, bytes(body[i + 2 : i + length]), length))
            i += length
        elif t0 == 0x00 and t1 % 4 == 0:
            # 2-byte marker
            blocks.append((i, "00", t1, b"", 2))
            i += 2
        elif t0 == 0x20 and t1 % 4 == 0:
            # type 2 — try length 2+t1/2 (similar to 10) OR fixed
            length = t1 // 2 + 2
            blocks.append((i, "20", t1, bytes(body[i + 2 : i + length]), length))
            i += length
        elif t0 == 0x30 and t1 % 4 == 0:
            length = t1 // 2 + 2
            blocks.append((i, "30", t1, bytes(body[i + 2 : i + length]), length))
            i += length
        elif t0 == 0x40 and t1 == 0x02:
            # Special "footer transition" block — try fixed 22 bytes
            length = 22
            blocks.append((i, "40", t1, bytes(body[i + 2 : i + length]), length))
            i += length
        else:
            # Unknown tag — stop
            blocks.append((i, "??", t0, bytes(body[i:i+8]), 0))
            break
    return blocks, i
 def main():
    for name in ("event-c", "event-d", "event-a", "event-b"):
        b = load_bundle(name)
        body = b.body
        # Auto-detect start
        for s in range(15):
            if body[s] == 0x10 and body[s+1] % 4 == 0 and 0 < body[s+1] <= 0x80:
                start = s
                break
        else:
            start = 7
        blocks, end = walk(body, start)
        # Categorize
        from collections import Counter
        types = Counter(b[1] for b in blocks)
        print(f"\n=== {name} === body={len(body)} N={len(b.samples['Tran'])}  start={start}")
        print(f"  total blocks: {len(blocks)}, walk ended at {end}/{len(body)}")
        print(f"  type counts: {dict(types)}")
        # Print last 5 blocks
        print(f"  last 5 blocks: {[(bb[0], bb[1], bb[2]) for bb in blocks[-5:]]}")
        if end < len(body):
            print(f"  bytes at end: {body[end:end+24].hex(' ')}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,83 @@
 """
 Walker v5: flexible NN range and multiple block-type lengths.
 Hypothesis:
 - [10 NN]: 4-bit-delta data block, length = NN/2 + 2
 - [20 NN]: 8-bit-literal data block, length = NN + 2
 - [00 NN]: 2-byte marker (no payload)
 - [30 NN]: trailer/summary block, length = NN*4
 - [40 NN]: footer-marker block, fixed 22 bytes
 """
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 from collections import Counter
 def walk(body, start, max_blocks=10000):
    i = start
    blocks = []
    while i + 1 < len(body) and len(blocks) < max_blocks:
        t0 = body[i]
        t1 = body[i + 1]
        if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
            length = t1 // 2 + 2
            if i + length > len(body):
                break
            data = bytes(body[i + 2 : i + length])
            blocks.append((i, "10", t1, data, length))
            i += length
        elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
            length = t1 + 2
            if i + length > len(body):
                break
            data = bytes(body[i + 2 : i + length])
            blocks.append((i, "20", t1, data, length))
            i += length
        elif t0 == 0x00 and t1 % 4 == 0:
            # 2-byte marker
            blocks.append((i, "00", t1, b"", 2))
            i += 2
        elif t0 == 0x30 and t1 % 4 == 0:
            length = t1 * 4
            if i + length > len(body):
                break
            data = bytes(body[i + 2 : i + length])
            blocks.append((i, "30", t1, data, length))
            i += length
        elif t0 == 0x40 and t1 == 0x02:
            length = 22
            if i + length > len(body):
                break
            data = bytes(body[i + 2 : i + length])
            blocks.append((i, "40", t1, data, length))
            i += length
        else:
            blocks.append((i, "??", t0, bytes(body[i:i+8]), 0))
            break
    return blocks, i
 def main():
    for name in ("event-c", "event-d", "event-a", "event-b"):
        b = load_bundle(name)
        body = b.body
        for s in range(15):
            if body[s] == 0x10 and body[s+1] % 4 == 0 and 0 < body[s+1] <= 0xFC:
                start = s; break
        else:
            start = 7
        blocks, end = walk(body, start)
        types = Counter(bb[1] for bb in blocks)
        print(f"\n=== {name} === body={len(body)} N={len(b.samples['Tran'])}  start={start}")
        print(f"  total blocks: {len(blocks)}, walk ended at {end}/{len(body)}")
        print(f"  type counts: {dict(types)}")
        if blocks and blocks[-1][1] == "??":
            print(f"  stopped at byte: 0x{blocks[-1][2]:02x}, prev 5 blocks: {[(bb[0], bb[1], bb[2]) for bb in blocks[-6:-1]]}")
        # Sum payload sizes by type
        payload_sizes = {t: sum(len(bb[3]) for bb in blocks if bb[1] == t) for t in types}
        print(f"  payload bytes by type: {payload_sizes}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,68 @@
 """
 Walker v6: handle 40 02 blocks correctly (length 20).
 Block formats:
 - [10 NN]: 4-bit nibble delta data, length = NN/2 + 2
 - [20 NN]: int8 literal data, length = NN + 2
 - [00 NN]: 2-byte marker
 - [30 NN]: trailer/summary block, length = NN*4
 - [40 02]: segment header, fixed length 20
 """
 import sys
 sys.path.insert(0, ".")
 from analysis.load_bundle import load_bundle
 from collections import Counter
 def walk(body, start, max_blocks=10000):
    i = start
    blocks = []
    while i + 1 < len(body) and len(blocks) < max_blocks:
        t0 = body[i]
        t1 = body[i + 1]
        if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
            length = t1 // 2 + 2
        elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
            length = t1 + 2
        elif t0 == 0x00 and t1 % 4 == 0:
            length = 2
        elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
            length = t1 * 4
        elif t0 == 0x40 and t1 == 0x02:
            length = 20
        else:
            blocks.append((i, "??", t0, bytes(body[i:i+8]), 0))
            break
        if i + length > len(body):
            break
        data = bytes(body[i + 2 : i + length])
        blocks.append((i, f"{t0:02x}", t1, data, length))
        i += length
    return blocks, i
 def main():
    for name in ("event-c", "event-d", "event-a", "event-b"):
        b = load_bundle(name)
        body = b.body
        for s in range(15):
            if body[s] == 0x10 and body[s+1] % 4 == 0 and 0 < body[s+1] <= 0xFC:
                start = s; break
        else:
            start = 7
        blocks, end = walk(body, start)
        types = Counter(bb[1] for bb in blocks)
        print(f"\n=== {name} === body={len(body)} N={len(b.samples['Tran'])}  start={start}")
        print(f"  total blocks: {len(blocks)}, walk ended at {end}/{len(body)}")
        print(f"  type counts: {dict(types)}")
        if blocks and blocks[-1][1] == "??":
            print(f"  stopped at byte: 0x{blocks[-1][2]:02x} at offset {blocks[-1][0]}")
            print(f"  prev 5 blocks: {[(bb[0], bb[1], bb[2]) for bb in blocks[-6:-1]]}")
            print(f"  bytes around stop: {body[end-4:end+24].hex(' ')}")
        # Sum
        payload_sizes = {t: sum(len(bb[3]) for bb in blocks if bb[1] == t) for t in types}
        print(f"  payload bytes by type: {payload_sizes}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,65 @@
 """Run read_idf_file across the corpus and report per-channel accuracy vs sidecars."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from micromate.idf_file import read_idf_file
 from analysis_idf.recon import load_sidecar_samples
 def sidecar_path(idfw: Path) -> Path:
    return idfw.parent / "TXT" / f"{idfw.name}.txt"
 def main():
    root = REPO / "tests/fixtures/THORDATA_example"
    files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]
    files.sort()
    GEO_LSB = 0.0003
    n_ok = n_skip = 0
    overall = {"Tran": [], "Vert": [], "Long": []}
    for f in files:
        try:
            res = read_idf_file(f)
        except Exception:
            n_skip += 1
            continue
        sc_path = sidecar_path(f)
        if not sc_path.exists():
            n_skip += 1
            continue
        try:
            sc = load_sidecar_samples(sc_path)
        except Exception:
            n_skip += 1
            continue
        per_file = {}
        for ch in ("Tran", "Vert", "Long"):
            sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
            dec = res.samples.get(ch, [])
            n = min(len(sc_counts), len(dec))
            if n == 0:
                per_file[ch] = 0.0
                continue
            exact = sum(1 for i in range(n) if sc_counts[i] == dec[i])
            pct = 100.0 * exact / n
            per_file[ch] = pct
            overall[ch].append(pct)
        n_ok += 1
    print(f"Processed {n_ok} files (skipped {n_skip})")
    print("Per-channel exact-match % (mean / min / max):")
    for ch, vals in overall.items():
        if vals:
            avg = sum(vals) / len(vals)
            print(f"  {ch}: mean={avg:.2f}%  min={min(vals):.2f}%  max={max(vals):.2f}%  n={len(vals)}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,49 @@
 """Find where decoded-vs-sidecar diverges for each channel."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from minimateplus.waveform_codec import decode_waveform_v2
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def main():
    buf = TARGET.read_bytes()
    sc = load_sidecar_samples(TXT)
    decoded = decode_waveform_v2(buf[0x0f1f:])
    GEO_LSB = 0.0003
    for ch in ("Tran", "Vert", "Long"):
        sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
        dec = decoded[ch]
        # Find ALL transitions where mismatches start/stop
        first_diff = next((i for i in range(len(dec)) if dec[i] != sc_counts[i]), None)
        if first_diff is None:
            print(f"{ch}: NO MISMATCHES")
            continue
        print(f"{ch}: first diff at idx {first_diff}")
        # Show 5 before, 5 after
        for i in range(max(0, first_diff - 3), min(len(dec), first_diff + 8)):
            mark = "  " if dec[i] == sc_counts[i] else "**"
            print(f"  {mark} idx {i:4d}: sc={sc_counts[i]:6d}  dec={dec[i]:6d}  diff={dec[i]-sc_counts[i]:+d}")
        # Where does cumulative diff exceed 100?
        cum_match_run = 0
        max_match_run = 0
        match_run_start = 0
        diff_count = 0
        for i in range(len(dec)):
            if dec[i] == sc_counts[i]:
                cum_match_run += 1
                max_match_run = max(max_match_run, cum_match_run)
            else:
                cum_match_run = 0
                diff_count += 1
        print(f"  total mismatches: {diff_count}/{len(dec)}, longest run of matches: {max_match_run}")
        print()
 if __name__ == "__main__":
    main()
@@ -0,0 +1,48 @@
 """End-to-end IDFH ingest verification."""
 from __future__ import annotations
 import sys
 import tempfile
 import json
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from sfm.waveform_store import WaveformStore
 def main():
    idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
    txt  = idfh.parent / "TXT" / f"{idfh.name}.txt"
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idfh.read_bytes(),
            idfh,
            idf_report_text=txt.read_text(errors="replace"),
        )
        print("=== save_imported_idf (IDFH) ===")
        print(f"  serial:        {rec['serial']}")
        print(f"  filename:      {rec['filename']}")
        print(f"  filesize:      {rec['filesize']}")
        print(f"  h5:            {rec['hdf5_filename']}")  # expect None for histogram
        print(f"  sidecar:       {rec['sidecar_filename']}")
        print()
        print("=== Event ===")
        print(f"  timestamp:     {ev.timestamp}")
        print(f"  record_type:   {ev.record_type}")
        print(f"  sample_rate:   {ev.sample_rate}")
        print()
        # Inspect sidecar to confirm intervals were stashed
        sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json"
        sc = json.loads(sc_path.read_text())
        intervals = sc.get("extensions", {}).get("idf_intervals", [])
        print(f"  sidecar intervals: {len(intervals)}")
        if intervals:
            print(f"  first interval:    {intervals[0]}")
            print(f"  last interval:     {intervals[-1]}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,40 @@
 """Verify the had_report=False path: ingest IDFW with no .txt."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 import tempfile
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from sfm.waveform_store import WaveformStore
 def main():
    idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idfw.read_bytes(),
            idfw,
            serial_hint=None,
            idf_report_text=None,        # ← no .txt!
        )
        print("=== IDFW without .txt ingest ===")
        print(f"  serial:        {rec['serial']}")
        print(f"  timestamp:     {ev.timestamp}")
        print(f"  sample_rate:   {ev.sample_rate}")
        print(f"  record_type:   {ev.record_type}")
        print(f"  rectime_sec:   {ev.rectime_seconds}")
        nT = len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0
        nV = len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0
        nL = len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0
        nM = len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0
        print(f"  raw_samples:   Tran={nT} Vert={nV} Long={nL} MicL={nM}")
        if ev.peak_values:
            print(f"  peak_values:   tran={ev.peak_values.tran} vert={ev.peak_values.vert} long={ev.peak_values.long}")
        print(f"  h5 written:    {rec['hdf5_filename']}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,102 @@
 """End-to-end Thor report PDF rendering.
 Ingests an IDFW + .txt via save_imported_idf, runs gather_report_data
 (faking a minimal DB row), and renders the PDF to disk.
 """
 from __future__ import annotations
 import sys
 import tempfile
 import json
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from sfm.waveform_store import WaveformStore
 from sfm import report_pdf
 class FakeDb:
    """Stand-in for SeismoDb.get_event(); the renderer only needs a few cols."""
    def __init__(self, event):
        self.event = event
    def get_event(self, _id):
        return self.event
 def main():
    base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719"
    idfw = base / "UM11719_20231219162723.IDFW"
    txt  = base / "TXT" / f"{idfw.name}.txt"
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idfw.read_bytes(),
            idfw,
            idf_report_text=txt.read_text(errors="replace"),
        )
        print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}")
        # Verify sidecar has bw_report block
        sc_path = Path(td) / "UM11719" / f"{idfw.name}.sfm.json"
        sc = json.loads(sc_path.read_text())
        bw = sc.get("bw_report", {})
        print(f"  bw_report.available: {bw.get('available')}")
        print(f"  bw_report.peaks.tran.ppv_ips: {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}")
        print(f"  bw_report.mic.pspl_dbl: {bw.get('mic', {}).get('pspl_dbl')}")
        print(f"  bw_report.histogram.n_intervals: {bw.get('histogram', {}).get('n_intervals')}")
        # Build a DB-row-shaped dict from the Event for gather_report_data
        import datetime
        ts = ev.timestamp
        ts_iso = None
        if ts is not None:
            try:
                ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
            except Exception:
                pass
        fake_row = {
            "serial":              "UM11719",
            "blastware_filename":  rec["filename"],
            "record_type":         "Waveform",
            "timestamp":           ts_iso,
            "sample_rate":         ev.sample_rate,
            "project":             ev.project_info.project if ev.project_info else None,
            "client":              ev.project_info.client  if ev.project_info else None,
            "operator":            ev.project_info.operator if ev.project_info else None,
            "sensor_location":     ev.project_info.sensor_location if ev.project_info else None,
            "created_at":          None,
        }
        rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
        print()
        print(f"=== ReportData ===")
        print(f"  event_id:           {rd.event_id}")
        print(f"  serial:             {rd.serial}")
        print(f"  record_type:        {rd.record_type}")
        print(f"  event_datetime:     {rd.event_datetime_str}")
        print(f"  trigger:            {rd.trigger_source}")
        print(f"  geo_range:          {rd.geo_range_str}")
        print(f"  sample_rate:        {rd.sample_rate_str}")
        print(f"  firmware:           {rd.firmware}")
        print(f"  calibration:        {rd.calibration_date} by {rd.calibration_by}")
        print(f"  battery:            {rd.battery_volts}")
        print(f"  PVS:                {rd.peak_vector_sum_ips} in/s at {rd.peak_vector_sum_time_s} sec")
        print(f"  mic_pspl_dbl:       {rd.mic_pspl_dbl}")
        print(f"  mic_zc_freq_hz:     {rd.mic_zc_freq_hz}")
        print(f"  channel_stats:      {len(rd.channel_stats)} rows")
        for cs in rd.channel_stats:
            print(f"    {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} ToP={cs['time_of_peak_s']} Acc={cs['peak_accel_g']} Disp={cs['peak_disp_in']} Test={cs['sensor_check']}")
        # Render the PDF
        out_path = REPO / "analysis_idf" / "thor_report.pdf"
        pdf_bytes = report_pdf.render_event_report_pdf(rd)
        out_path.write_bytes(pdf_bytes)
        print()
        print(f"  PDF written: {out_path} ({len(pdf_bytes)} bytes)")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,91 @@
 """End-to-end Thor IDFH histogram report PDF rendering."""
 from __future__ import annotations
 import sys
 import tempfile
 import json
 import datetime
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from sfm.waveform_store import WaveformStore
 from sfm import report_pdf
 class FakeDb:
    def __init__(self, event):
        self.event = event
    def get_event(self, _id):
        return self.event
 def main():
    # Use the multi-interval IDFH (81 + trigger row)
    idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
    txt  = idfh.parent / "TXT" / f"{idfh.name}.txt"
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idfh.read_bytes(),
            idfh,
            idf_report_text=txt.read_text(errors="replace"),
        )
        print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}")
        sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json"
        sc = json.loads(sc_path.read_text())
        bw = sc.get("bw_report", {})
        hist = bw.get("histogram", {})
        print(f"  bw_report.histogram.start:           {hist.get('start')}")
        print(f"  bw_report.histogram.stop:            {hist.get('stop')}")
        print(f"  bw_report.histogram.n_intervals:     {hist.get('n_intervals')}")
        print(f"  bw_report.histogram.interval_size:   {hist.get('interval_size')}")
        print(f"  bw_report.histogram.interval_size_s: {hist.get('interval_size_s')}")
        print(f"  bw_report.peaks.tran.ppv_ips:        {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}")
        ts = ev.timestamp
        ts_iso = None
        if ts is not None:
            try:
                ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
            except Exception:
                pass
        fake_row = {
            "serial":              "UM13981",
            "blastware_filename":  rec["filename"],
            "record_type":         "Histogram",
            "timestamp":           ts_iso,
            "sample_rate":         ev.sample_rate,
            "project":             ev.project_info.project if ev.project_info else None,
            "client":              ev.project_info.client  if ev.project_info else None,
            "operator":            ev.project_info.operator if ev.project_info else None,
            "sensor_location":     ev.project_info.sensor_location if ev.project_info else None,
            "created_at":          None,
        }
        rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="hist-1")
        print()
        print("=== ReportData (histogram) ===")
        print(f"  is_histogram:           {rd.is_histogram}")
        print(f"  histogram_start:        {rd.histogram_start_str}")
        print(f"  histogram_stop:         {rd.histogram_stop_str}")
        print(f"  histogram_n_intervals:  {rd.histogram_n_intervals}")
        print(f"  histogram_interval_size:{rd.histogram_interval_size}")
        print(f"  histogram_interval_times[:3]: {rd.histogram_interval_times[:3]}")
        print(f"  histogram_interval_times[-2:]: {rd.histogram_interval_times[-2:]}")
        print(f"  channel_stats: {len(rd.channel_stats)} rows")
        for cs in rd.channel_stats:
            print(f"    {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} peak_date={cs['peak_date']} peak_time={cs['peak_time']}")
        pdf_bytes = report_pdf.render_event_report_pdf(rd)
        out_path = REPO / "analysis_idf" / "thor_report_idfh.pdf"
        out_path.write_bytes(pdf_bytes)
        print()
        print(f"  PDF written: {out_path} ({len(pdf_bytes)} bytes)")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,52 @@
 """End-to-end ingest test: feed an IDFW + .txt to save_imported_idf in a tmp store."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 import tempfile
 import shutil
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from sfm.waveform_store import WaveformStore
 def main():
    idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
    txt  = idfw.parent / "TXT" / f"{idfw.name}.txt"
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idfw.read_bytes(),
            idfw,
            serial_hint=None,
            idf_report_text=txt.read_text(errors="replace"),
        )
        print("=== Save result ===")
        print(f"  serial:    {rec['serial']}")
        print(f"  filename:  {rec['filename']}")
        print(f"  filesize:  {rec['filesize']}")
        print(f"  h5:        {rec['hdf5_filename']}")
        print(f"  sidecar:   {rec['sidecar_filename']}")
        print()
        print("=== Event ===")
        print(f"  serial:        {ev.serial if hasattr(ev,'serial') else '(n/a)'}")
        print(f"  timestamp:     {ev.timestamp}")
        print(f"  sample_rate:   {ev.sample_rate}")
        print(f"  record_type:   {ev.record_type}")
        print(f"  rectime_sec:   {ev.rectime_seconds}")
        print(f"  raw_samples:   Tran={len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0}, Vert={len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0}, Long={len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0}, MicL={len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0}")
        if ev.peak_values:
            print(f"  peaks (txt):   Tran={ev.peak_values.tran} Vert={ev.peak_values.vert} Long={ev.peak_values.long}")
        print()
        # Verify the h5 file actually got written
        h5path = Path(td) / "UM11719" / f"{idfw.name}.h5"
        print(f"  h5 exists:     {h5path.exists()}  size={h5path.stat().st_size if h5path.exists() else 0}")
        sidecar = Path(td) / "UM11719" / f"{idfw.name}.sfm.json"
        print(f"  sidecar exists:{sidecar.exists()}  size={sidecar.stat().st_size if sidecar.exists() else 0}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,137 @@
 """Decode IDFH histogram intervals + verify against sidecar."""
 from __future__ import annotations
 import sys
 import struct
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 SEGMENT_MAGIC = b"\x02\xda\x0a\x00\x00\x00"
 SEGMENT_SIZE = 732   # = 10-byte header + 10 × 72-byte intervals + 2-byte tail
 INTERVAL_SIZE = 72
 CHANNELS = ("Tran", "Vert", "Long", "MicL")
 def decode_interval(buf72: bytes) -> dict:
    """Decode one 72-byte interval into per-channel min/max/halfp."""
    out = {}
    for i, ch in enumerate(CHANNELS):
        block = buf72[i*16 : (i+1)*16]
        mn = struct.unpack_from(">h", block, 0)[0]
        mx = struct.unpack_from(">h", block, 2)[0]
        sb = struct.unpack_from(">h", block, 4)[0]
        halfp = struct.unpack_from(">H", block, 6)[0]
        f10 = struct.unpack_from(">H", block, 10)[0]
        f14 = struct.unpack_from(">H", block, 14)[0]
        peak_count = max(abs(mn), abs(mx))
        out[ch] = {
            "min":     mn,
            "max":     mx,
            "field4":  sb,
            "halfp":   halfp,
            "field10": f10,
            "field14": f14,
            "peak":    peak_count,
            "freq_hz": (512.0 / halfp) if halfp > 5 else None,
        }
    out["_tail"] = buf72[64:].hex(" ")
    return out
 def walk_idfh(buf: bytes) -> list:
    """Walk all interval records in an IDFH file."""
    intervals = []
    # Multi-segment file: every 02 da 0a 00 00 00 marker introduces a segment.
    # Single-interval file: just one body header at 0xf96 of form ?? ?? 0a 00 00 00.
    # Find them all.
    i = 0
    while True:
        j = buf.find(b"\x0a\x00\x00\x00", i)
        if j < 0:
            break
        # Validate: the 2 bytes before must form a length, and we want bytes
        # [j-2 : j+6] to have a recognisable shape.  Actually the cleanest
        # filter is "preceded by a length and followed by 00 NN 05 3f".
        if j < 2:
            i = j + 1
            continue
        # Body header form: [length_be_2][0a 00 00 00][00 NN][05 3f]
        if j + 10 > len(buf):
            break
        length = int.from_bytes(buf[j-2:j], "big")
        # Verify the segment-marker shape: [length_be][0a 00 00 00][00 NN][05 3f]
        if buf[j+4] != 0x00:
            i = j + 1
            continue
        if buf[j+6:j+8] != b"\x05\x3f":
            i = j + 1
            continue
        # Header layout (10 bytes): [length_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B]
        # Followed by N interval records of 72 bytes each, then 2 tail bytes.
        # length value = (N × 72) + 10  (counts bytes from 0x0a... through interval data).
        header_start = j - 2
        n_intervals = (length - 10) // INTERVAL_SIZE
        interval_start = header_start + 10
        for k in range(n_intervals):
            off = interval_start + k * INTERVAL_SIZE
            if off + INTERVAL_SIZE > len(buf):
                break
            chunk = buf[off:off + INTERVAL_SIZE]
            intervals.append({"offset": off, **decode_interval(chunk)})
        i = header_start + length + 2
    return intervals
 def main():
    # Test against multi-segment IDFH
    target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
    sc_path = target.parent / "TXT" / f"{target.name}.txt"
    buf = target.read_bytes()
    intervals = walk_idfh(buf)
    print(f"=== {target.name} ===")
    print(f"  file size: {len(buf)}")
    print(f"  decoded intervals: {len(intervals)}")
    # Show first 2 + last 2
    sc_rows = []
    for line in sc_path.read_text(errors="replace").splitlines():
        if line.startswith("2022-") or line.startswith("2023-"):
            sc_rows.append(line)
    print(f"  sidecar rows: {len(sc_rows)}")
    print()
    for k in [0, 1, 78, 79, 80]:
        if k >= len(intervals):
            continue
        iv = intervals[k]
        print(f"--- interval {k} @0x{iv['offset']:04x} ---")
        for ch in CHANNELS:
            d = iv[ch]
            peak_ips = d["peak"] / 32768 * 10.0
            print(f"  {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s)  halfp={d['halfp']:5d}  freq={d['freq_hz']}")
        # sidecar row
        if k < len(sc_rows):
            print(f"  SC: {sc_rows[k]}")
    # Test single-interval IDFH
    print()
    target2 = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH"
    sc2 = target2.parent / "TXT" / f"{target2.name}.txt"
    buf2 = target2.read_bytes()
    intervals2 = walk_idfh(buf2)
    print(f"=== {target2.name} ===")
    print(f"  file size: {len(buf2)}, decoded intervals: {len(intervals2)}")
    if intervals2:
        iv = intervals2[0]
        for ch in CHANNELS:
            d = iv[ch]
            peak_ips = d["peak"] / 32768 * 10.0
            print(f"  {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s)  halfp={d['halfp']:5d}  freq={d['freq_hz']}")
        sc_rows2 = [l for l in sc2.read_text(errors='replace').splitlines() if l.startswith("2023-")]
        if sc_rows2:
            print(f"  SC: {sc_rows2[0]}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,41 @@
 """Find IDFH interval period via auto-correlation of structural patterns."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 from collections import Counter
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 def main():
    target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
    buf = target.read_bytes()
    body_start = 0xF96
    body_end   = 0x270C
    body = buf[body_start:body_end]
    print(f"body size: {len(body)} bytes (file {len(buf)} bytes)")
    # For each candidate interval size, count how many bytes at fixed offsets within
    # each interval are zero (consistent column-zero pattern indicates correct size).
    print()
    print("=== zero-column score by interval size (higher = more likely) ===")
    best = []
    for sz in range(16, 100):
        n = len(body) // sz
        if n < 30:
            continue
        # For each column position within an interval, count how many of n intervals have zero
        score = 0
        for col in range(sz):
            zeros = sum(1 for i in range(n) if body[i*sz + col] == 0)
            if zeros >= n * 0.9:
                score += 1
        best.append((score, sz, n))
    best.sort(reverse=True)
    for score, sz, n in best[:10]:
        print(f"  size={sz:3d}  n_intervals={n}  consistently-zero-cols={score}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,40 @@
 """Per-file accuracy + sample-count details."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from micromate.idf_file import read_idf_file
 from analysis_idf.recon import load_sidecar_samples
 def main():
    root = REPO / "tests/fixtures/THORDATA_example"
    files = sorted([f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")])
    GEO_LSB = 0.0003
    # Limit to first 15 successful files for detail.
    shown = 0
    for f in files:
        try:
            res = read_idf_file(f)
        except Exception:
            continue
        sc_path = f.parent / "TXT" / f"{f.name}.txt"
        if not sc_path.exists():
            continue
        sc = load_sidecar_samples(sc_path)
        sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]]
        dec = res.samples.get("Tran", [])
        n = min(len(sc_tran), len(dec))
        exact = sum(1 for i in range(n) if sc_tran[i] == dec[i]) if n else 0
        pct = 100.0 * exact / n if n else 0.0
        print(f"{f.name:40s}  size={f.stat().st_size:6d}  sc_n={len(sc_tran):4d}  dec_n={len(dec):4d}  exact={pct:.1f}%")
        shown += 1
        if shown >= 20:
            break
 if __name__ == "__main__":
    main()
@@ -0,0 +1,64 @@
 """Look at what's at the divergence boundary."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from minimateplus.waveform_codec import walk_body, find_data_start, parse_segment_header
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def main():
    buf = TARGET.read_bytes()
    body = buf[0x0f1f:]
    start = find_data_start(body)
    print(f"data_start: {start}  (= file offset 0x{0x0f1f + start:04x})")
    blocks = walk_body(body, start)
    print(f"{len(blocks)} blocks total")
    print()
    # First 25 blocks
    print("=== first 30 blocks ===")
    for i, b in enumerate(blocks[:30]):
        body_off = 0x0f1f + b.offset
        if b.tag_hi == 0x40:
            hdr = parse_segment_header(b)
            print(f"  [{i:3d}] @0x{body_off:04x}  {b.kind}  (segment header)  counter={hdr['counter'] if hdr else '?'}  field2={hdr['field2'].hex() if hdr else '?'}  anchor={hdr['anchor_bytes'].hex() if hdr else '?'}  tail={hdr['tail'].hex() if hdr else '?'}")
        else:
            print(f"  [{i:3d}] @0x{body_off:04x}  {b.kind}  len={b.length}  data={b.data[:16].hex()}")
    print()
    # Cumulative sample counts per block to find which block contains sample 254
    print("=== cumulative samples through blocks ===")
    cur_ch = "Tran"
    rotation = ["Vert", "Long", "MicL", "Tran"]
    seg_count = 0
    samples_in_curseg = 2  # preamble Tran[0], Tran[1]
    for i, b in enumerate(blocks[:30]):
        if b.tag_hi == 0x40:
            seg_count += 1
            prev_ch = cur_ch
            cur_ch = rotation[(seg_count - 1) % 4]
            print(f"  [{i:3d}] 40 02 -> end of {prev_ch} segment, start {cur_ch} (segment {seg_count})")
            samples_in_curseg = 2  # anchors
        elif (b.tag_hi & 0xF0) == 0x10:
            nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo
            samples_in_curseg += nn
            print(f"  [{i:3d}] {b.kind} nibble: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
        elif (b.tag_hi & 0xF0) == 0x20:
            nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo
            samples_in_curseg += nn
            print(f"  [{i:3d}] {b.kind} int8: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
        elif b.tag_hi == 0x00:
            samples_in_curseg += b.tag_lo
            print(f"  [{i:3d}] {b.kind} RLE: +{b.tag_lo}, ch={cur_ch}, ch_total~{samples_in_curseg}")
        elif b.tag_hi == 0x30:
            samples_in_curseg += b.tag_lo
            print(f"  [{i:3d}] {b.kind} packed12: +{b.tag_lo} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,89 @@
 """Reconnaissance helpers for cracking the Thor IDFW binary."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 TARGET = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
 TXT = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/TXT/UM11719_20231219162723.IDFW.txt"
 def hex_at(buf: bytes, off: int, n: int = 32) -> str:
    chunk = buf[off : off + n]
    hexs = " ".join(f"{b:02x}" for b in chunk)
    asc = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk)
    return f"{off:04x}: {hexs}  {asc}"
 def find_all(buf: bytes, needle: bytes) -> list[int]:
    out: list[int] = []
    i = 0
    while True:
        j = buf.find(needle, i)
        if j < 0:
            break
        out.append(j)
        i = j + 1
    return out
 def load_sidecar_samples(path: Path) -> dict[str, list[float]]:
    """Parse the txt sample table — Tran/Vert/Long/MicL."""
    out = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
    in_block = False
    for line in path.read_text(errors="replace").splitlines():
        if not in_block:
            if line.strip() == "Waveform Data Channels":
                in_block = True
            continue
        if line.startswith("Waveform Data USB Channels"):
            break
        parts = line.split("\t")
        # First row is the header "\tTran\tVert\tLong\tMicL"
        if len(parts) >= 5 and parts[1] == "Tran":
            continue
        if len(parts) < 5:
            continue
        try:
            out["Tran"].append(float(parts[1]))
            out["Vert"].append(float(parts[2]))
            out["Long"].append(float(parts[3]))
            out["MicL"].append(float(parts[4]))
        except ValueError:
            continue
    return out
 def main():
    buf = TARGET.read_bytes()
    samples = load_sidecar_samples(TXT)
    print(f"file size: {len(buf)} bytes")
    print(f"sample rows: Tran={len(samples['Tran'])} Vert={len(samples['Vert'])} Long={len(samples['Long'])} MicL={len(samples['MicL'])}")
    print(f"first 6 Tran samples: {samples['Tran'][:6]}")
    print(f"first 6 Vert samples: {samples['Vert'][:6]}")
    print(f"first 6 Long samples: {samples['Long'][:6]}")
    print(f"first 6 MicL samples: {samples['MicL'][:6]}")
    print()
    print("=== BW magic '00 02 00' positions ===")
    hits = find_all(buf, b"\x00\x02\x00")
    print(f"{len(hits)} hits")
    for h in hits[:20]:
        print(hex_at(buf, h, 24))
    print()
    print("=== '40 02' segment-header positions ===")
    hits = find_all(buf, b"\x40\x02")
    print(f"{len(hits)} hits")
    for h in hits:
        ctx_pre = buf[max(0, h - 4): h].hex()
        ctx_post = buf[h: h + 20].hex()
        # Show byte preceding to help identify real headers vs casual occurrences
        print(f"  0x{h:04x}  pre={ctx_pre}  post={ctx_post}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,40 @@
 """Find each segment boundary in the channel and check if errors reset there."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from minimateplus.waveform_codec import decode_waveform_v2
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def main():
    buf = TARGET.read_bytes()
    sc = load_sidecar_samples(TXT)
    decoded = decode_waveform_v2(buf[0x0f1f:])
    GEO_LSB = 0.0003
    for ch in ("Tran", "Vert", "Long"):
        sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
        dec = decoded[ch]
        # Find every transition where error becomes zero from nonzero (or grows from zero)
        # Print indices where dec resyncs back to exact match.
        n = min(len(sc_counts), len(dec))
        events = []
        prev_match = True
        for i in range(n):
            match = sc_counts[i] == dec[i]
            if match != prev_match:
                kind = "RESYNC" if match else "DIVERGE"
                events.append((i, kind, sc_counts[i], dec[i]))
                prev_match = match
        print(f"{ch}: {len(events)} transitions")
        for i, kind, sc_v, dec_v in events[:20]:
            print(f"  idx {i:4d}  {kind:8s}  sc={sc_v:6d}  dec={dec_v:6d}  diff={dec_v-sc_v:+d}")
        print()
 if __name__ == "__main__":
    main()
@@ -0,0 +1,46 @@
 """Smoke-test read_idf_file on IDFH across the corpus."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from micromate.idf_file import read_idf_file
 def main():
    target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH"
    result = read_idf_file(target)
    ev = result.event
    print(f"=== {target.name} ===")
    print(f"  signature:   {result.signature}")
    print(f"  serial:      {ev.serial}")
    print(f"  timestamp:   {ev.timestamp}")
    print(f"  sample_rate: {ev.sample_rate}")
    print(f"  kind:        {ev.kind}")
    print(f"  intervals:   {len(result.intervals or [])}")
    print(f"  peaks:       T={ev.peaks.transverse_ips:.4f} V={ev.peaks.vertical_ips:.4f} L={ev.peaks.longitudinal_ips:.4f}")
    print()
    root = REPO / "tests/fixtures/THORDATA_example"
    files = list(root.rglob("*.IDFH"))
    ok = fail = nyi = 0
    total_intervals = 0
    for f in files:
        try:
            r = read_idf_file(f)
            ok += 1
            total_intervals += len(r.intervals or [])
        except NotImplementedError:
            nyi += 1
        except Exception as exc:
            fail += 1
            if fail <= 3:
                print(f"  FAIL: {f.name}: {type(exc).__name__}: {exc}")
    print(f"Corpus: {len(files)} IDFH files | ok={ok} fail={fail} nyi={nyi}")
    print(f"Total intervals decoded: {total_intervals}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,48 @@
 """Smoke-test read_idf_file across the sample corpus."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from micromate.idf_file import read_idf_file, geo_count_to_ips, mic_count_to_psi
 def main():
    target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
    result = read_idf_file(target)
    ev = result.event
    print(f"=== {target.name} ===")
    print(f"  signature: {result.signature}")
    print(f"  serial:    {ev.serial}")
    print(f"  timestamp: {ev.timestamp}")
    print(f"  sample_rate: {ev.sample_rate}")
    print(f"  record_time: {ev.record_time_sec}")
    print(f"  calibration: {result.binary_metadata.calibration_date}")
    print(f"  Tran samples: {len(result.samples['Tran'])}, peak_ips={ev.peaks.transverse_ips:.4f}")
    print(f"  Vert samples: {len(result.samples['Vert'])}, peak_ips={ev.peaks.vertical_ips:.4f}")
    print(f"  Long samples: {len(result.samples['Long'])}, peak_ips={ev.peaks.longitudinal_ips:.4f}")
    print(f"  MicL samples: {len(result.samples['MicL'])}")
    print()
    # Corpus sweep
    root = REPO / "tests/fixtures/THORDATA_example"
    files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]
    ok = fail = nyi = 0
    for f in files:
        try:
            r = read_idf_file(f)
            ok += 1
        except NotImplementedError:
            nyi += 1
        except Exception as exc:
            fail += 1
            if fail <= 5:
                print(f"  FAIL: {f.name}: {type(exc).__name__}: {exc}")
    print()
    print(f"Corpus: {len(files)} IDFW files | ok={ok} fail={fail} not-implemented={nyi}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,47 @@
 """Verify build_bw_report_from_idf against a known sidecar."""
 from __future__ import annotations
 import json
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from micromate.idf_ascii_report import parse_idf_report
 from micromate.idf_to_bw_report import build_bw_report_from_idf
 from micromate.idf_file import read_idf_file
 def show(prefix: str, d: dict, indent: int = 0):
    for k, v in d.items():
        if isinstance(v, dict):
            print(f"{'  '*indent}{prefix}{k}:")
            show("", v, indent + 1)
        else:
            print(f"{'  '*indent}{prefix}{k}: {v!r}")
 def main():
    base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719"
    idfw = base / "UM11719_20231219162723.IDFW"
    txt  = base / "TXT" / f"{idfw.name}.txt"
    report_dict = parse_idf_report(txt.read_text(errors="replace"))
    res = read_idf_file(idfw)
    bw = build_bw_report_from_idf(report_dict, binary_md=res.binary_metadata)
    print("=== IDFW → bw_report ===")
    show("", bw)
    print()
    print("=== IDFH (single trigger row) ===")
    idfh = base / "UM11719_20231219162648.IDFH"
    txt_h = base / "TXT" / f"{idfh.name}.txt"
    rh = parse_idf_report(txt_h.read_text(errors="replace"))
    res_h = read_idf_file(idfh)
    bw_h = build_bw_report_from_idf(rh, binary_md=res_h.binary_metadata, intervals=res_h.intervals)
    show("", bw_h)
 if __name__ == "__main__":
    main()
@@ -0,0 +1,73 @@
 """Trace Tran sample-by-sample to find exactly where the codec drifts."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def s4(n: int) -> int:
    return n if n < 8 else n - 16
 def i8(b: int) -> int:
    return b if b < 128 else b - 256
 def main():
    buf = TARGET.read_bytes()
    sc = load_sidecar_samples(TXT)
    GEO_LSB = 0.0003
    sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]]
    body = buf[0x0f1f:]
    # Tran[0], Tran[1] from preamble
    t0 = int.from_bytes(body[3:5], "big", signed=True)
    t1 = int.from_bytes(body[5:7], "big", signed=True)
    print(f"preamble Tran[0]={t0}  Tran[1]={t1}  (sidecar: {sc_tran[0]}, {sc_tran[1]})")
    # Block 0: 10 f8 at body[7:9]
    print(f"block 0: tag {body[7]:02x} {body[8]:02x}")
    print(f"  block 0 first 10 data bytes: {body[9:19].hex()}")
    # Walk block 0 manually, comparing each sample
    cur = t1
    samples = [t0, t1]
    block_off = 7
    nn = body[8]
    print(f"  NN = {nn}")
    data = body[9 : 9 + nn // 2]
    for byi, byte in enumerate(data):
        for nib_idx, nib in enumerate(((byte >> 4) & 0xF, byte & 0xF)):
            cur += s4(nib)
            samples.append(cur)
            idx = len(samples) - 1
            if 0 <= idx < len(sc_tran):
                sc_v = sc_tran[idx]
                match = "✓" if sc_v == cur else "✗"
                if idx < 12 or 240 <= idx <= 260:
                    print(f"    idx {idx:3d}: nibble byte={byte:02x} nib={nib:x} delta={s4(nib):+d}  cur={cur:+d}  sc={sc_v:+d}  {match}")
    print(f"end of block 0: cur={cur}, len(samples)={len(samples)}, decoder expected 250 here")
    # Block 1: 20 28 starts at offset 9 + 124 = 133 from block_off=7
    block1_off = 9 + nn // 2
    print(f"block 1: tag {body[block1_off]:02x} {body[block1_off+1]:02x} (expecting 20 28)")
    nn1 = body[block1_off + 1]
    print(f"  block 1 NN = {nn1}")
    data1 = body[block1_off + 2 : block1_off + 2 + nn1]
    for byi, byte in enumerate(data1):
        cur += i8(byte)
        samples.append(cur)
        idx = len(samples) - 1
        if idx < len(sc_tran):
            sc_v = sc_tran[idx]
            match = "✓" if sc_v == cur else "✗"
            if 248 <= idx <= 295:
                print(f"    idx {idx:3d}: int8 byte={byte:02x} delta={i8(byte):+d}  cur={cur:+d}  sc={sc_v:+d}  {match}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,42 @@
 """Feed candidate body offsets to the BW codec and compare with sidecar."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from minimateplus.waveform_codec import decode_waveform_v2, walk_body, find_data_start
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def main():
    buf = TARGET.read_bytes()
    sc = load_sidecar_samples(TXT)
    # Sidecar samples in 0.0003 counts (Thor geo LSB).
    sc_tran = [int(round(v / 0.0003)) for v in sc["Tran"][:30]]
    sc_vert = [int(round(v / 0.0003)) for v in sc["Vert"][:30]]
    sc_long = [int(round(v / 0.0003)) for v in sc["Long"][:30]]
    sc_micl = [int(round(v / 1e-6)) for v in sc["MicL"][:30]]  # 1 µ unit for mic? Will iterate.
    print(f"sidecar Tran (counts): {sc_tran}")
    print(f"sidecar Vert (counts): {sc_vert}")
    print(f"sidecar Long (counts): {sc_long}")
    print(f"sidecar MicL (×1e-6):  {sc_micl}")
    print()
    # Try candidate body start offsets.
    for off in (0x0f1f, 0x1057, 0x11f1, 0x1333, 0x1bde, 0x0d30):
        print(f"=== body @ 0x{off:04x} ===")
        body = buf[off:]
        decoded = decode_waveform_v2(body)
        if not decoded:
            print("  decode_waveform_v2 returned None")
            continue
        for ch in ("Tran", "Vert", "Long", "MicL"):
            arr = decoded.get(ch, [])
            print(f"  {ch}[{len(arr)}]: {arr[:20]}")
        print()
 if __name__ == "__main__":
    main()
@@ -0,0 +1,51 @@
 """Verify decode_waveform_v2 against sidecar across all 2304 samples per channel."""
 from __future__ import annotations
 import sys
 from pathlib import Path
 REPO = Path(__file__).resolve().parents[1]
 sys.path.insert(0, str(REPO))
 from minimateplus.waveform_codec import decode_waveform_v2
 from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
 def main():
    buf = TARGET.read_bytes()
    sc = load_sidecar_samples(TXT)
    body = buf[0x0f1f:]
    decoded = decode_waveform_v2(body)
    print(f"Sidecar lengths: Tran={len(sc['Tran'])} Vert={len(sc['Vert'])} Long={len(sc['Long'])} MicL={len(sc['MicL'])}")
    print(f"Decoded lengths: Tran={len(decoded['Tran'])} Vert={len(decoded['Vert'])} Long={len(decoded['Long'])} MicL={len(decoded['MicL'])}")
    print()
    GEO_LSB = 0.0003  # in/s per count
    for ch in ("Tran", "Vert", "Long"):
        sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
        dec = decoded[ch]
        n = min(len(sc_counts), len(dec))
        matches = sum(1 for i in range(n) if sc_counts[i] == dec[i])
        first_mismatch = next((i for i in range(n) if sc_counts[i] != dec[i]), None)
        print(f"{ch}: compared {n}, exact matches {matches} ({100*matches/n:.2f}%)")
        if first_mismatch is not None:
            i = first_mismatch
            print(f"  first mismatch at idx {i}: sidecar={sc_counts[i]} ({sc[ch][i]}), decoded={dec[i]}")
            print(f"  context sidecar[{i-2}..{i+5}]: {sc_counts[max(0,i-2):i+5]}")
            print(f"  context decoded[{i-2}..{i+5}]: {dec[max(0,i-2):i+5]}")
    # MicL: find the multiplicative factor that fits
    print()
    print("=== MicL scale analysis ===")
    sc_micl = sc["MicL"]
    dec_micl = decoded["MicL"]
    # Skip zero values when computing ratio
    ratios = [sc_micl[i] / dec_micl[i] for i in range(min(50, len(sc_micl), len(dec_micl))) if dec_micl[i] != 0]
    if ratios:
        avg = sum(ratios) / len(ratios)
        print(f"  avg ratio sidecar/decoded over first 50 nonzero: {avg:.4e} (n={len(ratios)})")
        print(f"  ratios sample: {[f'{r:.4e}' for r in ratios[:6]]}")
 if __name__ == "__main__":
    main()
@@ -516,6 +516,7 @@ class AchSession:
                        serial=serial or self.peer,
                        session_id=None,
                        waveform_records=waveform_records,
                        device_family="series3",
                    )
                    _ml_ins, _ml_skip = self.db.insert_monitor_log(
                        new_monitor_entries, session_id=None
@@ -0,0 +1,185 @@
 # Histogram body codec — FULLY DECODED (2026-05-20)
 Clean working status doc for the MiniMate Plus histogram-mode event
 body codec.  Companion to `waveform_codec_re_status.md`.  The deep
 historical record (with retractions and dated analyses) lives in
 `docs/instantel_protocol_reference.md §7.6.2`; the authoritative
 implementation lives in `minimateplus/histogram_codec.py`.
 ## TL;DR
 **The codec is fully decoded.**  Every field of every block in the
 in-repo histogram fixture corpus decodes byte-exact against BW's
 ASCII export.
 26 regression tests pass against ~3,500 blocks across 5 in-repo
 fixtures, plus a synthetic regression block taken from a real
 BE9558 prod event to lock in the uint8-peak interpretation.
 **Important correction (2026-05-21):** the per-channel peak count
 is `uint8` at byte[6]/[10]/[14]/[18], NOT `uint16 LE` at byte[6:8]
 etc.  The N844 fixture corpus the original RE was done against has
 zero values in bytes [7]/[11]/[15]/[19] for every block, so the
 two interpretations happened to be equivalent.  Cross-correlating
 non-N844 events (BE9558 Tran-drift, BE18003 Histogram+Continuous)
 against BW's per-interval ASCII export — 4 channels × ~1400 blocks
 per event × multiple events = 100% byte-exact only when the peak
 is read as uint8.  Reading as uint16 LE produced peaks up to 268
 in/s per channel and 35× inflated PVS sums when first deployed to
 prod (rolled back, root-caused, and fixed in commit 7183b95+1).
 ## Body format
 ```
 body = [stream of 32-byte data blocks] + [small trailing remnant]
 ```
 Each block represents one histogram interval.  Block layout:
 ```
 [0]    0x00                      always-zero tag
 [1]    segment_id (uint8)        0x00..0x03 — 256 blocks per segment
 [2:4]  block_ctr (uint16 LE)     resets each segment (0x0100, 0x0101, …)
 [4:6]  0x000a (uint16 LE)        constant marker (= 10)
 [6]    T_peak_count   uint8      Tran peak (count × 0.005 → in/s at Normal,
                                  max 1.275 in/s — fits in uint8)
 [7]    T_annotation   uint8      empirically non-zero on intervals with sub-Hz
                                  or unmeasurable freq; meaning not fully RE'd
 [8:10] T_halfperiod   uint16 LE  Tran half-period in samples
                                  (freq_Hz = 512 / halfp; ≤ 5 means ">100 Hz")
 [10]   V_peak_count   uint8      Vert peak
 [11]   V_annotation   uint8
 [12:14] V_halfperiod  uint16 LE  Vert freq half-period
 [14]   L_peak_count   uint8      Long peak
 [15]   L_annotation   uint8
 [16:18] L_halfperiod  uint16 LE  Long freq half-period
 [18]   M_peak_count   uint8      MicL peak count
                                  (dB via waveform_codec.mic_count_to_db)
 [19]   M_annotation   uint8
 [20:22] M_halfperiod  uint16 LE  MicL freq half-period
 [22:24] 0x00 0x00                constant
 [24:28] 4-byte variable          purpose unknown — possibly CRC,
                                  timestamp delta, or psi(L) numeric;
                                  not needed for waveform reconstruction
 [28:32] 0x1e 0x0a 0x00 0x00      constant block-end signature
 ```
 Reliable block-identification anchor:
 ```python
 block[22:24] == b"\x00\x00" and block[28:32] == b"\x1e\x0a\x00\x00"
 ```
 (The `1e 0a 00 00` constant tail is the most distinctive signature.)
 ## Per-channel encoding
 | Channel | Peak encoding | Frequency encoding |
 |---|---|---|
 | Tran | count × 0.005 = in/s at Normal range | `freq_Hz = 512 / halfperiod` |
 | Vert | same | same |
 | Long | same | same |
 | MicL | count → dB via `mic_count_to_db(count)` (same formula as waveform codec) | same |
 **`>100 Hz` sentinel**: when halfperiod ≤ 5 (giving ≥100 Hz from the
 512/halfp formula), BW displays `>100 Hz`.  Codec's `half_period_to_hz`
 returns `None` in this range.
 ## Verified facts (cross-checked against fixture corpus)
 Example: N844L6Z8.ZR0H block 130 → all 8 decoded fields byte-exact:
 ```
 binary samples [10, 6, 24, 4, 18, 5, 21, 5, 9]
 TXT row        [0.030, 21, 0.020, 28, 0.025, 24, 0.040, 0.000, 95.92, 57]
 slot[0] = 10                                  marker
 slot[1] = 6  × 0.005 = 0.030 in/s         ✓ T_peak
 slot[2] = 24 → 512/24 = 21.3 → 21 Hz      ✓ T_freq
 slot[3] = 4  × 0.005 = 0.020 in/s         ✓ V_peak
 slot[4] = 18 → 512/18 = 28.4 → 28 Hz      ✓ V_freq
 slot[5] = 5  × 0.005 = 0.025 in/s         ✓ L_peak
 slot[6] = 21 → 512/21 = 24.4 → 24 Hz      ✓ L_freq
 slot[7] = 5  → 81.94 + 20·log10(5) = 95.92 dB  ✓ M_peak
 slot[8] = 9  → 512/9 = 56.9 → 57 Hz       ✓ M_freq
 ```
 ## Verified test coverage
 `tests/test_histogram_codec.py` (24 tests):
 - Block walking: yields one record per `.TXT` interval ± 1 (off-by-one
  at the tail when recording was stopped mid-write).  Segment-ID
  groups of 256 blocks confirmed.
 - Geo peaks: every block of N844L20G, N844L6Z8, N844L6XE, N844L23B
  matches `.TXT` within the 0.0005 in/s quantization step.
 - Geo freqs: every block of N844L6Z8 and N844L6XE matches `.TXT`
  within 1 Hz (BW display rounds).  `>100 Hz` sentinel handled correctly.
 - Mic dB: every block of N844L6XE, N844L23B, N844L6Z8 matches `.TXT`
  within 0.1 dB (BW display precision).
 - Mic freq: matches `.TXT` within 1 Hz across active blocks.
 ## What's NOT yet decoded
 - **Annotation bytes (`block[7]/[11]/[15]/[19]`)**.  Empirically
  non-zero on intervals where the per-channel ZC frequency comes
  out as `N/A` or sub-Hz (`<1.0`, `1.X`).  Hypothesis tested in the
  RE session: byte != 0 ↔ sub-Hz freq.  Only ~50% correlation
  across the K558 corpus, so the relationship is more complex.
  Possibilities: time-of-peak-within-interval, halfp extension for
  very-long-period signals, or a debug/diagnostic field the firmware
  writes opportunistically.  Doesn't affect peak amplitudes or
  waveform reconstruction.  Captured as `record["annotations"]` for
  future RE.
 - **4-byte variable metadata field (bytes 24:28)**.  Not needed for
  waveform reconstruction.  Speculation: per-block CRC, sub-second
  timestamp offset, or a Mic psi(L) count not in the 9 samples.
  Punt until something needs it.
 - **Geo PVS (TXT col 7, e.g. "0.040 in/s")**.  Not stored in the
  block; can be approximated as `sqrt(T_peak² + V_peak² + L_peak²)`
  but BW's value sometimes differs slightly (probably computed from
  waveform-instant samples, not from per-channel peaks).  Punt — the
  `.h5` consumers don't need PVS as a sample channel.
 - **Mic psi(L) value (TXT col 8)**.  TXT shows it as a small psi value
  derived from the dB measurement.  Not in the 9 samples.  Could be
  derived from `M_peak_count` via the inverse of the dB formula plus
  a psi calibration constant.  Defer.
 ## Output shape
 `decode_histogram_body` returns the standard 4-channel dict that
 mirrors `waveform_codec.decode_waveform_v2`'s output:
 ```python
 {
    "Tran": [peak_count_per_interval, ...],   # 16-count units (LSB = 0.005 in/s)
    "Vert": [..., ...],
    "Long": [..., ...],
    "MicL": [..., ...],                       # raw ADC counts
 }
 ```
 Run through `waveform_codec.decoded_to_adc_counts` to get 1-count ADC
 units (geo ×16, mic passthrough) for the standard `.h5` writer.
 For the full per-interval record with frequencies + metadata, use
 `decode_histogram_body_full()`.
 ## Where it's wired
 - `minimateplus/event_file_io.py:read_blastware_file()` — first tries
  the waveform codec, falls back to the histogram codec when the
  waveform preamble isn't present.  Same output shape, same
  downstream pipeline.
 - `scripts/backfill_sidecars.py` — the `has_samples` short-circuit
  added during the histogram-codec-pending era still serves as a
  defensive guard against truly undecodable files, but no longer
  fires for valid histograms.
 ## Companion reference
 - `docs/waveform_codec_re_status.md` — sibling status doc for the
  much-more-complex waveform-mode codec.
 - `docs/instantel_protocol_reference.md §7.6.2` — historical
  protocol-reference entry.  Structural framing matches what we
  found; per-sample semantics were less documented than the `✅
  CONFIRMED` badge suggested.  This doc supersedes §7.6.2 where they
  conflict on confidence level.
@@ -0,0 +1,341 @@
 # IDF Protocol Reference — Thor / Micromate Series IV
 Starting-point reference for reverse-engineering Instantel's Micromate
 Series IV event-file format.  Sibling to
 [instantel_protocol_reference.md](instantel_protocol_reference.md) (the
 Series III "Rosetta Stone") — this doc holds what we know so far and
 the open questions still to crack.
 **Status (2026-05-28):** ASCII text sidecar fully decoded (1,014
 sample files round-trip).  **Thor IDFW** binary now decodes via
 `micromate.idf_file.read_idf_file()` — reuses the BW segment-rotated
 block codec verbatim at fixed body offset `0x0f1f`; metadata (serial,
 timestamp, sample_rate, record_time, calibration_date) extracted from
 the binary header.  Sample fidelity is 87–99% byte-exact on quiet
 events; loud events hit the BW codec's known walker-stops-early
 limitation.  Residual ~3% drift on per-sample deltas (likely a
 Thor-specific 12-bit delta refinement not yet modelled).
 **Thor IDFH histograms also decoded.**  Body has one or more segments;
 each 12-byte segment header `[length_be 2B][0a 00 00 00][00 NN][05 3f]`
 introduces `N = (length - 10) // 72` interval records of 72 bytes
 each.  Each interval = 4 × 16-byte per-channel records:
 `[int16 min][int16 max][int16 ??][uint16 halfp][2B 00][uint16 ??][2B 00][uint16 ??]`.
 Geo peak `= max(|min|, |max|) / 32768 × 10` in/s (matches sidecar
 ~1.8%); freq `= 512 / halfp` Hz (None for halfp ≤ 5 → ">100"
 sentinel).  Corpus: **all 859 Thor IDFH files decode, 181,071
 intervals**.  Wired through `read_idf_file()` →
 `save_imported_idf()` → sidecar's `extensions.idf_intervals`.
 **Note on the BE9439 outliers in the example corpus:** Two files
 (`BE9439_20200713131747.IDFW` and `BE9439_20200713124251.IDFH`) are
 **Series III Blastware** binaries, not Thor.  Provenance: TMI tried
 to use Thor to manage auto-call-homes for Series III units; the
 experiment didn't work out, but it did leave a few BW event files
 in Thor's per-serial directory structure with `.IDFW`/`.IDFH`
 extensions — Thor's forwarder applied its own naming convention to
 the BW bodies it was relaying.  Their header `10 00 01 80 00 00
 Instantel STRT ff fe <end_key> <start_key>` is the BW SUB 5A STRT
 record, not a Thor body preamble.  The reader detects them by
 signature and raises `NotImplementedError` pointing callers at
 `read_blastware_file()`, which extracts BW-format peaks from them.
 **Still NYI for Thor IDFH:** per-channel `int16 field4` (possibly
 time-of-peak); the two uint16 fields (probably PVS contributions);
 8-byte interval tail (PVS data); mic dB(L) exact conversion constant.
 ### Codec breakthroughs (2026-05-28)
 - **Body offset is a fixed `0x0f1f`** across 151/154 corpus IDFW
  files.  Preceded by a 4-byte record-type marker (`46 00 00 00`)
  + magic preamble `00 02 00 [Tran[0] BE] [Tran[1] BE]`.
 - **Sample stream is BW's segment-rotated block codec verbatim.**
  Thor reuses `10 NN` (nibble), `20 NN` (int8), `00 NN` (RLE),
  `30 NN` (packed12), `40 02` (segment header) tags with the same
  semantics.  Channel rotation Tran→Vert→Long→MicL.
 - **Geo LSB = 0.0003 in/s** (not BW's 0.005), because Thor's 16-bit
  ADC range maps to 10 in/s without the 16-count BW quantization step.
 - **Mic ≈ 2.14×10⁻⁶ psi/count** (rough scale; refine after channel
  block calibration constants are decoded).
 - **BW compliance anchor `\xbe\x80\x00\x00\x00\x00` reappears at
  IDFW offset 0x952** — sample_rate at anchor−6 (uint16 BE),
  record_time at anchor+6 (float32 BE), same layout as BW.
 - **Event timestamp at offset 0x97A** — 8 bytes `[day][month]
  [year_be][unk][hour][min][sec]`.  Stop-time mirrors at 0x982.
 - **Serial as null-terminated ASCII at 0x14E**.
 - **Calibration date** at 0x194–0x197 (day, month, year_be).
 - Per-sample residual drift of ~3% suggests Thor encodes int8/nibble
  deltas with an extra refinement bit that BW doesn't carry —
  unsolved; errors resync within a few samples so cumulative impact
  is small.
 ---
 ## File model
 ### Filename convention
 ```
 <SERIAL>_<YYYYMMDDHHMMSS>.<KIND>
 ```
 - **SERIAL** — literal device serial, two-letter prefix + numeric
  suffix.  Examples seen: `UM11719`, `UM13981`, `UM20147`, `BE9439`.
  Unlike Series III BW filenames (`M529LK44.AB0`, base-36 stem),
  Series IV filenames carry the serial in plain text.
 - **YYYYMMDDHHMMSS** — 14-char ASCII timestamp in **device local
  time** (no timezone marker).
 - **KIND** — `IDFH` for histograms, `IDFW` for waveforms.
 The `.IDFH.txt` / `.IDFW.txt` ASCII sidecar lives in a `TXT/`
 **subfolder** of the unit's directory, not alongside the binary.
 This pairing convention is encoded in
 `event_forwarder.idf_report_path()`.
 ### Directory layout
 ```
 C:\THORDATA\
 └── <Project>\
    └── <UM####>\                  ← unit serial dir
        ├── UM12345_20260520100000.MLG     ← monitor log (not events)
        ├── UM12345_20260520100000.IDFH    ← histogram event (binary)
        ├── UM12345_20260520100000.IDFW    ← waveform event (binary)
        ├── UM12345_20260520100000.IDFW.CDB ← cache-DB variant (skip)
        ├── TXT\
        │   ├── UM12345_20260520100000.IDFH.txt    ← histogram ASCII sidecar
        │   └── UM12345_20260520100000.IDFW.txt    ← waveform  ASCII sidecar
        ├── CSV\, HTML\, PDF\, XML\        ← operator-facing derived exports
        └── ...
 ```
 The `.IDFW.CDB` files share the binary's basename but appear to be a
 separate cache/database variant.  Their first 8 bytes match the
 **old**-firmware Thor signature (see below) regardless of which
 signature the paired `.IDFW` uses.  Purpose unknown; sizes vary
 wildly (observed 123 B → 40,491 B).  Thor-watcher's forwarder
 deliberately skips them.
 ### Sample corpus
 The `thor-watcher/example-data/THORDATA_example/` tree carries
 **1,014 paired .IDFW / .IDFH + .txt files** spanning 2020–2023
 across nine units (UM11719, UM13981, UM20147, …, plus BE9439 from
 2020).  This is the reverse-engineering ground truth.
 ---
 ## ASCII sidecar (`.IDFW.txt` / `.IDFH.txt`) — fully decoded
 Shape: plain text, one `"Key : Value"` line per metadata field,
 followed for waveforms by a tab-separated sample table headed by
 the literal line `Waveform Data Channels`.  Parsed by
 [`micromate/idf_ascii_report.py`](../micromate/idf_ascii_report.py).
 See [`micromate/models.py`](../micromate/models.py) for the typed
 `IdfReport` shape.
 ### Notable conventions
 - **Units are native to Thor** — geophone in **in/s**, microphone in
  **dB(L)** (not psi like Series III BW reports), frequency in Hz,
  acceleration in g, displacement in in.
 - **Below-threshold readings** appear as the literal string
  `<0.005 in/s` (155 occurrences in the sample corpus) — the parser
  strips the `<` and treats the numeric remainder as the value.
 - **Out-of-range / not-measured** values appear as `N/A` — parser
  drops the field rather than letting the string leak into a numeric
  column.
 - **Firmware string** observed: `Micromate ISEE 11.0AK`.
 - **TitleString1..4** are operator-defined free-text slots; Thor's
  default labels map them to Location / Client / Company / Notes,
  which the parser surfaces as `project` / `client` / `operator` /
  `notes`.
 - **Histogram sidecars** use `HistogramStartDate` / `HistogramStartTime`
  in place of waveform's `EventDate` / `EventTime`.  Parser falls
  through to either.
 - **Histogram tabular block** lacks the `Waveform Data Channels`
  marker; instead it's a multi-line column header followed by
  per-interval rows (`<date> <time> <tran-ppv> <freq> ...`).  Parser
  silently ignores lines after the metadata block since they lack a
  colon-separated `key : value` shape (the timestamps DO contain
  colons but produce garbage keys that don't collide with any
  recognised field).
 ---
 ## Binary header signatures (observed)
 Hex dump of the first 32 bytes across 1,014 sample files reveals
 **two distinct file signatures**, both anchored by the literal
 ASCII string `"\x00Instantel\x00"` at offset 6–16:
 ### Signature A — newer firmware (1,012 files, 99.8% of corpus)
 ```
 00000000: 0012 0100 0000 496e 7374 616e 7465 6c00   ......Instantel.
 00000010: 0000 a695 002e b500 4f70 6572 6174 6f72   ........Operator
                                ^^^^^^^^^^^^^^^^
                                operator/title string starts at 0x18
 ```
 Header bytes 0–5: `00 12 01 00 00 00`.  Followed immediately by the
 8-byte ASCII tag, then 6 unknown bytes, then ASCII operator-supplied
 strings (Operator name, etc.) and on through the project / client /
 title strings.  No `STRT` record observed in this layout.
 ### Signature B — older firmware (2 files: BE9439 from 2020)
 ```
 00000000: 1000 0180 0000 496e 7374 616e 7465 6c00   ......Instantel.
 00000010: 072c 0012 0300 5354 5254 fffe 0111 2340   .,....STRT....#@
                          ^^^^^^^^^                ^^^^^^^^^
                          STRT magic               4-byte end_key
 00000020: 0111 0000 2e5f 00ac 4600 0000 0200 0000   ....._..F.......
          ^^^^^^^^^             ^^^
          4-byte start_key      0x46 (BW WAVEHDR record-type marker)
 ```
 Header bytes 0–5: `10 00 01 80 00 00`.  The structure after the
 `Instantel` magic is **byte-for-byte identical to a BW SUB 5A
 probe-response STRT record** as documented in
 [instantel_protocol_reference.md → "SUB 5A — STRT record encodes
 end_offset"](instantel_protocol_reference.md).  Specifically:
 | Offset | Bytes               | Meaning (per BW reference)          |
 |--------|---------------------|--------------------------------------|
 | 0x14   | `53 54 52 54`       | `STRT` magic                         |
 | 0x18   | `ff fe`             | STRT sentinel                        |
 | 0x1A   | `01 11 23 40`       | `end_key` (4 bytes)                  |
 | 0x1E   | `01 11 00 00`       | `start_key` (4 bytes)                |
 | 0x26   | `46`                | `0x46` waveform-record type marker   |
 **Hypothesis:** Older Micromate firmware writes a wrapped BW-format
 event into the `.IDFW` file — essentially the same on-disk shape as
 a Series III device, with the new filename convention applied at
 export time.  Newer firmware (signature A) abandoned the
 BW-compatible layout for an Instantel-specific format.
 If that hypothesis holds, the 2 signature-B files can already be
 parsed via `minimateplus/event_file_io.read_blastware_file()` — worth
 testing.  The 1,012 signature-A files are the real reverse-engineering
 target.
 ### `.IDFW.CDB` cache files
 Always carry signature B (`10 00 01 80 ...`), even when the paired
 `.IDFW` carries signature A.  Plausible explanation: the CDB is an
 internal Thor cache-database export that retains the legacy BW-style
 record layout regardless of the user-facing `.IDFW` format version.
 Not currently consumed by the forwarder.
 ---
 ## File-size patterns (Signature A, the main target)
 Survey of 1,012 signature-A files:
 | Event type   | Typical size      | Source of variance                           |
 |--------------|-------------------|----------------------------------------------|
 | `.IDFW` 2-sec | 9,200 – 10,500 B | Operator-supplied strings (TitleString1..4) of varying length |
 | `.IDFH`       | 2,944 – 4,076 B  | Histogram interval count (record duration / interval) |
 **Naive arithmetic for 2-sec waveform:**
 - 4 channels × 2 sec × 1024 sps = 8,192 samples
 - At 2 bytes/sample (int16) = 16,384 sample bytes → file would be > 16 KB
 - Observed: ~9–10 KB
 - → samples are likely **1 byte each** (int8 quantised), **or** stored
  with bit-packing / delta encoding, **or** only one channel's
  full-rate samples are stored with the others reconstructed
  arithmetically.  Verifying this is the **first RE milestone**.
 Project-string–length variance (~1 KB across the corpus) is consistent
 with the file carrying a single copy of each TitleString1..4 plus
 operator + setup-name as null-padded ASCII regions.
 ---
 ## Open questions
 The reverse-engineering targets, roughly in dependency order:
 1. **Sample encoding (signature A)** — int8? int16 LE/BE? Bit-packed?
   Delta-coded?  Per-channel interleaved or sequential blocks?
 2. **Header field layout (signature A)** — where do sample_rate,
   record_time, channel count, and per-channel peaks live in the
   binary?  The ASCII sidecar gives the device-authoritative values,
   so binary fields can be confirmed by diff.
 3. **Operator-string offsets** — `Operator` at 0x18 is the first
   visible string in signature-A files; the rest (project, client,
   notes, setup) follow.  Need to map exact offsets and null-padding
   conventions.
 4. **Signature-B → BW codec compatibility** — does
   `minimateplus/event_file_io.read_blastware_file()` actually parse
   the 2 BE9439 signature-B files as-is?  If yes, the OLD-format
   ingest is free.
 5. **`.IDFW.CDB` purpose** — is it an internal Thor cache, a
   ring-buffer dump, or something else?  Worth a single small effort
   to characterise so we know what we're skipping.
 6. **Footer / checksum** — every BW event file has a footer; does
   IDF?  Where does the per-channel sample block end?
 ---
 ## Reverse-engineering playbook (when we start)
 The Series III BW codec took ~2 months of MITM wire captures
 because we didn't have ground-truth metadata.  Thor's situation is
 **substantially better**:
 - **Ground truth is on disk.**  Every binary in `example-data/`
  has a paired `.IDFW.txt` carrying the full decoded sample table
  (`Waveform Data Channels` block — see any sample file in
  `thor-watcher/example-data/.../TXT/`).  Aligning binary bytes
  to the table's float-per-row values gives an immediate per-byte
  hypothesis test.
 - **Cross-event diffing.**  1,012 signature-A samples from 9 units
  spanning 4 years means any field that varies between events is
  immediately localisable.  Fields that are constant across all
  files (firmware ID, channel labels, format-version word) are also
  immediately localisable by complementary search.
 - **No protocol surface.**  Files at rest, not a wire dialect.  No
  DLE stuffing, no inner-frame parsing, no probe/data two-step.
 Suggested first session (2-4 hours): hand-decode `UM11719_20231219162723.IDFW`
 (10,290 bytes) against its `TXT/UM11719_20231219162723.IDFW.txt`
 sample table (the 2-sec waveform at 1024 sps × 4 channels = 8,192
 sample rows).  Find the first per-channel sample value (`0.0003` in
 the Tran column at t=0) in the binary.  Confirms sample encoding.
 Everything else flows from there.
 ---
 ## Code seams ready to receive the codec
 When the codec lands, it goes into
 [`micromate/idf_file.py`](../micromate/idf_file.py) (currently a
 stub raising `NotImplementedError`).  Public API:
 ```python
 from micromate import IdfEvent
 from micromate.idf_file import read_idf_file
 event: IdfEvent = read_idf_file(Path("UM11719_20231219163444.IDFW"))
 # event.peaks.transverse_ips, event.timestamp, event.raw_samples, ...
 ```
 The ingest pipeline (`WaveformStore.save_imported_idf`) currently
 builds the `IdfEvent` from the `.txt` parser only.  Once
 `read_idf_file()` works, the binary becomes authoritative; the
 `.txt` parser drops to fast-path metadata cross-check.  Operators
 who don't enable Thor's TXT exporter still get fully populated
 events.
 ---
 ## See also
 - [instantel_protocol_reference.md](instantel_protocol_reference.md) — Series III BW protocol reference (the Rosetta Stone).  STRT record format, DLE framing, BW filename encoding.
 - [`micromate/idf_ascii_report.py`](../micromate/idf_ascii_report.py) — `.txt` sidecar parser.
 - [`micromate/models.py`](../micromate/models.py) — `IdfEvent`, `IdfReport` typed dataclasses.
 - [`micromate/idf_file.py`](../micromate/idf_file.py) — placeholder for the binary codec.
 - [`thor-watcher/example-data/THORDATA_example/`](../../thor-watcher/example-data/) — 1,014 paired binary + .txt files for codec validation.
@@ -0,0 +1,255 @@
 # Runbook — Recovering a wedged unit stuck in a call-home loop
 **Original incident:** BE9558H at `166.246.130.1:9034`, recovered 2026-05-17.
 A field unit with a stuck-triggered geophone (or any hardware fault causing
 constant event triggering) will record events back-to-back, and if Auto Call
 Home is set to "After Event Recorded" the device will dial the office BW
 ACH server in a tight loop. Combined with a Sierra Wireless modem in
 bidirectional serial-TCP mode, this makes the unit effectively unreachable
 from SFM — every TCP connection we open gets killed when the modem flips
 from server-mode to client-mode to honor the device's next AT dial command.
 This runbook describes how to break the loop and recover control.
 ---
 ## Symptoms
 - Terra-View / SFM `/device/info` either hangs or fails on `count_events()`.
 - `/device/monitor/status` and `/device/rescue` return 502 (protocol timeout
  waiting for POLL response) or 503 (TCP connect refused).
 - ACEmanager serial log shows repeating
  `Connect to IP: <BW_IP> Port: <BW_PORT>` → `Shutdown TCP socket` cycles
  every 30-60 seconds.
 - Spam-mode endpoints (`/device/stop_monitoring_spam`) report many
  `sent_ok` but the device's monitoring state never changes.
 - `slow_drip` reports `[Errno 32] Broken pipe` after sending the preamble
  but before completing the drip loop.
 If you see *all* of these, the unit is in this exact failure mode.
 ---
 ## Quick reference — how to recover
 You need **ACEmanager access** to the unit's modem.
 ### Step 1: stop the modem's mode-flipping
 In ACEmanager → **Serial → Port Configuration**:
 | Field | Set to |
 |---|---|
 | **Destination Address** | clear (blank) |
 | **Destination Port** | `0` |
 Click **Apply**. This removes the modem's auto-dial-out target. The device's
 AT dial commands now error back at the modem instead of triggering a
 mode-flip, so the modem stays in TCP-server mode permanently and our inbound
 TCP sessions stay alive.
 *(Optional belt-and-suspenders: also add the BW server's port to
 **Security → Port Filtering - Outbound** as a blocked port, with
 Outbound Port Filtering Mode = Blocked Ports.)*
 ### Step 2: stop monitoring on the device (slow drip)
 From the SFM host:
 ```bash
 /home/serversdown/seismo-relay/scripts/slow_drip.sh <DEVICE_IP> <PORT>
 ```
 Defaults are 120s duration with a drip every 3s. Watch the response:
 - `duration_s ≈ 120` and `drips_sent ≈ 40` → session held the full duration ✓
 - `bytes_received > 0` → device is responding ✓ (this is the success signal)
 If `duration_s` is small or `send_error: "Broken pipe"`, Step 1 didn't take
 hold — re-check ACEmanager, may need to reboot the modem after Apply.
 ### Step 3: confirm monitoring stopped
 ```bash
 curl 'http://localhost:8200/device/monitor/status?host=<DEVICE_IP>&tcp_port=<PORT>&force=true'
 # expect: {"is_monitoring": false, ...}
 ```
 ### Step 4: disable ACH at the device level + erase corrupted events
 Either fire the rescue endpoint:
 ```bash
 /home/serversdown/seismo-relay/scripts/rescue_device.sh <DEVICE_IP> <PORT>
 ```
 Or do the two steps manually:
 ```bash
 # Disable ACH in the device's compliance config
 curl -X POST 'http://localhost:8200/device/call_home?host=<DEVICE_IP>&tcp_port=<PORT>' \
  -H 'Content-Type: application/json' \
  -d '{"auto_call_home_enabled": false}'
 # Erase corrupted event chain
 curl -X POST 'http://localhost:8200/device/events/erase?host=<DEVICE_IP>&tcp_port=<PORT>'
 ```
 You can also do this via the SFM standalone UI → **Call Home** tab → set
 `Enable Auto Call Home` to `Disabled` → **Write to Device**.
 ### Step 5: restore modem config (housekeeping)
 Once the device-side ACH is disabled, restore the modem's Destination
 Address and Port to the original values (e.g. `50.197.32.92` / `12345`) in
 ACEmanager. The modem will resume normal bidirectional behavior, but the
 unit won't issue any dial commands until ACH is explicitly re-enabled on
 the device.
 ### Step 6: do NOT re-enable ACH on this unit until the underlying hardware
 fault is repaired. If you do, the call-home loop starts again immediately
 and you'll be running this runbook a second time.
 ---
 ## Why this works — the failure mode explained
 The Sierra Wireless RV50/RV55 serial port operates in one of two TCP modes
 at any moment:
 - **Server mode** — listens on `Device Port` (e.g. 9034), bridges inbound
  TCP to the device's serial port. This is what we need to interact with
  the device.
 - **Client mode** — when the device sends an AT dial command on its serial
  TX line, the modem opens an outbound TCP to `Destination Address:Port`
  and bridges that to serial.
 A serial port in this configuration is **bidirectional**: the modem flips
 between server and client modes on demand. When the device's firmware is
 healthy and only dials occasionally, this works fine.
 When the unit is constantly triggering events and ACH is set to "After
 Event Recorded", the device sends an AT dial command every few seconds.
 Each one causes the modem to:
 1. Drop any active inbound TCP session
 2. Flip to client mode
 3. Attempt outbound TCP to `Destination Address:Port`
 4. Hang for up to a minute waiting for it to succeed/fail
 5. Drop back to server mode
 **During the entire hang, no inbound TCP can establish.** Even between
 hangs, the modem closes any existing inbound session before flipping. So
 any tool that needs more than a few seconds of held TCP (e.g. POLL +
 config read + write) gets repeatedly kicked off.
 Clearing `Destination Address` removes step 3-4 from the cycle: the modem
 has nowhere to dial, so it doesn't flip modes when it receives an AT dial
 command. The serial port effectively becomes server-only, and inbound TCP
 sessions can stay open as long as needed.
 **This is a modem-layer issue, not a device firmware issue.** The device
 is alive and responsive the whole time — confirmed in the BE9558H
 recovery by 990 bytes of S3 responses received over a 120s slow-drip
 session once the modem was no longer mode-flipping.
 ---
 ## Why simpler approaches don't work
 | Approach | Why it fails |
 |---|---|
 | Standard `/device/info` | Triggers `count_events()` 1E/1F walk, takes 90s+ and hits corrupted event chain in this scenario |
 | `/device/rescue` race loop | Gets 502 (protocol timeout) because the modem closes the TCP before the POLL handshake can complete |
 | `/device/stop_monitoring_blind` (single frame) | Even if the bytes leave the wire, the device's protocol parser ignores write commands without a preceding POLL handshake (early-version bug, now fixed by including POLL preamble in blind sends) |
 | `/device/stop_monitoring_spam` (sub-second cadence) | Each session is killed by the modem's mode-flip before the device can drain its UART RX buffer; high-rate spam also risks UART FIFO overrun on the device side |
 | Outbound port firewall block alone | Stops the outbound TCP from succeeding, but doesn't stop the modem from *trying* and mode-flipping. Reduces but doesn't eliminate the contention. |
 | Modem reboot | Temporary — as soon as the device starts triggering again, the loop resumes within seconds |
 The combination of `slow_drip` + cleared `Destination Address` works because:
 1. The modem stops mode-flipping → TCP session stays open for the full
   drip duration
 2. Slow drip rate → device's UART RX FIFO never overflows even if
   firmware is busy with event recording
 3. The drip is `SESSION_RESET + STOP_MONITORING` every 3s → many
   independent chances for the parser to land one valid frame
 4. Once one Stop Monitoring is parsed, event recording halts → firmware
   has CPU to spare → subsequent operations are trivially easy
 ---
 ## Tooling reference
 All endpoints live in `seismo-relay/sfm/server.py`. All scripts live in
 `seismo-relay/scripts/` and default to SFM direct (`http://localhost:8200`),
 overridable via `SFM_BASE_URL`.
 ### Endpoints added during BE9558H recovery
 | Endpoint | Purpose |
 |---|---|
 | `GET /device/events/storage_range` | SUB 0x06 — first/last event keys, `is_empty` flag. ~2s, no event walk. |
 | `GET /device/events/index` | SUB 0x08 — lifetime event counter (does NOT decrement on erase). ~2s. |
 | `POST /device/events/erase` | Full erase sequence 0xA3 → 0x1C → 0x06 → 0xA2. |
 | `POST /device/rescue` | Disable ACH + erase in one TCP session. Short timeouts for race-loop usage. |
 | `POST /device/stop_monitoring_blind` | Fire-and-forget Stop with full POLL preamble (single attempt). |
 | `POST /device/stop_monitoring_spam` | Server-side tight retry loop, sub-second cadence, duration-bounded. |
 | `POST /device/stop_monitoring_slow_drip` | One held TCP session, slow trickle of stop frames. **The endpoint that saved BE9558H.** |
 Also changed: default protocol recv timeout dropped from 30s → 10s in
 `_build_client`. Added `connect_timeout` knob to same. Cleaned up
 unhandled-exception path in `/device/monitor/status` so it returns 502
 instead of 500 on protocol timeouts.
 ### Scripts
 | Script | Purpose |
 |---|---|
 | `scripts/rescue_device.sh` | Race-loop wrapper around `/device/rescue` |
 | `scripts/blind_stop.sh` | Race-loop wrapper around `/device/stop_monitoring_blind` |
 | `scripts/spam_stop.sh` | Single-call burst hammer (`/device/stop_monitoring_spam`) |
 | `scripts/slow_drip.sh` | Single-call held-session drip (`/device/stop_monitoring_slow_drip`) |
 | `scripts/watch_unit.sh` | Passive periodic reachability check, logs to file |
 ---
 ## Incident log — BE9558H, 2026-05-16/17
 What was wrong: Long-axis geophone developed an offset, constantly above
 trigger threshold → constant event recording → after-event ACH set →
 modem dialing office BW server (`50.197.32.92:12345`) every 30-60s.
 Local event chain corrupted (`next_boundary 0x100EE exceeds uint16`).
 Diagnostic path:
 1. `/device/info` slow, choked on event walk
 2. Built lightweight probe endpoints (`storage_range`, `index`) — useful
   but didn't reach the wedged unit
 3. Built `/device/rescue` with short timeouts — got 502 (POLL no response)
 4. Built `/device/stop_monitoring_blind` — first version was a false
   positive (no POLL preamble); fixed by including
   `SESSION_RESET+POLL_PROBE+SESSION_RESET+POLL_DATA` in the dump
 5. Verified blind stop works on bench unit
 6. Built `/device/stop_monitoring_spam` — 420 successful sends over
   5 min, zero behavior change on field unit
 7. Inspected ACEmanager logs → saw outbound dial-out attempts every ~30s,
   confirmed device was not fully locked up
 8. Added outbound port-12345 firewall block → outbound attempts now fail
   instantly but contention persisted
 9. Built `/device/stop_monitoring_slow_drip` — session died at 3s with
   broken pipe (modem closing on us)
 10. Looked at full ACEmanager Port Configuration → **found
    `Destination Address: 50.197.32.92` configured**, realized every AT
    dial command was triggering a modem mode-flip that killed our inbound
 11. Cleared Destination Address + Port → slow_drip held 120s, device
    responded with 990 bytes, 39 stop commands acked
 12. Disabled ACH at device level via `/device/call_home`, erased events
 Final state: device IDLE, memory 958.1 / 960 KB free, ACH disabled at
 device level, modem destination cleared (to be restored after physical
 service).
 Total time from "i was wondering if its possible to" first attempt to
 recovery: ~7 hours of intermittent debugging across one evening.
@@ -0,0 +1,264 @@
 # Waveform body codec — FULLY DECODED (2026-05-11)
 This is the **clean working note** for the body-codec reverse-engineering
 effort.  It supersedes scattered claims elsewhere when they conflict.
 The deep historical record (with retractions, dead ends, and dated
 analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
 authoritative implementation lives in `minimateplus/waveform_codec.py`.
 ## TL;DR
 **The codec is fully decoded.**  Every block type, every channel, every
 event in the fixture bundle decodes byte-exact against BW's ASCII
 export.
 | Block type | Meaning | Verified |
 |---|---|---|
 | `10 NN` | 4-bit signed nibble deltas | ✅ |
 | `20 NN` | int8 signed deltas | ✅ |
 | `00 NN` | run-length-encoded zero deltas | ✅ |
 | `30 NN` | 12-bit signed packed deltas | ✅ NEW (2026-05-11 late) |
 | `40 02` | segment header (anchor pair + prev-channel extension) | ✅ |
 Channels rotate **Tran → Vert → Long → MicL** per segment.  Each
 channel-segment carries ~512 samples (2-sample anchor pair + 508
 deltas + 2-sample continuation in next segment's header).
 ## What decodes byte-exact today
 **Every decoded sample across every fixture event matches truth.  Zero
 divergences.**
 | Event | Description | Tran | Vert | Long | Total |
 |---|---|---|---|---|---|
 | event-a (5-8) | quiet, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
 | event-c (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
 | event-d (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
 | JQ0 (5-11) | Vert-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
 | V70 (5-11) | Mic-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
 | SP0 (5-11) | loud all, 3 sec | 2048 ✓ | 1538 ✓ | 1536 ✓ | 5122 |
 | SS0 (5-11) | loud-from-start | 734 ✓ | 512 ✓ | 512 ✓ | 1758 |
 | SV0 (5-11) | loud-from-start | 1024 ✓ | 578 ✓ | 512 ✓ | 2114 |
 | event-b (5-8) | quiet, 2 sec | 512 ✓ | 226 ✓ | 0 | 738 |
 That's **47,364 ADC samples decoded byte-exact, zero errors.**
 Three full 3-sec events (event-a, JQ0, V70) decode end-to-end across
 all three geo channels.
 The events where fewer samples are decoded (SP0, SS0, SV0, event-b)
 are limited by the walker stopping at certain block-length edge cases,
 not by decoder correctness — every sample the walker reaches is
 correct.
 ## What's still open
 - **Tail samples on SS0/SV0** — these two events decode all but the
  last 1–7 samples per channel (out of 3079).  Likely the same
  "last segment is truncated" pattern.  Minor; doesn't affect the
  bulk of the data.
 ## Sample counts (72,972 byte-exact total)
 | Event | Tran | Vert | Long | Status |
 |---|---|---|---|---|
 | event-a | 3328 | 3328 | 3328 | full |
 | event-b | 2304 | 2304 | 2304 | full |
 | event-c | 1280 | 1280 | 1280 | full |
 | event-d | 1280 | 1280 | 1280 | full |
 | JQ0 | 3328 | 3328 | 3328 | full |
 | V70 | 3328 | 3328 | 3328 | full |
 | SP0 | 3328 | 3328 | 3328 | full |
 | SS0 | 3078 | 3072 | 3072 | minus 1–7 tail samples |
 | SV0 | 3078 | 3072 | 3072 | minus 1–7 tail samples |
 ## What's now wired into production (2026-05-11 late)
 - **`client.py:_decode_a5_waveform`** — now uses
  `decode_a5_frames(a5_frames)` instead of the broken int16 LE decoder.
  `event.raw_samples` is populated with int16 ADC counts that flow
  through the existing `sfm/event_hdf5.py` scaling pipeline unchanged.
  Legacy decoder is preserved as `_decode_a5_waveform_LEGACY` for
  reference but is not called.
 - **MicL → dB(L) conversion** — exposed as
  `waveform_codec.mic_count_to_db(count)`.  Verified against BW
  display values (count=1 → 81.94 dB; count=813 → 140.14 dB; matches
  the V70 mic-heavy fixture exactly).
 - **`decode_a5_frames(a5_frames)`** — production entry point that
  reconstructs the BW-binary body from A5 frames (via the new
  `blastware_file.extract_body_bytes` helper) and runs the verified
  codec.  Returns the same `raw_samples` dict shape the consumers
  already expect.
 ## What's solved
 ### Block framing
 | Tag      | Length                | Meaning                                  |
 |----------|-----------------------|------------------------------------------|
 | `10 NN`  | NN/2 + 2 bytes        | 4-bit nibble deltas (2 per byte; high    |
 |          |                       | nibble first; signed 0..7 / 8..F = -8..-1)|
 | `20 NN`  | NN + 2 bytes          | int8 signed deltas (1 per byte)          |
 | `00 NN`  | 2 bytes               | RLE: append NN copies of current value   |
 | `30 NN`  | NN*2 in data section, | Unknown content.  Only in loud-from-     |
 |          | NN*4 in trailer       | start events.                            |
 | `40 02`  | 20 bytes (fixed)      | Segment header                           |
 NN is always a multiple of 4.
 Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
 ### 7-byte preamble
 ```
 body[0:3]  = 00 02 00              magic
 body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
 body[5:7]  = Tran[1]   int16 BE    in 16-count units
 ```
 ### Tran channel, segment 0
 Segment 0 (everything before the first `40 02`) encodes Tran samples
 only.  Starting from preamble anchors Tran[0] and Tran[1], each block
 contributes to a running cumulative:
 - `10 NN` →  append NN nibble-deltas
 - `20 NN` →  append NN int8-deltas
 - `00 NN` →  append NN copies of current value (RLE)
 - `40 02` →  end segment 0
 Verified byte-exact:
 | Event | Description | Segment 0 size | Match |
 |---|---|---|---|
 | `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
 | `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
 | `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
 | `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
 | `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
 Implementation: `decode_tran_initial()`.
 ### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11
 | Payload offset | Field | Status |
 |---|---|---|
 | [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
 | [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
 | [4:6] | Unknown (likely checksum) | ❓ open |
 | [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
 | [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
 | [12:14] | Constant `02 00` | ✅ confirmed |
 | [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
 | [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |
 **Key insight (2026-05-11 late):** every segment carries 510 main
 samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
 in the NEXT segment header.  So each channel-segment effectively spans
 512 sample-sets.  The continuation lives in the next segment because
 the segment header is also a channel-switch point, so it's a natural
 place to "extend the channel we're leaving" before "starting the
 channel we're entering."
 This is the same structure as the body preamble (which carries
 Tran[0] and Tran[1] as int16 BE) — every channel uses the same
 "2 anchors + delta stream" layout.
 ## Channel rotation — VERIFIED 2026-05-11
 ```
 (initial body)  →  Tran samples 0..509       (preamble + delta blocks)
 segment 0 hdr  ext+anchor →  Vert samples 0..511   ← anchor in hdr [14:18]
 segment 1 hdr  ext+anchor →  Long samples 0..511
 segment 2 hdr  ext+anchor →  Mic  samples 0..511
 segment 3 hdr  ext+anchor →  Tran samples 510..1021 (continuation)
 segment 4 hdr  ext+anchor →  Vert samples 512..1023
 segment 5 hdr  ext+anchor →  Long samples 512..1023
 segment 6 hdr  ext+anchor →  Mic  samples 512..1023
 segment 7 hdr  ext+anchor →  Tran samples 1022..1533
 ...
 ```
 Implementation: `decode_waveform_v2()` returns
 `{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
 each channel's samples in 16-count units.  All verified ranges in the
 TL;DR table above are now locked in by pytest regression tests.
 ## What's still open
 1. **`30 NN` block content.**  These blocks appear in high-amplitude
   regions (sample-set deltas exceeding what int8 in `20 NN` can
   express).  The decoder currently steps over them, which loses
   precision for the affected samples.  Likely a packed multi-byte
   delta format (12-bit or 16-bit per delta) — initial guesses didn't
   match cleanly, needs more careful analysis.
 2. **MicL decoding.**  The mic channel's anchor pair appears in the
   third segment of each rotation cycle in the same format as the
   geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
   quantization steps), so direct integer comparison against ADC
   units doesn't work.  Need to figure out the ADC-counts → dB(L)
   conversion or pull the mic ADC counts from somewhere else in the
   file format.
 3. **Walker fix for event-b.**  The original quiet bundle's event-b
   still bails out partway through.  Lower priority since the other
   7 events walk cleanly.
 ## `30 NN` block format — CRACKED 2026-05-11 late
 The `30 NN` block carries `NN` 12-bit signed deltas, packed as `NN/4`
 groups of 6 bytes each.  Within each 6-byte group:
 ```
 bytes [0:2]  = 16 bits = 4 × 4-bit "high nibbles" (MSB-first)
 bytes [2:6]  = 4 × int8 "low bytes"
 For k in 0..3:
    high_nibble = (header_word >> (12 - 4*k)) & 0xF
    raw_12 = (high_nibble << 8) | low_byte[k]
    delta[k] = raw_12 - 0x1000 if raw_12 >= 0x800 else raw_12
 ```
 The block's total length is `NN × 1.5 + 2` bytes (tag included).  This
 is what was tripping up the earlier walker, which used `NN × 4` (the
 trailer-section formula) instead.
 Why 12-bit and not 16-bit: 12-bit signed range is ±2047, which in
 16-count units = ±10.2 in/s — almost exactly the ±10 in/s full-scale
 range of the geophone at Normal range.  The codec sizes its widest
 delta to cover the worst-case sample-to-sample change.
 Verified against all 14 `30 NN` blocks across the bundled fixture
 events.  Every delta decodes byte-exact against BW's ASCII export.
 ## Test fixtures
 Committed under `tests/fixtures/`:
 - `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
  PPV < 1 in/s).  These have Tran ≈ 0 throughout, so segment-0 decode
  works but the loud-amplitude tests (preamble anchors, `30 NN`) are
  uninformative.
 - `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
  channels).  These cracked the Tran codec.
 - `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures.  JQ0 is Vert-heavy,
  V70 is Mic-heavy (140 dB).  These cracked the `00 NN` RLE rule.
 Each fixture has a `.TXT` Blastware ASCII export as ground truth.
 ## Tests
 `tests/test_waveform_codec.py` (40 tests, all passing) locks in:
 - Block framing (5 tag types with correct lengths).
 - Walker contiguity (no gaps or overlaps).
 - Segment header parsing (counter monotonicity, fixed-pattern check).
 - `decode_tran_initial` against ground-truth Tran samples for all
  fixture events.
 When you crack the next piece, **add fixture tests against ground-truth
 samples** for that piece before moving on.  Don't let unverified code
 ship without a regression lock-in.
@@ -0,0 +1,48 @@
 """
 micromate — Instantel Micromate (Series IV) device library.
 Sibling of ``minimateplus`` (the Series III library).  Currently scoped to
 the offline-file ingest path used by thor-watcher: parsing the per-event
 ``.IDFH``/``.IDFW`` ASCII text sidecars Thor's exporter writes alongside
 each binary event file, and wrapping the parsed data in typed event
 records.
 Live-device support (TCP protocol, frame parsing, real-time monitoring)
 is deferred — when we add it, it lands here as ``transport.py`` /
 ``framing.py`` / ``protocol.py`` / ``client.py``, mirroring the
 ``minimateplus`` package layout.
 Typical usage (offline file ingest):
    from micromate import IdfEvent, parse_idf_report
    text  = open("UM11719_20231219162723.IDFW.txt").read()
    rep   = parse_idf_report(text)                       # dict
    event = IdfEvent.from_report(rep, "UM11719_20231219162723.IDFW")
    print(event.serial, event.peaks.transverse_ips, event.mic_pspl_dbl)
 """
 from .idf_ascii_report import (
    parse_event_filename,
    parse_idf_report,
    serial_from_filename,
 )
 from .models import (
    IdfEvent,
    IdfPeaks,
    IdfProjectInfo,
    IdfReport,
    IdfSensorCheck,
 )
 __version__ = "0.1.0"
 __all__ = [
    "IdfEvent",
    "IdfPeaks",
    "IdfProjectInfo",
    "IdfReport",
    "IdfSensorCheck",
    "parse_event_filename",
    "parse_idf_report",
    "serial_from_filename",
 ]
@@ -0,0 +1,330 @@
 """
 micromate/idf_ascii_report.py — parse Thor (Micromate Series IV) IDF ASCII reports.
 Thor exports a `.IDFW.txt` or `.IDFH.txt` sidecar next to each `.IDFW`
 (waveform) or `.IDFH` (histogram) event binary.  Each sidecar is a
 plain-text file with `"Key : Value"` lines covering the full device-
 authoritative event metadata — PPV per channel, ZC Freq, Time of Peak,
 Peak Acceleration / Displacement, sensor self-check results, project
 strings, calibration date, battery level, etc. — followed by a raw
 waveform-samples block headed by the literal line "Waveform Data Channels".
 This is the Thor analogue of `minimateplus/bw_ascii_report.py` for the
 Blastware (Series III) report format.  The parser is intentionally
 permissive: we extract everything we recognise into a flat dict and
 silently ignore anything we don't.  Downstream callers parse units
 (`"0.2119 in/s"` → 0.2119) only on the fields they need.
 Example input (truncated):
    "EventType : Full Waveform"
    "SampleRate : 1024 sps"
    "EventTime : 16:27:23"
    "EventDate : 2023-12-19"
    "TranPPV : 0.0251 in/s"
    "VertPPV : 0.2119 in/s"
    "LongPPV : 0.0282 in/s"
    "PeakVectorSum : 0.2131 in/s"
    "MicPSPL : 99.4 dB(L)"
    "TranZCFreq : 6.5 Hz"
    "SerialNumber : UM11719"
    "Version : Micromate ISEE 11.0AK"
    "FileName : UM11719_20231219162723.IDFW"
    "BatteryLevel : 3.8 volts"
    "Calibration : November 22, 2023 by Instantel"
    "TranTestResults : Passed"
    "TitleString1 : UPMC Presby-Loc 3-Level1-1R Elevator Rm"
    Waveform Data Channels
        Tran    Vert    Long    MicL
        0.0003  -0.0003  0.0003  0.00013
        ...
 """
 from __future__ import annotations
 import datetime
 import re
 from typing import Any, Dict, Optional, Tuple, Union
 # Lines look like:  "Key : Value"   (quotes literal, single ":" separator)
 _LINE_RE = re.compile(r'^\s*"?([^":]+?)"?\s*:\s*"?(.*?)"?\s*$')
 # Marker that ends the metadata block — everything after is raw sample data.
 _WAVEFORM_BLOCK_MARKER = "waveform data channels"
 def _normalize_key(raw: str) -> str:
    """Convert "TranPPV" / "PreTriggerLength" → snake_case."""
    s = raw.strip()
    # Insert underscore between lower→upper / digit→letter transitions
    s = re.sub(r"(?<=[a-z0-9])(?=[A-Z])", "_", s)
    s = re.sub(r"(?<=[A-Z])(?=[A-Z][a-z])", "_", s)
    s = s.replace("-", "_").replace(" ", "_")
    return s.lower()
 def _strip_unit_suffix(value: str) -> str:
    """Return the numeric part of values like "0.2119 in/s" → "0.2119".
    Also strips Thor's below/above-threshold prefixes:
      "<0.005 in/s"  → "0.005"   (below-noise-floor reading)
      ">100 Hz"      → "100"     (above-measurement-range reading)
    """
    parts = value.strip().split()
    token = parts[0] if parts else value.strip()
    if token.startswith("<") or token.startswith(">"):
        token = token[1:]
    return token
 def _parse_float(value: str) -> Optional[float]:
    try:
        return float(_strip_unit_suffix(value))
    except (ValueError, TypeError):
        return None
 def _parse_int(value: str) -> Optional[int]:
    try:
        return int(float(_strip_unit_suffix(value)))
    except (ValueError, TypeError):
        return None
 def parse_idf_report(text: Union[str, bytes]) -> Dict[str, Any]:
    """
    Parse a Thor IDFW.txt / IDFH.txt sidecar.
    Returns a flat dict with two kinds of entries:
      - **Raw fields** — every `Key : Value` line, keyed by snake_case
        of the original key, value as a string (unit suffix preserved).
        Lets callers grab any field we haven't explicitly normalised.
      - **Derived fields** — a curated set with parsed types:
          * `serial_number`     str
          * `event_type`        str  ("Full Waveform" / "Full Histogram")
          * `event_datetime`    ISO-8601 string ("YYYY-MM-DDTHH:MM:SS") when
                                 both EventDate and EventTime are present
          * `sample_rate`       int  (samples/sec)
          * `tran_ppv`,`vert_ppv`,`long_ppv` float (in/s)
          * `mic_ppv`           float (dB or psi — same units as MicPSPL)
          * `peak_vector_sum`   float (in/s)
          * `tran_zc_freq`,`vert_zc_freq`,`long_zc_freq` float (Hz)
          * `record_time_sec`   float (seconds)
          * `pre_trigger_sec`   float (seconds)
          * `project`           str  (from TitleString1 — Thor's location)
          * `client`            str  (TitleString2)
          * `operator`          str  (TitleString3 — company/operator)
          * `notes`             str  (TitleString4)
          * `setup`             str
          * `version`           str  (firmware)
          * `battery_volts`     float
          * `calibration_text`  str  (e.g. "November 22, 2023 by Instantel")
          * `tran_test_passed`, `vert_test_passed`, `long_test_passed`,
            `mic_test_passed`  bool  ("Passed" → True; anything else → False)
          * `filename`          str  (FileName line — useful sanity check)
    Stops parsing at the literal "Waveform Data Channels" line; the
    raw-samples block is left to whoever wants to decode the binary.
    Input may be `str` or `bytes` (`utf-8`/`latin-1` tolerant).
    """
    if isinstance(text, bytes):
        try:
            text = text.decode("utf-8")
        except UnicodeDecodeError:
            text = text.decode("latin-1", errors="replace")
    raw: Dict[str, str] = {}
    for line in text.splitlines():
        stripped = line.strip()
        if not stripped:
            continue
        if stripped.lower().startswith(_WAVEFORM_BLOCK_MARKER):
            break
        m = _LINE_RE.match(stripped)
        if not m:
            continue
        key = _normalize_key(m.group(1))
        value = m.group(2).strip()
        # Multi-value lines (Channel, Units, etc.) — coalesce by appending.
        if key in raw:
            raw[key] = raw[key] + "; " + value
        else:
            raw[key] = value
    out: Dict[str, Any] = dict(raw)  # keep all raw fields
    # ── Derived fields ───────────────────────────────────────────────────────
    def _take(*candidates: str) -> Optional[str]:
        for c in candidates:
            if c in raw:
                return raw[c]
        return None
    # Event identity
    if "serial_number" in raw:
        out["serial_number"] = raw["serial_number"]
    if "event_type" in raw:
        out["event_type"] = raw["event_type"]
    if "file_name" in raw:
        out["filename"] = raw["file_name"]
    # Combined date+time.  Waveform sidecars use "EventDate" / "EventTime";
    # histogram sidecars use "HistogramStartDate" / "HistogramStartTime".
    # Prefer the event_* names when both are present.
    ed = raw.get("event_date") or raw.get("histogram_start_date")
    et = raw.get("event_time") or raw.get("histogram_start_time")
    if ed and et:
        try:
            dt = datetime.datetime.strptime(f"{ed} {et}", "%Y-%m-%d %H:%M:%S")
            out["event_datetime"] = dt.isoformat()
        except ValueError:
            pass
    # Numeric scalars.  For every field we typify here, we MUST drop the
    # raw string copy from `out` when parsing fails — Thor writes things
    # like "<0.005 in/s" (below threshold) and "N/A" (not measured) that
    # would otherwise linger in `out` as strings, sneak into SQLite REAL
    # columns via permissive type affinity, and then crash the JS
    # frontend on `.toFixed(...)`.
    int_fields = ("sample_rate",)
    for key in int_fields:
        v = raw.get(key)
        if v is None:
            continue
        iv = _parse_int(v)
        if iv is not None:
            out[key] = iv
        else:
            out.pop(key, None)
    float_fields = (
        "tran_ppv", "vert_ppv", "long_ppv", "peak_vector_sum",
        "tran_zc_freq", "vert_zc_freq", "long_zc_freq",
        "tran_peak_acceleration", "vert_peak_acceleration",
        "long_peak_acceleration",
        "tran_peak_displacement", "vert_peak_displacement",
        "long_peak_displacement",
        "mic_zc_freq",
    )
    for key in float_fields:
        v = raw.get(key)
        if v is None:
            continue
        fv = _parse_float(v)
        if fv is not None:
            out[key] = fv
        else:
            out.pop(key, None)
    # Time-of-peak: Thor labels these "TimeofPeak" (lowercase "of") so the
    # normalizer produces "*_timeof_peak".  Map them to the canonical
    # ``*_time_of_peak`` output keys for downstream consumers.
    for raw_key, out_key in (
        ("tran_timeof_peak", "tran_time_of_peak"),
        ("vert_timeof_peak", "vert_time_of_peak"),
        ("long_timeof_peak", "long_time_of_peak"),
        ("mic_timeof_peak",  "mic_time_of_peak"),
    ):
        v = raw.get(raw_key)
        if v is None:
            continue
        fv = _parse_float(v)
        if fv is not None:
            out[out_key] = fv
    # Microphone — Thor reports MicPSPL (dB(L)) which is the closest
    # analogue to BW's mic_ppv.  The raw "99.4 dB(L)" string stays in
    # `out` under the original `mic_pspl` key for display; the parsed
    # float goes in `mic_ppv`.
    mic = raw.get("mic_pspl")
    if mic is not None:
        fv = _parse_float(mic)
        if fv is not None:
            out["mic_ppv"] = fv
    # Record / pre-trigger duration — same drop-on-failure discipline.
    rt = raw.get("record_time")
    if rt is not None:
        fv = _parse_float(rt)
        if fv is not None:
            out["record_time_sec"] = fv
    pt = raw.get("pre_trigger_length")
    if pt is not None:
        fv = _parse_float(pt)
        if fv is not None:
            out["pre_trigger_sec"] = fv
    # Project / client / operator / location strings.  Thor's title
    # strings are operator-defined; conventional mapping (per Thor's
    # default TitleNote labels in the example data):
    #   TitleString1 = Location  → project (sensor location identifier)
    #   TitleString2 = Client    → client
    #   TitleString3 = Company   → operator (the monitoring company)
    #   TitleString4 = Notes     → notes
    out["project"]  = _take("title_string1")
    out["client"]   = _take("title_string2")
    out["operator"] = _take("title_string3", "operator")
    out["notes"]    = _take("title_string4", "post_event_note")
    if "setup" in raw:
        out["setup"] = raw["setup"]
    if "version" in raw:
        out["version"] = raw["version"]
    # Battery (e.g. "3.8 volts" → 3.8)
    bl = raw.get("battery_level")
    if bl is not None:
        fv = _parse_float(bl)
        if fv is not None:
            out["battery_volts"] = fv
    # Calibration line is free-form (e.g. "November 22, 2023 by Instantel").
    if "calibration" in raw:
        out["calibration_text"] = raw["calibration"]
    # Sensor self-check results — bool flags
    for key, out_key in (
        ("tran_test_results", "tran_test_passed"),
        ("vert_test_results", "vert_test_passed"),
        ("long_test_results", "long_test_passed"),
        ("mic_test_results",  "mic_test_passed"),
    ):
        v = raw.get(key)
        if v is not None:
            out[out_key] = v.strip().lower() == "passed"
    return out
 def serial_from_filename(name: str) -> Optional[str]:
    """Convenience: pull the serial prefix from a Thor event filename.
    Thor uses the literal serial as the filename prefix:
      UM11719_20231219163444.IDFW  →  "UM11719"
      BE9439_20200713124251.IDFH   →  "BE9439"
    """
    m = re.match(r"^([A-Z]{2}\d+)_\d{14}\.(IDFH|IDFW)(?:\.txt)?$",
                 name, re.IGNORECASE)
    return m.group(1).upper() if m else None
 def parse_event_filename(name: str) -> Optional[Tuple[str, datetime.datetime, str]]:
    """Parse `<SERIAL>_<YYYYMMDDHHMMSS>.<KIND>` → (serial, datetime, kind).
    `kind` is "IDFH" or "IDFW" (upper-case).  Returns None on no match.
    """
    m = re.match(r"^([A-Z]{2}\d+)_(\d{14})\.(IDFH|IDFW)$",
                 name, re.IGNORECASE)
    if not m:
        return None
    try:
        ts = datetime.datetime.strptime(m.group(2), "%Y%m%d%H%M%S")
    except ValueError:
        return None
    return m.group(1).upper(), ts, m.group(3).upper()
@@ -0,0 +1,530 @@
 """
 micromate/idf_file.py — Thor IDF binary codec.
 Decodes the Instantel Micromate Series IV ``.IDFW`` (waveform) and
 ``.IDFH`` (histogram) binary on-disk format.  Sister module to
 ``minimateplus/event_file_io.py``.
 Status (2026-05-28):
 - **Genuine Series IV / Thor binaries** are all signed
  ``00 12 01 00 00 00 Instantel\\0`` (sig-A in earlier notes).  Two
  Series III (Blastware) binaries appear in the example corpus
  (``BE9439_*``) — they share the ``.IDFW``/``.IDFH`` extension by
  filing convention but carry a BW STRT header (``10 00 01 80 00 00
  Instantel STRT...``) and are NOT Thor data.  The reader detects
  them by signature and raises NotImplementedError pointing callers
  at ``minimateplus.event_file_io.read_blastware_file()``.
 - **IDFW waveform body** reuses the BW segment-rotated block codec
  verbatim.  Body always starts at file offset ``0x0f1f``.  Samples
  decoded via ``minimateplus.waveform_codec.decode_waveform_v2``
  with 87–99% byte-exact match against ``.IDFW.txt`` sidecar (quiet
  events).  Loud events hit the BW codec's known walker-stops-early
  limit.  Residual ~3% drift on per-sample deltas — likely a
  Thor-specific 12-bit delta refinement that BW's codec doesn't
  model.  Geo LSB = 0.0003 in/s; mic factor ~2.14e-6 psi/count.
 - **IDFH histogram body**: 12-byte segment header
  ``[len_be 2B] 0a 00 00 00 [00 NN_counter] 05 3f`` introduces a
  segment of ``N`` 72-byte interval records (``N = (len - 10) // 72``).
  Each record holds 4 × 16-byte per-channel min/max/halfp + 8-byte
  tail.  Geo peaks via ``max(|min|, |max|) / 32768 × 10`` in/s
  (matches sidecar within ~1.8%), freq via ``512 / halfp`` Hz.
  **All 859 Thor IDFH files in the corpus decode (181,071 intervals).**
 - Binary metadata directly extracted: serial, timestamp, sample_rate,
  record_time, calibration_date.  Other fields fall back to the paired
  ``.IDFW.txt`` / ``.IDFH.txt`` sidecar (consumed by
  ``WaveformStore.save_imported_idf``).
 The full reverse-engineering writeup lives in
 ``docs/idf_protocol_reference.md``.
 """
 from __future__ import annotations
 import datetime
 import struct
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Optional, Union
 from minimateplus.waveform_codec import decode_waveform_v2
 from .models import IdfEvent, IdfPeaks, IdfReport
 # Genuine Series IV / Thor IDF binary signature: 6 bytes, then ASCII "Instantel".
 _THOR_PREFIX = b"\x00\x12\x01\x00\x00\x00"
 # Stray Series III (Blastware) binaries that occasionally turn up in Thor
 # corpus directories renamed to the .IDFW/.IDFH convention.  Their header
 # (`10 00 01 80 00 00 Instantel STRT ...`) is byte-for-byte a BW SUB 5A
 # STRT record, not a Thor binary.  Detected so we can refuse-and-route
 # rather than mis-parse.
 _BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00"
 _INSTANTEL_TAG = b"Instantel"
 # Most common body offset for sig-A IDFW files (~50% of prod events;
 # 151/154 in the original tests/fixtures/THORDATA_example corpus).  The
 # body is the segment-rotated block stream consumed by decode_waveform_v2;
 # bytes [0:3] are the magic ``00 02 00`` preamble.  Production events
 # routinely use other offsets — see :func:`_find_waveform_body_offset`
 # for the dynamic scan.  This constant survives only as the priority hint.
 _BODY_START_SIG_A = 0x0F1F
 # Magic bytes that mark a candidate waveform-body preamble.
 _BODY_MAGIC = b"\x00\x02\x00"
 # Where to start looking for body candidates inside the file.  Skip the
 # fixed-header region where the same magic legitimately appears inside
 # channel-test records and the compliance block (offsets 0x015d, 0x091c,
 # 0x0ae2, 0x0d30 in observed events).
 _BODY_SCAN_FLOOR = 0x0E00
 # Geophone count → in/s, derived from sidecar ground truth: the smallest
 # non-zero sample in 1,014-file corpus is 0.0003 in/s.
 _GEO_LSB_IPS = 0.0003
 # Microphone count → psi, derived from sidecar regression on 50 sample
 # pairs from UM11719_20231219162723.IDFW (mic-heavy event).
 _MIC_LSB_PSI = 2.14e-6
 # IDFH histogram constants.
 _IDFH_INTERVAL_SIZE = 72        # bytes per per-interval record
 _IDFH_SEGMENT_HEADER = 10       # bytes: [len_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B]
 _IDFH_SEGMENT_TAIL   = 2        # bytes after the interval data block, before next marker
 _IDFH_HALFP_FREQ_NUM = 512.0    # freq_hz = NUM / halfp; halfp ≤ 5 means ">100 Hz" sentinel
 _IDFH_GEO_FULL_SCALE = 10.0     # in/s — Normal range
 _IDFH_INT16_FS = 32768.0
 _IDFH_CHANNELS = ("Tran", "Vert", "Long", "MicL")
 # ─── Binary metadata extraction ─────────────────────────────────────────────
@dataclass
 class IdfBinaryMetadata:
    """Fields recoverable from the sig-A binary header (no .txt needed)."""
    serial:           Optional[str] = None
    event_datetime:   Optional[datetime.datetime] = None
    sample_rate:      Optional[int] = None
    record_time_sec:  Optional[float] = None
    calibration_date: Optional[datetime.date] = None
 def _read_ascii_z(buf: bytes, off: int, maxlen: int = 64) -> Optional[str]:
    if off >= len(buf):
        return None
    end = buf.find(b"\x00", off, off + maxlen)
    if end < 0:
        end = min(off + maxlen, len(buf))
    s = buf[off:end].decode("ascii", errors="replace").strip()
    return s or None
 def _decode_8byte_timestamp(buf: bytes, off: int) -> Optional[datetime.datetime]:
    """Layout: ``[day][month][year_hi][year_lo][unknown][hour][min][sec]``."""
    if off + 8 > len(buf):
        return None
    day, mon, yh, yl, _unk, hr, mn, sc = buf[off : off + 8]
    year = (yh << 8) | yl
    if not (2015 <= year <= 2050 and 1 <= mon <= 12 and 1 <= day <= 31
            and 0 <= hr < 24 and 0 <= mn < 60 and 0 <= sc < 60):
        return None
    try:
        return datetime.datetime(year, mon, day, hr, mn, sc)
    except ValueError:
        return None
 def extract_binary_metadata(buf: bytes) -> IdfBinaryMetadata:
    """Pull serial/timestamp/sample_rate/record_time/calibration from the
    sig-A binary header.
    Field positions confirmed against UM11719_20231219162723.IDFW; stable
    across the 151-file sig-A corpus.
    """
    md = IdfBinaryMetadata()
    # Serial: null-terminated ASCII at 0x14E.
    md.serial = _read_ascii_z(buf, 0x14E, maxlen=16)
    # Sample rate + record time live in a BW-compatible compliance block.
    # Locate the 6-byte anchor `be 80 00 00 00 00` and read offsets relative
    # to it: anchor-6 = sample_rate uint16 BE; anchor+6 = record_time float32 BE.
    anchor = buf.find(b"\xbe\x80\x00\x00\x00\x00", 0x800, 0xA00)
    if anchor > 0:
        sr_bytes = buf[anchor - 6 : anchor - 4]
        if len(sr_bytes) == 2:
            sr = int.from_bytes(sr_bytes, "big")
            if sr in (256, 512, 1024, 2048, 4096):
                md.sample_rate = sr
        rt_bytes = buf[anchor + 6 : anchor + 10]
        if len(rt_bytes) == 4:
            try:
                rt = struct.unpack(">f", rt_bytes)[0]
                if 0.1 <= rt <= 600.0:
                    md.record_time_sec = float(rt)
            except struct.error:
                pass
    # Event timestamp: 8 bytes.  Position differs between IDFW (0x97A) and
    # IDFH (0x9F8); scan a small range and accept the first valid decode.
    for off in (0x97A, 0x9F8):
        ts = _decode_8byte_timestamp(buf, off)
        if ts is not None:
            md.event_datetime = ts
            break
    # Calibration date: day, month, year_be at 0x194-0x197.
    if len(buf) > 0x197:
        day, mon = buf[0x194], buf[0x195]
        year = int.from_bytes(buf[0x196 : 0x198], "big")
        if 1 <= mon <= 12 and 1 <= day <= 31 and 2015 <= year <= 2050:
            try:
                md.calibration_date = datetime.date(year, mon, day)
            except ValueError:
                pass
    return md
 # ─── Sample decoder + unit conversion ───────────────────────────────────────
 def _find_waveform_body_offset(buf: bytes) -> Optional[int]:
    """Pick the file offset of the waveform body by trial-decoding every
    ``00 02 00`` magic position past the fixed-header region.
    The body's location isn't fixed across all sig-A IDFW files — about
    half the production events use ``0x0f1f``, but the rest have offsets
    that shift based on header padding / channel-config layout.  We
    auto-detect by:
      1. Find every ``00 02 00`` occurrence past ``_BODY_SCAN_FLOOR``.
      2. Try ``decode_waveform_v2()`` on each candidate.
      3. Pick the offset whose decoded sample count is largest.
    Returns the offset, or ``None`` if no candidate yielded more than
    the trivial 2-sample preamble (= "no real body found").
    Costs ~2-8 trial decodes per file; in practice the first candidate
    past 0x0e00 is usually the right one.
    """
    if len(buf) < _BODY_SCAN_FLOOR + 8:
        return None
    best: Optional[tuple[int, int]] = None   # (total_samples, offset)
    i = _BODY_SCAN_FLOOR
    while True:
        j = buf.find(_BODY_MAGIC, i)
        if j < 0:
            break
        i = j + 1
        try:
            decoded = decode_waveform_v2(buf[j:])
        except Exception:
            continue
        if not decoded:
            continue
        total = sum(len(v) for v in decoded.values())
        # A "real" body has more than just the 2-sample preamble.
        if total <= 2:
            continue
        if best is None or total > best[0]:
            best = (total, j)
    return best[1] if best else None
 def _decode_waveform_samples(buf: bytes) -> Optional[dict]:
    """Decode samples from the sig-A waveform body.
    Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in
    its own count unit (see :func:`mic_count_to_psi`).  Returns None if
    no usable body is found.
    Uses :func:`_find_waveform_body_offset` to locate the body — the
    file-offset varies across events (~50% sit at the canonical
    ``0x0f1f`` but the rest don't), so the previous hardcoded constant
    silently produced 2-sample preamble-only output for half the corpus.
    """
    off = _find_waveform_body_offset(buf)
    if off is None:
        return None
    return decode_waveform_v2(buf[off:])
 def geo_count_to_ips(count: int) -> float:
    """Convert a Thor geo decoder count to in/s.  LSB = 0.0003 in/s."""
    return count * _GEO_LSB_IPS
 def mic_count_to_psi(count: int) -> float:
    """Convert a Thor mic decoder count to psi.  Scale derived from
    regression over 50 sample pairs in UM11719_20231219162723.IDFW;
    consistent to ~5%.  Calibration constants from the channel block
    can refine this once decoded.
    """
    return count * _MIC_LSB_PSI
 # ─── IDFH histogram decoder ─────────────────────────────────────────────────
@dataclass
 class IdfhInterval:
    """One decoded histogram interval (typically one minute of monitoring)."""
    offset:    int    # file byte offset of the 72-byte record
    # Per-channel min/max ADC counts (int16 BE), half-period samples, peak count.
    # Peak = max(|min|, |max|).  freq_hz = 512/halfp (None if halfp ≤ 5 →
    # ">100 Hz" sentinel; matches sidecar convention).
    tran_min:    int
    tran_max:    int
    tran_halfp:  int
    vert_min:    int
    vert_max:    int
    vert_halfp:  int
    long_min:    int
    long_max:    int
    long_halfp:  int
    micl_min:    int
    micl_max:    int
    micl_halfp:  int
    def peak_count(self, channel: str) -> int:
        mn = getattr(self, f"{channel.lower()}_min")
        mx = getattr(self, f"{channel.lower()}_max")
        return max(abs(mn), abs(mx))
    def peak_ips(self, channel: str) -> float:
        """Convert peak count to in/s (geo channels only)."""
        return self.peak_count(channel) / _IDFH_INT16_FS * _IDFH_GEO_FULL_SCALE
    def freq_hz(self, channel: str) -> Optional[float]:
        halfp = getattr(self, f"{channel.lower()}_halfp")
        if halfp <= 5:
            return None
        return _IDFH_HALFP_FREQ_NUM / halfp
 def _decode_idfh_interval(buf72: bytes, offset: int) -> IdfhInterval:
    """Decode one 72-byte interval record into per-channel min/max/halfp."""
    import struct
    fields = []
    for i in range(4):
        block = buf72[i * 16 : (i + 1) * 16]
        mn = struct.unpack_from(">h", block, 0)[0]
        mx = struct.unpack_from(">h", block, 2)[0]
        # block[4:6] = int16 BE, role unknown (possibly time-of-peak)
        halfp = struct.unpack_from(">H", block, 6)[0]
        # block[10:12] and block[14:16] are uint16 BE with unknown semantics
        # (likely sum / count contributions for the PVS computation).
        fields.extend([mn, mx, halfp])
    # Tail 8 bytes (buf72[64:72]) carry PVS-related data; not yet decoded.
    return IdfhInterval(
        offset=offset,
        tran_min=fields[0], tran_max=fields[1], tran_halfp=fields[2],
        vert_min=fields[3], vert_max=fields[4], vert_halfp=fields[5],
        long_min=fields[6], long_max=fields[7], long_halfp=fields[8],
        micl_min=fields[9], micl_max=fields[10], micl_halfp=fields[11],
    )
 def decode_idfh_body(buf: bytes) -> list:
    """Walk an IDFH file and decode every interval record.
    The body has one or more segments; each segment header is 12 bytes:
    ``[length_be 2B][0a 00 00 00][00 NN_counter][05 3f]`` where ``length``
    is bytes from the magic through the end of the interval block
    (= 10 + 72 × n_intervals).  Segments are separated by a 2-byte tail
    + next-segment 2-byte prefix (the bytes before the next length field).
    Confirmed against the 859-file corpus (181,071 intervals decoded; 1
    failure is the sig-B BE9439 file).
    """
    intervals: list = []
    i = 0
    while True:
        j = buf.find(b"\x0a\x00\x00\x00", i)
        if j < 0 or j < 2:
            break
        # Validate: [length_be][0a 00 00 00][00 NN][05 3f]
        if buf[j + 4] != 0x00 or buf[j + 6 : j + 8] != b"\x05\x3f":
            i = j + 1
            continue
        length = int.from_bytes(buf[j - 2 : j], "big")
        n = (length - _IDFH_SEGMENT_HEADER) // _IDFH_INTERVAL_SIZE
        if n <= 0:
            i = j + 1
            continue
        header_start = j - 2
        interval_start = header_start + _IDFH_SEGMENT_HEADER
        for k in range(n):
            off = interval_start + k * _IDFH_INTERVAL_SIZE
            if off + _IDFH_INTERVAL_SIZE > len(buf):
                break
            chunk = buf[off : off + _IDFH_INTERVAL_SIZE]
            intervals.append(_decode_idfh_interval(chunk, off))
        # Advance past this segment + the 2-byte tail.
        i = header_start + length + _IDFH_SEGMENT_TAIL
    return intervals
 # ─── Top-level reader ───────────────────────────────────────────────────────
@dataclass
 class IdfReadResult:
    """Return type for :func:`read_idf_file`.
    For waveforms (``.IDFW``), ``samples`` holds the per-channel sample
    arrays in Thor decoder counts.  For histograms (``.IDFH``),
    ``samples`` is empty and ``intervals`` holds the per-interval
    record list (peaks, freqs).
    """
    event:           IdfEvent
    samples:         dict   # {"Tran": [...], ...} for IDFW; empty for IDFH
    binary_metadata: IdfBinaryMetadata
    signature:       str    # always "thor" for now (sig-A genuine Thor)
    intervals:       Optional[list] = None  # list[IdfhInterval] for IDFH; None for IDFW
 def read_idf_file(
    path: Union[str, Path],
    *,
    data: Optional[bytes] = None,
 ) -> IdfReadResult:
    """Parse a Thor ``.IDFW`` binary into an ``IdfEvent`` + decoded samples.
    Currently implements signature-A waveforms only.  Signature-B
    (old-firmware) and ``.IDFH`` histograms raise NotImplementedError;
    use the paired ``.IDFW.txt`` / ``.IDFH.txt`` sidecar for those via
    ``parse_idf_report()``.
    Returns an :class:`IdfReadResult`.  The caller converts int sample
    counts to physical units via :func:`geo_count_to_ips` /
    :func:`mic_count_to_psi`.
    ``path`` is used for filename in error messages and ``.IDFH`` vs
    ``.IDFW`` suffix detection.  When ``data`` is supplied the disk
    read is skipped — useful for ingest paths that already have the
    bytes in memory and where the file may not exist on disk yet.
    """
    p = Path(path)
    buf = data if data is not None else p.read_bytes()
    if len(buf) < 16 or buf[6:16] != _INSTANTEL_TAG + b"\x00":
        raise ValueError(f"{p.name}: not an IDF file (missing Instantel magic)")
    sig_prefix = buf[:6]
    if sig_prefix == _THOR_PREFIX:
        signature = "thor"
    elif sig_prefix == _BW_STRAY_PREFIX:
        raise NotImplementedError(
            f"{p.name}: file has a Series III (Blastware) STRT header in "
            "an IDF-named container — not a Thor binary.  Route through "
            "minimateplus.event_file_io.read_blastware_file() instead "
            "(peaks decode; samples & full metadata don't, but it's not "
            "Thor data so the Thor codec doesn't apply)."
        )
    else:
        raise ValueError(f"{p.name}: unknown IDF signature {sig_prefix.hex()}")
    is_histogram = p.suffix.upper() == ".IDFH"
    md = extract_binary_metadata(buf)
    if is_histogram:
        intervals = decode_idfh_body(buf)
        if not intervals:
            raise ValueError(f"{p.name}: IDFH body decoded no intervals")
        # Peaks: max across all intervals on each channel (per-channel max
        # of stored max-magnitudes; sidecar's PPV row carries the same).
        peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0)
        peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0)
        peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0)
        # Mic peak in psi — Thor stores per-interval mic ADC counts in the
        # binary; convert the max count to psi via the per-count factor.
        mic_peak_count = max((iv.peak_count("MicL") for iv in intervals), default=0)
        mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
        rep = IdfReport(
            serial_number=md.serial,
            event_type="Full Histogram",
            event_datetime=md.event_datetime,
            filename=p.name,
            sample_rate=md.sample_rate,
            record_time_sec=md.record_time_sec,
        )
        peaks = IdfPeaks(
            transverse_ips=peak_tran,
            vertical_ips=peak_vert,
            longitudinal_ips=peak_long,
            peak_vector_sum_ips=None,
            mic_pspl_dbl=None,         # IDFH binary doesn't carry the dB(L) value
            mic_pspl_psi=mic_peak_psi,
        )
        event = IdfEvent(
            serial=md.serial or "UNKNOWN",
            timestamp=md.event_datetime or datetime.datetime(1970, 1, 1),
            kind="Histogram",
            filename=p.name,
            sample_rate=md.sample_rate,
            record_time_sec=md.record_time_sec,
            peaks=peaks,
            report=rep,
        )
        return IdfReadResult(
            event=event,
            samples={},
            binary_metadata=md,
            signature=signature,
            intervals=intervals,
        )
    # Waveform path.
    decoded = _decode_waveform_samples(buf)
    if decoded is None:
        raise ValueError(f"{p.name}: waveform body codec failed")
    rep = IdfReport(
        serial_number=md.serial,
        event_type="Full Waveform",
        event_datetime=md.event_datetime,
        filename=p.name,
        sample_rate=md.sample_rate,
        record_time_sec=md.record_time_sec,
    )
    def _peak_ips(ch: str) -> float:
        arr = decoded.get(ch, [])
        return geo_count_to_ips(max((abs(v) for v in arr), default=0))
    # Mic peak psi from binary: max absolute MicL ADC count × 2.14e-6 psi/count.
    mic_arr = decoded.get("MicL", [])
    mic_peak_count = max((abs(v) for v in mic_arr), default=0)
    mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
    peaks = IdfPeaks(
        transverse_ips=_peak_ips("Tran"),
        vertical_ips=_peak_ips("Vert"),
        longitudinal_ips=_peak_ips("Long"),
        # PVS requires aligned per-sample √(T²+V²+L²); leave None — the
        # sidecar carries it and the bridge picks it up if present.
        peak_vector_sum_ips=None,
        mic_pspl_dbl=None,             # binary IDFW doesn't carry the dB(L) value;
                                       # sidecar .txt fills it via IdfReport.from_dict
        mic_pspl_psi=mic_peak_psi,
    )
    event = IdfEvent(
        serial=md.serial or "UNKNOWN",
        timestamp=md.event_datetime or datetime.datetime(1970, 1, 1),
        kind="Waveform",
        filename=p.name,
        sample_rate=md.sample_rate,
        record_time_sec=md.record_time_sec,
        peaks=peaks,
        report=rep,
    )
    return IdfReadResult(
        event=event,
        samples=decoded,
        binary_metadata=md,
        signature=signature,
    )
@@ -0,0 +1,323 @@
 """
 micromate/idf_to_bw_report.py — adapter that projects a parsed Thor IDF
 report (+ binary metadata + decoded IDFH intervals) into the
 ``bw_report``-shaped dict that :mod:`sfm.report_pdf.gather_report_data`
 consumes.
 Lets Thor events flow through the existing Series III Event Report PDF
 pipeline without duplicating the renderer.  Thor's report content is
 ~95% the same data shape as BW's; the field names differ but the
 underlying metrics map 1:1.
 Caveats
 ───────
 - **Mic units** — Thor records ``MicPSPL`` natively in dB(L).  This
  adapter sets ``bw_report.mic.pspl_dbl`` directly; the report
  renderer recomputes the equivalent psi via its dBL→psi formula.
 - **Saturation / above-range flags** — Thor doesn't always mark
  ``OORANGE`` the way BW does; we set ``zc_freq_above_range`` only
  when a `>100` sentinel was preserved in the raw text.
 - **Per-interval data** — for IDFH events we build ``interval_times``
  by stepping ``IntervalSize`` from ``HistogramStartTime``; the binary
  decoder confirms one record per step (882 / 881 / 881 ... across
  the corpus).
 - **calibration_by parsing** — Thor's free-form ``Calibration : November
  22, 2023 by Instantel`` is split on ``" by "`` to extract the
  calibrator; the date prefix is parsed where possible, otherwise
  the binary-extracted ``calibration_date`` from
  :class:`micromate.idf_file.IdfBinaryMetadata` wins.
 """
 from __future__ import annotations
 import datetime
 import re
 from typing import Any, Dict, List, Optional
 # ─── Helpers ────────────────────────────────────────────────────────────────
 _NUM_RE = re.compile(r"-?\d+(?:\.\d+)?")
 def _parse_first_number(s: Optional[str]) -> Optional[float]:
    """Pull the first numeric token from a string like ``"0.1500 in/s"``."""
    if s is None:
        return None
    m = _NUM_RE.search(str(s))
    if not m:
        return None
    try:
        return float(m.group(0))
    except ValueError:
        return None
 def _parse_interval_size_s(s: Optional[str]) -> Optional[float]:
    """``"60 sec"`` → 60.0, ``"5 min"`` → 300.0, ``"1 hour"`` → 3600."""
    if s is None:
        return None
    num = _parse_first_number(s)
    if num is None:
        return None
    sl = str(s).lower()
    if "hour" in sl or "hr" in sl:
        return num * 3600.0
    if "min" in sl:
        return num * 60.0
    return num   # default to seconds
 def _parse_calibration(text: Optional[str]) -> tuple[Optional[str], Optional[str]]:
    """Split ``"November 22, 2023 by Instantel"`` → (ISO date, calibrator).
    Returns ``(None, None)`` if neither half parses.
    """
    if not text:
        return None, None
    parts = str(text).split(" by ", 1)
    date_part = parts[0].strip() if parts else None
    by_part = parts[1].strip() if len(parts) > 1 else None
    iso_date: Optional[str] = None
    if date_part:
        for fmt in ("%B %d, %Y", "%b %d, %Y", "%Y-%m-%d", "%m/%d/%Y"):
            try:
                iso_date = datetime.datetime.strptime(date_part, fmt).date().isoformat()
                break
            except ValueError:
                continue
    return iso_date, by_part
 def _channel_peaks(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]:
    """Map ``tran_ppv`` / ``tran_zc_freq`` / ... → bw_report.peaks.tran shape."""
    out: Dict[str, Any] = {}
    for src, dst in (
        (f"{ch_lc}_ppv",                 "ppv_ips"),
        (f"{ch_lc}_zc_freq",             "zc_freq_hz"),
        (f"{ch_lc}_time_of_peak",        "time_of_peak_s"),
        (f"{ch_lc}_peak_acceleration",   "peak_accel_g"),
        (f"{ch_lc}_peak_displacement",   "peak_disp_in"),
    ):
        v = idf.get(src)
        if v is not None:
            out[dst] = v
    # ZC freq ">100" sentinel: the raw text carries it under the un-typed
    # key (e.g. ``raw["tran_zc_freq"]`` would be ``">100"``), and our parser
    # dropped the typed entry.  Detect that case and flag.
    raw_zc = idf.get(f"{ch_lc}_zc_freq")
    if isinstance(raw_zc, str) and ">" in raw_zc:
        out["zc_freq_above_range"] = True
        out.pop("zc_freq_hz", None)
    return out
 def _sensor_check(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]:
    out: Dict[str, Any] = {}
    fr = idf.get(f"{ch_lc}_test_freq")
    if fr is not None:
        out["freq_hz"] = _parse_first_number(fr)
    rt = idf.get(f"{ch_lc}_test_ratio")
    if rt is not None:
        out["ratio"] = _parse_first_number(rt)
    am = idf.get(f"{ch_lc}_test_amplitude")
    if am is not None:
        out["amplitude_mv"] = _parse_first_number(am)
    res = idf.get(f"{ch_lc}_test_results")
    if res is not None:
        out["result"] = str(res).strip()
    return {k: v for k, v in out.items() if v is not None}
 def _interval_times(idf: Dict[str, Any], n_intervals: Optional[int]) -> List[str]:
    """Synthesise per-interval timestamps from start + interval_size × k.
    Returns ``[]`` when start time or interval size is unknown.
    """
    if not n_intervals:
        return []
    start_date = idf.get("histogram_start_date") or idf.get("event_date")
    start_time = idf.get("histogram_start_time") or idf.get("event_time")
    iv_str = idf.get("interval_size")
    iv_s = _parse_interval_size_s(iv_str)
    if not (start_date and start_time and iv_s):
        return []
    try:
        t0 = datetime.datetime.strptime(f"{start_date} {start_time}", "%Y-%m-%d %H:%M:%S")
    except ValueError:
        return []
    out = []
    for k in range(int(n_intervals)):
        t = t0 + datetime.timedelta(seconds=iv_s * (k + 1))
        out.append(t.isoformat())
    return out
 # ─── Top-level adapter ──────────────────────────────────────────────────────
 def build_bw_report_from_idf(
    idf_report: Dict[str, Any],
    *,
    binary_md=None,
    intervals: Optional[list] = None,
    is_histogram: Optional[bool] = None,
 ) -> Dict[str, Any]:
    """Project a parsed IDF report dict (and optional binary metadata +
    decoded IDFH intervals) into the BW report sidecar shape.
    The returned dict is structurally identical to what
    ``minimateplus.event_file_io._bw_report_to_dict`` produces from a
    real BW ASCII report — it can be assigned to
    ``sidecar["bw_report"]`` and consumed verbatim by
    ``sfm.report_pdf.gather_report_data``.
    ``intervals`` is the list of :class:`micromate.idf_file.IdfhInterval`
    objects from :func:`micromate.idf_file.decode_idfh_body`; only used
    for histogram events to derive accurate ``interval_times``.
    """
    if is_histogram is None:
        et = str(idf_report.get("event_type", ""))
        is_histogram = et.lower().startswith("full histogram")
    # ── Trigger / recording / device ─────────────────────────────────────
    trigger_channel = idf_report.get("trigger")
    trigger_level   = _parse_first_number(idf_report.get("geo_trigger_level"))
    geo_range_ips   = _parse_first_number(idf_report.get("geo_range"))
    cal_iso, cal_by = _parse_calibration(idf_report.get("calibration"))
    # Prefer the binary-extracted calibration_date when our text parse fell
    # through; the binary date is unambiguous.
    if cal_iso is None and binary_md is not None and binary_md.calibration_date:
        cal_iso = binary_md.calibration_date.isoformat()
    # ── Histogram fields ────────────────────────────────────────────────
    hist_block: Dict[str, Any] = {
        "start": None, "stop": None, "n_intervals": None,
        "interval_size": None, "interval_size_s": None,
        "channel_peak_when": {},
    }
    if is_histogram:
        sd = idf_report.get("histogram_start_date")
        st = idf_report.get("histogram_start_time")
        if sd and st:
            try:
                hist_block["start"] = datetime.datetime.strptime(
                    f"{sd} {st}", "%Y-%m-%d %H:%M:%S"
                ).isoformat()
            except ValueError:
                pass
        ed = idf_report.get("histogram_stop_date")
        et_ = idf_report.get("histogram_stop_time")
        if ed and et_:
            try:
                hist_block["stop"] = datetime.datetime.strptime(
                    f"{ed} {et_}", "%Y-%m-%d %H:%M:%S"
                ).isoformat()
            except ValueError:
                pass
        n_raw = idf_report.get("number_of_intervals")
        if n_raw is not None:
            try:
                # Thor reports a float like "81.04"; round to int (the BW
                # report uses an int for the column).
                hist_block["n_intervals"] = int(float(str(n_raw)))
            except ValueError:
                pass
        # When the binary decoder gave us the actual interval count, prefer it.
        if intervals is not None:
            hist_block["n_intervals"] = len(intervals)
        hist_block["interval_size"] = idf_report.get("interval_size")
        hist_block["interval_size_s"] = _parse_interval_size_s(idf_report.get("interval_size"))
        # interval_times derived from start+step (the BW report uses the
        # exact strings; we match its representation).
        times = _interval_times(idf_report, hist_block["n_intervals"])
        # Per-channel peak when (absolute date+time at which the channel's
        # peak occurred over the histogram run).  Thor splits this into
        # ``TranPeakDate`` / ``TranPeakTime`` etc.
        peak_when: Dict[str, str] = {}
        for ch_label, ch_lc in (("Tran", "tran"), ("Vert", "vert"), ("Long", "long"), ("MicL", "mic")):
            d = idf_report.get(f"{ch_lc}_peak_date")
            t = idf_report.get(f"{ch_lc}_peak_time")
            if d and t:
                try:
                    peak_when[ch_label] = datetime.datetime.strptime(
                        f"{d} {t}", "%Y-%m-%d %H:%M:%S"
                    ).isoformat()
                except ValueError:
                    continue
        if peak_when:
            hist_block["channel_peak_when"] = peak_when
    # ── Mic block ────────────────────────────────────────────────────────
    mic_block = {
        "weighting":           "L",                   # Thor mic is ISEE Linear
        "pspl_dbl":            idf_report.get("mic_ppv"),  # the dB(L) float
        "pspl_saturated":      False,
        "zc_freq_hz":          idf_report.get("mic_zc_freq"),
        "zc_freq_above_range": isinstance(idf_report.get("mic_zc_freq"), str)
                               and ">" in str(idf_report.get("mic_zc_freq")),
        "time_of_peak_s":      idf_report.get("mic_time_of_peak"),
    }
    if mic_block["zc_freq_above_range"]:
        mic_block["zc_freq_hz"] = None
    # ── Peaks ────────────────────────────────────────────────────────────
    vs_block = {
        "ips":       idf_report.get("peak_vector_sum"),
        "time_s":    _parse_first_number(idf_report.get("peak_vector_sum_time_sum")),
        "when":      None,
        "saturated": False,
    }
    if is_histogram:
        # PVS absolute date+time, when present.
        vs_d = idf_report.get("peak_vector_sum_date")
        vs_t = idf_report.get("peak_vector_sum_time")
        if vs_d and vs_t:
            try:
                vs_block["when"] = datetime.datetime.strptime(
                    f"{vs_d} {vs_t}", "%Y-%m-%d %H:%M:%S"
                ).isoformat()
            except ValueError:
                pass
    return {
        "available":  True,
        "event_type": idf_report.get("event_type"),
        "version":    idf_report.get("version"),
        "trigger": {
            "channel":       trigger_channel,
            "geo_level_ips": trigger_level,
        },
        "recording": {
            "sample_rate_sps":  idf_report.get("sample_rate"),
            "record_time_s":    idf_report.get("record_time_sec"),
            "pretrig_s":        idf_report.get("pre_trigger_sec"),
            "stop_mode":        idf_report.get("record_stop_mode"),
            "geo_range_ips":    geo_range_ips,
            "units":            idf_report.get("units"),
        },
        "device": {
            "battery_volts":    idf_report.get("battery_volts"),
            "calibration_date": cal_iso,
            "calibration_by":   cal_by,
        },
        "peaks": {
            "tran":       _channel_peaks(idf_report, "tran"),
            "vert":       _channel_peaks(idf_report, "vert"),
            "long":       _channel_peaks(idf_report, "long"),
            "vector_sum": vs_block,
        },
        "mic":          mic_block,
        "sensor_check": {
            "tran": _sensor_check(idf_report, "tran"),
            "vert": _sensor_check(idf_report, "vert"),
            "long": _sensor_check(idf_report, "long"),
            "mic":  _sensor_check(idf_report, "mic"),
        },
        "histogram":    hist_block,
        "monitor_log":  [],
        "pc_sw_version": None,
    }
@@ -0,0 +1,398 @@
 """
 Micromate (Series IV / Thor) native data models.
 These are the right-shaped dataclasses for Thor data — Thor measures
 the microphone in dB(L) directly, so this model carries
 ``mic_pspl_dbl`` rather than the pseudo-``psi`` shoehorn that
 ``minimateplus.PeakValues`` uses for Series III BW data.
 The ingest pipeline today goes:
    .IDFW.txt  →  parse_idf_report()  →  dict
    dict       →  IdfEvent.from_report()  →  IdfEvent  (typed)
    IdfEvent   →  IdfEvent.to_minimateplus_event()  →  shape DB / sidecar
                                                     machinery expects
 The ``to_minimateplus_event()`` bridge is a temporary boundary — when we
 crack the binary IDF codec and have richer per-event data to store, the
 DB schema will grow Series-IV-specific columns and the bridge will
 shrink or disappear.
 """
 from __future__ import annotations
 import datetime
 from dataclasses import dataclass, field
 from typing import Any, Dict, Optional, Tuple
 # ── IdfReport ─────────────────────────────────────────────────────────────────
@dataclass
 class IdfReport:
    """Typed wrapper around the dict returned by ``parse_idf_report``.
    All fields optional — Thor's exporter is permissive and some IDF .txt
    files (especially histograms) omit fields that waveform sidecars
    include.  Use ``.raw`` for any field this dataclass hasn't surfaced
    yet (the parser keeps every recognised key in the raw dict).
    """
    # Identity / kind
    serial_number:     Optional[str] = None
    event_type:        Optional[str] = None      # "Full Waveform" | "Full Histogram"
    event_datetime:    Optional[datetime.datetime] = None
    filename:          Optional[str] = None      # echoed by Thor's exporter
    # Sampling / timing
    sample_rate:       Optional[int]   = None    # samples/sec
    record_time_sec:   Optional[float] = None
    pre_trigger_sec:   Optional[float] = None
    # Geophone peaks (in/s)
    tran_ppv:          Optional[float] = None
    vert_ppv:          Optional[float] = None
    long_ppv:          Optional[float] = None
    peak_vector_sum:   Optional[float] = None
    # Microphone — Thor's native unit is dB(L), NOT psi.
    mic_pspl_dbl:      Optional[float] = None
    # Zero-crossing frequencies (Hz)
    tran_zc_freq:      Optional[float] = None
    vert_zc_freq:      Optional[float] = None
    long_zc_freq:      Optional[float] = None
    mic_zc_freq:       Optional[float] = None
    # Per-channel time of peak (sec, since event start)
    tran_time_of_peak: Optional[float] = None
    vert_time_of_peak: Optional[float] = None
    long_time_of_peak: Optional[float] = None
    mic_time_of_peak:  Optional[float] = None
    # Derived per-channel motion
    tran_peak_acceleration: Optional[float] = None    # g
    vert_peak_acceleration: Optional[float] = None
    long_peak_acceleration: Optional[float] = None
    tran_peak_displacement: Optional[float] = None    # in
    vert_peak_displacement: Optional[float] = None
    long_peak_displacement: Optional[float] = None
    # Operator-supplied strings (Thor's TitleString1..4 → semantic slots)
    project:           Optional[str] = None    # TitleString1
    client:            Optional[str] = None    # TitleString2
    operator:          Optional[str] = None    # TitleString3
    notes:             Optional[str] = None    # TitleString4 / PostEventNote
    setup:             Optional[str] = None    # setup file name
    # Sensor self-check results
    tran_test_passed:  Optional[bool] = None
    vert_test_passed:  Optional[bool] = None
    long_test_passed:  Optional[bool] = None
    mic_test_passed:   Optional[bool] = None
    # Device-fixed metadata
    firmware_version:  Optional[str]   = None
    calibration_text:  Optional[str]   = None
    battery_volts:     Optional[float] = None
    # Original parser dict — preserves every recognised key (including
    # raw unit-suffixed strings) for forward-compatible field access.
    raw: Dict[str, Any] = field(default_factory=dict, repr=False)
    @classmethod
    def from_dict(cls, d: Dict[str, Any]) -> "IdfReport":
        """Build an IdfReport from the dict returned by ``parse_idf_report``."""
        ed = d.get("event_datetime")
        if isinstance(ed, str):
            try:
                ed = datetime.datetime.fromisoformat(ed)
            except ValueError:
                ed = None
        return cls(
            serial_number     = d.get("serial_number"),
            event_type        = d.get("event_type"),
            event_datetime    = ed if isinstance(ed, datetime.datetime) else None,
            filename          = d.get("filename"),
            sample_rate       = d.get("sample_rate"),
            record_time_sec   = d.get("record_time_sec"),
            pre_trigger_sec   = d.get("pre_trigger_sec"),
            tran_ppv          = d.get("tran_ppv"),
            vert_ppv          = d.get("vert_ppv"),
            long_ppv          = d.get("long_ppv"),
            peak_vector_sum   = d.get("peak_vector_sum"),
            mic_pspl_dbl      = d.get("mic_ppv"),       # parser names it mic_ppv (legacy)
            tran_zc_freq      = d.get("tran_zc_freq"),
            vert_zc_freq      = d.get("vert_zc_freq"),
            long_zc_freq      = d.get("long_zc_freq"),
            mic_zc_freq       = d.get("mic_zc_freq"),
            tran_time_of_peak = d.get("tran_time_of_peak"),
            vert_time_of_peak = d.get("vert_time_of_peak"),
            long_time_of_peak = d.get("long_time_of_peak"),
            mic_time_of_peak  = d.get("mic_time_of_peak"),
            tran_peak_acceleration = d.get("tran_peak_acceleration"),
            vert_peak_acceleration = d.get("vert_peak_acceleration"),
            long_peak_acceleration = d.get("long_peak_acceleration"),
            tran_peak_displacement = d.get("tran_peak_displacement"),
            vert_peak_displacement = d.get("vert_peak_displacement"),
            long_peak_displacement = d.get("long_peak_displacement"),
            project           = d.get("project"),
            client            = d.get("client"),
            operator          = d.get("operator"),
            notes             = d.get("notes"),
            setup             = d.get("setup"),
            tran_test_passed  = d.get("tran_test_passed"),
            vert_test_passed  = d.get("vert_test_passed"),
            long_test_passed  = d.get("long_test_passed"),
            mic_test_passed   = d.get("mic_test_passed"),
            firmware_version  = d.get("version"),
            calibration_text  = d.get("calibration_text"),
            battery_volts     = d.get("battery_volts"),
            raw               = d,
        )
 # ── IdfPeaks / IdfProjectInfo / IdfSensorCheck (narrow grouping types) ───────
@dataclass
 class IdfPeaks:
    """Geophone + mic peak values for one Thor event.  Native Thor units.
    Thor stores the mic peak in two parallel forms — ``mic_pspl_dbl`` is
    what the sidecar's top-level ``MicPSPL`` header field carries (dB(L)),
    used in the report header.  ``mic_pspl_psi`` is the psi value derived
    either from the IDFW sample table / IDFH interval column 9, or from
    the binary mic counts (~2.14e-6 psi/count).  Needed because the
    BW-shaped ``PeakValues.micl`` consumed by ``event_hdf5.write_event_hdf5``
    expects psi — feeding it dB(L) makes the h5 mic-chart scale factor
    blow up.
    """
    transverse_ips:    Optional[float] = None    # in/s
    vertical_ips:      Optional[float] = None    # in/s
    longitudinal_ips:  Optional[float] = None    # in/s
    peak_vector_sum_ips: Optional[float] = None  # in/s
    mic_pspl_dbl:      Optional[float] = None    # dB(L)
    mic_pspl_psi:      Optional[float] = None    # psi
@dataclass
 class IdfProjectInfo:
    """Operator-supplied strings from Thor's TitleString1..4."""
    project:  Optional[str] = None
    client:   Optional[str] = None
    operator: Optional[str] = None
    notes:    Optional[str] = None
    setup:    Optional[str] = None
@dataclass
 class IdfSensorCheck:
    """Per-channel pass/fail from Thor's self-test."""
    tran: Optional[bool] = None
    vert: Optional[bool] = None
    long: Optional[bool] = None
    mic:  Optional[bool] = None
 # ── IdfEvent ─────────────────────────────────────────────────────────────────
@dataclass
 class IdfEvent:
    """A single Thor / Micromate Series IV event.
    Built from a parsed .IDFW.txt or .IDFH.txt sidecar via
    ``IdfEvent.from_report()``.  The filename is the authoritative
    source for serial + timestamp + kind; the .txt provides
    device-authoritative peak values, frequencies, project strings,
    sensor self-check, firmware, calibration.
    """
    # Identity
    serial:    str
    timestamp: datetime.datetime
    kind:      str                  # "Waveform" | "Histogram"
    filename:  str                  # device-native binary filename, e.g. "UM11719_20231219163444.IDFW"
    # Sampling / timing
    sample_rate:     Optional[int]   = None
    record_time_sec: Optional[float] = None
    pre_trigger_sec: Optional[float] = None
    # Peaks
    peaks: IdfPeaks = field(default_factory=IdfPeaks)
    # Per-channel frequencies (Hz)
    tran_zc_freq: Optional[float] = None
    vert_zc_freq: Optional[float] = None
    long_zc_freq: Optional[float] = None
    mic_zc_freq:  Optional[float] = None
    # Project strings
    project_info: IdfProjectInfo = field(default_factory=IdfProjectInfo)
    # Sensor self-check
    sensor_check: IdfSensorCheck = field(default_factory=IdfSensorCheck)
    # Device-fixed
    firmware_version: Optional[str]   = None
    calibration_text: Optional[str]   = None
    battery_volts:    Optional[float] = None
    # The full parsed report — preserves anything not surfaced as a typed field
    report: IdfReport = field(default_factory=IdfReport)
    @classmethod
    def from_report(
        cls,
        report: Any,
        filename: str,
    ) -> "IdfEvent":
        """Build an IdfEvent from a parsed report (dict or IdfReport) and
        the device-native binary filename.
        The filename is authoritative for serial + timestamp + kind:
        Thor's filenames are literal ``<SERIAL>_<YYYYMMDDHHMMSS>.<KIND>``
        and the device's own clock is the canonical event timestamp.
        If the report carries an ``event_datetime`` that differs from
        what's in the filename, the report wins (it has finer-grained
        device-reported time-of-trigger semantics).
        """
        from .idf_ascii_report import parse_event_filename
        # Normalise input to IdfReport
        if isinstance(report, IdfReport):
            rep = report
        elif isinstance(report, dict):
            rep = IdfReport.from_dict(report)
        else:
            raise TypeError(
                f"report must be IdfReport or dict; got {type(report).__name__}"
            )
        # Filename → (serial, timestamp, kind).  Required — fall back to
        # report-supplied values only if filename parsing fails.
        parsed = parse_event_filename(filename)
        if parsed is not None:
            fn_serial, fn_ts, fn_kind = parsed
            kind = "Histogram" if fn_kind == "IDFH" else "Waveform"
        else:
            fn_serial = rep.serial_number or "UNKNOWN"
            fn_ts     = rep.event_datetime or datetime.datetime(1970, 1, 1)
            kind      = "Waveform" if (rep.event_type or "").lower().startswith("full waveform") else "Histogram"
        # Prefer report's event_datetime (device-authoritative) over the filename.
        ts = rep.event_datetime or fn_ts
        serial = rep.serial_number or fn_serial
        return cls(
            serial=serial,
            timestamp=ts,
            kind=kind,
            filename=filename,
            sample_rate=rep.sample_rate,
            record_time_sec=rep.record_time_sec,
            pre_trigger_sec=rep.pre_trigger_sec,
            peaks=IdfPeaks(
                transverse_ips      = rep.tran_ppv,
                vertical_ips        = rep.vert_ppv,
                longitudinal_ips    = rep.long_ppv,
                peak_vector_sum_ips = rep.peak_vector_sum,
                mic_pspl_dbl        = rep.mic_pspl_dbl,
            ),
            tran_zc_freq=rep.tran_zc_freq,
            vert_zc_freq=rep.vert_zc_freq,
            long_zc_freq=rep.long_zc_freq,
            mic_zc_freq=rep.mic_zc_freq,
            project_info=IdfProjectInfo(
                project=rep.project,
                client=rep.client,
                operator=rep.operator,
                notes=rep.notes,
                setup=rep.setup,
            ),
            sensor_check=IdfSensorCheck(
                tran=rep.tran_test_passed,
                vert=rep.vert_test_passed,
                long=rep.long_test_passed,
                mic=rep.mic_test_passed,
            ),
            firmware_version=rep.firmware_version,
            calibration_text=rep.calibration_text,
            battery_volts=rep.battery_volts,
            report=rep,
        )
    # ── Bridge to minimateplus shape (for the existing DB / sidecar paths) ──
    def to_minimateplus_event(self, waveform_key: bytes) -> Any:
        """Project this Thor event into the shape ``minimateplus.Event``
        carries, so it can flow through the existing
        ``SeismoDb.insert_events()`` and ``event_to_sidecar_dict()``
        machinery without those code paths needing to know about Thor.
        Caveats of the bridge:
          - ``PeakValues.micl`` carries the mic peak in **psi** (matching
            BW's convention) — set from :attr:`IdfPeaks.mic_pspl_psi`,
            with a dB(L)→psi fallback when only the dB(L) value is
            available.  This is what the h5 writer's mic-scale-factor
            logic needs.  The dB(L) value still flows through
            ``bw_report.mic.pspl_dbl`` (set by the
            ``idf_to_bw_report`` adapter) and the renderer reads it
            from there for the report header.
          - Many Thor-specific fields (Peak Acceleration / Displacement,
            sensor self-check, calibration) don't have a slot in
            ``Event``.  The full IdfReport is preserved on the
            ``.sfm.json`` sidecar under ``extensions.idf_report`` via
            ``save_imported_idf`` — that's the source of truth for them.
        """
        from minimateplus.models import (
            Event, PeakValues, ProjectInfo, Timestamp,
        )
        ts_obj = Timestamp(
            raw=bytes(9),
            flag=0,
            year=self.timestamp.year,
            unknown_byte=0,
            month=self.timestamp.month,
            day=self.timestamp.day,
            hour=self.timestamp.hour,
            minute=self.timestamp.minute,
            second=self.timestamp.second,
        )
        # Resolve mic peak as psi.  Priority: binary-derived mic_pspl_psi
        # (set by read_idf_file) > dB(L)→psi fallback via standard formula
        # (psi = 2.9e-9 × 10^(dBL/20)) > None.
        mic_psi = self.peaks.mic_pspl_psi
        if mic_psi is None and self.peaks.mic_pspl_dbl is not None:
            mic_psi = 2.9e-9 * (10.0 ** (self.peaks.mic_pspl_dbl / 20.0))
        pv = PeakValues(
            tran=self.peaks.transverse_ips,
            vert=self.peaks.vertical_ips,
            long=self.peaks.longitudinal_ips,
            micl=mic_psi,   # psi, matching BW's convention (h5 scaling depends on this)
            peak_vector_sum=self.peaks.peak_vector_sum_ips,
        )
        pi = ProjectInfo(
            setup_name=self.project_info.setup,
            project=self.project_info.project,
            client=self.project_info.client,
            operator=self.project_info.operator,
            sensor_location=None,           # Thor folds location into project string
            notes=self.project_info.notes,
        )
        ev = Event(
            index=0,
            timestamp=ts_obj,
            sample_rate=self.sample_rate,
            peak_values=pv,
            project_info=pi,
            record_type=self.kind,
            rectime_seconds=self.record_time_sec,
        )
        ev._waveform_key = waveform_key
        return ev
@@ -552,6 +552,105 @@ def classify_frame(frame: S3Frame) -> str:
 # ── Waveform file writer ───────────────────────────────────────────────────────────
 def extract_body_bytes(a5_frames):
    """Reconstruct the Blastware-file body bytes from a list of A5 frames.
    Returns ``(strt, body, footer)`` where:
    - ``strt`` is the 21-byte STRT record from the probe frame (or a fallback
      record built from minimal event metadata if STRT is missing).
    - ``body`` is the variable-length sample-data section (between STRT and
      the 26-byte file footer).  Empty if no frames decode.
    - ``footer`` is the 26-byte file footer.
    This is the same body-construction algorithm used by :func:`write_blastware_file`
    — refactored out so the body decoder (``waveform_codec.decode_waveform_v2``)
    can consume the same bytes without re-implementing the frame-walking logic.
    Returns ``(b"", b"", b"")`` if *a5_frames* is empty.
    """
    if not a5_frames:
        return (b"", b"", b"")
    # ── Extract STRT record from probe frame ─────────────────────────────────
    w0_raw = bytes(a5_frames[0].data[7:])
    w0_stripped = _strip_inner_frame_dles(w0_raw)
    strt_pos_stripped = w0_stripped.find(b"STRT")
    if strt_pos_stripped >= 0:
        strt = bytes(w0_stripped[strt_pos_stripped : strt_pos_stripped + 21])
        # Walk raw bytes to find the raw-domain end of the STRT (= body start).
        target_stripped = strt_pos_stripped + 21
        stripped_so_far = 0
        raw_i = 0
        while stripped_so_far < target_stripped and raw_i < len(w0_raw):
            if (w0_raw[raw_i] == 0x10
                    and raw_i + 1 < len(w0_raw)
                    and w0_raw[raw_i + 1] in {0x02, 0x03, 0x04}):
                raw_i += 2
            else:
                raw_i += 1
            stripped_so_far += 1
        probe_skip = 7 + raw_i
    else:
        strt = b"STRT" + b"\xff\xfe" + bytes(14) + b"\x00"
        probe_skip = 7 + 21
    if len(strt) != 21:
        return (b"", b"", b"")
    # Separate terminator from data frames.
    term_idx: Optional[int] = None
    if a5_frames and a5_frames[-1].page_key != 0x0010:
        term_idx = len(a5_frames) - 1
    if term_idx is not None:
        body_frames = a5_frames[:term_idx]
        term_frame = a5_frames[term_idx]
    else:
        body_frames = a5_frames
        term_frame = None
    all_bytes = bytearray()
    for fi, frame in enumerate(body_frames):
        if fi == 0:
            skip = probe_skip
        elif fi in (1, 2):
            skip = 13   # metadata pages
        else:
            skip = 12   # sample chunks
        all_bytes.extend(_frame_body_bytes(frame, skip))
    if term_frame is not None:
        all_bytes.extend(_frame_body_bytes(term_frame, 11))
    # Find the first valid `0e 08` footer marker.
    footer_pos = -1
    pos = 0
    while True:
        pos = bytes(all_bytes).find(b"\x0e\x08", pos)
        if pos < 0 or pos + 26 > len(all_bytes):
            break
        yr = (all_bytes[pos + 4] << 8) | all_bytes[pos + 5]
        if 2015 <= yr <= 2050:
            footer_pos = pos
            break
        pos += 1
    if footer_pos >= 0:
        body = bytes(all_bytes[:footer_pos])
        footer = bytes(all_bytes[footer_pos : footer_pos + 26])
    elif len(all_bytes) >= 26:
        body = bytes(all_bytes[:-26])
        footer = bytes(all_bytes[-26:])
    else:
        body = bytes(all_bytes)
        footer = b""
    return (strt, body, footer)
 def write_blastware_file(
    event: Event,
    a5_frames: list[S3Frame],
@@ -0,0 +1,738 @@
 """
 minimateplus/bw_ascii_report.py — parser for Blastware's per-event ASCII
 report (the .TXT file BW writes alongside each saved event binary).
 The ASCII export is the authoritative source for every "rich" per-event
 field that BW computes from the waveform but never persists in the BW
 binary itself:
  - Per-channel PPV (Tran / Vert / Long / MicL)
  - Peak Vector Sum + Peak Vector Sum Time
  - Per-channel ZC Freq, Time of Peak, Peak Acceleration, Peak Displacement
  - MicL PSPL, MicL Time of Peak, MicL ZC Freq
  - Per-channel Sensor Self-Check (Test Freq / Test Ratio / Test Results)
  - MicL Test Amplitude (mV)
  - Battery, calibration date, monitor-log timestamps
 Persisting these values into the SFM database lets the monthly-summary
 review workflow ("show me events at Location X with PVS > 0.5") work
 without depending on the (still-undecoded) waveform body codec.
 Format (verified against decode-re/5-8-26 4-event bundle):
  - One field per line, wrapped in double quotes:   `"Field Name : Value"`
  - Field/value separator: literal ` : ` (space-colon-space).
  - Some field names contain an internal `:` already (e.g. `"Project:"`),
    so we split on the FIRST ` : ` only.
  - Some fields have unit suffixes:  `"0.500 in/s"` / `"7.5 Hz"` / `"533 mv"`.
  - A `"Monitor Log(s)"` marker line is followed by tab-separated rows
    of `start_time<TAB>stop_time<TAB>description`.
  - Final `"PC SW Version : ..."` line ends the metadata block.
  - A blank line separates metadata from the sample table.
  - Sample table starts with `   Tran   <TAB>   Vert   <TAB>...`, then
    one row per sample (tab-separated, right-padded numeric values).
  - Geo channel values are in in/s; MicL in dB(L) (or 0.000 below threshold).
 Because some metadata fields have whitespace quirks ("MicL  Time of
 Peak" has two spaces; the leading "Project:" value has its own colon),
 we normalise whitespace in the key before lookup.
 """
 from __future__ import annotations
 import datetime
 import re
 from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Dict, List, Optional, Tuple, Union
 # ─────────────────────────────────────────────────────────────────────────────
 # Output dataclasses
 # ─────────────────────────────────────────────────────────────────────────────
@dataclass
 class ChannelStats:
    """Per-channel derived stats, populated from an event report."""
    ppv_ips:           Optional[float] = None      # in/s            (geo channels only)
    zc_freq_hz:        Optional[float] = None      # Hz
    time_of_peak_s:    Optional[float] = None      # seconds (relative to trigger; can be negative)
    peak_accel_g:      Optional[float] = None      # g               (geo channels only)
    peak_disp_in:      Optional[float] = None      # in              (geo channels only)
    # When BW writes "OORANGE" (Out Of Range — truncated) for a PPV
    # value, the true peak exceeded the channel's full-scale range.
    # We substitute the range max (e.g. 10.000 in/s for Normal range)
    # as a lower bound, and flag here so downstream UI / alerts know
    # to render "> 10 in/s" or "saturated" instead of trusting the
    # value as an exact measurement.
    ppv_saturated:     bool = False
    # Set when BW writes ">100 Hz" for ZC Freq — the zero-crossing
    # algorithm's peak frequency exceeded the device's reporting
    # ceiling (typically 100 Hz on V10.72).  zc_freq_hz gets the
    # threshold (100.0) as a lower bound; downstream UI renders ">100".
    zc_freq_above_range: bool = False
@dataclass
 class MicStats:
    """MicL-specific stats."""
    weighting:         Optional[str]   = None      # e.g. "Linear Weighting"
    pspl_dbl:          Optional[float] = None      # dB(L)
    zc_freq_hz:        Optional[float] = None
    time_of_peak_s:    Optional[float] = None
    # Set when BW writes "OORANGE" for PSPL — mic exceeded its
    # measurement range.  pspl_dbl gets the conservative upper bound
    # 140 dBL (typical NL-43 max; some units cap at 148).  Consumers
    # should render "> 140 dB(L)" or similar when this flag is set.
    pspl_saturated:    bool = False
    # Same semantics as ChannelStats.zc_freq_above_range — mic ZC
    # peak exceeded device reporting ceiling.
    zc_freq_above_range: bool = False
@dataclass
 class SensorCheck:
    """Per-channel sensor self-check result.
    Geo channels report a frequency + ratio; MicL reports a frequency +
    amplitude (mV).  All channels also have a Pass/Fail string.
    """
    test_freq_hz:      Optional[float] = None
    test_ratio:        Optional[float] = None      # geo channels only
    test_amplitude_mv: Optional[float] = None      # MicL only
    test_results:      Optional[str]   = None      # "Passed" / "Failed"
@dataclass
 class MonitorLogEntry:
    """One row of the trailing Monitor Log(s) block."""
    start_time:  Optional[datetime.datetime] = None
    stop_time:   Optional[datetime.datetime] = None
    description: Optional[str] = None
 # BW saturation marker — appears in PPV / Peak Vector Sum / similar
 # numeric fields when the underlying measurement exceeded the
 # channel's full-scale range (e.g., a geophone reading > 10 in/s at
 # Normal range, or a mic exceeding its sensitivity ceiling).  Treated
 # as "≥ range_max" + a saturated flag rather than discarded.
 # Appears as: ``"Tran PPV : OORANGE in/s"``
 _OORANGE_MARKERS = ("OORANGE", "OUT OF RANGE")
 def _is_oorange(value: str) -> bool:
    """True when a BW numeric field is an Out-Of-Range saturation marker."""
    s = value.strip().upper()
    return any(m in s for m in _OORANGE_MARKERS)
 def _parse_above_range(value: str) -> Optional[float]:
    """For BW "above-range" markers like ">100 Hz", return the threshold.
    BW writes ZC Freq as ">100 Hz" when the zero-crossing algorithm sees
    a peak too fast to count (device cuts off at 100 Hz).  Returns the
    numeric portion after the '>' (e.g. 100.0), or None if `value` is
    not an above-range marker.
    """
    s = value.strip()
    if not s.startswith(">"):
        return None
    return _parse_number(s[1:])
@dataclass
 class BwAsciiReport:
    """Structured representation of one BW per-event ASCII export."""
    # ── Identity ─────────────────────────────────────────────────────────────
    event_type:        Optional[str] = None         # e.g. "Full Waveform"
    serial:            Optional[str] = None         # e.g. "BE11529"
    version:           Optional[str] = None         # firmware version line
    file_name:         Optional[str] = None         # e.g. "M529LK44.AB0"
    event_datetime:    Optional[datetime.datetime] = None  # parsed from Event Time + Event Date
    # ── Trigger / recording config ──────────────────────────────────────────
    trigger_channel:        Optional[str]   = None  # e.g. "Vert" or "From Unit"
    geo_trigger_level_ips:  Optional[float] = None
    pretrig_s:              Optional[float] = None  # negative seconds
    record_time_s:          Optional[float] = None
    record_stop_mode:       Optional[str]   = None
    sample_rate_sps:        Optional[int]   = None
    battery_volts:          Optional[float] = None
    calibration_date:       Optional[datetime.date] = None
    calibration_by:         Optional[str]   = None  # e.g. "Instantel"
    units:                  Optional[str]   = None  # e.g. "in/s and dB(L)"
    # ── Operator-supplied metadata ──────────────────────────────────────────
    # Parsed by POSITION from the 4-line "User Notes" block BW writes
    # between the `Units :` and `Geo Range :` lines.  Position-based so
    # the values populate correctly even when an operator renames the
    # labels in Blastware's Compliance Setup → Notes tab (the 4 labels
    # are user-editable, e.g. "Seis Loc:" → "Building:" → "Site Address:").
    # The original labels BW wrote are preserved in `user_note_labels`
    # so terra-view can render them as the operator named them.
    project:           Optional[str] = None     # position 1 (BW default label "Project:")
    client:            Optional[str] = None     # position 2 (BW default label "Client:")
    operator:          Optional[str] = None     # position 3 (BW default label "User Name:")
    sensor_location:   Optional[str] = None     # position 4 (BW default label "Seis Loc:")
    # Maps canonical slot name → the literal label BW wrote in the ASCII
    # export.  Empty if the User Notes block wasn't present.  Example
    # when the operator renamed slot 4 to "Building:":
    #     {"project": "Project:", "client": "Client:",
    #      "operator": "User Name:", "sensor_location": "Building:"}
    user_note_labels:  Dict[str, str] = field(default_factory=dict)
    # ── Geo channel scaling ─────────────────────────────────────────────────
    geo_range_ips:     Optional[float] = None       # 10.000 / 1.250
    # ── Per-channel derived stats (geo + mic) ───────────────────────────────
    channels:          Dict[str, ChannelStats] = field(default_factory=dict)
    mic:               MicStats = field(default_factory=MicStats)
    # ── Vector sum ──────────────────────────────────────────────────────────
    peak_vector_sum_ips:    Optional[float] = None
    peak_vector_sum_time_s: Optional[float] = None
    # Saturation flag — set when BW writes "OORANGE" for the PVS.  We
    # then substitute sqrt(3) * geo_range_ips as a conservative upper
    # bound (the theoretical maximum PVS when all 3 geo channels are
    # simultaneously at full-scale).  Consumers should display this as
    # ">{value} in/s" or similar.
    peak_vector_sum_saturated: bool = False
    # Histograms additionally have an absolute date+time for the PVS
    # (it occurred at a specific interval).  Waveform reports show
    # only the relative-time value above.
    peak_vector_sum_when:   Optional[datetime.datetime] = None
    # ── Histogram-specific fields (populated only when Event Type starts
    # with 'Histogram' / 'Full Histogram' / 'Histogram + Continuous') ──
    histogram_start:        Optional[datetime.datetime] = None
    histogram_stop:         Optional[datetime.datetime] = None
    histogram_n_intervals:  Optional[int]   = None      # e.g. 4, 1436
    histogram_interval_size_str: Optional[str]   = None  # "1 minute" / "5 minutes" / "15 seconds"
    histogram_interval_size_s:   Optional[float] = None  # parsed to seconds
    # Per-channel absolute peak time+date (histogram-specific).  For
    # waveform events these are None — those reports use the channel's
    # time_of_peak_s (relative to trigger) instead.  Keyed by channel
    # name ("Tran", "Vert", "Long", "MicL").
    channel_peak_when:      Dict[str, datetime.datetime] = field(default_factory=dict)
    # ── Sensor self-check (per channel) ─────────────────────────────────────
    sensor_check:      Dict[str, SensorCheck] = field(default_factory=dict)
    # ── Monitor log + tooling version ───────────────────────────────────────
    monitor_log:       List[MonitorLogEntry] = field(default_factory=list)
    pc_sw_version:     Optional[str] = None
    # ── Sample table (optional; only parsed if requested) ───────────────────
    # Each entry: (Tran, Vert, Long, MicL) in the report's units (geo
    # channels in in/s, MicL in dB(L)).  None when parse_samples=False.
    samples:           Optional[List[Tuple[float, float, float, float]]] = None
 # ─────────────────────────────────────────────────────────────────────────────
 # Helpers
 # ─────────────────────────────────────────────────────────────────────────────
 _KEY_NORMALISE_RE = re.compile(r"\s+")
 _NUMERIC_RE       = re.compile(r"^-?\d+(?:\.\d+)?")
 def _normalise_key(k: str) -> str:
    """Collapse whitespace runs (incl. tabs) and strip — handles BW's
    "MicL  Time of Peak" double-space and leading-colon quirks."""
    return _KEY_NORMALISE_RE.sub(" ", k).strip()
 def _strip_quotes(line: str) -> str:
    line = line.rstrip("\r\n")
    if len(line) >= 2 and line.startswith('"') and line.endswith('"'):
        return line[1:-1]
    return line
 def _parse_number(value: str) -> Optional[float]:
    """Pull the leading numeric portion out of a value like "0.500 in/s"."""
    m = _NUMERIC_RE.match(value.strip())
    if not m:
        return None
    try:
        return float(m.group(0))
    except ValueError:
        return None
 def _parse_int(value: str) -> Optional[int]:
    n = _parse_number(value)
    return None if n is None else int(round(n))
 # Months exactly as BW writes them.
 _MONTHS = {
    "January": 1, "February": 2, "March": 3, "April": 4,
    "May": 5, "June": 6, "July": 7, "August": 8,
    "September": 9, "October": 10, "November": 11, "December": 12,
    # Short forms used in monitor-log rows ("Apr 23 /26").
    "Jan": 1, "Feb": 2, "Mar": 3, "Apr": 4, "Jun": 6, "Jul": 7,
    "Aug": 8, "Sep": 9, "Oct": 10, "Nov": 11, "Dec": 12,
 }
 def _parse_event_date(s: str) -> Optional[datetime.date]:
    """Parse "April 23, 2026" or "May 8, 2026" → date."""
    s = s.strip()
    parts = s.replace(",", " ").split()
    if len(parts) < 3:
        return None
    month_name, day_str, year_str = parts[0], parts[1], parts[2]
    month = _MONTHS.get(month_name)
    if month is None:
        return None
    try:
        return datetime.date(int(year_str), month, int(day_str))
    except ValueError:
        return None
 def _parse_iso_date(s: str) -> Optional[datetime.date]:
    """Parse "2026-05-16" → date.  Histograms use ISO format for their
    Start Date / Stop Date / Peak Date fields; waveforms use the
    "May 8, 2026" long form which `_parse_event_date` handles."""
    s = s.strip()
    try:
        return datetime.date.fromisoformat(s)
    except ValueError:
        return None
 _INTERVAL_UNIT_SECONDS = {
    "second": 1, "seconds": 1, "sec": 1, "secs": 1,
    "minute": 60, "minutes": 60, "min": 60, "mins": 60,
    "hour": 3600, "hours": 3600, "hr": 3600, "hrs": 3600,
 }
 def _parse_interval_size(s: str) -> Optional[float]:
    """Parse "1 minute" / "5 minutes" / "15 seconds" / "2 seconds" → seconds.
    Handles the BW Compliance Setup → Histogram Interval values verbatim
    ("2 seconds", "5 seconds", "15 seconds", "1 minute", "5 minutes",
    "15 minutes") plus a few defensive variants.
    """
    if not s:
        return None
    parts = s.strip().split()
    if len(parts) < 2:
        return None
    try:
        n = float(parts[0])
    except ValueError:
        return None
    unit_per_s = _INTERVAL_UNIT_SECONDS.get(parts[1].lower())
    if unit_per_s is None:
        return None
    return n * unit_per_s
 def _parse_event_time(s: str) -> Optional[datetime.time]:
    """Parse "15:56:35" → time."""
    s = s.strip()
    try:
        h, m, sec = s.split(":")
        return datetime.time(int(h), int(m), int(sec))
    except (ValueError, IndexError):
        return None
 def _parse_calibration(value: str) -> Tuple[Optional[datetime.date], Optional[str]]:
    """Parse "April 29, 2025 by Instantel" → (date, "Instantel")."""
    parts = value.split(" by ", 1)
    date = _parse_event_date(parts[0])
    by = parts[1].strip() if len(parts) > 1 else None
    return date, by
 def _parse_monitor_row(line: str) -> Optional[MonitorLogEntry]:
    """Parse a tab-separated monitor log row.
    Format: `<start>\t<stop>\t<desc>` where each timestamp is BW's
    short form "Mon DD /YY HH:MM:SS" (e.g. "Apr 23 /26 15:46:16").
    Year is encoded as a 2-digit suffix; we expand "/26" → 2026.
    """
    parts = line.split("\t")
    if len(parts) < 2:
        return None
    start = _parse_monitor_ts(parts[0])
    stop  = _parse_monitor_ts(parts[1])
    desc  = parts[2].strip() if len(parts) > 2 else None
    if start is None and stop is None and not desc:
        return None
    return MonitorLogEntry(start_time=start, stop_time=stop, description=desc)
 def _parse_monitor_ts(s: str) -> Optional[datetime.datetime]:
    """Parse "Apr 23 /26 15:46:16" → datetime."""
    s = s.strip()
    parts = s.split()
    if len(parts) < 4:
        return None
    month = _MONTHS.get(parts[0])
    if month is None:
        return None
    try:
        day = int(parts[1])
        # parts[2] looks like "/26" → century-flip to 2026
        yy = int(parts[2].lstrip("/"))
        year = 2000 + yy if yy < 80 else 1900 + yy
        h, m, sec = (int(x) for x in parts[3].split(":"))
        return datetime.datetime(year, month, day, h, m, sec)
    except (ValueError, IndexError):
        return None
 # ── User-notes positional slot map ──────────────────────────────────────────
 #
 # Blastware's Compliance Setup → Notes tab shows four operator-supplied
 # fields whose LABELS the operator can rename (see screenshot in
 # project archive).  Defaults are "Project:" / "Client:" /
 # "User Name:" / "Seis Loc:", but an operator using a different
 # convention can rename them to anything ("Building:", "Site:",
 # "Address:", etc.).  The ASCII export reflects whatever the operator
 # typed, so label-based matching is fragile.
 #
 # What IS reliable: BW always writes the 4 user-notes lines in the
 # same order, contiguously between the `Units :` line and the
 # `Geo Range :` line.  We parse them by POSITION and preserve the
 # operator's labels in `report.user_note_labels` so terra-view can
 # render them as the operator intended.
 _USER_NOTE_SLOTS = ("project", "client", "operator", "sensor_location")
 # ─────────────────────────────────────────────────────────────────────────────
 # Top-level parser
 # ─────────────────────────────────────────────────────────────────────────────
 def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwAsciiReport:
    """Parse a BW per-event ASCII export into a structured BwAsciiReport.
    Set ``parse_samples=True`` to also populate ``report.samples`` with
    the trailing sample table.  Default False because the table is
    huge and most callers only want metadata for indexing.
    """
    if isinstance(text, bytes):
        text = text.decode("ascii", errors="replace")
    report = BwAsciiReport()
    # Pre-create channel stat slots so callers can rely on them existing.
    for ch in ("Tran", "Vert", "Long", "MicL"):
        report.channels.setdefault(ch, ChannelStats())
        report.sensor_check.setdefault(ch, SensorCheck())
    lines = text.splitlines()
    i = 0
    n = len(lines)
    in_monitor_log_section = False
    event_time_str: Optional[str] = None
    event_date: Optional[datetime.date] = None
    # User-notes block detection.  We enter the block after parsing
    # the "Units :" line and exit on the "Geo Range :" line.  Inside,
    # the first 4 unmatched `<label> : <value>` lines are assigned to
    # the 4 canonical operator-supplied slots by POSITION (project,
    # client, operator, sensor_location) regardless of what the
    # operator named the labels in BW's Compliance Setup → Notes tab.
    in_user_notes_block = False
    user_note_position = 0
    # Histogram-field staging — BW writes <Channel> Peak Time and
    # <Channel> Peak Date on separate lines (and similarly Histogram
    # Start Time / Date).  We stash the partial value when the time
    # line arrives and combine it when the matching date line arrives.
    _hist_start_time: Optional[datetime.time] = None
    _hist_stop_time:  Optional[datetime.time] = None
    _pending_peak_time: Dict[str, Optional[datetime.time]] = {}
    _pvs_time_raw: Optional[str] = None  # last Peak Vector Sum Time value, raw
    while i < n:
        raw_line = lines[i]
        i += 1
        # Blank line marks the start of the sample table.
        if raw_line.strip() == "":
            break
        line = _strip_quotes(raw_line)
        # Monitor log section: "Monitor Log(s)" header followed by N rows
        # (still inside double-quoted lines), terminated by a non-row line
        # like "PC SW Version : ..." or a blank line.
        if not in_monitor_log_section and line.strip() == "Monitor Log(s)":
            in_monitor_log_section = True
            continue
        if in_monitor_log_section:
            # Heuristic: monitor rows contain a tab; the next "Field : Value"
            # line ends the section.
            if "\t" in line:
                entry = _parse_monitor_row(line)
                if entry:
                    report.monitor_log.append(entry)
                continue
            # Falls through to the field parser below; clear the flag.
            in_monitor_log_section = False
        # "Field : Value" — split on FIRST occurrence of " : "
        idx = line.find(" : ")
        if idx < 0:
            continue
        key = _normalise_key(line[:idx])
        value = line[idx + 3 :].strip()
        # ── Identity / config ────────────────────────────────────────────────
        if   key == "Event Type":           report.event_type = value
        elif key == "Serial Number":        report.serial = value
        elif key == "Version":              report.version = value
        elif key == "File Name":            report.file_name = value
        elif key == "Event Time":           event_time_str = value
        elif key == "Event Date":           event_date = _parse_event_date(value)
        elif key == "Trigger":              report.trigger_channel = value
        elif key == "Geo Trigger Level":    report.geo_trigger_level_ips = _parse_number(value)
        elif key == "Pre-trigger Length":   report.pretrig_s = _parse_number(value)
        elif key == "Record Time":          report.record_time_s = _parse_number(value)
        elif key == "Record Stop Mode":     report.record_stop_mode = value
        elif key == "Sample Rate":          report.sample_rate_sps = _parse_int(value)
        elif key == "Battery Level":        report.battery_volts = _parse_number(value)
        elif key == "Calibration":
            report.calibration_date, report.calibration_by = _parse_calibration(value)
        elif key == "Units":
            report.units = value
            # Entering the user-notes block.  Next ~4 lines until
            # "Geo Range :" are the operator-supplied notes.
            in_user_notes_block = True
            user_note_position = 0
        elif key == "Geo Range":
            # Exiting the user-notes block.
            in_user_notes_block = False
            report.geo_range_ips = _parse_number(value)
        # User-notes block: assign by position (operator may have
        # renamed the labels, so we don't trust them).  Preserve the
        # original labels in `user_note_labels` for downstream UIs
        # (terra-view) that want to display them as the operator
        # named them.
        elif in_user_notes_block and user_note_position < len(_USER_NOTE_SLOTS):
            slot = _USER_NOTE_SLOTS[user_note_position]
            setattr(report, slot, value)
            report.user_note_labels[slot] = key
            user_note_position += 1
        # ── Per-channel stats ────────────────────────────────────────────────
        # All match the pattern "{Channel} <stat-name>"
        elif key in (
            "Tran PPV", "Vert PPV", "Long PPV",
            "Tran ZC Freq", "Vert ZC Freq", "Long ZC Freq",
            "Tran Time of Peak", "Vert Time of Peak", "Long Time of Peak",
            "Tran Peak Acceleration", "Vert Peak Acceleration", "Long Peak Acceleration",
            "Tran Peak Displacement", "Vert Peak Displacement", "Long Peak Displacement",
        ):
            ch_name, stat = key.split(" ", 1)
            cs = report.channels.setdefault(ch_name, ChannelStats())
            if stat == "PPV":
                if _is_oorange(value):
                    # Channel saturated — substitute range max as lower
                    # bound; flag so downstream UI can render "> 10 in/s".
                    cs.ppv_ips       = report.geo_range_ips
                    cs.ppv_saturated = True
                else:
                    cs.ppv_ips = _parse_number(value)
            elif stat == "ZC Freq":
                # ">100 Hz" → store threshold + flag; numeric → parse normally
                threshold = _parse_above_range(value)
                if threshold is not None:
                    cs.zc_freq_hz = threshold
                    cs.zc_freq_above_range = True
                else:
                    cs.zc_freq_hz = _parse_number(value)
            else:
                num = _parse_number(value)
                if   stat == "Time of Peak":        cs.time_of_peak_s = num
                elif stat == "Peak Acceleration":   cs.peak_accel_g   = num
                elif stat == "Peak Displacement":   cs.peak_disp_in   = num
        # ── Histogram-specific fields ────────────────────────────────────────
        # Histograms have Start/Stop time+date pairs + an interval count
        # and size, plus per-channel absolute Peak Time/Date instead of
        # the waveform's relative Time of Peak.
        elif key == "Histogram Start Time":
            _hist_start_time = _parse_event_time(value)
        elif key == "Histogram Start Date":
            _d = _parse_iso_date(value)
            if _d and _hist_start_time:
                report.histogram_start = datetime.datetime.combine(_d, _hist_start_time)
        elif key == "Histogram Stop Time":
            _hist_stop_time = _parse_event_time(value)
        elif key == "Histogram Stop Date":
            _d = _parse_iso_date(value)
            if _d and _hist_stop_time:
                report.histogram_stop = datetime.datetime.combine(_d, _hist_stop_time)
        elif key == "Number of Intervals":
            try:
                report.histogram_n_intervals = int(float(value.strip()))
            except ValueError:
                pass
        elif key == "Interval Size":
            report.histogram_interval_size_str = value.strip()
            report.histogram_interval_size_s   = _parse_interval_size(value)
        # ── Per-channel histogram Peak Date / Peak Time ──
        # Lines like "Tran Peak Time : 22:31:38" + "Tran Peak Date : 2026-05-16"
        elif key in ("Tran Peak Time", "Vert Peak Time", "Long Peak Time", "MicL Time"):
            ch_name = "MicL" if key == "MicL Time" else key.split(" ", 1)[0]
            _pending_peak_time[ch_name] = _parse_event_time(value)
        elif key in ("Tran Peak Date", "Vert Peak Date", "Long Peak Date", "MicL Date"):
            ch_name = "MicL" if key == "MicL Date" else key.split(" ", 1)[0]
            _d = _parse_iso_date(value)
            _t = _pending_peak_time.get(ch_name)
            if _d and _t:
                report.channel_peak_when[ch_name] = datetime.datetime.combine(_d, _t)
        # ── Vector Sum ───────────────────────────────────────────────────────
        elif key == "Peak Vector Sum":
            if _is_oorange(value):
                # PVS saturated — conservative upper bound is
                # sqrt(3) * geo_range_ips (all 3 channels at full-scale).
                # Real PVS could be lower (channels rarely peak
                # simultaneously) but never higher within the range.
                if report.geo_range_ips is not None:
                    import math as _math
                    report.peak_vector_sum_ips = _math.sqrt(3) * report.geo_range_ips
                report.peak_vector_sum_saturated = True
            else:
                report.peak_vector_sum_ips = _parse_number(value)
        # BW writes the PVS-time label with a typo: "Peak Vector Sum TimeSum"
        # (looks like Sum got appended twice).  Accept both forms.  Confirmed
        # against actual BW output on 2026-05-27 — every PVS-time line in
        # the field examples (T190, T438, K557) uses the typo'd label.
        elif key in ("Peak Vector Sum Time", "Peak Vector Sum TimeSum"):
            report.peak_vector_sum_time_s = _parse_number(value)
            _pvs_time_raw = value
        elif key == "Peak Vector Sum Date":
            # Histogram-mode PVS gets paired with a date.  We may have
            # captured 'Peak Vector Sum Time' as either a relative
            # seconds float (waveform) or an HH:MM:SS string we
            # interpreted as a number.  For histograms, BW writes
            # "Peak Vector Sum Time : 22:33:52" which _parse_number
            # parses as 22.0 (loses information).  When Peak Vector Sum
            # Date arrives, re-parse the previous PVS time line as a
            # clock time and combine into an absolute datetime.
            _d = _parse_iso_date(value)
            if _d and _pvs_time_raw is not None:
                _t = _parse_event_time(_pvs_time_raw)
                if _t:
                    report.peak_vector_sum_when = datetime.datetime.combine(_d, _t)
                    # The earlier seconds parse was bogus for histograms;
                    # clear it so consumers don't think it's a real offset.
                    report.peak_vector_sum_time_s = None
        # ── Microphone block ────────────────────────────────────────────────
        elif key == "Microphone":
            report.mic.weighting = value
        elif key == "MicL PSPL":
            if _is_oorange(value):
                # Mic saturated — substitute conservative upper bound 140 dBL.
                report.mic.pspl_dbl       = 140.0
                report.mic.pspl_saturated = True
            else:
                report.mic.pspl_dbl = _parse_number(value)
            # Mirror onto the "MicL" entry in channels so callers querying
            # `channels["MicL"].ppv_ips` see something — but it's dB(L), not
            # in/s, so we store as-is in the MicStats and mark the channel.
        elif key == "MicL Time of Peak":
            report.mic.time_of_peak_s = _parse_number(value)
            cs = report.channels.setdefault("MicL", ChannelStats())
            cs.time_of_peak_s = report.mic.time_of_peak_s
        elif key == "MicL ZC Freq":
            threshold = _parse_above_range(value)
            if threshold is not None:
                report.mic.zc_freq_hz         = threshold
                report.mic.zc_freq_above_range = True
            else:
                report.mic.zc_freq_hz = _parse_number(value)
            cs = report.channels.setdefault("MicL", ChannelStats())
            cs.zc_freq_hz          = report.mic.zc_freq_hz
            cs.zc_freq_above_range = report.mic.zc_freq_above_range
        # ── Sensor self-check ────────────────────────────────────────────────
        elif key in (
            "Tran Test Freq", "Vert Test Freq", "Long Test Freq", "MicL Test Freq",
            "Tran Test Ratio", "Vert Test Ratio", "Long Test Ratio",
            "MicL Test Amplitude",
            "Tran Test Results", "Vert Test Results", "Long Test Results", "MicL Test Results",
        ):
            ch_name, stat = key.split(" ", 1)
            sc = report.sensor_check.setdefault(ch_name, SensorCheck())
            if   stat == "Test Freq":      sc.test_freq_hz      = _parse_number(value)
            elif stat == "Test Ratio":     sc.test_ratio        = _parse_number(value)
            elif stat == "Test Amplitude": sc.test_amplitude_mv = _parse_number(value)
            elif stat == "Test Results":   sc.test_results      = value
        # ── Trailer ─────────────────────────────────────────────────────────
        elif key == "PC SW Version":
            report.pc_sw_version = value
        # Unknown keys are silently dropped — forward-compat for future
        # BW versions that may add fields.
    # Combine event date + time into a datetime
    if event_date is not None and event_time_str is not None:
        t = _parse_event_time(event_time_str)
        if t is not None:
            report.event_datetime = datetime.datetime.combine(event_date, t)
    if parse_samples:
        report.samples = _parse_sample_table(lines, i)
    return report
 def _parse_sample_table(
    lines: List[str], start: int,
 ) -> List[Tuple[float, float, float, float]]:
    """Parse the trailing sample table.
    The table starts with a header row ("   Tran   <TAB>...") and continues
    until EOF.  Each data row is a tab-separated quartet of numeric values.
    """
    samples: List[Tuple[float, float, float, float]] = []
    seen_header = False
    for line in lines[start:]:
        line = line.rstrip("\r\n")
        if not line.strip():
            continue
        cols = [c.strip() for c in line.split("\t") if c.strip()]
        if not seen_header:
            # Header row contains channel names; numeric rows don't.
            if any(c in ("Tran", "Vert", "Long", "MicL") for c in cols):
                seen_header = True
            continue
        if len(cols) < 4:
            continue
        try:
            samples.append((
                float(cols[0]), float(cols[1]),
                float(cols[2]), float(cols[3]),
            ))
        except ValueError:
            continue
    return samples
 def parse_report_file(
    path: Union[str, Path], *, parse_samples: bool = False,
 ) -> BwAsciiReport:
    """Convenience: read a .TXT file from disk and parse it."""
    return parse_report(Path(path).read_bytes(), parse_samples=parse_samples)
@@ -1362,20 +1362,6 @@ def _decode_waveform_record_into(data: bytes, event: Event) -> None:
    Modifies event in-place.
    """
    # ── Always preserve the raw 210 bytes ─────────────────────────────────────
    # The 0C record carries far more than just peaks + project strings:
    # ZC Freq, Time of Peak, Peak Acceleration, Peak Displacement, Vector
    # Sum Time, MicL Time of Peak, and the per-channel sensor self-check
    # results (Test Freq / Ratio / Pass-Fail) all live somewhere in this
    # 210-byte block.  Their byte offsets are not yet mapped — keeping the
    # raw bytes lets us decode those fields offline once we have a paired
    # (raw 0C, BW-report) sample to fit against.  Cheap to keep around
    # (210 bytes per event).
    try:
        event._raw_record = bytes(data[:210])
    except Exception:
        pass
    # ── Record type + format detection ────────────────────────────────────────
    # `record_type` is the user-facing label ("Waveform" for any triggered
    # event regardless of timestamp-header layout).  `fmt` is the internal
@@ -1514,22 +1500,69 @@ def _decode_a5_waveform(
    (BULK_WAVEFORM_STREAM) frame payloads and populate event.raw_samples,
    event.total_samples, event.pretrig_samples, and event.rectime_seconds.
-    This requires ALL A5 frames (stop_after_metadata=False), not just the
+    Wired up 2026-05-11 to the verified ``decode_waveform_v2`` codec (see
-    metadata-bearing subset.
+    ``minimateplus/waveform_codec.py`` and ``docs/waveform_codec_re_status.md``).
    Replaces the legacy int16 LE decoder, which produced full-scale ±32K
    noise on every event because the body bytes are encoded, not raw
    samples.
-    ── Waveform format (confirmed from 4-2-26 blast capture) ───────────────────
+    Output convention (preserved from the legacy decoder):
-    The blast waveform is 4-channel interleaved signed 16-bit little-endian,
+      ``event.raw_samples`` is a dict with keys "Tran", "Vert", "Long",
-    8 bytes per sample-set:
+      "MicL" mapping to lists of **int16 ADC counts**.  Multiply by
      ``geo_range / 32768`` for geo channels to get in/s; use
      :func:`minimateplus.waveform_codec.mic_count_to_db` for mic dB(L).
    ``total_samples`` / ``pretrig_samples`` / ``rectime_seconds`` are set
    to ``None`` so the caller backfills from compliance_config (the
    authoritative source — STRT fields aren't reliable).
    """
    from .waveform_codec import decode_a5_frames
    event.total_samples = None
    event.pretrig_samples = None
    event.rectime_seconds = None
    if not frames_data:
        log.debug("_decode_a5_waveform: no frames provided")
        return
    decoded = decode_a5_frames(frames_data)
    if decoded is None:
        log.warning("_decode_a5_waveform: codec returned no samples")
        return
    event.raw_samples = decoded
    log.debug(
        "_decode_a5_waveform: decoded %d/%d/%d/%d samples (T/V/L/M)",
        len(decoded.get("Tran", [])),
        len(decoded.get("Vert", [])),
        len(decoded.get("Long", [])),
        len(decoded.get("MicL", [])),
    )
 def _decode_a5_waveform_LEGACY(
    frames_data: list[S3Frame],
    event: Event,
 ) -> None:
    """
    LEGACY decoder — kept for reference only.  DO NOT CALL.
    This is the int16 LE decoder that produced full-scale ±32K noise
    on every event.  Retracted 2026-05-08; replaced 2026-05-11 with
    the verified codec in :mod:`minimateplus.waveform_codec`.  See
    ``docs/instantel_protocol_reference.md §7.6.1`` for the full history.
    ── Waveform format (LEGACY — WRONG) ────────────────────────────────
    Claimed 4-channel interleaved signed 16-bit little-endian, 8 bytes
    per sample-set:
        [T_lo T_hi V_lo V_hi L_lo L_hi M_lo M_hi] × N
-    where T=Tran, V=Vert, L=Long, M=Mic.  Channel ordering follows the
+    where T=Tran, V=Vert, L=Long, M=Mic.
    Blastware convention [Tran, Vert, Long, Mic] = [ch0, ch1, ch2, ch3].
-    ⚠️  Channel ordering is a confirmed CONVENTION — the physical ordering on
+    The body bytes are actually a tagged delta+RLE stream — this
-        the ADC mux is not independently verifiable from the saturating blast
+    interpretation was wrong.
        captures we have.  The convention is consistent with Blastware labeling
        (Tran is always the first channel field in the A5 STRT+waveform stream).
    ── Frame structure ──────────────────────────────────────────────────────────
    A5[0] (probe response):
@@ -15,7 +15,6 @@ declared in `event_to_sidecar_dict()`.
 from __future__ import annotations
 import base64
 import datetime
 import hashlib
 import json
@@ -27,6 +26,14 @@ from typing import Optional, Union
 from .models import Event, PeakValues, ProjectInfo, Timestamp
 from . import blastware_file as _bw  # avoid circular reference at module load
 from .bw_ascii_report import BwAsciiReport
 from .waveform_codec import decode_waveform_v2, decoded_to_adc_counts
 from .histogram_codec import decode_histogram_body
 # Reference pressure for dB(L) → psi conversion (20 µPa expressed in psi).
 # Same constant as sfm/sfm_webapp.html so server-side and browser-side
 # conversions agree.
 _DBL_REF_PSI = 2.9e-9
 log = logging.getLogger(__name__)
@@ -42,7 +49,7 @@ SIDECAR_KIND   = "sfm.event"
 # bumped without a `pip install` re-run — leading to confusing stale
 # version stamps in sidecars.  Bump this constant and CHANGELOG.md
 # together at release time.
-TOOL_VERSION = "0.15.0"
+TOOL_VERSION = "0.21.1"
 try:
    # Best-effort: prefer the installed metadata when it's NEWER than the
@@ -95,6 +102,242 @@ def _peak_values_to_dict(pv: Optional[PeakValues]) -> dict:
    }
 def _bw_report_to_dict(report: BwAsciiReport) -> dict:
    """Project a parsed BW ASCII report into the sidecar's `bw_report` block.
    All fields are rendered as plain JSON-compatible types (no datetime
    objects).  Channels are uniformly lowercased for stable JSON keys.
    """
    def _ch(ch_name: str) -> dict:
        cs = report.channels.get(ch_name)
        if cs is None:
            return {}
        out = {
            "ppv_ips":         cs.ppv_ips,
            "zc_freq_hz":      cs.zc_freq_hz,
            "time_of_peak_s":  cs.time_of_peak_s,
            "peak_accel_g":    cs.peak_accel_g,
            "peak_disp_in":    cs.peak_disp_in,
        }
        # Drop all-None entries — keeps the JSON tidy for partial reports.
        out = {k: v for k, v in out.items() if v is not None}
        # Saturation flag (only present when True) — signals that ppv_ips
        # is the channel range max (a lower bound), not an exact reading.
        if getattr(cs, "ppv_saturated", False):
            out["ppv_saturated"] = True
        # ZC Freq above device reporting ceiling (BW ">100 Hz") — value
        # in zc_freq_hz is the threshold, not an exact measurement.
        if getattr(cs, "zc_freq_above_range", False):
            out["zc_freq_above_range"] = True
        return out
    def _sc(ch_name: str) -> dict:
        sc = report.sensor_check.get(ch_name)
        if sc is None:
            return {}
        out = {
            "freq_hz":      sc.test_freq_hz,
            "ratio":        sc.test_ratio,
            "amplitude_mv": sc.test_amplitude_mv,
            "result":       sc.test_results,
        }
        return {k: v for k, v in out.items() if v is not None}
    monitor_log = []
    for entry in report.monitor_log:
        e = {
            "start":       entry.start_time.isoformat() if entry.start_time else None,
            "stop":        entry.stop_time.isoformat()  if entry.stop_time  else None,
            "description": entry.description,
        }
        monitor_log.append({k: v for k, v in e.items() if v is not None})
    return {
        "available":   True,
        "event_type":  report.event_type,
        "version":     report.version,
        "trigger": {
            "channel":       report.trigger_channel,
            "geo_level_ips": report.geo_trigger_level_ips,
        },
        "recording": {
            "sample_rate_sps":  report.sample_rate_sps,
            "record_time_s":    report.record_time_s,
            "pretrig_s":        report.pretrig_s,
            "stop_mode":        report.record_stop_mode,
            "geo_range_ips":    report.geo_range_ips,
            "units":            report.units,
        },
        "device": {
            "battery_volts":    report.battery_volts,
            "calibration_date": report.calibration_date.isoformat() if report.calibration_date else None,
            "calibration_by":   report.calibration_by,
        },
        "peaks": {
            "tran":         _ch("Tran"),
            "vert":         _ch("Vert"),
            "long":         _ch("Long"),
            "vector_sum": {
                "ips":       report.peak_vector_sum_ips,
                "time_s":    report.peak_vector_sum_time_s,
                # Histogram events have an absolute date+time for the PVS
                # (the interval at which it occurred); waveform events
                # only have the time_s offset.
                "when":      report.peak_vector_sum_when.isoformat() if report.peak_vector_sum_when else None,
                # Set when BW reported the PVS as OORANGE — value is the
                # conservative upper bound sqrt(3) * geo_range_ips, not
                # an exact peak.
                "saturated": bool(getattr(report, "peak_vector_sum_saturated", False)),
            },
        },
        "mic": {
            "weighting":             report.mic.weighting,
            "pspl_dbl":              report.mic.pspl_dbl,
            "pspl_saturated":        bool(getattr(report.mic, "pspl_saturated", False)),
            "zc_freq_hz":            report.mic.zc_freq_hz,
            "zc_freq_above_range":   bool(getattr(report.mic, "zc_freq_above_range", False)),
            "time_of_peak_s":        report.mic.time_of_peak_s,
        },
        "sensor_check": {
            "tran": _sc("Tran"),
            "vert": _sc("Vert"),
            "long": _sc("Long"),
            "mic":  _sc("MicL"),
        },
        # Histogram-specific fields (None on waveform-mode events).
        # Per-channel absolute peak time/date for histograms — for
        # waveforms see channels[ch]["time_of_peak_s"] instead.
        "histogram": {
            "start":               report.histogram_start.isoformat() if report.histogram_start else None,
            "stop":                report.histogram_stop.isoformat()  if report.histogram_stop  else None,
            "n_intervals":         report.histogram_n_intervals,
            "interval_size":       report.histogram_interval_size_str,
            "interval_size_s":     report.histogram_interval_size_s,
            "channel_peak_when":   {ch: dt.isoformat() for ch, dt in report.channel_peak_when.items()},
        },
        "monitor_log":   monitor_log,
        "pc_sw_version": report.pc_sw_version,
    }
 def _dbl_to_psi(pspl_dbl: float) -> float:
    """Convert dB(L) sound pressure level back to psi.  Uses the same
    20 µPa reference (= 2.9e-9 psi) as the webapp so server-side and
    browser-side conversions agree."""
    return _DBL_REF_PSI * (10.0 ** (pspl_dbl / 20.0))
 def apply_report_to_event(event: Event, report: BwAsciiReport) -> None:
    """Overlay device-authoritative fields from a parsed BW ASCII report
    onto an in-memory Event, IN-PLACE.
    Why this exists
    ───────────────
    `read_blastware_file()` parses the BW binary and fills `Event.peak_values`
    via `_peaks_from_samples()` — which runs the (still-undecoded) BW body
    codec assuming raw int16 LE and produces ±32K-shaped noise on every
    channel.  Result: peak values land in the SeismoDb event row as
    ~10 in/s on every event regardless of the actual signal.
    When a paired BW ASCII report is available, the report carries the
    device's own authoritative peak / project / sample-rate / record-time
    values.  This helper folds those onto the Event before it flows to
    `SeismoDb.insert_events()`, so the DB columns reflect the report
    rather than the broken-codec output.
    Fields overlaid (only when the report supplies a non-None value):
      - peak_values.tran / .vert / .long              (from report.channels)
      - peak_values.peak_vector_sum                   (from report.peak_vector_sum_ips)
      - peak_values.micl  (psi)                       (from report.mic.pspl_dbl → psi)
      - project_info.project / .client / .operator / .sensor_location
      - sample_rate                                   (from report.sample_rate_sps)
      - rectime_seconds                               (from report.record_time_s)
    Fields NOT touched (operator-edit / parser-output preserved):
      - timestamp, raw_samples, record_type, total_samples,
        pretrig_samples, _waveform_key, _a5_frames, _raw_record
      - false_trigger and review state (those live on the sidecar, not on Event)
    """
    if event.peak_values is None:
        event.peak_values = PeakValues()
    pv = event.peak_values
    ch = report.channels
    if (t := ch.get("Tran")) and t.ppv_ips is not None: pv.tran = t.ppv_ips
    if (v := ch.get("Vert")) and v.ppv_ips is not None: pv.vert = v.ppv_ips
    if (l := ch.get("Long")) and l.ppv_ips is not None: pv.long = l.ppv_ips
    if report.peak_vector_sum_ips is not None:
        pv.peak_vector_sum = report.peak_vector_sum_ips
    if report.mic.pspl_dbl is not None and report.mic.pspl_dbl > 0:
        pv.micl = _dbl_to_psi(report.mic.pspl_dbl)
    if event.project_info is None:
        event.project_info = ProjectInfo()
    pi = event.project_info
    if report.project:         pi.project         = report.project
    if report.client:          pi.client          = report.client
    if report.operator:        pi.operator        = report.operator
    if report.sensor_location: pi.sensor_location = report.sensor_location
    if report.sample_rate_sps:
        event.sample_rate = report.sample_rate_sps
    if report.record_time_s is not None:
        event.rectime_seconds = report.record_time_s
 def apply_bw_report_dict_to_event(event: Event, bw_report: dict) -> None:
    """Mirror of ``apply_report_to_event`` for the projected sidecar
    dict shape (as produced by ``_bw_report_to_dict``).
    Why this exists
    ───────────────
    The ingest path holds a live ``BwAsciiReport`` parsed straight from
    the ``_ASCII.TXT`` and uses ``apply_report_to_event`` to overlay
    device-authoritative peaks onto the codec output before insert.
    The backfill path doesn't have the original ``.TXT`` (it's not
    retained in the waveform store), but it does have the preserved
    ``bw_report`` block from the sidecar — which contains the same
    projected fields.  Re-overlaying those during a backfill keeps the
    DB peak columns aligned with what BW reports rather than letting
    the codec output (which may be incomplete for unhandled formats or
    walker edge cases) win by default.
    No-ops cleanly when ``bw_report`` is ``None``, empty, or missing
    any particular sub-field — only fields with a concrete value get
    written.  Mirrors ``apply_report_to_event``'s "report wins where
    present" semantics.
    """
    if not bw_report:
        return
    if event.peak_values is None:
        event.peak_values = PeakValues()
    pv = event.peak_values
    peaks = bw_report.get("peaks") or {}
    tran = (peaks.get("tran") or {}).get("ppv_ips")
    vert = (peaks.get("vert") or {}).get("ppv_ips")
    long = (peaks.get("long") or {}).get("ppv_ips")
    if tran is not None: pv.tran = tran
    if vert is not None: pv.vert = vert
    if long is not None: pv.long = long
    vs_ips = (peaks.get("vector_sum") or {}).get("ips")
    if vs_ips is not None:
        pv.peak_vector_sum = vs_ips
    mic = bw_report.get("mic") or {}
    pspl = mic.get("pspl_dbl")
    if pspl is not None and pspl > 0:
        pv.micl = _dbl_to_psi(pspl)
    rec = bw_report.get("recording") or {}
    sr = rec.get("sample_rate_sps")
    if sr:
        event.sample_rate = sr
    rt = rec.get("record_time_s")
    if rt is not None:
        event.rectime_seconds = rt
 def _project_info_to_dict(pi: Optional[ProjectInfo]) -> dict:
    if pi is None:
        return {
@@ -119,54 +362,110 @@ def event_to_sidecar_dict(
    blastware_filesize: int,
    blastware_sha256: str,
    source_kind: str = "sfm-live",
    txt_filename: Optional[str] = None,
    a5_pickle_filename: Optional[str] = None,
    tool_version: str = _TOOL_VERSION_DEFAULT,
    captured_at: Optional[datetime.datetime] = None,
    review: Optional[dict] = None,
    extensions: Optional[dict] = None,
    bw_report: Optional[BwAsciiReport] = None,
 ) -> dict:
    """
    Build a v1 sidecar dict from an Event + the surrounding metadata.
    Pure helper — no file I/O.  Callers stitch the result into a sidecar
    via `write_sidecar()` (or POST it back via the PATCH endpoint).
    When *bw_report* is supplied (e.g. by the ACH-forwarded import path
    where Blastware writes a per-event ASCII report alongside the binary),
    its decoded fields are folded into the sidecar:
      - A new top-level ``bw_report`` block carries the rich derived
        per-channel stats (Peak Acceleration, Peak Displacement, ZC Freq,
        Time of Peak), the Peak Vector Sum + time, the per-channel sensor
        self-check results, and monitor-log timestamps.
      - ``peak_values`` is overlaid from the report (the report's PPV/PVS
        values are computed by the device firmware and are authoritative;
        anything ``read_blastware_file()`` derived from samples is
        approximate at best until the body codec is decoded).
      - ``project_info`` is overlaid from the report when the report
        supplies a non-empty value (the report mirrors the device's
        compliance config, which is what BW shows in its event report).
      - ``event.timestamp`` is overlaid from the report's Event Date +
        Event Time (BW's report timestamps are second-resolution and
        match the binary's footer; we prefer the report value because
        the BW-binary footer timestamp can drift on some firmware).
    """
-    if source_kind not in {"sfm-live", "sfm-ach", "bw-import"}:
+    if source_kind not in {"sfm-live", "sfm-ach", "bw-import", "idf-import"}:
        raise ValueError(f"unknown source_kind: {source_kind!r}")
    captured_at = captured_at or datetime.datetime.utcnow()
-    # Stash raw 0C record bytes in `extensions.raw_records` so future
+    # ── Overlay event fields from the report when present ───────────────────
-    # field-decoding work (Peak Acceleration, ZC Freq, Time of Peak,
+    timestamp_iso = _ts_iso(event.timestamp)
-    # sensor self-check results, etc.) can run offline against committed
+    if bw_report and bw_report.event_datetime:
-    # sidecars without a live device.  Cheap (~280 bytes base64) and
+        timestamp_iso = bw_report.event_datetime.isoformat()
    # forward-compatible (older readers ignore unknown extensions keys).
    ext_dict: dict = dict(extensions) if extensions else {}
    raw_0c = getattr(event, "_raw_record", None)
    if raw_0c:
        rr = ext_dict.setdefault("raw_records", {})
        # Don't clobber a raw_0c that callers explicitly passed in via
        # `extensions=...` (e.g. round-trip preservation in patch_sidecar).
        rr.setdefault("waveform_record_b64", base64.b64encode(raw_0c).decode("ascii"))
        rr.setdefault("waveform_record_len", len(raw_0c))
-    return {
+    # Build peak_values, optionally overlaid from the report.  The report
    # stores Mic peak as PSPL (dB(L)); we convert to psi to match the
    # existing peak_values.mic_psi field.
    peak_dict = _peak_values_to_dict(event.peak_values)
    if bw_report:
        ch = bw_report.channels
        if (t := ch.get("Tran")) and t.ppv_ips is not None: peak_dict["transverse"]   = t.ppv_ips
        if (v := ch.get("Vert")) and v.ppv_ips is not None: peak_dict["vertical"]     = v.ppv_ips
        if (l := ch.get("Long")) and l.ppv_ips is not None: peak_dict["longitudinal"] = l.ppv_ips
        if bw_report.peak_vector_sum_ips is not None:
            peak_dict["vector_sum"] = bw_report.peak_vector_sum_ips
        if bw_report.mic.pspl_dbl is not None and bw_report.mic.pspl_dbl > 0:
            peak_dict["mic_psi"] = _dbl_to_psi(bw_report.mic.pspl_dbl)
    # Project info: overlay from report (the report mirrors the
    # session-start compliance config that BW renders in event reports).
    proj_dict = _project_info_to_dict(event.project_info)
    if bw_report:
        if bw_report.project:         proj_dict["project"]         = bw_report.project
        if bw_report.client:          proj_dict["client"]          = bw_report.client
        if bw_report.operator:        proj_dict["operator"]        = bw_report.operator
        if bw_report.sensor_location: proj_dict["sensor_location"] = bw_report.sensor_location
    # Event-block fields: overlay from report where available.
    event_block = {
        "serial":           serial,
        "timestamp":        timestamp_iso,
        "waveform_key":     event._waveform_key.hex() if event._waveform_key else None,
        "record_type":      event.record_type,
        "sample_rate":      event.sample_rate,
        "rectime_seconds":  event.rectime_seconds,
        "total_samples":    event.total_samples,
        "pretrig_samples":  event.pretrig_samples,
    }
    if bw_report:
        # Report values are authoritative — they're the user-configured
        # values BW reads back, not STRT-derived guesses.  In particular
        # `event.rectime_seconds` from `read_blastware_file()` reads
        # STRT[18] which is actually the `0x46` record-type marker (= 70)
        # rather than the user's Record Time setting.  Always overwrite.
        if bw_report.sample_rate_sps:
            event_block["sample_rate"] = bw_report.sample_rate_sps
        if bw_report.record_time_s is not None:
            event_block["rectime_seconds"] = bw_report.record_time_s
        # Derive total_samples + pretrig_samples per channel from the
        # report's sample_rate × times.  These match the row count of
        # the report's sample table (verified: event-c reports 1024 sps
        # × (1.0 + 0.25) = 1280 rows).
        if (sr := bw_report.sample_rate_sps) and bw_report.record_time_s is not None:
            pretrig_s = abs(bw_report.pretrig_s) if bw_report.pretrig_s is not None else 0.0
            event_block["total_samples"]   = int(round(sr * (bw_report.record_time_s + pretrig_s)))
            event_block["pretrig_samples"] = int(round(sr * pretrig_s))
    out = {
        "schema_version": SCHEMA_VERSION,
        "kind":           SIDECAR_KIND,
-        "event": {
+        "event":        event_block,
-            "serial":           serial,
+        "peak_values":  peak_dict,
-            "timestamp":        _ts_iso(event.timestamp),
+        "project_info": proj_dict,
            "waveform_key":     event._waveform_key.hex() if event._waveform_key else None,
            "record_type":      event.record_type,
            "sample_rate":      event.sample_rate,
            "rectime_seconds":  event.rectime_seconds,
            "total_samples":    event.total_samples,
            "pretrig_samples":  event.pretrig_samples,
        },
        "peak_values":  _peak_values_to_dict(event.peak_values),
        "project_info": _project_info_to_dict(event.project_info),
        "blastware": {
            "filename":  blastware_filename,
@@ -180,6 +479,7 @@ def event_to_sidecar_dict(
            "captured_at":        captured_at.isoformat() + "Z" if captured_at.tzinfo is None else captured_at.isoformat(),
            "tool_version":       tool_version,
            "a5_pickle_filename": a5_pickle_filename,
            "txt_filename":       txt_filename,
        },
        "review": review or {
@@ -189,9 +489,14 @@ def event_to_sidecar_dict(
            "notes":         "",
        },
-        "extensions": ext_dict,
+        "extensions": extensions or {},
    }
    if bw_report:
        out["bw_report"] = _bw_report_to_dict(bw_report)
    return out
 # ── Sidecar IO ────────────────────────────────────────────────────────────────
@@ -429,6 +734,50 @@ def _peaks_from_samples(samples: dict[str, list[int]]) -> PeakValues:
    )
 _RECORD_TYPE_BY_EXT_SUFFIX = {
    'H': 'Histogram',
    'W': 'Waveform',
    'M': 'Manual',
    'E': 'Event',
    'C': 'Combo',
 }
 def derive_record_type_from_filename(filename, default: str = "Waveform") -> str:
    """Derive a BW Event's record_type from its filename's extension suffix.
    V10.72+ MiniMate Plus firmware encodes the event type as the LAST
    character of the extension (the `T` in BW's `AB0T` scheme):
        ``M529LKIQ.G10H``  →  H  →  ``"Histogram"``
        ``T350L385.VY0W``  →  W  →  ``"Waveform"``
        ``...M``           →  M  →  ``"Manual"``
        ``...E``           →  E  →  ``"Event"``
        ``...C``           →  C  →  ``"Combo"``
    Old S338 firmware uses 3-char extensions ending in ``0`` whose
    encoding is not yet known — those fall through to ``default``.
    Micromate Series 4 uses a different scheme entirely (observed:
    ``IDFH``, ``IDFW``) but the LAST-char convention (H / W) still holds
    for the type code, so it works for both families.
    Returns ``default`` if filename is empty, has no extension, or the
    suffix char isn't a recognized type code.
    """
    if not filename:
        return default
    try:
        name = Path(filename).name
    except (TypeError, ValueError):
        return default
    if '.' not in name:
        return default
    ext = name.rsplit('.', 1)[1]
    if not ext:
        return default
    return _RECORD_TYPE_BY_EXT_SUFFIX.get(ext[-1].upper(), default)
 def read_blastware_file(path: Union[str, Path]) -> Event:
    """
    Parse a Blastware waveform file into an Event.
@@ -494,11 +843,40 @@ def read_blastware_file(path: Union[str, Path]) -> Event:
    ts1 = _bw._decode_ts_be(footer[2:10])
    ts2 = _bw._decode_ts_be(footer[10:18])
-    # Body: first 6 bytes are the preamble (00 00 ff ff ff ff).  Strip
+    # Body: decode via the verified body codecs.  Two formats coexist:
-    # them before decoding samples.  Any trailing tail past the last
+    #
-    # full sample-set is silently truncated by _decode_samples_4ch.
+    #   1. Waveform-mode (.AB0W) — starts with 7-byte preamble
-    sample_bytes = body[6:] if body[:6].hex() in ("0000ffffffff", "0000FFFFFFFF") else body
+    #      ``00 02 00 [Tran[0] BE] [Tran[1] BE]`` followed by the
-    samples = _decode_samples_4ch_int16_le(sample_bytes)
+    #      tagged-block delta stream documented in
    #      ``docs/waveform_codec_re_status.md`` and §7.6.1 of the
    #      protocol reference.  Decoded by ``waveform_codec.decode_waveform_v2``.
    #
    #   2. Histogram-mode (.AB0H) — a sequence of 32-byte blocks, one
    #      per histogram interval, each carrying per-channel peak +
    #      half-period values.  Decoded by
    #      ``histogram_codec.decode_histogram_body``.  Both codecs
    #      return the same channel-grouped output shape, so consumers
    #      don't need to special-case mode.
    #
    # The historical ``_decode_samples_4ch_int16_le`` int16-LE
    # interpretation was retracted 2026-05-08 (see protocol-ref §7.6.1
    # retraction box) — it produced ±32K noise on every event.
    #
    # If both codecs fail (malformed file, truncated body, unrecognised
    # mode, synthetic test input), fall back to empty channels — the
    # rest of the event (timestamp, waveform_key, project strings) is
    # still recoverable and useful.
    decoded = decode_waveform_v2(body)
    if decoded is None:
        decoded = decode_histogram_body(body)
    if decoded is None:
        log.warning(
            "%s: body codec failed to decode (body starts %s) — "
            "raw_samples will be empty", path, body[:8].hex(" "),
        )
        samples = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
    else:
        samples = decoded_to_adc_counts(decoded)
    # Metadata strings (label-anchored search across the body).
    project = _find_first_string(body, b"Project:")
@@ -510,7 +888,12 @@ def read_blastware_file(path: Union[str, Path]) -> Event:
    ev = Event(index=-1)
    if strt_fields.get("waveform_key"):
        ev._waveform_key = bytes.fromhex(strt_fields["waveform_key"])
-    ev.record_type     = "Waveform"
+    # Derive record_type from the filename's extension suffix (H/W/M/E/C).
    # When called from save_imported_bw the path here is a tmp file with a
    # ".bw" suffix, so the derivation falls back to "Waveform" and the
    # caller overrides ev.record_type using the original filename — see
    # waveform_store.save_imported_bw.
    ev.record_type     = derive_record_type_from_filename(path.name)
    ev.rectime_seconds = strt_fields.get("rectime_seconds")
    ev.total_samples   = strt_fields.get("total_samples")
    ev.pretrig_samples = strt_fields.get("pretrig_samples")
@@ -527,7 +910,18 @@ def read_blastware_file(path: Union[str, Path]) -> Event:
        project=project, client=client, operator=user, sensor_location=seisloc,
    )
    ev.raw_samples = samples
-    ev.peak_values = _peaks_from_samples(samples)
+    # Only compute peaks from samples when we actually have samples.
    # For events the codec couldn't decode (histogram-mode bodies, until
    # the §7.6.2 histogram codec is wired in), samples is an empty dict
    # and ``_peaks_from_samples`` would return PeakValues(0, 0, 0, 0, 0).
    # That would then OVERWRITE existing good DB peak values (e.g. from
    # paired BW ASCII reports) during the backfill UPSERT path.
    # Leaving peak_values=None signals "we don't know" to downstream
    # consumers; the backfill script seeds from the DB row when it sees
    # None, and ``apply_report_to_event`` overlays from a paired ASCII
    # report when one is supplied.
    has_samples = any(samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL"))
    ev.peak_values = _peaks_from_samples(samples) if has_samples else None
    ev._a5_frames = None  # not recoverable from BW file
    return ev
@@ -0,0 +1,283 @@
 """
 histogram_codec.py — decoder for MiniMate Plus histogram-mode event bodies.
 FULLY DECODED 2026-05-20.  Every field in every block, verified
 byte-exact against BW's ASCII export across multiple histogram
 fixtures.
 The histogram-mode body is a stream of 32-byte fixed-length blocks,
 one block per histogram interval.  Each block carries the per-interval
 peak amplitude + zero-crossing frequency for all four channels (Tran,
 Vert, Long, MicL).
 ────────────────────────────────────────────────────────────────────────────
 Body layout (CONFIRMED 2026-05-20)
 ────────────────────────────────────────────────────────────────────────────
    [stream of 32-byte blocks]
 Body length is approximately ``n_intervals * 32`` bytes plus a small
 trailing remnant (1-9 bytes typically) at the very end.  Walker should
 iterate 32-stride and stop before the tail.
 ────────────────────────────────────────────────────────────────────────────
 32-byte block layout
 ────────────────────────────────────────────────────────────────────────────
    [0]    0x00                      always-zero tag
    [1]    segment_id  (uint8)       0x00..0x03 — 256 blocks per segment
    [2:4]  block_ctr  (uint16 LE)    resets each segment (0x0100, 0x0101, …)
    [4:6]  0x000a (uint16 LE)        constant marker (= 10)
    [6]    T_peak_count   uint8      Tran peak (count × 0.005 → in/s, max 1.275 in/s)
    [7]    T_annotation   uint8      empirically non-zero on intervals with sub-Hz
                                     or unmeasurable Tran freq; meaning not fully RE'd
    [8:10] T_halfperiod   uint16 LE  Tran half-period in samples (freq = 512 / halfp Hz)
    [10]   V_peak_count   uint8
    [11]   V_annotation   uint8
    [12:14] V_halfperiod  uint16 LE
    [14]   L_peak_count   uint8
    [15]   L_annotation   uint8
    [16:18] L_halfperiod  uint16 LE
    [18]   M_peak_count   uint8      MicL peak (count → dB via mic_count_to_db)
    [19]   M_annotation   uint8
    [20:22] M_halfperiod  uint16 LE  MicL half-period in samples (freq = 512 / halfp Hz)
    [22:24] 0x00 0x00                constant
    [24:28] 4-byte variable          purpose unknown (possibly CRC or timestamp delta)
    [28:32] 0x1e 0x0a 0x00 0x00      constant block-end signature
 NOTE on peak-count width: an earlier interpretation treated the peak
 fields as uint16 LE spanning [6:8] / [10:12] / [14:16] / [18:20].
 That happened to be byte-exact against the N844 fixture corpus only
 because every annotation byte in those fixtures was zero, making
 ``uint16 LE == uint8``.  Cross-correlating BE9558 (K558) Tran-drift
 and BE18003 (T003) Histogram+Continuous events against the BW ASCII
 export proved peak is uint8 alone — see test_histogram_codec.py
 and docs/histogram_codec_re_status.md.
 Block-identification anchor: ``block[22:24] == b"\\x00\\x00"`` AND
 ``block[28:32] == b"\\x1e\\x0a\\x00\\x00"``.  This is the reliable
 distinguisher from non-block content in the file.
 ────────────────────────────────────────────────────────────────────────────
 Per-channel encoding
 ────────────────────────────────────────────────────────────────────────────
 Geophone channels (Tran, Vert, Long):
  - peak_count × 0.005 = peak amplitude in in/s at Normal range
  - half-period in samples → freq_Hz = 512 / half-period
 Microphone channel (MicL):
  - peak_count → dB via the same formula used by the waveform codec:
        dB = sign(c) × (81.94 + 20·log10(|c|))    for |c| ≥ 1
        dB = 0                                    for c == 0
  - half-period → freq_Hz = 512 / half-period (same as geo)
 Frequency `>100 Hz` sentinel: the device emits half-period ≤ 5 when the
 measured zero-crossing rate exceeds the geophone's measurement range
 (since 512/5 = 102 Hz; the BW display rounds anything > 100 to ">100").
 ────────────────────────────────────────────────────────────────────────────
 Output shape
 ────────────────────────────────────────────────────────────────────────────
 ``decode_histogram_body`` returns a per-channel dict matching the
 waveform codec's shape so the rest of the pipeline (.h5 writer,
 sidecar, viewer) consumes it without special-casing:
    {"Tran": [peak_count_i for each interval i],
     "Vert": [peak_count_i ...],
     "Long": [peak_count_i ...],
     "MicL": [peak_count_i ...]}
 Values are in **16-count units for geo** (LSB = 0.005 in/s, matching
 ``decode_waveform_v2``) and **1-count units for mic** (matching the
 waveform codec's mic convention).  Run through
 ``waveform_codec.decoded_to_adc_counts`` to scale geo to 1-count ADC.
 Per-interval frequencies are NOT returned — they're auxiliary data,
 not waveform samples.  Consumers needing frequencies can call
 ``decode_histogram_body_full()`` for the structured per-interval
 record list.
 """
 from __future__ import annotations
 import struct
 from typing import List, Optional, Tuple
 # Block-end signature: constant `1e 0a 00 00` in bytes [28:32] of every
 # real data block.  More distinctive than the byte-22 `00 00` (which
 # matches many false positives), so we anchor on this.
 _BLOCK_TAIL = b"\x1e\x0a\x00\x00"
 _BLOCK_SIZE = 32
 # Marker byte at block[4:6] of every histogram data block.  Used as
 # additional validation that we're looking at a real block.
 _BLOCK_MARKER = 10
 # Geo peak scaling: stored as "count × 0.005 in/s" where 1 count = one
 # 0.005 in/s display quantum.  Equivalent to the waveform codec's
 # 16-count-unit output (1 unit = 0.005 in/s = 16 ADC counts).
 _GEO_LSB_INS = 0.005
 # Frequency formula: freq_Hz = _FREQ_NUMERATOR / half_period_samples.
 # Empirically determined to be 512 (= sample_rate / 2, where sample rate
 # is 1024 sps for the standard MiniMate Plus configuration).
 _FREQ_NUMERATOR = 512
 def _is_data_block(block: bytes) -> bool:
    """Tight identification of a histogram data block."""
    if len(block) < _BLOCK_SIZE:
        return False
    if block[28:32] != _BLOCK_TAIL:
        return False
    if block[22:24] != b"\x00\x00":
        return False
    if block[0] != 0x00:
        return False
    marker = block[4] | (block[5] << 8)
    if marker != _BLOCK_MARKER:
        return False
    return True
 def _decode_block(block: bytes) -> Optional[dict]:
    """Decode one 32-byte histogram block.  Caller must have validated
    with ``_is_data_block`` first.
    Returns a record with per-channel peak counts (uint8) and
    half-periods (uint16 LE).
    """
    # Peak counts are uint8 at bytes [6] / [10] / [14] / [18].  The
    # adjacent bytes [7] / [11] / [15] / [19] hold an annotation field
    # whose meaning isn't fully understood (empirically non-zero in
    # intervals with sub-Hz or unmeasurable geo frequencies, mostly
    # zero otherwise — see test fixtures from BE9558/BE18003 corpora).
    # Crucially, those annotation bytes are NOT the high byte of the
    # peak count: cross-correlating against BW's per-interval ASCII
    # export proves the peak is uint8 alone.
    #
    # Reading the peak as uint16 LE (the original interpretation) was
    # accidentally correct only because every block in the N844 fixture
    # corpus had a zero annotation byte; non-N844 events with non-zero
    # annotation bytes decoded to physically impossible peaks (e.g.
    # 268 in/s per channel) and produced 35× inflated PVS sums when
    # first run against prod data.  See histogram_codec_re_status.md.
    t_peak = block[6]
    v_peak = block[10]
    l_peak = block[14]
    m_peak = block[18]
    t_halfp = block[8]  | (block[9]  << 8)
    v_halfp = block[12] | (block[13] << 8)
    l_halfp = block[16] | (block[17] << 8)
    m_halfp = block[20] | (block[21] << 8)
    segment_id = block[1]
    block_ctr  = block[2] | (block[3] << 8)
    var_meta   = bytes(block[24:28])
    annotations = (block[7], block[11], block[15], block[19])
    return {
        "segment_id":  segment_id,
        "block_ctr":   block_ctr,
        "t_peak":      t_peak,
        "t_halfp":     t_halfp,
        "v_peak":      v_peak,
        "v_halfp":     v_halfp,
        "l_peak":      l_peak,
        "l_halfp":     l_halfp,
        "m_peak":      m_peak,
        "m_halfp":     m_halfp,
        "meta_var":    var_meta,
        "annotations": annotations,
    }
 def walk_body(body: bytes) -> List[dict]:
    """Walk the body and return one dict per histogram interval.
    Iterates 32-byte strides from offset 0.  Yields a decoded record
    for every block that passes ``_is_data_block`` validation.  Stops
    when the remaining bytes are too short to form a complete block.
    In Histogram+Continuous mode the body interleaves data blocks with
    other 32-byte content (likely continuous-mode waveform blocks) that
    fail the data-block validation; the walker naturally skips them
    without losing 32-byte alignment.  Use ``block_ctr`` from each
    returned record to map back to the original interval index — the
    record list is sparse when other block types are interleaved.
    """
    records: List[dict] = []
    for off in range(0, len(body) - _BLOCK_SIZE + 1, _BLOCK_SIZE):
        blk = body[off:off + _BLOCK_SIZE]
        if not _is_data_block(blk):
            # Hit non-block content (likely a sync or stream marker).
            # Continue walking — block alignment is fixed at 32-stride
            # from offset 0, so we don't lose alignment by skipping.
            continue
        decoded = _decode_block(blk)
        if decoded is None:
            # Block validated as a histogram block but had peak fields
            # outside the plausible range — undocumented extension.
            # Skip rather than propagating bogus PVS contributions.
            continue
        records.append(decoded)
    return records
 def decode_histogram_body(body: bytes) -> Optional[dict]:
    """Decode a histogram-mode body into per-channel peak-sample arrays.
    Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
    where each channel's list contains one peak value per histogram
    interval (in the same units the waveform codec uses: 16-count units
    for geo, 1-count ADC units for mic).  Returns ``None`` if the body
    doesn't contain any valid histogram blocks.
    To convert to physical units:
      - Geo channels: ``count * 0.005`` = peak in in/s at Normal range
        (or run through ``waveform_codec.decoded_to_adc_counts`` first
         to get 1-count ADC values, then ``count / 32767 * 10.0`` for in/s)
      - Mic channel:  use ``waveform_codec.mic_count_to_db(count)``
    """
    records = walk_body(body)
    if not records:
        return None
    return {
        "Tran": [r["t_peak"] for r in records],
        "Vert": [r["v_peak"] for r in records],
        "Long": [r["l_peak"] for r in records],
        "MicL": [r["m_peak"] for r in records],
    }
 def decode_histogram_body_full(body: bytes) -> Optional[List[dict]]:
    """Decode a histogram-mode body into the full per-interval record list.
    Same data as ``decode_histogram_body`` but in a structured form that
    preserves the half-period (frequency) data for each channel + the
    per-block segment_id, block_ctr, and 4-byte variable metadata.
    Useful for diagnostic tools, sidecar enrichment, and future-codec
    work.
    Returns ``None`` if the body has no valid blocks.
    """
    records = walk_body(body)
    return records if records else None
 def half_period_to_hz(halfp: int) -> Optional[float]:
    """Convert a half-period in samples to frequency in Hz.
    Returns ``None`` for half-period ≤ 5 — the device emits values in
    that range when the measured zero-crossing rate exceeds 100 Hz
    (the BW display reports `>100 Hz` for such cases).  Callers can
    treat ``None`` as the `>100 Hz` sentinel.
    """
    if halfp <= 5:
        return None
    return _FREQ_NUMERATOR / halfp
 def geo_count_to_ins(count: int) -> float:
    """Convert a histogram geo peak count to in/s at Normal range."""
    return count * _GEO_LSB_INS
@@ -0,0 +1,578 @@
 """
 waveform_codec.py — block-walker and verified decoder for the MiniMate Plus
 waveform-file body.
 FULLY DECODED 2026-05-11.  Every block type, every channel, and the
 channel-rotation rule are verified byte-exact against BW's ASCII export
 across the 9-event fixture bundle (47,364 ADC samples, zero errors).
 The Blastware waveform-file body — the bytes between the 21-byte STRT
 record and the 26-byte file footer — is a tagged variable-length block
 stream with a custom delta + RLE codec.  (Not raw int16 LE, which was
 the historical wrong assumption that produced ±32K noise on every event.)
 Current status:
 - Block framing: ✅ solved (5 block types and lengths all confirmed)
 - Per-channel decode: ✅ solved (Tran / Vert / Long / MicL all byte-exact)
 - Channel rotation: ✅ Tran → Vert → Long → MicL per segment
 - Segment header: ✅ fully decoded (anchor pair + prev-channel extension)
 - 30 NN packed-delta block: ✅ NN × 12-bit signed deltas in NN/4 groups
 - MicL → dB(L) conversion: ✅ ``mic_count_to_db`` matches BW display
 - Production wiring: ✅ ``client.py:_decode_a5_waveform`` uses the new
  codec (via ``decode_a5_frames``).  ``.h5`` sidecars now render
  correctly.
 Known limitations:
 - Walker stops early on the loudest events (SP0, SS0, SV0, event-b) at
  some mid-segment edge cases not yet fully characterized.  Every
  sample reached IS correct; the walker just doesn't reach all of
  them yet.  The cleanly-decoded subset is still ~5000–15000 samples
  per loud event.
 ────────────────────────────────────────────────────────────────────────────
 Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
 ────────────────────────────────────────────────────────────────────────────
    [7-byte preamble] [stream of tagged blocks] [trailer]
 The preamble is always exactly 7 bytes:
    body[0:3]  = 00 02 00              magic
    body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
    body[5:7]  = Tran[1]   int16 BE    in 16-count units
 (Earlier drafts of this module described a "7-or-9-byte preamble";
 that was wrong — single-shot and continuous events both use 7 bytes.
 The "extra 2 bytes" on continuous events were the first ``00 NN`` RLE
 marker, not part of the preamble.)
 Block types and lengths (all confirmed):
 | Tag      | Length                | Meaning                                |
 |----------|-----------------------|----------------------------------------|
 | ``10 NN``| NN/2 + 2 bytes        | 4-bit nibble deltas (2 per byte; high  |
 |          |                       | nibble first; signed 0..7 / 8..F = -8..-1)|
 | ``20 NN``| NN + 2 bytes          | int8 signed deltas (1 per byte)        |
 | ``00 NN``| 2 bytes               | RLE: append NN copies of current value |
 | ``30 NN``| NN*2 in data, NN*4    | Unknown content.  Only in loud events. |
 |          | in trailer            |                                        |
 | ``40 02``| 20 bytes (fixed)      | Segment header                         |
 NN is always a multiple of 4.
 ────────────────────────────────────────────────────────────────────────────
 Tran channel, segment 0 (CONFIRMED 2026-05-11)
 ────────────────────────────────────────────────────────────────────────────
 Segment 0 — everything before the first ``40 02`` segment header — encodes
 Tran samples only.  Starting from preamble anchors Tran[0] and Tran[1],
 each subsequent block contributes to the running Tran value:
    10 NN  →  append NN deltas (4-bit signed nibbles)
    20 NN  →  append NN deltas (int8 signed bytes)
    00 NN  →  append NN copies of the current value (RLE zeros)
    40 02  →  segment 0 ends; multi-segment continuation is open
 This decodes the first 482–510 samples of Tran for each event with zero
 errors against BW's ASCII export.  The exact segment-0 sample count
 varies per event (it's bounded by a fixed device-flash byte budget, not
 a fixed sample count — quiet events fit more samples because zero
 deltas pack into ``00 NN`` markers compactly).
 Implementation: :func:`decode_tran_initial`.
 ────────────────────────────────────────────────────────────────────────────
 Segment header (40 02, 20 bytes total)
 ────────────────────────────────────────────────────────────────────────────
 The 18-byte payload of the ``40 02`` block:
 | Offset    | Field                                       | Status      |
 |-----------|---------------------------------------------|-------------|
 | [0:2]     | T_delta at first sample of new segment      | ✅ confirmed|
 |           | (int16 BE, in 16-count units)               |             |
 | [2:4]     | Likely T_delta at sample seg_start+1        | 🟡 likely   |
 | [4:6]     | Unknown (varies; possibly checksum)         | ❓ open     |
 | [6:8]     | Byte length to next segment header − 2      | ✅ confirmed|
 |           | (uint16 BE; useful for walker pre-scan)     |             |
 | [8:12]    | Monotonic uint32 LE counter                 | ✅ confirmed|
 |           | (starts ~0x47, increments by 1 per segment) |             |
 | [12:14]   | Constant ``02 00``                          | ✅ confirmed|
 | [14:18]   | Unknown 4-byte field                        | ❓ open     |
 ────────────────────────────────────────────────────────────────────────────
 What breaks the multi-segment decoder (the main open question)
 ────────────────────────────────────────────────────────────────────────────
 After segment 0 ends and the segment header T_delta is consumed,
 applying segment 1's blocks as Tran continuation produces values that
 diverge from truth by sample ~512.  The block structure inside segment
 1 is IDENTICAL to segment 0 (same alternating 10 NN / 00 NN pattern),
 and the delta budget matches the segment size exactly (V70 segment 1
 has 264 nibble-deltas + 244 RLE zeros = 508 = the segment's sample
 count).  But the cumulative is wrong.
 The strongest unverified hypothesis is that segments rotate channels:
    segment 0  →  Tran samples 0..509
    segment 1  →  Vert samples 0..507
    segment 2  →  Long samples 0..507
    segment 3  →  Mic  samples 0..507
    segment 4  →  Tran samples 510..N (continuation)
    ...
 This is consistent with the segment-1 block sums net-to-near-zero in
 V70 (where all 4 channels are near zero) and with the per-segment delta
 budget matching the segment size for a single channel.  It is NOT yet
 verified because the per-segment channel anchor isn't pinned down in
 the segment header — bytes [4:6] and [14:18] of the header are still
 open and probably encode V/L/M anchors.
 See ``docs/waveform_codec_re_status.md`` for the current working notes
 and the suggested next experiment ("segment-channel scoring analyzer").
 """
 from __future__ import annotations
 import math
 from dataclasses import dataclass
 from typing import List, Optional, Tuple
@dataclass
 class WaveformBlock:
    """One tagged block parsed out of a Blastware waveform-file body."""
    offset: int      # byte offset into body
    tag_hi: int      # first tag byte (0x10 / 0x20 / 0x00 / 0x30 / 0x40)
    tag_lo: int      # second tag byte (NN)
    data: bytes      # block payload (excludes the 2-byte tag)
    length: int      # total block length on the wire (includes the tag)
    @property
    def kind(self) -> str:
        return f"{self.tag_hi:02x} {self.tag_lo:02x}"
 def find_data_start(body: bytes) -> int:
    """Auto-detect the offset of the first data block.
    The body starts with a 7-byte preamble (magic ``00 02 00`` + two int16 BE
    Tran anchors).  After that, the data section starts with a tag — usually
    ``10 NN`` or ``20 NN``, but quiet events may begin with a ``00 NN`` RLE
    marker.  We return the offset of the first recognized tag.
    """
    # Try fixed offset 7 first (canonical preamble length).
    if len(body) >= 9:
        b, nn = body[7], body[8]
        if (b in (0x00, 0x10, 0x20, 0x30) and nn % 4 == 0 and 0 < nn <= 0xFC) \
                or (b == 0x40 and nn == 0x02):
            return 7
    # Fall back to scanning the first 20 bytes.
    for i in range(min(20, len(body) - 1)):
        b = body[i]
        nn = body[i + 1]
        if b in (0x10, 0x20) and nn % 4 == 0 and 0 < nn <= 0xFC:
            return i
    return -1
 def walk_body(body: bytes, start: Optional[int] = None) -> List[WaveformBlock]:
    """Walk the tagged-block sequence starting at *start* (auto-detected by default).
    Stops when an unrecognized tag is encountered or end of body is reached.
    Returned blocks are in stream order.
    """
    if start is None:
        start = find_data_start(body)
        if start < 0:
            return []
    blocks: List[WaveformBlock] = []
    i = start
    while i + 1 < len(body):
        t0 = body[i]
        t1 = body[i + 1]
        if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
            length = t1 // 2 + 2
        elif (t0 & 0xF0) == 0x10 and (t0 & 0x0F) != 0 and t1 % 4 == 0:
            # Wide-NN nibble block: ``1X NN`` where X is the high nibble of a
            # 12-bit NN value.  NN = ((t0 & 0x0F) << 8) | t1.  Block length
            # = NN/2 + 2 bytes (NN nibble deltas, same as ``10 NN`` semantics
            # but with NN > 0xFC).  Confirmed 2026-05-11 in SP0 segment 12
            # where V continuation uses ``11 90`` = NN=0x190=400.
            wide_nn = ((t0 & 0x0F) << 8) | t1
            length = wide_nn // 2 + 2
        elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
            length = t1 + 2
        elif (t0 & 0xF0) == 0x20 and (t0 & 0x0F) != 0 and t1 % 4 == 0:
            # Wide-NN int8 block: ``2X NN`` extends NN to 12 bits the same way.
            wide_nn = ((t0 & 0x0F) << 8) | t1
            length = wide_nn + 2
        elif t0 == 0x00 and t1 % 4 == 0:
            length = 2
        elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
            # Data-section ``30 NN`` blocks carry NN 12-bit signed deltas packed
            # as NN/4 groups of (2-byte high-nibble field + 4 × int8 low byte).
            # Length = NN/4 × 6 + 2 = NN × 1.5 + 2 (= 8 for NN=4, 14 for NN=8,
            # 20 for NN=12, etc.).  Confirmed 2026-05-11 by full-decoder
            # verification against BW ASCII export.
            #
            # Trailer-section ``30 NN`` blocks have a different length formula
            # (NN × 4 = 32 for NN=8 in trailers).  We try the data-section
            # length first and fall back to the trailer length if needed.
            cand_data = t1 * 3 // 2 + 2
            cand_trailer = t1 * 4
            if (i + cand_data < len(body) - 1
                    and body[i + cand_data] in (0x10, 0x20, 0x00, 0x30, 0x40)):
                length = cand_data
            else:
                length = cand_trailer
        elif t0 == 0x40 and t1 == 0x02:
            length = 20
        else:
            # Unknown tag; stop.  Caller can inspect ``i`` to see where.
            break
        if i + length > len(body):
            break
        data = bytes(body[i + 2 : i + length])
        blocks.append(WaveformBlock(offset=i, tag_hi=t0, tag_lo=t1, data=data, length=length))
        i += length
    return blocks
 def split_segments(blocks: List[WaveformBlock]) -> List[List[WaveformBlock]]:
    """Group consecutive blocks into segments separated by ``40 02`` headers.
    The first segment is whatever runs before the first ``40 02`` header
    (typically the "segment 0" preamble data after the body preamble).
    Subsequent segments start with a ``40 02`` block, then have their
    own data blocks until the next ``40 02``.
    """
    segments: List[List[WaveformBlock]] = []
    current: List[WaveformBlock] = []
    for b in blocks:
        if b.tag_hi == 0x40 and b.tag_lo == 0x02:
            if current:
                segments.append(current)
            current = [b]
        else:
            current.append(b)
    if current:
        segments.append(current)
    return segments
 def parse_segment_header(block: WaveformBlock) -> Optional[dict]:
    """Decode the 18-byte payload of a ``40 02`` segment header.
    Returns a dict with the labelled fields, or None if *block* is not
    a ``40 02`` header.
    """
    if not (block.tag_hi == 0x40 and block.tag_lo == 0x02):
        return None
    if len(block.data) < 18:
        return None
    p = block.data
    counter = int.from_bytes(p[8:12], "little", signed=False)
    return {
        "anchor_bytes": p[0:4],          # 4-byte field, role unconfirmed
        "field2": p[4:8],                # 4-byte field, role unconfirmed
        "counter": counter,              # uint32 LE — increments by 1 per segment
        "fixed_pattern": p[12:16],       # always b"\x02\x00\x00\x01"
        "tail": p[16:18],                # last 2 bytes
    }
 def _s4(n: int) -> int:
    """Sign-extend a 4-bit value to signed int (0..7 → 0..7; 8..F → -8..-1)."""
    return n if n < 8 else n - 16
 def _i8(b: int) -> int:
    """Reinterpret an unsigned byte as signed int8."""
    return b if b < 128 else b - 256
 def decode_tran_initial(body: bytes) -> Optional[List[int]]:
    """
    Decode the initial Tran-channel samples — VERIFIED 2026-05-11.
    Returns Tran samples in **16-count units** (LSB = 0.005 in/s at Normal
    range — the same quantization BW uses for its ASCII export).  Returns
    ``None`` if the body cannot be parsed.
    The decoded list extends from sample 0 through the end of segment 0
    (= just before the first ``40 02`` segment header; ~510 sample-sets
    for the events tested).  Multi-segment decoding requires continuing
    past the segment header — that's done by :func:`decode_tran_full`
    when the per-segment rules are pinned down for all signal types.
    Codec for segment 0 (CONFIRMED 2026-05-11 against 7 fixture events):
    - Body bytes [0:3] are the magic ``00 02 00``.
    - Body bytes [3:5] = ``Tran[0]`` as int16 BE in 16-count units.
    - Body bytes [5:7] = ``Tran[1]`` as int16 BE in 16-count units.
    - Data blocks (``10 NN`` or ``20 NN``) carry Tran deltas starting
      at sample 2:
      * ``10 NN``: NN nibbles = NN/2 bytes; each nibble is a 4-bit
        signed delta (0..7 → 0..+7; 8..F → -8..-1).  High nibble of
        each byte comes first.
      * ``20 NN``: NN int8 signed deltas (one delta per byte).
    - ``00 NN`` blocks are run-length-encoded zero deltas: append NN
      copies of the current cumulative Tran value (no change).
    - ``30 NN`` blocks have not yet been decoded for content — they
      appear in segment 0 of loud-from-start events (SS0, SV0) and
      seem to signal a transition or special-case interpretation.
      The walker steps over them but their data is ignored.
    The walk stops at the first ``40 02`` segment header.
    """
    if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
        return None
    t0 = int.from_bytes(body[3:5], "big", signed=True)
    t1 = int.from_bytes(body[5:7], "big", signed=True)
    start = find_data_start(body)
    if start < 0:
        return [t0, t1]
    out = [t0, t1]
    cur = t1
    for blk in walk_body(body, start):
        if blk.tag_hi == 0x40:
            # Segment boundary — stop.  Multi-segment decode is decode_tran_full.
            break
        if blk.tag_hi == 0x10:
            for byte in blk.data:
                for nib in ((byte >> 4) & 0xF, byte & 0xF):
                    cur += _s4(nib)
                    out.append(cur)
        elif blk.tag_hi == 0x20:
            for byte in blk.data:
                cur += _i8(byte)
                out.append(cur)
        elif blk.tag_hi == 0x00:
            # RLE zero deltas: append NN copies of current Tran value.
            for _ in range(blk.tag_lo):
                out.append(cur)
        # 30 NN: unknown content; skip.
    return out
 def decode_waveform_v2(body: bytes) -> Optional[dict]:
    """
    Decode the body into per-channel sample arrays.
    Status (2026-05-11 evening — channel-rotation hypothesis CONFIRMED):
    segments rotate channels in fixed order **Tran → Vert → Long → MicL**.
    Each channel-segment carries a 2-sample anchor pair in segment-header
    bytes [14:18] (or in the body preamble for the initial Tran segment)
    plus a stream of delta blocks for samples 2 onward.
    Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
    with each channel's decoded samples in 16-count units (LSB = 0.005
    in/s at Normal range).  Returns ``None`` if the body cannot be
    parsed.
    """
    if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
        return None
    channels = ["Tran", "Vert", "Long", "MicL"]
    out: dict = {ch: [] for ch in channels}
    # Initial Tran segment: preamble anchor pair + delta blocks before first 40 02.
    t0 = int.from_bytes(body[3:5], "big", signed=True)
    t1 = int.from_bytes(body[5:7], "big", signed=True)
    out["Tran"].extend([t0, t1])
    start = find_data_start(body)
    if start < 0:
        return out
    blocks = walk_body(body, start)
    seg_idx = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
    def apply_blocks(channel: str, anchor: int,
                     block_start: int, block_end: int) -> int:
        """Apply delta blocks [block_start, block_end) to *channel*'s sample
        list, starting from *anchor*.  Returns the final cumulative value."""
        cur = anchor
        for bi in range(block_start, block_end):
            blk = blocks[bi]
            if (blk.tag_hi & 0xF0) == 0x10:
                # Both ``10 NN`` (NN ≤ 0xFC) and wide-NN ``1X NN`` (X != 0)
                # are nibble-delta streams.  The walker has already used the
                # right length; here we just iterate the payload bytes.
                for byte in blk.data:
                    for nib in ((byte >> 4) & 0xF, byte & 0xF):
                        cur += _s4(nib)
                        out[channel].append(cur)
            elif (blk.tag_hi & 0xF0) == 0x20:
                # ``20 NN`` and wide ``2X NN`` both carry int8 deltas.
                for byte in blk.data:
                    cur += _i8(byte)
                    out[channel].append(cur)
            elif blk.tag_hi == 0x00:
                for _ in range(blk.tag_lo):
                    out[channel].append(cur)
            elif blk.tag_hi == 0x30:
                # 12-bit signed deltas, packed as NN/4 groups of 6 bytes each:
                #   bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB first)
                #   bytes [2:6] = 4 × int8 low bytes
                # Each delta = sign_extend_12((high_nibble << 8) | low_byte).
                # Confirmed 2026-05-11 against all 14 ``30 NN`` blocks in the
                # bundled fixtures.
                n_groups = blk.tag_lo // 4
                for g in range(n_groups):
                    grp = blk.data[g * 6 : (g + 1) * 6]
                    if len(grp) < 6:
                        break
                    high_word = (grp[0] << 8) | grp[1]
                    for k in range(4):
                        nib = (high_word >> (12 - 4 * k)) & 0xF
                        v = (nib << 8) | grp[2 + k]
                        if v >= 0x800:
                            v -= 0x1000
                        cur += v
                        out[channel].append(cur)
            # 40 02: should not occur in segment data.
        return cur
    # Initial Tran segment: deltas from start of body up to first 40 02 (or end).
    first_seg = seg_idx[0] if seg_idx else len(blocks)
    last_tran_value = apply_blocks("Tran", t1, 0, first_seg)
    # Subsequent segments rotate channels.  Each segment header carries:
    #   bytes [0:2] and [2:4] = 2 deltas extending the PREVIOUS channel
    #   bytes [14:16] and [16:18] = anchor pair for THIS segment's channel
    #
    # Rotation: V, L, M, T, V, L, M, T, ...  (initial Tran segment is the
    # implicit T in the cycle.)
    rotation = ["Vert", "Long", "MicL", "Tran"]
    # Track each channel's "running cumulative value" so we can apply the
    # previous-channel extension deltas at every segment boundary.
    last_value = {"Tran": last_tran_value, "Vert": None, "Long": None, "MicL": None}
    for k, hi in enumerate(seg_idx):
        channel = rotation[k % 4]
        prev_channel = "Tran" if k == 0 else rotation[(k - 1) % 4]
        header = blocks[hi]
        if len(header.data) < 18:
            continue
        # Validate: real segment headers have bytes [12:14] = `02 00`.
        # Trailer/footer "40 02" markers contain ASCII serial bytes or other
        # non-header data there and would otherwise be mis-interpreted as
        # segment headers, adding spurious samples at the tail.
        if header.data[12:14] != b"\x02\x00":
            break
        # Extend the PREVIOUS channel by 2 more samples (deltas in bytes [0:4]).
        prev_d0 = int.from_bytes(header.data[0:2], "big", signed=True)
        prev_d1 = int.from_bytes(header.data[2:4], "big", signed=True)
        if last_value[prev_channel] is not None:
            v = last_value[prev_channel] + prev_d0
            out[prev_channel].append(v)
            v += prev_d1
            out[prev_channel].append(v)
            last_value[prev_channel] = v
        # Anchor pair for THIS segment's channel.
        c0 = int.from_bytes(header.data[14:16], "big", signed=True)
        c1 = int.from_bytes(header.data[16:18], "big", signed=True)
        out[channel].extend([c0, c1])
        # Apply delta blocks for this segment.
        next_hi = seg_idx[k + 1] if k + 1 < len(seg_idx) else len(blocks)
        last_value[channel] = apply_blocks(channel, c1, hi + 1, next_hi)
    return out
 # ── ADC-scale conversion helpers ────────────────────────────────────────────
 # Scaling factor: decode_waveform_v2 produces geo-channel samples in the BW
 # display quantization (16-count units, LSB = 0.005 in/s at Normal range).
 # The legacy consumer pipeline (sfm/event_hdf5.py) expects raw_samples in
 # 1-count ADC units (× full_scale / 32768 → physical).  To plug the new
 # decoder in without rewriting consumers, multiply geo values by 16.
 #
 # Mic samples are already in raw ADC counts (decoded value 1 = 1 mic ADC count
 # = -81.94 dB on the BW display).  Mic values pass through unchanged.
 _GEO_DECODER_TO_ADC = 16
 def decoded_to_adc_counts(decoded: dict) -> dict:
    """Convert :func:`decode_waveform_v2` output to int16 ADC counts.
    Geo channels are scaled by ×16 (decoder produces 16-count units,
    consumer expects 1-count ADC).  Mic is passed through as raw counts.
    """
    if not decoded:
        return {}
    return {
        "Tran": [v * _GEO_DECODER_TO_ADC for v in decoded.get("Tran", [])],
        "Vert": [v * _GEO_DECODER_TO_ADC for v in decoded.get("Vert", [])],
        "Long": [v * _GEO_DECODER_TO_ADC for v in decoded.get("Long", [])],
        "MicL": list(decoded.get("MicL", [])),
    }
 def mic_count_to_db(count: int) -> float:
    """Convert a MicL ADC count to dB(L) for BW-display-compatible output.
    Empirical formula (confirmed 2026-05-11 against V70 fixture: count=813
    → 140.1 dB; count=±1 → ±81.94 dB; count=±24 → ±109.5 dB):
        dB = sign(count) × (81.94 + 20 × log10(|count|))    for |count| ≥ 1
        dB = 0.0                                            for count == 0
    The constant 81.94 corresponds to 10^(81.94/20) ≈ 12490 mic ADC counts
    being the dB(L) reference level — almost certainly a calibration
    constant from the device's mic.
    """
    if count == 0:
        return 0.0
    sign = 1.0 if count > 0 else -1.0
    return sign * (81.94 + 20.0 * math.log10(abs(count)))
 # ── A5-frame entry point ────────────────────────────────────────────────────
 def decode_a5_frames(a5_frames) -> Optional[dict]:
    """Decode a list of A5 (BULK_WAVEFORM_STREAM) frames into per-channel
    int16 ADC samples.
    Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
    with each channel's samples in **1-count ADC units** (the legacy
    ``event.raw_samples`` convention — multiply by ``full_scale / 32768``
    to convert to physical units; for mic, use :func:`mic_count_to_db` or
    a per-count psi factor).
    Returns ``None`` if the frames cannot be parsed.
    This is the wired-up production entry point.  It:
      1. Reconstructs the BW-binary body bytes from the A5 frames
         (``blastware_file.extract_body_bytes``).
      2. Runs the verified codec (``decode_waveform_v2``) on the body.
      3. Converts to int16 ADC counts via :func:`decoded_to_adc_counts`.
    """
    # Local import to avoid a cycle: blastware_file imports models and
    # ultimately client.py imports waveform_codec.
    from .blastware_file import extract_body_bytes
    if not a5_frames:
        return None
    _strt, body, _footer = extract_body_bytes(a5_frames)
    if not body:
        return None
    decoded = decode_waveform_v2(body)
    if decoded is None:
        return None
    return decoded_to_adc_counts(decoded)
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "seismo-relay"
-version = "0.15.0"
+version = "0.21.1"
 description = "Python client and REST server for MiniMate Plus seismographs"
 requires-python = ">=3.10"
 dependencies = [
@@ -15,9 +15,10 @@ dependencies = [
    "python-multipart>=0.0.7",
    "h5py>=3.10",
    "numpy>=1.24",
    "matplotlib>=3.8",
 ]
 [tool.setuptools.packages.find]
-# Auto-discovers minimateplus/, sfm/, bridges/ as packages
+# Auto-discovers minimateplus/, micromate/, sfm/, bridges/ as packages
 where = ["."]
-include = ["minimateplus*", "sfm*", "bridges*"]
+include = ["minimateplus*", "micromate*", "sfm*", "bridges*"]
@@ -5,3 +5,4 @@ pyserial
 python-multipart
 h5py
 numpy
 matplotlib
@@ -0,0 +1,360 @@
 """
 scratch/next_experiment_skeleton.py — segment-channel scoring analyzer.
 This is the suggested NEXT EXPERIMENT for cracking the waveform body codec.
 The goal is to figure out what segments 1+ contain, since segment 0 = Tran
 is solved but multi-segment continuation diverges from truth at sample ~512.
 ────────────────────────────────────────────────────────────────────────────
 The hypothesis to test
 ────────────────────────────────────────────────────────────────────────────
 Segments rotate through channels:
    segment 0  →  Tran samples 0..509
    segment 1  →  Vert samples 0..507
    segment 2  →  Long samples 0..507
    segment 3  →  Mic  samples 0..507
    segment 4  →  Tran samples 510..N (continuation)
    ...
 This would explain why segment 0 works perfectly (it's pure Tran) and why
 applying segment 1's blocks as Tran continuation gives wrong values
 (it's actually Vert).
 ────────────────────────────────────────────────────────────────────────────
 What the analyzer should do
 ────────────────────────────────────────────────────────────────────────────
 For each segment in each fixture event:
 1. Run the segment-0 block-walker + RLE decode (the same algorithm that
   ``decode_tran_initial`` uses) over the segment's blocks.  Start from
   some anchor value and produce a cumulative trajectory of length =
   number-of-deltas-in-segment.
 2. For each candidate channel C ∈ {Tran, Vert, Long, MicL}:
   For each candidate anchor location in the segment-header payload
   (try [0:2], [2:4], [4:6], [14:16], [16:18] as int16 BE):
       Compare the decoded trajectory against truth[C] starting from
       the segment's first sample index.
       Score = number of matches (or sum of squared errors).
 3. Report the best (channel, anchor-location) combination per segment.
 If the rotation hypothesis is correct, you'll see:
    segment 0  →  best score for (Tran, preamble bytes [3:5])    ✓ already known
    segment 1  →  best score for (Vert, <some-header-byte>)
    segment 2  →  best score for (Long, <some-header-byte>)
    segment 3  →  best score for (MicL, <some-header-byte>)
    segment 4  →  best score for (Tran, continuing from segment 0's end)
 If the rotation hypothesis is NOT correct, the scorer will at least narrow
 down what segment 1 actually carries.  Maybe channels interleave at finer
 granularity, or maybe segments alternate by something other than channel.
 ────────────────────────────────────────────────────────────────────────────
 Why this is a scoring analyzer, not a hand-written decoder
 ────────────────────────────────────────────────────────────────────────────
 Direct hand-coding ("assume segment 1 is Vert with anchor at byte X") gets
 stuck when the assumption is wrong because the failure mode is silent —
 you get plausible-looking-but-wrong samples and have to manually diff
 against truth to debug.
 The scorer is brute-force but cheap: every fixture event × every segment ×
 4 channels × 5 anchor-byte candidates is only ~hundreds of comparisons.
 The winning combination jumps out by score.
 ────────────────────────────────────────────────────────────────────────────
 Skeleton
 ────────────────────────────────────────────────────────────────────────────
 """
 from __future__ import annotations
 import os
 import re
 import sys
 from dataclasses import dataclass
 from typing import List, Optional, Tuple
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
 from minimateplus.waveform_codec import walk_body, find_data_start, WaveformBlock
 # ── Reusable pieces ──────────────────────────────────────────────────────────
 CHANNELS = ("Tran", "Vert", "Long", "MicL")
 LSB_INV = 200  # 1 in/s / 0.005 in/s/LSB; multiply BW-export floats by this
               # to get 16-count units (the body's native quantization).
@dataclass
 class FixtureEvent:
    name: str           # e.g. "M529LL1A.SP0"
    bin_path: str
    txt_path: str
    body: bytes
    truth: dict         # {channel: list of int16-quantized samples}
    blocks: List[WaveformBlock]
    segment_starts: List[int]  # block indices of each 40 02 segment header
    segment_sample_starts: List[int]  # for each segment, the truth sample index it starts at
 def s4(n: int) -> int:
    """4-bit signed nibble decode."""
    return n if n < 8 else n - 16
 def i8(b: int) -> int:
    """int8 reinterpret of unsigned byte."""
    return b if b < 128 else b - 256
 def load_fixture(name: str) -> FixtureEvent:
    """Load a fixture event with its truth values and parsed block stream."""
    # Find the fixture (search both subdirs of tests/fixtures/).
    base = os.path.join(os.path.dirname(__file__), "..", "tests", "fixtures")
    candidates = [
        os.path.join(base, "5-11-26", name),
        os.path.join(base, "decode-re-5-8-26", "event-a", name),  # not used directly
    ]
    bin_path = next((c for c in candidates if os.path.exists(c)), None)
    if bin_path is None:
        # Try a glob walk for the 5-8 fixtures (they're in subdirs).
        for root, _, files in os.walk(base):
            if name in files:
                bin_path = os.path.join(root, name)
                break
    if bin_path is None:
        raise FileNotFoundError(name)
    txt_path = bin_path + ".TXT"
    with open(bin_path, "rb") as f:
        raw = f.read()
    body = raw[43:-26]
    truth = _parse_txt(txt_path)
    blocks = walk_body(body, find_data_start(body))
    seg_idx = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
    # Segment 0 starts at sample 0; subsequent segments start at the
    # cumulative sample count from previous segment(s).  Tran's segment 0
    # is N samples; if rotation hypothesis is correct, segment 1's data
    # starts at sample 0 for a *different* channel.  The analyzer should
    # try both "continues from previous segment" and "starts at sample 0
    # of a different channel."
    seg_sample_starts = _compute_segment_sample_starts(blocks, seg_idx)
    return FixtureEvent(
        name=name, bin_path=bin_path, txt_path=txt_path,
        body=body, truth=truth, blocks=blocks,
        segment_starts=seg_idx, segment_sample_starts=seg_sample_starts,
    )
 def _parse_txt(path: str) -> dict:
    """Parse BW ASCII TXT export into {channel: [int_samples_in_16_count_units]}."""
    with open(path, "r", encoding="utf-8", errors="replace") as f:
        lines = f.read().splitlines()
    header_idx = next(
        (i for i, l in enumerate(lines)
         if all(c in l for c in CHANNELS)),
        None,
    )
    if header_idx is None:
        return {ch: [] for ch in CHANNELS}
    out = {ch: [] for ch in CHANNELS}
    for line in lines[header_idx + 1:]:
        parts = re.split(r"\s+", line.strip())
        if len(parts) < 4:
            continue
        try:
            vals = [float(p) for p in parts[:4]]
        except ValueError:
            continue
        for ch, v in zip(CHANNELS, vals):
            # Multiply by LSB_INV; geo channels are in in/s, MicL is in dB(L)
            # (which doesn't quantize the same way — leaving raw for MicL is fine,
            # the scorer should treat MicL specially).
            out[ch].append(round(v * LSB_INV) if ch != "MicL" else v)
    return out
 def _compute_segment_sample_starts(
    blocks: List[WaveformBlock], seg_idx: List[int]
 ) -> List[int]:
    """Cumulative sample-count up to each segment header (if all blocks treated
    as Tran continuation).  Useful as one candidate for segment-1-Tran tests.
    The scorer should ALSO try "segment 1 starts at sample 0 of a new channel"
    as the rotation hypothesis predicts.
    """
    starts = []
    cum = 2  # T[0] + T[1] from preamble
    for i, b in enumerate(blocks):
        if i in seg_idx:
            starts.append(cum)
        if b.tag_hi == 0x10:
            cum += b.tag_lo
        elif b.tag_hi == 0x20:
            cum += b.tag_lo
        elif b.tag_hi == 0x00:
            cum += b.tag_lo
        # 30 NN and 40 02 don't contribute samples (for this hypothesis)
    return starts
 # ── The core algorithm: decode a segment's blocks as deltas ─────────────────
 def decode_segment_as_channel(
    blocks: List[WaveformBlock],
    seg_start_block_idx: int,
    seg_end_block_idx: int,
    anchor: int,
 ) -> List[int]:
    """Apply the segment-0 codec rules to a range of blocks, starting from *anchor*.
    Returns a list of cumulative sample values (one per delta).  Does NOT include
    the anchor itself in the output — the first returned value is anchor + first_delta.
    """
    out = []
    cur = anchor
    for bi in range(seg_start_block_idx, seg_end_block_idx):
        blk = blocks[bi]
        if blk.tag_hi == 0x10:
            for byte in blk.data:
                for nib in ((byte >> 4) & 0xF, byte & 0xF):
                    cur += s4(nib)
                    out.append(cur)
        elif blk.tag_hi == 0x20:
            for byte in blk.data:
                cur += i8(byte)
                out.append(cur)
        elif blk.tag_hi == 0x00:
            for _ in range(blk.tag_lo):
                out.append(cur)
        # 30 NN: skip (content unknown)
        # 40 02: shouldn't appear in segment data (it's the segment header)
    return out
 def score_against_truth(
    decoded: List[int],
    truth: List[int],
    truth_start: int,
 ) -> Tuple[int, int]:
    """Compare *decoded* to truth[truth_start : truth_start + len(decoded)].
    Returns (n_matches, n_compared).
    """
    n = min(len(decoded), len(truth) - truth_start)
    if n <= 0:
        return (0, 0)
    matches = sum(1 for i in range(n) if decoded[i] == truth[truth_start + i])
    return (matches, n)
 # ── TODO for the next pass ──────────────────────────────────────────────────
 def score_segment_against_all_channels(
    event: FixtureEvent,
    segment_index: int,
 ) -> List[Tuple[str, int, int, int]]:
    """For segment *segment_index* of *event*, find the best (channel, start_sample)
    fit.
    For each candidate channel C and each candidate starting truth-sample index s,
    we pick the anchor that makes the FIRST decoded value match truth[C][s], then
    score the remaining decoded values against truth[C][s+1 : s+N].
    Returns rows of (channel_name, start_sample, n_matches, n_compared)
    sorted by match-count descending.
    """
    # Block range of this segment: from the segment header (inclusive) up to
    # the next segment header (exclusive), or end-of-blocks.
    seg_header_idx = event.segment_starts[segment_index]
    next_header_idx = (
        event.segment_starts[segment_index + 1]
        if segment_index + 1 < len(event.segment_starts)
        else len(event.blocks)
    )
    # Decode the segment's data blocks (skip the segment-header block itself).
    # Use anchor=0 — we'll re-anchor when scoring against each channel.
    deltas_trajectory = decode_segment_as_channel(
        event.blocks, seg_header_idx + 1, next_header_idx, anchor=0
    )
    if not deltas_trajectory:
        return []
    n = len(deltas_trajectory)
    results = []
    for ch in ("Tran", "Vert", "Long"):
        truth = event.truth.get(ch)
        if not truth or len(truth) < n + 1:
            continue
        # For each candidate starting sample s in truth, check if applying
        # the deltas starting from truth[s] reproduces truth[s+1:s+n+1].
        best = (0, -1)
        for s in range(len(truth) - n):
            anchor = truth[s]
            offset = anchor - deltas_trajectory[0] + truth[s + 1] - anchor
            # Recompute: trajectory[i] = anchor + cumulative_delta_through_i
            # but we already have deltas_trajectory computed from anchor=0,
            # so trajectory_relative[i] = anchor + deltas_trajectory[i].
            matches = 0
            for i in range(n):
                if truth[s + i + 1] == anchor + deltas_trajectory[i]:
                    matches += 1
                # Note: we could break early on first mismatch for "matches start",
                # but counting total matches gives a more robust score.
            if matches > best[0]:
                best = (matches, s)
        results.append((ch, best[1], best[0], n))
    results.sort(key=lambda r: -r[2])
    return results
 # ── Driver ──────────────────────────────────────────────────────────────────
 def main():
    """Run the analyzer on all loud-bundle events and print best scores."""
    events = ["M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0",
              "M529LL1L.JQ0", "M529LL1L.V70"]
    for name in events:
        try:
            event = load_fixture(name)
        except FileNotFoundError:
            print(f"{name}: fixture not found")
            continue
        print(f"\n=== {name} ===")
        print(f"  body bytes: {len(event.body)}")
        print(f"  blocks: {len(event.blocks)}")
        print(f"  segments: {len(event.segment_starts)}")
        print(f"  segment sample-starts (if all blocks are 1 channel):")
        for si, sample_start in enumerate(event.segment_sample_starts):
            print(f"    seg {si}: sample {sample_start}")
        for si in range(len(event.segment_starts)):
            results = score_segment_against_all_channels(event, si)
            if not results:
                print(f"  seg {si}: (no scorable data)")
                continue
            tag = "✓" if results[0][2] / max(results[0][3], 1) > 0.9 else " "
            top = results[0]
            print(f"  seg {si}: best fit {tag} = {top[0]:<5} "
                  f"starting at sample {top[1]:>5}, {top[2]:>4}/{top[3]:<4} match"
                  + (f"  (next: {results[1][0]} @{results[1][1]} {results[1][2]}/{results[1][3]})"
                     if len(results) > 1 else ""))
 if __name__ == "__main__":
    main()
@@ -0,0 +1,150 @@
 """
 scripts/backfill_record_type.py — fix `record_type` on legacy event
 rows whose value was hardcoded to "Waveform" regardless of actual type.
 Why this is needed
 ──────────────────
 Pre-v0.16.1 the BW file importer (`event_file_io.read_blastware_file`)
 hardcoded `ev.record_type = "Waveform"` for every imported event.  Fixed
 in commit aac1c8e — new ingests now derive the type from the Blastware
 filename's extension last character (H=Histogram, W=Waveform, M=Manual,
 E=Event, C=Combo) per the V10.72+ MiniMate Plus AB0T filename scheme.
 Effect on a server that imported events under the old code: every
 events row has `record_type = "Waveform"`, even for histograms,
 manuals, etc.  Visible in terra-view's event-detail modal under the
 "Record Type" field.  Terra-view also has a client-side workaround
 that derives the type from the filename for display purposes, so
 operators see the correct type in the UI even before this backfill.
 This script makes the DB column match what the UI is already showing,
 which matters for reporting and any downstream consumer that reads
 events.record_type directly.
 This script
 ───────────
 Walks the `events` table and updates each row's `record_type` to the
 derived value from its `blastware_filename`.  Old S338 firmware files
 (3-char extensions ending in `0`) and any unrecognized suffix get
 left at the existing value (defaults to "Waveform").
 Idempotent: re-running after a successful backfill finds zero rows
 needing updates and exits cleanly (it always re-derives but only
 writes when the value would change).
 Usage
 ─────
  # Dry-run (default): print what would change, don't touch the DB
  python -m scripts.backfill_record_type --db bridges/captures/seismo_relay.db
  # Apply the backfill
  python -m scripts.backfill_record_type --db bridges/captures/seismo_relay.db --apply
 """
 from __future__ import annotations
 import argparse
 import sqlite3
 import sys
 from collections import Counter
 from pathlib import Path
 # Must stay in sync with minimateplus.event_file_io._RECORD_TYPE_BY_EXT_SUFFIX.
 _TYPE_FROM_SUFFIX = {
    "H": "Histogram",
    "W": "Waveform",
    "M": "Manual",
    "E": "Event",
    "C": "Combo",
 }
 def derive_record_type(filename: str | None, default: str = "Waveform") -> str:
    """Mirror of minimateplus.event_file_io.derive_record_type_from_filename.
    Vendored here so this script runs without needing the seismo-relay
    package on the Python path (useful on prod where you might be
    running it via `docker exec` against a container's DB volume).
    """
    if not filename:
        return default
    name = Path(filename).name
    if "." not in name:
        return default
    ext = name.rsplit(".", 1)[1]
    if not ext:
        return default
    return _TYPE_FROM_SUFFIX.get(ext[-1].upper(), default)
 def main() -> int:
    ap = argparse.ArgumentParser(description=__doc__)
    ap.add_argument("--db", required=True, help="Path to seismo_relay.db")
    ap.add_argument("--apply", action="store_true",
                    help="Actually write changes (default is dry-run).")
    ap.add_argument("--default", default="Waveform",
                    help="Fallback record_type when filename doesn't encode one. "
                         "Default: Waveform (matches the pre-fix bug's behavior).")
    args = ap.parse_args()
    db_path = Path(args.db)
    if not db_path.exists():
        print(f"ERROR: database not found at {db_path}", file=sys.stderr)
        return 1
    conn = sqlite3.connect(str(db_path))
    conn.row_factory = sqlite3.Row
    cur = conn.cursor()
    cur.execute("""
        SELECT id, blastware_filename, record_type
        FROM events
        WHERE blastware_filename IS NOT NULL
          AND blastware_filename != ''
    """)
    rows = cur.fetchall()
    total = len(rows)
    print(f"Scanning {total:,} event rows…")
    print()
    # Tally proposed changes.
    transitions: Counter[tuple[str, str]] = Counter()
    update_ids: list[tuple[str, str]] = []
    unrecognized = 0
    for row in rows:
        derived = derive_record_type(row["blastware_filename"], default=args.default)
        current = row["record_type"] or ""
        if derived == current:
            continue
        transitions[(current, derived)] += 1
        update_ids.append((row["id"], derived))
    if not update_ids:
        print("Nothing to update — all rows already match.")
        conn.close()
        return 0
    print(f"{len(update_ids):,} row(s) need updating:")
    for (old, new), count in sorted(transitions.items(), key=lambda x: -x[1]):
        print(f"  {count:>6,}  {old!r:14s} → {new!r}")
    print()
    if not args.apply:
        print("(dry-run — re-run with --apply to write changes)")
        conn.close()
        return 0
    print("Applying changes…")
    cur.executemany(
        "UPDATE events SET record_type = ? WHERE id = ?",
        [(new, eid) for eid, new in update_ids],
    )
    conn.commit()
    print(f"Done. Updated {cur.rowcount:,} row(s).")
    conn.close()
    return 0
 if __name__ == "__main__":
    sys.exit(main())
@@ -12,8 +12,20 @@ Walks `<store_root>/<serial>/<filename>` and for each BW event file:
      parsing the BW binary directly (peaks computed from samples).
  Clean waveform (.h5):
-    - Skip when <filename>.h5 already exists (idempotent).
+    - Regenerated whenever the sidecar is regenerated (sha mismatch
-    - Else write from .a5.pkl (preferred) or BW binary parse (fallback).
+      OR sidecar.source.tool_version < current TOOL_VERSION OR --force).
      The .h5 and the sidecar both come from the same decoder output,
      so if the sidecar is stale the .h5 is too.
    - Written when missing.
    - --skip-hdf5 turns off all .h5 writes.
 Typical use after a decoder upgrade:
    1. Pull the new seismo-relay code (which bumped TOOL_VERSION).
    2. Run this script — every sidecar with an older tool_version
       stamp regenerates, and the associated .h5 cascade-regenerates.
    3. Operator review state (review.false_trigger, notes, reviewer)
       and the sidecar's extensions block are preserved across the
       regen.
 Usage:
    python scripts/backfill_sidecars.py [--store-root PATH]
@@ -42,14 +54,26 @@ log = logging.getLogger("backfill_sidecars")
 def _looks_like_event_file(path: Path) -> bool:
-    """Same heuristic as the importer CLI."""
+    """Same heuristic as the importer CLI.
    Filters to BW (Series III) event files only — Thor (Series IV)
    `.IDFW` / `.IDFH` files share the store but have their own ingest
    path (`WaveformStore.save_imported_idf`) and are NOT decodable by
    `event_file_io.read_blastware_file`.  Their sidecars are populated
    at ingest from the paired `.IDFW.txt` ASCII report; nothing the
    backfill regenerates would improve on them, so we exclude them
    from scope.
    """
    if not path.is_file():
        return False
-    if path.name.endswith((".a5.pkl", ".sfm.json")):
+    if path.name.endswith((".a5.pkl", ".sfm.json", ".h5")):
        return False
    ext = path.suffix.lstrip(".")
    if not (3 <= len(ext) <= 4):
        return False
    # Thor IDF files share the .{W,H}-suffix shape but aren't BW.
    if ext.upper() in ("IDFW", "IDFH"):
        return False
    if not (ext[-1].upper() in {"W", "H"} or ext.endswith("0")):
        return False
    try:
@@ -79,6 +103,17 @@ def main(argv=None) -> int:
            "STRT-rectime byte-offset fix in v0.15.x)."
        ),
    )
    p.add_argument(
        "--reparse-txt", action="store_true",
        help=(
            "Re-parse the preserved <serial>/<filename>_ASCII.TXT with the "
            "current bw_ascii_report parser and overwrite the sidecar's "
            "bw_report block.  Use this after upgrading the ASCII parser to "
            "pull in new fields (e.g. zc_freq_above_range for BW '>100 Hz' "
            "ZC peaks).  No-op for events without a preserved .TXT; safely "
            "idempotent when the parser hasn't changed."
        ),
    )
    p.add_argument("-v", "--verbose", action="store_true")
    args = p.parse_args(argv)
@@ -123,7 +158,13 @@ def main(argv=None) -> int:
            #      the sidecar was written by a build that includes any
            #      decoder fixes shipped since).
            # Either part failing → regenerate.  --force bypasses both.
-            if sidecar_path.exists() and not args.force:
+            #
            # Tracks whether we're regenerating the sidecar this iteration
            # so the .h5 logic below knows to refresh that too — staleness
            # of the sidecar implies staleness of the derived .h5 (both
            # come out of the same decoder).
            sidecar_stale = True
            if sidecar_path.exists() and not args.force and not args.reparse_txt:
                try:
                    existing = event_file_io.read_sidecar(sidecar_path)
                    sha_ok = existing.get("blastware", {}).get("sha256") == bw_sha
@@ -136,6 +177,7 @@ def main(argv=None) -> int:
                    ver_ok = _vt(src_ver) >= _vt(event_file_io.TOOL_VERSION)
                    if sha_ok and ver_ok:
                        skipped += 1
                        sidecar_stale = False
                        continue
                    if sha_ok and not ver_ok:
                        log.info(
@@ -256,19 +298,68 @@ def main(argv=None) -> int:
                            or ev.total_samples < derived // 4):
                        ev.total_samples = derived
-                # Preserve user-edited review state + extensions from the
+                # Preserve user-edited review state + extensions + the
-                # existing sidecar (false_trigger flag, notes, etc.) so a
+                # bw_report block from the existing sidecar so a backfill
-                # backfill never wipes them out.
+                # never wipes them out.  The bw_report block originates
-                preserved_review = None
+                # from the paired .TXT ASCII report parsed at ORIGINAL
-                preserved_ext    = None
+                # import time (ach forward / direct upload); the .TXT
                # file is not in the waveform store, so we can't re-derive
                # it from disk.  event_to_sidecar_dict takes a
                # BwAsciiReport dataclass (not a dict), so for bw_report
                # we overlay the existing block after regen instead of
                # passing it as a kwarg.
                preserved_review     = None
                preserved_ext        = None
                preserved_bw_report  = None
                preserved_txt_fn     = None
                if sidecar_path.exists():
                    try:
                        _existing = event_file_io.read_sidecar(sidecar_path)
-                        preserved_review = _existing.get("review")
+                        preserved_review    = _existing.get("review")
-                        preserved_ext    = _existing.get("extensions")
+                        preserved_ext       = _existing.get("extensions")
                        preserved_bw_report = _existing.get("bw_report")
                        # Preserve txt_filename so backfills don't blank out the
                        # pointer to the saved raw .TXT (events ingested after
                        # 2026-05-27 have this).
                        preserved_txt_fn    = (_existing.get("source") or {}).get("txt_filename")
                    except Exception:
                        pass
                # --reparse-txt: if a .TXT is preserved on disk, run the
                # current parser against it and overwrite the bw_report
                # block.  Picks up post-ingest parser fixes (e.g. the
                # 2026-05-28 zc_freq_above_range / ">100 Hz" addition).
                if args.reparse_txt and preserved_txt_fn:
                    try:
                        from minimateplus import bw_ascii_report
                        txt_path = store.txt_path_for(serial, path.name)
                        if txt_path.exists():
                            refreshed = bw_ascii_report.parse_report_file(txt_path)
                            preserved_bw_report = event_file_io._bw_report_to_dict(refreshed)
                            log.debug("reparsed bw_report from %s", txt_path.name)
                        else:
                            log.debug("--reparse-txt: no .TXT at %s (sidecar says %r)",
                                      txt_path, preserved_txt_fn)
                    except Exception as exc:
                        log.warning("--reparse-txt failed for %s: %s", path.name, exc)
                # Overlay BW ASCII report fields onto the rebuilt Event
                # BEFORE the sidecar + DB write.  Mirrors what the ingest
                # path does — BW's reported peaks (and sample_rate /
                # record_time) win over codec output where present.
                #
                # Without this step, --force backfill silently overwrites
                # the bw_report-overlaid DB columns with codec-derived
                # values, which is wrong for events the codec doesn't
                # fully decode (e.g. waveform walker edge cases on
                # SP0/SS0/SV0-style events, or histogram sub-formats with
                # byte[5]!=0 that aren't yet RE'd).  Net effect was PVS=0
                # on three top-10 events on 2026-05-22.
                if preserved_bw_report:
                    event_file_io.apply_bw_report_dict_to_event(
                        ev, preserved_bw_report,
                    )
                sidecar = event_file_io.event_to_sidecar_dict(
                    ev,
                    serial=serial,
@@ -277,16 +368,44 @@ def main(argv=None) -> int:
                    blastware_sha256=bw_sha,
                    source_kind=source_kind,
                    a5_pickle_filename=a5_filename,
                    txt_filename=preserved_txt_fn,
                    review=preserved_review,
                    extensions=preserved_ext,
                )
                if preserved_bw_report is not None:
                    sidecar["bw_report"] = preserved_bw_report
-                # Also emit the .h5 clean-waveform file when missing OR when
+                # Also emit the .h5 clean-waveform file when:
-                # --force was passed (so a re-backfill picks up decoder fixes).
+                #   - it's missing, OR
                #   - --force was passed, OR
                #   - the sidecar is being regenerated this iteration
                #     (sha mismatch / tool_version too old).  The .h5 and
                #     the sidecar are both derived from the same decoder
                #     output, so if the sidecar is stale, so is the .h5.
                #
                # Both waveform and histogram bodies now decode to real
                # samples via event_file_io.read_blastware_file → either
                # waveform_codec.decode_waveform_v2 or histogram_codec.
                # decode_histogram_body.  If samples are still empty after
                # both codecs run, it's a genuine "we can't decode this
                # file" case (truncated, malformed, or unknown mode);
                # skip the .h5 write so we don't replace whatever's
                # there with an empty placeholder.
                has_samples = bool(
                    ev.raw_samples and any(
                        ev.raw_samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL")
                    )
                )
                hdf5_path = store.hdf5_path_for(serial, path.name)
                hdf5_filename = hdf5_path.name if hdf5_path.exists() else None
                hdf5_action = "kept"
-                need_h5 = not args.skip_hdf5 and (args.force or not hdf5_path.exists())
+                need_h5 = (
                    not args.skip_hdf5
                    and (args.force or not hdf5_path.exists() or sidecar_stale)
                    and has_samples
                )
                if not has_samples and not args.skip_hdf5:
                    hdf5_action = "skipped-undecodable"
                if need_h5:
                    if args.dry_run:
                        hdf5_action = "would (re)write"
@@ -326,6 +445,7 @@ def main(argv=None) -> int:
                            }}
                            if ev._waveform_key else None
                        ),
                        device_family="series3",
                    )
                except Exception as exc:
                    log.warning("DB upsert failed for %s: %s", path.name, exc)
@@ -0,0 +1,331 @@
 """
 scripts/backfill_thor_events.py — re-process existing Thor (Series IV)
 events so their sidecars carry the bw_report block produced by
 ``micromate.idf_to_bw_report.build_bw_report_from_idf`` + their .h5
 clean-waveform files for IDFW events.
 Why this exists
 ───────────────
 Thor events ingested before v0.21.0 (or during the v0.21.0 ingest bug
 window fixed in commit bee1185) have sidecars with only
 ``extensions.idf_report`` — no ``bw_report`` block.  Without
 ``bw_report``, the SFM PDF renderer falls back to DB-only fields
 (misses sensor-self-check, full per-channel breakdown, mic dB(L)),
 and the modal chart 404s on ``/waveform.json`` for IDFW events
 because no .h5 was written when the codec failed at ingest.
 Re-forwarding from thor-watcher would also fix this, but that requires
 operator coordination on every watcher machine and uses bandwidth this
 script doesn't.
 What this does
 ──────────────
 Walks ``<store>/<serial>/<filename>`` for ``.IDFW`` / ``.IDFH`` files
 and, for each one:
  1. Reads the existing sidecar (preserving review state + captured_at).
  2. Re-runs ``micromate.idf_file.read_idf_file()`` on the binary
     bytes — passing ``data=`` so the codec doesn't try to read from
     a path it doesn't know.
  3. Pulls ``extensions.idf_report`` (the raw parsed Thor dict the
     v0.18.0+ ingest path already stashed) and runs the v0.21.0
     ``build_bw_report_from_idf`` adapter against it.
  4. Writes the refreshed sidecar with the new ``bw_report``,
     bumped ``source.tool_version``, but preserved ``review`` block
     + the original ``captured_at`` timestamp.
  5. Regenerates the .h5 waveform file via the existing
     ``event_hdf5`` writer.  For IDFW that's the decoded per-sample
     stream; for IDFH it's a 1-sample-per-interval synthesised array
     (peak ADC count per channel) so the renderer's bar-chart code
     has data to group on.  Mic peak psi from the binary is merged
     onto the IdfEvent before the bridge so the h5 writer's per-count
     mic scale factor lands on a sensible value (without this the
     mic chart on Thor events plots dB(L)-as-pseudo-psi and shows
     bomb-level numbers).
 Idempotent.  Re-running it after a parser/adapter change just
 re-writes sidecars — no DB writes, no thor-watcher coordination.
 Usage
 ─────
    python scripts/backfill_thor_events.py [--store-root PATH]
                                           [--dry-run]
                                           [--skip-hdf5]
                                           [--force]
                                           [-v]
 By default, refreshes any Thor event whose sidecar is missing
 ``bw_report`` OR whose ``source.tool_version`` is older than the
 current ``TOOL_VERSION``.  ``--force`` refreshes every Thor event
 regardless.
 """
 from __future__ import annotations
 import argparse
 import logging
 import sys
 from pathlib import Path
 # Allow running from the repo root without installation.
 sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
 from minimateplus import event_file_io
 from sfm.waveform_store import WaveformStore
 log = logging.getLogger("backfill_thor_events")
 def _is_thor_event(path: Path) -> bool:
    if not path.is_file():
        return False
    if path.name.endswith((".sfm.json", ".h5", "_ASCII.TXT")):
        return False
    return path.suffix.upper() in (".IDFW", ".IDFH")
 def _vtuple(s: str) -> tuple:
    try:
        return tuple(int(p) for p in str(s).split(".")[:3])
    except Exception:
        return (0, 0, 0)
 def main(argv=None) -> int:
    p = argparse.ArgumentParser(description=__doc__)
    p.add_argument(
        "--db-path",
        default=str(Path(__file__).resolve().parent.parent / "bridges" / "captures" / "seismo_relay.db"),
        help="Used only to derive the default --store-root.",
    )
    p.add_argument("--store-root", default=None)
    p.add_argument("--dry-run", action="store_true")
    p.add_argument("--skip-hdf5", action="store_true",
                   help="Don't regenerate .h5 files for IDFW events.")
    p.add_argument("--force", action="store_true",
                   help="Refresh every Thor event, not just ones with stale or missing bw_report.")
    p.add_argument("-v", "--verbose", action="store_true")
    args = p.parse_args(argv)
    logging.basicConfig(
        level=logging.DEBUG if args.verbose else logging.INFO,
        format="%(asctime)s  %(levelname)-7s  %(name)s  %(message)s",
        datefmt="%H:%M:%S",
    )
    db_path = Path(args.db_path).expanduser().resolve()
    store_root = (
        Path(args.store_root).expanduser().resolve()
        if args.store_root else db_path.parent / "waveforms"
    )
    if not store_root.exists():
        log.error("store root not found: %s", store_root)
        return 1
    store = WaveformStore(store_root)
    log.info("store root: %s", store_root)
    log.info("current TOOL_VERSION: %s", event_file_io.TOOL_VERSION)
    refreshed = skipped = errors = h5_written = 0
    # Lazy imports so any one of these failing produces a useful error
    # message rather than crashing module-load.
    from micromate.idf_file import read_idf_file
    from micromate.idf_to_bw_report import build_bw_report_from_idf
    for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
        serial = serial_dir.name
        for path in sorted(serial_dir.iterdir()):
            if not _is_thor_event(path):
                continue
            sidecar_path = store.sidecar_path_for(serial, path.name)
            if not sidecar_path.exists():
                log.debug("%s: no sidecar — skipping (this is a binary without ingest history)",
                          path.name)
                skipped += 1
                continue
            try:
                existing = event_file_io.read_sidecar(sidecar_path)
            except Exception as exc:
                log.warning("%s: failed to read sidecar — %s", path.name, exc)
                errors += 1
                continue
            has_bw_report = bool(existing.get("bw_report"))
            existing_version = (existing.get("source") or {}).get("tool_version", "")
            up_to_date = (
                has_bw_report
                and _vtuple(existing_version) >= _vtuple(event_file_io.TOOL_VERSION)
            )
            if up_to_date and not args.force:
                skipped += 1
                continue
            # Re-decode the binary.  Catch + log; continue with .txt-only
            # data if it fails (matches the live ingest path's behavior).
            idf_samples = None
            idf_intervals = None
            binary_md = None
            is_histogram = path.suffix.upper() == ".IDFH"
            try:
                binary_bytes = path.read_bytes()
                res = read_idf_file(path, data=binary_bytes)
                idf_samples = res.samples or None
                idf_intervals = res.intervals
                binary_md = res.binary_metadata
                is_histogram = res.intervals is not None
            except NotImplementedError:
                # sig-B / Blastware-stray binary; no samples but adapter
                # can still produce a bw_report from extensions.idf_report.
                log.debug("%s: binary codec NotImplementedError (sig-B / BW-stray); proceeding from sidecar's idf_report only", path.name)
            except Exception as exc:
                log.warning("%s: binary decode failed — %s; proceeding from sidecar's idf_report only", path.name, exc)
            # Run the adapter.  Pull report_dict from
            # extensions.idf_report (the v0.18.0+ ingest preserved it).
            report_dict = (existing.get("extensions") or {}).get("idf_report") or {}
            if not report_dict and binary_md is None:
                log.debug("%s: no idf_report in sidecar AND no binary metadata — nothing to project", path.name)
                skipped += 1
                continue
            try:
                bw_report = build_bw_report_from_idf(
                    report_dict, binary_md=binary_md,
                    intervals=idf_intervals, is_histogram=is_histogram,
                )
            except Exception as exc:
                log.warning("%s: adapter failed — %s", path.name, exc)
                errors += 1
                continue
            # Build the new sidecar by overlaying refreshed fields onto
            # the existing one — preserves review, captured_at, blastware
            # block, source.kind, etc.
            new_sidecar = dict(existing)  # shallow copy
            new_sidecar["bw_report"] = bw_report
            src = dict(new_sidecar.get("source") or {})
            src["tool_version"] = event_file_io.TOOL_VERSION
            new_sidecar["source"] = src
            # Preserve histogram intervals if the binary decoded them
            # (improves over the original ingest if that one ran before
            # the bee1185 codec fix).
            if idf_intervals is not None:
                ext = dict(new_sidecar.get("extensions") or {})
                ext["idf_intervals"] = [
                    {
                        "offset":     iv.offset,
                        "tran_peak":  iv.peak_count("Tran"),
                        "tran_halfp": iv.tran_halfp,
                        "tran_freq":  iv.freq_hz("Tran"),
                        "vert_peak":  iv.peak_count("Vert"),
                        "vert_halfp": iv.vert_halfp,
                        "vert_freq":  iv.freq_hz("Vert"),
                        "long_peak":  iv.peak_count("Long"),
                        "long_halfp": iv.long_halfp,
                        "long_freq":  iv.freq_hz("Long"),
                        "mic_peak":   iv.peak_count("MicL"),
                        "mic_halfp":  iv.micl_halfp,
                        "mic_freq":   iv.freq_hz("MicL"),
                    }
                    for iv in idf_intervals
                ]
                new_sidecar["extensions"] = ext
            if args.dry_run:
                will_write_h5 = (idf_samples or idf_intervals) and not args.skip_hdf5
                log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)",
                         serial, path.name,
                         "wrote" if not has_bw_report else "refreshed",
                         "would write" if will_write_h5 else "skipped")
            else:
                event_file_io.write_sidecar(sidecar_path, new_sidecar)
                log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)",
                         serial, path.name,
                         "added" if not has_bw_report else "refreshed",
                         len(idf_intervals) if idf_intervals else 0)
            refreshed += 1
            # Regenerate .h5 by replaying the same IdfEvent → Event bridge
            # save_imported_idf uses.  For IDFW we write the decoded per-
            # sample arrays.  For IDFH we synthesise a 1-sample-per-interval
            # array (peak ADC count per channel per interval) so the
            # renderer's bar-chart code has something to group on.
            # Pre-condition: either real samples (IDFW) or decoded intervals
            # (IDFH).  Skip otherwise.
            have_data = bool(idf_samples) or bool(idf_intervals)
            if have_data and not args.skip_hdf5:
                from sfm import event_hdf5
                hdf5_path = store.hdf5_path_for(serial, path.name)
                if args.dry_run:
                    log.debug("[DRY] would write %s", hdf5_path.name)
                else:
                    try:
                        from micromate import IdfEvent
                        from minimateplus.event_file_io import file_sha256
                        idf_event = IdfEvent.from_report(report_dict, path.name)
                        # Merge the binary-derived mic peak psi (only the
                        # binary path knows the proper psi value; the .txt
                        # carries dB(L)).  Without this, the h5 writer's
                        # per-count mic factor is computed against the
                        # dB(L) value-as-pseudo-psi and the mic chart
                        # scales wildly.
                        if (binary_md is not None and res is not None
                                and res.event.peaks.mic_pspl_psi is not None):
                            idf_event.peaks.mic_pspl_psi = res.event.peaks.mic_pspl_psi
                        sha256 = file_sha256(path)
                        waveform_key = bytes.fromhex(sha256)[:16]
                        ev = idf_event.to_minimateplus_event(waveform_key)
                        if is_histogram and idf_intervals:
                            # 1 sample per interval per channel — same
                            # synthesis save_imported_idf uses.  The h5
                            # writer's count×geo_fs/32768 conversion turns
                            # each peak-ADC-count into the bar's physical
                            # value.
                            ev.raw_samples = {
                                "Tran": [iv.peak_count("Tran") for iv in idf_intervals],
                                "Vert": [iv.peak_count("Vert") for iv in idf_intervals],
                                "Long": [iv.peak_count("Long") for iv in idf_intervals],
                                "MicL": [iv.peak_count("MicL") for iv in idf_intervals],
                            }
                            ev.total_samples = ev.total_samples or len(idf_intervals)
                        elif idf_samples:
                            ev.raw_samples = idf_samples
                            n_samp = max(
                                (len(idf_samples.get(ch, []))
                                 for ch in ("Tran", "Vert", "Long", "MicL")),
                                default=0,
                            )
                            ev.total_samples = ev.total_samples or n_samp
                        event_hdf5.write_event_hdf5(
                            hdf5_path, ev,
                            serial=serial,
                            geo_range="normal",
                            source_kind="idf-import",
                            tool_version=event_file_io.TOOL_VERSION,
                        )
                        h5_written += 1
                        log.debug("%s/%s — .h5 written (%s)",
                                  serial, path.name,
                                  f"{len(idf_intervals)} intervals" if is_histogram
                                  else f"{sum(len(v) for v in (idf_samples or {}).values())} samples")
                    except Exception as exc:
                        log.warning("%s/%s — .h5 write failed: %s",
                                    serial, path.name, exc)
    log.info("Done.  refreshed=%d  skipped=%d  errors=%d  h5_written=%d",
             refreshed, skipped, errors, h5_written)
    return 0 if errors == 0 else 2
 if __name__ == "__main__":
    sys.exit(main())
@@ -0,0 +1,100 @@
 #!/usr/bin/env bash
 # Fire-and-forget Stop Monitoring loop — for wedged or constantly-triggering units.
 #
 # Hammers POST /device/stop_monitoring_blind in a tight loop.  The endpoint
 # opens TCP, dumps SESSION_RESET + a few copies of the SUB 0x97 frame, and
 # closes — without ever reading an S3 response.  Each TCP-won attempt is
 # ~50ms of wire activity instead of the multi-frame handshake the regular
 # rescue endpoint does, so windows that are too small for the full rescue
 # can still land a stop-monitoring command.
 #
 # Usage:
 #   ./blind_stop.sh <host> [tcp_port]
 #
 # Env:
 #   SFM_BASE_URL    Default: http://localhost:8200 (SFM direct).
 #                   Set to http://localhost:8001/api/sfm to route through
 #                   Terra-View's proxy.
 #   MAX_ATTEMPTS    Default: 600
 #   SLEEP_S         Default: 0  (no backoff — hammer it)
 #   MAX_TIME_S      Default: 15
 #   CONNECT_TIMEOUT Default: 5
 #   REPEAT          Frames per TCP session (default 3 — increases hit rate
 #                   if the device is busy reading its own buffer).
 #   STOP_ON_OK      Default: 1.  Set to 0 to keep hammering indefinitely
 #                   even after successful sends (every 503 means the device
 #                   is in *another* session, every 200 means our bytes got
 #                   through — but the device may not have processed them).
 set -u
 host="${1:-}"
 tcp_port="${2:-9034}"
 if [[ -z "$host" ]]; then
  echo "usage: $0 <host> [tcp_port]" >&2
  exit 2
 fi
 base="${SFM_BASE_URL:-http://localhost:8200}"
 max_attempts="${MAX_ATTEMPTS:-600}"
 sleep_s="${SLEEP_S:-0}"
 max_time_s="${MAX_TIME_S:-15}"
 connect_timeout="${CONNECT_TIMEOUT:-5}"
 repeat="${REPEAT:-3}"
 stop_on_ok="${STOP_ON_OK:-1}"
 url="${base}/device/stop_monitoring_blind?host=${host}&tcp_port=${tcp_port}&connect_timeout=${connect_timeout}&repeat=${repeat}"
 echo "blind_stop: target ${host}:${tcp_port}  connect_timeout=${connect_timeout}s  repeat=${repeat}"
 echo "blind_stop: POST ${url}"
 echo "blind_stop: up to ${max_attempts} attempts, ${sleep_s}s between, ${max_time_s}s per request"
 echo "blind_stop: stop_on_ok=${stop_on_ok}"
 echo
 ok_count=0
 busy_count=0
 err_count=0
 started=$(date +%s)
 for ((i=1; i<=max_attempts; i++)); do
  printf "[%4d] %s  " "$i" "$(date +%H:%M:%S)"
  http_code=$(curl -sS -o /tmp/blind_resp.$$ -w "%{http_code}" \
    --max-time "$max_time_s" \
    -X POST "$url" || echo "000")
  body=$(cat /tmp/blind_resp.$$ 2>/dev/null || true)
  rm -f /tmp/blind_resp.$$
  case "$http_code" in
    200|201)
      ok_count=$((ok_count + 1))
      echo "SENT  $body"
      if [[ "$stop_on_ok" == "1" ]]; then
        elapsed=$(( $(date +%s) - started ))
        echo
        echo "blind_stop: success after ${i} attempts (${elapsed}s).  ok=${ok_count} busy=${busy_count} err=${err_count}"
        echo "blind_stop: NEXT — wait ~10s, then try the full rescue:"
        echo "  /home/serversdown/seismo-relay/scripts/rescue_device.sh ${host} ${tcp_port}"
        exit 0
      fi
      ;;
    503)
      busy_count=$((busy_count + 1))
      echo "busy (503)"
      ;;
    000)
      err_count=$((err_count + 1))
      echo "curl error"
      ;;
    *)
      err_count=$((err_count + 1))
      echo "HTTP $http_code  $body" | head -c 400
      echo
      ;;
  esac
  [[ "$sleep_s" != "0" ]] && sleep "$sleep_s"
 done
 elapsed=$(( $(date +%s) - started ))
 echo
 echo "blind_stop: gave up after ${max_attempts} attempts (${elapsed}s).  ok=${ok_count} busy=${busy_count} err=${err_count}" >&2
 exit 1
@@ -0,0 +1,185 @@
 """
 scripts/check_bw_report_preservation.py — verify that running backfill_sidecars
 doesn't wipe the `bw_report` block from sidecars that already had one.
 Two-step workflow:
  # Before running backfill — capture a baseline snapshot:
  python scripts/check_bw_report_preservation.py snapshot \
      --store-root /path/to/waveforms \
      --out before.json
  # Run backfill:
  python scripts/backfill_sidecars.py --store-root /path/to/waveforms --force
  # After backfill — diff against the baseline:
  python scripts/check_bw_report_preservation.py diff \
      --store-root /path/to/waveforms \
      --baseline before.json
 The diff classifies every sidecar into one of:
  PRESERVED      had bw_report before, has same hash now  ← GOOD
  CHANGED        had bw_report before, has different hash now  ← suspicious
                 (backfill should only ever copy the block verbatim)
  WIPED          had bw_report before, doesn't now  ← BUG — data loss
  STILL_MISSING  didn't have bw_report before, still doesn't  ← expected
  NEW            didn't have bw_report before, has one now
                 (only possible if a re-ingest happened between snapshots;
                  shouldn't happen during backfill)
  REMOVED        sidecar existed in baseline, file is gone now
  ADDED          sidecar didn't exist in baseline, exists now
 Exit code is 0 if no WIPED or CHANGED entries are found, 1 otherwise.
 """
 from __future__ import annotations
 import argparse
 import hashlib
 import json
 import sys
 from pathlib import Path
 from typing import Optional
 # Allow running from the repo root without installation.
 sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
 from minimateplus import event_file_io
 def _bw_report_hash(sidecar_data: dict) -> Optional[str]:
    """Canonical-JSON hash of the bw_report block, or None if absent."""
    br = sidecar_data.get("bw_report")
    if not br:
        return None
    # sort_keys for stable hashing across dict-ordering differences
    blob = json.dumps(br, sort_keys=True, separators=(",", ":"))
    return hashlib.sha256(blob.encode()).hexdigest()
 def _scan_store(store_root: Path) -> dict:
    """Walk every <serial>/<file>.sfm.json and return {relpath: hash_or_None}.
    Relpath is `<serial>/<filename>` — stable across machines/snapshots.
    """
    out: dict[str, Optional[str]] = {}
    for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
        for sidecar in sorted(serial_dir.glob("*.sfm.json")):
            relpath = f"{serial_dir.name}/{sidecar.name}"
            try:
                data = event_file_io.read_sidecar(sidecar)
            except Exception as exc:
                print(f"  WARN: failed to read {relpath}: {exc}", file=sys.stderr)
                continue
            out[relpath] = _bw_report_hash(data)
    return out
 def cmd_snapshot(args) -> int:
    store_root = Path(args.store_root).expanduser().resolve()
    if not store_root.exists():
        print(f"error: store root does not exist: {store_root}", file=sys.stderr)
        return 2
    out_path = Path(args.out).expanduser().resolve()
    print(f"Scanning {store_root} …")
    snapshot = _scan_store(store_root)
    with_bw    = sum(1 for v in snapshot.values() if v is not None)
    without_bw = sum(1 for v in snapshot.values() if v is None)
    print(f"  total sidecars:     {len(snapshot)}")
    print(f"  with bw_report:     {with_bw}")
    print(f"  without bw_report:  {without_bw}")
    out_path.parent.mkdir(parents=True, exist_ok=True)
    with open(out_path, "w") as f:
        json.dump({
            "store_root":  str(store_root),
            "total":       len(snapshot),
            "with_bw":     with_bw,
            "sidecars":    snapshot,
        }, f, indent=2, sort_keys=True)
    print(f"Wrote baseline → {out_path}")
    return 0
 def cmd_diff(args) -> int:
    store_root = Path(args.store_root).expanduser().resolve()
    if not store_root.exists():
        print(f"error: store root does not exist: {store_root}", file=sys.stderr)
        return 2
    baseline_path = Path(args.baseline).expanduser().resolve()
    if not baseline_path.exists():
        print(f"error: baseline file not found: {baseline_path}", file=sys.stderr)
        return 2
    with open(baseline_path) as f:
        baseline = json.load(f)
    before = baseline["sidecars"]
    print(f"Scanning {store_root} for comparison against {baseline_path.name} …")
    after = _scan_store(store_root)
    classes = {k: [] for k in (
        "PRESERVED", "CHANGED", "WIPED", "STILL_MISSING", "NEW", "REMOVED", "ADDED",
    )}
    all_keys = set(before) | set(after)
    for key in sorted(all_keys):
        b = before.get(key, "__MISSING__")
        a = after.get(key, "__MISSING__")
        if b == "__MISSING__":
            classes["ADDED"].append(key)
        elif a == "__MISSING__":
            classes["REMOVED"].append(key)
        elif b is None and a is None:
            classes["STILL_MISSING"].append(key)
        elif b is None and a is not None:
            classes["NEW"].append(key)
        elif b is not None and a is None:
            classes["WIPED"].append(key)
        elif b == a:
            classes["PRESERVED"].append(key)
        else:
            classes["CHANGED"].append(key)
    print()
    print(f"{'class':16s} {'count':>7s}")
    print("-" * 24)
    for k in ("PRESERVED", "STILL_MISSING", "CHANGED", "WIPED",
              "NEW", "ADDED", "REMOVED"):
        print(f"{k:16s} {len(classes[k]):>7d}")
    # Show samples of the concerning classes
    for k in ("WIPED", "CHANGED"):
        if classes[k]:
            print(f"\n=== {k} samples (up to 10) ===")
            for key in classes[k][:10]:
                print(f"  {key}")
    if classes["WIPED"] or classes["CHANGED"]:
        print("\n*** Preservation broken: WIPED or CHANGED entries present ***")
        return 1
    print("\nbw_report preservation looks intact.")
    return 0
 def main(argv=None) -> int:
    p = argparse.ArgumentParser(description=__doc__)
    sub = p.add_subparsers(dest="cmd", required=True)
    p_snap = sub.add_parser("snapshot", help="capture baseline bw_report hashes")
    p_snap.add_argument("--store-root", required=True)
    p_snap.add_argument("--out", required=True, help="output JSON path")
    p_snap.set_defaults(func=cmd_snapshot)
    p_diff = sub.add_parser("diff", help="diff current store against a baseline")
    p_diff.add_argument("--store-root", required=True)
    p_diff.add_argument("--baseline",   required=True, help="JSON from `snapshot`")
    p_diff.set_defaults(func=cmd_diff)
    args = p.parse_args(argv)
    return args.func(args)
 if __name__ == "__main__":
    sys.exit(main())
@@ -0,0 +1,151 @@
 """
 scripts/repair_unknown_serials.py — re-attribute events stuck under
 `serial = 'UNKNOWN'` to their correct serial by decoding the BW filename.
 Why this is needed
 ──────────────────
 The /db/import/blastware_file endpoint had a bug (fixed in commit a032fa5+1
 on the ach-report-ingestion branch) where every forwarded event was inserted
 with serial='UNKNOWN' because the endpoint's `_serial_from_event(ev)` stub
 returned None and never consulted the BW-filename serial that
 `WaveformStore.save_imported_bw()` had already decoded.
 Effect on a server that ran a buggy version: every forwarded event's
 SeismoDb row has `serial='UNKNOWN'`, even though the on-disk waveform
 store has correctly bucketed the files into `BE<NNNN>/` folders.  So
 the BW binaries / sidecars / HDF5s are fine, but `/db/units` and
 `/db/events?serial=...` queries don't surface the events.
 This script
 ───────────
 Walks the events table looking for rows with `serial='UNKNOWN'` and
 re-attributes each one to the serial decoded from its
 `blastware_filename` column.  If the row's serial would collide with
 an existing row (already-correct duplicate from a later re-forward),
 the UNKNOWN row is deleted.  Otherwise the row's `serial` column is
 updated in-place.
 Idempotent: re-running after a successful repair finds zero matching
 rows and exits cleanly.
 Usage
 ─────
  # Dry-run (default): print what would change, don't touch the DB
  python -m scripts.repair_unknown_serials --db bridges/captures/seismo_relay.db
  # Apply the repair
  python -m scripts.repair_unknown_serials --db bridges/captures/seismo_relay.db --apply
 """
 from __future__ import annotations
 import argparse
 import sqlite3
 import sys
 from pathlib import Path
 # Reach into sfm.waveform_store for the serial decoder.  This script
 # is run from the repo root via `python -m scripts.repair_unknown_serials`.
 sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
 from sfm.waveform_store import _serial_from_bw_filename
 def main(argv: list[str] | None = None) -> int:
    p = argparse.ArgumentParser(
        description="Re-attribute events stuck under serial='UNKNOWN'.",
    )
    p.add_argument(
        "--db", required=True, type=Path,
        help="Path to seismo_relay.db (e.g. bridges/captures/seismo_relay.db)",
    )
    p.add_argument(
        "--apply", action="store_true",
        help="Apply the repair.  Without this flag the script runs in "
             "dry-run mode and only reports what would change.",
    )
    args = p.parse_args(argv)
    if not args.db.exists():
        print(f"DB not found: {args.db}", file=sys.stderr)
        return 2
    conn = sqlite3.connect(str(args.db))
    conn.row_factory = sqlite3.Row
    rows = list(conn.execute(
        "SELECT id, serial, timestamp, blastware_filename "
        "  FROM events "
        " WHERE serial = 'UNKNOWN' "
        " ORDER BY timestamp",
    ))
    print(f"Found {len(rows)} UNKNOWN-serial rows in events table.")
    if not rows:
        return 0
    updated   = 0
    deleted   = 0
    unresolved = 0
    by_serial: dict[str, int] = {}
    for row in rows:
        rid       = row["id"]
        ts        = row["timestamp"]
        bw_name   = row["blastware_filename"]
        new_serial = _serial_from_bw_filename(bw_name) if bw_name else None
        if not new_serial:
            print(f"  ⚠ id={rid[:8]} ts={ts} filename={bw_name!r} — "
                  f"cannot decode serial from filename; skipping")
            unresolved += 1
            continue
        # Check for an existing row at the target (serial, timestamp).
        existing = conn.execute(
            "SELECT id FROM events WHERE serial = ? AND timestamp = ?",
            (new_serial, ts),
        ).fetchone()
        action: str
        if existing is None:
            # Safe to UPDATE in place.
            if args.apply:
                conn.execute(
                    "UPDATE events SET serial = ? WHERE id = ?",
                    (new_serial, rid),
                )
            action = "UPDATE"
            updated += 1
        else:
            # A correctly-attributed row already exists.  Drop the
            # UNKNOWN duplicate.
            if args.apply:
                conn.execute("DELETE FROM events WHERE id = ?", (rid,))
            action = "DELETE (dup)"
            deleted += 1
        by_serial[new_serial] = by_serial.get(new_serial, 0) + 1
        print(f"  {action:14s}  id={rid[:8]}  ts={ts}  "
              f"filename={bw_name}  →  {new_serial}")
    if args.apply:
        conn.commit()
    conn.close()
    print()
    print(f"Summary:")
    print(f"  UNKNOWN rows scanned:       {len(rows)}")
    print(f"  Updated to real serial:     {updated}")
    print(f"  Deleted (duplicate of an    ")
    print(f"   already-correct row):      {deleted}")
    print(f"  Unresolved (bad filename):  {unresolved}")
    print()
    if by_serial:
        print(f"Per-serial breakdown of repaired rows:")
        for serial, count in sorted(by_serial.items()):
            print(f"  {serial:12s}  {count}")
    if not args.apply:
        print()
        print("(dry-run — re-run with --apply to commit)")
    return 0
 if __name__ == "__main__":
    sys.exit(main())
@@ -0,0 +1,99 @@
 #!/usr/bin/env bash
 # Rescue an uncooperative MiniMate that's busy with another ACH session.
 #
 # Hammers POST /device/rescue in a tight loop with a short timeout.  When the
 # device is in an ACH session our SYN either gets refused or silently dropped
 # (5s connect timeout inside the endpoint) and we retry immediately.  When the
 # device is between sessions, our TCP wins, the endpoint disables Auto Call
 # Home and erases events inside the same session, then returns success.
 #
 # Usage:
 #   ./rescue_device.sh <host> [tcp_port] [--no-erase] [--no-disable-ach]
 #
 # Examples:
 #   ./rescue_device.sh 166.246.130.1 9034
 #   ./rescue_device.sh 166.246.130.1 9034 --no-erase     # just silence it
 #
 # Environment:
 #   SFM_BASE_URL    Defaults to http://localhost:8200 (SFM direct).
 #                   Set to http://localhost:8001/api/sfm to route through
 #                   Terra-View's proxy.  Direct mode avoids the proxy's
 #                   60s timeout, which matters for long-running endpoints.
 #   MAX_ATTEMPTS    Cap on retries (default 600 ≈ 30+ min).
 #   SLEEP_S         Backoff between attempts (default 1).
 #   MAX_TIME_S      Per-request timeout (default 60).
 #   CONNECT_TIMEOUT TCP connect timeout (default 5).
 #   RECV_TIMEOUT    Per-frame S3 recv timeout (default 5).  If POLL or any
 #                   subsequent frame doesn't respond within this window, the
 #                   rescue endpoint bails and this script retries.
 set -u
 host="${1:-}"
 tcp_port="${2:-9034}"
 shift 2 2>/dev/null || shift $# 2>/dev/null
 if [[ -z "$host" ]]; then
  echo "usage: $0 <host> [tcp_port] [--no-erase] [--no-disable-ach]" >&2
  exit 2
 fi
 disable_ach="true"
 erase="true"
 for arg in "$@"; do
  case "$arg" in
    --no-erase)        erase="false" ;;
    --no-disable-ach)  disable_ach="false" ;;
    *) echo "unknown flag: $arg" >&2; exit 2 ;;
  esac
 done
 base="${SFM_BASE_URL:-http://localhost:8200}"
 max_attempts="${MAX_ATTEMPTS:-600}"
 sleep_s="${SLEEP_S:-1}"
 max_time_s="${MAX_TIME_S:-60}"
 connect_timeout="${CONNECT_TIMEOUT:-5}"
 recv_timeout="${RECV_TIMEOUT:-5}"
 url="${base}/device/rescue?host=${host}&tcp_port=${tcp_port}&disable_ach=${disable_ach}&erase=${erase}&connect_timeout=${connect_timeout}&recv_timeout=${recv_timeout}"
 echo "rescue: target ${host}:${tcp_port}  disable_ach=${disable_ach}  erase=${erase}"
 echo "rescue: connect_timeout=${connect_timeout}s  recv_timeout=${recv_timeout}s"
 echo "rescue: POST ${url}"
 echo "rescue: up to ${max_attempts} attempts, ${sleep_s}s between, ${max_time_s}s per request"
 echo
 started=$(date +%s)
 for ((i=1; i<=max_attempts; i++)); do
  printf "[%3d] %s  " "$i" "$(date +%H:%M:%S)"
  http_code=$(curl -sS -o /tmp/rescue_resp.$$ -w "%{http_code}" \
    --max-time "$max_time_s" \
    -X POST "$url" || echo "000")
  body=$(cat /tmp/rescue_resp.$$ 2>/dev/null || true)
  rm -f /tmp/rescue_resp.$$
  case "$http_code" in
    200|201)
      elapsed=$(( $(date +%s) - started ))
      echo "OK  (${elapsed}s total)"
      echo "$body"
      exit 0
      ;;
    503)
      # Connection refused / timeout — device busy in another session.  Retry fast.
      echo "busy (503)"
      ;;
    000)
      echo "curl error (network)"
      ;;
    *)
      echo "HTTP $http_code"
      echo "  $body" | head -c 400
      echo
      ;;
  esac
  sleep "$sleep_s"
 done
 echo "rescue: gave up after ${max_attempts} attempts" >&2
 exit 1
@@ -0,0 +1,44 @@
 #!/usr/bin/env bash
 # Hold a single TCP session open and drip stop-monitoring frames at a slow
 # rate, so the device's UART RX FIFO has time to drain between sends.
 #
 # Use when high-rate spam isn't landing — typically because the device's
 # firmware is too busy to drain its serial buffer fast enough and bytes
 # are being lost to UART overrun.
 #
 # Usage:
 #   ./slow_drip.sh <host> [tcp_port] [duration_s]
 #
 # Env:
 #   DURATION         Default: 120 (seconds; arg 3 overrides). Clamped 1..600.
 #   INTERVAL         Seconds between drip sends (default 3).  Lower = more
 #                    aggressive, more risk of FIFO overrun.  Higher = safer
 #                    but fewer total drips per duration.
 #   CONNECT_TIMEOUT  Default: 5
 #   SFM_BASE_URL     Default: http://localhost:8200 (SFM direct).
 set -u
 host="${1:-}"
 tcp_port="${2:-9034}"
 duration="${3:-${DURATION:-120}}"
 if [[ -z "$host" ]]; then
  echo "usage: $0 <host> [tcp_port] [duration_s]" >&2
  exit 2
 fi
 base="${SFM_BASE_URL:-http://localhost:8200}"
 interval="${INTERVAL:-3}"
 connect_timeout="${CONNECT_TIMEOUT:-5}"
 url="${base}/device/stop_monitoring_slow_drip?host=${host}&tcp_port=${tcp_port}&duration_s=${duration}&interval_s=${interval}&connect_timeout=${connect_timeout}"
 echo "slow_drip: target ${host}:${tcp_port}  duration=${duration}s  interval=${interval}s  connect_timeout=${connect_timeout}s"
 echo "slow_drip: POST ${url}"
 echo
 # Give curl enough slack to wait out the duration plus a buffer
 max_time=$(awk -v d="$duration" 'BEGIN { printf "%d", d + 30 }')
 curl -sS --max-time "$max_time" -X POST "$url"
 echo
@@ -0,0 +1,48 @@
 #!/usr/bin/env bash
 # Hammer a device with blind stop-monitoring sessions as fast as possible.
 # Single HTTP call kicks off the burst inside SFM (no per-attempt HTTP
 # overhead).  Default: 10 seconds, ~500 ms per attempt = ~20 attempts/sec.
 #
 # Usage:
 #   ./spam_stop.sh <host> [tcp_port] [duration_s]
 #
 # Examples:
 #   ./spam_stop.sh 166.246.130.1                  # 10s burst
 #   ./spam_stop.sh 166.246.130.1 9034 30          # 30s burst
 #   DURATION=60 CONNECT_TIMEOUT=0.2 ./spam_stop.sh 166.246.130.1
 #
 # Env:
 #   SFM_BASE_URL     Default: http://localhost:8200 (SFM direct).
 #                    Set to http://localhost:8001/api/sfm to route through
 #                    Terra-View's proxy — but note the proxy has a 60s
 #                    timeout, so long bursts need direct mode.
 #   DURATION         Default: 10 (seconds; arg 3 overrides)
 #   CONNECT_TIMEOUT  Default: 0.5 (seconds)
 #   REPEAT           Default: 3   (stop frames per TCP session)
 set -u
 host="${1:-}"
 tcp_port="${2:-9034}"
 duration="${3:-${DURATION:-10}}"
 if [[ -z "$host" ]]; then
  echo "usage: $0 <host> [tcp_port] [duration_s]" >&2
  exit 2
 fi
 base="${SFM_BASE_URL:-http://localhost:8200}"
 connect_timeout="${CONNECT_TIMEOUT:-0.5}"
 repeat="${REPEAT:-3}"
 url="${base}/device/stop_monitoring_spam?host=${host}&tcp_port=${tcp_port}&duration_s=${duration}&connect_timeout=${connect_timeout}&repeat=${repeat}"
 echo "spam_stop: target ${host}:${tcp_port}  duration=${duration}s  connect_timeout=${connect_timeout}s  repeat=${repeat}"
 echo "spam_stop: POST ${url}"
 echo
 # Give curl enough slack to wait out the duration plus a buffer
 max_time=$(awk -v d="$duration" 'BEGIN { printf "%d", d + 10 }')
 curl -sS --max-time "$max_time" -X POST "$url"
 echo
@@ -0,0 +1,91 @@
 """Re-ingest a prod IDFW + IDFH via the patched save_imported_idf and
 render both PDFs to confirm charts have data."""
 from __future__ import annotations
 import sys
 import json
 import datetime
 import tempfile
 from pathlib import Path
 sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
 from sfm.waveform_store import WaveformStore
 from sfm import report_pdf
 import h5py
 class FakeDb:
    def __init__(self, event):
        self.event = event
    def get_event(self, _id):
        return self.event
 def to_ts_iso(ts):
    if ts is None:
        return None
    try:
        return datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
    except Exception:
        return None
 def render_case(idf_path: Path, serial: str, out_pdf: Path, h5_summary: bool = True):
    with tempfile.TemporaryDirectory() as td:
        store = WaveformStore(Path(td))
        ev, rec = store.save_imported_idf(
            idf_path.read_bytes(),
            idf_path,
            idf_report_text=None,    # production worst case: no .txt
        )
        print(f"=== {idf_path.name} ===")
        print(f"  h5: {rec['hdf5_filename']}, sidecar: {rec['sidecar_filename']}")
        h5p = Path(td) / serial / f"{idf_path.name}.h5"
        if h5p.exists() and h5_summary:
            with h5py.File(h5p) as h:
                for ch in ("Tran", "Vert", "Long", "MicL"):
                    ds = h.get(f"samples/{ch}")
                    if ds is not None:
                        n = ds.shape[0]
                        mx = float(abs(ds[...]).max()) if n else 0
                        print(f"  samples/{ch}: n={n}  max_abs={mx:.5f}")
        record_type = "Histogram" if idf_path.suffix.upper() == ".IDFH" else "Waveform"
        fake_row = {
            "serial":              serial,
            "blastware_filename":  rec["filename"],
            "record_type":         record_type,
            "timestamp":           to_ts_iso(ev.timestamp),
            "sample_rate":         ev.sample_rate,
            "project":             ev.project_info.project if ev.project_info else None,
            "client":              ev.project_info.client if ev.project_info else None,
            "operator":            ev.project_info.operator if ev.project_info else None,
            "sensor_location":     ev.project_info.sensor_location if ev.project_info else None,
            "created_at":          None,
        }
        rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
        print(f"  ReportData: channels={ {k: len(v) for k,v in rd.channels.items()} }")
        if rd.is_histogram:
            print(f"  histogram n_intervals={rd.histogram_n_intervals} interval_size={rd.histogram_interval_size}")
        pdf = report_pdf.render_event_report_pdf(rd)
        out_pdf.write_bytes(pdf)
        print(f"  PDF: {out_pdf}  ({len(pdf)} bytes)")
 def main():
    out_dir = Path("/tmp/thor_render_test"); out_dir.mkdir(exist_ok=True)
    cases = [
        # IDFW that decoded to preamble-only under the old codec
        ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804154137.IDFW", "UM6047"),
        # IDFW that worked under the old codec (validates no regression)
        ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804104450.IDFW", "UM6047"),
        # IDFH histogram
        ("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804190047.IDFH", "UM6047"),
    ]
    for path, serial in cases:
        render_case(Path(path), serial, out_dir / f"{Path(path).name}.pdf")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,58 @@
 #!/usr/bin/env bash
 # Passive monitor for a misbehaving unit.  Every INTERVAL seconds, attempts
 # a single short TCP probe + storage_range read and logs the result.  Designed
 # to run unattended for hours/days and tell you when the unit comes back.
 #
 # Usage:
 #   ./watch_unit.sh <host> [tcp_port]
 #
 # Env:
 #   INTERVAL    Seconds between checks (default 300 = 5 min)
 #   LOG_FILE    Append results here (default /tmp/watch_<host>.log)
 #   SFM_BASE_URL  Default: http://localhost:8200
 set -u
 host="${1:-}"
 tcp_port="${2:-9034}"
 if [[ -z "$host" ]]; then
  echo "usage: $0 <host> [tcp_port]" >&2
  exit 2
 fi
 interval="${INTERVAL:-300}"
 log_file="${LOG_FILE:-/tmp/watch_${host}.log}"
 base="${SFM_BASE_URL:-http://localhost:8200}"
 url="${base}/device/events/storage_range?host=${host}&tcp_port=${tcp_port}"
 echo "watch_unit: target ${host}:${tcp_port}  interval=${interval}s  log=${log_file}"
 echo "watch_unit: Ctrl-C to stop"
 while true; do
  ts=$(date '+%Y-%m-%d %H:%M:%S')
  http_code=$(curl -sS -o /tmp/watch_resp.$$ -w "%{http_code}" \
    --max-time 20 "$url" || echo "000")
  body=$(cat /tmp/watch_resp.$$ 2>/dev/null || true)
  rm -f /tmp/watch_resp.$$
  case "$http_code" in
    200|201)
      # Strip the raw_hex for readability
      summary=$(echo "$body" | sed 's/"raw_hex":"[^"]*",*//; s/,*$//' | head -c 200)
      echo "$ts  REACHABLE  $summary" | tee -a "$log_file"
      ;;
    502|503)
      err=$(echo "$body" | head -c 150)
      echo "$ts  ERROR_$http_code  $err" | tee -a "$log_file"
      ;;
    000)
      echo "$ts  CURL_FAIL  (network/timeout)" | tee -a "$log_file"
      ;;
    *)
      echo "$ts  HTTP_$http_code  $(echo "$body" | head -c 150)" | tee -a "$log_file"
      ;;
  esac
  sleep "$interval"
 done
@@ -85,6 +85,7 @@ CREATE TABLE IF NOT EXISTS events (
    blastware_filesize      INTEGER,                -- bytes; NULL if no event file saved
    a5_pickle_filename      TEXT,                   -- "<filename>.a5.pkl" sidecar
    sidecar_filename        TEXT,                   -- "<filename>.sfm.json" review/metadata sidecar
    device_family           TEXT,                   -- "series3" (MiniMate Plus / BW) | "series4" (Micromate / Thor) — drives per-family UI rendering (units, labels)
    created_at              TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%SZ', 'now')),
    UNIQUE(serial, timestamp)
 );
@@ -198,11 +199,53 @@ class SeismoDb:
            ("blastware_filesize", "INTEGER"),
            ("a5_pickle_filename", "TEXT"),
            ("sidecar_filename",   "TEXT"),
            ("device_family",      "TEXT"),
        ):
            if col not in existing_cols:
                log.info("_migrate: events ADD COLUMN %s %s", col, ddl)
                conn.execute(f"ALTER TABLE events ADD COLUMN {col} {ddl}")
        # Migration 1c: backfill device_family for existing rows by sniffing
        # the device-native binary filename's extension.  Thor (Micromate
        # Series IV) writes `.IDFH` / `.IDFW`; MiniMate Plus (Series III)
        # writes `.AB0*` / `.N00` / `.<base36>` Blastware extensions.  We do
        # this here rather than from sidecars so the migration is fully
        # self-contained (doesn't need the waveform-store root) and runs at
        # DB-init time.  Only fills NULL device_family so re-runs are no-ops.
        rebackfill = conn.execute(
            "SELECT COUNT(*) FROM events WHERE device_family IS NULL"
        ).fetchone()
        if rebackfill and rebackfill[0] > 0:
            log.info("_migrate: backfilling device_family for %d events", rebackfill[0])
            # Series IV (Thor IDF) — extension is exactly .IDFH or .IDFW
            conn.execute(
                """
                UPDATE events
                   SET device_family = 'series4'
                 WHERE device_family IS NULL
                   AND (
                        UPPER(blastware_filename) LIKE '%.IDFH'
                     OR UPPER(blastware_filename) LIKE '%.IDFW'
                   )
                """
            )
            # Everything else with a filename → Series III (Blastware family)
            conn.execute(
                """
                UPDATE events
                   SET device_family = 'series3'
                 WHERE device_family IS NULL
                   AND blastware_filename IS NOT NULL
                """
            )
            # Rows with no filename (e.g. older monitor_log-derived events)
            # stay NULL — UI handles NULL as "unknown family".
            remaining = conn.execute(
                "SELECT COUNT(*) FROM events WHERE device_family IS NULL"
            ).fetchone()[0]
            log.info("_migrate: device_family backfill complete (remaining NULL=%d)",
                     remaining)
        # Migration 2: change monitor_log UNIQUE from (serial, waveform_key) to
        # (serial, start_time) — same reasoning as events.
        row = conn.execute(
@@ -302,6 +345,7 @@ class SeismoDb:
        serial: str,
        session_id: Optional[str] = None,
        waveform_records: Optional[dict[str, dict]] = None,
        device_family: Optional[str] = None,
    ) -> tuple[int, int]:
        """
        Insert triggered events. Silently skips duplicates (serial+timestamp).
@@ -316,6 +360,11 @@ class SeismoDb:
        (dedup hit), the matching waveform record is upserted onto the
        existing row so a re-download via the live endpoint refreshes the
        file metadata.
        ``device_family`` (optional): "series3" (MiniMate Plus / Blastware) or
        "series4" (Micromate / Thor).  Drives per-family UI rendering — most
        importantly the mic-unit convention (psi vs dB(L)).  Set on every
        insert and overwritten on every UPSERT so the latest writer wins.
        """
        inserted = skipped = 0
        wave_recs = waveform_records or {}
@@ -349,8 +398,9 @@ class SeismoDb:
                             project, client, operator, sensor_location,
                             sample_rate, record_type,
                             blastware_filename, blastware_filesize,
-                             a5_pickle_filename, sidecar_filename)
+                             a5_pickle_filename, sidecar_filename,
-                        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+                             device_family)
                        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
                        """,
                        (
                            self._new_id(), serial, key, session_id, ts,
@@ -369,33 +419,72 @@ class SeismoDb:
                            rec.get("filesize"),
                            rec.get("a5_pickle_filename"),
                            rec.get("sidecar_filename"),
                            device_family,
                        ),
                    )
                    inserted += 1
                except sqlite3.IntegrityError:
                    skipped += 1
-                    # Upsert waveform fields onto the existing dedup row so a
+                    # UPSERT path: a row for this (serial, timestamp) already
-                    # re-download via the live endpoint refreshes filename /
+                    # exists.  Refresh every device-authoritative field from
-                    # size / sidecar without churning the rest of the row.
+                    # the new data so that a re-import with better data (e.g.
-                    if rec and ts:
+                    # a watcher re-forward where the previous attempt missed
-                        conn.execute(
+                    # the paired BW ASCII report) replaces stale peaks /
-                            """
+                    # project info / sample_rate.
-                            UPDATE events
+                    #
-                               SET blastware_filename = ?,
+                    # Preserved (not in this UPDATE):
-                                   blastware_filesize = ?,
+                    #   id, waveform_key, session_id, created_at  — immutable / FK
-                                   a5_pickle_filename = ?,
+                    #   false_trigger                              — operator review state
-                                   sidecar_filename   = ?
+                    #
-                             WHERE serial = ? AND timestamp = ?
+                    # Behaviour change vs prior versions: this UPDATE used
-                            """,
+                    # to only refresh filename / filesize / a5_pickle /
-                            (
+                    # sidecar fields.  As a result, the first insert's
-                                rec.get("filename"),
+                    # broken-codec peak values were locked in forever even
-                                rec.get("filesize"),
+                    # if subsequent re-forwards arrived with correct
-                                rec.get("a5_pickle_filename"),
+                    # report-derived values.  Now every re-import lifts the
-                                rec.get("sidecar_filename"),
+                    # DB row up to whatever the latest Event carries.
-                                serial,
+                    conn.execute(
-                                ts,
+                        """
-                            ),
+                        UPDATE events
-                        )
+                           SET tran_ppv           = ?,
                               vert_ppv           = ?,
                               long_ppv           = ?,
                               peak_vector_sum    = ?,
                               mic_ppv            = ?,
                               project            = ?,
                               client             = ?,
                               operator           = ?,
                               sensor_location    = ?,
                               sample_rate        = ?,
                               record_type        = ?,
                               blastware_filename = ?,
                               blastware_filesize = ?,
                               a5_pickle_filename = ?,
                               sidecar_filename   = ?,
                               device_family      = COALESCE(?, device_family)
                         WHERE serial = ? AND timestamp = ?
                        """,
                        (
                            pv.tran            if pv else None,
                            pv.vert            if pv else None,
                            pv.long            if pv else None,
                            pv.peak_vector_sum if pv else None,
                            pv.micl            if pv else None,
                            pi.project         if pi else None,
                            pi.client          if pi else None,
                            pi.operator        if pi else None,
                            pi.sensor_location if pi else None,
                            ev.sample_rate,
                            ev.record_type,
                            rec.get("filename")           if rec else None,
                            rec.get("filesize")           if rec else None,
                            rec.get("a5_pickle_filename") if rec else None,
                            rec.get("sidecar_filename")   if rec else None,
                            device_family,
                            serial,
                            ts,
                        ),
                    )
        log.debug("insert_events serial=%s inserted=%d skipped=%d",
                  serial, inserted, skipped)
@@ -455,6 +544,75 @@ class SeismoDb:
            )
        return cur.rowcount > 0
    def delete_event(self, event_id: str) -> Optional[dict]:
        """
        Hard-delete one event row by id.  Returns the deleted row (so the
        caller can clean up any on-disk files referenced by it) or None
        if no row matched.
        """
        with self._connect() as conn:
            row = conn.execute(
                "SELECT * FROM events WHERE id = ?", (event_id,),
            ).fetchone()
            if row is None:
                return None
            conn.execute("DELETE FROM events WHERE id = ?", (event_id,))
        return dict(row)
    def delete_events_bulk(
        self,
        serial: Optional[str] = None,
        from_dt: Optional[datetime.datetime] = None,
        to_dt: Optional[datetime.datetime] = None,
        false_trigger: Optional[bool] = None,
        ids: Optional[list[str]] = None,
    ) -> list[dict]:
        """
        Hard-delete events matching the given filters.  Returns the list
        of deleted row dicts.  Refuses to delete with no filters at all
        (would wipe the whole table) — raises ValueError.
        Filter semantics match query_events: serial / from_dt / to_dt /
        false_trigger combine with AND.  `ids` is an additional inclusion
        list (event_id IN (...)); if supplied alongside other filters,
        only rows matching all conditions are deleted.
        """
        clauses: list[str] = []
        params:  list      = []
        if serial:
            clauses.append("serial = ?")
            params.append(serial)
        if from_dt:
            clauses.append("timestamp >= ?")
            params.append(from_dt.isoformat())
        if to_dt:
            clauses.append("timestamp <= ?")
            params.append(to_dt.isoformat())
        if false_trigger is not None:
            clauses.append("false_trigger = ?")
            params.append(1 if false_trigger else 0)
        if ids:
            placeholders = ",".join("?" * len(ids))
            clauses.append(f"id IN ({placeholders})")
            params.extend(ids)
        if not clauses:
            raise ValueError(
                "delete_events_bulk refuses to delete with no filters "
                "(would wipe the entire events table)"
            )
        where = "WHERE " + " AND ".join(clauses)
        with self._connect() as conn:
            rows = conn.execute(
                f"SELECT * FROM events {where}", params,
            ).fetchall()
            if rows:
                conn.execute(f"DELETE FROM events {where}", params)
        return [dict(r) for r in rows]
    def update_event_review(self, event_id: str, review: dict) -> bool:
        """
        Sync derived index columns from a sidecar's `review` block.
@@ -564,21 +722,79 @@ class SeismoDb:
    def query_units(self) -> list[dict]:
        """
-        Return one row per known serial with summary stats:
+        Return one row per known serial with summary stats.
-        last_seen, total_events, total_monitor_entries.
+
        Aggregates from BOTH source tables:
          - `events`        — populated by every ingest path
                              (live ACH, /db/import/blastware_file
                              from the series3-watcher forwarder, etc.)
          - `ach_sessions`  — only populated by the live ACH server;
                              empty for events that came in via the
                              BW-importer route.
        Earlier this method only joined on `ach_sessions`, which made
        watcher-forwarded units invisible to the SFM webapp's fleet
        overview even though their events were correctly populated in
        `events`.  Now we union the two and surface every serial that
        has activity in either table.
        Fields:
          serial                — unit serial number (e.g. "BE11529")
          last_seen             — most recent of MAX(events.timestamp)
                                  and MAX(ach_sessions.session_time)
          total_events          — COUNT(*) from `events` (the
                                  authoritative count regardless of
                                  ingest path)
          total_monitor_entries — from `ach_sessions`, 0 when absent
          total_sessions        — COUNT(*) from `ach_sessions`, 0 when absent
        """
        with self._connect() as conn:
-            rows = conn.execute(
+            event_stats = {
-                """
+                row["serial"]: row
-                SELECT
+                for row in conn.execute(
-                    s.serial,
+                    """
-                    MAX(s.session_time)  AS last_seen,
+                    SELECT serial,
-                    SUM(s.events_downloaded)  AS total_events,
+                           MAX(timestamp) AS last_event_at,
-                    SUM(s.monitor_entries)    AS total_monitor_entries,
+                           COUNT(*)        AS total_events
-                    COUNT(*)                  AS total_sessions
+                      FROM events
-                FROM ach_sessions s
+                     GROUP BY serial
-                GROUP BY s.serial
+                    """,
-                ORDER BY last_seen DESC
+                ).fetchall()
-                """
+            }
-            ).fetchall()
+            session_stats = {
-        return [dict(r) for r in rows]
+                row["serial"]: row
                for row in conn.execute(
                    """
                    SELECT serial,
                           MAX(session_time)         AS last_session_at,
                           SUM(monitor_entries)      AS total_monitor_entries,
                           COUNT(*)                  AS total_sessions
                      FROM ach_sessions
                     GROUP BY serial
                    """,
                ).fetchall()
            }
        all_serials = set(event_stats) | set(session_stats)
        units = []
        for serial in all_serials:
            e = event_stats.get(serial)
            s = session_stats.get(serial)
            last_event_at   = e["last_event_at"]   if e else None
            last_session_at = s["last_session_at"] if s else None
            # Prefer whichever timestamp is more recent
            last_seen = max(
                (t for t in (last_event_at, last_session_at) if t),
                default=None,
            )
            units.append({
                "serial":                serial,
                "last_seen":             last_seen,
                "total_events":          e["total_events"]          if e else 0,
                "total_monitor_entries": s["total_monitor_entries"] if s else 0,
                "total_sessions":        s["total_sessions"]        if s else 0,
            })
        # Sort by last_seen desc; serials with no timestamp at all sink to the bottom.
        units.sort(key=lambda u: u.get("last_seen") or "", reverse=True)
        return units
@@ -1,216 +0,0 @@
 """
 sfm.dump_0c — inspect the raw 210-byte SUB 0C waveform record stored in a
 sidecar JSON's `extensions.raw_records.waveform_record_b64`.
 Usage:
    python -m sfm.dump_0c <sidecar.sfm.json> [<sidecar.sfm.json> ...]
 Prints, for each input:
  - A header summarising the sidecar's metadata-block claims (peaks,
    project, timestamp) — the "what BW says this event measured" view.
  - A 16-byte-wide hex dump of the raw 0C record, annotated with known
    field anchors (STRT, channel labels, project strings).
  - A "candidate float regions" scan that brute-forces every byte
    position as a float32 BE and prints any that yield a value in a
    plausible range (1e-7 to 1e3) — useful for hunting where Peak
    Acceleration / Peak Displacement / ZC Freq / Time of Peak live.
 Pairing the printed candidates with the BW Event Report values lets
 us nail down byte offsets for the missing fields without a live
 device.
 """
 from __future__ import annotations
 import argparse
 import base64
 import json
 import struct
 import sys
 from pathlib import Path
 # ── Annotations for known anchors in a 210-byte 0C record ──────────────────
 # Anchors we look for and label inline in the hex dump.  Each is a needle
 # (bytes to find) and a short label.  Found via .find() — the first
 # occurrence wins.
 _ANCHORS = [
    (b"Tran",            "Tran label  (PPV @ +6, PVS @ -12)"),
    (b"Vert",            "Vert label  (PPV @ +6)"),
    (b"Long",            "Long label  (PPV @ +6)"),
    (b"MicL",            "MicL label  (peak psi @ +6)"),
    (b"Project:",        "Project: label"),
    (b"Client:",         "Client: label"),
    (b"User Name:",      "User Name: label"),
    (b"Seis Loc:",       "Seis Loc: label"),
    (b"Extended Notes",  "Extended Notes label"),
 ]
 def _hex_dump(data: bytes, anchors: dict[int, str]) -> str:
    """Return a 16-byte-wide hex+ASCII dump, with anchor labels printed
    on the line that contains the anchor's start byte."""
    lines = []
    for off in range(0, len(data), 16):
        chunk = data[off : off + 16]
        hex_part   = " ".join(f"{b:02x}" for b in chunk)
        ascii_part = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk)
        line = f"  {off:04x}  {hex_part:<47}  |{ascii_part}|"
        # If any anchor lands on a byte in this row, append a tag
        tags = [
            f"[{a:#04x}: {label}]"
            for a, label in anchors.items()
            if off <= a < off + 16
        ]
        if tags:
            line += "  " + "  ".join(tags)
        lines.append(line)
    return "\n".join(lines)
 def _scan_float32_be(data: bytes, lo: float, hi: float) -> list[tuple[int, float]]:
    """Brute-force every offset where data[off:off+4] is a float32 BE in
    (lo, hi).  Includes negatives in the symmetric range."""
    hits = []
    for i in range(len(data) - 3):
        try:
            v = struct.unpack_from(">f", data, i)[0]
        except struct.error:
            continue
        if v != v:                     # NaN
            continue
        if abs(v) < 1e-30 or abs(v) > 1e10:   # crap range
            continue
        a = abs(v)
        if lo <= a <= hi:
            hits.append((i, v))
    return hits
 def _scan_uint16_be(data: bytes, lo: int, hi: int) -> list[tuple[int, int]]:
    """Find every offset where uint16 BE is in [lo, hi]."""
    hits = []
    for i in range(len(data) - 1):
        v = (data[i] << 8) | data[i + 1]
        if lo <= v <= hi:
            hits.append((i, v))
    return hits
 def _summarize_sidecar(side: dict) -> str:
    ev   = side.get("event", {})
    pv   = side.get("peak_values", {})
    pi   = side.get("project_info", {})
    bw   = side.get("blastware", {})
    return (
        f"  serial:     {ev.get('serial')}\n"
        f"  timestamp:  {ev.get('timestamp')}\n"
        f"  waveform:   {ev.get('waveform_key')}  ({ev.get('record_type')})\n"
        f"  sample_rate:{ev.get('sample_rate')} sps  rectime:{ev.get('rectime_seconds')}s\n"
        f"  bw file:    {bw.get('filename')}  ({bw.get('filesize')} B)\n"
        f"  peaks:      "
        f"Tran={pv.get('transverse'):.5f}  "
        f"Vert={pv.get('vertical'):.5f}  "
        f"Long={pv.get('longitudinal'):.5f}  "
        f"PVS={pv.get('vector_sum'):.5f} in/s  "
        f"Mic={pv.get('mic_psi'):.6e} psi"
        if all(pv.get(k) is not None for k in
               ("transverse", "vertical", "longitudinal", "vector_sum", "mic_psi"))
        else f"  peaks:      {pv}\n  project:    {pi}"
    ) + (
        f"\n  project:    {pi.get('project')!r}  / {pi.get('client')!r}  / "
        f"operator={pi.get('operator')!r}  loc={pi.get('sensor_location')!r}"
    )
 def dump_one(path: Path) -> int:
    side = json.loads(path.read_text(encoding="utf-8"))
    raw_b64 = (
        side.get("extensions", {})
            .get("raw_records", {})
            .get("waveform_record_b64")
    )
    if not raw_b64:
        print(f"\n=== {path} ===")
        print("  ! no extensions.raw_records.waveform_record_b64 — sidecar")
        print("    pre-dates raw-0C persistence (added in v0.15.x).  Re-save")
        print("    the event from the device to capture the bytes.")
        return 1
    raw = base64.b64decode(raw_b64)
    # Build anchor map
    anchors: dict[int, str] = {}
    for needle, label in _ANCHORS:
        i = raw.find(needle)
        if i >= 0:
            anchors[i] = label
    print(f"\n=== {path} ===")
    print("metadata claimed by sidecar:")
    print(_summarize_sidecar(side))
    print(f"\nraw 0C record  ({len(raw)} bytes):")
    print(_hex_dump(raw, anchors))
    # Float32 BE candidates in geo-relevant ranges
    geo_hits = _scan_float32_be(raw, 1e-5, 50.0)
    # Filter: only show hits that are NOT trivially the per-channel labels'
    # +6 PPV floats already documented (those will land in any sweep too).
    print("\nfloat32 BE candidates (1e-5 .. 50.0):")
    for off, v in geo_hits:
        annotation = ""
        for needle, _ in _ANCHORS[:4]:   # geo + mic labels
            i = raw.find(needle)
            if i >= 0 and off == i + 6:
                annotation = f"  ← {needle.decode()} PPV (label+6)"
                break
        print(f"    {off:#04x}  ({off:3d})  {v:>+15.6f}{annotation}")
    print("\nuint16 BE candidates ZC-Freq-ish (1..200):")
    for off, v in _scan_uint16_be(raw, 1, 200):
        if v < 5:    # too noisy at very low end
            continue
        print(f"    {off:#04x}  ({off:3d})  = {v}")
    print("\nuint16 BE candidates Time-of-Peak-ish if stored as ms (1..30000):")
    for off, v in _scan_uint16_be(raw, 1, 30000):
        if v < 100:  # noise filter
            continue
        # Only the first ~80 are worth showing — too many hits otherwise
        if off > 80:
            break
        print(f"    {off:#04x}  ({off:3d})  = {v} ms ?")
    print()
    return 0
 def main(argv: list[str] | None = None) -> int:
    p = argparse.ArgumentParser(
        description="Inspect a saved 0C waveform record from a sidecar JSON.",
    )
    p.add_argument(
        "sidecars",
        nargs="+",
        type=Path,
        help="Path(s) to <event>.sfm.json sidecar file(s).",
    )
    args = p.parse_args(argv)
    rc = 0
    for path in args.sidecars:
        try:
            rc |= dump_one(path)
        except Exception as exc:
            print(f"\n=== {path} ===\n  ERROR: {exc}", file=sys.stderr)
            rc |= 2
    return rc
 if __name__ == "__main__":
    sys.exit(main())
@@ -0,0 +1,909 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>SFM Event Browser</title>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.min.js"></script>
  <style>
    * { box-sizing: border-box; margin: 0; padding: 0; }
    body {
      background: #0d1117;
      color: #c9d1d9;
      font-family: 'Segoe UI', system-ui, sans-serif;
      font-size: 13px;
      height: 100vh;
      display: flex;
      flex-direction: column;
      overflow: hidden;
    }
    header {
      background: #161b22;
      border-bottom: 1px solid #30363d;
      padding: 12px 20px;
      display: flex;
      align-items: center;
      gap: 16px;
      flex-shrink: 0;
    }
    header h1 {
      font-size: 15px;
      font-weight: 600;
      color: #f0f6fc;
      white-space: nowrap;
    }
    label { color: #8b949e; font-size: 12px; }
    select, input[type="text"], input[type="search"] {
      background: #0d1117;
      border: 1px solid #30363d;
      border-radius: 6px;
      color: #c9d1d9;
      padding: 5px 8px;
      font-size: 13px;
    }
    select { min-width: 140px; }
    input[type="search"] { width: 200px; }
    select:focus, input:focus { outline: none; border-color: #388bfd; }
    button {
      background: #1f6feb;
      border: none;
      border-radius: 6px;
      color: #fff;
      cursor: pointer;
      font-size: 13px;
      font-weight: 500;
      padding: 5px 14px;
    }
    button:hover { background: #388bfd; }
    button:disabled { background: #21262d; color: #484f58; cursor: not-allowed; }
    #main {
      flex: 1;
      display: flex;
      overflow: hidden;
    }
    /* ── Event list (left sidebar) ────────────────────────────────── */
    #event-list-wrap {
      width: 320px;
      flex-shrink: 0;
      background: #0d1117;
      border-right: 1px solid #21262d;
      display: flex;
      flex-direction: column;
    }
    #event-list-header {
      padding: 10px 14px;
      border-bottom: 1px solid #21262d;
      font-size: 11px;
      color: #8b949e;
      text-transform: uppercase;
      letter-spacing: 0.06em;
      display: flex;
      justify-content: space-between;
    }
    #event-list {
      flex: 1;
      overflow-y: auto;
    }
    .event-row {
      padding: 8px 14px;
      border-bottom: 1px solid #161b22;
      cursor: pointer;
      transition: background 0.1s;
    }
    .event-row:hover { background: #161b22; }
    .event-row.active { background: #1f3a5f; border-left: 3px solid #58a6ff; padding-left: 11px; }
    .event-row .er-top {
      display: flex;
      justify-content: space-between;
      align-items: center;
      margin-bottom: 2px;
    }
    .event-row .er-ts { font-family: monospace; font-size: 12px; color: #c9d1d9; }
    .event-row .er-pvs { font-family: monospace; font-size: 12px; color: #58a6ff; font-weight: 600; }
    .event-row .er-meta { font-size: 11px; color: #8b949e; }
    .event-row.false_trigger .er-pvs { color: #f85149; text-decoration: line-through; }
    /* ── Main viewer (right side) ─────────────────────────────────── */
    #viewer {
      flex: 1;
      display: flex;
      flex-direction: column;
      overflow: hidden;
    }
    #event-meta {
      padding: 12px 20px;
      background: #161b22;
      border-bottom: 1px solid #21262d;
      display: grid;
      grid-template-columns: repeat(auto-fit, minmax(160px, 1fr));
      gap: 8px 24px;
      flex-shrink: 0;
    }
    .meta-field {
      display: flex;
      flex-direction: column;
      gap: 1px;
    }
    .meta-field .mf-label {
      font-size: 10px;
      color: #484f58;
      text-transform: uppercase;
      letter-spacing: 0.05em;
    }
    .meta-field .mf-value {
      font-family: monospace;
      font-size: 13px;
      color: #c9d1d9;
    }
    .meta-field .mf-value.highlight { color: #58a6ff; font-weight: 600; }
    #charts {
      flex: 1;
      overflow-y: auto;
      padding: 12px 16px;
      display: flex;
      flex-direction: column;
      gap: 10px;
    }
    .chart-wrap {
      background: #161b22;
      border: 1px solid #21262d;
      border-radius: 8px;
      padding: 10px 30px 8px 12px;  /* right padding leaves room for the "0.0" baseline label */
    }
    .chart-label {
      font-size: 11px;
      font-weight: 600;
      letter-spacing: 0.06em;
      text-transform: uppercase;
      margin-bottom: 4px;
      display: flex;
      justify-content: space-between;
    }
    .chart-canvas-wrap { position: relative; height: 130px; }
    .ch-tran { color: #58a6ff; }
    .ch-vert { color: #3fb950; }
    .ch-long { color: #d29922; }
    .ch-micl { color: #bc8cff; }
    #status-bar {
      background: #161b22;
      border-top: 1px solid #21262d;
      padding: 5px 20px;
      font-size: 12px;
      color: #8b949e;
      min-height: 26px;
      flex-shrink: 0;
    }
    #status-bar.error { color: #f85149; }
    #status-bar.ok    { color: #3fb950; }
    #empty-state {
      flex: 1;
      display: flex;
      flex-direction: column;
      align-items: center;
      justify-content: center;
      color: #484f58;
      gap: 8px;
    }
    #empty-state svg { opacity: 0.3; }
    .pill {
      background: #21262d;
      border-radius: 4px;
      padding: 2px 8px;
      color: #c9d1d9;
      font-family: monospace;
      font-size: 11px;
      margin-left: 8px;
    }
    /* Per-channel stats table in the metadata header */
    .stats-table {
      grid-column: 1 / -1;
      border-collapse: collapse;
      font-family: monospace;
      font-size: 12px;
      margin-top: 4px;
    }
    .stats-table th, .stats-table td {
      padding: 3px 14px 3px 0;
      text-align: left;
      color: #c9d1d9;
    }
    .stats-table th {
      color: #484f58;
      font-size: 10px;
      text-transform: uppercase;
      letter-spacing: 0.05em;
      font-weight: 500;
    }
    /* ── Print view (light theme matching the Instantel printout) ─── */
    body.print-view {
      background: #ffffff;
      color: #000000;
    }
    body.print-view header,
    body.print-view #event-list-wrap,
    body.print-view #event-list-header,
    body.print-view #event-meta,
    body.print-view #status-bar,
    body.print-view .chart-wrap {
      background: #ffffff;
      border-color: #cccccc;
      color: #000000;
    }
    body.print-view .event-row { color: #000; border-bottom-color: #eee; }
    body.print-view .event-row:hover { background: #f4f4f4; }
    body.print-view .event-row.active {
      background: #e6f0ff;
      border-left-color: #1f6feb;
    }
    body.print-view .er-ts { color: #000; }
    body.print-view .er-pvs { color: #003a8c; }
    body.print-view .er-meta,
    body.print-view #event-list-header,
    body.print-view .meta-field .mf-label,
    body.print-view .stats-table th {
      color: #666;
    }
    body.print-view .mf-value { color: #000; }
    body.print-view .mf-value.highlight { color: #003a8c; }
    body.print-view label { color: #444; }
    body.print-view input, body.print-view select {
      background: #fff; color: #000; border-color: #ccc;
    }
    /* In print theme, the channel-label colors stay (they identify
       the trace).  Only the chart panel background flips. */
    @media print {
      header, #event-list-wrap, #status-bar, button { display: none !important; }
      body { overflow: visible; height: auto; }
      #main, #viewer { overflow: visible; }
      #charts { overflow: visible; }
    }
  </style>
 </head>
 <body>
 <header>
  <h1>SFM Event Browser</h1>
  <label>Serial</label>
  <select id="serial-select">
    <option value="">Loading…</option>
  </select>
  <input type="search" id="event-filter" placeholder="filter events…" />
  <span class="pill" id="count-pill">—</span>
  <button id="mic-unit-toggle" style="margin-left:auto;background:#21262d"
          onclick="_setMicUnit(_getMicUnit() === 'dBL' ? 'psi' : 'dBL')"
          title="Toggle mic display unit (dBL ↔ psi). Persists across page loads.">
    Mic: dBL
  </button>
  <button id="print-btn" onclick="togglePrintView()" style="background:#21262d">Print view</button>
  <button id="reload-btn" onclick="loadSerials()">Reload</button>
 </header>
 <div id="main">
  <div id="event-list-wrap">
    <div id="event-list-header">
      <span>Events</span>
      <span id="event-list-count">—</span>
    </div>
    <div id="event-list"></div>
  </div>
  <div id="viewer">
    <div id="empty-state">
      <svg width="48" height="48" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5">
        <polyline points="22 12 18 12 15 21 9 3 6 12 2 12"/>
      </svg>
      <p>Select a unit and event to view its waveform.</p>
    </div>
    <div id="event-meta" style="display:none"></div>
    <div id="charts" style="display:none"></div>
  </div>
 </div>
 <div id="status-bar">Ready.</div>
 <script>
 // Channel colors and rendering order mirror Instantel's BW Event Report
 // printout: MicL at the top, Tran at the bottom.  Colors approximate
 // what BW renders (magenta mic, blue long, green vert, red tran).
 const CHANNEL_COLORS = {
  MicL: '#e066ff',
  Long: '#3a80ff',
  Vert: '#3fb950',
  Tran: '#f85149',
 };
 const CHANNEL_ORDER = ['MicL', 'Long', 'Vert', 'Tran'];
 // Reference pressure for dB(L) — 20 µPa expressed in psi (≈ 2.9e-9 psi).
 const DBL_REF = 2.9e-9;
 // User-toggleable mic display unit: 'dBL' (default, matches BW printout
 // + the rest of SFM) or 'psi' (raw sample unit).
 function _getMicUnit() {
  return localStorage.getItem('sfm_mic_unit') === 'psi' ? 'psi' : 'dBL';
 }
 function _setMicUnit(u) {
  localStorage.setItem('sfm_mic_unit', u === 'psi' ? 'psi' : 'dBL');
  _refreshMicUnitToggle();
  if (currentEventId) loadEvent(currentEventId);
 }
 function _refreshMicUnitToggle() {
  const b = document.getElementById('mic-unit-toggle');
  if (b) b.textContent = `Mic: ${_getMicUnit()}`;
 }
 // psi → dB(L).  Null for non-positive (log undefined; Chart.js renders as a gap).
 function _psiToDbl(psi) {
  if (psi == null || !(psi > 0)) return null;
  return 20 * Math.log10(psi / DBL_REF);
 }
 // Per-sample mic chart conversion — rectify the AC waveform, dBL,
 // floor below the noise-floor minimum.  Gives a continuous baseline
 // instead of the spikey/discontinuous look you get from raw _psiToDbl.
 const MIC_DBL_FLOOR = 60;
 function _psiToDblForChart(psi) {
  if (psi == null) return MIC_DBL_FLOOR;
  const a = Math.abs(psi);
  if (a === 0) return MIC_DBL_FLOOR;
  const dbl = 20 * Math.log10(a / DBL_REF);
  return dbl > MIC_DBL_FLOOR ? dbl : MIC_DBL_FLOOR;
 }
 // Format an ISO timestamp in the browser's local timezone — UTC values
 // (with 'Z' suffix) convert; naive values are interpreted as local clock.
 // Returns '—' for null/empty/unparseable.
 function _fmtTsLocal(iso) {
  if (!iso) return '—';
  const d = new Date(iso);
  if (isNaN(d)) return iso;
  return d.toLocaleString();
 }
 // Adaptive decimal formatter — scientific notation only for truly extreme
 // values.  Normal-range peaks render as plain decimals with sensible
 // precision (was previously forcing toExponential(3) which produced ugly
 // "2.500E-2 IN/S" labels).
 function _fmtPeak(v, unit) {
  if (v == null || (typeof v === 'number' && !isFinite(v))) return '';
  if (typeof v !== 'number') return String(v) + (unit ? ' ' + unit : '');
  if (v === 0) return '0' + (unit ? ' ' + unit : '');
  const a = Math.abs(v);
  const u = unit ? ' ' + unit : '';
  if (a >= 0.0001 && a < 10000) {
    const d = a >= 100 ? 1 : a >= 10 ? 2 : a >= 1 ? 3 : a >= 0.1 ? 4 : 5;
    return v.toFixed(d) + u;
  }
  return v.toExponential(2) + u;
 }
 let allEvents = [];
 let filteredEvents = [];
 let currentEventId = null;
 let charts = {};
 const apiBase = window.location.origin;
 function setStatus(msg, cls = '') {
  const bar = document.getElementById('status-bar');
  bar.textContent = msg;
  bar.className = cls;
 }
 async function loadSerials() {
  setStatus('Loading serials…');
  try {
    const r = await fetch(`${apiBase}/db/units`);
    if (!r.ok) throw new Error(r.statusText);
    // /db/units returns a bare list[dict], not {units:[...]}
    const units = await r.json();
    const sel = document.getElementById('serial-select');
    sel.innerHTML = '';
    if (!units || units.length === 0) {
      sel.innerHTML = '<option value="">(no units found)</option>';
      setStatus('No units in DB.', 'error');
      return;
    }
    sel.innerHTML = '<option value="">— pick a unit —</option>' +
      units.map(u => {
        const n = u.total_events ?? 0;
        return `<option value="${u.serial}">${u.serial}  (${n} events)</option>`;
      }).join('');
    setStatus(`Loaded ${units.length} units.`, 'ok');
  } catch (e) {
    setStatus(`Failed to load units: ${e.message}`, 'error');
  }
 }
 async function loadEventsForSerial(serial) {
  if (!serial) {
    allEvents = [];
    renderEventList();
    return;
  }
  setStatus(`Loading events for ${serial}…`);
  try {
    const r = await fetch(`${apiBase}/db/events?serial=${encodeURIComponent(serial)}&limit=500`);
    if (!r.ok) throw new Error(r.statusText);
    const d = await r.json();
    allEvents = d.events || [];
    document.getElementById('count-pill').textContent = `${allEvents.length} events`;
    applyFilter();
    setStatus(`Loaded ${allEvents.length} events for ${serial}.`, 'ok');
  } catch (e) {
    setStatus(`Failed to load events: ${e.message}`, 'error');
  }
 }
 function applyFilter() {
  const q = document.getElementById('event-filter').value.toLowerCase().trim();
  if (!q) {
    filteredEvents = allEvents;
  } else {
    filteredEvents = allEvents.filter(ev =>
      (ev.blastware_filename || '').toLowerCase().includes(q) ||
      (ev.timestamp           || '').toLowerCase().includes(q) ||
      (ev.record_type         || '').toLowerCase().includes(q) ||
      (ev.project             || '').toLowerCase().includes(q)
    );
  }
  document.getElementById('event-list-count').textContent = `${filteredEvents.length} / ${allEvents.length}`;
  renderEventList();
 }
 function renderEventList() {
  const list = document.getElementById('event-list');
  list.innerHTML = '';
  if (filteredEvents.length === 0) {
    list.innerHTML = '<div style="padding:14px;color:#484f58;font-size:12px">No events.</div>';
    return;
  }
  for (const ev of filteredEvents) {
    const row = document.createElement('div');
    row.className = 'event-row' + (ev.false_trigger ? ' false_trigger' : '');
    if (ev.id === currentEventId) row.className += ' active';
    const ts = _fmtTsLocal(ev.timestamp);
    const pvs = ev.peak_vector_sum != null ? `${ev.peak_vector_sum.toFixed(3)} in/s` : '—';
    row.innerHTML = `
      <div class="er-top">
        <span class="er-ts">${ts || '(no ts)'}</span>
        <span class="er-pvs">${pvs}</span>
      </div>
      <div class="er-meta">${ev.record_type || '?'} · ${ev.blastware_filename || ev.id.slice(0,8)}</div>
    `;
    row.onclick = () => loadEvent(ev.id);
    list.appendChild(row);
  }
 }
 async function loadEvent(eventId) {
  currentEventId = eventId;
  renderEventList();
  setStatus('Loading waveform…');
  try {
    // Sidecar fetch runs in parallel — its bw_report block carries ZC
    // Freq + above-range flags + sensor-check results that the per-
    // channel stats table surfaces.  Failures are non-fatal (legacy
    // events without a preserved .TXT have no sidecar bw_report).
    const sidecarP = fetch(`${apiBase}/db/events/${eventId}/sidecar`)
      .then(r => r.ok ? r.json() : null)
      .catch(() => null);
    const r = await fetch(`${apiBase}/db/events/${eventId}/waveform.json`);
    if (!r.ok) {
      if (r.status === 404) {
        showEmpty('No waveform data for this event (codec returned no samples).');
        return;
      }
      throw new Error(r.statusText);
    }
    const data = await r.json();
    renderWaveform(data);
    // Also fetch metadata from the events list for richer header
    const ev = allEvents.find(e => e.id === eventId);
    const sidecar = await sidecarP;
    renderMeta(data, ev, sidecar);
    setStatus(`Event loaded.`, 'ok');
  } catch (e) {
    setStatus(`Failed to load event: ${e.message}`, 'error');
    showEmpty(`Error: ${e.message}`);
  }
 }
 function showEmpty(msg) {
  document.getElementById('empty-state').style.display = 'flex';
  document.getElementById('empty-state').querySelector('p').textContent = msg;
  document.getElementById('event-meta').style.display = 'none';
  document.getElementById('charts').style.display = 'none';
  Object.values(charts).forEach(c => c.destroy());
  charts = {};
 }
 function renderMeta(data, ev, sidecar) {
  const metaDiv = document.getElementById('event-meta');
  const fields = [
    ['Serial',      data.serial || ev?.serial || '—'],
    ['Timestamp',   _fmtTsLocal(data.timestamp || ev?.timestamp)],
    ['Record',      data.record_type || ev?.record_type || '—'],
    ['Sample rate', data.sample_rate ? `${data.sample_rate} sps` : '—'],
    ['Geo range',   data.geo_range ? `${data.geo_range} (${data.geo_full_scale_ips} in/s FS)` : '—'],
    ['Project',     ev?.project || '—'],
    ['Location',    ev?.sensor_location || '—'],
    ['Peak Vector Sum',
                    ev?.peak_vector_sum != null ? `${ev.peak_vector_sum.toFixed(4)} in/s` : '—'],
  ];
  // Per-channel stats table mirroring the printout's middle block.
  // PPV from the events DB row; ZC Freq + saturation flags from the
  // sidecar's bw_report block (when a .TXT was preserved on ingest).
  const bwrPeaks = (sidecar?.bw_report || {}).peaks || {};
  const bwrMic   = (sidecar?.bw_report || {}).mic   || {};
  const fmt = v => (v == null ? '—' : (typeof v === 'number' ? v.toFixed(3) : v));
  const fmtZc = bwr => {
    if (!bwr || bwr.zc_freq_hz == null) return '—';
    const prefix = bwr.zc_freq_above_range ? '>' : '';
    return `${prefix}${Math.round(bwr.zc_freq_hz)} Hz`;
  };
  const rows = [
    ['Tran', ev?.tran_ppv, fmtZc(bwrPeaks.tran)],
    ['Vert', ev?.vert_ppv, fmtZc(bwrPeaks.vert)],
    ['Long', ev?.long_ppv, fmtZc(bwrPeaks.long)],
  ];
  // Mic display honors the current user preference (dBL default).
  // mic_ppv is stored as raw psi on series3 events; convert when needed.
  const micPsi = ev?.mic_ppv;
  const micUnitDisplay = _getMicUnit();
  let micStr;
  if (micPsi == null) {
    micStr = '—';
  } else if (micUnitDisplay === 'dBL') {
    const d = _psiToDbl(Number(micPsi));
    micStr = (d != null ? d.toFixed(1) : '—') + ' dBL';
  } else {
    micStr = Number(micPsi).toExponential(2) + ' psi';
  }
  const statsHtml = `
    <table class="stats-table">
      <thead>
        <tr><th>Channel</th><th>PPV (in/s)</th><th>ZC Freq</th></tr>
      </thead>
      <tbody>
        ${rows.map(([ch, ppv, zc]) => `<tr><td>${ch}</td><td>${fmt(ppv)}</td><td>${zc}</td></tr>`).join('')}
        <tr><td>MicL</td><td>${micStr}</td><td>${fmtZc(bwrMic)}</td></tr>
      </tbody>
    </table>
  `;
  metaDiv.innerHTML =
    fields.map(([l, v]) =>
      `<div class="meta-field"><span class="mf-label">${l}</span><span class="mf-value${l === 'Peak Vector Sum' ? ' highlight' : ''}">${v}</span></div>`
    ).join('') + statsHtml;
  metaDiv.style.display = 'grid';
 }
 function togglePrintView() {
  document.body.classList.toggle('print-view');
  // Force chart redraw so axis/grid colors are re-evaluated against the
  // new background.  Easiest: re-render the current event.
  if (currentEventId) {
    loadEvent(currentEventId);
  }
 }
 function renderWaveform(data) {
  document.getElementById('empty-state').style.display = 'none';
  const chartsDiv = document.getElementById('charts');
  chartsDiv.style.display = 'flex';
  chartsDiv.innerHTML = '';
  Object.values(charts).forEach(c => c.destroy());
  charts = {};
  const channels = data.channels || {};
  // time_axis is METADATA from sfm.plot.v1 — sample_rate, pretrig_samples,
  // t0_ms (first-sample time relative to trigger; negative when pretrig
  // exists), dt_ms.  Trigger is at t=0 by convention.
  const ta    = data.time_axis || {};
  const sr    = ta.sample_rate || 1024;
  const dtMs  = ta.dt_ms || (1000.0 / sr);
  const t0Ms  = ta.t0_ms != null ? ta.t0_ms : 0;
  const isPrintMode = document.body.classList.contains('print-view');
  // Histograms record per-interval peaks (typically 1 per minute/5-min),
  // not per-sample waveforms.  Render as a tight bar graph instead of a
  // line plot — matches the BW Event Report's histogram presentation.
  const isHistogram = String(data.record_type || '').toLowerCase().includes('histogram');
  // Which channels actually have data → determines which one renders the
  // shared x-axis at the bottom (Instantel printout has the time scale
  // only on the bottom-most chart).
  const channelsWithData = CHANNEL_ORDER.filter(ch =>
    channels[ch] && (channels[ch].values || []).length > 0
  );
  const lastDataCh = channelsWithData[channelsWithData.length - 1];
  const micUnit = _getMicUnit();
  for (const ch of CHANNEL_ORDER) {
    const chData = channels[ch];
    if (!chData) continue;
    if ((chData.values || []).length === 0) {
      // Render an empty card so user sees the channel exists but is missing
      const wrap = document.createElement('div');
      wrap.className = 'chart-wrap';
      wrap.innerHTML = `
        <div class="chart-label ch-${ch.toLowerCase()}">
          <span>${ch}</span>
          <span style="color:#484f58">no samples decoded</span>
        </div>
        <div class="chart-canvas-wrap" style="display:flex;align-items:center;justify-content:center;color:#484f58;font-size:12px">empty</div>
      `;
      chartsDiv.appendChild(wrap);
      continue;
    }
    // Mic channel: convert from raw psi to dB(L) when the user prefers dBL
    // (the default).  We mutate `values`, `peak`, and `unit` locally so the
    // chart datasets + axis title + tooltip + peak label all stay aligned.
    let values = chData.values || [];
    let unit  = chData.unit || 'unit';
    let peak  = chData.peak;
    const peakT = chData.peak_t_ms;
    if (ch === 'MicL' && unit === 'psi' && micUnit === 'dBL') {
      // Per-sample chart uses rectified-and-floored conversion so the
      // baseline is continuous; the peak label uses the unrectified
      // converter to preserve the true measurement.
      values = values.map(_psiToDblForChart);
      peak   = _psiToDbl(peak);
      unit   = 'dB(L)';
    }
    const peakLabel = peak != null
      ? `peak ${_fmtPeak(peak, unit)}`
        + (!isHistogram && peakT != null ? ` @ ${peakT.toFixed(1)} ms` : '')
      : '';
    // Hide x-axis on every chart except the bottom-most data channel —
    // gives the "single shared time axis" feel of the BW printout.
    const showXAxis = (ch === lastDataCh);
    const wrap = document.createElement('div');
    wrap.className = 'chart-wrap';
    const lbl = document.createElement('div');
    lbl.className = `chart-label ch-${ch.toLowerCase()}`;
    lbl.innerHTML = `<span>${ch}</span><span style="color:#8b949e;font-weight:normal">${peakLabel}</span>`;
    wrap.appendChild(lbl);
    const canvasWrap = document.createElement('div');
    canvasWrap.className = 'chart-canvas-wrap';
    const canvas = document.createElement('canvas');
    canvasWrap.appendChild(canvas);
    wrap.appendChild(canvasWrap);
    chartsDiv.appendChild(wrap);
    // Waveform: per-sample time in ms relative to trigger (negative for pretrig).
    // Histogram: when the server has aggregated to BW-reported intervals AND
    // provides per-interval timestamps, use those as x-axis labels (HH:MM:SS).
    // Falls back to interval index.
    let times;
    if (isHistogram) {
      const intervalTimes = ta.interval_times || [];
      times = (intervalTimes.length === values.length)
        ? intervalTimes
        : values.map((_, i) => i + 1);
    } else {
      times = values.map((_, i) => t0Ms + i * dtMs);
    }
    // Downsample for rendering
    const MAX_POINTS = 4000;
    let rT = times, rV = values;
    if (values.length > MAX_POINTS) {
      const step = Math.ceil(values.length / MAX_POINTS);
      rT = times.filter((_, i) => i % step === 0);
      rV = values.filter((_, i) => i % step === 0);
    }
    // Tick formatter — round to 1 decimal so we don't get
    // "11.7187040000000002 ms" garbage from floating-point accumulation.
    const xAxisUnit = isHistogram ? '' : ' ms';
    const fmtTick = i => {
      const v = rT[i];
      if (typeof v !== 'number') return String(v) + xAxisUnit;
      return (Number.isInteger(v) ? String(v) : v.toFixed(1)) + xAxisUnit;
    };
    // Y-axis bounds.  Geophone waveforms render symmetric around zero
    // (seismograph convention — zero line in the middle, signal goes
    // up AND down).  Mic + histograms keep default auto-scale (always
    // positive values; zero at the bottom).
    let yBounds = {};
    const isGeo = ch !== 'MicL';
    if (isGeo && !isHistogram) {
      // Waveform geo: symmetric around zero for full shape detail.
      let absMax = 0;
      for (const v of values) {
        const a = Math.abs(v);
        if (a > absMax) absMax = a;
      }
      const padded = (absMax || 1) * 1.10;
      yBounds = { min: -padded, max: padded };
    } else if (isGeo && isHistogram) {
      // Histogram geo: enforce minimum chart range so quiet events
      // look quiet (matches BW's near-fixed-scale convention).
      const HIST_GEO_MIN_INS = 0.05;
      let p = 0;
      for (const v of values) { const a = Math.abs(v); if (a > p) p = a; }
      yBounds = { min: 0, max: Math.max(p * 1.10, HIST_GEO_MIN_INS) };
    } else if (ch === 'MicL' && micUnit === 'dBL') {
      // Mic dBL: baseline at noise-floor minimum, top at peak + 5 dB.
      const peakDbl = (typeof peak === 'number' && isFinite(peak))
        ? peak + 5
        : 100;
      yBounds = { min: MIC_DBL_FLOOR, max: Math.max(peakDbl, MIC_DBL_FLOOR + 20) };
    } else if (ch === 'MicL' && isHistogram && micUnit === 'psi') {
      // Mic histogram in psi: same minimum-range treatment as geo.
      const HIST_MIC_MIN_PSI = 0.001;
      let p = 0;
      for (const v of values) { const a = Math.abs(v); if (a > p) p = a; }
      yBounds = { min: 0, max: Math.max(p * 1.10, HIST_MIC_MIN_PSI) };
    }
    const chart = new Chart(canvas, {
      type: isHistogram ? 'bar' : 'line',
      data: {
        labels: rT.map(t => (typeof t === 'number' ? (Number.isInteger(t) ? String(t) : t.toFixed(2)) : t)),
        datasets: isHistogram ? [{
          data: rV,
          backgroundColor: CHANNEL_COLORS[ch],
          borderWidth: 0,
          barPercentage: 1.0,
          categoryPercentage: 1.0,  // bars touch — tight bargraph
        }] : [{
          data: rV,
          borderColor: CHANNEL_COLORS[ch],
          borderWidth: 1,
          pointRadius: 0,
          tension: 0,
        }],
      },
      options: {
        animation: false,
        responsive: true,
        maintainAspectRatio: false,
        plugins: {
          legend: { display: false },
          tooltip: {
            mode: 'index',
            intersect: false,
            callbacks: {
              title: items => isHistogram
                ? `interval ${items[0].label}`
                : `t = ${items[0].label} ms`,
              label: item => `${ch}: ${_fmtPeak(item.raw, unit)}`,
            },
          },
        },
        scales: {
          x: {
            type: 'category',
            display: showXAxis,
            ticks: {
              color: isPrintMode ? '#666' : '#484f58',
              maxTicksLimit: 10,
              maxRotation: 0,
              callback: (val, i) => fmtTick(i),
            },
            grid: { color: isPrintMode ? '#e0e0e0' : '#21262d', drawTicks: showXAxis },
          },
          y: {
            ...yBounds,
            ticks: { color: isPrintMode ? '#666' : '#484f58', maxTicksLimit: 5 },
            grid: { color: isPrintMode ? '#e0e0e0' : '#21262d' },
            title: { display: true, text: unit,
                     color: isPrintMode ? '#666' : '#484f58', font: { size: 10 } },
          },
        },
      },
      plugins: isHistogram ? [] : [{
        // Trigger line @ t=0 + triangle markers above/below + "0.0"
        // baseline label on the right edge.  Matches the Instantel
        // BW Event Report printout style.  Skipped for histograms —
        // they have no trigger event.
        id: 'instantelOverlays',
        afterDraw(chart) {
          const ctx   = chart.ctx;
          const xAxis = chart.scales.x;
          const yAxis = chart.scales.y;
          const fgPrim = isPrintMode ? '#000' : '#c9d1d9';
          const fgTrigger = '#f85149';
          // Dashed vertical trigger line at t=0
          const zeroIdx = rT.findIndex(t => parseFloat(t) >= 0);
          if (zeroIdx >= 0) {
            const x = xAxis.getPixelForValue(zeroIdx);
            ctx.save();
            ctx.beginPath();
            ctx.moveTo(x, yAxis.top);
            ctx.lineTo(x, yAxis.bottom);
            ctx.strokeStyle = isPrintMode ? '#cc0000' : 'rgba(248, 81, 73, 0.8)';
            ctx.lineWidth = 1.2;
            ctx.setLineDash([4, 3]);
            ctx.stroke();
            ctx.restore();
            // Triangles above and below the chart at the trigger column
            ctx.save();
            ctx.fillStyle = fgTrigger;
            ctx.beginPath();  // top triangle pointing down
            ctx.moveTo(x - 5, yAxis.top - 8);
            ctx.lineTo(x + 5, yAxis.top - 8);
            ctx.lineTo(x,     yAxis.top - 1);
            ctx.closePath();
            ctx.fill();
            ctx.beginPath();  // bottom triangle pointing up
            ctx.moveTo(x - 5, yAxis.bottom + 8);
            ctx.lineTo(x + 5, yAxis.bottom + 8);
            ctx.lineTo(x,     yAxis.bottom + 1);
            ctx.closePath();
            ctx.fill();
            ctx.restore();
          }
          // "0.0" baseline label on the right edge — printout convention.
          // Position vertically at the zero-amplitude level.
          const zeroY = yAxis.getPixelForValue(0);
          if (zeroY >= yAxis.top && zeroY <= yAxis.bottom) {
            ctx.save();
            ctx.strokeStyle = isPrintMode ? '#aaa' : '#30363d';
            ctx.lineWidth = 0.8;
            ctx.setLineDash([2, 2]);
            ctx.beginPath();
            ctx.moveTo(xAxis.left, zeroY);
            ctx.lineTo(xAxis.right, zeroY);
            ctx.stroke();
            ctx.restore();
            ctx.save();
            ctx.fillStyle = fgPrim;
            ctx.font = '11px monospace';
            ctx.textAlign = 'left';
            ctx.textBaseline = 'middle';
            ctx.fillText('0.0', xAxis.right + 6, zeroY);
            ctx.restore();
          }
        },
      }],
    });
    charts[ch] = chart;
  }
 }
 // Wire up handlers
 document.getElementById('serial-select').addEventListener('change', e => {
  loadEventsForSerial(e.target.value);
 });
 document.getElementById('event-filter').addEventListener('input', applyFilter);
 // Reflect any persisted mic-unit preference in the header pill on load
 _refreshMicUnitToggle();
 // Initial load
 loadSerials();
 </script>
 </body>
 </html>
@@ -166,6 +166,7 @@ def main(argv: list[str] | None = None) -> int:
                    {ev._waveform_key.hex(): rec}
                    if ev._waveform_key else None
                ),
                device_family="series3",
            )
            tag = "OK  " if ins else ("SKIP" if sk else "OK  ")
            print(f"  [{tag}] {path.name}  → {rec['filename']}  "
@@ -0,0 +1,939 @@
 """
 sfm/report_pdf.py — generate Instantel-style Event Report PDFs.
 Stub layout for v0.20.0 — the exact visual is iterated against actual
 Blastware reference PDFs (uploaded to docs/reference/instantel/).
 Current output captures all the data fields a real BW Event Report
 contains, but the visual hierarchy / spacing is still approximate.
 Architecture
 ────────────
 1. ``gather_report_data(event_id)`` — assembles a flat dict from three
   sources: the SeismoDb events row, the .sfm.json sidecar (bw_report
   block), and the .h5 waveform samples.  Returns ``None`` when the
   event doesn't exist or has no waveform data on disk.
 2. ``render_event_report_pdf(data)`` — takes that dict and produces a
   single-page letter-sized PDF as bytes, using matplotlib's PDF
   backend (vector output, no rasterization, prints cleanly).
 3. The HTTP endpoint at ``/db/events/{id}/report.pdf`` wires them
   together: fetch event → gather → render → stream bytes back with
   ``Content-Type: application/pdf``.
 What's in the report (every field BW's printout includes):
  Header (left):  Date/Time, Trigger Source, Range, Sample Rate, Notes,
                  Project, Client, User Name, Seis. Loc
  Header (right): Serial + firmware, Battery, Calibration, File Name,
                  Post Event Notes
  Mic block:      PSPL (dBL + psi), ZC Freq, Channel Test result
  Stats table:    per-channel PPV / ZC Freq / Time of Peak /
                  Peak Acceleration / Peak Displacement / Sensor Check
  Peak Vector Sum
  Waveform plot:  4 channels stacked (MicL/Long/Vert/Tran), shared
                  time axis, trigger marker, peak markers
  USBM RI8507/OSMRE compliance chart:  STUBBED — separate work item
 Histogram events: the layout differs (Number of Intervals header
 field, no trigger marker, per-interval bar chart instead of waveform).
 Handled via a record_type branch in ``render_event_report_pdf``.
 """
 from __future__ import annotations
 import io
 import json
 import logging
 import math
 from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Optional
 import matplotlib
 matplotlib.use("Agg")   # headless — no display required
 import matplotlib.pyplot as plt
 import numpy as np
 from matplotlib.backends.backend_pdf import PdfPages
 log = logging.getLogger(__name__)
 # Reference pressure for dB(L) conversion: 20 µPa expressed in psi.
 DBL_REF_PSI = 2.9e-9
 # ── Data assembly ────────────────────────────────────────────────────────────
@dataclass
 class ReportData:
    """All fields needed to render an Instantel-style Event Report.
    Most fields are Optional — BW's printout shows '—' or just omits
    sections when source data is missing.  The renderer mirrors that.
    """
    # Header — left column
    event_datetime_str: Optional[str] = None
    trigger_source:     Optional[str] = None
    geo_range_str:      Optional[str] = None
    sample_rate_str:    Optional[str] = None
    notes:              Optional[str] = None
    project:            Optional[str] = None
    client:             Optional[str] = None
    operator:           Optional[str] = None
    sensor_location:    Optional[str] = None
    # Header — right column
    serial:                 Optional[str] = None
    firmware:               Optional[str] = None
    battery_volts:          Optional[float] = None
    calibration_date:       Optional[str] = None
    calibration_by:         Optional[str] = None
    file_name:              Optional[str] = None
    post_event_notes:       Optional[str] = None
    # Microphone block
    mic_pspl_dbl:           Optional[float] = None
    mic_pspl_psi:           Optional[float] = None
    mic_pspl_time_s:        Optional[float] = None
    mic_pspl_when_str:      Optional[str] = None    # histogram absolute date+time, BW-formatted
    mic_zc_freq_hz:         Optional[float] = None
    mic_zc_freq_above_range: bool           = False
    mic_channel_test_result: Optional[str] = None
    mic_channel_test_freq_hz: Optional[float] = None
    mic_channel_test_amp_mv: Optional[float] = None
    # Per-channel stats — list of dicts (one per channel)
    # Keys: name, ppv_ips, zc_freq_hz, time_of_peak_s,
    #       peak_accel_g, peak_disp_in, sensor_check
    channel_stats:          list[dict] = field(default_factory=list)
    # Peak Vector Sum
    peak_vector_sum_ips:    Optional[float] = None
    peak_vector_sum_time_s: Optional[float] = None
    # Waveform samples — channels[ch] = list of floats in physical units
    # Time axis derived from sample_rate + pretrig_samples
    channels:               dict = field(default_factory=dict)
    sample_rate_sps:        Optional[int] = None
    pretrig_samples:        Optional[int] = None
    t0_ms:                  Optional[float] = None
    dt_ms:                  Optional[float] = None
    # Record-type discriminator
    record_type:            Optional[str] = None
    is_histogram:           bool = False
    # Histogram-only fields — only populated for record_type starts with 'Hist'
    histogram_start_str:    Optional[str] = None       # "22:30:38 May 16, 2026"
    histogram_stop_str:     Optional[str] = None
    histogram_n_intervals:  Optional[float] = None     # 4.00
    histogram_interval_size: Optional[str] = None      # "1 minute"
    histogram_interval_size_s: Optional[float] = None  # 60.0 — numeric seconds, used to derive interval_times
    histogram_interval_times: list[str] = field(default_factory=list)  # per-interval timestamps for x-axis
    # Peak Vector Sum metadata (histograms show absolute date+time)
    peak_vector_sum_when_str: Optional[str] = None
    # Bookkeeping
    event_id:               Optional[str] = None
    server_received_at:     Optional[str] = None
    bw_pc_sw_version:       Optional[str] = None
 def gather_report_data(
    db,
    store,
    event_id: str,
 ) -> Optional[ReportData]:
    """Collect every field needed to render an event report.
    Returns ``None`` if the event is unknown or has no waveform data
    on disk (no .h5, no .a5.pkl — same condition the waveform.json
    endpoint 404s on).
    """
    row = db.get_event(event_id)
    if row is None:
        return None
    serial   = row.get("serial")
    filename = row.get("blastware_filename")
    if not serial or not filename:
        return None
    rd = ReportData(
        event_id=event_id,
        serial=serial,
        file_name=filename,
        record_type=row.get("record_type"),
        is_histogram=str(row.get("record_type", "")).lower().startswith("hist"),
        event_datetime_str=row.get("timestamp"),
        sample_rate_sps=row.get("sample_rate"),
        project=row.get("project"),
        client=row.get("client"),
        operator=row.get("operator"),
        sensor_location=row.get("sensor_location"),
        server_received_at=row.get("created_at"),
    )
    # ── Sidecar bw_report — the rich BW-derived fields ──
    sidecar_path = store.sidecar_path_for(serial, filename)
    if sidecar_path.exists():
        try:
            sc = json.loads(sidecar_path.read_text())
        except Exception as exc:
            log.warning("gather_report_data: sidecar read failed: %s", exc)
            sc = {}
        bw = sc.get("bw_report") or {}
        # Trigger / range / sample-rate display
        trig = bw.get("trigger") or {}
        rd.trigger_source = (
            f"{trig.get('channel','')}: {trig.get('geo_level_ips')} in/s"
            if trig.get("channel") or trig.get("geo_level_ips") is not None
            else None
        )
        rec = bw.get("recording") or {}
        rd.geo_range_str = (
            f"Geo: {rec.get('geo_range_ips')} in/s"
            if rec.get("geo_range_ips") is not None else None
        )
        rt = rec.get("record_time_s")
        if rt is not None and rd.sample_rate_sps:
            rd.sample_rate_str = f"{rt:.1f} sec At {rd.sample_rate_sps} Sps"
        # Device block
        dev = bw.get("device") or {}
        rd.battery_volts    = dev.get("battery_volts")
        rd.calibration_date = dev.get("calibration_date")
        rd.calibration_by   = dev.get("calibration_by")
        rd.firmware         = bw.get("version")
        rd.bw_pc_sw_version = bw.get("pc_sw_version")
        # Microphone block
        mic = bw.get("mic") or {}
        rd.mic_pspl_dbl    = mic.get("pspl_dbl")
        if rd.mic_pspl_dbl is not None and rd.mic_pspl_dbl > 0:
            # Inverse of the dBL formula → psi.  Mirrors waveform_codec convention.
            rd.mic_pspl_psi = DBL_REF_PSI * (10 ** (rd.mic_pspl_dbl / 20))
        rd.mic_pspl_time_s = mic.get("time_of_peak_s")
        rd.mic_zc_freq_hz             = mic.get("zc_freq_hz")
        rd.mic_zc_freq_above_range    = bool(mic.get("zc_freq_above_range"))
        sc_mic = (bw.get("sensor_check") or {}).get("mic") or {}
        rd.mic_channel_test_result   = sc_mic.get("result")
        rd.mic_channel_test_freq_hz  = sc_mic.get("freq_hz")
        rd.mic_channel_test_amp_mv   = sc_mic.get("amplitude_mv")
        # Per-channel stats (Tran / Vert / Long).  Per-channel peak
        # date+time for histograms comes from bw_report.histogram.channel_peak_when
        # (populated when the parser captured it; see the bw_ascii_report
        # parser's histogram-fields handler).
        peaks = bw.get("peaks") or {}
        sc_block = bw.get("sensor_check") or {}
        hist_block = bw.get("histogram") or {}
        peak_when = hist_block.get("channel_peak_when") or {}
        for ch_lc, ch_label in (("tran", "Tran"), ("vert", "Vert"), ("long", "Long")):
            ch = peaks.get(ch_lc) or {}
            sc_ch = sc_block.get(ch_lc) or {}
            ch_when_iso = peak_when.get(ch_label)
            peak_date, peak_time = _split_iso_to_date_time(ch_when_iso)
            rd.channel_stats.append({
                "name":               ch_label,
                "ppv_ips":            ch.get("ppv_ips"),
                "zc_freq_hz":         ch.get("zc_freq_hz"),
                "zc_freq_above_range": bool(ch.get("zc_freq_above_range")),
                "time_of_peak_s":     ch.get("time_of_peak_s"),
                "peak_accel_g":       ch.get("peak_accel_g"),
                "peak_disp_in":       ch.get("peak_disp_in"),
                "sensor_check":       sc_ch.get("result"),
                "peak_date":          peak_date,
                "peak_time":          peak_time,
            })
        # MicL peak time (used in the mic block — "PSPL ... on DATE at TIME")
        mic_when_iso = peak_when.get("MicL")
        rd.mic_pspl_when_str = _fmt_iso_to_bw(mic_when_iso) if mic_when_iso else None
        # Peak Vector Sum
        vs = peaks.get("vector_sum") or {}
        rd.peak_vector_sum_ips    = vs.get("ips")
        rd.peak_vector_sum_time_s = vs.get("time_s")
        # PVS absolute date+time (histograms).  Same formatting as Mic.
        pvs_when_iso = vs.get("when")
        rd.peak_vector_sum_when_str = _fmt_iso_to_bw(pvs_when_iso) if pvs_when_iso else None
        # Histogram-specific header fields — keys match the projection in
        # _bw_report_to_dict ("start" / "stop", not "_str" suffixed).
        if rd.is_histogram:
            rd.histogram_start_str   = hist_block.get("start") or rd.event_datetime_str
            rd.histogram_stop_str    = hist_block.get("stop")
            rd.histogram_n_intervals = hist_block.get("n_intervals")
            rd.histogram_interval_size = hist_block.get("interval_size")
            rd.histogram_interval_size_s = hist_block.get("interval_size_s")
            rd.histogram_interval_times = hist_block.get("interval_times") or []
    # ── Waveform samples — from the .h5 via the existing helper ──
    from sfm import event_hdf5
    h5_path = store.hdf5_path_for(serial, filename)
    if h5_path.exists():
        try:
            wf = event_hdf5.plot_json_from_hdf5(h5_path, event_id=event_id)
            rd.channels = {
                ch: (chd.get("values") or [])
                for ch, chd in (wf.get("channels") or {}).items()
            }
            ta = wf.get("time_axis") or {}
            rd.sample_rate_sps  = rd.sample_rate_sps or ta.get("sample_rate")
            rd.pretrig_samples  = ta.get("pretrig_samples")
            rd.t0_ms            = ta.get("t0_ms")
            rd.dt_ms            = ta.get("dt_ms")
        except Exception as exc:
            log.warning("gather_report_data: hdf5 read failed: %s", exc)
    # ── Histogram aggregation ──
    # Codec emits ~N per-block samples (typically 1/sec); BW reports
    # one bar per configured interval (1 min / 5 min / etc.).  When
    # bw_report.histogram.n_intervals is populated (events ingested
    # with the parser extension), group max-per-group to match.  Also
    # derives per-interval timestamps for the x-axis.  No-op for
    # waveform events or when n_intervals is missing.
    if rd.is_histogram and rd.histogram_n_intervals and rd.histogram_n_intervals >= 1:
        n = int(rd.histogram_n_intervals)
        for ch, vals in list(rd.channels.items()):
            if not vals:
                continue
            per_group = len(vals) // n
            remainder = len(vals) % n
            agg: list = []
            offset = 0
            for i in range(n):
                grp_size = per_group + (1 if i < remainder else 0)
                if grp_size > 0:
                    grp = vals[offset:offset + grp_size]
                    agg.append(max((abs(v) for v in grp if v is not None), default=0))
                    offset += grp_size
                else:
                    agg.append(0)
            rd.channels[ch] = agg
        # Derive per-interval HH:MM:SS labels if we have the start time + size
        if rd.histogram_start_str and rd.histogram_interval_size_s and not rd.histogram_interval_times:
            try:
                import datetime as _dt
                start = _dt.datetime.fromisoformat(rd.histogram_start_str)
                rd.histogram_interval_times = [
                    (start + _dt.timedelta(seconds=(i + 1) * rd.histogram_interval_size_s)).strftime("%H:%M:%S")
                    for i in range(n)
                ]
            except Exception:
                pass
    return rd
 # ── PDF rendering ────────────────────────────────────────────────────────────
 def render_event_report_pdf(rd: ReportData) -> bytes:
    """Render an event report dict to a single-page letter PDF.
    Branches on ``rd.is_histogram`` — waveform and histogram layouts
    differ in their header fields, stats-table rows, and bottom plot.
    Layout modeled on Blastware's Event Report PDFs (samples in
    docs/reference/instantel/).
    """
    # Letter portrait — 8.5"×11"
    fig = plt.figure(figsize=(8.5, 11), dpi=100)
    fig.patch.set_facecolor("white")
    if rd.is_histogram:
        _render_histogram_layout(fig, rd)
    else:
        _render_waveform_layout(fig, rd)
    # Page footer (common to both layouts) — Created date + event id.
    # Pushed to the very page bottom so it doesn't collide with the
    # waveform footer scale / trigger legend lines just above.
    # Convert UTC server_received_at to local for display.
    created_local = _fmt_iso_to_bw(rd.server_received_at) if rd.server_received_at else "—"
    fig.text(
        0.07, 0.005,
        f"Created: {created_local}  •  seismo-relay",
        fontsize=6, color="#888", ha="left",
    )
    fig.text(
        0.93, 0.005,
        f"Event {rd.event_id[:8] if rd.event_id else '—'}",
        fontsize=6, color="#888", ha="right",
    )
    buf = io.BytesIO()
    fig.savefig(buf, format="pdf")
    plt.close(fig)
    return buf.getvalue()
 def _render_waveform_layout(fig, rd: ReportData) -> None:
    """Waveform layout: header / mic+USBM / per-channel stats / waveform plot.
    Stats table includes Time (Rel. to Trig), Peak Accel, Peak Disp.
    Left margin sized to fit the channel labels (MicL/Long/Vert/Tran).
    Extra bottom margin reserves space for x-axis tick labels +
    "Amplitude Geo: X in/s/div Mic: Y psi(L)/div" footer + trigger
    legend without overlap.
    """
    gs = fig.add_gridspec(
        nrows=4, ncols=1,
        left=0.11, right=0.94, top=0.97, bottom=0.12,
        height_ratios=[1.7, 2.0, 1.8, 5.5],
        hspace=0.35,
    )
    ax_header = fig.add_subplot(gs[0]); ax_header.axis("off")
    _draw_header_waveform(ax_header, rd)
    ax_mid = fig.add_subplot(gs[1]); ax_mid.axis("off")
    _draw_mic_and_usbm(ax_mid, rd)
    ax_stats = fig.add_subplot(gs[2]); ax_stats.axis("off")
    _draw_channel_stats_waveform(ax_stats, rd)
    _draw_waveform_subplot(fig, gs[3], rd)
 def _render_histogram_layout(fig, rd: ReportData) -> None:
    """Histogram layout: header / mic-only / per-channel stats / bar plot.
    No USBM compliance chart (it's a waveform-only concept).  Stats table
    uses Date + Time-of-peak instead of relative-time + accel + disp.
    Left margin sized to fit the channel labels.  Extra bottom margin
    leaves room for the x-axis time labels + footer scale legend
    without overlap.
    """
    gs = fig.add_gridspec(
        nrows=4, ncols=1,
        left=0.11, right=0.94, top=0.97, bottom=0.12,
        height_ratios=[1.8, 0.9, 1.7, 5.6],
        hspace=0.35,
    )
    ax_header = fig.add_subplot(gs[0]); ax_header.axis("off")
    _draw_header_histogram(ax_header, rd)
    ax_mic = fig.add_subplot(gs[1]); ax_mic.axis("off")
    _draw_mic_only(ax_mic, rd)
    ax_stats = fig.add_subplot(gs[2]); ax_stats.axis("off")
    _draw_channel_stats_histogram(ax_stats, rd)
    _draw_histogram_subplot(fig, gs[3], rd)
 def _to_display_local(iso: str):
    """Parse an ISO timestamp and return a datetime in the system's local
    timezone (set by the TZ env var, default America/New_York via the
    Dockerfile).
    Behaviour:
      - "...Z" or "...+HH:MM" suffix → tz-aware UTC → converted to local
      - Naïve "YYYY-MM-DDTHH:MM:SS" (no tz) → returned as-is.  This
        matches the convention used elsewhere in seismo-relay: BW's
        recorded-at timestamps are naïve and ALREADY in the unit's
        local clock; we don't second-guess them.
    """
    import datetime as _dt
    dt = _dt.datetime.fromisoformat(iso.replace("Z", "+00:00"))
    if dt.tzinfo is not None:
        # Convert from UTC (or other tz) → local per the TZ env var.
        # astimezone() without arg uses the system timezone.
        dt = dt.astimezone()
    return dt
 def _fmt_iso_to_bw(iso: Optional[str]) -> Optional[str]:
    """Convert an ISO-8601 timestamp to BW's display format
    '22:30:37 May 16, 2026'.  UTC inputs (with Z suffix) are
    converted to the system's local timezone first; naïve inputs
    are formatted as-is.  Returns input unchanged on parse failure."""
    if not iso or "T" not in iso:
        return iso
    try:
        return _to_display_local(iso).strftime("%H:%M:%S %B %d, %Y").replace(" 0", " ")
    except Exception:
        return iso
 def _split_iso_to_date_time(iso: Optional[str]) -> tuple[Optional[str], Optional[str]]:
    """Split an ISO timestamp into BW-formatted ('May 27 /26', '06:06:14')
    date+time strings.  Used for the histogram stats table where the
    Date and Time rows are presented separately.  UTC inputs are
    converted to local time first.  Returns (None, None) on parse failure."""
    if not iso:
        return (None, None)
    try:
        dt = _to_display_local(iso)
        # BW format: 'May 27 /26' (3-letter month + 2-digit year)
        date_str = dt.strftime("%b %d /%y").replace(" 0", " ")
        time_str = dt.strftime("%H:%M:%S")
        return (date_str, time_str)
    except Exception:
        return (None, None)
 def _kv(ax, x, y, label, value, *, label_w=0.18):
    """Render a 'Label  Value' row at axes-coordinates (x, y)."""
    ax.text(x, y, label, fontsize=8, color="#555", ha="left", va="top",
            transform=ax.transAxes)
    ax.text(x + label_w, y, _fmt(value), fontsize=8, ha="left", va="top",
            transform=ax.transAxes, family="monospace")
 def _fmt(v):
    """Format any field for display — '—' for None, str otherwise."""
    if v is None:
        return "—"
    if isinstance(v, float):
        return f"{v:.4f}".rstrip("0").rstrip(".")
    return str(v)
 def _draw_header_waveform(ax, rd: ReportData) -> None:
    """Two-column metadata header — waveform variant."""
    rows_left = [
        ("Date/Time",      _fmt_iso_to_bw(rd.event_datetime_str)),
        ("Trigger Source", rd.trigger_source),
        ("Range",          rd.geo_range_str),
        ("Sample Rate",    rd.sample_rate_str),
        ("Notes",          rd.notes),
        ("Project:",       rd.project),
        ("Client:",        rd.client),
        ("User Name:",     rd.operator),
        ("Seis. Loc:",     rd.sensor_location),
    ]
    _draw_header_columns(ax, rows_left, rd)
 def _draw_header_histogram(ax, rd: ReportData) -> None:
    """Two-column metadata header — histogram variant.
    Histograms have Start / Finish / Intervals fields instead of
    Trigger Source (there's no trigger event for a histogram capture).
    """
    intervals_str = None
    if rd.histogram_n_intervals is not None and rd.histogram_interval_size:
        intervals_str = f"{rd.histogram_n_intervals} At {rd.histogram_interval_size}"
    rows_left = [
        ("Start",      _fmt_iso_to_bw(rd.histogram_start_str or rd.event_datetime_str)),
        ("Finish",     _fmt_iso_to_bw(rd.histogram_stop_str)),
        ("Intervals",  intervals_str),
        ("Range",      rd.geo_range_str),
        ("Sample Rate", (f"{rd.sample_rate_sps} Sps" if rd.sample_rate_sps else None)),
        ("Notes",      rd.notes),
        ("Project:",   rd.project),
        ("Client:",    rd.client),
        ("User Name:", rd.operator),
        ("Seis. Loc:", rd.sensor_location),
    ]
    _draw_header_columns(ax, rows_left, rd)
 def _draw_header_columns(ax, rows_left, rd: ReportData) -> None:
    """Shared 2-column header rendering used by both layouts."""
    rows_right = [
        ("Serial Number", f"{rd.serial or '—'}" + (f"  {rd.firmware}" if rd.firmware else "")),
        ("Battery Level", f"{rd.battery_volts:.1f} Volts" if rd.battery_volts is not None else None),
        ("Unit Calibration", (f"{rd.calibration_date}" + (f" by {rd.calibration_by}" if rd.calibration_by else ""))
                              if rd.calibration_date else None),
        ("File Name", rd.file_name),
        ("Post Event Notes", rd.post_event_notes),
    ]
    y = 0.95
    dy = 0.095
    for label, value in rows_left:
        _kv(ax, 0.0, y, label, value, label_w=0.18)
        y -= dy
    y = 0.95
    for label, value in rows_right:
        _kv(ax, 0.55, y, label, value, label_w=0.20)
        y -= dy
 def _draw_mic_only(ax, rd: ReportData) -> None:
    """Mic block (histogram variant — no USBM chart)."""
    ax.text(0.0, 0.95, "Microphone   Linear Weighting", fontsize=8, color="#555",
            transform=ax.transAxes, va="top")
    rows = _mic_rows(rd)
    y = 0.70
    for label, value in rows:
        _kv(ax, 0.0, y, label, value, label_w=0.18)
        y -= 0.22
 def _draw_mic_and_usbm(ax, rd: ReportData) -> None:
    """Mic block on the left + USBM compliance chart placeholder on right.
    (Waveform variant — USBM is a velocity-vs-frequency compliance plot
    that doesn't apply to histograms.)"""
    ax.text(0.0, 0.95, "Microphone   Linear Weighting", fontsize=8, color="#555",
            transform=ax.transAxes, va="top")
    rows = _mic_rows(rd)
    y = 0.80
    for label, value in rows:
        _kv(ax, 0.0, y, label, value, label_w=0.18)
        y -= 0.15
    # USBM chart placeholder — upper-right.  Real piecewise compliance
    # curves are a separate work item; for now this just shows the title
    # + a "see report" message so the layout is correct.
    ax.text(0.72, 0.97, "USBM RI8507 And OSMRE",
            fontsize=9, weight="bold", color="#333", ha="center", va="top",
            transform=ax.transAxes)
    ax.text(0.72, 0.50, "[compliance chart\ncoming soon]",
            fontsize=8, color="#bbb", ha="center", va="center",
            transform=ax.transAxes, style="italic")
 def _mic_rows(rd: ReportData) -> list[tuple[str, Optional[str]]]:
    """Build the mic-section value rows (shared by both layouts).
    For histograms, BW formats the PSPL line as
        "125.7 dB(L) on May 27, 2026 at 06:19:14"
    (absolute date+time of peak).  Waveform events show the relative
    "at 0.012 sec." instead.  Both formats covered here based on which
    field is populated.
    """
    rows: list[tuple[str, Optional[str]]] = []
    if rd.mic_pspl_dbl is not None:
        line = f"{rd.mic_pspl_dbl:.1f} dB(L)"
        if rd.mic_pspl_when_str:
            # Histogram-style: "PSPL  125.7 dB(L) on May 27, 2026 at 06:19:14"
            # mic_pspl_when_str is already "HH:MM:SS Month DD, YYYY";
            # reformat to "on Month DD, YYYY at HH:MM:SS" for BW match.
            parts = rd.mic_pspl_when_str.split(" ", 1)
            if len(parts) == 2:
                line += f" on {parts[1]} at {parts[0]}"
            else:
                line += f" on {rd.mic_pspl_when_str}"
        elif rd.mic_pspl_time_s is not None:
            # Waveform-style: relative-to-trigger seconds.
            line += f" at {rd.mic_pspl_time_s:.3f} sec."
        rows.append(("PSPL", line))
    if rd.mic_zc_freq_hz is not None:
        prefix = ">" if rd.mic_zc_freq_above_range else ""
        rows.append(("ZC Freq", f"{prefix}{rd.mic_zc_freq_hz:.0f} Hz"))
    if rd.mic_channel_test_result:
        line = rd.mic_channel_test_result
        if rd.mic_channel_test_freq_hz is not None and rd.mic_channel_test_amp_mv is not None:
            line += (f" (Freq = {rd.mic_channel_test_freq_hz:.1f} Hz, "
                     f"Amp = {rd.mic_channel_test_amp_mv:.0f} mv)")
        rows.append(("Channel Test", line))
    return rows
 def _draw_channel_stats_waveform(ax, rd: ReportData) -> None:
    """Waveform stats table — has Time (Rel. to Trig), Peak Accel, Peak Disp.
    Followed by Peak Vector Sum line."""
    rows_spec = [
        ("PPV",                  "ppv_ips",        "in/s"),
        ("ZC Freq",              "zc_freq_hz",     "Hz"),
        ("Time (Rel. to Trig)",  "time_of_peak_s", "sec"),
        ("Peak Acceleration",    "peak_accel_g",   "g"),
        ("Peak Displacement",    "peak_disp_in",   "in"),
        ("Sensor Check",         "sensor_check",   ""),
    ]
    _draw_stats_table(ax, rd, rows_spec)
    _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec))
 def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
    """Histogram stats table — PPV, ZC Freq, Date, Time of peak, Sensor Check.
    Followed by Peak Vector Sum line."""
    # Date / Time of peak are per-channel timestamps for the interval at peak.
    # bw_report stores time_of_peak_s as relative seconds, but for histograms
    # BW shows them as absolute date+time.  We populate from rd.channel_stats
    # if those absolute fields are present; otherwise fall back to relative.
    rows_spec = [
        ("PPV",          "ppv_ips",         "in/s"),
        ("ZC Freq",      "zc_freq_hz",      "Hz"),
        ("Date",         "peak_date",       ""),
        ("Time",         "peak_time",       ""),
        ("Sensor Check", "sensor_check",    ""),
    ]
    _draw_stats_table(ax, rd, rows_spec)
    _draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec), histogram_when=True)
 def _draw_pvs_summary(
    ax,
    rd: ReportData,
    *,
    n_data_rows: int,
    histogram_when: bool = False,
 ) -> None:
    """Render the Peak Vector Sum + 'NA: Not Applicable' caption below the
    stats table.
    Reads ``ax._stats_table_bottom`` (set by ``_draw_stats_table`` when
    it pins the table via an explicit ``bbox``) so the PVS line lands
    just below the table's known bottom edge instead of guessing at the
    geometry.
    Centered horizontally for visual balance (the previous left-aligned
    x=0 landed under the label column, not the data, which looked off).
    """
    if rd.peak_vector_sum_ips is None:
        return
    line = f"Peak Vector Sum   {rd.peak_vector_sum_ips:.3f} in/s"
    if histogram_when and rd.peak_vector_sum_when_str:
        # Histogram absolute date+time.  when_str is "HH:MM:SS Month DD, YYYY";
        # reformat to "<value> on <date> At <time>" to match BW.
        parts = rd.peak_vector_sum_when_str.split(" ", 1)
        if len(parts) == 2:
            line += f" on {parts[1]} At {parts[0]}"
        else:
            line += f" on {rd.peak_vector_sum_when_str}"
    elif not histogram_when and rd.peak_vector_sum_time_s is not None:
        line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
    # _draw_stats_table stashes the bbox bottom on the axes so we don't
    # have to guess geometry.  Falls back to a conservative default if
    # the bbox approach hasn't run.
    table_bottom_y = getattr(ax, "_stats_table_bottom", -0.10)
    pvs_y = table_bottom_y - 0.04   # small gap below the table border
    # Centered for visual balance — looks intentional rather than offset.
    # The original BW-replica had a "NA: Not Applicable" caption below
    # this line; dropped because we use "—" for missing values and the
    # legend was always squished against the PVS line.
    ax.text(0.5, pvs_y, line, fontsize=9, weight="bold",
            ha="center", va="top", transform=ax.transAxes)
 def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]]) -> None:
    """Render a per-channel stats table (Tran/Vert/Long).
    rows_spec: list of (label, field_name_in_channel_stats, unit_string)
    """
    headers = ["", "Tran", "Vert", "Long", ""]
    ch_lookup = {c["name"]: c for c in rd.channel_stats}
    def _cell(field, ch_name):
        ch_rec = ch_lookup.get(ch_name, {})
        val = ch_rec.get(field)
        if val is None:
            return "—"
        if isinstance(val, float):
            # ZC Freq is integer-formatted in BW; ">100 Hz" sentinel
            # rendered as ">N" (val carries the threshold).  Everything
            # else gets 3 decimals.
            if field == "zc_freq_hz":
                prefix = ">" if ch_rec.get("zc_freq_above_range") else ""
                return f"{prefix}{val:.0f}"
            return f"{val:.3f}"
        return str(val)
    table_data = [headers]
    for label, field_name, unit in rows_spec:
        table_data.append([
            label,
            _cell(field_name, "Tran"),
            _cell(field_name, "Vert"),
            _cell(field_name, "Long"),
            unit,
        ])
    # Pin the table's position+size via bbox so we know exactly where
    # the bottom edge lands.  Lets _draw_pvs_summary place the PVS line
    # just below the table without guessing at row heights.
    #
    # bbox = [x, y, width, height] in axes coords.  Header + data rows
    # at row_h each; horizontal extent matches sum(colWidths).
    n_rows = len(table_data)        # header + data rows
    row_h  = 0.12                   # axes-fraction per row (fits fontsize=8)
    table_height = n_rows * row_h
    table_bottom = 1.0 - table_height
    tbl = ax.table(
        cellText=table_data,
        colWidths=[0.28, 0.14, 0.14, 0.14, 0.10],
        cellLoc="left", edges="open",
        bbox=[0.0, table_bottom, 0.80, table_height],
    )
    tbl.auto_set_font_size(False)
    tbl.set_fontsize(8)
    for j in range(5):
        tbl[(0, j)].set_text_props(weight="bold", color="#555")
    # Stash the bottom Y so _draw_pvs_summary can position itself below.
    ax._stats_table_bottom = table_bottom
 def _channel_axis_color(ch: str) -> str:
    return {"MicL": "#cc00cc", "Long": "#0066ff", "Vert": "#009933", "Tran": "#cc0000"}.get(ch, "#444")
 def _draw_waveform_subplot(fig, gridspec_cell, rd: ReportData) -> None:
    """4-channel stacked waveform plot — Instantel printout order
    (MicL on top, Tran on bottom), shared x-axis in SECONDS, trigger
    triangle markers at t=0, '0.0' baseline label on right of each."""
    inner = gridspec_cell.subgridspec(4, 1, hspace=0.0)
    order = ["MicL", "Long", "Vert", "Tran"]
    sr = rd.sample_rate_sps or 1024
    # Convert ms-based time axis to seconds for the x-axis
    dt_s = (rd.dt_ms or (1000.0 / sr)) / 1000.0
    t0_s = (rd.t0_ms if rd.t0_ms is not None else 0.0) / 1000.0
    last_idx = len(order) - 1
    for i, ch in enumerate(order):
        ax = fig.add_subplot(inner[i])
        values = rd.channels.get(ch) or []
        times = [t0_s + j * dt_s for j in range(len(values))]
        if values:
            color = _channel_axis_color(ch)
            ax.plot(times, values, color=color, linewidth=0.5)
            # Symmetric y-axis for geo; zero-anchored for mic.
            if ch != "MicL":
                amax = max((abs(v) for v in values), default=0.001)
                ax.set_ylim(-amax * 1.10, amax * 1.10)
            else:
                amax = max((abs(v) for v in values), default=0.001)
                ax.set_ylim(-amax * 1.10, amax * 1.10)
        # Channel label on the LEFT (matches BW)
        ax.set_ylabel(ch, fontsize=8, rotation=0, ha="right", va="center",
                      color=_channel_axis_color(ch), weight="bold", labelpad=14)
        # "0.0" on the RIGHT (BW convention)
        ax.text(1.005, 0.5, "0.0", transform=ax.transAxes,
                fontsize=7, color="#555", va="center", ha="left")
        ax.grid(True, linestyle="--", linewidth=0.3, color="#bbb", alpha=0.6)
        # Vertical dashed trigger line at t=0
        ax.axvline(0.0, color="#cc0000", linestyle="--", linewidth=0.6, alpha=0.7)
        # Zero baseline horizontal
        ax.axhline(0.0, color=_channel_axis_color(ch), linestyle="-",
                   linewidth=0.4, alpha=0.5)
        if i != last_idx:
            ax.set_xticklabels([])
            ax.tick_params(axis="x", length=0)
        else:
            ax.tick_params(axis="x", labelsize=7)
        ax.tick_params(axis="y", labelsize=6)
    # Trigger triangle marker ▼ above the top channel at t=0
    top_ax = fig.axes[-4]  # MicL is the first added in this gridspec
    top_ax.plot([0], [top_ax.get_ylim()[1]], marker="v", color="black",
                markersize=8, clip_on=False, zorder=10)
    # Compute scale-per-division for the footer (10 divs across the chart)
    # and find peak geo amplitude for the geo amp/div setting.
    total_s = times[-1] - times[0] if values else 0
    div_s = total_s / 10 if total_s > 0 else 0
    geo_amp_div = "—"
    for ch in ("Tran", "Vert", "Long"):
        v = rd.channels.get(ch) or []
        if v:
            amax = max(abs(x) for x in v)
            geo_amp_div = f"{(amax * 1.1 * 2) / 10:.3f}"
            break
    fig.text(
        0.11, 0.030,
        f"Time(Seconds) {div_s:.2f} sec/div   Amplitude Geo: {geo_amp_div} in/s/div   Mic: 0.001 psi(L)/div",
        fontsize=7, color="#444", ha="left",
    )
    fig.text(
        0.11, 0.018,
        "Trigger = ▶━━━━━ ━━━━━━◀",
        fontsize=7, color="#444", ha="left",
    )
 def _nice_geo_step(amax: float) -> float:
    """Pick a "nice" per-division step for the geo y-axis.
    Geo LSB is 0.005 in/s — sub-LSB steps like 0.003/div are nonsense.
    Quantize to the BW-style 1-2-5 sequence (0.005, 0.01, 0.025, 0.05,
    …) and return the smallest step where 5 divisions >= amax, so the
    top of the chart lands on a tick.
    """
    if amax <= 0:
        return 0.005
    for step in (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0):
        if step * 5 >= amax:
            return step
    return 10.0
 def _draw_histogram_subplot(fig, gridspec_cell, rd: ReportData) -> None:
    """4-channel stacked histogram bar chart — per-interval peaks.
    X-axis labeled with the actual times from rd.histogram_interval_times
    when available; otherwise interval index.
    The three geo channels share a single y-axis scale (a BW-style nice
    multiple of the 0.005 in/s LSB) so bar heights are directly
    comparable across channels.  MicL has its own auto-scale.
    """
    inner = gridspec_cell.subgridspec(4, 1, hspace=0.0)
    order = ["MicL", "Long", "Vert", "Tran"]
    last_idx = len(order) - 1
    # X-axis: use absolute time labels if we have them, else interval index
    have_times = bool(rd.histogram_interval_times)
    # Shared geo scale: max across Tran/Vert/Long, quantized to a nice
    # tick step.  Used for ylim + the footer "Amplitude Geo: X in/s/div".
    geo_amax = 0.0
    for gch in ("Tran", "Vert", "Long"):
        gv = rd.channels.get(gch) or []
        if gv:
            geo_amax = max(geo_amax, max(abs(x) for x in gv if x is not None))
    geo_step = _nice_geo_step(geo_amax)
    geo_top  = geo_step * 5  # 5 divisions — top tick lands at this value
    for i, ch in enumerate(order):
        ax = fig.add_subplot(inner[i])
        values = rd.channels.get(ch) or []
        if values:
            # Histograms record per-interval PEAK magnitudes — always
            # non-negative.  Codec output occasionally includes signed
            # values when the underlying .h5 was scaled like a waveform;
            # take the absolute value so the bars rise from zero.
            abs_vals = [abs(v) if v is not None else 0 for v in values]
            xs = np.arange(len(abs_vals))
            color = _channel_axis_color(ch)
            ax.bar(xs, abs_vals, color=color, width=0.85, linewidth=0)
            if ch in ("Tran", "Vert", "Long"):
                ax.set_ylim(0, geo_top)
                ax.set_yticks([j * geo_step for j in range(6)])
            else:
                amax = max(abs_vals, default=0)
                if amax > 0:
                    ax.set_ylim(0, amax * 1.10)
        ax.set_ylabel(ch, fontsize=8, rotation=0, ha="right", va="center",
                      color=_channel_axis_color(ch), weight="bold", labelpad=14)
        ax.text(1.005, 0.02, "0.0", transform=ax.transAxes,
                fontsize=7, color="#555", va="bottom", ha="left")
        ax.grid(True, axis="y", linestyle="--", linewidth=0.3, color="#bbb", alpha=0.6)
        if i != last_idx:
            ax.set_xticklabels([])
            ax.tick_params(axis="x", length=0)
        else:
            if have_times and len(rd.histogram_interval_times) == len(values):
                # Show 2-4 labels evenly spaced
                n = len(values)
                step = max(1, n // 4)
                tick_positions = list(range(0, n, step))
                ax.set_xticks(tick_positions)
                ax.set_xticklabels([rd.histogram_interval_times[t] for t in tick_positions],
                                   rotation=0, fontsize=6)
            else:
                ax.set_xlabel("Interval", fontsize=8)
            ax.tick_params(axis="x", labelsize=7)
        ax.tick_params(axis="y", labelsize=6)
    # Footer scale info — histograms use minute/div.  Reuses the shared
    # geo_step computed above so the label matches the actual y-axis
    # tick spacing on every subplot.
    interval_str = rd.histogram_interval_size or "—"
    geo_amp_div = f"{geo_step:.3f}"
    fig.text(
        0.11, 0.030,
        f"Time {interval_str} /div   Amplitude Geo: {geo_amp_div} in/s/div   Mic: 0.001 psi(L)/div",
        fontsize=7, color="#444", ha="left",
    )
@@ -499,6 +499,20 @@
      text-align: left;
      border-bottom: 1px solid var(--border);
      white-space: nowrap;
      position: sticky;
      top: 0;
      z-index: 1;
    }
    table.db-table thead th[data-sort]:hover {
      background: var(--border2);
      color: var(--text);
    }
    table.db-table thead th .sort-arrow {
      display: inline-block;
      width: 10px;
      color: var(--accent, #58a6ff);
      font-weight: 900;
      text-align: center;
    }
    table.db-table tbody tr { border-bottom: 1px solid var(--border2); }
    table.db-table tbody tr:last-child { border-bottom: none; }
@@ -758,7 +772,9 @@
      overflow: hidden;
      min-height: 0;
    }
-    #section-db { display: none; }
+    /* Default to Database view on page load — most users are here to
       browse stored events, not connect to a live unit. */
    #section-live { display: none; }
    /* ── Live connect bar (host/port/connect, live section only) ── */
    #live-connect-bar {
@@ -792,8 +808,8 @@
  </div>
  <div class="hdr-sep"></div>
  <div class="section-switcher">
-    <button class="section-btn active" onclick="switchSection('live')">Live Device</button>
+    <button class="section-btn"        onclick="switchSection('live')">Live Device</button>
-    <button class="section-btn"        onclick="switchSection('db')">Database</button>
+    <button class="section-btn active" onclick="switchSection('db')">Database</button>
  </div>
  <div class="hdr-sep"></div>
  <label class="force-toggle" id="force-toggle"
@@ -802,6 +818,12 @@
    <span class="ft-dot"></span>
    <span>Force refresh</span>
  </label>
  <div class="hdr-sep"></div>
  <button id="mic-unit-toggle" class="section-btn"
          onclick="_setMicUnit(_getMicUnit() === 'dBL' ? 'psi' : 'dBL')"
          title="Toggle microphone display unit (dBL ↔ psi) for waveform plots.  Affects all mic charts; persists across page loads.">
    Mic: dBL
  </button>
 </header>
 <!-- ════════════════════════════════════════════════════════════════
@@ -1224,18 +1246,18 @@
    <div class="db-table-wrap" id="hist-table-wrap" style="display:none">
      <table class="db-table" id="hist-table">
        <thead>
-          <tr>
+          <tr id="hist-header-row">
-            <th>Timestamp</th>
+            <th data-sort="timestamp">Timestamp <span class="sort-arrow"></span></th>
-            <th>Serial</th>
+            <th data-sort="serial">Serial <span class="sort-arrow"></span></th>
-            <th>Tran (in/s)</th>
+            <th data-sort="tran_ppv">Tran (in/s) <span class="sort-arrow"></span></th>
-            <th>Vert (in/s)</th>
+            <th data-sort="vert_ppv">Vert (in/s) <span class="sort-arrow"></span></th>
-            <th>Long (in/s)</th>
+            <th data-sort="long_ppv">Long (in/s) <span class="sort-arrow"></span></th>
-            <th>PVS (in/s)</th>
+            <th data-sort="peak_vector_sum">PVS (in/s) <span class="sort-arrow"></span></th>
-            <th>Mic (dBL)</th>
+            <th data-sort="mic_ppv">Mic (dBL) <span class="sort-arrow"></span></th>
-            <th>Project</th>
+            <th data-sort="project">Project <span class="sort-arrow"></span></th>
-            <th>Client</th>
+            <th data-sort="client">Client <span class="sort-arrow"></span></th>
-            <th>Type</th>
+            <th data-sort="record_type">Type <span class="sort-arrow"></span></th>
-            <th>Key</th>
+            <th data-sort="waveform_key">Key <span class="sort-arrow"></span></th>
            <th></th>
          </tr>
        </thead>
@@ -1388,7 +1410,9 @@ function deviceParams() {
 }
 // ── Section switching ─────────────────────────────────────────────────────────
-let currentSection = 'live';
+// Default to Database — most users land here to browse stored events.
 // Live Device is opt-in (click the tab to talk to a unit).
 let currentSection = 'db';
 function switchSection(name) {
  currentSection = name;
@@ -2285,13 +2309,16 @@ let sessLoaded   = false;
 const _unitSerials = new Set();
 function _ppvClass(v) {
-  if (v == null) return '';
+  const n = (v == null) ? null : Number(v);
-  if (v >= 2.0)  return 'ppv-high';
+  if (n == null || !isFinite(n)) return '';
-  if (v >= 0.5)  return 'ppv-warn';
+  if (n >= 2.0)  return 'ppv-high';
  if (n >= 0.5)  return 'ppv-warn';
  return 'ppv-ok';
 }
 function _ppvFmt(v) {
-  return v != null ? v.toFixed(5) : '—';
+  if (v == null) return '—';
  const n = typeof v === 'number' ? v : Number(v);
  return isFinite(n) ? n.toFixed(5) : String(v);
 }
 function _fmtTs(ts) {
  if (!ts) return '—';
@@ -2330,6 +2357,12 @@ async function _fetchUnits() {
 }
 // ── History tab ────────────────────────────────────────────────────────────────
 // Module-level state for the history table — preserved across re-sorts.
 // We sort + re-render without re-fetching.
 let _histEvents = [];
 let _histSortKey = 'timestamp';
 let _histSortDir = 'desc';   // 'asc' | 'desc'
 async function loadHistory() {
  histLoaded = true;
  const serial  = document.getElementById('hist-serial-filter').value;
@@ -2361,10 +2394,20 @@ async function loadHistory() {
  _populateSerialDropdown('monlog-serial-filter');
  _populateSerialDropdown('sess-serial-filter');
-  document.getElementById('hist-count').textContent = `${events.length} event${events.length !== 1 ? 's' : ''}`;
+  _histEvents = events;
  renderHistTable();
 }
 // Re-render the history table from `_histEvents` using the current sort
 // state.  Pulled out of `loadHistory` so column-header clicks can re-sort
 // in-memory without re-fetching from the server.
 function renderHistTable() {
  const events = _histEvents;
  document.getElementById('hist-count').textContent =
    `${events.length} event${events.length !== 1 ? 's' : ''}`;
  const tbody = document.getElementById('hist-tbody');
  tbody.innerHTML = '';
  if (events.length === 0) {
    document.getElementById('hist-empty').style.display = 'block';
    document.getElementById('hist-table-wrap').style.display = 'none';
@@ -2373,11 +2416,31 @@ async function loadHistory() {
  document.getElementById('hist-empty').style.display = 'none';
  document.getElementById('hist-table-wrap').style.display = 'block';
-  for (const ev of events) {
+  // Sort in-place by current key + direction.  Nulls sink to the bottom
  // regardless of direction.
  const k = _histSortKey;
  const dir = _histSortDir === 'asc' ? 1 : -1;
  const sorted = [...events].sort((a, b) => {
    const av = a[k], bv = b[k];
    if (av == null && bv == null) return 0;
    if (av == null) return 1;
    if (bv == null) return -1;
    if (typeof av === 'number' && typeof bv === 'number') return (av - bv) * dir;
    return String(av).localeCompare(String(bv)) * dir;
  });
  // Update arrow indicators in the headers
  document.querySelectorAll('#hist-header-row th[data-sort]').forEach(th => {
    const arrow = th.querySelector('.sort-arrow');
    if (!arrow) return;
    arrow.textContent = th.dataset.sort === k ? (_histSortDir === 'asc' ? '↑' : '↓') : '';
  });
  for (const ev of sorted) {
    const tr = document.createElement('tr');
    const pvs = ev.peak_vector_sum;
    tr.classList.add('clickable');
-    tr.title = 'Click to review (open sidecar editor)';
+    tr.title = 'Click to view waveform + sidecar';
    tr.dataset.eventId = ev.id;
    tr.innerHTML = `
      <td>${_fmtTs(ev.timestamp)}</td>
@@ -2386,7 +2449,14 @@ async function loadHistory() {
      <td class="${_ppvClass(ev.vert_ppv)}">${_ppvFmt(ev.vert_ppv)}</td>
      <td class="${_ppvClass(ev.long_ppv)}">${_ppvFmt(ev.long_ppv)}</td>
      <td class="${_ppvClass(pvs)}">${_ppvFmt(pvs)}</td>
-      <td class="td-dim">${ev.mic_ppv != null && ev.mic_ppv > 0 ? (20 * Math.log10(ev.mic_ppv / DBL_REF)).toFixed(1) + ' dBL' : '—'}</td>
+      <td class="td-dim">${(() => {
        const m = ev.mic_ppv == null ? null : Number(ev.mic_ppv);
        if (m == null || !isFinite(m) || m <= 0) return '—';
        // Series III (MiniMate Plus / BW) stores mic_ppv as psi → convert.
        // Series IV (Micromate / Thor) already stores dB(L) → display direct.
        if (ev.device_family === 'series4') return m.toFixed(1) + ' dBL';
        return (20 * Math.log10(m / DBL_REF)).toFixed(1) + ' dBL';
      })()}</td>
      <td class="td-text">${ev.project ?? '—'}</td>
      <td class="td-text">${ev.client  ?? '—'}</td>
      <td class="td-dim">${ev.record_type ?? '—'}</td>
@@ -2398,6 +2468,28 @@ async function loadHistory() {
  }
 }
 // Click a column header → toggle sort.  Click another → set sort to that column.
 document.addEventListener('DOMContentLoaded', () => {
  const headerRow = document.getElementById('hist-header-row');
  if (!headerRow) return;
  headerRow.querySelectorAll('th[data-sort]').forEach(th => {
    th.style.cursor = 'pointer';
    th.style.userSelect = 'none';
    th.addEventListener('click', () => {
      const k = th.dataset.sort;
      if (_histSortKey === k) {
        _histSortDir = _histSortDir === 'asc' ? 'desc' : 'asc';
      } else {
        _histSortKey = k;
        // Default direction: 'desc' for numbers + timestamps (biggest/newest first),
        // 'asc' for text columns (alphabetical).
        _histSortDir = ['serial','project','client','record_type','waveform_key'].includes(k) ? 'asc' : 'desc';
      }
      renderHistTable();
    });
  });
 });
 // ── Sidecar review modal ───────────────────────────────────────────────────────
 //
 // Opens on row click in the History table.  Loads the .sfm.json sidecar
@@ -2420,23 +2512,373 @@ async function openSidecarModal(eventId) {
  document.getElementById('sc-edit-ft').checked = false;
  document.getElementById('sc-edit-reviewer').value = '';
  document.getElementById('sc-edit-notes').value = '';
  // Reset waveform area
  document.getElementById('sc-waveform-status').textContent = 'Loading waveform…';
  document.getElementById('sc-waveform-charts').innerHTML = '';
  _destroyScCharts();
-  try {
+  // Sidecar + waveform fetched in parallel — neither blocks the other.
-    const r = await fetch(`${api()}/db/events/${eventId}/sidecar`);
+  const sidecarP  = fetch(`${api()}/db/events/${eventId}/sidecar`)
-    if (!r.ok) {
+    .then(async r => {
-      const e = await r.json().catch(() => ({}));
+      if (!r.ok) { const e = await r.json().catch(() => ({})); throw new Error(e.detail || r.statusText); }
-      throw new Error(e.detail || r.statusText);
+      return r.json();
-    }
+    });
-    const data = await r.json();
+  const waveformP = fetch(`${api()}/db/events/${eventId}/waveform.json`)
    .then(async r => {
      if (r.status === 404) return null;  // no waveform available — render empty state
      if (!r.ok) { const e = await r.json().catch(() => ({})); throw new Error(e.detail || r.statusText); }
      return r.json();
    });
  // Sidecar usually loads first (smaller payload).  Each one renders
  // independently so the modal becomes useful as soon as either lands.
  sidecarP.then(data => {
    _scCurrentSidecar = data;
    _renderSidecar(data);
    document.getElementById('sc-status').textContent = '';
-  } catch (e) {
+  }).catch(e => {
    document.getElementById('sc-status').className = 'sc-status error';
-    document.getElementById('sc-status').textContent = `Load failed: ${e.message}`;
+    document.getElementById('sc-status').textContent = `Sidecar load failed: ${e.message}`;
  });
  waveformP.then(data => {
    if (!data) {
      document.getElementById('sc-waveform-status').textContent = 'No waveform data for this event.';
      return;
    }
    _renderScWaveform(data);
  }).catch(e => {
    document.getElementById('sc-waveform-status').textContent = `Waveform load failed: ${e.message}`;
  });
 }
 // ── Sidecar-modal waveform plot ──────────────────────────────────────────────
 // Renders the 4-channel decoded waveform fetched from
 // /db/events/{id}/waveform.json — MicL on top, Tran on bottom (matches
 // Instantel BW Event Report layout).  Uses Chart.js (loaded at the top of
 // the page for the live-device viewer).
 const _SC_CHANNEL_COLORS = {
  MicL: '#e066ff',
  Long: '#3a80ff',
  Vert: '#3fb950',
  Tran: '#f85149',
 };
 const _SC_CHANNEL_ORDER = ['MicL', 'Long', 'Vert', 'Tran'];
 let _scCharts = {};
 // User preference for how mic is displayed in plots — dBL (default,
 // matches BW printout convention + the rest of SFM) or psi (the raw
 // sample unit).  Toggleable via the header pill; persists in localStorage.
 function _getMicUnit() {
  return localStorage.getItem('sfm_mic_unit') === 'psi' ? 'psi' : 'dBL';
 }
 function _setMicUnit(u) {
  localStorage.setItem('sfm_mic_unit', u === 'psi' ? 'psi' : 'dBL');
  _refreshMicUnitToggleLabel();
  // Re-render the open modal so the change is immediately visible.
  if (_scCurrentEventId) openSidecarModal(_scCurrentEventId);
 }
 function _refreshMicUnitToggleLabel() {
  const b = document.getElementById('mic-unit-toggle');
  if (b) b.textContent = `Mic: ${_getMicUnit()}`;
 }
 // Convert a psi value to dB(L).  Returns null for non-positive values
 // (log of zero is undefined) — Chart.js handles null as a gap in the line.
 function _psiToDbl(psi) {
  if (psi == null || !(psi > 0)) return null;
  return 20 * Math.log10(psi / DBL_REF);
 }
 // Per-sample mic display floor.  Sound pressure AC samples spend most
 // of their time at the digitization noise floor (1-2 ADC counts ≈ ~20-40
 // dBL).  Rendering each one as null/-inf produces a spikey discontinuous
 // chart of "moments when sound briefly exceeded 80 dBL" — confusing.
 // Instead we rectify (abs the AC waveform), convert to dBL, and floor
 // anything below MIC_DBL_FLOOR so the chart has a continuous baseline
 // with peaks rising above it.  Matches how acoustic engineers expect to
 // see SPL-vs-time.
 const MIC_DBL_FLOOR = 60;
 function _psiToDblForChart(psi) {
  if (psi == null) return MIC_DBL_FLOOR;
  const a = Math.abs(psi);
  if (a === 0) return MIC_DBL_FLOOR;
  const dbl = 20 * Math.log10(a / DBL_REF);
  return dbl > MIC_DBL_FLOOR ? dbl : MIC_DBL_FLOOR;
 }
 // Adaptive decimal formatter — scientific notation is reserved for truly
 // extreme values (10000+ or sub-0.0001).  Normal-range values (most peaks
 // fall here) render as decimals with sensible precision.  Replaces the
 // previous .toExponential(3) call that turned every peak into ugly "2.500E-2".
 function _fmtPeak(v, unit) {
  if (v == null || (typeof v === 'number' && !isFinite(v))) return '';
  if (typeof v !== 'number') return String(v) + (unit ? ' ' + unit : '');
  if (v === 0) return '0' + (unit ? ' ' + unit : '');
  const a = Math.abs(v);
  const u = unit ? ' ' + unit : '';
  if (a >= 0.0001 && a < 10000) {
    const d = a >= 100 ? 1 : a >= 10 ? 2 : a >= 1 ? 3 : a >= 0.1 ? 4 : 5;
    return v.toFixed(d) + u;
  }
  return v.toExponential(2) + u;
 }
 function _destroyScCharts() {
  Object.values(_scCharts).forEach(c => { try { c.destroy(); } catch {} });
  _scCharts = {};
 }
 function _renderScWaveform(data) {
  document.getElementById('sc-waveform-status').textContent = '';
  const chartsDiv = document.getElementById('sc-waveform-charts');
  chartsDiv.innerHTML = '';
  _destroyScCharts();
  const channels = data.channels || {};
  // time_axis is METADATA, not an array — it carries sample_rate,
  // pretrig_samples, t0_ms (first-sample time relative to trigger,
  // negative when pretrig samples exist), and dt_ms.  Trigger is at
  // t=0 by convention.
  const ta       = data.time_axis || {};
  const sr       = ta.sample_rate || 1024;
  const dtMs    = ta.dt_ms || (1000.0 / sr);
  const t0Ms    = ta.t0_ms != null ? ta.t0_ms : 0;
  // Histogram events have per-interval peaks, not per-sample data.
  // Render as bars (one per interval) instead of a connected line, and
  // suppress trigger/zero overlays which don't apply.  X-axis becomes
  // interval index since the sample_rate-based time math is meaningless
  // here (each "sample" is one interval, typically 1-5 minutes long).
  const isHistogram = String(data.record_type || '').toLowerCase().includes('histogram');
  // Which channels have data — determines which one renders the shared bottom axis.
  const withData = _SC_CHANNEL_ORDER.filter(ch =>
    channels[ch] && (channels[ch].values || []).length > 0
  );
  const lastCh = withData[withData.length - 1];
  const micUnit = _getMicUnit();   // user preference: 'dBL' or 'psi'
  for (const ch of _SC_CHANNEL_ORDER) {
    const chData = channels[ch];
    if (!chData) continue;
    let values = chData.values || [];
    let chUnit = chData.unit || '';
    let chPeak = chData.peak;
    // Mic channel: convert from raw psi to dB(L) when user prefers dBL
    // (default).  Per-sample values use _psiToDblForChart which rectifies
    // (abs) the AC waveform and floors at MIC_DBL_FLOOR so the chart is
    // continuous with a baseline + peaks above it, instead of a sparse
    // pattern of isolated spikes for "moments when sound briefly exceeded
    // the Y-axis bottom".  The peak label uses _psiToDbl with the
    // unrectified peak (preserves the true measurement).
    if (ch === 'MicL' && chUnit === 'psi' && micUnit === 'dBL') {
      values = values.map(_psiToDblForChart);
      chPeak = _psiToDbl(chPeak);
      chUnit = 'dB(L)';
    }
    const wrap = document.createElement('div');
    wrap.style.cssText = 'background:var(--surface);border:1px solid var(--border2);border-radius:6px;padding:6px 30px 4px 10px';
    const lbl = document.createElement('div');
    lbl.style.cssText = `font-size:10px;font-weight:600;letter-spacing:0.05em;text-transform:uppercase;margin-bottom:2px;color:${_SC_CHANNEL_COLORS[ch]};display:flex;justify-content:space-between`;
    const peakStr = chPeak != null
      ? `peak ${_fmtPeak(chPeak, chUnit)}`
      : '';
    lbl.innerHTML = `<span>${ch}</span><span style="color:var(--text-dim);font-weight:normal">${peakStr}</span>`;
    wrap.appendChild(lbl);
    if (values.length === 0) {
      const e = document.createElement('div');
      e.style.cssText = 'height:80px;display:flex;align-items:center;justify-content:center;color:var(--text-dim);font-size:11px';
      e.textContent = 'no samples decoded';
      wrap.appendChild(e);
      chartsDiv.appendChild(wrap);
      continue;
    }
    const canvasWrap = document.createElement('div');
    canvasWrap.style.cssText = 'position:relative;height:100px';
    const canvas = document.createElement('canvas');
    canvasWrap.appendChild(canvas);
    wrap.appendChild(canvasWrap);
    chartsDiv.appendChild(wrap);
    // Waveform: per-sample time in ms relative to trigger (negative for pretrig).
    // Histogram: when the server has aggregated to BW-reported intervals AND
    // provides per-interval timestamps, use those as x-axis labels (HH:MM:SS).
    // Falls back to interval index.
    let times;
    if (isHistogram) {
      const intervalTimes = ta.interval_times || [];
      times = (intervalTimes.length === values.length)
        ? intervalTimes
        : values.map((_, i) => i + 1);
    } else {
      times = values.map((_, i) => t0Ms + i * dtMs);
    }
    // Downsample for rendering when very long.
    const MAX = 3000;
    let rT = times, rV = values;
    if (values.length > MAX) {
      const step = Math.ceil(values.length / MAX);
      rT = times.filter((_, i) => i % step === 0);
      rV = values.filter((_, i) => i % step === 0);
    }
    const showX = (ch === lastCh);
    // Tick label formatter: snap floats to 1 decimal place so we don't get
    // "11.7187040000000002 ms" garbage from accumulated floating-point error.
    const xAxisLabel = isHistogram ? '' : ' ms';
    const fmtTick = i => {
      const v = rT[i];
      if (typeof v === 'number') {
        // Whole numbers (intervals) → no decimals.  Sub-integer ms → 1 decimal.
        const s = Number.isInteger(v) ? String(v) : v.toFixed(1);
        return s + xAxisLabel;
      }
      return String(v) + xAxisLabel;
    };
    // Y-axis bounds.  Convention:
    //   - Geophones (Tran/Vert/Long) on waveform-mode events:
    //     symmetric around zero so the zero line sits in the middle and
    //     positive/negative excursions are visually balanced.
    //   - Mic (always positive sound pressure) + histograms (per-interval
    //     peaks, always positive): default auto-scale, zero at the bottom.
    let yBounds = {};
    const isGeo = ch !== 'MicL';
    if (isGeo && !isHistogram) {
      // Waveform geo: symmetric around zero, full zoom to shape detail.
      let absMax = 0;
      for (const v of values) {
        const a = Math.abs(v);
        if (a > absMax) absMax = a;
      }
      const padded = (absMax || 1) * 1.10;
      yBounds = { min: -padded, max: padded };
    } else if (isGeo && isHistogram) {
      // Histogram geo: enforce a minimum chart range so a quiet
      // 0.005 in/s event renders as ~10% of chart height instead of
      // filling the panel.  Matches BW's near-fixed-scale convention
      // (their footer is "Geo: 0.002 in/s/div" — a chart-relative scale,
      // not auto-zoom).
      const HIST_GEO_MIN_INS = 0.05;
      let peak = 0;
      for (const v of values) { const a = Math.abs(v); if (a > peak) peak = a; }
      yBounds = { min: 0, max: Math.max(peak * 1.10, HIST_GEO_MIN_INS) };
    } else if (ch === 'MicL' && micUnit === 'dBL') {
      // Mic in dBL — pin baseline at noise-floor minimum (where we floored
      // quiet samples), top at actual peak + a few dB headroom.
      const peakDbl = (typeof chPeak === 'number' && isFinite(chPeak))
        ? chPeak + 5
        : 100;
      yBounds = { min: MIC_DBL_FLOOR, max: Math.max(peakDbl, MIC_DBL_FLOOR + 20) };
    } else if (ch === 'MicL' && isHistogram && micUnit === 'psi') {
      // Mic histogram in psi — same minimum-range treatment as geo.
      // 0.001 psi ≈ 110 dBL — typical "loud" mic peak.  Quiet events
      // sit near the bottom.
      const HIST_MIC_MIN_PSI = 0.001;
      let peak = 0;
      for (const v of values) { const a = Math.abs(v); if (a > peak) peak = a; }
      yBounds = { min: 0, max: Math.max(peak * 1.10, HIST_MIC_MIN_PSI) };
    }
    _scCharts[ch] = new Chart(canvas, {
      type: isHistogram ? 'bar' : 'line',
      data: {
        labels: rT.map(t => (typeof t === 'number' ? (Number.isInteger(t) ? String(t) : t.toFixed(2)) : t)),
        datasets: isHistogram ? [{
          data: rV,
          backgroundColor: _SC_CHANNEL_COLORS[ch],
          borderWidth: 0,
          barPercentage: 1.0,
          categoryPercentage: 1.0,  // bars touch — "tight bargraph" look
        }] : [{
          data: rV,
          borderColor: _SC_CHANNEL_COLORS[ch],
          borderWidth: 1,
          pointRadius: 0,
          tension: 0,
        }],
      },
      options: {
        animation: false, responsive: true, maintainAspectRatio: false,
        plugins: {
          legend: { display: false },
          tooltip: {
            mode: 'index', intersect: false,
            callbacks: {
              title: items => isHistogram
                ? `interval ${items[0].label}`
                : `t = ${items[0].label} ms`,
              label: item => `${ch}: ${_fmtPeak(item.raw, chUnit)}`,
            },
          },
        },
        scales: {
          x: {
            type: 'category', display: showX,
            ticks: { color: '#484f58', maxTicksLimit: 8, maxRotation: 0, callback: (v, i) => fmtTick(i) },
            grid:  { color: '#21262d', drawTicks: showX },
          },
          y: {
            ...yBounds,
            ticks: { color: '#484f58', maxTicksLimit: 4 },
            grid:  { color: '#21262d' },
            title: { display: true, text: chUnit, color: '#484f58', font: { size: 9 } },
          },
        },
      },
      plugins: isHistogram ? [] : [{
        // Trigger line + triangle markers + zero baseline — only meaningful
        // for waveform-mode events.  Histograms have no trigger.
        id: 'overlays',
        afterDraw(chart) {
          const ctx = chart.ctx, x = chart.scales.x, y = chart.scales.y;
          // Dashed trigger line at t=0
          const zi = rT.findIndex(t => parseFloat(t) >= 0);
          if (zi >= 0) {
            const px = x.getPixelForValue(zi);
            ctx.save();
            ctx.beginPath(); ctx.moveTo(px, y.top); ctx.lineTo(px, y.bottom);
            ctx.strokeStyle = 'rgba(248,81,73,0.8)'; ctx.lineWidth = 1.2;
            ctx.setLineDash([4, 3]); ctx.stroke(); ctx.restore();
            // Triangle markers above and below the chart
            ctx.save();
            ctx.fillStyle = '#f85149';
            ctx.beginPath();
            ctx.moveTo(px - 4, y.top - 7); ctx.lineTo(px + 4, y.top - 7); ctx.lineTo(px, y.top - 1);
            ctx.closePath(); ctx.fill();
            ctx.beginPath();
            ctx.moveTo(px - 4, y.bottom + 7); ctx.lineTo(px + 4, y.bottom + 7); ctx.lineTo(px, y.bottom + 1);
            ctx.closePath(); ctx.fill();
            ctx.restore();
          }
          // Zero baseline + label
          const zy = y.getPixelForValue(0);
          if (zy >= y.top && zy <= y.bottom) {
            ctx.save();
            ctx.strokeStyle = '#30363d'; ctx.lineWidth = 0.8;
            ctx.setLineDash([2, 2]);
            ctx.beginPath(); ctx.moveTo(x.left, zy); ctx.lineTo(x.right, zy); ctx.stroke();
            ctx.restore();
            ctx.save();
            ctx.fillStyle = '#c9d1d9'; ctx.font = '10px monospace';
            ctx.textAlign = 'left'; ctx.textBaseline = 'middle';
            ctx.fillText('0.0', x.right + 6, zy);
            ctx.restore();
          }
        },
      }],
    });
  }
 }
 // Make sure charts get cleaned up when the modal closes.
 function _scCleanupOnClose() { _destroyScCharts(); }
 function _renderSidecar(data) {
  const ev   = data.event        || {};
  const pv   = data.peak_values  || {};
@@ -2444,38 +2886,103 @@ function _renderSidecar(data) {
  const bw   = data.blastware    || {};
  const src  = data.source       || {};
  const rev  = data.review       || {};
  // bw_report carries the per-channel ASCII-derived stats (ZC Freq,
  // saturation flags, peak time, etc.).  Only present on events
  // ingested with a preserved .TXT (post-2026-05-27); falls back to
  // empty for legacy events.
  const bwrPeaks = (data.bw_report || {}).peaks || {};
  const bwrMic   = (data.bw_report || {}).mic   || {};
  document.getElementById('sc-title').textContent = `Event — ${bw.filename || ev.waveform_key || 'unknown'}`;
-  const fmtPpv = v => (v == null ? '—' : Number(v).toFixed(5) + ' in/s');
+  const fmtPpv = v => {
    if (v == null) return '—';
    const n = Number(v);
    return isFinite(n) ? n.toFixed(5) + ' in/s' : String(v);
  };
  // Map sidecar source.kind → device family (Series IV ingest path is
  // "idf-import"; everything else is Series III today).  The events-list
  // table uses ev.device_family from the DB row, but sidecars don't carry
  // that column — source.kind is the equivalent signal here.
  const family = ((src.kind || '') === 'idf-import') ? 'series4' : 'series3';
  const fmtMic = v => {
-    if (v == null || v <= 0) return '—';
+    if (v == null) return '—';
-    const dbl = 20 * Math.log10(v / DBL_REF);
+    const n = Number(v);
-    return `${dbl.toFixed(1)} dBL  (${v.toExponential(2)} psi)`;
+    if (!isFinite(n) || n <= 0) return '—';
    // Series IV (Micromate / Thor) stores mic as dB(L); Series III (BW)
    // stores it as psi and we render both for cross-reference.
    if (family === 'series4') return `${n.toFixed(1)} dBL`;
    const dbl = 20 * Math.log10(n / DBL_REF);
    return `${dbl.toFixed(1)} dBL  (${n.toExponential(2)} psi)`;
  };
  document.getElementById('sc-f-serial').textContent   = ev.serial          || '—';
-  document.getElementById('sc-f-ts').textContent       = ev.timestamp       || '—';
+  // Route through _fmtTs so the unit-local naive timestamp shows as
  // "5/27/2026, 6:00:13 AM" instead of "2026-05-27T06:00:13".
  document.getElementById('sc-f-ts').textContent       = _fmtTs(ev.timestamp);
  document.getElementById('sc-f-rt').textContent       = ev.record_type     || '—';
  document.getElementById('sc-f-sr').textContent       = (ev.sample_rate ?? '—') + (ev.sample_rate ? ' sps' : '');
  document.getElementById('sc-f-key').textContent      = ev.waveform_key    || '—';
-  document.getElementById('sc-f-tran').textContent     = fmtPpv(pv.transverse);
+  // Suffix with " · {prefix}{N} Hz" when bw_report has a ZC Freq.
-  document.getElementById('sc-f-vert').textContent     = fmtPpv(pv.vertical);
+  // Above-range ZC peaks (BW ">100 Hz") get a literal ">" prefix so
-  document.getElementById('sc-f-long').textContent     = fmtPpv(pv.longitudinal);
+  // operators see the same indicator the PDF shows.
  const fmtZc = bwr => {
    if (!bwr || bwr.zc_freq_hz == null) return '';
    const prefix = bwr.zc_freq_above_range ? '>' : '';
    return ` · ${prefix}${Math.round(bwr.zc_freq_hz)} Hz`;
  };
  document.getElementById('sc-f-tran').textContent     = fmtPpv(pv.transverse)   + fmtZc(bwrPeaks.tran);
  document.getElementById('sc-f-vert').textContent     = fmtPpv(pv.vertical)     + fmtZc(bwrPeaks.vert);
  document.getElementById('sc-f-long').textContent     = fmtPpv(pv.longitudinal) + fmtZc(bwrPeaks.long);
  document.getElementById('sc-f-pvs').textContent      = fmtPpv(pv.vector_sum);
-  document.getElementById('sc-f-mic').textContent      = fmtMic(pv.mic_psi);
+  document.getElementById('sc-f-mic').textContent      = fmtMic(pv.mic_psi)      + fmtZc(bwrMic);
  document.getElementById('sc-f-project').textContent  = pi.project         || '—';
  document.getElementById('sc-f-client').textContent   = pi.client          || '—';
  document.getElementById('sc-f-operator').textContent = pi.operator        || '—';
  document.getElementById('sc-f-loc').textContent      = pi.sensor_location || '—';
-  document.getElementById('sc-f-bw').textContent       = bw.filename        || '—';
+  // Filename rendered as a clickable download link for the original BW
  // binary.  Same endpoint the live-device viewer uses for stored events
  // (/db/events/{id}/blastware_file).
  const bwCell = document.getElementById('sc-f-bw');
  bwCell.innerHTML = '';
  if (bw.filename && _scCurrentEventId) {
    const a = document.createElement('a');
    a.href = `${api()}/db/events/${_scCurrentEventId}/blastware_file`;
    a.textContent = bw.filename;
    a.download = bw.filename;
    a.title = 'Download original BW event binary';
    a.style.color = 'var(--accent, #58a6ff)';
    a.style.textDecoration = 'underline';
    bwCell.appendChild(a);
  } else {
    bwCell.textContent = '—';
  }
  document.getElementById('sc-f-bwsize').textContent   = bw.filesize != null ? `${bw.filesize} bytes` : '—';
  document.getElementById('sc-f-sha').textContent      = bw.sha256          || '—';
-  document.getElementById('sc-f-src').textContent      = src.kind           || '—';
+  // Source kind + a download link for the preserved BW ASCII report
-  document.getElementById('sc-f-cap').textContent      = src.captured_at    || '—';
+  // (.TXT), when available.  Only events ingested after 2026-05-27
  // have the .TXT preserved; older events show "—".
  const srcCell = document.getElementById('sc-f-src');
  srcCell.innerHTML = '';
  srcCell.appendChild(document.createTextNode(src.kind || '—'));
  if (src.txt_filename && _scCurrentEventId) {
    const a = document.createElement('a');
    a.href = `${api()}/db/events/${_scCurrentEventId}/ascii_report.txt`;
    a.textContent = ' (download .TXT)';
    a.download = src.txt_filename;
    a.title = 'Download preserved BW ASCII report';
    a.style.color = 'var(--accent, #58a6ff)';
    a.style.marginLeft = '8px';
    a.style.fontSize = '11px';
    srcCell.appendChild(a);
  }
  // captured_at has a "Z" suffix (UTC); _fmtTs converts to browser local
  // — matches the BW-reported recorded-at, no more "21:59:57 vs it's 6 PM"
  // confusion from operators reading the raw UTC value.
  document.getElementById('sc-f-cap').textContent      = _fmtTs(src.captured_at);
  document.getElementById('sc-edit-ft').checked        = !!rev.false_trigger;
  document.getElementById('sc-edit-reviewer').value    = rev.reviewer || '';
@@ -2488,6 +2995,19 @@ function closeSidecarModal() {
  document.getElementById('sc-overlay').classList.remove('visible');
  _scCurrentEventId = null;
  _scCurrentSidecar = null;
  _destroyScCharts();
 }
 // Trigger a PDF download for the currently-open event.  The browser
 // handles the actual save dialog from the Content-Disposition header
 // the server sends.
 function downloadEventReport() {
  if (!_scCurrentEventId) return;
  const url = `${api()}/db/events/${_scCurrentEventId}/report.pdf`;
  // Open in a new tab — browser prompts to save or displays inline,
  // and a failed fetch (e.g. 404 for events with no waveform) shows
  // its JSON error in-page rather than silently failing.
  window.open(url, '_blank');
 }
 function onSidecarOverlayClick(e) {
@@ -2698,6 +3218,16 @@ document.addEventListener('keydown', e => {
 // hit localhost:8200, 10.0.0.44:8200, or anything else.
 document.getElementById('api-base').value = window.location.origin;
 // Reflect any persisted mic-unit preference in the header pill on load
 _refreshMicUnitToggleLabel();
 // We default to Database view → trigger initial history + units load
 // (switchSection handles this when clicked, but we never click on first paint).
 if (currentSection === 'db') {
  if (!histLoaded)  loadHistory();
  if (!unitsLoaded) loadUnits();
 }
 // Press Enter in any live connect field to connect
 ['dev-host','dev-port'].forEach(id => {
  document.getElementById(id)?.addEventListener('keydown', e => { if (e.key === 'Enter') connectUnit(); });
@@ -2714,11 +3244,18 @@ document.getElementById('api-base').value = window.location.origin;
      <button class="sc-close" onclick="closeSidecarModal()">×</button>
    </div>
    <div class="sc-body">
      <!-- Waveform plot — 4 channels stacked (MicL, Long, Vert, Tran) — -->
      <div class="sc-section" id="sc-section-waveform">
        <h4>Waveform</h4>
        <div id="sc-waveform-status" style="color:var(--text-dim);font-size:11px;margin-bottom:6px">Loading…</div>
        <div id="sc-waveform-charts" style="display:flex;flex-direction:column;gap:6px"></div>
      </div>
      <div class="sc-section">
        <h4>Event</h4>
        <dl class="sc-grid">
          <dt>Serial</dt>           <dd id="sc-f-serial">—</dd>
-          <dt>Timestamp</dt>        <dd id="sc-f-ts">—</dd>
+          <dt title="When the seismograph recorded this event (from the BW report's Event Time field)">Recorded at</dt>
                                    <dd id="sc-f-ts">—</dd>
          <dt>Record type</dt>      <dd id="sc-f-rt">—</dd>
          <dt>Sample rate</dt>      <dd id="sc-f-sr">—</dd>
          <dt>Waveform key</dt>     <dd id="sc-f-key">—</dd>
@@ -2746,11 +3283,12 @@ document.getElementById('api-base').value = window.location.origin;
      <div class="sc-section">
        <h4>Source / files</h4>
        <dl class="sc-grid">
-          <dt>BW filename</dt>      <dd id="sc-f-bw">—</dd>
+          <dt id="sc-l-bw">Event file</dt>      <dd id="sc-f-bw">—</dd>
-          <dt>BW filesize</dt>      <dd id="sc-f-bwsize">—</dd>
+          <dt id="sc-l-bwsize">File size</dt>   <dd id="sc-f-bwsize">—</dd>
-          <dt>BW sha256</dt>        <dd id="sc-f-sha">—</dd>
+          <dt id="sc-l-sha">File sha256</dt>    <dd id="sc-f-sha">—</dd>
          <dt>Source kind</dt>      <dd id="sc-f-src">—</dd>
-          <dt>Captured at</dt>      <dd id="sc-f-cap">—</dd>
+          <dt title="When SFM received and stored this event — NOT the unit-local trigger time (see Timestamp at the top of the modal for that).">Time received</dt>
                                    <dd id="sc-f-cap">—</dd>
        </dl>
      </div>
      <div class="sc-section">
@@ -2773,6 +3311,10 @@ document.getElementById('api-base').value = window.location.origin;
    </div>
    <div class="sc-footer">
      <span class="sc-status" id="sc-status"></span>
      <button class="btn btn-ghost" id="sc-pdf-btn" onclick="downloadEventReport()"
              title="Download an Instantel-style Event Report PDF for this event">
        Download PDF
      </button>
      <button class="btn btn-ghost" onclick="closeSidecarModal()">Cancel</button>
      <button class="btn" id="sc-save-btn" onclick="saveSidecarReview()">Save</button>
    </div>
@@ -34,7 +34,7 @@ import logging
 import pickle
 import shutil
 from pathlib import Path
-from typing import Optional
+from typing import Optional, Union
 from minimateplus import event_file_io
 from minimateplus.blastware_file import blastware_filename, write_blastware_file
@@ -108,11 +108,30 @@ class WaveformStore:
        """Return absolute path to the .h5 clean-waveform file for a given event."""
        return self._serial_dir(serial) / f"{filename}.h5"
    def txt_path_for(self, serial: str, filename: str) -> Path:
        """Return absolute path to the preserved BW ASCII report (.TXT)
        for a given event.
        We name it ``<filename>_ASCII.TXT`` to match BW's own filename
        convention in the ACH folder.  Saved at ingest time alongside
        the binary so the parser bug fixes can be applied retroactively
        by re-parsing without needing to re-forward from the watcher PC.
        """
        return self._serial_dir(serial) / f"{filename}_ASCII.TXT"
    def open_blastware(self, serial: str, filename: str) -> Optional[Path]:
        """Return absolute path to an existing event file or None."""
        bw_path, _ = self.paths_for(serial, filename)
        return bw_path if bw_path.exists() else None
    def open_txt(self, serial: str, filename: str) -> Optional[Path]:
        """Return absolute path to the preserved BW ASCII report for an
        event, or None if the .TXT wasn't saved at ingest time (events
        ingested before .TXT preservation landed will show None until
        re-forwarded)."""
        p = self.txt_path_for(serial, filename)
        return p if p.exists() else None
    # ── save / load ─────────────────────────────────────────────────────────────
    def save(
@@ -258,6 +277,7 @@ class WaveformStore:
        source_path: Path,
        *,
        serial_hint: Optional[str] = None,
        bw_report_text: Optional[Union[str, bytes]] = None,
    ) -> tuple[Event, dict]:
        """
        Ingest a Blastware event file produced by an external tool
@@ -267,10 +287,17 @@ class WaveformStore:
        Workflow:
          1. Parse the bytes via event_file_io.read_blastware_file (writes
             a temp file to do that, since the parser takes a path).
-          2. Resolve serial from BW filename (`<P><serial3>...`) or use
+          2. Optionally parse a paired BW ASCII event report (the .TXT
             file BW writes alongside the binary).  When supplied, its
             decoded fields land in the sidecar's `bw_report` block AND
             overlay the device-authoritative peak values into the
             top-level `peak_values` block.  This is the right path for
             the ACH-forwarder daemon use case where Blastware's own
             ACH writes both files into the watch folder.
          3. Resolve serial from BW filename (`<P><serial3>...`) or use
             serial_hint.  Falls back to "UNKNOWN".
-          3. Copy the BW bytes verbatim into <root>/<serial>/<filename>.
+          4. Copy the BW bytes verbatim into <root>/<serial>/<filename>.
-          4. Write the .sfm.json sidecar with source.kind = "bw-import"
+          5. Write the .sfm.json sidecar with source.kind = "bw-import"
             and a5_pickle_filename = None.  Does NOT write a .a5.pkl
             (no A5 source available; byte-for-byte regeneration not
             possible — the on-disk BW file IS the byte-for-byte source).
@@ -292,6 +319,47 @@ class WaveformStore:
            except FileNotFoundError:
                pass
        # read_blastware_file derives record_type from its path arg, but
        # that arg is the tmp file (suffix ".bw") — so override with the
        # original filename's encoded type (H/W/M/E/C in the BW AB0T
        # scheme).  Without this override every BW-imported event lands
        # in the DB with record_type="Waveform" regardless of the actual
        # type (Histogram, Manual, etc.).
        ev.record_type = event_file_io.derive_record_type_from_filename(
            source_path.name
        )
        # Parse the BW ASCII report if one was supplied.  Failures here
        # are non-fatal: we still write the binary + sidecar without the
        # rich derived fields.
        bw_report = None
        if bw_report_text is not None:
            try:
                from minimateplus.bw_ascii_report import parse_report
                bw_report = parse_report(bw_report_text)
            except Exception as exc:
                log.warning(
                    "save_imported_bw: BW report parse failed: %s — continuing without it",
                    exc,
                )
        # If we have a report, overlay its device-authoritative fields
        # (peaks, project, sample_rate, record_time) onto the Event
        # BEFORE handing it to db.insert_events().  Without this overlay
        # the DB row gets `peak_values` from _peaks_from_samples(), which
        # runs the still-undecoded waveform codec on the BW body and
        # produces ±10 in/s saturation values on every channel for every
        # event.  The sidecar JSON had the correct values via
        # event_to_sidecar_dict(bw_report=...) but the DB columns didn't.
        if bw_report is not None:
            try:
                event_file_io.apply_report_to_event(ev, bw_report)
            except Exception as exc:
                log.warning(
                    "save_imported_bw: failed to overlay report onto event: %s",
                    exc,
                )
        # Resolve serial.  blastware_filename derives a 4-char prefix from
        # the numeric serial (e.g. BE11529 → M529); we go the other way
        # via the source filename if a hint wasn't given.
@@ -308,6 +376,28 @@ class WaveformStore:
        filesize = bw_path.stat().st_size
        sha256   = event_file_io.file_sha256(bw_path)
        # 1b. preserve the raw BW ASCII report (.TXT) alongside the binary.
        # Saved at <root>/<serial>/<filename>_ASCII.TXT.  Lets us re-parse
        # offline after parser fixes without needing to re-forward from
        # the watcher PC.  Negligible storage cost (~15 KB per event).
        # Skipped silently when no report was supplied (live download path,
        # manual upload without paired TXT).
        txt_filename: Optional[str] = None
        if bw_report_text is not None:
            try:
                txt_path = self.txt_path_for(serial, filename)
                if isinstance(bw_report_text, bytes):
                    txt_path.write_bytes(bw_report_text)
                else:
                    txt_path.write_text(bw_report_text)
                txt_filename = txt_path.name
            except Exception as exc:
                log.warning(
                    "save_imported_bw: failed to save TXT for %s: %s — "
                    "continuing without it",
                    filename, exc,
                )
        # 2. write the .h5 clean-waveform file from the parsed Event.
        # Note: peaks here are computed from raw samples (the BW file
        # doesn't carry the device-authoritative 0C peaks).  Best-effort.
@@ -344,7 +434,9 @@ class WaveformStore:
            blastware_sha256=sha256,
            source_kind="bw-import",
            a5_pickle_filename=None,
            txt_filename=txt_filename,
            review=existing_review,
            bw_report=bw_report,
        )
        event_file_io.write_sidecar(sidecar_path, sidecar)
@@ -360,6 +452,289 @@ class WaveformStore:
            "a5_pickle_filename": None,
            "hdf5_filename":      hdf5_filename,
            "sidecar_filename":   sidecar_path.name,
            "serial":             serial,
        }
    def save_imported_idf(
        self,
        idf_bytes: bytes,
        source_path: Path,
        *,
        serial_hint: Optional[str] = None,
        idf_report_text: Optional[Union[str, bytes]] = None,
    ) -> tuple[Optional["Event"], dict]:
        """
        Ingest a Thor (Micromate Series IV) IDF event file (`.IDFW` or
        `.IDFH`) produced by Thor's TXT exporter.
        Workflow:
          1. For sig-A `.IDFW` binaries, decode samples + binary metadata
             via ``micromate.idf_file.read_idf_file()``.  Failure or
             non-IDFW path falls through to the .txt-only flow.
          2. Parse the paired TXT report (when supplied) via
             ``micromate.parse_idf_report`` → dict.  TXT remains the
             source of truth for fields the binary doesn't yet supply
             (full peak set with ZC freq / Time of Peak, sensor self-check,
             firmware string, project strings).
          3. Wrap parsed dict + filename into a typed ``micromate.IdfEvent``.
          4. Copy bytes verbatim into ``<root>/<serial>/<filename>``.
          5. Bridge IdfEvent → ``minimateplus.Event`` and attach
             ``raw_samples`` from the binary decoder (when available).
          6. Write the `.h5` clean-waveform file when samples decoded.
          7. Write the ``.sfm.json`` sidecar with
             ``source.kind = "idf-import"`` and the full raw IDF report
             under ``extensions.idf_report``.
        Returns ``(event, record_dict)`` so the endpoint can both insert
        into SeismoDb and surface the parsed event.
        """
        from micromate import IdfEvent, parse_idf_report
        # 1. Binary decode (sig-A IDFW and IDFH).  Non-fatal: any failure
        # leaves samples / binary metadata unfilled and we proceed with
        # the .txt path as before.
        idf_samples: Optional[dict] = None
        idf_intervals: Optional[list] = None
        binary_md = None
        binary_peaks = None
        is_histogram = False
        try:
            from micromate.idf_file import read_idf_file
            # Pass idf_bytes through `data=` — at this point in the flow
            # the binary hasn't been written to disk yet, so the codec
            # can't read from source_path.  We still pass source_path so
            # the codec has the filename for error messages + .IDFH
            # suffix detection.
            res = read_idf_file(source_path, data=idf_bytes)
            idf_samples = res.samples or None
            idf_intervals = res.intervals
            is_histogram = res.intervals is not None
            binary_md = res.binary_metadata
            binary_peaks = res.event.peaks
        except NotImplementedError:
            # sig-B — codec doesn't handle this yet.
            pass
        except Exception as exc:
            log.warning(
                "save_imported_idf: binary codec failed for %s: %s — "
                "falling back to .txt-only ingest",
                source_path.name, exc,
            )
        # 2. Parse the .txt sidecar (best-effort; non-fatal on failure).
        report_dict: dict = {}
        if idf_report_text is not None:
            try:
                report_dict = parse_idf_report(idf_report_text)
            except Exception as exc:
                log.warning(
                    "save_imported_idf: report parse failed: %s — continuing without it",
                    exc,
                )
        # 3. Backfill report_dict with binary metadata for fields the
        # .txt didn't supply.  Binary takes precedence on tied fields
        # where the binary is more reliable (timestamp, sample_rate),
        # and fills in fields entirely missing from the .txt.
        if binary_md is not None:
            if binary_md.serial and not report_dict.get("serial_number"):
                report_dict["serial_number"] = binary_md.serial
            if binary_md.event_datetime and not report_dict.get("event_datetime"):
                report_dict["event_datetime"] = binary_md.event_datetime
            if binary_md.sample_rate and not report_dict.get("sample_rate"):
                report_dict["sample_rate"] = binary_md.sample_rate
            if binary_md.record_time_sec and not report_dict.get("record_time_sec"):
                report_dict["record_time_sec"] = binary_md.record_time_sec
            # Calibration date (binary) vs calibration text (.txt) cohabit
            # under different keys; no overwrite needed.
            if binary_md.event_datetime and not report_dict.get("event_type"):
                report_dict["event_type"] = (
                    "Full Histogram" if is_histogram else "Full Waveform"
                )
        # Binary-derived peaks fill in when the .txt didn't supply them.
        # They're ~3% low vs the device-authoritative .txt values (residual
        # codec drift), so .txt always wins when present.
        if binary_peaks is not None:
            if binary_peaks.transverse_ips and not report_dict.get("tran_ppv"):
                report_dict["tran_ppv"] = binary_peaks.transverse_ips
            if binary_peaks.vertical_ips and not report_dict.get("vert_ppv"):
                report_dict["vert_ppv"] = binary_peaks.vertical_ips
            if binary_peaks.longitudinal_ips and not report_dict.get("long_ppv"):
                report_dict["long_ppv"] = binary_peaks.longitudinal_ips
        # 4. Build the typed IdfEvent.  Filename is authoritative for
        # (serial, timestamp, kind); the report's event_datetime takes
        # precedence over the filename timestamp inside from_report().
        idf_event = IdfEvent.from_report(report_dict, source_path.name)
        # The binary mic peak (psi) isn't carried through from_report() —
        # IdfReport.from_dict only sees the .txt's dB(L) value.  Pull the
        # binary-derived ``mic_pspl_psi`` onto the typed IdfEvent so the
        # downstream bridge can populate ``PeakValues.micl`` (psi-shaped)
        # and the h5 writer's per-count mic factor lands at a sensible
        # value.  Without this, the h5 mic chart auto-scales against the
        # dB(L) value-as-pseudo-psi and renders ~flat.
        if binary_peaks is not None and binary_peaks.mic_pspl_psi is not None:
            idf_event.peaks.mic_pspl_psi = binary_peaks.mic_pspl_psi
        # Operator-supplied serial_hint wins over the binary's filename
        # prefix when both are present (e.g. callers passing a known-good
        # serial that overrides a misnamed export).
        serial = serial_hint or idf_event.serial or "UNKNOWN"
        # 5. Filesystem write of binary bytes.
        filename = source_path.name
        bw_path = self._serial_dir(serial) / filename
        bw_path.write_bytes(idf_bytes)
        filesize = bw_path.stat().st_size
        sha256   = event_file_io.file_sha256(bw_path)
        # _waveform_key dedups (serial, timestamp) rows in the events
        # table.  Use the binary's sha256 (first 16 bytes) as a stable
        # surrogate — every distinct binary maps to a distinct row.
        waveform_key = bytes.fromhex(sha256)[:16]
        # 6. Bridge to minimateplus.Event for the existing sidecar / DB
        # insert paths.  See IdfEvent.to_minimateplus_event() for the
        # caveats of this bridge (mic units, missing fields → sidecar).
        ev = idf_event.to_minimateplus_event(waveform_key)
        # Attach the decoded sample arrays.  Thor's decoder counts use
        # LSB = 0.0003 in/s for geo (vs BW's 16-count units at 0.005 in/s)
        # — the .h5 writer's geo_range="normal" yields LSB = 10/32768
        # ≈ 0.000305 in/s, so plotted samples come out ~1.7% high.
        # Acceptable known offset; refine with a Thor-aware h5 path later.
        if idf_samples is not None:
            ev.raw_samples = idf_samples
            n_samples = max((len(idf_samples.get(ch, [])) for ch in ("Tran", "Vert", "Long", "MicL")), default=0)
            ev.total_samples = ev.total_samples or n_samples
        # For IDFH histograms there are no per-sample waveform arrays — the
        # device stores one peak ADC count per interval per channel.  Synthesise
        # a 1-sample-per-interval array so the existing h5+renderer pipeline
        # (which groups samples down to ``n_intervals`` bars via max-per-group)
        # produces a non-blank histogram chart.  Each "sample" is the peak ADC
        # count for that interval, so the h5 writer's ``count × geo_fs/32768``
        # conversion yields the right physical value for the bar height.
        if is_histogram and idf_intervals:
            hist_samples = {
                "Tran": [iv.peak_count("Tran") for iv in idf_intervals],
                "Vert": [iv.peak_count("Vert") for iv in idf_intervals],
                "Long": [iv.peak_count("Long") for iv in idf_intervals],
                "MicL": [iv.peak_count("MicL") for iv in idf_intervals],
            }
            ev.raw_samples = hist_samples
            ev.total_samples = ev.total_samples or len(idf_intervals)
        # 7. Write the .h5 clean-waveform file when we have samples to write
        # (either the IDFW per-sample stream, or the IDFH synthesised per-
        # interval peak array).  The renderer treats both shapes the same way.
        hdf5_filename: Optional[str] = None
        if ev.raw_samples:
            hdf5_path = self.hdf5_path_for(serial, filename)
            try:
                event_hdf5.write_event_hdf5(
                    hdf5_path, ev,
                    serial=serial,
                    geo_range="normal",   # Thor's geo full scale is also 10 in/s (Normal)
                    source_kind="idf-import",
                )
                hdf5_filename = hdf5_path.name
            except Exception as exc:
                log.warning(
                    "save_imported_idf: HDF5 write failed for %s: %s — continuing without .h5",
                    hdf5_path, exc,
                )
        # 8. Write the sidecar.  Source kind "idf-import" is on the allow-list.
        sidecar_path = self.sidecar_path_for(serial, filename)
        existing_review = None
        if sidecar_path.exists():
            try:
                existing_review = event_file_io.read_sidecar(sidecar_path).get("review")
            except Exception:
                pass
        sidecar = event_file_io.event_to_sidecar_dict(
            ev,
            serial=serial,
            blastware_filename=filename,
            blastware_filesize=filesize,
            blastware_sha256=sha256,
            source_kind="idf-import",
            a5_pickle_filename=None,
            review=existing_review,
        )
        # Stash the full parsed IDF report under extensions so downstream
        # consumers can recover the rich derived fields that don't fit
        # the BW-shaped event model (Peak Acceleration / Displacement,
        # Time of Peak, sensor self-check, calibration, firmware).
        if report_dict:
            sidecar["extensions"]["idf_report"] = report_dict
        # Project the IDF report into the BW report sidecar shape so the
        # existing Event Report PDF pipeline (sfm/report_pdf.py) can
        # render Thor events without needing a separate code path.  Thor
        # data is 95% the same metric set as BW — the adapter handles
        # the field-name mapping.
        if report_dict or binary_md is not None:
            try:
                from micromate.idf_to_bw_report import build_bw_report_from_idf
                sidecar["bw_report"] = build_bw_report_from_idf(
                    report_dict or {},
                    binary_md=binary_md,
                    intervals=idf_intervals,
                    is_histogram=is_histogram,
                )
            except Exception as exc:
                log.warning(
                    "save_imported_idf: idf→bw_report adapter failed for %s: %s — "
                    "report PDF will fall back to DB-only fields",
                    filename, exc,
                )
        # For histograms, also stash the binary-decoded per-interval
        # records so the UI / report layer doesn't need to re-walk the
        # IDFH file at render time.
        if idf_intervals is not None:
            sidecar["extensions"]["idf_intervals"] = [
                {
                    "offset":     iv.offset,
                    "tran_peak":  iv.peak_count("Tran"),
                    "tran_halfp": iv.tran_halfp,
                    "tran_freq":  iv.freq_hz("Tran"),
                    "vert_peak":  iv.peak_count("Vert"),
                    "vert_halfp": iv.vert_halfp,
                    "vert_freq":  iv.freq_hz("Vert"),
                    "long_peak":  iv.peak_count("Long"),
                    "long_halfp": iv.long_halfp,
                    "long_freq":  iv.freq_hz("Long"),
                    "mic_peak":   iv.peak_count("MicL"),
                    "mic_halfp":  iv.micl_halfp,
                    "mic_freq":   iv.freq_hz("MicL"),
                }
                for iv in idf_intervals
            ]
        event_file_io.write_sidecar(sidecar_path, sidecar)
        log.info(
            "WaveformStore.save_imported_idf serial=%s filename=%s filesize=%d "
            "kind=%s report_attached=%s binary_decoded=%s h5=%s intervals=%d",
            serial, filename, filesize,
            "histogram" if is_histogram else "waveform",
            bool(report_dict),
            (idf_samples is not None) or (idf_intervals is not None),
            hdf5_filename or "(skipped)",
            len(idf_intervals) if idf_intervals else 0,
        )
        return ev, {
            "filename":           filename,
            "filesize":           filesize,
            "sha256":             sha256,
            "a5_pickle_filename": None,
            "hdf5_filename":      hdf5_filename,
            "sidecar_filename":   sidecar_path.name,
            "serial":             serial,
        }
    def load_a5(self, serial: str, filename: str) -> Optional[list[S3Frame]]:
--- a/Show More
+++ b/Show More