diff --git a/.gitignore b/.gitignore index d6e4855..90e5d24 100644 --- a/.gitignore +++ b/.gitignore @@ -1,6 +1,6 @@ /bridges/captures/ /example-events/ - +/tests/fixtures/ /manuals/ # Python build artifacts diff --git a/CHANGELOG.md b/CHANGELOG.md index f2d4f95..1b92776 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,99 @@ All notable changes to seismo-relay are documented here. --- +## [Unreleased] + +--- + +## v0.20.0 — 2026-05-28 + +The "PDF + parser polish" release. Closes out the Event-Report PDF iteration started in v0.17.x: histogram layouts now render correctly against BW reference PDFs, the ASCII parser handles the real-world edge cases production events were tripping over (OORANGE, `>100 Hz`, histogram timestamps), and the `.TXT` preservation rollout lets parser fixes be applied retroactively to ingested events. Adds server-wide timezone support so operator-visible timestamps no longer drift into UTC. Rolls up the substantial "pre-v0.20" body of work that had accumulated under `[Unreleased]` (PDF generation, histogram codec fix, histogram parser fields, `.TXT` preservation, backfill safety) — see the trailing "pre-v0.20.0 work" section below for the full list. + +### Added (2026-05-28) + +- **Server-wide display timezone via `TZ` env var.** Both seismo-relay and terra-view now respect a `TZ` environment variable (default `America/New_York` on prod). Affects server log timestamps, the PDF report renderer's UTC→local conversions on the "Created" footer line, matplotlib's datetime axes, and any other naïve-vs-aware datetime rendering. DB columns (`created_at`, etc.) stay UTC regardless — this is a display-side fix, not a storage-side one. Dockerfile now installs `tzdata` (required for the env var to take effect under `python:slim`). Override per-deployment via the `TZ` line in `docker-compose.yml`. +- **ZC Freq "above-range" handling — render `>100 Hz` instead of `—`.** BW writes `">100 Hz"` literally when the zero-crossing algorithm sees a peak too fast to count (device cuts off at 100 Hz on V10.72). Previously `_parse_number(">100")` returned None and the PDF stats table rendered `—`. Now the parser mirrors the OORANGE pattern: stores 100.0 on `zc_freq_hz` and sets a new `zc_freq_above_range` flag. Flag rides through the sidecar's `bw_report` block. Renders as `>100` in the PDF (per-channel + mic block), as `· >100 Hz` inline on the event modal's Peaks section, and as a dedicated column on the event-browser stats table. Verified against the real T190LD5Q.LK0W fixture from 2026-05-27 plus a synthetic test case. +- **Per-channel ZC Freq surfaced in event modals.** Neither the main webapp modal (`sfm_webapp.html`) nor the standalone event browser (`event_browser.html`) previously exposed ZC Freq. Now both do — webapp shows it inline alongside PPV (`0.04500 in/s · 47 Hz`); event-browser gets a dedicated column on its per-channel stats table. Required wiring a parallel sidecar fetch into the event-browser's `loadEvent()` (it was only fetching `waveform.json`). Falls back to `—` for events without a preserved `.TXT` (pre-2026-05-27 ingests). +- **`scripts/backfill_sidecars.py --reparse-txt` flag.** Before this, the backfill script preserved the `bw_report` block from existing sidecars verbatim — so parser-side fixes (like the `>100 Hz` addition above) couldn't reach old events. The new flag re-runs the current parser against the preserved `/_ASCII.TXT`, overwrites the bw_report block, and cascade-regenerates the sidecar. Implies sidecar regeneration on every event (bypasses the sha/version skip). No-op for events without a preserved .TXT (legacy ingests pre-2026-05-27 .TXT-preservation rollout). Idempotent. Run with `--skip-hdf5` to skip waveform regen — recommended when only the bw_report needs refreshing. Validated end-to-end on prod: 9,999 events refreshed cleanly, ZC Freq + OORANGE flags now populated where the original .TXT had them. + +### Fixed (2026-05-28) + +- **Histogram PDFs no longer 500 on the missing `histogram_interval_size_s` attribute.** The histogram-interval-times derivation block in `gather_report_data` referenced `rd.histogram_interval_size_s`, but the field was never declared on the `ReportData` dataclass nor read from the sidecar projection (it was inlined into `gather_report_data` without the seconds-numeric counterpart making it onto the dataclass). Every histogram PDF render raised `AttributeError → 500`. Waveform PDFs were unaffected. Fix: add the field, read it from the projection's existing `bw_report.histogram.interval_size_s` key. +- **Histogram PDF geo channels now share a single nice-quantized y-axis.** Previously each geo subplot auto-scaled independently — Tran, Vert, and Long all showed different per-channel maxes, so bar heights weren't directly comparable across channels. The footer "Amplitude Geo: X in/s/div" label was also computed as `max(first_geo_channel) / 5` with no LSB quantization, producing nonsense values like `0.003 in/s/div` when the geophone LSB is 0.005. Fix: compute a single shared geo y-axis range from `max(Tran, Vert, Long)`, quantize the per-division step to BW's 1-2-5 sequence rounded to the 0.005 in/s LSB (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, ...), apply the same `ylim` + ticks to all three subplots, and use that step for the footer label. MicL stays on its own auto-scale (different units). Matches BW's chart styling. + +### Docs (2026-05-28) + +- **Roadmap entry for a second undecoded histogram body sub-format.** BE17353 (S353) events observed on 2026-05-28 use a histogram body where `byte[5] = 0x00` (looks like a valid block header by every prior signal) but the walker finds zero data blocks. Different from the existing `byte[5] != 0` roadmap entry (T190 / O121). Operationally identical impact — ingestion succeeds, DB peaks come from the bw_report overlay, only the chart is empty. Sample events captured in the roadmap entry for future RE work. + +### Migration / Operations + +- **Re-parse existing events to pick up the new parser fields.** Run on whichever box hosts the live waveform store: + ```bash + docker exec terra-view-sfm-1 python /app/scripts/backfill_sidecars.py \ + --reparse-txt --skip-hdf5 --dry-run -v | tail + # Looks reasonable? Run for real: + docker exec terra-view-sfm-1 python /app/scripts/backfill_sidecars.py \ + --reparse-txt --skip-hdf5 -v | tee /tmp/reparse.log | tail -30 + ``` + Idempotent; safe to re-run. Only touches sidecars on disk — no DB writes. +- **terra-view docker-compose.yml**: add `TZ=America/New_York` (or your deployment's zone) to both the `terra-view` and `sfm` service `environment:` blocks. Without this, server-rendered timestamps stay in UTC even on the rebuilt SFM image. + +### Pre-v0.20.0 work (rolled into this release) + +The bullets below accumulated under `[Unreleased]` between v0.19.0 and v0.20.0; kept here so the historical narrative isn't lost. + +#### Fixed + +- **bw_ascii_report parser now handles `OORANGE` saturation marker.** BW writes `"OORANGE"` (truncation of "Out Of Range") in PPV / PVS / MicL PSPL fields when the underlying measurement exceeded the channel's full-scale. Previously our `_parse_number()` returned None → DB ended up with NULL peaks for legitimate high-amplitude events. Confirmed on real ASCII files pulled 2026-05-27 from the Windows watcher PC: T190LD5Q.LK0W (Vert saturated at Normal range 10 in/s), T438L713.RY0W (all three channels saturated at Sensitive range 1.25 in/s), K557L3YM.OE0W (Tran+Vert saturated + Mic PSPL OORANGE). New behavior: + - Per-channel PPV: substitute `geo_range_ips` as a conservative lower bound + set `ppv_saturated` flag + - Peak Vector Sum: substitute `sqrt(3) * geo_range_ips` (the theoretical max when all 3 channels are simultaneously at full-scale) + `peak_vector_sum_saturated` flag + - MicL PSPL: substitute 140 dB(L) (conservative NL-43 max) + `pspl_saturated` flag + - Saturation flags are propagated into the sidecar's `bw_report` block for downstream UI rendering (`> 10 in/s` or similar) + - Five events on prod (T190 / T438 / K557 + 2 others matching the same fault pattern) will pick up correct DB peaks + saturation flags once re-forwarded +- **bw_ascii_report parser handles `Peak Vector Sum TimeSum` typo'd label.** Real BW output uses this misspelled label (Sum appended twice instead of "Peak Vector Sum Time"). Now accepted as an alias. Confirmed against all three OORANGE example files — every one has the typo. + +#### Added + +- **Histogram per-interval aggregation in `waveform.json`.** Histogram events now render with one bar per BW-reported interval (matching the Blastware printout) instead of ~200 bars per event (the raw codec output). When the sidecar's `bw_report.histogram.n_intervals` is populated (events ingested with the new parser, see next bullet), the `/db/events/{id}/waveform.json` endpoint groups the codec samples into N intervals via max-per-group and returns the aggregated array. `time_axis` gains `histogram_aggregated: true`, `n_intervals`, `interval_size_s`, and `interval_times` (HH:MM:SS strings). Both the modal chart and the standalone event browser use those interval timestamps as x-axis labels when present. Defensive: no-op for events ingested before the parser extension landed (their sidecars lack `histogram.n_intervals`) — those continue to render with raw codec output. +- **`bw_ascii_report` parser now captures histogram-specific fields.** Previously the parser dropped these fields silently (Roadmap item closed): + - `Histogram Start Time` / `Histogram Start Date` (combined into `histogram_start: datetime`) + - `Histogram Stop Time` / `Histogram Stop Date` (combined into `histogram_stop: datetime`) + - `Number of Intervals` (`histogram_n_intervals: int`) + - `Interval Size` ("1 minute" string + parsed seconds: `histogram_interval_size_str`, `histogram_interval_size_s`) + - ` Peak Time` + ` Peak Date` for histogram events (combined into `channel_peak_when: dict`; waveforms continue to use `time_of_peak_s` relative) + - `Peak Vector Sum Date` (combined with PVS Time into `peak_vector_sum_when: datetime`; clears the previous bogus `peak_vector_sum_time_s` parse that interpreted "22:33:52" as 22.0 seconds) + - All new fields land in the sidecar's `bw_report.histogram` block via `_bw_report_to_dict`. Tested against synthetic K558LLB7.V20H-shaped input. +- **Raw BW ASCII report (.TXT) preservation.** `save_imported_bw` now writes the paired `_ASCII.TXT` to `//_ASCII.TXT` alongside the binary at ingest time. Previously the .TXT was parsed into the sidecar's `bw_report` projection and then discarded — meaning parser bug fixes couldn't be applied retroactively without re-forwarding from the watcher PC. Now the raw .TXT lives in the waveform store permanently (~15 KB per event; ~210 MB total for a 14k-event store; negligible). Sidecar's `source.txt_filename` field records the saved path; backfill_sidecars preserves it across regens. New `GET /db/events/{id}/ascii_report.txt` endpoint serves the raw .TXT for any event ingested after this change. Events ingested before today still return 404 from that endpoint until re-forwarded. Architectural rationale: with BW Mail / Forwarding Agent being phased out of the operator workflow, the XML/PDF/WMF that those tools produced are no longer available — the binary + .TXT (created by BW ACH itself) are our authoritative source for everything going forward. + +- **Event Report PDF generation** — `GET /db/events/{id}/report.pdf` returns a single-page letter-portrait PDF for any event with waveform data on disk. Covers every field a Blastware Event Report includes: header metadata (date/time, trigger source, range, sample rate, project/client/operator/location, serial+firmware, battery, calibration, file name), microphone block (PSPL in dB(L) + psi, ZC freq, channel test), per-channel stats table (rows differ for waveform vs histogram), Peak Vector Sum, and the 4-channel plot. Iterated against real Blastware reference PDFs (uploaded to `example-events/pdfsnstuff/`): + - **Waveform layout**: header shows Date/Time, Trigger Source, Range, Sample Rate; stats table has PPV / ZC Freq / Time (Rel. to Trig) / Peak Accel / Peak Disp / Sensor Check; bottom plot is 4-channel line waveform (MicL top → Tran bottom), shared time axis in seconds, dashed trigger line + triangle marker at t=0, symmetric Y on geo channels, zero-anchored on mic, "0.0" baseline label on right per BW convention; footer shows `Time X sec/div Amplitude Geo: Y in/s/div Mic: 0.001 psi(L)/div` and the trigger window `▶━━◀` marker. USBM RI8507/OSMRE compliance chart placeholder upper-right. + - **Histogram layout**: header shows Start / Finish / Intervals At Size / Range / Sample Rate (no Trigger Source — histograms aren't triggered); NO USBM chart; stats table has PPV / ZC Freq / Date / Time / Sensor Check; bottom plot is per-interval bar chart, Y-axis 0-to-peak (never negative), 0.0 baseline at the bottom; footer shows `Time INTERVAL_SIZE /div Amplitude Geo: Y in/s/div Mic: 0.001 psi(L)/div`. + - Backed by matplotlib (vector PDF, no headless-browser dep). Adds matplotlib>=3.8 to deps. + - **Known gap**: histogram codec returns per-block granularity (~200 bars for a 4-interval event) instead of BW's per-interval aggregation. Visual difference vs BW's 4-bar display. XML-driven data source (parsing the structured `_XML.XML` files BW also exports) is the planned fix; that route also resolves the bw_ascii_report PPV-miss bug. + - **Stubbed**: USBM RI8507 / OSMRE compliance chart curves (separate work item; requires coding the regulatory piecewise functions). +- **"Download PDF" button** in the event modal's footer — triggers the new endpoint; opens in a new tab so the browser handles save-or-display + surfaces any 404 / server errors visibly. + +- **SFM webapp now opens to Database view by default** and the History table is fully interactive. Click any column header to sort ascending / descending (timestamp, serial, per-channel PPV, PVS, mic dB(L), project, client, record type, key — all sortable). Click any event row to open the event modal, which now renders a **4-channel waveform plot inline** (MicL / Long / Vert / Tran stacked, Instantel-printout order) alongside the existing sidecar review fields. Headers are sticky so the columns stay visible while scrolling long event lists. No more "where is the viewer" — pick a unit from the filter dropdown, scan the table, click the event, see the waveform. +- **Stored-event browser** — new standalone HTML page at `GET /events` (`sfm/event_browser.html`). Pick a serial from the unit dropdown, scroll through that unit's events (newest-first), click any event to render its decoded waveform via the existing `/db/events/{id}/waveform.json` endpoint. Dark-themed Chart.js viewer, channels stacked vertically (MicL / Long / Vert / Tran — Instantel printout order, designed PDF-export-ready), trigger line at t=0, peak labels, search/filter, false-trigger flag honored. Companion to the existing live-device viewer at `/waveform`; the two routes are now clearly delineated in their docstrings. The webapp's inline plot at `/` is the primary path; `/events` remains a useful diagnostic when you want just a viewer. +- **Histogram body codec — uint8 peak count fix.** Per-channel peak fields at `block[6]/[10]/[14]/[18]` are `uint8`, not `uint16 LE` spanning `block[6:8]` etc. The original interpretation was byte-exact on the N844 fixture corpus only because every annotation byte (`block[7]/[11]/[15]/[19]`) in those fixtures was zero. On non-N844 events with non-zero annotation bytes (observed across BE9558 Tran-drift and BE18003 Histogram+Continuous units), the old interpretation produced peaks up to 268 in/s per channel and 35× inflated PVS sums when first deployed to prod (rolled back same day; properly fixed in this release). Cross-correlated against BW's per-interval ASCII export on K558 / T003 / N599 / N844 corpora — 100% byte-exact on T/V/L, 99%+ on M (sub-precision rounding). Annotation byte preserved on each record as `record["annotations"]` for future RE. Verified against ~3,500 blocks across 5 in-repo fixtures + a synthetic K558 interval-12 regression block. +- **`apply_bw_report_dict_to_event` helper** in `minimateplus.event_file_io`. Mirror of `apply_report_to_event` for the projected sidecar dict shape — used by the backfill path, which has the preserved `bw_report` block but not the original `.TXT` file. BW's reported peaks (and `sample_rate` / `record_time`) now win over codec output during `--force` backfill, matching ingest-path behavior. +- **`scripts/check_bw_report_preservation.py`** — two-step snapshot/diff tool to verify that `backfill_sidecars.py` doesn't wipe the `bw_report` block from existing sidecars. Classifies every sidecar as PRESERVED / CHANGED / WIPED / STILL_MISSING / NEW / ADDED / REMOVED. Exit code 1 if any WIPED or CHANGED entries are found, so it can gate a CI step or deploy script. + +#### Fixed + +- **`scripts/backfill_sidecars.py` no longer wipes `bw_report`.** Before this fix, `event_to_sidecar_dict` silently dropped the preserved `bw_report` block during every backfill, since the function only emits a `bw_report` when called with a live `BwAsciiReport` dataclass (which the backfill doesn't have — only the projected sidecar dict). Now we read the existing sidecar's `bw_report` and overlay it onto the regenerated sidecar, alongside the existing `review` and `extensions` preservation. +- **`scripts/backfill_sidecars.py --force` no longer overwrites BW-overlaid DB peaks with codec output.** The backfill path now calls `apply_bw_report_dict_to_event` before the DB upsert, mirroring what the ingest path does (`/db/import/blastware_file` parses the `.TXT` into a `BwAsciiReport`, calls `apply_report_to_event`, then upserts). Without this, events where the codec doesn't fully decode (waveform walker edge cases on SP0/SS0/SV0-style events, histogram `byte[5]!=0` sub-format) ended up with PVS=0 in the DB after a `--force` backfill; bit on prod 2026-05-22, rolled back the same day. +- **Thor IDF files no longer attempted as BW events in backfill.** `scripts/backfill_sidecars.py` now filters out `.IDFW` / `.IDFH` files in `_looks_like_event_file()`; they share the `.X0W` / `.X0H` suffix shape but use a separate ingest path (`WaveformStore.save_imported_idf`) and aren't decodable by `event_file_io.read_blastware_file`. + +#### Docs + +- **CLAUDE.md** — added a three-tier conceptual architecture model (SFM / SDM / shared codec library) near the top of the file, with a placement rule for where new code goes. Documents that what is conceptually SDM (database, waveform store, ingest, `/db/*` endpoints) still lives under `sfm/` for historical reasons; rename deferred until the codebase is quiet enough for a clean refactor. +- **README.md** — added a "Strategic direction" lead-in to the Roadmap that frames seismo-relay as a suite of cooperating components (not a single app), and an explicit "Terra-View ↔ SFM device control" roadmap section with a concrete implementation checklist (auth as hard prerequisite, embedded live-monitor view, action history, Series IV live-device support). +- **`docs/histogram_codec_re_status.md`** updated with the uint8 retraction and the annotation-byte status. +- Three known issues recorded in the Roadmap that were discovered during prod validation: (1) `bw_ascii_report` parser misses PPV / `vector_sum` on some `.TXT` formats (5 events on prod); (2) NULL-timestamp duplicate-row dedup needed (2 events on prod); (3) histogram body sub-format with `byte[5] != 0` not yet decoded (~3 events on prod with empty `.h5` plots). + +--- + ## v0.19.0 — 2026-05-20 The "device-family separation" release. Tightens the boundary between Series III (MiniMate Plus / Blastware) and Series IV (Micromate / Thor) so the UI and storage layer dispatch deterministically by family instead of sniffing filename extensions or magnitude heuristics. diff --git a/CLAUDE.md b/CLAUDE.md index 5dd6629..c2892d6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,12 +2,90 @@ Ground-up Python replacement for **Blastware**, Instantel's Windows-only software for managing MiniMate Plus seismographs. Connects over direct RS-232 or cellular modem -(Sierra Wireless RV50 / RV55). Current version: **v0.17.0**. +(Sierra Wireless RV50 / RV55). Current version: **v0.20.0**. When new information about the protocol is discovered, please update the instantel_protocol_reference.md with the findings in addition to this document --- +## Architecture: three-tier conceptual model + +seismo-relay is a **suite of cooperating components**, not a single app. +The three tiers below are the canonical mental model — the current +directory layout doesn't fully reflect them yet (some of what is +conceptually SDM lives under `sfm/` today), but new code should be +placed and named according to this model. + +### 1. SFM — the device-side (active connection to physical units) + +Replaces Blastware's *talk-to-the-meter* role. Lives where a connection +to a physical seismograph is open. + +In scope: +- `minimateplus/{transport,framing,protocol,client}.py` — wire protocol +- `seismo_lab.py` — diagnostic GUI (a thick client for SFM) +- The `/device/*` HTTP endpoints in `sfm/server.py` — + `/device/info`, `/device/events`, `/device/monitor/*`, `/device/call_home`, + etc. Anything that opens a connection at the moment of the request. +- Future: a Thor / Micromate live client (mirror `minimateplus/`) +- Future: a control surface Terra-View can launch into — see the + README's Roadmap. + +Does NOT own a database. Outputs `Event` objects. Has a "spun up when +needed" runtime profile rather than "always on". + +### 2. SDM — the data-side (storage, ingest, and serving) + +The new name for the receiving-and-storing role. Originally called SFM +because the FastAPI service started life as a thin device proxy, but +the actual role has migrated heavily toward data management. **For now +the directory remains `sfm/`** — renaming requires touching ~30-50 +files in seismo-relay + ~10-15 in terra-view + a Docker volume +migration; deferred until the codebase is quiet enough to do it as a +clean refactor. + +In scope: +- `sfm/database.py` (`SeismoDb`) +- `sfm/waveform_store.py`, `sfm/event_hdf5.py` +- The `/db/*` HTTP endpoints — `events`, `units`, `monitor_log`, + `sessions`, `false_trigger` mutations +- The `/db/import/*` ingest endpoints — `blastware_file` (series3), + `idf_file` (series4); anything that receives events FROM somewhere +- `scripts/backfill_sidecars.py`, `scripts/check_bw_report_preservation.py`, + and similar data-maintenance tools +- The `.sfm.json` sidecars and `.h5` files in the waveform store +- The shape that Terra-View consumes (Terra-View should never need to + reach into SFM/device-side endpoints to populate its UI) + +Always-on, scaled for storage/serving, has the DB and waveform store. + +### 3. Codec library — pure data interpretation (used by both sides) + +Neither SFM nor SDM — a shared library both depend on. + +In scope: +- `minimateplus/{waveform_codec,histogram_codec,event_file_io,bw_ascii_report,blastware_file}.py` +- `micromate/{idf_ascii_report,idf_file}.py` + +These modules take bytes (off the wire on the SFM side, or from a +forwarded file on the SDM side) and return `Event` objects. They +should not import from `sfm/`, must not touch a DB, and have no I/O +beyond reading files passed as arguments. Keep them pure — both +tiers can then depend on them without circularity. + +### Practical consequences + +When deciding where new code goes, ask: +- *Does it need a connection to a device?* → SFM +- *Does it operate on stored events / sidecars / DB rows?* → SDM +- *Does it interpret bytes into structured data, with no I/O of its own?* → codec lib + +Terra-View is downstream of SDM for data, and (per the roadmap) will +eventually invoke into SFM's device-control endpoints to provide a +"connect to unit" experience. + +--- + ## Project layout ``` diff --git a/Dockerfile b/Dockerfile index 8fb05f7..af55af5 100644 --- a/Dockerfile +++ b/Dockerfile @@ -2,14 +2,27 @@ FROM python:3.11-slim WORKDIR /app +# tzdata is required for the TZ env var to take effect (python:slim +# omits the timezone database). Without it, datetime.now() / logging +# / matplotlib all stay in UTC regardless of TZ. Default zone gets +# set further down via ENV; users override per-deployment via the +# `TZ` env var in docker-compose. RUN apt-get update && \ - apt-get install -y --no-install-recommends curl && \ + apt-get install -y --no-install-recommends curl tzdata && \ rm -rf /var/lib/apt/lists/* +# Default display timezone — applied to server logs, datetime.now(), +# matplotlib rendered timestamps, and any naïve-vs-aware datetime +# conversions in the PDF renderer. Override via TZ env var in +# docker-compose; storage in the DB is always UTC regardless. +ENV TZ=America/New_York + COPY pyproject.toml requirements.txt ./ COPY minimateplus ./minimateplus -COPY sfm ./sfm -COPY bridges ./bridges +COPY micromate ./micromate +COPY sfm ./sfm +COPY bridges ./bridges +COPY scripts ./scripts RUN pip install --no-cache-dir -e . diff --git a/README.md b/README.md index c057f68..7522bb1 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# seismo-relay `v0.19.0` +# seismo-relay `v0.20.0` A ground-up replacement for **Blastware** — Instantel's aging Windows-only software for managing seismographs. Supports both the **MiniMate Plus @@ -35,6 +35,16 @@ over direct RS-232 or cellular modem (Sierra Wireless RV50 / RV55). > and storage layer dispatch deterministically instead of sniffing > filenames. Self-applying migration backfills existing rows from the > binary filename extension. +> **v0.20.0 (2026-05-28)** closes out the Event-Report PDF iteration +> started in v0.17.x: histogram layouts render correctly against BW +> reference PDFs, the ASCII parser handles real-world edge cases +> (`OORANGE`, `>100 Hz`, histogram timestamps), and per-channel ZC +> Freq is surfaced in both modals (event browser + main webapp). +> Adds a server-wide `TZ` env var so operator-visible timestamps +> render in local time instead of UTC. New +> `scripts/backfill_sidecars.py --reparse-txt` lets parser fixes be +> applied retroactively to existing events without re-forwarding, +> using the `.TXT` files preserved at ingest time. > See [CHANGELOG.md](CHANGELOG.md) for full version history. --- @@ -459,6 +469,72 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows. ## Roadmap (Future) +### Strategic direction — where this is going + +seismo-relay is being built as a **suite of cooperating components** +that together replace and improve on Blastware's role. Three logical +tiers: + +1. **SFM** (device-side) — owns the active connection to a physical + unit. Today: `minimateplus/`, `/device/*` HTTP endpoints, + `seismo_lab.py`. Future: live Thor / Micromate support. +2. **SDM** (data-side) — owns the database, waveform store, ingest + pipelines, and the read-API that Terra-View consumes. Today this + code lives under `sfm/` for historical reasons; the role has + migrated and the eventual rename is on the long-tail cleanup list. +3. **Codec library** — pure data-interpretation: `minimateplus/*_codec.py`, + `bw_ascii_report.py`, `micromate/idf_*.py`. Used by both SFM and + SDM, depends on neither. + +Terra-View is downstream of SDM for fleet listings, event detail, etc. +The long-term vision adds a **second link** from Terra-View → SFM for +direct device interaction (see below). + +The codec work in this repo isn't trying to replace BW's network +layer — BW's ACH file forwarding and Thor's IDF call-home are +battle-tested. The value is in the receiving and processing side: turn +the stream of binary+ASCII pairs into something users can search, +filter, alert on, and report from. + +### Terra-View ↔ SFM device control (the long-term vision) + +Today Terra-View only reads from SDM (event listings, dashboards, +project reports). When a unit goes missing — operator notices in the +Terra-View dashboard — there's no way to *do* anything from the UI. +The path of least resistance is to RDP into a Windows box and open +Blastware, which defeats the purpose of having Terra-View. + +Target experience: +- Operator notices a unit in Terra-View dashboard hasn't called in. +- Clicks unit detail → "Connect to Device" button. +- Terra-View opens an embedded view (modal or side-panel) that talks + to SFM's `/device/*` endpoints over the network. +- Live view: device clock, battery, memory, current monitor status. +- Actions: start/stop monitoring, push compliance config changes, pull + fresh events, run a sensor self-check, change call-home settings. +- Audit log: every connect / action recorded in SDM for the unit + history. + +Implementation steps (concrete): +- [ ] **SFM authentication & authorization layer.** Today `/device/*` + endpoints are unauthenticated — anyone on the network can call + them. Need at minimum a token-based auth, ideally with a "who + can connect to which units" mapping. Hard prerequisite for + letting Terra-View users into the control surface. +- [ ] **Terra-View "Connect to Device" entry point** on the unit + detail page. Renders only when unit has connection info on file + and the user has permission. +- [ ] **Embedded live-monitor view** in Terra-View — equivalent to + `seismo_lab.py`'s Bridge tab, but in the browser. Polls SFM's + `/device/monitor/status` on an interval; sends start/stop via + `/device/monitor/{start,stop}`. +- [ ] **Action history** — every connect / push / action call records + a row in `unit_history`, viewable on the unit detail page. +- [ ] **Series IV live-device support in SFM** — currently `/device/*` + only supports MiniMate Plus. Blocks "Connect to Device" for + Thor units until done. Depends on Thor wire-protocol capture + and a `micromate/` parallel of the `minimateplus/` modules. + ### High-impact (unblocks product features) - [ ] **Series III waveform body codec reverse-engineering.** The 5A bulk-stream body is some kind of compressed/encoded format (not raw int16 LE as previously assumed — see §7.6.1 retraction in `docs/instantel_protocol_reference.md`). Structural framing is ~50% decoded on branch `claude/codec-re-cBGNe` (tagged-block walker, segment counters); per-byte sample mapping is still open. Until this lands, the in-app waveform viewer renders garbage and BW-import peak values fall back to `_peaks_from_samples()` saturation noise. Workaround: pair every BW-imported event with its `_ASCII.TXT` so the device-authoritative peaks land in the DB regardless of codec. @@ -470,9 +546,10 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows. ### BW ASCII report parser enhancements (built in v0.16.0) -- [ ] **Histogram-specific structural fields.** Current parser handles the shared fields (PPV, ZC Freq, sensor self-check, project) but silently drops histogram-only fields: `Histogram Start/Stop Time`, `Histogram Start/Stop Date`, `Number of Intervals`, `Interval Size`, per-channel `Peak Time` + `Peak Date` (absolute timestamps rather than the waveform's `Time of Peak` relative seconds). +- [x] **PPV field misses on certain TXT formats.** ✅ v0.20.0 — root cause was the `OORANGE` (Out Of Range) saturation marker that BW writes when a channel exceeds its full-scale; `_parse_number()` returned None for the non-numeric value. Parser now substitutes `geo_range_ips` as a lower bound + sets `ppv_saturated` flag. All 5 prod events (T190LD5Q.LK0W, T438L713.RY0W, K557L3YM.OE0W, + 2 others) now parse cleanly. +- [x] **Histogram-specific structural fields.** ✅ v0.20.0 — `Histogram Start/Stop Time+Date`, `Number of Intervals`, `Interval Size`, per-channel `Peak Time` + `Peak Date`, and `Peak Vector Sum Date` all parse now. Land in the sidecar's `bw_report.histogram` block. - [ ] **Histogram interval bin-table parsing.** Trailing 792-row table (per-interval Peak/Freq per channel + MicL) in histogram TXTs is unparsed. Probably too big for the sidecar JSON; may want a separate `.histogram.h5` companion file. -- [ ] **`>100 Hz` value parsing.** Histogram TXTs use `>100 Hz` for out-of-range ZC freq; current `_parse_number()` returns `None` for these (loses information). +- [x] **`>100 Hz` value parsing.** ✅ v0.20.0 — parser now mirrors the OORANGE pattern: stores 100.0 on `zc_freq_hz` + sets `zc_freq_above_range` flag. PDF + both modals render `>100 Hz` instead of `—`. ### Ingestion gaps @@ -498,3 +575,7 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows. - [ ] Locate "Sensor Check" byte in compliance config (need capture with Disabled vs Before-monitoring). - [ ] Call Home — map time slots 3/4 offsets; confirm `modem_power_relay_enabled`. - [ ] RV55 DCD/DTR — newer RV55 firmware doesn't assert DCD by default; units don't resume monitoring after call-home disconnect (`--restart-monitoring` flag deferred). +- [ ] **NULL-timestamp duplicate-row dedup.** A small handful of events (2 known on prod as of 2026-05-22) have `events.timestamp IS NULL` because the codec couldn't extract a timestamp from the binary footer. The `UNIQUE(serial, timestamp)` constraint doesn't fire on `NULL` (SQL semantics: `NULL ≠ NULL`), so every `--force` backfill INSERTs a new row instead of UPSERTing the existing one. Cleanup: a one-shot SQL query that keeps only the newest row per `(serial, blastware_filename)` and deletes the rest. Longer-term: extend the unique key to `(serial, COALESCE(timestamp, blastware_filename))` or reject inserts with NULL timestamp. +- [ ] **Histogram body sub-format with `byte[5] != 0`.** ~3 events on prod (`T190LD5Q.LD0H`, `O121L4L1.GU0H`) use a histogram body my walker doesn't recognize — the first block has `byte[5] = 0x01` or `0x07` instead of `0x00`, and the entire body lacks the `1e 0a 00 00` tail signature. Codec returns 0 valid blocks; their DB PVS comes from the bw_report ASCII overlay (which BW computed from the same binary, so the DB columns are correct). Only the `.h5` waveform plot is empty. Cracking the sub-format would unlock the plot. Needs binary+ASCII pairs from a few `byte[5]!=0` events; same RE approach as the K558 case. +- [ ] **Histogram body sub-format with `byte[5] == 0x00` but undecodable.** Observed 2026-05-28 on BE17353 (S353) events: `S353L4H2.FZ0H`, `S353L4H2.P00H`, `S353L4H3.7O0H`, `S353L4H3.E10H`. Body starts `00 00 00 01 0a 00 XX 00 ...` which LOOKS like a valid histogram block header (marker 0x000a at byte[4:6] ✓, byte[5]=0x00 normal-format ✓), but the walker finds zero data blocks across the whole body. Likely an extra header before the block stream OR a different tail signature than `1e 0a 00 00`. Smaller body lengths (1900-2100 bytes) suggest these may be short-recording histogram variants. Same operational impact as the byte[5]!=0 case: event ingests cleanly, DB peaks correct via bw_report overlay, only the chart is empty. Worth dumping a hex view of one body to diagnose. +- [ ] **Sensor-check waveform extraction from the BW binary.** BW's Event Report PDFs include a narrow panel on the right side of the waveform plot showing each channel's response to the sensor self-check signal (a damped sinusoid for geo, sawtooth-at-test-freq for mic). Our parser captures the test RESULTS (`test_freq_hz`, `test_ratio`, `test_amplitude_mv`, `test_results` pass/fail) and the PDF + modal display them as text — but BW's per-sample sensor-check waveform isn't accessible to us today. Two paths to add it: (a) RE the binary to find where the sensor-check samples are stored — could be a section before STRT, after the footer, or in a separate sub-record; protocol reference doesn't currently mention it. (b) If samples aren't in the binary, synthesize a representative waveform from the test parameters (damped sinusoid at `test_freq_hz` with damping from `test_ratio`). Path (a) is the honest answer; path (b) is decorative. Until either lands, the text-only sensor-check display in the report is fine. diff --git a/docs/histogram_codec_re_status.md b/docs/histogram_codec_re_status.md new file mode 100644 index 0000000..6fa388c --- /dev/null +++ b/docs/histogram_codec_re_status.md @@ -0,0 +1,185 @@ +# Histogram body codec — FULLY DECODED (2026-05-20) + +Clean working status doc for the MiniMate Plus histogram-mode event +body codec. Companion to `waveform_codec_re_status.md`. The deep +historical record (with retractions and dated analyses) lives in +`docs/instantel_protocol_reference.md §7.6.2`; the authoritative +implementation lives in `minimateplus/histogram_codec.py`. + +## TL;DR + +**The codec is fully decoded.** Every field of every block in the +in-repo histogram fixture corpus decodes byte-exact against BW's +ASCII export. + +26 regression tests pass against ~3,500 blocks across 5 in-repo +fixtures, plus a synthetic regression block taken from a real +BE9558 prod event to lock in the uint8-peak interpretation. + +**Important correction (2026-05-21):** the per-channel peak count +is `uint8` at byte[6]/[10]/[14]/[18], NOT `uint16 LE` at byte[6:8] +etc. The N844 fixture corpus the original RE was done against has +zero values in bytes [7]/[11]/[15]/[19] for every block, so the +two interpretations happened to be equivalent. Cross-correlating +non-N844 events (BE9558 Tran-drift, BE18003 Histogram+Continuous) +against BW's per-interval ASCII export — 4 channels × ~1400 blocks +per event × multiple events = 100% byte-exact only when the peak +is read as uint8. Reading as uint16 LE produced peaks up to 268 +in/s per channel and 35× inflated PVS sums when first deployed to +prod (rolled back, root-caused, and fixed in commit 7183b95+1). + +## Body format + +``` +body = [stream of 32-byte data blocks] + [small trailing remnant] +``` + +Each block represents one histogram interval. Block layout: + +``` +[0] 0x00 always-zero tag +[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment +[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …) +[4:6] 0x000a (uint16 LE) constant marker (= 10) +[6] T_peak_count uint8 Tran peak (count × 0.005 → in/s at Normal, + max 1.275 in/s — fits in uint8) +[7] T_annotation uint8 empirically non-zero on intervals with sub-Hz + or unmeasurable freq; meaning not fully RE'd +[8:10] T_halfperiod uint16 LE Tran half-period in samples + (freq_Hz = 512 / halfp; ≤ 5 means ">100 Hz") +[10] V_peak_count uint8 Vert peak +[11] V_annotation uint8 +[12:14] V_halfperiod uint16 LE Vert freq half-period +[14] L_peak_count uint8 Long peak +[15] L_annotation uint8 +[16:18] L_halfperiod uint16 LE Long freq half-period +[18] M_peak_count uint8 MicL peak count + (dB via waveform_codec.mic_count_to_db) +[19] M_annotation uint8 +[20:22] M_halfperiod uint16 LE MicL freq half-period +[22:24] 0x00 0x00 constant +[24:28] 4-byte variable purpose unknown — possibly CRC, + timestamp delta, or psi(L) numeric; + not needed for waveform reconstruction +[28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature +``` + +Reliable block-identification anchor: +```python +block[22:24] == b"\x00\x00" and block[28:32] == b"\x1e\x0a\x00\x00" +``` +(The `1e 0a 00 00` constant tail is the most distinctive signature.) + +## Per-channel encoding + +| Channel | Peak encoding | Frequency encoding | +|---|---|---| +| Tran | count × 0.005 = in/s at Normal range | `freq_Hz = 512 / halfperiod` | +| Vert | same | same | +| Long | same | same | +| MicL | count → dB via `mic_count_to_db(count)` (same formula as waveform codec) | same | + +**`>100 Hz` sentinel**: when halfperiod ≤ 5 (giving ≥100 Hz from the +512/halfp formula), BW displays `>100 Hz`. Codec's `half_period_to_hz` +returns `None` in this range. + +## Verified facts (cross-checked against fixture corpus) + +Example: N844L6Z8.ZR0H block 130 → all 8 decoded fields byte-exact: + +``` +binary samples [10, 6, 24, 4, 18, 5, 21, 5, 9] +TXT row [0.030, 21, 0.020, 28, 0.025, 24, 0.040, 0.000, 95.92, 57] + +slot[0] = 10 marker +slot[1] = 6 × 0.005 = 0.030 in/s ✓ T_peak +slot[2] = 24 → 512/24 = 21.3 → 21 Hz ✓ T_freq +slot[3] = 4 × 0.005 = 0.020 in/s ✓ V_peak +slot[4] = 18 → 512/18 = 28.4 → 28 Hz ✓ V_freq +slot[5] = 5 × 0.005 = 0.025 in/s ✓ L_peak +slot[6] = 21 → 512/21 = 24.4 → 24 Hz ✓ L_freq +slot[7] = 5 → 81.94 + 20·log10(5) = 95.92 dB ✓ M_peak +slot[8] = 9 → 512/9 = 56.9 → 57 Hz ✓ M_freq +``` + +## Verified test coverage + +`tests/test_histogram_codec.py` (24 tests): + +- Block walking: yields one record per `.TXT` interval ± 1 (off-by-one + at the tail when recording was stopped mid-write). Segment-ID + groups of 256 blocks confirmed. +- Geo peaks: every block of N844L20G, N844L6Z8, N844L6XE, N844L23B + matches `.TXT` within the 0.0005 in/s quantization step. +- Geo freqs: every block of N844L6Z8 and N844L6XE matches `.TXT` + within 1 Hz (BW display rounds). `>100 Hz` sentinel handled correctly. +- Mic dB: every block of N844L6XE, N844L23B, N844L6Z8 matches `.TXT` + within 0.1 dB (BW display precision). +- Mic freq: matches `.TXT` within 1 Hz across active blocks. + +## What's NOT yet decoded + +- **Annotation bytes (`block[7]/[11]/[15]/[19]`)**. Empirically + non-zero on intervals where the per-channel ZC frequency comes + out as `N/A` or sub-Hz (`<1.0`, `1.X`). Hypothesis tested in the + RE session: byte != 0 ↔ sub-Hz freq. Only ~50% correlation + across the K558 corpus, so the relationship is more complex. + Possibilities: time-of-peak-within-interval, halfp extension for + very-long-period signals, or a debug/diagnostic field the firmware + writes opportunistically. Doesn't affect peak amplitudes or + waveform reconstruction. Captured as `record["annotations"]` for + future RE. +- **4-byte variable metadata field (bytes 24:28)**. Not needed for + waveform reconstruction. Speculation: per-block CRC, sub-second + timestamp offset, or a Mic psi(L) count not in the 9 samples. + Punt until something needs it. +- **Geo PVS (TXT col 7, e.g. "0.040 in/s")**. Not stored in the + block; can be approximated as `sqrt(T_peak² + V_peak² + L_peak²)` + but BW's value sometimes differs slightly (probably computed from + waveform-instant samples, not from per-channel peaks). Punt — the + `.h5` consumers don't need PVS as a sample channel. +- **Mic psi(L) value (TXT col 8)**. TXT shows it as a small psi value + derived from the dB measurement. Not in the 9 samples. Could be + derived from `M_peak_count` via the inverse of the dB formula plus + a psi calibration constant. Defer. + +## Output shape + +`decode_histogram_body` returns the standard 4-channel dict that +mirrors `waveform_codec.decode_waveform_v2`'s output: + +```python +{ + "Tran": [peak_count_per_interval, ...], # 16-count units (LSB = 0.005 in/s) + "Vert": [..., ...], + "Long": [..., ...], + "MicL": [..., ...], # raw ADC counts +} +``` + +Run through `waveform_codec.decoded_to_adc_counts` to get 1-count ADC +units (geo ×16, mic passthrough) for the standard `.h5` writer. + +For the full per-interval record with frequencies + metadata, use +`decode_histogram_body_full()`. + +## Where it's wired + +- `minimateplus/event_file_io.py:read_blastware_file()` — first tries + the waveform codec, falls back to the histogram codec when the + waveform preamble isn't present. Same output shape, same + downstream pipeline. +- `scripts/backfill_sidecars.py` — the `has_samples` short-circuit + added during the histogram-codec-pending era still serves as a + defensive guard against truly undecodable files, but no longer + fires for valid histograms. + +## Companion reference + +- `docs/waveform_codec_re_status.md` — sibling status doc for the + much-more-complex waveform-mode codec. +- `docs/instantel_protocol_reference.md §7.6.2` — historical + protocol-reference entry. Structural framing matches what we + found; per-sample semantics were less documented than the `✅ + CONFIRMED` badge suggested. This doc supersedes §7.6.2 where they + conflict on confidence level. diff --git a/minimateplus/bw_ascii_report.py b/minimateplus/bw_ascii_report.py index a3aee4b..2f919c4 100644 --- a/minimateplus/bw_ascii_report.py +++ b/minimateplus/bw_ascii_report.py @@ -60,6 +60,18 @@ class ChannelStats: time_of_peak_s: Optional[float] = None # seconds (relative to trigger; can be negative) peak_accel_g: Optional[float] = None # g (geo channels only) peak_disp_in: Optional[float] = None # in (geo channels only) + # When BW writes "OORANGE" (Out Of Range — truncated) for a PPV + # value, the true peak exceeded the channel's full-scale range. + # We substitute the range max (e.g. 10.000 in/s for Normal range) + # as a lower bound, and flag here so downstream UI / alerts know + # to render "> 10 in/s" or "saturated" instead of trusting the + # value as an exact measurement. + ppv_saturated: bool = False + # Set when BW writes ">100 Hz" for ZC Freq — the zero-crossing + # algorithm's peak frequency exceeded the device's reporting + # ceiling (typically 100 Hz on V10.72). zc_freq_hz gets the + # threshold (100.0) as a lower bound; downstream UI renders ">100". + zc_freq_above_range: bool = False @dataclass @@ -69,6 +81,14 @@ class MicStats: pspl_dbl: Optional[float] = None # dB(L) zc_freq_hz: Optional[float] = None time_of_peak_s: Optional[float] = None + # Set when BW writes "OORANGE" for PSPL — mic exceeded its + # measurement range. pspl_dbl gets the conservative upper bound + # 140 dBL (typical NL-43 max; some units cap at 148). Consumers + # should render "> 140 dB(L)" or similar when this flag is set. + pspl_saturated: bool = False + # Same semantics as ChannelStats.zc_freq_above_range — mic ZC + # peak exceeded device reporting ceiling. + zc_freq_above_range: bool = False @dataclass @@ -92,6 +112,35 @@ class MonitorLogEntry: description: Optional[str] = None +# BW saturation marker — appears in PPV / Peak Vector Sum / similar +# numeric fields when the underlying measurement exceeded the +# channel's full-scale range (e.g., a geophone reading > 10 in/s at +# Normal range, or a mic exceeding its sensitivity ceiling). Treated +# as "≥ range_max" + a saturated flag rather than discarded. +# Appears as: ``"Tran PPV : OORANGE in/s"`` +_OORANGE_MARKERS = ("OORANGE", "OUT OF RANGE") + + +def _is_oorange(value: str) -> bool: + """True when a BW numeric field is an Out-Of-Range saturation marker.""" + s = value.strip().upper() + return any(m in s for m in _OORANGE_MARKERS) + + +def _parse_above_range(value: str) -> Optional[float]: + """For BW "above-range" markers like ">100 Hz", return the threshold. + + BW writes ZC Freq as ">100 Hz" when the zero-crossing algorithm sees + a peak too fast to count (device cuts off at 100 Hz). Returns the + numeric portion after the '>' (e.g. 100.0), or None if `value` is + not an above-range marker. + """ + s = value.strip() + if not s.startswith(">"): + return None + return _parse_number(s[1:]) + + @dataclass class BwAsciiReport: """Structured representation of one BW per-event ASCII export.""" @@ -144,6 +193,29 @@ class BwAsciiReport: # ── Vector sum ────────────────────────────────────────────────────────── peak_vector_sum_ips: Optional[float] = None peak_vector_sum_time_s: Optional[float] = None + # Saturation flag — set when BW writes "OORANGE" for the PVS. We + # then substitute sqrt(3) * geo_range_ips as a conservative upper + # bound (the theoretical maximum PVS when all 3 geo channels are + # simultaneously at full-scale). Consumers should display this as + # ">{value} in/s" or similar. + peak_vector_sum_saturated: bool = False + # Histograms additionally have an absolute date+time for the PVS + # (it occurred at a specific interval). Waveform reports show + # only the relative-time value above. + peak_vector_sum_when: Optional[datetime.datetime] = None + + # ── Histogram-specific fields (populated only when Event Type starts + # with 'Histogram' / 'Full Histogram' / 'Histogram + Continuous') ── + histogram_start: Optional[datetime.datetime] = None + histogram_stop: Optional[datetime.datetime] = None + histogram_n_intervals: Optional[int] = None # e.g. 4, 1436 + histogram_interval_size_str: Optional[str] = None # "1 minute" / "5 minutes" / "15 seconds" + histogram_interval_size_s: Optional[float] = None # parsed to seconds + # Per-channel absolute peak time+date (histogram-specific). For + # waveform events these are None — those reports use the channel's + # time_of_peak_s (relative to trigger) instead. Keyed by channel + # name ("Tran", "Vert", "Long", "MicL"). + channel_peak_when: Dict[str, datetime.datetime] = field(default_factory=dict) # ── Sensor self-check (per channel) ───────────────────────────────────── sensor_check: Dict[str, SensorCheck] = field(default_factory=dict) @@ -223,6 +295,46 @@ def _parse_event_date(s: str) -> Optional[datetime.date]: return None +def _parse_iso_date(s: str) -> Optional[datetime.date]: + """Parse "2026-05-16" → date. Histograms use ISO format for their + Start Date / Stop Date / Peak Date fields; waveforms use the + "May 8, 2026" long form which `_parse_event_date` handles.""" + s = s.strip() + try: + return datetime.date.fromisoformat(s) + except ValueError: + return None + + +_INTERVAL_UNIT_SECONDS = { + "second": 1, "seconds": 1, "sec": 1, "secs": 1, + "minute": 60, "minutes": 60, "min": 60, "mins": 60, + "hour": 3600, "hours": 3600, "hr": 3600, "hrs": 3600, +} + + +def _parse_interval_size(s: str) -> Optional[float]: + """Parse "1 minute" / "5 minutes" / "15 seconds" / "2 seconds" → seconds. + + Handles the BW Compliance Setup → Histogram Interval values verbatim + ("2 seconds", "5 seconds", "15 seconds", "1 minute", "5 minutes", + "15 minutes") plus a few defensive variants. + """ + if not s: + return None + parts = s.strip().split() + if len(parts) < 2: + return None + try: + n = float(parts[0]) + except ValueError: + return None + unit_per_s = _INTERVAL_UNIT_SECONDS.get(parts[1].lower()) + if unit_per_s is None: + return None + return n * unit_per_s + + def _parse_event_time(s: str) -> Optional[datetime.time]: """Parse "15:56:35" → time.""" s = s.strip() @@ -336,6 +448,15 @@ def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwA in_user_notes_block = False user_note_position = 0 + # Histogram-field staging — BW writes Peak Time and + # Peak Date on separate lines (and similarly Histogram + # Start Time / Date). We stash the partial value when the time + # line arrives and combine it when the matching date line arrives. + _hist_start_time: Optional[datetime.time] = None + _hist_stop_time: Optional[datetime.time] = None + _pending_peak_time: Dict[str, Optional[datetime.time]] = {} + _pvs_time_raw: Optional[str] = None # last Peak Vector Sum Time value, raw + while i < n: raw_line = lines[i] i += 1 @@ -420,24 +541,113 @@ def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwA ): ch_name, stat = key.split(" ", 1) cs = report.channels.setdefault(ch_name, ChannelStats()) - num = _parse_number(value) - if stat == "PPV": cs.ppv_ips = num - elif stat == "ZC Freq": cs.zc_freq_hz = num - elif stat == "Time of Peak": cs.time_of_peak_s = num - elif stat == "Peak Acceleration": cs.peak_accel_g = num - elif stat == "Peak Displacement": cs.peak_disp_in = num + if stat == "PPV": + if _is_oorange(value): + # Channel saturated — substitute range max as lower + # bound; flag so downstream UI can render "> 10 in/s". + cs.ppv_ips = report.geo_range_ips + cs.ppv_saturated = True + else: + cs.ppv_ips = _parse_number(value) + elif stat == "ZC Freq": + # ">100 Hz" → store threshold + flag; numeric → parse normally + threshold = _parse_above_range(value) + if threshold is not None: + cs.zc_freq_hz = threshold + cs.zc_freq_above_range = True + else: + cs.zc_freq_hz = _parse_number(value) + else: + num = _parse_number(value) + if stat == "Time of Peak": cs.time_of_peak_s = num + elif stat == "Peak Acceleration": cs.peak_accel_g = num + elif stat == "Peak Displacement": cs.peak_disp_in = num + + # ── Histogram-specific fields ──────────────────────────────────────── + # Histograms have Start/Stop time+date pairs + an interval count + # and size, plus per-channel absolute Peak Time/Date instead of + # the waveform's relative Time of Peak. + elif key == "Histogram Start Time": + _hist_start_time = _parse_event_time(value) + elif key == "Histogram Start Date": + _d = _parse_iso_date(value) + if _d and _hist_start_time: + report.histogram_start = datetime.datetime.combine(_d, _hist_start_time) + elif key == "Histogram Stop Time": + _hist_stop_time = _parse_event_time(value) + elif key == "Histogram Stop Date": + _d = _parse_iso_date(value) + if _d and _hist_stop_time: + report.histogram_stop = datetime.datetime.combine(_d, _hist_stop_time) + elif key == "Number of Intervals": + try: + report.histogram_n_intervals = int(float(value.strip())) + except ValueError: + pass + elif key == "Interval Size": + report.histogram_interval_size_str = value.strip() + report.histogram_interval_size_s = _parse_interval_size(value) + + # ── Per-channel histogram Peak Date / Peak Time ── + # Lines like "Tran Peak Time : 22:31:38" + "Tran Peak Date : 2026-05-16" + elif key in ("Tran Peak Time", "Vert Peak Time", "Long Peak Time", "MicL Time"): + ch_name = "MicL" if key == "MicL Time" else key.split(" ", 1)[0] + _pending_peak_time[ch_name] = _parse_event_time(value) + elif key in ("Tran Peak Date", "Vert Peak Date", "Long Peak Date", "MicL Date"): + ch_name = "MicL" if key == "MicL Date" else key.split(" ", 1)[0] + _d = _parse_iso_date(value) + _t = _pending_peak_time.get(ch_name) + if _d and _t: + report.channel_peak_when[ch_name] = datetime.datetime.combine(_d, _t) # ── Vector Sum ─────────────────────────────────────────────────────── elif key == "Peak Vector Sum": - report.peak_vector_sum_ips = _parse_number(value) - elif key == "Peak Vector Sum Time": + if _is_oorange(value): + # PVS saturated — conservative upper bound is + # sqrt(3) * geo_range_ips (all 3 channels at full-scale). + # Real PVS could be lower (channels rarely peak + # simultaneously) but never higher within the range. + if report.geo_range_ips is not None: + import math as _math + report.peak_vector_sum_ips = _math.sqrt(3) * report.geo_range_ips + report.peak_vector_sum_saturated = True + else: + report.peak_vector_sum_ips = _parse_number(value) + # BW writes the PVS-time label with a typo: "Peak Vector Sum TimeSum" + # (looks like Sum got appended twice). Accept both forms. Confirmed + # against actual BW output on 2026-05-27 — every PVS-time line in + # the field examples (T190, T438, K557) uses the typo'd label. + elif key in ("Peak Vector Sum Time", "Peak Vector Sum TimeSum"): report.peak_vector_sum_time_s = _parse_number(value) + _pvs_time_raw = value + elif key == "Peak Vector Sum Date": + # Histogram-mode PVS gets paired with a date. We may have + # captured 'Peak Vector Sum Time' as either a relative + # seconds float (waveform) or an HH:MM:SS string we + # interpreted as a number. For histograms, BW writes + # "Peak Vector Sum Time : 22:33:52" which _parse_number + # parses as 22.0 (loses information). When Peak Vector Sum + # Date arrives, re-parse the previous PVS time line as a + # clock time and combine into an absolute datetime. + _d = _parse_iso_date(value) + if _d and _pvs_time_raw is not None: + _t = _parse_event_time(_pvs_time_raw) + if _t: + report.peak_vector_sum_when = datetime.datetime.combine(_d, _t) + # The earlier seconds parse was bogus for histograms; + # clear it so consumers don't think it's a real offset. + report.peak_vector_sum_time_s = None # ── Microphone block ──────────────────────────────────────────────── elif key == "Microphone": report.mic.weighting = value elif key == "MicL PSPL": - report.mic.pspl_dbl = _parse_number(value) + if _is_oorange(value): + # Mic saturated — substitute conservative upper bound 140 dBL. + report.mic.pspl_dbl = 140.0 + report.mic.pspl_saturated = True + else: + report.mic.pspl_dbl = _parse_number(value) # Mirror onto the "MicL" entry in channels so callers querying # `channels["MicL"].ppv_ips` see something — but it's dB(L), not # in/s, so we store as-is in the MicStats and mark the channel. @@ -446,9 +656,15 @@ def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwA cs = report.channels.setdefault("MicL", ChannelStats()) cs.time_of_peak_s = report.mic.time_of_peak_s elif key == "MicL ZC Freq": - report.mic.zc_freq_hz = _parse_number(value) + threshold = _parse_above_range(value) + if threshold is not None: + report.mic.zc_freq_hz = threshold + report.mic.zc_freq_above_range = True + else: + report.mic.zc_freq_hz = _parse_number(value) cs = report.channels.setdefault("MicL", ChannelStats()) - cs.zc_freq_hz = report.mic.zc_freq_hz + cs.zc_freq_hz = report.mic.zc_freq_hz + cs.zc_freq_above_range = report.mic.zc_freq_above_range # ── Sensor self-check ──────────────────────────────────────────────── elif key in ( diff --git a/minimateplus/event_file_io.py b/minimateplus/event_file_io.py index 9c82718..7dc74c1 100644 --- a/minimateplus/event_file_io.py +++ b/minimateplus/event_file_io.py @@ -27,6 +27,8 @@ from typing import Optional, Union from .models import Event, PeakValues, ProjectInfo, Timestamp from . import blastware_file as _bw # avoid circular reference at module load from .bw_ascii_report import BwAsciiReport +from .waveform_codec import decode_waveform_v2, decoded_to_adc_counts +from .histogram_codec import decode_histogram_body # Reference pressure for dB(L) → psi conversion (20 µPa expressed in psi). # Same constant as sfm/sfm_webapp.html so server-side and browser-side @@ -47,7 +49,7 @@ SIDECAR_KIND = "sfm.event" # bumped without a `pip install` re-run — leading to confusing stale # version stamps in sidecars. Bump this constant and CHANGELOG.md # together at release time. -TOOL_VERSION = "0.16.1" +TOOL_VERSION = "0.20.0" try: # Best-effort: prefer the installed metadata when it's NEWER than the @@ -118,7 +120,16 @@ def _bw_report_to_dict(report: BwAsciiReport) -> dict: "peak_disp_in": cs.peak_disp_in, } # Drop all-None entries — keeps the JSON tidy for partial reports. - return {k: v for k, v in out.items() if v is not None} + out = {k: v for k, v in out.items() if v is not None} + # Saturation flag (only present when True) — signals that ppv_ips + # is the channel range max (a lower bound), not an exact reading. + if getattr(cs, "ppv_saturated", False): + out["ppv_saturated"] = True + # ZC Freq above device reporting ceiling (BW ">100 Hz") — value + # in zc_freq_hz is the threshold, not an exact measurement. + if getattr(cs, "zc_freq_above_range", False): + out["zc_freq_above_range"] = True + return out def _sc(ch_name: str) -> dict: sc = report.sensor_check.get(ch_name) @@ -167,15 +178,25 @@ def _bw_report_to_dict(report: BwAsciiReport) -> dict: "vert": _ch("Vert"), "long": _ch("Long"), "vector_sum": { - "ips": report.peak_vector_sum_ips, - "time_s": report.peak_vector_sum_time_s, + "ips": report.peak_vector_sum_ips, + "time_s": report.peak_vector_sum_time_s, + # Histogram events have an absolute date+time for the PVS + # (the interval at which it occurred); waveform events + # only have the time_s offset. + "when": report.peak_vector_sum_when.isoformat() if report.peak_vector_sum_when else None, + # Set when BW reported the PVS as OORANGE — value is the + # conservative upper bound sqrt(3) * geo_range_ips, not + # an exact peak. + "saturated": bool(getattr(report, "peak_vector_sum_saturated", False)), }, }, "mic": { - "weighting": report.mic.weighting, - "pspl_dbl": report.mic.pspl_dbl, - "zc_freq_hz": report.mic.zc_freq_hz, - "time_of_peak_s": report.mic.time_of_peak_s, + "weighting": report.mic.weighting, + "pspl_dbl": report.mic.pspl_dbl, + "pspl_saturated": bool(getattr(report.mic, "pspl_saturated", False)), + "zc_freq_hz": report.mic.zc_freq_hz, + "zc_freq_above_range": bool(getattr(report.mic, "zc_freq_above_range", False)), + "time_of_peak_s": report.mic.time_of_peak_s, }, "sensor_check": { "tran": _sc("Tran"), @@ -183,6 +204,17 @@ def _bw_report_to_dict(report: BwAsciiReport) -> dict: "long": _sc("Long"), "mic": _sc("MicL"), }, + # Histogram-specific fields (None on waveform-mode events). + # Per-channel absolute peak time/date for histograms — for + # waveforms see channels[ch]["time_of_peak_s"] instead. + "histogram": { + "start": report.histogram_start.isoformat() if report.histogram_start else None, + "stop": report.histogram_stop.isoformat() if report.histogram_stop else None, + "n_intervals": report.histogram_n_intervals, + "interval_size": report.histogram_interval_size_str, + "interval_size_s": report.histogram_interval_size_s, + "channel_peak_when": {ch: dt.isoformat() for ch, dt in report.channel_peak_when.items()}, + }, "monitor_log": monitor_log, "pc_sw_version": report.pc_sw_version, } @@ -252,6 +284,60 @@ def apply_report_to_event(event: Event, report: BwAsciiReport) -> None: event.rectime_seconds = report.record_time_s +def apply_bw_report_dict_to_event(event: Event, bw_report: dict) -> None: + """Mirror of ``apply_report_to_event`` for the projected sidecar + dict shape (as produced by ``_bw_report_to_dict``). + + Why this exists + ─────────────── + The ingest path holds a live ``BwAsciiReport`` parsed straight from + the ``_ASCII.TXT`` and uses ``apply_report_to_event`` to overlay + device-authoritative peaks onto the codec output before insert. + + The backfill path doesn't have the original ``.TXT`` (it's not + retained in the waveform store), but it does have the preserved + ``bw_report`` block from the sidecar — which contains the same + projected fields. Re-overlaying those during a backfill keeps the + DB peak columns aligned with what BW reports rather than letting + the codec output (which may be incomplete for unhandled formats or + walker edge cases) win by default. + + No-ops cleanly when ``bw_report`` is ``None``, empty, or missing + any particular sub-field — only fields with a concrete value get + written. Mirrors ``apply_report_to_event``'s "report wins where + present" semantics. + """ + if not bw_report: + return + if event.peak_values is None: + event.peak_values = PeakValues() + pv = event.peak_values + + peaks = bw_report.get("peaks") or {} + tran = (peaks.get("tran") or {}).get("ppv_ips") + vert = (peaks.get("vert") or {}).get("ppv_ips") + long = (peaks.get("long") or {}).get("ppv_ips") + if tran is not None: pv.tran = tran + if vert is not None: pv.vert = vert + if long is not None: pv.long = long + vs_ips = (peaks.get("vector_sum") or {}).get("ips") + if vs_ips is not None: + pv.peak_vector_sum = vs_ips + + mic = bw_report.get("mic") or {} + pspl = mic.get("pspl_dbl") + if pspl is not None and pspl > 0: + pv.micl = _dbl_to_psi(pspl) + + rec = bw_report.get("recording") or {} + sr = rec.get("sample_rate_sps") + if sr: + event.sample_rate = sr + rt = rec.get("record_time_s") + if rt is not None: + event.rectime_seconds = rt + + def _project_info_to_dict(pi: Optional[ProjectInfo]) -> dict: if pi is None: return { @@ -276,6 +362,7 @@ def event_to_sidecar_dict( blastware_filesize: int, blastware_sha256: str, source_kind: str = "sfm-live", + txt_filename: Optional[str] = None, a5_pickle_filename: Optional[str] = None, tool_version: str = _TOOL_VERSION_DEFAULT, captured_at: Optional[datetime.datetime] = None, @@ -392,6 +479,7 @@ def event_to_sidecar_dict( "captured_at": captured_at.isoformat() + "Z" if captured_at.tzinfo is None else captured_at.isoformat(), "tool_version": tool_version, "a5_pickle_filename": a5_pickle_filename, + "txt_filename": txt_filename, }, "review": review or { @@ -755,11 +843,40 @@ def read_blastware_file(path: Union[str, Path]) -> Event: ts1 = _bw._decode_ts_be(footer[2:10]) ts2 = _bw._decode_ts_be(footer[10:18]) - # Body: first 6 bytes are the preamble (00 00 ff ff ff ff). Strip - # them before decoding samples. Any trailing tail past the last - # full sample-set is silently truncated by _decode_samples_4ch. - sample_bytes = body[6:] if body[:6].hex() in ("0000ffffffff", "0000FFFFFFFF") else body - samples = _decode_samples_4ch_int16_le(sample_bytes) + # Body: decode via the verified body codecs. Two formats coexist: + # + # 1. Waveform-mode (.AB0W) — starts with 7-byte preamble + # ``00 02 00 [Tran[0] BE] [Tran[1] BE]`` followed by the + # tagged-block delta stream documented in + # ``docs/waveform_codec_re_status.md`` and §7.6.1 of the + # protocol reference. Decoded by ``waveform_codec.decode_waveform_v2``. + # + # 2. Histogram-mode (.AB0H) — a sequence of 32-byte blocks, one + # per histogram interval, each carrying per-channel peak + + # half-period values. Decoded by + # ``histogram_codec.decode_histogram_body``. Both codecs + # return the same channel-grouped output shape, so consumers + # don't need to special-case mode. + # + # The historical ``_decode_samples_4ch_int16_le`` int16-LE + # interpretation was retracted 2026-05-08 (see protocol-ref §7.6.1 + # retraction box) — it produced ±32K noise on every event. + # + # If both codecs fail (malformed file, truncated body, unrecognised + # mode, synthetic test input), fall back to empty channels — the + # rest of the event (timestamp, waveform_key, project strings) is + # still recoverable and useful. + decoded = decode_waveform_v2(body) + if decoded is None: + decoded = decode_histogram_body(body) + if decoded is None: + log.warning( + "%s: body codec failed to decode (body starts %s) — " + "raw_samples will be empty", path, body[:8].hex(" "), + ) + samples = {"Tran": [], "Vert": [], "Long": [], "MicL": []} + else: + samples = decoded_to_adc_counts(decoded) # Metadata strings (label-anchored search across the body). project = _find_first_string(body, b"Project:") @@ -793,7 +910,18 @@ def read_blastware_file(path: Union[str, Path]) -> Event: project=project, client=client, operator=user, sensor_location=seisloc, ) ev.raw_samples = samples - ev.peak_values = _peaks_from_samples(samples) + # Only compute peaks from samples when we actually have samples. + # For events the codec couldn't decode (histogram-mode bodies, until + # the §7.6.2 histogram codec is wired in), samples is an empty dict + # and ``_peaks_from_samples`` would return PeakValues(0, 0, 0, 0, 0). + # That would then OVERWRITE existing good DB peak values (e.g. from + # paired BW ASCII reports) during the backfill UPSERT path. + # Leaving peak_values=None signals "we don't know" to downstream + # consumers; the backfill script seeds from the DB row when it sees + # None, and ``apply_report_to_event`` overlays from a paired ASCII + # report when one is supplied. + has_samples = any(samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL")) + ev.peak_values = _peaks_from_samples(samples) if has_samples else None ev._a5_frames = None # not recoverable from BW file return ev diff --git a/minimateplus/histogram_codec.py b/minimateplus/histogram_codec.py new file mode 100644 index 0000000..36e399d --- /dev/null +++ b/minimateplus/histogram_codec.py @@ -0,0 +1,283 @@ +""" +histogram_codec.py — decoder for MiniMate Plus histogram-mode event bodies. + +FULLY DECODED 2026-05-20. Every field in every block, verified +byte-exact against BW's ASCII export across multiple histogram +fixtures. + +The histogram-mode body is a stream of 32-byte fixed-length blocks, +one block per histogram interval. Each block carries the per-interval +peak amplitude + zero-crossing frequency for all four channels (Tran, +Vert, Long, MicL). + +──────────────────────────────────────────────────────────────────────────── +Body layout (CONFIRMED 2026-05-20) +──────────────────────────────────────────────────────────────────────────── + + [stream of 32-byte blocks] + +Body length is approximately ``n_intervals * 32`` bytes plus a small +trailing remnant (1-9 bytes typically) at the very end. Walker should +iterate 32-stride and stop before the tail. + +──────────────────────────────────────────────────────────────────────────── +32-byte block layout +──────────────────────────────────────────────────────────────────────────── + + [0] 0x00 always-zero tag + [1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment + [2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …) + [4:6] 0x000a (uint16 LE) constant marker (= 10) + [6] T_peak_count uint8 Tran peak (count × 0.005 → in/s, max 1.275 in/s) + [7] T_annotation uint8 empirically non-zero on intervals with sub-Hz + or unmeasurable Tran freq; meaning not fully RE'd + [8:10] T_halfperiod uint16 LE Tran half-period in samples (freq = 512 / halfp Hz) + [10] V_peak_count uint8 + [11] V_annotation uint8 + [12:14] V_halfperiod uint16 LE + [14] L_peak_count uint8 + [15] L_annotation uint8 + [16:18] L_halfperiod uint16 LE + [18] M_peak_count uint8 MicL peak (count → dB via mic_count_to_db) + [19] M_annotation uint8 + [20:22] M_halfperiod uint16 LE MicL half-period in samples (freq = 512 / halfp Hz) + [22:24] 0x00 0x00 constant + [24:28] 4-byte variable purpose unknown (possibly CRC or timestamp delta) + [28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature + +NOTE on peak-count width: an earlier interpretation treated the peak +fields as uint16 LE spanning [6:8] / [10:12] / [14:16] / [18:20]. +That happened to be byte-exact against the N844 fixture corpus only +because every annotation byte in those fixtures was zero, making +``uint16 LE == uint8``. Cross-correlating BE9558 (K558) Tran-drift +and BE18003 (T003) Histogram+Continuous events against the BW ASCII +export proved peak is uint8 alone — see test_histogram_codec.py +and docs/histogram_codec_re_status.md. + +Block-identification anchor: ``block[22:24] == b"\\x00\\x00"`` AND +``block[28:32] == b"\\x1e\\x0a\\x00\\x00"``. This is the reliable +distinguisher from non-block content in the file. + +──────────────────────────────────────────────────────────────────────────── +Per-channel encoding +──────────────────────────────────────────────────────────────────────────── + +Geophone channels (Tran, Vert, Long): + - peak_count × 0.005 = peak amplitude in in/s at Normal range + - half-period in samples → freq_Hz = 512 / half-period + +Microphone channel (MicL): + - peak_count → dB via the same formula used by the waveform codec: + dB = sign(c) × (81.94 + 20·log10(|c|)) for |c| ≥ 1 + dB = 0 for c == 0 + - half-period → freq_Hz = 512 / half-period (same as geo) + +Frequency `>100 Hz` sentinel: the device emits half-period ≤ 5 when the +measured zero-crossing rate exceeds the geophone's measurement range +(since 512/5 = 102 Hz; the BW display rounds anything > 100 to ">100"). + +──────────────────────────────────────────────────────────────────────────── +Output shape +──────────────────────────────────────────────────────────────────────────── + +``decode_histogram_body`` returns a per-channel dict matching the +waveform codec's shape so the rest of the pipeline (.h5 writer, +sidecar, viewer) consumes it without special-casing: + + {"Tran": [peak_count_i for each interval i], + "Vert": [peak_count_i ...], + "Long": [peak_count_i ...], + "MicL": [peak_count_i ...]} + +Values are in **16-count units for geo** (LSB = 0.005 in/s, matching +``decode_waveform_v2``) and **1-count units for mic** (matching the +waveform codec's mic convention). Run through +``waveform_codec.decoded_to_adc_counts`` to scale geo to 1-count ADC. + +Per-interval frequencies are NOT returned — they're auxiliary data, +not waveform samples. Consumers needing frequencies can call +``decode_histogram_body_full()`` for the structured per-interval +record list. +""" + +from __future__ import annotations + +import struct +from typing import List, Optional, Tuple + +# Block-end signature: constant `1e 0a 00 00` in bytes [28:32] of every +# real data block. More distinctive than the byte-22 `00 00` (which +# matches many false positives), so we anchor on this. +_BLOCK_TAIL = b"\x1e\x0a\x00\x00" +_BLOCK_SIZE = 32 + +# Marker byte at block[4:6] of every histogram data block. Used as +# additional validation that we're looking at a real block. +_BLOCK_MARKER = 10 + +# Geo peak scaling: stored as "count × 0.005 in/s" where 1 count = one +# 0.005 in/s display quantum. Equivalent to the waveform codec's +# 16-count-unit output (1 unit = 0.005 in/s = 16 ADC counts). +_GEO_LSB_INS = 0.005 + +# Frequency formula: freq_Hz = _FREQ_NUMERATOR / half_period_samples. +# Empirically determined to be 512 (= sample_rate / 2, where sample rate +# is 1024 sps for the standard MiniMate Plus configuration). +_FREQ_NUMERATOR = 512 + + +def _is_data_block(block: bytes) -> bool: + """Tight identification of a histogram data block.""" + if len(block) < _BLOCK_SIZE: + return False + if block[28:32] != _BLOCK_TAIL: + return False + if block[22:24] != b"\x00\x00": + return False + if block[0] != 0x00: + return False + marker = block[4] | (block[5] << 8) + if marker != _BLOCK_MARKER: + return False + return True + + +def _decode_block(block: bytes) -> Optional[dict]: + """Decode one 32-byte histogram block. Caller must have validated + with ``_is_data_block`` first. + + Returns a record with per-channel peak counts (uint8) and + half-periods (uint16 LE). + """ + # Peak counts are uint8 at bytes [6] / [10] / [14] / [18]. The + # adjacent bytes [7] / [11] / [15] / [19] hold an annotation field + # whose meaning isn't fully understood (empirically non-zero in + # intervals with sub-Hz or unmeasurable geo frequencies, mostly + # zero otherwise — see test fixtures from BE9558/BE18003 corpora). + # Crucially, those annotation bytes are NOT the high byte of the + # peak count: cross-correlating against BW's per-interval ASCII + # export proves the peak is uint8 alone. + # + # Reading the peak as uint16 LE (the original interpretation) was + # accidentally correct only because every block in the N844 fixture + # corpus had a zero annotation byte; non-N844 events with non-zero + # annotation bytes decoded to physically impossible peaks (e.g. + # 268 in/s per channel) and produced 35× inflated PVS sums when + # first run against prod data. See histogram_codec_re_status.md. + t_peak = block[6] + v_peak = block[10] + l_peak = block[14] + m_peak = block[18] + t_halfp = block[8] | (block[9] << 8) + v_halfp = block[12] | (block[13] << 8) + l_halfp = block[16] | (block[17] << 8) + m_halfp = block[20] | (block[21] << 8) + segment_id = block[1] + block_ctr = block[2] | (block[3] << 8) + var_meta = bytes(block[24:28]) + annotations = (block[7], block[11], block[15], block[19]) + return { + "segment_id": segment_id, + "block_ctr": block_ctr, + "t_peak": t_peak, + "t_halfp": t_halfp, + "v_peak": v_peak, + "v_halfp": v_halfp, + "l_peak": l_peak, + "l_halfp": l_halfp, + "m_peak": m_peak, + "m_halfp": m_halfp, + "meta_var": var_meta, + "annotations": annotations, + } + + +def walk_body(body: bytes) -> List[dict]: + """Walk the body and return one dict per histogram interval. + + Iterates 32-byte strides from offset 0. Yields a decoded record + for every block that passes ``_is_data_block`` validation. Stops + when the remaining bytes are too short to form a complete block. + + In Histogram+Continuous mode the body interleaves data blocks with + other 32-byte content (likely continuous-mode waveform blocks) that + fail the data-block validation; the walker naturally skips them + without losing 32-byte alignment. Use ``block_ctr`` from each + returned record to map back to the original interval index — the + record list is sparse when other block types are interleaved. + """ + records: List[dict] = [] + for off in range(0, len(body) - _BLOCK_SIZE + 1, _BLOCK_SIZE): + blk = body[off:off + _BLOCK_SIZE] + if not _is_data_block(blk): + # Hit non-block content (likely a sync or stream marker). + # Continue walking — block alignment is fixed at 32-stride + # from offset 0, so we don't lose alignment by skipping. + continue + decoded = _decode_block(blk) + if decoded is None: + # Block validated as a histogram block but had peak fields + # outside the plausible range — undocumented extension. + # Skip rather than propagating bogus PVS contributions. + continue + records.append(decoded) + return records + + +def decode_histogram_body(body: bytes) -> Optional[dict]: + """Decode a histogram-mode body into per-channel peak-sample arrays. + + Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}`` + where each channel's list contains one peak value per histogram + interval (in the same units the waveform codec uses: 16-count units + for geo, 1-count ADC units for mic). Returns ``None`` if the body + doesn't contain any valid histogram blocks. + + To convert to physical units: + - Geo channels: ``count * 0.005`` = peak in in/s at Normal range + (or run through ``waveform_codec.decoded_to_adc_counts`` first + to get 1-count ADC values, then ``count / 32767 * 10.0`` for in/s) + - Mic channel: use ``waveform_codec.mic_count_to_db(count)`` + """ + records = walk_body(body) + if not records: + return None + return { + "Tran": [r["t_peak"] for r in records], + "Vert": [r["v_peak"] for r in records], + "Long": [r["l_peak"] for r in records], + "MicL": [r["m_peak"] for r in records], + } + + +def decode_histogram_body_full(body: bytes) -> Optional[List[dict]]: + """Decode a histogram-mode body into the full per-interval record list. + + Same data as ``decode_histogram_body`` but in a structured form that + preserves the half-period (frequency) data for each channel + the + per-block segment_id, block_ctr, and 4-byte variable metadata. + Useful for diagnostic tools, sidecar enrichment, and future-codec + work. + + Returns ``None`` if the body has no valid blocks. + """ + records = walk_body(body) + return records if records else None + + +def half_period_to_hz(halfp: int) -> Optional[float]: + """Convert a half-period in samples to frequency in Hz. + + Returns ``None`` for half-period ≤ 5 — the device emits values in + that range when the measured zero-crossing rate exceeds 100 Hz + (the BW display reports `>100 Hz` for such cases). Callers can + treat ``None`` as the `>100 Hz` sentinel. + """ + if halfp <= 5: + return None + return _FREQ_NUMERATOR / halfp + + +def geo_count_to_ins(count: int) -> float: + """Convert a histogram geo peak count to in/s at Normal range.""" + return count * _GEO_LSB_INS diff --git a/pyproject.toml b/pyproject.toml index 7674acc..5151f55 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "seismo-relay" -version = "0.19.0" +version = "0.20.0" description = "Python client and REST server for MiniMate Plus seismographs" requires-python = ">=3.10" dependencies = [ @@ -15,6 +15,7 @@ dependencies = [ "python-multipart>=0.0.7", "h5py>=3.10", "numpy>=1.24", + "matplotlib>=3.8", ] [tool.setuptools.packages.find] diff --git a/requirements.txt b/requirements.txt index 8b01960..c77bbf7 100644 --- a/requirements.txt +++ b/requirements.txt @@ -5,3 +5,4 @@ pyserial python-multipart h5py numpy +matplotlib diff --git a/scripts/backfill_sidecars.py b/scripts/backfill_sidecars.py index b937e8c..04789a5 100644 --- a/scripts/backfill_sidecars.py +++ b/scripts/backfill_sidecars.py @@ -12,8 +12,20 @@ Walks `//` and for each BW event file: parsing the BW binary directly (peaks computed from samples). Clean waveform (.h5): - - Skip when .h5 already exists (idempotent). - - Else write from .a5.pkl (preferred) or BW binary parse (fallback). + - Regenerated whenever the sidecar is regenerated (sha mismatch + OR sidecar.source.tool_version < current TOOL_VERSION OR --force). + The .h5 and the sidecar both come from the same decoder output, + so if the sidecar is stale the .h5 is too. + - Written when missing. + - --skip-hdf5 turns off all .h5 writes. + +Typical use after a decoder upgrade: + 1. Pull the new seismo-relay code (which bumped TOOL_VERSION). + 2. Run this script — every sidecar with an older tool_version + stamp regenerates, and the associated .h5 cascade-regenerates. + 3. Operator review state (review.false_trigger, notes, reviewer) + and the sidecar's extensions block are preserved across the + regen. Usage: python scripts/backfill_sidecars.py [--store-root PATH] @@ -42,14 +54,26 @@ log = logging.getLogger("backfill_sidecars") def _looks_like_event_file(path: Path) -> bool: - """Same heuristic as the importer CLI.""" + """Same heuristic as the importer CLI. + + Filters to BW (Series III) event files only — Thor (Series IV) + `.IDFW` / `.IDFH` files share the store but have their own ingest + path (`WaveformStore.save_imported_idf`) and are NOT decodable by + `event_file_io.read_blastware_file`. Their sidecars are populated + at ingest from the paired `.IDFW.txt` ASCII report; nothing the + backfill regenerates would improve on them, so we exclude them + from scope. + """ if not path.is_file(): return False - if path.name.endswith((".a5.pkl", ".sfm.json")): + if path.name.endswith((".a5.pkl", ".sfm.json", ".h5")): return False ext = path.suffix.lstrip(".") if not (3 <= len(ext) <= 4): return False + # Thor IDF files share the .{W,H}-suffix shape but aren't BW. + if ext.upper() in ("IDFW", "IDFH"): + return False if not (ext[-1].upper() in {"W", "H"} or ext.endswith("0")): return False try: @@ -79,6 +103,17 @@ def main(argv=None) -> int: "STRT-rectime byte-offset fix in v0.15.x)." ), ) + p.add_argument( + "--reparse-txt", action="store_true", + help=( + "Re-parse the preserved /_ASCII.TXT with the " + "current bw_ascii_report parser and overwrite the sidecar's " + "bw_report block. Use this after upgrading the ASCII parser to " + "pull in new fields (e.g. zc_freq_above_range for BW '>100 Hz' " + "ZC peaks). No-op for events without a preserved .TXT; safely " + "idempotent when the parser hasn't changed." + ), + ) p.add_argument("-v", "--verbose", action="store_true") args = p.parse_args(argv) @@ -123,7 +158,13 @@ def main(argv=None) -> int: # the sidecar was written by a build that includes any # decoder fixes shipped since). # Either part failing → regenerate. --force bypasses both. - if sidecar_path.exists() and not args.force: + # + # Tracks whether we're regenerating the sidecar this iteration + # so the .h5 logic below knows to refresh that too — staleness + # of the sidecar implies staleness of the derived .h5 (both + # come out of the same decoder). + sidecar_stale = True + if sidecar_path.exists() and not args.force and not args.reparse_txt: try: existing = event_file_io.read_sidecar(sidecar_path) sha_ok = existing.get("blastware", {}).get("sha256") == bw_sha @@ -136,6 +177,7 @@ def main(argv=None) -> int: ver_ok = _vt(src_ver) >= _vt(event_file_io.TOOL_VERSION) if sha_ok and ver_ok: skipped += 1 + sidecar_stale = False continue if sha_ok and not ver_ok: log.info( @@ -256,19 +298,68 @@ def main(argv=None) -> int: or ev.total_samples < derived // 4): ev.total_samples = derived - # Preserve user-edited review state + extensions from the - # existing sidecar (false_trigger flag, notes, etc.) so a - # backfill never wipes them out. - preserved_review = None - preserved_ext = None + # Preserve user-edited review state + extensions + the + # bw_report block from the existing sidecar so a backfill + # never wipes them out. The bw_report block originates + # from the paired .TXT ASCII report parsed at ORIGINAL + # import time (ach forward / direct upload); the .TXT + # file is not in the waveform store, so we can't re-derive + # it from disk. event_to_sidecar_dict takes a + # BwAsciiReport dataclass (not a dict), so for bw_report + # we overlay the existing block after regen instead of + # passing it as a kwarg. + preserved_review = None + preserved_ext = None + preserved_bw_report = None + preserved_txt_fn = None if sidecar_path.exists(): try: _existing = event_file_io.read_sidecar(sidecar_path) - preserved_review = _existing.get("review") - preserved_ext = _existing.get("extensions") + preserved_review = _existing.get("review") + preserved_ext = _existing.get("extensions") + preserved_bw_report = _existing.get("bw_report") + # Preserve txt_filename so backfills don't blank out the + # pointer to the saved raw .TXT (events ingested after + # 2026-05-27 have this). + preserved_txt_fn = (_existing.get("source") or {}).get("txt_filename") except Exception: pass + # --reparse-txt: if a .TXT is preserved on disk, run the + # current parser against it and overwrite the bw_report + # block. Picks up post-ingest parser fixes (e.g. the + # 2026-05-28 zc_freq_above_range / ">100 Hz" addition). + if args.reparse_txt and preserved_txt_fn: + try: + from minimateplus import bw_ascii_report + txt_path = store.txt_path_for(serial, path.name) + if txt_path.exists(): + refreshed = bw_ascii_report.parse_report_file(txt_path) + preserved_bw_report = event_file_io._bw_report_to_dict(refreshed) + log.debug("reparsed bw_report from %s", txt_path.name) + else: + log.debug("--reparse-txt: no .TXT at %s (sidecar says %r)", + txt_path, preserved_txt_fn) + except Exception as exc: + log.warning("--reparse-txt failed for %s: %s", path.name, exc) + + # Overlay BW ASCII report fields onto the rebuilt Event + # BEFORE the sidecar + DB write. Mirrors what the ingest + # path does — BW's reported peaks (and sample_rate / + # record_time) win over codec output where present. + # + # Without this step, --force backfill silently overwrites + # the bw_report-overlaid DB columns with codec-derived + # values, which is wrong for events the codec doesn't + # fully decode (e.g. waveform walker edge cases on + # SP0/SS0/SV0-style events, or histogram sub-formats with + # byte[5]!=0 that aren't yet RE'd). Net effect was PVS=0 + # on three top-10 events on 2026-05-22. + if preserved_bw_report: + event_file_io.apply_bw_report_dict_to_event( + ev, preserved_bw_report, + ) + sidecar = event_file_io.event_to_sidecar_dict( ev, serial=serial, @@ -277,16 +368,44 @@ def main(argv=None) -> int: blastware_sha256=bw_sha, source_kind=source_kind, a5_pickle_filename=a5_filename, + txt_filename=preserved_txt_fn, review=preserved_review, extensions=preserved_ext, ) + if preserved_bw_report is not None: + sidecar["bw_report"] = preserved_bw_report - # Also emit the .h5 clean-waveform file when missing OR when - # --force was passed (so a re-backfill picks up decoder fixes). + # Also emit the .h5 clean-waveform file when: + # - it's missing, OR + # - --force was passed, OR + # - the sidecar is being regenerated this iteration + # (sha mismatch / tool_version too old). The .h5 and + # the sidecar are both derived from the same decoder + # output, so if the sidecar is stale, so is the .h5. + # + # Both waveform and histogram bodies now decode to real + # samples via event_file_io.read_blastware_file → either + # waveform_codec.decode_waveform_v2 or histogram_codec. + # decode_histogram_body. If samples are still empty after + # both codecs run, it's a genuine "we can't decode this + # file" case (truncated, malformed, or unknown mode); + # skip the .h5 write so we don't replace whatever's + # there with an empty placeholder. + has_samples = bool( + ev.raw_samples and any( + ev.raw_samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL") + ) + ) hdf5_path = store.hdf5_path_for(serial, path.name) hdf5_filename = hdf5_path.name if hdf5_path.exists() else None hdf5_action = "kept" - need_h5 = not args.skip_hdf5 and (args.force or not hdf5_path.exists()) + need_h5 = ( + not args.skip_hdf5 + and (args.force or not hdf5_path.exists() or sidecar_stale) + and has_samples + ) + if not has_samples and not args.skip_hdf5: + hdf5_action = "skipped-undecodable" if need_h5: if args.dry_run: hdf5_action = "would (re)write" diff --git a/scripts/check_bw_report_preservation.py b/scripts/check_bw_report_preservation.py new file mode 100644 index 0000000..2402ffe --- /dev/null +++ b/scripts/check_bw_report_preservation.py @@ -0,0 +1,185 @@ +""" +scripts/check_bw_report_preservation.py — verify that running backfill_sidecars +doesn't wipe the `bw_report` block from sidecars that already had one. + +Two-step workflow: + + # Before running backfill — capture a baseline snapshot: + python scripts/check_bw_report_preservation.py snapshot \ + --store-root /path/to/waveforms \ + --out before.json + + # Run backfill: + python scripts/backfill_sidecars.py --store-root /path/to/waveforms --force + + # After backfill — diff against the baseline: + python scripts/check_bw_report_preservation.py diff \ + --store-root /path/to/waveforms \ + --baseline before.json + +The diff classifies every sidecar into one of: + + PRESERVED had bw_report before, has same hash now ← GOOD + CHANGED had bw_report before, has different hash now ← suspicious + (backfill should only ever copy the block verbatim) + WIPED had bw_report before, doesn't now ← BUG — data loss + STILL_MISSING didn't have bw_report before, still doesn't ← expected + NEW didn't have bw_report before, has one now + (only possible if a re-ingest happened between snapshots; + shouldn't happen during backfill) + REMOVED sidecar existed in baseline, file is gone now + ADDED sidecar didn't exist in baseline, exists now + +Exit code is 0 if no WIPED or CHANGED entries are found, 1 otherwise. +""" + +from __future__ import annotations + +import argparse +import hashlib +import json +import sys +from pathlib import Path +from typing import Optional + +# Allow running from the repo root without installation. +sys.path.insert(0, str(Path(__file__).resolve().parent.parent)) + +from minimateplus import event_file_io + + +def _bw_report_hash(sidecar_data: dict) -> Optional[str]: + """Canonical-JSON hash of the bw_report block, or None if absent.""" + br = sidecar_data.get("bw_report") + if not br: + return None + # sort_keys for stable hashing across dict-ordering differences + blob = json.dumps(br, sort_keys=True, separators=(",", ":")) + return hashlib.sha256(blob.encode()).hexdigest() + + +def _scan_store(store_root: Path) -> dict: + """Walk every /.sfm.json and return {relpath: hash_or_None}. + + Relpath is `/` — stable across machines/snapshots. + """ + out: dict[str, Optional[str]] = {} + for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()): + for sidecar in sorted(serial_dir.glob("*.sfm.json")): + relpath = f"{serial_dir.name}/{sidecar.name}" + try: + data = event_file_io.read_sidecar(sidecar) + except Exception as exc: + print(f" WARN: failed to read {relpath}: {exc}", file=sys.stderr) + continue + out[relpath] = _bw_report_hash(data) + return out + + +def cmd_snapshot(args) -> int: + store_root = Path(args.store_root).expanduser().resolve() + if not store_root.exists(): + print(f"error: store root does not exist: {store_root}", file=sys.stderr) + return 2 + out_path = Path(args.out).expanduser().resolve() + + print(f"Scanning {store_root} …") + snapshot = _scan_store(store_root) + + with_bw = sum(1 for v in snapshot.values() if v is not None) + without_bw = sum(1 for v in snapshot.values() if v is None) + print(f" total sidecars: {len(snapshot)}") + print(f" with bw_report: {with_bw}") + print(f" without bw_report: {without_bw}") + + out_path.parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w") as f: + json.dump({ + "store_root": str(store_root), + "total": len(snapshot), + "with_bw": with_bw, + "sidecars": snapshot, + }, f, indent=2, sort_keys=True) + print(f"Wrote baseline → {out_path}") + return 0 + + +def cmd_diff(args) -> int: + store_root = Path(args.store_root).expanduser().resolve() + if not store_root.exists(): + print(f"error: store root does not exist: {store_root}", file=sys.stderr) + return 2 + baseline_path = Path(args.baseline).expanduser().resolve() + if not baseline_path.exists(): + print(f"error: baseline file not found: {baseline_path}", file=sys.stderr) + return 2 + + with open(baseline_path) as f: + baseline = json.load(f) + before = baseline["sidecars"] + print(f"Scanning {store_root} for comparison against {baseline_path.name} …") + after = _scan_store(store_root) + + classes = {k: [] for k in ( + "PRESERVED", "CHANGED", "WIPED", "STILL_MISSING", "NEW", "REMOVED", "ADDED", + )} + all_keys = set(before) | set(after) + for key in sorted(all_keys): + b = before.get(key, "__MISSING__") + a = after.get(key, "__MISSING__") + if b == "__MISSING__": + classes["ADDED"].append(key) + elif a == "__MISSING__": + classes["REMOVED"].append(key) + elif b is None and a is None: + classes["STILL_MISSING"].append(key) + elif b is None and a is not None: + classes["NEW"].append(key) + elif b is not None and a is None: + classes["WIPED"].append(key) + elif b == a: + classes["PRESERVED"].append(key) + else: + classes["CHANGED"].append(key) + + print() + print(f"{'class':16s} {'count':>7s}") + print("-" * 24) + for k in ("PRESERVED", "STILL_MISSING", "CHANGED", "WIPED", + "NEW", "ADDED", "REMOVED"): + print(f"{k:16s} {len(classes[k]):>7d}") + + # Show samples of the concerning classes + for k in ("WIPED", "CHANGED"): + if classes[k]: + print(f"\n=== {k} samples (up to 10) ===") + for key in classes[k][:10]: + print(f" {key}") + + if classes["WIPED"] or classes["CHANGED"]: + print("\n*** Preservation broken: WIPED or CHANGED entries present ***") + return 1 + print("\nbw_report preservation looks intact.") + return 0 + + +def main(argv=None) -> int: + p = argparse.ArgumentParser(description=__doc__) + sub = p.add_subparsers(dest="cmd", required=True) + + p_snap = sub.add_parser("snapshot", help="capture baseline bw_report hashes") + p_snap.add_argument("--store-root", required=True) + p_snap.add_argument("--out", required=True, help="output JSON path") + p_snap.set_defaults(func=cmd_snapshot) + + p_diff = sub.add_parser("diff", help="diff current store against a baseline") + p_diff.add_argument("--store-root", required=True) + p_diff.add_argument("--baseline", required=True, help="JSON from `snapshot`") + p_diff.set_defaults(func=cmd_diff) + + args = p.parse_args(argv) + return args.func(args) + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/sfm/event_browser.html b/sfm/event_browser.html new file mode 100644 index 0000000..ca19794 --- /dev/null +++ b/sfm/event_browser.html @@ -0,0 +1,909 @@ + + + + + + SFM Event Browser + + + + + +
+

SFM Event Browser

+ + + + + + + +
+ +
+
+
+ Events + +
+
+
+ +
+
+ + + +

Select a unit and event to view its waveform.

+
+ + +
+
+ +
Ready.
+ + + + diff --git a/sfm/report_pdf.py b/sfm/report_pdf.py new file mode 100644 index 0000000..6618d9a --- /dev/null +++ b/sfm/report_pdf.py @@ -0,0 +1,900 @@ +""" +sfm/report_pdf.py — generate Instantel-style Event Report PDFs. + +Stub layout for v0.20.0 — the exact visual is iterated against actual +Blastware reference PDFs (uploaded to docs/reference/instantel/). +Current output captures all the data fields a real BW Event Report +contains, but the visual hierarchy / spacing is still approximate. + +Architecture +──────────── +1. ``gather_report_data(event_id)`` — assembles a flat dict from three + sources: the SeismoDb events row, the .sfm.json sidecar (bw_report + block), and the .h5 waveform samples. Returns ``None`` when the + event doesn't exist or has no waveform data on disk. + +2. ``render_event_report_pdf(data)`` — takes that dict and produces a + single-page letter-sized PDF as bytes, using matplotlib's PDF + backend (vector output, no rasterization, prints cleanly). + +3. The HTTP endpoint at ``/db/events/{id}/report.pdf`` wires them + together: fetch event → gather → render → stream bytes back with + ``Content-Type: application/pdf``. + +What's in the report (every field BW's printout includes): + + Header (left): Date/Time, Trigger Source, Range, Sample Rate, Notes, + Project, Client, User Name, Seis. Loc + Header (right): Serial + firmware, Battery, Calibration, File Name, + Post Event Notes + Mic block: PSPL (dBL + psi), ZC Freq, Channel Test result + Stats table: per-channel PPV / ZC Freq / Time of Peak / + Peak Acceleration / Peak Displacement / Sensor Check + Peak Vector Sum + Waveform plot: 4 channels stacked (MicL/Long/Vert/Tran), shared + time axis, trigger marker, peak markers + USBM RI8507/OSMRE compliance chart: STUBBED — separate work item + +Histogram events: the layout differs (Number of Intervals header +field, no trigger marker, per-interval bar chart instead of waveform). +Handled via a record_type branch in ``render_event_report_pdf``. +""" + +from __future__ import annotations + +import io +import json +import logging +import math +from dataclasses import dataclass, field +from pathlib import Path +from typing import Optional + +import matplotlib +matplotlib.use("Agg") # headless — no display required +import matplotlib.pyplot as plt +import numpy as np +from matplotlib.backends.backend_pdf import PdfPages + +log = logging.getLogger(__name__) + + +# Reference pressure for dB(L) conversion: 20 µPa expressed in psi. +DBL_REF_PSI = 2.9e-9 + + +# ── Data assembly ──────────────────────────────────────────────────────────── + + +@dataclass +class ReportData: + """All fields needed to render an Instantel-style Event Report. + + Most fields are Optional — BW's printout shows '—' or just omits + sections when source data is missing. The renderer mirrors that. + """ + # Header — left column + event_datetime_str: Optional[str] = None + trigger_source: Optional[str] = None + geo_range_str: Optional[str] = None + sample_rate_str: Optional[str] = None + notes: Optional[str] = None + project: Optional[str] = None + client: Optional[str] = None + operator: Optional[str] = None + sensor_location: Optional[str] = None + + # Header — right column + serial: Optional[str] = None + firmware: Optional[str] = None + battery_volts: Optional[float] = None + calibration_date: Optional[str] = None + calibration_by: Optional[str] = None + file_name: Optional[str] = None + post_event_notes: Optional[str] = None + + # Microphone block + mic_pspl_dbl: Optional[float] = None + mic_pspl_psi: Optional[float] = None + mic_pspl_time_s: Optional[float] = None + mic_pspl_when_str: Optional[str] = None # histogram absolute date+time, BW-formatted + mic_zc_freq_hz: Optional[float] = None + mic_zc_freq_above_range: bool = False + mic_channel_test_result: Optional[str] = None + mic_channel_test_freq_hz: Optional[float] = None + mic_channel_test_amp_mv: Optional[float] = None + + # Per-channel stats — list of dicts (one per channel) + # Keys: name, ppv_ips, zc_freq_hz, time_of_peak_s, + # peak_accel_g, peak_disp_in, sensor_check + channel_stats: list[dict] = field(default_factory=list) + + # Peak Vector Sum + peak_vector_sum_ips: Optional[float] = None + peak_vector_sum_time_s: Optional[float] = None + + # Waveform samples — channels[ch] = list of floats in physical units + # Time axis derived from sample_rate + pretrig_samples + channels: dict = field(default_factory=dict) + sample_rate_sps: Optional[int] = None + pretrig_samples: Optional[int] = None + t0_ms: Optional[float] = None + dt_ms: Optional[float] = None + + # Record-type discriminator + record_type: Optional[str] = None + is_histogram: bool = False + + # Histogram-only fields — only populated for record_type starts with 'Hist' + histogram_start_str: Optional[str] = None # "22:30:38 May 16, 2026" + histogram_stop_str: Optional[str] = None + histogram_n_intervals: Optional[float] = None # 4.00 + histogram_interval_size: Optional[str] = None # "1 minute" + histogram_interval_size_s: Optional[float] = None # 60.0 — numeric seconds, used to derive interval_times + histogram_interval_times: list[str] = field(default_factory=list) # per-interval timestamps for x-axis + + # Peak Vector Sum metadata (histograms show absolute date+time) + peak_vector_sum_when_str: Optional[str] = None + + # Bookkeeping + event_id: Optional[str] = None + server_received_at: Optional[str] = None + bw_pc_sw_version: Optional[str] = None + + +def gather_report_data( + db, + store, + event_id: str, +) -> Optional[ReportData]: + """Collect every field needed to render an event report. + + Returns ``None`` if the event is unknown or has no waveform data + on disk (no .h5, no .a5.pkl — same condition the waveform.json + endpoint 404s on). + """ + row = db.get_event(event_id) + if row is None: + return None + serial = row.get("serial") + filename = row.get("blastware_filename") + if not serial or not filename: + return None + + rd = ReportData( + event_id=event_id, + serial=serial, + file_name=filename, + record_type=row.get("record_type"), + is_histogram=str(row.get("record_type", "")).lower().startswith("hist"), + event_datetime_str=row.get("timestamp"), + sample_rate_sps=row.get("sample_rate"), + project=row.get("project"), + client=row.get("client"), + operator=row.get("operator"), + sensor_location=row.get("sensor_location"), + server_received_at=row.get("created_at"), + ) + + # ── Sidecar bw_report — the rich BW-derived fields ── + sidecar_path = store.sidecar_path_for(serial, filename) + if sidecar_path.exists(): + try: + sc = json.loads(sidecar_path.read_text()) + except Exception as exc: + log.warning("gather_report_data: sidecar read failed: %s", exc) + sc = {} + bw = sc.get("bw_report") or {} + + # Trigger / range / sample-rate display + trig = bw.get("trigger") or {} + rd.trigger_source = ( + f"{trig.get('channel','')}: {trig.get('geo_level_ips')} in/s" + if trig.get("channel") or trig.get("geo_level_ips") is not None + else None + ) + rec = bw.get("recording") or {} + rd.geo_range_str = ( + f"Geo: {rec.get('geo_range_ips')} in/s" + if rec.get("geo_range_ips") is not None else None + ) + rt = rec.get("record_time_s") + if rt is not None and rd.sample_rate_sps: + rd.sample_rate_str = f"{rt:.1f} sec At {rd.sample_rate_sps} Sps" + + # Device block + dev = bw.get("device") or {} + rd.battery_volts = dev.get("battery_volts") + rd.calibration_date = dev.get("calibration_date") + rd.calibration_by = dev.get("calibration_by") + rd.firmware = bw.get("version") + rd.bw_pc_sw_version = bw.get("pc_sw_version") + + # Microphone block + mic = bw.get("mic") or {} + rd.mic_pspl_dbl = mic.get("pspl_dbl") + if rd.mic_pspl_dbl is not None and rd.mic_pspl_dbl > 0: + # Inverse of the dBL formula → psi. Mirrors waveform_codec convention. + rd.mic_pspl_psi = DBL_REF_PSI * (10 ** (rd.mic_pspl_dbl / 20)) + rd.mic_pspl_time_s = mic.get("time_of_peak_s") + rd.mic_zc_freq_hz = mic.get("zc_freq_hz") + rd.mic_zc_freq_above_range = bool(mic.get("zc_freq_above_range")) + sc_mic = (bw.get("sensor_check") or {}).get("mic") or {} + rd.mic_channel_test_result = sc_mic.get("result") + rd.mic_channel_test_freq_hz = sc_mic.get("freq_hz") + rd.mic_channel_test_amp_mv = sc_mic.get("amplitude_mv") + + # Per-channel stats (Tran / Vert / Long). Per-channel peak + # date+time for histograms comes from bw_report.histogram.channel_peak_when + # (populated when the parser captured it; see the bw_ascii_report + # parser's histogram-fields handler). + peaks = bw.get("peaks") or {} + sc_block = bw.get("sensor_check") or {} + hist_block = bw.get("histogram") or {} + peak_when = hist_block.get("channel_peak_when") or {} + for ch_lc, ch_label in (("tran", "Tran"), ("vert", "Vert"), ("long", "Long")): + ch = peaks.get(ch_lc) or {} + sc_ch = sc_block.get(ch_lc) or {} + ch_when_iso = peak_when.get(ch_label) + peak_date, peak_time = _split_iso_to_date_time(ch_when_iso) + rd.channel_stats.append({ + "name": ch_label, + "ppv_ips": ch.get("ppv_ips"), + "zc_freq_hz": ch.get("zc_freq_hz"), + "zc_freq_above_range": bool(ch.get("zc_freq_above_range")), + "time_of_peak_s": ch.get("time_of_peak_s"), + "peak_accel_g": ch.get("peak_accel_g"), + "peak_disp_in": ch.get("peak_disp_in"), + "sensor_check": sc_ch.get("result"), + "peak_date": peak_date, + "peak_time": peak_time, + }) + + # MicL peak time (used in the mic block — "PSPL ... on DATE at TIME") + mic_when_iso = peak_when.get("MicL") + rd.mic_pspl_when_str = _fmt_iso_to_bw(mic_when_iso) if mic_when_iso else None + + # Peak Vector Sum + vs = peaks.get("vector_sum") or {} + rd.peak_vector_sum_ips = vs.get("ips") + rd.peak_vector_sum_time_s = vs.get("time_s") + # PVS absolute date+time (histograms). Same formatting as Mic. + pvs_when_iso = vs.get("when") + rd.peak_vector_sum_when_str = _fmt_iso_to_bw(pvs_when_iso) if pvs_when_iso else None + + # Histogram-specific header fields — keys match the projection in + # _bw_report_to_dict ("start" / "stop", not "_str" suffixed). + if rd.is_histogram: + rd.histogram_start_str = hist_block.get("start") or rd.event_datetime_str + rd.histogram_stop_str = hist_block.get("stop") + rd.histogram_n_intervals = hist_block.get("n_intervals") + rd.histogram_interval_size = hist_block.get("interval_size") + rd.histogram_interval_size_s = hist_block.get("interval_size_s") + rd.histogram_interval_times = hist_block.get("interval_times") or [] + + # ── Waveform samples — from the .h5 via the existing helper ── + from sfm import event_hdf5 + h5_path = store.hdf5_path_for(serial, filename) + if h5_path.exists(): + try: + wf = event_hdf5.plot_json_from_hdf5(h5_path, event_id=event_id) + rd.channels = { + ch: (chd.get("values") or []) + for ch, chd in (wf.get("channels") or {}).items() + } + ta = wf.get("time_axis") or {} + rd.sample_rate_sps = rd.sample_rate_sps or ta.get("sample_rate") + rd.pretrig_samples = ta.get("pretrig_samples") + rd.t0_ms = ta.get("t0_ms") + rd.dt_ms = ta.get("dt_ms") + except Exception as exc: + log.warning("gather_report_data: hdf5 read failed: %s", exc) + + # ── Histogram aggregation ── + # Codec emits ~N per-block samples (typically 1/sec); BW reports + # one bar per configured interval (1 min / 5 min / etc.). When + # bw_report.histogram.n_intervals is populated (events ingested + # with the parser extension), group max-per-group to match. Also + # derives per-interval timestamps for the x-axis. No-op for + # waveform events or when n_intervals is missing. + if rd.is_histogram and rd.histogram_n_intervals and rd.histogram_n_intervals >= 1: + n = int(rd.histogram_n_intervals) + for ch, vals in list(rd.channels.items()): + if not vals: + continue + per_group = len(vals) // n + remainder = len(vals) % n + agg: list = [] + offset = 0 + for i in range(n): + grp_size = per_group + (1 if i < remainder else 0) + if grp_size > 0: + grp = vals[offset:offset + grp_size] + agg.append(max((abs(v) for v in grp if v is not None), default=0)) + offset += grp_size + else: + agg.append(0) + rd.channels[ch] = agg + # Derive per-interval HH:MM:SS labels if we have the start time + size + if rd.histogram_start_str and rd.histogram_interval_size_s and not rd.histogram_interval_times: + try: + import datetime as _dt + start = _dt.datetime.fromisoformat(rd.histogram_start_str) + rd.histogram_interval_times = [ + (start + _dt.timedelta(seconds=(i + 1) * rd.histogram_interval_size_s)).strftime("%H:%M:%S") + for i in range(n) + ] + except Exception: + pass + + return rd + + +# ── PDF rendering ──────────────────────────────────────────────────────────── + + +def render_event_report_pdf(rd: ReportData) -> bytes: + """Render an event report dict to a single-page letter PDF. + + Branches on ``rd.is_histogram`` — waveform and histogram layouts + differ in their header fields, stats-table rows, and bottom plot. + Layout modeled on Blastware's Event Report PDFs (samples in + docs/reference/instantel/). + """ + # Letter portrait — 8.5"×11" + fig = plt.figure(figsize=(8.5, 11), dpi=100) + fig.patch.set_facecolor("white") + + if rd.is_histogram: + _render_histogram_layout(fig, rd) + else: + _render_waveform_layout(fig, rd) + + # Page footer (common to both layouts) — Created date + event id. + # Pushed to the very page bottom so it doesn't collide with the + # waveform footer scale / trigger legend lines just above. + # Convert UTC server_received_at to local for display. + created_local = _fmt_iso_to_bw(rd.server_received_at) if rd.server_received_at else "—" + fig.text( + 0.07, 0.005, + f"Created: {created_local} • seismo-relay", + fontsize=6, color="#888", ha="left", + ) + fig.text( + 0.93, 0.005, + f"Event {rd.event_id[:8] if rd.event_id else '—'}", + fontsize=6, color="#888", ha="right", + ) + + buf = io.BytesIO() + fig.savefig(buf, format="pdf") + plt.close(fig) + return buf.getvalue() + + +def _render_waveform_layout(fig, rd: ReportData) -> None: + """Waveform layout: header / mic+USBM / per-channel stats / waveform plot. + + Stats table includes Time (Rel. to Trig), Peak Accel, Peak Disp. + Left margin sized to fit the channel labels (MicL/Long/Vert/Tran). + Extra bottom margin reserves space for x-axis tick labels + + "Amplitude Geo: X in/s/div Mic: Y psi(L)/div" footer + trigger + legend without overlap. + """ + gs = fig.add_gridspec( + nrows=4, ncols=1, + left=0.11, right=0.94, top=0.97, bottom=0.12, + height_ratios=[1.7, 2.0, 1.8, 5.5], + hspace=0.35, + ) + ax_header = fig.add_subplot(gs[0]); ax_header.axis("off") + _draw_header_waveform(ax_header, rd) + + ax_mid = fig.add_subplot(gs[1]); ax_mid.axis("off") + _draw_mic_and_usbm(ax_mid, rd) + + ax_stats = fig.add_subplot(gs[2]); ax_stats.axis("off") + _draw_channel_stats_waveform(ax_stats, rd) + + _draw_waveform_subplot(fig, gs[3], rd) + + +def _render_histogram_layout(fig, rd: ReportData) -> None: + """Histogram layout: header / mic-only / per-channel stats / bar plot. + + No USBM compliance chart (it's a waveform-only concept). Stats table + uses Date + Time-of-peak instead of relative-time + accel + disp. + Left margin sized to fit the channel labels. Extra bottom margin + leaves room for the x-axis time labels + footer scale legend + without overlap. + """ + gs = fig.add_gridspec( + nrows=4, ncols=1, + left=0.11, right=0.94, top=0.97, bottom=0.12, + height_ratios=[1.8, 0.9, 1.7, 5.6], + hspace=0.35, + ) + ax_header = fig.add_subplot(gs[0]); ax_header.axis("off") + _draw_header_histogram(ax_header, rd) + + ax_mic = fig.add_subplot(gs[1]); ax_mic.axis("off") + _draw_mic_only(ax_mic, rd) + + ax_stats = fig.add_subplot(gs[2]); ax_stats.axis("off") + _draw_channel_stats_histogram(ax_stats, rd) + + _draw_histogram_subplot(fig, gs[3], rd) + + +def _to_display_local(iso: str): + """Parse an ISO timestamp and return a datetime in the system's local + timezone (set by the TZ env var, default America/New_York via the + Dockerfile). + + Behaviour: + - "...Z" or "...+HH:MM" suffix → tz-aware UTC → converted to local + - Naïve "YYYY-MM-DDTHH:MM:SS" (no tz) → returned as-is. This + matches the convention used elsewhere in seismo-relay: BW's + recorded-at timestamps are naïve and ALREADY in the unit's + local clock; we don't second-guess them. + """ + import datetime as _dt + dt = _dt.datetime.fromisoformat(iso.replace("Z", "+00:00")) + if dt.tzinfo is not None: + # Convert from UTC (or other tz) → local per the TZ env var. + # astimezone() without arg uses the system timezone. + dt = dt.astimezone() + return dt + + +def _fmt_iso_to_bw(iso: Optional[str]) -> Optional[str]: + """Convert an ISO-8601 timestamp to BW's display format + '22:30:37 May 16, 2026'. UTC inputs (with Z suffix) are + converted to the system's local timezone first; naïve inputs + are formatted as-is. Returns input unchanged on parse failure.""" + if not iso or "T" not in iso: + return iso + try: + return _to_display_local(iso).strftime("%H:%M:%S %B %d, %Y").replace(" 0", " ") + except Exception: + return iso + + +def _split_iso_to_date_time(iso: Optional[str]) -> tuple[Optional[str], Optional[str]]: + """Split an ISO timestamp into BW-formatted ('May 27 /26', '06:06:14') + date+time strings. Used for the histogram stats table where the + Date and Time rows are presented separately. UTC inputs are + converted to local time first. Returns (None, None) on parse failure.""" + if not iso: + return (None, None) + try: + dt = _to_display_local(iso) + # BW format: 'May 27 /26' (3-letter month + 2-digit year) + date_str = dt.strftime("%b %d /%y").replace(" 0", " ") + time_str = dt.strftime("%H:%M:%S") + return (date_str, time_str) + except Exception: + return (None, None) + + +def _kv(ax, x, y, label, value, *, label_w=0.18): + """Render a 'Label Value' row at axes-coordinates (x, y).""" + ax.text(x, y, label, fontsize=8, color="#555", ha="left", va="top", + transform=ax.transAxes) + ax.text(x + label_w, y, _fmt(value), fontsize=8, ha="left", va="top", + transform=ax.transAxes, family="monospace") + + +def _fmt(v): + """Format any field for display — '—' for None, str otherwise.""" + if v is None: + return "—" + if isinstance(v, float): + return f"{v:.4f}".rstrip("0").rstrip(".") + return str(v) + + +def _draw_header_waveform(ax, rd: ReportData) -> None: + """Two-column metadata header — waveform variant.""" + rows_left = [ + ("Date/Time", _fmt_iso_to_bw(rd.event_datetime_str)), + ("Trigger Source", rd.trigger_source), + ("Range", rd.geo_range_str), + ("Sample Rate", rd.sample_rate_str), + ("Notes", rd.notes), + ("Project:", rd.project), + ("Client:", rd.client), + ("User Name:", rd.operator), + ("Seis. Loc:", rd.sensor_location), + ] + _draw_header_columns(ax, rows_left, rd) + + +def _draw_header_histogram(ax, rd: ReportData) -> None: + """Two-column metadata header — histogram variant. + + Histograms have Start / Finish / Intervals fields instead of + Trigger Source (there's no trigger event for a histogram capture). + """ + intervals_str = None + if rd.histogram_n_intervals is not None and rd.histogram_interval_size: + intervals_str = f"{rd.histogram_n_intervals} At {rd.histogram_interval_size}" + rows_left = [ + ("Start", _fmt_iso_to_bw(rd.histogram_start_str or rd.event_datetime_str)), + ("Finish", _fmt_iso_to_bw(rd.histogram_stop_str)), + ("Intervals", intervals_str), + ("Range", rd.geo_range_str), + ("Sample Rate", (f"{rd.sample_rate_sps} Sps" if rd.sample_rate_sps else None)), + ("Notes", rd.notes), + ("Project:", rd.project), + ("Client:", rd.client), + ("User Name:", rd.operator), + ("Seis. Loc:", rd.sensor_location), + ] + _draw_header_columns(ax, rows_left, rd) + + +def _draw_header_columns(ax, rows_left, rd: ReportData) -> None: + """Shared 2-column header rendering used by both layouts.""" + rows_right = [ + ("Serial Number", f"{rd.serial or '—'}" + (f" {rd.firmware}" if rd.firmware else "")), + ("Battery Level", f"{rd.battery_volts:.1f} Volts" if rd.battery_volts is not None else None), + ("Unit Calibration", (f"{rd.calibration_date}" + (f" by {rd.calibration_by}" if rd.calibration_by else "")) + if rd.calibration_date else None), + ("File Name", rd.file_name), + ("Post Event Notes", rd.post_event_notes), + ] + y = 0.95 + dy = 0.095 + for label, value in rows_left: + _kv(ax, 0.0, y, label, value, label_w=0.18) + y -= dy + y = 0.95 + for label, value in rows_right: + _kv(ax, 0.55, y, label, value, label_w=0.20) + y -= dy + + +def _draw_mic_only(ax, rd: ReportData) -> None: + """Mic block (histogram variant — no USBM chart).""" + ax.text(0.0, 0.95, "Microphone Linear Weighting", fontsize=8, color="#555", + transform=ax.transAxes, va="top") + rows = _mic_rows(rd) + y = 0.70 + for label, value in rows: + _kv(ax, 0.0, y, label, value, label_w=0.18) + y -= 0.22 + + +def _draw_mic_and_usbm(ax, rd: ReportData) -> None: + """Mic block on the left + USBM compliance chart placeholder on right. + (Waveform variant — USBM is a velocity-vs-frequency compliance plot + that doesn't apply to histograms.)""" + ax.text(0.0, 0.95, "Microphone Linear Weighting", fontsize=8, color="#555", + transform=ax.transAxes, va="top") + rows = _mic_rows(rd) + y = 0.80 + for label, value in rows: + _kv(ax, 0.0, y, label, value, label_w=0.18) + y -= 0.15 + + # USBM chart placeholder — upper-right. Real piecewise compliance + # curves are a separate work item; for now this just shows the title + # + a "see report" message so the layout is correct. + ax.text(0.72, 0.97, "USBM RI8507 And OSMRE", + fontsize=9, weight="bold", color="#333", ha="center", va="top", + transform=ax.transAxes) + ax.text(0.72, 0.50, "[compliance chart\ncoming soon]", + fontsize=8, color="#bbb", ha="center", va="center", + transform=ax.transAxes, style="italic") + + +def _mic_rows(rd: ReportData) -> list[tuple[str, Optional[str]]]: + """Build the mic-section value rows (shared by both layouts). + + For histograms, BW formats the PSPL line as + "125.7 dB(L) on May 27, 2026 at 06:19:14" + (absolute date+time of peak). Waveform events show the relative + "at 0.012 sec." instead. Both formats covered here based on which + field is populated. + """ + rows: list[tuple[str, Optional[str]]] = [] + if rd.mic_pspl_dbl is not None: + line = f"{rd.mic_pspl_dbl:.1f} dB(L)" + if rd.mic_pspl_when_str: + # Histogram-style: "PSPL 125.7 dB(L) on May 27, 2026 at 06:19:14" + # mic_pspl_when_str is already "HH:MM:SS Month DD, YYYY"; + # reformat to "on Month DD, YYYY at HH:MM:SS" for BW match. + parts = rd.mic_pspl_when_str.split(" ", 1) + if len(parts) == 2: + line += f" on {parts[1]} at {parts[0]}" + else: + line += f" on {rd.mic_pspl_when_str}" + elif rd.mic_pspl_time_s is not None: + # Waveform-style: relative-to-trigger seconds. + line += f" at {rd.mic_pspl_time_s:.3f} sec." + rows.append(("PSPL", line)) + if rd.mic_zc_freq_hz is not None: + prefix = ">" if rd.mic_zc_freq_above_range else "" + rows.append(("ZC Freq", f"{prefix}{rd.mic_zc_freq_hz:.0f} Hz")) + if rd.mic_channel_test_result: + line = rd.mic_channel_test_result + if rd.mic_channel_test_freq_hz is not None and rd.mic_channel_test_amp_mv is not None: + line += (f" (Freq = {rd.mic_channel_test_freq_hz:.1f} Hz, " + f"Amp = {rd.mic_channel_test_amp_mv:.0f} mv)") + rows.append(("Channel Test", line)) + return rows + + +def _draw_channel_stats_waveform(ax, rd: ReportData) -> None: + """Waveform stats table — has Time (Rel. to Trig), Peak Accel, Peak Disp. + Followed by Peak Vector Sum line.""" + rows_spec = [ + ("PPV", "ppv_ips", "in/s"), + ("ZC Freq", "zc_freq_hz", "Hz"), + ("Time (Rel. to Trig)", "time_of_peak_s", "sec"), + ("Peak Acceleration", "peak_accel_g", "g"), + ("Peak Displacement", "peak_disp_in", "in"), + ("Sensor Check", "sensor_check", ""), + ] + _draw_stats_table(ax, rd, rows_spec) + if rd.peak_vector_sum_ips is not None: + line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s" + if rd.peak_vector_sum_time_s is not None: + line += f" At {rd.peak_vector_sum_time_s:.3f} sec." + ax.text(0.0, -0.08, line, fontsize=9, weight="bold", + ha="left", va="top", transform=ax.transAxes) + ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888", + ha="left", va="top", transform=ax.transAxes) + + +def _draw_channel_stats_histogram(ax, rd: ReportData) -> None: + """Histogram stats table — PPV, ZC Freq, Date, Time of peak, Sensor Check. + Followed by Peak Vector Sum line.""" + # Date / Time of peak are per-channel timestamps for the interval at peak. + # bw_report stores time_of_peak_s as relative seconds, but for histograms + # BW shows them as absolute date+time. We populate from rd.channel_stats + # if those absolute fields are present; otherwise fall back to relative. + rows_spec = [ + ("PPV", "ppv_ips", "in/s"), + ("ZC Freq", "zc_freq_hz", "Hz"), + ("Date", "peak_date", ""), + ("Time", "peak_time", ""), + ("Sensor Check", "sensor_check", ""), + ] + _draw_stats_table(ax, rd, rows_spec) + if rd.peak_vector_sum_ips is not None: + line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s" + # Histograms: "0.091 in/s on May 27, 2026 At 06:06:14" + # The when_str is "HH:MM:SS Month DD, YYYY" — reformat for BW match. + if rd.peak_vector_sum_when_str: + parts = rd.peak_vector_sum_when_str.split(" ", 1) + if len(parts) == 2: + line += f" on {parts[1]} At {parts[0]}" + else: + line += f" on {rd.peak_vector_sum_when_str}" + ax.text(0.0, -0.08, line, fontsize=9, weight="bold", + ha="left", va="top", transform=ax.transAxes) + ax.text(0.0, -0.18, "NA: Not Applicable", fontsize=7, color="#888", + ha="left", va="top", transform=ax.transAxes) + + +def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]]) -> None: + """Render a per-channel stats table (Tran/Vert/Long). + + rows_spec: list of (label, field_name_in_channel_stats, unit_string) + """ + headers = ["", "Tran", "Vert", "Long", ""] + ch_lookup = {c["name"]: c for c in rd.channel_stats} + + def _cell(field, ch_name): + ch_rec = ch_lookup.get(ch_name, {}) + val = ch_rec.get(field) + if val is None: + return "—" + if isinstance(val, float): + # ZC Freq is integer-formatted in BW; ">100 Hz" sentinel + # rendered as ">N" (val carries the threshold). Everything + # else gets 3 decimals. + if field == "zc_freq_hz": + prefix = ">" if ch_rec.get("zc_freq_above_range") else "" + return f"{prefix}{val:.0f}" + return f"{val:.3f}" + return str(val) + + table_data = [headers] + for label, field_name, unit in rows_spec: + table_data.append([ + label, + _cell(field_name, "Tran"), + _cell(field_name, "Vert"), + _cell(field_name, "Long"), + unit, + ]) + tbl = ax.table( + cellText=table_data, loc="upper left", + colWidths=[0.28, 0.14, 0.14, 0.14, 0.10], + cellLoc="left", edges="open", + ) + tbl.auto_set_font_size(False) + tbl.set_fontsize(8) + tbl.scale(1, 1.4) + for j in range(5): + tbl[(0, j)].set_text_props(weight="bold", color="#555") + + +def _channel_axis_color(ch: str) -> str: + return {"MicL": "#cc00cc", "Long": "#0066ff", "Vert": "#009933", "Tran": "#cc0000"}.get(ch, "#444") + + +def _draw_waveform_subplot(fig, gridspec_cell, rd: ReportData) -> None: + """4-channel stacked waveform plot — Instantel printout order + (MicL on top, Tran on bottom), shared x-axis in SECONDS, trigger + triangle markers at t=0, '0.0' baseline label on right of each.""" + inner = gridspec_cell.subgridspec(4, 1, hspace=0.0) + order = ["MicL", "Long", "Vert", "Tran"] + sr = rd.sample_rate_sps or 1024 + # Convert ms-based time axis to seconds for the x-axis + dt_s = (rd.dt_ms or (1000.0 / sr)) / 1000.0 + t0_s = (rd.t0_ms if rd.t0_ms is not None else 0.0) / 1000.0 + + last_idx = len(order) - 1 + for i, ch in enumerate(order): + ax = fig.add_subplot(inner[i]) + values = rd.channels.get(ch) or [] + times = [t0_s + j * dt_s for j in range(len(values))] + + if values: + color = _channel_axis_color(ch) + ax.plot(times, values, color=color, linewidth=0.5) + # Symmetric y-axis for geo; zero-anchored for mic. + if ch != "MicL": + amax = max((abs(v) for v in values), default=0.001) + ax.set_ylim(-amax * 1.10, amax * 1.10) + else: + amax = max((abs(v) for v in values), default=0.001) + ax.set_ylim(-amax * 1.10, amax * 1.10) + + # Channel label on the LEFT (matches BW) + ax.set_ylabel(ch, fontsize=8, rotation=0, ha="right", va="center", + color=_channel_axis_color(ch), weight="bold", labelpad=14) + # "0.0" on the RIGHT (BW convention) + ax.text(1.005, 0.5, "0.0", transform=ax.transAxes, + fontsize=7, color="#555", va="center", ha="left") + + ax.grid(True, linestyle="--", linewidth=0.3, color="#bbb", alpha=0.6) + # Vertical dashed trigger line at t=0 + ax.axvline(0.0, color="#cc0000", linestyle="--", linewidth=0.6, alpha=0.7) + # Zero baseline horizontal + ax.axhline(0.0, color=_channel_axis_color(ch), linestyle="-", + linewidth=0.4, alpha=0.5) + + if i != last_idx: + ax.set_xticklabels([]) + ax.tick_params(axis="x", length=0) + else: + ax.tick_params(axis="x", labelsize=7) + ax.tick_params(axis="y", labelsize=6) + + # Trigger triangle marker ▼ above the top channel at t=0 + top_ax = fig.axes[-4] # MicL is the first added in this gridspec + top_ax.plot([0], [top_ax.get_ylim()[1]], marker="v", color="black", + markersize=8, clip_on=False, zorder=10) + + # Compute scale-per-division for the footer (10 divs across the chart) + # and find peak geo amplitude for the geo amp/div setting. + total_s = times[-1] - times[0] if values else 0 + div_s = total_s / 10 if total_s > 0 else 0 + geo_amp_div = "—" + for ch in ("Tran", "Vert", "Long"): + v = rd.channels.get(ch) or [] + if v: + amax = max(abs(x) for x in v) + geo_amp_div = f"{(amax * 1.1 * 2) / 10:.3f}" + break + fig.text( + 0.11, 0.030, + f"Time(Seconds) {div_s:.2f} sec/div Amplitude Geo: {geo_amp_div} in/s/div Mic: 0.001 psi(L)/div", + fontsize=7, color="#444", ha="left", + ) + fig.text( + 0.11, 0.018, + "Trigger = ▶━━━━━ ━━━━━━◀", + fontsize=7, color="#444", ha="left", + ) + + +def _nice_geo_step(amax: float) -> float: + """Pick a "nice" per-division step for the geo y-axis. + + Geo LSB is 0.005 in/s — sub-LSB steps like 0.003/div are nonsense. + Quantize to the BW-style 1-2-5 sequence (0.005, 0.01, 0.025, 0.05, + …) and return the smallest step where 5 divisions >= amax, so the + top of the chart lands on a tick. + """ + if amax <= 0: + return 0.005 + for step in (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0): + if step * 5 >= amax: + return step + return 10.0 + + +def _draw_histogram_subplot(fig, gridspec_cell, rd: ReportData) -> None: + """4-channel stacked histogram bar chart — per-interval peaks. + + X-axis labeled with the actual times from rd.histogram_interval_times + when available; otherwise interval index. + + The three geo channels share a single y-axis scale (a BW-style nice + multiple of the 0.005 in/s LSB) so bar heights are directly + comparable across channels. MicL has its own auto-scale. + """ + inner = gridspec_cell.subgridspec(4, 1, hspace=0.0) + order = ["MicL", "Long", "Vert", "Tran"] + last_idx = len(order) - 1 + + # X-axis: use absolute time labels if we have them, else interval index + have_times = bool(rd.histogram_interval_times) + + # Shared geo scale: max across Tran/Vert/Long, quantized to a nice + # tick step. Used for ylim + the footer "Amplitude Geo: X in/s/div". + geo_amax = 0.0 + for gch in ("Tran", "Vert", "Long"): + gv = rd.channels.get(gch) or [] + if gv: + geo_amax = max(geo_amax, max(abs(x) for x in gv if x is not None)) + geo_step = _nice_geo_step(geo_amax) + geo_top = geo_step * 5 # 5 divisions — top tick lands at this value + + for i, ch in enumerate(order): + ax = fig.add_subplot(inner[i]) + values = rd.channels.get(ch) or [] + if values: + # Histograms record per-interval PEAK magnitudes — always + # non-negative. Codec output occasionally includes signed + # values when the underlying .h5 was scaled like a waveform; + # take the absolute value so the bars rise from zero. + abs_vals = [abs(v) if v is not None else 0 for v in values] + xs = np.arange(len(abs_vals)) + color = _channel_axis_color(ch) + ax.bar(xs, abs_vals, color=color, width=0.85, linewidth=0) + if ch in ("Tran", "Vert", "Long"): + ax.set_ylim(0, geo_top) + ax.set_yticks([j * geo_step for j in range(6)]) + else: + amax = max(abs_vals, default=0) + if amax > 0: + ax.set_ylim(0, amax * 1.10) + ax.set_ylabel(ch, fontsize=8, rotation=0, ha="right", va="center", + color=_channel_axis_color(ch), weight="bold", labelpad=14) + ax.text(1.005, 0.02, "0.0", transform=ax.transAxes, + fontsize=7, color="#555", va="bottom", ha="left") + ax.grid(True, axis="y", linestyle="--", linewidth=0.3, color="#bbb", alpha=0.6) + if i != last_idx: + ax.set_xticklabels([]) + ax.tick_params(axis="x", length=0) + else: + if have_times and len(rd.histogram_interval_times) == len(values): + # Show 2-4 labels evenly spaced + n = len(values) + step = max(1, n // 4) + tick_positions = list(range(0, n, step)) + ax.set_xticks(tick_positions) + ax.set_xticklabels([rd.histogram_interval_times[t] for t in tick_positions], + rotation=0, fontsize=6) + else: + ax.set_xlabel("Interval", fontsize=8) + ax.tick_params(axis="x", labelsize=7) + ax.tick_params(axis="y", labelsize=6) + + # Footer scale info — histograms use minute/div. Reuses the shared + # geo_step computed above so the label matches the actual y-axis + # tick spacing on every subplot. + interval_str = rd.histogram_interval_size or "—" + geo_amp_div = f"{geo_step:.3f}" + fig.text( + 0.11, 0.030, + f"Time {interval_str} /div Amplitude Geo: {geo_amp_div} in/s/div Mic: 0.001 psi(L)/div", + fontsize=7, color="#444", ha="left", + ) diff --git a/sfm/server.py b/sfm/server.py index 5934cf9..ed42775 100644 --- a/sfm/server.py +++ b/sfm/server.py @@ -46,7 +46,7 @@ from typing import Optional # FastAPI / Pydantic try: - from fastapi import Body, FastAPI, File, HTTPException, Query, UploadFile + from fastapi import Body, FastAPI, File, HTTPException, Query, Response, UploadFile from fastapi.middleware.cors import CORSMiddleware from fastapi.responses import FileResponse, JSONResponse, StreamingResponse from pydantic import BaseModel @@ -381,10 +381,24 @@ def webapp(): @app.get("/waveform", response_class=FileResponse) def waveform_viewer(): - """Serve the standalone waveform viewer.""" + """Serve the standalone LIVE-device waveform viewer. + + Talks to ``/device/*`` endpoints — for plotting events pulled from + a connected unit in real time. For the stored-event browser that + reads from the SeismoDb + WaveformStore, see ``/events``. + """ return str(Path(__file__).parent / "waveform_viewer.html") +@app.get("/events", response_class=FileResponse) +def event_browser(): + """Serve the stored-event browser — pick a serial, list its events, + render any one's waveform from the persisted ``.h5`` via the + ``/db/events/{id}/waveform.json`` endpoint. Standalone HTML + + Chart.js, no auth, no build step.""" + return str(Path(__file__).parent / "event_browser.html") + + @app.get("/device/info") def device_info( port: Optional[str] = Query(None, description="Serial port (e.g. COM5, /dev/ttyUSB0)"), @@ -1973,10 +1987,15 @@ def _cleanup_event_files(row: dict) -> dict: base_name = bw_name or a5_name or sc_name if base_name: bw_path, a5_path = store.paths_for(serial, base_name) - sc_path = store.sidecar_path_for(serial, base_name) - h5_path = store.hdf5_path_for(serial, base_name) + sc_path = store.sidecar_path_for(serial, base_name) + h5_path = store.hdf5_path_for(serial, base_name) + # Preserved BW ASCII report (added 2026-05-27 with the .TXT + # preservation feature) — needs to be cleaned up too, otherwise + # deletes leave orphan _ASCII.TXT files behind. + txt_path = store.txt_path_for(serial, base_name) for kind, p in [("blastware", bw_path), ("a5_pickle", a5_path), - ("sidecar", sc_path), ("hdf5", h5_path)]: + ("sidecar", sc_path), ("hdf5", h5_path), + ("txt", txt_path)]: try: if p.exists(): p.unlink() @@ -2164,6 +2183,148 @@ def db_event_blastware_file(event_id: str) -> FileResponse: ) +@app.get("/db/events/{event_id}/ascii_report.txt") +def db_event_ascii_report_txt(event_id: str): + """Serve the raw BW ASCII report (.TXT) for an event, when preserved. + + Returns 404 for events ingested before the .TXT-preservation feature + landed (2026-05-27) — those events have only the parsed ``bw_report`` + block in the sidecar, not the raw .TXT. Re-forwarding from the + watcher PC will populate the .TXT going forward. + """ + row = _get_db().get_event(event_id) + if row is None: + raise HTTPException(status_code=404, detail=f"Event {event_id} not found") + serial = row.get("serial") + filename = row.get("blastware_filename") + if not serial or not filename: + raise HTTPException(status_code=404, detail="Event has no associated BW file") + txt_path = _get_store().open_txt(serial, filename) + if txt_path is None: + raise HTTPException( + status_code=404, + detail=( + f"Raw .TXT not preserved for {filename}. Events ingested " + "before 2026-05-27 don't have it; re-forward from the " + "watcher PC to populate." + ), + ) + return FileResponse( + path=str(txt_path), + media_type="text/plain", + filename=txt_path.name, + ) + + +@app.get("/db/events/{event_id}/report.pdf") +def db_event_report_pdf(event_id: str): + """Render an Instantel-style Event Report as a PDF. + + Single-page letter portrait, matches the BW Event Report's data + coverage and layout (header / mic block / per-channel stats / + waveform plot). V0.20.0 stub — exact visual being iterated + against reference PDFs in ``docs/reference/instantel/``. + + Returns 404 if the event is unknown or has no waveform data on + disk (same condition as /waveform.json). + """ + from sfm import report_pdf + rd = report_pdf.gather_report_data(_get_db(), _get_store(), event_id) + if rd is None: + raise HTTPException(status_code=404, detail=f"Event {event_id} not found or has no waveform") + pdf_bytes = report_pdf.render_event_report_pdf(rd) + # Suggested download filename based on the BW file basename. + fname = (rd.file_name or event_id).replace(".", "_") + return Response( + content=pdf_bytes, + media_type="application/pdf", + headers={"Content-Disposition": f'inline; filename="{fname}_report.pdf"'}, + ) + + +def _maybe_aggregate_histogram(plot: dict, store, serial: str, filename: str, row: dict) -> dict: + """For histogram events, aggregate the codec's per-block samples into + the BW-reported number of intervals. No-op for waveforms or when + we don't have the histogram metadata (interval count + size) in the + sidecar's bw_report block. + + Why: the histogram codec emits one value per internal block (~1 per + second), but BW's printout shows one bar per configured interval + (typically 1-15 minutes). For a 1-minute-interval event the codec + gives ~60 blocks per BW bar. Aggregating max-per-group makes the + SFM chart + PDF visually match BW's display. + """ + record_type = row.get("record_type") or "" + if not record_type.lower().startswith("hist"): + return plot + + # Read interval count + size from the sidecar's bw_report.histogram block + try: + import json as _json + sidecar_path = store.sidecar_path_for(serial, filename) + if not sidecar_path.exists(): + return plot + sc = _json.loads(sidecar_path.read_text()) + hist = (sc.get("bw_report") or {}).get("histogram") or {} + n_intervals = hist.get("n_intervals") + interval_size_s = hist.get("interval_size_s") + start_iso = hist.get("start") + except Exception: + return plot + if not n_intervals or n_intervals < 1: + return plot + + # Aggregate each channel's values into n_intervals groups, max-per-group + channels = plot.get("channels") or {} + aggregated_channels: dict = {} + for ch, chd in channels.items(): + vals = chd.get("values") or [] + if not vals: + aggregated_channels[ch] = chd + continue + # Distribute len(vals) samples across n_intervals groups; uneven + # remainders get distributed across the first few groups. + per_group = len(vals) // n_intervals + remainder = len(vals) % n_intervals + agg: list = [] + offset = 0 + for i in range(n_intervals): + grp_size = per_group + (1 if i < remainder else 0) + if grp_size > 0: + grp = vals[offset:offset + grp_size] + # Max of absolute values (peaks are magnitudes). + agg.append(max((abs(v) for v in grp if v is not None), default=0)) + offset += grp_size + else: + agg.append(0) + aggregated_channels[ch] = {**chd, "values": agg} + + # Build per-interval timestamp labels for the x-axis if we have start time + interval_times: list = [] + if start_iso and interval_size_s: + try: + import datetime as _dt + start = _dt.datetime.fromisoformat(start_iso) + for i in range(int(n_intervals)): + # Show the END of each interval (BW convention — the + # peak reported is for samples taken THROUGH that time) + end = start + _dt.timedelta(seconds=(i + 1) * interval_size_s) + interval_times.append(end.strftime("%H:%M:%S")) + except Exception: + pass + + # Override the time_axis to reflect intervals (not samples). + plot_aggr = {**plot, "channels": aggregated_channels} + plot_aggr["time_axis"] = { + **(plot.get("time_axis") or {}), + "histogram_aggregated": True, + "n_intervals": int(n_intervals), + "interval_size_s": interval_size_s, + "interval_times": interval_times, + } + return plot_aggr + + @app.get("/db/events/{event_id}/waveform.json") def db_event_waveform_json(event_id: str) -> dict: """ @@ -2195,7 +2356,8 @@ def db_event_waveform_json(event_id: str) -> dict: h5_path = store.hdf5_path_for(serial, filename) if h5_path.exists(): try: - return event_hdf5.plot_json_from_hdf5(h5_path, event_id=event_id) + plot = event_hdf5.plot_json_from_hdf5(h5_path, event_id=event_id) + return _maybe_aggregate_histogram(plot, store, serial, filename, row) except Exception as exc: log.warning("HDF5 read failed (%s); falling back to A5 path", exc) diff --git a/sfm/sfm_webapp.html b/sfm/sfm_webapp.html index 576ae94..7f283a4 100644 --- a/sfm/sfm_webapp.html +++ b/sfm/sfm_webapp.html @@ -499,6 +499,20 @@ text-align: left; border-bottom: 1px solid var(--border); white-space: nowrap; + position: sticky; + top: 0; + z-index: 1; + } + table.db-table thead th[data-sort]:hover { + background: var(--border2); + color: var(--text); + } + table.db-table thead th .sort-arrow { + display: inline-block; + width: 10px; + color: var(--accent, #58a6ff); + font-weight: 900; + text-align: center; } table.db-table tbody tr { border-bottom: 1px solid var(--border2); } table.db-table tbody tr:last-child { border-bottom: none; } @@ -758,7 +772,9 @@ overflow: hidden; min-height: 0; } - #section-db { display: none; } + /* Default to Database view on page load — most users are here to + browse stored events, not connect to a live unit. */ + #section-live { display: none; } /* ── Live connect bar (host/port/connect, live section only) ── */ #live-connect-bar { @@ -792,8 +808,8 @@
- - + +
+
+ +
+

Waveform

+
Loading…
+
+

Event

Serial
-
Timestamp
+
Recorded at
+
Record type
Sample rate
Waveform key
@@ -2774,7 +3287,8 @@ document.getElementById('api-base').value = window.location.origin;
File size
File sha256
Source kind
-
Captured at
+
Received by server at
+
@@ -2797,6 +3311,10 @@ document.getElementById('api-base').value = window.location.origin;
diff --git a/sfm/waveform_store.py b/sfm/waveform_store.py index 5032dc2..d982dce 100644 --- a/sfm/waveform_store.py +++ b/sfm/waveform_store.py @@ -108,11 +108,30 @@ class WaveformStore: """Return absolute path to the .h5 clean-waveform file for a given event.""" return self._serial_dir(serial) / f"{filename}.h5" + def txt_path_for(self, serial: str, filename: str) -> Path: + """Return absolute path to the preserved BW ASCII report (.TXT) + for a given event. + + We name it ``_ASCII.TXT`` to match BW's own filename + convention in the ACH folder. Saved at ingest time alongside + the binary so the parser bug fixes can be applied retroactively + by re-parsing without needing to re-forward from the watcher PC. + """ + return self._serial_dir(serial) / f"{filename}_ASCII.TXT" + def open_blastware(self, serial: str, filename: str) -> Optional[Path]: """Return absolute path to an existing event file or None.""" bw_path, _ = self.paths_for(serial, filename) return bw_path if bw_path.exists() else None + def open_txt(self, serial: str, filename: str) -> Optional[Path]: + """Return absolute path to the preserved BW ASCII report for an + event, or None if the .TXT wasn't saved at ingest time (events + ingested before .TXT preservation landed will show None until + re-forwarded).""" + p = self.txt_path_for(serial, filename) + return p if p.exists() else None + # ── save / load ───────────────────────────────────────────────────────────── def save( @@ -357,6 +376,28 @@ class WaveformStore: filesize = bw_path.stat().st_size sha256 = event_file_io.file_sha256(bw_path) + # 1b. preserve the raw BW ASCII report (.TXT) alongside the binary. + # Saved at //_ASCII.TXT. Lets us re-parse + # offline after parser fixes without needing to re-forward from + # the watcher PC. Negligible storage cost (~15 KB per event). + # Skipped silently when no report was supplied (live download path, + # manual upload without paired TXT). + txt_filename: Optional[str] = None + if bw_report_text is not None: + try: + txt_path = self.txt_path_for(serial, filename) + if isinstance(bw_report_text, bytes): + txt_path.write_bytes(bw_report_text) + else: + txt_path.write_text(bw_report_text) + txt_filename = txt_path.name + except Exception as exc: + log.warning( + "save_imported_bw: failed to save TXT for %s: %s — " + "continuing without it", + filename, exc, + ) + # 2. write the .h5 clean-waveform file from the parsed Event. # Note: peaks here are computed from raw samples (the BW file # doesn't carry the device-authoritative 0C peaks). Best-effort. @@ -393,6 +434,7 @@ class WaveformStore: blastware_sha256=sha256, source_kind="bw-import", a5_pickle_filename=None, + txt_filename=txt_filename, review=existing_review, bw_report=bw_report, ) diff --git a/tests/test_bw_ascii_report.py b/tests/test_bw_ascii_report.py index 024a9a4..5756fb2 100644 --- a/tests/test_bw_ascii_report.py +++ b/tests/test_bw_ascii_report.py @@ -385,6 +385,98 @@ def test_user_notes_extra_lines_beyond_four_are_dropped(): assert "L5" not in r.user_note_labels.values() +def test_oorange_marker_treated_as_saturation(): + """BW writes 'OORANGE' (Out Of Range — truncated) when a channel + exceeds its full-scale. Verify ppv_ips falls back to geo_range_ips + + saturated flag is set, mirroring the real T190LD5Q.LK0W, + T438L713.RY0W, and K557L3YM.OE0W events from prod 2026-05-27. + """ + txt = """\ +"Event Type : Full Waveform" +"Serial Number : BE18190" +"Geo Range : 10.000 in/s" +"Tran PPV : 2.140 in/s" +"Vert PPV : OORANGE in/s" +"Long PPV : 2.830 in/s" +"Peak Vector Sum : OORANGE in/s" +"Peak Vector Sum TimeSum : 0.007 s" +"MicL PSPL : OORANGE " +""" + r = parse_report(txt) + # Tran/Long parse normally + assert r.channels["Tran"].ppv_ips == 2.14 + assert r.channels["Tran"].ppv_saturated is False + assert r.channels["Long"].ppv_ips == 2.83 + # Vert saturated → range max + flag + assert r.channels["Vert"].ppv_ips == 10.0 + assert r.channels["Vert"].ppv_saturated is True + # PVS saturated → sqrt(3) * range_max as upper bound + flag + import math + assert r.peak_vector_sum_ips == pytest.approx(math.sqrt(3) * 10.0) + assert r.peak_vector_sum_saturated is True + # Mic saturated → 140 dBL conservative upper bound + flag + assert r.mic.pspl_dbl == 140.0 + assert r.mic.pspl_saturated is True + # PVS time still parses despite the BW typo'd label "TimeSum" + assert r.peak_vector_sum_time_s == pytest.approx(0.007) + + +def test_real_oorange_event_t190_parses(): + """End-to-end against the real T190LD5Q.LK0W ASCII file pulled from + a Windows watcher PC on 2026-05-27. This is the canonical example + of the parser-PPV-miss bug we fixed in this iteration.""" + fixture_path = ( + Path(__file__).parent.parent / "example-events" / + "ascii-5-27-26" / "T190LD5Q_LK0W_ASCII.TXT" + ) + if not fixture_path.exists(): + pytest.skip("real ASCII fixture not present (local-only)") + r = parse_report_file(fixture_path) + assert r.serial == "BE18190" + assert r.geo_range_ips == 10.0 + # Tran reads cleanly, Vert was OORANGE + assert r.channels["Tran"].ppv_ips == pytest.approx(2.14) + assert r.channels["Vert"].ppv_ips == 10.0 + assert r.channels["Vert"].ppv_saturated is True + assert r.channels["Long"].ppv_ips == pytest.approx(2.83) + assert r.peak_vector_sum_saturated is True + assert r.peak_vector_sum_time_s == pytest.approx(0.007) + # Same fixture: Tran ZC Freq is ">100 Hz" — must parse as 100 + + # above_range flag, not None (which would render as "—" on the PDF). + assert r.channels["Tran"].zc_freq_hz == 100.0 + assert r.channels["Tran"].zc_freq_above_range is True + # Vert/Long are normal numeric values; flag stays False. + assert r.channels["Vert"].zc_freq_above_range is False + assert r.channels["Long"].zc_freq_above_range is False + + +def test_above_range_marker_treated_as_zc_threshold(): + """BW writes '>100 Hz' for ZC Freq when the zero-crossing algorithm + sees a peak too fast to count (cuts off at the device's 100 Hz + reporting ceiling). Parser must store the threshold + flag, not + fall back to None. + """ + txt = """\ +"Event Type : Full Waveform" +"Serial Number : BE18190" +"Tran ZC Freq : >100 Hz" +"Vert ZC Freq : 73 Hz" +"Long ZC Freq : N/A Hz" +"MicL ZC Freq : >100 Hz" +""" + r = parse_report(txt) + assert r.channels["Tran"].zc_freq_hz == 100.0 + assert r.channels["Tran"].zc_freq_above_range is True + assert r.channels["Vert"].zc_freq_hz == 73.0 + assert r.channels["Vert"].zc_freq_above_range is False + # N/A → None, flag stays False + assert r.channels["Long"].zc_freq_hz is None + assert r.channels["Long"].zc_freq_above_range is False + # Mic above-range + assert r.mic.zc_freq_hz == 100.0 + assert r.mic.zc_freq_above_range is True + + def test_real_histogram_fixture_populates_sensor_location(): """End-to-end: the histogram fixture uses 'Seis. Location:' — must successfully populate sensor_location via position-based parsing.""" diff --git a/tests/test_event_file_io.py b/tests/test_event_file_io.py index a1990f0..0e043e8 100644 --- a/tests/test_event_file_io.py +++ b/tests/test_event_file_io.py @@ -289,9 +289,106 @@ def test_read_blastware_file_round_trip(tmp_path: Path): assert parsed.timestamp.second == ev.timestamp.second # No A5 source recoverable. assert parsed._a5_frames is None - # Peaks computed from samples (synthetic = zero samples → zero peaks). - assert parsed.peak_values is not None - assert parsed.peak_values.peak_vector_sum == 0.0 + # The synthetic event has no real waveform body, so the codec can't + # decode samples → read_blastware_file leaves peak_values=None + # (the "we don't know" signal) rather than fabricating all-zero + # peaks that would otherwise overwrite real DB values via UPSERT. + assert parsed.peak_values is None + assert parsed.raw_samples is not None + # Empty channels — codec returned None for the malformed synthetic body. + for ch in ("Tran", "Vert", "Long", "MicL"): + assert parsed.raw_samples[ch] == [] + + +_BW_CODEC_FIXTURES = [ + # (path, expected_n_samples_per_channel, BW-reported Vert PPV in/s for sanity) + ("tests/fixtures/decode-re-5-8-26/event-a/M529LKVQ.6S0", 3328, 0.780), + ("tests/fixtures/decode-re-5-8-26/event-b/M529LK5Q.RG0", 2304, 0.505), + ("tests/fixtures/decode-re-5-8-26/event-c/M529LK44.AB0", 1280, 0.610), + ("tests/fixtures/decode-re-5-8-26/event-d/M529LK2V.470", 1280, 0.565), + ("tests/fixtures/5-11-26/M529LL1L.V70", 3328, 0.010), + ("tests/fixtures/5-11-26/M529LL1L.JQ0", 3328, 3.465), +] + + +@pytest.mark.parametrize("path,expected_n,expected_ppv", _BW_CODEC_FIXTURES) +def test_read_blastware_file_decodes_via_codec(path: str, expected_n: int, expected_ppv: float): + """Regression lock: ``read_blastware_file()`` must use the verified + waveform-body codec (``minimateplus.waveform_codec``), not the + retracted int16-LE assumption. + + Verifies against the real BW fixture corpus: every event in the + bundled fixtures must produce the expected per-channel sample count + and a Vert PPV close to BW's own reported value. Catches any + accidental regression of the body decoder back to the old + ``_decode_samples_4ch_int16_le`` path (which produced ±32K noise + on every event, giving wildly wrong PPVs). + """ + repo_root = Path(__file__).resolve().parent.parent + full_path = repo_root / path + if not full_path.exists(): + pytest.skip(f"fixture missing: {full_path}") + + ev = event_file_io.read_blastware_file(full_path) + assert ev.raw_samples is not None + for ch in ("Tran", "Vert", "Long"): + assert len(ev.raw_samples[ch]) == expected_n, ( + f"{ch}: expected {expected_n} samples, got {len(ev.raw_samples[ch])}" + ) + + # PPV check: the codec produces decoded samples in 1-count ADC units; + # _peaks_from_samples scales by GEO_NORMAL_FS_INS / 32767. BW's own + # PPV is computed at slightly different precision/interpolation, so + # we allow a 0.2 in/s tolerance — well under the broken-decoder + # signature (which would produce ~10 in/s saturation). + assert ev.peak_values is not None + assert abs(ev.peak_values.vert - expected_ppv) < 0.2, ( + f"Vert PPV {ev.peak_values.vert:.3f} differs from BW's " + f"{expected_ppv:.3f} by >0.2 in/s — codec regression?" + ) + + +def test_read_blastware_file_v70_samples_match_txt_truth(): + """Strongest regression lock: every one of V70's 3328 decoded + sample-sets must match the .TXT ground truth table within the + 0.005 in/s display quantum.""" + repo_root = Path(__file__).resolve().parent.parent + bw_path = repo_root / "tests/fixtures/5-11-26/M529LL1L.V70" + txt_path = repo_root / "tests/fixtures/5-11-26/M529LL1L.V70.TXT" + if not bw_path.exists() or not txt_path.exists(): + pytest.skip(f"V70 fixture missing") + + import re + ev = event_file_io.read_blastware_file(bw_path) + + # Parse .TXT ground truth sample table + text = txt_path.read_text() + lines = text.splitlines() + hdr_idx = next(i for i, line in enumerate(lines) + if re.match(r"^Tran\s+Vert\s+Long\s+MicL?", line.strip())) + truth = [] + for line in lines[hdr_idx + 1:]: + parts = line.strip().split() + if len(parts) != 4: + continue + try: + truth.append([float(x) for x in parts]) + except ValueError: + continue + assert len(truth) == 3328, f"expected 3328 truth rows, got {len(truth)}" + + def adc_to_ins(count): + return count / 32767.0 * 10.0 + + for i, truth_row in enumerate(truth): + for ch_idx, ch_name in enumerate(("Tran", "Vert", "Long")): + decoded_ips = adc_to_ins(ev.raw_samples[ch_name][i]) + truth_ips = truth_row[ch_idx] + # 0.003 in/s tolerance: <0.005 quantum + small float precision room + assert abs(decoded_ips - truth_ips) < 0.003, ( + f"row {i} {ch_name}: decoded {decoded_ips:+.4f} vs " + f"truth {truth_ips:+.4f} (delta {decoded_ips - truth_ips:+.4f})" + ) def test_save_imported_bw_with_paired_report(tmp_path: Path): @@ -432,6 +529,77 @@ def test_save_imported_bw_round_trip(tmp_path: Path): assert stored_path.read_bytes() == src.read_bytes() +# ── apply_bw_report_dict_to_event ──────────────────────────────────────────── + + +def test_apply_bw_report_dict_overlays_peaks_and_recording(): + """Verbatim mirror of the data shape produced by `_bw_report_to_dict` + when projecting a parsed `BwAsciiReport` into the sidecar. Confirms + each field overlays onto Event correctly so the backfill path + matches ingest behavior.""" + from minimateplus.models import PeakValues + ev = Event(index=0) + bw_report = { + "peaks": { + "tran": {"ppv_ips": 9.84375}, + "vert": {"ppv_ips": 0.305}, + "long": {"ppv_ips": 0.405}, + "vector_sum": {"ips": 14.86736}, + }, + "mic": {"pspl_dbl": 115.9}, + "recording": {"sample_rate_sps": 1024, "record_time_s": 3.0}, + } + event_file_io.apply_bw_report_dict_to_event(ev, bw_report) + assert ev.peak_values is not None + assert ev.peak_values.tran == 9.84375 + assert ev.peak_values.vert == 0.305 + assert ev.peak_values.long == 0.405 + assert ev.peak_values.peak_vector_sum == 14.86736 + # MicL is converted dB → psi via _dbl_to_psi — just confirm non-zero + assert ev.peak_values.micl is not None and ev.peak_values.micl > 0 + assert ev.sample_rate == 1024 + assert ev.rectime_seconds == 3.0 + + +def test_apply_bw_report_dict_overwrites_codec_peaks(): + """The whole point of this helper: bw_report wins over whatever the + codec produced. This is what the 2026-05-22 prod backfill missed — + DB peaks got overwritten with codec output (incl. PVS=0 on the + three top events) when they should have stayed bw_report-overlaid.""" + from minimateplus.models import PeakValues + ev = Event(index=0) + # Simulate codec output that's clearly wrong (incomplete decode): + ev.peak_values = PeakValues( + tran=2.09, vert=0.0, long=0.0, peak_vector_sum=0.0, + ) + bw_report = { + "peaks": { + "tran": {"ppv_ips": 9.84}, + "vert": {"ppv_ips": 4.95}, + "long": {"ppv_ips": 8.05}, + "vector_sum": {"ips": 14.95}, + }, + } + event_file_io.apply_bw_report_dict_to_event(ev, bw_report) + assert ev.peak_values.tran == 9.84 + assert ev.peak_values.vert == 4.95 + assert ev.peak_values.long == 8.05 + assert ev.peak_values.peak_vector_sum == 14.95 + + +def test_apply_bw_report_dict_no_op_on_empty(): + """None / empty dict / missing keys should leave Event untouched.""" + from minimateplus.models import PeakValues + for empty in (None, {}, {"peaks": {}}, {"peaks": {"tran": {}}}): + ev = Event(index=0) + ev.peak_values = PeakValues(tran=1.0, vert=2.0, long=3.0) + event_file_io.apply_bw_report_dict_to_event(ev, empty) + # Unchanged + assert ev.peak_values.tran == 1.0 + assert ev.peak_values.vert == 2.0 + assert ev.peak_values.long == 3.0 + + if __name__ == "__main__": if pytest is not None: pytest.main([__file__, "-v"]) diff --git a/tests/test_histogram_codec.py b/tests/test_histogram_codec.py new file mode 100644 index 0000000..6a42e27 --- /dev/null +++ b/tests/test_histogram_codec.py @@ -0,0 +1,385 @@ +""" +test_histogram_codec.py — regression locks for the histogram body codec. + +The codec is verified byte-exact against BW's ASCII export across the +in-repo histogram fixture bundle. Each test cross-checks decoded +binary fields against the corresponding .TXT row. + +Run: + python -m pytest tests/test_histogram_codec.py -q +""" + +from __future__ import annotations + +import os +import re +import sys +from pathlib import Path + +import pytest + +sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) + +from minimateplus.blastware_file import _WAVEFORM_HEADER_SIZE +from minimateplus.histogram_codec import ( + _BLOCK_SIZE, + decode_histogram_body, + decode_histogram_body_full, + geo_count_to_ins, + half_period_to_hz, + walk_body, +) +from minimateplus.waveform_codec import mic_count_to_db + + +_FIXTURE_DIR = Path(__file__).resolve().parent.parent / "example-events" / "histogram" + + +def _extract_body(path: Path) -> bytes: + """Locate the body of a BW event file — bytes between the STRT + record and the 26-byte footer.""" + raw = path.read_bytes() + body_start = _WAVEFORM_HEADER_SIZE + 21 + pos = body_start + footer_pos = -1 + while True: + pos = raw.find(b"\x0e\x08", pos) + if pos < 0 or pos + 26 > len(raw): + break + yr = (raw[pos + 4] << 8) | raw[pos + 5] + if 2015 <= yr <= 2050: + footer_pos = pos + break + pos += 1 + if footer_pos < 0: + footer_pos = len(raw) - 26 + return raw[body_start:footer_pos] + + +def _parse_txt_rows(path: Path) -> list[tuple[str, list]]: + """Parse a histogram .TXT into ``[(time_str, [10 col values]), …]``. + + Special tokens: + - ``">100"`` (the BW-display sentinel for freq > 100 Hz) → ``None`` + - non-numeric → ``None`` + """ + text = path.read_text() + lines = text.splitlines() + hdr = None + for i, line in enumerate(lines): + if re.match(r"^Tran\s+", line.strip()): + hdr = i + 3 # skip 2-row header + units row + break + if hdr is None: + return [] + rows: list[tuple[str, list]] = [] + for line in lines[hdr:]: + parts = line.split("\t") + if len(parts) != 11: + continue + vals: list = [] + for p in parts[1:]: + s = p.strip() + if s.startswith(">"): + vals.append(None) # ">100 Hz" sentinel + continue + try: + vals.append(float(s)) + except ValueError: + vals.append(None) + rows.append((parts[0].strip(), vals)) + return rows + + +# ── Block-walker plumbing ──────────────────────────────────────────────────── + + +@pytest.mark.parametrize("fixture", [ + "N844L20G.630H", + "N844L21H.2R0H", + "N844L6Z8.ZR0H", + "N844L6XE.BH0H", + "N844L23B.ND0H", +]) +def test_walk_body_returns_records(fixture: str): + """Walker yields at least one valid block per fixture.""" + path = _FIXTURE_DIR / fixture + if not path.exists(): + pytest.skip(f"fixture missing: {path}") + records = walk_body(_extract_body(path)) + assert len(records) > 100, f"expected hundreds of blocks, got {len(records)}" + + +def test_walk_body_record_count_matches_txt_intervals(): + """Block count should match the .TXT interval count (off-by-one + at the tail is acceptable — last interval may be truncated at + recording stop).""" + bin_path = _FIXTURE_DIR / "N844L20G.630H" + txt_path = _FIXTURE_DIR / "N844L20G_630H_ASCII.TXT" + if not bin_path.exists() or not txt_path.exists(): + pytest.skip("fixture missing") + records = walk_body(_extract_body(bin_path)) + txt_rows = _parse_txt_rows(txt_path) + # Allow off-by-one (final block may have been mid-write at stop) + assert abs(len(records) - len(txt_rows)) <= 1, ( + f"binary {len(records)} blocks vs TXT {len(txt_rows)} intervals" + ) + + +def test_walk_body_segment_id_increments_every_256_blocks(): + """Segment ID advances 0→1→2→… after every 256 blocks within + one event.""" + path = _FIXTURE_DIR / "N844L20G.630H" + if not path.exists(): + pytest.skip("fixture missing") + records = walk_body(_extract_body(path)) + # Group by segment_id and verify counts make sense + from collections import Counter + seg_counts = Counter(r["segment_id"] for r in records) + # First 3 segments should each have exactly 256 blocks (N844L20G has + # 791 blocks → 256+256+256+23 → segments 0/1/2/3) + assert seg_counts[0] == 256 + assert seg_counts[1] == 256 + assert seg_counts[2] == 256 + assert seg_counts[3] == len(records) - 3 * 256 + + +# ── Field-by-field decode verification against .TXT ground truth ───────────── + + +@pytest.mark.parametrize("fixture", [ + "N844L20G.630H", + "N844L6Z8.ZR0H", + "N844L6XE.BH0H", + "N844L23B.ND0H", +]) +def test_decoded_geo_peaks_match_txt(fixture: str): + """For every block, decoded Tran/Vert/Long peak (count × 0.005) + matches the corresponding .TXT cell.""" + bin_path = _FIXTURE_DIR / fixture + txt_path = _FIXTURE_DIR / (fixture.replace(".", "_") + "_ASCII.TXT") + if not bin_path.exists() or not txt_path.exists(): + pytest.skip("fixture missing") + records = walk_body(_extract_body(bin_path)) + txt_rows = _parse_txt_rows(txt_path) + n = min(len(records), len(txt_rows)) + assert n > 0 + for i in range(n): + rec = records[i] + _ts, txt = txt_rows[i] + # TXT cols 0/2/4 are T/V/L peak in in/s + for slot, key in (("T", "t_peak"), ("V", "v_peak"), ("L", "l_peak")): + col = {"T": 0, "V": 2, "L": 4}[slot] + decoded_ips = geo_count_to_ins(rec[key]) + expected = txt[col] + assert abs(decoded_ips - expected) < 0.0005, ( + f"{fixture} block {i} {slot}_peak: " + f"decoded={decoded_ips:.4f} vs txt={expected:.4f}" + ) + + +@pytest.mark.parametrize("fixture", [ + "N844L6Z8.ZR0H", + "N844L6XE.BH0H", +]) +def test_decoded_geo_freqs_match_txt(fixture: str): + """Decoded half-period → Hz matches the .TXT freq column for blocks + where the freq is in-range (not the `>100 Hz` sentinel).""" + bin_path = _FIXTURE_DIR / fixture + txt_path = _FIXTURE_DIR / (fixture.replace(".", "_") + "_ASCII.TXT") + if not bin_path.exists() or not txt_path.exists(): + pytest.skip("fixture missing") + records = walk_body(_extract_body(bin_path)) + txt_rows = _parse_txt_rows(txt_path) + n = min(len(records), len(txt_rows)) + for i in range(n): + rec = records[i] + _ts, txt = txt_rows[i] + for slot, key, col in (("T", "t_halfp", 1), ("V", "v_halfp", 3), ("L", "l_halfp", 5)): + decoded_hz = half_period_to_hz(rec[key]) + expected = txt[col] + if expected is None: + # TXT shows `>100 Hz` — codec should also yield None + assert decoded_hz is None or decoded_hz > 100, ( + f"{fixture} block {i} {slot}_freq: codec says " + f"{decoded_hz} but TXT says >100" + ) + continue + # TXT rounds; allow ±1 Hz + assert decoded_hz is not None + assert abs(decoded_hz - expected) < 1.0, ( + f"{fixture} block {i} {slot}_freq: " + f"decoded={decoded_hz:.2f} Hz vs txt={expected:.2f} Hz" + ) + + +@pytest.mark.parametrize("fixture", [ + "N844L6XE.BH0H", + "N844L23B.ND0H", + "N844L6Z8.ZR0H", +]) +def test_decoded_mic_db_matches_txt(fixture: str): + """Decoded MicL peak count → dB(L) via mic_count_to_db matches + the .TXT dB(L) column.""" + bin_path = _FIXTURE_DIR / fixture + txt_path = _FIXTURE_DIR / (fixture.replace(".", "_") + "_ASCII.TXT") + if not bin_path.exists() or not txt_path.exists(): + pytest.skip("fixture missing") + records = walk_body(_extract_body(bin_path)) + txt_rows = _parse_txt_rows(txt_path) + n = min(len(records), len(txt_rows)) + for i in range(n): + rec = records[i] + _ts, txt = txt_rows[i] + # TXT col 8 = MicL dB(L) + decoded_db = mic_count_to_db(rec["m_peak"]) + expected = txt[8] + if expected is None: + continue + # BW rounds to 1 decimal place for display. Tolerance 0.1 dB + # absorbs both rounding modes (truncate vs round-half-even). + assert abs(decoded_db - expected) < 0.1, ( + f"{fixture} block {i} M_dB: " + f"decoded={decoded_db:.2f} dB vs txt={expected:.2f} dB" + ) + + +@pytest.mark.parametrize("fixture", [ + "N844L20G.630H", + "N844L6Z8.ZR0H", +]) +def test_decoded_mic_freq_matches_txt(fixture: str): + """Decoded MicL half-period → freq matches the .TXT col 9 freq.""" + bin_path = _FIXTURE_DIR / fixture + txt_path = _FIXTURE_DIR / (fixture.replace(".", "_") + "_ASCII.TXT") + if not bin_path.exists() or not txt_path.exists(): + pytest.skip("fixture missing") + records = walk_body(_extract_body(bin_path)) + txt_rows = _parse_txt_rows(txt_path) + n = min(len(records), len(txt_rows)) + for i in range(n): + rec = records[i] + _ts, txt = txt_rows[i] + decoded_hz = half_period_to_hz(rec["m_halfp"]) + expected = txt[9] + if expected is None: + assert decoded_hz is None or decoded_hz > 100 + continue + assert decoded_hz is not None + assert abs(decoded_hz - expected) < 1.0, ( + f"{fixture} block {i} M_freq: " + f"decoded={decoded_hz:.2f} Hz vs txt={expected:.2f} Hz" + ) + + +# ── Public API ─────────────────────────────────────────────────────────────── + + +def test_decode_histogram_body_returns_four_channels(): + """The public API returns the standard 4-channel dict shape.""" + path = _FIXTURE_DIR / "N844L20G.630H" + if not path.exists(): + pytest.skip("fixture missing") + decoded = decode_histogram_body(_extract_body(path)) + assert decoded is not None + assert set(decoded.keys()) == {"Tran", "Vert", "Long", "MicL"} + # All channels same length (one value per histogram interval) + n = len(decoded["Tran"]) + assert all(len(decoded[ch]) == n for ch in ("Vert", "Long", "MicL")) + assert n > 100 + + +def test_decode_histogram_body_returns_none_for_non_histogram(): + """A waveform-mode body (starts with 00 02 00) doesn't decode as + a histogram body.""" + fake_waveform_body = b"\x00\x02\x00" + b"\x00" * 100 + assert decode_histogram_body(fake_waveform_body) is None + + +def test_decode_histogram_body_returns_none_for_garbage(): + """Bytes that don't form valid blocks return None.""" + assert decode_histogram_body(b"\xff" * 256) is None + + +def test_decode_histogram_body_full_preserves_frequency_data(): + """The structured-record API preserves the per-channel half-period + fields that the flat-channel API drops.""" + path = _FIXTURE_DIR / "N844L20G.630H" + if not path.exists(): + pytest.skip("fixture missing") + records = decode_histogram_body_full(_extract_body(path)) + assert records is not None + r0 = records[0] + expected_fields = { + "segment_id", "block_ctr", + "t_peak", "t_halfp", "v_peak", "v_halfp", + "l_peak", "l_halfp", "m_peak", "m_halfp", + "meta_var", + } + assert set(r0.keys()) >= expected_fields + + +# ── Helpers ────────────────────────────────────────────────────────────────── + + +def test_half_period_to_hz_sentinel(): + """Half-period ≤ 5 returns None (the `>100 Hz` sentinel).""" + assert half_period_to_hz(5) is None + assert half_period_to_hz(1) is None + # halfp=6 gives 512/6 = 85.3 Hz — below the >100 threshold + assert half_period_to_hz(6) == pytest.approx(85.33, abs=0.01) + + +def test_geo_count_to_ins_scale(): + """1 count = 0.005 in/s at Normal range.""" + assert geo_count_to_ins(1) == pytest.approx(0.005) + assert geo_count_to_ins(10) == pytest.approx(0.050) + assert geo_count_to_ins(0) == 0.0 + + +# ── Regression: peak is uint8 byte[N], NOT uint16 LE byte[N:N+2] ──────────── +# +# Block taken verbatim from K558LKZU.RE0H (BE9558) interval 12 — a real +# field event where the Tran channel had developed a DC offset and was +# producing sub-Hz drift content the device couldn't characterize. +# The annotation byte at [7] = 0xd2 is non-zero in that case. The +# legacy codec read [6:8] as uint16 LE, producing T_peak = 53763 → +# 268 in/s — physically impossible and 35× too high for the actual +# 0.015 in/s value (T_lo = 3 alone gives the correct count). +# Verified against the paired BW ASCII export. +_K558_INTERVAL_12_BLOCK = bytes.fromhex( + "00 00 0c 01 0a 00 03 d2 45 00 02 00 02 00 02 00" + "02 00 10 00 06 00 00 00 0e 91 2f 00 1e 0a 00 00".replace(" ", "") +) + + +def test_extension_byte_does_not_inflate_peak(): + """The annotation byte at [7]/[11]/[15]/[19] must NOT contribute to + the peak count. Decoded T_peak must be 3 (uint8 byte[6]), NOT + 53763 (uint16 LE byte[6:8]).""" + body = _K558_INTERVAL_12_BLOCK + records = decode_histogram_body_full(body) + assert records is not None + assert len(records) == 1 + r = records[0] + assert r["t_peak"] == 3, f"T_peak should be 3 (uint8), got {r['t_peak']}" + assert r["v_peak"] == 2 + assert r["l_peak"] == 2 + assert r["m_peak"] == 16 + # Half-periods unchanged — still uint16 LE. + assert r["t_halfp"] == 0x0045 # 69 → 7.4 Hz + assert r["m_halfp"] == 6 # → 85.3 Hz + # Annotation byte is preserved (for future RE) but does not affect peak. + assert r["annotations"] == (0xd2, 0x00, 0x00, 0x00) + + +def test_extension_byte_decoded_to_correct_in_s(): + """End-to-end: the channel-grouped output for the K558 ext block + should give T = 3 counts = 0.015 in/s, not 53763 counts = 268 in/s.""" + channels = decode_histogram_body(_K558_INTERVAL_12_BLOCK) + assert channels is not None + assert channels["Tran"] == [3] + assert geo_count_to_ins(channels["Tran"][0]) == pytest.approx(0.015) + assert channels["Vert"] == [2] + assert channels["Long"] == [2] + assert channels["MicL"] == [16]