50 Commits

Author SHA1 Message Date
serversdown d0b66368d5 Merge pull request 'update to v0.21.1, thor data import successful' (#29) from dev into main
Reviewed-on: #29
2026-06-01 16:54:23 -04:00
serversdown 25386cab8b fix(backfill): regenerate IDFH .h5 + merge binary mic_pspl_psi onto bridge
Two gaps in backfill_thor_events.py that left old Thor events showing
stale charts after a v0.21.1 backfill pass:
1. IDFH events were skipped from .h5 regeneration (the "have decoded
   samples" gate was IDFW-only).  Histograms kept their pre-v0.21.1
   .h5 — written from raw_samples = None, which the renderer turned
   into a near-empty bar chart, or for older events the dB(L)-as-pseudo-
   psi mic scale that produced "107.7 psi" peaks (atomic-bomb level
   instead of footstep level).  Fix: synthesise the same 1-sample-per-
   interval array save_imported_idf v0.21.1 uses (peak ADC count per
   channel per interval) so the renderer's bar-chart grouping has
   data to work with.
2. The IDFW h5 path didn't merge binary_peaks.mic_pspl_psi onto the
   IdfEvent before to_minimateplus_event().  The live save_imported_idf
   does this merge — without it, IdfEvent.from_report() only sees the
   .txt's dB(L) value, the bridge falls back to the dBL→psi formula
   (instead of the binary-accurate 2.14e-6 psi/count value), and the
   h5 writer's per-count mic factor lands on a less-correct value.
   Fix: same merge the live ingest does (lift res.event.peaks.mic_pspl_psi
   onto idf_event.peaks before the bridge call).
Verified against UM6047_20250804190047.IDFH (250-interval prod
histogram): 250 intervals decode, mic_pspl_psi = 2.78e-5 (was being
treated as dB(L)=107.7 in the old h5).
Operator: re-run after deploy.  `docker compose exec sfm python
scripts/backfill_thor_events.py` is idempotent — the existing version
check still skips events already at the new TOOL_VERSION, and review
state + captured_at are preserved on the second pass.
2026-06-01 20:02:54 +00:00
serversdown 6cb619ecc4 version bump - 0.21.1 2026-06-01 19:33:44 +00:00
serversdown 1ed86244d0 fix(thor-events): add parallel field for mic psi. Now shows mic in dbl and psi. (psi for charts) 2026-06-01 18:27:24 +00:00
serversdown b2c565f217 fix(idf_waveforms): _find_waveform_body_offset() — scans every 00 02 00 magic past offset 0x0E00, runs decode_waveform_v2 on each candidate, picks the one that returns the most samples. Validated on 483 prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully decode, 126/483 partial (BW codec walker-stops-early on loud events — known issue).
IDFH now synthesises a 1-sample-per-interval array from the binary intervals and writes an .h5 so the existing renderer works unchanged. Each "sample" is the per-interval peak ADC count → h5_value = count × geo_fs/32768 yields the right bar height.
2026-05-31 20:51:09 +00:00
serversdown 43f440812a scripts: add backfill_thor_events.py
Refreshes the bw_report sidecar block + .h5 waveform files for Thor
events ingested before the v0.21.0 adapter wiring + the bee1185 codec
fix.  Those events landed with extensions.idf_report only (no
bw_report, no .h5 for IDFW) — symptom on the UI side: the modal chart
404'd on /waveform.json and the PDF rendered from DB-only fields
without sensor self-check, full per-channel breakdown, or mic dB(L).

Walks <store>/<serial>/<filename>:
  - Reads the existing sidecar (preserves review state + captured_at)
  - Re-runs read_idf_file() on the binary bytes (passes data=
    kwarg so codec doesn't try the broken bare-path Path.read_bytes)
  - Reads extensions.idf_report from the existing sidecar
  - Runs build_bw_report_from_idf adapter
  - Writes refreshed sidecar with bw_report + bumped tool_version,
    preserving review block and original captured_at
  - For IDFW: regenerates .h5 by bridging IdfEvent.from_report ->
    to_minimateplus_event -> write_event_hdf5 (mirrors save_imported_idf
    steps 4-7)
  - IDFH events skip .h5 (histograms have no per-sample data)

Skips events already at current TOOL_VERSION with bw_report present.
--force overrides.  --skip-hdf5 limits to sidecar-only refresh.
--dry-run for preview.

Validated against the prod-snap waveform store: 3,815 Thor sidecars
refreshed cleanly with 0 errors, 462 IDFW .h5 files written, 2 skipped
(binaries with no sidecar — backfill doesn't conjure events from
nothing).  Verified one originally-broken IDFW event now serves
waveform.json (200, 168KB) and a fully populated PDF (119KB vs the
previous 56KB sparse output).

Operator workflow on prod:
  docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py --dry-run
  # Inspect counts, then for real:
  docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py

Idempotent — re-running it is a no-op once everything's at the current
TOOL_VERSION.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 04:37:43 +00:00
serversdown 23e83908c2 report_pdf: fix PVS overlapping stats table, drop NA caption
Two related fixes to the per-channel stats block:

1. Pin the stats table's position via an explicit bbox= on
   ax.table() so the bottom edge is at a known axes-fraction Y.
   The previous loc="upper left" + tbl.scale(1, 1.4) combo let
   matplotlib choose row heights based on text size, which made the
   table extend further below the axes than the hard-coded PVS line
   at y=-0.08 expected.  Result was the "Peak Vector Sum X in/s"
   string landing horizontally inside the Peak Displacement row.

   With bbox=[0, 1-N*0.12, 0.80, N*0.12] the table is pinned to a
   precise rectangle (12% axes-fraction per row × N rows tall).
   _draw_stats_table now stashes the bottom Y on the axes for the
   PVS helper to reference, so the geometry stays in sync.

2. Center PVS horizontally (ha="center" at x=0.5 instead of ha="left"
   at x=0).  The previous left-edge alignment put PVS at the same
   X as the label column, which read as "off-center" once the rest
   of the stats data was column-aligned further right.

3. Drop the "NA: Not Applicable" caption.  It existed to explain
   "—" placeholder cells, but "—" is universally understood and the
   caption was always visually squished against the PVS line below.
   Less cruft on the page; one fewer position to manage.

Verified against a real BE12599 histogram event (5 data rows) and
a real UM12947 IDFW waveform event (6 data rows) — both layouts
clear the table cleanly with no overlap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 22:17:43 +00:00
serversdown bee118506b fix(idf): decode from in-memory bytes during ingest
Bug shipped in v0.21.0: save_imported_idf called read_idf_file()
with `source_path` (a bare filename like "UM12947_….IDFW") BEFORE
writing the binary to disk.  The codec did Path(path).read_bytes()
which resolved relative to /app and hit FileNotFoundError.  The
error was caught + logged as a warning, and ingest fell back to
.txt-only — events still landed in the DB but lost the bw_report
block + .h5 waveform that the codec was supposed to produce.

Observed during a full re-forward from thor-watcher on 2026-05-29:
every Thor event logged "binary codec failed for X: [Errno 2] No
such file or directory" and got binary_decoded=False.

Fix:
- read_idf_file() gains a `data: Optional[bytes]` kwarg.  When
  supplied, skips the disk read and decodes the provided bytes
  directly.  `path` stays required (used for filename in error
  messages + .IDFH vs .IDFW suffix detection); only the read is
  conditional.  Backward compatible — existing positional callers
  (CLI scripts, tests) continue to work unchanged.
- save_imported_idf passes `data=idf_bytes` since the bytes are
  already in memory from the multipart upload.  Filesystem write
  still happens at step 5 of the existing flow; codec just no
  longer depends on it.

Verified end-to-end against UM11719_20231219162723.IDFW from the
example-data corpus: ingest endpoint returns inserted=1, log line
shows binary_decoded=True + h5=...IDFW.h5, no warnings.

Re-forward existing Thor events from thor-watcher after deploy to
backfill the bw_report block — UPSERT preserves review state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 20:09:54 +00:00
serversdown defd17d9c2 sfm_webapp: harmonize "Received by server at" → "Time received"
Matches Terra-View's event-modal relabel from the same iteration.
Wording was already clearer here than in Terra-View's "Captured at",
but using identical text across both surfaces means operators see the
same label whether they're in the native modal or the standalone
webapp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 19:51:58 +00:00
serversdown e42956a20b release: v0.21.0 — Thor / Series IV codec + Thor→BW adapter
Documents two commits that landed on dev since v0.20.0:

  9b71ead  series 4 codec work, initial decode success
           micromate/idf_file.read_idf_file() decodes both IDFW
           (waveform; 87-99% sample fidelity reusing
           decode_waveform_v2 at offset 0x0f1f) and IDFH (histogram;
           dedicated segment-based decoder, all 859 corpus files
           decode, 181,071 intervals total).

  9fd52dd  feat: add thor report generation, pdf generation
           micromate/idf_to_bw_report.py adapter projects parsed
           Thor data into the bw_report sidecar shape so Thor
           events flow through sfm/report_pdf.py without a
           separate renderer.  Wired into save_imported_idf.

Net effect: a Thor event ingested via /db/import/idf_file now
lands with the same fidelity as a BW event, gets a per-event PDF
on demand, and renders in Terra-View's modal chart using the same
plotting code as a BW event.

Roadmap items closed:
- Binary .IDFW / .IDFH codec (was pending)
- Series IV (Thor IDF) binary codec reverse-engineering

Companion: Terra-View v0.13.0 ships in parallel and closes Phase 1
of the SFM integration.  No API changes in seismo-relay for that
piece — Terra-View just consumes existing endpoints better.

Bumps:
- pyproject.toml 0.20.0 → 0.21.0
- minimateplus.event_file_io.TOOL_VERSION 0.20.0 → 0.21.0
  (any subsequent backfill_sidecars.py --force will re-stamp
  existing sidecars; expected + harmless)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 19:25:44 +00:00
serversdown 9fd52ddabb feat: add thor report generation, pdf generation. 2026-05-29 19:03:06 +00:00
serversdown 9b71ead44b series 4 codec work, inital decode success 2026-05-29 06:33:13 +00:00
serversdown 2eb1d25028 Merge pull request 'v0.20.0 -- Full s3 event parse and PDF creation.' (#28) from dev into main
Reviewed-on: #28
2026-05-28 17:54:31 -04:00
serversdown 1bccc44b88 release: v0.20.0 — PDF + parser polish
Closes out the Event-Report PDF iteration started in v0.17.x and ships
the parser fixes the real-world events were tripping over.

Today's additions on top of the pre-v0.20 unreleased body:

- Server-wide display TZ via the TZ env var (default America/New_York
  on prod).  Affects server logs, the PDF report's "Created" footer,
  matplotlib datetime axes.  DB columns stay UTC.  Dockerfile now
  installs tzdata.
- ZC Freq "above-range" handling — parser stores 100.0 +
  zc_freq_above_range flag for BW's ">100 Hz" marker.  Renders as
  >100 in the PDF stats table, both modals (inline on webapp Peaks,
  new column on event-browser table).
- scripts/backfill_sidecars.py --reparse-txt — re-runs the current
  parser against the preserved _ASCII.TXT and overwrites the
  sidecar's bw_report block.  Lets parser fixes reach old events
  without re-forwarding.  Validated end-to-end against ~10k prod
  events.

Fixes shipped today:
- histogram_interval_size_s missing from ReportData → every
  histogram PDF render 500'd.
- Histogram PDF geo channels now share a nice-quantized y-axis
  (0.005-LSB-aware 1-2-5 step sequence) instead of auto-scaling
  per channel + inventing sub-LSB "0.003 in/s/div" footer labels.

Roadmap delta: closes the BW ASCII parser "PPV-miss on some TXT
formats", "histogram-specific structural fields", and ">100 Hz value
parsing" items.  Adds a new entry for the byte[5]==0 histogram body
sub-format observed on S353 events.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 21:17:53 +00:00
serversdown a3cc44d30a feat(backfill): --reparse-txt flag to refresh bw_report from preserved .TXT
The existing backfill_sidecars.py PRESERVES the bw_report block across
regenerations — it's treated as the source of truth from the original
ingest pass (the .TXT isn't reachable from the script's normal data
path, so it can't be re-derived).

That means parser-side fixes (like the 2026-05-28 ">100 Hz" ZC Freq
addition) won't reach old events even with --force.  The new
--reparse-txt flag fixes that: when the sidecar's source.txt_filename
points at a preserved <serial>/<filename>_ASCII.TXT, the script re-runs
the current parser against it and overwrites the bw_report block.

Implies sidecar regeneration on every event (bypasses the
sha-up-to-date / version-up-to-date skip), so that the .h5 cascade-
regenerates alongside.  No-op for events without a preserved .TXT
(legacy ingests pre-2026-05-27).  Idempotent — re-running it produces
the same sidecar bytes when the parser hasn't changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 18:56:23 +00:00
serversdown 6a73523e4d ui: surface per-channel ZC Freq (and ">100") in event modals
The PDF report shows per-channel ZC Freq alongside PPV in the stats
block, but neither modal exposed it.  Now that the sidecar projection
carries zc_freq_hz + zc_freq_above_range, plumb them through:

- sfm_webapp.html: inline suffix on existing Peaks cells, e.g.
  "Tran  0.04500 in/s · >100 Hz".  Empty suffix when no ZC is
  available (legacy events without a preserved .TXT).

- event_browser.html: new ZC Freq column on the per-channel stats
  table.  Required adding a parallel sidecar fetch in loadEvent()
  (waveform.json alone doesn't carry bw_report).  Fetch failure is
  non-fatal — falls back to "—" in the new column.

Above-range ZC peaks (BW ">100 Hz") render with a literal ">"
prefix mirroring the PDF, so operators don't have to generate the
PDF to see when a channel hit the zero-crossing ceiling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 18:47:37 +00:00
serversdown 780b45a371 feat: render ">100" for above-range ZC Freq instead of "—"
BW writes ">100 Hz" for ZC Freq when the zero-crossing algorithm sees a
peak too fast to count — the device's reporting ceiling is 100 Hz on
V10.72.  Our parser fell back to None via _parse_number (which requires
a leading digit), so the PDF rendered "—" where BW shows ">100".

Mirrors the OORANGE/saturated pattern already used for PPV and PSPL:
parser stores the threshold (100.0) on zc_freq_hz + sets a new
zc_freq_above_range flag.  Projection carries the flag through to the
sidecar; PDF renderer prepends ">" when set.

Affects both per-channel stats tables (waveform + histogram variants)
and the mic block's ZC Freq row.

Verified on the real T190LD5Q.LK0W fixture: Tran zc_freq_hz=100.0
above_range=True; Vert/Long (normal values) above_range=False; "N/A"
still produces zc_freq_hz=None which renders as "—" (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 18:38:49 +00:00
serversdown f6abe3caa0 fix(report_pdf): histogram geo channels share nice-quantized y-axis
Two related visual bugs on histogram PDFs:

1. Per-channel auto-scale meant Tran/Vert/Long had different y-axes
   (e.g. 0-0.015, 0-0.025, 0-0.020) — bars looked taller on the
   channel that happened to be quietest.  Not directly comparable.

2. Footer "Amplitude Geo: X in/s/div" was just amax/5 of the FIRST
   geo channel with data, with no LSB quantization — producing
   nonsense like 0.003 in/s/div when the geophone LSB is 0.005.

Fix: compute a single shared geo y-axis range from max(Tran,Vert,Long),
quantize the per-division step to BW's 1-2-5 sequence rounded to the
0.005 LSB (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, ...), apply the same
ylim + ticks to all three geo subplots, and use that same step for the
footer label.  MicL stays on its own auto-scale (different units).

Verified across edge cases including the reported event
(geo max 0.025 → 0.005/div, top 0.025), small PVS events, and large
blast amplitudes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 18:22:20 +00:00
serversdown ad2702d4bf fix(report_pdf): add missing histogram_interval_size_s field
The histogram-interval-times derivation block at line 314 references
rd.histogram_interval_size_s, but the field wasn't declared on the
ReportData dataclass — only the string form histogram_interval_size
was.  Result: every PDF render of a histogram event raised
AttributeError → 500 from /db/events/{id}/report.pdf.

Cause: when the histogram aggregation block was inlined into
gather_report_data, the seconds-numeric counterpart that the
projection already carries (bw_report.histogram.interval_size_s) was
never wired into the dataclass.  Waveform PDFs weren't affected
because the offending line is gated on is_histogram.

Fix: add the field, read it from the projection alongside the other
histogram keys.  No-op for waveform events (the field stays None and
the gate skips it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 18:07:41 +00:00
serversdown 86325b9bab docs: roadmap entry for a SECOND undecoded histogram sub-format (S353)
Observed in fresh ingest logs on 2026-05-28: BE17353 events
(S353L4H2.FZ0H, S353L4H2.P00H, etc.) cause "body codec failed to
decode" warnings.  Different from the byte[5]!=0 case already tracked
(T190 / O121) — these have byte[5]==0x00 with what looks like a
valid block header, but the walker finds zero data blocks anyway.

Operational impact identical to the existing case: ingestion
succeeds, DB peaks come from bw_report overlay, only the chart is
empty.  No data loss.

Pinning so it doesn't get lost — needs a hex dump of one body to
work out what's different about these.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 05:42:18 +00:00
serversdown 6381dcb312 tz: server-wide display timezone via TZ env var (default EST/EDT)
User-reported issue: server logs were timestamped in UTC ("05:36:20"
when local was ~01:36 EDT), and the PDF report's "Created" footer
similarly showed raw UTC.  Inconsistent with the modal which already
converts to browser local via toLocaleString.

Solution: standard Linux TZ env var.  Set once in the container, and:
  - Python's datetime.now() uses local
  - Logging module's timestamps use local
  - matplotlib renderers + report_pdf formatters use local
  - astimezone() conversions resolve to the configured TZ

DB columns stay UTC (created_at uses SQLite's strftime('%Y-...Z', 'now')
which is always UTC, regardless of TZ env var — proper "store UTC,
display local" pattern).

Changes:
  - Dockerfile: install tzdata (python:3.11-slim omits the timezone
    database), set default TZ=America/New_York
  - sfm/report_pdf.py: _fmt_iso_to_bw and _split_iso_to_date_time now
    convert UTC inputs (Z-suffixed) to local via astimezone(); naïve
    inputs (BW recorded-at, already unit-local) returned as-is.
    New _to_display_local helper centralizes the logic.
  - "Created" line in the PDF page footer now uses the converted
    timestamp.

Override per-deployment via the TZ env var in docker-compose
(separate commit on terra-view side).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 05:41:10 +00:00
serversdown 53c05d93e2 delete: also clean up preserved _ASCII.TXT file
_cleanup_event_files() removes the on-disk artifacts when an event is
hard-deleted (binary, a5_pickle, sidecar, h5).  Today's .TXT
preservation feature added a new on-disk file (_ASCII.TXT next to the
binary) but the cleanup didn't know about it — so any event deleted
via /db/events/{id} (single) or /db/events/delete_bulk (or the
Terra-View "SFM Event DB Manager" UI which proxies through to those
endpoints) was leaving orphan .TXT files in the store.

Added "txt" to the cleanup list using the new
WaveformStore.txt_path_for().  Safe for old events without a .TXT —
the exists() check skips the unlink.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 05:31:08 +00:00
serversdown a5888e1b5c report_pdf: PDF histogram aggregation + fix footer/x-axis overlap
Two issues spotted on a histogram event PDF:

1. Footer scale ("Time — /div  Amplitude Geo: X in/s/div  Mic: Y
   psi(L)/div") was overlapping horizontally with the x-axis tick
   labels (0, 20, 40, 60...).  Both rendered on the same Y row.
   Fix: bumped gridspec bottom margin from 0.06 → 0.12, moved the
   footer text from y=0.045 → y=0.030 (below the tick labels), moved
   the page-bottom Created/Event line from y=0.015 → y=0.005.
   Trigger legend on waveforms moved 0.030 → 0.018.  Everything
   stacks cleanly now without collision.

2. PDF was showing the raw codec output (~150+ bars per histogram)
   instead of BW's per-interval aggregation.  Why: the aggregation
   I'd added to /db/events/{id}/waveform.json wasn't replicated in
   the PDF gather path.  Now: gather_report_data does the same
   max-per-group aggregation when bw_report.histogram.n_intervals is
   populated, AND derives per-interval HH:MM:SS labels from the
   start time + interval_size_s.  Result: histogram PDFs now match
   BW's display (one bar per BW interval, x-axis labeled with actual
   times) — same fix as the modal chart, applied to the PDF.

For events ingested BEFORE the parser extension (no histogram block
in their sidecar), aggregation is a no-op — they still render with
per-block bars + interval-index x-axis (but the overlap fix applies
to them too).  Re-forwarding repopulates the histogram block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:33:53 +00:00
serversdown b9f8bbb220 viewers: enforce minimum Y-range on histogram channels
Quiet histogram events were filling the chart panel even though the
peak was tiny (0.005 in/s rendered as 90% of chart height because
Chart.js auto-scaled to peak * 1.1).  Made everything look uniformly
loud regardless of actual amplitude.

BW's solution: a near-fixed scale per channel ("Geo: 0.002 in/s/div"
from the footer).  Quiet events render small, loud events render
proportionally tall.

Match the intent without copying BW's "no Y-axis labels at all"
convention.  For histogram channels:

  Geo (in/s):       min Y range 0.05 in/s
  Mic in psi:       min Y range 0.001 psi
  Mic in dBL:       unchanged (the 60 dBL floor + peak+5 top already
                    gives quiet events a sensible baseline)

So a 0.005 in/s geo event renders as ~10% of chart height; a 0.05
event fills it; a 5.0 event still fills it (max(peak*1.1, 0.05) ==
peak*1.1 for any peak > 0.045).

Waveform charts unchanged — they should zoom for shape detail.
Applied to both the modal in sfm_webapp.html and the standalone
/events page in event_browser.html.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:23:01 +00:00
serversdown b59f886cb7 docs: roadmap entry for sensor-check waveform extraction
BW's Event Report PDFs include a per-channel sensor-check response
waveform on the right side of the bottom plot (damped sinusoid for
geo channels, sawtooth-at-test-freq for mic).  Looks like real
per-sample data extracted from the binary, not synthesized.

Our parser captures the test results (freq, ratio, amplitude,
pass/fail) but not the waveform samples — so the report shows text
only for sensor check.  Pinning a roadmap entry to investigate the
binary for the sample data (path a) or fall back to synthesized
visualization (path b).

Current text-only display is operationally sufficient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:17:50 +00:00
serversdown 87aec3f4d1 viewers: smoother mic dBL chart + restore binary/TXT download links
Two issues spotted in the modal:

1. Mic dBL chart looked spikey/discontinuous — isolated bars at 80-95
   with gaps in between.  Cause: _psiToDbl() returns null for zero or
   negative samples, and most mic samples on a quiet event sit at the
   digitization noise floor where they're effectively zero.  Result:
   the chart only renders the moments when instantaneous SPL exceeded
   the Y-axis bottom — looks like a sound trigger gate.

   Fix: new _psiToDblForChart() rectifies the AC waveform (abs), then
   converts to dBL, then floors at MIC_DBL_FLOOR=60 dBL.  Chart now
   has a continuous 60 dBL baseline with peaks above it — matches how
   acoustic engineers expect SPL-vs-time.  Y-axis bottom pinned to
   MIC_DBL_FLOOR, top to peak + 5 dB headroom.  Peak label still uses
   the unrectified _psiToDbl so the displayed peak value is exact.

2. Filename in Source/Files block was unlinked.  Endpoint exists
   (/db/events/{id}/blastware_file) — just wasn't wired to the modal.
   Made it a clickable download link.  Same treatment for the
   preserved .TXT — added "(download .TXT)" link next to source kind
   when source.txt_filename is populated (events ingested after the
   .TXT preservation feature landed; older events show no link).

Applied to both the inline modal in sfm_webapp.html and the
standalone /events page in event_browser.html.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 23:08:21 +00:00
serversdown ace542cba5 report_pdf: wire histogram peak date/time + PVS-when + Finish field
Spotted comparing our PDF to BW's reference for T003LLUB.CE0H:
  - Finish blank
  - Per-channel Date / Time rows all dashes
  - MicL PSPL line missing "on May 27, 2026 at 06:19:14"
  - Peak Vector Sum missing "on May 27, 2026 At 06:06:14"

Root cause: I'd added these fields to the projection (write side) in
_bw_report_to_dict but never wired them into gather_report_data
(read side).  Plus the projection used keys "start"/"stop" while
gather was reading "start_str"/"stop_str" — typo'd lookup.

Fixes:
  - gather_report_data now reads bw_report.histogram.start /
    .stop / .channel_peak_when (correct keys, matching the projection)
  - Per-channel "peak_date" / "peak_time" populated from
    channel_peak_when[<channel>] for the histogram stats table
  - MicL PSPL line formats as "PSPL  125.7 dB(L) on May 27, 2026
    at 06:19:14" (BW style) when channel_peak_when["MicL"] is present;
    falls back to the waveform-relative "at 0.012 sec" otherwise
  - PVS line formats as "Peak Vector Sum  0.091 in/s on May 27, 2026
    At 06:06:14" (BW style) when bw_report.peaks.vector_sum.when is
    populated; falls back to the relative time_s for waveforms
  - New _split_iso_to_date_time() helper splits ISO timestamps into
    BW-formatted ("May 27 /26", "06:06:14") date+time pairs for the
    stats table's separate Date and Time rows

Events ingested BEFORE the parser extension landed (most of the
existing prod corpus) still show dashes — their sidecars lack the
histogram block.  Re-forwarding repopulates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:47:53 +00:00
serversdown 8cbda09917 viewers: render timestamps in browser-local time
Spotted on the SFM webapp event modal — "Received by server at" was
showing the raw ISO string "2026-05-27T21:59:57.213043Z" because we
were assigning ev.timestamp / src.captured_at directly to the
textContent of the modal fields, bypassing the existing _fmtTs()
helper that wraps them in toLocaleString().

Net effect for operators: confusing "21:59 vs it's 6 PM" mismatch
when the displayed UTC timestamp didn't match wall-clock time.  The
values were always correct; the display was just ambiguous.

After this fix:
  - "Recorded at" (naive ISO from BW = unit local time) renders
    cleanly as the unit wrote it: "5/27/2026, 6:00:13 AM"
  - "Received by server at" (UTC with Z suffix) converts to browser
    local: "5/27/2026, 5:59:57 PM"
  - Timestamp column in the history table already used _fmtTs —
    unchanged
  - Same fix applied to the standalone /events page (sidebar event
    list + meta header) via a new _fmtTsLocal helper

Note: did NOT add file-mtime-on-watcher-PC tracking as a separate
"Called in at" column — discussed and decided created_at is close
enough for schedule-compliance monitoring (worst case lag = watcher
poll interval ~60s, indistinguishable from BW write time at the
operationally-relevant resolution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:30:43 +00:00
serversdown 3457ed0072 bw_ascii_report: parse OORANGE saturation marker + TimeSum typo
BW writes "OORANGE" (truncation of "Out Of Range") when a channel
exceeds its full-scale, and uses a typo'd label "Peak Vector Sum
TimeSum" for the PVS time field.  Both confirmed against real ASCII
files pulled from a Windows watcher PC 2026-05-27:

  T190LD5Q.LK0W  Vert PPV = OORANGE  (Normal range, 10 in/s exceeded)
  T438L713.RY0W  All three PPVs OORANGE  (Sensitive range, 1.25 in/s)
  K557L3YM.OE0W  Tran+Vert PPV OORANGE + MicL PSPL OORANGE

Previously our _parse_number() returned None for OORANGE → DB columns
ended up NULL → events vanished from filters / sorts / dashboards
despite being legitimate high-amplitude events.

New behavior — substitute a conservative bound + set a saturation flag:
  - Channel PPV       → geo_range_ips + ChannelStats.ppv_saturated
  - Peak Vector Sum   → sqrt(3) * geo_range_ips + peak_vector_sum_saturated
  - MicL PSPL         → 140 dB(L) + MicStats.pspl_saturated

Flags propagate to the sidecar's bw_report block so the SFM UI can
render "> 10 in/s" / "> 140 dBL" rather than treating the substituted
value as exact.

Same commit also accepts "Peak Vector Sum TimeSum" as an alias for
"Peak Vector Sum Time" (BW always writes the typo on OORANGE PVS
lines — every example file confirms it).

Tests: new test_oorange_marker_treated_as_saturation (synthetic) +
test_real_oorange_event_t190_parses (skips if real fixture absent).
177/177 tests pass; 16 pre-existing missing-fixture skips unchanged.

Five events on prod (T190, T438, K557, plus 2 others matching the
same fault pattern) will pick up correct peaks + saturation flags
once watchers re-forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 20:32:56 +00:00
serversdown d21e3b5298 histogram aggregation + parser extension for BW interval fields
Three layered changes that together make histogram charts visually
match BW's printout (one bar per interval, not per codec block):

1. bw_ascii_report parser captures histogram fields it previously
   dropped:
     - Histogram Start/Stop Time + Date → datetime
     - Number of Intervals + Interval Size (string + parsed seconds)
     - <Channel> Peak Time + Peak Date → datetime (per-channel)
     - Peak Vector Sum Date (combined with PVS Time → datetime;
       clears the bogus seconds parse that interpreted "22:33:52"
       as 22.0)
   New _parse_iso_date() handles BW's ISO format for histograms
   (waveforms use "May 8, 2026" long form).  New _parse_interval_size()
   handles "1 minute" / "5 minutes" / "15 seconds" etc.

2. _bw_report_to_dict() projects the new fields into a new
   bw_report.histogram block in the sidecar.

3. /db/events/{id}/waveform.json wraps the existing path 1 (HDF5)
   output with _maybe_aggregate_histogram(): when the event is a
   histogram AND the sidecar has bw_report.histogram.n_intervals,
   group the codec's per-block samples into N intervals via
   max-per-group and return the aggregated array.  time_axis gains
   histogram_aggregated / n_intervals / interval_size_s / interval_times
   fields.

Frontend (both modal chart in sfm_webapp.html + standalone event
browser) uses interval_times as x-axis labels when provided (BW-style
HH:MM:SS), falls back to interval index.

Defensive: aggregation is no-op when the sidecar lacks the histogram
block (events ingested before this change).  Activates automatically
on prod once a watcher re-forward populates new sidecars.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 20:23:05 +00:00
serversdown ad2b553c7b ingest: preserve raw BW ASCII report (.TXT) alongside the binary
Previously the .TXT was parsed into the sidecar's bw_report projection
and then discarded at ingest time.  Now save_imported_bw() writes it
to <store>/<serial>/<filename>_ASCII.TXT permanently.

Rationale: with BW Mail / Forwarding Agent being phased out of the
operator workflow, the XML/PDF/WMF those tools produce won't be
available — the binary + .TXT (created by BW ACH itself) are our
only authoritative inputs going forward.  Keeping the raw .TXT
unlocks:

  - Parser bug fixes can be applied RETROACTIVELY by re-parsing the
    stored .TXT, instead of requiring a re-forward from the watcher
    PC (which lost the .TXT after BW ACH cleanup).
  - Audit trail of what BW actually sent us, for debugging.
  - The five known parser-PPV-miss events will be re-parseable once
    the regex fix lands (instead of staying broken indefinitely).

Storage cost: ~15 KB per event × 14k events = ~210 MB on the
existing prod corpus.  Negligible.

Implementation:
  - WaveformStore gains txt_path_for() + open_txt()
  - save_imported_bw() writes the .TXT when bw_report_text is supplied
  - sidecar source block records the txt_filename
  - backfill_sidecars.py preserves txt_filename across regens
  - New GET /db/events/{id}/ascii_report.txt endpoint serves it
  - Returns 404 for events ingested before this change (no .TXT in
    the store yet) — re-forward to populate

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 20:01:12 +00:00
serversdown dfbc8b8520 report_pdf: split waveform vs histogram layouts (BW PDF iteration)
Reviewed against real Blastware Event Report PDFs (uploaded to
example-events/pdfsnstuff/) for K558LLB7.V20H (histogram) and
K558LLB8.0E0W (waveform).  Each event type has its own layout because
BW's printouts genuinely differ:

  Waveform header:   Date/Time, Trigger Source, Range, Sample Rate
  Histogram header:  Start, Finish, Intervals At Size, Range, Sample Rate
                     (no trigger field — histograms aren't triggered)

  Waveform stats:    PPV, ZC Freq, Time (Rel. to Trig),
                     Peak Acceleration, Peak Displacement, Sensor Check
  Histogram stats:   PPV, ZC Freq, Date, Time (of peak), Sensor Check

  Waveform plot:     4-channel stacked line, x-axis in SECONDS,
                     trigger triangle + window markers, symmetric Y
                     for geo, zero-anchored mic, "0.0" baseline label
                     on right edge per BW convention
  Histogram plot:    4-channel stacked bars, Y-axis 0-to-peak only
                     (never negative — peaks are magnitudes), 0.0
                     baseline at the bottom

  Waveform footer:   USBM chart placeholder upper-right;
                     "Time X sec/div   Amplitude Geo: Y in/s/div   Mic: 0.001 psi(L)/div"
                     "Trigger = ▶━━◀"
  Histogram footer:  No USBM chart; same scale-info footer with
                     interval-size as the time unit

Other fixes from the first-pass screenshot review:
  - Channel labels (MicL/Long/Vert/Tran) no longer cut off (wider
    left margin)
  - Histogram bars rise from zero baseline (abs of any signed values)
  - ISO timestamp "2026-05-16T22:33:50" → "22:33:50 May 16, 2026"
    matching BW's display format

Known gaps (separate work):
  - Histogram codec returns per-block granularity (~200 bars for
    BW's 4-interval display).  XML-driven data source is the planned
    fix; the structured BW XML has the per-interval aggregates.
  - USBM RI8507 / OSMRE compliance chart still placeholder

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 18:22:03 +00:00
serversdown 411ef8139e sfm: Event Report PDF generation (v0.20.0 stub layout)
New endpoint GET /db/events/{id}/report.pdf returns a single-page
letter-portrait PDF for any event with waveform data on disk.

Architecture:
  sfm/report_pdf.py — gather_report_data() assembles fields from
    SeismoDb row + .sfm.json sidecar (bw_report block) + .h5 samples;
    render_event_report_pdf() turns that into PDF bytes via matplotlib.
  sfm/server.py — new endpoint wires them together, streams PDF back
    with Content-Disposition: inline so the browser displays it.
  sfm_webapp.html — new "Download PDF" button in the event modal
    footer that opens the endpoint in a new tab.

Fields surfaced — same coverage as a Blastware Event Report:
  Header metadata (date/time, trigger source, range, sample rate,
                   project, client, operator, location, serial+firmware,
                   battery, calibration, file name)
  Microphone block (PSPL in dB(L) + psi, ZC freq, channel test)
  Per-channel stats (PPV, ZC Freq, Time of Peak, Peak Accel,
                     Peak Disp, Sensor Check) for Tran/Vert/Long
  Peak Vector Sum
  Waveform plot (MicL/Long/Vert/Tran stacked, shared time axis,
                 trigger marker, symmetric Y for geo, zero-anchored
                 mic) — OR per-interval bar chart for histograms.

Rendering pipeline = matplotlib only (vector PDF, no headless-browser
dep).  Adds matplotlib>=3.8 to deps.

Visual layout is approximate until reference PDFs from Instantel land
at docs/reference/instantel/ for iteration.  USBM RI8507 / OSMRE
compliance chart is stubbed (placeholder rectangle) — separate work
item.

Smoke-tested on a K558 waveform event: 77 KB valid PDF, all fields
populated correctly from the snapshot DB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 02:55:58 +00:00
serversdown ed926de3f4 viewers: default mic to dB(L) + add Mic-unit toggle (dBL ↔ psi)
The sidecar-modal waveform plot was rendering mic in raw psi, while the
rest of SFM (history table column, peaks block, live-device chart,
event detail modal mic field) had already converted to dB(L) — matching
the BW Event Report convention.  Unifying.

Both viewers now:
  - Default mic chart values + axis title + peak label to dB(L)
  - Provide a header toggle ("Mic: dBL" pill) to flip to psi
  - Persist the preference via localStorage (sfm_mic_unit)
  - Re-render the open chart immediately on toggle

Conversion: dBL = 20 * log10(psi / 2.9e-9), where 2.9e-9 psi is the
20 µPa reference pressure already defined for the rest of the webapp.
Non-positive psi samples (log undefined) render as null; Chart.js
handles them as gaps in line mode and missing bars in histogram mode.

Also fixes event_browser.html's stats table — the MicL row was
hard-coding "<value> psi"; now honors the same toggle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 02:30:56 +00:00
serversdown 5d5441604b viewers: symmetric Y-axis on geo waveforms + clarify timestamp labels
Two fixes from the second screenshot review:

1. Geophone waveform Y-axis now renders SYMMETRIC around zero — zero
   line sits in the middle of the chart, signal goes both above and
   below.  Standard seismograph display convention; matches the
   Instantel printout look.  Previously Chart.js auto-scaled to the
   data range so e.g. Vert showing values from -0.005 to -0.015 had
   the zero line completely off-screen.

   Mic channel (sound pressure, always positive) keeps the default
   auto-scale anchored at zero.  Histograms (per-interval peaks, also
   always positive) likewise keep bars rising from a zero baseline.

2. Modal labels clarified to remove the 'Timestamp' vs 'Captured at'
   ambiguity:
     'Timestamp'   →  'Recorded at'         (when the seismograph
                                              recorded the event —
                                              from BW report's Event
                                              Time field)
     'Captured at' →  'Received by server at' (when our sfm-db
                                              inserted the row)
   Both have tooltips explaining the distinction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 20:26:23 +00:00
serversdown 784f2cca36 viewers: decimal peak labels + bar chart for histograms + clean x-axis ticks
Three polish fixes spotted in the first prod screenshot of the inline
event-modal waveform plot:

1. Peak labels were rendering as "PEAK 2.500E-2 IN/S" because of a
   blanket toExponential(3) call.  New _fmtPeak() formatter picks
   decimal with adaptive precision for normal-range values (0.0001 to
   10000) and falls back to scientific only for truly extreme
   magnitudes.  Same value now reads "peak 0.0250 in/s".

2. Histogram events were being plotted as connected line charts, but
   histograms are per-INTERVAL peaks (one bar per minute, typically),
   not per-sample waveforms.  Now: detect histogram via record_type,
   render as a tight bar graph (bars touch), suppress the trigger line
   + zero baseline overlays (no trigger event on a histogram), and
   label the x-axis with interval number instead of milliseconds.

3. X-axis tick labels were displaying as "11.7187040000000002 ms"
   because the callback used the raw float, not the formatted label.
   Snap to 1 decimal place (or integer for whole-number values like
   histogram intervals).

Applied to both the inline modal plot in sfm_webapp.html and the
standalone /events viewer in event_browser.html — they share the same
data shape and presentation conventions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 19:54:04 +00:00
serversdown 6abfadae4f viewers: render pre-trigger samples (time_axis is metadata, not an array)
The /db/events/{id}/waveform.json endpoint returns `time_axis` as a
metadata object — {sample_rate, pretrig_samples, t0_ms, dt_ms,
n_samples, total_samples, rectime_seconds} — not a per-sample times
array.  Both viewers (sfm_webapp.html sidecar modal + event_browser.html)
were treating it as an array, silently falling back to a derived path
that ignored pretrig entirely and started the time axis at 0.

Symptom: trigger line drawn at the very left edge of every chart, no
visible "leading up to the event" samples even though they're in the
decoded data.

Fix: read time_axis.t0_ms (negative when pretrig samples exist),
time_axis.dt_ms, build per-sample times as `t0_ms + i * dt_ms`.  Trigger
line lands at sample where t crosses 0; pretrig samples render at
negative t to the left of it.

Confirmed on a K558 event with 208 pretrig samples + 2 sec rectime at
1024 sps — time axis now spans -203 ms to +2046 ms, trigger line at
~9% from the left edge as expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 21:58:20 +00:00
serversdown fd0e28657d sfm_webapp: default to Database view + sortable columns + inline waveform plot
Three UX upgrades to the main SFM webapp at /, all reinforcing the
'browse stored events' flow as the primary entry point:

1. Default section is now Database, not Live Device.  Most users land
   here to look at stored events; Live Device is opt-in (click the tab
   to talk to a unit).  Initial history + units fetch fires on first
   paint so the table is populated when the page loads.

2. History table columns are sortable.  Click any header to sort:
   timestamp, serial, per-channel PPV (Tran/Vert/Long), PVS, mic dB(L),
   project, client, type, key.  Default direction varies by column type
   (desc for numbers + timestamps, asc for text).  Sort arrows appear
   in the active column header.  Headers are sticky so they stay
   visible while scrolling.

3. Click-event-to-see-waveform.  The existing sidecar review modal now
   renders the 4-channel waveform plot inline at the top, fetched from
   /db/events/{id}/waveform.json in parallel with the sidecar fetch.
   Channels stacked MicL / Long / Vert / Tran (Instantel printout
   order), shared bottom time axis, dashed trigger line + triangle
   markers at t=0, zero baseline with "0.0" label on the right edge,
   peak callouts per channel.  Charts cleaned up on modal close.

Resolves the "where is the viewer" surprise — operators no longer need
to know about the /events route to see waveforms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:39:18 +00:00
serversdown c14a8c54db event_browser: Instantel-printout-style polish
Apply the cheap visual wins from the BW Event Report layout:

  1. Channel order reversed → MicL (top), Long, Vert, Tran (bottom)
     to match the Instantel printout.
  2. Shared bottom time axis — x-axis ticks only render on the
     bottom-most data channel; other channels hide ticks so all four
     visually share one time scale.
  3. Triangle trigger markers above and below the t=0 dashed line.
  4. Horizontal zero-baseline (dotted) per channel with "0.0" label
     on the right edge — Instantel convention.
  5. "Print view" toggle that flips dark→light theme (white panels,
     light grids, dark text) so the viewer can render usefully on
     paper-style output / @media print.
  6. Per-channel PPV stats table in the metadata header, with Peak
     Vector Sum displayed prominently.
  7. Colors adjusted to approximate BW trace colors (magenta MicL,
     blue Long, green Vert, red Tran).

Future PDF-export work will reproduce the same layout server-side
once you upload a real example PDF and we pick a rendering pipeline
(weasyprint / chromium --print-to-pdf / etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:09:12 +00:00
serversdown 460006e5cd sfm: stored-event browser at /events
New standalone HTML page (sfm/event_browser.html, ~470 lines, Chart.js)
that lets you browse persisted events from the SeismoDb + WaveformStore.
Companion to the existing live-device viewer at /waveform:

  /waveform  — connect to a unit and pull events in real time
  /events    — browse events already stored in the DB

Flow:
  1. Page loads → GET /db/units → populate serial dropdown
  2. Select serial → GET /db/events?serial=X&limit=500 → event list
  3. Click event → GET /db/events/{id}/waveform.json → render

Layout is Instantel-printout-ready: channels stacked vertically in
Tran / Vert / Long / MicL order, trigger line at t=0, peak labels,
clean dark theme.  Frames the future PDF-export feature without
needing extra layout work.

Smoke-tested against the dev prod-snapshot — 4 channels render with
correct peaks for K558 events (L=0.3 in/s = the offset-fault peak
we've been chasing all week).

CHANGELOG entry added under [Unreleased] per the v0.20.0 release plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 06:53:48 +00:00
serversdown 8710b8f327 docs: record three known issues discovered during prod deployment
1. bw_ascii_report parser misses PPV/vector_sum fields on certain TXT
   formats (5 events in prod).  Parser extracts every OTHER field for
   the same channels — likely a regex / format mismatch specific to
   some firmware-or-event-type combination.

2. NULL-timestamp duplicate rows.  events.timestamp can come back as
   NULL when the codec can't extract a footer timestamp; UNIQUE(serial,
   timestamp) doesn't fire on NULL, so backfills create new rows
   instead of upserting.  2 affected events on prod, easy SQL cleanup.

3. Histogram body sub-format with byte[5] != 0.  ~3 events on prod
   (T190LD5Q, O121L4L1) use a histogram body the walker doesn't
   recognize.  Codec returns 0 valid blocks; DB peaks come from the
   bw_report ASCII overlay so DB columns are correct, only the .h5
   plot is empty.  Cracking the sub-format unlocks the plot.

All three are pre-existing issues that today's deployment surfaced
during validation; none are regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 21:02:13 +00:00
serversdown db657bcac9 Merge pull request 'fix: bw_report overlay onto event before DB, prevents data loss docs: three-tier architecture model + strategic roadmap' (#27) from feat/wire-histogram-codec into dev
Reviewed-on: #27
2026-05-22 15:46:46 -04:00
serversdown 35842ac50a backfill: overlay bw_report onto Event before DB upsert
Mirror what the ingest path does: BW's reported peaks (and sample_rate
/ record_time) take precedence over codec output where present.

Without this, --force backfill silently overwrites bw_report-overlaid
DB columns with codec-derived peaks.  Wrong for events where the codec
doesn't fully decode (waveform walker edge cases on SP0/SS0/SV0-style
events, histogram byte[5]!=0 sub-format that isn't yet RE'd), producing
PVS=0 on real high-amplitude events.  Bit on prod 2026-05-22 with
three top-10 waveform events ending up at PVS=0 (rolled back same day,
this fix is the proper resolution).

New helper minimateplus.event_file_io.apply_bw_report_dict_to_event
operates on the projected sidecar dict shape (the structure
_bw_report_to_dict produces, which is what gets preserved in the
sidecar).  Mirrors apply_report_to_event's semantics: only writes
fields where bw_report has a non-None value, no-ops cleanly on
empty / None input.

Dev validation against prod snapshot:
  pre  : 1839.7315 pvs_sum   356 events with DB PVS ≠ sidecar bw_report
  post : 2016.4902 pvs_sum     2 events still mismatched (both have NULL
                                timestamp + duplicate rows, edge case)

Both edge-case events DO get the correct value written by the new
backfill — their stale rows from prior backfills remain because
UNIQUE(serial, timestamp) doesn't fire on NULL.  Separate dedup
cleanup needed for those 2 events (0.014% of corpus); not blocking.

Backfill remains idempotent + bw_report preservation still passes
(0 WIPED, 0 CHANGED on the 3rd consecutive run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 18:56:22 +00:00
serversdown 49a524d0d4 docs: three-tier architecture model + strategic roadmap
CLAUDE.md gains an Architecture section near the top describing the
canonical three-tier mental model:

  - SFM: device-side, live connections, /device/* endpoints
  - SDM: data-side, DB + waveform store + /db/* endpoints (currently
    living under sfm/ for historical reasons; rename deferred)
  - Codec library: pure data-interpretation, used by both tiers

Future code should be placed and named according to this model even
though the directory layout doesn't fully reflect it yet.  Decision
rule for where new code goes is documented inline.

README.md's Roadmap section gains two strategic-direction subsections:

  - "Strategic direction" — frames the suite-of-components vision and
    notes that BW ACH + Thor IDF call-home remain the data movers;
    seismo-relay's value is on the receiving and processing side.
  - "Terra-View ↔ SFM device control" — the long-term vision where
    Terra-View can launch into SFM device-control surfaces (operator
    notices missing unit → clicks "Connect to Device" → live view in
    browser).  Includes concrete implementation checklist (auth,
    embedded live-monitor view, action history, series IV live
    support).

The existing tactical roadmap items remain unchanged below.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 18:38:00 +00:00
serversdown 9ef424d098 Merge pull request 'Histogram body codec — full RE + peak-count fix that resolves the prod inflation incident' (#26) from feat/wire-histogram-codec into dev
Reviewed-on: #26
2026-05-22 13:08:03 -04:00
claude cc821f9ee3 hotfix: fix dockerfile on main to fix import bug on prod 2026-05-21 20:42:15 +00:00
serversdown ed6982c512 scripts: bw_report preservation check for backfill safety
Two-step tool to verify that backfill_sidecars doesn't wipe the
bw_report block from existing sidecars.  Workflow:

  1. snapshot --out before.json    (canonical-JSON hash per sidecar)
  2. run backfill
  3. diff --baseline before.json   (classifies every sidecar:
       PRESERVED / CHANGED / WIPED / STILL_MISSING / NEW / ADDED / REMOVED)

Exit code 1 if any WIPED or CHANGED entries found, 0 otherwise — so
it can gate a CI step or a deploy script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 06:13:52 +00:00
serversdown d506ebc103 histogram_codec: peak count is uint8 (not uint16 LE) — properly cracks
the BE9558 / BE18003 extension-byte case

The bytes at [7]/[11]/[15]/[19] are an annotation field (purpose still
unclear — empirically non-zero on intervals with sub-Hz or unmeasurable
freq), NOT the high byte of the peak count.  The N844 fixture corpus
the original RE was done against had zero values in those bytes for
every block, so uint8 and uint16 LE were equivalent there — but on
real BE9558 Tran-drift events and BE18003 Histogram+Continuous events
the uint16 LE interpretation produced peaks up to 268 in/s and 35×
inflated PVS sums.

Cross-correlated against BW's per-interval ASCII export on:
  - K558LKZU/LL1P/LL3K  → 100% T/V/L/M peak match (1435 blocks each)
  - T003LKZR/LL0O/LL1M  → 100% T/V/L, 99.3% M (0.05 dB rounding only)
  - N599LKZS/LL0L        → 100% all channels
  - N844 fixture corpus  → 100% all channels (unchanged)

Annotations preserved on every record for future RE; the defensive
_MAX_PEAK_COUNT bound is no longer needed (uint8 maxes at 1.275 in/s,
well below any physical limit).

Synthetic regression test added using the verbatim K558LKZU.RE0H
interval-12 block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 06:05:19 +00:00
serversdown e949232875 histogram_codec + backfill: tighter peak ceiling, preserve bw_report
histogram_codec: drop _MAX_PEAK_COUNT 4096 → 2200. The old ceiling
let extension-byte blocks slip through at up to 20.48 in/s per
channel, producing 35× inflated PVS sums when first deployed to
prod. 2200 covers Normal-range full-scale (10 in/s = 2000 counts)
plus 10% headroom for quantization edge cases.

backfill_sidecars: also preserve the bw_report block alongside
review + extensions when regenerating sidecars. event_to_sidecar_dict
takes a BwAsciiReport dataclass not a dict, so for bw_report we
overlay the existing block after regen rather than passing as a kwarg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:50:10 +00:00
serversdown 76bce0b5a3 Merge pull request 'v0.20.0 - prerelease features.' (#25) from feat/wire-histogram-codec into dev
- dockerfile fix
- histogram body codec FULLY decoded
- backfill scripts fixed.
- docs added for histogram codec
2026-05-20 21:05:37 -04:00
49 changed files with 6609 additions and 209 deletions
+1 -1
View File
@@ -1,6 +1,6 @@
/bridges/captures/
/example-events/
/tests/fixtures/
/manuals/
# Python build artifacts
+184
View File
@@ -4,6 +4,190 @@ All notable changes to seismo-relay are documented here.
---
## [Unreleased]
---
## v0.21.1 — 2026-06-01
Bug fixes against v0.21.0 surfaced after the first prod redeploy. Three
production-visible symptoms — blank waveform charts on most Thor events,
blank histogram charts on all Thor events, and a mic chart that
auto-scaled against a dB(L) value treated as psi — all root-caused and
fixed.
### Fixed
- **Dynamic IDFW body offset.** The v0.21.0 codec hardcoded the body
at file offset `0x0f1f` based on the example corpus, but only ~52%
of production IDFW events use that offset; the rest sit at offsets
from `0x1033` up to `0x3082` depending on header padding. At
`0x0f1f` the codec would find a coincidentally-matching `00 02 00`
magic, read the 2-byte Tran preamble, and return empty V/L/M
arrays — producing near-empty .h5 files and blank charts.
`micromate.idf_file._find_waveform_body_offset()` now scans every
`00 02 00` magic position past `0x0E00`, trial-decodes each one,
and picks the offset with the most samples. Validated across 483
prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully
decode, 126/483 partial (BW codec walker-stops-early on loud
events — pre-existing limitation, samples reached are correct).
- **IDFH histograms now render bar charts.** Histograms previously
skipped the .h5 write because there are no per-sample arrays, but
the renderer drives the per-interval bar chart from .h5 channel
data + `bw_report.histogram.n_intervals`. `save_imported_idf` now
synthesizes a 1-sample-per-interval array from the decoded
`IdfhInterval` peak counts and writes an .h5 so the existing
renderer works unchanged — each "sample" is the per-interval peak
ADC count, so the writer's `count × geo_fs/32768` conversion
yields the right bar height.
- **Mic chart scaling on Thor events.** `PeakValues.micl` (consumed
by the h5 writer's per-count mic scale factor) expects psi, but
the Thor bridge was stuffing the dB(L) value (~99.4) into it,
producing a per-count factor 5+ orders of magnitude too large and
a flat-looking mic chart. Fixed by adding `IdfPeaks.mic_pspl_psi`
alongside `mic_pspl_dbl`; `read_idf_file()` computes it from
binary mic counts (`max(|MicL|) × 2.14e-6 psi/count`) for both
IDFW and IDFH paths; `save_imported_idf` merges it onto the typed
event after `IdfEvent.from_report`; the bridge feeds psi to
`PeakValues.micl` with a dB(L)→psi formula fallback when only the
dB(L) value is available. dB(L) for the report header still
flows through `bw_report.mic.pspl_dbl` unchanged.
### Operator
After deploy, run `python scripts/backfill_thor_events.py` to refresh
every existing Thor event's sidecar + .h5 with the corrected codec
output. The script auto-skips events already at the current
`TOOL_VERSION`, so the bump from `0.21.0``0.21.1` is what triggers
the refresh.
---
## v0.21.0 — 2026-05-29
The "Thor / Series IV codec" release. Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline. Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event.
### Added — Thor IDF binary codec (`micromate/idf_file.read_idf_file`)
- **IDFW (waveform)** — body sits at fixed file offset `0x0f1f`; reuses the verified `decode_waveform_v2()` walker from `minimateplus.waveform_codec`. Sample fidelity is **8799% byte-exact** against the ASCII-sidecar reference values on quiet events; loud events hit the same walker-stops-early limitation as the BW codec on `SP0/SS0/SV0`-style events.
- **IDFH (histogram)** — dedicated segment-based decoder for the Thor histogram body format: `[len_be][0a 00 00 00][00 NN][05 3f]` framing plus N × 72-byte interval records (4 × 16-byte per-channel min/max/halfp). **All 859 Thor IDFH corpus files decode**, totalling **181,071 intervals**; per-channel peaks match the sidecar within **~1.8% (ADC quantization)**.
- **BW-aliased binary detection** — a small number of corpus files (e.g. `BE9439_*.IDFW/IDFH`) are actually Series III Blastware binaries that share the IDF filename convention by accident. `read_idf_file()` detects them via their BW `STRT` signature and raises `NotImplementedError` pointing the caller at `read_blastware_file()` instead of trying to decode them as IDF.
- Full field layouts in `docs/idf_protocol_reference.md`; supporting analysis scripts in `analysis_idf/` (decode validators, per-file detail dumps, corpus accuracy reports).
### Added — Thor → BW report adapter (`micromate/idf_to_bw_report.py`)
- **`build_bw_report_from_idf(report_dict, binary_md=, intervals=, is_histogram=)`** projects a parsed Thor `IdfReport` plus binary-extracted metadata plus decoded IDFH intervals into the `bw_report`-shaped dict that `sfm.report_pdf.gather_report_data` consumes. No need to duplicate the renderer — Thor data is ~95% the same metric set as BW; the adapter handles the field-name mapping (`MicPSPL``pspl_dbl`, `>100` sentinel → `zc_freq_above_range`, free-form `Calibration : Nov 22, 2023 by Instantel``calibration_date` + `calibration_by`, etc.).
- For IDFH events the adapter derives `histogram.interval_times` by stepping `IntervalSize` from `HistogramStartTime`, matching what the BW pipeline expects from a histogram-mode event.
- **Wired into `WaveformStore.save_imported_idf`** — every Thor event ingested via `/db/import/idf_file` now gets a `bw_report` block in its sidecar in addition to the existing `extensions.idf_report` (the raw parsed Thor payload). Falls back gracefully (PDF renders from DB-only fields) if the adapter raises — logged as a warning rather than failing the ingest.
### Companion releases
- **Terra-View v0.13.0** ships in parallel — closes Phase 1 of the SFM integration. The shared event-detail modal now renders the SFM event story (Chart.js waveform/histogram chart, inline PDF preview, `.TXT` download, FT/reviewer/notes review form) without operators needing to bounce to the standalone SFM webapp on port 8200. Uses only existing seismo-relay endpoints — no API changes here, just better consumption.
### Migration / Operations
No DB migration needed. Existing Thor events already in the store don't automatically pick up the new `bw_report` block — they'd need a re-ingest (post the IDF binary + paired `.TXT` back to `/db/import/idf_file`) for the adapter to run. Alternatively, run `scripts/backfill_sidecars.py --reparse-txt` after a small adapter change (the script currently only re-runs the BW ASCII parser; extending it to handle Thor would be a small follow-up).
```bash
cd /home/serversdown/terra-view
docker compose build sfm && docker compose up -d sfm
```
The bumped `TOOL_VERSION = "0.21.0"` in `minimateplus/event_file_io.py` means any subsequent `backfill_sidecars.py --force` pass will re-write sidecars with the new version stamp; that's expected and harmless.
---
## v0.20.0 — 2026-05-28
The "PDF + parser polish" release. Closes out the Event-Report PDF iteration started in v0.17.x: histogram layouts now render correctly against BW reference PDFs, the ASCII parser handles the real-world edge cases production events were tripping over (OORANGE, `>100 Hz`, histogram timestamps), and the `.TXT` preservation rollout lets parser fixes be applied retroactively to ingested events. Adds server-wide timezone support so operator-visible timestamps no longer drift into UTC. Rolls up the substantial "pre-v0.20" body of work that had accumulated under `[Unreleased]` (PDF generation, histogram codec fix, histogram parser fields, `.TXT` preservation, backfill safety) — see the trailing "pre-v0.20.0 work" section below for the full list.
### Added (2026-05-28)
- **Server-wide display timezone via `TZ` env var.** Both seismo-relay and terra-view now respect a `TZ` environment variable (default `America/New_York` on prod). Affects server log timestamps, the PDF report renderer's UTC→local conversions on the "Created" footer line, matplotlib's datetime axes, and any other naïve-vs-aware datetime rendering. DB columns (`created_at`, etc.) stay UTC regardless — this is a display-side fix, not a storage-side one. Dockerfile now installs `tzdata` (required for the env var to take effect under `python:slim`). Override per-deployment via the `TZ` line in `docker-compose.yml`.
- **ZC Freq "above-range" handling — render `>100 Hz` instead of `—`.** BW writes `">100 Hz"` literally when the zero-crossing algorithm sees a peak too fast to count (device cuts off at 100 Hz on V10.72). Previously `_parse_number(">100")` returned None and the PDF stats table rendered `—`. Now the parser mirrors the OORANGE pattern: stores 100.0 on `zc_freq_hz` and sets a new `zc_freq_above_range` flag. Flag rides through the sidecar's `bw_report` block. Renders as `>100` in the PDF (per-channel + mic block), as `· >100 Hz` inline on the event modal's Peaks section, and as a dedicated column on the event-browser stats table. Verified against the real T190LD5Q.LK0W fixture from 2026-05-27 plus a synthetic test case.
- **Per-channel ZC Freq surfaced in event modals.** Neither the main webapp modal (`sfm_webapp.html`) nor the standalone event browser (`event_browser.html`) previously exposed ZC Freq. Now both do — webapp shows it inline alongside PPV (`0.04500 in/s · 47 Hz`); event-browser gets a dedicated column on its per-channel stats table. Required wiring a parallel sidecar fetch into the event-browser's `loadEvent()` (it was only fetching `waveform.json`). Falls back to `—` for events without a preserved `.TXT` (pre-2026-05-27 ingests).
- **`scripts/backfill_sidecars.py --reparse-txt` flag.** Before this, the backfill script preserved the `bw_report` block from existing sidecars verbatim — so parser-side fixes (like the `>100 Hz` addition above) couldn't reach old events. The new flag re-runs the current parser against the preserved `<serial>/<filename>_ASCII.TXT`, overwrites the bw_report block, and cascade-regenerates the sidecar. Implies sidecar regeneration on every event (bypasses the sha/version skip). No-op for events without a preserved .TXT (legacy ingests pre-2026-05-27 .TXT-preservation rollout). Idempotent. Run with `--skip-hdf5` to skip waveform regen — recommended when only the bw_report needs refreshing. Validated end-to-end on prod: 9,999 events refreshed cleanly, ZC Freq + OORANGE flags now populated where the original .TXT had them.
### Fixed (2026-05-28)
- **Histogram PDFs no longer 500 on the missing `histogram_interval_size_s` attribute.** The histogram-interval-times derivation block in `gather_report_data` referenced `rd.histogram_interval_size_s`, but the field was never declared on the `ReportData` dataclass nor read from the sidecar projection (it was inlined into `gather_report_data` without the seconds-numeric counterpart making it onto the dataclass). Every histogram PDF render raised `AttributeError → 500`. Waveform PDFs were unaffected. Fix: add the field, read it from the projection's existing `bw_report.histogram.interval_size_s` key.
- **Histogram PDF geo channels now share a single nice-quantized y-axis.** Previously each geo subplot auto-scaled independently — Tran, Vert, and Long all showed different per-channel maxes, so bar heights weren't directly comparable across channels. The footer "Amplitude Geo: X in/s/div" label was also computed as `max(first_geo_channel) / 5` with no LSB quantization, producing nonsense values like `0.003 in/s/div` when the geophone LSB is 0.005. Fix: compute a single shared geo y-axis range from `max(Tran, Vert, Long)`, quantize the per-division step to BW's 1-2-5 sequence rounded to the 0.005 in/s LSB (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, ...), apply the same `ylim` + ticks to all three subplots, and use that step for the footer label. MicL stays on its own auto-scale (different units). Matches BW's chart styling.
### Docs (2026-05-28)
- **Roadmap entry for a second undecoded histogram body sub-format.** BE17353 (S353) events observed on 2026-05-28 use a histogram body where `byte[5] = 0x00` (looks like a valid block header by every prior signal) but the walker finds zero data blocks. Different from the existing `byte[5] != 0` roadmap entry (T190 / O121). Operationally identical impact — ingestion succeeds, DB peaks come from the bw_report overlay, only the chart is empty. Sample events captured in the roadmap entry for future RE work.
### Migration / Operations
- **Re-parse existing events to pick up the new parser fields.** Run on whichever box hosts the live waveform store:
```bash
docker exec terra-view-sfm-1 python /app/scripts/backfill_sidecars.py \
--reparse-txt --skip-hdf5 --dry-run -v | tail
# Looks reasonable? Run for real:
docker exec terra-view-sfm-1 python /app/scripts/backfill_sidecars.py \
--reparse-txt --skip-hdf5 -v | tee /tmp/reparse.log | tail -30
```
Idempotent; safe to re-run. Only touches sidecars on disk — no DB writes.
- **terra-view docker-compose.yml**: add `TZ=America/New_York` (or your deployment's zone) to both the `terra-view` and `sfm` service `environment:` blocks. Without this, server-rendered timestamps stay in UTC even on the rebuilt SFM image.
### Pre-v0.20.0 work (rolled into this release)
The bullets below accumulated under `[Unreleased]` between v0.19.0 and v0.20.0; kept here so the historical narrative isn't lost.
#### Fixed
- **bw_ascii_report parser now handles `OORANGE` saturation marker.** BW writes `"OORANGE"` (truncation of "Out Of Range") in PPV / PVS / MicL PSPL fields when the underlying measurement exceeded the channel's full-scale. Previously our `_parse_number()` returned None → DB ended up with NULL peaks for legitimate high-amplitude events. Confirmed on real ASCII files pulled 2026-05-27 from the Windows watcher PC: T190LD5Q.LK0W (Vert saturated at Normal range 10 in/s), T438L713.RY0W (all three channels saturated at Sensitive range 1.25 in/s), K557L3YM.OE0W (Tran+Vert saturated + Mic PSPL OORANGE). New behavior:
- Per-channel PPV: substitute `geo_range_ips` as a conservative lower bound + set `ppv_saturated` flag
- Peak Vector Sum: substitute `sqrt(3) * geo_range_ips` (the theoretical max when all 3 channels are simultaneously at full-scale) + `peak_vector_sum_saturated` flag
- MicL PSPL: substitute 140 dB(L) (conservative NL-43 max) + `pspl_saturated` flag
- Saturation flags are propagated into the sidecar's `bw_report` block for downstream UI rendering (`> 10 in/s` or similar)
- Five events on prod (T190 / T438 / K557 + 2 others matching the same fault pattern) will pick up correct DB peaks + saturation flags once re-forwarded
- **bw_ascii_report parser handles `Peak Vector Sum TimeSum` typo'd label.** Real BW output uses this misspelled label (Sum appended twice instead of "Peak Vector Sum Time"). Now accepted as an alias. Confirmed against all three OORANGE example files — every one has the typo.
#### Added
- **Histogram per-interval aggregation in `waveform.json`.** Histogram events now render with one bar per BW-reported interval (matching the Blastware printout) instead of ~200 bars per event (the raw codec output). When the sidecar's `bw_report.histogram.n_intervals` is populated (events ingested with the new parser, see next bullet), the `/db/events/{id}/waveform.json` endpoint groups the codec samples into N intervals via max-per-group and returns the aggregated array. `time_axis` gains `histogram_aggregated: true`, `n_intervals`, `interval_size_s`, and `interval_times` (HH:MM:SS strings). Both the modal chart and the standalone event browser use those interval timestamps as x-axis labels when present. Defensive: no-op for events ingested before the parser extension landed (their sidecars lack `histogram.n_intervals`) — those continue to render with raw codec output.
- **`bw_ascii_report` parser now captures histogram-specific fields.** Previously the parser dropped these fields silently (Roadmap item closed):
- `Histogram Start Time` / `Histogram Start Date` (combined into `histogram_start: datetime`)
- `Histogram Stop Time` / `Histogram Stop Date` (combined into `histogram_stop: datetime`)
- `Number of Intervals` (`histogram_n_intervals: int`)
- `Interval Size` ("1 minute" string + parsed seconds: `histogram_interval_size_str`, `histogram_interval_size_s`)
- `<Channel> Peak Time` + `<Channel> Peak Date` for histogram events (combined into `channel_peak_when: dict`; waveforms continue to use `time_of_peak_s` relative)
- `Peak Vector Sum Date` (combined with PVS Time into `peak_vector_sum_when: datetime`; clears the previous bogus `peak_vector_sum_time_s` parse that interpreted "22:33:52" as 22.0 seconds)
- All new fields land in the sidecar's `bw_report.histogram` block via `_bw_report_to_dict`. Tested against synthetic K558LLB7.V20H-shaped input.
- **Raw BW ASCII report (.TXT) preservation.** `save_imported_bw` now writes the paired `_ASCII.TXT` to `<store>/<serial>/<filename>_ASCII.TXT` alongside the binary at ingest time. Previously the .TXT was parsed into the sidecar's `bw_report` projection and then discarded — meaning parser bug fixes couldn't be applied retroactively without re-forwarding from the watcher PC. Now the raw .TXT lives in the waveform store permanently (~15 KB per event; ~210 MB total for a 14k-event store; negligible). Sidecar's `source.txt_filename` field records the saved path; backfill_sidecars preserves it across regens. New `GET /db/events/{id}/ascii_report.txt` endpoint serves the raw .TXT for any event ingested after this change. Events ingested before today still return 404 from that endpoint until re-forwarded. Architectural rationale: with BW Mail / Forwarding Agent being phased out of the operator workflow, the XML/PDF/WMF that those tools produced are no longer available — the binary + .TXT (created by BW ACH itself) are our authoritative source for everything going forward.
- **Event Report PDF generation** — `GET /db/events/{id}/report.pdf` returns a single-page letter-portrait PDF for any event with waveform data on disk. Covers every field a Blastware Event Report includes: header metadata (date/time, trigger source, range, sample rate, project/client/operator/location, serial+firmware, battery, calibration, file name), microphone block (PSPL in dB(L) + psi, ZC freq, channel test), per-channel stats table (rows differ for waveform vs histogram), Peak Vector Sum, and the 4-channel plot. Iterated against real Blastware reference PDFs (uploaded to `example-events/pdfsnstuff/`):
- **Waveform layout**: header shows Date/Time, Trigger Source, Range, Sample Rate; stats table has PPV / ZC Freq / Time (Rel. to Trig) / Peak Accel / Peak Disp / Sensor Check; bottom plot is 4-channel line waveform (MicL top → Tran bottom), shared time axis in seconds, dashed trigger line + triangle marker at t=0, symmetric Y on geo channels, zero-anchored on mic, "0.0" baseline label on right per BW convention; footer shows `Time X sec/div Amplitude Geo: Y in/s/div Mic: 0.001 psi(L)/div` and the trigger window `▶━━◀` marker. USBM RI8507/OSMRE compliance chart placeholder upper-right.
- **Histogram layout**: header shows Start / Finish / Intervals At Size / Range / Sample Rate (no Trigger Source — histograms aren't triggered); NO USBM chart; stats table has PPV / ZC Freq / Date / Time / Sensor Check; bottom plot is per-interval bar chart, Y-axis 0-to-peak (never negative), 0.0 baseline at the bottom; footer shows `Time INTERVAL_SIZE /div Amplitude Geo: Y in/s/div Mic: 0.001 psi(L)/div`.
- Backed by matplotlib (vector PDF, no headless-browser dep). Adds matplotlib>=3.8 to deps.
- **Known gap**: histogram codec returns per-block granularity (~200 bars for a 4-interval event) instead of BW's per-interval aggregation. Visual difference vs BW's 4-bar display. XML-driven data source (parsing the structured `_XML.XML` files BW also exports) is the planned fix; that route also resolves the bw_ascii_report PPV-miss bug.
- **Stubbed**: USBM RI8507 / OSMRE compliance chart curves (separate work item; requires coding the regulatory piecewise functions).
- **"Download PDF" button** in the event modal's footer — triggers the new endpoint; opens in a new tab so the browser handles save-or-display + surfaces any 404 / server errors visibly.
- **SFM webapp now opens to Database view by default** and the History table is fully interactive. Click any column header to sort ascending / descending (timestamp, serial, per-channel PPV, PVS, mic dB(L), project, client, record type, key — all sortable). Click any event row to open the event modal, which now renders a **4-channel waveform plot inline** (MicL / Long / Vert / Tran stacked, Instantel-printout order) alongside the existing sidecar review fields. Headers are sticky so the columns stay visible while scrolling long event lists. No more "where is the viewer" — pick a unit from the filter dropdown, scan the table, click the event, see the waveform.
- **Stored-event browser** — new standalone HTML page at `GET /events` (`sfm/event_browser.html`). Pick a serial from the unit dropdown, scroll through that unit's events (newest-first), click any event to render its decoded waveform via the existing `/db/events/{id}/waveform.json` endpoint. Dark-themed Chart.js viewer, channels stacked vertically (MicL / Long / Vert / Tran — Instantel printout order, designed PDF-export-ready), trigger line at t=0, peak labels, search/filter, false-trigger flag honored. Companion to the existing live-device viewer at `/waveform`; the two routes are now clearly delineated in their docstrings. The webapp's inline plot at `/` is the primary path; `/events` remains a useful diagnostic when you want just a viewer.
- **Histogram body codec — uint8 peak count fix.** Per-channel peak fields at `block[6]/[10]/[14]/[18]` are `uint8`, not `uint16 LE` spanning `block[6:8]` etc. The original interpretation was byte-exact on the N844 fixture corpus only because every annotation byte (`block[7]/[11]/[15]/[19]`) in those fixtures was zero. On non-N844 events with non-zero annotation bytes (observed across BE9558 Tran-drift and BE18003 Histogram+Continuous units), the old interpretation produced peaks up to 268 in/s per channel and 35× inflated PVS sums when first deployed to prod (rolled back same day; properly fixed in this release). Cross-correlated against BW's per-interval ASCII export on K558 / T003 / N599 / N844 corpora — 100% byte-exact on T/V/L, 99%+ on M (sub-precision rounding). Annotation byte preserved on each record as `record["annotations"]` for future RE. Verified against ~3,500 blocks across 5 in-repo fixtures + a synthetic K558 interval-12 regression block.
- **`apply_bw_report_dict_to_event` helper** in `minimateplus.event_file_io`. Mirror of `apply_report_to_event` for the projected sidecar dict shape — used by the backfill path, which has the preserved `bw_report` block but not the original `.TXT` file. BW's reported peaks (and `sample_rate` / `record_time`) now win over codec output during `--force` backfill, matching ingest-path behavior.
- **`scripts/check_bw_report_preservation.py`** — two-step snapshot/diff tool to verify that `backfill_sidecars.py` doesn't wipe the `bw_report` block from existing sidecars. Classifies every sidecar as PRESERVED / CHANGED / WIPED / STILL_MISSING / NEW / ADDED / REMOVED. Exit code 1 if any WIPED or CHANGED entries are found, so it can gate a CI step or deploy script.
#### Fixed
- **`scripts/backfill_sidecars.py` no longer wipes `bw_report`.** Before this fix, `event_to_sidecar_dict` silently dropped the preserved `bw_report` block during every backfill, since the function only emits a `bw_report` when called with a live `BwAsciiReport` dataclass (which the backfill doesn't have — only the projected sidecar dict). Now we read the existing sidecar's `bw_report` and overlay it onto the regenerated sidecar, alongside the existing `review` and `extensions` preservation.
- **`scripts/backfill_sidecars.py --force` no longer overwrites BW-overlaid DB peaks with codec output.** The backfill path now calls `apply_bw_report_dict_to_event` before the DB upsert, mirroring what the ingest path does (`/db/import/blastware_file` parses the `.TXT` into a `BwAsciiReport`, calls `apply_report_to_event`, then upserts). Without this, events where the codec doesn't fully decode (waveform walker edge cases on SP0/SS0/SV0-style events, histogram `byte[5]!=0` sub-format) ended up with PVS=0 in the DB after a `--force` backfill; bit on prod 2026-05-22, rolled back the same day.
- **Thor IDF files no longer attempted as BW events in backfill.** `scripts/backfill_sidecars.py` now filters out `.IDFW` / `.IDFH` files in `_looks_like_event_file()`; they share the `.X0W` / `.X0H` suffix shape but use a separate ingest path (`WaveformStore.save_imported_idf`) and aren't decodable by `event_file_io.read_blastware_file`.
#### Docs
- **CLAUDE.md** — added a three-tier conceptual architecture model (SFM / SDM / shared codec library) near the top of the file, with a placement rule for where new code goes. Documents that what is conceptually SDM (database, waveform store, ingest, `/db/*` endpoints) still lives under `sfm/` for historical reasons; rename deferred until the codebase is quiet enough for a clean refactor.
- **README.md** — added a "Strategic direction" lead-in to the Roadmap that frames seismo-relay as a suite of cooperating components (not a single app), and an explicit "Terra-View ↔ SFM device control" roadmap section with a concrete implementation checklist (auth as hard prerequisite, embedded live-monitor view, action history, Series IV live-device support).
- **`docs/histogram_codec_re_status.md`** updated with the uint8 retraction and the annotation-byte status.
- Three known issues recorded in the Roadmap that were discovered during prod validation: (1) `bw_ascii_report` parser misses PPV / `vector_sum` on some `.TXT` formats (5 events on prod); (2) NULL-timestamp duplicate-row dedup needed (2 events on prod); (3) histogram body sub-format with `byte[5] != 0` not yet decoded (~3 events on prod with empty `.h5` plots).
---
## v0.19.0 — 2026-05-20
The "device-family separation" release. Tightens the boundary between Series III (MiniMate Plus / Blastware) and Series IV (Micromate / Thor) so the UI and storage layer dispatch deterministically by family instead of sniffing filename extensions or magnitude heuristics.
+101 -1
View File
@@ -2,12 +2,112 @@
Ground-up Python replacement for **Blastware**, Instantel's Windows-only software for
managing MiniMate Plus seismographs. Connects over direct RS-232 or cellular modem
(Sierra Wireless RV50 / RV55). Current version: **v0.17.0**.
(Sierra Wireless RV50 / RV55). Current version: **v0.21.0**.
When new information about the protocol is discovered, please update the instantel_protocol_reference.md with the findings in addition to this document
---
## Architecture: three-tier conceptual model
seismo-relay is a **suite of cooperating components**, not a single app.
The three tiers below are the canonical mental model — the current
directory layout doesn't fully reflect them yet (some of what is
conceptually SDM lives under `sfm/` today), but new code should be
placed and named according to this model.
### 1. SFM — the device-side (active connection to physical units)
Replaces Blastware's *talk-to-the-meter* role. Lives where a connection
to a physical seismograph is open.
In scope:
- `minimateplus/{transport,framing,protocol,client}.py` — wire protocol
- `seismo_lab.py` — diagnostic GUI (a thick client for SFM)
- The `/device/*` HTTP endpoints in `sfm/server.py`
`/device/info`, `/device/events`, `/device/monitor/*`, `/device/call_home`,
etc. Anything that opens a connection at the moment of the request.
- Future: a Thor / Micromate live client (mirror `minimateplus/`)
- Future: a control surface Terra-View can launch into — see the
README's Roadmap.
Does NOT own a database. Outputs `Event` objects. Has a "spun up when
needed" runtime profile rather than "always on".
### 2. SDM — the data-side (storage, ingest, and serving)
The new name for the receiving-and-storing role. Originally called SFM
because the FastAPI service started life as a thin device proxy, but
the actual role has migrated heavily toward data management. **For now
the directory remains `sfm/`** — renaming requires touching ~30-50
files in seismo-relay + ~10-15 in terra-view + a Docker volume
migration; deferred until the codebase is quiet enough to do it as a
clean refactor.
In scope:
- `sfm/database.py` (`SeismoDb`)
- `sfm/waveform_store.py`, `sfm/event_hdf5.py`
- The `/db/*` HTTP endpoints — `events`, `units`, `monitor_log`,
`sessions`, `false_trigger` mutations
- The `/db/import/*` ingest endpoints — `blastware_file` (series3),
`idf_file` (series4); anything that receives events FROM somewhere
- `scripts/backfill_sidecars.py`, `scripts/check_bw_report_preservation.py`,
and similar data-maintenance tools
- The `.sfm.json` sidecars and `.h5` files in the waveform store
- The shape that Terra-View consumes (Terra-View should never need to
reach into SFM/device-side endpoints to populate its UI)
Always-on, scaled for storage/serving, has the DB and waveform store.
### 3. Codec library — pure data interpretation (used by both sides)
Neither SFM nor SDM — a shared library both depend on.
In scope:
- `minimateplus/{waveform_codec,histogram_codec,event_file_io,bw_ascii_report,blastware_file}.py`
- `micromate/{idf_ascii_report,idf_file}.py`
These modules take bytes (off the wire on the SFM side, or from a
forwarded file on the SDM side) and return `Event` objects. They
should not import from `sfm/`, must not touch a DB, and have no I/O
beyond reading files passed as arguments. Keep them pure — both
tiers can then depend on them without circularity.
#### Thor IDF binary codec (2026-05-28)
`micromate/idf_file.read_idf_file()` decodes both Thor IDFW
(waveform) and IDFH (histogram) binaries.
- **IDFW** reuses `decode_waveform_v2()` on the body at fixed file
offset `0x0f1f`. Sample fidelity is 8799% byte-exact on quiet
events; loud events hit the BW codec's known walker-stops-early
limitation.
- **IDFH** has its own segment-based decoder: `[len_be][0a 00 00 00]
[00 NN][05 3f]` + N × 72-byte interval records (4 × 16-byte
per-channel min/max/halfp). All 859 Thor IDFH corpus files
decode (181,071 intervals); peak matches sidecar within ~1.8%
(ADC quantization).
The two outlier `BE9439_*` files in the Thor example corpus are
actually Series III Blastware binaries that share the `.IDFW`/`.IDFH`
filename convention by accident. `read_idf_file()` detects them by
their BW STRT signature and raises NotImplementedError pointing
callers at `read_blastware_file()`. See
`docs/idf_protocol_reference.md` for full field layouts.
### Practical consequences
When deciding where new code goes, ask:
- *Does it need a connection to a device?* → SFM
- *Does it operate on stored events / sidecars / DB rows?* → SDM
- *Does it interpret bytes into structured data, with no I/O of its own?* → codec lib
Terra-View is downstream of SDM for data, and (per the roadmap) will
eventually invoke into SFM's device-control endpoints to provide a
"connect to unit" experience.
---
## Project layout
```
+12 -1
View File
@@ -2,10 +2,21 @@ FROM python:3.11-slim
WORKDIR /app
# tzdata is required for the TZ env var to take effect (python:slim
# omits the timezone database). Without it, datetime.now() / logging
# / matplotlib all stay in UTC regardless of TZ. Default zone gets
# set further down via ENV; users override per-deployment via the
# `TZ` env var in docker-compose.
RUN apt-get update && \
apt-get install -y --no-install-recommends curl && \
apt-get install -y --no-install-recommends curl tzdata && \
rm -rf /var/lib/apt/lists/*
# Default display timezone — applied to server logs, datetime.now(),
# matplotlib rendered timestamps, and any naïve-vs-aware datetime
# conversions in the PDF renderer. Override via TZ env var in
# docker-compose; storage in the DB is always UTC regardless.
ENV TZ=America/New_York
COPY pyproject.toml requirements.txt ./
COPY minimateplus ./minimateplus
COPY micromate ./micromate
+97 -6
View File
@@ -1,4 +1,4 @@
# seismo-relay `v0.19.0`
# seismo-relay `v0.21.0`
A ground-up replacement for **Blastware** — Instantel's aging Windows-only
software for managing seismographs. Supports both the **MiniMate Plus
@@ -35,6 +35,25 @@ over direct RS-232 or cellular modem (Sierra Wireless RV50 / RV55).
> and storage layer dispatch deterministically instead of sniffing
> filenames. Self-applying migration backfills existing rows from the
> binary filename extension.
> **v0.20.0 (2026-05-28)** closes out the Event-Report PDF iteration
> started in v0.17.x: histogram layouts render correctly against BW
> reference PDFs, the ASCII parser handles real-world edge cases
> (`OORANGE`, `>100 Hz`, histogram timestamps), and per-channel ZC
> Freq is surfaced in both modals (event browser + main webapp).
> Adds a server-wide `TZ` env var so operator-visible timestamps
> render in local time instead of UTC. New
> `scripts/backfill_sidecars.py --reparse-txt` lets parser fixes be
> applied retroactively to existing events without re-forwarding,
> using the `.TXT` files preserved at ingest time.
> **v0.21.0 (2026-05-29)** is the Thor / Series IV decoder release —
> `micromate/idf_file.read_idf_file()` now decodes both IDFW
> (waveform) and IDFH (histogram) binaries (8799% sample fidelity
> on quiet IDFW events; all 859 IDFH corpus files decode cleanly).
> A new `micromate/idf_to_bw_report.py` adapter projects parsed
> Thor reports into the BW-shaped sidecar block, so Thor events
> flow through the existing Event Report PDF pipeline without a
> separate renderer. Terra-View v0.13.0 ships in parallel and
> closes Phase 1 of the SFM integration — see its CHANGELOG.
> See [CHANGELOG.md](CHANGELOG.md) for full version history.
---
@@ -58,7 +77,8 @@ seismo-relay/
├── micromate/ ← Series IV (Micromate / Thor) client library (NEW v0.19)
│ ├── models.py ← IdfEvent, IdfReport, IdfPeaks, IdfProjectInfo, IdfSensorCheck (mic in native dB(L))
│ ├── idf_ascii_report.py ← Parse Thor .IDFW.txt / .IDFH.txt event sidecars
── idf_file.py ← Stub for the .IDFW / .IDFH binary codec (reverse-engineering pending)
── idf_file.py ← Binary codec for .IDFW + .IDFH (v0.21.0+)
│ └── idf_to_bw_report.py ← Adapter projecting Thor IDF into the BW report shape (v0.21.0+)
├── sfm/ ← SFM REST API server (FastAPI, port 8200)
│ ├── server.py ← Live device endpoints + DB query + ingest endpoints + caching
@@ -415,7 +435,7 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows.
- [x] Thor IDF file ingest at `/db/import/idf_file` (paired with `thor-watcher`, v0.18.0+)
- [x] Native `IdfEvent` / `IdfReport` typed models — mic in dB(L), full title strings, sensor self-check, calibration, firmware version
- [x] Parser verified against 1,014 paired `.txt` sidecars in `thor-watcher/example-data/`
- [ ] Binary `.IDFW` / `.IDFH` codec — pending (see Roadmap + [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md))
- [x] Binary `.IDFW` / `.IDFH` codec — ✅ v0.21.0. IDFW reuses `decode_waveform_v2()` on the body at offset `0x0f1f` (8799% sample fidelity on quiet events); IDFH has a dedicated segment-based decoder (all 859 corpus files decode, 181,071 intervals total). See `micromate/idf_file.py` + `docs/idf_protocol_reference.md`.
- [ ] Live-device protocol — pending codec
**Data persistence:**
@@ -459,10 +479,76 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows.
## Roadmap (Future)
### Strategic direction — where this is going
seismo-relay is being built as a **suite of cooperating components**
that together replace and improve on Blastware's role. Three logical
tiers:
1. **SFM** (device-side) — owns the active connection to a physical
unit. Today: `minimateplus/`, `/device/*` HTTP endpoints,
`seismo_lab.py`. Future: live Thor / Micromate support.
2. **SDM** (data-side) — owns the database, waveform store, ingest
pipelines, and the read-API that Terra-View consumes. Today this
code lives under `sfm/` for historical reasons; the role has
migrated and the eventual rename is on the long-tail cleanup list.
3. **Codec library** — pure data-interpretation: `minimateplus/*_codec.py`,
`bw_ascii_report.py`, `micromate/idf_*.py`. Used by both SFM and
SDM, depends on neither.
Terra-View is downstream of SDM for fleet listings, event detail, etc.
The long-term vision adds a **second link** from Terra-View → SFM for
direct device interaction (see below).
The codec work in this repo isn't trying to replace BW's network
layer — BW's ACH file forwarding and Thor's IDF call-home are
battle-tested. The value is in the receiving and processing side: turn
the stream of binary+ASCII pairs into something users can search,
filter, alert on, and report from.
### Terra-View ↔ SFM device control (the long-term vision)
Today Terra-View only reads from SDM (event listings, dashboards,
project reports). When a unit goes missing — operator notices in the
Terra-View dashboard — there's no way to *do* anything from the UI.
The path of least resistance is to RDP into a Windows box and open
Blastware, which defeats the purpose of having Terra-View.
Target experience:
- Operator notices a unit in Terra-View dashboard hasn't called in.
- Clicks unit detail → "Connect to Device" button.
- Terra-View opens an embedded view (modal or side-panel) that talks
to SFM's `/device/*` endpoints over the network.
- Live view: device clock, battery, memory, current monitor status.
- Actions: start/stop monitoring, push compliance config changes, pull
fresh events, run a sensor self-check, change call-home settings.
- Audit log: every connect / action recorded in SDM for the unit
history.
Implementation steps (concrete):
- [ ] **SFM authentication & authorization layer.** Today `/device/*`
endpoints are unauthenticated — anyone on the network can call
them. Need at minimum a token-based auth, ideally with a "who
can connect to which units" mapping. Hard prerequisite for
letting Terra-View users into the control surface.
- [ ] **Terra-View "Connect to Device" entry point** on the unit
detail page. Renders only when unit has connection info on file
and the user has permission.
- [ ] **Embedded live-monitor view** in Terra-View — equivalent to
`seismo_lab.py`'s Bridge tab, but in the browser. Polls SFM's
`/device/monitor/status` on an interval; sends start/stop via
`/device/monitor/{start,stop}`.
- [ ] **Action history** — every connect / push / action call records
a row in `unit_history`, viewable on the unit detail page.
- [ ] **Series IV live-device support in SFM** — currently `/device/*`
only supports MiniMate Plus. Blocks "Connect to Device" for
Thor units until done. Depends on Thor wire-protocol capture
and a `micromate/` parallel of the `minimateplus/` modules.
### High-impact (unblocks product features)
- [ ] **Series III waveform body codec reverse-engineering.** The 5A bulk-stream body is some kind of compressed/encoded format (not raw int16 LE as previously assumed — see §7.6.1 retraction in `docs/instantel_protocol_reference.md`). Structural framing is ~50% decoded on branch `claude/codec-re-cBGNe` (tagged-block walker, segment counters); per-byte sample mapping is still open. Until this lands, the in-app waveform viewer renders garbage and BW-import peak values fall back to `_peaks_from_samples()` saturation noise. Workaround: pair every BW-imported event with its `_ASCII.TXT` so the device-authoritative peaks land in the DB regardless of codec.
- [ ] **Series IV (Thor IDF) binary codec reverse-engineering.** `.IDFH` / `.IDFW` files are currently stored opaquely by `WaveformStore.save_imported_idf`, with all metadata sourced from the paired `.txt` sidecar. This works because thor-watcher forwards both files together, but operators who haven't enabled Thor's TXT exporter get rows with NULL peaks. Cracking the binary closes that gap and unlocks waveform display. Starting-point reference at [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md) — two observed file signatures (1,012 newer-firmware files + 2 old files whose layout matches the Series III STRT-record format), suggested first-session plan (~2-4 hrs), 1,014 paired binary+txt files available as ground truth in `thor-watcher/example-data/`. Code seam ready at `micromate/idf_file.py`.
- [x] **Series IV (Thor IDF) binary codec reverse-engineering.** ✅ v0.21.0 — `micromate/idf_file.read_idf_file()` decodes both IDFW (waveform body at offset `0x0f1f`, reusing `decode_waveform_v2()`; 8799% sample fidelity on quiet events) and IDFH (dedicated segment-based decoder: all 859 corpus files decode, 181,071 intervals, peaks within ~1.8% of sidecar values). `WaveformStore.save_imported_idf` now also projects parsed Thor data into a `bw_report` block via `micromate/idf_to_bw_report.py` so Thor events render in the existing Event Report PDF pipeline without a separate renderer.
- [ ] **In-app waveform viewer accuracy.** Depends on Series III codec decode. Plot.v1 JSON pipeline + viewer skeleton already exist; will start showing real waveforms automatically once `_decode_a5_waveform` produces correct samples. Series IV waveforms come online when the IDF codec lands.
- [ ] **Series IV live-device support.** Once the IDF binary is decoded, extend `micromate/` with `transport.py` / `framing.py` / `protocol.py` / `client.py` mirroring the `minimateplus/` package layout — depends on capturing Thor's wire protocol (TCP / RS-232 captures TBD).
- [ ] **Terra-view integration** — seismo-relay router, unit detail page, VISON-style event listing.
@@ -470,9 +556,10 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows.
### BW ASCII report parser enhancements (built in v0.16.0)
- [ ] **Histogram-specific structural fields.** Current parser handles the shared fields (PPV, ZC Freq, sensor self-check, project) but silently drops histogram-only fields: `Histogram Start/Stop Time`, `Histogram Start/Stop Date`, `Number of Intervals`, `Interval Size`, per-channel `Peak Time` + `Peak Date` (absolute timestamps rather than the waveform's `Time of Peak` relative seconds).
- [x] **PPV field misses on certain TXT formats.** ✅ v0.20.0 — root cause was the `OORANGE` (Out Of Range) saturation marker that BW writes when a channel exceeds its full-scale; `_parse_number()` returned None for the non-numeric value. Parser now substitutes `geo_range_ips` as a lower bound + sets `ppv_saturated` flag. All 5 prod events (T190LD5Q.LK0W, T438L713.RY0W, K557L3YM.OE0W, + 2 others) now parse cleanly.
- [x] **Histogram-specific structural fields.** ✅ v0.20.0 — `Histogram Start/Stop Time+Date`, `Number of Intervals`, `Interval Size`, per-channel `Peak Time` + `Peak Date`, and `Peak Vector Sum Date` all parse now. Land in the sidecar's `bw_report.histogram` block.
- [ ] **Histogram interval bin-table parsing.** Trailing 792-row table (per-interval Peak/Freq per channel + MicL) in histogram TXTs is unparsed. Probably too big for the sidecar JSON; may want a separate `.histogram.h5` companion file.
- [ ] **`>100 Hz` value parsing.** Histogram TXTs use `>100 Hz` for out-of-range ZC freq; current `_parse_number()` returns `None` for these (loses information).
- [x] **`>100 Hz` value parsing.** ✅ v0.20.0 — parser now mirrors the OORANGE pattern: stores 100.0 on `zc_freq_hz` + sets `zc_freq_above_range` flag. PDF + both modals render `>100 Hz` instead of `—`.
### Ingestion gaps
@@ -498,3 +585,7 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows.
- [ ] Locate "Sensor Check" byte in compliance config (need capture with Disabled vs Before-monitoring).
- [ ] Call Home — map time slots 3/4 offsets; confirm `modem_power_relay_enabled`.
- [ ] RV55 DCD/DTR — newer RV55 firmware doesn't assert DCD by default; units don't resume monitoring after call-home disconnect (`--restart-monitoring` flag deferred).
- [ ] **NULL-timestamp duplicate-row dedup.** A small handful of events (2 known on prod as of 2026-05-22) have `events.timestamp IS NULL` because the codec couldn't extract a timestamp from the binary footer. The `UNIQUE(serial, timestamp)` constraint doesn't fire on `NULL` (SQL semantics: `NULL ≠ NULL`), so every `--force` backfill INSERTs a new row instead of UPSERTing the existing one. Cleanup: a one-shot SQL query that keeps only the newest row per `(serial, blastware_filename)` and deletes the rest. Longer-term: extend the unique key to `(serial, COALESCE(timestamp, blastware_filename))` or reject inserts with NULL timestamp.
- [ ] **Histogram body sub-format with `byte[5] != 0`.** ~3 events on prod (`T190LD5Q.LD0H`, `O121L4L1.GU0H`) use a histogram body my walker doesn't recognize — the first block has `byte[5] = 0x01` or `0x07` instead of `0x00`, and the entire body lacks the `1e 0a 00 00` tail signature. Codec returns 0 valid blocks; their DB PVS comes from the bw_report ASCII overlay (which BW computed from the same binary, so the DB columns are correct). Only the `.h5` waveform plot is empty. Cracking the sub-format would unlock the plot. Needs binary+ASCII pairs from a few `byte[5]!=0` events; same RE approach as the K558 case.
- [ ] **Histogram body sub-format with `byte[5] == 0x00` but undecodable.** Observed 2026-05-28 on BE17353 (S353) events: `S353L4H2.FZ0H`, `S353L4H2.P00H`, `S353L4H3.7O0H`, `S353L4H3.E10H`. Body starts `00 00 00 01 0a 00 XX 00 ...` which LOOKS like a valid histogram block header (marker 0x000a at byte[4:6] ✓, byte[5]=0x00 normal-format ✓), but the walker finds zero data blocks across the whole body. Likely an extra header before the block stream OR a different tail signature than `1e 0a 00 00`. Smaller body lengths (1900-2100 bytes) suggest these may be short-recording histogram variants. Same operational impact as the byte[5]!=0 case: event ingests cleanly, DB peaks correct via bw_report overlay, only the chart is empty. Worth dumping a hex view of one body to diagnose.
- [ ] **Sensor-check waveform extraction from the BW binary.** BW's Event Report PDFs include a narrow panel on the right side of the waveform plot showing each channel's response to the sensor self-check signal (a damped sinusoid for geo, sawtooth-at-test-freq for mic). Our parser captures the test RESULTS (`test_freq_hz`, `test_ratio`, `test_amplitude_mv`, `test_results` pass/fail) and the PDF + modal display them as text — but BW's per-sample sensor-check waveform isn't accessible to us today. Two paths to add it: (a) RE the binary to find where the sensor-check samples are stored — could be a section before STRT, after the footer, or in a separate sub-record; protocol reference doesn't currently mention it. (b) If samples aren't in the binary, synthesize a representative waveform from the test parameters (damped sinusoid at `test_freq_hz` with damping from `test_ratio`). Path (a) is the honest answer; path (b) is decorative. Until either lands, the text-only sensor-check display in the report is fine.
+65
View File
@@ -0,0 +1,65 @@
"""Run read_idf_file across the corpus and report per-channel accuracy vs sidecars."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file
from analysis_idf.recon import load_sidecar_samples
def sidecar_path(idfw: Path) -> Path:
return idfw.parent / "TXT" / f"{idfw.name}.txt"
def main():
root = REPO / "tests/fixtures/THORDATA_example"
files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]
files.sort()
GEO_LSB = 0.0003
n_ok = n_skip = 0
overall = {"Tran": [], "Vert": [], "Long": []}
for f in files:
try:
res = read_idf_file(f)
except Exception:
n_skip += 1
continue
sc_path = sidecar_path(f)
if not sc_path.exists():
n_skip += 1
continue
try:
sc = load_sidecar_samples(sc_path)
except Exception:
n_skip += 1
continue
per_file = {}
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = res.samples.get(ch, [])
n = min(len(sc_counts), len(dec))
if n == 0:
per_file[ch] = 0.0
continue
exact = sum(1 for i in range(n) if sc_counts[i] == dec[i])
pct = 100.0 * exact / n
per_file[ch] = pct
overall[ch].append(pct)
n_ok += 1
print(f"Processed {n_ok} files (skipped {n_skip})")
print("Per-channel exact-match % (mean / min / max):")
for ch, vals in overall.items():
if vals:
avg = sum(vals) / len(vals)
print(f" {ch}: mean={avg:.2f}% min={min(vals):.2f}% max={max(vals):.2f}% n={len(vals)}")
if __name__ == "__main__":
main()
+49
View File
@@ -0,0 +1,49 @@
"""Find where decoded-vs-sidecar diverges for each channel."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
decoded = decode_waveform_v2(buf[0x0f1f:])
GEO_LSB = 0.0003
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = decoded[ch]
# Find ALL transitions where mismatches start/stop
first_diff = next((i for i in range(len(dec)) if dec[i] != sc_counts[i]), None)
if first_diff is None:
print(f"{ch}: NO MISMATCHES")
continue
print(f"{ch}: first diff at idx {first_diff}")
# Show 5 before, 5 after
for i in range(max(0, first_diff - 3), min(len(dec), first_diff + 8)):
mark = " " if dec[i] == sc_counts[i] else "**"
print(f" {mark} idx {i:4d}: sc={sc_counts[i]:6d} dec={dec[i]:6d} diff={dec[i]-sc_counts[i]:+d}")
# Where does cumulative diff exceed 100?
cum_match_run = 0
max_match_run = 0
match_run_start = 0
diff_count = 0
for i in range(len(dec)):
if dec[i] == sc_counts[i]:
cum_match_run += 1
max_match_run = max(max_match_run, cum_match_run)
else:
cum_match_run = 0
diff_count += 1
print(f" total mismatches: {diff_count}/{len(dec)}, longest run of matches: {max_match_run}")
print()
if __name__ == "__main__":
main()
+48
View File
@@ -0,0 +1,48 @@
"""End-to-end IDFH ingest verification."""
from __future__ import annotations
import sys
import tempfile
import json
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
def main():
idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
txt = idfh.parent / "TXT" / f"{idfh.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfh.read_bytes(),
idfh,
idf_report_text=txt.read_text(errors="replace"),
)
print("=== save_imported_idf (IDFH) ===")
print(f" serial: {rec['serial']}")
print(f" filename: {rec['filename']}")
print(f" filesize: {rec['filesize']}")
print(f" h5: {rec['hdf5_filename']}") # expect None for histogram
print(f" sidecar: {rec['sidecar_filename']}")
print()
print("=== Event ===")
print(f" timestamp: {ev.timestamp}")
print(f" record_type: {ev.record_type}")
print(f" sample_rate: {ev.sample_rate}")
print()
# Inspect sidecar to confirm intervals were stashed
sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json"
sc = json.loads(sc_path.read_text())
intervals = sc.get("extensions", {}).get("idf_intervals", [])
print(f" sidecar intervals: {len(intervals)}")
if intervals:
print(f" first interval: {intervals[0]}")
print(f" last interval: {intervals[-1]}")
if __name__ == "__main__":
main()
+40
View File
@@ -0,0 +1,40 @@
"""Verify the had_report=False path: ingest IDFW with no .txt."""
from __future__ import annotations
import sys
from pathlib import Path
import tempfile
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
def main():
idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfw.read_bytes(),
idfw,
serial_hint=None,
idf_report_text=None, # ← no .txt!
)
print("=== IDFW without .txt ingest ===")
print(f" serial: {rec['serial']}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" record_type: {ev.record_type}")
print(f" rectime_sec: {ev.rectime_seconds}")
nT = len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0
nV = len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0
nL = len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0
nM = len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0
print(f" raw_samples: Tran={nT} Vert={nV} Long={nL} MicL={nM}")
if ev.peak_values:
print(f" peak_values: tran={ev.peak_values.tran} vert={ev.peak_values.vert} long={ev.peak_values.long}")
print(f" h5 written: {rec['hdf5_filename']}")
if __name__ == "__main__":
main()
+102
View File
@@ -0,0 +1,102 @@
"""End-to-end Thor report PDF rendering.
Ingests an IDFW + .txt via save_imported_idf, runs gather_report_data
(faking a minimal DB row), and renders the PDF to disk.
"""
from __future__ import annotations
import sys
import tempfile
import json
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
from sfm import report_pdf
class FakeDb:
"""Stand-in for SeismoDb.get_event(); the renderer only needs a few cols."""
def __init__(self, event):
self.event = event
def get_event(self, _id):
return self.event
def main():
base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719"
idfw = base / "UM11719_20231219162723.IDFW"
txt = base / "TXT" / f"{idfw.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfw.read_bytes(),
idfw,
idf_report_text=txt.read_text(errors="replace"),
)
print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}")
# Verify sidecar has bw_report block
sc_path = Path(td) / "UM11719" / f"{idfw.name}.sfm.json"
sc = json.loads(sc_path.read_text())
bw = sc.get("bw_report", {})
print(f" bw_report.available: {bw.get('available')}")
print(f" bw_report.peaks.tran.ppv_ips: {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}")
print(f" bw_report.mic.pspl_dbl: {bw.get('mic', {}).get('pspl_dbl')}")
print(f" bw_report.histogram.n_intervals: {bw.get('histogram', {}).get('n_intervals')}")
# Build a DB-row-shaped dict from the Event for gather_report_data
import datetime
ts = ev.timestamp
ts_iso = None
if ts is not None:
try:
ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
except Exception:
pass
fake_row = {
"serial": "UM11719",
"blastware_filename": rec["filename"],
"record_type": "Waveform",
"timestamp": ts_iso,
"sample_rate": ev.sample_rate,
"project": ev.project_info.project if ev.project_info else None,
"client": ev.project_info.client if ev.project_info else None,
"operator": ev.project_info.operator if ev.project_info else None,
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
"created_at": None,
}
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
print()
print(f"=== ReportData ===")
print(f" event_id: {rd.event_id}")
print(f" serial: {rd.serial}")
print(f" record_type: {rd.record_type}")
print(f" event_datetime: {rd.event_datetime_str}")
print(f" trigger: {rd.trigger_source}")
print(f" geo_range: {rd.geo_range_str}")
print(f" sample_rate: {rd.sample_rate_str}")
print(f" firmware: {rd.firmware}")
print(f" calibration: {rd.calibration_date} by {rd.calibration_by}")
print(f" battery: {rd.battery_volts}")
print(f" PVS: {rd.peak_vector_sum_ips} in/s at {rd.peak_vector_sum_time_s} sec")
print(f" mic_pspl_dbl: {rd.mic_pspl_dbl}")
print(f" mic_zc_freq_hz: {rd.mic_zc_freq_hz}")
print(f" channel_stats: {len(rd.channel_stats)} rows")
for cs in rd.channel_stats:
print(f" {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} ToP={cs['time_of_peak_s']} Acc={cs['peak_accel_g']} Disp={cs['peak_disp_in']} Test={cs['sensor_check']}")
# Render the PDF
out_path = REPO / "analysis_idf" / "thor_report.pdf"
pdf_bytes = report_pdf.render_event_report_pdf(rd)
out_path.write_bytes(pdf_bytes)
print()
print(f" PDF written: {out_path} ({len(pdf_bytes)} bytes)")
if __name__ == "__main__":
main()
+91
View File
@@ -0,0 +1,91 @@
"""End-to-end Thor IDFH histogram report PDF rendering."""
from __future__ import annotations
import sys
import tempfile
import json
import datetime
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
from sfm import report_pdf
class FakeDb:
def __init__(self, event):
self.event = event
def get_event(self, _id):
return self.event
def main():
# Use the multi-interval IDFH (81 + trigger row)
idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
txt = idfh.parent / "TXT" / f"{idfh.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfh.read_bytes(),
idfh,
idf_report_text=txt.read_text(errors="replace"),
)
print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}")
sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json"
sc = json.loads(sc_path.read_text())
bw = sc.get("bw_report", {})
hist = bw.get("histogram", {})
print(f" bw_report.histogram.start: {hist.get('start')}")
print(f" bw_report.histogram.stop: {hist.get('stop')}")
print(f" bw_report.histogram.n_intervals: {hist.get('n_intervals')}")
print(f" bw_report.histogram.interval_size: {hist.get('interval_size')}")
print(f" bw_report.histogram.interval_size_s: {hist.get('interval_size_s')}")
print(f" bw_report.peaks.tran.ppv_ips: {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}")
ts = ev.timestamp
ts_iso = None
if ts is not None:
try:
ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
except Exception:
pass
fake_row = {
"serial": "UM13981",
"blastware_filename": rec["filename"],
"record_type": "Histogram",
"timestamp": ts_iso,
"sample_rate": ev.sample_rate,
"project": ev.project_info.project if ev.project_info else None,
"client": ev.project_info.client if ev.project_info else None,
"operator": ev.project_info.operator if ev.project_info else None,
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
"created_at": None,
}
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="hist-1")
print()
print("=== ReportData (histogram) ===")
print(f" is_histogram: {rd.is_histogram}")
print(f" histogram_start: {rd.histogram_start_str}")
print(f" histogram_stop: {rd.histogram_stop_str}")
print(f" histogram_n_intervals: {rd.histogram_n_intervals}")
print(f" histogram_interval_size:{rd.histogram_interval_size}")
print(f" histogram_interval_times[:3]: {rd.histogram_interval_times[:3]}")
print(f" histogram_interval_times[-2:]: {rd.histogram_interval_times[-2:]}")
print(f" channel_stats: {len(rd.channel_stats)} rows")
for cs in rd.channel_stats:
print(f" {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} peak_date={cs['peak_date']} peak_time={cs['peak_time']}")
pdf_bytes = report_pdf.render_event_report_pdf(rd)
out_path = REPO / "analysis_idf" / "thor_report_idfh.pdf"
out_path.write_bytes(pdf_bytes)
print()
print(f" PDF written: {out_path} ({len(pdf_bytes)} bytes)")
if __name__ == "__main__":
main()
+52
View File
@@ -0,0 +1,52 @@
"""End-to-end ingest test: feed an IDFW + .txt to save_imported_idf in a tmp store."""
from __future__ import annotations
import sys
from pathlib import Path
import tempfile
import shutil
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
def main():
idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
txt = idfw.parent / "TXT" / f"{idfw.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfw.read_bytes(),
idfw,
serial_hint=None,
idf_report_text=txt.read_text(errors="replace"),
)
print("=== Save result ===")
print(f" serial: {rec['serial']}")
print(f" filename: {rec['filename']}")
print(f" filesize: {rec['filesize']}")
print(f" h5: {rec['hdf5_filename']}")
print(f" sidecar: {rec['sidecar_filename']}")
print()
print("=== Event ===")
print(f" serial: {ev.serial if hasattr(ev,'serial') else '(n/a)'}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" record_type: {ev.record_type}")
print(f" rectime_sec: {ev.rectime_seconds}")
print(f" raw_samples: Tran={len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0}, Vert={len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0}, Long={len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0}, MicL={len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0}")
if ev.peak_values:
print(f" peaks (txt): Tran={ev.peak_values.tran} Vert={ev.peak_values.vert} Long={ev.peak_values.long}")
print()
# Verify the h5 file actually got written
h5path = Path(td) / "UM11719" / f"{idfw.name}.h5"
print(f" h5 exists: {h5path.exists()} size={h5path.stat().st_size if h5path.exists() else 0}")
sidecar = Path(td) / "UM11719" / f"{idfw.name}.sfm.json"
print(f" sidecar exists:{sidecar.exists()} size={sidecar.stat().st_size if sidecar.exists() else 0}")
if __name__ == "__main__":
main()
+137
View File
@@ -0,0 +1,137 @@
"""Decode IDFH histogram intervals + verify against sidecar."""
from __future__ import annotations
import sys
import struct
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
SEGMENT_MAGIC = b"\x02\xda\x0a\x00\x00\x00"
SEGMENT_SIZE = 732 # = 10-byte header + 10 × 72-byte intervals + 2-byte tail
INTERVAL_SIZE = 72
CHANNELS = ("Tran", "Vert", "Long", "MicL")
def decode_interval(buf72: bytes) -> dict:
"""Decode one 72-byte interval into per-channel min/max/halfp."""
out = {}
for i, ch in enumerate(CHANNELS):
block = buf72[i*16 : (i+1)*16]
mn = struct.unpack_from(">h", block, 0)[0]
mx = struct.unpack_from(">h", block, 2)[0]
sb = struct.unpack_from(">h", block, 4)[0]
halfp = struct.unpack_from(">H", block, 6)[0]
f10 = struct.unpack_from(">H", block, 10)[0]
f14 = struct.unpack_from(">H", block, 14)[0]
peak_count = max(abs(mn), abs(mx))
out[ch] = {
"min": mn,
"max": mx,
"field4": sb,
"halfp": halfp,
"field10": f10,
"field14": f14,
"peak": peak_count,
"freq_hz": (512.0 / halfp) if halfp > 5 else None,
}
out["_tail"] = buf72[64:].hex(" ")
return out
def walk_idfh(buf: bytes) -> list:
"""Walk all interval records in an IDFH file."""
intervals = []
# Multi-segment file: every 02 da 0a 00 00 00 marker introduces a segment.
# Single-interval file: just one body header at 0xf96 of form ?? ?? 0a 00 00 00.
# Find them all.
i = 0
while True:
j = buf.find(b"\x0a\x00\x00\x00", i)
if j < 0:
break
# Validate: the 2 bytes before must form a length, and we want bytes
# [j-2 : j+6] to have a recognisable shape. Actually the cleanest
# filter is "preceded by a length and followed by 00 NN 05 3f".
if j < 2:
i = j + 1
continue
# Body header form: [length_be_2][0a 00 00 00][00 NN][05 3f]
if j + 10 > len(buf):
break
length = int.from_bytes(buf[j-2:j], "big")
# Verify the segment-marker shape: [length_be][0a 00 00 00][00 NN][05 3f]
if buf[j+4] != 0x00:
i = j + 1
continue
if buf[j+6:j+8] != b"\x05\x3f":
i = j + 1
continue
# Header layout (10 bytes): [length_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B]
# Followed by N interval records of 72 bytes each, then 2 tail bytes.
# length value = (N × 72) + 10 (counts bytes from 0x0a... through interval data).
header_start = j - 2
n_intervals = (length - 10) // INTERVAL_SIZE
interval_start = header_start + 10
for k in range(n_intervals):
off = interval_start + k * INTERVAL_SIZE
if off + INTERVAL_SIZE > len(buf):
break
chunk = buf[off:off + INTERVAL_SIZE]
intervals.append({"offset": off, **decode_interval(chunk)})
i = header_start + length + 2
return intervals
def main():
# Test against multi-segment IDFH
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
sc_path = target.parent / "TXT" / f"{target.name}.txt"
buf = target.read_bytes()
intervals = walk_idfh(buf)
print(f"=== {target.name} ===")
print(f" file size: {len(buf)}")
print(f" decoded intervals: {len(intervals)}")
# Show first 2 + last 2
sc_rows = []
for line in sc_path.read_text(errors="replace").splitlines():
if line.startswith("2022-") or line.startswith("2023-"):
sc_rows.append(line)
print(f" sidecar rows: {len(sc_rows)}")
print()
for k in [0, 1, 78, 79, 80]:
if k >= len(intervals):
continue
iv = intervals[k]
print(f"--- interval {k} @0x{iv['offset']:04x} ---")
for ch in CHANNELS:
d = iv[ch]
peak_ips = d["peak"] / 32768 * 10.0
print(f" {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s) halfp={d['halfp']:5d} freq={d['freq_hz']}")
# sidecar row
if k < len(sc_rows):
print(f" SC: {sc_rows[k]}")
# Test single-interval IDFH
print()
target2 = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH"
sc2 = target2.parent / "TXT" / f"{target2.name}.txt"
buf2 = target2.read_bytes()
intervals2 = walk_idfh(buf2)
print(f"=== {target2.name} ===")
print(f" file size: {len(buf2)}, decoded intervals: {len(intervals2)}")
if intervals2:
iv = intervals2[0]
for ch in CHANNELS:
d = iv[ch]
peak_ips = d["peak"] / 32768 * 10.0
print(f" {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s) halfp={d['halfp']:5d} freq={d['freq_hz']}")
sc_rows2 = [l for l in sc2.read_text(errors='replace').splitlines() if l.startswith("2023-")]
if sc_rows2:
print(f" SC: {sc_rows2[0]}")
if __name__ == "__main__":
main()
+41
View File
@@ -0,0 +1,41 @@
"""Find IDFH interval period via auto-correlation of structural patterns."""
from __future__ import annotations
import sys
from pathlib import Path
from collections import Counter
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
def main():
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
buf = target.read_bytes()
body_start = 0xF96
body_end = 0x270C
body = buf[body_start:body_end]
print(f"body size: {len(body)} bytes (file {len(buf)} bytes)")
# For each candidate interval size, count how many bytes at fixed offsets within
# each interval are zero (consistent column-zero pattern indicates correct size).
print()
print("=== zero-column score by interval size (higher = more likely) ===")
best = []
for sz in range(16, 100):
n = len(body) // sz
if n < 30:
continue
# For each column position within an interval, count how many of n intervals have zero
score = 0
for col in range(sz):
zeros = sum(1 for i in range(n) if body[i*sz + col] == 0)
if zeros >= n * 0.9:
score += 1
best.append((score, sz, n))
best.sort(reverse=True)
for score, sz, n in best[:10]:
print(f" size={sz:3d} n_intervals={n} consistently-zero-cols={score}")
if __name__ == "__main__":
main()
+40
View File
@@ -0,0 +1,40 @@
"""Per-file accuracy + sample-count details."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file
from analysis_idf.recon import load_sidecar_samples
def main():
root = REPO / "tests/fixtures/THORDATA_example"
files = sorted([f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")])
GEO_LSB = 0.0003
# Limit to first 15 successful files for detail.
shown = 0
for f in files:
try:
res = read_idf_file(f)
except Exception:
continue
sc_path = f.parent / "TXT" / f"{f.name}.txt"
if not sc_path.exists():
continue
sc = load_sidecar_samples(sc_path)
sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]]
dec = res.samples.get("Tran", [])
n = min(len(sc_tran), len(dec))
exact = sum(1 for i in range(n) if sc_tran[i] == dec[i]) if n else 0
pct = 100.0 * exact / n if n else 0.0
print(f"{f.name:40s} size={f.stat().st_size:6d} sc_n={len(sc_tran):4d} dec_n={len(dec):4d} exact={pct:.1f}%")
shown += 1
if shown >= 20:
break
if __name__ == "__main__":
main()
+64
View File
@@ -0,0 +1,64 @@
"""Look at what's at the divergence boundary."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import walk_body, find_data_start, parse_segment_header
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
body = buf[0x0f1f:]
start = find_data_start(body)
print(f"data_start: {start} (= file offset 0x{0x0f1f + start:04x})")
blocks = walk_body(body, start)
print(f"{len(blocks)} blocks total")
print()
# First 25 blocks
print("=== first 30 blocks ===")
for i, b in enumerate(blocks[:30]):
body_off = 0x0f1f + b.offset
if b.tag_hi == 0x40:
hdr = parse_segment_header(b)
print(f" [{i:3d}] @0x{body_off:04x} {b.kind} (segment header) counter={hdr['counter'] if hdr else '?'} field2={hdr['field2'].hex() if hdr else '?'} anchor={hdr['anchor_bytes'].hex() if hdr else '?'} tail={hdr['tail'].hex() if hdr else '?'}")
else:
print(f" [{i:3d}] @0x{body_off:04x} {b.kind} len={b.length} data={b.data[:16].hex()}")
print()
# Cumulative sample counts per block to find which block contains sample 254
print("=== cumulative samples through blocks ===")
cur_ch = "Tran"
rotation = ["Vert", "Long", "MicL", "Tran"]
seg_count = 0
samples_in_curseg = 2 # preamble Tran[0], Tran[1]
for i, b in enumerate(blocks[:30]):
if b.tag_hi == 0x40:
seg_count += 1
prev_ch = cur_ch
cur_ch = rotation[(seg_count - 1) % 4]
print(f" [{i:3d}] 40 02 -> end of {prev_ch} segment, start {cur_ch} (segment {seg_count})")
samples_in_curseg = 2 # anchors
elif (b.tag_hi & 0xF0) == 0x10:
nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo
samples_in_curseg += nn
print(f" [{i:3d}] {b.kind} nibble: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
elif (b.tag_hi & 0xF0) == 0x20:
nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo
samples_in_curseg += nn
print(f" [{i:3d}] {b.kind} int8: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
elif b.tag_hi == 0x00:
samples_in_curseg += b.tag_lo
print(f" [{i:3d}] {b.kind} RLE: +{b.tag_lo}, ch={cur_ch}, ch_total~{samples_in_curseg}")
elif b.tag_hi == 0x30:
samples_in_curseg += b.tag_lo
print(f" [{i:3d}] {b.kind} packed12: +{b.tag_lo} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
if __name__ == "__main__":
main()
+89
View File
@@ -0,0 +1,89 @@
"""Reconnaissance helpers for cracking the Thor IDFW binary."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
TARGET = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
TXT = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/TXT/UM11719_20231219162723.IDFW.txt"
def hex_at(buf: bytes, off: int, n: int = 32) -> str:
chunk = buf[off : off + n]
hexs = " ".join(f"{b:02x}" for b in chunk)
asc = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk)
return f"{off:04x}: {hexs} {asc}"
def find_all(buf: bytes, needle: bytes) -> list[int]:
out: list[int] = []
i = 0
while True:
j = buf.find(needle, i)
if j < 0:
break
out.append(j)
i = j + 1
return out
def load_sidecar_samples(path: Path) -> dict[str, list[float]]:
"""Parse the txt sample table — Tran/Vert/Long/MicL."""
out = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
in_block = False
for line in path.read_text(errors="replace").splitlines():
if not in_block:
if line.strip() == "Waveform Data Channels":
in_block = True
continue
if line.startswith("Waveform Data USB Channels"):
break
parts = line.split("\t")
# First row is the header "\tTran\tVert\tLong\tMicL"
if len(parts) >= 5 and parts[1] == "Tran":
continue
if len(parts) < 5:
continue
try:
out["Tran"].append(float(parts[1]))
out["Vert"].append(float(parts[2]))
out["Long"].append(float(parts[3]))
out["MicL"].append(float(parts[4]))
except ValueError:
continue
return out
def main():
buf = TARGET.read_bytes()
samples = load_sidecar_samples(TXT)
print(f"file size: {len(buf)} bytes")
print(f"sample rows: Tran={len(samples['Tran'])} Vert={len(samples['Vert'])} Long={len(samples['Long'])} MicL={len(samples['MicL'])}")
print(f"first 6 Tran samples: {samples['Tran'][:6]}")
print(f"first 6 Vert samples: {samples['Vert'][:6]}")
print(f"first 6 Long samples: {samples['Long'][:6]}")
print(f"first 6 MicL samples: {samples['MicL'][:6]}")
print()
print("=== BW magic '00 02 00' positions ===")
hits = find_all(buf, b"\x00\x02\x00")
print(f"{len(hits)} hits")
for h in hits[:20]:
print(hex_at(buf, h, 24))
print()
print("=== '40 02' segment-header positions ===")
hits = find_all(buf, b"\x40\x02")
print(f"{len(hits)} hits")
for h in hits:
ctx_pre = buf[max(0, h - 4): h].hex()
ctx_post = buf[h: h + 20].hex()
# Show byte preceding to help identify real headers vs casual occurrences
print(f" 0x{h:04x} pre={ctx_pre} post={ctx_post}")
if __name__ == "__main__":
main()
+40
View File
@@ -0,0 +1,40 @@
"""Find each segment boundary in the channel and check if errors reset there."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
decoded = decode_waveform_v2(buf[0x0f1f:])
GEO_LSB = 0.0003
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = decoded[ch]
# Find every transition where error becomes zero from nonzero (or grows from zero)
# Print indices where dec resyncs back to exact match.
n = min(len(sc_counts), len(dec))
events = []
prev_match = True
for i in range(n):
match = sc_counts[i] == dec[i]
if match != prev_match:
kind = "RESYNC" if match else "DIVERGE"
events.append((i, kind, sc_counts[i], dec[i]))
prev_match = match
print(f"{ch}: {len(events)} transitions")
for i, kind, sc_v, dec_v in events[:20]:
print(f" idx {i:4d} {kind:8s} sc={sc_v:6d} dec={dec_v:6d} diff={dec_v-sc_v:+d}")
print()
if __name__ == "__main__":
main()
+46
View File
@@ -0,0 +1,46 @@
"""Smoke-test read_idf_file on IDFH across the corpus."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file
def main():
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH"
result = read_idf_file(target)
ev = result.event
print(f"=== {target.name} ===")
print(f" signature: {result.signature}")
print(f" serial: {ev.serial}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" kind: {ev.kind}")
print(f" intervals: {len(result.intervals or [])}")
print(f" peaks: T={ev.peaks.transverse_ips:.4f} V={ev.peaks.vertical_ips:.4f} L={ev.peaks.longitudinal_ips:.4f}")
print()
root = REPO / "tests/fixtures/THORDATA_example"
files = list(root.rglob("*.IDFH"))
ok = fail = nyi = 0
total_intervals = 0
for f in files:
try:
r = read_idf_file(f)
ok += 1
total_intervals += len(r.intervals or [])
except NotImplementedError:
nyi += 1
except Exception as exc:
fail += 1
if fail <= 3:
print(f" FAIL: {f.name}: {type(exc).__name__}: {exc}")
print(f"Corpus: {len(files)} IDFH files | ok={ok} fail={fail} nyi={nyi}")
print(f"Total intervals decoded: {total_intervals}")
if __name__ == "__main__":
main()
+48
View File
@@ -0,0 +1,48 @@
"""Smoke-test read_idf_file across the sample corpus."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file, geo_count_to_ips, mic_count_to_psi
def main():
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
result = read_idf_file(target)
ev = result.event
print(f"=== {target.name} ===")
print(f" signature: {result.signature}")
print(f" serial: {ev.serial}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" record_time: {ev.record_time_sec}")
print(f" calibration: {result.binary_metadata.calibration_date}")
print(f" Tran samples: {len(result.samples['Tran'])}, peak_ips={ev.peaks.transverse_ips:.4f}")
print(f" Vert samples: {len(result.samples['Vert'])}, peak_ips={ev.peaks.vertical_ips:.4f}")
print(f" Long samples: {len(result.samples['Long'])}, peak_ips={ev.peaks.longitudinal_ips:.4f}")
print(f" MicL samples: {len(result.samples['MicL'])}")
print()
# Corpus sweep
root = REPO / "tests/fixtures/THORDATA_example"
files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]
ok = fail = nyi = 0
for f in files:
try:
r = read_idf_file(f)
ok += 1
except NotImplementedError:
nyi += 1
except Exception as exc:
fail += 1
if fail <= 5:
print(f" FAIL: {f.name}: {type(exc).__name__}: {exc}")
print()
print(f"Corpus: {len(files)} IDFW files | ok={ok} fail={fail} not-implemented={nyi}")
if __name__ == "__main__":
main()
+47
View File
@@ -0,0 +1,47 @@
"""Verify build_bw_report_from_idf against a known sidecar."""
from __future__ import annotations
import json
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_ascii_report import parse_idf_report
from micromate.idf_to_bw_report import build_bw_report_from_idf
from micromate.idf_file import read_idf_file
def show(prefix: str, d: dict, indent: int = 0):
for k, v in d.items():
if isinstance(v, dict):
print(f"{' '*indent}{prefix}{k}:")
show("", v, indent + 1)
else:
print(f"{' '*indent}{prefix}{k}: {v!r}")
def main():
base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719"
idfw = base / "UM11719_20231219162723.IDFW"
txt = base / "TXT" / f"{idfw.name}.txt"
report_dict = parse_idf_report(txt.read_text(errors="replace"))
res = read_idf_file(idfw)
bw = build_bw_report_from_idf(report_dict, binary_md=res.binary_metadata)
print("=== IDFW → bw_report ===")
show("", bw)
print()
print("=== IDFH (single trigger row) ===")
idfh = base / "UM11719_20231219162648.IDFH"
txt_h = base / "TXT" / f"{idfh.name}.txt"
rh = parse_idf_report(txt_h.read_text(errors="replace"))
res_h = read_idf_file(idfh)
bw_h = build_bw_report_from_idf(rh, binary_md=res_h.binary_metadata, intervals=res_h.intervals)
show("", bw_h)
if __name__ == "__main__":
main()
Binary file not shown.
Binary file not shown.
+73
View File
@@ -0,0 +1,73 @@
"""Trace Tran sample-by-sample to find exactly where the codec drifts."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def s4(n: int) -> int:
return n if n < 8 else n - 16
def i8(b: int) -> int:
return b if b < 128 else b - 256
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
GEO_LSB = 0.0003
sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]]
body = buf[0x0f1f:]
# Tran[0], Tran[1] from preamble
t0 = int.from_bytes(body[3:5], "big", signed=True)
t1 = int.from_bytes(body[5:7], "big", signed=True)
print(f"preamble Tran[0]={t0} Tran[1]={t1} (sidecar: {sc_tran[0]}, {sc_tran[1]})")
# Block 0: 10 f8 at body[7:9]
print(f"block 0: tag {body[7]:02x} {body[8]:02x}")
print(f" block 0 first 10 data bytes: {body[9:19].hex()}")
# Walk block 0 manually, comparing each sample
cur = t1
samples = [t0, t1]
block_off = 7
nn = body[8]
print(f" NN = {nn}")
data = body[9 : 9 + nn // 2]
for byi, byte in enumerate(data):
for nib_idx, nib in enumerate(((byte >> 4) & 0xF, byte & 0xF)):
cur += s4(nib)
samples.append(cur)
idx = len(samples) - 1
if 0 <= idx < len(sc_tran):
sc_v = sc_tran[idx]
match = "" if sc_v == cur else ""
if idx < 12 or 240 <= idx <= 260:
print(f" idx {idx:3d}: nibble byte={byte:02x} nib={nib:x} delta={s4(nib):+d} cur={cur:+d} sc={sc_v:+d} {match}")
print(f"end of block 0: cur={cur}, len(samples)={len(samples)}, decoder expected 250 here")
# Block 1: 20 28 starts at offset 9 + 124 = 133 from block_off=7
block1_off = 9 + nn // 2
print(f"block 1: tag {body[block1_off]:02x} {body[block1_off+1]:02x} (expecting 20 28)")
nn1 = body[block1_off + 1]
print(f" block 1 NN = {nn1}")
data1 = body[block1_off + 2 : block1_off + 2 + nn1]
for byi, byte in enumerate(data1):
cur += i8(byte)
samples.append(cur)
idx = len(samples) - 1
if idx < len(sc_tran):
sc_v = sc_tran[idx]
match = "" if sc_v == cur else ""
if 248 <= idx <= 295:
print(f" idx {idx:3d}: int8 byte={byte:02x} delta={i8(byte):+d} cur={cur:+d} sc={sc_v:+d} {match}")
if __name__ == "__main__":
main()
+42
View File
@@ -0,0 +1,42 @@
"""Feed candidate body offsets to the BW codec and compare with sidecar."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2, walk_body, find_data_start
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
# Sidecar samples in 0.0003 counts (Thor geo LSB).
sc_tran = [int(round(v / 0.0003)) for v in sc["Tran"][:30]]
sc_vert = [int(round(v / 0.0003)) for v in sc["Vert"][:30]]
sc_long = [int(round(v / 0.0003)) for v in sc["Long"][:30]]
sc_micl = [int(round(v / 1e-6)) for v in sc["MicL"][:30]] # 1 µ unit for mic? Will iterate.
print(f"sidecar Tran (counts): {sc_tran}")
print(f"sidecar Vert (counts): {sc_vert}")
print(f"sidecar Long (counts): {sc_long}")
print(f"sidecar MicL (×1e-6): {sc_micl}")
print()
# Try candidate body start offsets.
for off in (0x0f1f, 0x1057, 0x11f1, 0x1333, 0x1bde, 0x0d30):
print(f"=== body @ 0x{off:04x} ===")
body = buf[off:]
decoded = decode_waveform_v2(body)
if not decoded:
print(" decode_waveform_v2 returned None")
continue
for ch in ("Tran", "Vert", "Long", "MicL"):
arr = decoded.get(ch, [])
print(f" {ch}[{len(arr)}]: {arr[:20]}")
print()
if __name__ == "__main__":
main()
+51
View File
@@ -0,0 +1,51 @@
"""Verify decode_waveform_v2 against sidecar across all 2304 samples per channel."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
body = buf[0x0f1f:]
decoded = decode_waveform_v2(body)
print(f"Sidecar lengths: Tran={len(sc['Tran'])} Vert={len(sc['Vert'])} Long={len(sc['Long'])} MicL={len(sc['MicL'])}")
print(f"Decoded lengths: Tran={len(decoded['Tran'])} Vert={len(decoded['Vert'])} Long={len(decoded['Long'])} MicL={len(decoded['MicL'])}")
print()
GEO_LSB = 0.0003 # in/s per count
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = decoded[ch]
n = min(len(sc_counts), len(dec))
matches = sum(1 for i in range(n) if sc_counts[i] == dec[i])
first_mismatch = next((i for i in range(n) if sc_counts[i] != dec[i]), None)
print(f"{ch}: compared {n}, exact matches {matches} ({100*matches/n:.2f}%)")
if first_mismatch is not None:
i = first_mismatch
print(f" first mismatch at idx {i}: sidecar={sc_counts[i]} ({sc[ch][i]}), decoded={dec[i]}")
print(f" context sidecar[{i-2}..{i+5}]: {sc_counts[max(0,i-2):i+5]}")
print(f" context decoded[{i-2}..{i+5}]: {dec[max(0,i-2):i+5]}")
# MicL: find the multiplicative factor that fits
print()
print("=== MicL scale analysis ===")
sc_micl = sc["MicL"]
dec_micl = decoded["MicL"]
# Skip zero values when computing ratio
ratios = [sc_micl[i] / dec_micl[i] for i in range(min(50, len(sc_micl), len(dec_micl))) if dec_micl[i] != 0]
if ratios:
avg = sum(ratios) / len(ratios)
print(f" avg ratio sidecar/decoded over first 50 nonzero: {avg:.4e} (n={len(ratios)})")
print(f" ratios sample: {[f'{r:.4e}' for r in ratios[:6]]}")
if __name__ == "__main__":
main()
+35 -5
View File
@@ -12,7 +12,21 @@ implementation lives in `minimateplus/histogram_codec.py`.
in-repo histogram fixture corpus decodes byte-exact against BW's
ASCII export.
24 regression tests pass against ~3,500 blocks across 5 fixtures.
26 regression tests pass against ~3,500 blocks across 5 in-repo
fixtures, plus a synthetic regression block taken from a real
BE9558 prod event to lock in the uint8-peak interpretation.
**Important correction (2026-05-21):** the per-channel peak count
is `uint8` at byte[6]/[10]/[14]/[18], NOT `uint16 LE` at byte[6:8]
etc. The N844 fixture corpus the original RE was done against has
zero values in bytes [7]/[11]/[15]/[19] for every block, so the
two interpretations happened to be equivalent. Cross-correlating
non-N844 events (BE9558 Tran-drift, BE18003 Histogram+Continuous)
against BW's per-interval ASCII export — 4 channels × ~1400 blocks
per event × multiple events = 100% byte-exact only when the peak
is read as uint8. Reading as uint16 LE produced peaks up to 268
in/s per channel and 35× inflated PVS sums when first deployed to
prod (rolled back, root-caused, and fixed in commit 7183b95+1).
## Body format
@@ -27,15 +41,21 @@ Each block represents one histogram interval. Block layout:
[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …)
[4:6] 0x000a (uint16 LE) constant marker (= 10)
[6:8] T_peak_count uint16 LE Tran peak (count × 0.005 → in/s at Normal)
[6] T_peak_count uint8 Tran peak (count × 0.005 → in/s at Normal,
max 1.275 in/s — fits in uint8)
[7] T_annotation uint8 empirically non-zero on intervals with sub-Hz
or unmeasurable freq; meaning not fully RE'd
[8:10] T_halfperiod uint16 LE Tran half-period in samples
(freq_Hz = 512 / halfp; ≤ 5 means ">100 Hz")
[10:12] V_peak_count uint16 LE Vert peak
[10] V_peak_count uint8 Vert peak
[11] V_annotation uint8
[12:14] V_halfperiod uint16 LE Vert freq half-period
[14:16] L_peak_count uint16 LE Long peak
[14] L_peak_count uint8 Long peak
[15] L_annotation uint8
[16:18] L_halfperiod uint16 LE Long freq half-period
[18:20] M_peak_count uint16 LE MicL peak count
[18] M_peak_count uint8 MicL peak count
(dB via waveform_codec.mic_count_to_db)
[19] M_annotation uint8
[20:22] M_halfperiod uint16 LE MicL freq half-period
[22:24] 0x00 0x00 constant
[24:28] 4-byte variable purpose unknown — possibly CRC,
@@ -99,6 +119,16 @@ slot[8] = 9 → 512/9 = 56.9 → 57 Hz ✓ M_freq
## What's NOT yet decoded
- **Annotation bytes (`block[7]/[11]/[15]/[19]`)**. Empirically
non-zero on intervals where the per-channel ZC frequency comes
out as `N/A` or sub-Hz (`<1.0`, `1.X`). Hypothesis tested in the
RE session: byte != 0 ↔ sub-Hz freq. Only ~50% correlation
across the K558 corpus, so the relationship is more complex.
Possibilities: time-of-peak-within-interval, halfp extension for
very-long-period signals, or a debug/diagnostic field the firmware
writes opportunistically. Doesn't affect peak amplitudes or
waveform reconstruction. Captured as `record["annotations"]` for
future RE.
- **4-byte variable metadata field (bytes 24:28)**. Not needed for
waveform reconstruction. Speculation: per-block CRC, sub-second
timestamp offset, or a Mic psi(L) count not in the 9 samples.
+62 -5
View File
@@ -6,11 +6,68 @@ Series IV event-file format. Sibling to
Series III "Rosetta Stone") — this doc holds what we know so far and
the open questions still to crack.
**Status (2026-05-20):** ASCII text sidecar fully decoded (1,014
sample files round-trip). Binary `.IDFH` / `.IDFW` codec
**not yet implemented** — binaries are stored opaquely by
`WaveformStore.save_imported_idf`, with metadata sourced from the
paired `.txt` sidecar.
**Status (2026-05-28):** ASCII text sidecar fully decoded (1,014
sample files round-trip). **Thor IDFW** binary now decodes via
`micromate.idf_file.read_idf_file()` — reuses the BW segment-rotated
block codec verbatim at fixed body offset `0x0f1f`; metadata (serial,
timestamp, sample_rate, record_time, calibration_date) extracted from
the binary header. Sample fidelity is 8799% byte-exact on quiet
events; loud events hit the BW codec's known walker-stops-early
limitation. Residual ~3% drift on per-sample deltas (likely a
Thor-specific 12-bit delta refinement not yet modelled).
**Thor IDFH histograms also decoded.** Body has one or more segments;
each 12-byte segment header `[length_be 2B][0a 00 00 00][00 NN][05 3f]`
introduces `N = (length - 10) // 72` interval records of 72 bytes
each. Each interval = 4 × 16-byte per-channel records:
`[int16 min][int16 max][int16 ??][uint16 halfp][2B 00][uint16 ??][2B 00][uint16 ??]`.
Geo peak `= max(|min|, |max|) / 32768 × 10` in/s (matches sidecar
~1.8%); freq `= 512 / halfp` Hz (None for halfp ≤ 5 → ">100"
sentinel). Corpus: **all 859 Thor IDFH files decode, 181,071
intervals**. Wired through `read_idf_file()`
`save_imported_idf()` → sidecar's `extensions.idf_intervals`.
**Note on the BE9439 outliers in the example corpus:** Two files
(`BE9439_20200713131747.IDFW` and `BE9439_20200713124251.IDFH`) are
**Series III Blastware** binaries, not Thor. Provenance: TMI tried
to use Thor to manage auto-call-homes for Series III units; the
experiment didn't work out, but it did leave a few BW event files
in Thor's per-serial directory structure with `.IDFW`/`.IDFH`
extensions — Thor's forwarder applied its own naming convention to
the BW bodies it was relaying. Their header `10 00 01 80 00 00
Instantel STRT ff fe <end_key> <start_key>` is the BW SUB 5A STRT
record, not a Thor body preamble. The reader detects them by
signature and raises `NotImplementedError` pointing callers at
`read_blastware_file()`, which extracts BW-format peaks from them.
**Still NYI for Thor IDFH:** per-channel `int16 field4` (possibly
time-of-peak); the two uint16 fields (probably PVS contributions);
8-byte interval tail (PVS data); mic dB(L) exact conversion constant.
### Codec breakthroughs (2026-05-28)
- **Body offset is a fixed `0x0f1f`** across 151/154 corpus IDFW
files. Preceded by a 4-byte record-type marker (`46 00 00 00`)
+ magic preamble `00 02 00 [Tran[0] BE] [Tran[1] BE]`.
- **Sample stream is BW's segment-rotated block codec verbatim.**
Thor reuses `10 NN` (nibble), `20 NN` (int8), `00 NN` (RLE),
`30 NN` (packed12), `40 02` (segment header) tags with the same
semantics. Channel rotation Tran→Vert→Long→MicL.
- **Geo LSB = 0.0003 in/s** (not BW's 0.005), because Thor's 16-bit
ADC range maps to 10 in/s without the 16-count BW quantization step.
- **Mic ≈ 2.14×10⁻⁶ psi/count** (rough scale; refine after channel
block calibration constants are decoded).
- **BW compliance anchor `\xbe\x80\x00\x00\x00\x00` reappears at
IDFW offset 0x952** — sample_rate at anchor6 (uint16 BE),
record_time at anchor+6 (float32 BE), same layout as BW.
- **Event timestamp at offset 0x97A** — 8 bytes `[day][month]
[year_be][unk][hour][min][sec]`. Stop-time mirrors at 0x982.
- **Serial as null-terminated ASCII at 0x14E**.
- **Calibration date** at 0x1940x197 (day, month, year_be).
- Per-sample residual drift of ~3% suggests Thor encodes int8/nibble
deltas with an extra refinement bit that BW doesn't carry —
unsolved; errors resync within a few samples so cumulative impact
is small.
---
+17 -2
View File
@@ -210,8 +210,7 @@ def parse_idf_report(text: Union[str, bytes]) -> Dict[str, Any]:
"long_peak_acceleration",
"tran_peak_displacement", "vert_peak_displacement",
"long_peak_displacement",
"tran_time_of_peak", "vert_time_of_peak", "long_time_of_peak",
"mic_time_of_peak", "mic_zc_freq",
"mic_zc_freq",
)
for key in float_fields:
v = raw.get(key)
@@ -223,6 +222,22 @@ def parse_idf_report(text: Union[str, bytes]) -> Dict[str, Any]:
else:
out.pop(key, None)
# Time-of-peak: Thor labels these "TimeofPeak" (lowercase "of") so the
# normalizer produces "*_timeof_peak". Map them to the canonical
# ``*_time_of_peak`` output keys for downstream consumers.
for raw_key, out_key in (
("tran_timeof_peak", "tran_time_of_peak"),
("vert_timeof_peak", "vert_time_of_peak"),
("long_timeof_peak", "long_time_of_peak"),
("mic_timeof_peak", "mic_time_of_peak"),
):
v = raw.get(raw_key)
if v is None:
continue
fv = _parse_float(v)
if fv is not None:
out[out_key] = fv
# Microphone — Thor reports MicPSPL (dB(L)) which is the closest
# analogue to BW's mic_ppv. The raw "99.4 dB(L)" string stays in
# `out` under the original `mic_pspl` key for display; the parsed
+514 -48
View File
@@ -1,64 +1,530 @@
"""
micromate/idf_file.py — placeholder for the Thor IDF binary codec.
micromate/idf_file.py — Thor IDF binary codec.
Thor's ``.IDFH`` (histogram) and ``.IDFW`` (waveform) event files are an
Instantel proprietary binary format that has not yet been reverse-
engineered. Today seismo-relay treats them as opaque blobs:
``WaveformStore.save_imported_idf`` stores the bytes verbatim and reads
all device-authoritative metadata from the paired ``.IDFW.txt`` /
``.IDFH.txt`` ASCII sidecar (parsed by ``idf_ascii_report.py``).
Decodes the Instantel Micromate Series IV ``.IDFW`` (waveform) and
``.IDFH`` (histogram) binary on-disk format. Sister module to
``minimateplus/event_file_io.py``.
When we crack the binary codec — same reverse-engineering playbook we
used to byte-perfect-parse Series III BW files (see
``docs/instantel_protocol_reference.md`` and ``minimateplus/event_file_io.py``)
— this module will grow:
Status (2026-05-28):
- ``read_idf_file(path) -> IdfEvent``
Parse a ``.IDFW``/``.IDFH`` binary and return a fully populated
``IdfEvent`` whose waveform-sample arrays come from the binary
(the .txt sidecar's tabular sample block being a best-effort
check). Lets us ingest Thor events even when the operator
hasn't enabled the .txt exporter — closing the
``had_report=False`` gap that the thor-watcher forwarder
currently tolerates as a known limitation.
- **Genuine Series IV / Thor binaries** are all signed
``00 12 01 00 00 00 Instantel\\0`` (sig-A in earlier notes). Two
Series III (Blastware) binaries appear in the example corpus
(``BE9439_*``) — they share the ``.IDFW``/``.IDFH`` extension by
filing convention but carry a BW STRT header (``10 00 01 80 00 00
Instantel STRT...``) and are NOT Thor data. The reader detects
them by signature and raises NotImplementedError pointing callers
at ``minimateplus.event_file_io.read_blastware_file()``.
- **IDFW waveform body** reuses the BW segment-rotated block codec
verbatim. Body always starts at file offset ``0x0f1f``. Samples
decoded via ``minimateplus.waveform_codec.decode_waveform_v2``
with 8799% byte-exact match against ``.IDFW.txt`` sidecar (quiet
events). Loud events hit the BW codec's known walker-stops-early
limit. Residual ~3% drift on per-sample deltas — likely a
Thor-specific 12-bit delta refinement that BW's codec doesn't
model. Geo LSB = 0.0003 in/s; mic factor ~2.14e-6 psi/count.
- **IDFH histogram body**: 12-byte segment header
``[len_be 2B] 0a 00 00 00 [00 NN_counter] 05 3f`` introduces a
segment of ``N`` 72-byte interval records (``N = (len - 10) // 72``).
Each record holds 4 × 16-byte per-channel min/max/halfp + 8-byte
tail. Geo peaks via ``max(|min|, |max|) / 32768 × 10`` in/s
(matches sidecar within ~1.8%), freq via ``512 / halfp`` Hz.
**All 859 Thor IDFH files in the corpus decode (181,071 intervals).**
- Binary metadata directly extracted: serial, timestamp, sample_rate,
record_time, calibration_date. Other fields fall back to the paired
``.IDFW.txt`` / ``.IDFH.txt`` sidecar (consumed by
``WaveformStore.save_imported_idf``).
- ``write_idf_file(path, event)`` (eventually)
Round-trip event reconstruction, used for verifying the codec
against captured device files the way ``write_blastware_file``
verifies the Series III codec.
- Helpers for decoding the binary's per-channel sample arrays into
physical units, the per-event flash buffer's monitor-log records,
etc.
The reverse-engineering path: pair every ``.IDFW`` binary in
``thor-watcher/example-data/`` with its sibling ``.IDFW.txt``, treating
the txt's "Waveform Data Channels" block as ground-truth, and align
the binary's per-channel int16-or-similar arrays against it. Header
fields (sample rate, channel count, record time, timestamps) sit before
the sample block — same approach as the BW codec where ASCII strings
inside the binary (``Project:``, ``Client:``, etc.) anchored field
discovery.
The full reverse-engineering writeup lives in
``docs/idf_protocol_reference.md``.
"""
from __future__ import annotations
import datetime
import struct
from dataclasses import dataclass
from pathlib import Path
from typing import Union
from typing import Optional, Union
from .models import IdfEvent
from minimateplus.waveform_codec import decode_waveform_v2
from .models import IdfEvent, IdfPeaks, IdfReport
def read_idf_file(path: Union[str, Path]) -> "IdfEvent":
"""Parse a Thor ``.IDFW``/``.IDFH`` binary into an ``IdfEvent``.
# Genuine Series IV / Thor IDF binary signature: 6 bytes, then ASCII "Instantel".
_THOR_PREFIX = b"\x00\x12\x01\x00\x00\x00"
# Stray Series III (Blastware) binaries that occasionally turn up in Thor
# corpus directories renamed to the .IDFW/.IDFH convention. Their header
# (`10 00 01 80 00 00 Instantel STRT ...`) is byte-for-byte a BW SUB 5A
# STRT record, not a Thor binary. Detected so we can refuse-and-route
# rather than mis-parse.
_BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00"
_INSTANTEL_TAG = b"Instantel"
Not yet implemented. When implemented, this will be the canonical
entry point for reading Thor binaries — the ASCII sidecar parser
becomes an optional fast-path metadata supplement rather than the
sole source of device-authoritative data.
# Most common body offset for sig-A IDFW files (~50% of prod events;
# 151/154 in the original tests/fixtures/THORDATA_example corpus). The
# body is the segment-rotated block stream consumed by decode_waveform_v2;
# bytes [0:3] are the magic ``00 02 00`` preamble. Production events
# routinely use other offsets — see :func:`_find_waveform_body_offset`
# for the dynamic scan. This constant survives only as the priority hint.
_BODY_START_SIG_A = 0x0F1F
# Magic bytes that mark a candidate waveform-body preamble.
_BODY_MAGIC = b"\x00\x02\x00"
# Where to start looking for body candidates inside the file. Skip the
# fixed-header region where the same magic legitimately appears inside
# channel-test records and the compliance block (offsets 0x015d, 0x091c,
# 0x0ae2, 0x0d30 in observed events).
_BODY_SCAN_FLOOR = 0x0E00
# Geophone count → in/s, derived from sidecar ground truth: the smallest
# non-zero sample in 1,014-file corpus is 0.0003 in/s.
_GEO_LSB_IPS = 0.0003
# Microphone count → psi, derived from sidecar regression on 50 sample
# pairs from UM11719_20231219162723.IDFW (mic-heavy event).
_MIC_LSB_PSI = 2.14e-6
# IDFH histogram constants.
_IDFH_INTERVAL_SIZE = 72 # bytes per per-interval record
_IDFH_SEGMENT_HEADER = 10 # bytes: [len_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B]
_IDFH_SEGMENT_TAIL = 2 # bytes after the interval data block, before next marker
_IDFH_HALFP_FREQ_NUM = 512.0 # freq_hz = NUM / halfp; halfp ≤ 5 means ">100 Hz" sentinel
_IDFH_GEO_FULL_SCALE = 10.0 # in/s — Normal range
_IDFH_INT16_FS = 32768.0
_IDFH_CHANNELS = ("Tran", "Vert", "Long", "MicL")
# ─── Binary metadata extraction ─────────────────────────────────────────────
@dataclass
class IdfBinaryMetadata:
"""Fields recoverable from the sig-A binary header (no .txt needed)."""
serial: Optional[str] = None
event_datetime: Optional[datetime.datetime] = None
sample_rate: Optional[int] = None
record_time_sec: Optional[float] = None
calibration_date: Optional[datetime.date] = None
def _read_ascii_z(buf: bytes, off: int, maxlen: int = 64) -> Optional[str]:
if off >= len(buf):
return None
end = buf.find(b"\x00", off, off + maxlen)
if end < 0:
end = min(off + maxlen, len(buf))
s = buf[off:end].decode("ascii", errors="replace").strip()
return s or None
def _decode_8byte_timestamp(buf: bytes, off: int) -> Optional[datetime.datetime]:
"""Layout: ``[day][month][year_hi][year_lo][unknown][hour][min][sec]``."""
if off + 8 > len(buf):
return None
day, mon, yh, yl, _unk, hr, mn, sc = buf[off : off + 8]
year = (yh << 8) | yl
if not (2015 <= year <= 2050 and 1 <= mon <= 12 and 1 <= day <= 31
and 0 <= hr < 24 and 0 <= mn < 60 and 0 <= sc < 60):
return None
try:
return datetime.datetime(year, mon, day, hr, mn, sc)
except ValueError:
return None
def extract_binary_metadata(buf: bytes) -> IdfBinaryMetadata:
"""Pull serial/timestamp/sample_rate/record_time/calibration from the
sig-A binary header.
Field positions confirmed against UM11719_20231219162723.IDFW; stable
across the 151-file sig-A corpus.
"""
raise NotImplementedError(
"IDF binary codec not yet implemented; the .IDFW/.IDFH binary format "
"is undecoded. Use parse_idf_report() on the paired .txt sidecar "
"for device-authoritative metadata."
md = IdfBinaryMetadata()
# Serial: null-terminated ASCII at 0x14E.
md.serial = _read_ascii_z(buf, 0x14E, maxlen=16)
# Sample rate + record time live in a BW-compatible compliance block.
# Locate the 6-byte anchor `be 80 00 00 00 00` and read offsets relative
# to it: anchor-6 = sample_rate uint16 BE; anchor+6 = record_time float32 BE.
anchor = buf.find(b"\xbe\x80\x00\x00\x00\x00", 0x800, 0xA00)
if anchor > 0:
sr_bytes = buf[anchor - 6 : anchor - 4]
if len(sr_bytes) == 2:
sr = int.from_bytes(sr_bytes, "big")
if sr in (256, 512, 1024, 2048, 4096):
md.sample_rate = sr
rt_bytes = buf[anchor + 6 : anchor + 10]
if len(rt_bytes) == 4:
try:
rt = struct.unpack(">f", rt_bytes)[0]
if 0.1 <= rt <= 600.0:
md.record_time_sec = float(rt)
except struct.error:
pass
# Event timestamp: 8 bytes. Position differs between IDFW (0x97A) and
# IDFH (0x9F8); scan a small range and accept the first valid decode.
for off in (0x97A, 0x9F8):
ts = _decode_8byte_timestamp(buf, off)
if ts is not None:
md.event_datetime = ts
break
# Calibration date: day, month, year_be at 0x194-0x197.
if len(buf) > 0x197:
day, mon = buf[0x194], buf[0x195]
year = int.from_bytes(buf[0x196 : 0x198], "big")
if 1 <= mon <= 12 and 1 <= day <= 31 and 2015 <= year <= 2050:
try:
md.calibration_date = datetime.date(year, mon, day)
except ValueError:
pass
return md
# ─── Sample decoder + unit conversion ───────────────────────────────────────
def _find_waveform_body_offset(buf: bytes) -> Optional[int]:
"""Pick the file offset of the waveform body by trial-decoding every
``00 02 00`` magic position past the fixed-header region.
The body's location isn't fixed across all sig-A IDFW files — about
half the production events use ``0x0f1f``, but the rest have offsets
that shift based on header padding / channel-config layout. We
auto-detect by:
1. Find every ``00 02 00`` occurrence past ``_BODY_SCAN_FLOOR``.
2. Try ``decode_waveform_v2()`` on each candidate.
3. Pick the offset whose decoded sample count is largest.
Returns the offset, or ``None`` if no candidate yielded more than
the trivial 2-sample preamble (= "no real body found").
Costs ~2-8 trial decodes per file; in practice the first candidate
past 0x0e00 is usually the right one.
"""
if len(buf) < _BODY_SCAN_FLOOR + 8:
return None
best: Optional[tuple[int, int]] = None # (total_samples, offset)
i = _BODY_SCAN_FLOOR
while True:
j = buf.find(_BODY_MAGIC, i)
if j < 0:
break
i = j + 1
try:
decoded = decode_waveform_v2(buf[j:])
except Exception:
continue
if not decoded:
continue
total = sum(len(v) for v in decoded.values())
# A "real" body has more than just the 2-sample preamble.
if total <= 2:
continue
if best is None or total > best[0]:
best = (total, j)
return best[1] if best else None
def _decode_waveform_samples(buf: bytes) -> Optional[dict]:
"""Decode samples from the sig-A waveform body.
Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in
its own count unit (see :func:`mic_count_to_psi`). Returns None if
no usable body is found.
Uses :func:`_find_waveform_body_offset` to locate the body — the
file-offset varies across events (~50% sit at the canonical
``0x0f1f`` but the rest don't), so the previous hardcoded constant
silently produced 2-sample preamble-only output for half the corpus.
"""
off = _find_waveform_body_offset(buf)
if off is None:
return None
return decode_waveform_v2(buf[off:])
def geo_count_to_ips(count: int) -> float:
"""Convert a Thor geo decoder count to in/s. LSB = 0.0003 in/s."""
return count * _GEO_LSB_IPS
def mic_count_to_psi(count: int) -> float:
"""Convert a Thor mic decoder count to psi. Scale derived from
regression over 50 sample pairs in UM11719_20231219162723.IDFW;
consistent to ~5%. Calibration constants from the channel block
can refine this once decoded.
"""
return count * _MIC_LSB_PSI
# ─── IDFH histogram decoder ─────────────────────────────────────────────────
@dataclass
class IdfhInterval:
"""One decoded histogram interval (typically one minute of monitoring)."""
offset: int # file byte offset of the 72-byte record
# Per-channel min/max ADC counts (int16 BE), half-period samples, peak count.
# Peak = max(|min|, |max|). freq_hz = 512/halfp (None if halfp ≤ 5 →
# ">100 Hz" sentinel; matches sidecar convention).
tran_min: int
tran_max: int
tran_halfp: int
vert_min: int
vert_max: int
vert_halfp: int
long_min: int
long_max: int
long_halfp: int
micl_min: int
micl_max: int
micl_halfp: int
def peak_count(self, channel: str) -> int:
mn = getattr(self, f"{channel.lower()}_min")
mx = getattr(self, f"{channel.lower()}_max")
return max(abs(mn), abs(mx))
def peak_ips(self, channel: str) -> float:
"""Convert peak count to in/s (geo channels only)."""
return self.peak_count(channel) / _IDFH_INT16_FS * _IDFH_GEO_FULL_SCALE
def freq_hz(self, channel: str) -> Optional[float]:
halfp = getattr(self, f"{channel.lower()}_halfp")
if halfp <= 5:
return None
return _IDFH_HALFP_FREQ_NUM / halfp
def _decode_idfh_interval(buf72: bytes, offset: int) -> IdfhInterval:
"""Decode one 72-byte interval record into per-channel min/max/halfp."""
import struct
fields = []
for i in range(4):
block = buf72[i * 16 : (i + 1) * 16]
mn = struct.unpack_from(">h", block, 0)[0]
mx = struct.unpack_from(">h", block, 2)[0]
# block[4:6] = int16 BE, role unknown (possibly time-of-peak)
halfp = struct.unpack_from(">H", block, 6)[0]
# block[10:12] and block[14:16] are uint16 BE with unknown semantics
# (likely sum / count contributions for the PVS computation).
fields.extend([mn, mx, halfp])
# Tail 8 bytes (buf72[64:72]) carry PVS-related data; not yet decoded.
return IdfhInterval(
offset=offset,
tran_min=fields[0], tran_max=fields[1], tran_halfp=fields[2],
vert_min=fields[3], vert_max=fields[4], vert_halfp=fields[5],
long_min=fields[6], long_max=fields[7], long_halfp=fields[8],
micl_min=fields[9], micl_max=fields[10], micl_halfp=fields[11],
)
def decode_idfh_body(buf: bytes) -> list:
"""Walk an IDFH file and decode every interval record.
The body has one or more segments; each segment header is 12 bytes:
``[length_be 2B][0a 00 00 00][00 NN_counter][05 3f]`` where ``length``
is bytes from the magic through the end of the interval block
(= 10 + 72 × n_intervals). Segments are separated by a 2-byte tail
+ next-segment 2-byte prefix (the bytes before the next length field).
Confirmed against the 859-file corpus (181,071 intervals decoded; 1
failure is the sig-B BE9439 file).
"""
intervals: list = []
i = 0
while True:
j = buf.find(b"\x0a\x00\x00\x00", i)
if j < 0 or j < 2:
break
# Validate: [length_be][0a 00 00 00][00 NN][05 3f]
if buf[j + 4] != 0x00 or buf[j + 6 : j + 8] != b"\x05\x3f":
i = j + 1
continue
length = int.from_bytes(buf[j - 2 : j], "big")
n = (length - _IDFH_SEGMENT_HEADER) // _IDFH_INTERVAL_SIZE
if n <= 0:
i = j + 1
continue
header_start = j - 2
interval_start = header_start + _IDFH_SEGMENT_HEADER
for k in range(n):
off = interval_start + k * _IDFH_INTERVAL_SIZE
if off + _IDFH_INTERVAL_SIZE > len(buf):
break
chunk = buf[off : off + _IDFH_INTERVAL_SIZE]
intervals.append(_decode_idfh_interval(chunk, off))
# Advance past this segment + the 2-byte tail.
i = header_start + length + _IDFH_SEGMENT_TAIL
return intervals
# ─── Top-level reader ───────────────────────────────────────────────────────
@dataclass
class IdfReadResult:
"""Return type for :func:`read_idf_file`.
For waveforms (``.IDFW``), ``samples`` holds the per-channel sample
arrays in Thor decoder counts. For histograms (``.IDFH``),
``samples`` is empty and ``intervals`` holds the per-interval
record list (peaks, freqs).
"""
event: IdfEvent
samples: dict # {"Tran": [...], ...} for IDFW; empty for IDFH
binary_metadata: IdfBinaryMetadata
signature: str # always "thor" for now (sig-A genuine Thor)
intervals: Optional[list] = None # list[IdfhInterval] for IDFH; None for IDFW
def read_idf_file(
path: Union[str, Path],
*,
data: Optional[bytes] = None,
) -> IdfReadResult:
"""Parse a Thor ``.IDFW`` binary into an ``IdfEvent`` + decoded samples.
Currently implements signature-A waveforms only. Signature-B
(old-firmware) and ``.IDFH`` histograms raise NotImplementedError;
use the paired ``.IDFW.txt`` / ``.IDFH.txt`` sidecar for those via
``parse_idf_report()``.
Returns an :class:`IdfReadResult`. The caller converts int sample
counts to physical units via :func:`geo_count_to_ips` /
:func:`mic_count_to_psi`.
``path`` is used for filename in error messages and ``.IDFH`` vs
``.IDFW`` suffix detection. When ``data`` is supplied the disk
read is skipped — useful for ingest paths that already have the
bytes in memory and where the file may not exist on disk yet.
"""
p = Path(path)
buf = data if data is not None else p.read_bytes()
if len(buf) < 16 or buf[6:16] != _INSTANTEL_TAG + b"\x00":
raise ValueError(f"{p.name}: not an IDF file (missing Instantel magic)")
sig_prefix = buf[:6]
if sig_prefix == _THOR_PREFIX:
signature = "thor"
elif sig_prefix == _BW_STRAY_PREFIX:
raise NotImplementedError(
f"{p.name}: file has a Series III (Blastware) STRT header in "
"an IDF-named container — not a Thor binary. Route through "
"minimateplus.event_file_io.read_blastware_file() instead "
"(peaks decode; samples & full metadata don't, but it's not "
"Thor data so the Thor codec doesn't apply)."
)
else:
raise ValueError(f"{p.name}: unknown IDF signature {sig_prefix.hex()}")
is_histogram = p.suffix.upper() == ".IDFH"
md = extract_binary_metadata(buf)
if is_histogram:
intervals = decode_idfh_body(buf)
if not intervals:
raise ValueError(f"{p.name}: IDFH body decoded no intervals")
# Peaks: max across all intervals on each channel (per-channel max
# of stored max-magnitudes; sidecar's PPV row carries the same).
peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0)
peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0)
peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0)
# Mic peak in psi — Thor stores per-interval mic ADC counts in the
# binary; convert the max count to psi via the per-count factor.
mic_peak_count = max((iv.peak_count("MicL") for iv in intervals), default=0)
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
rep = IdfReport(
serial_number=md.serial,
event_type="Full Histogram",
event_datetime=md.event_datetime,
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
)
peaks = IdfPeaks(
transverse_ips=peak_tran,
vertical_ips=peak_vert,
longitudinal_ips=peak_long,
peak_vector_sum_ips=None,
mic_pspl_dbl=None, # IDFH binary doesn't carry the dB(L) value
mic_pspl_psi=mic_peak_psi,
)
event = IdfEvent(
serial=md.serial or "UNKNOWN",
timestamp=md.event_datetime or datetime.datetime(1970, 1, 1),
kind="Histogram",
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
peaks=peaks,
report=rep,
)
return IdfReadResult(
event=event,
samples={},
binary_metadata=md,
signature=signature,
intervals=intervals,
)
# Waveform path.
decoded = _decode_waveform_samples(buf)
if decoded is None:
raise ValueError(f"{p.name}: waveform body codec failed")
rep = IdfReport(
serial_number=md.serial,
event_type="Full Waveform",
event_datetime=md.event_datetime,
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
)
def _peak_ips(ch: str) -> float:
arr = decoded.get(ch, [])
return geo_count_to_ips(max((abs(v) for v in arr), default=0))
# Mic peak psi from binary: max absolute MicL ADC count × 2.14e-6 psi/count.
mic_arr = decoded.get("MicL", [])
mic_peak_count = max((abs(v) for v in mic_arr), default=0)
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
peaks = IdfPeaks(
transverse_ips=_peak_ips("Tran"),
vertical_ips=_peak_ips("Vert"),
longitudinal_ips=_peak_ips("Long"),
# PVS requires aligned per-sample √(T²+V²+L²); leave None — the
# sidecar carries it and the bridge picks it up if present.
peak_vector_sum_ips=None,
mic_pspl_dbl=None, # binary IDFW doesn't carry the dB(L) value;
# sidecar .txt fills it via IdfReport.from_dict
mic_pspl_psi=mic_peak_psi,
)
event = IdfEvent(
serial=md.serial or "UNKNOWN",
timestamp=md.event_datetime or datetime.datetime(1970, 1, 1),
kind="Waveform",
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
peaks=peaks,
report=rep,
)
return IdfReadResult(
event=event,
samples=decoded,
binary_metadata=md,
signature=signature,
)
+323
View File
@@ -0,0 +1,323 @@
"""
micromate/idf_to_bw_report.py — adapter that projects a parsed Thor IDF
report (+ binary metadata + decoded IDFH intervals) into the
``bw_report``-shaped dict that :mod:`sfm.report_pdf.gather_report_data`
consumes.
Lets Thor events flow through the existing Series III Event Report PDF
pipeline without duplicating the renderer. Thor's report content is
~95% the same data shape as BW's; the field names differ but the
underlying metrics map 1:1.
Caveats
───────
- **Mic units** — Thor records ``MicPSPL`` natively in dB(L). This
adapter sets ``bw_report.mic.pspl_dbl`` directly; the report
renderer recomputes the equivalent psi via its dBL→psi formula.
- **Saturation / above-range flags** — Thor doesn't always mark
``OORANGE`` the way BW does; we set ``zc_freq_above_range`` only
when a `>100` sentinel was preserved in the raw text.
- **Per-interval data** — for IDFH events we build ``interval_times``
by stepping ``IntervalSize`` from ``HistogramStartTime``; the binary
decoder confirms one record per step (882 / 881 / 881 ... across
the corpus).
- **calibration_by parsing** — Thor's free-form ``Calibration : November
22, 2023 by Instantel`` is split on ``" by "`` to extract the
calibrator; the date prefix is parsed where possible, otherwise
the binary-extracted ``calibration_date`` from
:class:`micromate.idf_file.IdfBinaryMetadata` wins.
"""
from __future__ import annotations
import datetime
import re
from typing import Any, Dict, List, Optional
# ─── Helpers ────────────────────────────────────────────────────────────────
_NUM_RE = re.compile(r"-?\d+(?:\.\d+)?")
def _parse_first_number(s: Optional[str]) -> Optional[float]:
"""Pull the first numeric token from a string like ``"0.1500 in/s"``."""
if s is None:
return None
m = _NUM_RE.search(str(s))
if not m:
return None
try:
return float(m.group(0))
except ValueError:
return None
def _parse_interval_size_s(s: Optional[str]) -> Optional[float]:
"""``"60 sec"`` → 60.0, ``"5 min"`` → 300.0, ``"1 hour"`` → 3600."""
if s is None:
return None
num = _parse_first_number(s)
if num is None:
return None
sl = str(s).lower()
if "hour" in sl or "hr" in sl:
return num * 3600.0
if "min" in sl:
return num * 60.0
return num # default to seconds
def _parse_calibration(text: Optional[str]) -> tuple[Optional[str], Optional[str]]:
"""Split ``"November 22, 2023 by Instantel"`` → (ISO date, calibrator).
Returns ``(None, None)`` if neither half parses.
"""
if not text:
return None, None
parts = str(text).split(" by ", 1)
date_part = parts[0].strip() if parts else None
by_part = parts[1].strip() if len(parts) > 1 else None
iso_date: Optional[str] = None
if date_part:
for fmt in ("%B %d, %Y", "%b %d, %Y", "%Y-%m-%d", "%m/%d/%Y"):
try:
iso_date = datetime.datetime.strptime(date_part, fmt).date().isoformat()
break
except ValueError:
continue
return iso_date, by_part
def _channel_peaks(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]:
"""Map ``tran_ppv`` / ``tran_zc_freq`` / ... → bw_report.peaks.tran shape."""
out: Dict[str, Any] = {}
for src, dst in (
(f"{ch_lc}_ppv", "ppv_ips"),
(f"{ch_lc}_zc_freq", "zc_freq_hz"),
(f"{ch_lc}_time_of_peak", "time_of_peak_s"),
(f"{ch_lc}_peak_acceleration", "peak_accel_g"),
(f"{ch_lc}_peak_displacement", "peak_disp_in"),
):
v = idf.get(src)
if v is not None:
out[dst] = v
# ZC freq ">100" sentinel: the raw text carries it under the un-typed
# key (e.g. ``raw["tran_zc_freq"]`` would be ``">100"``), and our parser
# dropped the typed entry. Detect that case and flag.
raw_zc = idf.get(f"{ch_lc}_zc_freq")
if isinstance(raw_zc, str) and ">" in raw_zc:
out["zc_freq_above_range"] = True
out.pop("zc_freq_hz", None)
return out
def _sensor_check(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]:
out: Dict[str, Any] = {}
fr = idf.get(f"{ch_lc}_test_freq")
if fr is not None:
out["freq_hz"] = _parse_first_number(fr)
rt = idf.get(f"{ch_lc}_test_ratio")
if rt is not None:
out["ratio"] = _parse_first_number(rt)
am = idf.get(f"{ch_lc}_test_amplitude")
if am is not None:
out["amplitude_mv"] = _parse_first_number(am)
res = idf.get(f"{ch_lc}_test_results")
if res is not None:
out["result"] = str(res).strip()
return {k: v for k, v in out.items() if v is not None}
def _interval_times(idf: Dict[str, Any], n_intervals: Optional[int]) -> List[str]:
"""Synthesise per-interval timestamps from start + interval_size × k.
Returns ``[]`` when start time or interval size is unknown.
"""
if not n_intervals:
return []
start_date = idf.get("histogram_start_date") or idf.get("event_date")
start_time = idf.get("histogram_start_time") or idf.get("event_time")
iv_str = idf.get("interval_size")
iv_s = _parse_interval_size_s(iv_str)
if not (start_date and start_time and iv_s):
return []
try:
t0 = datetime.datetime.strptime(f"{start_date} {start_time}", "%Y-%m-%d %H:%M:%S")
except ValueError:
return []
out = []
for k in range(int(n_intervals)):
t = t0 + datetime.timedelta(seconds=iv_s * (k + 1))
out.append(t.isoformat())
return out
# ─── Top-level adapter ──────────────────────────────────────────────────────
def build_bw_report_from_idf(
idf_report: Dict[str, Any],
*,
binary_md=None,
intervals: Optional[list] = None,
is_histogram: Optional[bool] = None,
) -> Dict[str, Any]:
"""Project a parsed IDF report dict (and optional binary metadata +
decoded IDFH intervals) into the BW report sidecar shape.
The returned dict is structurally identical to what
``minimateplus.event_file_io._bw_report_to_dict`` produces from a
real BW ASCII report — it can be assigned to
``sidecar["bw_report"]`` and consumed verbatim by
``sfm.report_pdf.gather_report_data``.
``intervals`` is the list of :class:`micromate.idf_file.IdfhInterval`
objects from :func:`micromate.idf_file.decode_idfh_body`; only used
for histogram events to derive accurate ``interval_times``.
"""
if is_histogram is None:
et = str(idf_report.get("event_type", ""))
is_histogram = et.lower().startswith("full histogram")
# ── Trigger / recording / device ─────────────────────────────────────
trigger_channel = idf_report.get("trigger")
trigger_level = _parse_first_number(idf_report.get("geo_trigger_level"))
geo_range_ips = _parse_first_number(idf_report.get("geo_range"))
cal_iso, cal_by = _parse_calibration(idf_report.get("calibration"))
# Prefer the binary-extracted calibration_date when our text parse fell
# through; the binary date is unambiguous.
if cal_iso is None and binary_md is not None and binary_md.calibration_date:
cal_iso = binary_md.calibration_date.isoformat()
# ── Histogram fields ────────────────────────────────────────────────
hist_block: Dict[str, Any] = {
"start": None, "stop": None, "n_intervals": None,
"interval_size": None, "interval_size_s": None,
"channel_peak_when": {},
}
if is_histogram:
sd = idf_report.get("histogram_start_date")
st = idf_report.get("histogram_start_time")
if sd and st:
try:
hist_block["start"] = datetime.datetime.strptime(
f"{sd} {st}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
pass
ed = idf_report.get("histogram_stop_date")
et_ = idf_report.get("histogram_stop_time")
if ed and et_:
try:
hist_block["stop"] = datetime.datetime.strptime(
f"{ed} {et_}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
pass
n_raw = idf_report.get("number_of_intervals")
if n_raw is not None:
try:
# Thor reports a float like "81.04"; round to int (the BW
# report uses an int for the column).
hist_block["n_intervals"] = int(float(str(n_raw)))
except ValueError:
pass
# When the binary decoder gave us the actual interval count, prefer it.
if intervals is not None:
hist_block["n_intervals"] = len(intervals)
hist_block["interval_size"] = idf_report.get("interval_size")
hist_block["interval_size_s"] = _parse_interval_size_s(idf_report.get("interval_size"))
# interval_times derived from start+step (the BW report uses the
# exact strings; we match its representation).
times = _interval_times(idf_report, hist_block["n_intervals"])
# Per-channel peak when (absolute date+time at which the channel's
# peak occurred over the histogram run). Thor splits this into
# ``TranPeakDate`` / ``TranPeakTime`` etc.
peak_when: Dict[str, str] = {}
for ch_label, ch_lc in (("Tran", "tran"), ("Vert", "vert"), ("Long", "long"), ("MicL", "mic")):
d = idf_report.get(f"{ch_lc}_peak_date")
t = idf_report.get(f"{ch_lc}_peak_time")
if d and t:
try:
peak_when[ch_label] = datetime.datetime.strptime(
f"{d} {t}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
continue
if peak_when:
hist_block["channel_peak_when"] = peak_when
# ── Mic block ────────────────────────────────────────────────────────
mic_block = {
"weighting": "L", # Thor mic is ISEE Linear
"pspl_dbl": idf_report.get("mic_ppv"), # the dB(L) float
"pspl_saturated": False,
"zc_freq_hz": idf_report.get("mic_zc_freq"),
"zc_freq_above_range": isinstance(idf_report.get("mic_zc_freq"), str)
and ">" in str(idf_report.get("mic_zc_freq")),
"time_of_peak_s": idf_report.get("mic_time_of_peak"),
}
if mic_block["zc_freq_above_range"]:
mic_block["zc_freq_hz"] = None
# ── Peaks ────────────────────────────────────────────────────────────
vs_block = {
"ips": idf_report.get("peak_vector_sum"),
"time_s": _parse_first_number(idf_report.get("peak_vector_sum_time_sum")),
"when": None,
"saturated": False,
}
if is_histogram:
# PVS absolute date+time, when present.
vs_d = idf_report.get("peak_vector_sum_date")
vs_t = idf_report.get("peak_vector_sum_time")
if vs_d and vs_t:
try:
vs_block["when"] = datetime.datetime.strptime(
f"{vs_d} {vs_t}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
pass
return {
"available": True,
"event_type": idf_report.get("event_type"),
"version": idf_report.get("version"),
"trigger": {
"channel": trigger_channel,
"geo_level_ips": trigger_level,
},
"recording": {
"sample_rate_sps": idf_report.get("sample_rate"),
"record_time_s": idf_report.get("record_time_sec"),
"pretrig_s": idf_report.get("pre_trigger_sec"),
"stop_mode": idf_report.get("record_stop_mode"),
"geo_range_ips": geo_range_ips,
"units": idf_report.get("units"),
},
"device": {
"battery_volts": idf_report.get("battery_volts"),
"calibration_date": cal_iso,
"calibration_by": cal_by,
},
"peaks": {
"tran": _channel_peaks(idf_report, "tran"),
"vert": _channel_peaks(idf_report, "vert"),
"long": _channel_peaks(idf_report, "long"),
"vector_sum": vs_block,
},
"mic": mic_block,
"sensor_check": {
"tran": _sensor_check(idf_report, "tran"),
"vert": _sensor_check(idf_report, "vert"),
"long": _sensor_check(idf_report, "long"),
"mic": _sensor_check(idf_report, "mic"),
},
"histogram": hist_block,
"monitor_log": [],
"pc_sw_version": None,
}
+27 -6
View File
@@ -159,12 +159,23 @@ class IdfReport:
@dataclass
class IdfPeaks:
"""Geophone + mic peak values for one Thor event. Native Thor units."""
"""Geophone + mic peak values for one Thor event. Native Thor units.
Thor stores the mic peak in two parallel forms — ``mic_pspl_dbl`` is
what the sidecar's top-level ``MicPSPL`` header field carries (dB(L)),
used in the report header. ``mic_pspl_psi`` is the psi value derived
either from the IDFW sample table / IDFH interval column 9, or from
the binary mic counts (~2.14e-6 psi/count). Needed because the
BW-shaped ``PeakValues.micl`` consumed by ``event_hdf5.write_event_hdf5``
expects psi — feeding it dB(L) makes the h5 mic-chart scale factor
blow up.
"""
transverse_ips: Optional[float] = None # in/s
vertical_ips: Optional[float] = None # in/s
longitudinal_ips: Optional[float] = None # in/s
peak_vector_sum_ips: Optional[float] = None # in/s
mic_pspl_dbl: Optional[float] = None # dB(L)
mic_pspl_psi: Optional[float] = None # psi
@dataclass
@@ -324,10 +335,14 @@ class IdfEvent:
machinery without those code paths needing to know about Thor.
Caveats of the bridge:
- ``mic_ppv`` on the produced Event carries Thor's dB(L) value
verbatim — the UI distinguishes via the ``device_family``
column (Phase 1). Don't run the BW psi→dBL converter on
Series IV rows.
- ``PeakValues.micl`` carries the mic peak in **psi** (matching
BW's convention) — set from :attr:`IdfPeaks.mic_pspl_psi`,
with a dB(L)→psi fallback when only the dB(L) value is
available. This is what the h5 writer's mic-scale-factor
logic needs. The dB(L) value still flows through
``bw_report.mic.pspl_dbl`` (set by the
``idf_to_bw_report`` adapter) and the renderer reads it
from there for the report header.
- Many Thor-specific fields (Peak Acceleration / Displacement,
sensor self-check, calibration) don't have a slot in
``Event``. The full IdfReport is preserved on the
@@ -349,11 +364,17 @@ class IdfEvent:
minute=self.timestamp.minute,
second=self.timestamp.second,
)
# Resolve mic peak as psi. Priority: binary-derived mic_pspl_psi
# (set by read_idf_file) > dB(L)→psi fallback via standard formula
# (psi = 2.9e-9 × 10^(dBL/20)) > None.
mic_psi = self.peaks.mic_pspl_psi
if mic_psi is None and self.peaks.mic_pspl_dbl is not None:
mic_psi = 2.9e-9 * (10.0 ** (self.peaks.mic_pspl_dbl / 20.0))
pv = PeakValues(
tran=self.peaks.transverse_ips,
vert=self.peaks.vertical_ips,
long=self.peaks.longitudinal_ips,
micl=self.peaks.mic_pspl_dbl, # dB(L) — see caveat above
micl=mic_psi, # psi, matching BW's convention (h5 scaling depends on this)
peak_vector_sum=self.peaks.peak_vector_sum_ips,
)
pi = ProjectInfo(
+227 -11
View File
@@ -60,6 +60,18 @@ class ChannelStats:
time_of_peak_s: Optional[float] = None # seconds (relative to trigger; can be negative)
peak_accel_g: Optional[float] = None # g (geo channels only)
peak_disp_in: Optional[float] = None # in (geo channels only)
# When BW writes "OORANGE" (Out Of Range — truncated) for a PPV
# value, the true peak exceeded the channel's full-scale range.
# We substitute the range max (e.g. 10.000 in/s for Normal range)
# as a lower bound, and flag here so downstream UI / alerts know
# to render "> 10 in/s" or "saturated" instead of trusting the
# value as an exact measurement.
ppv_saturated: bool = False
# Set when BW writes ">100 Hz" for ZC Freq — the zero-crossing
# algorithm's peak frequency exceeded the device's reporting
# ceiling (typically 100 Hz on V10.72). zc_freq_hz gets the
# threshold (100.0) as a lower bound; downstream UI renders ">100".
zc_freq_above_range: bool = False
@dataclass
@@ -69,6 +81,14 @@ class MicStats:
pspl_dbl: Optional[float] = None # dB(L)
zc_freq_hz: Optional[float] = None
time_of_peak_s: Optional[float] = None
# Set when BW writes "OORANGE" for PSPL — mic exceeded its
# measurement range. pspl_dbl gets the conservative upper bound
# 140 dBL (typical NL-43 max; some units cap at 148). Consumers
# should render "> 140 dB(L)" or similar when this flag is set.
pspl_saturated: bool = False
# Same semantics as ChannelStats.zc_freq_above_range — mic ZC
# peak exceeded device reporting ceiling.
zc_freq_above_range: bool = False
@dataclass
@@ -92,6 +112,35 @@ class MonitorLogEntry:
description: Optional[str] = None
# BW saturation marker — appears in PPV / Peak Vector Sum / similar
# numeric fields when the underlying measurement exceeded the
# channel's full-scale range (e.g., a geophone reading > 10 in/s at
# Normal range, or a mic exceeding its sensitivity ceiling). Treated
# as "≥ range_max" + a saturated flag rather than discarded.
# Appears as: ``"Tran PPV : OORANGE in/s"``
_OORANGE_MARKERS = ("OORANGE", "OUT OF RANGE")
def _is_oorange(value: str) -> bool:
"""True when a BW numeric field is an Out-Of-Range saturation marker."""
s = value.strip().upper()
return any(m in s for m in _OORANGE_MARKERS)
def _parse_above_range(value: str) -> Optional[float]:
"""For BW "above-range" markers like ">100 Hz", return the threshold.
BW writes ZC Freq as ">100 Hz" when the zero-crossing algorithm sees
a peak too fast to count (device cuts off at 100 Hz). Returns the
numeric portion after the '>' (e.g. 100.0), or None if `value` is
not an above-range marker.
"""
s = value.strip()
if not s.startswith(">"):
return None
return _parse_number(s[1:])
@dataclass
class BwAsciiReport:
"""Structured representation of one BW per-event ASCII export."""
@@ -144,6 +193,29 @@ class BwAsciiReport:
# ── Vector sum ──────────────────────────────────────────────────────────
peak_vector_sum_ips: Optional[float] = None
peak_vector_sum_time_s: Optional[float] = None
# Saturation flag — set when BW writes "OORANGE" for the PVS. We
# then substitute sqrt(3) * geo_range_ips as a conservative upper
# bound (the theoretical maximum PVS when all 3 geo channels are
# simultaneously at full-scale). Consumers should display this as
# ">{value} in/s" or similar.
peak_vector_sum_saturated: bool = False
# Histograms additionally have an absolute date+time for the PVS
# (it occurred at a specific interval). Waveform reports show
# only the relative-time value above.
peak_vector_sum_when: Optional[datetime.datetime] = None
# ── Histogram-specific fields (populated only when Event Type starts
# with 'Histogram' / 'Full Histogram' / 'Histogram + Continuous') ──
histogram_start: Optional[datetime.datetime] = None
histogram_stop: Optional[datetime.datetime] = None
histogram_n_intervals: Optional[int] = None # e.g. 4, 1436
histogram_interval_size_str: Optional[str] = None # "1 minute" / "5 minutes" / "15 seconds"
histogram_interval_size_s: Optional[float] = None # parsed to seconds
# Per-channel absolute peak time+date (histogram-specific). For
# waveform events these are None — those reports use the channel's
# time_of_peak_s (relative to trigger) instead. Keyed by channel
# name ("Tran", "Vert", "Long", "MicL").
channel_peak_when: Dict[str, datetime.datetime] = field(default_factory=dict)
# ── Sensor self-check (per channel) ─────────────────────────────────────
sensor_check: Dict[str, SensorCheck] = field(default_factory=dict)
@@ -223,6 +295,46 @@ def _parse_event_date(s: str) -> Optional[datetime.date]:
return None
def _parse_iso_date(s: str) -> Optional[datetime.date]:
"""Parse "2026-05-16" → date. Histograms use ISO format for their
Start Date / Stop Date / Peak Date fields; waveforms use the
"May 8, 2026" long form which `_parse_event_date` handles."""
s = s.strip()
try:
return datetime.date.fromisoformat(s)
except ValueError:
return None
_INTERVAL_UNIT_SECONDS = {
"second": 1, "seconds": 1, "sec": 1, "secs": 1,
"minute": 60, "minutes": 60, "min": 60, "mins": 60,
"hour": 3600, "hours": 3600, "hr": 3600, "hrs": 3600,
}
def _parse_interval_size(s: str) -> Optional[float]:
"""Parse "1 minute" / "5 minutes" / "15 seconds" / "2 seconds" → seconds.
Handles the BW Compliance Setup → Histogram Interval values verbatim
("2 seconds", "5 seconds", "15 seconds", "1 minute", "5 minutes",
"15 minutes") plus a few defensive variants.
"""
if not s:
return None
parts = s.strip().split()
if len(parts) < 2:
return None
try:
n = float(parts[0])
except ValueError:
return None
unit_per_s = _INTERVAL_UNIT_SECONDS.get(parts[1].lower())
if unit_per_s is None:
return None
return n * unit_per_s
def _parse_event_time(s: str) -> Optional[datetime.time]:
"""Parse "15:56:35" → time."""
s = s.strip()
@@ -336,6 +448,15 @@ def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwA
in_user_notes_block = False
user_note_position = 0
# Histogram-field staging — BW writes <Channel> Peak Time and
# <Channel> Peak Date on separate lines (and similarly Histogram
# Start Time / Date). We stash the partial value when the time
# line arrives and combine it when the matching date line arrives.
_hist_start_time: Optional[datetime.time] = None
_hist_stop_time: Optional[datetime.time] = None
_pending_peak_time: Dict[str, Optional[datetime.time]] = {}
_pvs_time_raw: Optional[str] = None # last Peak Vector Sum Time value, raw
while i < n:
raw_line = lines[i]
i += 1
@@ -420,24 +541,113 @@ def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwA
):
ch_name, stat = key.split(" ", 1)
cs = report.channels.setdefault(ch_name, ChannelStats())
num = _parse_number(value)
if stat == "PPV": cs.ppv_ips = num
elif stat == "ZC Freq": cs.zc_freq_hz = num
elif stat == "Time of Peak": cs.time_of_peak_s = num
elif stat == "Peak Acceleration": cs.peak_accel_g = num
elif stat == "Peak Displacement": cs.peak_disp_in = num
if stat == "PPV":
if _is_oorange(value):
# Channel saturated — substitute range max as lower
# bound; flag so downstream UI can render "> 10 in/s".
cs.ppv_ips = report.geo_range_ips
cs.ppv_saturated = True
else:
cs.ppv_ips = _parse_number(value)
elif stat == "ZC Freq":
# ">100 Hz" → store threshold + flag; numeric → parse normally
threshold = _parse_above_range(value)
if threshold is not None:
cs.zc_freq_hz = threshold
cs.zc_freq_above_range = True
else:
cs.zc_freq_hz = _parse_number(value)
else:
num = _parse_number(value)
if stat == "Time of Peak": cs.time_of_peak_s = num
elif stat == "Peak Acceleration": cs.peak_accel_g = num
elif stat == "Peak Displacement": cs.peak_disp_in = num
# ── Histogram-specific fields ────────────────────────────────────────
# Histograms have Start/Stop time+date pairs + an interval count
# and size, plus per-channel absolute Peak Time/Date instead of
# the waveform's relative Time of Peak.
elif key == "Histogram Start Time":
_hist_start_time = _parse_event_time(value)
elif key == "Histogram Start Date":
_d = _parse_iso_date(value)
if _d and _hist_start_time:
report.histogram_start = datetime.datetime.combine(_d, _hist_start_time)
elif key == "Histogram Stop Time":
_hist_stop_time = _parse_event_time(value)
elif key == "Histogram Stop Date":
_d = _parse_iso_date(value)
if _d and _hist_stop_time:
report.histogram_stop = datetime.datetime.combine(_d, _hist_stop_time)
elif key == "Number of Intervals":
try:
report.histogram_n_intervals = int(float(value.strip()))
except ValueError:
pass
elif key == "Interval Size":
report.histogram_interval_size_str = value.strip()
report.histogram_interval_size_s = _parse_interval_size(value)
# ── Per-channel histogram Peak Date / Peak Time ──
# Lines like "Tran Peak Time : 22:31:38" + "Tran Peak Date : 2026-05-16"
elif key in ("Tran Peak Time", "Vert Peak Time", "Long Peak Time", "MicL Time"):
ch_name = "MicL" if key == "MicL Time" else key.split(" ", 1)[0]
_pending_peak_time[ch_name] = _parse_event_time(value)
elif key in ("Tran Peak Date", "Vert Peak Date", "Long Peak Date", "MicL Date"):
ch_name = "MicL" if key == "MicL Date" else key.split(" ", 1)[0]
_d = _parse_iso_date(value)
_t = _pending_peak_time.get(ch_name)
if _d and _t:
report.channel_peak_when[ch_name] = datetime.datetime.combine(_d, _t)
# ── Vector Sum ───────────────────────────────────────────────────────
elif key == "Peak Vector Sum":
report.peak_vector_sum_ips = _parse_number(value)
elif key == "Peak Vector Sum Time":
if _is_oorange(value):
# PVS saturated — conservative upper bound is
# sqrt(3) * geo_range_ips (all 3 channels at full-scale).
# Real PVS could be lower (channels rarely peak
# simultaneously) but never higher within the range.
if report.geo_range_ips is not None:
import math as _math
report.peak_vector_sum_ips = _math.sqrt(3) * report.geo_range_ips
report.peak_vector_sum_saturated = True
else:
report.peak_vector_sum_ips = _parse_number(value)
# BW writes the PVS-time label with a typo: "Peak Vector Sum TimeSum"
# (looks like Sum got appended twice). Accept both forms. Confirmed
# against actual BW output on 2026-05-27 — every PVS-time line in
# the field examples (T190, T438, K557) uses the typo'd label.
elif key in ("Peak Vector Sum Time", "Peak Vector Sum TimeSum"):
report.peak_vector_sum_time_s = _parse_number(value)
_pvs_time_raw = value
elif key == "Peak Vector Sum Date":
# Histogram-mode PVS gets paired with a date. We may have
# captured 'Peak Vector Sum Time' as either a relative
# seconds float (waveform) or an HH:MM:SS string we
# interpreted as a number. For histograms, BW writes
# "Peak Vector Sum Time : 22:33:52" which _parse_number
# parses as 22.0 (loses information). When Peak Vector Sum
# Date arrives, re-parse the previous PVS time line as a
# clock time and combine into an absolute datetime.
_d = _parse_iso_date(value)
if _d and _pvs_time_raw is not None:
_t = _parse_event_time(_pvs_time_raw)
if _t:
report.peak_vector_sum_when = datetime.datetime.combine(_d, _t)
# The earlier seconds parse was bogus for histograms;
# clear it so consumers don't think it's a real offset.
report.peak_vector_sum_time_s = None
# ── Microphone block ────────────────────────────────────────────────
elif key == "Microphone":
report.mic.weighting = value
elif key == "MicL PSPL":
report.mic.pspl_dbl = _parse_number(value)
if _is_oorange(value):
# Mic saturated — substitute conservative upper bound 140 dBL.
report.mic.pspl_dbl = 140.0
report.mic.pspl_saturated = True
else:
report.mic.pspl_dbl = _parse_number(value)
# Mirror onto the "MicL" entry in channels so callers querying
# `channels["MicL"].ppv_ips` see something — but it's dB(L), not
# in/s, so we store as-is in the MicStats and mark the channel.
@@ -446,9 +656,15 @@ def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwA
cs = report.channels.setdefault("MicL", ChannelStats())
cs.time_of_peak_s = report.mic.time_of_peak_s
elif key == "MicL ZC Freq":
report.mic.zc_freq_hz = _parse_number(value)
threshold = _parse_above_range(value)
if threshold is not None:
report.mic.zc_freq_hz = threshold
report.mic.zc_freq_above_range = True
else:
report.mic.zc_freq_hz = _parse_number(value)
cs = report.channels.setdefault("MicL", ChannelStats())
cs.zc_freq_hz = report.mic.zc_freq_hz
cs.zc_freq_hz = report.mic.zc_freq_hz
cs.zc_freq_above_range = report.mic.zc_freq_above_range
# ── Sensor self-check ────────────────────────────────────────────────
elif key in (
+94 -8
View File
@@ -49,7 +49,7 @@ SIDECAR_KIND = "sfm.event"
# bumped without a `pip install` re-run — leading to confusing stale
# version stamps in sidecars. Bump this constant and CHANGELOG.md
# together at release time.
TOOL_VERSION = "0.20.0"
TOOL_VERSION = "0.21.1"
try:
# Best-effort: prefer the installed metadata when it's NEWER than the
@@ -120,7 +120,16 @@ def _bw_report_to_dict(report: BwAsciiReport) -> dict:
"peak_disp_in": cs.peak_disp_in,
}
# Drop all-None entries — keeps the JSON tidy for partial reports.
return {k: v for k, v in out.items() if v is not None}
out = {k: v for k, v in out.items() if v is not None}
# Saturation flag (only present when True) — signals that ppv_ips
# is the channel range max (a lower bound), not an exact reading.
if getattr(cs, "ppv_saturated", False):
out["ppv_saturated"] = True
# ZC Freq above device reporting ceiling (BW ">100 Hz") — value
# in zc_freq_hz is the threshold, not an exact measurement.
if getattr(cs, "zc_freq_above_range", False):
out["zc_freq_above_range"] = True
return out
def _sc(ch_name: str) -> dict:
sc = report.sensor_check.get(ch_name)
@@ -169,15 +178,25 @@ def _bw_report_to_dict(report: BwAsciiReport) -> dict:
"vert": _ch("Vert"),
"long": _ch("Long"),
"vector_sum": {
"ips": report.peak_vector_sum_ips,
"time_s": report.peak_vector_sum_time_s,
"ips": report.peak_vector_sum_ips,
"time_s": report.peak_vector_sum_time_s,
# Histogram events have an absolute date+time for the PVS
# (the interval at which it occurred); waveform events
# only have the time_s offset.
"when": report.peak_vector_sum_when.isoformat() if report.peak_vector_sum_when else None,
# Set when BW reported the PVS as OORANGE — value is the
# conservative upper bound sqrt(3) * geo_range_ips, not
# an exact peak.
"saturated": bool(getattr(report, "peak_vector_sum_saturated", False)),
},
},
"mic": {
"weighting": report.mic.weighting,
"pspl_dbl": report.mic.pspl_dbl,
"zc_freq_hz": report.mic.zc_freq_hz,
"time_of_peak_s": report.mic.time_of_peak_s,
"weighting": report.mic.weighting,
"pspl_dbl": report.mic.pspl_dbl,
"pspl_saturated": bool(getattr(report.mic, "pspl_saturated", False)),
"zc_freq_hz": report.mic.zc_freq_hz,
"zc_freq_above_range": bool(getattr(report.mic, "zc_freq_above_range", False)),
"time_of_peak_s": report.mic.time_of_peak_s,
},
"sensor_check": {
"tran": _sc("Tran"),
@@ -185,6 +204,17 @@ def _bw_report_to_dict(report: BwAsciiReport) -> dict:
"long": _sc("Long"),
"mic": _sc("MicL"),
},
# Histogram-specific fields (None on waveform-mode events).
# Per-channel absolute peak time/date for histograms — for
# waveforms see channels[ch]["time_of_peak_s"] instead.
"histogram": {
"start": report.histogram_start.isoformat() if report.histogram_start else None,
"stop": report.histogram_stop.isoformat() if report.histogram_stop else None,
"n_intervals": report.histogram_n_intervals,
"interval_size": report.histogram_interval_size_str,
"interval_size_s": report.histogram_interval_size_s,
"channel_peak_when": {ch: dt.isoformat() for ch, dt in report.channel_peak_when.items()},
},
"monitor_log": monitor_log,
"pc_sw_version": report.pc_sw_version,
}
@@ -254,6 +284,60 @@ def apply_report_to_event(event: Event, report: BwAsciiReport) -> None:
event.rectime_seconds = report.record_time_s
def apply_bw_report_dict_to_event(event: Event, bw_report: dict) -> None:
"""Mirror of ``apply_report_to_event`` for the projected sidecar
dict shape (as produced by ``_bw_report_to_dict``).
Why this exists
───────────────
The ingest path holds a live ``BwAsciiReport`` parsed straight from
the ``_ASCII.TXT`` and uses ``apply_report_to_event`` to overlay
device-authoritative peaks onto the codec output before insert.
The backfill path doesn't have the original ``.TXT`` (it's not
retained in the waveform store), but it does have the preserved
``bw_report`` block from the sidecar — which contains the same
projected fields. Re-overlaying those during a backfill keeps the
DB peak columns aligned with what BW reports rather than letting
the codec output (which may be incomplete for unhandled formats or
walker edge cases) win by default.
No-ops cleanly when ``bw_report`` is ``None``, empty, or missing
any particular sub-field — only fields with a concrete value get
written. Mirrors ``apply_report_to_event``'s "report wins where
present" semantics.
"""
if not bw_report:
return
if event.peak_values is None:
event.peak_values = PeakValues()
pv = event.peak_values
peaks = bw_report.get("peaks") or {}
tran = (peaks.get("tran") or {}).get("ppv_ips")
vert = (peaks.get("vert") or {}).get("ppv_ips")
long = (peaks.get("long") or {}).get("ppv_ips")
if tran is not None: pv.tran = tran
if vert is not None: pv.vert = vert
if long is not None: pv.long = long
vs_ips = (peaks.get("vector_sum") or {}).get("ips")
if vs_ips is not None:
pv.peak_vector_sum = vs_ips
mic = bw_report.get("mic") or {}
pspl = mic.get("pspl_dbl")
if pspl is not None and pspl > 0:
pv.micl = _dbl_to_psi(pspl)
rec = bw_report.get("recording") or {}
sr = rec.get("sample_rate_sps")
if sr:
event.sample_rate = sr
rt = rec.get("record_time_s")
if rt is not None:
event.rectime_seconds = rt
def _project_info_to_dict(pi: Optional[ProjectInfo]) -> dict:
if pi is None:
return {
@@ -278,6 +362,7 @@ def event_to_sidecar_dict(
blastware_filesize: int,
blastware_sha256: str,
source_kind: str = "sfm-live",
txt_filename: Optional[str] = None,
a5_pickle_filename: Optional[str] = None,
tool_version: str = _TOOL_VERSION_DEFAULT,
captured_at: Optional[datetime.datetime] = None,
@@ -394,6 +479,7 @@ def event_to_sidecar_dict(
"captured_at": captured_at.isoformat() + "Z" if captured_at.tzinfo is None else captured_at.isoformat(),
"tool_version": tool_version,
"a5_pickle_filename": a5_pickle_filename,
"txt_filename": txt_filename,
},
"review": review or {
+54 -38
View File
@@ -28,18 +28,32 @@ iterate 32-stride and stop before the tail.
[1] segment_id (uint8) 0x00..0x03 256 blocks per segment
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, )
[4:6] 0x000a (uint16 LE) constant marker (= 10)
[6:8] T_peak_count uint16 LE Tran peak (count × 0.005 in/s)
[6] T_peak_count uint8 Tran peak (count × 0.005 in/s, max 1.275 in/s)
[7] T_annotation uint8 empirically non-zero on intervals with sub-Hz
or unmeasurable Tran freq; meaning not fully RE'd
[8:10] T_halfperiod uint16 LE Tran half-period in samples (freq = 512 / halfp Hz)
[10:12] V_peak_count uint16 LE
[10] V_peak_count uint8
[11] V_annotation uint8
[12:14] V_halfperiod uint16 LE
[14:16] L_peak_count uint16 LE
[14] L_peak_count uint8
[15] L_annotation uint8
[16:18] L_halfperiod uint16 LE
[18:20] M_peak_count uint16 LE MicL peak (count dB via mic_count_to_db)
[18] M_peak_count uint8 MicL peak (count dB via mic_count_to_db)
[19] M_annotation uint8
[20:22] M_halfperiod uint16 LE MicL half-period in samples (freq = 512 / halfp Hz)
[22:24] 0x00 0x00 constant
[24:28] 4-byte variable purpose unknown (possibly CRC or timestamp delta)
[28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature
NOTE on peak-count width: an earlier interpretation treated the peak
fields as uint16 LE spanning [6:8] / [10:12] / [14:16] / [18:20].
That happened to be byte-exact against the N844 fixture corpus only
because every annotation byte in those fixtures was zero, making
``uint16 LE == uint8``. Cross-correlating BE9558 (K558) Tran-drift
and BE18003 (T003) Histogram+Continuous events against the BW ASCII
export proved peak is uint8 alone see test_histogram_codec.py
and docs/histogram_codec_re_status.md.
Block-identification anchor: ``block[22:24] == b"\\x00\\x00"`` AND
``block[28:32] == b"\\x1e\\x0a\\x00\\x00"``. This is the reliable
distinguisher from non-block content in the file.
@@ -101,23 +115,6 @@ _BLOCK_SIZE = 32
# additional validation that we're looking at a real block.
_BLOCK_MARKER = 10
# Maximum plausible peak-count value. Normal-range geophone tops out
# at 10 in/s = 2000 counts at the 0.005 in/s per count scale; even
# Sensitive range (1.25 in/s FS) wouldn't exceed ~250. Mic counts run
# 0..~400 in observed data. 4096 leaves comfortable headroom for any
# legitimate value across all modes.
#
# Some prod blocks have been observed with peak-count fields whose
# HIGH byte is non-zero (block[7] != 0 etc.) — observed across BE9558
# and BE18003 units in Histogram-mode events. Reading these as
# uint16 LE produces values like 30981 / 41733 / 62469, which scale
# to physically impossible peaks (150+ in/s). Best guess: an
# undocumented "time-of-peak-within-interval" extension byte the
# device writes in some sub-mode (possibly Histogram+Continuous).
# Until reverse-engineered, blocks exceeding this bound are skipped
# rather than propagating bogus values into PVS computations.
_MAX_PEAK_COUNT = 4096
# Geo peak scaling: stored as "count × 0.005 in/s" where 1 count = one
# 0.005 in/s display quantum. Equivalent to the waveform codec's
# 16-count-unit output (1 unit = 0.005 in/s = 16 ADC counts).
@@ -149,23 +146,36 @@ def _decode_block(block: bytes) -> Optional[dict]:
"""Decode one 32-byte histogram block. Caller must have validated
with ``_is_data_block`` first.
Returns ``None`` if any peak field exceeds ``_MAX_PEAK_COUNT``
those blocks contain an undocumented extension byte format whose
naive uint16 LE interpretation gives physically impossible peaks.
Skipping the block is safer than propagating bogus values into
PVS computations downstream.
Returns a record with per-channel peak counts (uint8) and
half-periods (uint16 LE).
"""
# All 16-bit fields are little-endian unsigned. Peak counts are
# always non-negative; half-periods are always positive when valid.
t_peak, t_halfp, v_peak, v_halfp, l_peak, l_halfp, m_peak, m_halfp = struct.unpack_from(
"<HHHHHHHH", block, 6
)
if (t_peak > _MAX_PEAK_COUNT or v_peak > _MAX_PEAK_COUNT
or l_peak > _MAX_PEAK_COUNT or m_peak > _MAX_PEAK_COUNT):
return None
# Peak counts are uint8 at bytes [6] / [10] / [14] / [18]. The
# adjacent bytes [7] / [11] / [15] / [19] hold an annotation field
# whose meaning isn't fully understood (empirically non-zero in
# intervals with sub-Hz or unmeasurable geo frequencies, mostly
# zero otherwise — see test fixtures from BE9558/BE18003 corpora).
# Crucially, those annotation bytes are NOT the high byte of the
# peak count: cross-correlating against BW's per-interval ASCII
# export proves the peak is uint8 alone.
#
# Reading the peak as uint16 LE (the original interpretation) was
# accidentally correct only because every block in the N844 fixture
# corpus had a zero annotation byte; non-N844 events with non-zero
# annotation bytes decoded to physically impossible peaks (e.g.
# 268 in/s per channel) and produced 35× inflated PVS sums when
# first run against prod data. See histogram_codec_re_status.md.
t_peak = block[6]
v_peak = block[10]
l_peak = block[14]
m_peak = block[18]
t_halfp = block[8] | (block[9] << 8)
v_halfp = block[12] | (block[13] << 8)
l_halfp = block[16] | (block[17] << 8)
m_halfp = block[20] | (block[21] << 8)
segment_id = block[1]
block_ctr = block[2] | (block[3] << 8)
var_meta = bytes(block[24:28])
annotations = (block[7], block[11], block[15], block[19])
return {
"segment_id": segment_id,
"block_ctr": block_ctr,
@@ -178,6 +188,7 @@ def _decode_block(block: bytes) -> Optional[dict]:
"m_peak": m_peak,
"m_halfp": m_halfp,
"meta_var": var_meta,
"annotations": annotations,
}
@@ -185,10 +196,15 @@ def walk_body(body: bytes) -> List[dict]:
"""Walk the body and return one dict per histogram interval.
Iterates 32-byte strides from offset 0. Yields a decoded record
for every block that passes ``_is_data_block`` validation AND has
plausible peak values (``_decode_block`` returns None for blocks
with out-of-bound peaks). Stops when the remaining bytes are too
short to form a complete block.
for every block that passes ``_is_data_block`` validation. Stops
when the remaining bytes are too short to form a complete block.
In Histogram+Continuous mode the body interleaves data blocks with
other 32-byte content (likely continuous-mode waveform blocks) that
fail the data-block validation; the walker naturally skips them
without losing 32-byte alignment. Use ``block_ctr`` from each
returned record to map back to the original interval index the
record list is sparse when other block types are interleaved.
"""
records: List[dict] = []
for off in range(0, len(body) - _BLOCK_SIZE + 1, _BLOCK_SIZE):
+2 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "seismo-relay"
version = "0.19.0"
version = "0.21.1"
description = "Python client and REST server for MiniMate Plus seismographs"
requires-python = ">=3.10"
dependencies = [
@@ -15,6 +15,7 @@ dependencies = [
"python-multipart>=0.0.7",
"h5py>=3.10",
"numpy>=1.24",
"matplotlib>=3.8",
]
[tool.setuptools.packages.find]
+1
View File
@@ -5,3 +5,4 @@ pyserial
python-multipart
h5py
numpy
matplotlib
+71 -8
View File
@@ -103,6 +103,17 @@ def main(argv=None) -> int:
"STRT-rectime byte-offset fix in v0.15.x)."
),
)
p.add_argument(
"--reparse-txt", action="store_true",
help=(
"Re-parse the preserved <serial>/<filename>_ASCII.TXT with the "
"current bw_ascii_report parser and overwrite the sidecar's "
"bw_report block. Use this after upgrading the ASCII parser to "
"pull in new fields (e.g. zc_freq_above_range for BW '>100 Hz' "
"ZC peaks). No-op for events without a preserved .TXT; safely "
"idempotent when the parser hasn't changed."
),
)
p.add_argument("-v", "--verbose", action="store_true")
args = p.parse_args(argv)
@@ -153,7 +164,7 @@ def main(argv=None) -> int:
# of the sidecar implies staleness of the derived .h5 (both
# come out of the same decoder).
sidecar_stale = True
if sidecar_path.exists() and not args.force:
if sidecar_path.exists() and not args.force and not args.reparse_txt:
try:
existing = event_file_io.read_sidecar(sidecar_path)
sha_ok = existing.get("blastware", {}).get("sha256") == bw_sha
@@ -287,19 +298,68 @@ def main(argv=None) -> int:
or ev.total_samples < derived // 4):
ev.total_samples = derived
# Preserve user-edited review state + extensions from the
# existing sidecar (false_trigger flag, notes, etc.) so a
# backfill never wipes them out.
preserved_review = None
preserved_ext = None
# Preserve user-edited review state + extensions + the
# bw_report block from the existing sidecar so a backfill
# never wipes them out. The bw_report block originates
# from the paired .TXT ASCII report parsed at ORIGINAL
# import time (ach forward / direct upload); the .TXT
# file is not in the waveform store, so we can't re-derive
# it from disk. event_to_sidecar_dict takes a
# BwAsciiReport dataclass (not a dict), so for bw_report
# we overlay the existing block after regen instead of
# passing it as a kwarg.
preserved_review = None
preserved_ext = None
preserved_bw_report = None
preserved_txt_fn = None
if sidecar_path.exists():
try:
_existing = event_file_io.read_sidecar(sidecar_path)
preserved_review = _existing.get("review")
preserved_ext = _existing.get("extensions")
preserved_review = _existing.get("review")
preserved_ext = _existing.get("extensions")
preserved_bw_report = _existing.get("bw_report")
# Preserve txt_filename so backfills don't blank out the
# pointer to the saved raw .TXT (events ingested after
# 2026-05-27 have this).
preserved_txt_fn = (_existing.get("source") or {}).get("txt_filename")
except Exception:
pass
# --reparse-txt: if a .TXT is preserved on disk, run the
# current parser against it and overwrite the bw_report
# block. Picks up post-ingest parser fixes (e.g. the
# 2026-05-28 zc_freq_above_range / ">100 Hz" addition).
if args.reparse_txt and preserved_txt_fn:
try:
from minimateplus import bw_ascii_report
txt_path = store.txt_path_for(serial, path.name)
if txt_path.exists():
refreshed = bw_ascii_report.parse_report_file(txt_path)
preserved_bw_report = event_file_io._bw_report_to_dict(refreshed)
log.debug("reparsed bw_report from %s", txt_path.name)
else:
log.debug("--reparse-txt: no .TXT at %s (sidecar says %r)",
txt_path, preserved_txt_fn)
except Exception as exc:
log.warning("--reparse-txt failed for %s: %s", path.name, exc)
# Overlay BW ASCII report fields onto the rebuilt Event
# BEFORE the sidecar + DB write. Mirrors what the ingest
# path does — BW's reported peaks (and sample_rate /
# record_time) win over codec output where present.
#
# Without this step, --force backfill silently overwrites
# the bw_report-overlaid DB columns with codec-derived
# values, which is wrong for events the codec doesn't
# fully decode (e.g. waveform walker edge cases on
# SP0/SS0/SV0-style events, or histogram sub-formats with
# byte[5]!=0 that aren't yet RE'd). Net effect was PVS=0
# on three top-10 events on 2026-05-22.
if preserved_bw_report:
event_file_io.apply_bw_report_dict_to_event(
ev, preserved_bw_report,
)
sidecar = event_file_io.event_to_sidecar_dict(
ev,
serial=serial,
@@ -308,9 +368,12 @@ def main(argv=None) -> int:
blastware_sha256=bw_sha,
source_kind=source_kind,
a5_pickle_filename=a5_filename,
txt_filename=preserved_txt_fn,
review=preserved_review,
extensions=preserved_ext,
)
if preserved_bw_report is not None:
sidecar["bw_report"] = preserved_bw_report
# Also emit the .h5 clean-waveform file when:
# - it's missing, OR
+331
View File
@@ -0,0 +1,331 @@
"""
scripts/backfill_thor_events.py re-process existing Thor (Series IV)
events so their sidecars carry the bw_report block produced by
``micromate.idf_to_bw_report.build_bw_report_from_idf`` + their .h5
clean-waveform files for IDFW events.
Why this exists
Thor events ingested before v0.21.0 (or during the v0.21.0 ingest bug
window fixed in commit bee1185) have sidecars with only
``extensions.idf_report`` no ``bw_report`` block. Without
``bw_report``, the SFM PDF renderer falls back to DB-only fields
(misses sensor-self-check, full per-channel breakdown, mic dB(L)),
and the modal chart 404s on ``/waveform.json`` for IDFW events
because no .h5 was written when the codec failed at ingest.
Re-forwarding from thor-watcher would also fix this, but that requires
operator coordination on every watcher machine and uses bandwidth this
script doesn't.
What this does
Walks ``<store>/<serial>/<filename>`` for ``.IDFW`` / ``.IDFH`` files
and, for each one:
1. Reads the existing sidecar (preserving review state + captured_at).
2. Re-runs ``micromate.idf_file.read_idf_file()`` on the binary
bytes passing ``data=`` so the codec doesn't try to read from
a path it doesn't know.
3. Pulls ``extensions.idf_report`` (the raw parsed Thor dict the
v0.18.0+ ingest path already stashed) and runs the v0.21.0
``build_bw_report_from_idf`` adapter against it.
4. Writes the refreshed sidecar with the new ``bw_report``,
bumped ``source.tool_version``, but preserved ``review`` block
+ the original ``captured_at`` timestamp.
5. Regenerates the .h5 waveform file via the existing
``event_hdf5`` writer. For IDFW that's the decoded per-sample
stream; for IDFH it's a 1-sample-per-interval synthesised array
(peak ADC count per channel) so the renderer's bar-chart code
has data to group on. Mic peak psi from the binary is merged
onto the IdfEvent before the bridge so the h5 writer's per-count
mic scale factor lands on a sensible value (without this the
mic chart on Thor events plots dB(L)-as-pseudo-psi and shows
bomb-level numbers).
Idempotent. Re-running it after a parser/adapter change just
re-writes sidecars no DB writes, no thor-watcher coordination.
Usage
python scripts/backfill_thor_events.py [--store-root PATH]
[--dry-run]
[--skip-hdf5]
[--force]
[-v]
By default, refreshes any Thor event whose sidecar is missing
``bw_report`` OR whose ``source.tool_version`` is older than the
current ``TOOL_VERSION``. ``--force`` refreshes every Thor event
regardless.
"""
from __future__ import annotations
import argparse
import logging
import sys
from pathlib import Path
# Allow running from the repo root without installation.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from minimateplus import event_file_io
from sfm.waveform_store import WaveformStore
log = logging.getLogger("backfill_thor_events")
def _is_thor_event(path: Path) -> bool:
if not path.is_file():
return False
if path.name.endswith((".sfm.json", ".h5", "_ASCII.TXT")):
return False
return path.suffix.upper() in (".IDFW", ".IDFH")
def _vtuple(s: str) -> tuple:
try:
return tuple(int(p) for p in str(s).split(".")[:3])
except Exception:
return (0, 0, 0)
def main(argv=None) -> int:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument(
"--db-path",
default=str(Path(__file__).resolve().parent.parent / "bridges" / "captures" / "seismo_relay.db"),
help="Used only to derive the default --store-root.",
)
p.add_argument("--store-root", default=None)
p.add_argument("--dry-run", action="store_true")
p.add_argument("--skip-hdf5", action="store_true",
help="Don't regenerate .h5 files for IDFW events.")
p.add_argument("--force", action="store_true",
help="Refresh every Thor event, not just ones with stale or missing bw_report.")
p.add_argument("-v", "--verbose", action="store_true")
args = p.parse_args(argv)
logging.basicConfig(
level=logging.DEBUG if args.verbose else logging.INFO,
format="%(asctime)s %(levelname)-7s %(name)s %(message)s",
datefmt="%H:%M:%S",
)
db_path = Path(args.db_path).expanduser().resolve()
store_root = (
Path(args.store_root).expanduser().resolve()
if args.store_root else db_path.parent / "waveforms"
)
if not store_root.exists():
log.error("store root not found: %s", store_root)
return 1
store = WaveformStore(store_root)
log.info("store root: %s", store_root)
log.info("current TOOL_VERSION: %s", event_file_io.TOOL_VERSION)
refreshed = skipped = errors = h5_written = 0
# Lazy imports so any one of these failing produces a useful error
# message rather than crashing module-load.
from micromate.idf_file import read_idf_file
from micromate.idf_to_bw_report import build_bw_report_from_idf
for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
serial = serial_dir.name
for path in sorted(serial_dir.iterdir()):
if not _is_thor_event(path):
continue
sidecar_path = store.sidecar_path_for(serial, path.name)
if not sidecar_path.exists():
log.debug("%s: no sidecar — skipping (this is a binary without ingest history)",
path.name)
skipped += 1
continue
try:
existing = event_file_io.read_sidecar(sidecar_path)
except Exception as exc:
log.warning("%s: failed to read sidecar — %s", path.name, exc)
errors += 1
continue
has_bw_report = bool(existing.get("bw_report"))
existing_version = (existing.get("source") or {}).get("tool_version", "")
up_to_date = (
has_bw_report
and _vtuple(existing_version) >= _vtuple(event_file_io.TOOL_VERSION)
)
if up_to_date and not args.force:
skipped += 1
continue
# Re-decode the binary. Catch + log; continue with .txt-only
# data if it fails (matches the live ingest path's behavior).
idf_samples = None
idf_intervals = None
binary_md = None
is_histogram = path.suffix.upper() == ".IDFH"
try:
binary_bytes = path.read_bytes()
res = read_idf_file(path, data=binary_bytes)
idf_samples = res.samples or None
idf_intervals = res.intervals
binary_md = res.binary_metadata
is_histogram = res.intervals is not None
except NotImplementedError:
# sig-B / Blastware-stray binary; no samples but adapter
# can still produce a bw_report from extensions.idf_report.
log.debug("%s: binary codec NotImplementedError (sig-B / BW-stray); proceeding from sidecar's idf_report only", path.name)
except Exception as exc:
log.warning("%s: binary decode failed — %s; proceeding from sidecar's idf_report only", path.name, exc)
# Run the adapter. Pull report_dict from
# extensions.idf_report (the v0.18.0+ ingest preserved it).
report_dict = (existing.get("extensions") or {}).get("idf_report") or {}
if not report_dict and binary_md is None:
log.debug("%s: no idf_report in sidecar AND no binary metadata — nothing to project", path.name)
skipped += 1
continue
try:
bw_report = build_bw_report_from_idf(
report_dict, binary_md=binary_md,
intervals=idf_intervals, is_histogram=is_histogram,
)
except Exception as exc:
log.warning("%s: adapter failed — %s", path.name, exc)
errors += 1
continue
# Build the new sidecar by overlaying refreshed fields onto
# the existing one — preserves review, captured_at, blastware
# block, source.kind, etc.
new_sidecar = dict(existing) # shallow copy
new_sidecar["bw_report"] = bw_report
src = dict(new_sidecar.get("source") or {})
src["tool_version"] = event_file_io.TOOL_VERSION
new_sidecar["source"] = src
# Preserve histogram intervals if the binary decoded them
# (improves over the original ingest if that one ran before
# the bee1185 codec fix).
if idf_intervals is not None:
ext = dict(new_sidecar.get("extensions") or {})
ext["idf_intervals"] = [
{
"offset": iv.offset,
"tran_peak": iv.peak_count("Tran"),
"tran_halfp": iv.tran_halfp,
"tran_freq": iv.freq_hz("Tran"),
"vert_peak": iv.peak_count("Vert"),
"vert_halfp": iv.vert_halfp,
"vert_freq": iv.freq_hz("Vert"),
"long_peak": iv.peak_count("Long"),
"long_halfp": iv.long_halfp,
"long_freq": iv.freq_hz("Long"),
"mic_peak": iv.peak_count("MicL"),
"mic_halfp": iv.micl_halfp,
"mic_freq": iv.freq_hz("MicL"),
}
for iv in idf_intervals
]
new_sidecar["extensions"] = ext
if args.dry_run:
will_write_h5 = (idf_samples or idf_intervals) and not args.skip_hdf5
log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)",
serial, path.name,
"wrote" if not has_bw_report else "refreshed",
"would write" if will_write_h5 else "skipped")
else:
event_file_io.write_sidecar(sidecar_path, new_sidecar)
log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)",
serial, path.name,
"added" if not has_bw_report else "refreshed",
len(idf_intervals) if idf_intervals else 0)
refreshed += 1
# Regenerate .h5 by replaying the same IdfEvent → Event bridge
# save_imported_idf uses. For IDFW we write the decoded per-
# sample arrays. For IDFH we synthesise a 1-sample-per-interval
# array (peak ADC count per channel per interval) so the
# renderer's bar-chart code has something to group on.
# Pre-condition: either real samples (IDFW) or decoded intervals
# (IDFH). Skip otherwise.
have_data = bool(idf_samples) or bool(idf_intervals)
if have_data and not args.skip_hdf5:
from sfm import event_hdf5
hdf5_path = store.hdf5_path_for(serial, path.name)
if args.dry_run:
log.debug("[DRY] would write %s", hdf5_path.name)
else:
try:
from micromate import IdfEvent
from minimateplus.event_file_io import file_sha256
idf_event = IdfEvent.from_report(report_dict, path.name)
# Merge the binary-derived mic peak psi (only the
# binary path knows the proper psi value; the .txt
# carries dB(L)). Without this, the h5 writer's
# per-count mic factor is computed against the
# dB(L) value-as-pseudo-psi and the mic chart
# scales wildly.
if (binary_md is not None and res is not None
and res.event.peaks.mic_pspl_psi is not None):
idf_event.peaks.mic_pspl_psi = res.event.peaks.mic_pspl_psi
sha256 = file_sha256(path)
waveform_key = bytes.fromhex(sha256)[:16]
ev = idf_event.to_minimateplus_event(waveform_key)
if is_histogram and idf_intervals:
# 1 sample per interval per channel — same
# synthesis save_imported_idf uses. The h5
# writer's count×geo_fs/32768 conversion turns
# each peak-ADC-count into the bar's physical
# value.
ev.raw_samples = {
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
"Long": [iv.peak_count("Long") for iv in idf_intervals],
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
}
ev.total_samples = ev.total_samples or len(idf_intervals)
elif idf_samples:
ev.raw_samples = idf_samples
n_samp = max(
(len(idf_samples.get(ch, []))
for ch in ("Tran", "Vert", "Long", "MicL")),
default=0,
)
ev.total_samples = ev.total_samples or n_samp
event_hdf5.write_event_hdf5(
hdf5_path, ev,
serial=serial,
geo_range="normal",
source_kind="idf-import",
tool_version=event_file_io.TOOL_VERSION,
)
h5_written += 1
log.debug("%s/%s — .h5 written (%s)",
serial, path.name,
f"{len(idf_intervals)} intervals" if is_histogram
else f"{sum(len(v) for v in (idf_samples or {}).values())} samples")
except Exception as exc:
log.warning("%s/%s — .h5 write failed: %s",
serial, path.name, exc)
log.info("Done. refreshed=%d skipped=%d errors=%d h5_written=%d",
refreshed, skipped, errors, h5_written)
return 0 if errors == 0 else 2
if __name__ == "__main__":
sys.exit(main())
+185
View File
@@ -0,0 +1,185 @@
"""
scripts/check_bw_report_preservation.py verify that running backfill_sidecars
doesn't wipe the `bw_report` block from sidecars that already had one.
Two-step workflow:
# Before running backfill — capture a baseline snapshot:
python scripts/check_bw_report_preservation.py snapshot \
--store-root /path/to/waveforms \
--out before.json
# Run backfill:
python scripts/backfill_sidecars.py --store-root /path/to/waveforms --force
# After backfill — diff against the baseline:
python scripts/check_bw_report_preservation.py diff \
--store-root /path/to/waveforms \
--baseline before.json
The diff classifies every sidecar into one of:
PRESERVED had bw_report before, has same hash now GOOD
CHANGED had bw_report before, has different hash now suspicious
(backfill should only ever copy the block verbatim)
WIPED had bw_report before, doesn't now ← BUG — data loss
STILL_MISSING didn't have bw_report before, still doesn't expected
NEW didn't have bw_report before, has one now
(only possible if a re-ingest happened between snapshots;
shouldn't happen during backfill)
REMOVED sidecar existed in baseline, file is gone now
ADDED sidecar didn't exist in baseline, exists now
Exit code is 0 if no WIPED or CHANGED entries are found, 1 otherwise.
"""
from __future__ import annotations
import argparse
import hashlib
import json
import sys
from pathlib import Path
from typing import Optional
# Allow running from the repo root without installation.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from minimateplus import event_file_io
def _bw_report_hash(sidecar_data: dict) -> Optional[str]:
"""Canonical-JSON hash of the bw_report block, or None if absent."""
br = sidecar_data.get("bw_report")
if not br:
return None
# sort_keys for stable hashing across dict-ordering differences
blob = json.dumps(br, sort_keys=True, separators=(",", ":"))
return hashlib.sha256(blob.encode()).hexdigest()
def _scan_store(store_root: Path) -> dict:
"""Walk every <serial>/<file>.sfm.json and return {relpath: hash_or_None}.
Relpath is `<serial>/<filename>` stable across machines/snapshots.
"""
out: dict[str, Optional[str]] = {}
for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
for sidecar in sorted(serial_dir.glob("*.sfm.json")):
relpath = f"{serial_dir.name}/{sidecar.name}"
try:
data = event_file_io.read_sidecar(sidecar)
except Exception as exc:
print(f" WARN: failed to read {relpath}: {exc}", file=sys.stderr)
continue
out[relpath] = _bw_report_hash(data)
return out
def cmd_snapshot(args) -> int:
store_root = Path(args.store_root).expanduser().resolve()
if not store_root.exists():
print(f"error: store root does not exist: {store_root}", file=sys.stderr)
return 2
out_path = Path(args.out).expanduser().resolve()
print(f"Scanning {store_root}")
snapshot = _scan_store(store_root)
with_bw = sum(1 for v in snapshot.values() if v is not None)
without_bw = sum(1 for v in snapshot.values() if v is None)
print(f" total sidecars: {len(snapshot)}")
print(f" with bw_report: {with_bw}")
print(f" without bw_report: {without_bw}")
out_path.parent.mkdir(parents=True, exist_ok=True)
with open(out_path, "w") as f:
json.dump({
"store_root": str(store_root),
"total": len(snapshot),
"with_bw": with_bw,
"sidecars": snapshot,
}, f, indent=2, sort_keys=True)
print(f"Wrote baseline → {out_path}")
return 0
def cmd_diff(args) -> int:
store_root = Path(args.store_root).expanduser().resolve()
if not store_root.exists():
print(f"error: store root does not exist: {store_root}", file=sys.stderr)
return 2
baseline_path = Path(args.baseline).expanduser().resolve()
if not baseline_path.exists():
print(f"error: baseline file not found: {baseline_path}", file=sys.stderr)
return 2
with open(baseline_path) as f:
baseline = json.load(f)
before = baseline["sidecars"]
print(f"Scanning {store_root} for comparison against {baseline_path.name}")
after = _scan_store(store_root)
classes = {k: [] for k in (
"PRESERVED", "CHANGED", "WIPED", "STILL_MISSING", "NEW", "REMOVED", "ADDED",
)}
all_keys = set(before) | set(after)
for key in sorted(all_keys):
b = before.get(key, "__MISSING__")
a = after.get(key, "__MISSING__")
if b == "__MISSING__":
classes["ADDED"].append(key)
elif a == "__MISSING__":
classes["REMOVED"].append(key)
elif b is None and a is None:
classes["STILL_MISSING"].append(key)
elif b is None and a is not None:
classes["NEW"].append(key)
elif b is not None and a is None:
classes["WIPED"].append(key)
elif b == a:
classes["PRESERVED"].append(key)
else:
classes["CHANGED"].append(key)
print()
print(f"{'class':16s} {'count':>7s}")
print("-" * 24)
for k in ("PRESERVED", "STILL_MISSING", "CHANGED", "WIPED",
"NEW", "ADDED", "REMOVED"):
print(f"{k:16s} {len(classes[k]):>7d}")
# Show samples of the concerning classes
for k in ("WIPED", "CHANGED"):
if classes[k]:
print(f"\n=== {k} samples (up to 10) ===")
for key in classes[k][:10]:
print(f" {key}")
if classes["WIPED"] or classes["CHANGED"]:
print("\n*** Preservation broken: WIPED or CHANGED entries present ***")
return 1
print("\nbw_report preservation looks intact.")
return 0
def main(argv=None) -> int:
p = argparse.ArgumentParser(description=__doc__)
sub = p.add_subparsers(dest="cmd", required=True)
p_snap = sub.add_parser("snapshot", help="capture baseline bw_report hashes")
p_snap.add_argument("--store-root", required=True)
p_snap.add_argument("--out", required=True, help="output JSON path")
p_snap.set_defaults(func=cmd_snapshot)
p_diff = sub.add_parser("diff", help="diff current store against a baseline")
p_diff.add_argument("--store-root", required=True)
p_diff.add_argument("--baseline", required=True, help="JSON from `snapshot`")
p_diff.set_defaults(func=cmd_diff)
args = p.parse_args(argv)
return args.func(args)
if __name__ == "__main__":
sys.exit(main())
+91
View File
@@ -0,0 +1,91 @@
"""Re-ingest a prod IDFW + IDFH via the patched save_imported_idf and
render both PDFs to confirm charts have data."""
from __future__ import annotations
import sys
import json
import datetime
import tempfile
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
from sfm.waveform_store import WaveformStore
from sfm import report_pdf
import h5py
class FakeDb:
def __init__(self, event):
self.event = event
def get_event(self, _id):
return self.event
def to_ts_iso(ts):
if ts is None:
return None
try:
return datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
except Exception:
return None
def render_case(idf_path: Path, serial: str, out_pdf: Path, h5_summary: bool = True):
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idf_path.read_bytes(),
idf_path,
idf_report_text=None, # production worst case: no .txt
)
print(f"=== {idf_path.name} ===")
print(f" h5: {rec['hdf5_filename']}, sidecar: {rec['sidecar_filename']}")
h5p = Path(td) / serial / f"{idf_path.name}.h5"
if h5p.exists() and h5_summary:
with h5py.File(h5p) as h:
for ch in ("Tran", "Vert", "Long", "MicL"):
ds = h.get(f"samples/{ch}")
if ds is not None:
n = ds.shape[0]
mx = float(abs(ds[...]).max()) if n else 0
print(f" samples/{ch}: n={n} max_abs={mx:.5f}")
record_type = "Histogram" if idf_path.suffix.upper() == ".IDFH" else "Waveform"
fake_row = {
"serial": serial,
"blastware_filename": rec["filename"],
"record_type": record_type,
"timestamp": to_ts_iso(ev.timestamp),
"sample_rate": ev.sample_rate,
"project": ev.project_info.project if ev.project_info else None,
"client": ev.project_info.client if ev.project_info else None,
"operator": ev.project_info.operator if ev.project_info else None,
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
"created_at": None,
}
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
print(f" ReportData: channels={ {k: len(v) for k,v in rd.channels.items()} }")
if rd.is_histogram:
print(f" histogram n_intervals={rd.histogram_n_intervals} interval_size={rd.histogram_interval_size}")
pdf = report_pdf.render_event_report_pdf(rd)
out_pdf.write_bytes(pdf)
print(f" PDF: {out_pdf} ({len(pdf)} bytes)")
def main():
out_dir = Path("/tmp/thor_render_test"); out_dir.mkdir(exist_ok=True)
cases = [
# IDFW that decoded to preamble-only under the old codec
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804154137.IDFW", "UM6047"),
# IDFW that worked under the old codec (validates no regression)
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804104450.IDFW", "UM6047"),
# IDFH histogram
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804190047.IDFH", "UM6047"),
]
for path, serial in cases:
render_case(Path(path), serial, out_dir / f"{Path(path).name}.pdf")
if __name__ == "__main__":
main()
+909
View File
@@ -0,0 +1,909 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>SFM Event Browser</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.min.js"></script>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body {
background: #0d1117;
color: #c9d1d9;
font-family: 'Segoe UI', system-ui, sans-serif;
font-size: 13px;
height: 100vh;
display: flex;
flex-direction: column;
overflow: hidden;
}
header {
background: #161b22;
border-bottom: 1px solid #30363d;
padding: 12px 20px;
display: flex;
align-items: center;
gap: 16px;
flex-shrink: 0;
}
header h1 {
font-size: 15px;
font-weight: 600;
color: #f0f6fc;
white-space: nowrap;
}
label { color: #8b949e; font-size: 12px; }
select, input[type="text"], input[type="search"] {
background: #0d1117;
border: 1px solid #30363d;
border-radius: 6px;
color: #c9d1d9;
padding: 5px 8px;
font-size: 13px;
}
select { min-width: 140px; }
input[type="search"] { width: 200px; }
select:focus, input:focus { outline: none; border-color: #388bfd; }
button {
background: #1f6feb;
border: none;
border-radius: 6px;
color: #fff;
cursor: pointer;
font-size: 13px;
font-weight: 500;
padding: 5px 14px;
}
button:hover { background: #388bfd; }
button:disabled { background: #21262d; color: #484f58; cursor: not-allowed; }
#main {
flex: 1;
display: flex;
overflow: hidden;
}
/* ── Event list (left sidebar) ────────────────────────────────── */
#event-list-wrap {
width: 320px;
flex-shrink: 0;
background: #0d1117;
border-right: 1px solid #21262d;
display: flex;
flex-direction: column;
}
#event-list-header {
padding: 10px 14px;
border-bottom: 1px solid #21262d;
font-size: 11px;
color: #8b949e;
text-transform: uppercase;
letter-spacing: 0.06em;
display: flex;
justify-content: space-between;
}
#event-list {
flex: 1;
overflow-y: auto;
}
.event-row {
padding: 8px 14px;
border-bottom: 1px solid #161b22;
cursor: pointer;
transition: background 0.1s;
}
.event-row:hover { background: #161b22; }
.event-row.active { background: #1f3a5f; border-left: 3px solid #58a6ff; padding-left: 11px; }
.event-row .er-top {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 2px;
}
.event-row .er-ts { font-family: monospace; font-size: 12px; color: #c9d1d9; }
.event-row .er-pvs { font-family: monospace; font-size: 12px; color: #58a6ff; font-weight: 600; }
.event-row .er-meta { font-size: 11px; color: #8b949e; }
.event-row.false_trigger .er-pvs { color: #f85149; text-decoration: line-through; }
/* ── Main viewer (right side) ─────────────────────────────────── */
#viewer {
flex: 1;
display: flex;
flex-direction: column;
overflow: hidden;
}
#event-meta {
padding: 12px 20px;
background: #161b22;
border-bottom: 1px solid #21262d;
display: grid;
grid-template-columns: repeat(auto-fit, minmax(160px, 1fr));
gap: 8px 24px;
flex-shrink: 0;
}
.meta-field {
display: flex;
flex-direction: column;
gap: 1px;
}
.meta-field .mf-label {
font-size: 10px;
color: #484f58;
text-transform: uppercase;
letter-spacing: 0.05em;
}
.meta-field .mf-value {
font-family: monospace;
font-size: 13px;
color: #c9d1d9;
}
.meta-field .mf-value.highlight { color: #58a6ff; font-weight: 600; }
#charts {
flex: 1;
overflow-y: auto;
padding: 12px 16px;
display: flex;
flex-direction: column;
gap: 10px;
}
.chart-wrap {
background: #161b22;
border: 1px solid #21262d;
border-radius: 8px;
padding: 10px 30px 8px 12px; /* right padding leaves room for the "0.0" baseline label */
}
.chart-label {
font-size: 11px;
font-weight: 600;
letter-spacing: 0.06em;
text-transform: uppercase;
margin-bottom: 4px;
display: flex;
justify-content: space-between;
}
.chart-canvas-wrap { position: relative; height: 130px; }
.ch-tran { color: #58a6ff; }
.ch-vert { color: #3fb950; }
.ch-long { color: #d29922; }
.ch-micl { color: #bc8cff; }
#status-bar {
background: #161b22;
border-top: 1px solid #21262d;
padding: 5px 20px;
font-size: 12px;
color: #8b949e;
min-height: 26px;
flex-shrink: 0;
}
#status-bar.error { color: #f85149; }
#status-bar.ok { color: #3fb950; }
#empty-state {
flex: 1;
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
color: #484f58;
gap: 8px;
}
#empty-state svg { opacity: 0.3; }
.pill {
background: #21262d;
border-radius: 4px;
padding: 2px 8px;
color: #c9d1d9;
font-family: monospace;
font-size: 11px;
margin-left: 8px;
}
/* Per-channel stats table in the metadata header */
.stats-table {
grid-column: 1 / -1;
border-collapse: collapse;
font-family: monospace;
font-size: 12px;
margin-top: 4px;
}
.stats-table th, .stats-table td {
padding: 3px 14px 3px 0;
text-align: left;
color: #c9d1d9;
}
.stats-table th {
color: #484f58;
font-size: 10px;
text-transform: uppercase;
letter-spacing: 0.05em;
font-weight: 500;
}
/* ── Print view (light theme matching the Instantel printout) ─── */
body.print-view {
background: #ffffff;
color: #000000;
}
body.print-view header,
body.print-view #event-list-wrap,
body.print-view #event-list-header,
body.print-view #event-meta,
body.print-view #status-bar,
body.print-view .chart-wrap {
background: #ffffff;
border-color: #cccccc;
color: #000000;
}
body.print-view .event-row { color: #000; border-bottom-color: #eee; }
body.print-view .event-row:hover { background: #f4f4f4; }
body.print-view .event-row.active {
background: #e6f0ff;
border-left-color: #1f6feb;
}
body.print-view .er-ts { color: #000; }
body.print-view .er-pvs { color: #003a8c; }
body.print-view .er-meta,
body.print-view #event-list-header,
body.print-view .meta-field .mf-label,
body.print-view .stats-table th {
color: #666;
}
body.print-view .mf-value { color: #000; }
body.print-view .mf-value.highlight { color: #003a8c; }
body.print-view label { color: #444; }
body.print-view input, body.print-view select {
background: #fff; color: #000; border-color: #ccc;
}
/* In print theme, the channel-label colors stay (they identify
the trace). Only the chart panel background flips. */
@media print {
header, #event-list-wrap, #status-bar, button { display: none !important; }
body { overflow: visible; height: auto; }
#main, #viewer { overflow: visible; }
#charts { overflow: visible; }
}
</style>
</head>
<body>
<header>
<h1>SFM Event Browser</h1>
<label>Serial</label>
<select id="serial-select">
<option value="">Loading…</option>
</select>
<input type="search" id="event-filter" placeholder="filter events…" />
<span class="pill" id="count-pill"></span>
<button id="mic-unit-toggle" style="margin-left:auto;background:#21262d"
onclick="_setMicUnit(_getMicUnit() === 'dBL' ? 'psi' : 'dBL')"
title="Toggle mic display unit (dBL ↔ psi). Persists across page loads.">
Mic: dBL
</button>
<button id="print-btn" onclick="togglePrintView()" style="background:#21262d">Print view</button>
<button id="reload-btn" onclick="loadSerials()">Reload</button>
</header>
<div id="main">
<div id="event-list-wrap">
<div id="event-list-header">
<span>Events</span>
<span id="event-list-count"></span>
</div>
<div id="event-list"></div>
</div>
<div id="viewer">
<div id="empty-state">
<svg width="48" height="48" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5">
<polyline points="22 12 18 12 15 21 9 3 6 12 2 12"/>
</svg>
<p>Select a unit and event to view its waveform.</p>
</div>
<div id="event-meta" style="display:none"></div>
<div id="charts" style="display:none"></div>
</div>
</div>
<div id="status-bar">Ready.</div>
<script>
// Channel colors and rendering order mirror Instantel's BW Event Report
// printout: MicL at the top, Tran at the bottom. Colors approximate
// what BW renders (magenta mic, blue long, green vert, red tran).
const CHANNEL_COLORS = {
MicL: '#e066ff',
Long: '#3a80ff',
Vert: '#3fb950',
Tran: '#f85149',
};
const CHANNEL_ORDER = ['MicL', 'Long', 'Vert', 'Tran'];
// Reference pressure for dB(L) — 20 µPa expressed in psi (≈ 2.9e-9 psi).
const DBL_REF = 2.9e-9;
// User-toggleable mic display unit: 'dBL' (default, matches BW printout
// + the rest of SFM) or 'psi' (raw sample unit).
function _getMicUnit() {
return localStorage.getItem('sfm_mic_unit') === 'psi' ? 'psi' : 'dBL';
}
function _setMicUnit(u) {
localStorage.setItem('sfm_mic_unit', u === 'psi' ? 'psi' : 'dBL');
_refreshMicUnitToggle();
if (currentEventId) loadEvent(currentEventId);
}
function _refreshMicUnitToggle() {
const b = document.getElementById('mic-unit-toggle');
if (b) b.textContent = `Mic: ${_getMicUnit()}`;
}
// psi → dB(L). Null for non-positive (log undefined; Chart.js renders as a gap).
function _psiToDbl(psi) {
if (psi == null || !(psi > 0)) return null;
return 20 * Math.log10(psi / DBL_REF);
}
// Per-sample mic chart conversion — rectify the AC waveform, dBL,
// floor below the noise-floor minimum. Gives a continuous baseline
// instead of the spikey/discontinuous look you get from raw _psiToDbl.
const MIC_DBL_FLOOR = 60;
function _psiToDblForChart(psi) {
if (psi == null) return MIC_DBL_FLOOR;
const a = Math.abs(psi);
if (a === 0) return MIC_DBL_FLOOR;
const dbl = 20 * Math.log10(a / DBL_REF);
return dbl > MIC_DBL_FLOOR ? dbl : MIC_DBL_FLOOR;
}
// Format an ISO timestamp in the browser's local timezone — UTC values
// (with 'Z' suffix) convert; naive values are interpreted as local clock.
// Returns '—' for null/empty/unparseable.
function _fmtTsLocal(iso) {
if (!iso) return '—';
const d = new Date(iso);
if (isNaN(d)) return iso;
return d.toLocaleString();
}
// Adaptive decimal formatter — scientific notation only for truly extreme
// values. Normal-range peaks render as plain decimals with sensible
// precision (was previously forcing toExponential(3) which produced ugly
// "2.500E-2 IN/S" labels).
function _fmtPeak(v, unit) {
if (v == null || (typeof v === 'number' && !isFinite(v))) return '';
if (typeof v !== 'number') return String(v) + (unit ? ' ' + unit : '');
if (v === 0) return '0' + (unit ? ' ' + unit : '');
const a = Math.abs(v);
const u = unit ? ' ' + unit : '';
if (a >= 0.0001 && a < 10000) {
const d = a >= 100 ? 1 : a >= 10 ? 2 : a >= 1 ? 3 : a >= 0.1 ? 4 : 5;
return v.toFixed(d) + u;
}
return v.toExponential(2) + u;
}
let allEvents = [];
let filteredEvents = [];
let currentEventId = null;
let charts = {};
const apiBase = window.location.origin;
function setStatus(msg, cls = '') {
const bar = document.getElementById('status-bar');
bar.textContent = msg;
bar.className = cls;
}
async function loadSerials() {
setStatus('Loading serials…');
try {
const r = await fetch(`${apiBase}/db/units`);
if (!r.ok) throw new Error(r.statusText);
// /db/units returns a bare list[dict], not {units:[...]}
const units = await r.json();
const sel = document.getElementById('serial-select');
sel.innerHTML = '';
if (!units || units.length === 0) {
sel.innerHTML = '<option value="">(no units found)</option>';
setStatus('No units in DB.', 'error');
return;
}
sel.innerHTML = '<option value="">— pick a unit —</option>' +
units.map(u => {
const n = u.total_events ?? 0;
return `<option value="${u.serial}">${u.serial} (${n} events)</option>`;
}).join('');
setStatus(`Loaded ${units.length} units.`, 'ok');
} catch (e) {
setStatus(`Failed to load units: ${e.message}`, 'error');
}
}
async function loadEventsForSerial(serial) {
if (!serial) {
allEvents = [];
renderEventList();
return;
}
setStatus(`Loading events for ${serial}…`);
try {
const r = await fetch(`${apiBase}/db/events?serial=${encodeURIComponent(serial)}&limit=500`);
if (!r.ok) throw new Error(r.statusText);
const d = await r.json();
allEvents = d.events || [];
document.getElementById('count-pill').textContent = `${allEvents.length} events`;
applyFilter();
setStatus(`Loaded ${allEvents.length} events for ${serial}.`, 'ok');
} catch (e) {
setStatus(`Failed to load events: ${e.message}`, 'error');
}
}
function applyFilter() {
const q = document.getElementById('event-filter').value.toLowerCase().trim();
if (!q) {
filteredEvents = allEvents;
} else {
filteredEvents = allEvents.filter(ev =>
(ev.blastware_filename || '').toLowerCase().includes(q) ||
(ev.timestamp || '').toLowerCase().includes(q) ||
(ev.record_type || '').toLowerCase().includes(q) ||
(ev.project || '').toLowerCase().includes(q)
);
}
document.getElementById('event-list-count').textContent = `${filteredEvents.length} / ${allEvents.length}`;
renderEventList();
}
function renderEventList() {
const list = document.getElementById('event-list');
list.innerHTML = '';
if (filteredEvents.length === 0) {
list.innerHTML = '<div style="padding:14px;color:#484f58;font-size:12px">No events.</div>';
return;
}
for (const ev of filteredEvents) {
const row = document.createElement('div');
row.className = 'event-row' + (ev.false_trigger ? ' false_trigger' : '');
if (ev.id === currentEventId) row.className += ' active';
const ts = _fmtTsLocal(ev.timestamp);
const pvs = ev.peak_vector_sum != null ? `${ev.peak_vector_sum.toFixed(3)} in/s` : '—';
row.innerHTML = `
<div class="er-top">
<span class="er-ts">${ts || '(no ts)'}</span>
<span class="er-pvs">${pvs}</span>
</div>
<div class="er-meta">${ev.record_type || '?'} · ${ev.blastware_filename || ev.id.slice(0,8)}</div>
`;
row.onclick = () => loadEvent(ev.id);
list.appendChild(row);
}
}
async function loadEvent(eventId) {
currentEventId = eventId;
renderEventList();
setStatus('Loading waveform…');
try {
// Sidecar fetch runs in parallel — its bw_report block carries ZC
// Freq + above-range flags + sensor-check results that the per-
// channel stats table surfaces. Failures are non-fatal (legacy
// events without a preserved .TXT have no sidecar bw_report).
const sidecarP = fetch(`${apiBase}/db/events/${eventId}/sidecar`)
.then(r => r.ok ? r.json() : null)
.catch(() => null);
const r = await fetch(`${apiBase}/db/events/${eventId}/waveform.json`);
if (!r.ok) {
if (r.status === 404) {
showEmpty('No waveform data for this event (codec returned no samples).');
return;
}
throw new Error(r.statusText);
}
const data = await r.json();
renderWaveform(data);
// Also fetch metadata from the events list for richer header
const ev = allEvents.find(e => e.id === eventId);
const sidecar = await sidecarP;
renderMeta(data, ev, sidecar);
setStatus(`Event loaded.`, 'ok');
} catch (e) {
setStatus(`Failed to load event: ${e.message}`, 'error');
showEmpty(`Error: ${e.message}`);
}
}
function showEmpty(msg) {
document.getElementById('empty-state').style.display = 'flex';
document.getElementById('empty-state').querySelector('p').textContent = msg;
document.getElementById('event-meta').style.display = 'none';
document.getElementById('charts').style.display = 'none';
Object.values(charts).forEach(c => c.destroy());
charts = {};
}
function renderMeta(data, ev, sidecar) {
const metaDiv = document.getElementById('event-meta');
const fields = [
['Serial', data.serial || ev?.serial || '—'],
['Timestamp', _fmtTsLocal(data.timestamp || ev?.timestamp)],
['Record', data.record_type || ev?.record_type || '—'],
['Sample rate', data.sample_rate ? `${data.sample_rate} sps` : '—'],
['Geo range', data.geo_range ? `${data.geo_range} (${data.geo_full_scale_ips} in/s FS)` : '—'],
['Project', ev?.project || '—'],
['Location', ev?.sensor_location || '—'],
['Peak Vector Sum',
ev?.peak_vector_sum != null ? `${ev.peak_vector_sum.toFixed(4)} in/s` : '—'],
];
// Per-channel stats table mirroring the printout's middle block.
// PPV from the events DB row; ZC Freq + saturation flags from the
// sidecar's bw_report block (when a .TXT was preserved on ingest).
const bwrPeaks = (sidecar?.bw_report || {}).peaks || {};
const bwrMic = (sidecar?.bw_report || {}).mic || {};
const fmt = v => (v == null ? '—' : (typeof v === 'number' ? v.toFixed(3) : v));
const fmtZc = bwr => {
if (!bwr || bwr.zc_freq_hz == null) return '—';
const prefix = bwr.zc_freq_above_range ? '>' : '';
return `${prefix}${Math.round(bwr.zc_freq_hz)} Hz`;
};
const rows = [
['Tran', ev?.tran_ppv, fmtZc(bwrPeaks.tran)],
['Vert', ev?.vert_ppv, fmtZc(bwrPeaks.vert)],
['Long', ev?.long_ppv, fmtZc(bwrPeaks.long)],
];
// Mic display honors the current user preference (dBL default).
// mic_ppv is stored as raw psi on series3 events; convert when needed.
const micPsi = ev?.mic_ppv;
const micUnitDisplay = _getMicUnit();
let micStr;
if (micPsi == null) {
micStr = '—';
} else if (micUnitDisplay === 'dBL') {
const d = _psiToDbl(Number(micPsi));
micStr = (d != null ? d.toFixed(1) : '—') + ' dBL';
} else {
micStr = Number(micPsi).toExponential(2) + ' psi';
}
const statsHtml = `
<table class="stats-table">
<thead>
<tr><th>Channel</th><th>PPV (in/s)</th><th>ZC Freq</th></tr>
</thead>
<tbody>
${rows.map(([ch, ppv, zc]) => `<tr><td>${ch}</td><td>${fmt(ppv)}</td><td>${zc}</td></tr>`).join('')}
<tr><td>MicL</td><td>${micStr}</td><td>${fmtZc(bwrMic)}</td></tr>
</tbody>
</table>
`;
metaDiv.innerHTML =
fields.map(([l, v]) =>
`<div class="meta-field"><span class="mf-label">${l}</span><span class="mf-value${l === 'Peak Vector Sum' ? ' highlight' : ''}">${v}</span></div>`
).join('') + statsHtml;
metaDiv.style.display = 'grid';
}
function togglePrintView() {
document.body.classList.toggle('print-view');
// Force chart redraw so axis/grid colors are re-evaluated against the
// new background. Easiest: re-render the current event.
if (currentEventId) {
loadEvent(currentEventId);
}
}
function renderWaveform(data) {
document.getElementById('empty-state').style.display = 'none';
const chartsDiv = document.getElementById('charts');
chartsDiv.style.display = 'flex';
chartsDiv.innerHTML = '';
Object.values(charts).forEach(c => c.destroy());
charts = {};
const channels = data.channels || {};
// time_axis is METADATA from sfm.plot.v1 — sample_rate, pretrig_samples,
// t0_ms (first-sample time relative to trigger; negative when pretrig
// exists), dt_ms. Trigger is at t=0 by convention.
const ta = data.time_axis || {};
const sr = ta.sample_rate || 1024;
const dtMs = ta.dt_ms || (1000.0 / sr);
const t0Ms = ta.t0_ms != null ? ta.t0_ms : 0;
const isPrintMode = document.body.classList.contains('print-view');
// Histograms record per-interval peaks (typically 1 per minute/5-min),
// not per-sample waveforms. Render as a tight bar graph instead of a
// line plot — matches the BW Event Report's histogram presentation.
const isHistogram = String(data.record_type || '').toLowerCase().includes('histogram');
// Which channels actually have data → determines which one renders the
// shared x-axis at the bottom (Instantel printout has the time scale
// only on the bottom-most chart).
const channelsWithData = CHANNEL_ORDER.filter(ch =>
channels[ch] && (channels[ch].values || []).length > 0
);
const lastDataCh = channelsWithData[channelsWithData.length - 1];
const micUnit = _getMicUnit();
for (const ch of CHANNEL_ORDER) {
const chData = channels[ch];
if (!chData) continue;
if ((chData.values || []).length === 0) {
// Render an empty card so user sees the channel exists but is missing
const wrap = document.createElement('div');
wrap.className = 'chart-wrap';
wrap.innerHTML = `
<div class="chart-label ch-${ch.toLowerCase()}">
<span>${ch}</span>
<span style="color:#484f58">no samples decoded</span>
</div>
<div class="chart-canvas-wrap" style="display:flex;align-items:center;justify-content:center;color:#484f58;font-size:12px">empty</div>
`;
chartsDiv.appendChild(wrap);
continue;
}
// Mic channel: convert from raw psi to dB(L) when the user prefers dBL
// (the default). We mutate `values`, `peak`, and `unit` locally so the
// chart datasets + axis title + tooltip + peak label all stay aligned.
let values = chData.values || [];
let unit = chData.unit || 'unit';
let peak = chData.peak;
const peakT = chData.peak_t_ms;
if (ch === 'MicL' && unit === 'psi' && micUnit === 'dBL') {
// Per-sample chart uses rectified-and-floored conversion so the
// baseline is continuous; the peak label uses the unrectified
// converter to preserve the true measurement.
values = values.map(_psiToDblForChart);
peak = _psiToDbl(peak);
unit = 'dB(L)';
}
const peakLabel = peak != null
? `peak ${_fmtPeak(peak, unit)}`
+ (!isHistogram && peakT != null ? ` @ ${peakT.toFixed(1)} ms` : '')
: '';
// Hide x-axis on every chart except the bottom-most data channel —
// gives the "single shared time axis" feel of the BW printout.
const showXAxis = (ch === lastDataCh);
const wrap = document.createElement('div');
wrap.className = 'chart-wrap';
const lbl = document.createElement('div');
lbl.className = `chart-label ch-${ch.toLowerCase()}`;
lbl.innerHTML = `<span>${ch}</span><span style="color:#8b949e;font-weight:normal">${peakLabel}</span>`;
wrap.appendChild(lbl);
const canvasWrap = document.createElement('div');
canvasWrap.className = 'chart-canvas-wrap';
const canvas = document.createElement('canvas');
canvasWrap.appendChild(canvas);
wrap.appendChild(canvasWrap);
chartsDiv.appendChild(wrap);
// Waveform: per-sample time in ms relative to trigger (negative for pretrig).
// Histogram: when the server has aggregated to BW-reported intervals AND
// provides per-interval timestamps, use those as x-axis labels (HH:MM:SS).
// Falls back to interval index.
let times;
if (isHistogram) {
const intervalTimes = ta.interval_times || [];
times = (intervalTimes.length === values.length)
? intervalTimes
: values.map((_, i) => i + 1);
} else {
times = values.map((_, i) => t0Ms + i * dtMs);
}
// Downsample for rendering
const MAX_POINTS = 4000;
let rT = times, rV = values;
if (values.length > MAX_POINTS) {
const step = Math.ceil(values.length / MAX_POINTS);
rT = times.filter((_, i) => i % step === 0);
rV = values.filter((_, i) => i % step === 0);
}
// Tick formatter — round to 1 decimal so we don't get
// "11.7187040000000002 ms" garbage from floating-point accumulation.
const xAxisUnit = isHistogram ? '' : ' ms';
const fmtTick = i => {
const v = rT[i];
if (typeof v !== 'number') return String(v) + xAxisUnit;
return (Number.isInteger(v) ? String(v) : v.toFixed(1)) + xAxisUnit;
};
// Y-axis bounds. Geophone waveforms render symmetric around zero
// (seismograph convention — zero line in the middle, signal goes
// up AND down). Mic + histograms keep default auto-scale (always
// positive values; zero at the bottom).
let yBounds = {};
const isGeo = ch !== 'MicL';
if (isGeo && !isHistogram) {
// Waveform geo: symmetric around zero for full shape detail.
let absMax = 0;
for (const v of values) {
const a = Math.abs(v);
if (a > absMax) absMax = a;
}
const padded = (absMax || 1) * 1.10;
yBounds = { min: -padded, max: padded };
} else if (isGeo && isHistogram) {
// Histogram geo: enforce minimum chart range so quiet events
// look quiet (matches BW's near-fixed-scale convention).
const HIST_GEO_MIN_INS = 0.05;
let p = 0;
for (const v of values) { const a = Math.abs(v); if (a > p) p = a; }
yBounds = { min: 0, max: Math.max(p * 1.10, HIST_GEO_MIN_INS) };
} else if (ch === 'MicL' && micUnit === 'dBL') {
// Mic dBL: baseline at noise-floor minimum, top at peak + 5 dB.
const peakDbl = (typeof peak === 'number' && isFinite(peak))
? peak + 5
: 100;
yBounds = { min: MIC_DBL_FLOOR, max: Math.max(peakDbl, MIC_DBL_FLOOR + 20) };
} else if (ch === 'MicL' && isHistogram && micUnit === 'psi') {
// Mic histogram in psi: same minimum-range treatment as geo.
const HIST_MIC_MIN_PSI = 0.001;
let p = 0;
for (const v of values) { const a = Math.abs(v); if (a > p) p = a; }
yBounds = { min: 0, max: Math.max(p * 1.10, HIST_MIC_MIN_PSI) };
}
const chart = new Chart(canvas, {
type: isHistogram ? 'bar' : 'line',
data: {
labels: rT.map(t => (typeof t === 'number' ? (Number.isInteger(t) ? String(t) : t.toFixed(2)) : t)),
datasets: isHistogram ? [{
data: rV,
backgroundColor: CHANNEL_COLORS[ch],
borderWidth: 0,
barPercentage: 1.0,
categoryPercentage: 1.0, // bars touch — tight bargraph
}] : [{
data: rV,
borderColor: CHANNEL_COLORS[ch],
borderWidth: 1,
pointRadius: 0,
tension: 0,
}],
},
options: {
animation: false,
responsive: true,
maintainAspectRatio: false,
plugins: {
legend: { display: false },
tooltip: {
mode: 'index',
intersect: false,
callbacks: {
title: items => isHistogram
? `interval ${items[0].label}`
: `t = ${items[0].label} ms`,
label: item => `${ch}: ${_fmtPeak(item.raw, unit)}`,
},
},
},
scales: {
x: {
type: 'category',
display: showXAxis,
ticks: {
color: isPrintMode ? '#666' : '#484f58',
maxTicksLimit: 10,
maxRotation: 0,
callback: (val, i) => fmtTick(i),
},
grid: { color: isPrintMode ? '#e0e0e0' : '#21262d', drawTicks: showXAxis },
},
y: {
...yBounds,
ticks: { color: isPrintMode ? '#666' : '#484f58', maxTicksLimit: 5 },
grid: { color: isPrintMode ? '#e0e0e0' : '#21262d' },
title: { display: true, text: unit,
color: isPrintMode ? '#666' : '#484f58', font: { size: 10 } },
},
},
},
plugins: isHistogram ? [] : [{
// Trigger line @ t=0 + triangle markers above/below + "0.0"
// baseline label on the right edge. Matches the Instantel
// BW Event Report printout style. Skipped for histograms —
// they have no trigger event.
id: 'instantelOverlays',
afterDraw(chart) {
const ctx = chart.ctx;
const xAxis = chart.scales.x;
const yAxis = chart.scales.y;
const fgPrim = isPrintMode ? '#000' : '#c9d1d9';
const fgTrigger = '#f85149';
// Dashed vertical trigger line at t=0
const zeroIdx = rT.findIndex(t => parseFloat(t) >= 0);
if (zeroIdx >= 0) {
const x = xAxis.getPixelForValue(zeroIdx);
ctx.save();
ctx.beginPath();
ctx.moveTo(x, yAxis.top);
ctx.lineTo(x, yAxis.bottom);
ctx.strokeStyle = isPrintMode ? '#cc0000' : 'rgba(248, 81, 73, 0.8)';
ctx.lineWidth = 1.2;
ctx.setLineDash([4, 3]);
ctx.stroke();
ctx.restore();
// Triangles above and below the chart at the trigger column
ctx.save();
ctx.fillStyle = fgTrigger;
ctx.beginPath(); // top triangle pointing down
ctx.moveTo(x - 5, yAxis.top - 8);
ctx.lineTo(x + 5, yAxis.top - 8);
ctx.lineTo(x, yAxis.top - 1);
ctx.closePath();
ctx.fill();
ctx.beginPath(); // bottom triangle pointing up
ctx.moveTo(x - 5, yAxis.bottom + 8);
ctx.lineTo(x + 5, yAxis.bottom + 8);
ctx.lineTo(x, yAxis.bottom + 1);
ctx.closePath();
ctx.fill();
ctx.restore();
}
// "0.0" baseline label on the right edge — printout convention.
// Position vertically at the zero-amplitude level.
const zeroY = yAxis.getPixelForValue(0);
if (zeroY >= yAxis.top && zeroY <= yAxis.bottom) {
ctx.save();
ctx.strokeStyle = isPrintMode ? '#aaa' : '#30363d';
ctx.lineWidth = 0.8;
ctx.setLineDash([2, 2]);
ctx.beginPath();
ctx.moveTo(xAxis.left, zeroY);
ctx.lineTo(xAxis.right, zeroY);
ctx.stroke();
ctx.restore();
ctx.save();
ctx.fillStyle = fgPrim;
ctx.font = '11px monospace';
ctx.textAlign = 'left';
ctx.textBaseline = 'middle';
ctx.fillText('0.0', xAxis.right + 6, zeroY);
ctx.restore();
}
},
}],
});
charts[ch] = chart;
}
}
// Wire up handlers
document.getElementById('serial-select').addEventListener('change', e => {
loadEventsForSerial(e.target.value);
});
document.getElementById('event-filter').addEventListener('input', applyFilter);
// Reflect any persisted mic-unit preference in the header pill on load
_refreshMicUnitToggle();
// Initial load
loadSerials();
</script>
</body>
</html>
+939
View File
@@ -0,0 +1,939 @@
"""
sfm/report_pdf.py generate Instantel-style Event Report PDFs.
Stub layout for v0.20.0 the exact visual is iterated against actual
Blastware reference PDFs (uploaded to docs/reference/instantel/).
Current output captures all the data fields a real BW Event Report
contains, but the visual hierarchy / spacing is still approximate.
Architecture
1. ``gather_report_data(event_id)`` assembles a flat dict from three
sources: the SeismoDb events row, the .sfm.json sidecar (bw_report
block), and the .h5 waveform samples. Returns ``None`` when the
event doesn't exist or has no waveform data on disk.
2. ``render_event_report_pdf(data)`` takes that dict and produces a
single-page letter-sized PDF as bytes, using matplotlib's PDF
backend (vector output, no rasterization, prints cleanly).
3. The HTTP endpoint at ``/db/events/{id}/report.pdf`` wires them
together: fetch event gather render stream bytes back with
``Content-Type: application/pdf``.
What's in the report (every field BW's printout includes):
Header (left): Date/Time, Trigger Source, Range, Sample Rate, Notes,
Project, Client, User Name, Seis. Loc
Header (right): Serial + firmware, Battery, Calibration, File Name,
Post Event Notes
Mic block: PSPL (dBL + psi), ZC Freq, Channel Test result
Stats table: per-channel PPV / ZC Freq / Time of Peak /
Peak Acceleration / Peak Displacement / Sensor Check
Peak Vector Sum
Waveform plot: 4 channels stacked (MicL/Long/Vert/Tran), shared
time axis, trigger marker, peak markers
USBM RI8507/OSMRE compliance chart: STUBBED separate work item
Histogram events: the layout differs (Number of Intervals header
field, no trigger marker, per-interval bar chart instead of waveform).
Handled via a record_type branch in ``render_event_report_pdf``.
"""
from __future__ import annotations
import io
import json
import logging
import math
from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional
import matplotlib
matplotlib.use("Agg") # headless — no display required
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.backends.backend_pdf import PdfPages
log = logging.getLogger(__name__)
# Reference pressure for dB(L) conversion: 20 µPa expressed in psi.
DBL_REF_PSI = 2.9e-9
# ── Data assembly ────────────────────────────────────────────────────────────
@dataclass
class ReportData:
"""All fields needed to render an Instantel-style Event Report.
Most fields are Optional BW's printout shows '' or just omits
sections when source data is missing. The renderer mirrors that.
"""
# Header — left column
event_datetime_str: Optional[str] = None
trigger_source: Optional[str] = None
geo_range_str: Optional[str] = None
sample_rate_str: Optional[str] = None
notes: Optional[str] = None
project: Optional[str] = None
client: Optional[str] = None
operator: Optional[str] = None
sensor_location: Optional[str] = None
# Header — right column
serial: Optional[str] = None
firmware: Optional[str] = None
battery_volts: Optional[float] = None
calibration_date: Optional[str] = None
calibration_by: Optional[str] = None
file_name: Optional[str] = None
post_event_notes: Optional[str] = None
# Microphone block
mic_pspl_dbl: Optional[float] = None
mic_pspl_psi: Optional[float] = None
mic_pspl_time_s: Optional[float] = None
mic_pspl_when_str: Optional[str] = None # histogram absolute date+time, BW-formatted
mic_zc_freq_hz: Optional[float] = None
mic_zc_freq_above_range: bool = False
mic_channel_test_result: Optional[str] = None
mic_channel_test_freq_hz: Optional[float] = None
mic_channel_test_amp_mv: Optional[float] = None
# Per-channel stats — list of dicts (one per channel)
# Keys: name, ppv_ips, zc_freq_hz, time_of_peak_s,
# peak_accel_g, peak_disp_in, sensor_check
channel_stats: list[dict] = field(default_factory=list)
# Peak Vector Sum
peak_vector_sum_ips: Optional[float] = None
peak_vector_sum_time_s: Optional[float] = None
# Waveform samples — channels[ch] = list of floats in physical units
# Time axis derived from sample_rate + pretrig_samples
channels: dict = field(default_factory=dict)
sample_rate_sps: Optional[int] = None
pretrig_samples: Optional[int] = None
t0_ms: Optional[float] = None
dt_ms: Optional[float] = None
# Record-type discriminator
record_type: Optional[str] = None
is_histogram: bool = False
# Histogram-only fields — only populated for record_type starts with 'Hist'
histogram_start_str: Optional[str] = None # "22:30:38 May 16, 2026"
histogram_stop_str: Optional[str] = None
histogram_n_intervals: Optional[float] = None # 4.00
histogram_interval_size: Optional[str] = None # "1 minute"
histogram_interval_size_s: Optional[float] = None # 60.0 — numeric seconds, used to derive interval_times
histogram_interval_times: list[str] = field(default_factory=list) # per-interval timestamps for x-axis
# Peak Vector Sum metadata (histograms show absolute date+time)
peak_vector_sum_when_str: Optional[str] = None
# Bookkeeping
event_id: Optional[str] = None
server_received_at: Optional[str] = None
bw_pc_sw_version: Optional[str] = None
def gather_report_data(
db,
store,
event_id: str,
) -> Optional[ReportData]:
"""Collect every field needed to render an event report.
Returns ``None`` if the event is unknown or has no waveform data
on disk (no .h5, no .a5.pkl same condition the waveform.json
endpoint 404s on).
"""
row = db.get_event(event_id)
if row is None:
return None
serial = row.get("serial")
filename = row.get("blastware_filename")
if not serial or not filename:
return None
rd = ReportData(
event_id=event_id,
serial=serial,
file_name=filename,
record_type=row.get("record_type"),
is_histogram=str(row.get("record_type", "")).lower().startswith("hist"),
event_datetime_str=row.get("timestamp"),
sample_rate_sps=row.get("sample_rate"),
project=row.get("project"),
client=row.get("client"),
operator=row.get("operator"),
sensor_location=row.get("sensor_location"),
server_received_at=row.get("created_at"),
)
# ── Sidecar bw_report — the rich BW-derived fields ──
sidecar_path = store.sidecar_path_for(serial, filename)
if sidecar_path.exists():
try:
sc = json.loads(sidecar_path.read_text())
except Exception as exc:
log.warning("gather_report_data: sidecar read failed: %s", exc)
sc = {}
bw = sc.get("bw_report") or {}
# Trigger / range / sample-rate display
trig = bw.get("trigger") or {}
rd.trigger_source = (
f"{trig.get('channel','')}: {trig.get('geo_level_ips')} in/s"
if trig.get("channel") or trig.get("geo_level_ips") is not None
else None
)
rec = bw.get("recording") or {}
rd.geo_range_str = (
f"Geo: {rec.get('geo_range_ips')} in/s"
if rec.get("geo_range_ips") is not None else None
)
rt = rec.get("record_time_s")
if rt is not None and rd.sample_rate_sps:
rd.sample_rate_str = f"{rt:.1f} sec At {rd.sample_rate_sps} Sps"
# Device block
dev = bw.get("device") or {}
rd.battery_volts = dev.get("battery_volts")
rd.calibration_date = dev.get("calibration_date")
rd.calibration_by = dev.get("calibration_by")
rd.firmware = bw.get("version")
rd.bw_pc_sw_version = bw.get("pc_sw_version")
# Microphone block
mic = bw.get("mic") or {}
rd.mic_pspl_dbl = mic.get("pspl_dbl")
if rd.mic_pspl_dbl is not None and rd.mic_pspl_dbl > 0:
# Inverse of the dBL formula → psi. Mirrors waveform_codec convention.
rd.mic_pspl_psi = DBL_REF_PSI * (10 ** (rd.mic_pspl_dbl / 20))
rd.mic_pspl_time_s = mic.get("time_of_peak_s")
rd.mic_zc_freq_hz = mic.get("zc_freq_hz")
rd.mic_zc_freq_above_range = bool(mic.get("zc_freq_above_range"))
sc_mic = (bw.get("sensor_check") or {}).get("mic") or {}
rd.mic_channel_test_result = sc_mic.get("result")
rd.mic_channel_test_freq_hz = sc_mic.get("freq_hz")
rd.mic_channel_test_amp_mv = sc_mic.get("amplitude_mv")
# Per-channel stats (Tran / Vert / Long). Per-channel peak
# date+time for histograms comes from bw_report.histogram.channel_peak_when
# (populated when the parser captured it; see the bw_ascii_report
# parser's histogram-fields handler).
peaks = bw.get("peaks") or {}
sc_block = bw.get("sensor_check") or {}
hist_block = bw.get("histogram") or {}
peak_when = hist_block.get("channel_peak_when") or {}
for ch_lc, ch_label in (("tran", "Tran"), ("vert", "Vert"), ("long", "Long")):
ch = peaks.get(ch_lc) or {}
sc_ch = sc_block.get(ch_lc) or {}
ch_when_iso = peak_when.get(ch_label)
peak_date, peak_time = _split_iso_to_date_time(ch_when_iso)
rd.channel_stats.append({
"name": ch_label,
"ppv_ips": ch.get("ppv_ips"),
"zc_freq_hz": ch.get("zc_freq_hz"),
"zc_freq_above_range": bool(ch.get("zc_freq_above_range")),
"time_of_peak_s": ch.get("time_of_peak_s"),
"peak_accel_g": ch.get("peak_accel_g"),
"peak_disp_in": ch.get("peak_disp_in"),
"sensor_check": sc_ch.get("result"),
"peak_date": peak_date,
"peak_time": peak_time,
})
# MicL peak time (used in the mic block — "PSPL ... on DATE at TIME")
mic_when_iso = peak_when.get("MicL")
rd.mic_pspl_when_str = _fmt_iso_to_bw(mic_when_iso) if mic_when_iso else None
# Peak Vector Sum
vs = peaks.get("vector_sum") or {}
rd.peak_vector_sum_ips = vs.get("ips")
rd.peak_vector_sum_time_s = vs.get("time_s")
# PVS absolute date+time (histograms). Same formatting as Mic.
pvs_when_iso = vs.get("when")
rd.peak_vector_sum_when_str = _fmt_iso_to_bw(pvs_when_iso) if pvs_when_iso else None
# Histogram-specific header fields — keys match the projection in
# _bw_report_to_dict ("start" / "stop", not "_str" suffixed).
if rd.is_histogram:
rd.histogram_start_str = hist_block.get("start") or rd.event_datetime_str
rd.histogram_stop_str = hist_block.get("stop")
rd.histogram_n_intervals = hist_block.get("n_intervals")
rd.histogram_interval_size = hist_block.get("interval_size")
rd.histogram_interval_size_s = hist_block.get("interval_size_s")
rd.histogram_interval_times = hist_block.get("interval_times") or []
# ── Waveform samples — from the .h5 via the existing helper ──
from sfm import event_hdf5
h5_path = store.hdf5_path_for(serial, filename)
if h5_path.exists():
try:
wf = event_hdf5.plot_json_from_hdf5(h5_path, event_id=event_id)
rd.channels = {
ch: (chd.get("values") or [])
for ch, chd in (wf.get("channels") or {}).items()
}
ta = wf.get("time_axis") or {}
rd.sample_rate_sps = rd.sample_rate_sps or ta.get("sample_rate")
rd.pretrig_samples = ta.get("pretrig_samples")
rd.t0_ms = ta.get("t0_ms")
rd.dt_ms = ta.get("dt_ms")
except Exception as exc:
log.warning("gather_report_data: hdf5 read failed: %s", exc)
# ── Histogram aggregation ──
# Codec emits ~N per-block samples (typically 1/sec); BW reports
# one bar per configured interval (1 min / 5 min / etc.). When
# bw_report.histogram.n_intervals is populated (events ingested
# with the parser extension), group max-per-group to match. Also
# derives per-interval timestamps for the x-axis. No-op for
# waveform events or when n_intervals is missing.
if rd.is_histogram and rd.histogram_n_intervals and rd.histogram_n_intervals >= 1:
n = int(rd.histogram_n_intervals)
for ch, vals in list(rd.channels.items()):
if not vals:
continue
per_group = len(vals) // n
remainder = len(vals) % n
agg: list = []
offset = 0
for i in range(n):
grp_size = per_group + (1 if i < remainder else 0)
if grp_size > 0:
grp = vals[offset:offset + grp_size]
agg.append(max((abs(v) for v in grp if v is not None), default=0))
offset += grp_size
else:
agg.append(0)
rd.channels[ch] = agg
# Derive per-interval HH:MM:SS labels if we have the start time + size
if rd.histogram_start_str and rd.histogram_interval_size_s and not rd.histogram_interval_times:
try:
import datetime as _dt
start = _dt.datetime.fromisoformat(rd.histogram_start_str)
rd.histogram_interval_times = [
(start + _dt.timedelta(seconds=(i + 1) * rd.histogram_interval_size_s)).strftime("%H:%M:%S")
for i in range(n)
]
except Exception:
pass
return rd
# ── PDF rendering ────────────────────────────────────────────────────────────
def render_event_report_pdf(rd: ReportData) -> bytes:
"""Render an event report dict to a single-page letter PDF.
Branches on ``rd.is_histogram`` waveform and histogram layouts
differ in their header fields, stats-table rows, and bottom plot.
Layout modeled on Blastware's Event Report PDFs (samples in
docs/reference/instantel/).
"""
# Letter portrait — 8.5"×11"
fig = plt.figure(figsize=(8.5, 11), dpi=100)
fig.patch.set_facecolor("white")
if rd.is_histogram:
_render_histogram_layout(fig, rd)
else:
_render_waveform_layout(fig, rd)
# Page footer (common to both layouts) — Created date + event id.
# Pushed to the very page bottom so it doesn't collide with the
# waveform footer scale / trigger legend lines just above.
# Convert UTC server_received_at to local for display.
created_local = _fmt_iso_to_bw(rd.server_received_at) if rd.server_received_at else ""
fig.text(
0.07, 0.005,
f"Created: {created_local} • seismo-relay",
fontsize=6, color="#888", ha="left",
)
fig.text(
0.93, 0.005,
f"Event {rd.event_id[:8] if rd.event_id else ''}",
fontsize=6, color="#888", ha="right",
)
buf = io.BytesIO()
fig.savefig(buf, format="pdf")
plt.close(fig)
return buf.getvalue()
def _render_waveform_layout(fig, rd: ReportData) -> None:
"""Waveform layout: header / mic+USBM / per-channel stats / waveform plot.
Stats table includes Time (Rel. to Trig), Peak Accel, Peak Disp.
Left margin sized to fit the channel labels (MicL/Long/Vert/Tran).
Extra bottom margin reserves space for x-axis tick labels +
"Amplitude Geo: X in/s/div Mic: Y psi(L)/div" footer + trigger
legend without overlap.
"""
gs = fig.add_gridspec(
nrows=4, ncols=1,
left=0.11, right=0.94, top=0.97, bottom=0.12,
height_ratios=[1.7, 2.0, 1.8, 5.5],
hspace=0.35,
)
ax_header = fig.add_subplot(gs[0]); ax_header.axis("off")
_draw_header_waveform(ax_header, rd)
ax_mid = fig.add_subplot(gs[1]); ax_mid.axis("off")
_draw_mic_and_usbm(ax_mid, rd)
ax_stats = fig.add_subplot(gs[2]); ax_stats.axis("off")
_draw_channel_stats_waveform(ax_stats, rd)
_draw_waveform_subplot(fig, gs[3], rd)
def _render_histogram_layout(fig, rd: ReportData) -> None:
"""Histogram layout: header / mic-only / per-channel stats / bar plot.
No USBM compliance chart (it's a waveform-only concept). Stats table
uses Date + Time-of-peak instead of relative-time + accel + disp.
Left margin sized to fit the channel labels. Extra bottom margin
leaves room for the x-axis time labels + footer scale legend
without overlap.
"""
gs = fig.add_gridspec(
nrows=4, ncols=1,
left=0.11, right=0.94, top=0.97, bottom=0.12,
height_ratios=[1.8, 0.9, 1.7, 5.6],
hspace=0.35,
)
ax_header = fig.add_subplot(gs[0]); ax_header.axis("off")
_draw_header_histogram(ax_header, rd)
ax_mic = fig.add_subplot(gs[1]); ax_mic.axis("off")
_draw_mic_only(ax_mic, rd)
ax_stats = fig.add_subplot(gs[2]); ax_stats.axis("off")
_draw_channel_stats_histogram(ax_stats, rd)
_draw_histogram_subplot(fig, gs[3], rd)
def _to_display_local(iso: str):
"""Parse an ISO timestamp and return a datetime in the system's local
timezone (set by the TZ env var, default America/New_York via the
Dockerfile).
Behaviour:
- "...Z" or "...+HH:MM" suffix tz-aware UTC converted to local
- Naïve "YYYY-MM-DDTHH:MM:SS" (no tz) returned as-is. This
matches the convention used elsewhere in seismo-relay: BW's
recorded-at timestamps are naïve and ALREADY in the unit's
local clock; we don't second-guess them.
"""
import datetime as _dt
dt = _dt.datetime.fromisoformat(iso.replace("Z", "+00:00"))
if dt.tzinfo is not None:
# Convert from UTC (or other tz) → local per the TZ env var.
# astimezone() without arg uses the system timezone.
dt = dt.astimezone()
return dt
def _fmt_iso_to_bw(iso: Optional[str]) -> Optional[str]:
"""Convert an ISO-8601 timestamp to BW's display format
'22:30:37 May 16, 2026'. UTC inputs (with Z suffix) are
converted to the system's local timezone first; naïve inputs
are formatted as-is. Returns input unchanged on parse failure."""
if not iso or "T" not in iso:
return iso
try:
return _to_display_local(iso).strftime("%H:%M:%S %B %d, %Y").replace(" 0", " ")
except Exception:
return iso
def _split_iso_to_date_time(iso: Optional[str]) -> tuple[Optional[str], Optional[str]]:
"""Split an ISO timestamp into BW-formatted ('May 27 /26', '06:06:14')
date+time strings. Used for the histogram stats table where the
Date and Time rows are presented separately. UTC inputs are
converted to local time first. Returns (None, None) on parse failure."""
if not iso:
return (None, None)
try:
dt = _to_display_local(iso)
# BW format: 'May 27 /26' (3-letter month + 2-digit year)
date_str = dt.strftime("%b %d /%y").replace(" 0", " ")
time_str = dt.strftime("%H:%M:%S")
return (date_str, time_str)
except Exception:
return (None, None)
def _kv(ax, x, y, label, value, *, label_w=0.18):
"""Render a 'Label Value' row at axes-coordinates (x, y)."""
ax.text(x, y, label, fontsize=8, color="#555", ha="left", va="top",
transform=ax.transAxes)
ax.text(x + label_w, y, _fmt(value), fontsize=8, ha="left", va="top",
transform=ax.transAxes, family="monospace")
def _fmt(v):
"""Format any field for display — '' for None, str otherwise."""
if v is None:
return ""
if isinstance(v, float):
return f"{v:.4f}".rstrip("0").rstrip(".")
return str(v)
def _draw_header_waveform(ax, rd: ReportData) -> None:
"""Two-column metadata header — waveform variant."""
rows_left = [
("Date/Time", _fmt_iso_to_bw(rd.event_datetime_str)),
("Trigger Source", rd.trigger_source),
("Range", rd.geo_range_str),
("Sample Rate", rd.sample_rate_str),
("Notes", rd.notes),
("Project:", rd.project),
("Client:", rd.client),
("User Name:", rd.operator),
("Seis. Loc:", rd.sensor_location),
]
_draw_header_columns(ax, rows_left, rd)
def _draw_header_histogram(ax, rd: ReportData) -> None:
"""Two-column metadata header — histogram variant.
Histograms have Start / Finish / Intervals fields instead of
Trigger Source (there's no trigger event for a histogram capture).
"""
intervals_str = None
if rd.histogram_n_intervals is not None and rd.histogram_interval_size:
intervals_str = f"{rd.histogram_n_intervals} At {rd.histogram_interval_size}"
rows_left = [
("Start", _fmt_iso_to_bw(rd.histogram_start_str or rd.event_datetime_str)),
("Finish", _fmt_iso_to_bw(rd.histogram_stop_str)),
("Intervals", intervals_str),
("Range", rd.geo_range_str),
("Sample Rate", (f"{rd.sample_rate_sps} Sps" if rd.sample_rate_sps else None)),
("Notes", rd.notes),
("Project:", rd.project),
("Client:", rd.client),
("User Name:", rd.operator),
("Seis. Loc:", rd.sensor_location),
]
_draw_header_columns(ax, rows_left, rd)
def _draw_header_columns(ax, rows_left, rd: ReportData) -> None:
"""Shared 2-column header rendering used by both layouts."""
rows_right = [
("Serial Number", f"{rd.serial or ''}" + (f" {rd.firmware}" if rd.firmware else "")),
("Battery Level", f"{rd.battery_volts:.1f} Volts" if rd.battery_volts is not None else None),
("Unit Calibration", (f"{rd.calibration_date}" + (f" by {rd.calibration_by}" if rd.calibration_by else ""))
if rd.calibration_date else None),
("File Name", rd.file_name),
("Post Event Notes", rd.post_event_notes),
]
y = 0.95
dy = 0.095
for label, value in rows_left:
_kv(ax, 0.0, y, label, value, label_w=0.18)
y -= dy
y = 0.95
for label, value in rows_right:
_kv(ax, 0.55, y, label, value, label_w=0.20)
y -= dy
def _draw_mic_only(ax, rd: ReportData) -> None:
"""Mic block (histogram variant — no USBM chart)."""
ax.text(0.0, 0.95, "Microphone Linear Weighting", fontsize=8, color="#555",
transform=ax.transAxes, va="top")
rows = _mic_rows(rd)
y = 0.70
for label, value in rows:
_kv(ax, 0.0, y, label, value, label_w=0.18)
y -= 0.22
def _draw_mic_and_usbm(ax, rd: ReportData) -> None:
"""Mic block on the left + USBM compliance chart placeholder on right.
(Waveform variant USBM is a velocity-vs-frequency compliance plot
that doesn't apply to histograms.)"""
ax.text(0.0, 0.95, "Microphone Linear Weighting", fontsize=8, color="#555",
transform=ax.transAxes, va="top")
rows = _mic_rows(rd)
y = 0.80
for label, value in rows:
_kv(ax, 0.0, y, label, value, label_w=0.18)
y -= 0.15
# USBM chart placeholder — upper-right. Real piecewise compliance
# curves are a separate work item; for now this just shows the title
# + a "see report" message so the layout is correct.
ax.text(0.72, 0.97, "USBM RI8507 And OSMRE",
fontsize=9, weight="bold", color="#333", ha="center", va="top",
transform=ax.transAxes)
ax.text(0.72, 0.50, "[compliance chart\ncoming soon]",
fontsize=8, color="#bbb", ha="center", va="center",
transform=ax.transAxes, style="italic")
def _mic_rows(rd: ReportData) -> list[tuple[str, Optional[str]]]:
"""Build the mic-section value rows (shared by both layouts).
For histograms, BW formats the PSPL line as
"125.7 dB(L) on May 27, 2026 at 06:19:14"
(absolute date+time of peak). Waveform events show the relative
"at 0.012 sec." instead. Both formats covered here based on which
field is populated.
"""
rows: list[tuple[str, Optional[str]]] = []
if rd.mic_pspl_dbl is not None:
line = f"{rd.mic_pspl_dbl:.1f} dB(L)"
if rd.mic_pspl_when_str:
# Histogram-style: "PSPL 125.7 dB(L) on May 27, 2026 at 06:19:14"
# mic_pspl_when_str is already "HH:MM:SS Month DD, YYYY";
# reformat to "on Month DD, YYYY at HH:MM:SS" for BW match.
parts = rd.mic_pspl_when_str.split(" ", 1)
if len(parts) == 2:
line += f" on {parts[1]} at {parts[0]}"
else:
line += f" on {rd.mic_pspl_when_str}"
elif rd.mic_pspl_time_s is not None:
# Waveform-style: relative-to-trigger seconds.
line += f" at {rd.mic_pspl_time_s:.3f} sec."
rows.append(("PSPL", line))
if rd.mic_zc_freq_hz is not None:
prefix = ">" if rd.mic_zc_freq_above_range else ""
rows.append(("ZC Freq", f"{prefix}{rd.mic_zc_freq_hz:.0f} Hz"))
if rd.mic_channel_test_result:
line = rd.mic_channel_test_result
if rd.mic_channel_test_freq_hz is not None and rd.mic_channel_test_amp_mv is not None:
line += (f" (Freq = {rd.mic_channel_test_freq_hz:.1f} Hz, "
f"Amp = {rd.mic_channel_test_amp_mv:.0f} mv)")
rows.append(("Channel Test", line))
return rows
def _draw_channel_stats_waveform(ax, rd: ReportData) -> None:
"""Waveform stats table — has Time (Rel. to Trig), Peak Accel, Peak Disp.
Followed by Peak Vector Sum line."""
rows_spec = [
("PPV", "ppv_ips", "in/s"),
("ZC Freq", "zc_freq_hz", "Hz"),
("Time (Rel. to Trig)", "time_of_peak_s", "sec"),
("Peak Acceleration", "peak_accel_g", "g"),
("Peak Displacement", "peak_disp_in", "in"),
("Sensor Check", "sensor_check", ""),
]
_draw_stats_table(ax, rd, rows_spec)
_draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec))
def _draw_channel_stats_histogram(ax, rd: ReportData) -> None:
"""Histogram stats table — PPV, ZC Freq, Date, Time of peak, Sensor Check.
Followed by Peak Vector Sum line."""
# Date / Time of peak are per-channel timestamps for the interval at peak.
# bw_report stores time_of_peak_s as relative seconds, but for histograms
# BW shows them as absolute date+time. We populate from rd.channel_stats
# if those absolute fields are present; otherwise fall back to relative.
rows_spec = [
("PPV", "ppv_ips", "in/s"),
("ZC Freq", "zc_freq_hz", "Hz"),
("Date", "peak_date", ""),
("Time", "peak_time", ""),
("Sensor Check", "sensor_check", ""),
]
_draw_stats_table(ax, rd, rows_spec)
_draw_pvs_summary(ax, rd, n_data_rows=len(rows_spec), histogram_when=True)
def _draw_pvs_summary(
ax,
rd: ReportData,
*,
n_data_rows: int,
histogram_when: bool = False,
) -> None:
"""Render the Peak Vector Sum + 'NA: Not Applicable' caption below the
stats table.
Reads ``ax._stats_table_bottom`` (set by ``_draw_stats_table`` when
it pins the table via an explicit ``bbox``) so the PVS line lands
just below the table's known bottom edge instead of guessing at the
geometry.
Centered horizontally for visual balance (the previous left-aligned
x=0 landed under the label column, not the data, which looked off).
"""
if rd.peak_vector_sum_ips is None:
return
line = f"Peak Vector Sum {rd.peak_vector_sum_ips:.3f} in/s"
if histogram_when and rd.peak_vector_sum_when_str:
# Histogram absolute date+time. when_str is "HH:MM:SS Month DD, YYYY";
# reformat to "<value> on <date> At <time>" to match BW.
parts = rd.peak_vector_sum_when_str.split(" ", 1)
if len(parts) == 2:
line += f" on {parts[1]} At {parts[0]}"
else:
line += f" on {rd.peak_vector_sum_when_str}"
elif not histogram_when and rd.peak_vector_sum_time_s is not None:
line += f" At {rd.peak_vector_sum_time_s:.3f} sec."
# _draw_stats_table stashes the bbox bottom on the axes so we don't
# have to guess geometry. Falls back to a conservative default if
# the bbox approach hasn't run.
table_bottom_y = getattr(ax, "_stats_table_bottom", -0.10)
pvs_y = table_bottom_y - 0.04 # small gap below the table border
# Centered for visual balance — looks intentional rather than offset.
# The original BW-replica had a "NA: Not Applicable" caption below
# this line; dropped because we use "—" for missing values and the
# legend was always squished against the PVS line.
ax.text(0.5, pvs_y, line, fontsize=9, weight="bold",
ha="center", va="top", transform=ax.transAxes)
def _draw_stats_table(ax, rd: ReportData, rows_spec: list[tuple[str, str, str]]) -> None:
"""Render a per-channel stats table (Tran/Vert/Long).
rows_spec: list of (label, field_name_in_channel_stats, unit_string)
"""
headers = ["", "Tran", "Vert", "Long", ""]
ch_lookup = {c["name"]: c for c in rd.channel_stats}
def _cell(field, ch_name):
ch_rec = ch_lookup.get(ch_name, {})
val = ch_rec.get(field)
if val is None:
return ""
if isinstance(val, float):
# ZC Freq is integer-formatted in BW; ">100 Hz" sentinel
# rendered as ">N" (val carries the threshold). Everything
# else gets 3 decimals.
if field == "zc_freq_hz":
prefix = ">" if ch_rec.get("zc_freq_above_range") else ""
return f"{prefix}{val:.0f}"
return f"{val:.3f}"
return str(val)
table_data = [headers]
for label, field_name, unit in rows_spec:
table_data.append([
label,
_cell(field_name, "Tran"),
_cell(field_name, "Vert"),
_cell(field_name, "Long"),
unit,
])
# Pin the table's position+size via bbox so we know exactly where
# the bottom edge lands. Lets _draw_pvs_summary place the PVS line
# just below the table without guessing at row heights.
#
# bbox = [x, y, width, height] in axes coords. Header + data rows
# at row_h each; horizontal extent matches sum(colWidths).
n_rows = len(table_data) # header + data rows
row_h = 0.12 # axes-fraction per row (fits fontsize=8)
table_height = n_rows * row_h
table_bottom = 1.0 - table_height
tbl = ax.table(
cellText=table_data,
colWidths=[0.28, 0.14, 0.14, 0.14, 0.10],
cellLoc="left", edges="open",
bbox=[0.0, table_bottom, 0.80, table_height],
)
tbl.auto_set_font_size(False)
tbl.set_fontsize(8)
for j in range(5):
tbl[(0, j)].set_text_props(weight="bold", color="#555")
# Stash the bottom Y so _draw_pvs_summary can position itself below.
ax._stats_table_bottom = table_bottom
def _channel_axis_color(ch: str) -> str:
return {"MicL": "#cc00cc", "Long": "#0066ff", "Vert": "#009933", "Tran": "#cc0000"}.get(ch, "#444")
def _draw_waveform_subplot(fig, gridspec_cell, rd: ReportData) -> None:
"""4-channel stacked waveform plot — Instantel printout order
(MicL on top, Tran on bottom), shared x-axis in SECONDS, trigger
triangle markers at t=0, '0.0' baseline label on right of each."""
inner = gridspec_cell.subgridspec(4, 1, hspace=0.0)
order = ["MicL", "Long", "Vert", "Tran"]
sr = rd.sample_rate_sps or 1024
# Convert ms-based time axis to seconds for the x-axis
dt_s = (rd.dt_ms or (1000.0 / sr)) / 1000.0
t0_s = (rd.t0_ms if rd.t0_ms is not None else 0.0) / 1000.0
last_idx = len(order) - 1
for i, ch in enumerate(order):
ax = fig.add_subplot(inner[i])
values = rd.channels.get(ch) or []
times = [t0_s + j * dt_s for j in range(len(values))]
if values:
color = _channel_axis_color(ch)
ax.plot(times, values, color=color, linewidth=0.5)
# Symmetric y-axis for geo; zero-anchored for mic.
if ch != "MicL":
amax = max((abs(v) for v in values), default=0.001)
ax.set_ylim(-amax * 1.10, amax * 1.10)
else:
amax = max((abs(v) for v in values), default=0.001)
ax.set_ylim(-amax * 1.10, amax * 1.10)
# Channel label on the LEFT (matches BW)
ax.set_ylabel(ch, fontsize=8, rotation=0, ha="right", va="center",
color=_channel_axis_color(ch), weight="bold", labelpad=14)
# "0.0" on the RIGHT (BW convention)
ax.text(1.005, 0.5, "0.0", transform=ax.transAxes,
fontsize=7, color="#555", va="center", ha="left")
ax.grid(True, linestyle="--", linewidth=0.3, color="#bbb", alpha=0.6)
# Vertical dashed trigger line at t=0
ax.axvline(0.0, color="#cc0000", linestyle="--", linewidth=0.6, alpha=0.7)
# Zero baseline horizontal
ax.axhline(0.0, color=_channel_axis_color(ch), linestyle="-",
linewidth=0.4, alpha=0.5)
if i != last_idx:
ax.set_xticklabels([])
ax.tick_params(axis="x", length=0)
else:
ax.tick_params(axis="x", labelsize=7)
ax.tick_params(axis="y", labelsize=6)
# Trigger triangle marker ▼ above the top channel at t=0
top_ax = fig.axes[-4] # MicL is the first added in this gridspec
top_ax.plot([0], [top_ax.get_ylim()[1]], marker="v", color="black",
markersize=8, clip_on=False, zorder=10)
# Compute scale-per-division for the footer (10 divs across the chart)
# and find peak geo amplitude for the geo amp/div setting.
total_s = times[-1] - times[0] if values else 0
div_s = total_s / 10 if total_s > 0 else 0
geo_amp_div = ""
for ch in ("Tran", "Vert", "Long"):
v = rd.channels.get(ch) or []
if v:
amax = max(abs(x) for x in v)
geo_amp_div = f"{(amax * 1.1 * 2) / 10:.3f}"
break
fig.text(
0.11, 0.030,
f"Time(Seconds) {div_s:.2f} sec/div Amplitude Geo: {geo_amp_div} in/s/div Mic: 0.001 psi(L)/div",
fontsize=7, color="#444", ha="left",
)
fig.text(
0.11, 0.018,
"Trigger = ▶━━━━━ ━━━━━━◀",
fontsize=7, color="#444", ha="left",
)
def _nice_geo_step(amax: float) -> float:
"""Pick a "nice" per-division step for the geo y-axis.
Geo LSB is 0.005 in/s sub-LSB steps like 0.003/div are nonsense.
Quantize to the BW-style 1-2-5 sequence (0.005, 0.01, 0.025, 0.05,
) and return the smallest step where 5 divisions >= amax, so the
top of the chart lands on a tick.
"""
if amax <= 0:
return 0.005
for step in (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0):
if step * 5 >= amax:
return step
return 10.0
def _draw_histogram_subplot(fig, gridspec_cell, rd: ReportData) -> None:
"""4-channel stacked histogram bar chart — per-interval peaks.
X-axis labeled with the actual times from rd.histogram_interval_times
when available; otherwise interval index.
The three geo channels share a single y-axis scale (a BW-style nice
multiple of the 0.005 in/s LSB) so bar heights are directly
comparable across channels. MicL has its own auto-scale.
"""
inner = gridspec_cell.subgridspec(4, 1, hspace=0.0)
order = ["MicL", "Long", "Vert", "Tran"]
last_idx = len(order) - 1
# X-axis: use absolute time labels if we have them, else interval index
have_times = bool(rd.histogram_interval_times)
# Shared geo scale: max across Tran/Vert/Long, quantized to a nice
# tick step. Used for ylim + the footer "Amplitude Geo: X in/s/div".
geo_amax = 0.0
for gch in ("Tran", "Vert", "Long"):
gv = rd.channels.get(gch) or []
if gv:
geo_amax = max(geo_amax, max(abs(x) for x in gv if x is not None))
geo_step = _nice_geo_step(geo_amax)
geo_top = geo_step * 5 # 5 divisions — top tick lands at this value
for i, ch in enumerate(order):
ax = fig.add_subplot(inner[i])
values = rd.channels.get(ch) or []
if values:
# Histograms record per-interval PEAK magnitudes — always
# non-negative. Codec output occasionally includes signed
# values when the underlying .h5 was scaled like a waveform;
# take the absolute value so the bars rise from zero.
abs_vals = [abs(v) if v is not None else 0 for v in values]
xs = np.arange(len(abs_vals))
color = _channel_axis_color(ch)
ax.bar(xs, abs_vals, color=color, width=0.85, linewidth=0)
if ch in ("Tran", "Vert", "Long"):
ax.set_ylim(0, geo_top)
ax.set_yticks([j * geo_step for j in range(6)])
else:
amax = max(abs_vals, default=0)
if amax > 0:
ax.set_ylim(0, amax * 1.10)
ax.set_ylabel(ch, fontsize=8, rotation=0, ha="right", va="center",
color=_channel_axis_color(ch), weight="bold", labelpad=14)
ax.text(1.005, 0.02, "0.0", transform=ax.transAxes,
fontsize=7, color="#555", va="bottom", ha="left")
ax.grid(True, axis="y", linestyle="--", linewidth=0.3, color="#bbb", alpha=0.6)
if i != last_idx:
ax.set_xticklabels([])
ax.tick_params(axis="x", length=0)
else:
if have_times and len(rd.histogram_interval_times) == len(values):
# Show 2-4 labels evenly spaced
n = len(values)
step = max(1, n // 4)
tick_positions = list(range(0, n, step))
ax.set_xticks(tick_positions)
ax.set_xticklabels([rd.histogram_interval_times[t] for t in tick_positions],
rotation=0, fontsize=6)
else:
ax.set_xlabel("Interval", fontsize=8)
ax.tick_params(axis="x", labelsize=7)
ax.tick_params(axis="y", labelsize=6)
# Footer scale info — histograms use minute/div. Reuses the shared
# geo_step computed above so the label matches the actual y-axis
# tick spacing on every subplot.
interval_str = rd.histogram_interval_size or ""
geo_amp_div = f"{geo_step:.3f}"
fig.text(
0.11, 0.030,
f"Time {interval_str} /div Amplitude Geo: {geo_amp_div} in/s/div Mic: 0.001 psi(L)/div",
fontsize=7, color="#444", ha="left",
)
+168 -6
View File
@@ -46,7 +46,7 @@ from typing import Optional
# FastAPI / Pydantic
try:
from fastapi import Body, FastAPI, File, HTTPException, Query, UploadFile
from fastapi import Body, FastAPI, File, HTTPException, Query, Response, UploadFile
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
from pydantic import BaseModel
@@ -381,10 +381,24 @@ def webapp():
@app.get("/waveform", response_class=FileResponse)
def waveform_viewer():
"""Serve the standalone waveform viewer."""
"""Serve the standalone LIVE-device waveform viewer.
Talks to ``/device/*`` endpoints for plotting events pulled from
a connected unit in real time. For the stored-event browser that
reads from the SeismoDb + WaveformStore, see ``/events``.
"""
return str(Path(__file__).parent / "waveform_viewer.html")
@app.get("/events", response_class=FileResponse)
def event_browser():
"""Serve the stored-event browser — pick a serial, list its events,
render any one's waveform from the persisted ``.h5`` via the
``/db/events/{id}/waveform.json`` endpoint. Standalone HTML +
Chart.js, no auth, no build step."""
return str(Path(__file__).parent / "event_browser.html")
@app.get("/device/info")
def device_info(
port: Optional[str] = Query(None, description="Serial port (e.g. COM5, /dev/ttyUSB0)"),
@@ -1973,10 +1987,15 @@ def _cleanup_event_files(row: dict) -> dict:
base_name = bw_name or a5_name or sc_name
if base_name:
bw_path, a5_path = store.paths_for(serial, base_name)
sc_path = store.sidecar_path_for(serial, base_name)
h5_path = store.hdf5_path_for(serial, base_name)
sc_path = store.sidecar_path_for(serial, base_name)
h5_path = store.hdf5_path_for(serial, base_name)
# Preserved BW ASCII report (added 2026-05-27 with the .TXT
# preservation feature) — needs to be cleaned up too, otherwise
# deletes leave orphan _ASCII.TXT files behind.
txt_path = store.txt_path_for(serial, base_name)
for kind, p in [("blastware", bw_path), ("a5_pickle", a5_path),
("sidecar", sc_path), ("hdf5", h5_path)]:
("sidecar", sc_path), ("hdf5", h5_path),
("txt", txt_path)]:
try:
if p.exists():
p.unlink()
@@ -2164,6 +2183,148 @@ def db_event_blastware_file(event_id: str) -> FileResponse:
)
@app.get("/db/events/{event_id}/ascii_report.txt")
def db_event_ascii_report_txt(event_id: str):
"""Serve the raw BW ASCII report (.TXT) for an event, when preserved.
Returns 404 for events ingested before the .TXT-preservation feature
landed (2026-05-27) those events have only the parsed ``bw_report``
block in the sidecar, not the raw .TXT. Re-forwarding from the
watcher PC will populate the .TXT going forward.
"""
row = _get_db().get_event(event_id)
if row is None:
raise HTTPException(status_code=404, detail=f"Event {event_id} not found")
serial = row.get("serial")
filename = row.get("blastware_filename")
if not serial or not filename:
raise HTTPException(status_code=404, detail="Event has no associated BW file")
txt_path = _get_store().open_txt(serial, filename)
if txt_path is None:
raise HTTPException(
status_code=404,
detail=(
f"Raw .TXT not preserved for {filename}. Events ingested "
"before 2026-05-27 don't have it; re-forward from the "
"watcher PC to populate."
),
)
return FileResponse(
path=str(txt_path),
media_type="text/plain",
filename=txt_path.name,
)
@app.get("/db/events/{event_id}/report.pdf")
def db_event_report_pdf(event_id: str):
"""Render an Instantel-style Event Report as a PDF.
Single-page letter portrait, matches the BW Event Report's data
coverage and layout (header / mic block / per-channel stats /
waveform plot). V0.20.0 stub exact visual being iterated
against reference PDFs in ``docs/reference/instantel/``.
Returns 404 if the event is unknown or has no waveform data on
disk (same condition as /waveform.json).
"""
from sfm import report_pdf
rd = report_pdf.gather_report_data(_get_db(), _get_store(), event_id)
if rd is None:
raise HTTPException(status_code=404, detail=f"Event {event_id} not found or has no waveform")
pdf_bytes = report_pdf.render_event_report_pdf(rd)
# Suggested download filename based on the BW file basename.
fname = (rd.file_name or event_id).replace(".", "_")
return Response(
content=pdf_bytes,
media_type="application/pdf",
headers={"Content-Disposition": f'inline; filename="{fname}_report.pdf"'},
)
def _maybe_aggregate_histogram(plot: dict, store, serial: str, filename: str, row: dict) -> dict:
"""For histogram events, aggregate the codec's per-block samples into
the BW-reported number of intervals. No-op for waveforms or when
we don't have the histogram metadata (interval count + size) in the
sidecar's bw_report block.
Why: the histogram codec emits one value per internal block (~1 per
second), but BW's printout shows one bar per configured interval
(typically 1-15 minutes). For a 1-minute-interval event the codec
gives ~60 blocks per BW bar. Aggregating max-per-group makes the
SFM chart + PDF visually match BW's display.
"""
record_type = row.get("record_type") or ""
if not record_type.lower().startswith("hist"):
return plot
# Read interval count + size from the sidecar's bw_report.histogram block
try:
import json as _json
sidecar_path = store.sidecar_path_for(serial, filename)
if not sidecar_path.exists():
return plot
sc = _json.loads(sidecar_path.read_text())
hist = (sc.get("bw_report") or {}).get("histogram") or {}
n_intervals = hist.get("n_intervals")
interval_size_s = hist.get("interval_size_s")
start_iso = hist.get("start")
except Exception:
return plot
if not n_intervals or n_intervals < 1:
return plot
# Aggregate each channel's values into n_intervals groups, max-per-group
channels = plot.get("channels") or {}
aggregated_channels: dict = {}
for ch, chd in channels.items():
vals = chd.get("values") or []
if not vals:
aggregated_channels[ch] = chd
continue
# Distribute len(vals) samples across n_intervals groups; uneven
# remainders get distributed across the first few groups.
per_group = len(vals) // n_intervals
remainder = len(vals) % n_intervals
agg: list = []
offset = 0
for i in range(n_intervals):
grp_size = per_group + (1 if i < remainder else 0)
if grp_size > 0:
grp = vals[offset:offset + grp_size]
# Max of absolute values (peaks are magnitudes).
agg.append(max((abs(v) for v in grp if v is not None), default=0))
offset += grp_size
else:
agg.append(0)
aggregated_channels[ch] = {**chd, "values": agg}
# Build per-interval timestamp labels for the x-axis if we have start time
interval_times: list = []
if start_iso and interval_size_s:
try:
import datetime as _dt
start = _dt.datetime.fromisoformat(start_iso)
for i in range(int(n_intervals)):
# Show the END of each interval (BW convention — the
# peak reported is for samples taken THROUGH that time)
end = start + _dt.timedelta(seconds=(i + 1) * interval_size_s)
interval_times.append(end.strftime("%H:%M:%S"))
except Exception:
pass
# Override the time_axis to reflect intervals (not samples).
plot_aggr = {**plot, "channels": aggregated_channels}
plot_aggr["time_axis"] = {
**(plot.get("time_axis") or {}),
"histogram_aggregated": True,
"n_intervals": int(n_intervals),
"interval_size_s": interval_size_s,
"interval_times": interval_times,
}
return plot_aggr
@app.get("/db/events/{event_id}/waveform.json")
def db_event_waveform_json(event_id: str) -> dict:
"""
@@ -2195,7 +2356,8 @@ def db_event_waveform_json(event_id: str) -> dict:
h5_path = store.hdf5_path_for(serial, filename)
if h5_path.exists():
try:
return event_hdf5.plot_json_from_hdf5(h5_path, event_id=event_id)
plot = event_hdf5.plot_json_from_hdf5(h5_path, event_id=event_id)
return _maybe_aggregate_histogram(plot, store, serial, filename, row)
except Exception as exc:
log.warning("HDF5 read failed (%s); falling back to A5 path", exc)
+557 -39
View File
@@ -499,6 +499,20 @@
text-align: left;
border-bottom: 1px solid var(--border);
white-space: nowrap;
position: sticky;
top: 0;
z-index: 1;
}
table.db-table thead th[data-sort]:hover {
background: var(--border2);
color: var(--text);
}
table.db-table thead th .sort-arrow {
display: inline-block;
width: 10px;
color: var(--accent, #58a6ff);
font-weight: 900;
text-align: center;
}
table.db-table tbody tr { border-bottom: 1px solid var(--border2); }
table.db-table tbody tr:last-child { border-bottom: none; }
@@ -758,7 +772,9 @@
overflow: hidden;
min-height: 0;
}
#section-db { display: none; }
/* Default to Database view on page load — most users are here to
browse stored events, not connect to a live unit. */
#section-live { display: none; }
/* ── Live connect bar (host/port/connect, live section only) ── */
#live-connect-bar {
@@ -792,8 +808,8 @@
</div>
<div class="hdr-sep"></div>
<div class="section-switcher">
<button class="section-btn active" onclick="switchSection('live')">Live Device</button>
<button class="section-btn" onclick="switchSection('db')">Database</button>
<button class="section-btn" onclick="switchSection('live')">Live Device</button>
<button class="section-btn active" onclick="switchSection('db')">Database</button>
</div>
<div class="hdr-sep"></div>
<label class="force-toggle" id="force-toggle"
@@ -802,6 +818,12 @@
<span class="ft-dot"></span>
<span>Force refresh</span>
</label>
<div class="hdr-sep"></div>
<button id="mic-unit-toggle" class="section-btn"
onclick="_setMicUnit(_getMicUnit() === 'dBL' ? 'psi' : 'dBL')"
title="Toggle microphone display unit (dBL ↔ psi) for waveform plots. Affects all mic charts; persists across page loads.">
Mic: dBL
</button>
</header>
<!-- ════════════════════════════════════════════════════════════════
@@ -1224,18 +1246,18 @@
<div class="db-table-wrap" id="hist-table-wrap" style="display:none">
<table class="db-table" id="hist-table">
<thead>
<tr>
<th>Timestamp</th>
<th>Serial</th>
<th>Tran (in/s)</th>
<th>Vert (in/s)</th>
<th>Long (in/s)</th>
<th>PVS (in/s)</th>
<th>Mic (dBL)</th>
<th>Project</th>
<th>Client</th>
<th>Type</th>
<th>Key</th>
<tr id="hist-header-row">
<th data-sort="timestamp">Timestamp <span class="sort-arrow"></span></th>
<th data-sort="serial">Serial <span class="sort-arrow"></span></th>
<th data-sort="tran_ppv">Tran (in/s) <span class="sort-arrow"></span></th>
<th data-sort="vert_ppv">Vert (in/s) <span class="sort-arrow"></span></th>
<th data-sort="long_ppv">Long (in/s) <span class="sort-arrow"></span></th>
<th data-sort="peak_vector_sum">PVS (in/s) <span class="sort-arrow"></span></th>
<th data-sort="mic_ppv">Mic (dBL) <span class="sort-arrow"></span></th>
<th data-sort="project">Project <span class="sort-arrow"></span></th>
<th data-sort="client">Client <span class="sort-arrow"></span></th>
<th data-sort="record_type">Type <span class="sort-arrow"></span></th>
<th data-sort="waveform_key">Key <span class="sort-arrow"></span></th>
<th></th>
</tr>
</thead>
@@ -1388,7 +1410,9 @@ function deviceParams() {
}
// ── Section switching ─────────────────────────────────────────────────────────
let currentSection = 'live';
// Default to Database — most users land here to browse stored events.
// Live Device is opt-in (click the tab to talk to a unit).
let currentSection = 'db';
function switchSection(name) {
currentSection = name;
@@ -2333,6 +2357,12 @@ async function _fetchUnits() {
}
// ── History tab ────────────────────────────────────────────────────────────────
// Module-level state for the history table — preserved across re-sorts.
// We sort + re-render without re-fetching.
let _histEvents = [];
let _histSortKey = 'timestamp';
let _histSortDir = 'desc'; // 'asc' | 'desc'
async function loadHistory() {
histLoaded = true;
const serial = document.getElementById('hist-serial-filter').value;
@@ -2364,10 +2394,20 @@ async function loadHistory() {
_populateSerialDropdown('monlog-serial-filter');
_populateSerialDropdown('sess-serial-filter');
document.getElementById('hist-count').textContent = `${events.length} event${events.length !== 1 ? 's' : ''}`;
_histEvents = events;
renderHistTable();
}
// Re-render the history table from `_histEvents` using the current sort
// state. Pulled out of `loadHistory` so column-header clicks can re-sort
// in-memory without re-fetching from the server.
function renderHistTable() {
const events = _histEvents;
document.getElementById('hist-count').textContent =
`${events.length} event${events.length !== 1 ? 's' : ''}`;
const tbody = document.getElementById('hist-tbody');
tbody.innerHTML = '';
if (events.length === 0) {
document.getElementById('hist-empty').style.display = 'block';
document.getElementById('hist-table-wrap').style.display = 'none';
@@ -2376,11 +2416,31 @@ async function loadHistory() {
document.getElementById('hist-empty').style.display = 'none';
document.getElementById('hist-table-wrap').style.display = 'block';
for (const ev of events) {
// Sort in-place by current key + direction. Nulls sink to the bottom
// regardless of direction.
const k = _histSortKey;
const dir = _histSortDir === 'asc' ? 1 : -1;
const sorted = [...events].sort((a, b) => {
const av = a[k], bv = b[k];
if (av == null && bv == null) return 0;
if (av == null) return 1;
if (bv == null) return -1;
if (typeof av === 'number' && typeof bv === 'number') return (av - bv) * dir;
return String(av).localeCompare(String(bv)) * dir;
});
// Update arrow indicators in the headers
document.querySelectorAll('#hist-header-row th[data-sort]').forEach(th => {
const arrow = th.querySelector('.sort-arrow');
if (!arrow) return;
arrow.textContent = th.dataset.sort === k ? (_histSortDir === 'asc' ? '↑' : '↓') : '';
});
for (const ev of sorted) {
const tr = document.createElement('tr');
const pvs = ev.peak_vector_sum;
tr.classList.add('clickable');
tr.title = 'Click to review (open sidecar editor)';
tr.title = 'Click to view waveform + sidecar';
tr.dataset.eventId = ev.id;
tr.innerHTML = `
<td>${_fmtTs(ev.timestamp)}</td>
@@ -2408,6 +2468,28 @@ async function loadHistory() {
}
}
// Click a column header → toggle sort. Click another → set sort to that column.
document.addEventListener('DOMContentLoaded', () => {
const headerRow = document.getElementById('hist-header-row');
if (!headerRow) return;
headerRow.querySelectorAll('th[data-sort]').forEach(th => {
th.style.cursor = 'pointer';
th.style.userSelect = 'none';
th.addEventListener('click', () => {
const k = th.dataset.sort;
if (_histSortKey === k) {
_histSortDir = _histSortDir === 'asc' ? 'desc' : 'asc';
} else {
_histSortKey = k;
// Default direction: 'desc' for numbers + timestamps (biggest/newest first),
// 'asc' for text columns (alphabetical).
_histSortDir = ['serial','project','client','record_type','waveform_key'].includes(k) ? 'asc' : 'desc';
}
renderHistTable();
});
});
});
// ── Sidecar review modal ───────────────────────────────────────────────────────
//
// Opens on row click in the History table. Loads the .sfm.json sidecar
@@ -2430,23 +2512,373 @@ async function openSidecarModal(eventId) {
document.getElementById('sc-edit-ft').checked = false;
document.getElementById('sc-edit-reviewer').value = '';
document.getElementById('sc-edit-notes').value = '';
// Reset waveform area
document.getElementById('sc-waveform-status').textContent = 'Loading waveform…';
document.getElementById('sc-waveform-charts').innerHTML = '';
_destroyScCharts();
try {
const r = await fetch(`${api()}/db/events/${eventId}/sidecar`);
if (!r.ok) {
const e = await r.json().catch(() => ({}));
throw new Error(e.detail || r.statusText);
}
const data = await r.json();
// Sidecar + waveform fetched in parallel — neither blocks the other.
const sidecarP = fetch(`${api()}/db/events/${eventId}/sidecar`)
.then(async r => {
if (!r.ok) { const e = await r.json().catch(() => ({})); throw new Error(e.detail || r.statusText); }
return r.json();
});
const waveformP = fetch(`${api()}/db/events/${eventId}/waveform.json`)
.then(async r => {
if (r.status === 404) return null; // no waveform available — render empty state
if (!r.ok) { const e = await r.json().catch(() => ({})); throw new Error(e.detail || r.statusText); }
return r.json();
});
// Sidecar usually loads first (smaller payload). Each one renders
// independently so the modal becomes useful as soon as either lands.
sidecarP.then(data => {
_scCurrentSidecar = data;
_renderSidecar(data);
document.getElementById('sc-status').textContent = '';
} catch (e) {
}).catch(e => {
document.getElementById('sc-status').className = 'sc-status error';
document.getElementById('sc-status').textContent = `Load failed: ${e.message}`;
document.getElementById('sc-status').textContent = `Sidecar load failed: ${e.message}`;
});
waveformP.then(data => {
if (!data) {
document.getElementById('sc-waveform-status').textContent = 'No waveform data for this event.';
return;
}
_renderScWaveform(data);
}).catch(e => {
document.getElementById('sc-waveform-status').textContent = `Waveform load failed: ${e.message}`;
});
}
// ── Sidecar-modal waveform plot ──────────────────────────────────────────────
// Renders the 4-channel decoded waveform fetched from
// /db/events/{id}/waveform.json — MicL on top, Tran on bottom (matches
// Instantel BW Event Report layout). Uses Chart.js (loaded at the top of
// the page for the live-device viewer).
const _SC_CHANNEL_COLORS = {
MicL: '#e066ff',
Long: '#3a80ff',
Vert: '#3fb950',
Tran: '#f85149',
};
const _SC_CHANNEL_ORDER = ['MicL', 'Long', 'Vert', 'Tran'];
let _scCharts = {};
// User preference for how mic is displayed in plots — dBL (default,
// matches BW printout convention + the rest of SFM) or psi (the raw
// sample unit). Toggleable via the header pill; persists in localStorage.
function _getMicUnit() {
return localStorage.getItem('sfm_mic_unit') === 'psi' ? 'psi' : 'dBL';
}
function _setMicUnit(u) {
localStorage.setItem('sfm_mic_unit', u === 'psi' ? 'psi' : 'dBL');
_refreshMicUnitToggleLabel();
// Re-render the open modal so the change is immediately visible.
if (_scCurrentEventId) openSidecarModal(_scCurrentEventId);
}
function _refreshMicUnitToggleLabel() {
const b = document.getElementById('mic-unit-toggle');
if (b) b.textContent = `Mic: ${_getMicUnit()}`;
}
// Convert a psi value to dB(L). Returns null for non-positive values
// (log of zero is undefined) — Chart.js handles null as a gap in the line.
function _psiToDbl(psi) {
if (psi == null || !(psi > 0)) return null;
return 20 * Math.log10(psi / DBL_REF);
}
// Per-sample mic display floor. Sound pressure AC samples spend most
// of their time at the digitization noise floor (1-2 ADC counts ≈ ~20-40
// dBL). Rendering each one as null/-inf produces a spikey discontinuous
// chart of "moments when sound briefly exceeded 80 dBL" — confusing.
// Instead we rectify (abs the AC waveform), convert to dBL, and floor
// anything below MIC_DBL_FLOOR so the chart has a continuous baseline
// with peaks rising above it. Matches how acoustic engineers expect to
// see SPL-vs-time.
const MIC_DBL_FLOOR = 60;
function _psiToDblForChart(psi) {
if (psi == null) return MIC_DBL_FLOOR;
const a = Math.abs(psi);
if (a === 0) return MIC_DBL_FLOOR;
const dbl = 20 * Math.log10(a / DBL_REF);
return dbl > MIC_DBL_FLOOR ? dbl : MIC_DBL_FLOOR;
}
// Adaptive decimal formatter — scientific notation is reserved for truly
// extreme values (10000+ or sub-0.0001). Normal-range values (most peaks
// fall here) render as decimals with sensible precision. Replaces the
// previous .toExponential(3) call that turned every peak into ugly "2.500E-2".
function _fmtPeak(v, unit) {
if (v == null || (typeof v === 'number' && !isFinite(v))) return '';
if (typeof v !== 'number') return String(v) + (unit ? ' ' + unit : '');
if (v === 0) return '0' + (unit ? ' ' + unit : '');
const a = Math.abs(v);
const u = unit ? ' ' + unit : '';
if (a >= 0.0001 && a < 10000) {
const d = a >= 100 ? 1 : a >= 10 ? 2 : a >= 1 ? 3 : a >= 0.1 ? 4 : 5;
return v.toFixed(d) + u;
}
return v.toExponential(2) + u;
}
function _destroyScCharts() {
Object.values(_scCharts).forEach(c => { try { c.destroy(); } catch {} });
_scCharts = {};
}
function _renderScWaveform(data) {
document.getElementById('sc-waveform-status').textContent = '';
const chartsDiv = document.getElementById('sc-waveform-charts');
chartsDiv.innerHTML = '';
_destroyScCharts();
const channels = data.channels || {};
// time_axis is METADATA, not an array — it carries sample_rate,
// pretrig_samples, t0_ms (first-sample time relative to trigger,
// negative when pretrig samples exist), and dt_ms. Trigger is at
// t=0 by convention.
const ta = data.time_axis || {};
const sr = ta.sample_rate || 1024;
const dtMs = ta.dt_ms || (1000.0 / sr);
const t0Ms = ta.t0_ms != null ? ta.t0_ms : 0;
// Histogram events have per-interval peaks, not per-sample data.
// Render as bars (one per interval) instead of a connected line, and
// suppress trigger/zero overlays which don't apply. X-axis becomes
// interval index since the sample_rate-based time math is meaningless
// here (each "sample" is one interval, typically 1-5 minutes long).
const isHistogram = String(data.record_type || '').toLowerCase().includes('histogram');
// Which channels have data — determines which one renders the shared bottom axis.
const withData = _SC_CHANNEL_ORDER.filter(ch =>
channels[ch] && (channels[ch].values || []).length > 0
);
const lastCh = withData[withData.length - 1];
const micUnit = _getMicUnit(); // user preference: 'dBL' or 'psi'
for (const ch of _SC_CHANNEL_ORDER) {
const chData = channels[ch];
if (!chData) continue;
let values = chData.values || [];
let chUnit = chData.unit || '';
let chPeak = chData.peak;
// Mic channel: convert from raw psi to dB(L) when user prefers dBL
// (default). Per-sample values use _psiToDblForChart which rectifies
// (abs) the AC waveform and floors at MIC_DBL_FLOOR so the chart is
// continuous with a baseline + peaks above it, instead of a sparse
// pattern of isolated spikes for "moments when sound briefly exceeded
// the Y-axis bottom". The peak label uses _psiToDbl with the
// unrectified peak (preserves the true measurement).
if (ch === 'MicL' && chUnit === 'psi' && micUnit === 'dBL') {
values = values.map(_psiToDblForChart);
chPeak = _psiToDbl(chPeak);
chUnit = 'dB(L)';
}
const wrap = document.createElement('div');
wrap.style.cssText = 'background:var(--surface);border:1px solid var(--border2);border-radius:6px;padding:6px 30px 4px 10px';
const lbl = document.createElement('div');
lbl.style.cssText = `font-size:10px;font-weight:600;letter-spacing:0.05em;text-transform:uppercase;margin-bottom:2px;color:${_SC_CHANNEL_COLORS[ch]};display:flex;justify-content:space-between`;
const peakStr = chPeak != null
? `peak ${_fmtPeak(chPeak, chUnit)}`
: '';
lbl.innerHTML = `<span>${ch}</span><span style="color:var(--text-dim);font-weight:normal">${peakStr}</span>`;
wrap.appendChild(lbl);
if (values.length === 0) {
const e = document.createElement('div');
e.style.cssText = 'height:80px;display:flex;align-items:center;justify-content:center;color:var(--text-dim);font-size:11px';
e.textContent = 'no samples decoded';
wrap.appendChild(e);
chartsDiv.appendChild(wrap);
continue;
}
const canvasWrap = document.createElement('div');
canvasWrap.style.cssText = 'position:relative;height:100px';
const canvas = document.createElement('canvas');
canvasWrap.appendChild(canvas);
wrap.appendChild(canvasWrap);
chartsDiv.appendChild(wrap);
// Waveform: per-sample time in ms relative to trigger (negative for pretrig).
// Histogram: when the server has aggregated to BW-reported intervals AND
// provides per-interval timestamps, use those as x-axis labels (HH:MM:SS).
// Falls back to interval index.
let times;
if (isHistogram) {
const intervalTimes = ta.interval_times || [];
times = (intervalTimes.length === values.length)
? intervalTimes
: values.map((_, i) => i + 1);
} else {
times = values.map((_, i) => t0Ms + i * dtMs);
}
// Downsample for rendering when very long.
const MAX = 3000;
let rT = times, rV = values;
if (values.length > MAX) {
const step = Math.ceil(values.length / MAX);
rT = times.filter((_, i) => i % step === 0);
rV = values.filter((_, i) => i % step === 0);
}
const showX = (ch === lastCh);
// Tick label formatter: snap floats to 1 decimal place so we don't get
// "11.7187040000000002 ms" garbage from accumulated floating-point error.
const xAxisLabel = isHistogram ? '' : ' ms';
const fmtTick = i => {
const v = rT[i];
if (typeof v === 'number') {
// Whole numbers (intervals) → no decimals. Sub-integer ms → 1 decimal.
const s = Number.isInteger(v) ? String(v) : v.toFixed(1);
return s + xAxisLabel;
}
return String(v) + xAxisLabel;
};
// Y-axis bounds. Convention:
// - Geophones (Tran/Vert/Long) on waveform-mode events:
// symmetric around zero so the zero line sits in the middle and
// positive/negative excursions are visually balanced.
// - Mic (always positive sound pressure) + histograms (per-interval
// peaks, always positive): default auto-scale, zero at the bottom.
let yBounds = {};
const isGeo = ch !== 'MicL';
if (isGeo && !isHistogram) {
// Waveform geo: symmetric around zero, full zoom to shape detail.
let absMax = 0;
for (const v of values) {
const a = Math.abs(v);
if (a > absMax) absMax = a;
}
const padded = (absMax || 1) * 1.10;
yBounds = { min: -padded, max: padded };
} else if (isGeo && isHistogram) {
// Histogram geo: enforce a minimum chart range so a quiet
// 0.005 in/s event renders as ~10% of chart height instead of
// filling the panel. Matches BW's near-fixed-scale convention
// (their footer is "Geo: 0.002 in/s/div" — a chart-relative scale,
// not auto-zoom).
const HIST_GEO_MIN_INS = 0.05;
let peak = 0;
for (const v of values) { const a = Math.abs(v); if (a > peak) peak = a; }
yBounds = { min: 0, max: Math.max(peak * 1.10, HIST_GEO_MIN_INS) };
} else if (ch === 'MicL' && micUnit === 'dBL') {
// Mic in dBL — pin baseline at noise-floor minimum (where we floored
// quiet samples), top at actual peak + a few dB headroom.
const peakDbl = (typeof chPeak === 'number' && isFinite(chPeak))
? chPeak + 5
: 100;
yBounds = { min: MIC_DBL_FLOOR, max: Math.max(peakDbl, MIC_DBL_FLOOR + 20) };
} else if (ch === 'MicL' && isHistogram && micUnit === 'psi') {
// Mic histogram in psi — same minimum-range treatment as geo.
// 0.001 psi ≈ 110 dBL — typical "loud" mic peak. Quiet events
// sit near the bottom.
const HIST_MIC_MIN_PSI = 0.001;
let peak = 0;
for (const v of values) { const a = Math.abs(v); if (a > peak) peak = a; }
yBounds = { min: 0, max: Math.max(peak * 1.10, HIST_MIC_MIN_PSI) };
}
_scCharts[ch] = new Chart(canvas, {
type: isHistogram ? 'bar' : 'line',
data: {
labels: rT.map(t => (typeof t === 'number' ? (Number.isInteger(t) ? String(t) : t.toFixed(2)) : t)),
datasets: isHistogram ? [{
data: rV,
backgroundColor: _SC_CHANNEL_COLORS[ch],
borderWidth: 0,
barPercentage: 1.0,
categoryPercentage: 1.0, // bars touch — "tight bargraph" look
}] : [{
data: rV,
borderColor: _SC_CHANNEL_COLORS[ch],
borderWidth: 1,
pointRadius: 0,
tension: 0,
}],
},
options: {
animation: false, responsive: true, maintainAspectRatio: false,
plugins: {
legend: { display: false },
tooltip: {
mode: 'index', intersect: false,
callbacks: {
title: items => isHistogram
? `interval ${items[0].label}`
: `t = ${items[0].label} ms`,
label: item => `${ch}: ${_fmtPeak(item.raw, chUnit)}`,
},
},
},
scales: {
x: {
type: 'category', display: showX,
ticks: { color: '#484f58', maxTicksLimit: 8, maxRotation: 0, callback: (v, i) => fmtTick(i) },
grid: { color: '#21262d', drawTicks: showX },
},
y: {
...yBounds,
ticks: { color: '#484f58', maxTicksLimit: 4 },
grid: { color: '#21262d' },
title: { display: true, text: chUnit, color: '#484f58', font: { size: 9 } },
},
},
},
plugins: isHistogram ? [] : [{
// Trigger line + triangle markers + zero baseline — only meaningful
// for waveform-mode events. Histograms have no trigger.
id: 'overlays',
afterDraw(chart) {
const ctx = chart.ctx, x = chart.scales.x, y = chart.scales.y;
// Dashed trigger line at t=0
const zi = rT.findIndex(t => parseFloat(t) >= 0);
if (zi >= 0) {
const px = x.getPixelForValue(zi);
ctx.save();
ctx.beginPath(); ctx.moveTo(px, y.top); ctx.lineTo(px, y.bottom);
ctx.strokeStyle = 'rgba(248,81,73,0.8)'; ctx.lineWidth = 1.2;
ctx.setLineDash([4, 3]); ctx.stroke(); ctx.restore();
// Triangle markers above and below the chart
ctx.save();
ctx.fillStyle = '#f85149';
ctx.beginPath();
ctx.moveTo(px - 4, y.top - 7); ctx.lineTo(px + 4, y.top - 7); ctx.lineTo(px, y.top - 1);
ctx.closePath(); ctx.fill();
ctx.beginPath();
ctx.moveTo(px - 4, y.bottom + 7); ctx.lineTo(px + 4, y.bottom + 7); ctx.lineTo(px, y.bottom + 1);
ctx.closePath(); ctx.fill();
ctx.restore();
}
// Zero baseline + label
const zy = y.getPixelForValue(0);
if (zy >= y.top && zy <= y.bottom) {
ctx.save();
ctx.strokeStyle = '#30363d'; ctx.lineWidth = 0.8;
ctx.setLineDash([2, 2]);
ctx.beginPath(); ctx.moveTo(x.left, zy); ctx.lineTo(x.right, zy); ctx.stroke();
ctx.restore();
ctx.save();
ctx.fillStyle = '#c9d1d9'; ctx.font = '10px monospace';
ctx.textAlign = 'left'; ctx.textBaseline = 'middle';
ctx.fillText('0.0', x.right + 6, zy);
ctx.restore();
}
},
}],
});
}
}
// Make sure charts get cleaned up when the modal closes.
function _scCleanupOnClose() { _destroyScCharts(); }
function _renderSidecar(data) {
const ev = data.event || {};
const pv = data.peak_values || {};
@@ -2454,6 +2886,12 @@ function _renderSidecar(data) {
const bw = data.blastware || {};
const src = data.source || {};
const rev = data.review || {};
// bw_report carries the per-channel ASCII-derived stats (ZC Freq,
// saturation flags, peak time, etc.). Only present on events
// ingested with a preserved .TXT (post-2026-05-27); falls back to
// empty for legacy events.
const bwrPeaks = (data.bw_report || {}).peaks || {};
const bwrMic = (data.bw_report || {}).mic || {};
document.getElementById('sc-title').textContent = `Event — ${bw.filename || ev.waveform_key || 'unknown'}`;
@@ -2479,27 +2917,72 @@ function _renderSidecar(data) {
};
document.getElementById('sc-f-serial').textContent = ev.serial || '—';
document.getElementById('sc-f-ts').textContent = ev.timestamp || '—';
// Route through _fmtTs so the unit-local naive timestamp shows as
// "5/27/2026, 6:00:13 AM" instead of "2026-05-27T06:00:13".
document.getElementById('sc-f-ts').textContent = _fmtTs(ev.timestamp);
document.getElementById('sc-f-rt').textContent = ev.record_type || '—';
document.getElementById('sc-f-sr').textContent = (ev.sample_rate ?? '—') + (ev.sample_rate ? ' sps' : '');
document.getElementById('sc-f-key').textContent = ev.waveform_key || '—';
document.getElementById('sc-f-tran').textContent = fmtPpv(pv.transverse);
document.getElementById('sc-f-vert').textContent = fmtPpv(pv.vertical);
document.getElementById('sc-f-long').textContent = fmtPpv(pv.longitudinal);
// Suffix with " · {prefix}{N} Hz" when bw_report has a ZC Freq.
// Above-range ZC peaks (BW ">100 Hz") get a literal ">" prefix so
// operators see the same indicator the PDF shows.
const fmtZc = bwr => {
if (!bwr || bwr.zc_freq_hz == null) return '';
const prefix = bwr.zc_freq_above_range ? '>' : '';
return ` · ${prefix}${Math.round(bwr.zc_freq_hz)} Hz`;
};
document.getElementById('sc-f-tran').textContent = fmtPpv(pv.transverse) + fmtZc(bwrPeaks.tran);
document.getElementById('sc-f-vert').textContent = fmtPpv(pv.vertical) + fmtZc(bwrPeaks.vert);
document.getElementById('sc-f-long').textContent = fmtPpv(pv.longitudinal) + fmtZc(bwrPeaks.long);
document.getElementById('sc-f-pvs').textContent = fmtPpv(pv.vector_sum);
document.getElementById('sc-f-mic').textContent = fmtMic(pv.mic_psi);
document.getElementById('sc-f-mic').textContent = fmtMic(pv.mic_psi) + fmtZc(bwrMic);
document.getElementById('sc-f-project').textContent = pi.project || '—';
document.getElementById('sc-f-client').textContent = pi.client || '—';
document.getElementById('sc-f-operator').textContent = pi.operator || '—';
document.getElementById('sc-f-loc').textContent = pi.sensor_location || '—';
document.getElementById('sc-f-bw').textContent = bw.filename || '—';
// Filename rendered as a clickable download link for the original BW
// binary. Same endpoint the live-device viewer uses for stored events
// (/db/events/{id}/blastware_file).
const bwCell = document.getElementById('sc-f-bw');
bwCell.innerHTML = '';
if (bw.filename && _scCurrentEventId) {
const a = document.createElement('a');
a.href = `${api()}/db/events/${_scCurrentEventId}/blastware_file`;
a.textContent = bw.filename;
a.download = bw.filename;
a.title = 'Download original BW event binary';
a.style.color = 'var(--accent, #58a6ff)';
a.style.textDecoration = 'underline';
bwCell.appendChild(a);
} else {
bwCell.textContent = '—';
}
document.getElementById('sc-f-bwsize').textContent = bw.filesize != null ? `${bw.filesize} bytes` : '—';
document.getElementById('sc-f-sha').textContent = bw.sha256 || '—';
document.getElementById('sc-f-src').textContent = src.kind || '—';
document.getElementById('sc-f-cap').textContent = src.captured_at || '—';
// Source kind + a download link for the preserved BW ASCII report
// (.TXT), when available. Only events ingested after 2026-05-27
// have the .TXT preserved; older events show "—".
const srcCell = document.getElementById('sc-f-src');
srcCell.innerHTML = '';
srcCell.appendChild(document.createTextNode(src.kind || '—'));
if (src.txt_filename && _scCurrentEventId) {
const a = document.createElement('a');
a.href = `${api()}/db/events/${_scCurrentEventId}/ascii_report.txt`;
a.textContent = ' (download .TXT)';
a.download = src.txt_filename;
a.title = 'Download preserved BW ASCII report';
a.style.color = 'var(--accent, #58a6ff)';
a.style.marginLeft = '8px';
a.style.fontSize = '11px';
srcCell.appendChild(a);
}
// captured_at has a "Z" suffix (UTC); _fmtTs converts to browser local
// — matches the BW-reported recorded-at, no more "21:59:57 vs it's 6 PM"
// confusion from operators reading the raw UTC value.
document.getElementById('sc-f-cap').textContent = _fmtTs(src.captured_at);
document.getElementById('sc-edit-ft').checked = !!rev.false_trigger;
document.getElementById('sc-edit-reviewer').value = rev.reviewer || '';
@@ -2512,6 +2995,19 @@ function closeSidecarModal() {
document.getElementById('sc-overlay').classList.remove('visible');
_scCurrentEventId = null;
_scCurrentSidecar = null;
_destroyScCharts();
}
// Trigger a PDF download for the currently-open event. The browser
// handles the actual save dialog from the Content-Disposition header
// the server sends.
function downloadEventReport() {
if (!_scCurrentEventId) return;
const url = `${api()}/db/events/${_scCurrentEventId}/report.pdf`;
// Open in a new tab — browser prompts to save or displays inline,
// and a failed fetch (e.g. 404 for events with no waveform) shows
// its JSON error in-page rather than silently failing.
window.open(url, '_blank');
}
function onSidecarOverlayClick(e) {
@@ -2722,6 +3218,16 @@ document.addEventListener('keydown', e => {
// hit localhost:8200, 10.0.0.44:8200, or anything else.
document.getElementById('api-base').value = window.location.origin;
// Reflect any persisted mic-unit preference in the header pill on load
_refreshMicUnitToggleLabel();
// We default to Database view → trigger initial history + units load
// (switchSection handles this when clicked, but we never click on first paint).
if (currentSection === 'db') {
if (!histLoaded) loadHistory();
if (!unitsLoaded) loadUnits();
}
// Press Enter in any live connect field to connect
['dev-host','dev-port'].forEach(id => {
document.getElementById(id)?.addEventListener('keydown', e => { if (e.key === 'Enter') connectUnit(); });
@@ -2738,11 +3244,18 @@ document.getElementById('api-base').value = window.location.origin;
<button class="sc-close" onclick="closeSidecarModal()">×</button>
</div>
<div class="sc-body">
<!-- Waveform plot — 4 channels stacked (MicL, Long, Vert, Tran) — -->
<div class="sc-section" id="sc-section-waveform">
<h4>Waveform</h4>
<div id="sc-waveform-status" style="color:var(--text-dim);font-size:11px;margin-bottom:6px">Loading…</div>
<div id="sc-waveform-charts" style="display:flex;flex-direction:column;gap:6px"></div>
</div>
<div class="sc-section">
<h4>Event</h4>
<dl class="sc-grid">
<dt>Serial</dt> <dd id="sc-f-serial"></dd>
<dt>Timestamp</dt> <dd id="sc-f-ts"></dd>
<dt title="When the seismograph recorded this event (from the BW report's Event Time field)">Recorded at</dt>
<dd id="sc-f-ts"></dd>
<dt>Record type</dt> <dd id="sc-f-rt"></dd>
<dt>Sample rate</dt> <dd id="sc-f-sr"></dd>
<dt>Waveform key</dt> <dd id="sc-f-key"></dd>
@@ -2774,7 +3287,8 @@ document.getElementById('api-base').value = window.location.origin;
<dt id="sc-l-bwsize">File size</dt> <dd id="sc-f-bwsize"></dd>
<dt id="sc-l-sha">File sha256</dt> <dd id="sc-f-sha"></dd>
<dt>Source kind</dt> <dd id="sc-f-src"></dd>
<dt>Captured at</dt> <dd id="sc-f-cap"></dd>
<dt title="When SFM received and stored this event — NOT the unit-local trigger time (see Timestamp at the top of the modal for that).">Time received</dt>
<dd id="sc-f-cap"></dd>
</dl>
</div>
<div class="sc-section">
@@ -2797,6 +3311,10 @@ document.getElementById('api-base').value = window.location.origin;
</div>
<div class="sc-footer">
<span class="sc-status" id="sc-status"></span>
<button class="btn btn-ghost" id="sc-pdf-btn" onclick="downloadEventReport()"
title="Download an Instantel-style Event Report PDF for this event">
Download PDF
</button>
<button class="btn btn-ghost" onclick="closeSidecarModal()">Cancel</button>
<button class="btn" id="sc-save-btn" onclick="saveSidecarReview()">Save</button>
</div>
+231 -23
View File
@@ -108,11 +108,30 @@ class WaveformStore:
"""Return absolute path to the .h5 clean-waveform file for a given event."""
return self._serial_dir(serial) / f"{filename}.h5"
def txt_path_for(self, serial: str, filename: str) -> Path:
"""Return absolute path to the preserved BW ASCII report (.TXT)
for a given event.
We name it ``<filename>_ASCII.TXT`` to match BW's own filename
convention in the ACH folder. Saved at ingest time alongside
the binary so the parser bug fixes can be applied retroactively
by re-parsing without needing to re-forward from the watcher PC.
"""
return self._serial_dir(serial) / f"{filename}_ASCII.TXT"
def open_blastware(self, serial: str, filename: str) -> Optional[Path]:
"""Return absolute path to an existing event file or None."""
bw_path, _ = self.paths_for(serial, filename)
return bw_path if bw_path.exists() else None
def open_txt(self, serial: str, filename: str) -> Optional[Path]:
"""Return absolute path to the preserved BW ASCII report for an
event, or None if the .TXT wasn't saved at ingest time (events
ingested before .TXT preservation landed will show None until
re-forwarded)."""
p = self.txt_path_for(serial, filename)
return p if p.exists() else None
# ── save / load ─────────────────────────────────────────────────────────────
def save(
@@ -357,6 +376,28 @@ class WaveformStore:
filesize = bw_path.stat().st_size
sha256 = event_file_io.file_sha256(bw_path)
# 1b. preserve the raw BW ASCII report (.TXT) alongside the binary.
# Saved at <root>/<serial>/<filename>_ASCII.TXT. Lets us re-parse
# offline after parser fixes without needing to re-forward from
# the watcher PC. Negligible storage cost (~15 KB per event).
# Skipped silently when no report was supplied (live download path,
# manual upload without paired TXT).
txt_filename: Optional[str] = None
if bw_report_text is not None:
try:
txt_path = self.txt_path_for(serial, filename)
if isinstance(bw_report_text, bytes):
txt_path.write_bytes(bw_report_text)
else:
txt_path.write_text(bw_report_text)
txt_filename = txt_path.name
except Exception as exc:
log.warning(
"save_imported_bw: failed to save TXT for %s: %s"
"continuing without it",
filename, exc,
)
# 2. write the .h5 clean-waveform file from the parsed Event.
# Note: peaks here are computed from raw samples (the BW file
# doesn't carry the device-authoritative 0C peaks). Best-effort.
@@ -393,6 +434,7 @@ class WaveformStore:
blastware_sha256=sha256,
source_kind="bw-import",
a5_pickle_filename=None,
txt_filename=txt_filename,
review=existing_review,
bw_report=bw_report,
)
@@ -425,21 +467,21 @@ class WaveformStore:
Ingest a Thor (Micromate Series IV) IDF event file (`.IDFW` or
`.IDFH`) produced by Thor's TXT exporter.
Thor binaries are stored as opaque bytes seismo-relay doesn't
yet decode the proprietary IDF binary format (codec slot lives
at ``micromate/idf_file.py``). Device-authoritative metadata
comes from the paired ``.IDFW.txt`` / ``.IDFH.txt`` sidecar
when supplied.
Workflow:
1. Parse the paired TXT report (when supplied) via
``micromate.parse_idf_report`` dict.
2. Wrap parsed dict + filename into a typed ``micromate.IdfEvent``.
3. Copy bytes verbatim into ``<root>/<serial>/<filename>``.
4. Bridge IdfEvent ``minimateplus.Event`` (for the existing
sidecar / DB insert machinery) via
``IdfEvent.to_minimateplus_event(waveform_key)``.
5. Write the ``.sfm.json`` sidecar with
1. For sig-A `.IDFW` binaries, decode samples + binary metadata
via ``micromate.idf_file.read_idf_file()``. Failure or
non-IDFW path falls through to the .txt-only flow.
2. Parse the paired TXT report (when supplied) via
``micromate.parse_idf_report`` dict. TXT remains the
source of truth for fields the binary doesn't yet supply
(full peak set with ZC freq / Time of Peak, sensor self-check,
firmware string, project strings).
3. Wrap parsed dict + filename into a typed ``micromate.IdfEvent``.
4. Copy bytes verbatim into ``<root>/<serial>/<filename>``.
5. Bridge IdfEvent ``minimateplus.Event`` and attach
``raw_samples`` from the binary decoder (when available).
6. Write the `.h5` clean-waveform file when samples decoded.
7. Write the ``.sfm.json`` sidecar with
``source.kind = "idf-import"`` and the full raw IDF report
under ``extensions.idf_report``.
@@ -448,7 +490,38 @@ class WaveformStore:
"""
from micromate import IdfEvent, parse_idf_report
# Parse the .txt sidecar (best-effort; non-fatal on failure).
# 1. Binary decode (sig-A IDFW and IDFH). Non-fatal: any failure
# leaves samples / binary metadata unfilled and we proceed with
# the .txt path as before.
idf_samples: Optional[dict] = None
idf_intervals: Optional[list] = None
binary_md = None
binary_peaks = None
is_histogram = False
try:
from micromate.idf_file import read_idf_file
# Pass idf_bytes through `data=` — at this point in the flow
# the binary hasn't been written to disk yet, so the codec
# can't read from source_path. We still pass source_path so
# the codec has the filename for error messages + .IDFH
# suffix detection.
res = read_idf_file(source_path, data=idf_bytes)
idf_samples = res.samples or None
idf_intervals = res.intervals
is_histogram = res.intervals is not None
binary_md = res.binary_metadata
binary_peaks = res.event.peaks
except NotImplementedError:
# sig-B — codec doesn't handle this yet.
pass
except Exception as exc:
log.warning(
"save_imported_idf: binary codec failed for %s: %s"
"falling back to .txt-only ingest",
source_path.name, exc,
)
# 2. Parse the .txt sidecar (best-effort; non-fatal on failure).
report_dict: dict = {}
if idf_report_text is not None:
try:
@@ -459,17 +532,58 @@ class WaveformStore:
exc,
)
# Build the typed IdfEvent. Filename is authoritative for
# 3. Backfill report_dict with binary metadata for fields the
# .txt didn't supply. Binary takes precedence on tied fields
# where the binary is more reliable (timestamp, sample_rate),
# and fills in fields entirely missing from the .txt.
if binary_md is not None:
if binary_md.serial and not report_dict.get("serial_number"):
report_dict["serial_number"] = binary_md.serial
if binary_md.event_datetime and not report_dict.get("event_datetime"):
report_dict["event_datetime"] = binary_md.event_datetime
if binary_md.sample_rate and not report_dict.get("sample_rate"):
report_dict["sample_rate"] = binary_md.sample_rate
if binary_md.record_time_sec and not report_dict.get("record_time_sec"):
report_dict["record_time_sec"] = binary_md.record_time_sec
# Calibration date (binary) vs calibration text (.txt) cohabit
# under different keys; no overwrite needed.
if binary_md.event_datetime and not report_dict.get("event_type"):
report_dict["event_type"] = (
"Full Histogram" if is_histogram else "Full Waveform"
)
# Binary-derived peaks fill in when the .txt didn't supply them.
# They're ~3% low vs the device-authoritative .txt values (residual
# codec drift), so .txt always wins when present.
if binary_peaks is not None:
if binary_peaks.transverse_ips and not report_dict.get("tran_ppv"):
report_dict["tran_ppv"] = binary_peaks.transverse_ips
if binary_peaks.vertical_ips and not report_dict.get("vert_ppv"):
report_dict["vert_ppv"] = binary_peaks.vertical_ips
if binary_peaks.longitudinal_ips and not report_dict.get("long_ppv"):
report_dict["long_ppv"] = binary_peaks.longitudinal_ips
# 4. Build the typed IdfEvent. Filename is authoritative for
# (serial, timestamp, kind); the report's event_datetime takes
# precedence over the filename timestamp inside from_report().
idf_event = IdfEvent.from_report(report_dict, source_path.name)
# The binary mic peak (psi) isn't carried through from_report() —
# IdfReport.from_dict only sees the .txt's dB(L) value. Pull the
# binary-derived ``mic_pspl_psi`` onto the typed IdfEvent so the
# downstream bridge can populate ``PeakValues.micl`` (psi-shaped)
# and the h5 writer's per-count mic factor lands at a sensible
# value. Without this, the h5 mic chart auto-scales against the
# dB(L) value-as-pseudo-psi and renders ~flat.
if binary_peaks is not None and binary_peaks.mic_pspl_psi is not None:
idf_event.peaks.mic_pspl_psi = binary_peaks.mic_pspl_psi
# Operator-supplied serial_hint wins over the binary's filename
# prefix when both are present (e.g. callers passing a known-good
# serial that overrides a misnamed export).
serial = serial_hint or idf_event.serial or "UNKNOWN"
# Filesystem write.
# 5. Filesystem write of binary bytes.
filename = source_path.name
bw_path = self._serial_dir(serial) / filename
bw_path.write_bytes(idf_bytes)
@@ -481,13 +595,59 @@ class WaveformStore:
# surrogate — every distinct binary maps to a distinct row.
waveform_key = bytes.fromhex(sha256)[:16]
# Bridge to minimateplus.Event for the existing sidecar / DB
# 6. Bridge to minimateplus.Event for the existing sidecar / DB
# insert paths. See IdfEvent.to_minimateplus_event() for the
# caveats of this bridge (mic units, missing fields → sidecar).
ev = idf_event.to_minimateplus_event(waveform_key)
# Write the sidecar. Source kind "idf-import" was added to the
# allow-list in event_file_io.event_to_sidecar_dict for this.
# Attach the decoded sample arrays. Thor's decoder counts use
# LSB = 0.0003 in/s for geo (vs BW's 16-count units at 0.005 in/s)
# — the .h5 writer's geo_range="normal" yields LSB = 10/32768
# ≈ 0.000305 in/s, so plotted samples come out ~1.7% high.
# Acceptable known offset; refine with a Thor-aware h5 path later.
if idf_samples is not None:
ev.raw_samples = idf_samples
n_samples = max((len(idf_samples.get(ch, [])) for ch in ("Tran", "Vert", "Long", "MicL")), default=0)
ev.total_samples = ev.total_samples or n_samples
# For IDFH histograms there are no per-sample waveform arrays — the
# device stores one peak ADC count per interval per channel. Synthesise
# a 1-sample-per-interval array so the existing h5+renderer pipeline
# (which groups samples down to ``n_intervals`` bars via max-per-group)
# produces a non-blank histogram chart. Each "sample" is the peak ADC
# count for that interval, so the h5 writer's ``count × geo_fs/32768``
# conversion yields the right physical value for the bar height.
if is_histogram and idf_intervals:
hist_samples = {
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
"Long": [iv.peak_count("Long") for iv in idf_intervals],
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
}
ev.raw_samples = hist_samples
ev.total_samples = ev.total_samples or len(idf_intervals)
# 7. Write the .h5 clean-waveform file when we have samples to write
# (either the IDFW per-sample stream, or the IDFH synthesised per-
# interval peak array). The renderer treats both shapes the same way.
hdf5_filename: Optional[str] = None
if ev.raw_samples:
hdf5_path = self.hdf5_path_for(serial, filename)
try:
event_hdf5.write_event_hdf5(
hdf5_path, ev,
serial=serial,
geo_range="normal", # Thor's geo full scale is also 10 in/s (Normal)
source_kind="idf-import",
)
hdf5_filename = hdf5_path.name
except Exception as exc:
log.warning(
"save_imported_idf: HDF5 write failed for %s: %s — continuing without .h5",
hdf5_path, exc,
)
# 8. Write the sidecar. Source kind "idf-import" is on the allow-list.
sidecar_path = self.sidecar_path_for(serial, filename)
existing_review = None
if sidecar_path.exists():
@@ -512,19 +672,67 @@ class WaveformStore:
# Time of Peak, sensor self-check, calibration, firmware).
if report_dict:
sidecar["extensions"]["idf_report"] = report_dict
# Project the IDF report into the BW report sidecar shape so the
# existing Event Report PDF pipeline (sfm/report_pdf.py) can
# render Thor events without needing a separate code path. Thor
# data is 95% the same metric set as BW — the adapter handles
# the field-name mapping.
if report_dict or binary_md is not None:
try:
from micromate.idf_to_bw_report import build_bw_report_from_idf
sidecar["bw_report"] = build_bw_report_from_idf(
report_dict or {},
binary_md=binary_md,
intervals=idf_intervals,
is_histogram=is_histogram,
)
except Exception as exc:
log.warning(
"save_imported_idf: idf→bw_report adapter failed for %s: %s"
"report PDF will fall back to DB-only fields",
filename, exc,
)
# For histograms, also stash the binary-decoded per-interval
# records so the UI / report layer doesn't need to re-walk the
# IDFH file at render time.
if idf_intervals is not None:
sidecar["extensions"]["idf_intervals"] = [
{
"offset": iv.offset,
"tran_peak": iv.peak_count("Tran"),
"tran_halfp": iv.tran_halfp,
"tran_freq": iv.freq_hz("Tran"),
"vert_peak": iv.peak_count("Vert"),
"vert_halfp": iv.vert_halfp,
"vert_freq": iv.freq_hz("Vert"),
"long_peak": iv.peak_count("Long"),
"long_halfp": iv.long_halfp,
"long_freq": iv.freq_hz("Long"),
"mic_peak": iv.peak_count("MicL"),
"mic_halfp": iv.micl_halfp,
"mic_freq": iv.freq_hz("MicL"),
}
for iv in idf_intervals
]
event_file_io.write_sidecar(sidecar_path, sidecar)
log.info(
"WaveformStore.save_imported_idf serial=%s filename=%s filesize=%d "
"report_attached=%s",
serial, filename, filesize, bool(report_dict),
"kind=%s report_attached=%s binary_decoded=%s h5=%s intervals=%d",
serial, filename, filesize,
"histogram" if is_histogram else "waveform",
bool(report_dict),
(idf_samples is not None) or (idf_intervals is not None),
hdf5_filename or "(skipped)",
len(idf_intervals) if idf_intervals else 0,
)
return ev, {
"filename": filename,
"filesize": filesize,
"sha256": sha256,
"a5_pickle_filename": None,
"hdf5_filename": None,
"hdf5_filename": hdf5_filename,
"sidecar_filename": sidecar_path.name,
"serial": serial,
}
+92
View File
@@ -385,6 +385,98 @@ def test_user_notes_extra_lines_beyond_four_are_dropped():
assert "L5" not in r.user_note_labels.values()
def test_oorange_marker_treated_as_saturation():
"""BW writes 'OORANGE' (Out Of Range — truncated) when a channel
exceeds its full-scale. Verify ppv_ips falls back to geo_range_ips
+ saturated flag is set, mirroring the real T190LD5Q.LK0W,
T438L713.RY0W, and K557L3YM.OE0W events from prod 2026-05-27.
"""
txt = """\
"Event Type : Full Waveform"
"Serial Number : BE18190"
"Geo Range : 10.000 in/s"
"Tran PPV : 2.140 in/s"
"Vert PPV : OORANGE in/s"
"Long PPV : 2.830 in/s"
"Peak Vector Sum : OORANGE in/s"
"Peak Vector Sum TimeSum : 0.007 s"
"MicL PSPL : OORANGE "
"""
r = parse_report(txt)
# Tran/Long parse normally
assert r.channels["Tran"].ppv_ips == 2.14
assert r.channels["Tran"].ppv_saturated is False
assert r.channels["Long"].ppv_ips == 2.83
# Vert saturated → range max + flag
assert r.channels["Vert"].ppv_ips == 10.0
assert r.channels["Vert"].ppv_saturated is True
# PVS saturated → sqrt(3) * range_max as upper bound + flag
import math
assert r.peak_vector_sum_ips == pytest.approx(math.sqrt(3) * 10.0)
assert r.peak_vector_sum_saturated is True
# Mic saturated → 140 dBL conservative upper bound + flag
assert r.mic.pspl_dbl == 140.0
assert r.mic.pspl_saturated is True
# PVS time still parses despite the BW typo'd label "TimeSum"
assert r.peak_vector_sum_time_s == pytest.approx(0.007)
def test_real_oorange_event_t190_parses():
"""End-to-end against the real T190LD5Q.LK0W ASCII file pulled from
a Windows watcher PC on 2026-05-27. This is the canonical example
of the parser-PPV-miss bug we fixed in this iteration."""
fixture_path = (
Path(__file__).parent.parent / "example-events" /
"ascii-5-27-26" / "T190LD5Q_LK0W_ASCII.TXT"
)
if not fixture_path.exists():
pytest.skip("real ASCII fixture not present (local-only)")
r = parse_report_file(fixture_path)
assert r.serial == "BE18190"
assert r.geo_range_ips == 10.0
# Tran reads cleanly, Vert was OORANGE
assert r.channels["Tran"].ppv_ips == pytest.approx(2.14)
assert r.channels["Vert"].ppv_ips == 10.0
assert r.channels["Vert"].ppv_saturated is True
assert r.channels["Long"].ppv_ips == pytest.approx(2.83)
assert r.peak_vector_sum_saturated is True
assert r.peak_vector_sum_time_s == pytest.approx(0.007)
# Same fixture: Tran ZC Freq is ">100 Hz" — must parse as 100 +
# above_range flag, not None (which would render as "—" on the PDF).
assert r.channels["Tran"].zc_freq_hz == 100.0
assert r.channels["Tran"].zc_freq_above_range is True
# Vert/Long are normal numeric values; flag stays False.
assert r.channels["Vert"].zc_freq_above_range is False
assert r.channels["Long"].zc_freq_above_range is False
def test_above_range_marker_treated_as_zc_threshold():
"""BW writes '>100 Hz' for ZC Freq when the zero-crossing algorithm
sees a peak too fast to count (cuts off at the device's 100 Hz
reporting ceiling). Parser must store the threshold + flag, not
fall back to None.
"""
txt = """\
"Event Type : Full Waveform"
"Serial Number : BE18190"
"Tran ZC Freq : >100 Hz"
"Vert ZC Freq : 73 Hz"
"Long ZC Freq : N/A Hz"
"MicL ZC Freq : >100 Hz"
"""
r = parse_report(txt)
assert r.channels["Tran"].zc_freq_hz == 100.0
assert r.channels["Tran"].zc_freq_above_range is True
assert r.channels["Vert"].zc_freq_hz == 73.0
assert r.channels["Vert"].zc_freq_above_range is False
# N/A → None, flag stays False
assert r.channels["Long"].zc_freq_hz is None
assert r.channels["Long"].zc_freq_above_range is False
# Mic above-range
assert r.mic.zc_freq_hz == 100.0
assert r.mic.zc_freq_above_range is True
def test_real_histogram_fixture_populates_sensor_location():
"""End-to-end: the histogram fixture uses 'Seis. Location:' — must
successfully populate sensor_location via position-based parsing."""
+71
View File
@@ -529,6 +529,77 @@ def test_save_imported_bw_round_trip(tmp_path: Path):
assert stored_path.read_bytes() == src.read_bytes()
# ── apply_bw_report_dict_to_event ────────────────────────────────────────────
def test_apply_bw_report_dict_overlays_peaks_and_recording():
"""Verbatim mirror of the data shape produced by `_bw_report_to_dict`
when projecting a parsed `BwAsciiReport` into the sidecar. Confirms
each field overlays onto Event correctly so the backfill path
matches ingest behavior."""
from minimateplus.models import PeakValues
ev = Event(index=0)
bw_report = {
"peaks": {
"tran": {"ppv_ips": 9.84375},
"vert": {"ppv_ips": 0.305},
"long": {"ppv_ips": 0.405},
"vector_sum": {"ips": 14.86736},
},
"mic": {"pspl_dbl": 115.9},
"recording": {"sample_rate_sps": 1024, "record_time_s": 3.0},
}
event_file_io.apply_bw_report_dict_to_event(ev, bw_report)
assert ev.peak_values is not None
assert ev.peak_values.tran == 9.84375
assert ev.peak_values.vert == 0.305
assert ev.peak_values.long == 0.405
assert ev.peak_values.peak_vector_sum == 14.86736
# MicL is converted dB → psi via _dbl_to_psi — just confirm non-zero
assert ev.peak_values.micl is not None and ev.peak_values.micl > 0
assert ev.sample_rate == 1024
assert ev.rectime_seconds == 3.0
def test_apply_bw_report_dict_overwrites_codec_peaks():
"""The whole point of this helper: bw_report wins over whatever the
codec produced. This is what the 2026-05-22 prod backfill missed
DB peaks got overwritten with codec output (incl. PVS=0 on the
three top events) when they should have stayed bw_report-overlaid."""
from minimateplus.models import PeakValues
ev = Event(index=0)
# Simulate codec output that's clearly wrong (incomplete decode):
ev.peak_values = PeakValues(
tran=2.09, vert=0.0, long=0.0, peak_vector_sum=0.0,
)
bw_report = {
"peaks": {
"tran": {"ppv_ips": 9.84},
"vert": {"ppv_ips": 4.95},
"long": {"ppv_ips": 8.05},
"vector_sum": {"ips": 14.95},
},
}
event_file_io.apply_bw_report_dict_to_event(ev, bw_report)
assert ev.peak_values.tran == 9.84
assert ev.peak_values.vert == 4.95
assert ev.peak_values.long == 8.05
assert ev.peak_values.peak_vector_sum == 14.95
def test_apply_bw_report_dict_no_op_on_empty():
"""None / empty dict / missing keys should leave Event untouched."""
from minimateplus.models import PeakValues
for empty in (None, {}, {"peaks": {}}, {"peaks": {"tran": {}}}):
ev = Event(index=0)
ev.peak_values = PeakValues(tran=1.0, vert=2.0, long=3.0)
event_file_io.apply_bw_report_dict_to_event(ev, empty)
# Unchanged
assert ev.peak_values.tran == 1.0
assert ev.peak_values.vert == 2.0
assert ev.peak_values.long == 3.0
if __name__ == "__main__":
if pytest is not None:
pytest.main([__file__, "-v"])
+48
View File
@@ -335,3 +335,51 @@ def test_geo_count_to_ins_scale():
assert geo_count_to_ins(1) == pytest.approx(0.005)
assert geo_count_to_ins(10) == pytest.approx(0.050)
assert geo_count_to_ins(0) == 0.0
# ── Regression: peak is uint8 byte[N], NOT uint16 LE byte[N:N+2] ────────────
#
# Block taken verbatim from K558LKZU.RE0H (BE9558) interval 12 — a real
# field event where the Tran channel had developed a DC offset and was
# producing sub-Hz drift content the device couldn't characterize.
# The annotation byte at [7] = 0xd2 is non-zero in that case. The
# legacy codec read [6:8] as uint16 LE, producing T_peak = 53763 →
# 268 in/s — physically impossible and 35× too high for the actual
# 0.015 in/s value (T_lo = 3 alone gives the correct count).
# Verified against the paired BW ASCII export.
_K558_INTERVAL_12_BLOCK = bytes.fromhex(
"00 00 0c 01 0a 00 03 d2 45 00 02 00 02 00 02 00"
"02 00 10 00 06 00 00 00 0e 91 2f 00 1e 0a 00 00".replace(" ", "")
)
def test_extension_byte_does_not_inflate_peak():
"""The annotation byte at [7]/[11]/[15]/[19] must NOT contribute to
the peak count. Decoded T_peak must be 3 (uint8 byte[6]), NOT
53763 (uint16 LE byte[6:8])."""
body = _K558_INTERVAL_12_BLOCK
records = decode_histogram_body_full(body)
assert records is not None
assert len(records) == 1
r = records[0]
assert r["t_peak"] == 3, f"T_peak should be 3 (uint8), got {r['t_peak']}"
assert r["v_peak"] == 2
assert r["l_peak"] == 2
assert r["m_peak"] == 16
# Half-periods unchanged — still uint16 LE.
assert r["t_halfp"] == 0x0045 # 69 → 7.4 Hz
assert r["m_halfp"] == 6 # → 85.3 Hz
# Annotation byte is preserved (for future RE) but does not affect peak.
assert r["annotations"] == (0xd2, 0x00, 0x00, 0x00)
def test_extension_byte_decoded_to_correct_in_s():
"""End-to-end: the channel-grouped output for the K558 ext block
should give T = 3 counts = 0.015 in/s, not 53763 counts = 268 in/s."""
channels = decode_histogram_body(_K558_INTERVAL_12_BLOCK)
assert channels is not None
assert channels["Tran"] == [3]
assert geo_count_to_ins(channels["Tran"][0]) == pytest.approx(0.015)
assert channels["Vert"] == [2]
assert channels["Long"] == [2]
assert channels["MicL"] == [16]