From d21e3b52986c6f7dae04dbc314d8f63f74edad72 Mon Sep 17 00:00:00 2001 From: serversdown Date: Wed, 27 May 2026 20:23:05 +0000 Subject: [PATCH] histogram aggregation + parser extension for BW interval fields MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three layered changes that together make histogram charts visually match BW's printout (one bar per interval, not per codec block): 1. bw_ascii_report parser captures histogram fields it previously dropped: - Histogram Start/Stop Time + Date → datetime - Number of Intervals + Interval Size (string + parsed seconds) - Peak Time + Peak Date → datetime (per-channel) - Peak Vector Sum Date (combined with PVS Time → datetime; clears the bogus seconds parse that interpreted "22:33:52" as 22.0) New _parse_iso_date() handles BW's ISO format for histograms (waveforms use "May 8, 2026" long form). New _parse_interval_size() handles "1 minute" / "5 minutes" / "15 seconds" etc. 2. _bw_report_to_dict() projects the new fields into a new bw_report.histogram block in the sidecar. 3. /db/events/{id}/waveform.json wraps the existing path 1 (HDF5) output with _maybe_aggregate_histogram(): when the event is a histogram AND the sidecar has bw_report.histogram.n_intervals, group the codec's per-block samples into N intervals via max-per-group and return the aggregated array. time_axis gains histogram_aggregated / n_intervals / interval_size_s / interval_times fields. Frontend (both modal chart in sfm_webapp.html + standalone event browser) uses interval_times as x-axis labels when provided (BW-style HH:MM:SS), falls back to interval index. Defensive: aggregation is no-op when the sidecar lacks the histogram block (events ingested before this change). Activates automatically on prod once a watcher re-forward populates new sidecars. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 9 +++ minimateplus/bw_ascii_report.py | 121 ++++++++++++++++++++++++++++++++ minimateplus/event_file_io.py | 15 ++++ sfm/event_browser.html | 17 +++-- sfm/server.py | 86 ++++++++++++++++++++++- sfm/sfm_webapp.html | 16 +++-- 6 files changed, 254 insertions(+), 10 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0847c73..1e1324c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,15 @@ All notable changes to seismo-relay are documented here. ### Added +- **Histogram per-interval aggregation in `waveform.json`.** Histogram events now render with one bar per BW-reported interval (matching the Blastware printout) instead of ~200 bars per event (the raw codec output). When the sidecar's `bw_report.histogram.n_intervals` is populated (events ingested with the new parser, see next bullet), the `/db/events/{id}/waveform.json` endpoint groups the codec samples into N intervals via max-per-group and returns the aggregated array. `time_axis` gains `histogram_aggregated: true`, `n_intervals`, `interval_size_s`, and `interval_times` (HH:MM:SS strings). Both the modal chart and the standalone event browser use those interval timestamps as x-axis labels when present. Defensive: no-op for events ingested before the parser extension landed (their sidecars lack `histogram.n_intervals`) — those continue to render with raw codec output. +- **`bw_ascii_report` parser now captures histogram-specific fields.** Previously the parser dropped these fields silently (Roadmap item closed): + - `Histogram Start Time` / `Histogram Start Date` (combined into `histogram_start: datetime`) + - `Histogram Stop Time` / `Histogram Stop Date` (combined into `histogram_stop: datetime`) + - `Number of Intervals` (`histogram_n_intervals: int`) + - `Interval Size` ("1 minute" string + parsed seconds: `histogram_interval_size_str`, `histogram_interval_size_s`) + - ` Peak Time` + ` Peak Date` for histogram events (combined into `channel_peak_when: dict`; waveforms continue to use `time_of_peak_s` relative) + - `Peak Vector Sum Date` (combined with PVS Time into `peak_vector_sum_when: datetime`; clears the previous bogus `peak_vector_sum_time_s` parse that interpreted "22:33:52" as 22.0 seconds) + - All new fields land in the sidecar's `bw_report.histogram` block via `_bw_report_to_dict`. Tested against synthetic K558LLB7.V20H-shaped input. - **Raw BW ASCII report (.TXT) preservation.** `save_imported_bw` now writes the paired `_ASCII.TXT` to `//_ASCII.TXT` alongside the binary at ingest time. Previously the .TXT was parsed into the sidecar's `bw_report` projection and then discarded — meaning parser bug fixes couldn't be applied retroactively without re-forwarding from the watcher PC. Now the raw .TXT lives in the waveform store permanently (~15 KB per event; ~210 MB total for a 14k-event store; negligible). Sidecar's `source.txt_filename` field records the saved path; backfill_sidecars preserves it across regens. New `GET /db/events/{id}/ascii_report.txt` endpoint serves the raw .TXT for any event ingested after this change. Events ingested before today still return 404 from that endpoint until re-forwarded. Architectural rationale: with BW Mail / Forwarding Agent being phased out of the operator workflow, the XML/PDF/WMF that those tools produced are no longer available — the binary + .TXT (created by BW ACH itself) are our authoritative source for everything going forward. - **Event Report PDF generation** — `GET /db/events/{id}/report.pdf` returns a single-page letter-portrait PDF for any event with waveform data on disk. Covers every field a Blastware Event Report includes: header metadata (date/time, trigger source, range, sample rate, project/client/operator/location, serial+firmware, battery, calibration, file name), microphone block (PSPL in dB(L) + psi, ZC freq, channel test), per-channel stats table (rows differ for waveform vs histogram), Peak Vector Sum, and the 4-channel plot. Iterated against real Blastware reference PDFs (uploaded to `example-events/pdfsnstuff/`): diff --git a/minimateplus/bw_ascii_report.py b/minimateplus/bw_ascii_report.py index a3aee4b..5ccb10a 100644 --- a/minimateplus/bw_ascii_report.py +++ b/minimateplus/bw_ascii_report.py @@ -144,6 +144,23 @@ class BwAsciiReport: # ── Vector sum ────────────────────────────────────────────────────────── peak_vector_sum_ips: Optional[float] = None peak_vector_sum_time_s: Optional[float] = None + # Histograms additionally have an absolute date+time for the PVS + # (it occurred at a specific interval). Waveform reports show + # only the relative-time value above. + peak_vector_sum_when: Optional[datetime.datetime] = None + + # ── Histogram-specific fields (populated only when Event Type starts + # with 'Histogram' / 'Full Histogram' / 'Histogram + Continuous') ── + histogram_start: Optional[datetime.datetime] = None + histogram_stop: Optional[datetime.datetime] = None + histogram_n_intervals: Optional[int] = None # e.g. 4, 1436 + histogram_interval_size_str: Optional[str] = None # "1 minute" / "5 minutes" / "15 seconds" + histogram_interval_size_s: Optional[float] = None # parsed to seconds + # Per-channel absolute peak time+date (histogram-specific). For + # waveform events these are None — those reports use the channel's + # time_of_peak_s (relative to trigger) instead. Keyed by channel + # name ("Tran", "Vert", "Long", "MicL"). + channel_peak_when: Dict[str, datetime.datetime] = field(default_factory=dict) # ── Sensor self-check (per channel) ───────────────────────────────────── sensor_check: Dict[str, SensorCheck] = field(default_factory=dict) @@ -223,6 +240,46 @@ def _parse_event_date(s: str) -> Optional[datetime.date]: return None +def _parse_iso_date(s: str) -> Optional[datetime.date]: + """Parse "2026-05-16" → date. Histograms use ISO format for their + Start Date / Stop Date / Peak Date fields; waveforms use the + "May 8, 2026" long form which `_parse_event_date` handles.""" + s = s.strip() + try: + return datetime.date.fromisoformat(s) + except ValueError: + return None + + +_INTERVAL_UNIT_SECONDS = { + "second": 1, "seconds": 1, "sec": 1, "secs": 1, + "minute": 60, "minutes": 60, "min": 60, "mins": 60, + "hour": 3600, "hours": 3600, "hr": 3600, "hrs": 3600, +} + + +def _parse_interval_size(s: str) -> Optional[float]: + """Parse "1 minute" / "5 minutes" / "15 seconds" / "2 seconds" → seconds. + + Handles the BW Compliance Setup → Histogram Interval values verbatim + ("2 seconds", "5 seconds", "15 seconds", "1 minute", "5 minutes", + "15 minutes") plus a few defensive variants. + """ + if not s: + return None + parts = s.strip().split() + if len(parts) < 2: + return None + try: + n = float(parts[0]) + except ValueError: + return None + unit_per_s = _INTERVAL_UNIT_SECONDS.get(parts[1].lower()) + if unit_per_s is None: + return None + return n * unit_per_s + + def _parse_event_time(s: str) -> Optional[datetime.time]: """Parse "15:56:35" → time.""" s = s.strip() @@ -336,6 +393,15 @@ def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwA in_user_notes_block = False user_note_position = 0 + # Histogram-field staging — BW writes Peak Time and + # Peak Date on separate lines (and similarly Histogram + # Start Time / Date). We stash the partial value when the time + # line arrives and combine it when the matching date line arrives. + _hist_start_time: Optional[datetime.time] = None + _hist_stop_time: Optional[datetime.time] = None + _pending_peak_time: Dict[str, Optional[datetime.time]] = {} + _pvs_time_raw: Optional[str] = None # last Peak Vector Sum Time value, raw + while i < n: raw_line = lines[i] i += 1 @@ -427,11 +493,66 @@ def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwA elif stat == "Peak Acceleration": cs.peak_accel_g = num elif stat == "Peak Displacement": cs.peak_disp_in = num + # ── Histogram-specific fields ──────────────────────────────────────── + # Histograms have Start/Stop time+date pairs + an interval count + # and size, plus per-channel absolute Peak Time/Date instead of + # the waveform's relative Time of Peak. + elif key == "Histogram Start Time": + _hist_start_time = _parse_event_time(value) + elif key == "Histogram Start Date": + _d = _parse_iso_date(value) + if _d and _hist_start_time: + report.histogram_start = datetime.datetime.combine(_d, _hist_start_time) + elif key == "Histogram Stop Time": + _hist_stop_time = _parse_event_time(value) + elif key == "Histogram Stop Date": + _d = _parse_iso_date(value) + if _d and _hist_stop_time: + report.histogram_stop = datetime.datetime.combine(_d, _hist_stop_time) + elif key == "Number of Intervals": + try: + report.histogram_n_intervals = int(float(value.strip())) + except ValueError: + pass + elif key == "Interval Size": + report.histogram_interval_size_str = value.strip() + report.histogram_interval_size_s = _parse_interval_size(value) + + # ── Per-channel histogram Peak Date / Peak Time ── + # Lines like "Tran Peak Time : 22:31:38" + "Tran Peak Date : 2026-05-16" + elif key in ("Tran Peak Time", "Vert Peak Time", "Long Peak Time", "MicL Time"): + ch_name = "MicL" if key == "MicL Time" else key.split(" ", 1)[0] + _pending_peak_time[ch_name] = _parse_event_time(value) + elif key in ("Tran Peak Date", "Vert Peak Date", "Long Peak Date", "MicL Date"): + ch_name = "MicL" if key == "MicL Date" else key.split(" ", 1)[0] + _d = _parse_iso_date(value) + _t = _pending_peak_time.get(ch_name) + if _d and _t: + report.channel_peak_when[ch_name] = datetime.datetime.combine(_d, _t) + # ── Vector Sum ─────────────────────────────────────────────────────── elif key == "Peak Vector Sum": report.peak_vector_sum_ips = _parse_number(value) elif key == "Peak Vector Sum Time": report.peak_vector_sum_time_s = _parse_number(value) + _pvs_time_raw = value + elif key == "Peak Vector Sum Date": + # Histogram-mode PVS gets paired with a date. We may have + # captured 'Peak Vector Sum Time' as either a relative + # seconds float (waveform) or an HH:MM:SS string we + # interpreted as a number. For histograms, BW writes + # "Peak Vector Sum Time : 22:33:52" which _parse_number + # parses as 22.0 (loses information). When Peak Vector Sum + # Date arrives, re-parse the previous PVS time line as a + # clock time and combine into an absolute datetime. + _d = _parse_iso_date(value) + if _d and _pvs_time_raw is not None: + _t = _parse_event_time(_pvs_time_raw) + if _t: + report.peak_vector_sum_when = datetime.datetime.combine(_d, _t) + # The earlier seconds parse was bogus for histograms; + # clear it so consumers don't think it's a real offset. + report.peak_vector_sum_time_s = None # ── Microphone block ──────────────────────────────────────────────── elif key == "Microphone": diff --git a/minimateplus/event_file_io.py b/minimateplus/event_file_io.py index e513ad3..b455bc0 100644 --- a/minimateplus/event_file_io.py +++ b/minimateplus/event_file_io.py @@ -171,6 +171,10 @@ def _bw_report_to_dict(report: BwAsciiReport) -> dict: "vector_sum": { "ips": report.peak_vector_sum_ips, "time_s": report.peak_vector_sum_time_s, + # Histogram events have an absolute date+time for the PVS + # (the interval at which it occurred); waveform events + # only have the time_s offset. + "when": report.peak_vector_sum_when.isoformat() if report.peak_vector_sum_when else None, }, }, "mic": { @@ -185,6 +189,17 @@ def _bw_report_to_dict(report: BwAsciiReport) -> dict: "long": _sc("Long"), "mic": _sc("MicL"), }, + # Histogram-specific fields (None on waveform-mode events). + # Per-channel absolute peak time/date for histograms — for + # waveforms see channels[ch]["time_of_peak_s"] instead. + "histogram": { + "start": report.histogram_start.isoformat() if report.histogram_start else None, + "stop": report.histogram_stop.isoformat() if report.histogram_stop else None, + "n_intervals": report.histogram_n_intervals, + "interval_size": report.histogram_interval_size_str, + "interval_size_s": report.histogram_interval_size_s, + "channel_peak_when": {ch: dt.isoformat() for ch, dt in report.channel_peak_when.items()}, + }, "monitor_log": monitor_log, "pc_sw_version": report.pc_sw_version, } diff --git a/sfm/event_browser.html b/sfm/event_browser.html index 1ef883b..9f5fd31 100644 --- a/sfm/event_browser.html +++ b/sfm/event_browser.html @@ -656,11 +656,18 @@ function renderWaveform(data) { chartsDiv.appendChild(wrap); // Waveform: per-sample time in ms relative to trigger (negative for pretrig). - // Histogram: interval index (1..N); sample_rate-based time math doesn't - // apply to per-interval peaks. - const times = isHistogram - ? values.map((_, i) => i + 1) - : values.map((_, i) => t0Ms + i * dtMs); + // Histogram: when the server has aggregated to BW-reported intervals AND + // provides per-interval timestamps, use those as x-axis labels (HH:MM:SS). + // Falls back to interval index. + let times; + if (isHistogram) { + const intervalTimes = ta.interval_times || []; + times = (intervalTimes.length === values.length) + ? intervalTimes + : values.map((_, i) => i + 1); + } else { + times = values.map((_, i) => t0Ms + i * dtMs); + } // Downsample for rendering const MAX_POINTS = 4000; diff --git a/sfm/server.py b/sfm/server.py index aee5532..93ee110 100644 --- a/sfm/server.py +++ b/sfm/server.py @@ -2237,6 +2237,89 @@ def db_event_report_pdf(event_id: str): ) +def _maybe_aggregate_histogram(plot: dict, store, serial: str, filename: str, row: dict) -> dict: + """For histogram events, aggregate the codec's per-block samples into + the BW-reported number of intervals. No-op for waveforms or when + we don't have the histogram metadata (interval count + size) in the + sidecar's bw_report block. + + Why: the histogram codec emits one value per internal block (~1 per + second), but BW's printout shows one bar per configured interval + (typically 1-15 minutes). For a 1-minute-interval event the codec + gives ~60 blocks per BW bar. Aggregating max-per-group makes the + SFM chart + PDF visually match BW's display. + """ + record_type = row.get("record_type") or "" + if not record_type.lower().startswith("hist"): + return plot + + # Read interval count + size from the sidecar's bw_report.histogram block + try: + import json as _json + sidecar_path = store.sidecar_path_for(serial, filename) + if not sidecar_path.exists(): + return plot + sc = _json.loads(sidecar_path.read_text()) + hist = (sc.get("bw_report") or {}).get("histogram") or {} + n_intervals = hist.get("n_intervals") + interval_size_s = hist.get("interval_size_s") + start_iso = hist.get("start") + except Exception: + return plot + if not n_intervals or n_intervals < 1: + return plot + + # Aggregate each channel's values into n_intervals groups, max-per-group + channels = plot.get("channels") or {} + aggregated_channels: dict = {} + for ch, chd in channels.items(): + vals = chd.get("values") or [] + if not vals: + aggregated_channels[ch] = chd + continue + # Distribute len(vals) samples across n_intervals groups; uneven + # remainders get distributed across the first few groups. + per_group = len(vals) // n_intervals + remainder = len(vals) % n_intervals + agg: list = [] + offset = 0 + for i in range(n_intervals): + grp_size = per_group + (1 if i < remainder else 0) + if grp_size > 0: + grp = vals[offset:offset + grp_size] + # Max of absolute values (peaks are magnitudes). + agg.append(max((abs(v) for v in grp if v is not None), default=0)) + offset += grp_size + else: + agg.append(0) + aggregated_channels[ch] = {**chd, "values": agg} + + # Build per-interval timestamp labels for the x-axis if we have start time + interval_times: list = [] + if start_iso and interval_size_s: + try: + import datetime as _dt + start = _dt.datetime.fromisoformat(start_iso) + for i in range(int(n_intervals)): + # Show the END of each interval (BW convention — the + # peak reported is for samples taken THROUGH that time) + end = start + _dt.timedelta(seconds=(i + 1) * interval_size_s) + interval_times.append(end.strftime("%H:%M:%S")) + except Exception: + pass + + # Override the time_axis to reflect intervals (not samples). + plot_aggr = {**plot, "channels": aggregated_channels} + plot_aggr["time_axis"] = { + **(plot.get("time_axis") or {}), + "histogram_aggregated": True, + "n_intervals": int(n_intervals), + "interval_size_s": interval_size_s, + "interval_times": interval_times, + } + return plot_aggr + + @app.get("/db/events/{event_id}/waveform.json") def db_event_waveform_json(event_id: str) -> dict: """ @@ -2268,7 +2351,8 @@ def db_event_waveform_json(event_id: str) -> dict: h5_path = store.hdf5_path_for(serial, filename) if h5_path.exists(): try: - return event_hdf5.plot_json_from_hdf5(h5_path, event_id=event_id) + plot = event_hdf5.plot_json_from_hdf5(h5_path, event_id=event_id) + return _maybe_aggregate_histogram(plot, store, serial, filename, row) except Exception as exc: log.warning("HDF5 read failed (%s); falling back to A5 path", exc) diff --git a/sfm/sfm_webapp.html b/sfm/sfm_webapp.html index 188f4d5..9b0f862 100644 --- a/sfm/sfm_webapp.html +++ b/sfm/sfm_webapp.html @@ -2684,10 +2684,18 @@ function _renderScWaveform(data) { chartsDiv.appendChild(wrap); // Waveform: per-sample time in ms relative to trigger (negative for pretrig). - // Histogram: interval index (1..N); time math doesn't apply to per-interval peaks. - const times = isHistogram - ? values.map((_, i) => i + 1) - : values.map((_, i) => t0Ms + i * dtMs); + // Histogram: when the server has aggregated to BW-reported intervals AND + // provides per-interval timestamps, use those as x-axis labels (HH:MM:SS). + // Falls back to interval index. + let times; + if (isHistogram) { + const intervalTimes = ta.interval_times || []; + times = (intervalTimes.length === values.length) + ? intervalTimes + : values.map((_, i) => i + 1); + } else { + times = values.map((_, i) => t0Ms + i * dtMs); + } // Downsample for rendering when very long. const MAX = 3000;