From 49a524d0d49f832b1f2870f9b135a350cf62eae9 Mon Sep 17 00:00:00 2001 From: serversdown Date: Fri, 22 May 2026 18:38:00 +0000 Subject: [PATCH 1/2] docs: three-tier architecture model + strategic roadmap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CLAUDE.md gains an Architecture section near the top describing the canonical three-tier mental model: - SFM: device-side, live connections, /device/* endpoints - SDM: data-side, DB + waveform store + /db/* endpoints (currently living under sfm/ for historical reasons; rename deferred) - Codec library: pure data-interpretation, used by both tiers Future code should be placed and named according to this model even though the directory layout doesn't fully reflect it yet. Decision rule for where new code goes is documented inline. README.md's Roadmap section gains two strategic-direction subsections: - "Strategic direction" — frames the suite-of-components vision and notes that BW ACH + Thor IDF call-home remain the data movers; seismo-relay's value is on the receiving and processing side. - "Terra-View ↔ SFM device control" — the long-term vision where Terra-View can launch into SFM device-control surfaces (operator notices missing unit → clicks "Connect to Device" → live view in browser). Includes concrete implementation checklist (auth, embedded live-monitor view, action history, series IV live support). The existing tactical roadmap items remain unchanged below. Co-Authored-By: Claude Opus 4.7 (1M context) --- CLAUDE.md | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 66 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 144 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index 5dd6629..e46b30b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -8,6 +8,84 @@ When new information about the protocol is discovered, please update the instant --- +## Architecture: three-tier conceptual model + +seismo-relay is a **suite of cooperating components**, not a single app. +The three tiers below are the canonical mental model — the current +directory layout doesn't fully reflect them yet (some of what is +conceptually SDM lives under `sfm/` today), but new code should be +placed and named according to this model. + +### 1. SFM — the device-side (active connection to physical units) + +Replaces Blastware's *talk-to-the-meter* role. Lives where a connection +to a physical seismograph is open. + +In scope: +- `minimateplus/{transport,framing,protocol,client}.py` — wire protocol +- `seismo_lab.py` — diagnostic GUI (a thick client for SFM) +- The `/device/*` HTTP endpoints in `sfm/server.py` — + `/device/info`, `/device/events`, `/device/monitor/*`, `/device/call_home`, + etc. Anything that opens a connection at the moment of the request. +- Future: a Thor / Micromate live client (mirror `minimateplus/`) +- Future: a control surface Terra-View can launch into — see the + README's Roadmap. + +Does NOT own a database. Outputs `Event` objects. Has a "spun up when +needed" runtime profile rather than "always on". + +### 2. SDM — the data-side (storage, ingest, and serving) + +The new name for the receiving-and-storing role. Originally called SFM +because the FastAPI service started life as a thin device proxy, but +the actual role has migrated heavily toward data management. **For now +the directory remains `sfm/`** — renaming requires touching ~30-50 +files in seismo-relay + ~10-15 in terra-view + a Docker volume +migration; deferred until the codebase is quiet enough to do it as a +clean refactor. + +In scope: +- `sfm/database.py` (`SeismoDb`) +- `sfm/waveform_store.py`, `sfm/event_hdf5.py` +- The `/db/*` HTTP endpoints — `events`, `units`, `monitor_log`, + `sessions`, `false_trigger` mutations +- The `/db/import/*` ingest endpoints — `blastware_file` (series3), + `idf_file` (series4); anything that receives events FROM somewhere +- `scripts/backfill_sidecars.py`, `scripts/check_bw_report_preservation.py`, + and similar data-maintenance tools +- The `.sfm.json` sidecars and `.h5` files in the waveform store +- The shape that Terra-View consumes (Terra-View should never need to + reach into SFM/device-side endpoints to populate its UI) + +Always-on, scaled for storage/serving, has the DB and waveform store. + +### 3. Codec library — pure data interpretation (used by both sides) + +Neither SFM nor SDM — a shared library both depend on. + +In scope: +- `minimateplus/{waveform_codec,histogram_codec,event_file_io,bw_ascii_report,blastware_file}.py` +- `micromate/{idf_ascii_report,idf_file}.py` + +These modules take bytes (off the wire on the SFM side, or from a +forwarded file on the SDM side) and return `Event` objects. They +should not import from `sfm/`, must not touch a DB, and have no I/O +beyond reading files passed as arguments. Keep them pure — both +tiers can then depend on them without circularity. + +### Practical consequences + +When deciding where new code goes, ask: +- *Does it need a connection to a device?* → SFM +- *Does it operate on stored events / sidecars / DB rows?* → SDM +- *Does it interpret bytes into structured data, with no I/O of its own?* → codec lib + +Terra-View is downstream of SDM for data, and (per the roadmap) will +eventually invoke into SFM's device-control endpoints to provide a +"connect to unit" experience. + +--- + ## Project layout ``` diff --git a/README.md b/README.md index c057f68..6433158 100644 --- a/README.md +++ b/README.md @@ -459,6 +459,72 @@ Use **com0com** or **VSPD** to create the virtual COM pair on Windows. ## Roadmap (Future) +### Strategic direction — where this is going + +seismo-relay is being built as a **suite of cooperating components** +that together replace and improve on Blastware's role. Three logical +tiers: + +1. **SFM** (device-side) — owns the active connection to a physical + unit. Today: `minimateplus/`, `/device/*` HTTP endpoints, + `seismo_lab.py`. Future: live Thor / Micromate support. +2. **SDM** (data-side) — owns the database, waveform store, ingest + pipelines, and the read-API that Terra-View consumes. Today this + code lives under `sfm/` for historical reasons; the role has + migrated and the eventual rename is on the long-tail cleanup list. +3. **Codec library** — pure data-interpretation: `minimateplus/*_codec.py`, + `bw_ascii_report.py`, `micromate/idf_*.py`. Used by both SFM and + SDM, depends on neither. + +Terra-View is downstream of SDM for fleet listings, event detail, etc. +The long-term vision adds a **second link** from Terra-View → SFM for +direct device interaction (see below). + +The codec work in this repo isn't trying to replace BW's network +layer — BW's ACH file forwarding and Thor's IDF call-home are +battle-tested. The value is in the receiving and processing side: turn +the stream of binary+ASCII pairs into something users can search, +filter, alert on, and report from. + +### Terra-View ↔ SFM device control (the long-term vision) + +Today Terra-View only reads from SDM (event listings, dashboards, +project reports). When a unit goes missing — operator notices in the +Terra-View dashboard — there's no way to *do* anything from the UI. +The path of least resistance is to RDP into a Windows box and open +Blastware, which defeats the purpose of having Terra-View. + +Target experience: +- Operator notices a unit in Terra-View dashboard hasn't called in. +- Clicks unit detail → "Connect to Device" button. +- Terra-View opens an embedded view (modal or side-panel) that talks + to SFM's `/device/*` endpoints over the network. +- Live view: device clock, battery, memory, current monitor status. +- Actions: start/stop monitoring, push compliance config changes, pull + fresh events, run a sensor self-check, change call-home settings. +- Audit log: every connect / action recorded in SDM for the unit + history. + +Implementation steps (concrete): +- [ ] **SFM authentication & authorization layer.** Today `/device/*` + endpoints are unauthenticated — anyone on the network can call + them. Need at minimum a token-based auth, ideally with a "who + can connect to which units" mapping. Hard prerequisite for + letting Terra-View users into the control surface. +- [ ] **Terra-View "Connect to Device" entry point** on the unit + detail page. Renders only when unit has connection info on file + and the user has permission. +- [ ] **Embedded live-monitor view** in Terra-View — equivalent to + `seismo_lab.py`'s Bridge tab, but in the browser. Polls SFM's + `/device/monitor/status` on an interval; sends start/stop via + `/device/monitor/{start,stop}`. +- [ ] **Action history** — every connect / push / action call records + a row in `unit_history`, viewable on the unit detail page. +- [ ] **Series IV live-device support in SFM** — currently `/device/*` + only supports MiniMate Plus. Blocks "Connect to Device" for + Thor units until done. Depends on Thor wire-protocol capture + and a `micromate/` parallel of the `minimateplus/` modules. + ### High-impact (unblocks product features) - [ ] **Series III waveform body codec reverse-engineering.** The 5A bulk-stream body is some kind of compressed/encoded format (not raw int16 LE as previously assumed — see §7.6.1 retraction in `docs/instantel_protocol_reference.md`). Structural framing is ~50% decoded on branch `claude/codec-re-cBGNe` (tagged-block walker, segment counters); per-byte sample mapping is still open. Until this lands, the in-app waveform viewer renders garbage and BW-import peak values fall back to `_peaks_from_samples()` saturation noise. Workaround: pair every BW-imported event with its `_ASCII.TXT` so the device-authoritative peaks land in the DB regardless of codec. -- 2.52.0 From 35842ac50a8b225ad193f16100b3422002669f19 Mon Sep 17 00:00:00 2001 From: serversdown Date: Fri, 22 May 2026 18:56:22 +0000 Subject: [PATCH 2/2] backfill: overlay bw_report onto Event before DB upsert MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mirror what the ingest path does: BW's reported peaks (and sample_rate / record_time) take precedence over codec output where present. Without this, --force backfill silently overwrites bw_report-overlaid DB columns with codec-derived peaks. Wrong for events where the codec doesn't fully decode (waveform walker edge cases on SP0/SS0/SV0-style events, histogram byte[5]!=0 sub-format that isn't yet RE'd), producing PVS=0 on real high-amplitude events. Bit on prod 2026-05-22 with three top-10 waveform events ending up at PVS=0 (rolled back same day, this fix is the proper resolution). New helper minimateplus.event_file_io.apply_bw_report_dict_to_event operates on the projected sidecar dict shape (the structure _bw_report_to_dict produces, which is what gets preserved in the sidecar). Mirrors apply_report_to_event's semantics: only writes fields where bw_report has a non-None value, no-ops cleanly on empty / None input. Dev validation against prod snapshot: pre : 1839.7315 pvs_sum 356 events with DB PVS ≠ sidecar bw_report post : 2016.4902 pvs_sum 2 events still mismatched (both have NULL timestamp + duplicate rows, edge case) Both edge-case events DO get the correct value written by the new backfill — their stale rows from prior backfills remain because UNIQUE(serial, timestamp) doesn't fire on NULL. Separate dedup cleanup needed for those 2 events (0.014% of corpus); not blocking. Backfill remains idempotent + bw_report preservation still passes (0 WIPED, 0 CHANGED on the 3rd consecutive run). Co-Authored-By: Claude Opus 4.7 (1M context) --- minimateplus/event_file_io.py | 54 ++++++++++++++++++++++++++ scripts/backfill_sidecars.py | 17 +++++++++ tests/test_event_file_io.py | 71 +++++++++++++++++++++++++++++++++++ 3 files changed, 142 insertions(+) diff --git a/minimateplus/event_file_io.py b/minimateplus/event_file_io.py index 6e5674d..66a4b68 100644 --- a/minimateplus/event_file_io.py +++ b/minimateplus/event_file_io.py @@ -254,6 +254,60 @@ def apply_report_to_event(event: Event, report: BwAsciiReport) -> None: event.rectime_seconds = report.record_time_s +def apply_bw_report_dict_to_event(event: Event, bw_report: dict) -> None: + """Mirror of ``apply_report_to_event`` for the projected sidecar + dict shape (as produced by ``_bw_report_to_dict``). + + Why this exists + ─────────────── + The ingest path holds a live ``BwAsciiReport`` parsed straight from + the ``_ASCII.TXT`` and uses ``apply_report_to_event`` to overlay + device-authoritative peaks onto the codec output before insert. + + The backfill path doesn't have the original ``.TXT`` (it's not + retained in the waveform store), but it does have the preserved + ``bw_report`` block from the sidecar — which contains the same + projected fields. Re-overlaying those during a backfill keeps the + DB peak columns aligned with what BW reports rather than letting + the codec output (which may be incomplete for unhandled formats or + walker edge cases) win by default. + + No-ops cleanly when ``bw_report`` is ``None``, empty, or missing + any particular sub-field — only fields with a concrete value get + written. Mirrors ``apply_report_to_event``'s "report wins where + present" semantics. + """ + if not bw_report: + return + if event.peak_values is None: + event.peak_values = PeakValues() + pv = event.peak_values + + peaks = bw_report.get("peaks") or {} + tran = (peaks.get("tran") or {}).get("ppv_ips") + vert = (peaks.get("vert") or {}).get("ppv_ips") + long = (peaks.get("long") or {}).get("ppv_ips") + if tran is not None: pv.tran = tran + if vert is not None: pv.vert = vert + if long is not None: pv.long = long + vs_ips = (peaks.get("vector_sum") or {}).get("ips") + if vs_ips is not None: + pv.peak_vector_sum = vs_ips + + mic = bw_report.get("mic") or {} + pspl = mic.get("pspl_dbl") + if pspl is not None and pspl > 0: + pv.micl = _dbl_to_psi(pspl) + + rec = bw_report.get("recording") or {} + sr = rec.get("sample_rate_sps") + if sr: + event.sample_rate = sr + rt = rec.get("record_time_s") + if rt is not None: + event.rectime_seconds = rt + + def _project_info_to_dict(pi: Optional[ProjectInfo]) -> dict: if pi is None: return { diff --git a/scripts/backfill_sidecars.py b/scripts/backfill_sidecars.py index bbe0d0f..9c4bf5d 100644 --- a/scripts/backfill_sidecars.py +++ b/scripts/backfill_sidecars.py @@ -309,6 +309,23 @@ def main(argv=None) -> int: except Exception: pass + # Overlay BW ASCII report fields onto the rebuilt Event + # BEFORE the sidecar + DB write. Mirrors what the ingest + # path does — BW's reported peaks (and sample_rate / + # record_time) win over codec output where present. + # + # Without this step, --force backfill silently overwrites + # the bw_report-overlaid DB columns with codec-derived + # values, which is wrong for events the codec doesn't + # fully decode (e.g. waveform walker edge cases on + # SP0/SS0/SV0-style events, or histogram sub-formats with + # byte[5]!=0 that aren't yet RE'd). Net effect was PVS=0 + # on three top-10 events on 2026-05-22. + if preserved_bw_report: + event_file_io.apply_bw_report_dict_to_event( + ev, preserved_bw_report, + ) + sidecar = event_file_io.event_to_sidecar_dict( ev, serial=serial, diff --git a/tests/test_event_file_io.py b/tests/test_event_file_io.py index 6e08dae..0e043e8 100644 --- a/tests/test_event_file_io.py +++ b/tests/test_event_file_io.py @@ -529,6 +529,77 @@ def test_save_imported_bw_round_trip(tmp_path: Path): assert stored_path.read_bytes() == src.read_bytes() +# ── apply_bw_report_dict_to_event ──────────────────────────────────────────── + + +def test_apply_bw_report_dict_overlays_peaks_and_recording(): + """Verbatim mirror of the data shape produced by `_bw_report_to_dict` + when projecting a parsed `BwAsciiReport` into the sidecar. Confirms + each field overlays onto Event correctly so the backfill path + matches ingest behavior.""" + from minimateplus.models import PeakValues + ev = Event(index=0) + bw_report = { + "peaks": { + "tran": {"ppv_ips": 9.84375}, + "vert": {"ppv_ips": 0.305}, + "long": {"ppv_ips": 0.405}, + "vector_sum": {"ips": 14.86736}, + }, + "mic": {"pspl_dbl": 115.9}, + "recording": {"sample_rate_sps": 1024, "record_time_s": 3.0}, + } + event_file_io.apply_bw_report_dict_to_event(ev, bw_report) + assert ev.peak_values is not None + assert ev.peak_values.tran == 9.84375 + assert ev.peak_values.vert == 0.305 + assert ev.peak_values.long == 0.405 + assert ev.peak_values.peak_vector_sum == 14.86736 + # MicL is converted dB → psi via _dbl_to_psi — just confirm non-zero + assert ev.peak_values.micl is not None and ev.peak_values.micl > 0 + assert ev.sample_rate == 1024 + assert ev.rectime_seconds == 3.0 + + +def test_apply_bw_report_dict_overwrites_codec_peaks(): + """The whole point of this helper: bw_report wins over whatever the + codec produced. This is what the 2026-05-22 prod backfill missed — + DB peaks got overwritten with codec output (incl. PVS=0 on the + three top events) when they should have stayed bw_report-overlaid.""" + from minimateplus.models import PeakValues + ev = Event(index=0) + # Simulate codec output that's clearly wrong (incomplete decode): + ev.peak_values = PeakValues( + tran=2.09, vert=0.0, long=0.0, peak_vector_sum=0.0, + ) + bw_report = { + "peaks": { + "tran": {"ppv_ips": 9.84}, + "vert": {"ppv_ips": 4.95}, + "long": {"ppv_ips": 8.05}, + "vector_sum": {"ips": 14.95}, + }, + } + event_file_io.apply_bw_report_dict_to_event(ev, bw_report) + assert ev.peak_values.tran == 9.84 + assert ev.peak_values.vert == 4.95 + assert ev.peak_values.long == 8.05 + assert ev.peak_values.peak_vector_sum == 14.95 + + +def test_apply_bw_report_dict_no_op_on_empty(): + """None / empty dict / missing keys should leave Event untouched.""" + from minimateplus.models import PeakValues + for empty in (None, {}, {"peaks": {}}, {"peaks": {"tran": {}}}): + ev = Event(index=0) + ev.peak_values = PeakValues(tran=1.0, vert=2.0, long=3.0) + event_file_io.apply_bw_report_dict_to_event(ev, empty) + # Unchanged + assert ev.peak_values.tran == 1.0 + assert ev.peak_values.vert == 2.0 + assert ev.peak_values.long == 3.0 + + if __name__ == "__main__": if pytest is not None: pytest.main([__file__, "-v"]) -- 2.52.0