feat: v0.15.0
### Added
- **Layered event storage architecture.** Each event now lands as four
files in the per-serial waveform store, each with a clear role:
- `<filename>` — the Blastware-readable binary (BW file). Untouched.
- `<filename>.a5.pkl` — the raw 5A frames (regenerative source).
- `<filename>.h5` — clean per-channel waveform arrays in physical
units (in/s for geo, psi for mic) plus event metadata (HDF5 with
gzip compression). This is the canonical format for downstream
analysis tools.
- `<filename>.sfm.json` — the modern review/metadata sidecar (peaks,
project, source provenance, review state, extensions).
SQLite (`seismo_relay.db`) is the searchable index over all four.
- **Plot-ready waveform JSON (`sfm.plot.v1`).** The `/device/event/{idx}/waveform`
and `/db/events/{id}/waveform.json` endpoints now return samples in
physical units with explicit time-axis metadata, peak markers, and
per-channel unit hints — no more guessing the ADC-to-velocity scale
client-side. The webapp waveform viewer was rewritten to consume
this shape.
- **In-app waveform viewer accuracy fix.** The standalone SFM webapp
viewer was scaling geophone amplitudes by `geoAdcScale / 32767`
(≈ 6.206 / 32767), where `geoAdcScale = 6.206053` is the device's
*in/s per V* hardware constant — not the ADC-counts-to-velocity
factor. This silently scaled every plot ~38% too low for Normal-range
geophones (the correct full-scale is 10.0 in/s, or 1.25 in/s for
Sensitive). Conversion is now done server-side using the geo_range
from compliance config; the client just plots.
- New `sfm/event_hdf5.py` module: `write_event_hdf5()`,
`read_event_hdf5()`, plus a plot-JSON helper.
- Backfill script extended to also emit `.h5` for existing events.
### Dependencies
- Added `h5py>=3.10` and `numpy>=1.24` for the HDF5 storage layer.
- Added `python-multipart>=0.0.7` (required by FastAPI for the
`/db/import/blastware_file` endpoint introduced in this release).
This commit is contained in:
+297
-22
@@ -1,34 +1,46 @@
|
||||
"""
|
||||
sfm/waveform_store.py — On-disk store for Blastware-format event files.
|
||||
|
||||
Layout (flat per-serial):
|
||||
Layout (flat per-serial, four files per event):
|
||||
|
||||
<root>/<serial>/<filename> ← event file (Blastware-readable binary)
|
||||
<root>/<serial>/<filename> ← event file (BW-readable binary)
|
||||
<root>/<serial>/<filename>.a5.pkl ← pickled list of A5 S3Frame dicts
|
||||
<root>/<serial>/<filename>.h5 ← clean waveform arrays (HDF5)
|
||||
<root>/<serial>/<filename>.sfm.json ← modern sidecar (peaks, project,
|
||||
review state, extensions)
|
||||
|
||||
`<filename>` is whatever `minimateplus.blastware_file.blastware_filename`
|
||||
produces for the event. The extension is NOT a fixed type tag — it encodes
|
||||
the event timestamp (`AB0T` format: 2-char base-36 of `total_seconds %
|
||||
1296`, literal `0`, then `W`=Full Waveform / `H`=Full Histogram for ACH
|
||||
downloads, or 3-char `AB0` for direct/manual downloads). Every event's
|
||||
filename therefore contains its own timestamp + record-type fingerprint and
|
||||
collisions across the same physical event don't occur.
|
||||
produces for the event. The extension is NOT a fixed type tag — it
|
||||
encodes the event timestamp (`AB0T` format).
|
||||
|
||||
The `.a5.pkl` sidecar lets the event file be regenerated later if the
|
||||
encoder changes — captures the raw 5A frame stream as serializable dicts so
|
||||
the schema isn't tied to the `S3Frame` dataclass layout.
|
||||
Roles:
|
||||
- BW binary: what Blastware reads. Untouched. The user-facing review
|
||||
waveform viewer.
|
||||
- .a5.pkl: regenerative source. Lets the BW binary be rebuilt
|
||||
byte-for-byte if the encoder changes. Never delete.
|
||||
- .h5: clean per-channel waveform arrays in physical units (in/s for
|
||||
geo, psi for mic) plus event metadata. Canonical format for
|
||||
downstream analysis tools and the `/device/event/{idx}/waveform`
|
||||
endpoint's plot-JSON output.
|
||||
- .sfm.json: small, queryable metadata + review state. SQL
|
||||
`events.false_trigger` is a derived index kept in sync via
|
||||
`patch_sidecar()`.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import datetime
|
||||
import logging
|
||||
import pickle
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from minimateplus import event_file_io
|
||||
from minimateplus.blastware_file import blastware_filename, write_blastware_file
|
||||
from minimateplus.framing import S3Frame
|
||||
from minimateplus.models import Event
|
||||
from sfm import event_hdf5
|
||||
|
||||
log = logging.getLogger("sfm.waveform_store")
|
||||
|
||||
@@ -80,10 +92,22 @@ class WaveformStore:
|
||||
return d
|
||||
|
||||
def paths_for(self, serial: str, filename: str) -> tuple[Path, Path]:
|
||||
"""Return (blastware_path, a5_pickle_path) for a given serial+filename."""
|
||||
"""Return (blastware_path, a5_pickle_path) for a given serial+filename.
|
||||
|
||||
For the sidecar path use `sidecar_path_for()` — kept separate so
|
||||
existing callers don't need to unpack a 3-tuple.
|
||||
"""
|
||||
d = self._serial_dir(serial)
|
||||
return d / filename, d / f"{filename}.a5.pkl"
|
||||
|
||||
def sidecar_path_for(self, serial: str, filename: str) -> Path:
|
||||
"""Return absolute path to the .sfm.json sidecar for a given event."""
|
||||
return self._serial_dir(serial) / f"{filename}.sfm.json"
|
||||
|
||||
def hdf5_path_for(self, serial: str, filename: str) -> Path:
|
||||
"""Return absolute path to the .h5 clean-waveform file for a given event."""
|
||||
return self._serial_dir(serial) / f"{filename}.h5"
|
||||
|
||||
def open_blastware(self, serial: str, filename: str) -> Optional[Path]:
|
||||
"""Return absolute path to an existing event file or None."""
|
||||
bw_path, _ = self.paths_for(serial, filename)
|
||||
@@ -96,23 +120,43 @@ class WaveformStore:
|
||||
ev: Event,
|
||||
serial: str,
|
||||
a5_frames: list[S3Frame],
|
||||
*,
|
||||
source_kind: str = "sfm-live",
|
||||
geo_range = "normal",
|
||||
) -> dict:
|
||||
"""
|
||||
Write the event file and its .a5.pkl sidecar for one event.
|
||||
Write all four event-file artifacts for one event:
|
||||
- <filename> BW binary
|
||||
- <filename>.a5.pkl raw A5 frame pickle
|
||||
- <filename>.h5 clean waveform (HDF5)
|
||||
- <filename>.sfm.json modern sidecar (metadata + review)
|
||||
|
||||
Returns a record dict suitable for persisting alongside the DB row:
|
||||
|
||||
{
|
||||
"filename": "M529LKIQ.7M0W",
|
||||
"filesize": 8708,
|
||||
"sha256": "a1b2c3...",
|
||||
"a5_pickle_filename": "M529LKIQ.7M0W.a5.pkl",
|
||||
"hdf5_filename": "M529LKIQ.7M0W.h5",
|
||||
"sidecar_filename": "M529LKIQ.7M0W.sfm.json",
|
||||
}
|
||||
|
||||
The exact extension is timestamp-encoded per event (see
|
||||
`minimateplus.blastware_file.blastware_filename`).
|
||||
`source_kind` flows into `sidecar.source.kind` — callers should
|
||||
pass "sfm-live" (default) for the live endpoint and "sfm-ach" for
|
||||
the ACH ingestion path. BW-imported events use save_imported_bw()
|
||||
instead.
|
||||
|
||||
Idempotent: if the event file already exists, it is overwritten with
|
||||
the freshly-encoded version (same bytes for the same a5_frames).
|
||||
`geo_range` controls the ADC-counts → in/s scaling in the HDF5
|
||||
file ("normal" = 10 in/s FS, "sensitive" = 1.25 in/s FS).
|
||||
Defaults to "normal" — callers with compliance-config access
|
||||
should pass the actual unit setting so the saved samples are in
|
||||
the right units.
|
||||
|
||||
Idempotent: if the event file already exists, it is overwritten
|
||||
with the freshly-encoded version (same bytes for the same
|
||||
a5_frames) and the sidecar's review block is preserved across
|
||||
re-saves.
|
||||
"""
|
||||
if not a5_frames:
|
||||
raise ValueError("WaveformStore.save: a5_frames is empty")
|
||||
@@ -121,17 +165,18 @@ class WaveformStore:
|
||||
|
||||
filename = blastware_filename(ev, serial)
|
||||
bw_path, a5_path = self.paths_for(serial, filename)
|
||||
sidecar_path = self.sidecar_path_for(serial, filename)
|
||||
hdf5_path = self.hdf5_path_for(serial, filename)
|
||||
|
||||
# 1. encode the event file
|
||||
# Delete any stale file at this path so partial writes never leak
|
||||
# trailing bytes from a previous larger file (matches the live
|
||||
# endpoint's defensive unlink).
|
||||
# 1. encode the event file (defensive unlink prevents trailing-byte
|
||||
# leaks from a previous larger file on synced/odd filesystems).
|
||||
try:
|
||||
bw_path.unlink()
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
write_blastware_file(ev, a5_frames, bw_path)
|
||||
filesize = bw_path.stat().st_size
|
||||
sha256 = event_file_io.file_sha256(bw_path)
|
||||
|
||||
# 2. write the .a5.pkl sidecar
|
||||
try:
|
||||
@@ -145,14 +190,176 @@ class WaveformStore:
|
||||
with a5_path.open("wb") as fp:
|
||||
pickle.dump(payload, fp, protocol=pickle.HIGHEST_PROTOCOL)
|
||||
|
||||
# 3. write the .h5 clean-waveform file (samples in physical units).
|
||||
# Best-effort: a write failure shouldn't sink the rest of the save
|
||||
# (the HDF5 can be regenerated later from the .a5.pkl).
|
||||
hdf5_filename: Optional[str] = None
|
||||
try:
|
||||
event_hdf5.write_event_hdf5(
|
||||
hdf5_path, ev,
|
||||
serial=serial,
|
||||
geo_range=geo_range,
|
||||
source_kind=source_kind,
|
||||
)
|
||||
hdf5_filename = hdf5_path.name
|
||||
except Exception as exc:
|
||||
log.warning(
|
||||
"save: HDF5 write failed for %s: %s — continuing without .h5",
|
||||
hdf5_path, exc,
|
||||
)
|
||||
|
||||
# 4. write the .sfm.json sidecar. Preserve any existing review
|
||||
# block + extensions across re-saves so user edits aren't lost
|
||||
# when the same event is re-downloaded (e.g. via Force refresh).
|
||||
existing_review = None
|
||||
existing_extensions = None
|
||||
if sidecar_path.exists():
|
||||
try:
|
||||
old = event_file_io.read_sidecar(sidecar_path)
|
||||
existing_review = old.get("review")
|
||||
existing_extensions = old.get("extensions")
|
||||
except Exception as exc:
|
||||
log.warning(
|
||||
"save: existing sidecar at %s unreadable (%s); overwriting",
|
||||
sidecar_path, exc,
|
||||
)
|
||||
|
||||
sidecar = event_file_io.event_to_sidecar_dict(
|
||||
ev,
|
||||
serial=serial,
|
||||
blastware_filename=filename,
|
||||
blastware_filesize=filesize,
|
||||
blastware_sha256=sha256,
|
||||
source_kind=source_kind,
|
||||
a5_pickle_filename=a5_path.name,
|
||||
review=existing_review,
|
||||
extensions=existing_extensions,
|
||||
)
|
||||
event_file_io.write_sidecar(sidecar_path, sidecar)
|
||||
|
||||
log.info(
|
||||
"WaveformStore.save serial=%s filename=%s filesize=%d frames=%d",
|
||||
"WaveformStore.save serial=%s filename=%s filesize=%d frames=%d "
|
||||
"h5=%s sidecar=%s",
|
||||
serial, filename, filesize, len(a5_frames),
|
||||
hdf5_filename or "(skipped)", sidecar_path.name,
|
||||
)
|
||||
return {
|
||||
"filename": filename,
|
||||
"filesize": filesize,
|
||||
"sha256": sha256,
|
||||
"a5_pickle_filename": a5_path.name,
|
||||
"hdf5_filename": hdf5_filename,
|
||||
"sidecar_filename": sidecar_path.name,
|
||||
}
|
||||
|
||||
def save_imported_bw(
|
||||
self,
|
||||
bw_bytes: bytes,
|
||||
source_path: Path,
|
||||
*,
|
||||
serial_hint: Optional[str] = None,
|
||||
) -> tuple[Event, dict]:
|
||||
"""
|
||||
Ingest a Blastware event file produced by an external tool
|
||||
(Blastware's own ACH, manual download, etc.) where the source A5
|
||||
frames aren't available.
|
||||
|
||||
Workflow:
|
||||
1. Parse the bytes via event_file_io.read_blastware_file (writes
|
||||
a temp file to do that, since the parser takes a path).
|
||||
2. Resolve serial from BW filename (`<P><serial3>...`) or use
|
||||
serial_hint. Falls back to "UNKNOWN".
|
||||
3. Copy the BW bytes verbatim into <root>/<serial>/<filename>.
|
||||
4. Write the .sfm.json sidecar with source.kind = "bw-import"
|
||||
and a5_pickle_filename = None. Does NOT write a .a5.pkl
|
||||
(no A5 source available; byte-for-byte regeneration not
|
||||
possible — the on-disk BW file IS the byte-for-byte source).
|
||||
|
||||
Returns (event, record_dict) so callers can both insert into
|
||||
SeismoDb and surface the parsed Event.
|
||||
"""
|
||||
# Stash the bytes to a temp path so read_blastware_file (path-based)
|
||||
# can parse without us duplicating its logic.
|
||||
import tempfile
|
||||
with tempfile.NamedTemporaryFile(suffix=".bw", delete=False) as tmp:
|
||||
tmp.write(bw_bytes)
|
||||
tmp_path = Path(tmp.name)
|
||||
try:
|
||||
ev = event_file_io.read_blastware_file(tmp_path)
|
||||
finally:
|
||||
try:
|
||||
tmp_path.unlink()
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
|
||||
# Resolve serial. blastware_filename derives a 4-char prefix from
|
||||
# the numeric serial (e.g. BE11529 → M529); we go the other way
|
||||
# via the source filename if a hint wasn't given.
|
||||
serial = serial_hint or _serial_from_bw_filename(source_path.name) or "UNKNOWN"
|
||||
|
||||
# Use the source filename verbatim — it already encodes timestamp
|
||||
# + record type per BW's AB0T scheme, and we want to preserve it
|
||||
# so the file BW knows about can be opened back in BW.
|
||||
filename = source_path.name
|
||||
bw_path = self._serial_dir(serial) / filename
|
||||
|
||||
# 1. copy bytes
|
||||
bw_path.write_bytes(bw_bytes)
|
||||
filesize = bw_path.stat().st_size
|
||||
sha256 = event_file_io.file_sha256(bw_path)
|
||||
|
||||
# 2. write the .h5 clean-waveform file from the parsed Event.
|
||||
# Note: peaks here are computed from raw samples (the BW file
|
||||
# doesn't carry the device-authoritative 0C peaks). Best-effort.
|
||||
hdf5_path = self.hdf5_path_for(serial, filename)
|
||||
hdf5_filename: Optional[str] = None
|
||||
try:
|
||||
event_hdf5.write_event_hdf5(
|
||||
hdf5_path, ev,
|
||||
serial=serial,
|
||||
geo_range="normal", # BW file doesn't carry the range; assume Normal
|
||||
source_kind="bw-import",
|
||||
)
|
||||
hdf5_filename = hdf5_path.name
|
||||
except Exception as exc:
|
||||
log.warning(
|
||||
"save_imported_bw: HDF5 write failed for %s: %s — continuing",
|
||||
hdf5_path, exc,
|
||||
)
|
||||
|
||||
# 3. write sidecar with source.kind = bw-import
|
||||
sidecar_path = self.sidecar_path_for(serial, filename)
|
||||
existing_review = None
|
||||
if sidecar_path.exists():
|
||||
try:
|
||||
existing_review = event_file_io.read_sidecar(sidecar_path).get("review")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
sidecar = event_file_io.event_to_sidecar_dict(
|
||||
ev,
|
||||
serial=serial,
|
||||
blastware_filename=filename,
|
||||
blastware_filesize=filesize,
|
||||
blastware_sha256=sha256,
|
||||
source_kind="bw-import",
|
||||
a5_pickle_filename=None,
|
||||
review=existing_review,
|
||||
)
|
||||
event_file_io.write_sidecar(sidecar_path, sidecar)
|
||||
|
||||
log.info(
|
||||
"WaveformStore.save_imported_bw serial=%s filename=%s filesize=%d "
|
||||
"h5=%s (no .a5.pkl — A5 source unavailable for BW-imported files)",
|
||||
serial, filename, filesize, hdf5_filename or "(skipped)",
|
||||
)
|
||||
return ev, {
|
||||
"filename": filename,
|
||||
"filesize": filesize,
|
||||
"sha256": sha256,
|
||||
"a5_pickle_filename": None,
|
||||
"hdf5_filename": hdf5_filename,
|
||||
"sidecar_filename": sidecar_path.name,
|
||||
}
|
||||
|
||||
def load_a5(self, serial: str, filename: str) -> Optional[list[S3Frame]]:
|
||||
@@ -169,3 +376,71 @@ class WaveformStore:
|
||||
log.warning("WaveformStore.load_a5: malformed sidecar at %s", a5_path)
|
||||
return None
|
||||
return [_dict_to_frame(d) for d in payload["frames"]]
|
||||
|
||||
# ── modern .sfm.json sidecar accessors ──────────────────────────────────────
|
||||
|
||||
def load_sidecar(self, serial: str, filename: str) -> Optional[dict]:
|
||||
"""Return the parsed .sfm.json sidecar dict, or None if missing."""
|
||||
path = self.sidecar_path_for(serial, filename)
|
||||
if not path.exists():
|
||||
return None
|
||||
try:
|
||||
return event_file_io.read_sidecar(path)
|
||||
except Exception as exc:
|
||||
log.warning("load_sidecar: failed to read %s: %s", path, exc)
|
||||
return None
|
||||
|
||||
def patch_sidecar(
|
||||
self,
|
||||
serial: str,
|
||||
filename: str,
|
||||
*,
|
||||
review: Optional[dict] = None,
|
||||
extensions: Optional[dict] = None,
|
||||
reviewer_now: bool = True,
|
||||
) -> Optional[dict]:
|
||||
"""
|
||||
JSON-merge-patch the .sfm.json sidecar's review/extensions blocks.
|
||||
Returns the new full dict, or None if the sidecar doesn't exist.
|
||||
"""
|
||||
path = self.sidecar_path_for(serial, filename)
|
||||
if not path.exists():
|
||||
return None
|
||||
return event_file_io.patch_sidecar(
|
||||
path,
|
||||
review=review,
|
||||
extensions=extensions,
|
||||
reviewer_now=reviewer_now,
|
||||
)
|
||||
|
||||
|
||||
# ── helpers ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
def _serial_from_bw_filename(name: str) -> Optional[str]:
|
||||
"""
|
||||
Reverse of `blastware_filename`'s serial-prefix encoding.
|
||||
|
||||
BW filename format (V10.72): `<P><serial3><stem4>.<ext>`
|
||||
where P = chr(ord('B') + floor(serial // 1000))
|
||||
and serial3 = f"{serial % 1000:03d}".
|
||||
|
||||
Examples (from CLAUDE.md verification archive):
|
||||
P036... → BE14036 H907... → BE6907
|
||||
M529... → BE11529 T003... → BE18003
|
||||
|
||||
Returns the inferred BE-prefix serial (e.g. "BE11529") or None when
|
||||
the filename doesn't match the expected pattern.
|
||||
"""
|
||||
if not name:
|
||||
return None
|
||||
# First letter encodes the thousands group; next 3 chars encode the
|
||||
# last 3 digits of the serial.
|
||||
base = name.split(".", 1)[0]
|
||||
if len(base) < 4 or not base[0].isalpha() or not base[1:4].isdigit():
|
||||
return None
|
||||
prefix_letter = base[0].upper()
|
||||
if prefix_letter < "B":
|
||||
return None
|
||||
thousands = ord(prefix_letter) - ord("B")
|
||||
serial_num = thousands * 1000 + int(base[1:4])
|
||||
return f"BE{serial_num}"
|
||||
|
||||
Reference in New Issue
Block a user