4 Commits

Author SHA1 Message Date
serversdown 25386cab8b fix(backfill): regenerate IDFH .h5 + merge binary mic_pspl_psi onto bridge
Two gaps in backfill_thor_events.py that left old Thor events showing
stale charts after a v0.21.1 backfill pass:
1. IDFH events were skipped from .h5 regeneration (the "have decoded
   samples" gate was IDFW-only).  Histograms kept their pre-v0.21.1
   .h5 — written from raw_samples = None, which the renderer turned
   into a near-empty bar chart, or for older events the dB(L)-as-pseudo-
   psi mic scale that produced "107.7 psi" peaks (atomic-bomb level
   instead of footstep level).  Fix: synthesise the same 1-sample-per-
   interval array save_imported_idf v0.21.1 uses (peak ADC count per
   channel per interval) so the renderer's bar-chart grouping has
   data to work with.
2. The IDFW h5 path didn't merge binary_peaks.mic_pspl_psi onto the
   IdfEvent before to_minimateplus_event().  The live save_imported_idf
   does this merge — without it, IdfEvent.from_report() only sees the
   .txt's dB(L) value, the bridge falls back to the dBL→psi formula
   (instead of the binary-accurate 2.14e-6 psi/count value), and the
   h5 writer's per-count mic factor lands on a less-correct value.
   Fix: same merge the live ingest does (lift res.event.peaks.mic_pspl_psi
   onto idf_event.peaks before the bridge call).
Verified against UM6047_20250804190047.IDFH (250-interval prod
histogram): 250 intervals decode, mic_pspl_psi = 2.78e-5 (was being
treated as dB(L)=107.7 in the old h5).
Operator: re-run after deploy.  `docker compose exec sfm python
scripts/backfill_thor_events.py` is idempotent — the existing version
check still skips events already at the new TOOL_VERSION, and review
state + captured_at are preserved on the second pass.
2026-06-01 20:02:54 +00:00
serversdown 6cb619ecc4 version bump - 0.21.1 2026-06-01 19:33:44 +00:00
serversdown 1ed86244d0 fix(thor-events): add parallel field for mic psi. Now shows mic in dbl and psi. (psi for charts) 2026-06-01 18:27:24 +00:00
serversdown b2c565f217 fix(idf_waveforms): _find_waveform_body_offset() — scans every 00 02 00 magic past offset 0x0E00, runs decode_waveform_v2 on each candidate, picks the one that returns the most samples. Validated on 483 prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully decode, 126/483 partial (BW codec walker-stops-early on loud events — known issue).
IDFH now synthesises a 1-sample-per-interval array from the binary intervals and writes an .h5 so the existing renderer works unchanged. Each "sample" is the per-interval peak ADC count → h5_value = count × geo_fs/32768 yields the right bar height.
2026-05-31 20:51:09 +00:00
8 changed files with 348 additions and 41 deletions
+57
View File
@@ -8,6 +8,63 @@ All notable changes to seismo-relay are documented here.
--- ---
## v0.21.1 — 2026-06-01
Bug fixes against v0.21.0 surfaced after the first prod redeploy. Three
production-visible symptoms — blank waveform charts on most Thor events,
blank histogram charts on all Thor events, and a mic chart that
auto-scaled against a dB(L) value treated as psi — all root-caused and
fixed.
### Fixed
- **Dynamic IDFW body offset.** The v0.21.0 codec hardcoded the body
at file offset `0x0f1f` based on the example corpus, but only ~52%
of production IDFW events use that offset; the rest sit at offsets
from `0x1033` up to `0x3082` depending on header padding. At
`0x0f1f` the codec would find a coincidentally-matching `00 02 00`
magic, read the 2-byte Tran preamble, and return empty V/L/M
arrays — producing near-empty .h5 files and blank charts.
`micromate.idf_file._find_waveform_body_offset()` now scans every
`00 02 00` magic position past `0x0E00`, trial-decodes each one,
and picks the offset with the most samples. Validated across 483
prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully
decode, 126/483 partial (BW codec walker-stops-early on loud
events — pre-existing limitation, samples reached are correct).
- **IDFH histograms now render bar charts.** Histograms previously
skipped the .h5 write because there are no per-sample arrays, but
the renderer drives the per-interval bar chart from .h5 channel
data + `bw_report.histogram.n_intervals`. `save_imported_idf` now
synthesizes a 1-sample-per-interval array from the decoded
`IdfhInterval` peak counts and writes an .h5 so the existing
renderer works unchanged — each "sample" is the per-interval peak
ADC count, so the writer's `count × geo_fs/32768` conversion
yields the right bar height.
- **Mic chart scaling on Thor events.** `PeakValues.micl` (consumed
by the h5 writer's per-count mic scale factor) expects psi, but
the Thor bridge was stuffing the dB(L) value (~99.4) into it,
producing a per-count factor 5+ orders of magnitude too large and
a flat-looking mic chart. Fixed by adding `IdfPeaks.mic_pspl_psi`
alongside `mic_pspl_dbl`; `read_idf_file()` computes it from
binary mic counts (`max(|MicL|) × 2.14e-6 psi/count`) for both
IDFW and IDFH paths; `save_imported_idf` merges it onto the typed
event after `IdfEvent.from_report`; the bridge feeds psi to
`PeakValues.micl` with a dB(L)→psi formula fallback when only the
dB(L) value is available. dB(L) for the report header still
flows through `bw_report.mic.pspl_dbl` unchanged.
### Operator
After deploy, run `python scripts/backfill_thor_events.py` to refresh
every existing Thor event's sidecar + .h5 with the corrected codec
output. The script auto-skips events already at the current
`TOOL_VERSION`, so the bump from `0.21.0``0.21.1` is what triggers
the refresh.
---
## v0.21.0 — 2026-05-29 ## v0.21.0 — 2026-05-29
The "Thor / Series IV codec" release. Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline. Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event. The "Thor / Series IV codec" release. Two big pieces landed: (1) the IDF binary codec actually decodes now, both IDFW and IDFH, and (2) a Thor→BW adapter lets Thor events flow through the existing Series III Event Report PDF pipeline. Combined effect: a Thor event ingested via `/db/import/idf_file` now lands in the DB with the same fidelity as a Blastware event, gets a per-event PDF on demand, and renders in Terra-View's modal chart with the same plotting code as a BW event.
+82 -11
View File
@@ -62,12 +62,23 @@ _THOR_PREFIX = b"\x00\x12\x01\x00\x00\x00"
_BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00" _BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00"
_INSTANTEL_TAG = b"Instantel" _INSTANTEL_TAG = b"Instantel"
# Constant body offset for sig-A IDFW files (verified across 151/154 corpus # Most common body offset for sig-A IDFW files (~50% of prod events;
# files in tests/fixtures/THORDATA_example). The body is the segment-rotated # 151/154 in the original tests/fixtures/THORDATA_example corpus). The
# block stream consumed by decode_waveform_v2; bytes [0:3] are the magic # body is the segment-rotated block stream consumed by decode_waveform_v2;
# ``00 02 00`` preamble. # bytes [0:3] are the magic ``00 02 00`` preamble. Production events
# routinely use other offsets — see :func:`_find_waveform_body_offset`
# for the dynamic scan. This constant survives only as the priority hint.
_BODY_START_SIG_A = 0x0F1F _BODY_START_SIG_A = 0x0F1F
# Magic bytes that mark a candidate waveform-body preamble.
_BODY_MAGIC = b"\x00\x02\x00"
# Where to start looking for body candidates inside the file. Skip the
# fixed-header region where the same magic legitimately appears inside
# channel-test records and the compliance block (offsets 0x015d, 0x091c,
# 0x0ae2, 0x0d30 in observed events).
_BODY_SCAN_FLOOR = 0x0E00
# Geophone count → in/s, derived from sidecar ground truth: the smallest # Geophone count → in/s, derived from sidecar ground truth: the smallest
# non-zero sample in 1,014-file corpus is 0.0003 in/s. # non-zero sample in 1,014-file corpus is 0.0003 in/s.
_GEO_LSB_IPS = 0.0003 _GEO_LSB_IPS = 0.0003
@@ -179,17 +190,65 @@ def extract_binary_metadata(buf: bytes) -> IdfBinaryMetadata:
# ─── Sample decoder + unit conversion ─────────────────────────────────────── # ─── Sample decoder + unit conversion ───────────────────────────────────────
def _find_waveform_body_offset(buf: bytes) -> Optional[int]:
"""Pick the file offset of the waveform body by trial-decoding every
``00 02 00`` magic position past the fixed-header region.
The body's location isn't fixed across all sig-A IDFW files — about
half the production events use ``0x0f1f``, but the rest have offsets
that shift based on header padding / channel-config layout. We
auto-detect by:
1. Find every ``00 02 00`` occurrence past ``_BODY_SCAN_FLOOR``.
2. Try ``decode_waveform_v2()`` on each candidate.
3. Pick the offset whose decoded sample count is largest.
Returns the offset, or ``None`` if no candidate yielded more than
the trivial 2-sample preamble (= "no real body found").
Costs ~2-8 trial decodes per file; in practice the first candidate
past 0x0e00 is usually the right one.
"""
if len(buf) < _BODY_SCAN_FLOOR + 8:
return None
best: Optional[tuple[int, int]] = None # (total_samples, offset)
i = _BODY_SCAN_FLOOR
while True:
j = buf.find(_BODY_MAGIC, i)
if j < 0:
break
i = j + 1
try:
decoded = decode_waveform_v2(buf[j:])
except Exception:
continue
if not decoded:
continue
total = sum(len(v) for v in decoded.values())
# A "real" body has more than just the 2-sample preamble.
if total <= 2:
continue
if best is None or total > best[0]:
best = (total, j)
return best[1] if best else None
def _decode_waveform_samples(buf: bytes) -> Optional[dict]: def _decode_waveform_samples(buf: bytes) -> Optional[dict]:
"""Decode samples from the sig-A body starting at file offset 0x0f1f. """Decode samples from the sig-A waveform body.
Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in Returns the raw decoder counts dict — geo LSB = 0.0003 in/s, mic in
its own count unit (see :func:`mic_count_to_psi`). Returns None if its own count unit (see :func:`mic_count_to_psi`). Returns None if
decoding fails. no usable body is found.
Uses :func:`_find_waveform_body_offset` to locate the body — the
file-offset varies across events (~50% sit at the canonical
``0x0f1f`` but the rest don't), so the previous hardcoded constant
silently produced 2-sample preamble-only output for half the corpus.
""" """
if len(buf) < _BODY_START_SIG_A + 8: off = _find_waveform_body_offset(buf)
if off is None:
return None return None
body = buf[_BODY_START_SIG_A:] return decode_waveform_v2(buf[off:])
return decode_waveform_v2(body)
def geo_count_to_ips(count: int) -> float: def geo_count_to_ips(count: int) -> float:
@@ -379,6 +438,10 @@ def read_idf_file(
peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0) peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0)
peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0) peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0)
peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0) peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0)
# Mic peak in psi — Thor stores per-interval mic ADC counts in the
# binary; convert the max count to psi via the per-count factor.
mic_peak_count = max((iv.peak_count("MicL") for iv in intervals), default=0)
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
rep = IdfReport( rep = IdfReport(
serial_number=md.serial, serial_number=md.serial,
event_type="Full Histogram", event_type="Full Histogram",
@@ -392,7 +455,8 @@ def read_idf_file(
vertical_ips=peak_vert, vertical_ips=peak_vert,
longitudinal_ips=peak_long, longitudinal_ips=peak_long,
peak_vector_sum_ips=None, peak_vector_sum_ips=None,
mic_pspl_dbl=None, mic_pspl_dbl=None, # IDFH binary doesn't carry the dB(L) value
mic_pspl_psi=mic_peak_psi,
) )
event = IdfEvent( event = IdfEvent(
serial=md.serial or "UNKNOWN", serial=md.serial or "UNKNOWN",
@@ -430,6 +494,11 @@ def read_idf_file(
arr = decoded.get(ch, []) arr = decoded.get(ch, [])
return geo_count_to_ips(max((abs(v) for v in arr), default=0)) return geo_count_to_ips(max((abs(v) for v in arr), default=0))
# Mic peak psi from binary: max absolute MicL ADC count × 2.14e-6 psi/count.
mic_arr = decoded.get("MicL", [])
mic_peak_count = max((abs(v) for v in mic_arr), default=0)
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
peaks = IdfPeaks( peaks = IdfPeaks(
transverse_ips=_peak_ips("Tran"), transverse_ips=_peak_ips("Tran"),
vertical_ips=_peak_ips("Vert"), vertical_ips=_peak_ips("Vert"),
@@ -437,7 +506,9 @@ def read_idf_file(
# PVS requires aligned per-sample √(T²+V²+L²); leave None — the # PVS requires aligned per-sample √(T²+V²+L²); leave None — the
# sidecar carries it and the bridge picks it up if present. # sidecar carries it and the bridge picks it up if present.
peak_vector_sum_ips=None, peak_vector_sum_ips=None,
mic_pspl_dbl=None, mic_pspl_dbl=None, # binary IDFW doesn't carry the dB(L) value;
# sidecar .txt fills it via IdfReport.from_dict
mic_pspl_psi=mic_peak_psi,
) )
event = IdfEvent( event = IdfEvent(
+27 -6
View File
@@ -159,12 +159,23 @@ class IdfReport:
@dataclass @dataclass
class IdfPeaks: class IdfPeaks:
"""Geophone + mic peak values for one Thor event. Native Thor units.""" """Geophone + mic peak values for one Thor event. Native Thor units.
Thor stores the mic peak in two parallel forms — ``mic_pspl_dbl`` is
what the sidecar's top-level ``MicPSPL`` header field carries (dB(L)),
used in the report header. ``mic_pspl_psi`` is the psi value derived
either from the IDFW sample table / IDFH interval column 9, or from
the binary mic counts (~2.14e-6 psi/count). Needed because the
BW-shaped ``PeakValues.micl`` consumed by ``event_hdf5.write_event_hdf5``
expects psi — feeding it dB(L) makes the h5 mic-chart scale factor
blow up.
"""
transverse_ips: Optional[float] = None # in/s transverse_ips: Optional[float] = None # in/s
vertical_ips: Optional[float] = None # in/s vertical_ips: Optional[float] = None # in/s
longitudinal_ips: Optional[float] = None # in/s longitudinal_ips: Optional[float] = None # in/s
peak_vector_sum_ips: Optional[float] = None # in/s peak_vector_sum_ips: Optional[float] = None # in/s
mic_pspl_dbl: Optional[float] = None # dB(L) mic_pspl_dbl: Optional[float] = None # dB(L)
mic_pspl_psi: Optional[float] = None # psi
@dataclass @dataclass
@@ -324,10 +335,14 @@ class IdfEvent:
machinery without those code paths needing to know about Thor. machinery without those code paths needing to know about Thor.
Caveats of the bridge: Caveats of the bridge:
- ``mic_ppv`` on the produced Event carries Thor's dB(L) value - ``PeakValues.micl`` carries the mic peak in **psi** (matching
verbatim — the UI distinguishes via the ``device_family`` BW's convention) — set from :attr:`IdfPeaks.mic_pspl_psi`,
column (Phase 1). Don't run the BW psi→dBL converter on with a dB(L)→psi fallback when only the dB(L) value is
Series IV rows. available. This is what the h5 writer's mic-scale-factor
logic needs. The dB(L) value still flows through
``bw_report.mic.pspl_dbl`` (set by the
``idf_to_bw_report`` adapter) and the renderer reads it
from there for the report header.
- Many Thor-specific fields (Peak Acceleration / Displacement, - Many Thor-specific fields (Peak Acceleration / Displacement,
sensor self-check, calibration) don't have a slot in sensor self-check, calibration) don't have a slot in
``Event``. The full IdfReport is preserved on the ``Event``. The full IdfReport is preserved on the
@@ -349,11 +364,17 @@ class IdfEvent:
minute=self.timestamp.minute, minute=self.timestamp.minute,
second=self.timestamp.second, second=self.timestamp.second,
) )
# Resolve mic peak as psi. Priority: binary-derived mic_pspl_psi
# (set by read_idf_file) > dB(L)→psi fallback via standard formula
# (psi = 2.9e-9 × 10^(dBL/20)) > None.
mic_psi = self.peaks.mic_pspl_psi
if mic_psi is None and self.peaks.mic_pspl_dbl is not None:
mic_psi = 2.9e-9 * (10.0 ** (self.peaks.mic_pspl_dbl / 20.0))
pv = PeakValues( pv = PeakValues(
tran=self.peaks.transverse_ips, tran=self.peaks.transverse_ips,
vert=self.peaks.vertical_ips, vert=self.peaks.vertical_ips,
long=self.peaks.longitudinal_ips, long=self.peaks.longitudinal_ips,
micl=self.peaks.mic_pspl_dbl, # dB(L) — see caveat above micl=mic_psi, # psi, matching BW's convention (h5 scaling depends on this)
peak_vector_sum=self.peaks.peak_vector_sum_ips, peak_vector_sum=self.peaks.peak_vector_sum_ips,
) )
pi = ProjectInfo( pi = ProjectInfo(
+1 -1
View File
@@ -49,7 +49,7 @@ SIDECAR_KIND = "sfm.event"
# bumped without a `pip install` re-run — leading to confusing stale # bumped without a `pip install` re-run — leading to confusing stale
# version stamps in sidecars. Bump this constant and CHANGELOG.md # version stamps in sidecars. Bump this constant and CHANGELOG.md
# together at release time. # together at release time.
TOOL_VERSION = "0.21.0" TOOL_VERSION = "0.21.1"
try: try:
# Best-effort: prefer the installed metadata when it's NEWER than the # Best-effort: prefer the installed metadata when it's NEWER than the
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "seismo-relay" name = "seismo-relay"
version = "0.21.0" version = "0.21.1"
description = "Python client and REST server for MiniMate Plus seismographs" description = "Python client and REST server for MiniMate Plus seismographs"
requires-python = ">=3.10" requires-python = ">=3.10"
dependencies = [ dependencies = [
+51 -12
View File
@@ -35,8 +35,15 @@ and, for each one:
4. Writes the refreshed sidecar with the new ``bw_report``, 4. Writes the refreshed sidecar with the new ``bw_report``,
bumped ``source.tool_version``, but preserved ``review`` block bumped ``source.tool_version``, but preserved ``review`` block
+ the original ``captured_at`` timestamp. + the original ``captured_at`` timestamp.
5. For IDFW events with decoded samples, regenerates the .h5 5. Regenerates the .h5 waveform file via the existing
waveform file via the existing ``event_hdf5`` writer. ``event_hdf5`` writer. For IDFW that's the decoded per-sample
stream; for IDFH it's a 1-sample-per-interval synthesised array
(peak ADC count per channel) so the renderer's bar-chart code
has data to group on. Mic peak psi from the binary is merged
onto the IdfEvent before the bridge so the h5 writer's per-count
mic scale factor lands on a sensible value (without this the
mic chart on Thor events plots dB(L)-as-pseudo-psi and shows
bomb-level numbers).
Idempotent. Re-running it after a parser/adapter change just Idempotent. Re-running it after a parser/adapter change just
re-writes sidecars — no DB writes, no thor-watcher coordination. re-writes sidecars — no DB writes, no thor-watcher coordination.
@@ -231,10 +238,11 @@ def main(argv=None) -> int:
new_sidecar["extensions"] = ext new_sidecar["extensions"] = ext
if args.dry_run: if args.dry_run:
will_write_h5 = (idf_samples or idf_intervals) and not args.skip_hdf5
log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)", log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)",
serial, path.name, serial, path.name,
"wrote" if not has_bw_report else "refreshed", "wrote" if not has_bw_report else "refreshed",
"would write" if (idf_samples and not args.skip_hdf5) else "skipped") "would write" if will_write_h5 else "skipped")
else: else:
event_file_io.write_sidecar(sidecar_path, new_sidecar) event_file_io.write_sidecar(sidecar_path, new_sidecar)
log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)", log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)",
@@ -243,10 +251,15 @@ def main(argv=None) -> int:
len(idf_intervals) if idf_intervals else 0) len(idf_intervals) if idf_intervals else 0)
refreshed += 1 refreshed += 1
# Regenerate .h5 for IDFW events with decoded samples by # Regenerate .h5 by replaying the same IdfEvent → Event bridge
# replaying the same IdfEvent → Event bridge save_imported_idf # save_imported_idf uses. For IDFW we write the decoded per-
# uses. IDFH events have no per-sample data; skip. # sample arrays. For IDFH we synthesise a 1-sample-per-interval
if idf_samples and not args.skip_hdf5 and not is_histogram: # array (peak ADC count per channel per interval) so the
# renderer's bar-chart code has something to group on.
# Pre-condition: either real samples (IDFW) or decoded intervals
# (IDFH). Skip otherwise.
have_data = bool(idf_samples) or bool(idf_intervals)
if have_data and not args.skip_hdf5:
from sfm import event_hdf5 from sfm import event_hdf5
hdf5_path = store.hdf5_path_for(serial, path.name) hdf5_path = store.hdf5_path_for(serial, path.name)
if args.dry_run: if args.dry_run:
@@ -255,13 +268,36 @@ def main(argv=None) -> int:
try: try:
from micromate import IdfEvent from micromate import IdfEvent
from minimateplus.event_file_io import file_sha256 from minimateplus.event_file_io import file_sha256
# Bridge: parsed idf_report dict → IdfEvent →
# minimateplus Event → write_event_hdf5. Mirrors
# save_imported_idf steps 4-7.
idf_event = IdfEvent.from_report(report_dict, path.name) idf_event = IdfEvent.from_report(report_dict, path.name)
# Merge the binary-derived mic peak psi (only the
# binary path knows the proper psi value; the .txt
# carries dB(L)). Without this, the h5 writer's
# per-count mic factor is computed against the
# dB(L) value-as-pseudo-psi and the mic chart
# scales wildly.
if (binary_md is not None and res is not None
and res.event.peaks.mic_pspl_psi is not None):
idf_event.peaks.mic_pspl_psi = res.event.peaks.mic_pspl_psi
sha256 = file_sha256(path) sha256 = file_sha256(path)
waveform_key = bytes.fromhex(sha256)[:16] waveform_key = bytes.fromhex(sha256)[:16]
ev = idf_event.to_minimateplus_event(waveform_key) ev = idf_event.to_minimateplus_event(waveform_key)
if is_histogram and idf_intervals:
# 1 sample per interval per channel — same
# synthesis save_imported_idf uses. The h5
# writer's count×geo_fs/32768 conversion turns
# each peak-ADC-count into the bar's physical
# value.
ev.raw_samples = {
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
"Long": [iv.peak_count("Long") for iv in idf_intervals],
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
}
ev.total_samples = ev.total_samples or len(idf_intervals)
elif idf_samples:
ev.raw_samples = idf_samples ev.raw_samples = idf_samples
n_samp = max( n_samp = max(
(len(idf_samples.get(ch, [])) (len(idf_samples.get(ch, []))
@@ -269,6 +305,7 @@ def main(argv=None) -> int:
default=0, default=0,
) )
ev.total_samples = ev.total_samples or n_samp ev.total_samples = ev.total_samples or n_samp
event_hdf5.write_event_hdf5( event_hdf5.write_event_hdf5(
hdf5_path, ev, hdf5_path, ev,
serial=serial, serial=serial,
@@ -277,8 +314,10 @@ def main(argv=None) -> int:
tool_version=event_file_io.TOOL_VERSION, tool_version=event_file_io.TOOL_VERSION,
) )
h5_written += 1 h5_written += 1
log.debug("%s/%s — .h5 written (%d samples)", log.debug("%s/%s — .h5 written (%s)",
serial, path.name, n_samp) serial, path.name,
f"{len(idf_intervals)} intervals" if is_histogram
else f"{sum(len(v) for v in (idf_samples or {}).values())} samples")
except Exception as exc: except Exception as exc:
log.warning("%s/%s — .h5 write failed: %s", log.warning("%s/%s — .h5 write failed: %s",
serial, path.name, exc) serial, path.name, exc)
+91
View File
@@ -0,0 +1,91 @@
"""Re-ingest a prod IDFW + IDFH via the patched save_imported_idf and
render both PDFs to confirm charts have data."""
from __future__ import annotations
import sys
import json
import datetime
import tempfile
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
from sfm.waveform_store import WaveformStore
from sfm import report_pdf
import h5py
class FakeDb:
def __init__(self, event):
self.event = event
def get_event(self, _id):
return self.event
def to_ts_iso(ts):
if ts is None:
return None
try:
return datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
except Exception:
return None
def render_case(idf_path: Path, serial: str, out_pdf: Path, h5_summary: bool = True):
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idf_path.read_bytes(),
idf_path,
idf_report_text=None, # production worst case: no .txt
)
print(f"=== {idf_path.name} ===")
print(f" h5: {rec['hdf5_filename']}, sidecar: {rec['sidecar_filename']}")
h5p = Path(td) / serial / f"{idf_path.name}.h5"
if h5p.exists() and h5_summary:
with h5py.File(h5p) as h:
for ch in ("Tran", "Vert", "Long", "MicL"):
ds = h.get(f"samples/{ch}")
if ds is not None:
n = ds.shape[0]
mx = float(abs(ds[...]).max()) if n else 0
print(f" samples/{ch}: n={n} max_abs={mx:.5f}")
record_type = "Histogram" if idf_path.suffix.upper() == ".IDFH" else "Waveform"
fake_row = {
"serial": serial,
"blastware_filename": rec["filename"],
"record_type": record_type,
"timestamp": to_ts_iso(ev.timestamp),
"sample_rate": ev.sample_rate,
"project": ev.project_info.project if ev.project_info else None,
"client": ev.project_info.client if ev.project_info else None,
"operator": ev.project_info.operator if ev.project_info else None,
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
"created_at": None,
}
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
print(f" ReportData: channels={ {k: len(v) for k,v in rd.channels.items()} }")
if rd.is_histogram:
print(f" histogram n_intervals={rd.histogram_n_intervals} interval_size={rd.histogram_interval_size}")
pdf = report_pdf.render_event_report_pdf(rd)
out_pdf.write_bytes(pdf)
print(f" PDF: {out_pdf} ({len(pdf)} bytes)")
def main():
out_dir = Path("/tmp/thor_render_test"); out_dir.mkdir(exist_ok=True)
cases = [
# IDFW that decoded to preamble-only under the old codec
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804154137.IDFW", "UM6047"),
# IDFW that worked under the old codec (validates no regression)
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804104450.IDFW", "UM6047"),
# IDFH histogram
("/home/serversdown/seismo-relay-prod-snap/waveforms/UM6047/UM6047_20250804190047.IDFH", "UM6047"),
]
for path, serial in cases:
render_case(Path(path), serial, out_dir / f"{Path(path).name}.pdf")
if __name__ == "__main__":
main()
+31 -3
View File
@@ -568,6 +568,16 @@ class WaveformStore:
# precedence over the filename timestamp inside from_report(). # precedence over the filename timestamp inside from_report().
idf_event = IdfEvent.from_report(report_dict, source_path.name) idf_event = IdfEvent.from_report(report_dict, source_path.name)
# The binary mic peak (psi) isn't carried through from_report() —
# IdfReport.from_dict only sees the .txt's dB(L) value. Pull the
# binary-derived ``mic_pspl_psi`` onto the typed IdfEvent so the
# downstream bridge can populate ``PeakValues.micl`` (psi-shaped)
# and the h5 writer's per-count mic factor lands at a sensible
# value. Without this, the h5 mic chart auto-scales against the
# dB(L) value-as-pseudo-psi and renders ~flat.
if binary_peaks is not None and binary_peaks.mic_pspl_psi is not None:
idf_event.peaks.mic_pspl_psi = binary_peaks.mic_pspl_psi
# Operator-supplied serial_hint wins over the binary's filename # Operator-supplied serial_hint wins over the binary's filename
# prefix when both are present (e.g. callers passing a known-good # prefix when both are present (e.g. callers passing a known-good
# serial that overrides a misnamed export). # serial that overrides a misnamed export).
@@ -600,10 +610,28 @@ class WaveformStore:
n_samples = max((len(idf_samples.get(ch, [])) for ch in ("Tran", "Vert", "Long", "MicL")), default=0) n_samples = max((len(idf_samples.get(ch, [])) for ch in ("Tran", "Vert", "Long", "MicL")), default=0)
ev.total_samples = ev.total_samples or n_samples ev.total_samples = ev.total_samples or n_samples
# 7. Write the .h5 clean-waveform file when we actually have samples. # For IDFH histograms there are no per-sample waveform arrays — the
# Histograms (IDFH) don't have waveform samples — skip h5 for those. # device stores one peak ADC count per interval per channel. Synthesise
# a 1-sample-per-interval array so the existing h5+renderer pipeline
# (which groups samples down to ``n_intervals`` bars via max-per-group)
# produces a non-blank histogram chart. Each "sample" is the peak ADC
# count for that interval, so the h5 writer's ``count × geo_fs/32768``
# conversion yields the right physical value for the bar height.
if is_histogram and idf_intervals:
hist_samples = {
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
"Long": [iv.peak_count("Long") for iv in idf_intervals],
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
}
ev.raw_samples = hist_samples
ev.total_samples = ev.total_samples or len(idf_intervals)
# 7. Write the .h5 clean-waveform file when we have samples to write
# (either the IDFW per-sample stream, or the IDFH synthesised per-
# interval peak array). The renderer treats both shapes the same way.
hdf5_filename: Optional[str] = None hdf5_filename: Optional[str] = None
if idf_samples is not None and not is_histogram: if ev.raw_samples:
hdf5_path = self.hdf5_path_for(serial, filename) hdf5_path = self.hdf5_path_for(serial, filename)
try: try:
event_hdf5.write_event_hdf5( event_hdf5.write_event_hdf5(