ingest: preserve raw BW ASCII report (.TXT) alongside the binary
Previously the .TXT was parsed into the sidecar's bw_report projection
and then discarded at ingest time. Now save_imported_bw() writes it
to <store>/<serial>/<filename>_ASCII.TXT permanently.
Rationale: with BW Mail / Forwarding Agent being phased out of the
operator workflow, the XML/PDF/WMF those tools produce won't be
available — the binary + .TXT (created by BW ACH itself) are our
only authoritative inputs going forward. Keeping the raw .TXT
unlocks:
- Parser bug fixes can be applied RETROACTIVELY by re-parsing the
stored .TXT, instead of requiring a re-forward from the watcher
PC (which lost the .TXT after BW ACH cleanup).
- Audit trail of what BW actually sent us, for debugging.
- The five known parser-PPV-miss events will be re-parseable once
the regex fix lands (instead of staying broken indefinitely).
Storage cost: ~15 KB per event × 14k events = ~210 MB on the
existing prod corpus. Negligible.
Implementation:
- WaveformStore gains txt_path_for() + open_txt()
- save_imported_bw() writes the .TXT when bw_report_text is supplied
- sidecar source block records the txt_filename
- backfill_sidecars.py preserves txt_filename across regens
- New GET /db/events/{id}/ascii_report.txt endpoint serves it
- Returns 404 for events ingested before this change (no .TXT in
the store yet) — re-forward to populate
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -8,6 +8,8 @@ All notable changes to seismo-relay are documented here.
|
||||
|
||||
### Added
|
||||
|
||||
- **Raw BW ASCII report (.TXT) preservation.** `save_imported_bw` now writes the paired `_ASCII.TXT` to `<store>/<serial>/<filename>_ASCII.TXT` alongside the binary at ingest time. Previously the .TXT was parsed into the sidecar's `bw_report` projection and then discarded — meaning parser bug fixes couldn't be applied retroactively without re-forwarding from the watcher PC. Now the raw .TXT lives in the waveform store permanently (~15 KB per event; ~210 MB total for a 14k-event store; negligible). Sidecar's `source.txt_filename` field records the saved path; backfill_sidecars preserves it across regens. New `GET /db/events/{id}/ascii_report.txt` endpoint serves the raw .TXT for any event ingested after this change. Events ingested before today still return 404 from that endpoint until re-forwarded. Architectural rationale: with BW Mail / Forwarding Agent being phased out of the operator workflow, the XML/PDF/WMF that those tools produced are no longer available — the binary + .TXT (created by BW ACH itself) are our authoritative source for everything going forward.
|
||||
|
||||
- **Event Report PDF generation** — `GET /db/events/{id}/report.pdf` returns a single-page letter-portrait PDF for any event with waveform data on disk. Covers every field a Blastware Event Report includes: header metadata (date/time, trigger source, range, sample rate, project/client/operator/location, serial+firmware, battery, calibration, file name), microphone block (PSPL in dB(L) + psi, ZC freq, channel test), per-channel stats table (rows differ for waveform vs histogram), Peak Vector Sum, and the 4-channel plot. Iterated against real Blastware reference PDFs (uploaded to `example-events/pdfsnstuff/`):
|
||||
- **Waveform layout**: header shows Date/Time, Trigger Source, Range, Sample Rate; stats table has PPV / ZC Freq / Time (Rel. to Trig) / Peak Accel / Peak Disp / Sensor Check; bottom plot is 4-channel line waveform (MicL top → Tran bottom), shared time axis in seconds, dashed trigger line + triangle marker at t=0, symmetric Y on geo channels, zero-anchored on mic, "0.0" baseline label on right per BW convention; footer shows `Time X sec/div Amplitude Geo: Y in/s/div Mic: 0.001 psi(L)/div` and the trigger window `▶━━◀` marker. USBM RI8507/OSMRE compliance chart placeholder upper-right.
|
||||
- **Histogram layout**: header shows Start / Finish / Intervals At Size / Range / Sample Rate (no Trigger Source — histograms aren't triggered); NO USBM chart; stats table has PPV / ZC Freq / Date / Time / Sensor Check; bottom plot is per-interval bar chart, Y-axis 0-to-peak (never negative), 0.0 baseline at the bottom; footer shows `Time INTERVAL_SIZE /div Amplitude Geo: Y in/s/div Mic: 0.001 psi(L)/div`.
|
||||
|
||||
@@ -332,6 +332,7 @@ def event_to_sidecar_dict(
|
||||
blastware_filesize: int,
|
||||
blastware_sha256: str,
|
||||
source_kind: str = "sfm-live",
|
||||
txt_filename: Optional[str] = None,
|
||||
a5_pickle_filename: Optional[str] = None,
|
||||
tool_version: str = _TOOL_VERSION_DEFAULT,
|
||||
captured_at: Optional[datetime.datetime] = None,
|
||||
@@ -448,6 +449,7 @@ def event_to_sidecar_dict(
|
||||
"captured_at": captured_at.isoformat() + "Z" if captured_at.tzinfo is None else captured_at.isoformat(),
|
||||
"tool_version": tool_version,
|
||||
"a5_pickle_filename": a5_pickle_filename,
|
||||
"txt_filename": txt_filename,
|
||||
},
|
||||
|
||||
"review": review or {
|
||||
|
||||
@@ -300,12 +300,17 @@ def main(argv=None) -> int:
|
||||
preserved_review = None
|
||||
preserved_ext = None
|
||||
preserved_bw_report = None
|
||||
preserved_txt_fn = None
|
||||
if sidecar_path.exists():
|
||||
try:
|
||||
_existing = event_file_io.read_sidecar(sidecar_path)
|
||||
preserved_review = _existing.get("review")
|
||||
preserved_ext = _existing.get("extensions")
|
||||
preserved_bw_report = _existing.get("bw_report")
|
||||
# Preserve txt_filename so backfills don't blank out the
|
||||
# pointer to the saved raw .TXT (events ingested after
|
||||
# 2026-05-27 have this).
|
||||
preserved_txt_fn = (_existing.get("source") or {}).get("txt_filename")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
@@ -334,6 +339,7 @@ def main(argv=None) -> int:
|
||||
blastware_sha256=bw_sha,
|
||||
source_kind=source_kind,
|
||||
a5_pickle_filename=a5_filename,
|
||||
txt_filename=preserved_txt_fn,
|
||||
review=preserved_review,
|
||||
extensions=preserved_ext,
|
||||
)
|
||||
|
||||
@@ -2178,6 +2178,39 @@ def db_event_blastware_file(event_id: str) -> FileResponse:
|
||||
)
|
||||
|
||||
|
||||
@app.get("/db/events/{event_id}/ascii_report.txt")
|
||||
def db_event_ascii_report_txt(event_id: str):
|
||||
"""Serve the raw BW ASCII report (.TXT) for an event, when preserved.
|
||||
|
||||
Returns 404 for events ingested before the .TXT-preservation feature
|
||||
landed (2026-05-27) — those events have only the parsed ``bw_report``
|
||||
block in the sidecar, not the raw .TXT. Re-forwarding from the
|
||||
watcher PC will populate the .TXT going forward.
|
||||
"""
|
||||
row = _get_db().get_event(event_id)
|
||||
if row is None:
|
||||
raise HTTPException(status_code=404, detail=f"Event {event_id} not found")
|
||||
serial = row.get("serial")
|
||||
filename = row.get("blastware_filename")
|
||||
if not serial or not filename:
|
||||
raise HTTPException(status_code=404, detail="Event has no associated BW file")
|
||||
txt_path = _get_store().open_txt(serial, filename)
|
||||
if txt_path is None:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=(
|
||||
f"Raw .TXT not preserved for {filename}. Events ingested "
|
||||
"before 2026-05-27 don't have it; re-forward from the "
|
||||
"watcher PC to populate."
|
||||
),
|
||||
)
|
||||
return FileResponse(
|
||||
path=str(txt_path),
|
||||
media_type="text/plain",
|
||||
filename=txt_path.name,
|
||||
)
|
||||
|
||||
|
||||
@app.get("/db/events/{event_id}/report.pdf")
|
||||
def db_event_report_pdf(event_id: str):
|
||||
"""Render an Instantel-style Event Report as a PDF.
|
||||
|
||||
@@ -108,11 +108,30 @@ class WaveformStore:
|
||||
"""Return absolute path to the .h5 clean-waveform file for a given event."""
|
||||
return self._serial_dir(serial) / f"{filename}.h5"
|
||||
|
||||
def txt_path_for(self, serial: str, filename: str) -> Path:
|
||||
"""Return absolute path to the preserved BW ASCII report (.TXT)
|
||||
for a given event.
|
||||
|
||||
We name it ``<filename>_ASCII.TXT`` to match BW's own filename
|
||||
convention in the ACH folder. Saved at ingest time alongside
|
||||
the binary so the parser bug fixes can be applied retroactively
|
||||
by re-parsing without needing to re-forward from the watcher PC.
|
||||
"""
|
||||
return self._serial_dir(serial) / f"{filename}_ASCII.TXT"
|
||||
|
||||
def open_blastware(self, serial: str, filename: str) -> Optional[Path]:
|
||||
"""Return absolute path to an existing event file or None."""
|
||||
bw_path, _ = self.paths_for(serial, filename)
|
||||
return bw_path if bw_path.exists() else None
|
||||
|
||||
def open_txt(self, serial: str, filename: str) -> Optional[Path]:
|
||||
"""Return absolute path to the preserved BW ASCII report for an
|
||||
event, or None if the .TXT wasn't saved at ingest time (events
|
||||
ingested before .TXT preservation landed will show None until
|
||||
re-forwarded)."""
|
||||
p = self.txt_path_for(serial, filename)
|
||||
return p if p.exists() else None
|
||||
|
||||
# ── save / load ─────────────────────────────────────────────────────────────
|
||||
|
||||
def save(
|
||||
@@ -357,6 +376,28 @@ class WaveformStore:
|
||||
filesize = bw_path.stat().st_size
|
||||
sha256 = event_file_io.file_sha256(bw_path)
|
||||
|
||||
# 1b. preserve the raw BW ASCII report (.TXT) alongside the binary.
|
||||
# Saved at <root>/<serial>/<filename>_ASCII.TXT. Lets us re-parse
|
||||
# offline after parser fixes without needing to re-forward from
|
||||
# the watcher PC. Negligible storage cost (~15 KB per event).
|
||||
# Skipped silently when no report was supplied (live download path,
|
||||
# manual upload without paired TXT).
|
||||
txt_filename: Optional[str] = None
|
||||
if bw_report_text is not None:
|
||||
try:
|
||||
txt_path = self.txt_path_for(serial, filename)
|
||||
if isinstance(bw_report_text, bytes):
|
||||
txt_path.write_bytes(bw_report_text)
|
||||
else:
|
||||
txt_path.write_text(bw_report_text)
|
||||
txt_filename = txt_path.name
|
||||
except Exception as exc:
|
||||
log.warning(
|
||||
"save_imported_bw: failed to save TXT for %s: %s — "
|
||||
"continuing without it",
|
||||
filename, exc,
|
||||
)
|
||||
|
||||
# 2. write the .h5 clean-waveform file from the parsed Event.
|
||||
# Note: peaks here are computed from raw samples (the BW file
|
||||
# doesn't carry the device-authoritative 0C peaks). Best-effort.
|
||||
@@ -393,6 +434,7 @@ class WaveformStore:
|
||||
blastware_sha256=sha256,
|
||||
source_kind="bw-import",
|
||||
a5_pickle_filename=None,
|
||||
txt_filename=txt_filename,
|
||||
review=existing_review,
|
||||
bw_report=bw_report,
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user