ingest: preserve raw BW ASCII report (.TXT) alongside the binary

Previously the .TXT was parsed into the sidecar's bw_report projection
and then discarded at ingest time.  Now save_imported_bw() writes it
to <store>/<serial>/<filename>_ASCII.TXT permanently.

Rationale: with BW Mail / Forwarding Agent being phased out of the
operator workflow, the XML/PDF/WMF those tools produce won't be
available — the binary + .TXT (created by BW ACH itself) are our
only authoritative inputs going forward.  Keeping the raw .TXT
unlocks:

  - Parser bug fixes can be applied RETROACTIVELY by re-parsing the
    stored .TXT, instead of requiring a re-forward from the watcher
    PC (which lost the .TXT after BW ACH cleanup).
  - Audit trail of what BW actually sent us, for debugging.
  - The five known parser-PPV-miss events will be re-parseable once
    the regex fix lands (instead of staying broken indefinitely).

Storage cost: ~15 KB per event × 14k events = ~210 MB on the
existing prod corpus.  Negligible.

Implementation:
  - WaveformStore gains txt_path_for() + open_txt()
  - save_imported_bw() writes the .TXT when bw_report_text is supplied
  - sidecar source block records the txt_filename
  - backfill_sidecars.py preserves txt_filename across regens
  - New GET /db/events/{id}/ascii_report.txt endpoint serves it
  - Returns 404 for events ingested before this change (no .TXT in
    the store yet) — re-forward to populate

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-27 20:01:12 +00:00
parent dfbc8b8520
commit ad2b553c7b
5 changed files with 85 additions and 0 deletions
+2
View File
@@ -8,6 +8,8 @@ All notable changes to seismo-relay are documented here.
### Added
- **Raw BW ASCII report (.TXT) preservation.** `save_imported_bw` now writes the paired `_ASCII.TXT` to `<store>/<serial>/<filename>_ASCII.TXT` alongside the binary at ingest time. Previously the .TXT was parsed into the sidecar's `bw_report` projection and then discarded — meaning parser bug fixes couldn't be applied retroactively without re-forwarding from the watcher PC. Now the raw .TXT lives in the waveform store permanently (~15 KB per event; ~210 MB total for a 14k-event store; negligible). Sidecar's `source.txt_filename` field records the saved path; backfill_sidecars preserves it across regens. New `GET /db/events/{id}/ascii_report.txt` endpoint serves the raw .TXT for any event ingested after this change. Events ingested before today still return 404 from that endpoint until re-forwarded. Architectural rationale: with BW Mail / Forwarding Agent being phased out of the operator workflow, the XML/PDF/WMF that those tools produced are no longer available — the binary + .TXT (created by BW ACH itself) are our authoritative source for everything going forward.
- **Event Report PDF generation** — `GET /db/events/{id}/report.pdf` returns a single-page letter-portrait PDF for any event with waveform data on disk. Covers every field a Blastware Event Report includes: header metadata (date/time, trigger source, range, sample rate, project/client/operator/location, serial+firmware, battery, calibration, file name), microphone block (PSPL in dB(L) + psi, ZC freq, channel test), per-channel stats table (rows differ for waveform vs histogram), Peak Vector Sum, and the 4-channel plot. Iterated against real Blastware reference PDFs (uploaded to `example-events/pdfsnstuff/`):
- **Waveform layout**: header shows Date/Time, Trigger Source, Range, Sample Rate; stats table has PPV / ZC Freq / Time (Rel. to Trig) / Peak Accel / Peak Disp / Sensor Check; bottom plot is 4-channel line waveform (MicL top → Tran bottom), shared time axis in seconds, dashed trigger line + triangle marker at t=0, symmetric Y on geo channels, zero-anchored on mic, "0.0" baseline label on right per BW convention; footer shows `Time X sec/div Amplitude Geo: Y in/s/div Mic: 0.001 psi(L)/div` and the trigger window `▶━━◀` marker. USBM RI8507/OSMRE compliance chart placeholder upper-right.
- **Histogram layout**: header shows Start / Finish / Intervals At Size / Range / Sample Rate (no Trigger Source — histograms aren't triggered); NO USBM chart; stats table has PPV / ZC Freq / Date / Time / Sensor Check; bottom plot is per-interval bar chart, Y-axis 0-to-peak (never negative), 0.0 baseline at the bottom; footer shows `Time INTERVAL_SIZE /div Amplitude Geo: Y in/s/div Mic: 0.001 psi(L)/div`.
+2
View File
@@ -332,6 +332,7 @@ def event_to_sidecar_dict(
blastware_filesize: int,
blastware_sha256: str,
source_kind: str = "sfm-live",
txt_filename: Optional[str] = None,
a5_pickle_filename: Optional[str] = None,
tool_version: str = _TOOL_VERSION_DEFAULT,
captured_at: Optional[datetime.datetime] = None,
@@ -448,6 +449,7 @@ def event_to_sidecar_dict(
"captured_at": captured_at.isoformat() + "Z" if captured_at.tzinfo is None else captured_at.isoformat(),
"tool_version": tool_version,
"a5_pickle_filename": a5_pickle_filename,
"txt_filename": txt_filename,
},
"review": review or {
+6
View File
@@ -300,12 +300,17 @@ def main(argv=None) -> int:
preserved_review = None
preserved_ext = None
preserved_bw_report = None
preserved_txt_fn = None
if sidecar_path.exists():
try:
_existing = event_file_io.read_sidecar(sidecar_path)
preserved_review = _existing.get("review")
preserved_ext = _existing.get("extensions")
preserved_bw_report = _existing.get("bw_report")
# Preserve txt_filename so backfills don't blank out the
# pointer to the saved raw .TXT (events ingested after
# 2026-05-27 have this).
preserved_txt_fn = (_existing.get("source") or {}).get("txt_filename")
except Exception:
pass
@@ -334,6 +339,7 @@ def main(argv=None) -> int:
blastware_sha256=bw_sha,
source_kind=source_kind,
a5_pickle_filename=a5_filename,
txt_filename=preserved_txt_fn,
review=preserved_review,
extensions=preserved_ext,
)
+33
View File
@@ -2178,6 +2178,39 @@ def db_event_blastware_file(event_id: str) -> FileResponse:
)
@app.get("/db/events/{event_id}/ascii_report.txt")
def db_event_ascii_report_txt(event_id: str):
"""Serve the raw BW ASCII report (.TXT) for an event, when preserved.
Returns 404 for events ingested before the .TXT-preservation feature
landed (2026-05-27) those events have only the parsed ``bw_report``
block in the sidecar, not the raw .TXT. Re-forwarding from the
watcher PC will populate the .TXT going forward.
"""
row = _get_db().get_event(event_id)
if row is None:
raise HTTPException(status_code=404, detail=f"Event {event_id} not found")
serial = row.get("serial")
filename = row.get("blastware_filename")
if not serial or not filename:
raise HTTPException(status_code=404, detail="Event has no associated BW file")
txt_path = _get_store().open_txt(serial, filename)
if txt_path is None:
raise HTTPException(
status_code=404,
detail=(
f"Raw .TXT not preserved for {filename}. Events ingested "
"before 2026-05-27 don't have it; re-forward from the "
"watcher PC to populate."
),
)
return FileResponse(
path=str(txt_path),
media_type="text/plain",
filename=txt_path.name,
)
@app.get("/db/events/{event_id}/report.pdf")
def db_event_report_pdf(event_id: str):
"""Render an Instantel-style Event Report as a PDF.
+42
View File
@@ -108,11 +108,30 @@ class WaveformStore:
"""Return absolute path to the .h5 clean-waveform file for a given event."""
return self._serial_dir(serial) / f"{filename}.h5"
def txt_path_for(self, serial: str, filename: str) -> Path:
"""Return absolute path to the preserved BW ASCII report (.TXT)
for a given event.
We name it ``<filename>_ASCII.TXT`` to match BW's own filename
convention in the ACH folder. Saved at ingest time alongside
the binary so the parser bug fixes can be applied retroactively
by re-parsing without needing to re-forward from the watcher PC.
"""
return self._serial_dir(serial) / f"{filename}_ASCII.TXT"
def open_blastware(self, serial: str, filename: str) -> Optional[Path]:
"""Return absolute path to an existing event file or None."""
bw_path, _ = self.paths_for(serial, filename)
return bw_path if bw_path.exists() else None
def open_txt(self, serial: str, filename: str) -> Optional[Path]:
"""Return absolute path to the preserved BW ASCII report for an
event, or None if the .TXT wasn't saved at ingest time (events
ingested before .TXT preservation landed will show None until
re-forwarded)."""
p = self.txt_path_for(serial, filename)
return p if p.exists() else None
# ── save / load ─────────────────────────────────────────────────────────────
def save(
@@ -357,6 +376,28 @@ class WaveformStore:
filesize = bw_path.stat().st_size
sha256 = event_file_io.file_sha256(bw_path)
# 1b. preserve the raw BW ASCII report (.TXT) alongside the binary.
# Saved at <root>/<serial>/<filename>_ASCII.TXT. Lets us re-parse
# offline after parser fixes without needing to re-forward from
# the watcher PC. Negligible storage cost (~15 KB per event).
# Skipped silently when no report was supplied (live download path,
# manual upload without paired TXT).
txt_filename: Optional[str] = None
if bw_report_text is not None:
try:
txt_path = self.txt_path_for(serial, filename)
if isinstance(bw_report_text, bytes):
txt_path.write_bytes(bw_report_text)
else:
txt_path.write_text(bw_report_text)
txt_filename = txt_path.name
except Exception as exc:
log.warning(
"save_imported_bw: failed to save TXT for %s: %s"
"continuing without it",
filename, exc,
)
# 2. write the .h5 clean-waveform file from the parsed Event.
# Note: peaks here are computed from raw samples (the BW file
# doesn't carry the device-authoritative 0C peaks). Best-effort.
@@ -393,6 +434,7 @@ class WaveformStore:
blastware_sha256=sha256,
source_kind="bw-import",
a5_pickle_filename=None,
txt_filename=txt_filename,
review=existing_review,
bw_report=bw_report,
)