ingest: preserve raw BW ASCII report (.TXT) alongside the binary
Previously the .TXT was parsed into the sidecar's bw_report projection
and then discarded at ingest time. Now save_imported_bw() writes it
to <store>/<serial>/<filename>_ASCII.TXT permanently.
Rationale: with BW Mail / Forwarding Agent being phased out of the
operator workflow, the XML/PDF/WMF those tools produce won't be
available — the binary + .TXT (created by BW ACH itself) are our
only authoritative inputs going forward. Keeping the raw .TXT
unlocks:
- Parser bug fixes can be applied RETROACTIVELY by re-parsing the
stored .TXT, instead of requiring a re-forward from the watcher
PC (which lost the .TXT after BW ACH cleanup).
- Audit trail of what BW actually sent us, for debugging.
- The five known parser-PPV-miss events will be re-parseable once
the regex fix lands (instead of staying broken indefinitely).
Storage cost: ~15 KB per event × 14k events = ~210 MB on the
existing prod corpus. Negligible.
Implementation:
- WaveformStore gains txt_path_for() + open_txt()
- save_imported_bw() writes the .TXT when bw_report_text is supplied
- sidecar source block records the txt_filename
- backfill_sidecars.py preserves txt_filename across regens
- New GET /db/events/{id}/ascii_report.txt endpoint serves it
- Returns 404 for events ingested before this change (no .TXT in
the store yet) — re-forward to populate
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -108,11 +108,30 @@ class WaveformStore:
|
||||
"""Return absolute path to the .h5 clean-waveform file for a given event."""
|
||||
return self._serial_dir(serial) / f"{filename}.h5"
|
||||
|
||||
def txt_path_for(self, serial: str, filename: str) -> Path:
|
||||
"""Return absolute path to the preserved BW ASCII report (.TXT)
|
||||
for a given event.
|
||||
|
||||
We name it ``<filename>_ASCII.TXT`` to match BW's own filename
|
||||
convention in the ACH folder. Saved at ingest time alongside
|
||||
the binary so the parser bug fixes can be applied retroactively
|
||||
by re-parsing without needing to re-forward from the watcher PC.
|
||||
"""
|
||||
return self._serial_dir(serial) / f"{filename}_ASCII.TXT"
|
||||
|
||||
def open_blastware(self, serial: str, filename: str) -> Optional[Path]:
|
||||
"""Return absolute path to an existing event file or None."""
|
||||
bw_path, _ = self.paths_for(serial, filename)
|
||||
return bw_path if bw_path.exists() else None
|
||||
|
||||
def open_txt(self, serial: str, filename: str) -> Optional[Path]:
|
||||
"""Return absolute path to the preserved BW ASCII report for an
|
||||
event, or None if the .TXT wasn't saved at ingest time (events
|
||||
ingested before .TXT preservation landed will show None until
|
||||
re-forwarded)."""
|
||||
p = self.txt_path_for(serial, filename)
|
||||
return p if p.exists() else None
|
||||
|
||||
# ── save / load ─────────────────────────────────────────────────────────────
|
||||
|
||||
def save(
|
||||
@@ -357,6 +376,28 @@ class WaveformStore:
|
||||
filesize = bw_path.stat().st_size
|
||||
sha256 = event_file_io.file_sha256(bw_path)
|
||||
|
||||
# 1b. preserve the raw BW ASCII report (.TXT) alongside the binary.
|
||||
# Saved at <root>/<serial>/<filename>_ASCII.TXT. Lets us re-parse
|
||||
# offline after parser fixes without needing to re-forward from
|
||||
# the watcher PC. Negligible storage cost (~15 KB per event).
|
||||
# Skipped silently when no report was supplied (live download path,
|
||||
# manual upload without paired TXT).
|
||||
txt_filename: Optional[str] = None
|
||||
if bw_report_text is not None:
|
||||
try:
|
||||
txt_path = self.txt_path_for(serial, filename)
|
||||
if isinstance(bw_report_text, bytes):
|
||||
txt_path.write_bytes(bw_report_text)
|
||||
else:
|
||||
txt_path.write_text(bw_report_text)
|
||||
txt_filename = txt_path.name
|
||||
except Exception as exc:
|
||||
log.warning(
|
||||
"save_imported_bw: failed to save TXT for %s: %s — "
|
||||
"continuing without it",
|
||||
filename, exc,
|
||||
)
|
||||
|
||||
# 2. write the .h5 clean-waveform file from the parsed Event.
|
||||
# Note: peaks here are computed from raw samples (the BW file
|
||||
# doesn't carry the device-authoritative 0C peaks). Best-effort.
|
||||
@@ -393,6 +434,7 @@ class WaveformStore:
|
||||
blastware_sha256=sha256,
|
||||
source_kind="bw-import",
|
||||
a5_pickle_filename=None,
|
||||
txt_filename=txt_filename,
|
||||
review=existing_review,
|
||||
bw_report=bw_report,
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user