ingest: preserve raw BW ASCII report (.TXT) alongside the binary
Previously the .TXT was parsed into the sidecar's bw_report projection
and then discarded at ingest time. Now save_imported_bw() writes it
to <store>/<serial>/<filename>_ASCII.TXT permanently.
Rationale: with BW Mail / Forwarding Agent being phased out of the
operator workflow, the XML/PDF/WMF those tools produce won't be
available — the binary + .TXT (created by BW ACH itself) are our
only authoritative inputs going forward. Keeping the raw .TXT
unlocks:
- Parser bug fixes can be applied RETROACTIVELY by re-parsing the
stored .TXT, instead of requiring a re-forward from the watcher
PC (which lost the .TXT after BW ACH cleanup).
- Audit trail of what BW actually sent us, for debugging.
- The five known parser-PPV-miss events will be re-parseable once
the regex fix lands (instead of staying broken indefinitely).
Storage cost: ~15 KB per event × 14k events = ~210 MB on the
existing prod corpus. Negligible.
Implementation:
- WaveformStore gains txt_path_for() + open_txt()
- save_imported_bw() writes the .TXT when bw_report_text is supplied
- sidecar source block records the txt_filename
- backfill_sidecars.py preserves txt_filename across regens
- New GET /db/events/{id}/ascii_report.txt endpoint serves it
- Returns 404 for events ingested before this change (no .TXT in
the store yet) — re-forward to populate
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -300,12 +300,17 @@ def main(argv=None) -> int:
|
||||
preserved_review = None
|
||||
preserved_ext = None
|
||||
preserved_bw_report = None
|
||||
preserved_txt_fn = None
|
||||
if sidecar_path.exists():
|
||||
try:
|
||||
_existing = event_file_io.read_sidecar(sidecar_path)
|
||||
preserved_review = _existing.get("review")
|
||||
preserved_ext = _existing.get("extensions")
|
||||
preserved_bw_report = _existing.get("bw_report")
|
||||
# Preserve txt_filename so backfills don't blank out the
|
||||
# pointer to the saved raw .TXT (events ingested after
|
||||
# 2026-05-27 have this).
|
||||
preserved_txt_fn = (_existing.get("source") or {}).get("txt_filename")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
@@ -334,6 +339,7 @@ def main(argv=None) -> int:
|
||||
blastware_sha256=bw_sha,
|
||||
source_kind=source_kind,
|
||||
a5_pickle_filename=a5_filename,
|
||||
txt_filename=preserved_txt_fn,
|
||||
review=preserved_review,
|
||||
extensions=preserved_ext,
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user