ingest: preserve raw BW ASCII report (.TXT) alongside the binary
Previously the .TXT was parsed into the sidecar's bw_report projection
and then discarded at ingest time. Now save_imported_bw() writes it
to <store>/<serial>/<filename>_ASCII.TXT permanently.
Rationale: with BW Mail / Forwarding Agent being phased out of the
operator workflow, the XML/PDF/WMF those tools produce won't be
available — the binary + .TXT (created by BW ACH itself) are our
only authoritative inputs going forward. Keeping the raw .TXT
unlocks:
- Parser bug fixes can be applied RETROACTIVELY by re-parsing the
stored .TXT, instead of requiring a re-forward from the watcher
PC (which lost the .TXT after BW ACH cleanup).
- Audit trail of what BW actually sent us, for debugging.
- The five known parser-PPV-miss events will be re-parseable once
the regex fix lands (instead of staying broken indefinitely).
Storage cost: ~15 KB per event × 14k events = ~210 MB on the
existing prod corpus. Negligible.
Implementation:
- WaveformStore gains txt_path_for() + open_txt()
- save_imported_bw() writes the .TXT when bw_report_text is supplied
- sidecar source block records the txt_filename
- backfill_sidecars.py preserves txt_filename across regens
- New GET /db/events/{id}/ascii_report.txt endpoint serves it
- Returns 404 for events ingested before this change (no .TXT in
the store yet) — re-forward to populate
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -332,6 +332,7 @@ def event_to_sidecar_dict(
|
||||
blastware_filesize: int,
|
||||
blastware_sha256: str,
|
||||
source_kind: str = "sfm-live",
|
||||
txt_filename: Optional[str] = None,
|
||||
a5_pickle_filename: Optional[str] = None,
|
||||
tool_version: str = _TOOL_VERSION_DEFAULT,
|
||||
captured_at: Optional[datetime.datetime] = None,
|
||||
@@ -448,6 +449,7 @@ def event_to_sidecar_dict(
|
||||
"captured_at": captured_at.isoformat() + "Z" if captured_at.tzinfo is None else captured_at.isoformat(),
|
||||
"tool_version": tool_version,
|
||||
"a5_pickle_filename": a5_pickle_filename,
|
||||
"txt_filename": txt_filename,
|
||||
},
|
||||
|
||||
"review": review or {
|
||||
|
||||
Reference in New Issue
Block a user