fix(idf): decode from in-memory bytes during ingest
Bug shipped in v0.21.0: save_imported_idf called read_idf_file() with `source_path` (a bare filename like "UM12947_….IDFW") BEFORE writing the binary to disk. The codec did Path(path).read_bytes() which resolved relative to /app and hit FileNotFoundError. The error was caught + logged as a warning, and ingest fell back to .txt-only — events still landed in the DB but lost the bw_report block + .h5 waveform that the codec was supposed to produce. Observed during a full re-forward from thor-watcher on 2026-05-29: every Thor event logged "binary codec failed for X: [Errno 2] No such file or directory" and got binary_decoded=False. Fix: - read_idf_file() gains a `data: Optional[bytes]` kwarg. When supplied, skips the disk read and decodes the provided bytes directly. `path` stays required (used for filename in error messages + .IDFH vs .IDFW suffix detection); only the read is conditional. Backward compatible — existing positional callers (CLI scripts, tests) continue to work unchanged. - save_imported_idf passes `data=idf_bytes` since the bytes are already in memory from the multipart upload. Filesystem write still happens at step 5 of the existing flow; codec just no longer depends on it. Verified end-to-end against UM11719_20231219162723.IDFW from the example-data corpus: ingest endpoint returns inserted=1, log line shows binary_decoded=True + h5=...IDFW.h5, no warnings. Re-forward existing Thor events from thor-watcher after deploy to backfill the bw_report block — UPSERT preserves review state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+11
-2
@@ -326,7 +326,11 @@ class IdfReadResult:
|
||||
intervals: Optional[list] = None # list[IdfhInterval] for IDFH; None for IDFW
|
||||
|
||||
|
||||
def read_idf_file(path: Union[str, Path]) -> IdfReadResult:
|
||||
def read_idf_file(
|
||||
path: Union[str, Path],
|
||||
*,
|
||||
data: Optional[bytes] = None,
|
||||
) -> IdfReadResult:
|
||||
"""Parse a Thor ``.IDFW`` binary into an ``IdfEvent`` + decoded samples.
|
||||
|
||||
Currently implements signature-A waveforms only. Signature-B
|
||||
@@ -337,9 +341,14 @@ def read_idf_file(path: Union[str, Path]) -> IdfReadResult:
|
||||
Returns an :class:`IdfReadResult`. The caller converts int sample
|
||||
counts to physical units via :func:`geo_count_to_ips` /
|
||||
:func:`mic_count_to_psi`.
|
||||
|
||||
``path`` is used for filename in error messages and ``.IDFH`` vs
|
||||
``.IDFW`` suffix detection. When ``data`` is supplied the disk
|
||||
read is skipped — useful for ingest paths that already have the
|
||||
bytes in memory and where the file may not exist on disk yet.
|
||||
"""
|
||||
p = Path(path)
|
||||
buf = p.read_bytes()
|
||||
buf = data if data is not None else p.read_bytes()
|
||||
|
||||
if len(buf) < 16 or buf[6:16] != _INSTANTEL_TAG + b"\x00":
|
||||
raise ValueError(f"{p.name}: not an IDF file (missing Instantel magic)")
|
||||
|
||||
Reference in New Issue
Block a user