fix(idf): decode from in-memory bytes during ingest

Bug shipped in v0.21.0: save_imported_idf called read_idf_file()
with `source_path` (a bare filename like "UM12947_….IDFW") BEFORE
writing the binary to disk.  The codec did Path(path).read_bytes()
which resolved relative to /app and hit FileNotFoundError.  The
error was caught + logged as a warning, and ingest fell back to
.txt-only — events still landed in the DB but lost the bw_report
block + .h5 waveform that the codec was supposed to produce.

Observed during a full re-forward from thor-watcher on 2026-05-29:
every Thor event logged "binary codec failed for X: [Errno 2] No
such file or directory" and got binary_decoded=False.

Fix:
- read_idf_file() gains a `data: Optional[bytes]` kwarg.  When
  supplied, skips the disk read and decodes the provided bytes
  directly.  `path` stays required (used for filename in error
  messages + .IDFH vs .IDFW suffix detection); only the read is
  conditional.  Backward compatible — existing positional callers
  (CLI scripts, tests) continue to work unchanged.
- save_imported_idf passes `data=idf_bytes` since the bytes are
  already in memory from the multipart upload.  Filesystem write
  still happens at step 5 of the existing flow; codec just no
  longer depends on it.

Verified end-to-end against UM11719_20231219162723.IDFW from the
example-data corpus: ingest endpoint returns inserted=1, log line
shows binary_decoded=True + h5=...IDFW.h5, no warnings.

Re-forward existing Thor events from thor-watcher after deploy to
backfill the bw_report block — UPSERT preserves review state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 20:09:54 +00:00
parent defd17d9c2
commit bee118506b
2 changed files with 17 additions and 3 deletions
+6 -1
View File
@@ -500,7 +500,12 @@ class WaveformStore:
is_histogram = False
try:
from micromate.idf_file import read_idf_file
res = read_idf_file(source_path)
# Pass idf_bytes through `data=` — at this point in the flow
# the binary hasn't been written to disk yet, so the codec
# can't read from source_path. We still pass source_path so
# the codec has the filename for error messages + .IDFH
# suffix detection.
res = read_idf_file(source_path, data=idf_bytes)
idf_samples = res.samples or None
idf_intervals = res.intervals
is_histogram = res.intervals is not None