feat(forward): re-pair late-arriving TXTs on subsequent scans

When a binary is forwarded WITHOUT its paired _ASCII.TXT (because
the TXT wasn't quiescent within the grace period — BW slow to
write, AV scanning, etc.), the old behaviour was to permanently
mark the binary as "done" in the state file, even though the TXT
might land seconds later.  Result: that event lived in SFM forever
with broken-codec peak values and no project info.

Fix: state entries now carry a had_report flag.  Forwards without
a TXT set had_report=False.  On subsequent scans, the watcher
treats had_report=False entries as re-pair candidates — they get
re-forwarded once the TXT appears, and the SFM server's upsert
path (in seismo-relay's insert_events IntegrityError handler)
refreshes the DB row with the report's authoritative values.

Three status states in ForwardState.status(sha256):
  None  — never forwarded.  First-forward path.
  True  — forwarded successfully WITH report (or legacy entry
          without the had_report field).  Permanently done.
  False — forwarded WITHOUT report.  Re-pair if TXT now exists.

Backward compat: legacy state-file entries (no had_report key)
default to True so existing deployments don't unexpectedly
re-forward every entry on upgrade.

Tests cover:
  - re-pair when TXT appears after a had_report=False forward
  - had_report=True entries stay skipped permanently
  - legacy entries (missing field) treated as fully forwarded
  - state.status() returns None for unknown sha
  - re-marking had_report=False then True promotes to fully-done

36 watcher tests pass (was 31, +5 new).
This commit is contained in:
2026-05-11 16:22:53 +00:00
parent e6c25ab941
commit 65b3af90ae
3 changed files with 166 additions and 6 deletions
+76 -6
View File
@@ -218,11 +218,53 @@ class ForwardState:
def is_forwarded(self, sha256: str) -> bool:
return sha256 in self._data["forwarded"]
def mark_forwarded(self, sha256: str, filename: str, size: int) -> None:
def status(self, sha256: str) -> Optional[bool]:
"""Return forwarding status for *sha256*.
Returns:
None — never forwarded. Eligible for a fresh forward.
True — forwarded successfully with its paired report
(or in a legacy entry that pre-dates the
had_report field — assumed complete for safety).
NOT a candidate for re-forward.
False — forwarded WITHOUT its paired ``_ASCII.TXT``
(BW's TXT-write lagged past the grace period).
Eligible for re-forward IF the TXT now exists,
so the SFM server's upsert path can refresh the
DB row with the report's authoritative values.
Legacy state-file entries without a ``had_report`` key default
to ``True`` so an upgrade doesn't unexpectedly re-forward
every entry the operator has accumulated.
"""
entry = self._data["forwarded"].get(sha256)
if entry is None:
return None
return bool(entry.get("had_report", True))
def mark_forwarded(
self,
sha256: str,
filename: str,
size: int,
had_report: bool = True,
) -> None:
"""Record a successful forward.
Set ``had_report=False`` when the forward shipped the binary
without its paired ASCII report. Such entries are re-checked
on subsequent scans and re-forwarded once the TXT appears, so
SFM's upsert refreshes the DB row with the device-authoritative
peak/project values.
Idempotent: re-marking an existing sha256 with ``had_report=True``
is the explicit promotion path used when a re-pair succeeds.
"""
self._data["forwarded"][sha256] = {
"filename": filename,
"size": size,
"forwarded_at": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
"had_report": had_report,
}
self._save()
@@ -340,12 +382,19 @@ def find_pending_events(
continue
# Idempotency: skip if we already forwarded this content
# successfully. Three cases via state.status(digest):
# True — forwarded WITH report → permanently done, skip.
# False — forwarded WITHOUT report → re-pair candidate.
# Forward again only if a paired TXT is now present
# so SFM's upsert refreshes the DB row.
# None — never forwarded → normal first-forward path.
try:
digest = sha256_of_file(e.path)
except OSError as exc:
log.warning("forward scan: sha256 failed for %s: %s", e.path, exc)
continue
if state.is_forwarded(digest):
fwd_status = state.status(digest)
if fwd_status is True:
skipped_already_forwarded += 1
continue
@@ -380,9 +429,19 @@ def find_pending_events(
txt_path = candidate
# else: TXT is mid-write; treat as not-yet-paired and defer.
if txt_path is None:
# No TXT (or not yet quiescent). Wait for the grace
# period before forwarding alone.
if fwd_status is False:
# Previously forwarded WITHOUT report. We're here looking
# for a re-pair opportunity. If the TXT is now present
# and quiescent, include in pending for re-forward (the
# SFM server's upsert will refresh the DB row with the
# report's authoritative values). Otherwise skip — no
# point re-forwarding the same binary alone again.
if txt_path is None:
skipped_already_forwarded += 1
continue
elif txt_path is None:
# First-time forward and TXT not yet present. Wait for the
# grace period before forwarding alone.
if (now_ts - mtime) < missing_report_grace_seconds:
skipped_inflight += 1
continue
@@ -575,7 +634,18 @@ def forward_pending(
try:
digest = sha256_of_file(binary_path)
size = os.path.getsize(binary_path)
state.mark_forwarded(digest, os.path.basename(binary_path), size)
# Record whether this forward shipped a paired TXT.
# Forwards without a TXT are flagged had_report=False so
# subsequent scans re-check whether the TXT has since
# appeared and trigger a re-forward (the SFM server's
# upsert path refreshes the DB row with the report's
# authoritative values).
state.mark_forwarded(
digest,
os.path.basename(binary_path),
size,
had_report=(txt_path is not None),
)
except OSError as exc:
_log(f"[forward] post-success state save failed for "
f"{os.path.basename(binary_path)}: {exc}")