feat(sfm): Phase 5a — bulk-backfill projects/locations/assignments from event metadata
Operator clicks one button. Parser reads SFM's events table (operator-typed
project / client / sensor_location strings), clusters by serial + time +
metadata, fuzzy-matches against existing projects, and proposes
Project / MonitoringLocation / UnitAssignment chains to create.
Auto-applies high-confidence non-conflicting clusters in bulk; queues
medium/low confidence for individual review.
Verified against real data: 10,052 events → 59 clusters → 37 high-
confidence + 14 medium + 8 low. Test-applied one cluster end-to-end;
Project + Module + Location + Assignment + UnitHistory + Decision rows
all created correctly, and Phase 2's attribution walk picked up the
events automatically on the new location's detail page.
Pipeline (backend/services/metadata_backfill.py, ~700 lines):
1. Pull all SFM events via /db/events per serial.
2. Pre-filter: drop events already covered by an existing UnitAssignment
window (Phase 2 handles those automatically).
3. Time-cluster what's left: serial + 7-day gap is the cluster identity.
4. Metadata-split each time-cluster on persistent metadata transitions
(≥ 2 consecutive events) so a single typo doesn't fork the cluster.
5. Match against existing graph (rapidfuzz.WRatio multi-signal scoring,
normalisation that handles abbreviations / reorders / separator
variations). Thresholds: 0.95 exact, 0.80 fuzzy, min-shorter-input
5 chars to guardrail false positives on single common words.
6. Score confidence (high/medium/low) using event count, span,
blank-meta, conflict, ambiguity rules.
7. Detect conflicts: overlap with existing UnitAssignment at a different
location for the same serial → blocking. Operator must reconcile.
8. Apply: ensure auto_imported ProjectType exists, ensure
vibration_monitoring ProjectModule on the project, write
Project / MonitoringLocation / UnitAssignment / UnitHistory all in
one transaction.
Migration (backend/migrate_add_metadata_backfill.py): adds
unit_assignments.source column (default 'manual') and
metadata_backfill_decisions table. Idempotent, non-destructive.
API (backend/routers/metadata_backfill.py):
GET /api/admin/metadata_backfill/scan — clusters + suggestions
POST /api/admin/metadata_backfill/apply — bulk apply by cluster_ids
w/ optional per-cluster
project/location overrides
POST /api/admin/metadata_backfill/skip — mark skipped (persistent)
UI (templates/admin/metadata_backfill.html, accessible at
/settings/developer/metadata-backfill via the Developer tab of Settings):
- One-button "Run scan" entry.
- Summary KPI tiles (scanned / already attributed / pending / conflicts).
- "Apply all high-confidence" bulk button at the top — primary path.
- Per-cluster cards below with Apply / Skip / Preview event actions.
- Blank-meta clusters get inline input fields for operator-typed project +
location names before applying.
- Blocking-conflict clusters render with the conflicting assignment
information and a disabled Apply button.
- Live progress toast during apply.
- Reuses the Phase 1+2+4 event-detail modal for "Preview event" — operator
can sanity-check the BW report data against the cluster's sample event.
Dependencies: rapidfuzz==3.10.1 added to requirements.txt. Pre-built C
wheels for all platforms, ~5s docker build hit.
Phase 5b (deferred to next session): swap-detection daily background job,
notification inbox for auto-applied swaps, recently-applied audit view,
"Tidy" page for renaming/merging auto-created projects.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -259,9 +259,48 @@ class UnitAssignment(Base):
|
||||
device_type = Column(String, nullable=False) # "slm" | "seismograph"
|
||||
project_id = Column(String, nullable=False, index=True) # FK to Project.id
|
||||
|
||||
# Provenance: how was this assignment created? Used for auditing,
|
||||
# bulk-undo of parser actions, and the Phase 4 deployment timeline.
|
||||
# "manual" — operator created via UI
|
||||
# "metadata_backfill" — auto-created by the metadata parser
|
||||
# from operator-typed BW event metadata
|
||||
# (bulk backfill workflow)
|
||||
# "metadata_backfill_swap" — auto-created by swap-detection
|
||||
# background job
|
||||
source = Column(String, nullable=False, default="manual")
|
||||
|
||||
created_at = Column(DateTime, default=datetime.utcnow)
|
||||
|
||||
|
||||
class MetadataBackfillDecision(Base):
|
||||
"""
|
||||
Per-cluster decisions tracked by the metadata-backfill parser.
|
||||
|
||||
`cluster_id` is the deterministic SHA1 hash of
|
||||
(serial, first_event_date, last_event_date), so the same cluster
|
||||
produces the same id across re-scans. The decisions table lets the
|
||||
parser remember "I already applied this" or "operator skipped this"
|
||||
across scan invocations.
|
||||
"""
|
||||
__tablename__ = "metadata_backfill_decisions"
|
||||
|
||||
cluster_id = Column(String, primary_key=True)
|
||||
status = Column(String, nullable=False) # pending | applied | skipped | conflict
|
||||
confidence = Column(String, nullable=False) # high | medium | low
|
||||
decided_at = Column(DateTime, nullable=True)
|
||||
decided_by = Column(String, nullable=True) # background | operator | auto-high
|
||||
applied_assignment_id = Column(String, nullable=True) # FK to unit_assignments.id
|
||||
notes = Column(Text, nullable=True)
|
||||
first_seen_at = Column(DateTime, nullable=False, default=datetime.utcnow)
|
||||
last_seen_at = Column(DateTime, nullable=False, default=datetime.utcnow)
|
||||
serial = Column(String, nullable=False, index=True)
|
||||
project_raw = Column(String, nullable=True)
|
||||
location_raw = Column(String, nullable=True)
|
||||
first_event_ts = Column(DateTime, nullable=True)
|
||||
last_event_ts = Column(DateTime, nullable=True)
|
||||
event_count = Column(Integer, nullable=False, default=0)
|
||||
|
||||
|
||||
class ScheduledAction(Base):
|
||||
"""
|
||||
Scheduled actions: automation for recording start/stop/download.
|
||||
|
||||
Reference in New Issue
Block a user