42de06f441
Operator clicks one button. Parser reads SFM's events table (operator-typed
project / client / sensor_location strings), clusters by serial + time +
metadata, fuzzy-matches against existing projects, and proposes
Project / MonitoringLocation / UnitAssignment chains to create.
Auto-applies high-confidence non-conflicting clusters in bulk; queues
medium/low confidence for individual review.
Verified against real data: 10,052 events → 59 clusters → 37 high-
confidence + 14 medium + 8 low. Test-applied one cluster end-to-end;
Project + Module + Location + Assignment + UnitHistory + Decision rows
all created correctly, and Phase 2's attribution walk picked up the
events automatically on the new location's detail page.
Pipeline (backend/services/metadata_backfill.py, ~700 lines):
1. Pull all SFM events via /db/events per serial.
2. Pre-filter: drop events already covered by an existing UnitAssignment
window (Phase 2 handles those automatically).
3. Time-cluster what's left: serial + 7-day gap is the cluster identity.
4. Metadata-split each time-cluster on persistent metadata transitions
(≥ 2 consecutive events) so a single typo doesn't fork the cluster.
5. Match against existing graph (rapidfuzz.WRatio multi-signal scoring,
normalisation that handles abbreviations / reorders / separator
variations). Thresholds: 0.95 exact, 0.80 fuzzy, min-shorter-input
5 chars to guardrail false positives on single common words.
6. Score confidence (high/medium/low) using event count, span,
blank-meta, conflict, ambiguity rules.
7. Detect conflicts: overlap with existing UnitAssignment at a different
location for the same serial → blocking. Operator must reconcile.
8. Apply: ensure auto_imported ProjectType exists, ensure
vibration_monitoring ProjectModule on the project, write
Project / MonitoringLocation / UnitAssignment / UnitHistory all in
one transaction.
Migration (backend/migrate_add_metadata_backfill.py): adds
unit_assignments.source column (default 'manual') and
metadata_backfill_decisions table. Idempotent, non-destructive.
API (backend/routers/metadata_backfill.py):
GET /api/admin/metadata_backfill/scan — clusters + suggestions
POST /api/admin/metadata_backfill/apply — bulk apply by cluster_ids
w/ optional per-cluster
project/location overrides
POST /api/admin/metadata_backfill/skip — mark skipped (persistent)
UI (templates/admin/metadata_backfill.html, accessible at
/settings/developer/metadata-backfill via the Developer tab of Settings):
- One-button "Run scan" entry.
- Summary KPI tiles (scanned / already attributed / pending / conflicts).
- "Apply all high-confidence" bulk button at the top — primary path.
- Per-cluster cards below with Apply / Skip / Preview event actions.
- Blank-meta clusters get inline input fields for operator-typed project +
location names before applying.
- Blocking-conflict clusters render with the conflicting assignment
information and a disabled Apply button.
- Live progress toast during apply.
- Reuses the Phase 1+2+4 event-detail modal for "Preview event" — operator
can sanity-check the BW report data against the cluster's sample event.
Dependencies: rapidfuzz==3.10.1 added to requirements.txt. Pre-built C
wheels for all platforms, ~5s docker build hit.
Phase 5b (deferred to next session): swap-detection daily background job,
notification inbox for auto-applied swaps, recently-applied audit view,
"Tidy" page for renaming/merging auto-created projects.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
95 lines
3.6 KiB
Python
95 lines
3.6 KiB
Python
"""
|
|
Migration: add metadata-backfill support.
|
|
|
|
Adds:
|
|
1. `unit_assignments.source` column (TEXT, default 'manual').
|
|
Lets us audit which assignments were created by the metadata-backfill
|
|
parser vs by a human, and bulk-undo parser actions if needed.
|
|
|
|
2. `metadata_backfill_decisions` table. Tracks operator decisions per
|
|
cluster_id so the wizard remembers what's been skipped, what's
|
|
been applied, and what's pending across re-scans.
|
|
|
|
Idempotent — safe to re-run.
|
|
Non-destructive — adds only.
|
|
|
|
Run with:
|
|
docker exec terra-view-terra-view-1 python3 /app/backend/migrate_add_metadata_backfill.py
|
|
"""
|
|
|
|
import os
|
|
import sqlite3
|
|
|
|
DB_PATH = "./data/seismo_fleet.db"
|
|
|
|
|
|
def migrate_database():
|
|
if not os.path.exists(DB_PATH):
|
|
print(f"Database not found at {DB_PATH}")
|
|
return
|
|
|
|
print(f"Migrating database: {DB_PATH}")
|
|
conn = sqlite3.connect(DB_PATH)
|
|
cur = conn.cursor()
|
|
|
|
# ── 1. unit_assignments.source column ──────────────────────────────────
|
|
cur.execute("PRAGMA table_info(unit_assignments)")
|
|
cols = {row[1] for row in cur.fetchall()}
|
|
if "source" not in cols:
|
|
print("Adding unit_assignments.source column (default 'manual') ...")
|
|
cur.execute(
|
|
"ALTER TABLE unit_assignments ADD COLUMN source TEXT DEFAULT 'manual'"
|
|
)
|
|
# Backfill: any existing row gets source='manual'
|
|
cur.execute("UPDATE unit_assignments SET source='manual' WHERE source IS NULL")
|
|
conn.commit()
|
|
print(" Done.")
|
|
else:
|
|
print("unit_assignments.source already exists — skipping")
|
|
|
|
# ── 2. metadata_backfill_decisions table ──────────────────────────────
|
|
cur.execute(
|
|
"SELECT name FROM sqlite_master WHERE type='table' AND name='metadata_backfill_decisions'"
|
|
)
|
|
if cur.fetchone() is None:
|
|
print("Creating metadata_backfill_decisions table ...")
|
|
cur.execute("""
|
|
CREATE TABLE metadata_backfill_decisions (
|
|
cluster_id TEXT PRIMARY KEY, -- deterministic hash
|
|
status TEXT NOT NULL, -- pending | applied | skipped | conflict
|
|
confidence TEXT NOT NULL, -- high | medium | low (at time of decision)
|
|
decided_at TEXT, -- when applied/skipped
|
|
decided_by TEXT, -- 'background' | 'operator' | 'auto-high'
|
|
applied_assignment_id TEXT, -- FK to unit_assignments (if applied)
|
|
notes TEXT,
|
|
first_seen_at TEXT NOT NULL,
|
|
last_seen_at TEXT NOT NULL,
|
|
serial TEXT NOT NULL,
|
|
project_raw TEXT,
|
|
location_raw TEXT,
|
|
first_event_ts TEXT,
|
|
last_event_ts TEXT,
|
|
event_count INTEGER NOT NULL DEFAULT 0
|
|
)
|
|
""")
|
|
cur.execute(
|
|
"CREATE INDEX idx_mbd_status ON metadata_backfill_decisions(status)"
|
|
)
|
|
cur.execute(
|
|
"CREATE INDEX idx_mbd_last_seen ON metadata_backfill_decisions(last_seen_at)"
|
|
)
|
|
cur.execute(
|
|
"CREATE INDEX idx_mbd_serial ON metadata_backfill_decisions(serial)"
|
|
)
|
|
conn.commit()
|
|
print(" Done.")
|
|
else:
|
|
print("metadata_backfill_decisions table already exists — skipping")
|
|
|
|
conn.close()
|
|
print("\nMigration complete.")
|
|
|
|
|
|
if __name__ == "__main__":
|
|
migrate_database()
|