Phase 5b first slice. Surfaces near-duplicate projects (typo variants,
abbreviation differences, spacing variations like "SR81" vs "SR 81")
as side-by-side pairs the operator can merge with one click.
Backend (backend/services/project_tidy.py):
- find_duplicate_pairs(db, threshold=0.85) walks all active projects and
computes rapidfuzz.WRatio similarity for every pair. Pre-filters
too-short normalised names (< 4 chars) to avoid noise. Skips
soft-deleted projects. Returns pairs sorted by score desc, then by
total content (more assignments → review first).
- Each pair carries a suggested merge target with a human-readable
reason. Priorities (in order): manual source over parser source,
populated project_number, more locations, more assignments, shorter
name. Operator can override the suggestion by clicking the OTHER
direction button.
- O(N^2) over project count. Fine up to ~500 projects. Token-prefix
blocking is the obvious next optimisation if it becomes slow.
Backend (backend/routers/projects.py):
- GET /api/projects/admin/duplicate_pairs?threshold=&max_pairs= returns
pairs as JSON for the Tidy page.
Frontend (templates/admin/project_tidy.html):
- New admin page at /settings/developer/project-tidy. Threshold selector
(95% / 90% / 85% / 80%) at the top; rescan button next to it; auto-
scans on load.
- Each pair card shows side-by-side project summaries (name, project_
number, client, source-badge, location/assignment counts) with the
suggested target visually highlighted (orange border). Three buttons:
"Merge A → B", "Merge B → A", "Not a dup" (hide locally).
- Click-to-merge opens a native confirm with the preview totals
(assignments/sessions/data files moving, consolidations) — same data
the project_header.html merge modal shows. On confirm, hits the
existing /merge_into endpoint and re-scans automatically.
- Source badges distinguish parser-created (`metadata_backfill`) from
manual projects — at a glance the operator can see "this duplicate is
parser-generated; safe to merge into the manual one".
Frontend (templates/admin/metadata_backfill.html):
- Apply-result handling now surfaces failed[] cluster reasons in a
dedicated failure panel (bottom-left, dismissable). Previously a 200
OK with all-failures showed a misleading "1 cluster applied" success
toast because the count and the failure list weren't being reconciled.
This bit us during the DB-revert recovery earlier — the
project_modules table was missing, every apply silently rolled back,
user saw success toasts. Fixed.
Smoke-verified against current state (10K events, 9 projects, post-
merge): tool correctly finds 0 pairs at threshold 0.85 (data is clean),
1 false-positive at 0.70 (two unrelated projects sharing the token "81"
— example of why the 0.85 default is correct).
Settings link added under Developer → Project Tidy.
Phase 5c (swap-detection daily background job + notification inbox)
remains deferred to the next session.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Operator clicks one button. Parser reads SFM's events table (operator-typed
project / client / sensor_location strings), clusters by serial + time +
metadata, fuzzy-matches against existing projects, and proposes
Project / MonitoringLocation / UnitAssignment chains to create.
Auto-applies high-confidence non-conflicting clusters in bulk; queues
medium/low confidence for individual review.
Verified against real data: 10,052 events → 59 clusters → 37 high-
confidence + 14 medium + 8 low. Test-applied one cluster end-to-end;
Project + Module + Location + Assignment + UnitHistory + Decision rows
all created correctly, and Phase 2's attribution walk picked up the
events automatically on the new location's detail page.
Pipeline (backend/services/metadata_backfill.py, ~700 lines):
1. Pull all SFM events via /db/events per serial.
2. Pre-filter: drop events already covered by an existing UnitAssignment
window (Phase 2 handles those automatically).
3. Time-cluster what's left: serial + 7-day gap is the cluster identity.
4. Metadata-split each time-cluster on persistent metadata transitions
(≥ 2 consecutive events) so a single typo doesn't fork the cluster.
5. Match against existing graph (rapidfuzz.WRatio multi-signal scoring,
normalisation that handles abbreviations / reorders / separator
variations). Thresholds: 0.95 exact, 0.80 fuzzy, min-shorter-input
5 chars to guardrail false positives on single common words.
6. Score confidence (high/medium/low) using event count, span,
blank-meta, conflict, ambiguity rules.
7. Detect conflicts: overlap with existing UnitAssignment at a different
location for the same serial → blocking. Operator must reconcile.
8. Apply: ensure auto_imported ProjectType exists, ensure
vibration_monitoring ProjectModule on the project, write
Project / MonitoringLocation / UnitAssignment / UnitHistory all in
one transaction.
Migration (backend/migrate_add_metadata_backfill.py): adds
unit_assignments.source column (default 'manual') and
metadata_backfill_decisions table. Idempotent, non-destructive.
API (backend/routers/metadata_backfill.py):
GET /api/admin/metadata_backfill/scan — clusters + suggestions
POST /api/admin/metadata_backfill/apply — bulk apply by cluster_ids
w/ optional per-cluster
project/location overrides
POST /api/admin/metadata_backfill/skip — mark skipped (persistent)
UI (templates/admin/metadata_backfill.html, accessible at
/settings/developer/metadata-backfill via the Developer tab of Settings):
- One-button "Run scan" entry.
- Summary KPI tiles (scanned / already attributed / pending / conflicts).
- "Apply all high-confidence" bulk button at the top — primary path.
- Per-cluster cards below with Apply / Skip / Preview event actions.
- Blank-meta clusters get inline input fields for operator-typed project +
location names before applying.
- Blocking-conflict clusters render with the conflicting assignment
information and a disabled Apply button.
- Live progress toast during apply.
- Reuses the Phase 1+2+4 event-detail modal for "Preview event" — operator
can sanity-check the BW report data against the cluster's sample event.
Dependencies: rapidfuzz==3.10.1 added to requirements.txt. Pre-built C
wheels for all platforms, ~5s docker build hit.
Phase 5b (deferred to next session): swap-detection daily background job,
notification inbox for auto-applied swaps, recently-applied audit view,
"Tidy" page for renaming/merging auto-created projects.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Introduced a new section for displaying soft-deleted projects.
- Implemented loading of deleted projects via an API call.
- Added restore and permanently delete options for each deleted project.
- Integrated loading of deleted projects when the data tab is shown.
- Updated roster.html to include a new option for Sound Level Meter in the device type selection.
- Added specific fields for Sound Level Meter information, including model, host/IP address, TCP and FTP ports, serial number, frequency weighting, and time weighting.
- Enhanced JavaScript to handle the visibility and state of Sound Level Meter fields based on the selected device type.
- Modified the unit editing functionality to populate Sound Level Meter fields with existing data when editing a unit.
- Updated settings.html to change the deployment status display from badges to radio buttons for better user interaction.
- Adjusted the toggleDeployed function to accept the new state directly instead of the current state.
- Changed the edit button in unit_detail.html to redirect to the roster edit page with the appropriate unit ID.