Phase 5b first slice. Surfaces near-duplicate projects (typo variants,
abbreviation differences, spacing variations like "SR81" vs "SR 81")
as side-by-side pairs the operator can merge with one click.
Backend (backend/services/project_tidy.py):
- find_duplicate_pairs(db, threshold=0.85) walks all active projects and
computes rapidfuzz.WRatio similarity for every pair. Pre-filters
too-short normalised names (< 4 chars) to avoid noise. Skips
soft-deleted projects. Returns pairs sorted by score desc, then by
total content (more assignments → review first).
- Each pair carries a suggested merge target with a human-readable
reason. Priorities (in order): manual source over parser source,
populated project_number, more locations, more assignments, shorter
name. Operator can override the suggestion by clicking the OTHER
direction button.
- O(N^2) over project count. Fine up to ~500 projects. Token-prefix
blocking is the obvious next optimisation if it becomes slow.
Backend (backend/routers/projects.py):
- GET /api/projects/admin/duplicate_pairs?threshold=&max_pairs= returns
pairs as JSON for the Tidy page.
Frontend (templates/admin/project_tidy.html):
- New admin page at /settings/developer/project-tidy. Threshold selector
(95% / 90% / 85% / 80%) at the top; rescan button next to it; auto-
scans on load.
- Each pair card shows side-by-side project summaries (name, project_
number, client, source-badge, location/assignment counts) with the
suggested target visually highlighted (orange border). Three buttons:
"Merge A → B", "Merge B → A", "Not a dup" (hide locally).
- Click-to-merge opens a native confirm with the preview totals
(assignments/sessions/data files moving, consolidations) — same data
the project_header.html merge modal shows. On confirm, hits the
existing /merge_into endpoint and re-scans automatically.
- Source badges distinguish parser-created (`metadata_backfill`) from
manual projects — at a glance the operator can see "this duplicate is
parser-generated; safe to merge into the manual one".
Frontend (templates/admin/metadata_backfill.html):
- Apply-result handling now surfaces failed[] cluster reasons in a
dedicated failure panel (bottom-left, dismissable). Previously a 200
OK with all-failures showed a misleading "1 cluster applied" success
toast because the count and the failure list weren't being reconciled.
This bit us during the DB-revert recovery earlier — the
project_modules table was missing, every apply silently rolled back,
user saw success toasts. Fixed.
Smoke-verified against current state (10K events, 9 projects, post-
merge): tool correctly finds 0 pairs at threshold 0.85 (data is clean),
1 false-positive at 0.70 (two unrelated projects sharing the token "81"
— example of why the 0.85 default is correct).
Settings link added under Developer → Project Tidy.
Phase 5c (swap-detection daily background job + notification inbox)
remains deferred to the next session.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>