feat(projects): Tidy page for fuzzy-detecting + bulk-merging duplicate projects

Phase 5b first slice.  Surfaces near-duplicate projects (typo variants,
abbreviation differences, spacing variations like "SR81" vs "SR 81")
as side-by-side pairs the operator can merge with one click.

Backend (backend/services/project_tidy.py):
- find_duplicate_pairs(db, threshold=0.85) walks all active projects and
  computes rapidfuzz.WRatio similarity for every pair.  Pre-filters
  too-short normalised names (< 4 chars) to avoid noise.  Skips
  soft-deleted projects.  Returns pairs sorted by score desc, then by
  total content (more assignments → review first).
- Each pair carries a suggested merge target with a human-readable
  reason.  Priorities (in order): manual source over parser source,
  populated project_number, more locations, more assignments, shorter
  name.  Operator can override the suggestion by clicking the OTHER
  direction button.
- O(N^2) over project count.  Fine up to ~500 projects.  Token-prefix
  blocking is the obvious next optimisation if it becomes slow.

Backend (backend/routers/projects.py):
- GET /api/projects/admin/duplicate_pairs?threshold=&max_pairs=  returns
  pairs as JSON for the Tidy page.

Frontend (templates/admin/project_tidy.html):
- New admin page at /settings/developer/project-tidy.  Threshold selector
  (95% / 90% / 85% / 80%) at the top; rescan button next to it; auto-
  scans on load.
- Each pair card shows side-by-side project summaries (name, project_
  number, client, source-badge, location/assignment counts) with the
  suggested target visually highlighted (orange border).  Three buttons:
  "Merge A → B", "Merge B → A", "Not a dup" (hide locally).
- Click-to-merge opens a native confirm with the preview totals
  (assignments/sessions/data files moving, consolidations) — same data
  the project_header.html merge modal shows.  On confirm, hits the
  existing /merge_into endpoint and re-scans automatically.
- Source badges distinguish parser-created (`metadata_backfill`) from
  manual projects — at a glance the operator can see "this duplicate is
  parser-generated; safe to merge into the manual one".

Frontend (templates/admin/metadata_backfill.html):
- Apply-result handling now surfaces failed[] cluster reasons in a
  dedicated failure panel (bottom-left, dismissable).  Previously a 200
  OK with all-failures showed a misleading "1 cluster applied" success
  toast because the count and the failure list weren't being reconciled.
  This bit us during the DB-revert recovery earlier — the
  project_modules table was missing, every apply silently rolled back,
  user saw success toasts.  Fixed.

Smoke-verified against current state (10K events, 9 projects, post-
merge): tool correctly finds 0 pairs at threshold 0.85 (data is clean),
1 false-positive at 0.70 (two unrelated projects sharing the token "81"
— example of why the 0.85 default is correct).

Settings link added under Developer → Project Tidy.

Phase 5c (swap-detection daily background job + notification inbox)
remains deferred to the next session.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-12 21:29:50 +00:00
parent b1c2a1d778
commit 77483c2186
6 changed files with 624 additions and 3 deletions
+43
View File
@@ -729,6 +729,49 @@ async def project_merge_preview(
}
@router.get("/admin/duplicate_pairs")
async def get_duplicate_pairs(
threshold: float = 0.85,
max_pairs: int = 200,
db: Session = Depends(get_db),
):
"""Return all active-project pairs whose names fuzzy-match above the
threshold. Used by the Tidy page to surface duplicates that would
otherwise have to be hunted down one at a time.
Each pair carries a suggested merge-target with the reasoning so the
operator can decide direction with one click.
"""
from backend.services import project_tidy as pt
pairs = pt.find_duplicate_pairs(db, threshold=threshold, max_pairs=max_pairs)
def _ps(p):
return {
"id": p.id,
"name": p.name,
"project_number": p.project_number,
"client_name": p.client_name,
"source": p.source,
"status": p.status,
"location_count": p.location_count,
"assignment_count": p.assignment_count,
}
return {
"pairs": [
{
"a": _ps(pair.a),
"b": _ps(pair.b),
"score": round(pair.score, 3),
"suggested_target_id": pair.suggested_target_id,
"reason": pair.reason,
}
for pair in pairs
],
"threshold": threshold,
}
@router.post("/{source_id}/merge_into")
async def project_merge_execute(
source_id: str,