feat(projects): Tidy page for fuzzy-detecting + bulk-merging duplicate projects
Phase 5b first slice. Surfaces near-duplicate projects (typo variants, abbreviation differences, spacing variations like "SR81" vs "SR 81") as side-by-side pairs the operator can merge with one click. Backend (backend/services/project_tidy.py): - find_duplicate_pairs(db, threshold=0.85) walks all active projects and computes rapidfuzz.WRatio similarity for every pair. Pre-filters too-short normalised names (< 4 chars) to avoid noise. Skips soft-deleted projects. Returns pairs sorted by score desc, then by total content (more assignments → review first). - Each pair carries a suggested merge target with a human-readable reason. Priorities (in order): manual source over parser source, populated project_number, more locations, more assignments, shorter name. Operator can override the suggestion by clicking the OTHER direction button. - O(N^2) over project count. Fine up to ~500 projects. Token-prefix blocking is the obvious next optimisation if it becomes slow. Backend (backend/routers/projects.py): - GET /api/projects/admin/duplicate_pairs?threshold=&max_pairs= returns pairs as JSON for the Tidy page. Frontend (templates/admin/project_tidy.html): - New admin page at /settings/developer/project-tidy. Threshold selector (95% / 90% / 85% / 80%) at the top; rescan button next to it; auto- scans on load. - Each pair card shows side-by-side project summaries (name, project_ number, client, source-badge, location/assignment counts) with the suggested target visually highlighted (orange border). Three buttons: "Merge A → B", "Merge B → A", "Not a dup" (hide locally). - Click-to-merge opens a native confirm with the preview totals (assignments/sessions/data files moving, consolidations) — same data the project_header.html merge modal shows. On confirm, hits the existing /merge_into endpoint and re-scans automatically. - Source badges distinguish parser-created (`metadata_backfill`) from manual projects — at a glance the operator can see "this duplicate is parser-generated; safe to merge into the manual one". Frontend (templates/admin/metadata_backfill.html): - Apply-result handling now surfaces failed[] cluster reasons in a dedicated failure panel (bottom-left, dismissable). Previously a 200 OK with all-failures showed a misleading "1 cluster applied" success toast because the count and the failure list weren't being reconciled. This bit us during the DB-revert recovery earlier — the project_modules table was missing, every apply silently rolled back, user saw success toasts. Fixed. Smoke-verified against current state (10K events, 9 projects, post- merge): tool correctly finds 0 pairs at threshold 0.85 (data is clean), 1 false-positive at 0.70 (two unrelated projects sharing the token "81" — example of why the 0.85 default is correct). Settings link added under Developer → Project Tidy. Phase 5c (swap-detection daily background job + notification inbox) remains deferred to the next session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -729,6 +729,49 @@ async def project_merge_preview(
|
||||
}
|
||||
|
||||
|
||||
@router.get("/admin/duplicate_pairs")
|
||||
async def get_duplicate_pairs(
|
||||
threshold: float = 0.85,
|
||||
max_pairs: int = 200,
|
||||
db: Session = Depends(get_db),
|
||||
):
|
||||
"""Return all active-project pairs whose names fuzzy-match above the
|
||||
threshold. Used by the Tidy page to surface duplicates that would
|
||||
otherwise have to be hunted down one at a time.
|
||||
|
||||
Each pair carries a suggested merge-target with the reasoning so the
|
||||
operator can decide direction with one click.
|
||||
"""
|
||||
from backend.services import project_tidy as pt
|
||||
pairs = pt.find_duplicate_pairs(db, threshold=threshold, max_pairs=max_pairs)
|
||||
|
||||
def _ps(p):
|
||||
return {
|
||||
"id": p.id,
|
||||
"name": p.name,
|
||||
"project_number": p.project_number,
|
||||
"client_name": p.client_name,
|
||||
"source": p.source,
|
||||
"status": p.status,
|
||||
"location_count": p.location_count,
|
||||
"assignment_count": p.assignment_count,
|
||||
}
|
||||
|
||||
return {
|
||||
"pairs": [
|
||||
{
|
||||
"a": _ps(pair.a),
|
||||
"b": _ps(pair.b),
|
||||
"score": round(pair.score, 3),
|
||||
"suggested_target_id": pair.suggested_target_id,
|
||||
"reason": pair.reason,
|
||||
}
|
||||
for pair in pairs
|
||||
],
|
||||
"threshold": threshold,
|
||||
}
|
||||
|
||||
|
||||
@router.post("/{source_id}/merge_into")
|
||||
async def project_merge_execute(
|
||||
source_id: str,
|
||||
|
||||
Reference in New Issue
Block a user