77483c2186fdc315557fafba77975855b6b89b0c
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
77483c2186 |
feat(projects): Tidy page for fuzzy-detecting + bulk-merging duplicate projects
Phase 5b first slice. Surfaces near-duplicate projects (typo variants, abbreviation differences, spacing variations like "SR81" vs "SR 81") as side-by-side pairs the operator can merge with one click. Backend (backend/services/project_tidy.py): - find_duplicate_pairs(db, threshold=0.85) walks all active projects and computes rapidfuzz.WRatio similarity for every pair. Pre-filters too-short normalised names (< 4 chars) to avoid noise. Skips soft-deleted projects. Returns pairs sorted by score desc, then by total content (more assignments → review first). - Each pair carries a suggested merge target with a human-readable reason. Priorities (in order): manual source over parser source, populated project_number, more locations, more assignments, shorter name. Operator can override the suggestion by clicking the OTHER direction button. - O(N^2) over project count. Fine up to ~500 projects. Token-prefix blocking is the obvious next optimisation if it becomes slow. Backend (backend/routers/projects.py): - GET /api/projects/admin/duplicate_pairs?threshold=&max_pairs= returns pairs as JSON for the Tidy page. Frontend (templates/admin/project_tidy.html): - New admin page at /settings/developer/project-tidy. Threshold selector (95% / 90% / 85% / 80%) at the top; rescan button next to it; auto- scans on load. - Each pair card shows side-by-side project summaries (name, project_ number, client, source-badge, location/assignment counts) with the suggested target visually highlighted (orange border). Three buttons: "Merge A → B", "Merge B → A", "Not a dup" (hide locally). - Click-to-merge opens a native confirm with the preview totals (assignments/sessions/data files moving, consolidations) — same data the project_header.html merge modal shows. On confirm, hits the existing /merge_into endpoint and re-scans automatically. - Source badges distinguish parser-created (`metadata_backfill`) from manual projects — at a glance the operator can see "this duplicate is parser-generated; safe to merge into the manual one". Frontend (templates/admin/metadata_backfill.html): - Apply-result handling now surfaces failed[] cluster reasons in a dedicated failure panel (bottom-left, dismissable). Previously a 200 OK with all-failures showed a misleading "1 cluster applied" success toast because the count and the failure list weren't being reconciled. This bit us during the DB-revert recovery earlier — the project_modules table was missing, every apply silently rolled back, user saw success toasts. Fixed. Smoke-verified against current state (10K events, 9 projects, post- merge): tool correctly finds 0 pairs at threshold 0.85 (data is clean), 1 false-positive at 0.70 (two unrelated projects sharing the token "81" — example of why the 0.85 default is correct). Settings link added under Developer → Project Tidy. Phase 5c (swap-detection daily background job + notification inbox) remains deferred to the next session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
d3b5a3fd26 |
feat(sfm): inline typeahead override of project + location on each cluster card
Operator no longer has to accept the parser's suggested project /
location verbatim. Each cluster card now has editable typeahead inputs
that search existing projects (and existing locations within the chosen
project), with a "Create new: <typed>" fallback always available.
Solves the I-80-North-Fork case: of the 20+ cluster variants
("I-80-North Fork Bridges-I80 E. Abutment", "I-80- North Fork
Bridges-543 Plank Rd", etc.), operator types "I-80" in the Project
input, picks the existing project from the dropdown, and the cluster
attaches to it. Repeat for the other variants. No need to pre-create
the canonical project — though pre-creation still works fine if you'd
rather.
Backend (backend/routers/metadata_backfill.py):
- GET /api/admin/metadata_backfill/projects_search?q=&limit=
Returns existing projects matching by case-insensitive substring OR
rapidfuzz WRatio score >= 0.50. Substring matches sort to the top
(treated as exact for ordering). Includes location_count and
project_number/client_name in each result for disambiguation. Always
emits a "Create new: <q>" suggestion alongside the matches.
- GET /api/admin/metadata_backfill/locations_search?project_id=&q=&limit=
Same shape, scoped to a single project's vibration locations.
- POST /api/admin/metadata_backfill/apply now accepts four override
keys per cluster (was previously two):
project_id → attach to existing Project (operator picked from
typeahead)
project_name → create new with this name (operator typed a
custom name; existing project_name behaviour)
location_id → attach to existing MonitoringLocation; validated
against the chosen project_id so a stale location
FK can't sneak in
location_name → create new location with this name
Frontend (templates/admin/metadata_backfill.html):
- Each non-blank-meta cluster card now has two editable typeahead inputs
(Project + Location) pre-populated with the parser's suggested
values. Old static "Project: + Create new: X" / "≈ Fuzzy match" pills
replaced with compact hint lines under the inputs showing what the
current value will do.
- Typeahead dropdown opens on focus, debounced 150ms on type. Shows
matched existing entities with score badges (exact / NN%) plus a
"Create new: <typed>" option at the bottom. Click-to-pick fills the
text input and writes the entity id into a hidden field.
- Picking a new project clears the location id (forces re-pick under
the new project, avoids cross-project location FKs).
- _gatherOverrides re-wired to emit the new project_id / location_id
keys when the operator picked from the dropdown, falling back to
*_name when they typed free-form.
Backward-compatible: blank-meta clusters keep their existing "project_name
/ location_name" plain inputs and the override path still honours them.
Verified end-to-end:
- /projects_search?q=I-80 returns the existing "I-80 - North Fork
Bridge" project (score 1.0, has 4 locations) plus a "Create new"
option.
- /locations_search requires project_id (400 without it).
- Wizard page renders with typeahead wiring confirmed in HTML.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
6ebbe28308 |
feat(sfm): strip "- Loc N" suffix from operator-typed project names
Operators sometimes bake location identifiers into the project string for email-readability — "Fay - Locks & Dam No3 - Loc 2 - 735 Bunola" where "Fay - Locks & Dam No3" is the actual project and "- Loc 2 - 735 Bunola" is location info that already lives in sensor_location. Without stripping, every "- Loc N" variant became a separate project, fragmenting what should be one project with several locations. Backend: - New _extract_project_root() helper. Regex matches " - Loc N" / "-Loc3" / " - Location #5" / etc. with case-insensitive multi-dash support; strips from that marker forward and cleans up dangling separators. Strings without a Loc-marker pass through unchanged. - Cluster dataclass adds project_root field alongside project_raw. project_raw stays the operator-typed string for display ("hover to see what was actually typed"). project_root is what gets normalised for matching and used as the suggested project name. - _ensure_project + _ensure_location now do normalisation-aware dedup before creating: a cluster of "SR81" and a cluster of "SR 81" (which normalise to the same string) collapse into one project on apply, even when applied in the same bulk operation. Avoids UNIQUE constraint collisions and duplicate-named-by-spacing projects. Frontend: - Wizard cluster cards show "↳ stripped trailing 'Loc N' suffix; operator typed: <raw>" when project_root differs from project_raw, so the operator can see at a glance what the parser did to the string. Real-data results: against the same 10,055 SFM events, confidence distribution improved from 37/14/8 (high/med/low) to 43/9/7. "Fay - Locks & Dam No3" now appears as ONE project across 6 cluster instances spanning 3 serials and 6 different locations — exactly the "one project, many locations" model the user described. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
42de06f441 |
feat(sfm): Phase 5a — bulk-backfill projects/locations/assignments from event metadata
Operator clicks one button. Parser reads SFM's events table (operator-typed
project / client / sensor_location strings), clusters by serial + time +
metadata, fuzzy-matches against existing projects, and proposes
Project / MonitoringLocation / UnitAssignment chains to create.
Auto-applies high-confidence non-conflicting clusters in bulk; queues
medium/low confidence for individual review.
Verified against real data: 10,052 events → 59 clusters → 37 high-
confidence + 14 medium + 8 low. Test-applied one cluster end-to-end;
Project + Module + Location + Assignment + UnitHistory + Decision rows
all created correctly, and Phase 2's attribution walk picked up the
events automatically on the new location's detail page.
Pipeline (backend/services/metadata_backfill.py, ~700 lines):
1. Pull all SFM events via /db/events per serial.
2. Pre-filter: drop events already covered by an existing UnitAssignment
window (Phase 2 handles those automatically).
3. Time-cluster what's left: serial + 7-day gap is the cluster identity.
4. Metadata-split each time-cluster on persistent metadata transitions
(≥ 2 consecutive events) so a single typo doesn't fork the cluster.
5. Match against existing graph (rapidfuzz.WRatio multi-signal scoring,
normalisation that handles abbreviations / reorders / separator
variations). Thresholds: 0.95 exact, 0.80 fuzzy, min-shorter-input
5 chars to guardrail false positives on single common words.
6. Score confidence (high/medium/low) using event count, span,
blank-meta, conflict, ambiguity rules.
7. Detect conflicts: overlap with existing UnitAssignment at a different
location for the same serial → blocking. Operator must reconcile.
8. Apply: ensure auto_imported ProjectType exists, ensure
vibration_monitoring ProjectModule on the project, write
Project / MonitoringLocation / UnitAssignment / UnitHistory all in
one transaction.
Migration (backend/migrate_add_metadata_backfill.py): adds
unit_assignments.source column (default 'manual') and
metadata_backfill_decisions table. Idempotent, non-destructive.
API (backend/routers/metadata_backfill.py):
GET /api/admin/metadata_backfill/scan — clusters + suggestions
POST /api/admin/metadata_backfill/apply — bulk apply by cluster_ids
w/ optional per-cluster
project/location overrides
POST /api/admin/metadata_backfill/skip — mark skipped (persistent)
UI (templates/admin/metadata_backfill.html, accessible at
/settings/developer/metadata-backfill via the Developer tab of Settings):
- One-button "Run scan" entry.
- Summary KPI tiles (scanned / already attributed / pending / conflicts).
- "Apply all high-confidence" bulk button at the top — primary path.
- Per-cluster cards below with Apply / Skip / Preview event actions.
- Blank-meta clusters get inline input fields for operator-typed project +
location names before applying.
- Blocking-conflict clusters render with the conflicting assignment
information and a disabled Apply button.
- Live progress toast during apply.
- Reuses the Phase 1+2+4 event-detail modal for "Preview event" — operator
can sanity-check the BW report data against the cluster's sample event.
Dependencies: rapidfuzz==3.10.1 added to requirements.txt. Pre-built C
wheels for all platforms, ~5s docker build hit.
Phase 5b (deferred to next session): swap-detection daily background job,
notification inbox for auto-applied swaps, recently-applied audit view,
"Tidy" page for renaming/merging auto-created projects.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|