fix(sfm): broaden Loc-N suffix regex to catch '.Loc' and 'Loc No.' variants

Operators use more separator variations than the original regex caught:
  - "Trumbull-Brayman-JV- Mont.Dam.Loc 2-R-25" — period as separator
  - "CMU - RKM Hall - Loc No. 3 - 4615 Forbes" — "No." between Loc and digit

Added period to the separator character class and optional "No." token
before the digit.  Catches both above patterns plus near-variants
without false-positives on normal project strings.

Real-data impact: 5 more clusters now auto-strip cleanly, including the
1,903-event Trumbull-Brayman-JV- Mont.Dam cluster.  Confidence
distribution: 43 → 44 high.
This commit is contained in:
2026-05-12 19:19:46 +00:00
parent 6ebbe28308
commit d46f9fccf8
+14 -10
View File
@@ -103,16 +103,20 @@ def _normalise(s: Optional[str]) -> str:
# their full project_raw and the operator can edit them in the wizard. # their full project_raw and the operator can edit them in the wizard.
_PROJECT_LOC_SUFFIX = re.compile( _PROJECT_LOC_SUFFIX = re.compile(
r""" r"""
\s* # any leading whitespace \s* # any leading whitespace
[-–—] # hyphen or em-dash (separator before the Loc marker) [-–—.] # separator: hyphen, em-dash, or period
\s* # optional spaces # (operators use any of these — see
(?:loc|location) # 'Loc' or 'Location' # "Mont.Dam.Loc 2-R-25")
\.? # optional period \s*
\s* # optional space (?:loc|location) # 'Loc' or 'Location'
\#? # optional '#' \.? # optional trailing period after Loc
\s* # optional space \s*
\d+ # required digit (?:no\.?\s*)? # optional "No." or "No " before the digit
\b # word boundary # (e.g. "Loc No. 3", "Loc No 5")
\#? # optional '#'
\s*
\d+ # required digit
\b
""", """,
re.IGNORECASE | re.VERBOSE, re.IGNORECASE | re.VERBOSE,
) )