Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
14 KiB
Operator Authentication — Design & Build Plan
Status: in development (feat/operator-auth) · Targets: 0.15.x · Date: 2026-06-17
Adds a login + roles to the internal Terra-View app — the operator-facing surface that today has zero auth. This is the prerequisite that makes the app safe to expose to the internet (the office-deployment sequencing: operator auth → expose). Expands the "Deferred A" section of 2026-06-15-portal-auth-design.md into a standalone spec.
Goal
Anyone reaching the internal app must log in. Three known users to start (you +
two parents), two effective roles, and a dead-simple password-reset story for a
family-run shop. Reuses the building blocks the client portal already shipped: the
argon2 hasher (backend/auth_passwords.py) and the HMAC signed-cookie pattern
(backend/portal_auth.py).
Scope
v1 (this spec): email + password login (argon2) · long-lived "remember this
device" session · brute-force lockout · a deny-by-default gate over the whole
internal app · superadmin/admin roles · superadmin-only user management ·
password reset (superadmin-resets-anyone + self-service change + forced change)
· a seed CLI to bootstrap · the OPERATOR_AUTH_ENABLED feature-flag rollout.
Deferred (designed-not-built): TOTP 2FA (near-term follow-up, superadmin
account first) · the operator restricted role · email-based self-service
password reset (needs the email infra coming with the report work).
Principles
- Deny by default. Every route requires a login except an explicit allow-list. A route added next year is protected automatically — you can't forget to gate it.
- Can't lock yourself out. Ship dark behind a feature flag; seed + verify before enforcing; the flag is an instant escape hatch; a CLI is the break-glass.
- Reuse, don't reinvent. argon2 + the signed-cookie HMAC already exist and are tested. Operator auth is a thin new layer, not a parallel crypto stack.
- Easy recovery. For a 3-person shop, "forgot my password" must be a 10-second fix — the superadmin resets it, no email round-trip required.
Architecture
OPERATOR_AUTH_ENABLED=false ──▶ pass everything (app as today)
request ──▶ gate middleware ─┤
└ enabled ─▶ path exempt? ──yes──▶ serve (no login)
│ exempt: /login /logout /health
│ /static/* /portal/* + 3 machine endpoints
└no─▶ valid operator session?
├ no ─▶ HTML: 303 → /login?next=…
│ /api/*: 401 JSON
├ must_change_password ─▶ 303 → /change-password
└ yes ─▶ request.state.operator = user
─▶ route runs; require_role() may 403
One Starlette HTTP middleware is the gate (not per-route dependencies — a
middleware can't miss a route). It resolves the operator from the cookie using its
own SessionLocal() (same pattern the portal WS handler uses), stashes the user on
request.state.operator, and a require_role(...) dependency reads it for the few
routes that need more than "logged in."
Data model
New table operator_users (brand-new → create_all builds it on startup, no
migration needed, same as the portal's clients table):
| Column | Type | Notes |
|---|---|---|
id |
str UUID | caller-supplied str(uuid.uuid4()) (codebase convention) |
email |
str, unique, indexed | login handle, stored lowercased |
display_name |
str | "Brian", "Dad" — shown in UI + history |
password_hash |
str | argon2id via auth_passwords.hash_password |
role |
str | "superadmin" | "admin" ("operator" reserved, deferred) |
active |
bool, default True | disable a login without deleting |
must_change_password |
bool, default False | set on create/reset → forces a change on next login |
sessions_valid_from |
datetime, default utcnow |
bump to invalidate ALL of a user's sessions |
failed_login_count |
int, default 0 | lockout counter |
locked_until |
datetime, nullable | set after too many bad tries |
created_at |
datetime, default utcnow |
|
last_login_at |
datetime, nullable |
(Deferred columns, not in v1: totp_secret, totp_enabled.)
Role ladder — a rank map so checks read naturally and operator slots in later:
_ROLE_RANK = {"operator": 10, "admin": 20, "superadmin": 30}
require_role("admin") = admin or above; require_role("superadmin") for account mgmt.
Sessions
New shared module backend/auth_cookies.py — lift the generic signer out so both
auth systems share one implementation:
def sign(payload: dict) -> str # f"{b64url(json)}.{hmac_sha256(b64, SECRET_KEY)}"
def read(raw: str, max_age: int) -> dict | None # verify sig (compare_digest) + iat expiry; None on tamper/expiry
SECRET_KEY = os.getenv("SECRET_KEY", "dev-insecure-change-me") # same env the portal reads
COOKIE_SECURE = os.getenv("COOKIE_SECURE", "false") in truthy
Operator auth uses it now. (Portal's existing cookie helpers keep working untouched;
migrating them onto auth_cookies is an optional later dedupe, gated on the portal
tests staying green — don't destabilize the shipped portal for it.)
Operator session cookie: name tv_session (distinct from the portal's
portal_session), payload {"uid": <id>, "iat": <epoch>}, max_age 30 days
(= the "remember this device" — a small trusted set re-logs in rarely), httponly,
samesite=lax, secure=COOKIE_SECURE.
Validation each request (current_operator(request, db)): read+verify cookie →
load OperatorUser by uid → require active, iat >= sessions_valid_from
(epoch), and not locked_until > now. Any failure → no session. Bumping
sessions_valid_from (on password change / "log out everywhere") instantly kills all
live cookies with no session table.
Authorization
The gate (middleware) exempt list:
/login,/logout,/health,/static/*, plus PWA assets (/manifest.json,/sw.js,/favicon.ico)/portal/*— the client portal keeps its own (separate) auth- machine endpoints (LAN-only, automated, no human):
/emitters/report,/api/series3/heartbeat,/api/series4/heartbeat
/change-password is not exempt — it requires a logged-in session (you change
your own password). It's only excluded from the must_change_password redirect,
so a forced-change user can actually reach it (no redirect loop).
Permission split — minimal by design. Because the operator role is deferred,
every real v1 user is admin or superadmin, so "logged in" already means "full
app." The only thing gated above plain-admin is account management →
require_role("superadmin") on the user-management routes. Everything else just
requires a valid session (the middleware). One extra guard, not a sprawling matrix.
The flag governs everything. Both the middleware and require_role respect
OPERATOR_AUTH_ENABLED: when it's off, neither enforces anything (no session is set,
and require_role passes through) — the app behaves exactly as it does today. When
it's on, the middleware guarantees request.state.operator is set before any
require_role check runs.
Password management & reset (the emphasized requirement)
Three paths, no email infra required:
- Superadmin resets anyone — from the user-management UI, "Reset password" →
generates a strong password (
auth_passwords.generate_password), stores its hash, setsmust_change_password=True, shows the temp password once for you to hand off. Covers "easy for me to reset their password." - Self-service change —
/change-password(any logged-in user): current + new. Used for routine changes and the forced post-reset change. On success, bumpsessions_valid_from(logs out other devices) and clearmust_change_password. - Forced change — after a reset/first login,
must_change_password=True→ the gate routes them to/change-passworduntil they set their own.
Forgot it entirely (can't log in): v1 has no email reset — /login shows
"Forgot your password? Contact your administrator," and you (superadmin) reset it via
the UI or CLI. For a 3-person shop that's a text message, not a feature. (Email-based
self-service is the deferred follow-up once email infra lands.)
Bootstrapping — seed CLI
backend/operator_admin.py (modeled on the existing portal_admin.py), run inside
the container against the live DB:
create-superadmin --email you@x.com --name "Brian" # prompts for a password (or --generate)
create-user --email dad@x.com --name "Dad" --role admin # generates a temp password, must_change=True
reset-password --email dad@x.com # generates a temp, must_change=True
list # users + roles + active/locked state
disable --email dad@x.com / enable --email dad@x.com
The CLI is the bootstrap (first superadmin, before any UI is reachable) and the break-glass (locked out / forgot everything).
Account-management UI (superadmin-only)
GET /admin/users (page, require_role("superadmin")) + JSON endpoints:
- list operators (name, email, role, active, locked, last login)
- add operator (email, name, role) → temp password shown once
- reset password → temp shown once
- enable / disable, change role
Template
templates/admin/users.html. Admins (parents) don't see this; superadmin only.
Login / logout / change-password
GET /login→templates/login.html(email + password, optional?next=).POST /login→ lowercase email, lockout check, argon2 verify; on success settv_session, stamplast_login_at, clearfailed_login_count, redirect tonextor/; onmust_change_password→/change-password; on fail → increment + generic "invalid email or password" (no user-enumeration), lock after 5 → 15 min.GET /logout→ clear cookie →/login.GET/POST /change-password→templates/change_password.html.
Error handling
- Wrong email/password → generic message, increment fail count.
- ≥5 fails → "too many attempts, try again in 15 minutes" (
locked_until). - No/expired/forged cookie → HTML routes 303→
/login?next=…;/api/*→ 401 JSON. - Disabled / role-changed / password-changed-elsewhere → bounced on next request (re-validated against the DB every request).
- Superadmin-only route hit by an admin → 403.
Rollout — the no-self-lockout sequence
- Ship with
OPERATOR_AUTH_ENABLED=false(default) → the middleware short-circuits, app behaves exactly as today. Deploying can't break or lock anything. - Seed your
superadminviaoperator_admin.py. - Hit
/loginand confirm you get a session while the flag is still off (the login routes work regardless of the flag). - Flip
OPERATOR_AUTH_ENABLED=true→ the gate enforces. Your cookie is valid → you're in. Anything wrong → flip it back off (instant escape hatch). - Create your parents' accounts from
/admin/users(temp passwords, they change on first login).
- Break-glass:
operator_admin.py reset-password/create-superadminin the container; or flag off.
Testing
Reuses the pytest harness from the portal work (docker exec … python -m pytest).
- Middleware: flag off → every path passes; flag on → exempt paths + the 3 machine
endpoints pass with no cookie, a gated HTML path 303s to
/login, a gated/api/*path 401s,must_change_passworduser is routed to/change-password. - Login: success sets
tv_session; wrong password rejected + counts; 5 wrong → locked (even correct password refused). - Roles:
require_role("superadmin")route → admin gets 403, superadmin 200. - Sessions: bumping
sessions_valid_frominvalidates an existing cookie. - Password: self-change works + clears
must_change_password; superadmin reset sets a new hash +must_change_password+ returns the raw once. - Machine endpoints:
/api/series3/heartbeatetc. still 200 with the gate ON and no cookie (regression guard so we never silently break the watchers).
File structure
| File | Responsibility |
|---|---|
backend/auth_cookies.py (new) |
generic sign/read + SECRET_KEY/COOKIE_SECURE |
backend/models.py |
add OperatorUser |
backend/operator_auth.py (new) |
current_operator, require_role, the gate middleware, login/lockout helpers |
backend/routers/operator_auth_routes.py (new) |
/login, /logout, /change-password |
backend/routers/operator_users.py (new) |
/admin/users page + CRUD (superadmin) |
backend/operator_admin.py (new) |
seed/break-glass CLI |
backend/main.py |
register the gate middleware + routers; OPERATOR_AUTH_ENABLED |
templates/login.html, templates/change_password.html, templates/admin/users.html (new) |
UI |
Going to prod
- New table auto-creates; no migration. Just code + seeding.
- Set a real
SECRET_KEY(shared with the portal cookie) andCOOKIE_SECURE=trueonce on HTTPS — same env knobs already wired indocker-compose.yml. - Operator auth is what makes internet-exposing the internal app safe; pair with the (deferred) office deployment + reverse-proxy/TLS work.
Security notes
- Deny-by-default; client-supplied ids never trusted; every request re-validates the
session against the DB (instant revoke via
active/sessions_valid_from). - Passwords argon2-hashed; generic login errors (no user-enumeration); lockout on brute force; raw temp passwords shown once, never stored or logged.
- Cookies
HttpOnly+SameSite=Lax+Secure(on TLS), HMAC-signed with server-sideiatexpiry. - Known residual until deploy: without TLS the password crosses the wire in cleartext — fix is the deployment-phase TLS (Synology Let's Encrypt / Cloudflare Tunnel). The login is still a massive improvement over today's zero-auth exposure.
- TOTP 2FA is the near-term follow-up (superadmin first), especially without the UniFi edge in front on the home network.