# Operator Authentication — Design & Build Plan **Status:** in development (`feat/operator-auth`) · **Targets:** 0.15.x · **Date:** 2026-06-17 Adds a login + roles to the **internal** Terra-View app — the operator-facing surface that today has **zero auth**. This is the prerequisite that makes the app safe to expose to the internet (the office-deployment sequencing: operator auth → expose). Expands the "Deferred A" section of [2026-06-15-portal-auth-design.md](2026-06-15-portal-auth-design.md) into a standalone spec. ## Goal Anyone reaching the internal app must log in. Three known users to start (you + two parents), two effective roles, and a **dead-simple password-reset story** for a family-run shop. Reuses the building blocks the client portal already shipped: the argon2 hasher (`backend/auth_passwords.py`) and the HMAC signed-cookie pattern (`backend/portal_auth.py`). ## Scope **v1 (this spec):** email + password login (argon2) · long-lived "remember this device" session · brute-force lockout · a **deny-by-default gate** over the whole internal app · `superadmin`/`admin` roles · **superadmin-only user management** · **password reset** (superadmin-resets-anyone + self-service change + forced change) · a **seed CLI** to bootstrap · the `OPERATOR_AUTH_ENABLED` **feature-flag rollout**. **Deferred (designed-not-built):** TOTP 2FA (near-term follow-up, `superadmin` account first) · the `operator` restricted role · email-based self-service password reset (needs the email infra coming with the report work). ## Principles 1. **Deny by default.** Every route requires a login *except* an explicit allow-list. A route added next year is protected automatically — you can't forget to gate it. 2. **Can't lock yourself out.** Ship dark behind a feature flag; seed + verify before enforcing; the flag is an instant escape hatch; a CLI is the break-glass. 3. **Reuse, don't reinvent.** argon2 + the signed-cookie HMAC already exist and are tested. Operator auth is a thin new layer, not a parallel crypto stack. 4. **Easy recovery.** For a 3-person shop, "forgot my password" must be a 10-second fix — the superadmin resets it, no email round-trip required. ## Architecture ``` OPERATOR_AUTH_ENABLED=false ──▶ pass everything (app as today) request ──▶ gate middleware ─┤ └ enabled ─▶ path exempt? ──yes──▶ serve (no login) │ exempt: /login /logout /health │ /static/* /portal/* + 3 machine endpoints └no─▶ valid operator session? ├ no ─▶ HTML: 303 → /login?next=… │ /api/*: 401 JSON ├ must_change_password ─▶ 303 → /change-password └ yes ─▶ request.state.operator = user ─▶ route runs; require_role() may 403 ``` One **Starlette HTTP middleware** is the gate (not per-route dependencies — a middleware can't miss a route). It resolves the operator from the cookie using its own `SessionLocal()` (same pattern the portal WS handler uses), stashes the user on `request.state.operator`, and a `require_role(...)` dependency reads it for the few routes that need more than "logged in." ## Data model New table **`operator_users`** (brand-new → `create_all` builds it on startup, **no migration needed**, same as the portal's `clients` table): | Column | Type | Notes | |---|---|---| | `id` | str UUID | caller-supplied `str(uuid.uuid4())` (codebase convention) | | `email` | str, unique, indexed | login handle, stored lowercased | | `display_name` | str | "Brian", "Dad" — shown in UI + history | | `password_hash` | str | argon2id via `auth_passwords.hash_password` | | `role` | str | `"superadmin"` \| `"admin"` (`"operator"` reserved, deferred) | | `active` | bool, default True | disable a login without deleting | | `must_change_password` | bool, default False | set on create/reset → forces a change on next login | | `sessions_valid_from` | datetime, default `utcnow` | bump to invalidate ALL of a user's sessions | | `failed_login_count` | int, default 0 | lockout counter | | `locked_until` | datetime, nullable | set after too many bad tries | | `created_at` | datetime, default `utcnow` | | | `last_login_at` | datetime, nullable | | (Deferred columns, not in v1: `totp_secret`, `totp_enabled`.) **Role ladder** — a rank map so checks read naturally and `operator` slots in later: ```python _ROLE_RANK = {"operator": 10, "admin": 20, "superadmin": 30} ``` `require_role("admin")` = admin or above; `require_role("superadmin")` for account mgmt. ## Sessions **New shared module `backend/auth_cookies.py`** — lift the generic signer out so both auth systems share one implementation: ```python def sign(payload: dict) -> str # f"{b64url(json)}.{hmac_sha256(b64, SECRET_KEY)}" def read(raw: str, max_age: int) -> dict | None # verify sig (compare_digest) + iat expiry; None on tamper/expiry SECRET_KEY = os.getenv("SECRET_KEY", "dev-insecure-change-me") # same env the portal reads COOKIE_SECURE = os.getenv("COOKIE_SECURE", "false") in truthy ``` Operator auth uses it now. (Portal's existing cookie helpers keep working untouched; migrating them onto `auth_cookies` is an optional later dedupe, gated on the portal tests staying green — don't destabilize the shipped portal for it.) **Operator session cookie:** name **`tv_session`** (distinct from the portal's `portal_session`), payload `{"uid": , "iat": }`, `max_age` 30 days (= the "remember this device" — a small trusted set re-logs in rarely), `httponly`, `samesite=lax`, `secure=COOKIE_SECURE`. **Validation each request** (`current_operator(request, db)`): read+verify cookie → load `OperatorUser` by `uid` → require `active`, `iat >= sessions_valid_from` (epoch), and not `locked_until > now`. Any failure → no session. Bumping `sessions_valid_from` (on password change / "log out everywhere") instantly kills all live cookies with no session table. ## Authorization **The gate (middleware) exempt list:** - `/login`, `/logout`, `/health`, `/static/*`, plus PWA assets (`/manifest.json`, `/sw.js`, `/favicon.ico`) - `/portal/*` — the client portal keeps its own (separate) auth - **machine endpoints (LAN-only, automated, no human):** `/emitters/report`, `/api/series3/heartbeat`, `/api/series4/heartbeat` `/change-password` is **not** exempt — it requires a logged-in session (you change *your own* password). It's only *excluded from the `must_change_password` redirect*, so a forced-change user can actually reach it (no redirect loop). **Permission split — minimal by design.** Because the `operator` role is deferred, every real v1 user is `admin` or `superadmin`, so "logged in" already means "full app." The *only* thing gated above plain-admin is **account management** → `require_role("superadmin")` on the user-management routes. Everything else just requires a valid session (the middleware). One extra guard, not a sprawling matrix. **The flag governs everything.** Both the middleware *and* `require_role` respect `OPERATOR_AUTH_ENABLED`: when it's off, neither enforces anything (no session is set, and `require_role` passes through) — the app behaves exactly as it does today. When it's on, the middleware guarantees `request.state.operator` is set before any `require_role` check runs. ## Password management & reset *(the emphasized requirement)* Three paths, no email infra required: 1. **Superadmin resets anyone** — from the user-management UI, "Reset password" → generates a strong password (`auth_passwords.generate_password`), stores its hash, sets `must_change_password=True`, **shows the temp password once** for you to hand off. Covers "easy for *me* to reset *their* password." 2. **Self-service change** — `/change-password` (any logged-in user): current + new. Used for routine changes **and** the forced post-reset change. On success, bump `sessions_valid_from` (logs out other devices) and clear `must_change_password`. 3. **Forced change** — after a reset/first login, `must_change_password=True` → the gate routes them to `/change-password` until they set their own. **Forgot it entirely (can't log in):** v1 has **no email reset** — `/login` shows "Forgot your password? Contact your administrator," and you (superadmin) reset it via the UI or CLI. For a 3-person shop that's a text message, not a feature. (Email-based self-service is the deferred follow-up once email infra lands.) ## Bootstrapping — seed CLI `backend/operator_admin.py` (modeled on the existing `portal_admin.py`), run inside the container against the live DB: ``` create-superadmin --email you@x.com --name "Brian" # prompts for a password (or --generate) create-user --email dad@x.com --name "Dad" --role admin # generates a temp password, must_change=True reset-password --email dad@x.com # generates a temp, must_change=True list # users + roles + active/locked state disable --email dad@x.com / enable --email dad@x.com ``` The CLI is the bootstrap (first superadmin, before any UI is reachable) **and** the break-glass (locked out / forgot everything). ## Account-management UI (superadmin-only) `GET /admin/users` (page, `require_role("superadmin")`) + JSON endpoints: - list operators (name, email, role, active, locked, last login) - add operator (email, name, role) → temp password shown once - reset password → temp shown once - enable / disable, change role Template `templates/admin/users.html`. Admins (parents) don't see this; superadmin only. ## Login / logout / change-password - `GET /login` → `templates/login.html` (email + password, optional `?next=`). - `POST /login` → lowercase email, lockout check, argon2 verify; on success set `tv_session`, stamp `last_login_at`, clear `failed_login_count`, redirect to `next` or `/`; on `must_change_password` → `/change-password`; on fail → increment + generic "invalid email or password" (no user-enumeration), lock after 5 → 15 min. - `GET /logout` → clear cookie → `/login`. - `GET/POST /change-password` → `templates/change_password.html`. ## Error handling - Wrong email/password → generic message, increment fail count. - ≥5 fails → "too many attempts, try again in 15 minutes" (`locked_until`). - No/expired/forged cookie → HTML routes 303→`/login?next=…`; `/api/*` → 401 JSON. - Disabled / role-changed / password-changed-elsewhere → bounced on next request (re-validated against the DB every request). - Superadmin-only route hit by an admin → 403. ## Rollout — the no-self-lockout sequence 1. Ship with `OPERATOR_AUTH_ENABLED=false` (default) → the middleware short-circuits, app behaves **exactly as today**. Deploying can't break or lock anything. 2. Seed your `superadmin` via `operator_admin.py`. 3. Hit `/login` and confirm you get a session **while the flag is still off** (the login routes work regardless of the flag). 4. Flip `OPERATOR_AUTH_ENABLED=true` → the gate enforces. Your cookie is valid → you're in. Anything wrong → flip it back off (instant escape hatch). 5. Create your parents' accounts from `/admin/users` (temp passwords, they change on first login). - **Break-glass:** `operator_admin.py reset-password` / `create-superadmin` in the container; or flag off. ## Testing Reuses the pytest harness from the portal work (`docker exec … python -m pytest`). - **Middleware:** flag off → every path passes; flag on → exempt paths + the 3 machine endpoints pass with no cookie, a gated HTML path 303s to `/login`, a gated `/api/*` path 401s, `must_change_password` user is routed to `/change-password`. - **Login:** success sets `tv_session`; wrong password rejected + counts; 5 wrong → locked (even correct password refused). - **Roles:** `require_role("superadmin")` route → admin gets 403, superadmin 200. - **Sessions:** bumping `sessions_valid_from` invalidates an existing cookie. - **Password:** self-change works + clears `must_change_password`; superadmin reset sets a new hash + `must_change_password` + returns the raw once. - **Machine endpoints:** `/api/series3/heartbeat` etc. still 200 with the gate ON and no cookie (regression guard so we never silently break the watchers). ## File structure | File | Responsibility | |---|---| | `backend/auth_cookies.py` *(new)* | generic `sign`/`read` + `SECRET_KEY`/`COOKIE_SECURE` | | `backend/models.py` | add `OperatorUser` | | `backend/operator_auth.py` *(new)* | `current_operator`, `require_role`, the gate middleware, login/lockout helpers | | `backend/routers/operator_auth_routes.py` *(new)* | `/login`, `/logout`, `/change-password` | | `backend/routers/operator_users.py` *(new)* | `/admin/users` page + CRUD (superadmin) | | `backend/operator_admin.py` *(new)* | seed/break-glass CLI | | `backend/main.py` | register the gate middleware + routers; `OPERATOR_AUTH_ENABLED` | | `templates/login.html`, `templates/change_password.html`, `templates/admin/users.html` *(new)* | UI | ## Going to prod - New table auto-creates; **no migration**. Just code + seeding. - Set a real `SECRET_KEY` (shared with the portal cookie) and `COOKIE_SECURE=true` once on HTTPS — same env knobs already wired in `docker-compose.yml`. - Operator auth is what makes internet-exposing the internal app safe; pair with the (deferred) office deployment + reverse-proxy/TLS work. ## Security notes - Deny-by-default; client-supplied ids never trusted; every request re-validates the session against the DB (instant revoke via `active` / `sessions_valid_from`). - Passwords argon2-hashed; generic login errors (no user-enumeration); lockout on brute force; raw temp passwords shown once, never stored or logged. - Cookies `HttpOnly` + `SameSite=Lax` + `Secure` (on TLS), HMAC-signed with server-side `iat` expiry. - **Known residual until deploy:** without TLS the password crosses the wire in cleartext — fix is the deployment-phase TLS (Synology Let's Encrypt / Cloudflare Tunnel). The login is still a massive improvement over today's zero-auth exposure. - TOTP 2FA is the near-term follow-up (superadmin first), especially without the UniFi edge in front on the home network.