From 59c19291ca07f8fc005065c825fdb4281178b0b4 Mon Sep 17 00:00:00 2001 From: serversdown Date: Wed, 17 Jun 2026 17:39:49 +0000 Subject: [PATCH] docs: operator-auth design spec (v1 password login + roles + easy reset; 2FA/operator-role deferred) Co-Authored-By: Claude Opus 4.8 (1M context) --- .../specs/2026-06-17-operator-auth-design.md | 266 ++++++++++++++++++ 1 file changed, 266 insertions(+) create mode 100644 docs/superpowers/specs/2026-06-17-operator-auth-design.md diff --git a/docs/superpowers/specs/2026-06-17-operator-auth-design.md b/docs/superpowers/specs/2026-06-17-operator-auth-design.md new file mode 100644 index 0000000..dc5485f --- /dev/null +++ b/docs/superpowers/specs/2026-06-17-operator-auth-design.md @@ -0,0 +1,266 @@ +# Operator Authentication — Design & Build Plan + +**Status:** in development (`feat/operator-auth`) · **Targets:** 0.15.x · **Date:** 2026-06-17 + +Adds a login + roles to the **internal** Terra-View app — the operator-facing +surface that today has **zero auth**. This is the prerequisite that makes the app +safe to expose to the internet (the office-deployment sequencing: operator auth → +expose). Expands the "Deferred A" section of +[2026-06-15-portal-auth-design.md](2026-06-15-portal-auth-design.md) into a +standalone spec. + +## Goal + +Anyone reaching the internal app must log in. Three known users to start (you + +two parents), two effective roles, and a **dead-simple password-reset story** for a +family-run shop. Reuses the building blocks the client portal already shipped: the +argon2 hasher (`backend/auth_passwords.py`) and the HMAC signed-cookie pattern +(`backend/portal_auth.py`). + +## Scope + +**v1 (this spec):** email + password login (argon2) · long-lived "remember this +device" session · brute-force lockout · a **deny-by-default gate** over the whole +internal app · `superadmin`/`admin` roles · **superadmin-only user management** · +**password reset** (superadmin-resets-anyone + self-service change + forced change) +· a **seed CLI** to bootstrap · the `OPERATOR_AUTH_ENABLED` **feature-flag rollout**. + +**Deferred (designed-not-built):** TOTP 2FA (near-term follow-up, `superadmin` +account first) · the `operator` restricted role · email-based self-service +password reset (needs the email infra coming with the report work). + +## Principles + +1. **Deny by default.** Every route requires a login *except* an explicit allow-list. + A route added next year is protected automatically — you can't forget to gate it. +2. **Can't lock yourself out.** Ship dark behind a feature flag; seed + verify before + enforcing; the flag is an instant escape hatch; a CLI is the break-glass. +3. **Reuse, don't reinvent.** argon2 + the signed-cookie HMAC already exist and are + tested. Operator auth is a thin new layer, not a parallel crypto stack. +4. **Easy recovery.** For a 3-person shop, "forgot my password" must be a 10-second + fix — the superadmin resets it, no email round-trip required. + +## Architecture + +``` + OPERATOR_AUTH_ENABLED=false ──▶ pass everything (app as today) + request ──▶ gate middleware ─┤ + └ enabled ─▶ path exempt? ──yes──▶ serve (no login) + │ exempt: /login /logout /health + │ /static/* /portal/* + 3 machine endpoints + └no─▶ valid operator session? + ├ no ─▶ HTML: 303 → /login?next=… + │ /api/*: 401 JSON + ├ must_change_password ─▶ 303 → /change-password + └ yes ─▶ request.state.operator = user + ─▶ route runs; require_role() may 403 +``` + +One **Starlette HTTP middleware** is the gate (not per-route dependencies — a +middleware can't miss a route). It resolves the operator from the cookie using its +own `SessionLocal()` (same pattern the portal WS handler uses), stashes the user on +`request.state.operator`, and a `require_role(...)` dependency reads it for the few +routes that need more than "logged in." + +## Data model + +New table **`operator_users`** (brand-new → `create_all` builds it on startup, **no +migration needed**, same as the portal's `clients` table): + +| Column | Type | Notes | +|---|---|---| +| `id` | str UUID | caller-supplied `str(uuid.uuid4())` (codebase convention) | +| `email` | str, unique, indexed | login handle, stored lowercased | +| `display_name` | str | "Brian", "Dad" — shown in UI + history | +| `password_hash` | str | argon2id via `auth_passwords.hash_password` | +| `role` | str | `"superadmin"` \| `"admin"` (`"operator"` reserved, deferred) | +| `active` | bool, default True | disable a login without deleting | +| `must_change_password` | bool, default False | set on create/reset → forces a change on next login | +| `sessions_valid_from` | datetime, default `utcnow` | bump to invalidate ALL of a user's sessions | +| `failed_login_count` | int, default 0 | lockout counter | +| `locked_until` | datetime, nullable | set after too many bad tries | +| `created_at` | datetime, default `utcnow` | | +| `last_login_at` | datetime, nullable | | + +(Deferred columns, not in v1: `totp_secret`, `totp_enabled`.) + +**Role ladder** — a rank map so checks read naturally and `operator` slots in later: +```python +_ROLE_RANK = {"operator": 10, "admin": 20, "superadmin": 30} +``` +`require_role("admin")` = admin or above; `require_role("superadmin")` for account mgmt. + +## Sessions + +**New shared module `backend/auth_cookies.py`** — lift the generic signer out so both +auth systems share one implementation: +```python +def sign(payload: dict) -> str # f"{b64url(json)}.{hmac_sha256(b64, SECRET_KEY)}" +def read(raw: str, max_age: int) -> dict | None # verify sig (compare_digest) + iat expiry; None on tamper/expiry +SECRET_KEY = os.getenv("SECRET_KEY", "dev-insecure-change-me") # same env the portal reads +COOKIE_SECURE = os.getenv("COOKIE_SECURE", "false") in truthy +``` +Operator auth uses it now. (Portal's existing cookie helpers keep working untouched; +migrating them onto `auth_cookies` is an optional later dedupe, gated on the portal +tests staying green — don't destabilize the shipped portal for it.) + +**Operator session cookie:** name **`tv_session`** (distinct from the portal's +`portal_session`), payload `{"uid": , "iat": }`, `max_age` 30 days +(= the "remember this device" — a small trusted set re-logs in rarely), `httponly`, +`samesite=lax`, `secure=COOKIE_SECURE`. + +**Validation each request** (`current_operator(request, db)`): read+verify cookie → +load `OperatorUser` by `uid` → require `active`, `iat >= sessions_valid_from` +(epoch), and not `locked_until > now`. Any failure → no session. Bumping +`sessions_valid_from` (on password change / "log out everywhere") instantly kills all +live cookies with no session table. + +## Authorization + +**The gate (middleware) exempt list:** +- `/login`, `/logout`, `/health`, `/static/*`, plus PWA assets + (`/manifest.json`, `/sw.js`, `/favicon.ico`) +- `/portal/*` — the client portal keeps its own (separate) auth +- **machine endpoints (LAN-only, automated, no human):** `/emitters/report`, + `/api/series3/heartbeat`, `/api/series4/heartbeat` + +`/change-password` is **not** exempt — it requires a logged-in session (you change +*your own* password). It's only *excluded from the `must_change_password` redirect*, +so a forced-change user can actually reach it (no redirect loop). + +**Permission split — minimal by design.** Because the `operator` role is deferred, +every real v1 user is `admin` or `superadmin`, so "logged in" already means "full +app." The *only* thing gated above plain-admin is **account management** → +`require_role("superadmin")` on the user-management routes. Everything else just +requires a valid session (the middleware). One extra guard, not a sprawling matrix. + +**The flag governs everything.** Both the middleware *and* `require_role` respect +`OPERATOR_AUTH_ENABLED`: when it's off, neither enforces anything (no session is set, +and `require_role` passes through) — the app behaves exactly as it does today. When +it's on, the middleware guarantees `request.state.operator` is set before any +`require_role` check runs. + +## Password management & reset *(the emphasized requirement)* + +Three paths, no email infra required: + +1. **Superadmin resets anyone** — from the user-management UI, "Reset password" → + generates a strong password (`auth_passwords.generate_password`), stores its hash, + sets `must_change_password=True`, **shows the temp password once** for you to hand + off. Covers "easy for *me* to reset *their* password." +2. **Self-service change** — `/change-password` (any logged-in user): current + new. + Used for routine changes **and** the forced post-reset change. On success, bump + `sessions_valid_from` (logs out other devices) and clear `must_change_password`. +3. **Forced change** — after a reset/first login, `must_change_password=True` → the + gate routes them to `/change-password` until they set their own. + +**Forgot it entirely (can't log in):** v1 has **no email reset** — `/login` shows +"Forgot your password? Contact your administrator," and you (superadmin) reset it via +the UI or CLI. For a 3-person shop that's a text message, not a feature. (Email-based +self-service is the deferred follow-up once email infra lands.) + +## Bootstrapping — seed CLI + +`backend/operator_admin.py` (modeled on the existing `portal_admin.py`), run inside +the container against the live DB: +``` +create-superadmin --email you@x.com --name "Brian" # prompts for a password (or --generate) +create-user --email dad@x.com --name "Dad" --role admin # generates a temp password, must_change=True +reset-password --email dad@x.com # generates a temp, must_change=True +list # users + roles + active/locked state +disable --email dad@x.com / enable --email dad@x.com +``` +The CLI is the bootstrap (first superadmin, before any UI is reachable) **and** the +break-glass (locked out / forgot everything). + +## Account-management UI (superadmin-only) + +`GET /admin/users` (page, `require_role("superadmin")`) + JSON endpoints: +- list operators (name, email, role, active, locked, last login) +- add operator (email, name, role) → temp password shown once +- reset password → temp shown once +- enable / disable, change role +Template `templates/admin/users.html`. Admins (parents) don't see this; superadmin only. + +## Login / logout / change-password + +- `GET /login` → `templates/login.html` (email + password, optional `?next=`). +- `POST /login` → lowercase email, lockout check, argon2 verify; on success set + `tv_session`, stamp `last_login_at`, clear `failed_login_count`, redirect to `next` + or `/`; on `must_change_password` → `/change-password`; on fail → increment + + generic "invalid email or password" (no user-enumeration), lock after 5 → 15 min. +- `GET /logout` → clear cookie → `/login`. +- `GET/POST /change-password` → `templates/change_password.html`. + +## Error handling + +- Wrong email/password → generic message, increment fail count. +- ≥5 fails → "too many attempts, try again in 15 minutes" (`locked_until`). +- No/expired/forged cookie → HTML routes 303→`/login?next=…`; `/api/*` → 401 JSON. +- Disabled / role-changed / password-changed-elsewhere → bounced on next request + (re-validated against the DB every request). +- Superadmin-only route hit by an admin → 403. + +## Rollout — the no-self-lockout sequence + +1. Ship with `OPERATOR_AUTH_ENABLED=false` (default) → the middleware short-circuits, + app behaves **exactly as today**. Deploying can't break or lock anything. +2. Seed your `superadmin` via `operator_admin.py`. +3. Hit `/login` and confirm you get a session **while the flag is still off** (the + login routes work regardless of the flag). +4. Flip `OPERATOR_AUTH_ENABLED=true` → the gate enforces. Your cookie is valid → you're + in. Anything wrong → flip it back off (instant escape hatch). +5. Create your parents' accounts from `/admin/users` (temp passwords, they change on + first login). +- **Break-glass:** `operator_admin.py reset-password` / `create-superadmin` in the + container; or flag off. + +## Testing + +Reuses the pytest harness from the portal work (`docker exec … python -m pytest`). +- **Middleware:** flag off → every path passes; flag on → exempt paths + the 3 machine + endpoints pass with no cookie, a gated HTML path 303s to `/login`, a gated `/api/*` + path 401s, `must_change_password` user is routed to `/change-password`. +- **Login:** success sets `tv_session`; wrong password rejected + counts; 5 wrong → + locked (even correct password refused). +- **Roles:** `require_role("superadmin")` route → admin gets 403, superadmin 200. +- **Sessions:** bumping `sessions_valid_from` invalidates an existing cookie. +- **Password:** self-change works + clears `must_change_password`; superadmin reset + sets a new hash + `must_change_password` + returns the raw once. +- **Machine endpoints:** `/api/series3/heartbeat` etc. still 200 with the gate ON and + no cookie (regression guard so we never silently break the watchers). + +## File structure + +| File | Responsibility | +|---|---| +| `backend/auth_cookies.py` *(new)* | generic `sign`/`read` + `SECRET_KEY`/`COOKIE_SECURE` | +| `backend/models.py` | add `OperatorUser` | +| `backend/operator_auth.py` *(new)* | `current_operator`, `require_role`, the gate middleware, login/lockout helpers | +| `backend/routers/operator_auth_routes.py` *(new)* | `/login`, `/logout`, `/change-password` | +| `backend/routers/operator_users.py` *(new)* | `/admin/users` page + CRUD (superadmin) | +| `backend/operator_admin.py` *(new)* | seed/break-glass CLI | +| `backend/main.py` | register the gate middleware + routers; `OPERATOR_AUTH_ENABLED` | +| `templates/login.html`, `templates/change_password.html`, `templates/admin/users.html` *(new)* | UI | + +## Going to prod + +- New table auto-creates; **no migration**. Just code + seeding. +- Set a real `SECRET_KEY` (shared with the portal cookie) and `COOKIE_SECURE=true` + once on HTTPS — same env knobs already wired in `docker-compose.yml`. +- Operator auth is what makes internet-exposing the internal app safe; pair with the + (deferred) office deployment + reverse-proxy/TLS work. + +## Security notes + +- Deny-by-default; client-supplied ids never trusted; every request re-validates the + session against the DB (instant revoke via `active` / `sessions_valid_from`). +- Passwords argon2-hashed; generic login errors (no user-enumeration); lockout on + brute force; raw temp passwords shown once, never stored or logged. +- Cookies `HttpOnly` + `SameSite=Lax` + `Secure` (on TLS), HMAC-signed with server-side + `iat` expiry. +- **Known residual until deploy:** without TLS the password crosses the wire in + cleartext — fix is the deployment-phase TLS (Synology Let's Encrypt / Cloudflare + Tunnel). The login is still a massive improvement over today's zero-auth exposure. +- TOTP 2FA is the near-term follow-up (superadmin first), especially without the UniFi + edge in front on the home network.