59c19291ca
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
267 lines
14 KiB
Markdown
267 lines
14 KiB
Markdown
# Operator Authentication — Design & Build Plan
|
|
|
|
**Status:** in development (`feat/operator-auth`) · **Targets:** 0.15.x · **Date:** 2026-06-17
|
|
|
|
Adds a login + roles to the **internal** Terra-View app — the operator-facing
|
|
surface that today has **zero auth**. This is the prerequisite that makes the app
|
|
safe to expose to the internet (the office-deployment sequencing: operator auth →
|
|
expose). Expands the "Deferred A" section of
|
|
[2026-06-15-portal-auth-design.md](2026-06-15-portal-auth-design.md) into a
|
|
standalone spec.
|
|
|
|
## Goal
|
|
|
|
Anyone reaching the internal app must log in. Three known users to start (you +
|
|
two parents), two effective roles, and a **dead-simple password-reset story** for a
|
|
family-run shop. Reuses the building blocks the client portal already shipped: the
|
|
argon2 hasher (`backend/auth_passwords.py`) and the HMAC signed-cookie pattern
|
|
(`backend/portal_auth.py`).
|
|
|
|
## Scope
|
|
|
|
**v1 (this spec):** email + password login (argon2) · long-lived "remember this
|
|
device" session · brute-force lockout · a **deny-by-default gate** over the whole
|
|
internal app · `superadmin`/`admin` roles · **superadmin-only user management** ·
|
|
**password reset** (superadmin-resets-anyone + self-service change + forced change)
|
|
· a **seed CLI** to bootstrap · the `OPERATOR_AUTH_ENABLED` **feature-flag rollout**.
|
|
|
|
**Deferred (designed-not-built):** TOTP 2FA (near-term follow-up, `superadmin`
|
|
account first) · the `operator` restricted role · email-based self-service
|
|
password reset (needs the email infra coming with the report work).
|
|
|
|
## Principles
|
|
|
|
1. **Deny by default.** Every route requires a login *except* an explicit allow-list.
|
|
A route added next year is protected automatically — you can't forget to gate it.
|
|
2. **Can't lock yourself out.** Ship dark behind a feature flag; seed + verify before
|
|
enforcing; the flag is an instant escape hatch; a CLI is the break-glass.
|
|
3. **Reuse, don't reinvent.** argon2 + the signed-cookie HMAC already exist and are
|
|
tested. Operator auth is a thin new layer, not a parallel crypto stack.
|
|
4. **Easy recovery.** For a 3-person shop, "forgot my password" must be a 10-second
|
|
fix — the superadmin resets it, no email round-trip required.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
OPERATOR_AUTH_ENABLED=false ──▶ pass everything (app as today)
|
|
request ──▶ gate middleware ─┤
|
|
└ enabled ─▶ path exempt? ──yes──▶ serve (no login)
|
|
│ exempt: /login /logout /health
|
|
│ /static/* /portal/* + 3 machine endpoints
|
|
└no─▶ valid operator session?
|
|
├ no ─▶ HTML: 303 → /login?next=…
|
|
│ /api/*: 401 JSON
|
|
├ must_change_password ─▶ 303 → /change-password
|
|
└ yes ─▶ request.state.operator = user
|
|
─▶ route runs; require_role() may 403
|
|
```
|
|
|
|
One **Starlette HTTP middleware** is the gate (not per-route dependencies — a
|
|
middleware can't miss a route). It resolves the operator from the cookie using its
|
|
own `SessionLocal()` (same pattern the portal WS handler uses), stashes the user on
|
|
`request.state.operator`, and a `require_role(...)` dependency reads it for the few
|
|
routes that need more than "logged in."
|
|
|
|
## Data model
|
|
|
|
New table **`operator_users`** (brand-new → `create_all` builds it on startup, **no
|
|
migration needed**, same as the portal's `clients` table):
|
|
|
|
| Column | Type | Notes |
|
|
|---|---|---|
|
|
| `id` | str UUID | caller-supplied `str(uuid.uuid4())` (codebase convention) |
|
|
| `email` | str, unique, indexed | login handle, stored lowercased |
|
|
| `display_name` | str | "Brian", "Dad" — shown in UI + history |
|
|
| `password_hash` | str | argon2id via `auth_passwords.hash_password` |
|
|
| `role` | str | `"superadmin"` \| `"admin"` (`"operator"` reserved, deferred) |
|
|
| `active` | bool, default True | disable a login without deleting |
|
|
| `must_change_password` | bool, default False | set on create/reset → forces a change on next login |
|
|
| `sessions_valid_from` | datetime, default `utcnow` | bump to invalidate ALL of a user's sessions |
|
|
| `failed_login_count` | int, default 0 | lockout counter |
|
|
| `locked_until` | datetime, nullable | set after too many bad tries |
|
|
| `created_at` | datetime, default `utcnow` | |
|
|
| `last_login_at` | datetime, nullable | |
|
|
|
|
(Deferred columns, not in v1: `totp_secret`, `totp_enabled`.)
|
|
|
|
**Role ladder** — a rank map so checks read naturally and `operator` slots in later:
|
|
```python
|
|
_ROLE_RANK = {"operator": 10, "admin": 20, "superadmin": 30}
|
|
```
|
|
`require_role("admin")` = admin or above; `require_role("superadmin")` for account mgmt.
|
|
|
|
## Sessions
|
|
|
|
**New shared module `backend/auth_cookies.py`** — lift the generic signer out so both
|
|
auth systems share one implementation:
|
|
```python
|
|
def sign(payload: dict) -> str # f"{b64url(json)}.{hmac_sha256(b64, SECRET_KEY)}"
|
|
def read(raw: str, max_age: int) -> dict | None # verify sig (compare_digest) + iat expiry; None on tamper/expiry
|
|
SECRET_KEY = os.getenv("SECRET_KEY", "dev-insecure-change-me") # same env the portal reads
|
|
COOKIE_SECURE = os.getenv("COOKIE_SECURE", "false") in truthy
|
|
```
|
|
Operator auth uses it now. (Portal's existing cookie helpers keep working untouched;
|
|
migrating them onto `auth_cookies` is an optional later dedupe, gated on the portal
|
|
tests staying green — don't destabilize the shipped portal for it.)
|
|
|
|
**Operator session cookie:** name **`tv_session`** (distinct from the portal's
|
|
`portal_session`), payload `{"uid": <id>, "iat": <epoch>}`, `max_age` 30 days
|
|
(= the "remember this device" — a small trusted set re-logs in rarely), `httponly`,
|
|
`samesite=lax`, `secure=COOKIE_SECURE`.
|
|
|
|
**Validation each request** (`current_operator(request, db)`): read+verify cookie →
|
|
load `OperatorUser` by `uid` → require `active`, `iat >= sessions_valid_from`
|
|
(epoch), and not `locked_until > now`. Any failure → no session. Bumping
|
|
`sessions_valid_from` (on password change / "log out everywhere") instantly kills all
|
|
live cookies with no session table.
|
|
|
|
## Authorization
|
|
|
|
**The gate (middleware) exempt list:**
|
|
- `/login`, `/logout`, `/health`, `/static/*`, plus PWA assets
|
|
(`/manifest.json`, `/sw.js`, `/favicon.ico`)
|
|
- `/portal/*` — the client portal keeps its own (separate) auth
|
|
- **machine endpoints (LAN-only, automated, no human):** `/emitters/report`,
|
|
`/api/series3/heartbeat`, `/api/series4/heartbeat`
|
|
|
|
`/change-password` is **not** exempt — it requires a logged-in session (you change
|
|
*your own* password). It's only *excluded from the `must_change_password` redirect*,
|
|
so a forced-change user can actually reach it (no redirect loop).
|
|
|
|
**Permission split — minimal by design.** Because the `operator` role is deferred,
|
|
every real v1 user is `admin` or `superadmin`, so "logged in" already means "full
|
|
app." The *only* thing gated above plain-admin is **account management** →
|
|
`require_role("superadmin")` on the user-management routes. Everything else just
|
|
requires a valid session (the middleware). One extra guard, not a sprawling matrix.
|
|
|
|
**The flag governs everything.** Both the middleware *and* `require_role` respect
|
|
`OPERATOR_AUTH_ENABLED`: when it's off, neither enforces anything (no session is set,
|
|
and `require_role` passes through) — the app behaves exactly as it does today. When
|
|
it's on, the middleware guarantees `request.state.operator` is set before any
|
|
`require_role` check runs.
|
|
|
|
## Password management & reset *(the emphasized requirement)*
|
|
|
|
Three paths, no email infra required:
|
|
|
|
1. **Superadmin resets anyone** — from the user-management UI, "Reset password" →
|
|
generates a strong password (`auth_passwords.generate_password`), stores its hash,
|
|
sets `must_change_password=True`, **shows the temp password once** for you to hand
|
|
off. Covers "easy for *me* to reset *their* password."
|
|
2. **Self-service change** — `/change-password` (any logged-in user): current + new.
|
|
Used for routine changes **and** the forced post-reset change. On success, bump
|
|
`sessions_valid_from` (logs out other devices) and clear `must_change_password`.
|
|
3. **Forced change** — after a reset/first login, `must_change_password=True` → the
|
|
gate routes them to `/change-password` until they set their own.
|
|
|
|
**Forgot it entirely (can't log in):** v1 has **no email reset** — `/login` shows
|
|
"Forgot your password? Contact your administrator," and you (superadmin) reset it via
|
|
the UI or CLI. For a 3-person shop that's a text message, not a feature. (Email-based
|
|
self-service is the deferred follow-up once email infra lands.)
|
|
|
|
## Bootstrapping — seed CLI
|
|
|
|
`backend/operator_admin.py` (modeled on the existing `portal_admin.py`), run inside
|
|
the container against the live DB:
|
|
```
|
|
create-superadmin --email you@x.com --name "Brian" # prompts for a password (or --generate)
|
|
create-user --email dad@x.com --name "Dad" --role admin # generates a temp password, must_change=True
|
|
reset-password --email dad@x.com # generates a temp, must_change=True
|
|
list # users + roles + active/locked state
|
|
disable --email dad@x.com / enable --email dad@x.com
|
|
```
|
|
The CLI is the bootstrap (first superadmin, before any UI is reachable) **and** the
|
|
break-glass (locked out / forgot everything).
|
|
|
|
## Account-management UI (superadmin-only)
|
|
|
|
`GET /admin/users` (page, `require_role("superadmin")`) + JSON endpoints:
|
|
- list operators (name, email, role, active, locked, last login)
|
|
- add operator (email, name, role) → temp password shown once
|
|
- reset password → temp shown once
|
|
- enable / disable, change role
|
|
Template `templates/admin/users.html`. Admins (parents) don't see this; superadmin only.
|
|
|
|
## Login / logout / change-password
|
|
|
|
- `GET /login` → `templates/login.html` (email + password, optional `?next=`).
|
|
- `POST /login` → lowercase email, lockout check, argon2 verify; on success set
|
|
`tv_session`, stamp `last_login_at`, clear `failed_login_count`, redirect to `next`
|
|
or `/`; on `must_change_password` → `/change-password`; on fail → increment +
|
|
generic "invalid email or password" (no user-enumeration), lock after 5 → 15 min.
|
|
- `GET /logout` → clear cookie → `/login`.
|
|
- `GET/POST /change-password` → `templates/change_password.html`.
|
|
|
|
## Error handling
|
|
|
|
- Wrong email/password → generic message, increment fail count.
|
|
- ≥5 fails → "too many attempts, try again in 15 minutes" (`locked_until`).
|
|
- No/expired/forged cookie → HTML routes 303→`/login?next=…`; `/api/*` → 401 JSON.
|
|
- Disabled / role-changed / password-changed-elsewhere → bounced on next request
|
|
(re-validated against the DB every request).
|
|
- Superadmin-only route hit by an admin → 403.
|
|
|
|
## Rollout — the no-self-lockout sequence
|
|
|
|
1. Ship with `OPERATOR_AUTH_ENABLED=false` (default) → the middleware short-circuits,
|
|
app behaves **exactly as today**. Deploying can't break or lock anything.
|
|
2. Seed your `superadmin` via `operator_admin.py`.
|
|
3. Hit `/login` and confirm you get a session **while the flag is still off** (the
|
|
login routes work regardless of the flag).
|
|
4. Flip `OPERATOR_AUTH_ENABLED=true` → the gate enforces. Your cookie is valid → you're
|
|
in. Anything wrong → flip it back off (instant escape hatch).
|
|
5. Create your parents' accounts from `/admin/users` (temp passwords, they change on
|
|
first login).
|
|
- **Break-glass:** `operator_admin.py reset-password` / `create-superadmin` in the
|
|
container; or flag off.
|
|
|
|
## Testing
|
|
|
|
Reuses the pytest harness from the portal work (`docker exec … python -m pytest`).
|
|
- **Middleware:** flag off → every path passes; flag on → exempt paths + the 3 machine
|
|
endpoints pass with no cookie, a gated HTML path 303s to `/login`, a gated `/api/*`
|
|
path 401s, `must_change_password` user is routed to `/change-password`.
|
|
- **Login:** success sets `tv_session`; wrong password rejected + counts; 5 wrong →
|
|
locked (even correct password refused).
|
|
- **Roles:** `require_role("superadmin")` route → admin gets 403, superadmin 200.
|
|
- **Sessions:** bumping `sessions_valid_from` invalidates an existing cookie.
|
|
- **Password:** self-change works + clears `must_change_password`; superadmin reset
|
|
sets a new hash + `must_change_password` + returns the raw once.
|
|
- **Machine endpoints:** `/api/series3/heartbeat` etc. still 200 with the gate ON and
|
|
no cookie (regression guard so we never silently break the watchers).
|
|
|
|
## File structure
|
|
|
|
| File | Responsibility |
|
|
|---|---|
|
|
| `backend/auth_cookies.py` *(new)* | generic `sign`/`read` + `SECRET_KEY`/`COOKIE_SECURE` |
|
|
| `backend/models.py` | add `OperatorUser` |
|
|
| `backend/operator_auth.py` *(new)* | `current_operator`, `require_role`, the gate middleware, login/lockout helpers |
|
|
| `backend/routers/operator_auth_routes.py` *(new)* | `/login`, `/logout`, `/change-password` |
|
|
| `backend/routers/operator_users.py` *(new)* | `/admin/users` page + CRUD (superadmin) |
|
|
| `backend/operator_admin.py` *(new)* | seed/break-glass CLI |
|
|
| `backend/main.py` | register the gate middleware + routers; `OPERATOR_AUTH_ENABLED` |
|
|
| `templates/login.html`, `templates/change_password.html`, `templates/admin/users.html` *(new)* | UI |
|
|
|
|
## Going to prod
|
|
|
|
- New table auto-creates; **no migration**. Just code + seeding.
|
|
- Set a real `SECRET_KEY` (shared with the portal cookie) and `COOKIE_SECURE=true`
|
|
once on HTTPS — same env knobs already wired in `docker-compose.yml`.
|
|
- Operator auth is what makes internet-exposing the internal app safe; pair with the
|
|
(deferred) office deployment + reverse-proxy/TLS work.
|
|
|
|
## Security notes
|
|
|
|
- Deny-by-default; client-supplied ids never trusted; every request re-validates the
|
|
session against the DB (instant revoke via `active` / `sessions_valid_from`).
|
|
- Passwords argon2-hashed; generic login errors (no user-enumeration); lockout on
|
|
brute force; raw temp passwords shown once, never stored or logged.
|
|
- Cookies `HttpOnly` + `SameSite=Lax` + `Secure` (on TLS), HMAC-signed with server-side
|
|
`iat` expiry.
|
|
- **Known residual until deploy:** without TLS the password crosses the wire in
|
|
cleartext — fix is the deployment-phase TLS (Synology Let's Encrypt / Cloudflare
|
|
Tunnel). The login is still a massive improvement over today's zero-auth exposure.
|
|
- TOTP 2FA is the near-term follow-up (superadmin first), especially without the UniFi
|
|
edge in front on the home network.
|