terra-view

serversdown/terra-view

Fork 0

RSS Feed

v0.14.0 3f0b53c46c

Compare
v0.14.0 - SLMM expansion, client portal, authentication pt.1 Stable

serversdown released this 2026-06-17 16:43:32 -04:00 | 0 commits to main since this release
SLM live monitoring — fan-out feed + cache-first reads. Targets 0.14.0. The throughline: the NL-43 allows exactly one TCP connection at a time, so every page that opened its own device stream (or sent its own Measure?/DOD on load) was competing for that single connection — a second viewer saw nothing, and dashboard loads stole polling resolution from the live feed. This release moves Terra-View entirely onto SLMM's shared, cached monitoring: one DOD poll loop per device, fanned out to all viewers; dashboards read SLMM's cache (a DB read on SLMM's side) instead of touching the device; and the live panels populate instantly from cache on open, upgrading to the live WS only on demand. Paired with the SLMM-side work (adaptive poll rate, unreachable backoff, device-offline alert) on SLMM branch dev.

Added
- Fan-out /monitor feed consumption. The unit live view (partials/slm_live_view.html) and the dashboard live tile (sound_level_meters.html) now subscribe to SLMM's shared per-device monitor over WS /api/slmm/{unit}/monitor instead of each opening its own device stream. Any number of clients attach without each consuming the NL-43's single connection — the "second viewer sees nothing" contention is gone. A WS proxy handler for /monitor was added to backend/routers/slmm.py.
- L1/L10 percentile lines + cards. Both the per-unit live chart and the dashboard card chart now plot L1 (purple) and L10 (orange) alongside Lp/Leq, and the KPI cards show L1/L10. Sourced from the DOD feed's ln1/ln2 (DRD streaming can't carry percentiles, DOD can). Missing/-.- values leave a gap rather than dropping the line to 0.
- Live-chart backfill on open. Charts seed from SLMM's downsampled DOD trail (GET /api/slmm/{unit}/history?hours=2) so a viewer sees recent trend immediately instead of a blank chart that fills one point per second.
- Live Measurements panel auto-populates from cache. Opening the dashboard panel fills the KPI cards from cached /status and backfills the chart from /history — pure cache reads, no device hit. Shows a measuring badge (● Measuring / ■ Stopped) and a freshness stamp ("as of 3:48 PM (10s ago)", amber + "cached" when stale). Re-polls the cache every 15s while open; Start Live Stream upgrades to the live WS and no longer wipes the backfilled trail (chart point cap raised 60 → 600).
- Refresh buttons — one per device-list row, one in the panel header. On-demand, user-initiated single device read via GET /api/slmm/{unit}/live (which also refreshes SLMM's cache), with a spinner + success/error toast, then reloads the device list.
- Per-unit live-monitoring (keepalive) toggle on /admin/slmm — turns a device's server-side keepalive feed on/off (POST /monitor/start|stop), so alerting can keep a device's feed running with no browser attached.
Changed
- Dashboard device list + command center read SLMM's cache, not the device. slm_dashboard.py's get_slm_units pulls each unit's cached status from SLMM's /roster (one call, a SLMM DB read) for the badge + freshness; the command-center get_live_view reads cached /status instead of sending Measure? + a fresh DOD on every load. This stops dashboard loads from stealing the device's single connection from the live monitor. The elapsed-measurement timer still works because measurement_start_time is now included in the cached /status response.
- Device-list freshness reflects real monitoring. The "Last check" line now uses SLMM's cached last_seen (which the monitor advances on every successful poll) via unit.cache_last_seen, instead of the slm_last_check roster field the monitor never updates. The status badge also treats Measure as Measuring, matching the panel and SLMM's cache.
- Status badge relocated to the card's bottom meta row (next to "Last check"), off the top-right corner where it collided with the chart/gear/refresh action icons.
Fixed
- Deploy/bench threw can't access property "dispatchEvent", e is null. toggleSLMDeployed() and the save-config path called htmx.trigger('#slm-list', 'load') guarded only by typeof htmx !== 'undefined'; no page has a #slm-list, so htmx resolved null and called null.dispatchEvent(...). The deploy POST had already succeeded, so the operator saw both the green success and a red error. Both call sites now guard on the element existing (slm_settings_modal.html).
- Monitor WS proxy leaked CancelledError / "task exception never retrieved" on stream stop — the cleanup awaited pending tasks but only caught Exception, missing CancelledError (a BaseException).
- "No recent check-in" shown even on an actively-monitored device — the row read the stale slm_last_check roster field instead of SLMM's live cache (see Changed).
- L1/L10 KPI cards populated but the chart drew no L1/L10 lines — the card chart only had Lp + Leq datasets.
Upgrade Notes

Requires the matching SLMM build (branch dev) — Terra-View now depends on SLMM's fan-out /monitor feed, /history trail, /status carrying ln1/ln2 + measurement_start_time, cached /roster status, and the monitor_enabled keepalive flag.
```
# SLMM (branch dev) — REBUILD + MIGRATE (or you'll get `no such column: nl43_status.ln1` 500s)
cd /home/serversdown/slmm && docker compose build slmm && docker compose up -d slmm
docker exec terra-view-slmm-1 python3 migrate_add_ln_percentiles.py
docker exec terra-view-slmm-1 python3 migrate_add_monitor_enabled.py

# Terra-View — NO migration; templates are baked into the image, so rebuild (don't just restart)
cd /home/serversdown/terra-view && docker compose build terra-view && docker compose up -d terra-view
```
The two builds must ship together. Note the docker-compose.yml container was renamed for clarity (now terra-view-terra-view-1) — adjust any docker exec scripts that referenced the old name.

Client portal (new — read-only client-facing view)

A scoped, read-only portal at /portal/* where a client sees only their
locations, live. Built inside Terra-View (no new service), reusing the cached
SLMM feed; every route resolves the client through one swappable
get_current_client gate, so the interim magic/open-link auth can be replaced
(M4) without touching routes or templates. Strictly read-only — no device control.

Added
- Per-client scoping + interim auth. New Client, ClientAccessToken, and a
  Project.client_id FK. A signed (HMAC) session cookie carries the access-token
  id, re-validated against the DB each request (revoke kills live sessions, with
  server-side expiry). Entry via a magic link (/portal/enter/{token}) or a
  dev-only plain link (/portal/open/{id}, PORTAL_OPEN_LINKS, default off).
- Live location view. KPI cards (Lp/Leq/Lmax/L1/L10) + chart populate
  instantly from cache, then upgrade to a real ~1 Hz WebSocket stream scoped to
  the client's unit (a scrubbed bridge to the SLMM fan-out feed). The stream
  auto-closes when the tab is hidden (Page Visibility) and after a 15-min idle
  cap, so an abandoned tab can't pin the device at 1 Hz / burn cellular.
- Locations overview. Live status map (level-colored dots, dark/light CARTO
  tiles) + a status rollup (live/offline counts, "loudest now"). Leq is the
  headline metric.
- Alerts (config → surface → 24/7). Threshold-rule config on the SLM detail
  page (proxying SLMM's alert CRUD); breach history + ack internally and a
  read-only, scrubbed history + current-alarm banner + "your alert limits" panel
  in the portal; enabling a rule pins that device's monitor on so alerts evaluate
  round-the-clock.
- Operator sharing tools. A "View client portal" preview button and a
  "Copy client link" modal (mint / list / revoke magic links) on the project
  page, plus a backend/portal_admin.py CLI.
- Field-instrument design. Distinctive themed portal — Hanken Grotesk UI +
  IBM Plex Mono readouts, panel system, pulsing live dot, staggered reveal — with a
  light/dark toggle (light default, persisted, no-flash).
Security
- All scoping enforced server-side (404-not-403, no existence leak); client
  endpoints return scrubbed projections (no device-health/internal ids); WS
  frames whitelisted; operator-set strings HTML-escaped before injection (XSS).
  Pre-merge code review hardened cookie expiry, open-links default, and the slug
  collision. Remaining hardening (reverse proxy, TLS, SECRET_KEY, M4 auth) is
  tracked in docs/CLIENT_PORTAL.md → "Security hardening backlog".
Upgrade Notes
- Migration: docker compose exec web-app python3 backend/migrate_add_client_portal.py
  (adds projects.client_id; the clients / client_access_tokens tables
  auto-create).
- Set a real SECRET_KEY in any internet-facing env (signs session cookies),
  and keep PORTAL_OPEN_LINKS=false there.
- Portal alerts depend on the SLMM dev alert engine (rules/events/evaluator +
  cooldown + keepalive coupling) — same build pairing as above.
Portal authentication (Phase 1)
- Each project's client portal is now gated by a secure per-project link + shared password (argon2-hashed). Operators manage it from the project page's Portal access panel (enable, generate password, copy link).
- Per-project session isolation (a session for one project can't read another's data); brute-force lockout (5 tries / 15 min) on the password gate.
- Retired the interim magic-link / PORTAL_OPEN_LINKS open links and the portal_admin.py mint-link command.
- Upgrade: new argon2-cffi dependency → rebuild the image, then run python3 backend/migrate_add_project_portal_auth.py per DB (adds the projects.portal_* columns). SECRET_KEY and COOKIE_SECURE are now passed through in docker-compose.yml (settable via a .env file) — set a real SECRET_KEY (and COOKIE_SECURE=true once on HTTPS) before the portal faces the internet.
Downloads
- Source Code (ZIP)
- Source Code (TAR.GZ)

6 Releases 10 Tags

v0.14.0 - SLMM expansion, client portal, authentication pt.1 Stable

Added

Changed

Fixed

Upgrade Notes

Client portal (new — read-only client-facing view)

Added

Security

Upgrade Notes

Portal authentication (Phase 1)