diff --git a/CHANGELOG.md b/CHANGELOG.md index edba477..908f277 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,50 @@ All notable changes to Terra-View will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [Unreleased] + +SLM live monitoring — fan-out feed + cache-first reads. Targets **0.14.0**. The throughline: the NL-43 allows exactly **one** TCP connection at a time, so every page that opened its own device stream (or sent its own `Measure?`/DOD on load) was competing for that single connection — a second viewer saw nothing, and dashboard loads stole polling resolution from the live feed. This release moves Terra-View entirely onto SLMM's shared, cached monitoring: one DOD poll loop per device, fanned out to all viewers; dashboards read SLMM's cache (a DB read on SLMM's side) instead of touching the device; and the live panels populate instantly from cache on open, upgrading to the live WS only on demand. Paired with the SLMM-side work (adaptive poll rate, unreachable backoff, device-offline alert) on SLMM branch `dev`. + +### Added + +- **Fan-out `/monitor` feed consumption.** The unit live view (`partials/slm_live_view.html`) and the dashboard live tile (`sound_level_meters.html`) now subscribe to SLMM's shared per-device monitor over `WS /api/slmm/{unit}/monitor` instead of each opening its own device stream. Any number of clients attach without each consuming the NL-43's single connection — the "second viewer sees nothing" contention is gone. A WS proxy handler for `/monitor` was added to `backend/routers/slmm.py`. +- **L1/L10 percentile lines + cards.** Both the per-unit live chart and the dashboard card chart now plot L1 (purple) and L10 (orange) alongside Lp/Leq, and the KPI cards show L1/L10. Sourced from the DOD feed's `ln1`/`ln2` (DRD streaming can't carry percentiles, DOD can). Missing/`-.-` values leave a gap rather than dropping the line to 0. +- **Live-chart backfill on open.** Charts seed from SLMM's downsampled DOD trail (`GET /api/slmm/{unit}/history?hours=2`) so a viewer sees recent trend immediately instead of a blank chart that fills one point per second. +- **Live Measurements panel auto-populates from cache.** Opening the dashboard panel fills the KPI cards from cached `/status` and backfills the chart from `/history` — pure cache reads, no device hit. Shows a measuring badge (● Measuring / ■ Stopped) and a freshness stamp ("as of 3:48 PM (10s ago)", amber + "cached" when stale). Re-polls the cache every 15s while open; **Start Live Stream** upgrades to the live WS and no longer wipes the backfilled trail (chart point cap raised 60 → 600). +- **Refresh buttons** — one per device-list row, one in the panel header. On-demand, user-initiated single device read via `GET /api/slmm/{unit}/live` (which also refreshes SLMM's cache), with a spinner + success/error toast, then reloads the device list. +- **Per-unit live-monitoring (keepalive) toggle on `/admin/slmm`** — turns a device's server-side keepalive feed on/off (`POST /monitor/start|stop`), so alerting can keep a device's feed running with no browser attached. + +### Changed + +- **Dashboard device list + command center read SLMM's cache, not the device.** `slm_dashboard.py`'s `get_slm_units` pulls each unit's cached status from SLMM's `/roster` (one call, a SLMM DB read) for the badge + freshness; the command-center `get_live_view` reads cached `/status` instead of sending `Measure?` + a fresh DOD on every load. This stops dashboard loads from stealing the device's single connection from the live monitor. The elapsed-measurement timer still works because `measurement_start_time` is now included in the cached `/status` response. +- **Device-list freshness reflects real monitoring.** The "Last check" line now uses SLMM's cached `last_seen` (which the monitor advances on every successful poll) via `unit.cache_last_seen`, instead of the `slm_last_check` roster field the monitor never updates. The status badge also treats `Measure` as Measuring, matching the panel and SLMM's cache. +- **Status badge relocated** to the card's bottom meta row (next to "Last check"), off the top-right corner where it collided with the chart/gear/refresh action icons. + +### Fixed + +- **Deploy/bench threw `can't access property "dispatchEvent", e is null`.** `toggleSLMDeployed()` and the save-config path called `htmx.trigger('#slm-list', 'load')` guarded only by `typeof htmx !== 'undefined'`; no page has a `#slm-list`, so htmx resolved null and called `null.dispatchEvent(...)`. The deploy POST had already succeeded, so the operator saw both the green success **and** a red error. Both call sites now guard on the element existing (`slm_settings_modal.html`). +- **Monitor WS proxy leaked `CancelledError` / "task exception never retrieved"** on stream stop — the cleanup awaited pending tasks but only caught `Exception`, missing `CancelledError` (a `BaseException`). +- **"No recent check-in" shown even on an actively-monitored device** — the row read the stale `slm_last_check` roster field instead of SLMM's live cache (see Changed). +- **L1/L10 KPI cards populated but the chart drew no L1/L10 lines** — the card chart only had Lp + Leq datasets. + +### Upgrade Notes + +Requires the **matching SLMM build (branch `dev`)** — Terra-View now depends on SLMM's fan-out `/monitor` feed, `/history` trail, `/status` carrying `ln1`/`ln2` + `measurement_start_time`, cached `/roster` status, and the `monitor_enabled` keepalive flag. + +```bash +# SLMM (branch dev) — REBUILD + MIGRATE (or you'll get `no such column: nl43_status.ln1` 500s) +cd /home/serversdown/slmm && docker compose build slmm && docker compose up -d slmm +docker exec terra-view-slmm-1 python3 migrate_add_ln_percentiles.py +docker exec terra-view-slmm-1 python3 migrate_add_monitor_enabled.py + +# Terra-View — NO migration; templates are baked into the image, so rebuild (don't just restart) +cd /home/serversdown/terra-view && docker compose build terra-view && docker compose up -d terra-view +``` + +The two builds must ship **together**. Note the `docker-compose.yml` container was renamed for clarity (now `terra-view-terra-view-1`) — adjust any `docker exec` scripts that referenced the old name. + +--- + ## [0.13.3] - 2026-06-05 Calibration sync from SFM events. Closes the manual data-entry loop on calibration dates — Terra-View now pulls `device.calibration_date` from each seismograph's most recent event sidecar once a day and updates `RosterUnit.last_calibrated` when the device reports something fresher than what's stored. Manual edits still win when they're newer than the latest event; a fresh event arriving later supersedes the manual edit. Adds a "Sync now" button under Settings → Advanced → Calibration Defaults for on-demand runs, and a `docs/ROADMAP.md` to track in-flight + deferred work. diff --git a/backend/routers/slm_dashboard.py b/backend/routers/slm_dashboard.py index 3b93488..d35746c 100644 --- a/backend/routers/slm_dashboard.py +++ b/backend/routers/slm_dashboard.py @@ -91,29 +91,43 @@ async def get_slm_units( one_hour_ago = datetime.utcnow() - timedelta(hours=1) for unit in units: + # Legacy default from the roster field; refined from SLMM's cached status below. unit.is_recent = bool(unit.slm_last_check and unit.slm_last_check > one_hour_ago) + unit.measurement_state = None + unit.cache_last_seen = None # SLMM cache last_seen (real monitoring freshness) if include_measurement: - async def fetch_measurement_state(client: httpx.AsyncClient, unit_id: str) -> str | None: - try: - response = await client.get(f"{SLMM_BASE_URL}/api/nl43/{unit_id}/measurement-state") - if response.status_code == 200: - return response.json().get("measurement_state") - except Exception: - return None - return None - - deployed_units = [unit for unit in units if unit.deployed and not unit.retired] - if deployed_units: + # SLMM's /roster carries each unit's CACHED status (last_seen, + # measurement_state) from NL43Status — a DB read on SLMM's side, NOT a device + # call. The live monitor refreshes that cache ~every 1.3s, so this reflects + # real monitoring without sending Measure? to the device (which the old + # /measurement-state did) and competing with DOD polling. One call covers all. + slmm_status = {} + try: async with httpx.AsyncClient(timeout=3.0) as client: - tasks = [fetch_measurement_state(client, unit.id) for unit in deployed_units] - results = await asyncio.gather(*tasks, return_exceptions=True) + r = await client.get(f"{SLMM_BASE_URL}/api/nl43/roster") + if r.status_code == 200: + for dev in (r.json().get("devices") or []): + slmm_status[dev.get("unit_id")] = dev.get("status") or {} + except Exception: + slmm_status = {} - for unit, state in zip(deployed_units, results): - if isinstance(state, Exception): - unit.measurement_state = None - else: - unit.measurement_state = state + # "Recent" = the monitor has a fresh successful read. last_seen only advances + # on a successful poll, so staleness == the device isn't being reached. + recent_cutoff = datetime.utcnow() - timedelta(minutes=5) + for unit in units: + st = slmm_status.get(unit.id) + if not st: + continue + unit.measurement_state = st.get("measurement_state") + last_seen = st.get("last_seen") + if last_seen: + try: + ls = datetime.fromisoformat(last_seen.replace("Z", "")) + unit.is_recent = ls > recent_cutoff + unit.cache_last_seen = ls # the real freshness the monitor updates + except Exception: + pass return templates.TemplateResponse("partials/slm_device_list.html", { "request": request, @@ -157,25 +171,18 @@ async def get_live_view(request: Request, unit_id: str, db: Session = Depends(ge is_measuring = False try: - async with httpx.AsyncClient(timeout=10.0) as client: - # Get measurement state - state_response = await client.get( - f"{SLMM_BASE_URL}/api/nl43/{unit_id}/measurement-state" - ) - if state_response.status_code == 200: - state_data = state_response.json() - measurement_state = state_data.get("measurement_state", "Unknown") - is_measuring = state_data.get("is_measuring", False) - - # Get live status (measurement_start_time is already stored in SLMM database) - status_response = await client.get( - f"{SLMM_BASE_URL}/api/nl43/{unit_id}/live" - ) - if status_response.status_code == 200: - status_data = status_response.json() - current_status = status_data.get("data", {}) + # Read SLMM's CACHED status (NL43Status) — no device call. The live monitor + # keeps it fresh (~1.3s) and the live-stream WS provides ongoing updates, so we + # no longer fire Measure? + a fresh DOD read at the device on every command- + # center load (which competed with DOD polling for the single connection). + async with httpx.AsyncClient(timeout=5.0) as client: + r = await client.get(f"{SLMM_BASE_URL}/api/nl43/{unit_id}/status") + if r.status_code == 200: + current_status = r.json().get("data", {}) + measurement_state = current_status.get("measurement_state") + is_measuring = measurement_state in ("Start", "Measure") except Exception as e: - logger.error(f"Failed to get status for {unit_id}: {e}") + logger.error(f"Failed to get cached status for {unit_id}: {e}") return templates.TemplateResponse("partials/slm_live_view.html", { "request": request, diff --git a/backend/routers/slmm.py b/backend/routers/slmm.py index 1c73f5e..62a0385 100644 --- a/backend/routers/slmm.py +++ b/backend/routers/slmm.py @@ -231,6 +231,76 @@ async def proxy_websocket_live(websocket: WebSocket, unit_id: str): logger.info(f"WebSocket proxy closed for {unit_id} (live)") +@router.websocket("/{unit_id}/monitor") +async def proxy_websocket_monitor(websocket: WebSocket, unit_id: str): + """ + Proxy WebSocket connections to SLMM's /monitor (fan-out DOD feed). + + This is the shared ~1Hz DOD feed: many clients subscribe to one device feed + (no single-connection contention) and it carries L1/L10 (which the DRD + /stream cannot). Preferred over /stream for the live view. + """ + await websocket.accept() + logger.info(f"WebSocket accepted for SLMM unit {unit_id} (monitor)") + + target_ws_url = f"{SLMM_WS_BASE_URL}/api/nl43/{unit_id}/monitor" + backend_ws = None + + try: + backend_ws = await websockets.connect(target_ws_url) + logger.info(f"Connected to SLMM monitor feed for {unit_id}") + + async def forward_to_client(): + """Backend monitor frames -> browser.""" + async for message in backend_ws: + await websocket.send_text(message) + + async def watch_client(): + """Drain client frames; raises WebSocketDisconnect on close so we can + tear the pair down (the monitor feed is server->client only).""" + while True: + await websocket.receive_text() + + # When EITHER side ends (browser disconnects or backend closes), cancel the + # other immediately — avoids sending into a closed socket (the + # "Unexpected ASGI message after close" race that asyncio.gather leaves open). + tasks = [asyncio.ensure_future(forward_to_client()), + asyncio.ensure_future(watch_client())] + done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED) + for t in pending: + t.cancel() + # Await ALL tasks (the done one AND the cancelled one) and swallow both + # the expected WebSocketDisconnect and CancelledError. CancelledError is a + # BaseException, so a bare `except Exception` misses it — that's what leaked + # the traceback on stop; and awaiting only `pending` left the done task's + # exception unretrieved. + for t in tasks: + try: + await t + except (asyncio.CancelledError, Exception): + pass + + except websockets.exceptions.WebSocketException as e: + logger.error(f"WebSocket error connecting to SLMM monitor for {unit_id}: {e}") + try: + await websocket.send_json({"error": "Failed to connect to SLMM monitor", "detail": str(e)}) + except Exception: + pass + except Exception as e: + logger.error(f"Unexpected error in monitor proxy for {unit_id}: {e}") + finally: + if backend_ws: + try: + await backend_ws.close() + except Exception: + pass + try: + await websocket.close() + except Exception: + pass + logger.info(f"WebSocket monitor proxy closed for {unit_id}") + + # HTTP catch-all route MUST come after specific routes (including WebSocket routes) @router.api_route("/{path:path}", methods=["GET", "POST", "PUT", "DELETE", "PATCH"]) async def proxy_to_slmm(path: str, request: Request): diff --git a/templates/admin_slmm.html b/templates/admin_slmm.html index c9056b1..2b88dcd 100644 --- a/templates/admin_slmm.html +++ b/templates/admin_slmm.html @@ -42,6 +42,18 @@ + +
+

Live Monitoring (keepalive)

+

+ Keepalive runs the 1 Hz DOD feed 24/7 (even with no viewer), which powers the live-chart + trail and continuous threshold alerts. Toggling persists and survives restarts. +

+
+

Loading…

+
+
+

Raw API Tester

@@ -132,7 +144,60 @@ async function sendRaw() { } } +async function loadMonitors() { + const el = document.getElementById('monitor-list'); + try { + const r = await fetch('/api/slmm/roster'); + if (!r.ok) throw new Error('HTTP ' + r.status); + const d = await r.json(); + const devices = d.devices || []; + if (!devices.length) { + el.innerHTML = '

No devices configured.

'; + return; + } + el.innerHTML = devices.map(dev => { + const on = !!dev.monitor_enabled; + const reach = dev.status ? dev.status.is_reachable : null; + const reachDot = reach === false + ? '' + : ''; + return ` +
+
+ ${reachDot} + ${_esc(dev.unit_id)} + ${_esc(dev.host)}:${_esc(dev.tcp_port)} +
+
+ ${on ? '24/7 ON' : 'OFF'} + +
+
`; + }).join(''); + } catch (e) { + el.innerHTML = `

Failed to load devices: ${_esc(e.message)}

`; + } +} + +async function toggleMonitor(unitId, enable) { + const action = enable ? 'start' : 'stop'; + try { + const r = await fetch(`/api/slmm/${encodeURIComponent(unitId)}/monitor/${action}`, { method: 'POST' }); + if (!r.ok) throw new Error('HTTP ' + r.status); + await loadMonitors(); + } catch (e) { + alert('Toggle failed: ' + e.message); + } +} + loadSlmmOverview(); -setInterval(loadSlmmOverview, 30000); +loadMonitors(); +setInterval(() => { loadSlmmOverview(); loadMonitors(); }, 30000); {% endblock %} diff --git a/templates/partials/slm_device_list.html b/templates/partials/slm_device_list.html index 117decb..56e47a7 100644 --- a/templates/partials/slm_device_list.html +++ b/templates/partials/slm_device_list.html @@ -2,7 +2,14 @@ {% if units %} {% for unit in units %}
-
+
+
- - @@ -432,6 +434,24 @@ function initializeChart() { tension: 0.3, borderWidth: 2, pointRadius: 0 + }, + { + label: 'L1', + data: [], + borderColor: 'rgb(139, 92, 246)', + backgroundColor: 'rgba(139, 92, 246, 0.1)', + tension: 0.3, + borderWidth: 2, + pointRadius: 0 + }, + { + label: 'L10', + data: [], + borderColor: 'rgb(245, 158, 11)', + backgroundColor: 'rgba(245, 158, 11, 0.1)', + tension: 0.3, + borderWidth: 2, + pointRadius: 0 } ] }, @@ -493,7 +513,37 @@ if (typeof window.currentWebSocket === 'undefined') { window.currentWebSocket = null; } -function initLiveDataStream(unitId) { +// Backfill the chart with the recent DOD trail so it opens with context. +async function backfillChart(unitId) { + try { + const r = await fetch(`/api/slmm/${encodeURIComponent(unitId)}/history?hours=2`); + if (!r.ok) return; + const d = await r.json(); + const readings = d.readings || []; + if (!window.chartData) return; + for (const row of readings) { + // Trail timestamps are naive UTC; append 'Z' so they convert to local + // consistently with the live frames (which use local Date.now()). + window.chartData.timestamps.push(row.timestamp ? new Date(row.timestamp + 'Z').toLocaleTimeString() : ''); + window.chartData.lp.push(parseFloat(row.lp || 0)); + window.chartData.leq.push(parseFloat(row.leq || 0)); + window.chartData.ln1.push(parseFloat(row.ln1 || 0)); + window.chartData.ln2.push(parseFloat(row.ln2 || 0)); + } + if (window.liveChart) { + window.liveChart.data.labels = window.chartData.timestamps; + window.liveChart.data.datasets[0].data = window.chartData.lp; + window.liveChart.data.datasets[1].data = window.chartData.leq; + window.liveChart.data.datasets[2].data = window.chartData.ln1; + window.liveChart.data.datasets[3].data = window.chartData.ln2; + window.liveChart.update('none'); + } + } catch (e) { + console.warn('Chart backfill failed:', e); + } +} + +async function initLiveDataStream(unitId) { // Close existing connection if any if (window.currentWebSocket) { window.currentWebSocket.close(); @@ -504,17 +554,24 @@ function initLiveDataStream(unitId) { window.chartData.timestamps = []; window.chartData.lp = []; window.chartData.leq = []; + window.chartData.ln1 = []; + window.chartData.ln2 = []; } if (window.liveChart && window.liveChart.data && window.liveChart.data.datasets) { window.liveChart.data.labels = []; - window.liveChart.data.datasets[0].data = []; - window.liveChart.data.datasets[1].data = []; + window.liveChart.data.datasets.forEach(ds => ds.data = []); window.liveChart.update(); } - // WebSocket URL for SLMM backend via proxy + // Seed the chart with recent history BEFORE opening the live socket, so live + // frames append after the backfill (right order) and the chart isn't blank. + await backfillChart(unitId); + + // WebSocket URL for SLMM backend via proxy. + // /monitor = the shared fan-out DOD feed (many viewers, one device connection, + // and it carries L1/L10 which the DRD /stream cannot). const wsProtocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'; - const wsUrl = `${wsProtocol}//${window.location.host}/api/slmm/${unitId}/live`; + const wsUrl = `${wsProtocol}//${window.location.host}/api/slmm/${unitId}/monitor`; window.currentWebSocket = new WebSocket(wsUrl); @@ -530,7 +587,11 @@ function initLiveDataStream(unitId) { window.currentWebSocket.onmessage = function(event) { try { const data = JSON.parse(event.data); - console.log('WebSocket data received:', data); + // The DOD monitor sends keepalive 'heartbeat' frames (no metrics) and a + // 'feed_status' on each frame. Reflect status, but don't let a heartbeat + // or an 'unreachable' frame blank the cards / spike the chart with zeros. + updateFeedStatus(data.feed_status); + if (data.heartbeat || data.feed_status === 'unreachable') return; updateLiveMetrics(data); updateLiveChart(data); } catch (error) { @@ -559,6 +620,21 @@ function stopLiveDataStream() { } } +// Reflect device reachability from the monitor feed's feed_status. Safe no-op +// if the badge element isn't on the page. +function updateFeedStatus(status) { + const el = document.getElementById('live-feed-status'); + if (!el || status == null) return; + if (status === 'unreachable') { + el.textContent = 'Device offline'; + el.className = 'text-xs font-medium px-2 py-0.5 rounded bg-red-100 text-red-700 dark:bg-red-900/40 dark:text-red-300'; + } else { + el.textContent = 'Live'; + el.className = 'text-xs font-medium px-2 py-0.5 rounded bg-green-100 text-green-700 dark:bg-green-900/40 dark:text-green-300'; + } + el.style.display = ''; +} + // Update metrics display function updateLiveMetrics(data) { if (document.getElementById('live-lp')) { @@ -592,7 +668,9 @@ if (typeof window.chartData === 'undefined') { window.chartData = { timestamps: [], lp: [], - leq: [] + leq: [], + ln1: [], + ln2: [] }; } @@ -602,12 +680,17 @@ function updateLiveChart(data) { window.chartData.timestamps.push(now.toLocaleTimeString()); window.chartData.lp.push(parseFloat(data.lp || 0)); window.chartData.leq.push(parseFloat(data.leq || 0)); + window.chartData.ln1.push(parseFloat(data.ln1 || 0)); + window.chartData.ln2.push(parseFloat(data.ln2 || 0)); - // Keep only last 60 data points - if (window.chartData.timestamps.length > 60) { + // Keep a rolling window large enough to hold the ~2h backfill (one point/min) + // plus a good run of live points before the oldest scroll off. + if (window.chartData.timestamps.length > 600) { window.chartData.timestamps.shift(); window.chartData.lp.shift(); window.chartData.leq.shift(); + window.chartData.ln1.shift(); + window.chartData.ln2.shift(); } // Update chart if available @@ -615,6 +698,8 @@ function updateLiveChart(data) { window.liveChart.data.labels = window.chartData.timestamps; window.liveChart.data.datasets[0].data = window.chartData.lp; window.liveChart.data.datasets[1].data = window.chartData.leq; + window.liveChart.data.datasets[2].data = window.chartData.ln1; + window.liveChart.data.datasets[3].data = window.chartData.ln2; window.liveChart.update('none'); } } diff --git a/templates/partials/slm_settings_modal.html b/templates/partials/slm_settings_modal.html index 02e9ac6..0b89025 100644 --- a/templates/partials/slm_settings_modal.html +++ b/templates/partials/slm_settings_modal.html @@ -528,7 +528,7 @@ async function saveSLMSettings(event) { if (typeof checkFTPStatus === 'function') { checkFTPStatus(unitId); } - if (typeof htmx !== 'undefined') { + if (typeof htmx !== 'undefined' && document.getElementById('slm-list')) { htmx.trigger('#slm-list', 'load'); } }, 1500); @@ -604,8 +604,10 @@ async function toggleSLMDeployed() { successDiv.classList.remove('hidden'); setTimeout(() => successDiv.classList.add('hidden'), 3000); - // Refresh any SLM list on the page - if (typeof htmx !== 'undefined') { + // Refresh any SLM list on the page (only if one is actually present — + // the detail/dashboard pages have no #slm-list, and htmx.trigger on a + // null target throws "can't access property dispatchEvent, e is null"). + if (typeof htmx !== 'undefined' && document.getElementById('slm-list')) { htmx.trigger('#slm-list', 'load'); } } catch (error) { diff --git a/templates/sound_level_meters.html b/templates/sound_level_meters.html index 697f6e7..123e1d8 100644 --- a/templates/sound_level_meters.html +++ b/templates/sound_level_meters.html @@ -51,13 +51,31 @@