serversdown/slmm - slmm - Serversdown Labs

Author	SHA1	Message	Date
serversdown	9d34779171	perf: monitor caches run state, ~halving live-feed latency Each monitor poll was sending DOD? + Measure? (two commands), and the NL43 enforces >=1s between commands, so updates were ~2.5s apart. The run state changes rarely, so cache it and refresh via Measure? only every MONITOR_STATE_REFRESH_S (default 30s); most polls now send just DOD? (one rate-limited command) -> ~1.3s/update. Also trim MONITOR_POLL_INTERVAL to 0.25s since the device rate-limit is the real pacer. request_dod() gains an optional measurement_state arg: when supplied it reuses that state and skips the Measure? round-trip; None preserves the old query-every-time behavior. ~1Hz is the device floor for DOD (the >=1s command spacing); DRD's 10Hz push isn't reachable via polling, but ~1s is a normal cadence for SLM levels. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 18:52:13 +00:00
serversdown	ba622c67d8	feat: monitor heartbeat + background poller skips active-monitored units - Heartbeat: if nothing has been broadcast in MONITOR_HEARTBEAT_S (default 25s) — e.g. device offline and silent — send a non-cached keepalive frame so a reverse proxy (NPM) doesn't drop the idle WS. New subscribers still get the last real frame, not a heartbeat. - Poller-skip: the 60s background poller now skips any unit with a running monitor (MonitorManager.is_active). The monitor already polls it ~1Hz and keeps the status cache fresh, so the background poll was redundant and just added load/lock-contention on the device's single connection (and churn, which matters for the cellular wedge). Trade-off: the FTP start-time sync (only in the poller) doesn't run while a unit is actively monitored — fine, since reports take the authoritative start time from the FTP .rnd data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 17:33:29 +00:00
serversdown	6b1ec75396	feat: harden fan-out for live clients — instant first frame + offline status For multiple clients connecting to a live feed (e.g. the client portal): - cache the last broadcast frame and replay it to a new subscriber on connect, so a client sees data immediately instead of waiting a full poll cycle. - broadcast a {"feed_status":"unreachable"} frame once on transition (after 3 consecutive poll failures) so clients can render an offline state instead of a frozen chart; data frames now carry "feed_status":"ok". The cached frame reflects current state, so a client connecting while offline gets "unreachable" right away too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 17:13:21 +00:00
serversdown	aa3e088b64	feat: per-device live monitor (fan-out) + alert evaluator (POC) The piece the live-view + alerting work was building toward. monitor.py — one DOD poll loop per device, broadcast to many subscribers: - browser WebSockets (fixes the single-connection "second viewer sees nothing" contention — browsers no longer each open a device stream) - the alert evaluator (can keep a feed running with no browser via /monitor/start, so alerting runs continuously) - persistence (each snapshot written like the poller) DOD-sourced, so the broadcast carries ln1/ln2 (which DRD cannot). All polls go through the existing per-device lock + pool, so it serializes safely with the background poller and on-demand commands. alerts.py — pluggable POC evaluator: fires (logs) when ALERT_METRIC exceeds ALERT_THRESHOLD_DB with an ALERT_COOLDOWN_SECONDS cooldown. The rule (instantaneous vs sustained vs L10) is the single swap point; dispatch is a server log for now (email/SMS later). Endpoints: - WS /api/nl43/{unit_id}/monitor subscribe to the shared feed - POST /api/nl43/{unit_id}/monitor/start keep feed alive w/o a browser - POST /api/nl43/{unit_id}/monitor/stop drop the keep-alive - GET /api/nl43/_monitor/status running/subscribers/keepalive WS endpoint races queue.get() against a disconnect watcher so an idle feed still detects client drop and doesn't leak a subscription. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 23:27:05 +00:00

4 Commits