Each monitor poll was sending DOD? + Measure? (two commands), and the NL43
enforces >=1s between commands, so updates were ~2.5s apart. The run state
changes rarely, so cache it and refresh via Measure? only every
MONITOR_STATE_REFRESH_S (default 30s); most polls now send just DOD? (one
rate-limited command) -> ~1.3s/update. Also trim MONITOR_POLL_INTERVAL to
0.25s since the device rate-limit is the real pacer.
request_dod() gains an optional measurement_state arg: when supplied it
reuses that state and skips the Measure? round-trip; None preserves the old
query-every-time behavior.
~1Hz is the device floor for DOD (the >=1s command spacing); DRD's 10Hz
push isn't reachable via polling, but ~1s is a normal cadence for SLM levels.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Heartbeat: if nothing has been broadcast in MONITOR_HEARTBEAT_S (default
25s) — e.g. device offline and silent — send a non-cached keepalive frame
so a reverse proxy (NPM) doesn't drop the idle WS. New subscribers still
get the last real frame, not a heartbeat.
- Poller-skip: the 60s background poller now skips any unit with a running
monitor (MonitorManager.is_active). The monitor already polls it ~1Hz and
keeps the status cache fresh, so the background poll was redundant and just
added load/lock-contention on the device's single connection (and churn,
which matters for the cellular wedge). Trade-off: the FTP start-time sync
(only in the poller) doesn't run while a unit is actively monitored — fine,
since reports take the authoritative start time from the FTP .rnd data.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
For multiple clients connecting to a live feed (e.g. the client portal):
- cache the last broadcast frame and replay it to a new subscriber on
connect, so a client sees data immediately instead of waiting a full
poll cycle.
- broadcast a {"feed_status":"unreachable"} frame once on transition (after
3 consecutive poll failures) so clients can render an offline state
instead of a frozen chart; data frames now carry "feed_status":"ok".
The cached frame reflects current state, so a client connecting while
offline gets "unreachable" right away too.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The piece the live-view + alerting work was building toward.
monitor.py — one DOD poll loop per device, broadcast to many subscribers:
- browser WebSockets (fixes the single-connection "second viewer sees
nothing" contention — browsers no longer each open a device stream)
- the alert evaluator (can keep a feed running with no browser via
/monitor/start, so alerting runs continuously)
- persistence (each snapshot written like the poller)
DOD-sourced, so the broadcast carries ln1/ln2 (which DRD cannot). All polls
go through the existing per-device lock + pool, so it serializes safely with
the background poller and on-demand commands.
alerts.py — pluggable POC evaluator: fires (logs) when ALERT_METRIC exceeds
ALERT_THRESHOLD_DB with an ALERT_COOLDOWN_SECONDS cooldown. The rule
(instantaneous vs sustained vs L10) is the single swap point; dispatch is a
server log for now (email/SMS later).
Endpoints:
- WS /api/nl43/{unit_id}/monitor subscribe to the shared feed
- POST /api/nl43/{unit_id}/monitor/start keep feed alive w/o a browser
- POST /api/nl43/{unit_id}/monitor/stop drop the keep-alive
- GET /api/nl43/_monitor/status running/subscribers/keepalive
WS endpoint races queue.get() against a disconnect watcher so an idle feed
still detects client drop and doesn't leak a subscription.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>