So consumers (e.g. the command center) can read the elapsed-time clock from
the cached status instead of a fresh device /live read. Added to both the
GET and POST /status data dicts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
So a viewer sees recent trend on open instead of a blank chart. Viewing
only — reports still use the device's FTP .rnd data.
- NL43Reading table (auto-creates; no migration): unit_id, timestamp,
lp/leq/lmax/ln1/ln2.
- Monitor stores one downsampled reading per MONITOR_TRAIL_SAMPLE_S
(default 60s) from its keepalive poll loop, pruning rows older than
MONITOR_TRAIL_RETENTION_HOURS (default 24h). ~1440 rows/unit max.
- GET /api/nl43/{unit}/history?hours=N -> the trail for the last N hours
(clamped 0.1-48h), oldest-first.
Because keepalive runs 24/7, the trail fills continuously, so the history
is there whenever someone opens the live view.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Makes live monitoring (and therefore alerting) genuinely 24/7 and
restart-surviving, instead of runtime-only keepalive.
- NL43Config.monitor_enabled (default True) + migrate_add_monitor_enabled.py.
- On startup, auto-start keepalive monitors for every monitor_enabled +
tcp_enabled unit — so feeds/alerts resume after a restart with no manual step.
- /monitor/start and /monitor/stop now PERSIST monitor_enabled (start=True,
stop=False) in addition to applying keepalive at runtime, so the toggle
sticks. Roster output includes monitor_enabled for the admin UI to read.
On by default: configure a unit -> it's monitored 24/7 unless toggled off.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the POC single-threshold check with a real per-rule engine over
the live monitor feed.
- AlertRule / AlertEvent tables (auto-created via create_all; no migration).
Rule = {metric, comparison, threshold_db, duration_s, clear_margin_db,
schedule, channels, recipients}.
- alerts.py: per-(unit,rule) state machine IDLE->ACTIVE->IDLE with duration
debounce (both edges) + clear_margin hysteresis; onset/clear are distinct
events; optional nighttime schedule; rule cache w/ invalidation. The
state-machine core (_evaluate_step) is pure (no DB/clock) for testing.
- Dispatch is a server log (POC); _dispatch() is the seam for a Terra-View
webhook (email/SMS) later.
- CRUD: POST/GET/PUT/DELETE /{unit}/alerts/rules, GET /{unit}/alerts/events,
POST /{unit}/alerts/events/{id}/ack.
- test_alert_evaluator.py: synthetic level series proves onset debounce,
spike rejection, hysteresis hold, and below-comparison (4/4 pass, no device).
Source-agnostic: the same rules transfer unchanged if a unit's feed is later
sourced from FTP intervals instead of the DOD monitor.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The piece the live-view + alerting work was building toward.
monitor.py — one DOD poll loop per device, broadcast to many subscribers:
- browser WebSockets (fixes the single-connection "second viewer sees
nothing" contention — browsers no longer each open a device stream)
- the alert evaluator (can keep a feed running with no browser via
/monitor/start, so alerting runs continuously)
- persistence (each snapshot written like the poller)
DOD-sourced, so the broadcast carries ln1/ln2 (which DRD cannot). All polls
go through the existing per-device lock + pool, so it serializes safely with
the background poller and on-demand commands.
alerts.py — pluggable POC evaluator: fires (logs) when ALERT_METRIC exceeds
ALERT_THRESHOLD_DB with an ALERT_COOLDOWN_SECONDS cooldown. The rule
(instantaneous vs sustained vs L10) is the single swap point; dispatch is a
server log for now (email/SMS later).
Endpoints:
- WS /api/nl43/{unit_id}/monitor subscribe to the shared feed
- POST /api/nl43/{unit_id}/monitor/start keep feed alive w/o a browser
- POST /api/nl43/{unit_id}/monitor/stop drop the keep-alive
- GET /api/nl43/_monitor/status running/subscribers/keepalive
WS endpoint races queue.get() against a disconnect watcher so an idle feed
still detects client drop and doesn't leak a subscription.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lets an instance stop occupying a device's single TCP connection slot so
another instance (e.g. prod) can take over.
Per-unit:
- POST /api/nl43/{unit_id}/deactivate — poll_enabled=False (persisted) +
drop the connection (waits up to 10s for in-flight ops via the device
lock, then discards). Unit stays dormant across restarts.
- POST /api/nl43/{unit_id}/activate — re-enable polling.
Global standby:
- POST /api/nl43/_system/standby — poller idles and releases ALL
connections; the loop keeps re-releasing so the instance holds no slots.
- POST /api/nl43/_system/resume — resume polling.
- GET /api/nl43/_system/status — active vs standby + active_connections.
- SLMM_POLLING_ENABLED=false starts an instance in standby (persistent
way to keep a dev box from latching onto a prod-owned device).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
POST /api/nl43/{unit_id}/disconnect cleanly closes (TCP FIN + wait_closed)
and drops the pooled connection for a single device, freeing the NL43's
one connection slot. Previously only /_connections/flush existed, which
tears down every device at once.
Idempotent; no-op if nothing is cached. Releases the idle pooled
connection only — an active DRD stream/command has the socket checked out
of the pool, so close the stream WebSocket to end a live stream.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the SLMM side of the L1/L10 live-display contract. The NL-43's
DOD response carries percentile slots LN1-LN5 (channel 1, parts[5]/[6]);
parse the first two and expose them as ln1/ln2 end to end:
- NL43Snapshot dataclass: ln1/ln2 fields
- NL43Status model: ln1/ln2 columns (+ migrate_add_ln_percentiles.py)
- DOD parser: snap.ln1=parts[5], snap.ln2=parts[6]
- persist_snapshot writes them
- all /status data dicts, StatusPayload, and the DRD stream payload emit
ln1/ln2 (null on the DRD stream itself, which doesn't carry percentiles)
Labels: device LN1 defaults to L5, not L1 — Terra-View defaults the label
to L1/L10, so the device's Ln1/Ln2 slots must be set to 1%/10% for the
labels to be accurate (dynamic label emission is a follow-up).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- db cache dump on diagnostics request.
- individual device logs, db and files.
-Device logs api endpoints and diagnostics UI.
Fix:
- slmm standalone now uses local TZ (was UTC only before)
- fixed measurement start time logic.
- Implemented a new `/roster` endpoint to retrieve and manage device configurations.
- Added HTML template for the roster page with a table to display device status and actions.
- Introduced functionality to add, edit, and delete devices via the roster interface.
- Enhanced `ConfigPayload` model to include polling options.
- Updated the main application to serve the new roster page and link to it from the index.
- Added validation for polling interval in the configuration payload.
- Created detailed documentation for the roster management features and API endpoints.
- Introduced a new communication guide detailing protocol basics, transport modes, and a quick startup checklist.
- Added a detailed list of commands with their functions and usage for NL-43/NL-53 devices.
- Created a verified quick reference for command formats to prevent common mistakes.
- Implemented an improvements document outlining critical fixes, security enhancements, reliability upgrades, and code quality improvements for the SLMM project.
- Enhanced the frontend with a new button to retrieve all device settings, along with corresponding JavaScript functionality.
- Added a test script for the new settings retrieval API endpoint to demonstrate its usage and validate functionality.
- Implement migration script to add ftp_username and ftp_password columns to nl43_config table.
- Create set_ftp_credentials.py script for updating FTP credentials in the database.
- Update requirements.txt to include aioftp for FTP functionality.
- Enhance index.html with FTP controls including enable, disable, check status, and list files features.
- Add JavaScript functions for handling FTP operations and displaying file lists.