serversdown/slmm - slmm - Serversdown Labs

Author	SHA1	Message	Date
serversdown	9c43e68534	feat: alert engine stage 1 — rules, events, state machine, CRUD Replaces the POC single-threshold check with a real per-rule engine over the live monitor feed. - AlertRule / AlertEvent tables (auto-created via create_all; no migration). Rule = {metric, comparison, threshold_db, duration_s, clear_margin_db, schedule, channels, recipients}. - alerts.py: per-(unit,rule) state machine IDLE->ACTIVE->IDLE with duration debounce (both edges) + clear_margin hysteresis; onset/clear are distinct events; optional nighttime schedule; rule cache w/ invalidation. The state-machine core (_evaluate_step) is pure (no DB/clock) for testing. - Dispatch is a server log (POC); _dispatch() is the seam for a Terra-View webhook (email/SMS) later. - CRUD: POST/GET/PUT/DELETE /{unit}/alerts/rules, GET /{unit}/alerts/events, POST /{unit}/alerts/events/{id}/ack. - test_alert_evaluator.py: synthetic level series proves onset debounce, spike rejection, hysteresis hold, and below-comparison (4/4 pass, no device). Source-agnostic: the same rules transfer unchanged if a unit's feed is later sourced from FTP intervals instead of the DOD monitor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 01:04:03 +00:00
serversdown	aa3e088b64	feat: per-device live monitor (fan-out) + alert evaluator (POC) The piece the live-view + alerting work was building toward. monitor.py — one DOD poll loop per device, broadcast to many subscribers: - browser WebSockets (fixes the single-connection "second viewer sees nothing" contention — browsers no longer each open a device stream) - the alert evaluator (can keep a feed running with no browser via /monitor/start, so alerting runs continuously) - persistence (each snapshot written like the poller) DOD-sourced, so the broadcast carries ln1/ln2 (which DRD cannot). All polls go through the existing per-device lock + pool, so it serializes safely with the background poller and on-demand commands. alerts.py — pluggable POC evaluator: fires (logs) when ALERT_METRIC exceeds ALERT_THRESHOLD_DB with an ALERT_COOLDOWN_SECONDS cooldown. The rule (instantaneous vs sustained vs L10) is the single swap point; dispatch is a server log for now (email/SMS later). Endpoints: - WS /api/nl43/{unit_id}/monitor subscribe to the shared feed - POST /api/nl43/{unit_id}/monitor/start keep feed alive w/o a browser - POST /api/nl43/{unit_id}/monitor/stop drop the keep-alive - GET /api/nl43/_monitor/status running/subscribers/keepalive WS endpoint races queue.get() against a disconnect watcher so an idle feed still detects client drop and doesn't leak a subscription. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 23:27:05 +00:00
serversdown	8c17af4849	fix: ignore garbled measurement-state reads (phantom STOPPED/STARTED) A buffer desync on the shared persistent connection (commonly right after a DRD/DOD test) can make a Measure? read return a stray value. The state classifier treated anything not in {"Start","Measure"} as "not measuring", so a garbled read logged a phantom STOPPED, the next clean read logged STARTED, and that reset measurement_start_time — producing constant STOPPED/STARTED device-log pairs and a drifting elapsed timer. Now only recognized states drive transitions: {"Start","Measure"} = measuring, {"Stop"} = stopped, anything else = no change. Garbled reads are also not persisted as the cached state, so they can't poison the next transition check. Builds on the earlier Start<->Measure normalization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 22:50:18 +00:00
serversdown	b954eb8c89	feat: per-unit deactivate and global SLMM standby Lets an instance stop occupying a device's single TCP connection slot so another instance (e.g. prod) can take over. Per-unit: - POST /api/nl43/{unit_id}/deactivate — poll_enabled=False (persisted) + drop the connection (waits up to 10s for in-flight ops via the device lock, then discards). Unit stays dormant across restarts. - POST /api/nl43/{unit_id}/activate — re-enable polling. Global standby: - POST /api/nl43/_system/standby — poller idles and releases ALL connections; the loop keeps re-releasing so the instance holds no slots. - POST /api/nl43/_system/resume — resume polling. - GET /api/nl43/_system/status — active vs standby + active_connections. - SLMM_POLLING_ENABLED=false starts an instance in standby (persistent way to keep a dev box from latching onto a prod-owned device). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 22:45:52 +00:00
serversdown	0793e7df01	feat: add per-device disconnect endpoint POST /api/nl43/{unit_id}/disconnect cleanly closes (TCP FIN + wait_closed) and drops the pooled connection for a single device, freeing the NL43's one connection slot. Previously only /_connections/flush existed, which tears down every device at once. Idempotent; no-op if nothing is cached. Releases the idle pooled connection only — an active DRD stream/command has the socket checked out of the pool, so close the stream WebSocket to end a live stream. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 22:40:56 +00:00
serversdown	51dd6b682d	feat: surface LN1/LN2 (L1/L10) percentiles through SLMM Completes the SLMM side of the L1/L10 live-display contract. The NL-43's DOD response carries percentile slots LN1-LN5 (channel 1, parts[5]/[6]); parse the first two and expose them as ln1/ln2 end to end: - NL43Snapshot dataclass: ln1/ln2 fields - NL43Status model: ln1/ln2 columns (+ migrate_add_ln_percentiles.py) - DOD parser: snap.ln1=parts[5], snap.ln2=parts[6] - persist_snapshot writes them - all /status data dicts, StatusPayload, and the DRD stream payload emit ln1/ln2 (null on the DRD stream itself, which doesn't carry percentiles) Labels: device LN1 defaults to L5, not L1 — Terra-View defaults the label to L1/L10, so the device's Ln1/Ln2 slots must be set to 1%/10% for the labels to be accurate (dynamic label emission is a follow-up). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 22:01:31 +00:00
serversdown	a7983d2958	fix: correct DOD field parsing and stop measurement-time resets Two device-data bugs surfaced while scoping the live-feed work: 1. DOD parser misalignment. DOD's response has no leading counter and includes LE + LN1-LN5, but the parser reused the DRD field map (parts[0]=counter). That shifted everything: Lp was stored as the counter, Leq as Lp, LE as Leq, and LN1 as Lpeak (visible because "Lpeak" came out below Lmax, which is impossible). Parse DOD with its own map: Lp=0, Leq=1, Lmax=3, Lmin=4, Lpeak=10 (channel 1 = main). 2. measurement_start_time reset on every live-stream open/close. The DOD path tags state "Start"; the DRD stream path tags "Measure". The transition detector treated only "Start" as measuring, so opening the stream ("Start"->"Measure") read as a stop (cleared start time) and closing it ("Measure"->"Start") read as a start (reset to now). Every viewer reset the elapsed measurement time. Treat {"Start","Measure"} both as measuring. LN1/LN2 (L1/L10) parsing + model/serialization is the next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 21:53:00 +00:00
serversdown	d6dd2e736b	Merge pull request 'fix: improve connection pool idle and max age checks to allow disabling' (#3 ) from dev-persistent into main Reviewed-on: #3	2026-06-08 16:56:33 -04:00
serversdown	af86cf713e	fix: reuse pooled TCP connection for DRD streaming stream_drd() discarded the pooled connection and forced a fresh connect. The NL43 allows only one TCP connection at a time; over a cellular link the device does not free its single slot fast enough for an immediate reconnect, so the fresh connect times out — the live DRD stream fails while start/stop commands (which reuse the warm pooled socket) keep working. This surfaced once the persistent connection pool was enabled (TCP_PERSISTENT_ENABLED=true). Stream over the already-open pooled connection via acquire() instead of discard()+_open_connection(), and release() it back to the pool on exit (after sending SUB to stop the stream) so commands keep reusing the same single socket. The per-device lock is held for the whole streaming session, so the poller can't touch the socket concurrently. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 19:00:35 +00:00
serversdown	e3f9ca7f5b	fix: use request-first TemplateResponse signature Modern Starlette requires `request` as the first positional arg to TemplateResponse. The old `TemplateResponse(name, context)` form caused the context dict to be passed as the template name, which Jinja2 then tried to use as a cache key -> TypeError: unhashable type: 'dict' (500 on GET / and /roster). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 17:59:39 +00:00
serversdown	450509d210	stop tracking dev runtime data	2026-03-12 22:46:37 +00:00
serversdown	fefa9eace8	chore: gitignore clean up	2026-03-12 21:34:14 +00:00
serversdown	98a8d357e5	chore: data-dev folder added to gitignore	2026-03-12 21:33:43 +00:00
claude	0a7422eceb	Merge branch 'dev-persistent' of ssh://10.0.0.2:2222/serversdown/slmm into dev-persistent	2026-03-12 20:26:56 +00:00
claude	996b993cb9	chore: gitignore dev data	2026-03-12 20:26:53 +00:00
claude	01337696b3	feat: add connection pool status logging every 15 minutes	2026-02-19 15:09:50 +00:00
claude	a302fd15d4	fix: change debug logs to info level for connection pool events	2026-02-19 06:04:34 +00:00
claude	af5ecc1a92	fix: improve connection pool idle and max age checks to allow disabling	2026-02-19 01:25:01 +00:00
serversdown	ad1a40e0aa	Merge pull request 'v0.3.0, persistent polling update. Persistent TCP connection pool with all features Connection pool diagnostics (API + UI) All 6 new environment variables Changes to health check, diagnostics, and DRD streaming Technical architecture details and cellular' (#2 ) from dev-persistent into main Reviewed-on: #2 v0.3.0	2026-02-16 21:57:37 -05:00
claude	b62e84f8b3	v0.3.0, persistent polling update.	2026-02-17 02:56:11 +00:00
claude	a5f8d1b2c7	Persistent polling interval increased. Healthcheck now uses poll instead of separate handshakes.	2026-02-17 02:41:09 +00:00
claude	a1a80bbb4d	add: new persisent connection approach, env variables for tcp keepalive and persist, added connection pool class.	2026-02-16 04:25:51 +00:00
claude	005e0091fe	fix: delay added to ensure tcp commands dont talk over eachother	2026-02-16 02:42:41 +00:00
claude	e6ac80df6c	chore: add pcap files to gitignore	2026-02-10 21:12:19 +00:00
claude	7070b948a8	add: stress test script for diagnosing TCP connection issues. chore: clean up .gitignore	2026-02-10 07:07:34 +00:00
claude	3b6e9ad3f0	fix: time added to FTP enable step to prevent commands getting messed up	2026-02-06 17:37:10 +00:00
claude	eb0cbcc077	fix: 24hr restart schedule enchanced. Step 0: Pause polling Step 1: Stop measurement → wait 10s Step 2: Disable FTP → wait 10s Step 3: Enable FTP → wait 10s Step 4: Download data Step 5: Wait 30s for device to settle Step 6: Start new measurement Step 7: Re-enable polling	2026-01-31 05:15:00 +00:00
claude	cc0a5bdf84	chore cleanup	2026-01-29 22:44:20 +00:00
claude	bf5f222511	Add: - db cache dump on diagnostics request. - individual device logs, db and files. -Device logs api endpoints and diagnostics UI. Fix: - slmm standalone now uses local TZ (was UTC only before) - fixed measurement start time logic.	2026-01-29 18:50:47 +00:00
claude	eb39a9d1d0	add: device communication lock, Now to send a tcp command, slmm must establish a connection lock to prevent flooding unit. fixed: Background poller intervals increased.	2026-01-29 07:54:49 +00:00
claude	67d63b4173	Merge branch 'main' of ssh://10.0.0.2:2222/serversdown/slmm	2026-01-23 08:29:27 +00:00
claude	25cf9528d0	docs: update to 0.2.1	2026-01-23 08:26:23 +00:00
serversdown	738ad7878e	doc update	2026-01-22 15:30:06 -05:00
claude	152377d608	feat: terra-view scheduler implementation added. Start_cylce and stop_cycle functions added.	2026-01-22 20:25:47 +00:00
claude	4868381053	Enhance FTP logging with detailed phases for connection, authentication, and data transfer	2026-01-21 08:05:38 +00:00
claude	b4bbfd2b01	chore:fixed api.md to confirm FTP/TCP interactions are working.	2026-01-17 08:13:19 +00:00
claude	82651f71b5	Add roster management interface and related API endpoints - Implemented a new `/roster` endpoint to retrieve and manage device configurations. - Added HTML template for the roster page with a table to display device status and actions. - Introduced functionality to add, edit, and delete devices via the roster interface. - Enhanced `ConfigPayload` model to include polling options. - Updated the main application to serve the new roster page and link to it from the index. - Added validation for polling interval in the configuration payload. - Created detailed documentation for the roster management features and API endpoints.	2026-01-17 08:00:05 +00:00
claude	182920809d	chore: docs and scripts organized. clutter cleared.	2026-01-16 19:06:38 +00:00
claude	2a3589ca5c	Add endpoint to delete device configuration and associated status data	2026-01-16 07:39:26 +00:00
claude	d43ef7427f	v0.2.0: async status polling added.	2026-01-16 06:24:13 +00:00
claude	d2b47156d8	Simple diagnostics heartbeat test program added, for debugging.	2026-01-15 20:52:08 +00:00
claude	5b31c2e567	Add endpoint to sync measurement start time from FTP folder timestamp	2026-01-14 21:58:45 +00:00
claude	b74360b6bb	Implement automatic sleep mode disable for NL43/NL53 during config updates and measurements	2026-01-14 19:58:22 +00:00
claude	3d445daf1f	fixed FTP port support to NL43 configuration and client	2026-01-14 01:44:53 +00:00
claude	2cb96a7a1c	Add configurable timezone support with environment variables	2026-01-12 16:31:33 +00:00
claude	6b363b0788	Added: Ability to change store name and overwrite protection	2026-01-08 19:16:59 +00:00
claude	1fb786c262	Fix NL43 DRD field mapping to match official specification Corrected the parsing of NL43 DRD (Dynamic Range Data) and DOD (Data On Demand) responses according to the NL43 Communications Guide. The previous implementation incorrectly mapped d0 (counter field) as a measurement. Changes: - Updated DRD/DOD parsing to skip d0 (counter: 1-600) - Correctly map d1-d5 to lp/leq/lmax/lmin/lpeak measurements - Added inline documentation referencing DRD format specification - Included database migration script to revert incorrect field names DRD format per NL43 spec: - d0 = counter (1-600) - NOT a measurement - d1 = Lp (instantaneous sound pressure level) - d2 = Leq (equivalent continuous sound level) - d3 = Lmax (maximum level) - d4 = Lmin (minimum level) - d5 = Lpeak (peak level) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-07 03:42:26 +00:00
claude	50c9370b8e	Containerized for TV deployment	2026-01-07 01:32:25 +00:00
claude	a297e6c5fe	cleanup time	2026-01-02 21:19:57 +00:00
claude	6ac60eb380	api command reference doc added	2025-12-27 08:01:08 +00:00

1 2

62 Commits