Merge pull request 'Update to v 0.4.0' (#6 ) from dev into main

Reviewed-on: #6
chore: version bump
2026-06-22 18:07:36 -04:00 · 2026-06-22 20:54:43 +00:00 · 2026-06-21 20:22:39 +00:00 · 2026-06-11 23:40:52 +00:00 · 2026-06-11 22:47:39 +00:00 · 2026-06-11 19:36:16 +00:00
38 changed files with 8096 additions and 455 deletions
@@ -1,5 +1,8 @@
 /manuals/
 /data/
 /data-dev/
 /SLM-stress-test/stress_test_logs/
 /SLM-stress-test/tcpdump-runs/
 # Python cache
 __pycache__/
@@ -12,3 +15,5 @@ __pycache__/
 *.egg-info/
 dist/
 build/
 *.pcap
@@ -0,0 +1,251 @@
 # Changelog
 All notable changes to SLMM (Sound Level Meter Manager) will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [0.4.0] - 2026-06-22
 ### Added
 #### Live Monitor (fan-out feed)
 - **Per-device fan-out monitor** - one shared, cached live feed per device. Multiple clients (dashboards, portal, charts) subscribe to the same stream instead of each fighting for the NL-43's single TCP connection: one poller reads the device, all subscribers get the same frames.
 - **WebSocket monitor** - `WS /api/nl43/{unit_id}/monitor` delivers an instant first frame from cache, then live updates.
 - **Monitor control** - `POST /api/nl43/{unit_id}/monitor/{start|stop}`, `GET /api/nl43/_monitor/status`. A persistent `monitor_enabled` flag auto-starts the keepalive on boot.
 - **Adaptive polling** - poll rate adapts to demand; unreachable devices back off; a device-offline alert fires when a monitored unit drops.
 - **De-duplication** - the background poller skips units already covered by an active monitor (no double-polling); a heartbeat keeps the feed warm.
 - **Lower latency** - the monitor caches run state, roughly halving live-feed latency; fan-out emits an instant first frame + offline status to new clients.
 #### Alert Engine
 - **Threshold rules** - per-device alert rules (metric + threshold + cooldown) with full CRUD: `POST/GET/PUT/DELETE /api/nl43/{unit_id}/alerts/rules[/{rule_id}]`.
 - **Events + state machine** - onset/clear tracking via `GET /api/nl43/{unit_id}/alerts/events`; acknowledge with `POST .../events/{event_id}/ack`. A `cooldown_s` is enforced between onsets.
 - **24/7 evaluation** - enabled rules pin the monitor on, so rules evaluate continuously even with no UI client connected.
 - **Resilience** - editing or deleting a rule resets its state and closes any open event; device-offline events are raised when a monitored unit goes unreachable.
 #### Data & History
 - **Live-chart backfill** - a downsampled DOD trail is persisted to a new `nl43_readings` table, exposed via `GET /api/nl43/{unit_id}/history` so charts can backfill recent history on load.
 - **LN1/LN2 percentiles** - L1/L10 (configurable percentiles) surfaced through SLMM in the status and live-feed payloads.
 - **measurement_start_time** included in the cached `/status` response.
 #### Device control
 - **Per-device disconnect** - `POST /api/nl43/{unit_id}/disconnect` drops a device's pooled connection.
 - **Deactivate / standby** - `POST /api/nl43/{unit_id}/deactivate` and global `POST /api/nl43/_system/standby` to quiesce polling/monitoring.
 ### Changed
 - **DRD streaming reuses the pooled connection** rather than opening a separate socket, avoiding contention with the persistent pool on a single-connection device.
 - **Connection pool** - idle-TTL / max-age checks can now be disabled; pool status is logged periodically.
 ### Fixed
 - **Measurement-start confirmation** - `/start` now recognizes the device's `Start` state. It previously waited for `Measure`, which never matched, so the start cycle ran the full retry loop and Terra-View's proxy timed out with a misleading "Unknown error" even though the device had started.
 - **Garbled reads** - corrupted measurement-state reads that produced phantom STOPPED/STARTED transitions are now ignored.
 - **DOD parsing** - corrected field parsing and stopped spurious measurement-time resets.
 - **Monitor WebSocket** - quieted a send-after-close race on client disconnect.
 ### Database
 - **New tables** (auto-created on startup via `Base.metadata.create_all`): `alert_rules`, `alert_events`, `nl43_readings`.
 - **Migrations for existing tables** (run once per database): `migrate_add_ln_percentiles.py` (LN1/LN2 on `nl43_status`), `migrate_add_monitor_enabled.py` (`monitor_enabled` on `nl43_config`).
 ### Notes
 - Pairs with the matching Terra-View `dev` build, which reads SLMM's `/monitor` fan-out feed for live SLM dashboards (L1/L10 lines, live-chart backfill). Ship the two together.
 ---
 ## [0.3.0] - 2026-02-17
 ### Added
 #### Persistent TCP Connection Pool
 - **Connection reuse** - TCP connections are cached per device and reused across commands, eliminating repeated TCP handshakes over cellular modems
 - **OS-level TCP keepalive** - Configurable keepalive probes keep cellular NAT tables alive and detect dead connections early (default: probe after 15s idle, every 10s, 3 failures = dead)
 - **Transparent retry** - If a cached connection goes stale, the system automatically retries with a fresh connection so failures are never visible to the caller
 - **Stale connection detection** - Multi-layer detection via idle TTL, max age, transport state, and reader EOF checks
 - **Background cleanup** - Periodic task (every 30s) evicts expired connections from the pool
 - **Master switch** - Set `TCP_PERSISTENT_ENABLED=false` to revert to per-request connection behavior
 #### Connection Pool Diagnostics
 - `GET /api/nl43/_connections/status` - View pool configuration, active connections, age/idle times, and keepalive settings
 - `POST /api/nl43/_connections/flush` - Force-close all cached connections (useful for debugging)
 - **Connections tab on roster page** - Live UI showing pool config, active connections with age/idle/alive status, auto-refreshes every 5s, and flush button
 #### Environment Variables
 - `TCP_PERSISTENT_ENABLED` (default: `true`) - Master switch for persistent connections
 - `TCP_IDLE_TTL` (default: `300`) - Close idle connections after N seconds
 - `TCP_MAX_AGE` (default: `1800`) - Force reconnect after N seconds
 - `TCP_KEEPALIVE_IDLE` (default: `15`) - Seconds idle before keepalive probes start
 - `TCP_KEEPALIVE_INTERVAL` (default: `10`) - Seconds between keepalive probes
 - `TCP_KEEPALIVE_COUNT` (default: `3`) - Failed probes before declaring connection dead
 ### Changed
 - **Health check endpoint** (`/health/devices`) - Now uses connection pool instead of opening throwaway TCP connections; checks for existing live connections first (zero-cost), only opens new connection through pool if needed
 - **Diagnostics endpoint** - Removed separate port 443 modem check (extra handshake waste); TCP reachability test now uses connection pool
 - **DRD streaming** - Streaming connections now get TCP keepalive options set; cached connections are evicted before opening dedicated streaming socket
 - **Default timeouts tuned for cellular** - Idle TTL raised to 300s (5 min), max age raised to 1800s (30 min) to survive typical polling intervals over cellular links
 ### Technical Details
 #### Architecture
 - `ConnectionPool` class in `services.py` manages a single cached connection per device key (NL-43 only supports one TCP connection at a time)
 - Uses existing per-device asyncio locks and rate limiting — no changes to concurrency model
 - Pool is a module-level singleton initialized from environment variables at import time
 - Lifecycle managed via FastAPI lifespan: cleanup task starts on startup, all connections closed on shutdown
 - `_send_command_unlocked()` refactored to use acquire/release/discard pattern with single-retry fallback
 - Command parsing extracted to `_execute_command()` method for reuse between primary and retry paths
 #### Cellular Modem Optimizations
 - Keepalive probes at 15s prevent cellular NAT tables from expiring (typically 30-60s timeout)
 - 300s idle TTL ensures connections survive between polling cycles (default 60s interval)
 - 1800s max age allows a single socket to serve ~30 minutes of polling before forced reconnect
 - Health checks and diagnostics produce zero additional TCP handshakes when a pooled connection exists
 - Stale `$` prompt bytes drained from idle connections before command reuse
 ### Breaking Changes
 None. This release is fully backward-compatible with v0.2.x. Set `TCP_PERSISTENT_ENABLED=false` for identical behavior to previous versions.
 ---
 ## [0.2.1] - 2026-01-23
 ### Added
 - **Roster management**: UI and API endpoints for managing device rosters.
 - **Delete config endpoint**: Remove device configuration alongside cached status data.
 - **Scheduler hooks**: `start_cycle` and `stop_cycle` helpers for Terra-View scheduling integration.
 ### Changed
 - **FTP logging**: Connection, authentication, and transfer phases now log explicitly.
 - **Documentation**: Reorganized docs/scripts and updated API notes for FTP/TCP verification.
 ## [0.2.0] - 2026-01-15
 ### Added
 #### Background Polling System
 - **Continuous automatic device polling** - Background service that continuously polls configured devices
 - **Per-device configurable intervals** - Each device can have custom polling interval (10-3600 seconds, default 60)
 - **Automatic offline detection** - Devices automatically marked unreachable after 3 consecutive failures
 - **Reachability tracking** - Database fields track device health with failure counters and error messages
 - **Dynamic sleep scheduling** - Polling service adjusts sleep intervals based on device configurations
 - **Graceful lifecycle management** - Background poller starts on application startup and stops cleanly on shutdown
 #### New API Endpoints
 - `GET /api/nl43/{unit_id}/polling/config` - Get device polling configuration
 - `PUT /api/nl43/{unit_id}/polling/config` - Update polling interval and enable/disable per-device polling
 - `GET /api/nl43/_polling/status` - Get global polling status for all devices with reachability info
 #### Database Schema Changes
 - **NL43Config table**:
  - `poll_interval_seconds` (Integer, default 60) - Polling interval in seconds
  - `poll_enabled` (Boolean, default true) - Enable/disable background polling per device
 - **NL43Status table**:
  - `is_reachable` (Boolean, default true) - Current device reachability status
  - `consecutive_failures` (Integer, default 0) - Count of consecutive poll failures
  - `last_poll_attempt` (DateTime) - Last time background poller attempted to poll
  - `last_success` (DateTime) - Last successful poll timestamp
  - `last_error` (Text) - Last error message (truncated to 500 chars)
 #### New Files
 - `app/background_poller.py` - Background polling service implementation
 - `migrate_add_polling_fields.py` - Database migration script for v0.2.0 schema changes
 - `test_polling.sh` - Comprehensive test script for polling functionality
 - `CHANGELOG.md` - This changelog file
 ### Changed
 - **Enhanced status endpoint** - `GET /api/nl43/{unit_id}/status` now includes polling-related fields (is_reachable, consecutive_failures, last_poll_attempt, last_success, last_error)
 - **Application startup** - Added lifespan context manager in `app/main.py` to manage background poller lifecycle
 - **Performance improvement** - Terra-View requests now return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds)
 ### Technical Details
 #### Architecture
 - Background poller runs as async task using `asyncio.create_task()`
 - Uses existing `NL43Client` and `persist_snapshot()` functions - no code duplication
 - Respects existing 1-second rate limiting per device
 - Efficient resource usage - skips work when no devices configured
 - WebSocket streaming remains unaffected - separate real-time data path
 #### Default Behavior
 - Existing devices automatically get 60-second polling interval
 - Existing status records default to `is_reachable=true`
 - Migration is additive-only - no data loss
 - Polling can be disabled per-device via `poll_enabled=false`
 #### Recommended Intervals
 - Critical monitoring: 30 seconds
 - Normal monitoring: 60 seconds (default)
 - Battery conservation: 300 seconds (5 minutes)
 - Development/testing: 10 seconds (minimum allowed)
 ### Migration Notes
 To upgrade from v0.1.x to v0.2.0:
 1. **Stop the service** (if running):
   ```bash
   docker compose down slmm
   # OR
   # Stop your uvicorn process
   ```
 2. **Update code**:
   ```bash
   git pull
   # OR copy new files
   ```
 3. **Run migration**:
   ```bash
   cd slmm
   python3 migrate_add_polling_fields.py
   ```
 4. **Restart service**:
   ```bash
   docker compose up -d --build slmm
   # OR
   uvicorn app.main:app --host 0.0.0.0 --port 8100
   ```
 5. **Verify polling is active**:
   ```bash
   curl http://localhost:8100/api/nl43/_polling/status | jq '.'
   ```
 You should see `"poller_running": true` and all configured devices listed.
 ### Breaking Changes
 None. This release is fully backward-compatible with v0.1.x. All existing endpoints and functionality remain unchanged.
 ---
 ## [0.1.0] - 2025-12-XX
 ### Added
 - Initial release
 - REST API for NL43/NL53 sound level meter control
 - TCP command protocol implementation
 - FTP file download support
 - WebSocket streaming for real-time data (DRD)
 - Device configuration management
 - Measurement control (start, stop, pause, resume, reset, store)
 - Device information endpoints (battery, clock, results)
 - Measurement settings management (frequency/time weighting)
 - Sleep mode control
 - Rate limiting (1-second minimum between commands)
 - SQLite database for device configs and status cache
 - Health check endpoints
 - Comprehensive API documentation
 - NL43 protocol documentation
 ### Database Schema (v0.1.0)
 - **NL43Config table** - Device connection configuration
 - **NL43Status table** - Measurement snapshot cache
 ---
 ## Version History Summary
 - **v0.3.0** (2026-02-17) - Persistent TCP connections with keepalive for cellular modem reliability
 - **v0.2.1** (2026-01-23) - Roster management, scheduler hooks, FTP logging, doc cleanup
 - **v0.2.0** (2026-01-15) - Background Polling System
 - **v0.1.0** (2025-12-XX) - Initial Release
@@ -1,15 +1,23 @@
 # SLMM - Sound Level Meter Manager
 **Version 0.4.0**
 Backend API service for controlling and monitoring Rion NL-43/NL-53 Sound Level Meters via TCP and FTP protocols.
 ## Overview
 SLMM is a standalone backend module that provides REST API routing and command translation for NL43/NL53 sound level meters. This service acts as a bridge between the hardware devices and frontend applications, handling all device communication, data persistence, and protocol management.
-**Note:** This is a backend-only service. Actual user interfacing is done via [SFM/Terra-View](https://github.com/your-org/terra-view) frontend applications.
+**Note:** This is a backend-only service. Actual user interfacing is done via customized front ends or cli.
 ## Features
 - **Live Monitor (fan-out)**: One shared cached live feed per device — many clients subscribe to the same stream instead of fighting over the meter's single TCP connection
 - **Alert Engine**: Per-device threshold rules with onset/clear events, cooldowns, acks, and 24/7 evaluation
 - **History & Percentiles**: Downsampled DOD trail + history endpoint for live-chart backfill; LN1/LN2 (L1/L10) percentiles surfaced through the feed
 - **Persistent TCP Connections**: Cached per-device connections with OS-level keepalive, tuned for cellular modem reliability
 - **Background Polling**: Continuous automatic polling of devices with configurable intervals
 - **Offline Detection**: Automatic device reachability tracking with failure counters
 - **Device Management**: Configure and manage multiple NL43/NL53 devices
 - **Real-time Monitoring**: Stream live measurement data via WebSocket
 - **Measurement Control**: Start, stop, pause, resume, and reset measurements
@@ -18,22 +26,72 @@ SLMM is a standalone backend module that provides REST API routing and command t
 - **Device Configuration**: Manage frequency/time weighting, clock sync, and more
 - **Rate Limiting**: Automatic 1-second delay enforcement between device commands
 - **Persistent Storage**: SQLite database for device configs and measurement cache
 - **Connection Diagnostics**: Live UI and API endpoints for monitoring TCP connection pool status
 ## Architecture
 ```
-┌─────────────────┐         ┌──────────────┐         ┌─────────────────┐
+┌─────────────────┐         ┌──────────────────────────────┐         ┌─────────────────┐
-│  Terra-View UI  │◄───────►│  SLMM API    │◄───────►│  NL43/NL53      │
+│                 │◄───────►│  SLMM API                    │◄───────►│  NL43/NL53      │
-│  (Frontend)     │  HTTP   │  (Backend)   │  TCP    │  Sound Meters   │
+│  (Frontend)     │  HTTP   │  • REST Endpoints            │  TCP    │  Sound Meters   │
-└─────────────────┘         └──────────────┘         └─────────────────┘
+└─────────────────┘         │  • WebSocket Streaming       │  (kept  │  (via cellular  │
                            │  • Background Poller         │  alive) │   modem)        │
                            │  • Connection Pool (v0.3)    │         └─────────────────┘
                            └──────────────────────────────┘
                                          │
                                          ▼
                                  ┌──────────────┐
                                  │  SQLite DB   │
-                            │  (Cache)     │
+                                  │  • Config    │
                                  │  • Status    │
                                  └──────────────┘
 ```
 ### Live Monitor — Fan-Out Feed (v0.4.0)
 The NL-43 allows only one TCP control connection at a time, so multiple clients
 polling the same device directly would contend for it. The monitor solves this
 with a single shared, cached feed per device:
 - **One reader, many subscribers**: a single poller reads the device; every
  WebSocket subscriber (`WS /api/nl43/{unit_id}/monitor`) receives the same
  frames — an instant first frame from cache, then live updates.
 - **Persistent + auto-start**: a `monitor_enabled` flag keeps the feed running
  and auto-starts it on boot. Enabled alert rules pin the monitor on for 24/7
  evaluation even with no UI connected.
 - **Adaptive & deduplicated**: poll rate adapts to demand, unreachable devices
  back off, and the background poller skips units already covered by a monitor.
 ### Alert Engine (v0.4.0)
 Per-device threshold alerting evaluated against the live feed:
 - **Rules**: metric + threshold + `cooldown_s`, full CRUD per device
 - **Events**: onset/clear state machine, acknowledgement, and a device-offline
  alert when a monitored unit drops
 - **Robust**: editing/deleting a rule resets its state and closes open events
 ### Persistent TCP Connection Pool (v0.3.0)
 SLMM maintains persistent TCP connections to devices with OS-level keepalive, designed for reliable operation over cellular modems:
 - **Connection Reuse**: One cached TCP socket per device, reused across all commands (no repeated handshakes)
 - **TCP Keepalive**: Probes keep cellular NAT tables alive and detect dead connections early
 - **Transparent Retry**: Stale cached connections automatically retry with a fresh socket
 - **Configurable**: Idle TTL (300s), max age (1800s), and keepalive timing via environment variables
 - **Diagnostics**: Live UI on the roster page and API endpoints for monitoring pool status
 ### Background Polling (v0.2.0)
 Background polling service continuously queries devices and updates the status cache:
 - **Automatic Updates**: Devices are polled at configurable intervals (10-3600 seconds)
 - **Offline Detection**: Devices marked unreachable after 3 consecutive failures
 - **Per-Device Configuration**: Each device can have a custom polling interval
 - **Resource Efficient**: Dynamic sleep intervals and smart scheduling
 Status requests return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds).
 ## Quick Start
 ### Prerequisites
@@ -77,9 +135,18 @@ Once running, visit:
 ### Environment Variables
 **Server:**
 - `PORT`: Server port (default: 8100)
 - `CORS_ORIGINS`: Comma-separated list of allowed origins (default: "*")
 **TCP Connection Pool:**
 - `TCP_PERSISTENT_ENABLED`: Enable persistent connections (default: "true")
 - `TCP_IDLE_TTL`: Close idle connections after N seconds (default: 300)
 - `TCP_MAX_AGE`: Force reconnect after N seconds (default: 1800)
 - `TCP_KEEPALIVE_IDLE`: Seconds idle before keepalive probes (default: 15)
 - `TCP_KEEPALIVE_INTERVAL`: Seconds between keepalive probes (default: 10)
 - `TCP_KEEPALIVE_COUNT`: Failed probes before declaring dead (default: 3)
 ### Database
 The SQLite database is automatically created at [data/slmm.db](data/slmm.db) on first run.
@@ -103,10 +170,49 @@ Logs are written to:
 | Method | Endpoint | Description |
 |--------|----------|-------------|
-| GET | `/api/nl43/{unit_id}/status` | Get cached measurement snapshot |
+| GET | `/api/nl43/{unit_id}/status` | Get cached measurement snapshot (updated by background poller) |
-| GET | `/api/nl43/{unit_id}/live` | Request fresh DOD data from device |
+| GET | `/api/nl43/{unit_id}/live` | Request fresh DOD data from device (bypasses cache) |
 | GET | `/api/nl43/{unit_id}/history` | Downsampled DOD trail for live-chart backfill |
 | WS | `/api/nl43/{unit_id}/stream` | WebSocket stream for real-time DRD data |
 ### Live Monitor (fan-out feed)
 | Method | Endpoint | Description |
 |--------|----------|-------------|
 | WS | `/api/nl43/{unit_id}/monitor` | Subscribe to the shared cached live feed (instant first frame) |
 | POST | `/api/nl43/{unit_id}/monitor/start` | Start the device's monitor feed |
 | POST | `/api/nl43/{unit_id}/monitor/stop` | Stop the device's monitor feed |
 | GET | `/api/nl43/_monitor/status` | Global monitor status across devices |
 | POST | `/api/nl43/{unit_id}/disconnect` | Drop the device's pooled TCP connection |
 | POST | `/api/nl43/{unit_id}/deactivate` | Quiesce polling/monitoring for one device |
 | POST | `/api/nl43/_system/standby` | Global standby — quiesce all polling/monitoring |
 ### Alerts
 | Method | Endpoint | Description |
 |--------|----------|-------------|
 | GET | `/api/nl43/{unit_id}/alerts/rules` | List alert rules for a device |
 | POST | `/api/nl43/{unit_id}/alerts/rules` | Create an alert rule (metric, threshold, cooldown) |
 | PUT | `/api/nl43/{unit_id}/alerts/rules/{rule_id}` | Update a rule (resets its state, closes open events) |
 | DELETE | `/api/nl43/{unit_id}/alerts/rules/{rule_id}` | Delete a rule |
 | GET | `/api/nl43/{unit_id}/alerts/events` | List alert events (onset/clear) |
 | POST | `/api/nl43/{unit_id}/alerts/events/{event_id}/ack` | Acknowledge an event |
 ### Background Polling
 | Method | Endpoint | Description |
 |--------|----------|-------------|
 | GET | `/api/nl43/{unit_id}/polling/config` | Get device polling configuration |
 | PUT | `/api/nl43/{unit_id}/polling/config` | Update polling interval and enable/disable polling |
 | GET | `/api/nl43/_polling/status` | Get global polling status for all devices |
 ### Connection Pool
 | Method | Endpoint | Description |
 |--------|----------|-------------|
 | GET | `/api/nl43/_connections/status` | Get pool config, active connections, age/idle times |
 | POST | `/api/nl43/_connections/flush` | Force-close all cached TCP connections |
 ### Measurement Control
 | Method | Endpoint | Description |
@@ -167,6 +273,7 @@ slmm/
 │   ├── routers.py           # API route definitions
 │   ├── models.py            # SQLAlchemy database models
 │   ├── services.py          # NL43Client and business logic
 │   ├── background_poller.py # Background polling service ⭐ NEW
 │   └── database.py          # Database configuration
 ├── data/
 │   ├── slmm.db              # SQLite database (auto-created)
@@ -175,9 +282,12 @@ slmm/
 ├── templates/
 │   └── index.html           # Simple web interface (optional)
 ├── manuals/                 # Device documentation
 ├── migrate_add_polling_fields.py  # Database migration for v0.2.0 ⭐ NEW
 ├── test_polling.sh          # Polling feature test script ⭐ NEW
 ├── API.md                   # Detailed API documentation
 ├── COMMUNICATION_GUIDE.md   # NL43 protocol documentation
 ├── NL43_COMMANDS.md         # Command reference
 ├── CHANGELOG.md             # Version history ⭐ NEW
 ├── requirements.txt         # Python dependencies
 └── README.md                # This file
 ```
@@ -194,12 +304,16 @@ Stores device connection configuration:
 - `ftp_username`: FTP authentication username
 - `ftp_password`: FTP authentication password
 - `web_enabled`: Enable/disable web interface access
 - `poll_interval_seconds`: Polling interval in seconds (10-3600, default: 60) ⭐ NEW
 - `poll_enabled`: Enable/disable background polling for this device ⭐ NEW
 ### NL43Status Table
 Caches latest measurement snapshot:
 - `unit_id` (PK): Unique device identifier
 - `last_seen`: Timestamp of last update
 - `measurement_state`: Current state (Measure/Stop)
 - `measurement_start_time`: When measurement started (UTC)
 - `counter`: Measurement interval counter (1-600)
 - `lp`: Instantaneous sound pressure level
 - `leq`: Equivalent continuous sound level
 - `lmax`: Maximum sound level
@@ -210,11 +324,43 @@ Caches latest measurement snapshot:
 - `sd_remaining_mb`: Free SD card space (MB)
 - `sd_free_ratio`: SD card free space ratio
 - `raw_payload`: Raw device response data
 - `is_reachable`: Device reachability status (Boolean)
 - `consecutive_failures`: Count of consecutive poll failures
 - `last_poll_attempt`: Last time background poller attempted to poll
 - `last_success`: Last successful poll timestamp
 - `last_error`: Last error message (truncated to 500 chars)
 - `ln1` / `ln2`: LN1/LN2 (L1/L10) percentile levels ⭐ v0.4.0
 ### NL43Readings Table ⭐ v0.4.0
 Downsampled DOD trail backing the live-chart history endpoint (one row/minute,
 pruned to a retention window — viewing only, not the report source):
 - `id` (PK), `unit_id`, `timestamp`
 - `lp` / `leq` / `lmax` / `ln1` / `ln2`: cached level samples
 ### AlertRule Table ⭐ v0.4.0
 Per-device threshold alert rules:
 - `id` (PK), `unit_id`, `name`, `enabled`
 - `metric`, `comparison` (above/below), `threshold_db`, `clear_margin_db` (hysteresis)
 - `duration_s` (sustained), `cooldown_s` (min seconds between onsets)
 - `channels` / `recipients`, optional `schedule_start`/`schedule_end`/`schedule_days`
 ### AlertEvent Table ⭐ v0.4.0
 Alert onset/clear events for history, inbox, and acknowledgement:
 - `id` (PK), `unit_id`, `rule_id`, `rule_name`, `metric`, `threshold_db`
 - `onset_at` / `onset_value`, `peak_value`, `clear_at`, `status` (active/cleared)
 - `acknowledged_at` / `acknowledged_by`, `notes`
 > New tables (`alert_rules`, `alert_events`, `nl43_readings`) auto-create on
 > startup. Existing-table columns ship with migrations:
 > `migrate_add_ln_percentiles.py`, `migrate_add_monitor_enabled.py`.
 ## Protocol Details
 ### TCP Communication
 - Uses ASCII command protocol over TCP
 - Persistent connections with OS-level keepalive (tuned for cellular modems)
 - Connections cached per device and reused across commands
 - Transparent retry on stale connections
 - Enforces ≥1 second delay between commands to same device
 - Two-line response format:
  - Line 1: Result code (R+0000 for success)
@@ -253,11 +399,43 @@ curl -X PUT http://localhost:8100/api/nl43/meter-001/config \
 curl -X POST http://localhost:8100/api/nl43/meter-001/start
 ```
-### Get Live Status
+### Get Cached Status (Fast - from background poller)
 ```bash
 curl http://localhost:8100/api/nl43/meter-001/status
 ```
 ### Get Live Status (Bypasses cache)
 ```bash
 curl http://localhost:8100/api/nl43/meter-001/live
 ```
 ### Configure Background Polling ⭐ NEW
 ```bash
 # Set polling interval to 30 seconds
 curl -X PUT http://localhost:8100/api/nl43/meter-001/polling/config \
  -H "Content-Type: application/json" \
  -d '{
    "poll_interval_seconds": 30,
    "poll_enabled": true
  }'
 # Get polling configuration
 curl http://localhost:8100/api/nl43/meter-001/polling/config
 # Check global polling status
 curl http://localhost:8100/api/nl43/_polling/status
 ```
 ### Check Connection Pool Status
 ```bash
 curl http://localhost:8100/api/nl43/_connections/status | jq '.'
 ```
 ### Flush All Cached Connections
 ```bash
 curl -X POST http://localhost:8100/api/nl43/_connections/flush
 ```
 ### Verify Device Settings
 ```bash
 curl http://localhost:8100/api/nl43/meter-001/settings
@@ -326,11 +504,19 @@ See [API.md](API.md) for detailed integration examples.
 ## Troubleshooting
 ### Connection Issues
 - Check connection pool status: `curl http://localhost:8100/api/nl43/_connections/status`
 - Flush stale connections: `curl -X POST http://localhost:8100/api/nl43/_connections/flush`
 - Verify device IP address and port in configuration
 - Ensure device is on the same network
 - Check firewall rules allow TCP/FTP connections
 - Verify RX55 network adapter is properly configured on device
 ### Cellular Modem Issues
 - If modem wedges from too many handshakes, ensure `TCP_PERSISTENT_ENABLED=true` (default)
 - Increase `TCP_IDLE_TTL` if connections expire between poll cycles
 - Keepalive probes (default: every 15s) keep NAT tables alive — adjust `TCP_KEEPALIVE_IDLE` if needed
 - Set `TCP_PERSISTENT_ENABLED=false` to disable pooling for debugging
 ### Rate Limiting
 - API automatically enforces 1-second delay between commands
 - If experiencing delays, this is normal device behavior
@@ -356,13 +542,31 @@ pytest
 ### Database Migrations
 ```bash
-# Migrate existing database to add FTP credentials
+# Migrate to v0.2.0 (add background polling fields)
 python3 migrate_add_polling_fields.py
 # Legacy: Migrate to add FTP credentials
 python migrate_add_ftp_credentials.py
 # Set FTP credentials for a device
 python set_ftp_credentials.py <unit_id> <username> <password>
 ```
 ### Testing Background Polling
 ```bash
 # Run comprehensive polling tests
 ./test_polling.sh [unit_id]
 # Test settings endpoint
 python3 test_settings_endpoint.py <unit_id>
 # Test sleep mode auto-disable
 python3 test_sleep_mode_auto_disable.py <unit_id>
 ```
 ### Legacy Scripts
 Old migration scripts and manual polling tools have been moved to `archive/` for reference. See [archive/README.md](archive/README.md) for details.
 ## Contributing
 This is a standalone module kept separate from the SFM/Terra-View codebase. When contributing:
@@ -0,0 +1,403 @@
 # NL-43 + RX55 TCP “Wedge” Investigation (2255 Refusal) — Full Log & Next Steps
 **Last updated:** 2026-02-18  
 **Owner:** Brian / serversdown  
 **Context:** Terra-View / SLMM / field-deployed Rion NL-43 behind Sierra Wireless RX55
 ---
 ## 0) What this document is
 This is a **comprehensive, chronological** record of the debugging we did to isolate a failure where the **NL-43’s TCP control port (2255) eventually stops accepting connections** (“wedges”), while other services (notably FTP/21) remain reachable.
 This is written to be fed back into future troubleshooting, so it intentionally includes the **full reasoning chain, experiments, commands, packet evidence, and conclusions**.
 ---
 ## 1) Architecture (as tested)
 ### Network path
 - **Server (SLMM host):** `10.0.0.40`
 - **RX55 WAN IP:** `63.45.161.30`
 - **RX55 LAN subnet:** `192.168.1.0/24`
 - **RX55 LAN gateway:** `192.168.1.1`
 - **NL-43 LAN IP:** `192.168.1.10` (confirmed via ARP OUI + ping; see LAN validation)
 ### RX55 details
 - **Sierra Wireless RX55**
 - **OS:** 5.2
 - **Firmware:** `01.14.24.00`
 - **Carrier:** Verizon LTE (Band 66)
 ### Port forwarding rules (RX55)
 - **WAN:2255 → NL-43:2255**  (NL-43 TCP control)
 - **WAN:21   → NL-43:21**    (NL-43 FTP control)
 You also experimented with additional forwards:
 - **WAN:2253 → NL-43:2255** (test)
 - **WAN:2253 → NL-43:2253** (test)
 - **WAN:4450 → NL-43:4450** (test)
 **Important:** Rule “Input zone / interface” was set to **WAN-NAT**, and Source IP left as **Any IPv4**. This is correct for inbound port-forward behavior on Sierra OS 5.x.
 ---
 ## 2) Original problem statement (the “wedge”)
 After running for hours, the NL-43 becomes unreachable over TCP control.
 ### Symptom signature (WAN-side)
 - Client attempts to connect to `63.45.161.30:2255`
 - Instead of timing out, the client gets **connection refused** quickly.
 - Packet-level: SYN from client → **RST,ACK** back (meaning active refusal vs silent drop)
 ### Critical operational behavior
 - **Power cycling the NL-43 fixes it.**
 - **Power cycling the RX55 does NOT fix it.**
 - FTP sometimes remains available even while TCP control (2255) is dead.
 This combination is what forced us to determine whether:
 - The RX55 is rejecting connections, OR
 - The NL-43 is no longer listening on 2255, OR
 - Something about the RX55 path triggers the NL-43’s control listener to die.
 ---
 ## 3) Event timeline evidence (SLMM logs)
 A concrete wedge window was observed on **2026-02-18**:
 - 10:55:46 AM — Poll success (Start)
 - 11:00:28 AM — Measurement STOPPED (scheduled stop/download cycle succeeded)
 - 11:55:50 AM — Poll success (Stop)
 - 12:55:55 PM — Poll success (Stop)
 - **1:55:58 PM — Poll failed (attempt 1/3): Errno 111 (connection refused)**
 - 2:56:02 PM — Poll failed (attempt 2/3): Errno 111 (connection refused)
 Key interpretation:
 - The wedge occurred sometime between **12:55 and 1:55**.
 - The failure type is **refused**, not timeout.
 ---
 ## 4) Early hypotheses (before proof)
 We considered two main buckets:
 ### A) NL-43-side failure (most suspicious)
 - NL-43 TCP control service crashes / exits / unbinds from 2255
 - socket leak / accept backlog exhaustion
 - “single control session allowed” and it gets stuck thinking a session is active
 - mode/service manager bug (service restart fails after other activities)
 - firmware bug in TCP daemon
 ### B) RX55-side failure (possible trigger / less likely once FTP works)
 - NAT/forwarding table corruption
 - firewall behavior
 - helper/ALG interference
 - MSS/MTU weirdness causing edge-case behavior
 - session churn behavior causing downstream issues
 ---
 ## 5) Key experiments and what they proved
 ### 5.1) LAN-only stability test (No RX55 path)
 **Test:** NL-43 tested directly on LAN (no modem path involved).
 - Ran **24+ hours**
 - Scheduler start/stop cycles worked
 - Stress test: **500 commands @ 1/sec** → no failure
 - Response time trend decreased (not degrading)
 **Result:** The NL-43 appears stable in a “pure LAN” environment.
 **Interpretation:** The trigger is likely related to the RX55/WAN environment, connection patterns, or service switching patterns—not just simple uptime.
 ---
 ### 5.2) Port-forward behavior: timeout vs refused (RX55 behavior characterization)
 You observed:
 - **If a WAN port is NOT forwarded (no rule):** connecting to that port **times out** (silent drop)
 - **If a WAN port IS forwarded to NL-43 but nothing listens:** it **actively refuses** (RST)
 Concrete example:
 - Port **4450** with no rule → timeout
 - Port **4450 → NL-43:4450** rule created → connection refused
 **Interpretation:** This confirms the RX55 is actually forwarding packets to the NL-43 when a rule exists. “Refused” is consistent with the NL-43 (or RX55 relay behavior) responding quickly because the packet reached the target.
 Important nuance:
 - A “refused” on forwarded ports does **not** automatically prove the NL-43 is the one generating RST, because NAT hides the inside host and the RX55 could reject on behalf of an unreachable target. We needed a LAN-side proof test to close the loop.
 ---
 ### 5.3) UDP test confusion (and resolution)
 You ran:
 ```bash
 nc -vzu 63.45.161.30 2255
 nc -vz  63.45.161.30 2255
 ```
 Observed:
 - UDP: “succeeded”
 - TCP: “connection refused”
 Resolution:
 - UDP has **no handshake**. netcat prints “succeeded” if it doesn’t immediately receive an ICMP unreachable. It does **not** mean a UDP service exists.
 - TCP refused is meaningful: a RST implies “no listener” or “actively rejected.”
 **Net effect:** UDP test did not change the diagnosis.
 ---
 ### 5.4) Packet capture proof (WAN-side)
 You captured a Wireshark/tcpdump summary with these key patterns:
 #### Port 2255 (TCP control)
 Example:
 - `10.0.0.40 → 63.45.161.30:2255` SYN
 - `63.45.161.30 → 10.0.0.40` **RST, ACK** within ~50ms
 This happened repeatedly.
 #### Port 2253 (test port)
 Multiple SYN attempts to 2253 showed **retransmissions and no response**, i.e., **silent drop** (consistent with no rule or not forwarded at that moment).
 #### Port 21 (FTP)
 Clean 3-way handshake:
 - SYN → SYN/ACK → ACK
 Then:
 - FTP server banner: `220 Connection Ready`
 Then:
 - `530 Not logged in` (because SLMM was sending non-FTP “requests” as an experiment)
 Session closes cleanly.
 **Key takeaway from capture:**
 - TCP transport to NL-43 via RX55 is definitely working (port 21 proves it).
 - Port 2255 is being actively refused.
 This strongly suggested “2255 listener is gone,” but still didn’t fully prove whether the refusal was generated internally by NL-43 or by RX55 on behalf of NL-43.
 ---
 ## 6) The decisive experiment: LAN-side test while wedged (final proof)
 Because the RX55 does not offer SSH, the plan was to test from **inside the LAN behind the RX55**.
 ### 6.1) Physical LAN tap setup
 Constraint:
 - NL-43 has only one Ethernet port.
 Solution:
 - Insert an unmanaged switch:
  - RX55 LAN → switch
  - NL-43 → switch
  - Windows 10 laptop → switch
 This creates a shared L2 segment where the laptop can test NL-43 directly.
 ### 6.2) Windows LAN validation
 On the Windows laptop:
 - `ipconfig` showed:
  - IP: `192.168.1.100`
  - Gateway: `192.168.1.1` (RX55)
 - Initial `arp -a` only showed RX55, not NL-43.
 You then:
 - pinged likely host addresses and discovered NL-43 responds on **192.168.1.10**
 - `arp -a` then showed:
  - `192.168.1.10 → 00-10-50-14-0a-d8`
  - OUI `00-10-50` recognized as **Rion** (matches NL-43)
 So LAN identities were confirmed:
 - RX55: `192.168.1.1`
 - NL-43: `192.168.1.10`
 ### 6.3) The LAN port tests (the smoking gun)
 From Windows:
 ```powershell
 Test-NetConnection -ComputerName 192.168.1.10 -Port 2255
 Test-NetConnection -ComputerName 192.168.1.10 -Port 21
 ```
 Results (while the unit was “wedged” from the WAN perspective):
 - **2255:** `TcpTestSucceeded : False`
 - **21:**   `TcpTestSucceeded : True`
 **Conclusion (PROVEN):**
 - The NL-43 is reachable on the LAN
 - FTP port 21 is alive
 - **The NL-43 is NOT listening on TCP port 2255**
 - Therefore the RX55 is not the root cause of the refusal. The WAN refusal is consistent with the NL-43 having no listener on 2255.
 This is now settled.
 ---
 ## 7) What we learned (final conclusions)
 ### 7.1) RX55 innocence (for this failure mode)
 The RX55 is not “randomly rejecting” or “breaking TCP” in the way originally feared.
 It successfully forwards and supports TCP to the NL-43 on port 21, and the LAN-side test proves the 2255 failure exists *even without NAT/WAN involvement*.
 ### 7.2) NL-43 control listener failure
 The NL-43’s TCP control service (port 2255) stops listening while:
 - the device remains alive
 - the LAN stack remains alive (ping)
 - FTP remains alive (port 21)
 This looks like one of:
 - control daemon crash/exit
 - service unbind
 - stuck service state (e.g., “busy” / “session active forever”)
 - resource leak (sockets/file descriptors) specific to the control service
 - firmware service manager bug (start/stop of services fails after certain sequences)
 ---
 ## 8) Additional constraint discovered: “Web App mode” conflicts
 You noted an important operational constraint:
 > Turning on the web app disables other interfaces like TCP and FTP.
 Meaning the NL-43 appears to have mutually exclusive service/mode behavior (or at least serious conflicts). That matters because:
 - If any workflow toggles modes (explicitly or implicitly), it could destabilize the service lifecycle.
 - It reduces the possibility of using “web UI toggle” as an easy remote recovery mechanism **if** it disables the services needed.
 We have not yet run a controlled long test to determine whether:
 - mode switching contributes directly to the 2255 listener dying, OR
 - it happens even in a pure TCP-only mode with no switching.
 ---
 ## 9) Immediate operational decision (field tomorrow)
 Because the device is needed in the field immediately, you chose:
 - **Old-school manual deployment**
 - **Manual SD card downloads**
 - Avoid reliance on 2255/TCP control and remote workflows for now.
 **Important operational note:**
 The 2255 listener dying does not necessarily stop the NL-43 from measuring; it primarily breaks remote control/polling. Manual SD workflow sidesteps the entire remote control dependency.
 ---
 ## 10) What’s next (future work — when the unit is back)
 Because long tests can’t be run before tomorrow, the plan is to resume in a few weeks with controlled experiments designed to isolate the trigger and develop an operational mitigation.
 ### 10.1) Controlled experiment matrix (recommended)
 Run each test for 24–72 hours, or until wedge occurs, and record:
 - number of TCP connects
 - whether connections are persistent
 - whether FTP is used
 - whether any mode toggling is performed
 - time-to-wedge
 #### Test A — TCP-only (ideal baseline)
 - TCP control only (2255)
 - **True persistent connection** (open once, keep forever)
 - No FTP
 - No web mode toggling
 Outcome interpretation:
 - If stable: connection churn and/or FTP/mode switching is the trigger.
 - If wedges anyway: pure 2255 daemon leak/bug.
 #### Test B — TCP with connection churn
 - Same as A but intentionally reconnect on a schedule (current SLMM behavior)
 - No FTP
 Outcome:
 - If this wedges but A doesn’t: churn is the trigger.
 #### Test C — FTP activity + TCP
 - Introduce scheduled FTP sessions (downloads) while using TCP control
 - Observe whether wedge correlates with FTP use or with post-download periods.
 Outcome:
 - If wedge correlates with FTP, suspect internal service lifecycle conflict.
 #### Test D — Web mode interaction (only if safe/possible)
 - Evaluate what toggling web mode does to TCP/FTP services.
 - Determine if any remote-safe “soft reset” exists.
 ---
 ## 11) Mitigation options (ranked)
 ### Option 1 — Make SLMM truly persistent (highest probability of success)
 If the NL-43 wedges due to session churn or leaked socket states, the best mitigation is:
 - Open one TCP socket per device
 - Keep it open indefinitely
 - Use OS keepalive
 - Do **not** rotate connections on timers
 - Reconnect only when the socket actually dies
 This reduces:
 - connect/close cycles
 - NAT edge-case exposure
 - resource churn inside NL-43
 ### Option 2 — Service “soft reset” (if possible without disabling required services)
 If there exists any way to restart the 2255 service without power cycling:
 - LAN TCP toggle (if it doesn’t require web mode)
 - any “restart comms” command (unknown)
 - any maintenance menu sequence
 then SLMM could:
 - detect wedge
 - trigger soft reset
 - recover automatically
 Current constraint: web app mode appears to disable other services, so this may not be viable.
 ### Option 3 — Hardware watchdog power cycle (industrial but reliable)
 If this is a firmware bug with no clean workaround:
 - Add a remotely controlled relay/power switch
 - On wedge detection, power-cycle NL-43 automatically
 - Optionally schedule a nightly power cycle to prevent leak accumulation
 This is “field reality” and often the only long-term move with embedded devices.
 ### Option 4 — Vendor escalation (Rion)
 You now have excellent evidence:
 - LAN-side proof: 2255 dead while 21 alive
 - WAN packet evidence
 - clear isolation of RX55 innocence
 This is strong enough to send to Rion support as a firmware defect report.
 ---
 ## 12) Repro “wedge bundle” checklist (for future captures)
 When the wedge happens again, capture these before power cycling:
 1) From server:
 - `nc -vz 63.45.161.30 2255` (expect refused)
 - `nc -vz 63.45.161.30 21`   (expect success if FTP alive)
 2) From LAN side (via switch/laptop):
 - `Test-NetConnection 192.168.1.10 -Port 2255`
 - `Test-NetConnection 192.168.1.10 -Port 21`
 3) Optional: packet capture around the refused attempt.
 4) Record:
 - last successful poll timestamp
 - last FTP session timestamp
 - any scheduled start/stop/download cycles near wedge time
 - SLMM connection reuse/rotation settings in effect
 ---
 ## 13) Final, current-state summary (as of 2026-02-18)
 - The issue is **NOT** the RX55 rejecting inbound connections.
 - The NL-43 is **alive**, reachable on LAN, and FTP works.
 - The NL-43’s **TCP control listener on 2255 stops listening** while the device remains otherwise healthy.
 - The wedge can occur hours after successful operations.
 - The unit is needed in the field immediately, so investigation pauses.
 - Next phase: controlled tests to isolate trigger + implement mitigation (persistent socket or watchdog reset).
 ---
 ## 14) Notes / misc observations
 - The Wireshark trace showed repeated FTP sessions were opened and closed cleanly, but SLMM’s “FTP requests” were not valid FTP (causing `530 Not logged in`). That was part of experimentation, not a normal workflow.
 - UDP “success” via netcat is not meaningful because UDP has no handshake; it simply indicates no ICMP unreachable was returned.
 ---
 **End of document.**
@@ -0,0 +1,322 @@
 """
 Threshold alert engine.
 Each unit can have any number of AlertRules. A rule is evaluated against the
 unit's live monitor snapshots via a small per-(unit, rule) state machine:
    IDLE  --(metric exceeds threshold for duration_s)-->  ACTIVE   (fire ONSET)
    ACTIVE --(metric recovers past hysteresis for duration_s)--> IDLE (fire CLEAR)
 duration_s debounces both edges; clear_margin_db adds hysteresis so a level
 hovering at the threshold doesn't flap. Onset and clear are distinct events.
 The state-machine logic (`_evaluate_step`) is intentionally pure — no DB, no
 real clock — so it can be unit-tested with a synthetic level series and a fake
 clock. The AlertEvaluator wraps it with rule loading, scheduling, persistence,
 and dispatch. Dispatch is a server log for now (POC); the seam to POST events to
 a Terra-View webhook (email/SMS) is _dispatch().
 """
 import asyncio
 import logging
 import os
 from dataclasses import dataclass
 from datetime import datetime, timedelta
 from typing import Dict, List, Optional, Tuple
 logger = logging.getLogger(__name__)
 # Local timezone offset for schedule windows (same env var services.py uses).
 _TZ_OFFSET_HOURS = float(os.getenv("TIMEZONE_OFFSET", "-5"))
 # How long to cache a unit's rules before re-querying the DB (rules change rarely).
 _RULE_CACHE_TTL_S = 15.0
@dataclass
 class RuleState:
    """In-memory runtime state for one (unit, rule)."""
    phase: str = "idle"                 # "idle" | "active"
    edge_since: Optional[float] = None  # when the current edge condition began (clock time)
    peak: float = 0.0
    event_id: Optional[int] = None      # the open AlertEvent row (for the clear update)
    last_onset: Optional[float] = None  # time of the last onset (for cooldown)
 def _exceeds(value: float, rule) -> bool:
    if rule.comparison == "below":
        return value < rule.threshold_db
    return value > rule.threshold_db
 def _recovered(value: float, rule) -> bool:
    margin = rule.clear_margin_db or 0.0
    if rule.comparison == "below":
        return value > rule.threshold_db + margin
    return value < rule.threshold_db - margin
 def _evaluate_step(state: RuleState, value: float, now: float, rule) -> Optional[str]:
    """Advance the state machine by one reading.
    Pure: mutates `state`, returns 'onset' | 'clear' | None. `now` is injected so
    tests can drive a fake clock.
    """
    duration = rule.duration_s or 0
    if state.phase == "idle":
        if _exceeds(value, rule):
            if state.edge_since is None:
                state.edge_since = now
            if now - state.edge_since >= duration:
                # Cooldown: suppress a new onset within cooldown_s of the last one
                # (stops a repeatedly-breaching signal from flooding the history).
                # Hold edge_since so it fires the moment cooldown lapses if still
                # breaching — don't reset it here.
                cooldown = getattr(rule, "cooldown_s", 0) or 0
                if state.last_onset is not None and (now - state.last_onset) < cooldown:
                    return None
                state.phase = "active"
                state.edge_since = None
                state.peak = value
                state.last_onset = now
                return "onset"
        else:
            state.edge_since = None
        return None
    # active
    if rule.comparison == "below":
        state.peak = min(state.peak, value)
    else:
        state.peak = max(state.peak, value)
    if _recovered(value, rule):
        if state.edge_since is None:
            state.edge_since = now
        if now - state.edge_since >= duration:
            state.phase = "idle"
            state.edge_since = None
            return "clear"
    else:
        state.edge_since = None
    return None
 def _in_window(now_minutes: int, start: str, end: str) -> bool:
    """Is now_minutes (minutes since local midnight) within [start, end)?
    Handles wraparound windows like 22:00–07:00."""
    def _m(s: str) -> int:
        h, m = s.split(":")
        return int(h) * 60 + int(m)
    s, e = _m(start), _m(end)
    if s == e:
        return True
    if s < e:
        return s <= now_minutes < e
    return now_minutes >= s or now_minutes < e  # wraparound
 class AlertEvaluator:
    def __init__(self):
        self._states: Dict[Tuple[str, int], RuleState] = {}
        self._rule_cache: Dict[str, Tuple[float, list]] = {}  # unit_id -> (fetched_at, rules)
        self._offline_events: Dict[str, int] = {}  # unit_id -> open connectivity AlertEvent id
        logger.info("[ALERT] rule-based evaluator ready")
    async def evaluate(self, unit_id: str, snap) -> None:
        """Evaluate every enabled rule for this unit against one snapshot."""
        rules = self._get_rules(unit_id)
        if not rules:
            return
        now = asyncio.get_running_loop().time()
        for rule in rules:
            if not self._in_schedule(rule):
                continue
            raw = getattr(snap, rule.metric, None)
            try:
                value = float(raw)
            except (TypeError, ValueError):
                continue  # missing / non-numeric ("-.-")
            state = self._states.setdefault((unit_id, rule.id), RuleState())
            action = _evaluate_step(state, value, now, rule)
            if action == "onset":
                await self._on_onset(unit_id, rule, value, state)
            elif action == "clear":
                await self._on_clear(unit_id, rule, value, state)
    # -- rule loading (cached) ----------------------------------------------
    def _get_rules(self, unit_id: str) -> list:
        loop_now = asyncio.get_running_loop().time()
        cached = self._rule_cache.get(unit_id)
        if cached and loop_now - cached[0] < _RULE_CACHE_TTL_S:
            return cached[1]
        rules = self._load_rules(unit_id)
        self._rule_cache[unit_id] = (loop_now, rules)
        return rules
    def _load_rules(self, unit_id: str) -> list:
        from app.database import SessionLocal
        from app.models import AlertRule
        db = SessionLocal()
        try:
            return db.query(AlertRule).filter_by(unit_id=unit_id, enabled=True).all()
        except Exception as e:
            logger.warning(f"[ALERT] failed to load rules for {unit_id}: {e}")
            return []
        finally:
            db.close()
    def invalidate(self, unit_id: Optional[str] = None) -> None:
        """Drop cached rules so a change is picked up immediately."""
        if unit_id is None:
            self._rule_cache.clear()
        else:
            self._rule_cache.pop(unit_id, None)
    def forget_rule(self, unit_id: str, rule_id: int) -> None:
        """Drop a rule's per-(unit, rule) state machine after the rule is edited or
        deleted, so a stale 'active' phase / open event_id from the old config
        doesn't bleed into the new one (mis-firing a clear or suppressing an onset)."""
        self._states.pop((unit_id, rule_id), None)
    # -- scheduling ----------------------------------------------------------
    def _in_schedule(self, rule) -> bool:
        if not rule.schedule_start or not rule.schedule_end:
            day_ok = self._day_ok(rule)
            return day_ok
        local = datetime.utcnow() + timedelta(hours=_TZ_OFFSET_HOURS)
        if not self._day_ok(rule, local):
            return False
        return _in_window(local.hour * 60 + local.minute, rule.schedule_start, rule.schedule_end)
    @staticmethod
    def _day_ok(rule, local: Optional[datetime] = None) -> bool:
        if not rule.schedule_days:
            return True
        if local is None:
            local = datetime.utcnow() + timedelta(hours=_TZ_OFFSET_HOURS)
        allowed = {int(d) for d in str(rule.schedule_days).split(",") if d.strip() != ""}
        return local.weekday() in allowed  # Mon=0
    # -- event persistence + dispatch ---------------------------------------
    async def _on_onset(self, unit_id: str, rule, value: float, state: RuleState) -> None:
        from app.database import SessionLocal
        from app.models import AlertEvent
        db = SessionLocal()
        try:
            evt = AlertEvent(
                rule_id=rule.id, unit_id=unit_id, rule_name=rule.name,
                metric=rule.metric, threshold_db=rule.threshold_db,
                onset_value=value, peak_value=value, status="active",
            )
            db.add(evt)
            db.commit()
            db.refresh(evt)
            state.event_id = evt.id
        except Exception as e:
            logger.warning(f"[ALERT] failed to record onset for {unit_id}: {e}")
        finally:
            db.close()
        await self._dispatch(
            "ONSET", unit_id, rule,
            f"{rule.metric.upper()}={value:.1f} dB "
            f"{'<' if rule.comparison == 'below' else '>'} {rule.threshold_db:.1f} dB"
            f"{f' for {rule.duration_s}s' if rule.duration_s else ''}",
        )
    async def _on_clear(self, unit_id: str, rule, value: float, state: RuleState) -> None:
        peak = state.peak
        from app.database import SessionLocal
        from app.models import AlertEvent
        db = SessionLocal()
        try:
            if state.event_id is not None:
                evt = db.query(AlertEvent).filter_by(id=state.event_id).first()
                if evt:
                    evt.clear_at = datetime.utcnow()
                    evt.peak_value = peak
                    evt.status = "cleared"
                    db.commit()
        except Exception as e:
            logger.warning(f"[ALERT] failed to record clear for {unit_id}: {e}")
        finally:
            db.close()
        state.event_id = None
        await self._dispatch(
            "CLEAR", unit_id, rule,
            f"recovered to {value:.1f} dB (peak {peak:.1f} dB)",
        )
    # -- connectivity (device offline/online) -------------------------------
    #
    # Raised by the live monitor when it loses / regains contact with a device.
    # Persisted as an AlertEvent (sentinel rule_id=0, metric="connectivity") so it
    # lands in the same events/inbox/ack pipeline as threshold alerts. The in-memory
    # map dedupes; the DB query also dedupes across a process restart.
    async def device_offline(self, unit_id: str) -> None:
        if unit_id in self._offline_events:
            return  # already flagged offline
        from app.database import SessionLocal
        from app.models import AlertEvent
        db = SessionLocal()
        try:
            existing = db.query(AlertEvent).filter_by(
                unit_id=unit_id, metric="connectivity", status="active").first()
            if existing:  # already open in the DB (e.g. carried across a restart)
                self._offline_events[unit_id] = existing.id
                return
            evt = AlertEvent(
                rule_id=0, unit_id=unit_id, rule_name="Device unreachable",
                metric="connectivity", threshold_db=0.0, status="active",
            )
            db.add(evt)
            db.commit()
            db.refresh(evt)
            self._offline_events[unit_id] = evt.id
        except Exception as e:
            logger.warning(f"[ALERT] failed to record offline for {unit_id}: {e}")
        finally:
            db.close()
        await self._dispatch_raw("OFFLINE", unit_id, "Device unreachable",
                                 "live monitor lost contact with the device")
    async def device_online(self, unit_id: str) -> None:
        self._offline_events.pop(unit_id, None)
        from app.database import SessionLocal
        from app.models import AlertEvent
        db = SessionLocal()
        cleared = 0
        try:
            opened = db.query(AlertEvent).filter_by(
                unit_id=unit_id, metric="connectivity", status="active").all()
            for evt in opened:
                evt.clear_at = datetime.utcnow()
                evt.status = "cleared"
                cleared += 1
            if cleared:
                db.commit()
        except Exception as e:
            logger.warning(f"[ALERT] failed to record online for {unit_id}: {e}")
        finally:
            db.close()
        if cleared:  # only announce recovery if it was actually flagged offline
            await self._dispatch_raw("ONLINE", unit_id, "Device recovered",
                                     "live monitor regained contact with the device")
    # -- event persistence + dispatch ---------------------------------------
    async def _dispatch(self, kind: str, unit_id: str, rule, detail: str) -> None:
        await self._dispatch_raw(kind, unit_id, rule.name, detail)
    async def _dispatch_raw(self, kind: str, unit_id: str, name: str, detail: str) -> None:
        """POC dispatch: server log. Swap in a Terra-View webhook (email/SMS) here."""
        logger.warning(f"[ALERT:{kind}] {unit_id} '{name}': {detail}")
 # Module-level singleton (the monitor calls alert_evaluator.evaluate per snapshot)
 alert_evaluator = AlertEvaluator()
@@ -0,0 +1,411 @@
 """
 Background polling service for NL43 devices.
 This module provides continuous, automatic polling of configured NL43 devices
 at configurable intervals. Status snapshots are persisted to the database
 for fast API access without querying devices on every request.
 """
 import asyncio
 import logging
 import os
 from datetime import datetime, timedelta
 from typing import Optional
 from sqlalchemy.orm import Session
 from app.database import SessionLocal
 from app.models import NL43Config, NL43Status
 from app.services import NL43Client, persist_snapshot, sync_measurement_start_time_from_ftp
 from app.device_logger import log_device_event, cleanup_old_logs
 logger = logging.getLogger(__name__)
 # Global polling default. Set SLMM_POLLING_ENABLED=false to start an instance in
 # standby (running but not polling and not holding device connections) — e.g. a
 # dev box that must not latch onto a device that a prod instance owns.
 POLLING_ENABLED_DEFAULT = os.getenv("SLMM_POLLING_ENABLED", "true").lower() == "true"
 class BackgroundPoller:
    """
    Background task that continuously polls NL43 devices and updates status cache.
    Features:
    - Per-device configurable poll intervals (30 seconds to 6 hours)
    - Automatic offline detection (marks unreachable after 3 consecutive failures)
    - Dynamic sleep intervals based on device configurations
    - Graceful shutdown on application stop
    - Respects existing rate limiting (1-second minimum between commands)
    """
    def __init__(self):
        self._task: Optional[asyncio.Task] = None
        self._running = False
        self._logger = logger
        self._last_cleanup = None  # Track last log cleanup time
        self._last_pool_log = None  # Track last connection pool heartbeat log
        self._active = POLLING_ENABLED_DEFAULT  # Global polling on/off (standby toggle)
    async def start(self):
        """Start the background polling task."""
        if self._running:
            self._logger.warning("Background poller already running")
            return
        self._running = True
        self._task = asyncio.create_task(self._poll_loop())
        self._logger.info("Background poller task created")
    async def stop(self):
        """Gracefully stop the background polling task."""
        if not self._running:
            return
        self._logger.info("Stopping background poller...")
        self._running = False
        if self._task:
            try:
                await asyncio.wait_for(self._task, timeout=5.0)
            except asyncio.TimeoutError:
                self._logger.warning("Background poller task did not stop gracefully, cancelling...")
                self._task.cancel()
                try:
                    await self._task
                except asyncio.CancelledError:
                    pass
        self._logger.info("Background poller stopped")
    def is_active(self) -> bool:
        """Whether background polling is currently active (vs standby)."""
        return self._active
    async def set_active(self, active: bool):
        """Globally enable/disable polling at runtime.
        When deactivated, the loop stays alive but polls nothing and releases all
        device connections, so this SLMM instance stops occupying the devices'
        single connection slots (e.g. so a prod instance can take over). Runtime
        state only — on restart the instance returns to SLMM_POLLING_ENABLED.
        """
        self._active = active
        if active:
            self._logger.info("[SYSTEM] Background polling ACTIVATED")
        else:
            self._logger.info("[SYSTEM] Background polling DEACTIVATED (standby) — releasing connections")
            await self._release_all_connections()
    async def _release_all_connections(self):
        """Gracefully close every pooled device connection (no-op if none)."""
        from app.services import _connection_pool
        for device_key in list(_connection_pool.get_stats().get("connections", {})):
            await _connection_pool.discard(device_key)
    async def _poll_loop(self):
        """Main polling loop that runs continuously."""
        self._logger.info("Background polling loop started")
        while self._running:
            if self._active:
                try:
                    await self._poll_all_devices()
                except Exception as e:
                    self._logger.error(f"Error in poll loop: {e}", exc_info=True)
            else:
                # Standby: poll nothing, and keep holding no device connection slots
                # so another SLMM instance (e.g. prod) can talk to the devices.
                try:
                    await self._release_all_connections()
                except Exception as e:
                    self._logger.warning(f"Standby connection release failed: {e}")
            # Run log cleanup once per hour
            try:
                now = datetime.utcnow()
                if self._last_cleanup is None or (now - self._last_cleanup).total_seconds() > 3600:
                    cleanup_old_logs()
                    self._last_cleanup = now
            except Exception as e:
                self._logger.warning(f"Log cleanup failed: {e}")
            # Log connection pool status every 15 minutes
            try:
                now = datetime.utcnow()
                if self._last_pool_log is None or (now - self._last_pool_log).total_seconds() > 900:
                    from app.services import _connection_pool
                    stats = _connection_pool.get_stats()
                    conns = stats.get("connections", {})
                    if conns:
                        for key, c in conns.items():
                            self._logger.info(
                                f"[POOL] {key} — age={c['age_seconds']}s idle={c['idle_seconds']}s alive={c['alive']}"
                            )
                    else:
                        self._logger.info("[POOL] No active connections in pool")
                    self._last_pool_log = now
            except Exception as e:
                self._logger.warning(f"Pool status log failed: {e}")
            # Calculate dynamic sleep interval
            sleep_time = self._calculate_sleep_interval()
            self._logger.debug(f"Sleeping for {sleep_time} seconds until next poll cycle")
            # Sleep in small intervals to allow graceful shutdown
            for _ in range(int(sleep_time)):
                if not self._running:
                    break
                await asyncio.sleep(1)
        self._logger.info("Background polling loop exited")
    async def _poll_all_devices(self):
        """Poll all configured devices that are due for polling."""
        db: Session = SessionLocal()
        try:
            # Get all devices with TCP and polling enabled
            configs = db.query(NL43Config).filter_by(
                tcp_enabled=True,
                poll_enabled=True
            ).all()
            if not configs:
                self._logger.debug("No devices configured for polling")
                return
            self._logger.debug(f"Checking {len(configs)} devices for polling")
            now = datetime.utcnow()
            polled_count = 0
            from app.monitor import monitor_manager
            for cfg in configs:
                if not self._running:
                    break
                # Skip units with an active live monitor: it polls them at ~1Hz and
                # keeps the status cache fresh, so a redundant background poll would just
                # add load/lock-contention on the device's single connection.
                if monitor_manager.is_active(cfg.unit_id):
                    self._logger.debug(f"Skipping {cfg.unit_id} — live monitor active")
                    continue
                # Get current status
                status = db.query(NL43Status).filter_by(unit_id=cfg.unit_id).first()
                # Check if device should be polled
                if self._should_poll(cfg, status, now):
                    await self._poll_device(cfg, db)
                    polled_count += 1
                else:
                    self._logger.debug(f"Skipping {cfg.unit_id} - interval not elapsed")
            if polled_count > 0:
                self._logger.info(f"Polled {polled_count}/{len(configs)} devices")
        finally:
            db.close()
    def _should_poll(self, cfg: NL43Config, status: Optional[NL43Status], now: datetime) -> bool:
        """
        Determine if a device should be polled based on interval and last poll time.
        Args:
            cfg: Device configuration
            status: Current device status (may be None if never polled)
            now: Current UTC timestamp
        Returns:
            True if device should be polled, False otherwise
        """
        # If never polled before, poll now
        if not status or not status.last_poll_attempt:
            self._logger.debug(f"Device {cfg.unit_id} never polled, polling now")
            return True
        # Calculate elapsed time since last poll attempt
        interval = cfg.poll_interval_seconds or 60
        elapsed = (now - status.last_poll_attempt).total_seconds()
        should_poll = elapsed >= interval
        if should_poll:
            self._logger.debug(
                f"Device {cfg.unit_id} due for polling: {elapsed:.1f}s elapsed, interval={interval}s"
            )
        return should_poll
    async def _poll_device(self, cfg: NL43Config, db: Session):
        """
        Poll a single device and update its status in the database.
        Args:
            cfg: Device configuration
            db: Database session
        """
        unit_id = cfg.unit_id
        self._logger.info(f"Polling device {unit_id} at {cfg.host}:{cfg.tcp_port}")
        # Get or create status record
        status = db.query(NL43Status).filter_by(unit_id=unit_id).first()
        if not status:
            status = NL43Status(unit_id=unit_id)
            db.add(status)
        # Update last_poll_attempt immediately
        status.last_poll_attempt = datetime.utcnow()
        db.commit()
        # Create client and attempt to poll
        client = NL43Client(
            cfg.host,
            cfg.tcp_port,
            timeout=5.0,
            ftp_username=cfg.ftp_username,
            ftp_password=cfg.ftp_password,
            ftp_port=cfg.ftp_port or 21
        )
        try:
            # Send DOD? command to get device status
            snap = await client.request_dod()
            snap.unit_id = unit_id
            # Success - persist snapshot and reset failure counter
            persist_snapshot(snap, db)
            status.is_reachable = True
            status.consecutive_failures = 0
            status.last_success = datetime.utcnow()
            status.last_error = None
            db.commit()
            self._logger.info(f"✓ Successfully polled {unit_id}")
            # Log to device log
            log_device_event(
                unit_id, "INFO", "POLL",
                f"Poll success: state={snap.measurement_state}, Leq={snap.leq}, Lp={snap.lp}",
                db
            )
            # Check if device is measuring but has no start time recorded
            # This happens if measurement was started before SLMM began polling
            # or after a service restart
            status = db.query(NL43Status).filter_by(unit_id=unit_id).first()
            # Reset the sync flag when measurement stops (so next measurement can sync)
            if status and status.measurement_state != "Start":
                if status.start_time_sync_attempted:
                    status.start_time_sync_attempted = False
                    db.commit()
                    self._logger.debug(f"Reset FTP sync flag for {unit_id} (measurement stopped)")
                    log_device_event(unit_id, "DEBUG", "STATE", "Measurement stopped, reset FTP sync flag", db)
            # Attempt FTP sync if:
            # - Device is measuring
            # - No start time recorded
            # - FTP sync not already attempted for this measurement
            # - FTP is configured
            if (status and
                status.measurement_state == "Start" and
                status.measurement_start_time is None and
                not status.start_time_sync_attempted and
                cfg.ftp_enabled and
                cfg.ftp_username and
                cfg.ftp_password):
                self._logger.info(
                    f"Device {unit_id} is measuring but has no start time - "
                    f"attempting FTP sync"
                )
                log_device_event(unit_id, "INFO", "SYNC", "Attempting FTP sync for measurement start time", db)
                # Mark that we attempted sync (prevents repeated attempts on failure)
                status.start_time_sync_attempted = True
                db.commit()
                try:
                    synced = await sync_measurement_start_time_from_ftp(
                        unit_id=unit_id,
                        host=cfg.host,
                        tcp_port=cfg.tcp_port,
                        ftp_port=cfg.ftp_port or 21,
                        ftp_username=cfg.ftp_username,
                        ftp_password=cfg.ftp_password,
                        db=db
                    )
                    if synced:
                        self._logger.info(f"✓ FTP sync succeeded for {unit_id}")
                        log_device_event(unit_id, "INFO", "SYNC", "FTP sync succeeded - measurement start time updated", db)
                    else:
                        self._logger.warning(f"FTP sync returned False for {unit_id}")
                        log_device_event(unit_id, "WARNING", "SYNC", "FTP sync returned False", db)
                except Exception as sync_err:
                    self._logger.warning(
                        f"FTP sync failed for {unit_id}: {sync_err}"
                    )
                    log_device_event(unit_id, "ERROR", "SYNC", f"FTP sync failed: {sync_err}", db)
        except Exception as e:
            # Failure - increment counter and potentially mark offline
            status.consecutive_failures += 1
            error_msg = str(e)[:500]  # Truncate to prevent bloat
            status.last_error = error_msg
            # Mark unreachable after 3 consecutive failures
            if status.consecutive_failures >= 3:
                if status.is_reachable:  # Only log transition
                    self._logger.warning(
                        f"Device {unit_id} marked unreachable after {status.consecutive_failures} failures: {error_msg}"
                    )
                    log_device_event(unit_id, "ERROR", "POLL", f"Device marked UNREACHABLE after {status.consecutive_failures} failures: {error_msg}", db)
                status.is_reachable = False
            else:
                self._logger.warning(
                    f"Poll failed for {unit_id} (attempt {status.consecutive_failures}/3): {error_msg}"
                )
                log_device_event(unit_id, "WARNING", "POLL", f"Poll failed (attempt {status.consecutive_failures}/3): {error_msg}", db)
            db.commit()
    def _calculate_sleep_interval(self) -> int:
        """
        Calculate the next sleep interval based on all device poll intervals.
        Returns a dynamic sleep time that ensures responsive polling:
        - Minimum 30 seconds (prevents tight loops)
        - Maximum 300 seconds / 5 minutes (ensures reasonable responsiveness for long intervals)
        - Generally half the minimum device interval
        Returns:
            Sleep interval in seconds
        """
        db: Session = SessionLocal()
        try:
            configs = db.query(NL43Config).filter_by(
                tcp_enabled=True,
                poll_enabled=True
            ).all()
            if not configs:
                return 60  # Default sleep when no devices configured
            # Get all intervals
            intervals = [cfg.poll_interval_seconds or 60 for cfg in configs]
            min_interval = min(intervals)
            # Use half the minimum interval, but cap between 30-300 seconds
            # This allows longer sleep times when polling intervals are long (e.g., hourly)
            sleep_time = max(30, min(300, min_interval // 2))
            return sleep_time
        finally:
            db.close()
 # Global singleton instance
 poller = BackgroundPoller()
@@ -0,0 +1,277 @@
 """
 Per-device logging system.
 Provides dual output: database entries for structured queries and file logs for backup.
 Each device gets its own log file in data/logs/{unit_id}.log with rotation.
 """
 import logging
 import os
 from datetime import datetime, timedelta
 from logging.handlers import RotatingFileHandler
 from pathlib import Path
 from typing import Optional
 from sqlalchemy.orm import Session
 from app.database import SessionLocal
 from app.models import DeviceLog
 # Configure base logger
 logger = logging.getLogger(__name__)
 # Log directory (persisted in Docker volume)
 LOG_DIR = Path(os.path.dirname(os.path.dirname(__file__))) / "data" / "logs"
 LOG_DIR.mkdir(parents=True, exist_ok=True)
 # Per-device file loggers (cached)
 _device_file_loggers: dict = {}
 # Log retention (days)
 LOG_RETENTION_DAYS = int(os.getenv("LOG_RETENTION_DAYS", "7"))
 def _get_file_logger(unit_id: str) -> logging.Logger:
    """Get or create a file logger for a specific device."""
    if unit_id in _device_file_loggers:
        return _device_file_loggers[unit_id]
    # Create device-specific logger
    device_logger = logging.getLogger(f"device.{unit_id}")
    device_logger.setLevel(logging.DEBUG)
    # Avoid duplicate handlers
    if not device_logger.handlers:
        # Create rotating file handler (5 MB max, keep 3 backups)
        log_file = LOG_DIR / f"{unit_id}.log"
        handler = RotatingFileHandler(
            log_file,
            maxBytes=5 * 1024 * 1024,  # 5 MB
            backupCount=3,
            encoding="utf-8"
        )
        handler.setLevel(logging.DEBUG)
        # Format: timestamp [LEVEL] [CATEGORY] message
        formatter = logging.Formatter(
            "%(asctime)s [%(levelname)s] [%(category)s] %(message)s",
            datefmt="%Y-%m-%d %H:%M:%S"
        )
        handler.setFormatter(formatter)
        device_logger.addHandler(handler)
        # Don't propagate to root logger
        device_logger.propagate = False
    _device_file_loggers[unit_id] = device_logger
    return device_logger
 def log_device_event(
    unit_id: str,
    level: str,
    category: str,
    message: str,
    db: Optional[Session] = None
 ):
    """
    Log an event for a specific device.
    Writes to both:
    1. Database (DeviceLog table) for structured queries
    2. File (data/logs/{unit_id}.log) for backup/debugging
    Args:
        unit_id: Device identifier
        level: Log level (DEBUG, INFO, WARNING, ERROR)
        category: Event category (TCP, FTP, POLL, COMMAND, STATE, SYNC)
        message: Log message
        db: Optional database session (creates one if not provided)
    """
    timestamp = datetime.utcnow()
    # Write to file log
    try:
        file_logger = _get_file_logger(unit_id)
        log_func = getattr(file_logger, level.lower(), file_logger.info)
        # Pass category as extra for formatter
        log_func(message, extra={"category": category})
    except Exception as e:
        logger.warning(f"Failed to write file log for {unit_id}: {e}")
    # Write to database
    close_db = False
    try:
        if db is None:
            db = SessionLocal()
            close_db = True
        log_entry = DeviceLog(
            unit_id=unit_id,
            timestamp=timestamp,
            level=level.upper(),
            category=category.upper(),
            message=message
        )
        db.add(log_entry)
        db.commit()
    except Exception as e:
        logger.warning(f"Failed to write DB log for {unit_id}: {e}")
        if db:
            db.rollback()
    finally:
        if close_db and db:
            db.close()
 def cleanup_old_logs(retention_days: Optional[int] = None, db: Optional[Session] = None):
    """
    Delete log entries older than retention period.
    Args:
        retention_days: Days to retain (default: LOG_RETENTION_DAYS env var or 7)
        db: Optional database session
    """
    if retention_days is None:
        retention_days = LOG_RETENTION_DAYS
    cutoff = datetime.utcnow() - timedelta(days=retention_days)
    close_db = False
    try:
        if db is None:
            db = SessionLocal()
            close_db = True
        deleted = db.query(DeviceLog).filter(DeviceLog.timestamp < cutoff).delete()
        db.commit()
        if deleted > 0:
            logger.info(f"Cleaned up {deleted} log entries older than {retention_days} days")
    except Exception as e:
        logger.error(f"Failed to cleanup old logs: {e}")
        if db:
            db.rollback()
    finally:
        if close_db and db:
            db.close()
 def get_device_logs(
    unit_id: str,
    limit: int = 100,
    offset: int = 0,
    level: Optional[str] = None,
    category: Optional[str] = None,
    since: Optional[datetime] = None,
    db: Optional[Session] = None
 ) -> list:
    """
    Query log entries for a specific device.
    Args:
        unit_id: Device identifier
        limit: Max entries to return (default: 100)
        offset: Number of entries to skip (default: 0)
        level: Filter by level (DEBUG, INFO, WARNING, ERROR)
        category: Filter by category (TCP, FTP, POLL, COMMAND, STATE, SYNC)
        since: Filter entries after this timestamp
        db: Optional database session
    Returns:
        List of log entries as dicts
    """
    close_db = False
    try:
        if db is None:
            db = SessionLocal()
            close_db = True
        query = db.query(DeviceLog).filter(DeviceLog.unit_id == unit_id)
        if level:
            query = query.filter(DeviceLog.level == level.upper())
        if category:
            query = query.filter(DeviceLog.category == category.upper())
        if since:
            query = query.filter(DeviceLog.timestamp >= since)
        # Order by newest first
        query = query.order_by(DeviceLog.timestamp.desc())
        # Apply pagination
        entries = query.offset(offset).limit(limit).all()
        return [
            {
                "id": e.id,
                "timestamp": e.timestamp.isoformat() + "Z",
                "level": e.level,
                "category": e.category,
                "message": e.message
            }
            for e in entries
        ]
    finally:
        if close_db and db:
            db.close()
 def get_log_stats(unit_id: str, db: Optional[Session] = None) -> dict:
    """
    Get log statistics for a device.
    Returns:
        Dict with counts by level and category
    """
    close_db = False
    try:
        if db is None:
            db = SessionLocal()
            close_db = True
        total = db.query(DeviceLog).filter(DeviceLog.unit_id == unit_id).count()
        # Count by level
        level_counts = {}
        for level in ["DEBUG", "INFO", "WARNING", "ERROR"]:
            count = db.query(DeviceLog).filter(
                DeviceLog.unit_id == unit_id,
                DeviceLog.level == level
            ).count()
            if count > 0:
                level_counts[level] = count
        # Count by category
        category_counts = {}
        for category in ["TCP", "FTP", "POLL", "COMMAND", "STATE", "SYNC", "GENERAL"]:
            count = db.query(DeviceLog).filter(
                DeviceLog.unit_id == unit_id,
                DeviceLog.category == category
            ).count()
            if count > 0:
                category_counts[category] = count
        # Get oldest and newest
        oldest = db.query(DeviceLog).filter(
            DeviceLog.unit_id == unit_id
        ).order_by(DeviceLog.timestamp.asc()).first()
        newest = db.query(DeviceLog).filter(
            DeviceLog.unit_id == unit_id
        ).order_by(DeviceLog.timestamp.desc()).first()
        return {
            "total": total,
            "by_level": level_counts,
            "by_category": category_counts,
            "oldest": oldest.timestamp.isoformat() + "Z" if oldest else None,
            "newest": newest.timestamp.isoformat() + "Z" if newest else None
        }
    finally:
        if close_db and db:
            db.close()
@@ -1,5 +1,6 @@
 import os
 import logging
 from contextlib import asynccontextmanager
 from fastapi import FastAPI, Request
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import HTMLResponse
@@ -7,6 +8,7 @@ from fastapi.templating import Jinja2Templates
 from app.database import Base, engine
 from app import routers
 from app.background_poller import poller
 # Configure logging
 logging.basicConfig(
@@ -23,10 +25,54 @@ logger = logging.getLogger(__name__)
 Base.metadata.create_all(bind=engine)
 logger.info("Database tables initialized")
@asynccontextmanager
 async def lifespan(app: FastAPI):
    """Manage application lifecycle - startup and shutdown events."""
    from app.services import _connection_pool
    # Startup
    logger.info("Starting TCP connection pool cleanup task...")
    _connection_pool.start_cleanup()
    logger.info("Starting background poller...")
    await poller.start()
    logger.info("Background poller started")
    # Auto-start keepalive live monitors for units configured for 24/7 monitoring
    # (monitor_enabled). This is what keeps alerting running unattended across
    # restarts — without it a feed only runs while someone has the live view open.
    try:
        from app.monitor import monitor_manager
        from app.database import SessionLocal
        from app.models import NL43Config
        db = SessionLocal()
        try:
            units = db.query(NL43Config).filter_by(monitor_enabled=True, tcp_enabled=True).all()
            for cfg in units:
                m = await monitor_manager.get(cfg.unit_id)
                await m.set_keepalive(True)
                logger.info(f"Auto-started keepalive monitor for {cfg.unit_id}")
        finally:
            db.close()
    except Exception as e:
        logger.error(f"Failed to auto-start monitors: {e}")
    yield  # Application runs
    # Shutdown
    logger.info("Stopping background poller...")
    await poller.stop()
    logger.info("Background poller stopped")
    logger.info("Closing TCP connection pool...")
    await _connection_pool.close_all()
    logger.info("TCP connection pool closed")
 app = FastAPI(
    title="SLMM NL43 Addon",
-    description="Standalone module for NL43 configuration and status APIs",
+    description="Standalone module for NL43 configuration and status APIs with background polling",
-    version="0.1.0",
+    version="0.4.0",
    lifespan=lifespan,
 )
 # CORS configuration - use environment variable for allowed origins
@@ -49,7 +95,12 @@ app.include_router(routers.router)
@app.get("/", response_class=HTMLResponse)
 def index(request: Request):
-    return templates.TemplateResponse("index.html", {"request": request})
+    return templates.TemplateResponse(request, "index.html")
@app.get("/roster", response_class=HTMLResponse)
 def roster(request: Request):
    return templates.TemplateResponse(request, "roster.html")
@app.get("/health")
@@ -60,10 +111,14 @@ async def health():
@app.get("/health/devices")
 async def health_devices():
-    """Enhanced health check that tests device connectivity."""
+    """Enhanced health check that tests device connectivity.
    Uses the connection pool to avoid unnecessary TCP handshakes — if a
    cached connection exists and is alive, the device is reachable.
    """
    from sqlalchemy.orm import Session
    from app.database import SessionLocal
-    from app.services import NL43Client
+    from app.services import _connection_pool
    from app.models import NL43Config
    db: Session = SessionLocal()
@@ -73,7 +128,7 @@ async def health_devices():
        configs = db.query(NL43Config).filter_by(tcp_enabled=True).all()
        for cfg in configs:
-            client = NL43Client(cfg.host, cfg.tcp_port, timeout=2.0, ftp_username=cfg.ftp_username, ftp_password=cfg.ftp_password)
+            device_key = f"{cfg.host}:{cfg.tcp_port}"
            status = {
                "unit_id": cfg.unit_id,
                "host": cfg.host,
@@ -83,14 +138,22 @@ async def health_devices():
            }
            try:
-                # Try to connect (don't send command to avoid rate limiting issues)
+                # Check if pool already has a live connection (zero-cost check)
-                import asyncio
+                pool_stats = _connection_pool.get_stats()
-                reader, writer = await asyncio.wait_for(
+                conn_info = pool_stats["connections"].get(device_key)
-                    asyncio.open_connection(cfg.host, cfg.tcp_port), timeout=2.0
+                if conn_info and conn_info["alive"]:
                )
                writer.close()
                await writer.wait_closed()
                    status["reachable"] = True
                    status["source"] = "pool"
                else:
                    # No cached connection — do a lightweight acquire/release
                    # This opens a connection if needed but keeps it in the pool
                    import asyncio
                    reader, writer, from_cache = await _connection_pool.acquire(
                        device_key, cfg.host, cfg.tcp_port, timeout=2.0
                    )
                    await _connection_pool.release(device_key, reader, writer, cfg.host, cfg.tcp_port)
                    status["reachable"] = True
                    status["source"] = "cached" if from_cache else "new"
            except Exception as e:
                status["error"] = str(type(e).__name__)
                logger.warning(f"Device {cfg.unit_id} health check failed: {e}")
@@ -1,4 +1,4 @@
-from sqlalchemy import Column, String, DateTime, Boolean, Integer, Text, func
+from sqlalchemy import Column, String, DateTime, Boolean, Integer, Float, Text, func
 from app.database import Base
@@ -19,6 +19,14 @@ class NL43Config(Base):
    ftp_password = Column(String, nullable=True)  # FTP login password
    web_enabled = Column(Boolean, default=False)
    # Background polling configuration
    poll_interval_seconds = Column(Integer, nullable=True, default=60)  # Polling interval (10-3600 seconds)
    poll_enabled = Column(Boolean, default=True)  # Enable/disable background polling for this device
    # Live monitor (fan-out DOD feed). Keepalive runs it 24/7 even with no viewer,
    # which is what makes alerting continuous. On by default; toggleable from the UI.
    monitor_enabled = Column(Boolean, default=True)
 class NL43Status(Base):
    """
@@ -37,8 +45,107 @@ class NL43Status(Base):
    lmax = Column(String, nullable=True)  # Maximum level
    lmin = Column(String, nullable=True)  # Minimum level
    lpeak = Column(String, nullable=True)  # Peak level
    ln1 = Column(String, nullable=True)  # Percentile slot LN1 (configurable; device default L5, contract L1)
    ln2 = Column(String, nullable=True)  # Percentile slot LN2 (configurable; device default L10)
    battery_level = Column(String, nullable=True)
    power_source = Column(String, nullable=True)
    sd_remaining_mb = Column(String, nullable=True)
    sd_free_ratio = Column(String, nullable=True)
    raw_payload = Column(Text, nullable=True)
    # Background polling status
    is_reachable = Column(Boolean, default=True)  # Device reachability status
    consecutive_failures = Column(Integer, default=0)  # Count of consecutive poll failures
    last_poll_attempt = Column(DateTime, nullable=True)  # Last time background poller attempted to poll
    last_success = Column(DateTime, nullable=True)  # Last successful poll timestamp
    last_error = Column(Text, nullable=True)  # Last error message (truncated to 500 chars)
    # FTP start time sync tracking
    start_time_sync_attempted = Column(Boolean, default=False)  # True if FTP sync was attempted for current measurement
 class DeviceLog(Base):
    """
    Per-device log entries for debugging and audit trail.
    Stores events like commands, state changes, errors, and FTP operations.
    """
    __tablename__ = "device_logs"
    id = Column(Integer, primary_key=True, autoincrement=True)
    unit_id = Column(String, index=True, nullable=False)
    timestamp = Column(DateTime, default=func.now(), index=True)
    level = Column(String, default="INFO")  # DEBUG, INFO, WARNING, ERROR
    category = Column(String, default="GENERAL")  # TCP, FTP, POLL, COMMAND, STATE, SYNC
    message = Column(Text, nullable=False)
 class AlertRule(Base):
    """A threshold-alert rule evaluated against a unit's live monitor feed.
    Source-agnostic: today it runs over the DOD monitor; the same rule transfers
    unchanged if a unit's feed is later sourced from FTP intervals.
    """
    __tablename__ = "alert_rules"
    id = Column(Integer, primary_key=True, autoincrement=True)
    unit_id = Column(String, index=True, nullable=False)
    name = Column(String, nullable=False, default="Alert")
    metric = Column(String, nullable=False, default="lp")  # lp/leq/lmax/lmin/lpeak/ln1/ln2
    comparison = Column(String, nullable=False, default="above")  # above | below
    threshold_db = Column(Float, nullable=False)
    duration_s = Column(Integer, nullable=False, default=0)       # sustained seconds (0 = instant)
    clear_margin_db = Column(Float, nullable=False, default=2.0)  # hysteresis band
    cooldown_s = Column(Integer, nullable=False, default=300)     # min seconds between onsets
    # Optional time-of-day scoping (local time). schedule_start/end as "HH:MM";
    # null = always active. schedule_days = CSV of 0-6 (Mon=0); null = every day.
    schedule_start = Column(String, nullable=True)
    schedule_end = Column(String, nullable=True)
    schedule_days = Column(String, nullable=True)
    channels = Column(String, nullable=False, default="log")  # CSV: log,email,sms
    recipients = Column(Text, nullable=True)                  # CSV of emails/phones
    enabled = Column(Boolean, default=True)
    created_at = Column(DateTime, default=func.now())
 class AlertEvent(Base):
    """A fired alert (onset → clear), for history / inbox / acknowledgement."""
    __tablename__ = "alert_events"
    id = Column(Integer, primary_key=True, autoincrement=True)
    rule_id = Column(Integer, index=True, nullable=False)
    unit_id = Column(String, index=True, nullable=False)
    rule_name = Column(String, nullable=True)
    metric = Column(String, nullable=False)
    threshold_db = Column(Float, nullable=False)
    onset_at = Column(DateTime, default=func.now(), index=True)
    onset_value = Column(Float, nullable=True)
    peak_value = Column(Float, nullable=True)
    clear_at = Column(DateTime, nullable=True)
    status = Column(String, default="active")  # active | cleared
    acknowledged_at = Column(DateTime, nullable=True)
    acknowledged_by = Column(String, nullable=True)
    notes = Column(Text, nullable=True)
 class NL43Reading(Base):
    """Downsampled time-series of live-monitor readings, for the live-chart
    backfill (so a viewer sees recent trend on open, not a blank chart).
    Viewing only — NOT the report source. Reports use the device's authoritative
    FTP .rnd intervals. This is a short, capped trail (one row/minute, pruned to
    a retention window) fed by the monitor's keepalive poll loop.
    """
    __tablename__ = "nl43_readings"
    id = Column(Integer, primary_key=True, autoincrement=True)
    unit_id = Column(String, index=True, nullable=False)
    timestamp = Column(DateTime, default=func.now(), index=True)
    lp = Column(String, nullable=True)
    leq = Column(String, nullable=True)
    lmax = Column(String, nullable=True)
    ln1 = Column(String, nullable=True)
    ln2 = Column(String, nullable=True)
@@ -0,0 +1,322 @@
 """
 Per-device live monitor (fan-out hub).
 ONE DOD poll loop per device, broadcast to many subscribers:
 - browser WebSocket clients (live view) — they no longer each open their own
  device stream, so the NL43's single-connection limit stops causing the
  "second viewer sees nothing" contention.
 - the alert evaluator (threshold alerts), which can keep a device's feed running
  even with no browser attached.
 - persistence (each snapshot is written to NL43Status, like the poller does).
 The device's one TCP connection is respected: every poll goes through the same
 per-device lock + connection pool in services.py, so the monitor, the background
 poller, and on-demand commands all serialize safely.
 """
 import asyncio
 import logging
 import os
 from datetime import datetime
 from typing import Dict, Optional, Set
 from app.database import SessionLocal
 from app.models import NL43Config, NL43Status
 from app.services import NL43Client, persist_snapshot
 from app.alerts import alert_evaluator
 logger = logging.getLogger(__name__)
 # Extra idle between DOD polls WHEN A BROWSER IS WATCHING. The 1s device rate-limit
 # already paces consecutive DOD? commands, so this just needs to be small — the
 # rate-limit is the real floor (~1.25s/poll effective).
 MONITOR_POLL_INTERVAL = float(os.getenv("MONITOR_POLL_INTERVAL", "0.25"))
 # Idle cadence when NO browser is subscribed and the feed is only kept alive for
 # alerting. Same data, ~8x fewer polls -> ~8x less cellular traffic on a metered
 # SIM (~1 GB/device/month at full rate -> ~125 MB). NOTE: this also sets the alert
 # sampling resolution when nobody is watching, so keep it <= the smallest alert
 # duration_s you rely on (default 10s comfortably catches a "sustained 30/60s" rule).
 MONITOR_IDLE_POLL_INTERVAL = float(os.getenv("MONITOR_IDLE_POLL_INTERVAL", "10"))
 # Exponential backoff once the device is unreachable, so a powered-off / asleep /
 # out-of-signal device stops churning reconnects every cycle (log spam + a trickle
 # of wasted cellular data on failed SYNs). delay = min(BASE * 2**(fails-1), MAX),
 # reset to full-rate on the first good poll. While a browser is actively watching we
 # cap the backoff lower (WATCHED_MAX) so a recovery surfaces quickly for the viewer.
 MONITOR_BACKOFF_BASE_S = float(os.getenv("MONITOR_BACKOFF_BASE_S", "1"))
 MONITOR_BACKOFF_MAX_S = float(os.getenv("MONITOR_BACKOFF_MAX_S", "60"))
 MONITOR_BACKOFF_WATCHED_MAX_S = float(os.getenv("MONITOR_BACKOFF_WATCHED_MAX_S", "5"))
 # How often to refresh the run state (Measure?). It changes rarely, so we cache it
 # and skip that second rate-limited command on most polls — roughly halving the
 # per-update latency (~2.5s -> ~1.3s).
 MONITOR_STATE_REFRESH_S = float(os.getenv("MONITOR_STATE_REFRESH_S", "30"))
 # Downsampled trail for the live-chart backfill: store one reading per
 # TRAIL_SAMPLE_S and keep TRAIL_RETENTION_HOURS of it (pruned). Viewing only —
 # reports use the device's FTP .rnd data, not this.
 TRAIL_SAMPLE_S = float(os.getenv("MONITOR_TRAIL_SAMPLE_S", "60"))
 TRAIL_RETENTION_HOURS = float(os.getenv("MONITOR_TRAIL_RETENTION_HOURS", "24"))
 # If nothing has been broadcast in this many seconds (e.g. device offline and
 # silent), send a keepalive frame so reverse proxies don't drop the idle WS.
 MONITOR_HEARTBEAT_S = float(os.getenv("MONITOR_HEARTBEAT_S", "25"))
 def _snapshot_payload(snap, unit_id: str, measurement_start_time) -> dict:
    """Build the broadcast payload — same shape as the DRD stream, but DOD-sourced
    so it carries ln1/ln2 (which DRD cannot)."""
    return {
        "unit_id": unit_id,
        "timestamp": datetime.utcnow().isoformat(),
        "measurement_state": snap.measurement_state,
        "measurement_start_time": measurement_start_time,
        "counter": snap.counter,
        "lp": snap.lp,
        "leq": snap.leq,
        "lmax": snap.lmax,
        "lmin": snap.lmin,
        "lpeak": snap.lpeak,
        "ln1": snap.ln1,
        "ln2": snap.ln2,
        "raw_payload": snap.raw_payload,
    }
 class DeviceMonitor:
    """Owns a single DOD poll loop for one device and fans each snapshot out to
    all subscribers. Runs while it has at least one browser subscriber OR the
    server-side keep-alive (alerting) flag is set."""
    def __init__(self, unit_id: str):
        self.unit_id = unit_id
        self._subscribers: Set[asyncio.Queue] = set()
        self._keepalive = False
        self._task: Optional[asyncio.Task] = None
        self._lock = asyncio.Lock()
        self._last_payload: Optional[dict] = None  # replayed to new subscribers
        self._consec_fail = 0
        self._reachable = True  # last broadcast reachability (for transition frames)
        self._cached_state: Optional[str] = None  # run state, refreshed periodically
        self._last_state_refresh = 0.0
        self._last_trail_store = 0.0  # downsample throttle for the backfill trail
    @property
    def running(self) -> bool:
        return self._task is not None and not self._task.done()
    def subscriber_count(self) -> int:
        return len(self._subscribers)
    def _has_demand(self) -> bool:
        return bool(self._subscribers) or self._keepalive
    def _ensure_task(self) -> None:
        if self._task is None or self._task.done():
            self._task = asyncio.create_task(self._run())
    async def subscribe(self) -> asyncio.Queue:
        q: asyncio.Queue = asyncio.Queue(maxsize=5)
        async with self._lock:
            self._subscribers.add(q)
            # Replay the last frame so a client connecting mid-stream sees data
            # (or the current 'unreachable' state) immediately, not after a poll.
            if self._last_payload is not None:
                try:
                    q.put_nowait(self._last_payload)
                except asyncio.QueueFull:
                    pass
            self._ensure_task()
        return q
    async def unsubscribe(self, q: asyncio.Queue) -> None:
        async with self._lock:
            self._subscribers.discard(q)
    async def set_keepalive(self, on: bool) -> None:
        async with self._lock:
            self._keepalive = on
            if on:
                self._ensure_task()
    async def _run(self) -> None:
        logger.info(f"[MONITOR] {self.unit_id}: feed started")
        loop = asyncio.get_running_loop()
        last_send = loop.time()
        try:
            while self._has_demand():
                snap, mst = await self._poll_once()
                if snap is not None:
                    if not self._reachable:
                        # Recovered from an outage — clear the connectivity alert.
                        try:
                            await alert_evaluator.device_online(self.unit_id)
                        except Exception as e:
                            logger.warning(f"[MONITOR] {self.unit_id}: online alert failed: {e}")
                    self._consec_fail = 0
                    self._reachable = True
                    payload = _snapshot_payload(snap, self.unit_id, mst)
                    payload["feed_status"] = "ok"
                    self._broadcast(payload)
                    last_send = loop.time()
                    try:
                        await alert_evaluator.evaluate(self.unit_id, snap)
                    except Exception as e:
                        logger.warning(f"[MONITOR] {self.unit_id}: alert eval failed: {e}")
                else:
                    # Tell clients the device went offline — once, on transition, after a
                    # few failures so a momentary blip doesn't flap the UI. Same edge
                    # raises the device-offline alert.
                    self._consec_fail += 1
                    if self._reachable and self._consec_fail >= 3:
                        self._reachable = False
                        self._broadcast({
                            "unit_id": self.unit_id,
                            "timestamp": datetime.utcnow().isoformat(),
                            "feed_status": "unreachable",
                        })
                        last_send = loop.time()
                        try:
                            await alert_evaluator.device_offline(self.unit_id)
                        except Exception as e:
                            logger.warning(f"[MONITOR] {self.unit_id}: offline alert failed: {e}")
                # Heartbeat: during quiet/offline stretches, send a keepalive so an
                # idle WS isn't dropped by a reverse proxy. Not cached (new subscribers
                # should still get the last real frame, not a heartbeat).
                if loop.time() - last_send >= MONITOR_HEARTBEAT_S:
                    self._broadcast({
                        "unit_id": self.unit_id,
                        "timestamp": datetime.utcnow().isoformat(),
                        "feed_status": "ok" if self._reachable else "unreachable",
                        "heartbeat": True,
                    }, cache=False)
                    last_send = loop.time()
                await asyncio.sleep(self._next_delay())
        finally:
            logger.info(f"[MONITOR] {self.unit_id}: feed stopped")
    def _next_delay(self) -> float:
        """Inter-poll delay: exponential backoff while unreachable, full-rate while a
        browser is watching, relaxed cadence when the feed is keepalive-only."""
        if self._consec_fail > 0:
            shift = min(self._consec_fail - 1, 6)  # cap growth at 2**6 = 64x base
            delay = min(MONITOR_BACKOFF_BASE_S * (2 ** shift), MONITOR_BACKOFF_MAX_S)
            if self._subscribers:
                delay = min(delay, MONITOR_BACKOFF_WATCHED_MAX_S)
            return delay
        if self._subscribers:
            return MONITOR_POLL_INTERVAL       # a browser is watching — smooth chart
        return MONITOR_IDLE_POLL_INTERVAL      # keepalive-only (alerting) — save data
    async def _poll_once(self):
        """One DOD poll: read, persist, return (snapshot, measurement_start_iso)."""
        db = SessionLocal()
        try:
            cfg = db.query(NL43Config).filter_by(unit_id=self.unit_id).first()
            if not cfg or not cfg.tcp_enabled:
                return None, None
            client = NL43Client(
                cfg.host, cfg.tcp_port,
                ftp_username=cfg.ftp_username, ftp_password=cfg.ftp_password,
                ftp_port=cfg.ftp_port or 21,
            )
            # Refresh the run state only every MONITOR_STATE_REFRESH_S; reuse the
            # cached state otherwise so most polls send just DOD? (one rate-limited
            # command) instead of DOD? + Measure?.
            now = asyncio.get_running_loop().time()
            refresh_state = (self._cached_state is None
                             or now - self._last_state_refresh >= MONITOR_STATE_REFRESH_S)
            snap = await client.request_dod(
                measurement_state=None if refresh_state else self._cached_state
            )
            if refresh_state:
                self._cached_state = snap.measurement_state
                self._last_state_refresh = now
            snap.unit_id = self.unit_id
            persist_snapshot(snap, db)
            db.commit()
            # Append to the downsampled backfill trail (~one row per TRAIL_SAMPLE_S).
            if now - self._last_trail_store >= TRAIL_SAMPLE_S:
                self._last_trail_store = now
                self._store_trail(snap, db)
            status = db.query(NL43Status).filter_by(unit_id=self.unit_id).first()
            mst = (status.measurement_start_time.isoformat()
                   if status and status.measurement_start_time else None)
            return snap, mst
        except Exception as e:
            logger.warning(f"[MONITOR] {self.unit_id}: poll failed: {e}")
            return None, None
        finally:
            db.close()
    def _store_trail(self, snap, db) -> None:
        """Append one downsampled reading to the backfill trail and prune old rows."""
        from datetime import datetime, timedelta
        from app.models import NL43Reading
        try:
            db.add(NL43Reading(
                unit_id=self.unit_id, timestamp=datetime.utcnow(),
                lp=snap.lp, leq=snap.leq, lmax=snap.lmax, ln1=snap.ln1, ln2=snap.ln2,
            ))
            cutoff = datetime.utcnow() - timedelta(hours=TRAIL_RETENTION_HOURS)
            db.query(NL43Reading).filter(
                NL43Reading.unit_id == self.unit_id,
                NL43Reading.timestamp < cutoff,
            ).delete()
            db.commit()
        except Exception as e:
            logger.warning(f"[MONITOR] {self.unit_id}: trail store failed: {e}")
    def _broadcast(self, payload: dict, cache: bool = True) -> None:
        if cache:
            self._last_payload = payload  # replayed to new subscribers
        for q in list(self._subscribers):
            try:
                q.put_nowait(payload)
            except asyncio.QueueFull:
                # Slow consumer — drop this frame rather than stall the whole feed.
                pass
 class MonitorManager:
    """Registry of per-device monitors (one per unit_id)."""
    def __init__(self):
        self._monitors: Dict[str, DeviceMonitor] = {}
        self._lock = asyncio.Lock()
    async def get(self, unit_id: str) -> DeviceMonitor:
        async with self._lock:
            m = self._monitors.get(unit_id)
            if m is None:
                m = DeviceMonitor(unit_id)
                self._monitors[unit_id] = m
            return m
    def is_active(self, unit_id: str) -> bool:
        """True if this unit has a running monitor feed (so the background poller
        can skip it — the monitor already polls it more often)."""
        m = self._monitors.get(unit_id)
        return m is not None and m.running
    def status(self) -> dict:
        return {
            uid: {
                "running": m.running,
                "subscribers": m.subscriber_count(),
                "keepalive": m._keepalive,
                "reachable": m._reachable,
                # what cadence the loop is currently using, for observability
                "mode": ("backoff" if m._consec_fail > 0
                         else "watched" if m._subscribers
                         else "idle"),
            }
            for uid, m in self._monitors.items()
        }
 # Module-level singleton
 monitor_manager = MonitorManager()
@@ -0,0 +1,67 @@
 # SLMM Archive
 This directory contains legacy scripts that are no longer needed for normal operation but are preserved for reference.
 ## Legacy Migrations (`legacy_migrations/`)
 These migration scripts were used during SLMM development (v0.1.x) to incrementally add database fields. They are **no longer needed** because:
 1. **Fresh databases** get the complete schema automatically from `app/models.py`
 2. **Existing databases** should already have these fields from previous runs
 3. **Current migration** is `migrate_add_polling_fields.py` (v0.2.0) in the parent directory
 ### Archived Migration Files
 - `migrate_add_counter.py` - Added `counter` field to NL43Status
 - `migrate_add_measurement_start_time.py` - Added `measurement_start_time` field
 - `migrate_add_ftp_port.py` - Added `ftp_port` field to NL43Config
 - `migrate_field_names.py` - Renamed fields for consistency (one-time fix)
 - `migrate_revert_field_names.py` - Rollback for the rename migration
 **Do not delete** - These provide historical context for database schema evolution.
 ---
 ## Legacy Tools
 ### `nl43_dod_poll.py`
 Manual polling script that queries a single NL-43 device for DOD (Device On-Demand) data.
 **Status**: Replaced by background polling system in v0.2.0
 **Why archived**:
 - Background poller (`app/background_poller.py`) now handles continuous polling automatically
 - No need for manual polling scripts
 - Kept for reference in case manual querying is needed for debugging
 **How to use** (if needed):
 ```bash
 cd /home/serversdown/tmi/slmm/archive
 python3 nl43_dod_poll.py <host> <port> <unit_id>
 ```
 ---
 ## Active Scripts (Still in Parent Directory)
 These scripts are **actively used** and documented in the main README:
 ### Migrations
 - `migrate_add_polling_fields.py` - **v0.2.0 migration** - Adds background polling fields
 - `migrate_add_ftp_credentials.py` - **Legacy FTP migration** - Adds FTP auth fields
 ### Testing
 - `test_polling.sh` - Comprehensive test suite for background polling features
 - `test_settings_endpoint.py` - Tests device settings API
 - `test_sleep_mode_auto_disable.py` - Tests automatic sleep mode handling
 ### Utilities
 - `set_ftp_credentials.py` - Command-line tool to set FTP credentials for a device
 ---
 ## Version History
 - **v0.2.0** (2026-01-15) - Background polling system added, manual polling scripts archived
 - **v0.1.0** (2025-12-XX) - Initial release with incremental migrations
@@ -483,7 +483,7 @@ POST /{unit_id}/ftp/enable
 ```
 Enables FTP server on the device.
-**Note:** FTP and TCP are mutually exclusive. Enabling FTP will temporarily disable TCP control.
+**Note:** ~~FTP and TCP are mutually exclusive. Enabling FTP will temporarily disable TCP control.~~ As of v0.2.0, FTP and TCP are working fine in tandem. Just dont spam them a bunch.
 ### Disable FTP
 ```
@@ -0,0 +1,246 @@
 # SLMM Roster Management
 The SLMM standalone application now includes a roster management interface for viewing and configuring all Sound Level Meter devices.
 ## Features
 ### Web Interface
 Access the roster at: **http://localhost:8100/roster**
 The roster page provides:
 - **Device List Table**: View all configured SLMs with their connection details
 - **Real-time Status**: See device connectivity status (Online/Offline/Stale)
 - **Add Device**: Create new device configurations with a user-friendly modal form
 - **Edit Device**: Modify existing device configurations
 - **Delete Device**: Remove device configurations (does not affect physical devices)
 - **Test Connection**: Run diagnostics on individual devices
 ### Table Columns
 | Column | Description |
 |--------|-------------|
 | Unit ID | Unique identifier for the device |
 | Host / IP | Device IP address or hostname |
 | TCP Port | TCP control port (default: 2255) |
 | FTP Port | FTP file transfer port (default: 21) |
 | TCP | Whether TCP control is enabled |
 | FTP | Whether FTP file transfer is enabled |
 | Polling | Whether background polling is enabled |
 | Status | Device connectivity status (Online/Offline/Stale) |
 | Actions | Test, Edit, Delete buttons |
 ### Status Indicators
 - **Online** (green): Device responded within the last 5 minutes
 - **Stale** (yellow): Device hasn't responded recently but was seen before
 - **Offline** (red): Device is unreachable or has consecutive failures
 - **Unknown** (gray): No status data available yet
 ## API Endpoints
 ### List All Devices
 ```bash
 GET /api/nl43/roster
 ```
 Returns all configured devices with their status information.
 **Response:**
 ```json
 {
  "status": "ok",
  "devices": [
    {
      "unit_id": "SLM-43-01",
      "host": "192.168.1.100",
      "tcp_port": 2255,
      "ftp_port": 21,
      "tcp_enabled": true,
      "ftp_enabled": true,
      "ftp_username": "USER",
      "ftp_password": "0000",
      "web_enabled": false,
      "poll_enabled": true,
      "poll_interval_seconds": 60,
      "status": {
        "last_seen": "2026-01-16T20:00:00",
        "measurement_state": "Start",
        "is_reachable": true,
        "consecutive_failures": 0,
        "last_success": "2026-01-16T20:00:00",
        "last_error": null
      }
    }
  ],
  "total": 1
 }
 ```
 ### Create New Device
 ```bash
 POST /api/nl43/roster
 Content-Type: application/json
 {
  "unit_id": "SLM-43-01",
  "host": "192.168.1.100",
  "tcp_port": 2255,
  "ftp_port": 21,
  "tcp_enabled": true,
  "ftp_enabled": false,
  "poll_enabled": true,
  "poll_interval_seconds": 60
 }
 ```
 **Required Fields:**
 - `unit_id`: Unique device identifier
 - `host`: IP address or hostname
 **Optional Fields:**
 - `tcp_port`: TCP control port (default: 2255)
 - `ftp_port`: FTP port (default: 21)
 - `tcp_enabled`: Enable TCP control (default: true)
 - `ftp_enabled`: Enable FTP transfers (default: false)
 - `ftp_username`: FTP username (only if ftp_enabled)
 - `ftp_password`: FTP password (only if ftp_enabled)
 - `poll_enabled`: Enable background polling (default: true)
 - `poll_interval_seconds`: Polling interval 10-3600 seconds (default: 60)
 **Response:**
 ```json
 {
  "status": "ok",
  "message": "Device SLM-43-01 created successfully",
  "data": {
    "unit_id": "SLM-43-01",
    "host": "192.168.1.100",
    "tcp_port": 2255,
    "tcp_enabled": true,
    "ftp_enabled": false,
    "poll_enabled": true,
    "poll_interval_seconds": 60
  }
 }
 ```
 ### Update Device
 ```bash
 PUT /api/nl43/{unit_id}/config
 Content-Type: application/json
 {
  "host": "192.168.1.101",
  "tcp_port": 2255,
  "poll_interval_seconds": 120
 }
 ```
 All fields are optional. Only include fields you want to update.
 ### Delete Device
 ```bash
 DELETE /api/nl43/{unit_id}/config
 ```
 Removes the device configuration and associated status data. Does not affect the physical device.
 **Response:**
 ```json
 {
  "status": "ok",
  "message": "Deleted device SLM-43-01"
 }
 ```
 ## Usage Examples
 ### Via Web Interface
 1. Navigate to http://localhost:8100/roster
 2. Click "Add Device" to create a new configuration
 3. Fill in the device details (unit ID, IP address, ports)
 4. Configure TCP, FTP, and polling settings
 5. Click "Save Device"
 6. Use "Test" button to verify connectivity
 7. Edit or delete devices as needed
 ### Via API (curl)
 **Add a new device:**
 ```bash
 curl -X POST http://localhost:8100/api/nl43/roster \
  -H "Content-Type: application/json" \
  -d '{
    "unit_id": "slm-site-a",
    "host": "192.168.1.100",
    "tcp_port": 2255,
    "tcp_enabled": true,
    "ftp_enabled": true,
    "ftp_username": "USER",
    "ftp_password": "0000",
    "poll_enabled": true,
    "poll_interval_seconds": 60
  }'
 ```
 **Update device host:**
 ```bash
 curl -X PUT http://localhost:8100/api/nl43/slm-site-a/config \
  -H "Content-Type: application/json" \
  -d '{"host": "192.168.1.101"}'
 ```
 **Delete device:**
 ```bash
 curl -X DELETE http://localhost:8100/api/nl43/slm-site-a/config
 ```
 **List all devices:**
 ```bash
 curl http://localhost:8100/api/nl43/roster | python3 -m json.tool
 ```
 ## Integration with Terra-View
 When SLMM is used as a module within Terra-View:
 1. Terra-View manages device configurations in its own database
 2. Terra-View syncs configurations to SLMM via `PUT /api/nl43/{unit_id}/config`
 3. Terra-View can query device status via `GET /api/nl43/{unit_id}/status`
 4. SLMM's roster page can be used for standalone testing and diagnostics
 ## Background Polling
 Devices with `poll_enabled: true` are automatically polled at their configured interval:
 - Polls device status every `poll_interval_seconds` (10-3600 seconds)
 - Updates `NL43Status` table with latest measurements
 - Tracks device reachability and failure counts
 - Provides real-time status updates in the roster
 **Note**: Polling respects the NL43 protocol's 1-second rate limit between commands.
 ## Validation
 The roster system validates:
 - **Unit ID**: Must be unique across all devices
 - **Host**: Valid IP address or hostname format
 - **Ports**: Must be between 1-65535
 - **Poll Interval**: Must be between 10-3600 seconds
 - **Duplicate Check**: Returns 409 Conflict if unit_id already exists
 ## Notes
 - Deleting a device from the roster does NOT affect the physical device
 - Device configurations are stored in the SLMM database (`data/slmm.db`)
 - Status information is updated by the background polling system
 - The roster page auto-refreshes status indicators
 - Test button runs full diagnostics (connectivity, TCP, FTP if enabled)
@@ -0,0 +1,26 @@
 # SLMM Feature Documentation
 This directory contains detailed documentation for specific SLMM features and enhancements.
 ## Feature Documents
 ### FEATURE_SUMMARY.md
 Overview of all major features in SLMM.
 ### SETTINGS_ENDPOINT.md
 Documentation of the device settings endpoint and verification system.
 ### TIMEZONE_CONFIGURATION.md
 Timezone handling and configuration for SLMM timestamps.
 ### SLEEP_MODE_AUTO_DISABLE.md  
 Automatic sleep mode wake-up system for background polling.
 ### UI_UPDATE.md
 UI/UX improvements and interface updates.
 ## Related Documentation
 - [../README.md](../../README.md) - Main SLMM documentation
 - [../CHANGELOG.md](../../CHANGELOG.md) - Version history
 - [../API.md](../../API.md) - Complete API reference
@@ -0,0 +1,73 @@
 #!/usr/bin/env python3
 """
 Database migration: Add device_logs table.
 This table stores per-device log entries for debugging and audit trail.
 Run this once to add the new table.
 """
 import sqlite3
 import os
 # Path to the SLMM database
 DB_PATH = os.path.join(os.path.dirname(__file__), "data", "slmm.db")
 def migrate():
    print(f"Adding device_logs table to: {DB_PATH}")
    if not os.path.exists(DB_PATH):
        print("Database does not exist yet. Table will be created automatically on first run.")
        return
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.cursor()
    try:
        # Check if table already exists
        cursor.execute("""
            SELECT name FROM sqlite_master
            WHERE type='table' AND name='device_logs'
        """)
        if cursor.fetchone():
            print("✓ device_logs table already exists, no migration needed")
            return
        # Create the table
        print("Creating device_logs table...")
        cursor.execute("""
            CREATE TABLE device_logs (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                unit_id VARCHAR NOT NULL,
                timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
                level VARCHAR DEFAULT 'INFO',
                category VARCHAR DEFAULT 'GENERAL',
                message TEXT NOT NULL
            )
        """)
        # Create indexes for efficient querying
        print("Creating indexes...")
        cursor.execute("CREATE INDEX ix_device_logs_unit_id ON device_logs (unit_id)")
        cursor.execute("CREATE INDEX ix_device_logs_timestamp ON device_logs (timestamp)")
        conn.commit()
        print("✓ Created device_logs table with indexes")
        # Verify
        cursor.execute("""
            SELECT name FROM sqlite_master
            WHERE type='table' AND name='device_logs'
        """)
        if not cursor.fetchone():
            raise Exception("device_logs table was not created successfully")
        print("✓ Migration completed successfully")
    finally:
        conn.close()
 if __name__ == "__main__":
    migrate()
@@ -0,0 +1,58 @@
 #!/usr/bin/env python3
 """
 Migration script to add ln1 and ln2 percentile columns to the nl43_status table.
 The NL-43 DOD response carries percentile slots LN1-LN5; the live SLM display
 (Terra-View) shows two of them (default L1/L10). This adds storage for the two
 surfaced slots. Run once per database to update existing schema.
 """
 import sqlite3
 import sys
 from pathlib import Path
 DB_PATH = Path(__file__).parent / "data" / "slmm.db"
 def migrate():
    """Add ln1 and ln2 columns to the nl43_status table."""
    if not DB_PATH.exists():
        print(f"Database not found at {DB_PATH}")
        print("No migration needed - database will be created with new schema")
        return
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.cursor()
    try:
        cursor.execute("PRAGMA table_info(nl43_status)")
        columns = [row[1] for row in cursor.fetchall()]
        if "ln1" in columns and "ln2" in columns:
            print("✓ ln1/ln2 columns already exist, no migration needed")
            return
        if "ln1" not in columns:
            print("Adding ln1 column...")
            cursor.execute("ALTER TABLE nl43_status ADD COLUMN ln1 TEXT")
            print("✓ Added ln1 column")
        if "ln2" not in columns:
            print("Adding ln2 column...")
            cursor.execute("ALTER TABLE nl43_status ADD COLUMN ln2 TEXT")
            print("✓ Added ln2 column")
        conn.commit()
        print("\n✓ Migration completed successfully!")
    except Exception as e:
        conn.rollback()
        print(f"✗ Migration failed: {e}", file=sys.stderr)
        sys.exit(1)
    finally:
        conn.close()
 if __name__ == "__main__":
    migrate()
@@ -0,0 +1,48 @@
 #!/usr/bin/env python3
 """
 Migration: add monitor_enabled column to nl43_config.
 Controls whether the live fan-out DOD monitor is kept alive 24/7 for a unit
 (which is what makes alerting continuous). Defaults to enabled. Run once per DB.
 """
 import sqlite3
 import sys
 from pathlib import Path
 DB_PATH = Path(__file__).parent / "data" / "slmm.db"
 def migrate():
    if not DB_PATH.exists():
        print(f"Database not found at {DB_PATH}")
        print("No migration needed - database will be created with new schema")
        return
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.cursor()
    try:
        cursor.execute("PRAGMA table_info(nl43_config)")
        columns = [row[1] for row in cursor.fetchall()]
        if "monitor_enabled" in columns:
            print("✓ monitor_enabled column already exists, no migration needed")
            return
        print("Adding monitor_enabled column (default enabled)...")
        # SQLite stores booleans as 0/1; default 1 = enabled.
        cursor.execute("ALTER TABLE nl43_config ADD COLUMN monitor_enabled BOOLEAN DEFAULT 1")
        conn.commit()
        print("✓ Added monitor_enabled column")
        print("\n✓ Migration completed successfully!")
    except Exception as e:
        conn.rollback()
        print(f"✗ Migration failed: {e}", file=sys.stderr)
        sys.exit(1)
    finally:
        conn.close()
 if __name__ == "__main__":
    migrate()
@@ -0,0 +1,136 @@
 #!/usr/bin/env python3
 """
 Migration script to add polling-related fields to nl43_config and nl43_status tables.
 Adds to nl43_config:
 - poll_interval_seconds (INTEGER, default 60)
 - poll_enabled (BOOLEAN, default 1/True)
 Adds to nl43_status:
 - is_reachable (BOOLEAN, default 1/True)
 - consecutive_failures (INTEGER, default 0)
 - last_poll_attempt (DATETIME, nullable)
 - last_success (DATETIME, nullable)
 - last_error (TEXT, nullable)
 Usage:
    python migrate_add_polling_fields.py
 """
 import sqlite3
 import sys
 from pathlib import Path
 def migrate():
    db_path = Path("data/slmm.db")
    if not db_path.exists():
        print(f"❌ Database not found at {db_path}")
        print("   Run this script from the slmm directory")
        return False
    try:
        conn = sqlite3.connect(db_path)
        cursor = conn.cursor()
        # Check nl43_config columns
        cursor.execute("PRAGMA table_info(nl43_config)")
        config_columns = [row[1] for row in cursor.fetchall()]
        # Check nl43_status columns
        cursor.execute("PRAGMA table_info(nl43_status)")
        status_columns = [row[1] for row in cursor.fetchall()]
        changes_made = False
        # Add nl43_config columns
        if "poll_interval_seconds" not in config_columns:
            print("Adding poll_interval_seconds to nl43_config...")
            cursor.execute("""
                ALTER TABLE nl43_config
                ADD COLUMN poll_interval_seconds INTEGER DEFAULT 60
            """)
            changes_made = True
        else:
            print("✓ poll_interval_seconds already exists in nl43_config")
        if "poll_enabled" not in config_columns:
            print("Adding poll_enabled to nl43_config...")
            cursor.execute("""
                ALTER TABLE nl43_config
                ADD COLUMN poll_enabled BOOLEAN DEFAULT 1
            """)
            changes_made = True
        else:
            print("✓ poll_enabled already exists in nl43_config")
        # Add nl43_status columns
        if "is_reachable" not in status_columns:
            print("Adding is_reachable to nl43_status...")
            cursor.execute("""
                ALTER TABLE nl43_status
                ADD COLUMN is_reachable BOOLEAN DEFAULT 1
            """)
            changes_made = True
        else:
            print("✓ is_reachable already exists in nl43_status")
        if "consecutive_failures" not in status_columns:
            print("Adding consecutive_failures to nl43_status...")
            cursor.execute("""
                ALTER TABLE nl43_status
                ADD COLUMN consecutive_failures INTEGER DEFAULT 0
            """)
            changes_made = True
        else:
            print("✓ consecutive_failures already exists in nl43_status")
        if "last_poll_attempt" not in status_columns:
            print("Adding last_poll_attempt to nl43_status...")
            cursor.execute("""
                ALTER TABLE nl43_status
                ADD COLUMN last_poll_attempt DATETIME
            """)
            changes_made = True
        else:
            print("✓ last_poll_attempt already exists in nl43_status")
        if "last_success" not in status_columns:
            print("Adding last_success to nl43_status...")
            cursor.execute("""
                ALTER TABLE nl43_status
                ADD COLUMN last_success DATETIME
            """)
            changes_made = True
        else:
            print("✓ last_success already exists in nl43_status")
        if "last_error" not in status_columns:
            print("Adding last_error to nl43_status...")
            cursor.execute("""
                ALTER TABLE nl43_status
                ADD COLUMN last_error TEXT
            """)
            changes_made = True
        else:
            print("✓ last_error already exists in nl43_status")
        if changes_made:
            conn.commit()
            print("\n✓ Migration completed successfully")
            print("  Added polling-related fields to nl43_config and nl43_status")
        else:
            print("\n✓ All polling fields already exist - no changes needed")
        conn.close()
        return True
    except Exception as e:
        print(f"❌ Migration failed: {e}")
        return False
 if __name__ == "__main__":
    success = migrate()
    sys.exit(0 if success else 1)
@@ -0,0 +1,60 @@
 #!/usr/bin/env python3
 """
 Database migration: Add start_time_sync_attempted field to nl43_status table.
 This field tracks whether FTP sync has been attempted for the current measurement,
 preventing repeated sync attempts when FTP fails.
 Run this once to add the new column.
 """
 import sqlite3
 import os
 # Path to the SLMM database
 DB_PATH = os.path.join(os.path.dirname(__file__), "data", "slmm.db")
 def migrate():
    print(f"Adding start_time_sync_attempted field to: {DB_PATH}")
    if not os.path.exists(DB_PATH):
        print("Database does not exist yet. Column will be created automatically.")
        return
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.cursor()
    try:
        # Check if column already exists
        cursor.execute("PRAGMA table_info(nl43_status)")
        columns = [col[1] for col in cursor.fetchall()]
        if 'start_time_sync_attempted' in columns:
            print("✓ start_time_sync_attempted column already exists, no migration needed")
            return
        # Add the column
        print("Adding start_time_sync_attempted column...")
        cursor.execute("""
            ALTER TABLE nl43_status
            ADD COLUMN start_time_sync_attempted BOOLEAN DEFAULT 0
        """)
        conn.commit()
        print("✓ Added start_time_sync_attempted column")
        # Verify
        cursor.execute("PRAGMA table_info(nl43_status)")
        columns = [col[1] for col in cursor.fetchall()]
        if 'start_time_sync_attempted' not in columns:
            raise Exception("start_time_sync_attempted column was not added successfully")
        print("✓ Migration completed successfully")
    finally:
        conn.close()
 if __name__ == "__main__":
    migrate()
@@ -31,6 +31,11 @@
 <body>
  <h1>SLMM NL43 Standalone</h1>
  <p>Configure a unit (host/port), then use controls to Start/Stop and fetch live status.</p>
  <p style="margin-bottom: 16px;">
    <a href="/roster" style="color: #0969da; text-decoration: none; font-weight: 600;">📊 View Device Roster</a>
    <span style="margin: 0 8px; color: #d0d7de;">|</span>
    <a href="/docs" style="color: #0969da; text-decoration: none;">API Documentation</a>
  </p>
  <fieldset>
    <legend>🔍 Connection Diagnostics</legend>
@@ -40,13 +45,34 @@
  </fieldset>
  <fieldset>
-    <legend>Unit Config</legend>
+    <legend>Unit Selection & Config</legend>
    <div style="display: flex; gap: 8px; align-items: flex-end; margin-bottom: 12px;">
      <div style="flex: 1;">
        <label>Select Device</label>
        <select id="deviceSelector" onchange="loadSelectedDevice()" style="width: 100%; padding: 8px; margin-bottom: 0;">
          <option value="">-- Select a device --</option>
        </select>
      </div>
      <button onclick="refreshDeviceList()" style="padding: 8px 12px;">↻ Refresh</button>
    </div>
    <div style="padding: 12px; background: #f6f8fa; border: 1px solid #d0d7de; border-radius: 4px; margin-bottom: 12px;">
      <div style="display: flex; gap: 16px;">
        <div style="flex: 1;">
          <label>Unit ID</label>
          <input id="unitId" value="nl43-1" />
        </div>
        <div style="flex: 2;">
          <label>Host</label>
          <input id="host" value="127.0.0.1" />
-    <label>Port</label>
+        </div>
-    <input id="port" type="number" value="80" />
+        <div style="flex: 1;">
          <label>TCP Port</label>
          <input id="port" type="number" value="2255" />
        </div>
      </div>
    </div>
    <div style="margin: 12px 0;">
      <label style="display: inline-flex; align-items: center; margin-right: 16px;">
@@ -66,8 +92,10 @@
      <input id="ftpPassword" type="password" value="0000" />
    </div>
-    <button onclick="saveConfig()" style="margin-top: 12px;">Save Config</button>
+    <div style="margin-top: 12px;">
      <button onclick="saveConfig()">Save Config</button>
      <button onclick="loadConfig()">Load Config</button>
    </div>
  </fieldset>
  <fieldset>
@@ -148,6 +176,7 @@
    let ws = null;
    let streamUpdateCount = 0;
    let availableDevices = [];
    function log(msg) {
      logEl.textContent += msg + "\n";
@@ -160,9 +189,97 @@
      ftpCredentials.style.display = ftpEnabled ? 'block' : 'none';
    }
-    // Add event listener for FTP checkbox
+    // Load device list from roster
    async function refreshDeviceList() {
      try {
        const res = await fetch('/api/nl43/roster');
        const data = await res.json();
        if (!res.ok) {
          log('Failed to load device list');
          return;
        }
        availableDevices = data.devices || [];
        const selector = document.getElementById('deviceSelector');
        // Save current selection
        const currentSelection = selector.value;
        // Clear and rebuild options
        selector.innerHTML = '<option value="">-- Select a device --</option>';
        availableDevices.forEach(device => {
          const option = document.createElement('option');
          option.value = device.unit_id;
          // Add status indicator
          let statusIcon = '⚪';
          if (device.status) {
            if (device.status.is_reachable === false) {
              statusIcon = '🔴';
            } else if (device.status.last_success) {
              const lastSeen = new Date(device.status.last_success);
              const ageMinutes = Math.floor((Date.now() - lastSeen) / 60000);
              statusIcon = ageMinutes < 5 ? '🟢' : '🟡';
            }
          }
          option.textContent = `${statusIcon} ${device.unit_id} (${device.host})`;
          selector.appendChild(option);
        });
        // Restore selection if it still exists
        if (currentSelection && availableDevices.find(d => d.unit_id === currentSelection)) {
          selector.value = currentSelection;
        }
        log(`Loaded ${availableDevices.length} device(s) from roster`);
      } catch (err) {
        log(`Error loading device list: ${err.message}`);
      }
    }
    // Load selected device configuration
    function loadSelectedDevice() {
      const selector = document.getElementById('deviceSelector');
      const unitId = selector.value;
      if (!unitId) {
        return;
      }
      const device = availableDevices.find(d => d.unit_id === unitId);
      if (!device) {
        log(`Device ${unitId} not found in list`);
        return;
      }
      // Populate form fields
      document.getElementById('unitId').value = device.unit_id;
      document.getElementById('host').value = device.host;
      document.getElementById('port').value = device.tcp_port || 2255;
      document.getElementById('tcpEnabled').checked = device.tcp_enabled || false;
      document.getElementById('ftpEnabled').checked = device.ftp_enabled || false;
      if (device.ftp_username) {
        document.getElementById('ftpUsername').value = device.ftp_username;
      }
      if (device.ftp_password) {
        document.getElementById('ftpPassword').value = device.ftp_password;
      }
      toggleFtpCredentials();
      log(`Loaded configuration for ${device.unit_id}`);
    }
    // Add event listeners
    document.addEventListener('DOMContentLoaded', function() {
      document.getElementById('ftpEnabled').addEventListener('change', toggleFtpCredentials);
      // Load device list on page load
      refreshDeviceList();
    });
    async function runDiagnostics() {
@@ -216,6 +333,134 @@
        html += `<p style="margin-top: 12px; font-size: 0.9em; color: #666;">Last run: ${new Date(data.timestamp).toLocaleString()}</p>`;
        // Add database dump section if available
        if (data.database_dump) {
          html += `<div style="margin-top: 16px; border-top: 1px solid #d0d7de; padding-top: 12px;">`;
          html += `<h4 style="margin: 0 0 12px 0;">📦 Database Dump</h4>`;
          // Config section
          if (data.database_dump.config) {
            const cfg = data.database_dump.config;
            html += `<div style="background: #f0f4f8; padding: 12px; border-radius: 4px; margin-bottom: 12px;">`;
            html += `<strong>Configuration (nl43_config)</strong>`;
            html += `<table style="width: 100%; margin-top: 8px; font-size: 0.9em;">`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">Host</td><td>${cfg.host}:${cfg.tcp_port}</td></tr>`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">TCP Enabled</td><td>${cfg.tcp_enabled ? '✓' : '✗'}</td></tr>`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">FTP Enabled</td><td>${cfg.ftp_enabled ? '✓' : '✗'}${cfg.ftp_enabled ? ` (port ${cfg.ftp_port}, user: ${cfg.ftp_username || 'none'})` : ''}</td></tr>`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">Background Polling</td><td>${cfg.poll_enabled ? `✓ every ${cfg.poll_interval_seconds}s` : '✗ disabled'}</td></tr>`;
            html += `</table></div>`;
          }
          // Status cache section
          if (data.database_dump.status_cache) {
            const cache = data.database_dump.status_cache;
            html += `<div style="background: #f0f8f4; padding: 12px; border-radius: 4px; margin-bottom: 12px;">`;
            html += `<strong>Status Cache (nl43_status)</strong>`;
            html += `<table style="width: 100%; margin-top: 8px; font-size: 0.9em;">`;
            // Measurement state and timing
            html += `<tr><td style="padding: 2px 8px; color: #666;">Measurement State</td><td><strong>${cache.measurement_state || 'unknown'}</strong></td></tr>`;
            if (cache.measurement_start_time) {
              const startTime = new Date(cache.measurement_start_time);
              const elapsed = Math.floor((Date.now() - startTime) / 1000);
              const elapsedStr = elapsed > 3600 ? `${Math.floor(elapsed/3600)}h ${Math.floor((elapsed%3600)/60)}m` : elapsed > 60 ? `${Math.floor(elapsed/60)}m ${elapsed%60}s` : `${elapsed}s`;
              html += `<tr><td style="padding: 2px 8px; color: #666;">Measurement Started</td><td>${startTime.toLocaleString()} (${elapsedStr} ago)</td></tr>`;
            }
            html += `<tr><td style="padding: 2px 8px; color: #666;">Counter (d0)</td><td>${cache.counter || 'N/A'}</td></tr>`;
            // Sound levels
            html += `<tr><td colspan="2" style="padding: 8px 8px 2px 8px; font-weight: 600; border-top: 1px solid #d0d7de;">Sound Levels (dB)</td></tr>`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">Lp (Instantaneous)</td><td>${cache.lp || 'N/A'}</td></tr>`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">Leq (Equivalent)</td><td>${cache.leq || 'N/A'}</td></tr>`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">Lmax / Lmin</td><td>${cache.lmax || 'N/A'} / ${cache.lmin || 'N/A'}</td></tr>`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">Lpeak</td><td>${cache.lpeak || 'N/A'}</td></tr>`;
            // Device status
            html += `<tr><td colspan="2" style="padding: 8px 8px 2px 8px; font-weight: 600; border-top: 1px solid #d0d7de;">Device Status</td></tr>`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">Battery</td><td>${cache.battery_level || 'N/A'}${cache.power_source ? ` (${cache.power_source})` : ''}</td></tr>`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">SD Card</td><td>${cache.sd_remaining_mb ? `${cache.sd_remaining_mb} MB` : 'N/A'}${cache.sd_free_ratio ? ` (${cache.sd_free_ratio} free)` : ''}</td></tr>`;
            // Polling status
            html += `<tr><td colspan="2" style="padding: 8px 8px 2px 8px; font-weight: 600; border-top: 1px solid #d0d7de;">Polling Status</td></tr>`;
            html += `<tr><td style="padding: 2px 8px; color: #666;">Reachable</td><td>${cache.is_reachable ? '🟢 Yes' : '🔴 No'}</td></tr>`;
            if (cache.last_seen) {
              html += `<tr><td style="padding: 2px 8px; color: #666;">Last Seen</td><td>${new Date(cache.last_seen).toLocaleString()}</td></tr>`;
            }
            if (cache.last_success) {
              html += `<tr><td style="padding: 2px 8px; color: #666;">Last Success</td><td>${new Date(cache.last_success).toLocaleString()}</td></tr>`;
            }
            if (cache.last_poll_attempt) {
              html += `<tr><td style="padding: 2px 8px; color: #666;">Last Poll Attempt</td><td>${new Date(cache.last_poll_attempt).toLocaleString()}</td></tr>`;
            }
            html += `<tr><td style="padding: 2px 8px; color: #666;">Consecutive Failures</td><td>${cache.consecutive_failures || 0}</td></tr>`;
            if (cache.last_error) {
              html += `<tr><td style="padding: 2px 8px; color: #666;">Last Error</td><td style="color: #d00; font-size: 0.85em;">${cache.last_error}</td></tr>`;
            }
            html += `</table></div>`;
            // Raw payload (collapsible)
            if (cache.raw_payload) {
              html += `<details style="margin-top: 8px;"><summary style="cursor: pointer; color: #666; font-size: 0.9em;">📄 Raw Payload</summary>`;
              html += `<pre style="background: #f6f8fa; padding: 8px; border-radius: 4px; font-size: 0.8em; overflow-x: auto; margin-top: 8px;">${cache.raw_payload}</pre></details>`;
            }
          } else {
            html += `<p style="color: #888; font-style: italic;">No cached status available for this unit.</p>`;
          }
          html += `</div>`;
        }
        // Fetch and display device logs
        try {
          const logsRes = await fetch(`/api/nl43/${unitId}/logs?limit=50`);
          if (logsRes.ok) {
            const logsData = await logsRes.json();
            if (logsData.logs && logsData.logs.length > 0) {
              html += `<div style="margin-top: 16px; border-top: 1px solid #d0d7de; padding-top: 12px;">`;
              html += `<h4 style="margin: 0 0 12px 0;">📋 Device Logs (${logsData.stats.total} total)</h4>`;
              // Stats summary
              if (logsData.stats.by_level) {
                html += `<div style="margin-bottom: 8px; font-size: 0.85em; color: #666;">`;
                const levels = logsData.stats.by_level;
                const parts = [];
                if (levels.ERROR) parts.push(`<span style="color: #d00;">${levels.ERROR} errors</span>`);
                if (levels.WARNING) parts.push(`<span style="color: #fa0;">${levels.WARNING} warnings</span>`);
                if (levels.INFO) parts.push(`${levels.INFO} info`);
                html += parts.join(' · ');
                html += `</div>`;
              }
              // Log entries (collapsible)
              html += `<details open><summary style="cursor: pointer; font-size: 0.9em; margin-bottom: 8px;">Recent entries (${logsData.logs.length})</summary>`;
              html += `<div style="max-height: 300px; overflow-y: auto; background: #f6f8fa; border: 1px solid #d0d7de; border-radius: 4px; padding: 8px; font-size: 0.8em; font-family: monospace;">`;
              logsData.logs.forEach(entry => {
                const levelColor = {
                  'ERROR': '#d00',
                  'WARNING': '#b86e00',
                  'INFO': '#0969da',
                  'DEBUG': '#888'
                }[entry.level] || '#666';
                const time = new Date(entry.timestamp).toLocaleString();
                html += `<div style="margin-bottom: 4px; border-bottom: 1px solid #eee; padding-bottom: 4px;">`;
                html += `<span style="color: #888;">${time}</span> `;
                html += `<span style="color: ${levelColor}; font-weight: 600;">[${entry.level}]</span> `;
                html += `<span style="color: #666;">[${entry.category}]</span> `;
                html += `${entry.message}`;
                html += `</div>`;
              });
              html += `</div></details>`;
              html += `</div>`;
            }
          }
        } catch (logErr) {
          console.log('Could not fetch device logs:', logErr);
        }
        resultsEl.innerHTML = html;
        log(`Diagnostics complete: ${data.overall_status}`);
@@ -0,0 +1,901 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>SLMM - Device Roster &amp; Connections</title>
  <style>
    * { box-sizing: border-box; }
    body {
      font-family: system-ui, -apple-system, sans-serif;
      margin: 0;
      padding: 24px;
      background: #f6f8fa;
    }
    .container { max-width: 1400px; margin: 0 auto; }
    .header {
      display: flex;
      justify-content: space-between;
      align-items: center;
      margin-bottom: 24px;
      padding: 16px;
      background: white;
      border-radius: 6px;
      box-shadow: 0 1px 3px rgba(0,0,0,0.1);
    }
    h1 { margin: 0; font-size: 24px; }
    .nav { display: flex; gap: 12px; }
    .btn {
      padding: 8px 16px;
      border: 1px solid #d0d7de;
      background: white;
      border-radius: 6px;
      cursor: pointer;
      text-decoration: none;
      color: #24292f;
      font-size: 14px;
      transition: background 0.2s;
    }
    .btn:hover { background: #f6f8fa; }
    .btn-primary {
      background: #2da44e;
      color: white;
      border-color: #2da44e;
    }
    .btn-primary:hover { background: #2c974b; }
    .btn-danger {
      background: #cf222e;
      color: white;
      border-color: #cf222e;
    }
    .btn-danger:hover { background: #a40e26; }
    .btn-small {
      padding: 4px 8px;
      font-size: 12px;
      margin-right: 4px;
    }
    .table-container {
      background: white;
      border-radius: 6px;
      box-shadow: 0 1px 3px rgba(0,0,0,0.1);
      overflow-x: auto;
    }
    table {
      width: 100%;
      border-collapse: collapse;
    }
    th {
      background: #f6f8fa;
      padding: 12px;
      text-align: left;
      font-weight: 600;
      border-bottom: 2px solid #d0d7de;
      font-size: 13px;
      white-space: nowrap;
    }
    td {
      padding: 12px;
      border-bottom: 1px solid #d0d7de;
      font-size: 13px;
    }
    tr:hover { background: #f6f8fa; }
    .status-badge {
      display: inline-block;
      padding: 2px 8px;
      border-radius: 12px;
      font-size: 11px;
      font-weight: 600;
      text-transform: uppercase;
    }
    .status-ok {
      background: #dafbe1;
      color: #1a7f37;
    }
    .status-unknown {
      background: #eaeef2;
      color: #57606a;
    }
    .status-error {
      background: #ffebe9;
      color: #cf222e;
    }
    .checkbox-cell {
      text-align: center;
      width: 80px;
    }
    .checkbox-cell input[type="checkbox"] {
      cursor: pointer;
      width: 16px;
      height: 16px;
    }
    .actions-cell {
      white-space: nowrap;
      width: 200px;
    }
    .empty-state {
      text-align: center;
      padding: 48px;
      color: #57606a;
    }
    .empty-state-icon {
      font-size: 48px;
      margin-bottom: 16px;
    }
    .modal {
      display: none;
      position: fixed;
      top: 0;
      left: 0;
      width: 100%;
      height: 100%;
      background: rgba(0,0,0,0.5);
      z-index: 1000;
      align-items: center;
      justify-content: center;
    }
    .modal.active { display: flex; }
    .modal-content {
      background: white;
      padding: 24px;
      border-radius: 6px;
      max-width: 600px;
      width: 90%;
      max-height: 80vh;
      overflow-y: auto;
    }
    .modal-header {
      display: flex;
      justify-content: space-between;
      align-items: center;
      margin-bottom: 16px;
    }
    .modal-header h2 {
      margin: 0;
      font-size: 20px;
    }
    .close-btn {
      background: none;
      border: none;
      font-size: 24px;
      cursor: pointer;
      color: #57606a;
      padding: 0;
      width: 32px;
      height: 32px;
    }
    .close-btn:hover { color: #24292f; }
    .form-group {
      margin-bottom: 16px;
    }
    .form-group label {
      display: block;
      margin-bottom: 6px;
      font-weight: 600;
      font-size: 14px;
    }
    .form-group input[type="text"],
    .form-group input[type="number"],
    .form-group input[type="password"] {
      width: 100%;
      padding: 8px 12px;
      border: 1px solid #d0d7de;
      border-radius: 6px;
      font-size: 14px;
    }
    .form-group input[type="checkbox"] {
      width: auto;
      margin-right: 8px;
    }
    .checkbox-label {
      display: flex;
      align-items: center;
      font-weight: normal;
      cursor: pointer;
    }
    .form-actions {
      display: flex;
      justify-content: flex-end;
      gap: 8px;
      margin-top: 24px;
    }
    .toast {
      position: fixed;
      top: 24px;
      right: 24px;
      padding: 12px 16px;
      background: #24292f;
      color: white;
      border-radius: 6px;
      box-shadow: 0 4px 12px rgba(0,0,0,0.15);
      z-index: 2000;
      display: none;
      min-width: 300px;
    }
    .toast.active {
      display: block;
      animation: slideIn 0.3s ease-out;
    }
    @keyframes slideIn {
      from {
        transform: translateX(400px);
        opacity: 0;
      }
      to {
        transform: translateX(0);
        opacity: 1;
      }
    }
    .toast-success { background: #2da44e; }
    .toast-error { background: #cf222e; }
    /* Tabs */
    .tabs {
      display: flex;
      gap: 0;
      margin-bottom: 0;
      border-bottom: 2px solid #d0d7de;
    }
    .tab-btn {
      padding: 10px 20px;
      border: none;
      background: none;
      cursor: pointer;
      font-size: 14px;
      font-weight: 600;
      color: #57606a;
      border-bottom: 2px solid transparent;
      margin-bottom: -2px;
      transition: color 0.2s, border-color 0.2s;
    }
    .tab-btn:hover { color: #24292f; }
    .tab-btn.active {
      color: #24292f;
      border-bottom-color: #fd8c73;
    }
    .tab-panel { display: none; }
    .tab-panel.active { display: block; }
    /* Connection pool panel */
    .pool-config {
      display: grid;
      grid-template-columns: repeat(auto-fill, minmax(180px, 1fr));
      gap: 12px;
      margin-bottom: 20px;
    }
    .pool-config-card {
      background: #f6f8fa;
      border: 1px solid #d0d7de;
      border-radius: 6px;
      padding: 12px;
    }
    .pool-config-card .label {
      font-size: 11px;
      color: #57606a;
      text-transform: uppercase;
      font-weight: 600;
      margin-bottom: 4px;
    }
    .pool-config-card .value {
      font-size: 18px;
      font-weight: 600;
      color: #24292f;
    }
    .conn-card {
      background: white;
      border: 1px solid #d0d7de;
      border-radius: 6px;
      padding: 16px;
      margin-bottom: 12px;
    }
    .conn-card-header {
      display: flex;
      justify-content: space-between;
      align-items: center;
      margin-bottom: 12px;
    }
    .conn-card-header strong { font-size: 15px; }
    .conn-card-grid {
      display: grid;
      grid-template-columns: repeat(auto-fill, minmax(140px, 1fr));
      gap: 8px;
    }
    .conn-stat .label {
      font-size: 11px;
      color: #57606a;
      text-transform: uppercase;
      font-weight: 600;
    }
    .conn-stat .value {
      font-size: 14px;
      font-weight: 600;
      color: #24292f;
    }
    .conn-empty {
      text-align: center;
      padding: 32px;
      color: #57606a;
    }
    .pool-actions {
      display: flex;
      gap: 8px;
      margin-bottom: 16px;
    }
  </style>
 </head>
 <body>
  <div class="container">
    <div class="header">
      <h1>SLMM - Roster &amp; Connections</h1>
      <div class="nav">
        <a href="/" class="btn">&larr; Back to Control Panel</a>
        <button class="btn btn-primary" onclick="openAddModal()">+ Add Device</button>
      </div>
    </div>
    <div class="tabs">
      <button class="tab-btn active" onclick="switchTab('roster')">Device Roster</button>
      <button class="tab-btn" onclick="switchTab('connections')">Connections</button>
    </div>
    <!-- Roster Tab -->
    <div id="tab-roster" class="tab-panel active">
      <div class="table-container" style="border-top-left-radius: 0; border-top-right-radius: 0;">
        <table id="rosterTable">
          <thead>
            <tr>
              <th>Unit ID</th>
              <th>Host / IP</th>
              <th>TCP Port</th>
              <th>FTP Port</th>
              <th class="checkbox-cell">TCP</th>
              <th class="checkbox-cell">FTP</th>
              <th class="checkbox-cell">Polling</th>
              <th>Status</th>
              <th class="actions-cell">Actions</th>
            </tr>
          </thead>
          <tbody id="rosterBody">
            <tr>
              <td colspan="9" style="text-align: center; padding: 24px;">
                Loading...
              </td>
            </tr>
          </tbody>
        </table>
      </div>
    </div>
    <!-- Connections Tab -->
    <div id="tab-connections" class="tab-panel">
      <div class="table-container" style="padding: 20px; border-top-left-radius: 0; border-top-right-radius: 0;">
        <div class="pool-actions">
          <button class="btn" onclick="loadConnections()">Refresh</button>
          <button class="btn btn-danger" onclick="flushConnections()">Flush All Connections</button>
        </div>
        <h3 style="margin: 0 0 12px 0; font-size: 16px;">Pool Configuration</h3>
        <div id="poolConfig" class="pool-config">
          <div class="pool-config-card">
            <div class="label">Status</div>
            <div class="value" id="poolEnabled">--</div>
          </div>
        </div>
        <h3 style="margin: 20px 0 12px 0; font-size: 16px;">Active Connections</h3>
        <div id="connectionsList">
          <div class="conn-empty">Loading...</div>
        </div>
      </div>
    </div>
  </div>
  <!-- Add/Edit Modal -->
  <div id="deviceModal" class="modal">
    <div class="modal-content">
      <div class="modal-header">
        <h2 id="modalTitle">Add Device</h2>
        <button class="close-btn" onclick="closeModal()">&times;</button>
      </div>
      <form id="deviceForm" onsubmit="saveDevice(event)">
        <div class="form-group">
          <label for="unitId">Unit ID *</label>
          <input type="text" id="unitId" required placeholder="e.g., nl43-1, slm-site-a" />
        </div>
        <div class="form-group">
          <label for="host">Host / IP Address *</label>
          <input type="text" id="host" required placeholder="e.g., 192.168.1.100" />
        </div>
        <div class="form-group">
          <label for="tcpPort">TCP Port *</label>
          <input type="number" id="tcpPort" required value="2255" min="1" max="65535" />
        </div>
        <div class="form-group">
          <label for="ftpPort">FTP Port</label>
          <input type="number" id="ftpPort" value="21" min="1" max="65535" />
        </div>
        <div class="form-group">
          <label class="checkbox-label">
            <input type="checkbox" id="tcpEnabled" checked />
            TCP Enabled (required for remote control)
          </label>
        </div>
        <div class="form-group">
          <label class="checkbox-label">
            <input type="checkbox" id="ftpEnabled" onchange="toggleFtpCredentials()" />
            FTP Enabled (for file downloads)
          </label>
        </div>
        <div id="ftpCredentialsSection" style="display: none; padding: 12px; background: #f6f8fa; border-radius: 6px; margin-bottom: 16px;">
          <div class="form-group">
            <label for="ftpUsername">FTP Username</label>
            <input type="text" id="ftpUsername" placeholder="Default: USER" />
          </div>
          <div class="form-group">
            <label for="ftpPassword">FTP Password</label>
            <input type="password" id="ftpPassword" placeholder="Default: 0000" />
          </div>
        </div>
        <div class="form-group">
          <label class="checkbox-label">
            <input type="checkbox" id="pollEnabled" checked />
            Enable background polling (status updates)
          </label>
        </div>
        <div class="form-group">
          <label for="pollInterval">Polling Interval (seconds)</label>
          <input type="number" id="pollInterval" value="60" min="10" max="3600" />
        </div>
        <div class="form-actions">
          <button type="button" class="btn" onclick="closeModal()">Cancel</button>
          <button type="submit" class="btn btn-primary">Save Device</button>
        </div>
      </form>
    </div>
  </div>
  <!-- Toast Notification -->
  <div id="toast" class="toast"></div>
  <script>
    let devices = [];
    let editingDeviceId = null;
    // Load roster on page load
    document.addEventListener('DOMContentLoaded', () => {
      loadRoster();
    });
    async function loadRoster() {
      try {
        const res = await fetch('/api/nl43/roster');
        const data = await res.json();
        if (!res.ok) {
          showToast('Failed to load roster', 'error');
          return;
        }
        devices = data.devices || [];
        renderRoster();
      } catch (err) {
        showToast('Error loading roster: ' + err.message, 'error');
        console.error('Load roster error:', err);
      }
    }
    function renderRoster() {
      const tbody = document.getElementById('rosterBody');
      if (devices.length === 0) {
        tbody.innerHTML = `
          <tr>
            <td colspan="9" class="empty-state">
              <div class="empty-state-icon">📭</div>
              <div><strong>No devices configured</strong></div>
              <div style="margin-top: 8px; font-size: 14px;">Click "Add Device" to configure your first sound level meter</div>
            </td>
          </tr>
        `;
        return;
      }
      tbody.innerHTML = devices.map(device => `
        <tr>
          <td><strong>${escapeHtml(device.unit_id)}</strong></td>
          <td>${escapeHtml(device.host)}</td>
          <td>${device.tcp_port}</td>
          <td>${device.ftp_port || 21}</td>
          <td class="checkbox-cell">
            <input type="checkbox" ${device.tcp_enabled ? 'checked' : ''} disabled />
          </td>
          <td class="checkbox-cell">
            <input type="checkbox" ${device.ftp_enabled ? 'checked' : ''} disabled />
          </td>
          <td class="checkbox-cell">
            <input type="checkbox" ${device.poll_enabled ? 'checked' : ''} disabled />
          </td>
          <td>
            ${getStatusBadge(device)}
          </td>
          <td class="actions-cell">
            <button class="btn btn-small" onclick="testDevice('${escapeHtml(device.unit_id)}')">Test</button>
            <button class="btn btn-small" onclick="openEditModal('${escapeHtml(device.unit_id)}')">Edit</button>
            <button class="btn btn-small btn-danger" onclick="deleteDevice('${escapeHtml(device.unit_id)}')">Delete</button>
          </td>
        </tr>
      `).join('');
    }
    function getStatusBadge(device) {
      if (!device.status) {
        return '<span class="status-badge status-unknown">Unknown</span>';
      }
      if (device.status.is_reachable === false) {
        return '<span class="status-badge status-error">Offline</span>';
      }
      if (device.status.last_success) {
        const lastSeen = new Date(device.status.last_success);
        const ago = Math.floor((Date.now() - lastSeen) / 1000);
        if (ago < 300) { // Less than 5 minutes
          return '<span class="status-badge status-ok">Online</span>';
        } else {
          return `<span class="status-badge status-unknown">Stale (${Math.floor(ago / 60)}m ago)</span>`;
        }
      }
      return '<span class="status-badge status-unknown">Unknown</span>';
    }
    function escapeHtml(text) {
      const map = {
        '&': '&amp;',
        '<': '&lt;',
        '>': '&gt;',
        '"': '&quot;',
        "'": '&#039;'
      };
      return String(text).replace(/[&<>"']/g, m => map[m]);
    }
    function openAddModal() {
      editingDeviceId = null;
      document.getElementById('modalTitle').textContent = 'Add Device';
      document.getElementById('deviceForm').reset();
      document.getElementById('unitId').disabled = false;
      document.getElementById('tcpEnabled').checked = true;
      document.getElementById('ftpEnabled').checked = false;
      document.getElementById('pollEnabled').checked = true;
      document.getElementById('tcpPort').value = 2255;
      document.getElementById('ftpPort').value = 21;
      document.getElementById('pollInterval').value = 60;
      toggleFtpCredentials();
      document.getElementById('deviceModal').classList.add('active');
    }
    function openEditModal(unitId) {
      const device = devices.find(d => d.unit_id === unitId);
      if (!device) {
        showToast('Device not found', 'error');
        return;
      }
      editingDeviceId = unitId;
      document.getElementById('modalTitle').textContent = 'Edit Device';
      document.getElementById('unitId').value = device.unit_id;
      document.getElementById('unitId').disabled = true;
      document.getElementById('host').value = device.host;
      document.getElementById('tcpPort').value = device.tcp_port;
      document.getElementById('ftpPort').value = device.ftp_port || 21;
      document.getElementById('tcpEnabled').checked = device.tcp_enabled;
      document.getElementById('ftpEnabled').checked = device.ftp_enabled;
      document.getElementById('ftpUsername').value = device.ftp_username || '';
      document.getElementById('ftpPassword').value = device.ftp_password || '';
      document.getElementById('pollEnabled').checked = device.poll_enabled;
      document.getElementById('pollInterval').value = device.poll_interval_seconds || 60;
      toggleFtpCredentials();
      document.getElementById('deviceModal').classList.add('active');
    }
    function closeModal() {
      document.getElementById('deviceModal').classList.remove('active');
      editingDeviceId = null;
    }
    function toggleFtpCredentials() {
      const ftpEnabled = document.getElementById('ftpEnabled').checked;
      document.getElementById('ftpCredentialsSection').style.display = ftpEnabled ? 'block' : 'none';
    }
    async function saveDevice(event) {
      event.preventDefault();
      const unitId = document.getElementById('unitId').value.trim();
      const payload = {
        host: document.getElementById('host').value.trim(),
        tcp_port: parseInt(document.getElementById('tcpPort').value),
        ftp_port: parseInt(document.getElementById('ftpPort').value),
        tcp_enabled: document.getElementById('tcpEnabled').checked,
        ftp_enabled: document.getElementById('ftpEnabled').checked,
        poll_enabled: document.getElementById('pollEnabled').checked,
        poll_interval_seconds: parseInt(document.getElementById('pollInterval').value)
      };
      if (payload.ftp_enabled) {
        const username = document.getElementById('ftpUsername').value.trim();
        const password = document.getElementById('ftpPassword').value.trim();
        if (username) payload.ftp_username = username;
        if (password) payload.ftp_password = password;
      }
      try {
        const url = editingDeviceId
          ? `/api/nl43/${editingDeviceId}/config`
          : `/api/nl43/roster`;
        const method = editingDeviceId ? 'PUT' : 'POST';
        const body = editingDeviceId
          ? payload
          : { unit_id: unitId, ...payload };
        const res = await fetch(url, {
          method,
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify(body)
        });
        const data = await res.json();
        if (!res.ok) {
          showToast(data.detail || 'Failed to save device', 'error');
          return;
        }
        showToast(editingDeviceId ? 'Device updated successfully' : 'Device added successfully', 'success');
        closeModal();
        await loadRoster();
      } catch (err) {
        showToast('Error saving device: ' + err.message, 'error');
        console.error('Save device error:', err);
      }
    }
    async function deleteDevice(unitId) {
      if (!confirm(`Are you sure you want to delete "${unitId}"?\n\nThis will remove the device configuration but will not affect the physical device.`)) {
        return;
      }
      try {
        const res = await fetch(`/api/nl43/${unitId}/config`, {
          method: 'DELETE'
        });
        const data = await res.json();
        if (!res.ok) {
          showToast(data.detail || 'Failed to delete device', 'error');
          return;
        }
        showToast('Device deleted successfully', 'success');
        await loadRoster();
      } catch (err) {
        showToast('Error deleting device: ' + err.message, 'error');
        console.error('Delete device error:', err);
      }
    }
    async function testDevice(unitId) {
      showToast('Testing device connection...', 'success');
      try {
        const res = await fetch(`/api/nl43/${unitId}/diagnostics`);
        const data = await res.json();
        if (!res.ok) {
          showToast('Device test failed', 'error');
          return;
        }
        const statusText = {
          'pass': 'All systems operational ✓',
          'fail': 'Connection failed ✗',
          'degraded': 'Partial connectivity ⚠'
        };
        showToast(statusText[data.overall_status] || 'Test complete',
                  data.overall_status === 'pass' ? 'success' : 'error');
        // Reload to update status
        await loadRoster();
      } catch (err) {
        showToast('Error testing device: ' + err.message, 'error');
        console.error('Test device error:', err);
      }
    }
    function showToast(message, type = 'success') {
      const toast = document.getElementById('toast');
      toast.textContent = message;
      toast.className = `toast toast-${type} active`;
      setTimeout(() => {
        toast.classList.remove('active');
      }, 3000);
    }
    // Close modal when clicking outside
    document.getElementById('deviceModal').addEventListener('click', (e) => {
      if (e.target.id === 'deviceModal') {
        closeModal();
      }
    });
    // ========== Tab Switching ==========
    function switchTab(tabName) {
      document.querySelectorAll('.tab-btn').forEach(btn => btn.classList.remove('active'));
      document.querySelectorAll('.tab-panel').forEach(panel => panel.classList.remove('active'));
      document.querySelector(`.tab-btn[onclick="switchTab('${tabName}')"]`).classList.add('active');
      document.getElementById(`tab-${tabName}`).classList.add('active');
      if (tabName === 'connections') {
        loadConnections();
      }
    }
    // ========== Connection Pool ==========
    let connectionsRefreshTimer = null;
    async function loadConnections() {
      try {
        const res = await fetch('/api/nl43/_connections/status');
        const data = await res.json();
        if (!res.ok) {
          showToast('Failed to load connection pool status', 'error');
          return;
        }
        const pool = data.pool;
        renderPoolConfig(pool);
        renderConnections(pool.connections);
        // Auto-refresh while tab is active
        clearTimeout(connectionsRefreshTimer);
        if (document.getElementById('tab-connections').classList.contains('active')) {
          connectionsRefreshTimer = setTimeout(loadConnections, 5000);
        }
      } catch (err) {
        showToast('Error loading connections: ' + err.message, 'error');
        console.error('Load connections error:', err);
      }
    }
    function renderPoolConfig(pool) {
      document.getElementById('poolConfig').innerHTML = `
        <div class="pool-config-card">
          <div class="label">Persistent</div>
          <div class="value" style="color: ${pool.enabled ? '#1a7f37' : '#cf222e'}">${pool.enabled ? 'Enabled' : 'Disabled'}</div>
        </div>
        <div class="pool-config-card">
          <div class="label">Active</div>
          <div class="value">${pool.active_connections}</div>
        </div>
        <div class="pool-config-card">
          <div class="label">Idle TTL</div>
          <div class="value">${pool.idle_ttl}s</div>
        </div>
        <div class="pool-config-card">
          <div class="label">Max Age</div>
          <div class="value">${pool.max_age}s</div>
        </div>
        <div class="pool-config-card">
          <div class="label">KA Idle</div>
          <div class="value">${pool.keepalive_idle}s</div>
        </div>
        <div class="pool-config-card">
          <div class="label">KA Interval</div>
          <div class="value">${pool.keepalive_interval}s</div>
        </div>
        <div class="pool-config-card">
          <div class="label">KA Probes</div>
          <div class="value">${pool.keepalive_count}</div>
        </div>
      `;
    }
    function renderConnections(connections) {
      const container = document.getElementById('connectionsList');
      const keys = Object.keys(connections);
      if (keys.length === 0) {
        container.innerHTML = `
          <div class="conn-empty">
            <div style="font-size: 32px; margin-bottom: 8px;">~</div>
            <div><strong>No active connections</strong></div>
            <div style="margin-top: 4px; font-size: 13px;">
              Connections appear here when devices are actively being polled and the connection is cached between commands.
            </div>
          </div>
        `;
        return;
      }
      container.innerHTML = keys.map(key => {
        const conn = connections[key];
        const aliveColor = conn.alive ? '#1a7f37' : '#cf222e';
        const aliveText = conn.alive ? 'Alive' : 'Stale';
        return `
          <div class="conn-card">
            <div class="conn-card-header">
              <strong>${escapeHtml(key)}</strong>
              <span class="status-badge ${conn.alive ? 'status-ok' : 'status-error'}">${aliveText}</span>
            </div>
            <div class="conn-card-grid">
              <div class="conn-stat">
                <div class="label">Host</div>
                <div class="value">${escapeHtml(conn.host)}</div>
              </div>
              <div class="conn-stat">
                <div class="label">Port</div>
                <div class="value">${conn.port}</div>
              </div>
              <div class="conn-stat">
                <div class="label">Age</div>
                <div class="value">${formatSeconds(conn.age_seconds)}</div>
              </div>
              <div class="conn-stat">
                <div class="label">Idle</div>
                <div class="value">${formatSeconds(conn.idle_seconds)}</div>
              </div>
            </div>
          </div>
        `;
      }).join('');
    }
    function formatSeconds(s) {
      if (s < 60) return Math.round(s) + 's';
      if (s < 3600) return Math.floor(s / 60) + 'm ' + Math.round(s % 60) + 's';
      return Math.floor(s / 3600) + 'h ' + Math.floor((s % 3600) / 60) + 'm';
    }
    async function flushConnections() {
      if (!confirm('Close all cached TCP connections?\n\nDevices will reconnect on the next poll cycle.')) {
        return;
      }
      try {
        const res = await fetch('/api/nl43/_connections/flush', { method: 'POST' });
        const data = await res.json();
        if (!res.ok) {
          showToast(data.detail || 'Failed to flush connections', 'error');
          return;
        }
        showToast('All connections flushed', 'success');
        await loadConnections();
      } catch (err) {
        showToast('Error flushing connections: ' + err.message, 'error');
      }
    }
  </script>
 </body>
 </html>
@@ -0,0 +1,68 @@
 """
 Synthetic unit test for the alert state machine — no DB, no device.
 Drives `_evaluate_step` with a fake clock + a level series and checks that
 onset/clear fire with the right debounce + hysteresis. Run:
    docker compose exec -T slmm python3 test_alert_evaluator.py
    # or, if app.alerts imports cleanly standalone:  python3 test_alert_evaluator.py
 """
 from types import SimpleNamespace
 from app.alerts import RuleState, _evaluate_step
 def rule(**kw):
    base = dict(threshold_db=85.0, duration_s=3, clear_margin_db=2.0, comparison="above")
    base.update(kw)
    return SimpleNamespace(**base)
 def run(series, r):
    st = RuleState()
    events = [(now, a) for value, now in series
              if (a := _evaluate_step(st, value, now, r))]
    return events, st
 def main():
    failures = 0
    def check(label, cond, detail=""):
        nonlocal failures
        print(("PASS" if cond else "FAIL"), label, detail)
        if not cond:
            failures += 1
    # 1) sustained exceedance -> onset after duration; recovery -> clear after duration
    r = rule(threshold_db=85, duration_s=3, clear_margin_db=2)
    ev, _ = run([(80, 0), (86, 1), (87, 2), (88, 3), (88, 4),
                 (88, 5), (82, 6), (82, 7), (82, 8), (82, 9)], r)
    onsets = [t for t, a in ev if a == "onset"]
    clears = [t for t, a in ev if a == "clear"]
    check("1 sustained onset@4 / clear@9", onsets == [4] and clears == [9], str(ev))
    # 2) brief spike under duration -> no onset (debounce)
    ev, _ = run([(80, 0), (90, 1), (90, 2), (80, 3), (80, 4)], rule(duration_s=3))
    check("2 brief spike debounced", ev == [], str(ev))
    # 3) hysteresis: a dip into the margin (below threshold, above threshold-margin)
    #    does NOT clear
    r = rule(threshold_db=85, duration_s=0, clear_margin_db=3)
    ev, st = run([(86, 0), (84, 1), (84, 2), (84, 3)], r)
    check("3 hysteresis holds ACTIVE", ev == [(0, "onset")] and st.phase == "active",
          f"{ev} phase={st.phase}")
    # 4) 'below' comparison (device too quiet) -> onset when value < threshold
    ev, _ = run([(30, 0), (15, 1)], rule(threshold_db=20, duration_s=0,
                                         clear_margin_db=2, comparison="below"))
    check("4 below-comparison onset@1", ev == [(1, "onset")], str(ev))
    print()
    print("ALL PASS" if failures == 0 else f"{failures} FAILURE(S)")
    return failures
 if __name__ == "__main__":
    import sys
    sys.exit(1 if main() else 0)
@@ -0,0 +1,167 @@
 #!/bin/bash
 # Manual test script for background polling functionality
 # Usage: ./test_polling.sh [UNIT_ID]
 BASE_URL="http://localhost:8100/api/nl43"
 UNIT_ID="${1:-NL43-001}"
 echo "=========================================="
 echo "Background Polling Test Script"
 echo "=========================================="
 echo "Testing device: $UNIT_ID"
 echo "Base URL: $BASE_URL"
 echo ""
 # Color codes for output
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
 RED='\033[0;31m'
 NC='\033[0m' # No Color
 # Function to print test header
 test_header() {
    echo ""
    echo "=========================================="
    echo "$1"
    echo "=========================================="
 }
 # Function to print success
 success() {
    echo -e "${GREEN}✓${NC} $1"
 }
 # Function to print warning
 warning() {
    echo -e "${YELLOW}⚠${NC} $1"
 }
 # Function to print error
 error() {
    echo -e "${RED}✗${NC} $1"
 }
 # Test 1: Get current polling configuration
 test_header "Test 1: Get Current Polling Configuration"
 RESPONSE=$(curl -s "$BASE_URL/$UNIT_ID/polling/config")
 echo "$RESPONSE" | jq '.'
 if echo "$RESPONSE" | jq -e '.status == "ok"' > /dev/null; then
    success "Successfully retrieved polling configuration"
    CURRENT_INTERVAL=$(echo "$RESPONSE" | jq -r '.data.poll_interval_seconds')
    CURRENT_ENABLED=$(echo "$RESPONSE" | jq -r '.data.poll_enabled')
    echo "  Current interval: ${CURRENT_INTERVAL}s"
    echo "  Polling enabled: $CURRENT_ENABLED"
 else
    error "Failed to retrieve polling configuration"
    exit 1
 fi
 # Test 2: Update polling interval to 30 seconds
 test_header "Test 2: Update Polling Interval to 30 Seconds"
 RESPONSE=$(curl -s -X PUT "$BASE_URL/$UNIT_ID/polling/config" \
  -H "Content-Type: application/json" \
  -d '{"poll_interval_seconds": 30}')
 echo "$RESPONSE" | jq '.'
 if echo "$RESPONSE" | jq -e '.status == "ok"' > /dev/null; then
    success "Successfully updated polling interval to 30s"
 else
    error "Failed to update polling interval"
 fi
 # Test 3: Check global polling status
 test_header "Test 3: Check Global Polling Status"
 RESPONSE=$(curl -s "$BASE_URL/_polling/status")
 echo "$RESPONSE" | jq '.'
 if echo "$RESPONSE" | jq -e '.status == "ok"' > /dev/null; then
    success "Successfully retrieved global polling status"
    POLLER_RUNNING=$(echo "$RESPONSE" | jq -r '.data.poller_running')
    TOTAL_DEVICES=$(echo "$RESPONSE" | jq -r '.data.total_devices')
    echo "  Poller running: $POLLER_RUNNING"
    echo "  Total devices: $TOTAL_DEVICES"
 else
    error "Failed to retrieve global polling status"
 fi
 # Test 4: Wait for automatic poll to occur
 test_header "Test 4: Wait for Automatic Poll (35 seconds)"
 warning "Waiting 35 seconds for automatic poll to occur..."
 for i in {35..1}; do
    echo -ne "  ${i}s remaining...\r"
    sleep 1
 done
 echo ""
 success "Wait complete"
 # Test 5: Check if status was updated by background poller
 test_header "Test 5: Verify Background Poll Occurred"
 RESPONSE=$(curl -s "$BASE_URL/$UNIT_ID/status")
 echo "$RESPONSE" | jq '{last_poll_attempt, last_success, is_reachable, consecutive_failures}'
 if echo "$RESPONSE" | jq -e '.status == "ok"' > /dev/null; then
    LAST_POLL=$(echo "$RESPONSE" | jq -r '.data.last_poll_attempt')
    IS_REACHABLE=$(echo "$RESPONSE" | jq -r '.data.is_reachable')
    FAILURES=$(echo "$RESPONSE" | jq -r '.data.consecutive_failures')
    if [ "$LAST_POLL" != "null" ]; then
        success "Device was polled by background poller"
        echo "  Last poll: $LAST_POLL"
        echo "  Reachable: $IS_REACHABLE"
        echo "  Failures: $FAILURES"
    else
        warning "No automatic poll detected yet"
    fi
 else
    error "Failed to retrieve device status"
 fi
 # Test 6: Disable polling
 test_header "Test 6: Disable Background Polling"
 RESPONSE=$(curl -s -X PUT "$BASE_URL/$UNIT_ID/polling/config" \
  -H "Content-Type: application/json" \
  -d '{"poll_enabled": false}')
 echo "$RESPONSE" | jq '.'
 if echo "$RESPONSE" | jq -e '.status == "ok"' > /dev/null; then
    success "Successfully disabled background polling"
 else
    error "Failed to disable polling"
 fi
 # Test 7: Verify polling is disabled
 test_header "Test 7: Verify Polling Disabled in Global Status"
 RESPONSE=$(curl -s "$BASE_URL/_polling/status")
 DEVICE_ENABLED=$(echo "$RESPONSE" | jq --arg uid "$UNIT_ID" '.data.devices[] | select(.unit_id == $uid) | .poll_enabled')
 if [ "$DEVICE_ENABLED" == "false" ]; then
    success "Polling correctly shows as disabled for $UNIT_ID"
 else
    warning "Device still appears in polling list or shows as enabled"
 fi
 # Test 8: Re-enable polling with original interval
 test_header "Test 8: Re-enable Polling with Original Interval"
 RESPONSE=$(curl -s -X PUT "$BASE_URL/$UNIT_ID/polling/config" \
  -H "Content-Type: application/json" \
  -d "{\"poll_enabled\": true, \"poll_interval_seconds\": $CURRENT_INTERVAL}")
 echo "$RESPONSE" | jq '.'
 if echo "$RESPONSE" | jq -e '.status == "ok"' > /dev/null; then
    success "Successfully re-enabled polling with ${CURRENT_INTERVAL}s interval"
 else
    error "Failed to re-enable polling"
 fi
 # Summary
 test_header "Test Summary"
 echo "All tests completed!"
 echo ""
 echo "Key endpoints tested:"
 echo "  GET  $BASE_URL/{unit_id}/polling/config"
 echo "  PUT  $BASE_URL/{unit_id}/polling/config"
 echo "  GET  $BASE_URL/_polling/status"
 echo "  GET  $BASE_URL/{unit_id}/status (with polling fields)"
 echo ""
 success "Background polling feature is working correctly"
@@ -1,128 +0,0 @@
 #!/usr/bin/env python3
 """
 Test script to verify that sleep mode is automatically disabled when:
 1. Device configuration is created/updated with TCP enabled
 2. Measurements are started
 This script tests the API endpoints, not the actual device communication.
 """
 import requests
 import json
 BASE_URL = "http://localhost:8100/api/nl43"
 UNIT_ID = "test-nl43-001"
 def test_config_update():
    """Test that config update works (actual sleep mode disable requires real device)"""
    print("\n=== Testing Config Update ===")
    # Create/update a device config
    config_data = {
        "host": "192.168.1.100",
        "tcp_port": 2255,
        "tcp_enabled": True,
        "ftp_enabled": False,
        "ftp_username": "admin",
        "ftp_password": "password"
    }
    print(f"Updating config for {UNIT_ID}...")
    response = requests.put(f"{BASE_URL}/{UNIT_ID}/config", json=config_data)
    if response.status_code == 200:
        print("✓ Config updated successfully")
        print(f"Response: {json.dumps(response.json(), indent=2)}")
        print("\nNote: Sleep mode disable was attempted (will succeed if device is reachable)")
        return True
    else:
        print(f"✗ Config update failed: {response.status_code}")
        print(f"Error: {response.text}")
        return False
 def test_get_config():
    """Test retrieving the config"""
    print("\n=== Testing Get Config ===")
    response = requests.get(f"{BASE_URL}/{UNIT_ID}/config")
    if response.status_code == 200:
        print("✓ Config retrieved successfully")
        print(f"Response: {json.dumps(response.json(), indent=2)}")
        return True
    elif response.status_code == 404:
        print("✗ Config not found (create one first)")
        return False
    else:
        print(f"✗ Request failed: {response.status_code}")
        print(f"Error: {response.text}")
        return False
 def test_start_measurement():
    """Test that start measurement attempts to disable sleep mode"""
    print("\n=== Testing Start Measurement ===")
    print(f"Attempting to start measurement on {UNIT_ID}...")
    response = requests.post(f"{BASE_URL}/{UNIT_ID}/start")
    if response.status_code == 200:
        print("✓ Start command accepted")
        print(f"Response: {json.dumps(response.json(), indent=2)}")
        print("\nNote: Sleep mode was disabled before starting measurement")
        return True
    elif response.status_code == 404:
        print("✗ Device config not found (create config first)")
        return False
    elif response.status_code == 502:
        print("✗ Device not reachable (expected if no physical device)")
        print(f"Response: {response.text}")
        print("\nNote: This is expected behavior when testing without a physical device")
        return True  # This is actually success - the endpoint tried to communicate
    else:
        print(f"✗ Request failed: {response.status_code}")
        print(f"Error: {response.text}")
        return False
 def main():
    print("=" * 60)
    print("Sleep Mode Auto-Disable Test")
    print("=" * 60)
    print("\nThis test verifies that sleep mode is automatically disabled")
    print("when device configs are updated or measurements are started.")
    print("\nNote: Without a physical device, some operations will fail at")
    print("the device communication level, but the API logic will execute.")
    # Run tests
    results = []
    # Test 1: Update config (should attempt to disable sleep mode)
    results.append(("Config Update", test_config_update()))
    # Test 2: Get config
    results.append(("Get Config", test_get_config()))
    # Test 3: Start measurement (should attempt to disable sleep mode)
    results.append(("Start Measurement", test_start_measurement()))
    # Summary
    print("\n" + "=" * 60)
    print("Test Summary")
    print("=" * 60)
    for test_name, result in results:
        status = "✓ PASS" if result else "✗ FAIL"
        print(f"{status}: {test_name}")
    print("\n" + "=" * 60)
    print("Implementation Details:")
    print("=" * 60)
    print("1. Config endpoint is now async and calls ensure_sleep_mode_disabled()")
    print("   when TCP is enabled")
    print("2. Start measurement endpoint calls ensure_sleep_mode_disabled()")
    print("   before starting the measurement")
    print("3. Sleep mode check is non-blocking - config/start will succeed")
    print("   even if the device is unreachable")
    print("=" * 60)
 if __name__ == "__main__":
    main()