diff --git a/CHANGELOG.md b/CHANGELOG.md index b265e55..d83bab1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,59 @@ All notable changes to SLMM (Sound Level Meter Manager) will be documented in th The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.3.0] - 2026-02-17 + +### Added + +#### Persistent TCP Connection Pool +- **Connection reuse** - TCP connections are cached per device and reused across commands, eliminating repeated TCP handshakes over cellular modems +- **OS-level TCP keepalive** - Configurable keepalive probes keep cellular NAT tables alive and detect dead connections early (default: probe after 15s idle, every 10s, 3 failures = dead) +- **Transparent retry** - If a cached connection goes stale, the system automatically retries with a fresh connection so failures are never visible to the caller +- **Stale connection detection** - Multi-layer detection via idle TTL, max age, transport state, and reader EOF checks +- **Background cleanup** - Periodic task (every 30s) evicts expired connections from the pool +- **Master switch** - Set `TCP_PERSISTENT_ENABLED=false` to revert to per-request connection behavior + +#### Connection Pool Diagnostics +- `GET /api/nl43/_connections/status` - View pool configuration, active connections, age/idle times, and keepalive settings +- `POST /api/nl43/_connections/flush` - Force-close all cached connections (useful for debugging) +- **Connections tab on roster page** - Live UI showing pool config, active connections with age/idle/alive status, auto-refreshes every 5s, and flush button + +#### Environment Variables +- `TCP_PERSISTENT_ENABLED` (default: `true`) - Master switch for persistent connections +- `TCP_IDLE_TTL` (default: `300`) - Close idle connections after N seconds +- `TCP_MAX_AGE` (default: `1800`) - Force reconnect after N seconds +- `TCP_KEEPALIVE_IDLE` (default: `15`) - Seconds idle before keepalive probes start +- `TCP_KEEPALIVE_INTERVAL` (default: `10`) - Seconds between keepalive probes +- `TCP_KEEPALIVE_COUNT` (default: `3`) - Failed probes before declaring connection dead + +### Changed +- **Health check endpoint** (`/health/devices`) - Now uses connection pool instead of opening throwaway TCP connections; checks for existing live connections first (zero-cost), only opens new connection through pool if needed +- **Diagnostics endpoint** - Removed separate port 443 modem check (extra handshake waste); TCP reachability test now uses connection pool +- **DRD streaming** - Streaming connections now get TCP keepalive options set; cached connections are evicted before opening dedicated streaming socket +- **Default timeouts tuned for cellular** - Idle TTL raised to 300s (5 min), max age raised to 1800s (30 min) to survive typical polling intervals over cellular links + +### Technical Details + +#### Architecture +- `ConnectionPool` class in `services.py` manages a single cached connection per device key (NL-43 only supports one TCP connection at a time) +- Uses existing per-device asyncio locks and rate limiting — no changes to concurrency model +- Pool is a module-level singleton initialized from environment variables at import time +- Lifecycle managed via FastAPI lifespan: cleanup task starts on startup, all connections closed on shutdown +- `_send_command_unlocked()` refactored to use acquire/release/discard pattern with single-retry fallback +- Command parsing extracted to `_execute_command()` method for reuse between primary and retry paths + +#### Cellular Modem Optimizations +- Keepalive probes at 15s prevent cellular NAT tables from expiring (typically 30-60s timeout) +- 300s idle TTL ensures connections survive between polling cycles (default 60s interval) +- 1800s max age allows a single socket to serve ~30 minutes of polling before forced reconnect +- Health checks and diagnostics produce zero additional TCP handshakes when a pooled connection exists +- Stale `$` prompt bytes drained from idle connections before command reuse + +### Breaking Changes +None. This release is fully backward-compatible with v0.2.x. Set `TCP_PERSISTENT_ENABLED=false` for identical behavior to previous versions. + +--- + ## [0.2.1] - 2026-01-23 ### Added @@ -146,6 +199,7 @@ None. This release is fully backward-compatible with v0.1.x. All existing endpoi ## Version History Summary +- **v0.3.0** (2026-02-17) - Persistent TCP connections with keepalive for cellular modem reliability - **v0.2.1** (2026-01-23) - Roster management, scheduler hooks, FTP logging, doc cleanup - **v0.2.0** (2026-01-15) - Background Polling System - **v0.1.0** (2025-12-XX) - Initial Release diff --git a/README.md b/README.md index 1347e35..441a1e6 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # SLMM - Sound Level Meter Manager -**Version 0.2.1** +**Version 0.3.0** Backend API service for controlling and monitoring Rion NL-43/NL-53 Sound Level Meters via TCP and FTP protocols. @@ -12,8 +12,9 @@ SLMM is a standalone backend module that provides REST API routing and command t ## Features -- **Background Polling** ⭐ NEW: Continuous automatic polling of devices with configurable intervals -- **Offline Detection** ⭐ NEW: Automatic device reachability tracking with failure counters +- **Persistent TCP Connections**: Cached per-device connections with OS-level keepalive, tuned for cellular modem reliability +- **Background Polling**: Continuous automatic polling of devices with configurable intervals +- **Offline Detection**: Automatic device reachability tracking with failure counters - **Device Management**: Configure and manage multiple NL43/NL53 devices - **Real-time Monitoring**: Stream live measurement data via WebSocket - **Measurement Control**: Start, stop, pause, resume, and reset measurements @@ -22,6 +23,7 @@ SLMM is a standalone backend module that provides REST API routing and command t - **Device Configuration**: Manage frequency/time weighting, clock sync, and more - **Rate Limiting**: Automatic 1-second delay enforcement between device commands - **Persistent Storage**: SQLite database for device configs and measurement cache +- **Connection Diagnostics**: Live UI and API endpoints for monitoring TCP connection pool status ## Architecture @@ -29,29 +31,39 @@ SLMM is a standalone backend module that provides REST API routing and command t ┌─────────────────┐ ┌──────────────────────────────┐ ┌─────────────────┐ │ │◄───────►│ SLMM API │◄───────►│ NL43/NL53 │ │ (Frontend) │ HTTP │ • REST Endpoints │ TCP │ Sound Meters │ -└─────────────────┘ │ • WebSocket Streaming │ └─────────────────┘ - │ • Background Poller ⭐ NEW │ ▲ - └──────────────────────────────┘ │ - │ Continuous - ▼ Polling - ┌──────────────┐ │ - │ SQLite DB │◄─────────────────────┘ +└─────────────────┘ │ • WebSocket Streaming │ (kept │ (via cellular │ + │ • Background Poller │ alive) │ modem) │ + │ • Connection Pool (v0.3) │ └─────────────────┘ + └──────────────────────────────┘ + │ + ▼ + ┌──────────────┐ + │ SQLite DB │ │ • Config │ │ • Status │ └──────────────┘ ``` +### Persistent TCP Connection Pool (v0.3.0) + +SLMM maintains persistent TCP connections to devices with OS-level keepalive, designed for reliable operation over cellular modems: + +- **Connection Reuse**: One cached TCP socket per device, reused across all commands (no repeated handshakes) +- **TCP Keepalive**: Probes keep cellular NAT tables alive and detect dead connections early +- **Transparent Retry**: Stale cached connections automatically retry with a fresh socket +- **Configurable**: Idle TTL (300s), max age (1800s), and keepalive timing via environment variables +- **Diagnostics**: Live UI on the roster page and API endpoints for monitoring pool status + ### Background Polling (v0.2.0) -SLMM now includes a background polling service that continuously queries devices and updates the status cache: +Background polling service continuously queries devices and updates the status cache: - **Automatic Updates**: Devices are polled at configurable intervals (10-3600 seconds) - **Offline Detection**: Devices marked unreachable after 3 consecutive failures - **Per-Device Configuration**: Each device can have a custom polling interval - **Resource Efficient**: Dynamic sleep intervals and smart scheduling -- **Graceful Shutdown**: Background task stops cleanly on service shutdown -This makes Terra-View significantly more responsive - status requests return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds). +Status requests return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds). ## Quick Start @@ -96,9 +108,18 @@ Once running, visit: ### Environment Variables +**Server:** - `PORT`: Server port (default: 8100) - `CORS_ORIGINS`: Comma-separated list of allowed origins (default: "*") +**TCP Connection Pool:** +- `TCP_PERSISTENT_ENABLED`: Enable persistent connections (default: "true") +- `TCP_IDLE_TTL`: Close idle connections after N seconds (default: 300) +- `TCP_MAX_AGE`: Force reconnect after N seconds (default: 1800) +- `TCP_KEEPALIVE_IDLE`: Seconds idle before keepalive probes (default: 15) +- `TCP_KEEPALIVE_INTERVAL`: Seconds between keepalive probes (default: 10) +- `TCP_KEEPALIVE_COUNT`: Failed probes before declaring dead (default: 3) + ### Database The SQLite database is automatically created at [data/slmm.db](data/slmm.db) on first run. @@ -126,7 +147,7 @@ Logs are written to: | GET | `/api/nl43/{unit_id}/live` | Request fresh DOD data from device (bypasses cache) | | WS | `/api/nl43/{unit_id}/stream` | WebSocket stream for real-time DRD data | -### Background Polling Configuration ⭐ NEW +### Background Polling | Method | Endpoint | Description | |--------|----------|-------------| @@ -134,6 +155,13 @@ Logs are written to: | PUT | `/api/nl43/{unit_id}/polling/config` | Update polling interval and enable/disable polling | | GET | `/api/nl43/_polling/status` | Get global polling status for all devices | +### Connection Pool + +| Method | Endpoint | Description | +|--------|----------|-------------| +| GET | `/api/nl43/_connections/status` | Get pool config, active connections, age/idle times | +| POST | `/api/nl43/_connections/flush` | Force-close all cached TCP connections | + ### Measurement Control | Method | Endpoint | Description | @@ -255,6 +283,9 @@ Caches latest measurement snapshot: ### TCP Communication - Uses ASCII command protocol over TCP +- Persistent connections with OS-level keepalive (tuned for cellular modems) +- Connections cached per device and reused across commands +- Transparent retry on stale connections - Enforces ≥1 second delay between commands to same device - Two-line response format: - Line 1: Result code (R+0000 for success) @@ -320,6 +351,16 @@ curl http://localhost:8100/api/nl43/meter-001/polling/config curl http://localhost:8100/api/nl43/_polling/status ``` +### Check Connection Pool Status +```bash +curl http://localhost:8100/api/nl43/_connections/status | jq '.' +``` + +### Flush All Cached Connections +```bash +curl -X POST http://localhost:8100/api/nl43/_connections/flush +``` + ### Verify Device Settings ```bash curl http://localhost:8100/api/nl43/meter-001/settings @@ -388,11 +429,19 @@ See [API.md](API.md) for detailed integration examples. ## Troubleshooting ### Connection Issues +- Check connection pool status: `curl http://localhost:8100/api/nl43/_connections/status` +- Flush stale connections: `curl -X POST http://localhost:8100/api/nl43/_connections/flush` - Verify device IP address and port in configuration - Ensure device is on the same network - Check firewall rules allow TCP/FTP connections - Verify RX55 network adapter is properly configured on device +### Cellular Modem Issues +- If modem wedges from too many handshakes, ensure `TCP_PERSISTENT_ENABLED=true` (default) +- Increase `TCP_IDLE_TTL` if connections expire between poll cycles +- Keepalive probes (default: every 15s) keep NAT tables alive — adjust `TCP_KEEPALIVE_IDLE` if needed +- Set `TCP_PERSISTENT_ENABLED=false` to disable pooling for debugging + ### Rate Limiting - API automatically enforces 1-second delay between commands - If experiencing delays, this is normal device behavior diff --git a/app/main.py b/app/main.py index abf0879..ffce6a2 100644 --- a/app/main.py +++ b/app/main.py @@ -52,7 +52,7 @@ async def lifespan(app: FastAPI): app = FastAPI( title="SLMM NL43 Addon", description="Standalone module for NL43 configuration and status APIs with background polling", - version="0.2.0", + version="0.3.0", lifespan=lifespan, )