v0.3.0, persistent polling update. Persistent TCP connection pool with all features Connection pool diagnostics (API + UI) All 6 new environment variables Changes to health check, diagnostics, and DRD streaming Technical architecture details and cellular #2

Merged
serversdown merged 3 commits from dev-persistent into main 2026-02-16 21:57:38 -05:00
3 changed files with 118 additions and 15 deletions
Showing only changes of commit b62e84f8b3 - Show all commits

View File

@@ -5,6 +5,59 @@ All notable changes to SLMM (Sound Level Meter Manager) will be documented in th
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.3.0] - 2026-02-17
### Added
#### Persistent TCP Connection Pool
- **Connection reuse** - TCP connections are cached per device and reused across commands, eliminating repeated TCP handshakes over cellular modems
- **OS-level TCP keepalive** - Configurable keepalive probes keep cellular NAT tables alive and detect dead connections early (default: probe after 15s idle, every 10s, 3 failures = dead)
- **Transparent retry** - If a cached connection goes stale, the system automatically retries with a fresh connection so failures are never visible to the caller
- **Stale connection detection** - Multi-layer detection via idle TTL, max age, transport state, and reader EOF checks
- **Background cleanup** - Periodic task (every 30s) evicts expired connections from the pool
- **Master switch** - Set `TCP_PERSISTENT_ENABLED=false` to revert to per-request connection behavior
#### Connection Pool Diagnostics
- `GET /api/nl43/_connections/status` - View pool configuration, active connections, age/idle times, and keepalive settings
- `POST /api/nl43/_connections/flush` - Force-close all cached connections (useful for debugging)
- **Connections tab on roster page** - Live UI showing pool config, active connections with age/idle/alive status, auto-refreshes every 5s, and flush button
#### Environment Variables
- `TCP_PERSISTENT_ENABLED` (default: `true`) - Master switch for persistent connections
- `TCP_IDLE_TTL` (default: `300`) - Close idle connections after N seconds
- `TCP_MAX_AGE` (default: `1800`) - Force reconnect after N seconds
- `TCP_KEEPALIVE_IDLE` (default: `15`) - Seconds idle before keepalive probes start
- `TCP_KEEPALIVE_INTERVAL` (default: `10`) - Seconds between keepalive probes
- `TCP_KEEPALIVE_COUNT` (default: `3`) - Failed probes before declaring connection dead
### Changed
- **Health check endpoint** (`/health/devices`) - Now uses connection pool instead of opening throwaway TCP connections; checks for existing live connections first (zero-cost), only opens new connection through pool if needed
- **Diagnostics endpoint** - Removed separate port 443 modem check (extra handshake waste); TCP reachability test now uses connection pool
- **DRD streaming** - Streaming connections now get TCP keepalive options set; cached connections are evicted before opening dedicated streaming socket
- **Default timeouts tuned for cellular** - Idle TTL raised to 300s (5 min), max age raised to 1800s (30 min) to survive typical polling intervals over cellular links
### Technical Details
#### Architecture
- `ConnectionPool` class in `services.py` manages a single cached connection per device key (NL-43 only supports one TCP connection at a time)
- Uses existing per-device asyncio locks and rate limiting — no changes to concurrency model
- Pool is a module-level singleton initialized from environment variables at import time
- Lifecycle managed via FastAPI lifespan: cleanup task starts on startup, all connections closed on shutdown
- `_send_command_unlocked()` refactored to use acquire/release/discard pattern with single-retry fallback
- Command parsing extracted to `_execute_command()` method for reuse between primary and retry paths
#### Cellular Modem Optimizations
- Keepalive probes at 15s prevent cellular NAT tables from expiring (typically 30-60s timeout)
- 300s idle TTL ensures connections survive between polling cycles (default 60s interval)
- 1800s max age allows a single socket to serve ~30 minutes of polling before forced reconnect
- Health checks and diagnostics produce zero additional TCP handshakes when a pooled connection exists
- Stale `$` prompt bytes drained from idle connections before command reuse
### Breaking Changes
None. This release is fully backward-compatible with v0.2.x. Set `TCP_PERSISTENT_ENABLED=false` for identical behavior to previous versions.
---
## [0.2.1] - 2026-01-23 ## [0.2.1] - 2026-01-23
### Added ### Added
@@ -146,6 +199,7 @@ None. This release is fully backward-compatible with v0.1.x. All existing endpoi
## Version History Summary ## Version History Summary
- **v0.3.0** (2026-02-17) - Persistent TCP connections with keepalive for cellular modem reliability
- **v0.2.1** (2026-01-23) - Roster management, scheduler hooks, FTP logging, doc cleanup - **v0.2.1** (2026-01-23) - Roster management, scheduler hooks, FTP logging, doc cleanup
- **v0.2.0** (2026-01-15) - Background Polling System - **v0.2.0** (2026-01-15) - Background Polling System
- **v0.1.0** (2025-12-XX) - Initial Release - **v0.1.0** (2025-12-XX) - Initial Release

View File

@@ -1,6 +1,6 @@
# SLMM - Sound Level Meter Manager # SLMM - Sound Level Meter Manager
**Version 0.2.1** **Version 0.3.0**
Backend API service for controlling and monitoring Rion NL-43/NL-53 Sound Level Meters via TCP and FTP protocols. Backend API service for controlling and monitoring Rion NL-43/NL-53 Sound Level Meters via TCP and FTP protocols.
@@ -12,8 +12,9 @@ SLMM is a standalone backend module that provides REST API routing and command t
## Features ## Features
- **Background Polling** ⭐ NEW: Continuous automatic polling of devices with configurable intervals - **Persistent TCP Connections**: Cached per-device connections with OS-level keepalive, tuned for cellular modem reliability
- **Offline Detection** ⭐ NEW: Automatic device reachability tracking with failure counters - **Background Polling**: Continuous automatic polling of devices with configurable intervals
- **Offline Detection**: Automatic device reachability tracking with failure counters
- **Device Management**: Configure and manage multiple NL43/NL53 devices - **Device Management**: Configure and manage multiple NL43/NL53 devices
- **Real-time Monitoring**: Stream live measurement data via WebSocket - **Real-time Monitoring**: Stream live measurement data via WebSocket
- **Measurement Control**: Start, stop, pause, resume, and reset measurements - **Measurement Control**: Start, stop, pause, resume, and reset measurements
@@ -22,6 +23,7 @@ SLMM is a standalone backend module that provides REST API routing and command t
- **Device Configuration**: Manage frequency/time weighting, clock sync, and more - **Device Configuration**: Manage frequency/time weighting, clock sync, and more
- **Rate Limiting**: Automatic 1-second delay enforcement between device commands - **Rate Limiting**: Automatic 1-second delay enforcement between device commands
- **Persistent Storage**: SQLite database for device configs and measurement cache - **Persistent Storage**: SQLite database for device configs and measurement cache
- **Connection Diagnostics**: Live UI and API endpoints for monitoring TCP connection pool status
## Architecture ## Architecture
@@ -29,29 +31,39 @@ SLMM is a standalone backend module that provides REST API routing and command t
┌─────────────────┐ ┌──────────────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────────────────────┐ ┌─────────────────┐
│ │◄───────►│ SLMM API │◄───────►│ NL43/NL53 │ │ │◄───────►│ SLMM API │◄───────►│ NL43/NL53 │
│ (Frontend) │ HTTP │ • REST Endpoints │ TCP │ Sound Meters │ │ (Frontend) │ HTTP │ • REST Endpoints │ TCP │ Sound Meters │
└─────────────────┘ │ • WebSocket Streaming │ └─────────────────┘ └─────────────────┘ │ • WebSocket Streaming │ (kept (via cellular │
│ • Background Poller ⭐ NEW │ │ • Background Poller │ alive) │ modem)
└──────────────────────────────┘ │ • Connection Pool (v0.3) │ └─────────────────┘
│ Continuous └──────────────────────────────┘
▼ Polling
┌──────────────┐ │
│ SQLite DB │◄───────────────────── ──────────────
│ SQLite DB │
│ • Config │ │ • Config │
│ • Status │ │ • Status │
└──────────────┘ └──────────────┘
``` ```
### Persistent TCP Connection Pool (v0.3.0)
SLMM maintains persistent TCP connections to devices with OS-level keepalive, designed for reliable operation over cellular modems:
- **Connection Reuse**: One cached TCP socket per device, reused across all commands (no repeated handshakes)
- **TCP Keepalive**: Probes keep cellular NAT tables alive and detect dead connections early
- **Transparent Retry**: Stale cached connections automatically retry with a fresh socket
- **Configurable**: Idle TTL (300s), max age (1800s), and keepalive timing via environment variables
- **Diagnostics**: Live UI on the roster page and API endpoints for monitoring pool status
### Background Polling (v0.2.0) ### Background Polling (v0.2.0)
SLMM now includes a background polling service that continuously queries devices and updates the status cache: Background polling service continuously queries devices and updates the status cache:
- **Automatic Updates**: Devices are polled at configurable intervals (10-3600 seconds) - **Automatic Updates**: Devices are polled at configurable intervals (10-3600 seconds)
- **Offline Detection**: Devices marked unreachable after 3 consecutive failures - **Offline Detection**: Devices marked unreachable after 3 consecutive failures
- **Per-Device Configuration**: Each device can have a custom polling interval - **Per-Device Configuration**: Each device can have a custom polling interval
- **Resource Efficient**: Dynamic sleep intervals and smart scheduling - **Resource Efficient**: Dynamic sleep intervals and smart scheduling
- **Graceful Shutdown**: Background task stops cleanly on service shutdown
This makes Terra-View significantly more responsive - status requests return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds). Status requests return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds).
## Quick Start ## Quick Start
@@ -96,9 +108,18 @@ Once running, visit:
### Environment Variables ### Environment Variables
**Server:**
- `PORT`: Server port (default: 8100) - `PORT`: Server port (default: 8100)
- `CORS_ORIGINS`: Comma-separated list of allowed origins (default: "*") - `CORS_ORIGINS`: Comma-separated list of allowed origins (default: "*")
**TCP Connection Pool:**
- `TCP_PERSISTENT_ENABLED`: Enable persistent connections (default: "true")
- `TCP_IDLE_TTL`: Close idle connections after N seconds (default: 300)
- `TCP_MAX_AGE`: Force reconnect after N seconds (default: 1800)
- `TCP_KEEPALIVE_IDLE`: Seconds idle before keepalive probes (default: 15)
- `TCP_KEEPALIVE_INTERVAL`: Seconds between keepalive probes (default: 10)
- `TCP_KEEPALIVE_COUNT`: Failed probes before declaring dead (default: 3)
### Database ### Database
The SQLite database is automatically created at [data/slmm.db](data/slmm.db) on first run. The SQLite database is automatically created at [data/slmm.db](data/slmm.db) on first run.
@@ -126,7 +147,7 @@ Logs are written to:
| GET | `/api/nl43/{unit_id}/live` | Request fresh DOD data from device (bypasses cache) | | GET | `/api/nl43/{unit_id}/live` | Request fresh DOD data from device (bypasses cache) |
| WS | `/api/nl43/{unit_id}/stream` | WebSocket stream for real-time DRD data | | WS | `/api/nl43/{unit_id}/stream` | WebSocket stream for real-time DRD data |
### Background Polling Configuration ⭐ NEW ### Background Polling
| Method | Endpoint | Description | | Method | Endpoint | Description |
|--------|----------|-------------| |--------|----------|-------------|
@@ -134,6 +155,13 @@ Logs are written to:
| PUT | `/api/nl43/{unit_id}/polling/config` | Update polling interval and enable/disable polling | | PUT | `/api/nl43/{unit_id}/polling/config` | Update polling interval and enable/disable polling |
| GET | `/api/nl43/_polling/status` | Get global polling status for all devices | | GET | `/api/nl43/_polling/status` | Get global polling status for all devices |
### Connection Pool
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/nl43/_connections/status` | Get pool config, active connections, age/idle times |
| POST | `/api/nl43/_connections/flush` | Force-close all cached TCP connections |
### Measurement Control ### Measurement Control
| Method | Endpoint | Description | | Method | Endpoint | Description |
@@ -255,6 +283,9 @@ Caches latest measurement snapshot:
### TCP Communication ### TCP Communication
- Uses ASCII command protocol over TCP - Uses ASCII command protocol over TCP
- Persistent connections with OS-level keepalive (tuned for cellular modems)
- Connections cached per device and reused across commands
- Transparent retry on stale connections
- Enforces ≥1 second delay between commands to same device - Enforces ≥1 second delay between commands to same device
- Two-line response format: - Two-line response format:
- Line 1: Result code (R+0000 for success) - Line 1: Result code (R+0000 for success)
@@ -320,6 +351,16 @@ curl http://localhost:8100/api/nl43/meter-001/polling/config
curl http://localhost:8100/api/nl43/_polling/status curl http://localhost:8100/api/nl43/_polling/status
``` ```
### Check Connection Pool Status
```bash
curl http://localhost:8100/api/nl43/_connections/status | jq '.'
```
### Flush All Cached Connections
```bash
curl -X POST http://localhost:8100/api/nl43/_connections/flush
```
### Verify Device Settings ### Verify Device Settings
```bash ```bash
curl http://localhost:8100/api/nl43/meter-001/settings curl http://localhost:8100/api/nl43/meter-001/settings
@@ -388,11 +429,19 @@ See [API.md](API.md) for detailed integration examples.
## Troubleshooting ## Troubleshooting
### Connection Issues ### Connection Issues
- Check connection pool status: `curl http://localhost:8100/api/nl43/_connections/status`
- Flush stale connections: `curl -X POST http://localhost:8100/api/nl43/_connections/flush`
- Verify device IP address and port in configuration - Verify device IP address and port in configuration
- Ensure device is on the same network - Ensure device is on the same network
- Check firewall rules allow TCP/FTP connections - Check firewall rules allow TCP/FTP connections
- Verify RX55 network adapter is properly configured on device - Verify RX55 network adapter is properly configured on device
### Cellular Modem Issues
- If modem wedges from too many handshakes, ensure `TCP_PERSISTENT_ENABLED=true` (default)
- Increase `TCP_IDLE_TTL` if connections expire between poll cycles
- Keepalive probes (default: every 15s) keep NAT tables alive — adjust `TCP_KEEPALIVE_IDLE` if needed
- Set `TCP_PERSISTENT_ENABLED=false` to disable pooling for debugging
### Rate Limiting ### Rate Limiting
- API automatically enforces 1-second delay between commands - API automatically enforces 1-second delay between commands
- If experiencing delays, this is normal device behavior - If experiencing delays, this is normal device behavior

View File

@@ -52,7 +52,7 @@ async def lifespan(app: FastAPI):
app = FastAPI( app = FastAPI(
title="SLMM NL43 Addon", title="SLMM NL43 Addon",
description="Standalone module for NL43 configuration and status APIs with background polling", description="Standalone module for NL43 configuration and status APIs with background polling",
version="0.2.0", version="0.3.0",
lifespan=lifespan, lifespan=lifespan,
) )