Files
slmm/CHANGELOG.md
2026-02-17 02:56:11 +00:00

9.6 KiB

Changelog

All notable changes to SLMM (Sound Level Meter Manager) will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.3.0] - 2026-02-17

Added

Persistent TCP Connection Pool

  • Connection reuse - TCP connections are cached per device and reused across commands, eliminating repeated TCP handshakes over cellular modems
  • OS-level TCP keepalive - Configurable keepalive probes keep cellular NAT tables alive and detect dead connections early (default: probe after 15s idle, every 10s, 3 failures = dead)
  • Transparent retry - If a cached connection goes stale, the system automatically retries with a fresh connection so failures are never visible to the caller
  • Stale connection detection - Multi-layer detection via idle TTL, max age, transport state, and reader EOF checks
  • Background cleanup - Periodic task (every 30s) evicts expired connections from the pool
  • Master switch - Set TCP_PERSISTENT_ENABLED=false to revert to per-request connection behavior

Connection Pool Diagnostics

  • GET /api/nl43/_connections/status - View pool configuration, active connections, age/idle times, and keepalive settings
  • POST /api/nl43/_connections/flush - Force-close all cached connections (useful for debugging)
  • Connections tab on roster page - Live UI showing pool config, active connections with age/idle/alive status, auto-refreshes every 5s, and flush button

Environment Variables

  • TCP_PERSISTENT_ENABLED (default: true) - Master switch for persistent connections
  • TCP_IDLE_TTL (default: 300) - Close idle connections after N seconds
  • TCP_MAX_AGE (default: 1800) - Force reconnect after N seconds
  • TCP_KEEPALIVE_IDLE (default: 15) - Seconds idle before keepalive probes start
  • TCP_KEEPALIVE_INTERVAL (default: 10) - Seconds between keepalive probes
  • TCP_KEEPALIVE_COUNT (default: 3) - Failed probes before declaring connection dead

Changed

  • Health check endpoint (/health/devices) - Now uses connection pool instead of opening throwaway TCP connections; checks for existing live connections first (zero-cost), only opens new connection through pool if needed
  • Diagnostics endpoint - Removed separate port 443 modem check (extra handshake waste); TCP reachability test now uses connection pool
  • DRD streaming - Streaming connections now get TCP keepalive options set; cached connections are evicted before opening dedicated streaming socket
  • Default timeouts tuned for cellular - Idle TTL raised to 300s (5 min), max age raised to 1800s (30 min) to survive typical polling intervals over cellular links

Technical Details

Architecture

  • ConnectionPool class in services.py manages a single cached connection per device key (NL-43 only supports one TCP connection at a time)
  • Uses existing per-device asyncio locks and rate limiting — no changes to concurrency model
  • Pool is a module-level singleton initialized from environment variables at import time
  • Lifecycle managed via FastAPI lifespan: cleanup task starts on startup, all connections closed on shutdown
  • _send_command_unlocked() refactored to use acquire/release/discard pattern with single-retry fallback
  • Command parsing extracted to _execute_command() method for reuse between primary and retry paths

Cellular Modem Optimizations

  • Keepalive probes at 15s prevent cellular NAT tables from expiring (typically 30-60s timeout)
  • 300s idle TTL ensures connections survive between polling cycles (default 60s interval)
  • 1800s max age allows a single socket to serve ~30 minutes of polling before forced reconnect
  • Health checks and diagnostics produce zero additional TCP handshakes when a pooled connection exists
  • Stale $ prompt bytes drained from idle connections before command reuse

Breaking Changes

None. This release is fully backward-compatible with v0.2.x. Set TCP_PERSISTENT_ENABLED=false for identical behavior to previous versions.


[0.2.1] - 2026-01-23

Added

  • Roster management: UI and API endpoints for managing device rosters.
  • Delete config endpoint: Remove device configuration alongside cached status data.
  • Scheduler hooks: start_cycle and stop_cycle helpers for Terra-View scheduling integration.

Changed

  • FTP logging: Connection, authentication, and transfer phases now log explicitly.
  • Documentation: Reorganized docs/scripts and updated API notes for FTP/TCP verification.

[0.2.0] - 2026-01-15

Added

Background Polling System

  • Continuous automatic device polling - Background service that continuously polls configured devices
  • Per-device configurable intervals - Each device can have custom polling interval (10-3600 seconds, default 60)
  • Automatic offline detection - Devices automatically marked unreachable after 3 consecutive failures
  • Reachability tracking - Database fields track device health with failure counters and error messages
  • Dynamic sleep scheduling - Polling service adjusts sleep intervals based on device configurations
  • Graceful lifecycle management - Background poller starts on application startup and stops cleanly on shutdown

New API Endpoints

  • GET /api/nl43/{unit_id}/polling/config - Get device polling configuration
  • PUT /api/nl43/{unit_id}/polling/config - Update polling interval and enable/disable per-device polling
  • GET /api/nl43/_polling/status - Get global polling status for all devices with reachability info

Database Schema Changes

  • NL43Config table:

    • poll_interval_seconds (Integer, default 60) - Polling interval in seconds
    • poll_enabled (Boolean, default true) - Enable/disable background polling per device
  • NL43Status table:

    • is_reachable (Boolean, default true) - Current device reachability status
    • consecutive_failures (Integer, default 0) - Count of consecutive poll failures
    • last_poll_attempt (DateTime) - Last time background poller attempted to poll
    • last_success (DateTime) - Last successful poll timestamp
    • last_error (Text) - Last error message (truncated to 500 chars)

New Files

  • app/background_poller.py - Background polling service implementation
  • migrate_add_polling_fields.py - Database migration script for v0.2.0 schema changes
  • test_polling.sh - Comprehensive test script for polling functionality
  • CHANGELOG.md - This changelog file

Changed

  • Enhanced status endpoint - GET /api/nl43/{unit_id}/status now includes polling-related fields (is_reachable, consecutive_failures, last_poll_attempt, last_success, last_error)
  • Application startup - Added lifespan context manager in app/main.py to manage background poller lifecycle
  • Performance improvement - Terra-View requests now return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds)

Technical Details

Architecture

  • Background poller runs as async task using asyncio.create_task()
  • Uses existing NL43Client and persist_snapshot() functions - no code duplication
  • Respects existing 1-second rate limiting per device
  • Efficient resource usage - skips work when no devices configured
  • WebSocket streaming remains unaffected - separate real-time data path

Default Behavior

  • Existing devices automatically get 60-second polling interval
  • Existing status records default to is_reachable=true
  • Migration is additive-only - no data loss
  • Polling can be disabled per-device via poll_enabled=false
  • Critical monitoring: 30 seconds
  • Normal monitoring: 60 seconds (default)
  • Battery conservation: 300 seconds (5 minutes)
  • Development/testing: 10 seconds (minimum allowed)

Migration Notes

To upgrade from v0.1.x to v0.2.0:

  1. Stop the service (if running):

    docker compose down slmm
    # OR
    # Stop your uvicorn process
    
  2. Update code:

    git pull
    # OR copy new files
    
  3. Run migration:

    cd slmm
    python3 migrate_add_polling_fields.py
    
  4. Restart service:

    docker compose up -d --build slmm
    # OR
    uvicorn app.main:app --host 0.0.0.0 --port 8100
    
  5. Verify polling is active:

    curl http://localhost:8100/api/nl43/_polling/status | jq '.'
    

You should see "poller_running": true and all configured devices listed.

Breaking Changes

None. This release is fully backward-compatible with v0.1.x. All existing endpoints and functionality remain unchanged.


[0.1.0] - 2025-12-XX

Added

  • Initial release
  • REST API for NL43/NL53 sound level meter control
  • TCP command protocol implementation
  • FTP file download support
  • WebSocket streaming for real-time data (DRD)
  • Device configuration management
  • Measurement control (start, stop, pause, resume, reset, store)
  • Device information endpoints (battery, clock, results)
  • Measurement settings management (frequency/time weighting)
  • Sleep mode control
  • Rate limiting (1-second minimum between commands)
  • SQLite database for device configs and status cache
  • Health check endpoints
  • Comprehensive API documentation
  • NL43 protocol documentation

Database Schema (v0.1.0)

  • NL43Config table - Device connection configuration
  • NL43Status table - Measurement snapshot cache

Version History Summary

  • v0.3.0 (2026-02-17) - Persistent TCP connections with keepalive for cellular modem reliability
  • v0.2.1 (2026-01-23) - Roster management, scheduler hooks, FTP logging, doc cleanup
  • v0.2.0 (2026-01-15) - Background Polling System
  • v0.1.0 (2025-12-XX) - Initial Release