Compare commits

17 Commits

Author SHA1 Message Date
450509d210 stop tracking dev runtime data 2026-03-12 22:46:37 +00:00
fefa9eace8 chore: gitignore clean up 2026-03-12 21:34:14 +00:00
98a8d357e5 chore: data-dev folder added to gitignore 2026-03-12 21:33:43 +00:00
serversdwn
0a7422eceb Merge branch 'dev-persistent' of ssh://10.0.0.2:2222/serversdown/slmm into dev-persistent 2026-03-12 20:26:56 +00:00
serversdwn
996b993cb9 chore: gitignore dev data 2026-03-12 20:26:53 +00:00
serversdwn
01337696b3 feat: add connection pool status logging every 15 minutes 2026-02-19 15:09:50 +00:00
serversdwn
a302fd15d4 fix: change debug logs to info level for connection pool events 2026-02-19 06:04:34 +00:00
serversdwn
af5ecc1a92 fix: improve connection pool idle and max age checks to allow disabling 2026-02-19 01:25:01 +00:00
serversdwn
b62e84f8b3 v0.3.0, persistent polling update. 2026-02-17 02:56:11 +00:00
serversdwn
a5f8d1b2c7 Persistent polling interval increased. Healthcheck now uses poll instead of separate handshakes. 2026-02-17 02:41:09 +00:00
serversdwn
a1a80bbb4d add: new persisent connection approach, env variables for tcp keepalive and persist, added connection pool class. 2026-02-16 04:25:51 +00:00
serversdwn
005e0091fe fix: delay added to ensure tcp commands dont talk over eachother 2026-02-16 02:42:41 +00:00
serversdwn
e6ac80df6c chore: add pcap files to gitignore 2026-02-10 21:12:19 +00:00
serversdwn
7070b948a8 add: stress test script for diagnosing TCP connection issues.
chore: clean up .gitignore
2026-02-10 07:07:34 +00:00
serversdwn
3b6e9ad3f0 fix: time added to FTP enable step to prevent commands getting messed up 2026-02-06 17:37:10 +00:00
serversdwn
eb0cbcc077 fix: 24hr restart schedule enchanced.
Step 0: Pause polling
Step 1: Stop measurement → wait 10s
Step 2: Disable FTP → wait 10s
Step 3: Enable FTP → wait 10s
Step 4: Download data
Step 5: Wait 30s for device to settle
Step 6: Start new measurement
Step 7: Re-enable polling
2026-01-31 05:15:00 +00:00
serversdwn
cc0a5bdf84 chore cleanup 2026-01-29 22:44:20 +00:00
11 changed files with 2848 additions and 320 deletions

5
.gitignore vendored
View File

@@ -1,5 +1,8 @@
/manuals/ /manuals/
/data/ /data/
/data-dev/
/SLM-stress-test/stress_test_logs/
/SLM-stress-test/tcpdump-runs/
# Python cache # Python cache
__pycache__/ __pycache__/
@@ -12,3 +15,5 @@ __pycache__/
*.egg-info/ *.egg-info/
dist/ dist/
build/ build/
*.pcap

View File

@@ -5,6 +5,59 @@ All notable changes to SLMM (Sound Level Meter Manager) will be documented in th
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.3.0] - 2026-02-17
### Added
#### Persistent TCP Connection Pool
- **Connection reuse** - TCP connections are cached per device and reused across commands, eliminating repeated TCP handshakes over cellular modems
- **OS-level TCP keepalive** - Configurable keepalive probes keep cellular NAT tables alive and detect dead connections early (default: probe after 15s idle, every 10s, 3 failures = dead)
- **Transparent retry** - If a cached connection goes stale, the system automatically retries with a fresh connection so failures are never visible to the caller
- **Stale connection detection** - Multi-layer detection via idle TTL, max age, transport state, and reader EOF checks
- **Background cleanup** - Periodic task (every 30s) evicts expired connections from the pool
- **Master switch** - Set `TCP_PERSISTENT_ENABLED=false` to revert to per-request connection behavior
#### Connection Pool Diagnostics
- `GET /api/nl43/_connections/status` - View pool configuration, active connections, age/idle times, and keepalive settings
- `POST /api/nl43/_connections/flush` - Force-close all cached connections (useful for debugging)
- **Connections tab on roster page** - Live UI showing pool config, active connections with age/idle/alive status, auto-refreshes every 5s, and flush button
#### Environment Variables
- `TCP_PERSISTENT_ENABLED` (default: `true`) - Master switch for persistent connections
- `TCP_IDLE_TTL` (default: `300`) - Close idle connections after N seconds
- `TCP_MAX_AGE` (default: `1800`) - Force reconnect after N seconds
- `TCP_KEEPALIVE_IDLE` (default: `15`) - Seconds idle before keepalive probes start
- `TCP_KEEPALIVE_INTERVAL` (default: `10`) - Seconds between keepalive probes
- `TCP_KEEPALIVE_COUNT` (default: `3`) - Failed probes before declaring connection dead
### Changed
- **Health check endpoint** (`/health/devices`) - Now uses connection pool instead of opening throwaway TCP connections; checks for existing live connections first (zero-cost), only opens new connection through pool if needed
- **Diagnostics endpoint** - Removed separate port 443 modem check (extra handshake waste); TCP reachability test now uses connection pool
- **DRD streaming** - Streaming connections now get TCP keepalive options set; cached connections are evicted before opening dedicated streaming socket
- **Default timeouts tuned for cellular** - Idle TTL raised to 300s (5 min), max age raised to 1800s (30 min) to survive typical polling intervals over cellular links
### Technical Details
#### Architecture
- `ConnectionPool` class in `services.py` manages a single cached connection per device key (NL-43 only supports one TCP connection at a time)
- Uses existing per-device asyncio locks and rate limiting — no changes to concurrency model
- Pool is a module-level singleton initialized from environment variables at import time
- Lifecycle managed via FastAPI lifespan: cleanup task starts on startup, all connections closed on shutdown
- `_send_command_unlocked()` refactored to use acquire/release/discard pattern with single-retry fallback
- Command parsing extracted to `_execute_command()` method for reuse between primary and retry paths
#### Cellular Modem Optimizations
- Keepalive probes at 15s prevent cellular NAT tables from expiring (typically 30-60s timeout)
- 300s idle TTL ensures connections survive between polling cycles (default 60s interval)
- 1800s max age allows a single socket to serve ~30 minutes of polling before forced reconnect
- Health checks and diagnostics produce zero additional TCP handshakes when a pooled connection exists
- Stale `$` prompt bytes drained from idle connections before command reuse
### Breaking Changes
None. This release is fully backward-compatible with v0.2.x. Set `TCP_PERSISTENT_ENABLED=false` for identical behavior to previous versions.
---
## [0.2.1] - 2026-01-23 ## [0.2.1] - 2026-01-23
### Added ### Added
@@ -146,6 +199,7 @@ None. This release is fully backward-compatible with v0.1.x. All existing endpoi
## Version History Summary ## Version History Summary
- **v0.3.0** (2026-02-17) - Persistent TCP connections with keepalive for cellular modem reliability
- **v0.2.1** (2026-01-23) - Roster management, scheduler hooks, FTP logging, doc cleanup - **v0.2.1** (2026-01-23) - Roster management, scheduler hooks, FTP logging, doc cleanup
- **v0.2.0** (2026-01-15) - Background Polling System - **v0.2.0** (2026-01-15) - Background Polling System
- **v0.1.0** (2025-12-XX) - Initial Release - **v0.1.0** (2025-12-XX) - Initial Release

View File

@@ -1,6 +1,6 @@
# SLMM - Sound Level Meter Manager # SLMM - Sound Level Meter Manager
**Version 0.2.1** **Version 0.3.0**
Backend API service for controlling and monitoring Rion NL-43/NL-53 Sound Level Meters via TCP and FTP protocols. Backend API service for controlling and monitoring Rion NL-43/NL-53 Sound Level Meters via TCP and FTP protocols.
@@ -12,8 +12,9 @@ SLMM is a standalone backend module that provides REST API routing and command t
## Features ## Features
- **Background Polling** ⭐ NEW: Continuous automatic polling of devices with configurable intervals - **Persistent TCP Connections**: Cached per-device connections with OS-level keepalive, tuned for cellular modem reliability
- **Offline Detection** ⭐ NEW: Automatic device reachability tracking with failure counters - **Background Polling**: Continuous automatic polling of devices with configurable intervals
- **Offline Detection**: Automatic device reachability tracking with failure counters
- **Device Management**: Configure and manage multiple NL43/NL53 devices - **Device Management**: Configure and manage multiple NL43/NL53 devices
- **Real-time Monitoring**: Stream live measurement data via WebSocket - **Real-time Monitoring**: Stream live measurement data via WebSocket
- **Measurement Control**: Start, stop, pause, resume, and reset measurements - **Measurement Control**: Start, stop, pause, resume, and reset measurements
@@ -22,6 +23,7 @@ SLMM is a standalone backend module that provides REST API routing and command t
- **Device Configuration**: Manage frequency/time weighting, clock sync, and more - **Device Configuration**: Manage frequency/time weighting, clock sync, and more
- **Rate Limiting**: Automatic 1-second delay enforcement between device commands - **Rate Limiting**: Automatic 1-second delay enforcement between device commands
- **Persistent Storage**: SQLite database for device configs and measurement cache - **Persistent Storage**: SQLite database for device configs and measurement cache
- **Connection Diagnostics**: Live UI and API endpoints for monitoring TCP connection pool status
## Architecture ## Architecture
@@ -29,29 +31,39 @@ SLMM is a standalone backend module that provides REST API routing and command t
┌─────────────────┐ ┌──────────────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────────────────────┐ ┌─────────────────┐
│ │◄───────►│ SLMM API │◄───────►│ NL43/NL53 │ │ │◄───────►│ SLMM API │◄───────►│ NL43/NL53 │
│ (Frontend) │ HTTP │ • REST Endpoints │ TCP │ Sound Meters │ │ (Frontend) │ HTTP │ • REST Endpoints │ TCP │ Sound Meters │
└─────────────────┘ │ • WebSocket Streaming │ └─────────────────┘ └─────────────────┘ │ • WebSocket Streaming │ (kept (via cellular │
│ • Background Poller ⭐ NEW │ │ • Background Poller │ alive) │ modem)
└──────────────────────────────┘ │ • Connection Pool (v0.3) │ └─────────────────┘
│ Continuous └──────────────────────────────┘
▼ Polling
┌──────────────┐ │
│ SQLite DB │◄───────────────────── ──────────────
│ SQLite DB │
│ • Config │ │ • Config │
│ • Status │ │ • Status │
└──────────────┘ └──────────────┘
``` ```
### Persistent TCP Connection Pool (v0.3.0)
SLMM maintains persistent TCP connections to devices with OS-level keepalive, designed for reliable operation over cellular modems:
- **Connection Reuse**: One cached TCP socket per device, reused across all commands (no repeated handshakes)
- **TCP Keepalive**: Probes keep cellular NAT tables alive and detect dead connections early
- **Transparent Retry**: Stale cached connections automatically retry with a fresh socket
- **Configurable**: Idle TTL (300s), max age (1800s), and keepalive timing via environment variables
- **Diagnostics**: Live UI on the roster page and API endpoints for monitoring pool status
### Background Polling (v0.2.0) ### Background Polling (v0.2.0)
SLMM now includes a background polling service that continuously queries devices and updates the status cache: Background polling service continuously queries devices and updates the status cache:
- **Automatic Updates**: Devices are polled at configurable intervals (10-3600 seconds) - **Automatic Updates**: Devices are polled at configurable intervals (10-3600 seconds)
- **Offline Detection**: Devices marked unreachable after 3 consecutive failures - **Offline Detection**: Devices marked unreachable after 3 consecutive failures
- **Per-Device Configuration**: Each device can have a custom polling interval - **Per-Device Configuration**: Each device can have a custom polling interval
- **Resource Efficient**: Dynamic sleep intervals and smart scheduling - **Resource Efficient**: Dynamic sleep intervals and smart scheduling
- **Graceful Shutdown**: Background task stops cleanly on service shutdown
This makes Terra-View significantly more responsive - status requests return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds). Status requests return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds).
## Quick Start ## Quick Start
@@ -96,9 +108,18 @@ Once running, visit:
### Environment Variables ### Environment Variables
**Server:**
- `PORT`: Server port (default: 8100) - `PORT`: Server port (default: 8100)
- `CORS_ORIGINS`: Comma-separated list of allowed origins (default: "*") - `CORS_ORIGINS`: Comma-separated list of allowed origins (default: "*")
**TCP Connection Pool:**
- `TCP_PERSISTENT_ENABLED`: Enable persistent connections (default: "true")
- `TCP_IDLE_TTL`: Close idle connections after N seconds (default: 300)
- `TCP_MAX_AGE`: Force reconnect after N seconds (default: 1800)
- `TCP_KEEPALIVE_IDLE`: Seconds idle before keepalive probes (default: 15)
- `TCP_KEEPALIVE_INTERVAL`: Seconds between keepalive probes (default: 10)
- `TCP_KEEPALIVE_COUNT`: Failed probes before declaring dead (default: 3)
### Database ### Database
The SQLite database is automatically created at [data/slmm.db](data/slmm.db) on first run. The SQLite database is automatically created at [data/slmm.db](data/slmm.db) on first run.
@@ -126,7 +147,7 @@ Logs are written to:
| GET | `/api/nl43/{unit_id}/live` | Request fresh DOD data from device (bypasses cache) | | GET | `/api/nl43/{unit_id}/live` | Request fresh DOD data from device (bypasses cache) |
| WS | `/api/nl43/{unit_id}/stream` | WebSocket stream for real-time DRD data | | WS | `/api/nl43/{unit_id}/stream` | WebSocket stream for real-time DRD data |
### Background Polling Configuration ⭐ NEW ### Background Polling
| Method | Endpoint | Description | | Method | Endpoint | Description |
|--------|----------|-------------| |--------|----------|-------------|
@@ -134,6 +155,13 @@ Logs are written to:
| PUT | `/api/nl43/{unit_id}/polling/config` | Update polling interval and enable/disable polling | | PUT | `/api/nl43/{unit_id}/polling/config` | Update polling interval and enable/disable polling |
| GET | `/api/nl43/_polling/status` | Get global polling status for all devices | | GET | `/api/nl43/_polling/status` | Get global polling status for all devices |
### Connection Pool
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/nl43/_connections/status` | Get pool config, active connections, age/idle times |
| POST | `/api/nl43/_connections/flush` | Force-close all cached TCP connections |
### Measurement Control ### Measurement Control
| Method | Endpoint | Description | | Method | Endpoint | Description |
@@ -255,6 +283,9 @@ Caches latest measurement snapshot:
### TCP Communication ### TCP Communication
- Uses ASCII command protocol over TCP - Uses ASCII command protocol over TCP
- Persistent connections with OS-level keepalive (tuned for cellular modems)
- Connections cached per device and reused across commands
- Transparent retry on stale connections
- Enforces ≥1 second delay between commands to same device - Enforces ≥1 second delay between commands to same device
- Two-line response format: - Two-line response format:
- Line 1: Result code (R+0000 for success) - Line 1: Result code (R+0000 for success)
@@ -320,6 +351,16 @@ curl http://localhost:8100/api/nl43/meter-001/polling/config
curl http://localhost:8100/api/nl43/_polling/status curl http://localhost:8100/api/nl43/_polling/status
``` ```
### Check Connection Pool Status
```bash
curl http://localhost:8100/api/nl43/_connections/status | jq '.'
```
### Flush All Cached Connections
```bash
curl -X POST http://localhost:8100/api/nl43/_connections/flush
```
### Verify Device Settings ### Verify Device Settings
```bash ```bash
curl http://localhost:8100/api/nl43/meter-001/settings curl http://localhost:8100/api/nl43/meter-001/settings
@@ -388,11 +429,19 @@ See [API.md](API.md) for detailed integration examples.
## Troubleshooting ## Troubleshooting
### Connection Issues ### Connection Issues
- Check connection pool status: `curl http://localhost:8100/api/nl43/_connections/status`
- Flush stale connections: `curl -X POST http://localhost:8100/api/nl43/_connections/flush`
- Verify device IP address and port in configuration - Verify device IP address and port in configuration
- Ensure device is on the same network - Ensure device is on the same network
- Check firewall rules allow TCP/FTP connections - Check firewall rules allow TCP/FTP connections
- Verify RX55 network adapter is properly configured on device - Verify RX55 network adapter is properly configured on device
### Cellular Modem Issues
- If modem wedges from too many handshakes, ensure `TCP_PERSISTENT_ENABLED=true` (default)
- Increase `TCP_IDLE_TTL` if connections expire between poll cycles
- Keepalive probes (default: every 15s) keep NAT tables alive — adjust `TCP_KEEPALIVE_IDLE` if needed
- Set `TCP_PERSISTENT_ENABLED=false` to disable pooling for debugging
### Rate Limiting ### Rate Limiting
- API automatically enforces 1-second delay between commands - API automatically enforces 1-second delay between commands
- If experiencing delays, this is normal device behavior - If experiencing delays, this is normal device behavior

View File

@@ -0,0 +1,403 @@
# NL-43 + RX55 TCP “Wedge” Investigation (2255 Refusal) — Full Log & Next Steps
**Last updated:** 2026-02-18
**Owner:** Brian / serversdown
**Context:** Terra-View / SLMM / field-deployed Rion NL-43 behind Sierra Wireless RX55
---
## 0) What this document is
This is a **comprehensive, chronological** record of the debugging we did to isolate a failure where the **NL-43s TCP control port (2255) eventually stops accepting connections** (“wedges”), while other services (notably FTP/21) remain reachable.
This is written to be fed back into future troubleshooting, so it intentionally includes the **full reasoning chain, experiments, commands, packet evidence, and conclusions**.
---
## 1) Architecture (as tested)
### Network path
- **Server (SLMM host):** `10.0.0.40`
- **RX55 WAN IP:** `63.45.161.30`
- **RX55 LAN subnet:** `192.168.1.0/24`
- **RX55 LAN gateway:** `192.168.1.1`
- **NL-43 LAN IP:** `192.168.1.10` (confirmed via ARP OUI + ping; see LAN validation)
### RX55 details
- **Sierra Wireless RX55**
- **OS:** 5.2
- **Firmware:** `01.14.24.00`
- **Carrier:** Verizon LTE (Band 66)
### Port forwarding rules (RX55)
- **WAN:2255 → NL-43:2255** (NL-43 TCP control)
- **WAN:21 → NL-43:21** (NL-43 FTP control)
You also experimented with additional forwards:
- **WAN:2253 → NL-43:2255** (test)
- **WAN:2253 → NL-43:2253** (test)
- **WAN:4450 → NL-43:4450** (test)
**Important:** Rule “Input zone / interface” was set to **WAN-NAT**, and Source IP left as **Any IPv4**. This is correct for inbound port-forward behavior on Sierra OS 5.x.
---
## 2) Original problem statement (the “wedge”)
After running for hours, the NL-43 becomes unreachable over TCP control.
### Symptom signature (WAN-side)
- Client attempts to connect to `63.45.161.30:2255`
- Instead of timing out, the client gets **connection refused** quickly.
- Packet-level: SYN from client → **RST,ACK** back (meaning active refusal vs silent drop)
### Critical operational behavior
- **Power cycling the NL-43 fixes it.**
- **Power cycling the RX55 does NOT fix it.**
- FTP sometimes remains available even while TCP control (2255) is dead.
This combination is what forced us to determine whether:
- The RX55 is rejecting connections, OR
- The NL-43 is no longer listening on 2255, OR
- Something about the RX55 path triggers the NL-43s control listener to die.
---
## 3) Event timeline evidence (SLMM logs)
A concrete wedge window was observed on **2026-02-18**:
- 10:55:46 AM — Poll success (Start)
- 11:00:28 AM — Measurement STOPPED (scheduled stop/download cycle succeeded)
- 11:55:50 AM — Poll success (Stop)
- 12:55:55 PM — Poll success (Stop)
- **1:55:58 PM — Poll failed (attempt 1/3): Errno 111 (connection refused)**
- 2:56:02 PM — Poll failed (attempt 2/3): Errno 111 (connection refused)
Key interpretation:
- The wedge occurred sometime between **12:55 and 1:55**.
- The failure type is **refused**, not timeout.
---
## 4) Early hypotheses (before proof)
We considered two main buckets:
### A) NL-43-side failure (most suspicious)
- NL-43 TCP control service crashes / exits / unbinds from 2255
- socket leak / accept backlog exhaustion
- “single control session allowed” and it gets stuck thinking a session is active
- mode/service manager bug (service restart fails after other activities)
- firmware bug in TCP daemon
### B) RX55-side failure (possible trigger / less likely once FTP works)
- NAT/forwarding table corruption
- firewall behavior
- helper/ALG interference
- MSS/MTU weirdness causing edge-case behavior
- session churn behavior causing downstream issues
---
## 5) Key experiments and what they proved
### 5.1) LAN-only stability test (No RX55 path)
**Test:** NL-43 tested directly on LAN (no modem path involved).
- Ran **24+ hours**
- Scheduler start/stop cycles worked
- Stress test: **500 commands @ 1/sec** → no failure
- Response time trend decreased (not degrading)
**Result:** The NL-43 appears stable in a “pure LAN” environment.
**Interpretation:** The trigger is likely related to the RX55/WAN environment, connection patterns, or service switching patterns—not just simple uptime.
---
### 5.2) Port-forward behavior: timeout vs refused (RX55 behavior characterization)
You observed:
- **If a WAN port is NOT forwarded (no rule):** connecting to that port **times out** (silent drop)
- **If a WAN port IS forwarded to NL-43 but nothing listens:** it **actively refuses** (RST)
Concrete example:
- Port **4450** with no rule → timeout
- Port **4450 → NL-43:4450** rule created → connection refused
**Interpretation:** This confirms the RX55 is actually forwarding packets to the NL-43 when a rule exists. “Refused” is consistent with the NL-43 (or RX55 relay behavior) responding quickly because the packet reached the target.
Important nuance:
- A “refused” on forwarded ports does **not** automatically prove the NL-43 is the one generating RST, because NAT hides the inside host and the RX55 could reject on behalf of an unreachable target. We needed a LAN-side proof test to close the loop.
---
### 5.3) UDP test confusion (and resolution)
You ran:
```bash
nc -vzu 63.45.161.30 2255
nc -vz 63.45.161.30 2255
```
Observed:
- UDP: “succeeded”
- TCP: “connection refused”
Resolution:
- UDP has **no handshake**. netcat prints “succeeded” if it doesnt immediately receive an ICMP unreachable. It does **not** mean a UDP service exists.
- TCP refused is meaningful: a RST implies “no listener” or “actively rejected.”
**Net effect:** UDP test did not change the diagnosis.
---
### 5.4) Packet capture proof (WAN-side)
You captured a Wireshark/tcpdump summary with these key patterns:
#### Port 2255 (TCP control)
Example:
- `10.0.0.40 → 63.45.161.30:2255` SYN
- `63.45.161.30 → 10.0.0.40` **RST, ACK** within ~50ms
This happened repeatedly.
#### Port 2253 (test port)
Multiple SYN attempts to 2253 showed **retransmissions and no response**, i.e., **silent drop** (consistent with no rule or not forwarded at that moment).
#### Port 21 (FTP)
Clean 3-way handshake:
- SYN → SYN/ACK → ACK
Then:
- FTP server banner: `220 Connection Ready`
Then:
- `530 Not logged in` (because SLMM was sending non-FTP “requests” as an experiment)
Session closes cleanly.
**Key takeaway from capture:**
- TCP transport to NL-43 via RX55 is definitely working (port 21 proves it).
- Port 2255 is being actively refused.
This strongly suggested “2255 listener is gone,” but still didnt fully prove whether the refusal was generated internally by NL-43 or by RX55 on behalf of NL-43.
---
## 6) The decisive experiment: LAN-side test while wedged (final proof)
Because the RX55 does not offer SSH, the plan was to test from **inside the LAN behind the RX55**.
### 6.1) Physical LAN tap setup
Constraint:
- NL-43 has only one Ethernet port.
Solution:
- Insert an unmanaged switch:
- RX55 LAN → switch
- NL-43 → switch
- Windows 10 laptop → switch
This creates a shared L2 segment where the laptop can test NL-43 directly.
### 6.2) Windows LAN validation
On the Windows laptop:
- `ipconfig` showed:
- IP: `192.168.1.100`
- Gateway: `192.168.1.1` (RX55)
- Initial `arp -a` only showed RX55, not NL-43.
You then:
- pinged likely host addresses and discovered NL-43 responds on **192.168.1.10**
- `arp -a` then showed:
- `192.168.1.10 → 00-10-50-14-0a-d8`
- OUI `00-10-50` recognized as **Rion** (matches NL-43)
So LAN identities were confirmed:
- RX55: `192.168.1.1`
- NL-43: `192.168.1.10`
### 6.3) The LAN port tests (the smoking gun)
From Windows:
```powershell
Test-NetConnection -ComputerName 192.168.1.10 -Port 2255
Test-NetConnection -ComputerName 192.168.1.10 -Port 21
```
Results (while the unit was “wedged” from the WAN perspective):
- **2255:** `TcpTestSucceeded : False`
- **21:** `TcpTestSucceeded : True`
**Conclusion (PROVEN):**
- The NL-43 is reachable on the LAN
- FTP port 21 is alive
- **The NL-43 is NOT listening on TCP port 2255**
- Therefore the RX55 is not the root cause of the refusal. The WAN refusal is consistent with the NL-43 having no listener on 2255.
This is now settled.
---
## 7) What we learned (final conclusions)
### 7.1) RX55 innocence (for this failure mode)
The RX55 is not “randomly rejecting” or “breaking TCP” in the way originally feared.
It successfully forwards and supports TCP to the NL-43 on port 21, and the LAN-side test proves the 2255 failure exists *even without NAT/WAN involvement*.
### 7.2) NL-43 control listener failure
The NL-43s TCP control service (port 2255) stops listening while:
- the device remains alive
- the LAN stack remains alive (ping)
- FTP remains alive (port 21)
This looks like one of:
- control daemon crash/exit
- service unbind
- stuck service state (e.g., “busy” / “session active forever”)
- resource leak (sockets/file descriptors) specific to the control service
- firmware service manager bug (start/stop of services fails after certain sequences)
---
## 8) Additional constraint discovered: “Web App mode” conflicts
You noted an important operational constraint:
> Turning on the web app disables other interfaces like TCP and FTP.
Meaning the NL-43 appears to have mutually exclusive service/mode behavior (or at least serious conflicts). That matters because:
- If any workflow toggles modes (explicitly or implicitly), it could destabilize the service lifecycle.
- It reduces the possibility of using “web UI toggle” as an easy remote recovery mechanism **if** it disables the services needed.
We have not yet run a controlled long test to determine whether:
- mode switching contributes directly to the 2255 listener dying, OR
- it happens even in a pure TCP-only mode with no switching.
---
## 9) Immediate operational decision (field tomorrow)
Because the device is needed in the field immediately, you chose:
- **Old-school manual deployment**
- **Manual SD card downloads**
- Avoid reliance on 2255/TCP control and remote workflows for now.
**Important operational note:**
The 2255 listener dying does not necessarily stop the NL-43 from measuring; it primarily breaks remote control/polling. Manual SD workflow sidesteps the entire remote control dependency.
---
## 10) Whats next (future work — when the unit is back)
Because long tests cant be run before tomorrow, the plan is to resume in a few weeks with controlled experiments designed to isolate the trigger and develop an operational mitigation.
### 10.1) Controlled experiment matrix (recommended)
Run each test for 2472 hours, or until wedge occurs, and record:
- number of TCP connects
- whether connections are persistent
- whether FTP is used
- whether any mode toggling is performed
- time-to-wedge
#### Test A — TCP-only (ideal baseline)
- TCP control only (2255)
- **True persistent connection** (open once, keep forever)
- No FTP
- No web mode toggling
Outcome interpretation:
- If stable: connection churn and/or FTP/mode switching is the trigger.
- If wedges anyway: pure 2255 daemon leak/bug.
#### Test B — TCP with connection churn
- Same as A but intentionally reconnect on a schedule (current SLMM behavior)
- No FTP
Outcome:
- If this wedges but A doesnt: churn is the trigger.
#### Test C — FTP activity + TCP
- Introduce scheduled FTP sessions (downloads) while using TCP control
- Observe whether wedge correlates with FTP use or with post-download periods.
Outcome:
- If wedge correlates with FTP, suspect internal service lifecycle conflict.
#### Test D — Web mode interaction (only if safe/possible)
- Evaluate what toggling web mode does to TCP/FTP services.
- Determine if any remote-safe “soft reset” exists.
---
## 11) Mitigation options (ranked)
### Option 1 — Make SLMM truly persistent (highest probability of success)
If the NL-43 wedges due to session churn or leaked socket states, the best mitigation is:
- Open one TCP socket per device
- Keep it open indefinitely
- Use OS keepalive
- Do **not** rotate connections on timers
- Reconnect only when the socket actually dies
This reduces:
- connect/close cycles
- NAT edge-case exposure
- resource churn inside NL-43
### Option 2 — Service “soft reset” (if possible without disabling required services)
If there exists any way to restart the 2255 service without power cycling:
- LAN TCP toggle (if it doesnt require web mode)
- any “restart comms” command (unknown)
- any maintenance menu sequence
then SLMM could:
- detect wedge
- trigger soft reset
- recover automatically
Current constraint: web app mode appears to disable other services, so this may not be viable.
### Option 3 — Hardware watchdog power cycle (industrial but reliable)
If this is a firmware bug with no clean workaround:
- Add a remotely controlled relay/power switch
- On wedge detection, power-cycle NL-43 automatically
- Optionally schedule a nightly power cycle to prevent leak accumulation
This is “field reality” and often the only long-term move with embedded devices.
### Option 4 — Vendor escalation (Rion)
You now have excellent evidence:
- LAN-side proof: 2255 dead while 21 alive
- WAN packet evidence
- clear isolation of RX55 innocence
This is strong enough to send to Rion support as a firmware defect report.
---
## 12) Repro “wedge bundle” checklist (for future captures)
When the wedge happens again, capture these before power cycling:
1) From server:
- `nc -vz 63.45.161.30 2255` (expect refused)
- `nc -vz 63.45.161.30 21` (expect success if FTP alive)
2) From LAN side (via switch/laptop):
- `Test-NetConnection 192.168.1.10 -Port 2255`
- `Test-NetConnection 192.168.1.10 -Port 21`
3) Optional: packet capture around the refused attempt.
4) Record:
- last successful poll timestamp
- last FTP session timestamp
- any scheduled start/stop/download cycles near wedge time
- SLMM connection reuse/rotation settings in effect
---
## 13) Final, current-state summary (as of 2026-02-18)
- The issue is **NOT** the RX55 rejecting inbound connections.
- The NL-43 is **alive**, reachable on LAN, and FTP works.
- The NL-43s **TCP control listener on 2255 stops listening** while the device remains otherwise healthy.
- The wedge can occur hours after successful operations.
- The unit is needed in the field immediately, so investigation pauses.
- Next phase: controlled tests to isolate trigger + implement mitigation (persistent socket or watchdog reset).
---
## 14) Notes / misc observations
- The Wireshark trace showed repeated FTP sessions were opened and closed cleanly, but SLMMs “FTP requests” were not valid FTP (causing `530 Not logged in`). That was part of experimentation, not a normal workflow.
- UDP “success” via netcat is not meaningful because UDP has no handshake; it simply indicates no ICMP unreachable was returned.
---
**End of document.**

File diff suppressed because it is too large Load Diff

View File

@@ -38,6 +38,7 @@ class BackgroundPoller:
self._running = False self._running = False
self._logger = logger self._logger = logger
self._last_cleanup = None # Track last log cleanup time self._last_cleanup = None # Track last log cleanup time
self._last_pool_log = None # Track last connection pool heartbeat log
async def start(self): async def start(self):
"""Start the background polling task.""" """Start the background polling task."""
@@ -89,6 +90,24 @@ class BackgroundPoller:
except Exception as e: except Exception as e:
self._logger.warning(f"Log cleanup failed: {e}") self._logger.warning(f"Log cleanup failed: {e}")
# Log connection pool status every 15 minutes
try:
now = datetime.utcnow()
if self._last_pool_log is None or (now - self._last_pool_log).total_seconds() > 900:
from app.services import _connection_pool
stats = _connection_pool.get_stats()
conns = stats.get("connections", {})
if conns:
for key, c in conns.items():
self._logger.info(
f"[POOL] {key} — age={c['age_seconds']}s idle={c['idle_seconds']}s alive={c['alive']}"
)
else:
self._logger.info("[POOL] No active connections in pool")
self._last_pool_log = now
except Exception as e:
self._logger.warning(f"Pool status log failed: {e}")
# Calculate dynamic sleep interval # Calculate dynamic sleep interval
sleep_time = self._calculate_sleep_interval() sleep_time = self._calculate_sleep_interval()
self._logger.debug(f"Sleeping for {sleep_time} seconds until next poll cycle") self._logger.debug(f"Sleeping for {sleep_time} seconds until next poll cycle")

View File

@@ -29,7 +29,11 @@ logger.info("Database tables initialized")
@asynccontextmanager @asynccontextmanager
async def lifespan(app: FastAPI): async def lifespan(app: FastAPI):
"""Manage application lifecycle - startup and shutdown events.""" """Manage application lifecycle - startup and shutdown events."""
from app.services import _connection_pool
# Startup # Startup
logger.info("Starting TCP connection pool cleanup task...")
_connection_pool.start_cleanup()
logger.info("Starting background poller...") logger.info("Starting background poller...")
await poller.start() await poller.start()
logger.info("Background poller started") logger.info("Background poller started")
@@ -40,12 +44,15 @@ async def lifespan(app: FastAPI):
logger.info("Stopping background poller...") logger.info("Stopping background poller...")
await poller.stop() await poller.stop()
logger.info("Background poller stopped") logger.info("Background poller stopped")
logger.info("Closing TCP connection pool...")
await _connection_pool.close_all()
logger.info("TCP connection pool closed")
app = FastAPI( app = FastAPI(
title="SLMM NL43 Addon", title="SLMM NL43 Addon",
description="Standalone module for NL43 configuration and status APIs with background polling", description="Standalone module for NL43 configuration and status APIs with background polling",
version="0.2.0", version="0.3.0",
lifespan=lifespan, lifespan=lifespan,
) )
@@ -85,10 +92,14 @@ async def health():
@app.get("/health/devices") @app.get("/health/devices")
async def health_devices(): async def health_devices():
"""Enhanced health check that tests device connectivity.""" """Enhanced health check that tests device connectivity.
Uses the connection pool to avoid unnecessary TCP handshakes — if a
cached connection exists and is alive, the device is reachable.
"""
from sqlalchemy.orm import Session from sqlalchemy.orm import Session
from app.database import SessionLocal from app.database import SessionLocal
from app.services import NL43Client from app.services import _connection_pool
from app.models import NL43Config from app.models import NL43Config
db: Session = SessionLocal() db: Session = SessionLocal()
@@ -98,7 +109,7 @@ async def health_devices():
configs = db.query(NL43Config).filter_by(tcp_enabled=True).all() configs = db.query(NL43Config).filter_by(tcp_enabled=True).all()
for cfg in configs: for cfg in configs:
client = NL43Client(cfg.host, cfg.tcp_port, timeout=2.0, ftp_username=cfg.ftp_username, ftp_password=cfg.ftp_password) device_key = f"{cfg.host}:{cfg.tcp_port}"
status = { status = {
"unit_id": cfg.unit_id, "unit_id": cfg.unit_id,
"host": cfg.host, "host": cfg.host,
@@ -108,14 +119,22 @@ async def health_devices():
} }
try: try:
# Try to connect (don't send command to avoid rate limiting issues) # Check if pool already has a live connection (zero-cost check)
import asyncio pool_stats = _connection_pool.get_stats()
reader, writer = await asyncio.wait_for( conn_info = pool_stats["connections"].get(device_key)
asyncio.open_connection(cfg.host, cfg.tcp_port), timeout=2.0 if conn_info and conn_info["alive"]:
)
writer.close()
await writer.wait_closed()
status["reachable"] = True status["reachable"] = True
status["source"] = "pool"
else:
# No cached connection — do a lightweight acquire/release
# This opens a connection if needed but keeps it in the pool
import asyncio
reader, writer, from_cache = await _connection_pool.acquire(
device_key, cfg.host, cfg.tcp_port, timeout=2.0
)
await _connection_pool.release(device_key, reader, writer, cfg.host, cfg.tcp_port)
status["reachable"] = True
status["source"] = "cached" if from_cache else "new"
except Exception as e: except Exception as e:
status["error"] = str(type(e).__name__) status["error"] = str(type(e).__name__)
logger.warning(f"Device {cfg.unit_id} health check failed: {e}") logger.warning(f"Device {cfg.unit_id} health check failed: {e}")

View File

@@ -93,6 +93,34 @@ class PollingConfigPayload(BaseModel):
poll_enabled: bool | None = Field(None, description="Enable or disable background polling for this device") poll_enabled: bool | None = Field(None, description="Enable or disable background polling for this device")
# ============================================================================
# TCP CONNECTION POOL ENDPOINTS (must be before /{unit_id} routes)
# ============================================================================
@router.get("/_connections/status")
async def get_connection_pool_status():
"""Get status of the persistent TCP connection pool.
Returns information about cached connections, keepalive settings,
and per-device connection age/idle times.
"""
from app.services import _connection_pool
return {"status": "ok", "pool": _connection_pool.get_stats()}
@router.post("/_connections/flush")
async def flush_connection_pool():
"""Close all cached TCP connections.
Useful for debugging or forcing fresh connections to all devices.
"""
from app.services import _connection_pool
await _connection_pool.close_all()
# Restart cleanup task since close_all cancels it
_connection_pool.start_cleanup()
return {"status": "ok", "message": "All cached connections closed"}
# ============================================================================ # ============================================================================
# GLOBAL POLLING STATUS ENDPOINT (must be before /{unit_id} routes) # GLOBAL POLLING STATUS ENDPOINT (must be before /{unit_id} routes)
# ============================================================================ # ============================================================================
@@ -545,12 +573,6 @@ async def stop_measurement(unit_id: str, db: Session = Depends(get_db)):
try: try:
await client.stop() await client.stop()
logger.info(f"Stopped measurement on unit {unit_id}") logger.info(f"Stopped measurement on unit {unit_id}")
# Query device status to update database with "Stop" state
snap = await client.request_dod()
snap.unit_id = unit_id
persist_snapshot(snap, db)
except ConnectionError as e: except ConnectionError as e:
logger.error(f"Failed to stop measurement on {unit_id}: {e}") logger.error(f"Failed to stop measurement on {unit_id}: {e}")
raise HTTPException(status_code=502, detail="Failed to communicate with device") raise HTTPException(status_code=502, detail="Failed to communicate with device")
@@ -560,6 +582,15 @@ async def stop_measurement(unit_id: str, db: Session = Depends(get_db)):
except Exception as e: except Exception as e:
logger.error(f"Unexpected error stopping measurement on {unit_id}: {e}") logger.error(f"Unexpected error stopping measurement on {unit_id}: {e}")
raise HTTPException(status_code=500, detail="Internal server error") raise HTTPException(status_code=500, detail="Internal server error")
# Query device status to update database — non-fatal if this fails
try:
snap = await client.request_dod()
snap.unit_id = unit_id
persist_snapshot(snap, db)
except Exception as e:
logger.warning(f"Stop succeeded but failed to update status for {unit_id}: {e}")
return {"status": "ok", "message": "Measurement stopped"} return {"status": "ok", "message": "Measurement stopped"}
@@ -657,8 +688,9 @@ async def stop_cycle(unit_id: str, payload: StopCyclePayload = None, db: Session
return {"status": "ok", "unit_id": unit_id, **result} return {"status": "ok", "unit_id": unit_id, **result}
except Exception as e: except Exception as e:
logger.error(f"Stop cycle failed for {unit_id}: {e}") error_msg = str(e) if str(e) else f"{type(e).__name__}: No details available"
raise HTTPException(status_code=502, detail=str(e)) logger.error(f"Stop cycle failed for {unit_id}: {error_msg}")
raise HTTPException(status_code=502, detail=error_msg)
@router.post("/{unit_id}/store") @router.post("/{unit_id}/store")
@@ -1723,74 +1755,38 @@ async def run_diagnostics(unit_id: str, db: Session = Depends(get_db)):
"message": "TCP communication enabled" "message": "TCP communication enabled"
} }
# Test 3: Modem/Router reachable (check port 443 HTTPS) # Test 3: TCP connection reachable (device port) — uses connection pool
# This avoids extra TCP handshakes over cellular. If a cached connection
# exists and is alive, we skip the handshake entirely.
from app.services import _connection_pool
device_key = f"{cfg.host}:{cfg.tcp_port}"
try: try:
reader, writer = await asyncio.wait_for( pool_stats = _connection_pool.get_stats()
asyncio.open_connection(cfg.host, 443), timeout=3.0 conn_info = pool_stats["connections"].get(device_key)
) if conn_info and conn_info["alive"]:
writer.close() # Pool already has a live connection — device is reachable
await writer.wait_closed() diagnostics["tests"]["tcp_connection"] = {
diagnostics["tests"]["modem_reachable"] = {
"status": "pass", "status": "pass",
"message": f"Modem/router reachable at {cfg.host}" "message": f"TCP connection alive in pool for {cfg.host}:{cfg.tcp_port}"
} }
except asyncio.TimeoutError: else:
diagnostics["tests"]["modem_reachable"] = { # Acquire through the pool (opens new if needed, keeps it cached)
"status": "fail", reader, writer, from_cache = await _connection_pool.acquire(
"message": f"Modem/router timeout at {cfg.host} (network issue)" device_key, cfg.host, cfg.tcp_port, timeout=3.0
}
diagnostics["overall_status"] = "fail"
return diagnostics
except ConnectionRefusedError:
# Connection refused means host is up but port 443 closed - that's ok
diagnostics["tests"]["modem_reachable"] = {
"status": "pass",
"message": f"Modem/router reachable at {cfg.host} (HTTPS closed)"
}
except Exception as e:
diagnostics["tests"]["modem_reachable"] = {
"status": "fail",
"message": f"Cannot reach modem/router at {cfg.host}: {str(e)}"
}
diagnostics["overall_status"] = "fail"
return diagnostics
# Test 4: TCP connection reachable (device port)
try:
reader, writer = await asyncio.wait_for(
asyncio.open_connection(cfg.host, cfg.tcp_port), timeout=3.0
) )
writer.close() await _connection_pool.release(device_key, reader, writer, cfg.host, cfg.tcp_port)
await writer.wait_closed()
diagnostics["tests"]["tcp_connection"] = { diagnostics["tests"]["tcp_connection"] = {
"status": "pass", "status": "pass",
"message": f"TCP connection successful to {cfg.host}:{cfg.tcp_port}" "message": f"TCP connection successful to {cfg.host}:{cfg.tcp_port}"
} }
except asyncio.TimeoutError:
diagnostics["tests"]["tcp_connection"] = {
"status": "fail",
"message": f"Connection timeout to {cfg.host}:{cfg.tcp_port}"
}
diagnostics["overall_status"] = "fail"
return diagnostics
except ConnectionRefusedError:
diagnostics["tests"]["tcp_connection"] = {
"status": "fail",
"message": f"Connection refused by {cfg.host}:{cfg.tcp_port}"
}
diagnostics["overall_status"] = "fail"
return diagnostics
except Exception as e: except Exception as e:
diagnostics["tests"]["tcp_connection"] = { diagnostics["tests"]["tcp_connection"] = {
"status": "fail", "status": "fail",
"message": f"Connection error: {str(e)}" "message": f"Connection error to {cfg.host}:{cfg.tcp_port}: {str(e)}"
} }
diagnostics["overall_status"] = "fail" diagnostics["overall_status"] = "fail"
return diagnostics return diagnostics
# Wait a bit after connection test to let device settle
await asyncio.sleep(1.5)
# Test 5: Device responds to commands # Test 5: Device responds to commands
# Use longer timeout to account for rate limiting (device requires ≥1s between commands) # Use longer timeout to account for rate limiting (device requires ≥1s between commands)
client = NL43Client(cfg.host, cfg.tcp_port, timeout=10.0, ftp_username=cfg.ftp_username, ftp_password=cfg.ftp_password) client = NL43Client(cfg.host, cfg.tcp_port, timeout=10.0, ftp_username=cfg.ftp_username, ftp_password=cfg.ftp_password)

View File

@@ -1,20 +1,22 @@
""" """
NL43 TCP connector and snapshot persistence. NL43 TCP connector and snapshot persistence.
Implements simple per-request TCP calls to avoid long-lived socket complexity. Implements persistent per-device TCP connections with OS-level keepalive
Extend to pooled connections/DRD streaming later. to reduce handshake overhead and survive cellular modem NAT timeouts.
Falls back to per-request connections on error with transparent retry.
""" """
import asyncio import asyncio
import contextlib import contextlib
import logging import logging
import socket
import time import time
import os import os
import zipfile import zipfile
import tempfile import tempfile
from dataclasses import dataclass from dataclasses import dataclass, field
from datetime import datetime, timezone, timedelta from datetime import datetime, timezone, timedelta
from typing import Optional, List, Dict from typing import Optional, List, Dict, Tuple
from sqlalchemy.orm import Session from sqlalchemy.orm import Session
from ftplib import FTP from ftplib import FTP
from pathlib import Path from pathlib import Path
@@ -234,6 +236,293 @@ async def _get_device_lock(device_key: str) -> asyncio.Lock:
return _device_locks[device_key] return _device_locks[device_key]
# ---------------------------------------------------------------------------
# Persistent TCP connection pool with OS-level keepalive
# ---------------------------------------------------------------------------
# Configuration via environment variables
TCP_PERSISTENT_ENABLED = os.getenv("TCP_PERSISTENT_ENABLED", "true").lower() == "true"
TCP_IDLE_TTL = float(os.getenv("TCP_IDLE_TTL", "300")) # Close idle connections after N seconds
TCP_MAX_AGE = float(os.getenv("TCP_MAX_AGE", "1800")) # Force reconnect after N seconds
TCP_KEEPALIVE_IDLE = int(os.getenv("TCP_KEEPALIVE_IDLE", "15")) # Seconds idle before probes
TCP_KEEPALIVE_INTERVAL = int(os.getenv("TCP_KEEPALIVE_INTERVAL", "10")) # Seconds between probes
TCP_KEEPALIVE_COUNT = int(os.getenv("TCP_KEEPALIVE_COUNT", "3")) # Failed probes before dead
logger.info(
f"TCP connection pool: persistent={TCP_PERSISTENT_ENABLED}, "
f"idle_ttl={TCP_IDLE_TTL}s, max_age={TCP_MAX_AGE}s, "
f"keepalive_idle={TCP_KEEPALIVE_IDLE}s, keepalive_interval={TCP_KEEPALIVE_INTERVAL}s, "
f"keepalive_count={TCP_KEEPALIVE_COUNT}"
)
@dataclass
class DeviceConnection:
"""Tracks a cached TCP connection and its metadata."""
reader: asyncio.StreamReader
writer: asyncio.StreamWriter
device_key: str
host: str
port: int
created_at: float = field(default_factory=time.time)
last_used_at: float = field(default_factory=time.time)
class ConnectionPool:
"""Per-device persistent TCP connection cache with OS-level keepalive.
Each NL-43 device supports only one TCP connection at a time. This pool
caches that single connection per device key and reuses it across commands,
avoiding repeated TCP handshakes over high-latency cellular links.
Keepalive probes keep cellular NAT tables alive and detect dead connections
before the next command attempt.
"""
def __init__(
self,
enable_persistent: bool = True,
idle_ttl: float = 120.0,
max_age: float = 300.0,
keepalive_idle: int = 15,
keepalive_interval: int = 10,
keepalive_count: int = 3,
):
self._connections: Dict[str, DeviceConnection] = {}
self._lock = asyncio.Lock()
self._enable_persistent = enable_persistent
self._idle_ttl = idle_ttl
self._max_age = max_age
self._keepalive_idle = keepalive_idle
self._keepalive_interval = keepalive_interval
self._keepalive_count = keepalive_count
self._cleanup_task: Optional[asyncio.Task] = None
# -- lifecycle ----------------------------------------------------------
def start_cleanup(self):
"""Start background task that evicts stale connections."""
if self._enable_persistent and self._cleanup_task is None:
self._cleanup_task = asyncio.create_task(self._cleanup_loop())
logger.info("Connection pool cleanup task started")
async def close_all(self):
"""Close all cached connections (called at shutdown)."""
if self._cleanup_task is not None:
self._cleanup_task.cancel()
with contextlib.suppress(asyncio.CancelledError):
await self._cleanup_task
self._cleanup_task = None
async with self._lock:
for key, conn in list(self._connections.items()):
await self._close_connection(conn, reason="shutdown")
self._connections.clear()
logger.info("Connection pool: all connections closed")
# -- public API ---------------------------------------------------------
async def acquire(
self, device_key: str, host: str, port: int, timeout: float
) -> Tuple[asyncio.StreamReader, asyncio.StreamWriter, bool]:
"""Get a connection for a device (cached or fresh).
Returns:
(reader, writer, from_cache) — from_cache is True if reused.
"""
if self._enable_persistent:
async with self._lock:
conn = self._connections.pop(device_key, None)
if conn is not None:
if self._is_alive(conn):
self._drain_buffer(conn.reader)
conn.last_used_at = time.time()
logger.info(f"Pool hit for {device_key} (age={time.time() - conn.created_at:.0f}s)")
return conn.reader, conn.writer, True
else:
await self._close_connection(conn, reason="stale")
# Open fresh connection
reader, writer = await self._open_connection(host, port, timeout)
logger.info(f"New connection opened for {device_key}")
return reader, writer, False
async def release(self, device_key: str, reader: asyncio.StreamReader, writer: asyncio.StreamWriter, host: str, port: int):
"""Return a connection to the pool for reuse."""
if not self._enable_persistent:
self._close_writer(writer)
return
# Check transport is still healthy before caching
if writer.transport.is_closing() or reader.at_eof():
self._close_writer(writer)
return
conn = DeviceConnection(
reader=reader,
writer=writer,
device_key=device_key,
host=host,
port=port,
)
async with self._lock:
# Evict any existing connection for this device (shouldn't happen
# under normal locking, but be safe)
old = self._connections.pop(device_key, None)
if old is not None:
await self._close_connection(old, reason="replaced")
self._connections[device_key] = conn
async def discard(self, device_key: str):
"""Close and remove a connection from the pool (called on errors)."""
async with self._lock:
conn = self._connections.pop(device_key, None)
if conn is not None:
await self._close_connection(conn, reason="discarded")
logger.debug(f"Pool discard for {device_key}")
def get_stats(self) -> dict:
"""Return pool status for diagnostics."""
now = time.time()
connections = {}
for key, conn in self._connections.items():
connections[key] = {
"host": conn.host,
"port": conn.port,
"age_seconds": round(now - conn.created_at, 1),
"idle_seconds": round(now - conn.last_used_at, 1),
"alive": self._is_alive(conn),
}
return {
"enabled": self._enable_persistent,
"active_connections": len(self._connections),
"idle_ttl": self._idle_ttl,
"max_age": self._max_age,
"keepalive_idle": self._keepalive_idle,
"keepalive_interval": self._keepalive_interval,
"keepalive_count": self._keepalive_count,
"connections": connections,
}
# -- internals ----------------------------------------------------------
async def _open_connection(
self, host: str, port: int, timeout: float
) -> Tuple[asyncio.StreamReader, asyncio.StreamWriter]:
"""Open a new TCP connection with keepalive options set."""
try:
reader, writer = await asyncio.wait_for(
asyncio.open_connection(host, port), timeout=timeout
)
except asyncio.TimeoutError:
raise ConnectionError(f"Failed to connect to device at {host}:{port}")
except Exception as e:
raise ConnectionError(f"Failed to connect to device: {e}")
# Set TCP keepalive on the underlying socket
self._set_keepalive(writer)
return reader, writer
def _set_keepalive(self, writer: asyncio.StreamWriter):
"""Configure OS-level TCP keepalive on the connection socket."""
try:
sock = writer.transport.get_extra_info("socket")
if sock is None:
logger.warning("Could not access underlying socket for keepalive")
return
sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
# Linux-specific keepalive tuning
if hasattr(socket, "TCP_KEEPIDLE"):
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, self._keepalive_idle)
if hasattr(socket, "TCP_KEEPINTVL"):
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, self._keepalive_interval)
if hasattr(socket, "TCP_KEEPCNT"):
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, self._keepalive_count)
logger.debug(
f"TCP keepalive set: idle={self._keepalive_idle}s, "
f"interval={self._keepalive_interval}s, count={self._keepalive_count}"
)
except OSError as e:
logger.warning(f"Failed to set TCP keepalive: {e}")
def _is_alive(self, conn: DeviceConnection) -> bool:
"""Check whether a cached connection is still usable."""
now = time.time()
# Age / idle checks (value of -1 disables the check)
if self._idle_ttl >= 0 and now - conn.last_used_at > self._idle_ttl:
logger.debug(f"Connection {conn.device_key} idle too long ({now - conn.last_used_at:.0f}s > {self._idle_ttl}s)")
return False
if self._max_age >= 0 and now - conn.created_at > self._max_age:
logger.debug(f"Connection {conn.device_key} too old ({now - conn.created_at:.0f}s > {self._max_age}s)")
return False
# Transport-level checks
transport = conn.writer.transport
if transport.is_closing():
logger.debug(f"Connection {conn.device_key} transport is closing")
return False
if conn.reader.at_eof():
logger.debug(f"Connection {conn.device_key} reader at EOF")
return False
return True
@staticmethod
def _drain_buffer(reader: asyncio.StreamReader):
"""Drain any pending bytes (e.g. '$' prompt) from an idle connection."""
buf = reader._buffer # noqa: SLF001 — internal but stable across CPython
if buf:
pending = bytes(buf)
buf.clear()
logger.debug(f"Drained {len(pending)} bytes from cached connection: {pending!r}")
@staticmethod
def _close_writer(writer: asyncio.StreamWriter):
"""Close a writer, suppressing errors."""
try:
writer.close()
except Exception:
pass
async def _close_connection(self, conn: DeviceConnection, reason: str = ""):
"""Fully close a cached connection."""
logger.debug(f"Closing connection {conn.device_key} ({reason})")
conn.writer.close()
with contextlib.suppress(Exception):
await conn.writer.wait_closed()
async def _cleanup_loop(self):
"""Periodically evict idle/expired connections."""
try:
while True:
await asyncio.sleep(30)
async with self._lock:
for key in list(self._connections):
conn = self._connections[key]
if not self._is_alive(conn):
del self._connections[key]
await self._close_connection(conn, reason="cleanup")
except asyncio.CancelledError:
pass
# Module-level pool singleton
_connection_pool = ConnectionPool(
enable_persistent=TCP_PERSISTENT_ENABLED,
idle_ttl=TCP_IDLE_TTL,
max_age=TCP_MAX_AGE,
keepalive_idle=TCP_KEEPALIVE_IDLE,
keepalive_interval=TCP_KEEPALIVE_INTERVAL,
keepalive_count=TCP_KEEPALIVE_COUNT,
)
class NL43Client: class NL43Client:
def __init__(self, host: str, port: int, timeout: float = 5.0, ftp_username: str = None, ftp_password: str = None, ftp_port: int = 21): def __init__(self, host: str, port: int, timeout: float = 5.0, ftp_username: str = None, ftp_password: str = None, ftp_port: int = 21):
self.host = host self.host = host
@@ -245,7 +534,12 @@ class NL43Client:
self.device_key = f"{host}:{port}" self.device_key = f"{host}:{port}"
async def _enforce_rate_limit(self): async def _enforce_rate_limit(self):
"""Ensure ≥1 second between commands to the same device.""" """Ensure ≥1 second between commands to the same device.
NL43 protocol requires ≥1s after the device responds before sending
the next command. The timestamp is recorded after each command completes
(connection closed), so we measure from completion, not from send time.
"""
async with _rate_limit_lock: async with _rate_limit_lock:
last_time = _last_command_time.get(self.device_key, 0) last_time = _last_command_time.get(self.device_key, 0)
elapsed = time.time() - last_time elapsed = time.time() - last_time
@@ -253,7 +547,6 @@ class NL43Client:
wait_time = 1.0 - elapsed wait_time = 1.0 - elapsed
logger.debug(f"Rate limiting: waiting {wait_time:.2f}s for {self.device_key}") logger.debug(f"Rate limiting: waiting {wait_time:.2f}s for {self.device_key}")
await asyncio.sleep(wait_time) await asyncio.sleep(wait_time)
_last_command_time[self.device_key] = time.time()
async def _send_command(self, cmd: str) -> str: async def _send_command(self, cmd: str) -> str:
"""Send ASCII command to NL43 device via TCP. """Send ASCII command to NL43 device via TCP.
@@ -271,23 +564,62 @@ class NL43Client:
return await self._send_command_unlocked(cmd) return await self._send_command_unlocked(cmd)
async def _send_command_unlocked(self, cmd: str) -> str: async def _send_command_unlocked(self, cmd: str) -> str:
"""Internal: send command without acquiring device lock (lock must be held by caller).""" """Internal: send command without acquiring device lock (lock must be held by caller).
Uses the connection pool to reuse cached TCP connections when possible.
If a cached connection fails, retries once with a fresh connection.
"""
await self._enforce_rate_limit() await self._enforce_rate_limit()
logger.info(f"Sending command to {self.device_key}: {cmd.strip()}") logger.info(f"Sending command to {self.device_key}: {cmd.strip()}")
try: try:
reader, writer = await asyncio.wait_for( reader, writer, from_cache = await _connection_pool.acquire(
asyncio.open_connection(self.host, self.port), timeout=self.timeout self.device_key, self.host, self.port, self.timeout
) )
except asyncio.TimeoutError: except ConnectionError:
logger.error(f"Connection timeout to {self.device_key}") logger.error(f"Connection failed to {self.device_key}")
raise ConnectionError(f"Failed to connect to device at {self.host}:{self.port}") raise
except Exception as e:
logger.error(f"Connection failed to {self.device_key}: {e}")
raise ConnectionError(f"Failed to connect to device: {str(e)}")
try: try:
response = await self._execute_command(reader, writer, cmd)
# Success — return connection to pool for reuse
await _connection_pool.release(self.device_key, reader, writer, self.host, self.port)
_last_command_time[self.device_key] = time.time()
return response
except Exception as e:
# Discard the bad connection
await _connection_pool.discard(self.device_key)
ConnectionPool._close_writer(writer)
if from_cache:
# Retry once with a fresh connection — the cached one may have gone stale
logger.warning(f"Cached connection failed for {self.device_key}, retrying fresh: {e}")
await self._enforce_rate_limit()
try:
reader, writer, _ = await _connection_pool.acquire(
self.device_key, self.host, self.port, self.timeout
)
except ConnectionError:
logger.error(f"Retry connection also failed to {self.device_key}")
raise
try:
response = await self._execute_command(reader, writer, cmd)
await _connection_pool.release(self.device_key, reader, writer, self.host, self.port)
_last_command_time[self.device_key] = time.time()
return response
except Exception:
await _connection_pool.discard(self.device_key)
ConnectionPool._close_writer(writer)
raise
else:
raise
async def _execute_command(self, reader: asyncio.StreamReader, writer: asyncio.StreamWriter, cmd: str) -> str:
"""Send a command over an existing connection and parse the NL43 response."""
writer.write(cmd.encode("ascii")) writer.write(cmd.encode("ascii"))
await writer.drain() await writer.drain()
@@ -303,7 +635,7 @@ class NL43Client:
# Check result code # Check result code
if result_code == "R+0000": if result_code == "R+0000":
# Success - for query commands, read the second line with actual data # Success for query commands, read the second line with actual data
is_query = cmd.strip().endswith("?") is_query = cmd.strip().endswith("?")
if is_query: if is_query:
data_line = await asyncio.wait_for(reader.readuntil(b"\n"), timeout=self.timeout) data_line = await asyncio.wait_for(reader.readuntil(b"\n"), timeout=self.timeout)
@@ -311,7 +643,7 @@ class NL43Client:
logger.debug(f"Data line from {self.device_key}: {response}") logger.debug(f"Data line from {self.device_key}: {response}")
return response return response
else: else:
# Setting command - return success code # Setting command return success code
return result_code return result_code
elif result_code == "R+0001": elif result_code == "R+0001":
raise ValueError("Command error - device did not recognize command") raise ValueError("Command error - device did not recognize command")
@@ -324,17 +656,6 @@ class NL43Client:
else: else:
raise ValueError(f"Unknown result code: {result_code}") raise ValueError(f"Unknown result code: {result_code}")
except asyncio.TimeoutError:
logger.error(f"Response timeout from {self.device_key}")
raise TimeoutError(f"Device did not respond within {self.timeout}s")
except Exception as e:
logger.error(f"Communication error with {self.device_key}: {e}")
raise
finally:
writer.close()
with contextlib.suppress(Exception):
await writer.wait_closed()
async def request_dod(self) -> NL43Snapshot: async def request_dod(self) -> NL43Snapshot:
"""Request DOD (Data Output Display) snapshot from device. """Request DOD (Data Output Display) snapshot from device.
@@ -575,20 +896,19 @@ class NL43Client:
# Acquire per-device lock - held for entire streaming session # Acquire per-device lock - held for entire streaming session
device_lock = await _get_device_lock(self.device_key) device_lock = await _get_device_lock(self.device_key)
async with device_lock: async with device_lock:
# Evict any cached connection — streaming needs its own dedicated socket
await _connection_pool.discard(self.device_key)
await self._enforce_rate_limit() await self._enforce_rate_limit()
logger.info(f"Starting DRD stream for {self.device_key}") logger.info(f"Starting DRD stream for {self.device_key}")
try: try:
reader, writer = await asyncio.wait_for( reader, writer = await _connection_pool._open_connection(
asyncio.open_connection(self.host, self.port), timeout=self.timeout self.host, self.port, self.timeout
) )
except asyncio.TimeoutError: except ConnectionError:
logger.error(f"DRD stream connection timeout to {self.device_key}") logger.error(f"DRD stream connection failed to {self.device_key}")
raise ConnectionError(f"Failed to connect to device at {self.host}:{self.port}") raise
except Exception as e:
logger.error(f"DRD stream connection failed to {self.device_key}: {e}")
raise ConnectionError(f"Failed to connect to device: {str(e)}")
try: try:
# Start DRD streaming # Start DRD streaming
@@ -1381,11 +1701,42 @@ class NL43Client:
result["stopped"] = True result["stopped"] = True
logger.info(f"[STOP-CYCLE] Measurement stopped") logger.info(f"[STOP-CYCLE] Measurement stopped")
# Step 2: Enable FTP # Step 2: Reset FTP (disable then enable) to clear any stale state
logger.info(f"[STOP-CYCLE] Step 2: Enabling FTP") logger.info(f"[STOP-CYCLE] Step 2: Resetting FTP (disable then enable)")
try:
await self.disable_ftp()
logger.info(f"[STOP-CYCLE] FTP disabled")
except Exception as e:
logger.warning(f"[STOP-CYCLE] FTP disable failed (may already be off): {e}")
await self.enable_ftp() await self.enable_ftp()
logger.info(f"[STOP-CYCLE] FTP enable command sent")
# Step 2b: Wait and verify FTP is ready (NL-43 needs time to start FTP server)
ftp_ready_timeout = 30 # seconds
ftp_check_interval = 2 # seconds
ftp_ready = False
elapsed = 0
logger.info(f"[STOP-CYCLE] Step 2b: Waiting up to {ftp_ready_timeout}s for FTP server to be ready")
while elapsed < ftp_ready_timeout:
await asyncio.sleep(ftp_check_interval)
elapsed += ftp_check_interval
try:
ftp_status = await self.get_ftp_status()
logger.info(f"[STOP-CYCLE] FTP status check at {elapsed}s: {ftp_status}")
if ftp_status.lower() == "on":
ftp_ready = True
logger.info(f"[STOP-CYCLE] FTP server confirmed ready after {elapsed}s")
break
except Exception as e:
logger.warning(f"[STOP-CYCLE] FTP status check failed at {elapsed}s: {e}")
if ftp_ready:
result["ftp_enabled"] = True result["ftp_enabled"] = True
logger.info(f"[STOP-CYCLE] FTP enabled") logger.info(f"[STOP-CYCLE] FTP enabled and verified")
else:
logger.warning(f"[STOP-CYCLE] FTP not confirmed ready after {ftp_ready_timeout}s, proceeding anyway")
result["ftp_enabled"] = True # Command was sent, just not verified
if not download: if not download:
logger.info(f"[STOP-CYCLE] === Cycle complete (download=False) ===") logger.info(f"[STOP-CYCLE] === Cycle complete (download=False) ===")

View File

@@ -3,7 +3,7 @@
<head> <head>
<meta charset="UTF-8" /> <meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>SLMM Roster - Sound Level Meter Configuration</title> <title>SLMM - Device Roster &amp; Connections</title>
<style> <style>
* { box-sizing: border-box; } * { box-sizing: border-box; }
body { body {
@@ -227,19 +227,119 @@
} }
.toast-success { background: #2da44e; } .toast-success { background: #2da44e; }
.toast-error { background: #cf222e; } .toast-error { background: #cf222e; }
/* Tabs */
.tabs {
display: flex;
gap: 0;
margin-bottom: 0;
border-bottom: 2px solid #d0d7de;
}
.tab-btn {
padding: 10px 20px;
border: none;
background: none;
cursor: pointer;
font-size: 14px;
font-weight: 600;
color: #57606a;
border-bottom: 2px solid transparent;
margin-bottom: -2px;
transition: color 0.2s, border-color 0.2s;
}
.tab-btn:hover { color: #24292f; }
.tab-btn.active {
color: #24292f;
border-bottom-color: #fd8c73;
}
.tab-panel { display: none; }
.tab-panel.active { display: block; }
/* Connection pool panel */
.pool-config {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(180px, 1fr));
gap: 12px;
margin-bottom: 20px;
}
.pool-config-card {
background: #f6f8fa;
border: 1px solid #d0d7de;
border-radius: 6px;
padding: 12px;
}
.pool-config-card .label {
font-size: 11px;
color: #57606a;
text-transform: uppercase;
font-weight: 600;
margin-bottom: 4px;
}
.pool-config-card .value {
font-size: 18px;
font-weight: 600;
color: #24292f;
}
.conn-card {
background: white;
border: 1px solid #d0d7de;
border-radius: 6px;
padding: 16px;
margin-bottom: 12px;
}
.conn-card-header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 12px;
}
.conn-card-header strong { font-size: 15px; }
.conn-card-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(140px, 1fr));
gap: 8px;
}
.conn-stat .label {
font-size: 11px;
color: #57606a;
text-transform: uppercase;
font-weight: 600;
}
.conn-stat .value {
font-size: 14px;
font-weight: 600;
color: #24292f;
}
.conn-empty {
text-align: center;
padding: 32px;
color: #57606a;
}
.pool-actions {
display: flex;
gap: 8px;
margin-bottom: 16px;
}
</style> </style>
</head> </head>
<body> <body>
<div class="container"> <div class="container">
<div class="header"> <div class="header">
<h1>📊 Sound Level Meter Roster</h1> <h1>SLMM - Roster &amp; Connections</h1>
<div class="nav"> <div class="nav">
<a href="/" class="btn"> Back to Control Panel</a> <a href="/" class="btn">&larr; Back to Control Panel</a>
<button class="btn btn-primary" onclick="openAddModal()">+ Add Device</button> <button class="btn btn-primary" onclick="openAddModal()">+ Add Device</button>
</div> </div>
</div> </div>
<div class="table-container"> <div class="tabs">
<button class="tab-btn active" onclick="switchTab('roster')">Device Roster</button>
<button class="tab-btn" onclick="switchTab('connections')">Connections</button>
</div>
<!-- Roster Tab -->
<div id="tab-roster" class="tab-panel active">
<div class="table-container" style="border-top-left-radius: 0; border-top-right-radius: 0;">
<table id="rosterTable"> <table id="rosterTable">
<thead> <thead>
<tr> <tr>
@@ -265,6 +365,30 @@
</div> </div>
</div> </div>
<!-- Connections Tab -->
<div id="tab-connections" class="tab-panel">
<div class="table-container" style="padding: 20px; border-top-left-radius: 0; border-top-right-radius: 0;">
<div class="pool-actions">
<button class="btn" onclick="loadConnections()">Refresh</button>
<button class="btn btn-danger" onclick="flushConnections()">Flush All Connections</button>
</div>
<h3 style="margin: 0 0 12px 0; font-size: 16px;">Pool Configuration</h3>
<div id="poolConfig" class="pool-config">
<div class="pool-config-card">
<div class="label">Status</div>
<div class="value" id="poolEnabled">--</div>
</div>
</div>
<h3 style="margin: 20px 0 12px 0; font-size: 16px;">Active Connections</h3>
<div id="connectionsList">
<div class="conn-empty">Loading...</div>
</div>
</div>
</div>
</div>
<!-- Add/Edit Modal --> <!-- Add/Edit Modal -->
<div id="deviceModal" class="modal"> <div id="deviceModal" class="modal">
<div class="modal-content"> <div class="modal-content">
@@ -619,6 +743,159 @@
closeModal(); closeModal();
} }
}); });
// ========== Tab Switching ==========
function switchTab(tabName) {
document.querySelectorAll('.tab-btn').forEach(btn => btn.classList.remove('active'));
document.querySelectorAll('.tab-panel').forEach(panel => panel.classList.remove('active'));
document.querySelector(`.tab-btn[onclick="switchTab('${tabName}')"]`).classList.add('active');
document.getElementById(`tab-${tabName}`).classList.add('active');
if (tabName === 'connections') {
loadConnections();
}
}
// ========== Connection Pool ==========
let connectionsRefreshTimer = null;
async function loadConnections() {
try {
const res = await fetch('/api/nl43/_connections/status');
const data = await res.json();
if (!res.ok) {
showToast('Failed to load connection pool status', 'error');
return;
}
const pool = data.pool;
renderPoolConfig(pool);
renderConnections(pool.connections);
// Auto-refresh while tab is active
clearTimeout(connectionsRefreshTimer);
if (document.getElementById('tab-connections').classList.contains('active')) {
connectionsRefreshTimer = setTimeout(loadConnections, 5000);
}
} catch (err) {
showToast('Error loading connections: ' + err.message, 'error');
console.error('Load connections error:', err);
}
}
function renderPoolConfig(pool) {
document.getElementById('poolConfig').innerHTML = `
<div class="pool-config-card">
<div class="label">Persistent</div>
<div class="value" style="color: ${pool.enabled ? '#1a7f37' : '#cf222e'}">${pool.enabled ? 'Enabled' : 'Disabled'}</div>
</div>
<div class="pool-config-card">
<div class="label">Active</div>
<div class="value">${pool.active_connections}</div>
</div>
<div class="pool-config-card">
<div class="label">Idle TTL</div>
<div class="value">${pool.idle_ttl}s</div>
</div>
<div class="pool-config-card">
<div class="label">Max Age</div>
<div class="value">${pool.max_age}s</div>
</div>
<div class="pool-config-card">
<div class="label">KA Idle</div>
<div class="value">${pool.keepalive_idle}s</div>
</div>
<div class="pool-config-card">
<div class="label">KA Interval</div>
<div class="value">${pool.keepalive_interval}s</div>
</div>
<div class="pool-config-card">
<div class="label">KA Probes</div>
<div class="value">${pool.keepalive_count}</div>
</div>
`;
}
function renderConnections(connections) {
const container = document.getElementById('connectionsList');
const keys = Object.keys(connections);
if (keys.length === 0) {
container.innerHTML = `
<div class="conn-empty">
<div style="font-size: 32px; margin-bottom: 8px;">~</div>
<div><strong>No active connections</strong></div>
<div style="margin-top: 4px; font-size: 13px;">
Connections appear here when devices are actively being polled and the connection is cached between commands.
</div>
</div>
`;
return;
}
container.innerHTML = keys.map(key => {
const conn = connections[key];
const aliveColor = conn.alive ? '#1a7f37' : '#cf222e';
const aliveText = conn.alive ? 'Alive' : 'Stale';
return `
<div class="conn-card">
<div class="conn-card-header">
<strong>${escapeHtml(key)}</strong>
<span class="status-badge ${conn.alive ? 'status-ok' : 'status-error'}">${aliveText}</span>
</div>
<div class="conn-card-grid">
<div class="conn-stat">
<div class="label">Host</div>
<div class="value">${escapeHtml(conn.host)}</div>
</div>
<div class="conn-stat">
<div class="label">Port</div>
<div class="value">${conn.port}</div>
</div>
<div class="conn-stat">
<div class="label">Age</div>
<div class="value">${formatSeconds(conn.age_seconds)}</div>
</div>
<div class="conn-stat">
<div class="label">Idle</div>
<div class="value">${formatSeconds(conn.idle_seconds)}</div>
</div>
</div>
</div>
`;
}).join('');
}
function formatSeconds(s) {
if (s < 60) return Math.round(s) + 's';
if (s < 3600) return Math.floor(s / 60) + 'm ' + Math.round(s % 60) + 's';
return Math.floor(s / 3600) + 'h ' + Math.floor((s % 3600) / 60) + 'm';
}
async function flushConnections() {
if (!confirm('Close all cached TCP connections?\n\nDevices will reconnect on the next poll cycle.')) {
return;
}
try {
const res = await fetch('/api/nl43/_connections/flush', { method: 'POST' });
const data = await res.json();
if (!res.ok) {
showToast(data.detail || 'Failed to flush connections', 'error');
return;
}
showToast('All connections flushed', 'success');
await loadConnections();
} catch (err) {
showToast('Error flushing connections: ' + err.message, 'error');
}
}
</script> </script>
</body> </body>
</html> </html>

View File

@@ -1,128 +0,0 @@
#!/usr/bin/env python3
"""
Test script to verify that sleep mode is automatically disabled when:
1. Device configuration is created/updated with TCP enabled
2. Measurements are started
This script tests the API endpoints, not the actual device communication.
"""
import requests
import json
BASE_URL = "http://localhost:8100/api/nl43"
UNIT_ID = "test-nl43-001"
def test_config_update():
"""Test that config update works (actual sleep mode disable requires real device)"""
print("\n=== Testing Config Update ===")
# Create/update a device config
config_data = {
"host": "192.168.1.100",
"tcp_port": 2255,
"tcp_enabled": True,
"ftp_enabled": False,
"ftp_username": "admin",
"ftp_password": "password"
}
print(f"Updating config for {UNIT_ID}...")
response = requests.put(f"{BASE_URL}/{UNIT_ID}/config", json=config_data)
if response.status_code == 200:
print("✓ Config updated successfully")
print(f"Response: {json.dumps(response.json(), indent=2)}")
print("\nNote: Sleep mode disable was attempted (will succeed if device is reachable)")
return True
else:
print(f"✗ Config update failed: {response.status_code}")
print(f"Error: {response.text}")
return False
def test_get_config():
"""Test retrieving the config"""
print("\n=== Testing Get Config ===")
response = requests.get(f"{BASE_URL}/{UNIT_ID}/config")
if response.status_code == 200:
print("✓ Config retrieved successfully")
print(f"Response: {json.dumps(response.json(), indent=2)}")
return True
elif response.status_code == 404:
print("✗ Config not found (create one first)")
return False
else:
print(f"✗ Request failed: {response.status_code}")
print(f"Error: {response.text}")
return False
def test_start_measurement():
"""Test that start measurement attempts to disable sleep mode"""
print("\n=== Testing Start Measurement ===")
print(f"Attempting to start measurement on {UNIT_ID}...")
response = requests.post(f"{BASE_URL}/{UNIT_ID}/start")
if response.status_code == 200:
print("✓ Start command accepted")
print(f"Response: {json.dumps(response.json(), indent=2)}")
print("\nNote: Sleep mode was disabled before starting measurement")
return True
elif response.status_code == 404:
print("✗ Device config not found (create config first)")
return False
elif response.status_code == 502:
print("✗ Device not reachable (expected if no physical device)")
print(f"Response: {response.text}")
print("\nNote: This is expected behavior when testing without a physical device")
return True # This is actually success - the endpoint tried to communicate
else:
print(f"✗ Request failed: {response.status_code}")
print(f"Error: {response.text}")
return False
def main():
print("=" * 60)
print("Sleep Mode Auto-Disable Test")
print("=" * 60)
print("\nThis test verifies that sleep mode is automatically disabled")
print("when device configs are updated or measurements are started.")
print("\nNote: Without a physical device, some operations will fail at")
print("the device communication level, but the API logic will execute.")
# Run tests
results = []
# Test 1: Update config (should attempt to disable sleep mode)
results.append(("Config Update", test_config_update()))
# Test 2: Get config
results.append(("Get Config", test_get_config()))
# Test 3: Start measurement (should attempt to disable sleep mode)
results.append(("Start Measurement", test_start_measurement()))
# Summary
print("\n" + "=" * 60)
print("Test Summary")
print("=" * 60)
for test_name, result in results:
status = "✓ PASS" if result else "✗ FAIL"
print(f"{status}: {test_name}")
print("\n" + "=" * 60)
print("Implementation Details:")
print("=" * 60)
print("1. Config endpoint is now async and calls ensure_sleep_mode_disabled()")
print(" when TCP is enabled")
print("2. Start measurement endpoint calls ensure_sleep_mode_disabled()")
print(" before starting the measurement")
print("3. Sleep mode check is non-blocking - config/start will succeed")
print(" even if the device is unreachable")
print("=" * 60)
if __name__ == "__main__":
main()