Compare commits
21 Commits
738ad7878e
...
dev-persis
| Author | SHA1 | Date | |
|---|---|---|---|
| 450509d210 | |||
| fefa9eace8 | |||
| 98a8d357e5 | |||
|
|
0a7422eceb | ||
|
|
996b993cb9 | ||
|
|
01337696b3 | ||
|
|
a302fd15d4 | ||
|
|
af5ecc1a92 | ||
|
|
b62e84f8b3 | ||
|
|
a5f8d1b2c7 | ||
|
|
a1a80bbb4d | ||
|
|
005e0091fe | ||
|
|
e6ac80df6c | ||
|
|
7070b948a8 | ||
|
|
3b6e9ad3f0 | ||
|
|
eb0cbcc077 | ||
|
|
cc0a5bdf84 | ||
|
|
bf5f222511 | ||
|
|
eb39a9d1d0 | ||
|
|
67d63b4173 | ||
|
|
25cf9528d0 |
7
.gitignore
vendored
7
.gitignore
vendored
@@ -1,5 +1,8 @@
|
|||||||
/manuals/
|
/manuals/
|
||||||
/data/
|
/data/
|
||||||
|
/data-dev/
|
||||||
|
/SLM-stress-test/stress_test_logs/
|
||||||
|
/SLM-stress-test/tcpdump-runs/
|
||||||
|
|
||||||
# Python cache
|
# Python cache
|
||||||
__pycache__/
|
__pycache__/
|
||||||
@@ -11,4 +14,6 @@ __pycache__/
|
|||||||
*.egg
|
*.egg
|
||||||
*.egg-info/
|
*.egg-info/
|
||||||
dist/
|
dist/
|
||||||
build/
|
build/
|
||||||
|
|
||||||
|
*.pcap
|
||||||
66
CHANGELOG.md
66
CHANGELOG.md
@@ -5,6 +5,70 @@ All notable changes to SLMM (Sound Level Meter Manager) will be documented in th
|
|||||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
|
||||||
|
## [0.3.0] - 2026-02-17
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
#### Persistent TCP Connection Pool
|
||||||
|
- **Connection reuse** - TCP connections are cached per device and reused across commands, eliminating repeated TCP handshakes over cellular modems
|
||||||
|
- **OS-level TCP keepalive** - Configurable keepalive probes keep cellular NAT tables alive and detect dead connections early (default: probe after 15s idle, every 10s, 3 failures = dead)
|
||||||
|
- **Transparent retry** - If a cached connection goes stale, the system automatically retries with a fresh connection so failures are never visible to the caller
|
||||||
|
- **Stale connection detection** - Multi-layer detection via idle TTL, max age, transport state, and reader EOF checks
|
||||||
|
- **Background cleanup** - Periodic task (every 30s) evicts expired connections from the pool
|
||||||
|
- **Master switch** - Set `TCP_PERSISTENT_ENABLED=false` to revert to per-request connection behavior
|
||||||
|
|
||||||
|
#### Connection Pool Diagnostics
|
||||||
|
- `GET /api/nl43/_connections/status` - View pool configuration, active connections, age/idle times, and keepalive settings
|
||||||
|
- `POST /api/nl43/_connections/flush` - Force-close all cached connections (useful for debugging)
|
||||||
|
- **Connections tab on roster page** - Live UI showing pool config, active connections with age/idle/alive status, auto-refreshes every 5s, and flush button
|
||||||
|
|
||||||
|
#### Environment Variables
|
||||||
|
- `TCP_PERSISTENT_ENABLED` (default: `true`) - Master switch for persistent connections
|
||||||
|
- `TCP_IDLE_TTL` (default: `300`) - Close idle connections after N seconds
|
||||||
|
- `TCP_MAX_AGE` (default: `1800`) - Force reconnect after N seconds
|
||||||
|
- `TCP_KEEPALIVE_IDLE` (default: `15`) - Seconds idle before keepalive probes start
|
||||||
|
- `TCP_KEEPALIVE_INTERVAL` (default: `10`) - Seconds between keepalive probes
|
||||||
|
- `TCP_KEEPALIVE_COUNT` (default: `3`) - Failed probes before declaring connection dead
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- **Health check endpoint** (`/health/devices`) - Now uses connection pool instead of opening throwaway TCP connections; checks for existing live connections first (zero-cost), only opens new connection through pool if needed
|
||||||
|
- **Diagnostics endpoint** - Removed separate port 443 modem check (extra handshake waste); TCP reachability test now uses connection pool
|
||||||
|
- **DRD streaming** - Streaming connections now get TCP keepalive options set; cached connections are evicted before opening dedicated streaming socket
|
||||||
|
- **Default timeouts tuned for cellular** - Idle TTL raised to 300s (5 min), max age raised to 1800s (30 min) to survive typical polling intervals over cellular links
|
||||||
|
|
||||||
|
### Technical Details
|
||||||
|
|
||||||
|
#### Architecture
|
||||||
|
- `ConnectionPool` class in `services.py` manages a single cached connection per device key (NL-43 only supports one TCP connection at a time)
|
||||||
|
- Uses existing per-device asyncio locks and rate limiting — no changes to concurrency model
|
||||||
|
- Pool is a module-level singleton initialized from environment variables at import time
|
||||||
|
- Lifecycle managed via FastAPI lifespan: cleanup task starts on startup, all connections closed on shutdown
|
||||||
|
- `_send_command_unlocked()` refactored to use acquire/release/discard pattern with single-retry fallback
|
||||||
|
- Command parsing extracted to `_execute_command()` method for reuse between primary and retry paths
|
||||||
|
|
||||||
|
#### Cellular Modem Optimizations
|
||||||
|
- Keepalive probes at 15s prevent cellular NAT tables from expiring (typically 30-60s timeout)
|
||||||
|
- 300s idle TTL ensures connections survive between polling cycles (default 60s interval)
|
||||||
|
- 1800s max age allows a single socket to serve ~30 minutes of polling before forced reconnect
|
||||||
|
- Health checks and diagnostics produce zero additional TCP handshakes when a pooled connection exists
|
||||||
|
- Stale `$` prompt bytes drained from idle connections before command reuse
|
||||||
|
|
||||||
|
### Breaking Changes
|
||||||
|
None. This release is fully backward-compatible with v0.2.x. Set `TCP_PERSISTENT_ENABLED=false` for identical behavior to previous versions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [0.2.1] - 2026-01-23
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- **Roster management**: UI and API endpoints for managing device rosters.
|
||||||
|
- **Delete config endpoint**: Remove device configuration alongside cached status data.
|
||||||
|
- **Scheduler hooks**: `start_cycle` and `stop_cycle` helpers for Terra-View scheduling integration.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- **FTP logging**: Connection, authentication, and transfer phases now log explicitly.
|
||||||
|
- **Documentation**: Reorganized docs/scripts and updated API notes for FTP/TCP verification.
|
||||||
|
|
||||||
## [0.2.0] - 2026-01-15
|
## [0.2.0] - 2026-01-15
|
||||||
|
|
||||||
### Added
|
### Added
|
||||||
@@ -135,5 +199,7 @@ None. This release is fully backward-compatible with v0.1.x. All existing endpoi
|
|||||||
|
|
||||||
## Version History Summary
|
## Version History Summary
|
||||||
|
|
||||||
|
- **v0.3.0** (2026-02-17) - Persistent TCP connections with keepalive for cellular modem reliability
|
||||||
|
- **v0.2.1** (2026-01-23) - Roster management, scheduler hooks, FTP logging, doc cleanup
|
||||||
- **v0.2.0** (2026-01-15) - Background Polling System
|
- **v0.2.0** (2026-01-15) - Background Polling System
|
||||||
- **v0.1.0** (2025-12-XX) - Initial Release
|
- **v0.1.0** (2025-12-XX) - Initial Release
|
||||||
|
|||||||
77
README.md
77
README.md
@@ -1,6 +1,6 @@
|
|||||||
# SLMM - Sound Level Meter Manager
|
# SLMM - Sound Level Meter Manager
|
||||||
|
|
||||||
**Version 0.2.0**
|
**Version 0.3.0**
|
||||||
|
|
||||||
Backend API service for controlling and monitoring Rion NL-43/NL-53 Sound Level Meters via TCP and FTP protocols.
|
Backend API service for controlling and monitoring Rion NL-43/NL-53 Sound Level Meters via TCP and FTP protocols.
|
||||||
|
|
||||||
@@ -12,8 +12,9 @@ SLMM is a standalone backend module that provides REST API routing and command t
|
|||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **Background Polling** ⭐ NEW: Continuous automatic polling of devices with configurable intervals
|
- **Persistent TCP Connections**: Cached per-device connections with OS-level keepalive, tuned for cellular modem reliability
|
||||||
- **Offline Detection** ⭐ NEW: Automatic device reachability tracking with failure counters
|
- **Background Polling**: Continuous automatic polling of devices with configurable intervals
|
||||||
|
- **Offline Detection**: Automatic device reachability tracking with failure counters
|
||||||
- **Device Management**: Configure and manage multiple NL43/NL53 devices
|
- **Device Management**: Configure and manage multiple NL43/NL53 devices
|
||||||
- **Real-time Monitoring**: Stream live measurement data via WebSocket
|
- **Real-time Monitoring**: Stream live measurement data via WebSocket
|
||||||
- **Measurement Control**: Start, stop, pause, resume, and reset measurements
|
- **Measurement Control**: Start, stop, pause, resume, and reset measurements
|
||||||
@@ -22,6 +23,7 @@ SLMM is a standalone backend module that provides REST API routing and command t
|
|||||||
- **Device Configuration**: Manage frequency/time weighting, clock sync, and more
|
- **Device Configuration**: Manage frequency/time weighting, clock sync, and more
|
||||||
- **Rate Limiting**: Automatic 1-second delay enforcement between device commands
|
- **Rate Limiting**: Automatic 1-second delay enforcement between device commands
|
||||||
- **Persistent Storage**: SQLite database for device configs and measurement cache
|
- **Persistent Storage**: SQLite database for device configs and measurement cache
|
||||||
|
- **Connection Diagnostics**: Live UI and API endpoints for monitoring TCP connection pool status
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
@@ -29,29 +31,39 @@ SLMM is a standalone backend module that provides REST API routing and command t
|
|||||||
┌─────────────────┐ ┌──────────────────────────────┐ ┌─────────────────┐
|
┌─────────────────┐ ┌──────────────────────────────┐ ┌─────────────────┐
|
||||||
│ │◄───────►│ SLMM API │◄───────►│ NL43/NL53 │
|
│ │◄───────►│ SLMM API │◄───────►│ NL43/NL53 │
|
||||||
│ (Frontend) │ HTTP │ • REST Endpoints │ TCP │ Sound Meters │
|
│ (Frontend) │ HTTP │ • REST Endpoints │ TCP │ Sound Meters │
|
||||||
└─────────────────┘ │ • WebSocket Streaming │ └─────────────────┘
|
└─────────────────┘ │ • WebSocket Streaming │ (kept │ (via cellular │
|
||||||
│ • Background Poller ⭐ NEW │ ▲
|
│ • Background Poller │ alive) │ modem) │
|
||||||
└──────────────────────────────┘ │
|
│ • Connection Pool (v0.3) │ └─────────────────┘
|
||||||
│ Continuous
|
└──────────────────────────────┘
|
||||||
▼ Polling
|
│
|
||||||
┌──────────────┐ │
|
▼
|
||||||
│ SQLite DB │◄─────────────────────┘
|
┌──────────────┐
|
||||||
|
│ SQLite DB │
|
||||||
│ • Config │
|
│ • Config │
|
||||||
│ • Status │
|
│ • Status │
|
||||||
└──────────────┘
|
└──────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Persistent TCP Connection Pool (v0.3.0)
|
||||||
|
|
||||||
|
SLMM maintains persistent TCP connections to devices with OS-level keepalive, designed for reliable operation over cellular modems:
|
||||||
|
|
||||||
|
- **Connection Reuse**: One cached TCP socket per device, reused across all commands (no repeated handshakes)
|
||||||
|
- **TCP Keepalive**: Probes keep cellular NAT tables alive and detect dead connections early
|
||||||
|
- **Transparent Retry**: Stale cached connections automatically retry with a fresh socket
|
||||||
|
- **Configurable**: Idle TTL (300s), max age (1800s), and keepalive timing via environment variables
|
||||||
|
- **Diagnostics**: Live UI on the roster page and API endpoints for monitoring pool status
|
||||||
|
|
||||||
### Background Polling (v0.2.0)
|
### Background Polling (v0.2.0)
|
||||||
|
|
||||||
SLMM now includes a background polling service that continuously queries devices and updates the status cache:
|
Background polling service continuously queries devices and updates the status cache:
|
||||||
|
|
||||||
- **Automatic Updates**: Devices are polled at configurable intervals (10-3600 seconds)
|
- **Automatic Updates**: Devices are polled at configurable intervals (10-3600 seconds)
|
||||||
- **Offline Detection**: Devices marked unreachable after 3 consecutive failures
|
- **Offline Detection**: Devices marked unreachable after 3 consecutive failures
|
||||||
- **Per-Device Configuration**: Each device can have a custom polling interval
|
- **Per-Device Configuration**: Each device can have a custom polling interval
|
||||||
- **Resource Efficient**: Dynamic sleep intervals and smart scheduling
|
- **Resource Efficient**: Dynamic sleep intervals and smart scheduling
|
||||||
- **Graceful Shutdown**: Background task stops cleanly on service shutdown
|
|
||||||
|
|
||||||
This makes Terra-View significantly more responsive - status requests return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds).
|
Status requests return cached data instantly (<100ms) instead of waiting for device queries (1-2 seconds).
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
@@ -96,9 +108,18 @@ Once running, visit:
|
|||||||
|
|
||||||
### Environment Variables
|
### Environment Variables
|
||||||
|
|
||||||
|
**Server:**
|
||||||
- `PORT`: Server port (default: 8100)
|
- `PORT`: Server port (default: 8100)
|
||||||
- `CORS_ORIGINS`: Comma-separated list of allowed origins (default: "*")
|
- `CORS_ORIGINS`: Comma-separated list of allowed origins (default: "*")
|
||||||
|
|
||||||
|
**TCP Connection Pool:**
|
||||||
|
- `TCP_PERSISTENT_ENABLED`: Enable persistent connections (default: "true")
|
||||||
|
- `TCP_IDLE_TTL`: Close idle connections after N seconds (default: 300)
|
||||||
|
- `TCP_MAX_AGE`: Force reconnect after N seconds (default: 1800)
|
||||||
|
- `TCP_KEEPALIVE_IDLE`: Seconds idle before keepalive probes (default: 15)
|
||||||
|
- `TCP_KEEPALIVE_INTERVAL`: Seconds between keepalive probes (default: 10)
|
||||||
|
- `TCP_KEEPALIVE_COUNT`: Failed probes before declaring dead (default: 3)
|
||||||
|
|
||||||
### Database
|
### Database
|
||||||
|
|
||||||
The SQLite database is automatically created at [data/slmm.db](data/slmm.db) on first run.
|
The SQLite database is automatically created at [data/slmm.db](data/slmm.db) on first run.
|
||||||
@@ -126,7 +147,7 @@ Logs are written to:
|
|||||||
| GET | `/api/nl43/{unit_id}/live` | Request fresh DOD data from device (bypasses cache) |
|
| GET | `/api/nl43/{unit_id}/live` | Request fresh DOD data from device (bypasses cache) |
|
||||||
| WS | `/api/nl43/{unit_id}/stream` | WebSocket stream for real-time DRD data |
|
| WS | `/api/nl43/{unit_id}/stream` | WebSocket stream for real-time DRD data |
|
||||||
|
|
||||||
### Background Polling Configuration ⭐ NEW
|
### Background Polling
|
||||||
|
|
||||||
| Method | Endpoint | Description |
|
| Method | Endpoint | Description |
|
||||||
|--------|----------|-------------|
|
|--------|----------|-------------|
|
||||||
@@ -134,6 +155,13 @@ Logs are written to:
|
|||||||
| PUT | `/api/nl43/{unit_id}/polling/config` | Update polling interval and enable/disable polling |
|
| PUT | `/api/nl43/{unit_id}/polling/config` | Update polling interval and enable/disable polling |
|
||||||
| GET | `/api/nl43/_polling/status` | Get global polling status for all devices |
|
| GET | `/api/nl43/_polling/status` | Get global polling status for all devices |
|
||||||
|
|
||||||
|
### Connection Pool
|
||||||
|
|
||||||
|
| Method | Endpoint | Description |
|
||||||
|
|--------|----------|-------------|
|
||||||
|
| GET | `/api/nl43/_connections/status` | Get pool config, active connections, age/idle times |
|
||||||
|
| POST | `/api/nl43/_connections/flush` | Force-close all cached TCP connections |
|
||||||
|
|
||||||
### Measurement Control
|
### Measurement Control
|
||||||
|
|
||||||
| Method | Endpoint | Description |
|
| Method | Endpoint | Description |
|
||||||
@@ -255,6 +283,9 @@ Caches latest measurement snapshot:
|
|||||||
|
|
||||||
### TCP Communication
|
### TCP Communication
|
||||||
- Uses ASCII command protocol over TCP
|
- Uses ASCII command protocol over TCP
|
||||||
|
- Persistent connections with OS-level keepalive (tuned for cellular modems)
|
||||||
|
- Connections cached per device and reused across commands
|
||||||
|
- Transparent retry on stale connections
|
||||||
- Enforces ≥1 second delay between commands to same device
|
- Enforces ≥1 second delay between commands to same device
|
||||||
- Two-line response format:
|
- Two-line response format:
|
||||||
- Line 1: Result code (R+0000 for success)
|
- Line 1: Result code (R+0000 for success)
|
||||||
@@ -320,6 +351,16 @@ curl http://localhost:8100/api/nl43/meter-001/polling/config
|
|||||||
curl http://localhost:8100/api/nl43/_polling/status
|
curl http://localhost:8100/api/nl43/_polling/status
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Check Connection Pool Status
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8100/api/nl43/_connections/status | jq '.'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Flush All Cached Connections
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:8100/api/nl43/_connections/flush
|
||||||
|
```
|
||||||
|
|
||||||
### Verify Device Settings
|
### Verify Device Settings
|
||||||
```bash
|
```bash
|
||||||
curl http://localhost:8100/api/nl43/meter-001/settings
|
curl http://localhost:8100/api/nl43/meter-001/settings
|
||||||
@@ -388,11 +429,19 @@ See [API.md](API.md) for detailed integration examples.
|
|||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
### Connection Issues
|
### Connection Issues
|
||||||
|
- Check connection pool status: `curl http://localhost:8100/api/nl43/_connections/status`
|
||||||
|
- Flush stale connections: `curl -X POST http://localhost:8100/api/nl43/_connections/flush`
|
||||||
- Verify device IP address and port in configuration
|
- Verify device IP address and port in configuration
|
||||||
- Ensure device is on the same network
|
- Ensure device is on the same network
|
||||||
- Check firewall rules allow TCP/FTP connections
|
- Check firewall rules allow TCP/FTP connections
|
||||||
- Verify RX55 network adapter is properly configured on device
|
- Verify RX55 network adapter is properly configured on device
|
||||||
|
|
||||||
|
### Cellular Modem Issues
|
||||||
|
- If modem wedges from too many handshakes, ensure `TCP_PERSISTENT_ENABLED=true` (default)
|
||||||
|
- Increase `TCP_IDLE_TTL` if connections expire between poll cycles
|
||||||
|
- Keepalive probes (default: every 15s) keep NAT tables alive — adjust `TCP_KEEPALIVE_IDLE` if needed
|
||||||
|
- Set `TCP_PERSISTENT_ENABLED=false` to disable pooling for debugging
|
||||||
|
|
||||||
### Rate Limiting
|
### Rate Limiting
|
||||||
- API automatically enforces 1-second delay between commands
|
- API automatically enforces 1-second delay between commands
|
||||||
- If experiencing delays, this is normal device behavior
|
- If experiencing delays, this is normal device behavior
|
||||||
|
|||||||
403
SLM-stress-test/NL43_RX55_TCP_Wedge_Investigation_2026-02-18.md
Normal file
403
SLM-stress-test/NL43_RX55_TCP_Wedge_Investigation_2026-02-18.md
Normal file
@@ -0,0 +1,403 @@
|
|||||||
|
# NL-43 + RX55 TCP “Wedge” Investigation (2255 Refusal) — Full Log & Next Steps
|
||||||
|
**Last updated:** 2026-02-18
|
||||||
|
**Owner:** Brian / serversdown
|
||||||
|
**Context:** Terra-View / SLMM / field-deployed Rion NL-43 behind Sierra Wireless RX55
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0) What this document is
|
||||||
|
This is a **comprehensive, chronological** record of the debugging we did to isolate a failure where the **NL-43’s TCP control port (2255) eventually stops accepting connections** (“wedges”), while other services (notably FTP/21) remain reachable.
|
||||||
|
|
||||||
|
This is written to be fed back into future troubleshooting, so it intentionally includes the **full reasoning chain, experiments, commands, packet evidence, and conclusions**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1) Architecture (as tested)
|
||||||
|
### Network path
|
||||||
|
- **Server (SLMM host):** `10.0.0.40`
|
||||||
|
- **RX55 WAN IP:** `63.45.161.30`
|
||||||
|
- **RX55 LAN subnet:** `192.168.1.0/24`
|
||||||
|
- **RX55 LAN gateway:** `192.168.1.1`
|
||||||
|
- **NL-43 LAN IP:** `192.168.1.10` (confirmed via ARP OUI + ping; see LAN validation)
|
||||||
|
|
||||||
|
### RX55 details
|
||||||
|
- **Sierra Wireless RX55**
|
||||||
|
- **OS:** 5.2
|
||||||
|
- **Firmware:** `01.14.24.00`
|
||||||
|
- **Carrier:** Verizon LTE (Band 66)
|
||||||
|
|
||||||
|
### Port forwarding rules (RX55)
|
||||||
|
- **WAN:2255 → NL-43:2255** (NL-43 TCP control)
|
||||||
|
- **WAN:21 → NL-43:21** (NL-43 FTP control)
|
||||||
|
|
||||||
|
You also experimented with additional forwards:
|
||||||
|
- **WAN:2253 → NL-43:2255** (test)
|
||||||
|
- **WAN:2253 → NL-43:2253** (test)
|
||||||
|
- **WAN:4450 → NL-43:4450** (test)
|
||||||
|
|
||||||
|
**Important:** Rule “Input zone / interface” was set to **WAN-NAT**, and Source IP left as **Any IPv4**. This is correct for inbound port-forward behavior on Sierra OS 5.x.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2) Original problem statement (the “wedge”)
|
||||||
|
After running for hours, the NL-43 becomes unreachable over TCP control.
|
||||||
|
|
||||||
|
### Symptom signature (WAN-side)
|
||||||
|
- Client attempts to connect to `63.45.161.30:2255`
|
||||||
|
- Instead of timing out, the client gets **connection refused** quickly.
|
||||||
|
- Packet-level: SYN from client → **RST,ACK** back (meaning active refusal vs silent drop)
|
||||||
|
|
||||||
|
### Critical operational behavior
|
||||||
|
- **Power cycling the NL-43 fixes it.**
|
||||||
|
- **Power cycling the RX55 does NOT fix it.**
|
||||||
|
- FTP sometimes remains available even while TCP control (2255) is dead.
|
||||||
|
|
||||||
|
This combination is what forced us to determine whether:
|
||||||
|
- The RX55 is rejecting connections, OR
|
||||||
|
- The NL-43 is no longer listening on 2255, OR
|
||||||
|
- Something about the RX55 path triggers the NL-43’s control listener to die.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3) Event timeline evidence (SLMM logs)
|
||||||
|
A concrete wedge window was observed on **2026-02-18**:
|
||||||
|
|
||||||
|
- 10:55:46 AM — Poll success (Start)
|
||||||
|
- 11:00:28 AM — Measurement STOPPED (scheduled stop/download cycle succeeded)
|
||||||
|
- 11:55:50 AM — Poll success (Stop)
|
||||||
|
- 12:55:55 PM — Poll success (Stop)
|
||||||
|
- **1:55:58 PM — Poll failed (attempt 1/3): Errno 111 (connection refused)**
|
||||||
|
- 2:56:02 PM — Poll failed (attempt 2/3): Errno 111 (connection refused)
|
||||||
|
|
||||||
|
Key interpretation:
|
||||||
|
- The wedge occurred sometime between **12:55 and 1:55**.
|
||||||
|
- The failure type is **refused**, not timeout.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4) Early hypotheses (before proof)
|
||||||
|
We considered two main buckets:
|
||||||
|
|
||||||
|
### A) NL-43-side failure (most suspicious)
|
||||||
|
- NL-43 TCP control service crashes / exits / unbinds from 2255
|
||||||
|
- socket leak / accept backlog exhaustion
|
||||||
|
- “single control session allowed” and it gets stuck thinking a session is active
|
||||||
|
- mode/service manager bug (service restart fails after other activities)
|
||||||
|
- firmware bug in TCP daemon
|
||||||
|
|
||||||
|
### B) RX55-side failure (possible trigger / less likely once FTP works)
|
||||||
|
- NAT/forwarding table corruption
|
||||||
|
- firewall behavior
|
||||||
|
- helper/ALG interference
|
||||||
|
- MSS/MTU weirdness causing edge-case behavior
|
||||||
|
- session churn behavior causing downstream issues
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5) Key experiments and what they proved
|
||||||
|
|
||||||
|
### 5.1) LAN-only stability test (No RX55 path)
|
||||||
|
**Test:** NL-43 tested directly on LAN (no modem path involved).
|
||||||
|
- Ran **24+ hours**
|
||||||
|
- Scheduler start/stop cycles worked
|
||||||
|
- Stress test: **500 commands @ 1/sec** → no failure
|
||||||
|
- Response time trend decreased (not degrading)
|
||||||
|
|
||||||
|
**Result:** The NL-43 appears stable in a “pure LAN” environment.
|
||||||
|
|
||||||
|
**Interpretation:** The trigger is likely related to the RX55/WAN environment, connection patterns, or service switching patterns—not just simple uptime.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5.2) Port-forward behavior: timeout vs refused (RX55 behavior characterization)
|
||||||
|
You observed:
|
||||||
|
|
||||||
|
- **If a WAN port is NOT forwarded (no rule):** connecting to that port **times out** (silent drop)
|
||||||
|
- **If a WAN port IS forwarded to NL-43 but nothing listens:** it **actively refuses** (RST)
|
||||||
|
|
||||||
|
Concrete example:
|
||||||
|
- Port **4450** with no rule → timeout
|
||||||
|
- Port **4450 → NL-43:4450** rule created → connection refused
|
||||||
|
|
||||||
|
**Interpretation:** This confirms the RX55 is actually forwarding packets to the NL-43 when a rule exists. “Refused” is consistent with the NL-43 (or RX55 relay behavior) responding quickly because the packet reached the target.
|
||||||
|
|
||||||
|
Important nuance:
|
||||||
|
- A “refused” on forwarded ports does **not** automatically prove the NL-43 is the one generating RST, because NAT hides the inside host and the RX55 could reject on behalf of an unreachable target. We needed a LAN-side proof test to close the loop.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5.3) UDP test confusion (and resolution)
|
||||||
|
You ran:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nc -vzu 63.45.161.30 2255
|
||||||
|
nc -vz 63.45.161.30 2255
|
||||||
|
```
|
||||||
|
|
||||||
|
Observed:
|
||||||
|
- UDP: “succeeded”
|
||||||
|
- TCP: “connection refused”
|
||||||
|
|
||||||
|
Resolution:
|
||||||
|
- UDP has **no handshake**. netcat prints “succeeded” if it doesn’t immediately receive an ICMP unreachable. It does **not** mean a UDP service exists.
|
||||||
|
- TCP refused is meaningful: a RST implies “no listener” or “actively rejected.”
|
||||||
|
|
||||||
|
**Net effect:** UDP test did not change the diagnosis.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5.4) Packet capture proof (WAN-side)
|
||||||
|
You captured a Wireshark/tcpdump summary with these key patterns:
|
||||||
|
|
||||||
|
#### Port 2255 (TCP control)
|
||||||
|
Example:
|
||||||
|
- `10.0.0.40 → 63.45.161.30:2255` SYN
|
||||||
|
- `63.45.161.30 → 10.0.0.40` **RST, ACK** within ~50ms
|
||||||
|
|
||||||
|
This happened repeatedly.
|
||||||
|
|
||||||
|
#### Port 2253 (test port)
|
||||||
|
Multiple SYN attempts to 2253 showed **retransmissions and no response**, i.e., **silent drop** (consistent with no rule or not forwarded at that moment).
|
||||||
|
|
||||||
|
#### Port 21 (FTP)
|
||||||
|
Clean 3-way handshake:
|
||||||
|
- SYN → SYN/ACK → ACK
|
||||||
|
Then:
|
||||||
|
- FTP server banner: `220 Connection Ready`
|
||||||
|
Then:
|
||||||
|
- `530 Not logged in` (because SLMM was sending non-FTP “requests” as an experiment)
|
||||||
|
Session closes cleanly.
|
||||||
|
|
||||||
|
**Key takeaway from capture:**
|
||||||
|
- TCP transport to NL-43 via RX55 is definitely working (port 21 proves it).
|
||||||
|
- Port 2255 is being actively refused.
|
||||||
|
|
||||||
|
This strongly suggested “2255 listener is gone,” but still didn’t fully prove whether the refusal was generated internally by NL-43 or by RX55 on behalf of NL-43.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6) The decisive experiment: LAN-side test while wedged (final proof)
|
||||||
|
Because the RX55 does not offer SSH, the plan was to test from **inside the LAN behind the RX55**.
|
||||||
|
|
||||||
|
### 6.1) Physical LAN tap setup
|
||||||
|
Constraint:
|
||||||
|
- NL-43 has only one Ethernet port.
|
||||||
|
|
||||||
|
Solution:
|
||||||
|
- Insert an unmanaged switch:
|
||||||
|
- RX55 LAN → switch
|
||||||
|
- NL-43 → switch
|
||||||
|
- Windows 10 laptop → switch
|
||||||
|
|
||||||
|
This creates a shared L2 segment where the laptop can test NL-43 directly.
|
||||||
|
|
||||||
|
### 6.2) Windows LAN validation
|
||||||
|
On the Windows laptop:
|
||||||
|
|
||||||
|
- `ipconfig` showed:
|
||||||
|
- IP: `192.168.1.100`
|
||||||
|
- Gateway: `192.168.1.1` (RX55)
|
||||||
|
- Initial `arp -a` only showed RX55, not NL-43.
|
||||||
|
|
||||||
|
You then:
|
||||||
|
- pinged likely host addresses and discovered NL-43 responds on **192.168.1.10**
|
||||||
|
- `arp -a` then showed:
|
||||||
|
- `192.168.1.10 → 00-10-50-14-0a-d8`
|
||||||
|
- OUI `00-10-50` recognized as **Rion** (matches NL-43)
|
||||||
|
|
||||||
|
So LAN identities were confirmed:
|
||||||
|
- RX55: `192.168.1.1`
|
||||||
|
- NL-43: `192.168.1.10`
|
||||||
|
|
||||||
|
### 6.3) The LAN port tests (the smoking gun)
|
||||||
|
From Windows:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
Test-NetConnection -ComputerName 192.168.1.10 -Port 2255
|
||||||
|
Test-NetConnection -ComputerName 192.168.1.10 -Port 21
|
||||||
|
```
|
||||||
|
|
||||||
|
Results (while the unit was “wedged” from the WAN perspective):
|
||||||
|
- **2255:** `TcpTestSucceeded : False`
|
||||||
|
- **21:** `TcpTestSucceeded : True`
|
||||||
|
|
||||||
|
**Conclusion (PROVEN):**
|
||||||
|
- The NL-43 is reachable on the LAN
|
||||||
|
- FTP port 21 is alive
|
||||||
|
- **The NL-43 is NOT listening on TCP port 2255**
|
||||||
|
- Therefore the RX55 is not the root cause of the refusal. The WAN refusal is consistent with the NL-43 having no listener on 2255.
|
||||||
|
|
||||||
|
This is now settled.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7) What we learned (final conclusions)
|
||||||
|
### 7.1) RX55 innocence (for this failure mode)
|
||||||
|
The RX55 is not “randomly rejecting” or “breaking TCP” in the way originally feared.
|
||||||
|
|
||||||
|
It successfully forwards and supports TCP to the NL-43 on port 21, and the LAN-side test proves the 2255 failure exists *even without NAT/WAN involvement*.
|
||||||
|
|
||||||
|
### 7.2) NL-43 control listener failure
|
||||||
|
The NL-43’s TCP control service (port 2255) stops listening while:
|
||||||
|
- the device remains alive
|
||||||
|
- the LAN stack remains alive (ping)
|
||||||
|
- FTP remains alive (port 21)
|
||||||
|
|
||||||
|
This looks like one of:
|
||||||
|
- control daemon crash/exit
|
||||||
|
- service unbind
|
||||||
|
- stuck service state (e.g., “busy” / “session active forever”)
|
||||||
|
- resource leak (sockets/file descriptors) specific to the control service
|
||||||
|
- firmware service manager bug (start/stop of services fails after certain sequences)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8) Additional constraint discovered: “Web App mode” conflicts
|
||||||
|
You noted an important operational constraint:
|
||||||
|
|
||||||
|
> Turning on the web app disables other interfaces like TCP and FTP.
|
||||||
|
|
||||||
|
Meaning the NL-43 appears to have mutually exclusive service/mode behavior (or at least serious conflicts). That matters because:
|
||||||
|
- If any workflow toggles modes (explicitly or implicitly), it could destabilize the service lifecycle.
|
||||||
|
- It reduces the possibility of using “web UI toggle” as an easy remote recovery mechanism **if** it disables the services needed.
|
||||||
|
|
||||||
|
We have not yet run a controlled long test to determine whether:
|
||||||
|
- mode switching contributes directly to the 2255 listener dying, OR
|
||||||
|
- it happens even in a pure TCP-only mode with no switching.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9) Immediate operational decision (field tomorrow)
|
||||||
|
Because the device is needed in the field immediately, you chose:
|
||||||
|
- **Old-school manual deployment**
|
||||||
|
- **Manual SD card downloads**
|
||||||
|
- Avoid reliance on 2255/TCP control and remote workflows for now.
|
||||||
|
|
||||||
|
**Important operational note:**
|
||||||
|
The 2255 listener dying does not necessarily stop the NL-43 from measuring; it primarily breaks remote control/polling. Manual SD workflow sidesteps the entire remote control dependency.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10) What’s next (future work — when the unit is back)
|
||||||
|
Because long tests can’t be run before tomorrow, the plan is to resume in a few weeks with controlled experiments designed to isolate the trigger and develop an operational mitigation.
|
||||||
|
|
||||||
|
### 10.1) Controlled experiment matrix (recommended)
|
||||||
|
Run each test for 24–72 hours, or until wedge occurs, and record:
|
||||||
|
- number of TCP connects
|
||||||
|
- whether connections are persistent
|
||||||
|
- whether FTP is used
|
||||||
|
- whether any mode toggling is performed
|
||||||
|
- time-to-wedge
|
||||||
|
|
||||||
|
#### Test A — TCP-only (ideal baseline)
|
||||||
|
- TCP control only (2255)
|
||||||
|
- **True persistent connection** (open once, keep forever)
|
||||||
|
- No FTP
|
||||||
|
- No web mode toggling
|
||||||
|
|
||||||
|
Outcome interpretation:
|
||||||
|
- If stable: connection churn and/or FTP/mode switching is the trigger.
|
||||||
|
- If wedges anyway: pure 2255 daemon leak/bug.
|
||||||
|
|
||||||
|
#### Test B — TCP with connection churn
|
||||||
|
- Same as A but intentionally reconnect on a schedule (current SLMM behavior)
|
||||||
|
- No FTP
|
||||||
|
|
||||||
|
Outcome:
|
||||||
|
- If this wedges but A doesn’t: churn is the trigger.
|
||||||
|
|
||||||
|
#### Test C — FTP activity + TCP
|
||||||
|
- Introduce scheduled FTP sessions (downloads) while using TCP control
|
||||||
|
- Observe whether wedge correlates with FTP use or with post-download periods.
|
||||||
|
|
||||||
|
Outcome:
|
||||||
|
- If wedge correlates with FTP, suspect internal service lifecycle conflict.
|
||||||
|
|
||||||
|
#### Test D — Web mode interaction (only if safe/possible)
|
||||||
|
- Evaluate what toggling web mode does to TCP/FTP services.
|
||||||
|
- Determine if any remote-safe “soft reset” exists.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11) Mitigation options (ranked)
|
||||||
|
### Option 1 — Make SLMM truly persistent (highest probability of success)
|
||||||
|
If the NL-43 wedges due to session churn or leaked socket states, the best mitigation is:
|
||||||
|
- Open one TCP socket per device
|
||||||
|
- Keep it open indefinitely
|
||||||
|
- Use OS keepalive
|
||||||
|
- Do **not** rotate connections on timers
|
||||||
|
- Reconnect only when the socket actually dies
|
||||||
|
|
||||||
|
This reduces:
|
||||||
|
- connect/close cycles
|
||||||
|
- NAT edge-case exposure
|
||||||
|
- resource churn inside NL-43
|
||||||
|
|
||||||
|
### Option 2 — Service “soft reset” (if possible without disabling required services)
|
||||||
|
If there exists any way to restart the 2255 service without power cycling:
|
||||||
|
- LAN TCP toggle (if it doesn’t require web mode)
|
||||||
|
- any “restart comms” command (unknown)
|
||||||
|
- any maintenance menu sequence
|
||||||
|
then SLMM could:
|
||||||
|
- detect wedge
|
||||||
|
- trigger soft reset
|
||||||
|
- recover automatically
|
||||||
|
|
||||||
|
Current constraint: web app mode appears to disable other services, so this may not be viable.
|
||||||
|
|
||||||
|
### Option 3 — Hardware watchdog power cycle (industrial but reliable)
|
||||||
|
If this is a firmware bug with no clean workaround:
|
||||||
|
- Add a remotely controlled relay/power switch
|
||||||
|
- On wedge detection, power-cycle NL-43 automatically
|
||||||
|
- Optionally schedule a nightly power cycle to prevent leak accumulation
|
||||||
|
|
||||||
|
This is “field reality” and often the only long-term move with embedded devices.
|
||||||
|
|
||||||
|
### Option 4 — Vendor escalation (Rion)
|
||||||
|
You now have excellent evidence:
|
||||||
|
- LAN-side proof: 2255 dead while 21 alive
|
||||||
|
- WAN packet evidence
|
||||||
|
- clear isolation of RX55 innocence
|
||||||
|
|
||||||
|
This is strong enough to send to Rion support as a firmware defect report.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12) Repro “wedge bundle” checklist (for future captures)
|
||||||
|
When the wedge happens again, capture these before power cycling:
|
||||||
|
|
||||||
|
1) From server:
|
||||||
|
- `nc -vz 63.45.161.30 2255` (expect refused)
|
||||||
|
- `nc -vz 63.45.161.30 21` (expect success if FTP alive)
|
||||||
|
|
||||||
|
2) From LAN side (via switch/laptop):
|
||||||
|
- `Test-NetConnection 192.168.1.10 -Port 2255`
|
||||||
|
- `Test-NetConnection 192.168.1.10 -Port 21`
|
||||||
|
|
||||||
|
3) Optional: packet capture around the refused attempt.
|
||||||
|
|
||||||
|
4) Record:
|
||||||
|
- last successful poll timestamp
|
||||||
|
- last FTP session timestamp
|
||||||
|
- any scheduled start/stop/download cycles near wedge time
|
||||||
|
- SLMM connection reuse/rotation settings in effect
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13) Final, current-state summary (as of 2026-02-18)
|
||||||
|
- The issue is **NOT** the RX55 rejecting inbound connections.
|
||||||
|
- The NL-43 is **alive**, reachable on LAN, and FTP works.
|
||||||
|
- The NL-43’s **TCP control listener on 2255 stops listening** while the device remains otherwise healthy.
|
||||||
|
- The wedge can occur hours after successful operations.
|
||||||
|
- The unit is needed in the field immediately, so investigation pauses.
|
||||||
|
- Next phase: controlled tests to isolate trigger + implement mitigation (persistent socket or watchdog reset).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14) Notes / misc observations
|
||||||
|
- The Wireshark trace showed repeated FTP sessions were opened and closed cleanly, but SLMM’s “FTP requests” were not valid FTP (causing `530 Not logged in`). That was part of experimentation, not a normal workflow.
|
||||||
|
- UDP “success” via netcat is not meaningful because UDP has no handshake; it simply indicates no ICMP unreachable was returned.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**End of document.**
|
||||||
1483
SLM-stress-test/nl43_stress_test.py
Normal file
1483
SLM-stress-test/nl43_stress_test.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -15,7 +15,8 @@ from sqlalchemy.orm import Session
|
|||||||
|
|
||||||
from app.database import SessionLocal
|
from app.database import SessionLocal
|
||||||
from app.models import NL43Config, NL43Status
|
from app.models import NL43Config, NL43Status
|
||||||
from app.services import NL43Client, persist_snapshot
|
from app.services import NL43Client, persist_snapshot, sync_measurement_start_time_from_ftp
|
||||||
|
from app.device_logger import log_device_event, cleanup_old_logs
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@@ -25,7 +26,7 @@ class BackgroundPoller:
|
|||||||
Background task that continuously polls NL43 devices and updates status cache.
|
Background task that continuously polls NL43 devices and updates status cache.
|
||||||
|
|
||||||
Features:
|
Features:
|
||||||
- Per-device configurable poll intervals (10-3600 seconds)
|
- Per-device configurable poll intervals (30 seconds to 6 hours)
|
||||||
- Automatic offline detection (marks unreachable after 3 consecutive failures)
|
- Automatic offline detection (marks unreachable after 3 consecutive failures)
|
||||||
- Dynamic sleep intervals based on device configurations
|
- Dynamic sleep intervals based on device configurations
|
||||||
- Graceful shutdown on application stop
|
- Graceful shutdown on application stop
|
||||||
@@ -36,6 +37,8 @@ class BackgroundPoller:
|
|||||||
self._task: Optional[asyncio.Task] = None
|
self._task: Optional[asyncio.Task] = None
|
||||||
self._running = False
|
self._running = False
|
||||||
self._logger = logger
|
self._logger = logger
|
||||||
|
self._last_cleanup = None # Track last log cleanup time
|
||||||
|
self._last_pool_log = None # Track last connection pool heartbeat log
|
||||||
|
|
||||||
async def start(self):
|
async def start(self):
|
||||||
"""Start the background polling task."""
|
"""Start the background polling task."""
|
||||||
@@ -78,6 +81,33 @@ class BackgroundPoller:
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
self._logger.error(f"Error in poll loop: {e}", exc_info=True)
|
self._logger.error(f"Error in poll loop: {e}", exc_info=True)
|
||||||
|
|
||||||
|
# Run log cleanup once per hour
|
||||||
|
try:
|
||||||
|
now = datetime.utcnow()
|
||||||
|
if self._last_cleanup is None or (now - self._last_cleanup).total_seconds() > 3600:
|
||||||
|
cleanup_old_logs()
|
||||||
|
self._last_cleanup = now
|
||||||
|
except Exception as e:
|
||||||
|
self._logger.warning(f"Log cleanup failed: {e}")
|
||||||
|
|
||||||
|
# Log connection pool status every 15 minutes
|
||||||
|
try:
|
||||||
|
now = datetime.utcnow()
|
||||||
|
if self._last_pool_log is None or (now - self._last_pool_log).total_seconds() > 900:
|
||||||
|
from app.services import _connection_pool
|
||||||
|
stats = _connection_pool.get_stats()
|
||||||
|
conns = stats.get("connections", {})
|
||||||
|
if conns:
|
||||||
|
for key, c in conns.items():
|
||||||
|
self._logger.info(
|
||||||
|
f"[POOL] {key} — age={c['age_seconds']}s idle={c['idle_seconds']}s alive={c['alive']}"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
self._logger.info("[POOL] No active connections in pool")
|
||||||
|
self._last_pool_log = now
|
||||||
|
except Exception as e:
|
||||||
|
self._logger.warning(f"Pool status log failed: {e}")
|
||||||
|
|
||||||
# Calculate dynamic sleep interval
|
# Calculate dynamic sleep interval
|
||||||
sleep_time = self._calculate_sleep_interval()
|
sleep_time = self._calculate_sleep_interval()
|
||||||
self._logger.debug(f"Sleeping for {sleep_time} seconds until next poll cycle")
|
self._logger.debug(f"Sleeping for {sleep_time} seconds until next poll cycle")
|
||||||
@@ -205,6 +235,71 @@ class BackgroundPoller:
|
|||||||
db.commit()
|
db.commit()
|
||||||
self._logger.info(f"✓ Successfully polled {unit_id}")
|
self._logger.info(f"✓ Successfully polled {unit_id}")
|
||||||
|
|
||||||
|
# Log to device log
|
||||||
|
log_device_event(
|
||||||
|
unit_id, "INFO", "POLL",
|
||||||
|
f"Poll success: state={snap.measurement_state}, Leq={snap.leq}, Lp={snap.lp}",
|
||||||
|
db
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check if device is measuring but has no start time recorded
|
||||||
|
# This happens if measurement was started before SLMM began polling
|
||||||
|
# or after a service restart
|
||||||
|
status = db.query(NL43Status).filter_by(unit_id=unit_id).first()
|
||||||
|
|
||||||
|
# Reset the sync flag when measurement stops (so next measurement can sync)
|
||||||
|
if status and status.measurement_state != "Start":
|
||||||
|
if status.start_time_sync_attempted:
|
||||||
|
status.start_time_sync_attempted = False
|
||||||
|
db.commit()
|
||||||
|
self._logger.debug(f"Reset FTP sync flag for {unit_id} (measurement stopped)")
|
||||||
|
log_device_event(unit_id, "DEBUG", "STATE", "Measurement stopped, reset FTP sync flag", db)
|
||||||
|
|
||||||
|
# Attempt FTP sync if:
|
||||||
|
# - Device is measuring
|
||||||
|
# - No start time recorded
|
||||||
|
# - FTP sync not already attempted for this measurement
|
||||||
|
# - FTP is configured
|
||||||
|
if (status and
|
||||||
|
status.measurement_state == "Start" and
|
||||||
|
status.measurement_start_time is None and
|
||||||
|
not status.start_time_sync_attempted and
|
||||||
|
cfg.ftp_enabled and
|
||||||
|
cfg.ftp_username and
|
||||||
|
cfg.ftp_password):
|
||||||
|
|
||||||
|
self._logger.info(
|
||||||
|
f"Device {unit_id} is measuring but has no start time - "
|
||||||
|
f"attempting FTP sync"
|
||||||
|
)
|
||||||
|
log_device_event(unit_id, "INFO", "SYNC", "Attempting FTP sync for measurement start time", db)
|
||||||
|
|
||||||
|
# Mark that we attempted sync (prevents repeated attempts on failure)
|
||||||
|
status.start_time_sync_attempted = True
|
||||||
|
db.commit()
|
||||||
|
|
||||||
|
try:
|
||||||
|
synced = await sync_measurement_start_time_from_ftp(
|
||||||
|
unit_id=unit_id,
|
||||||
|
host=cfg.host,
|
||||||
|
tcp_port=cfg.tcp_port,
|
||||||
|
ftp_port=cfg.ftp_port or 21,
|
||||||
|
ftp_username=cfg.ftp_username,
|
||||||
|
ftp_password=cfg.ftp_password,
|
||||||
|
db=db
|
||||||
|
)
|
||||||
|
if synced:
|
||||||
|
self._logger.info(f"✓ FTP sync succeeded for {unit_id}")
|
||||||
|
log_device_event(unit_id, "INFO", "SYNC", "FTP sync succeeded - measurement start time updated", db)
|
||||||
|
else:
|
||||||
|
self._logger.warning(f"FTP sync returned False for {unit_id}")
|
||||||
|
log_device_event(unit_id, "WARNING", "SYNC", "FTP sync returned False", db)
|
||||||
|
except Exception as sync_err:
|
||||||
|
self._logger.warning(
|
||||||
|
f"FTP sync failed for {unit_id}: {sync_err}"
|
||||||
|
)
|
||||||
|
log_device_event(unit_id, "ERROR", "SYNC", f"FTP sync failed: {sync_err}", db)
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
# Failure - increment counter and potentially mark offline
|
# Failure - increment counter and potentially mark offline
|
||||||
status.consecutive_failures += 1
|
status.consecutive_failures += 1
|
||||||
@@ -217,11 +312,13 @@ class BackgroundPoller:
|
|||||||
self._logger.warning(
|
self._logger.warning(
|
||||||
f"Device {unit_id} marked unreachable after {status.consecutive_failures} failures: {error_msg}"
|
f"Device {unit_id} marked unreachable after {status.consecutive_failures} failures: {error_msg}"
|
||||||
)
|
)
|
||||||
|
log_device_event(unit_id, "ERROR", "POLL", f"Device marked UNREACHABLE after {status.consecutive_failures} failures: {error_msg}", db)
|
||||||
status.is_reachable = False
|
status.is_reachable = False
|
||||||
else:
|
else:
|
||||||
self._logger.warning(
|
self._logger.warning(
|
||||||
f"Poll failed for {unit_id} (attempt {status.consecutive_failures}/3): {error_msg}"
|
f"Poll failed for {unit_id} (attempt {status.consecutive_failures}/3): {error_msg}"
|
||||||
)
|
)
|
||||||
|
log_device_event(unit_id, "WARNING", "POLL", f"Poll failed (attempt {status.consecutive_failures}/3): {error_msg}", db)
|
||||||
|
|
||||||
db.commit()
|
db.commit()
|
||||||
|
|
||||||
@@ -230,8 +327,8 @@ class BackgroundPoller:
|
|||||||
Calculate the next sleep interval based on all device poll intervals.
|
Calculate the next sleep interval based on all device poll intervals.
|
||||||
|
|
||||||
Returns a dynamic sleep time that ensures responsive polling:
|
Returns a dynamic sleep time that ensures responsive polling:
|
||||||
- Minimum 10 seconds (prevents tight loops)
|
- Minimum 30 seconds (prevents tight loops)
|
||||||
- Maximum 30 seconds (ensures responsiveness)
|
- Maximum 300 seconds / 5 minutes (ensures reasonable responsiveness for long intervals)
|
||||||
- Generally half the minimum device interval
|
- Generally half the minimum device interval
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
@@ -245,14 +342,15 @@ class BackgroundPoller:
|
|||||||
).all()
|
).all()
|
||||||
|
|
||||||
if not configs:
|
if not configs:
|
||||||
return 30 # Default sleep when no devices configured
|
return 60 # Default sleep when no devices configured
|
||||||
|
|
||||||
# Get all intervals
|
# Get all intervals
|
||||||
intervals = [cfg.poll_interval_seconds or 60 for cfg in configs]
|
intervals = [cfg.poll_interval_seconds or 60 for cfg in configs]
|
||||||
min_interval = min(intervals)
|
min_interval = min(intervals)
|
||||||
|
|
||||||
# Use half the minimum interval, but cap between 10-30 seconds
|
# Use half the minimum interval, but cap between 30-300 seconds
|
||||||
sleep_time = max(10, min(30, min_interval // 2))
|
# This allows longer sleep times when polling intervals are long (e.g., hourly)
|
||||||
|
sleep_time = max(30, min(300, min_interval // 2))
|
||||||
|
|
||||||
return sleep_time
|
return sleep_time
|
||||||
|
|
||||||
|
|||||||
277
app/device_logger.py
Normal file
277
app/device_logger.py
Normal file
@@ -0,0 +1,277 @@
|
|||||||
|
"""
|
||||||
|
Per-device logging system.
|
||||||
|
|
||||||
|
Provides dual output: database entries for structured queries and file logs for backup.
|
||||||
|
Each device gets its own log file in data/logs/{unit_id}.log with rotation.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from logging.handlers import RotatingFileHandler
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from sqlalchemy.orm import Session
|
||||||
|
|
||||||
|
from app.database import SessionLocal
|
||||||
|
from app.models import DeviceLog
|
||||||
|
|
||||||
|
# Configure base logger
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Log directory (persisted in Docker volume)
|
||||||
|
LOG_DIR = Path(os.path.dirname(os.path.dirname(__file__))) / "data" / "logs"
|
||||||
|
LOG_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Per-device file loggers (cached)
|
||||||
|
_device_file_loggers: dict = {}
|
||||||
|
|
||||||
|
# Log retention (days)
|
||||||
|
LOG_RETENTION_DAYS = int(os.getenv("LOG_RETENTION_DAYS", "7"))
|
||||||
|
|
||||||
|
|
||||||
|
def _get_file_logger(unit_id: str) -> logging.Logger:
|
||||||
|
"""Get or create a file logger for a specific device."""
|
||||||
|
if unit_id in _device_file_loggers:
|
||||||
|
return _device_file_loggers[unit_id]
|
||||||
|
|
||||||
|
# Create device-specific logger
|
||||||
|
device_logger = logging.getLogger(f"device.{unit_id}")
|
||||||
|
device_logger.setLevel(logging.DEBUG)
|
||||||
|
|
||||||
|
# Avoid duplicate handlers
|
||||||
|
if not device_logger.handlers:
|
||||||
|
# Create rotating file handler (5 MB max, keep 3 backups)
|
||||||
|
log_file = LOG_DIR / f"{unit_id}.log"
|
||||||
|
handler = RotatingFileHandler(
|
||||||
|
log_file,
|
||||||
|
maxBytes=5 * 1024 * 1024, # 5 MB
|
||||||
|
backupCount=3,
|
||||||
|
encoding="utf-8"
|
||||||
|
)
|
||||||
|
handler.setLevel(logging.DEBUG)
|
||||||
|
|
||||||
|
# Format: timestamp [LEVEL] [CATEGORY] message
|
||||||
|
formatter = logging.Formatter(
|
||||||
|
"%(asctime)s [%(levelname)s] [%(category)s] %(message)s",
|
||||||
|
datefmt="%Y-%m-%d %H:%M:%S"
|
||||||
|
)
|
||||||
|
handler.setFormatter(formatter)
|
||||||
|
device_logger.addHandler(handler)
|
||||||
|
|
||||||
|
# Don't propagate to root logger
|
||||||
|
device_logger.propagate = False
|
||||||
|
|
||||||
|
_device_file_loggers[unit_id] = device_logger
|
||||||
|
return device_logger
|
||||||
|
|
||||||
|
|
||||||
|
def log_device_event(
|
||||||
|
unit_id: str,
|
||||||
|
level: str,
|
||||||
|
category: str,
|
||||||
|
message: str,
|
||||||
|
db: Optional[Session] = None
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Log an event for a specific device.
|
||||||
|
|
||||||
|
Writes to both:
|
||||||
|
1. Database (DeviceLog table) for structured queries
|
||||||
|
2. File (data/logs/{unit_id}.log) for backup/debugging
|
||||||
|
|
||||||
|
Args:
|
||||||
|
unit_id: Device identifier
|
||||||
|
level: Log level (DEBUG, INFO, WARNING, ERROR)
|
||||||
|
category: Event category (TCP, FTP, POLL, COMMAND, STATE, SYNC)
|
||||||
|
message: Log message
|
||||||
|
db: Optional database session (creates one if not provided)
|
||||||
|
"""
|
||||||
|
timestamp = datetime.utcnow()
|
||||||
|
|
||||||
|
# Write to file log
|
||||||
|
try:
|
||||||
|
file_logger = _get_file_logger(unit_id)
|
||||||
|
log_func = getattr(file_logger, level.lower(), file_logger.info)
|
||||||
|
# Pass category as extra for formatter
|
||||||
|
log_func(message, extra={"category": category})
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to write file log for {unit_id}: {e}")
|
||||||
|
|
||||||
|
# Write to database
|
||||||
|
close_db = False
|
||||||
|
try:
|
||||||
|
if db is None:
|
||||||
|
db = SessionLocal()
|
||||||
|
close_db = True
|
||||||
|
|
||||||
|
log_entry = DeviceLog(
|
||||||
|
unit_id=unit_id,
|
||||||
|
timestamp=timestamp,
|
||||||
|
level=level.upper(),
|
||||||
|
category=category.upper(),
|
||||||
|
message=message
|
||||||
|
)
|
||||||
|
db.add(log_entry)
|
||||||
|
db.commit()
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to write DB log for {unit_id}: {e}")
|
||||||
|
if db:
|
||||||
|
db.rollback()
|
||||||
|
finally:
|
||||||
|
if close_db and db:
|
||||||
|
db.close()
|
||||||
|
|
||||||
|
|
||||||
|
def cleanup_old_logs(retention_days: Optional[int] = None, db: Optional[Session] = None):
|
||||||
|
"""
|
||||||
|
Delete log entries older than retention period.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
retention_days: Days to retain (default: LOG_RETENTION_DAYS env var or 7)
|
||||||
|
db: Optional database session
|
||||||
|
"""
|
||||||
|
if retention_days is None:
|
||||||
|
retention_days = LOG_RETENTION_DAYS
|
||||||
|
|
||||||
|
cutoff = datetime.utcnow() - timedelta(days=retention_days)
|
||||||
|
|
||||||
|
close_db = False
|
||||||
|
try:
|
||||||
|
if db is None:
|
||||||
|
db = SessionLocal()
|
||||||
|
close_db = True
|
||||||
|
|
||||||
|
deleted = db.query(DeviceLog).filter(DeviceLog.timestamp < cutoff).delete()
|
||||||
|
db.commit()
|
||||||
|
|
||||||
|
if deleted > 0:
|
||||||
|
logger.info(f"Cleaned up {deleted} log entries older than {retention_days} days")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to cleanup old logs: {e}")
|
||||||
|
if db:
|
||||||
|
db.rollback()
|
||||||
|
finally:
|
||||||
|
if close_db and db:
|
||||||
|
db.close()
|
||||||
|
|
||||||
|
|
||||||
|
def get_device_logs(
|
||||||
|
unit_id: str,
|
||||||
|
limit: int = 100,
|
||||||
|
offset: int = 0,
|
||||||
|
level: Optional[str] = None,
|
||||||
|
category: Optional[str] = None,
|
||||||
|
since: Optional[datetime] = None,
|
||||||
|
db: Optional[Session] = None
|
||||||
|
) -> list:
|
||||||
|
"""
|
||||||
|
Query log entries for a specific device.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
unit_id: Device identifier
|
||||||
|
limit: Max entries to return (default: 100)
|
||||||
|
offset: Number of entries to skip (default: 0)
|
||||||
|
level: Filter by level (DEBUG, INFO, WARNING, ERROR)
|
||||||
|
category: Filter by category (TCP, FTP, POLL, COMMAND, STATE, SYNC)
|
||||||
|
since: Filter entries after this timestamp
|
||||||
|
db: Optional database session
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of log entries as dicts
|
||||||
|
"""
|
||||||
|
close_db = False
|
||||||
|
try:
|
||||||
|
if db is None:
|
||||||
|
db = SessionLocal()
|
||||||
|
close_db = True
|
||||||
|
|
||||||
|
query = db.query(DeviceLog).filter(DeviceLog.unit_id == unit_id)
|
||||||
|
|
||||||
|
if level:
|
||||||
|
query = query.filter(DeviceLog.level == level.upper())
|
||||||
|
if category:
|
||||||
|
query = query.filter(DeviceLog.category == category.upper())
|
||||||
|
if since:
|
||||||
|
query = query.filter(DeviceLog.timestamp >= since)
|
||||||
|
|
||||||
|
# Order by newest first
|
||||||
|
query = query.order_by(DeviceLog.timestamp.desc())
|
||||||
|
|
||||||
|
# Apply pagination
|
||||||
|
entries = query.offset(offset).limit(limit).all()
|
||||||
|
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
"id": e.id,
|
||||||
|
"timestamp": e.timestamp.isoformat() + "Z",
|
||||||
|
"level": e.level,
|
||||||
|
"category": e.category,
|
||||||
|
"message": e.message
|
||||||
|
}
|
||||||
|
for e in entries
|
||||||
|
]
|
||||||
|
|
||||||
|
finally:
|
||||||
|
if close_db and db:
|
||||||
|
db.close()
|
||||||
|
|
||||||
|
|
||||||
|
def get_log_stats(unit_id: str, db: Optional[Session] = None) -> dict:
|
||||||
|
"""
|
||||||
|
Get log statistics for a device.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with counts by level and category
|
||||||
|
"""
|
||||||
|
close_db = False
|
||||||
|
try:
|
||||||
|
if db is None:
|
||||||
|
db = SessionLocal()
|
||||||
|
close_db = True
|
||||||
|
|
||||||
|
total = db.query(DeviceLog).filter(DeviceLog.unit_id == unit_id).count()
|
||||||
|
|
||||||
|
# Count by level
|
||||||
|
level_counts = {}
|
||||||
|
for level in ["DEBUG", "INFO", "WARNING", "ERROR"]:
|
||||||
|
count = db.query(DeviceLog).filter(
|
||||||
|
DeviceLog.unit_id == unit_id,
|
||||||
|
DeviceLog.level == level
|
||||||
|
).count()
|
||||||
|
if count > 0:
|
||||||
|
level_counts[level] = count
|
||||||
|
|
||||||
|
# Count by category
|
||||||
|
category_counts = {}
|
||||||
|
for category in ["TCP", "FTP", "POLL", "COMMAND", "STATE", "SYNC", "GENERAL"]:
|
||||||
|
count = db.query(DeviceLog).filter(
|
||||||
|
DeviceLog.unit_id == unit_id,
|
||||||
|
DeviceLog.category == category
|
||||||
|
).count()
|
||||||
|
if count > 0:
|
||||||
|
category_counts[category] = count
|
||||||
|
|
||||||
|
# Get oldest and newest
|
||||||
|
oldest = db.query(DeviceLog).filter(
|
||||||
|
DeviceLog.unit_id == unit_id
|
||||||
|
).order_by(DeviceLog.timestamp.asc()).first()
|
||||||
|
|
||||||
|
newest = db.query(DeviceLog).filter(
|
||||||
|
DeviceLog.unit_id == unit_id
|
||||||
|
).order_by(DeviceLog.timestamp.desc()).first()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"total": total,
|
||||||
|
"by_level": level_counts,
|
||||||
|
"by_category": category_counts,
|
||||||
|
"oldest": oldest.timestamp.isoformat() + "Z" if oldest else None,
|
||||||
|
"newest": newest.timestamp.isoformat() + "Z" if newest else None
|
||||||
|
}
|
||||||
|
|
||||||
|
finally:
|
||||||
|
if close_db and db:
|
||||||
|
db.close()
|
||||||
43
app/main.py
43
app/main.py
@@ -29,7 +29,11 @@ logger.info("Database tables initialized")
|
|||||||
@asynccontextmanager
|
@asynccontextmanager
|
||||||
async def lifespan(app: FastAPI):
|
async def lifespan(app: FastAPI):
|
||||||
"""Manage application lifecycle - startup and shutdown events."""
|
"""Manage application lifecycle - startup and shutdown events."""
|
||||||
|
from app.services import _connection_pool
|
||||||
|
|
||||||
# Startup
|
# Startup
|
||||||
|
logger.info("Starting TCP connection pool cleanup task...")
|
||||||
|
_connection_pool.start_cleanup()
|
||||||
logger.info("Starting background poller...")
|
logger.info("Starting background poller...")
|
||||||
await poller.start()
|
await poller.start()
|
||||||
logger.info("Background poller started")
|
logger.info("Background poller started")
|
||||||
@@ -40,12 +44,15 @@ async def lifespan(app: FastAPI):
|
|||||||
logger.info("Stopping background poller...")
|
logger.info("Stopping background poller...")
|
||||||
await poller.stop()
|
await poller.stop()
|
||||||
logger.info("Background poller stopped")
|
logger.info("Background poller stopped")
|
||||||
|
logger.info("Closing TCP connection pool...")
|
||||||
|
await _connection_pool.close_all()
|
||||||
|
logger.info("TCP connection pool closed")
|
||||||
|
|
||||||
|
|
||||||
app = FastAPI(
|
app = FastAPI(
|
||||||
title="SLMM NL43 Addon",
|
title="SLMM NL43 Addon",
|
||||||
description="Standalone module for NL43 configuration and status APIs with background polling",
|
description="Standalone module for NL43 configuration and status APIs with background polling",
|
||||||
version="0.2.0",
|
version="0.3.0",
|
||||||
lifespan=lifespan,
|
lifespan=lifespan,
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -85,10 +92,14 @@ async def health():
|
|||||||
|
|
||||||
@app.get("/health/devices")
|
@app.get("/health/devices")
|
||||||
async def health_devices():
|
async def health_devices():
|
||||||
"""Enhanced health check that tests device connectivity."""
|
"""Enhanced health check that tests device connectivity.
|
||||||
|
|
||||||
|
Uses the connection pool to avoid unnecessary TCP handshakes — if a
|
||||||
|
cached connection exists and is alive, the device is reachable.
|
||||||
|
"""
|
||||||
from sqlalchemy.orm import Session
|
from sqlalchemy.orm import Session
|
||||||
from app.database import SessionLocal
|
from app.database import SessionLocal
|
||||||
from app.services import NL43Client
|
from app.services import _connection_pool
|
||||||
from app.models import NL43Config
|
from app.models import NL43Config
|
||||||
|
|
||||||
db: Session = SessionLocal()
|
db: Session = SessionLocal()
|
||||||
@@ -98,7 +109,7 @@ async def health_devices():
|
|||||||
configs = db.query(NL43Config).filter_by(tcp_enabled=True).all()
|
configs = db.query(NL43Config).filter_by(tcp_enabled=True).all()
|
||||||
|
|
||||||
for cfg in configs:
|
for cfg in configs:
|
||||||
client = NL43Client(cfg.host, cfg.tcp_port, timeout=2.0, ftp_username=cfg.ftp_username, ftp_password=cfg.ftp_password)
|
device_key = f"{cfg.host}:{cfg.tcp_port}"
|
||||||
status = {
|
status = {
|
||||||
"unit_id": cfg.unit_id,
|
"unit_id": cfg.unit_id,
|
||||||
"host": cfg.host,
|
"host": cfg.host,
|
||||||
@@ -108,14 +119,22 @@ async def health_devices():
|
|||||||
}
|
}
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Try to connect (don't send command to avoid rate limiting issues)
|
# Check if pool already has a live connection (zero-cost check)
|
||||||
import asyncio
|
pool_stats = _connection_pool.get_stats()
|
||||||
reader, writer = await asyncio.wait_for(
|
conn_info = pool_stats["connections"].get(device_key)
|
||||||
asyncio.open_connection(cfg.host, cfg.tcp_port), timeout=2.0
|
if conn_info and conn_info["alive"]:
|
||||||
)
|
status["reachable"] = True
|
||||||
writer.close()
|
status["source"] = "pool"
|
||||||
await writer.wait_closed()
|
else:
|
||||||
status["reachable"] = True
|
# No cached connection — do a lightweight acquire/release
|
||||||
|
# This opens a connection if needed but keeps it in the pool
|
||||||
|
import asyncio
|
||||||
|
reader, writer, from_cache = await _connection_pool.acquire(
|
||||||
|
device_key, cfg.host, cfg.tcp_port, timeout=2.0
|
||||||
|
)
|
||||||
|
await _connection_pool.release(device_key, reader, writer, cfg.host, cfg.tcp_port)
|
||||||
|
status["reachable"] = True
|
||||||
|
status["source"] = "cached" if from_cache else "new"
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
status["error"] = str(type(e).__name__)
|
status["error"] = str(type(e).__name__)
|
||||||
logger.warning(f"Device {cfg.unit_id} health check failed: {e}")
|
logger.warning(f"Device {cfg.unit_id} health check failed: {e}")
|
||||||
|
|||||||
@@ -53,3 +53,22 @@ class NL43Status(Base):
|
|||||||
last_poll_attempt = Column(DateTime, nullable=True) # Last time background poller attempted to poll
|
last_poll_attempt = Column(DateTime, nullable=True) # Last time background poller attempted to poll
|
||||||
last_success = Column(DateTime, nullable=True) # Last successful poll timestamp
|
last_success = Column(DateTime, nullable=True) # Last successful poll timestamp
|
||||||
last_error = Column(Text, nullable=True) # Last error message (truncated to 500 chars)
|
last_error = Column(Text, nullable=True) # Last error message (truncated to 500 chars)
|
||||||
|
|
||||||
|
# FTP start time sync tracking
|
||||||
|
start_time_sync_attempted = Column(Boolean, default=False) # True if FTP sync was attempted for current measurement
|
||||||
|
|
||||||
|
|
||||||
|
class DeviceLog(Base):
|
||||||
|
"""
|
||||||
|
Per-device log entries for debugging and audit trail.
|
||||||
|
Stores events like commands, state changes, errors, and FTP operations.
|
||||||
|
"""
|
||||||
|
|
||||||
|
__tablename__ = "device_logs"
|
||||||
|
|
||||||
|
id = Column(Integer, primary_key=True, autoincrement=True)
|
||||||
|
unit_id = Column(String, index=True, nullable=False)
|
||||||
|
timestamp = Column(DateTime, default=func.now(), index=True)
|
||||||
|
level = Column(String, default="INFO") # DEBUG, INFO, WARNING, ERROR
|
||||||
|
category = Column(String, default="GENERAL") # TCP, FTP, POLL, COMMAND, STATE, SYNC
|
||||||
|
message = Column(Text, nullable=False)
|
||||||
|
|||||||
279
app/routers.py
279
app/routers.py
@@ -3,6 +3,7 @@ from fastapi.responses import FileResponse
|
|||||||
from sqlalchemy.orm import Session
|
from sqlalchemy.orm import Session
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from pydantic import BaseModel, field_validator, Field
|
from pydantic import BaseModel, field_validator, Field
|
||||||
|
from typing import Optional
|
||||||
import logging
|
import logging
|
||||||
import ipaddress
|
import ipaddress
|
||||||
import json
|
import json
|
||||||
@@ -81,17 +82,45 @@ class ConfigPayload(BaseModel):
|
|||||||
@field_validator("poll_interval_seconds")
|
@field_validator("poll_interval_seconds")
|
||||||
@classmethod
|
@classmethod
|
||||||
def validate_poll_interval(cls, v):
|
def validate_poll_interval(cls, v):
|
||||||
if v is not None and not (10 <= v <= 3600):
|
if v is not None and not (30 <= v <= 21600):
|
||||||
raise ValueError("Poll interval must be between 10 and 3600 seconds")
|
raise ValueError("Poll interval must be between 30 and 21600 seconds (30s to 6 hours)")
|
||||||
return v
|
return v
|
||||||
|
|
||||||
|
|
||||||
class PollingConfigPayload(BaseModel):
|
class PollingConfigPayload(BaseModel):
|
||||||
"""Payload for updating device polling configuration."""
|
"""Payload for updating device polling configuration."""
|
||||||
poll_interval_seconds: int | None = Field(None, ge=10, le=3600, description="Polling interval in seconds (10-3600)")
|
poll_interval_seconds: int | None = Field(None, ge=30, le=21600, description="Polling interval in seconds (30s to 6 hours)")
|
||||||
poll_enabled: bool | None = Field(None, description="Enable or disable background polling for this device")
|
poll_enabled: bool | None = Field(None, description="Enable or disable background polling for this device")
|
||||||
|
|
||||||
|
|
||||||
|
# ============================================================================
|
||||||
|
# TCP CONNECTION POOL ENDPOINTS (must be before /{unit_id} routes)
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
@router.get("/_connections/status")
|
||||||
|
async def get_connection_pool_status():
|
||||||
|
"""Get status of the persistent TCP connection pool.
|
||||||
|
|
||||||
|
Returns information about cached connections, keepalive settings,
|
||||||
|
and per-device connection age/idle times.
|
||||||
|
"""
|
||||||
|
from app.services import _connection_pool
|
||||||
|
return {"status": "ok", "pool": _connection_pool.get_stats()}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/_connections/flush")
|
||||||
|
async def flush_connection_pool():
|
||||||
|
"""Close all cached TCP connections.
|
||||||
|
|
||||||
|
Useful for debugging or forcing fresh connections to all devices.
|
||||||
|
"""
|
||||||
|
from app.services import _connection_pool
|
||||||
|
await _connection_pool.close_all()
|
||||||
|
# Restart cleanup task since close_all cancels it
|
||||||
|
_connection_pool.start_cleanup()
|
||||||
|
return {"status": "ok", "message": "All cached connections closed"}
|
||||||
|
|
||||||
|
|
||||||
# ============================================================================
|
# ============================================================================
|
||||||
# GLOBAL POLLING STATUS ENDPOINT (must be before /{unit_id} routes)
|
# GLOBAL POLLING STATUS ENDPOINT (must be before /{unit_id} routes)
|
||||||
# ============================================================================
|
# ============================================================================
|
||||||
@@ -233,8 +262,8 @@ class RosterCreatePayload(BaseModel):
|
|||||||
@field_validator("poll_interval_seconds")
|
@field_validator("poll_interval_seconds")
|
||||||
@classmethod
|
@classmethod
|
||||||
def validate_poll_interval(cls, v):
|
def validate_poll_interval(cls, v):
|
||||||
if v is not None and not (10 <= v <= 3600):
|
if v is not None and not (30 <= v <= 21600):
|
||||||
raise ValueError("Poll interval must be between 10 and 3600 seconds")
|
raise ValueError("Poll interval must be between 30 and 21600 seconds (30s to 6 hours)")
|
||||||
return v
|
return v
|
||||||
|
|
||||||
|
|
||||||
@@ -544,12 +573,6 @@ async def stop_measurement(unit_id: str, db: Session = Depends(get_db)):
|
|||||||
try:
|
try:
|
||||||
await client.stop()
|
await client.stop()
|
||||||
logger.info(f"Stopped measurement on unit {unit_id}")
|
logger.info(f"Stopped measurement on unit {unit_id}")
|
||||||
|
|
||||||
# Query device status to update database with "Stop" state
|
|
||||||
snap = await client.request_dod()
|
|
||||||
snap.unit_id = unit_id
|
|
||||||
persist_snapshot(snap, db)
|
|
||||||
|
|
||||||
except ConnectionError as e:
|
except ConnectionError as e:
|
||||||
logger.error(f"Failed to stop measurement on {unit_id}: {e}")
|
logger.error(f"Failed to stop measurement on {unit_id}: {e}")
|
||||||
raise HTTPException(status_code=502, detail="Failed to communicate with device")
|
raise HTTPException(status_code=502, detail="Failed to communicate with device")
|
||||||
@@ -559,6 +582,15 @@ async def stop_measurement(unit_id: str, db: Session = Depends(get_db)):
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Unexpected error stopping measurement on {unit_id}: {e}")
|
logger.error(f"Unexpected error stopping measurement on {unit_id}: {e}")
|
||||||
raise HTTPException(status_code=500, detail="Internal server error")
|
raise HTTPException(status_code=500, detail="Internal server error")
|
||||||
|
|
||||||
|
# Query device status to update database — non-fatal if this fails
|
||||||
|
try:
|
||||||
|
snap = await client.request_dod()
|
||||||
|
snap.unit_id = unit_id
|
||||||
|
persist_snapshot(snap, db)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Stop succeeded but failed to update status for {unit_id}: {e}")
|
||||||
|
|
||||||
return {"status": "ok", "message": "Measurement stopped"}
|
return {"status": "ok", "message": "Measurement stopped"}
|
||||||
|
|
||||||
|
|
||||||
@@ -656,8 +688,9 @@ async def stop_cycle(unit_id: str, payload: StopCyclePayload = None, db: Session
|
|||||||
return {"status": "ok", "unit_id": unit_id, **result}
|
return {"status": "ok", "unit_id": unit_id, **result}
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Stop cycle failed for {unit_id}: {e}")
|
error_msg = str(e) if str(e) else f"{type(e).__name__}: No details available"
|
||||||
raise HTTPException(status_code=502, detail=str(e))
|
logger.error(f"Stop cycle failed for {unit_id}: {error_msg}")
|
||||||
|
raise HTTPException(status_code=502, detail=error_msg)
|
||||||
|
|
||||||
|
|
||||||
@router.post("/{unit_id}/store")
|
@router.post("/{unit_id}/store")
|
||||||
@@ -1722,74 +1755,38 @@ async def run_diagnostics(unit_id: str, db: Session = Depends(get_db)):
|
|||||||
"message": "TCP communication enabled"
|
"message": "TCP communication enabled"
|
||||||
}
|
}
|
||||||
|
|
||||||
# Test 3: Modem/Router reachable (check port 443 HTTPS)
|
# Test 3: TCP connection reachable (device port) — uses connection pool
|
||||||
|
# This avoids extra TCP handshakes over cellular. If a cached connection
|
||||||
|
# exists and is alive, we skip the handshake entirely.
|
||||||
|
from app.services import _connection_pool
|
||||||
|
device_key = f"{cfg.host}:{cfg.tcp_port}"
|
||||||
try:
|
try:
|
||||||
reader, writer = await asyncio.wait_for(
|
pool_stats = _connection_pool.get_stats()
|
||||||
asyncio.open_connection(cfg.host, 443), timeout=3.0
|
conn_info = pool_stats["connections"].get(device_key)
|
||||||
)
|
if conn_info and conn_info["alive"]:
|
||||||
writer.close()
|
# Pool already has a live connection — device is reachable
|
||||||
await writer.wait_closed()
|
diagnostics["tests"]["tcp_connection"] = {
|
||||||
diagnostics["tests"]["modem_reachable"] = {
|
"status": "pass",
|
||||||
"status": "pass",
|
"message": f"TCP connection alive in pool for {cfg.host}:{cfg.tcp_port}"
|
||||||
"message": f"Modem/router reachable at {cfg.host}"
|
}
|
||||||
}
|
else:
|
||||||
except asyncio.TimeoutError:
|
# Acquire through the pool (opens new if needed, keeps it cached)
|
||||||
diagnostics["tests"]["modem_reachable"] = {
|
reader, writer, from_cache = await _connection_pool.acquire(
|
||||||
"status": "fail",
|
device_key, cfg.host, cfg.tcp_port, timeout=3.0
|
||||||
"message": f"Modem/router timeout at {cfg.host} (network issue)"
|
)
|
||||||
}
|
await _connection_pool.release(device_key, reader, writer, cfg.host, cfg.tcp_port)
|
||||||
diagnostics["overall_status"] = "fail"
|
diagnostics["tests"]["tcp_connection"] = {
|
||||||
return diagnostics
|
"status": "pass",
|
||||||
except ConnectionRefusedError:
|
"message": f"TCP connection successful to {cfg.host}:{cfg.tcp_port}"
|
||||||
# Connection refused means host is up but port 443 closed - that's ok
|
}
|
||||||
diagnostics["tests"]["modem_reachable"] = {
|
|
||||||
"status": "pass",
|
|
||||||
"message": f"Modem/router reachable at {cfg.host} (HTTPS closed)"
|
|
||||||
}
|
|
||||||
except Exception as e:
|
|
||||||
diagnostics["tests"]["modem_reachable"] = {
|
|
||||||
"status": "fail",
|
|
||||||
"message": f"Cannot reach modem/router at {cfg.host}: {str(e)}"
|
|
||||||
}
|
|
||||||
diagnostics["overall_status"] = "fail"
|
|
||||||
return diagnostics
|
|
||||||
|
|
||||||
# Test 4: TCP connection reachable (device port)
|
|
||||||
try:
|
|
||||||
reader, writer = await asyncio.wait_for(
|
|
||||||
asyncio.open_connection(cfg.host, cfg.tcp_port), timeout=3.0
|
|
||||||
)
|
|
||||||
writer.close()
|
|
||||||
await writer.wait_closed()
|
|
||||||
diagnostics["tests"]["tcp_connection"] = {
|
|
||||||
"status": "pass",
|
|
||||||
"message": f"TCP connection successful to {cfg.host}:{cfg.tcp_port}"
|
|
||||||
}
|
|
||||||
except asyncio.TimeoutError:
|
|
||||||
diagnostics["tests"]["tcp_connection"] = {
|
|
||||||
"status": "fail",
|
|
||||||
"message": f"Connection timeout to {cfg.host}:{cfg.tcp_port}"
|
|
||||||
}
|
|
||||||
diagnostics["overall_status"] = "fail"
|
|
||||||
return diagnostics
|
|
||||||
except ConnectionRefusedError:
|
|
||||||
diagnostics["tests"]["tcp_connection"] = {
|
|
||||||
"status": "fail",
|
|
||||||
"message": f"Connection refused by {cfg.host}:{cfg.tcp_port}"
|
|
||||||
}
|
|
||||||
diagnostics["overall_status"] = "fail"
|
|
||||||
return diagnostics
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
diagnostics["tests"]["tcp_connection"] = {
|
diagnostics["tests"]["tcp_connection"] = {
|
||||||
"status": "fail",
|
"status": "fail",
|
||||||
"message": f"Connection error: {str(e)}"
|
"message": f"Connection error to {cfg.host}:{cfg.tcp_port}: {str(e)}"
|
||||||
}
|
}
|
||||||
diagnostics["overall_status"] = "fail"
|
diagnostics["overall_status"] = "fail"
|
||||||
return diagnostics
|
return diagnostics
|
||||||
|
|
||||||
# Wait a bit after connection test to let device settle
|
|
||||||
await asyncio.sleep(1.5)
|
|
||||||
|
|
||||||
# Test 5: Device responds to commands
|
# Test 5: Device responds to commands
|
||||||
# Use longer timeout to account for rate limiting (device requires ≥1s between commands)
|
# Use longer timeout to account for rate limiting (device requires ≥1s between commands)
|
||||||
client = NL43Client(cfg.host, cfg.tcp_port, timeout=10.0, ftp_username=cfg.ftp_username, ftp_password=cfg.ftp_password)
|
client = NL43Client(cfg.host, cfg.tcp_port, timeout=10.0, ftp_username=cfg.ftp_username, ftp_password=cfg.ftp_password)
|
||||||
@@ -1842,9 +1839,134 @@ async def run_diagnostics(unit_id: str, db: Session = Depends(get_db)):
|
|||||||
|
|
||||||
# All tests passed
|
# All tests passed
|
||||||
diagnostics["overall_status"] = "pass"
|
diagnostics["overall_status"] = "pass"
|
||||||
|
|
||||||
|
# Add database dump: config and status cache
|
||||||
|
diagnostics["database_dump"] = {
|
||||||
|
"config": {
|
||||||
|
"unit_id": cfg.unit_id,
|
||||||
|
"host": cfg.host,
|
||||||
|
"tcp_port": cfg.tcp_port,
|
||||||
|
"tcp_enabled": cfg.tcp_enabled,
|
||||||
|
"ftp_enabled": cfg.ftp_enabled,
|
||||||
|
"ftp_port": cfg.ftp_port,
|
||||||
|
"ftp_username": cfg.ftp_username,
|
||||||
|
"ftp_password": "***" if cfg.ftp_password else None, # Mask password
|
||||||
|
"web_enabled": cfg.web_enabled,
|
||||||
|
"poll_interval_seconds": cfg.poll_interval_seconds,
|
||||||
|
"poll_enabled": cfg.poll_enabled
|
||||||
|
},
|
||||||
|
"status_cache": None
|
||||||
|
}
|
||||||
|
|
||||||
|
# Get cached status if available
|
||||||
|
status = db.query(NL43Status).filter_by(unit_id=unit_id).first()
|
||||||
|
if status:
|
||||||
|
# Helper to format datetime as ISO with Z suffix to indicate UTC
|
||||||
|
def to_utc_iso(dt):
|
||||||
|
return dt.isoformat() + 'Z' if dt else None
|
||||||
|
|
||||||
|
diagnostics["database_dump"]["status_cache"] = {
|
||||||
|
"unit_id": status.unit_id,
|
||||||
|
"last_seen": to_utc_iso(status.last_seen),
|
||||||
|
"measurement_state": status.measurement_state,
|
||||||
|
"measurement_start_time": to_utc_iso(status.measurement_start_time),
|
||||||
|
"counter": status.counter,
|
||||||
|
"lp": status.lp,
|
||||||
|
"leq": status.leq,
|
||||||
|
"lmax": status.lmax,
|
||||||
|
"lmin": status.lmin,
|
||||||
|
"lpeak": status.lpeak,
|
||||||
|
"battery_level": status.battery_level,
|
||||||
|
"power_source": status.power_source,
|
||||||
|
"sd_remaining_mb": status.sd_remaining_mb,
|
||||||
|
"sd_free_ratio": status.sd_free_ratio,
|
||||||
|
"is_reachable": status.is_reachable,
|
||||||
|
"consecutive_failures": status.consecutive_failures,
|
||||||
|
"last_poll_attempt": to_utc_iso(status.last_poll_attempt),
|
||||||
|
"last_success": to_utc_iso(status.last_success),
|
||||||
|
"last_error": status.last_error,
|
||||||
|
"raw_payload": status.raw_payload
|
||||||
|
}
|
||||||
|
|
||||||
return diagnostics
|
return diagnostics
|
||||||
|
|
||||||
|
|
||||||
|
# ============================================================================
|
||||||
|
# DEVICE LOGS ENDPOINTS
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
@router.get("/{unit_id}/logs")
|
||||||
|
def get_device_logs(
|
||||||
|
unit_id: str,
|
||||||
|
limit: int = 100,
|
||||||
|
offset: int = 0,
|
||||||
|
level: Optional[str] = None,
|
||||||
|
category: Optional[str] = None,
|
||||||
|
db: Session = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Get log entries for a specific device.
|
||||||
|
|
||||||
|
Query parameters:
|
||||||
|
- limit: Max entries to return (default: 100, max: 1000)
|
||||||
|
- offset: Number of entries to skip (for pagination)
|
||||||
|
- level: Filter by level (DEBUG, INFO, WARNING, ERROR)
|
||||||
|
- category: Filter by category (TCP, FTP, POLL, COMMAND, STATE, SYNC)
|
||||||
|
|
||||||
|
Returns newest entries first.
|
||||||
|
"""
|
||||||
|
from app.device_logger import get_device_logs as fetch_logs, get_log_stats
|
||||||
|
|
||||||
|
# Validate limit
|
||||||
|
limit = min(limit, 1000)
|
||||||
|
|
||||||
|
logs = fetch_logs(
|
||||||
|
unit_id=unit_id,
|
||||||
|
limit=limit,
|
||||||
|
offset=offset,
|
||||||
|
level=level,
|
||||||
|
category=category,
|
||||||
|
db=db
|
||||||
|
)
|
||||||
|
|
||||||
|
stats = get_log_stats(unit_id, db)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"status": "ok",
|
||||||
|
"unit_id": unit_id,
|
||||||
|
"logs": logs,
|
||||||
|
"count": len(logs),
|
||||||
|
"stats": stats,
|
||||||
|
"filters": {
|
||||||
|
"level": level,
|
||||||
|
"category": category
|
||||||
|
},
|
||||||
|
"pagination": {
|
||||||
|
"limit": limit,
|
||||||
|
"offset": offset
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/{unit_id}/logs")
|
||||||
|
def clear_device_logs(unit_id: str, db: Session = Depends(get_db)):
|
||||||
|
"""
|
||||||
|
Clear all log entries for a specific device.
|
||||||
|
"""
|
||||||
|
from app.models import DeviceLog
|
||||||
|
|
||||||
|
deleted = db.query(DeviceLog).filter(DeviceLog.unit_id == unit_id).delete()
|
||||||
|
db.commit()
|
||||||
|
|
||||||
|
logger.info(f"Cleared {deleted} log entries for device {unit_id}")
|
||||||
|
|
||||||
|
return {
|
||||||
|
"status": "ok",
|
||||||
|
"message": f"Cleared {deleted} log entries for {unit_id}",
|
||||||
|
"deleted_count": deleted
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
# ============================================================================
|
# ============================================================================
|
||||||
# BACKGROUND POLLING CONFIGURATION ENDPOINTS
|
# BACKGROUND POLLING CONFIGURATION ENDPOINTS
|
||||||
# ============================================================================
|
# ============================================================================
|
||||||
@@ -1880,7 +2002,7 @@ def update_polling_config(
|
|||||||
"""
|
"""
|
||||||
Update background polling configuration for a device.
|
Update background polling configuration for a device.
|
||||||
|
|
||||||
Allows configuring the polling interval (10-3600 seconds) and
|
Allows configuring the polling interval (30-21600 seconds, i.e. 30s to 6 hours) and
|
||||||
enabling/disabling automatic background polling per device.
|
enabling/disabling automatic background polling per device.
|
||||||
|
|
||||||
Changes take effect on the next polling cycle.
|
Changes take effect on the next polling cycle.
|
||||||
@@ -1891,10 +2013,15 @@ def update_polling_config(
|
|||||||
|
|
||||||
# Update interval if provided
|
# Update interval if provided
|
||||||
if payload.poll_interval_seconds is not None:
|
if payload.poll_interval_seconds is not None:
|
||||||
if payload.poll_interval_seconds < 10:
|
if payload.poll_interval_seconds < 30:
|
||||||
raise HTTPException(
|
raise HTTPException(
|
||||||
status_code=400,
|
status_code=400,
|
||||||
detail="Polling interval must be at least 10 seconds"
|
detail="Polling interval must be at least 30 seconds"
|
||||||
|
)
|
||||||
|
if payload.poll_interval_seconds > 21600:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=400,
|
||||||
|
detail="Polling interval must be at most 21600 seconds (6 hours)"
|
||||||
)
|
)
|
||||||
cfg.poll_interval_seconds = payload.poll_interval_seconds
|
cfg.poll_interval_seconds = payload.poll_interval_seconds
|
||||||
|
|
||||||
|
|||||||
804
app/services.py
804
app/services.py
@@ -1,20 +1,22 @@
|
|||||||
"""
|
"""
|
||||||
NL43 TCP connector and snapshot persistence.
|
NL43 TCP connector and snapshot persistence.
|
||||||
|
|
||||||
Implements simple per-request TCP calls to avoid long-lived socket complexity.
|
Implements persistent per-device TCP connections with OS-level keepalive
|
||||||
Extend to pooled connections/DRD streaming later.
|
to reduce handshake overhead and survive cellular modem NAT timeouts.
|
||||||
|
Falls back to per-request connections on error with transparent retry.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import asyncio
|
import asyncio
|
||||||
import contextlib
|
import contextlib
|
||||||
import logging
|
import logging
|
||||||
|
import socket
|
||||||
import time
|
import time
|
||||||
import os
|
import os
|
||||||
import zipfile
|
import zipfile
|
||||||
import tempfile
|
import tempfile
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass, field
|
||||||
from datetime import datetime, timezone, timedelta
|
from datetime import datetime, timezone, timedelta
|
||||||
from typing import Optional, List
|
from typing import Optional, List, Dict, Tuple
|
||||||
from sqlalchemy.orm import Session
|
from sqlalchemy.orm import Session
|
||||||
from ftplib import FTP
|
from ftplib import FTP
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
@@ -76,10 +78,22 @@ def persist_snapshot(s: NL43Snapshot, db: Session):
|
|||||||
# Measurement just started - record the start time
|
# Measurement just started - record the start time
|
||||||
row.measurement_start_time = datetime.utcnow()
|
row.measurement_start_time = datetime.utcnow()
|
||||||
logger.info(f"✓ Measurement started on {s.unit_id} at {row.measurement_start_time}")
|
logger.info(f"✓ Measurement started on {s.unit_id} at {row.measurement_start_time}")
|
||||||
|
# Log state change (lazy import to avoid circular dependency)
|
||||||
|
try:
|
||||||
|
from app.device_logger import log_device_event
|
||||||
|
log_device_event(s.unit_id, "INFO", "STATE", f"Measurement STARTED at {row.measurement_start_time}", db)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
elif was_measuring and not is_measuring:
|
elif was_measuring and not is_measuring:
|
||||||
# Measurement stopped - clear the start time
|
# Measurement stopped - clear the start time
|
||||||
row.measurement_start_time = None
|
row.measurement_start_time = None
|
||||||
logger.info(f"✓ Measurement stopped on {s.unit_id}")
|
logger.info(f"✓ Measurement stopped on {s.unit_id}")
|
||||||
|
# Log state change
|
||||||
|
try:
|
||||||
|
from app.device_logger import log_device_event
|
||||||
|
log_device_event(s.unit_id, "INFO", "STATE", "Measurement STOPPED", db)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
row.measurement_state = new_state
|
row.measurement_state = new_state
|
||||||
row.counter = s.counter
|
row.counter = s.counter
|
||||||
@@ -101,10 +115,413 @@ def persist_snapshot(s: NL43Snapshot, db: Session):
|
|||||||
raise
|
raise
|
||||||
|
|
||||||
|
|
||||||
|
async def sync_measurement_start_time_from_ftp(
|
||||||
|
unit_id: str,
|
||||||
|
host: str,
|
||||||
|
tcp_port: int,
|
||||||
|
ftp_port: int,
|
||||||
|
ftp_username: str,
|
||||||
|
ftp_password: str,
|
||||||
|
db: Session
|
||||||
|
) -> bool:
|
||||||
|
"""
|
||||||
|
Sync measurement start time from the FTP folder timestamp.
|
||||||
|
|
||||||
|
This is called when SLMM detects a device is already measuring but doesn't
|
||||||
|
have a recorded start time (e.g., after service restart or if measurement
|
||||||
|
was started before SLMM began polling).
|
||||||
|
|
||||||
|
The workflow:
|
||||||
|
1. Disable FTP (reset)
|
||||||
|
2. Enable FTP
|
||||||
|
3. List NL-43 folder to get measurement folder timestamps
|
||||||
|
4. Use the most recent folder's timestamp as the start time
|
||||||
|
5. Update the database
|
||||||
|
|
||||||
|
Args:
|
||||||
|
unit_id: Device identifier
|
||||||
|
host: Device IP/hostname
|
||||||
|
tcp_port: TCP control port
|
||||||
|
ftp_port: FTP port (usually 21)
|
||||||
|
ftp_username: FTP username (usually "USER")
|
||||||
|
ftp_password: FTP password (usually "0000")
|
||||||
|
db: Database session
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if sync succeeded, False otherwise
|
||||||
|
"""
|
||||||
|
logger.info(f"[FTP-SYNC] Attempting to sync measurement start time for {unit_id} via FTP")
|
||||||
|
|
||||||
|
client = NL43Client(
|
||||||
|
host, tcp_port,
|
||||||
|
ftp_username=ftp_username,
|
||||||
|
ftp_password=ftp_password,
|
||||||
|
ftp_port=ftp_port
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Step 1: Disable FTP to reset it
|
||||||
|
logger.info(f"[FTP-SYNC] Step 1: Disabling FTP on {unit_id}")
|
||||||
|
await client.disable_ftp()
|
||||||
|
await asyncio.sleep(1.5) # Wait for device to process
|
||||||
|
|
||||||
|
# Step 2: Enable FTP
|
||||||
|
logger.info(f"[FTP-SYNC] Step 2: Enabling FTP on {unit_id}")
|
||||||
|
await client.enable_ftp()
|
||||||
|
await asyncio.sleep(2.0) # Wait for FTP server to start
|
||||||
|
|
||||||
|
# Step 3: List NL-43 folder
|
||||||
|
logger.info(f"[FTP-SYNC] Step 3: Listing /NL-43 folder on {unit_id}")
|
||||||
|
files = await client.list_ftp_files("/NL-43")
|
||||||
|
|
||||||
|
# Filter for directories only (measurement folders)
|
||||||
|
folders = [f for f in files if f.get('is_dir', False)]
|
||||||
|
|
||||||
|
if not folders:
|
||||||
|
logger.warning(f"[FTP-SYNC] No measurement folders found on {unit_id}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Sort by modified timestamp (newest first)
|
||||||
|
folders.sort(key=lambda f: f.get('modified_timestamp', ''), reverse=True)
|
||||||
|
|
||||||
|
latest_folder = folders[0]
|
||||||
|
folder_name = latest_folder['name']
|
||||||
|
logger.info(f"[FTP-SYNC] Found latest measurement folder: {folder_name}")
|
||||||
|
|
||||||
|
# Step 4: Parse timestamp
|
||||||
|
if 'modified_timestamp' in latest_folder and latest_folder['modified_timestamp']:
|
||||||
|
timestamp_str = latest_folder['modified_timestamp']
|
||||||
|
# Parse ISO format timestamp (already in UTC from SLMM FTP listing)
|
||||||
|
start_time = datetime.fromisoformat(timestamp_str.replace('Z', ''))
|
||||||
|
|
||||||
|
# Step 5: Update database
|
||||||
|
status = db.query(NL43Status).filter_by(unit_id=unit_id).first()
|
||||||
|
if status:
|
||||||
|
old_time = status.measurement_start_time
|
||||||
|
status.measurement_start_time = start_time
|
||||||
|
db.commit()
|
||||||
|
|
||||||
|
logger.info(f"[FTP-SYNC] ✓ Successfully synced start time for {unit_id}")
|
||||||
|
logger.info(f"[FTP-SYNC] Folder: {folder_name}")
|
||||||
|
logger.info(f"[FTP-SYNC] Old start time: {old_time}")
|
||||||
|
logger.info(f"[FTP-SYNC] New start time: {start_time}")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
logger.warning(f"[FTP-SYNC] Status record not found for {unit_id}")
|
||||||
|
return False
|
||||||
|
else:
|
||||||
|
logger.warning(f"[FTP-SYNC] Could not parse timestamp from folder {folder_name}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"[FTP-SYNC] Failed to sync start time for {unit_id}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
# Rate limiting: NL43 requires ≥1 second between commands
|
# Rate limiting: NL43 requires ≥1 second between commands
|
||||||
_last_command_time = {}
|
_last_command_time = {}
|
||||||
_rate_limit_lock = asyncio.Lock()
|
_rate_limit_lock = asyncio.Lock()
|
||||||
|
|
||||||
|
# Per-device connection locks: NL43 devices only support one TCP connection at a time
|
||||||
|
# This prevents concurrent connections from fighting for the device
|
||||||
|
_device_locks: Dict[str, asyncio.Lock] = {}
|
||||||
|
_device_locks_lock = asyncio.Lock()
|
||||||
|
|
||||||
|
|
||||||
|
async def _get_device_lock(device_key: str) -> asyncio.Lock:
|
||||||
|
"""Get or create a lock for a specific device."""
|
||||||
|
async with _device_locks_lock:
|
||||||
|
if device_key not in _device_locks:
|
||||||
|
_device_locks[device_key] = asyncio.Lock()
|
||||||
|
return _device_locks[device_key]
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Persistent TCP connection pool with OS-level keepalive
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# Configuration via environment variables
|
||||||
|
TCP_PERSISTENT_ENABLED = os.getenv("TCP_PERSISTENT_ENABLED", "true").lower() == "true"
|
||||||
|
TCP_IDLE_TTL = float(os.getenv("TCP_IDLE_TTL", "300")) # Close idle connections after N seconds
|
||||||
|
TCP_MAX_AGE = float(os.getenv("TCP_MAX_AGE", "1800")) # Force reconnect after N seconds
|
||||||
|
TCP_KEEPALIVE_IDLE = int(os.getenv("TCP_KEEPALIVE_IDLE", "15")) # Seconds idle before probes
|
||||||
|
TCP_KEEPALIVE_INTERVAL = int(os.getenv("TCP_KEEPALIVE_INTERVAL", "10")) # Seconds between probes
|
||||||
|
TCP_KEEPALIVE_COUNT = int(os.getenv("TCP_KEEPALIVE_COUNT", "3")) # Failed probes before dead
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"TCP connection pool: persistent={TCP_PERSISTENT_ENABLED}, "
|
||||||
|
f"idle_ttl={TCP_IDLE_TTL}s, max_age={TCP_MAX_AGE}s, "
|
||||||
|
f"keepalive_idle={TCP_KEEPALIVE_IDLE}s, keepalive_interval={TCP_KEEPALIVE_INTERVAL}s, "
|
||||||
|
f"keepalive_count={TCP_KEEPALIVE_COUNT}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class DeviceConnection:
|
||||||
|
"""Tracks a cached TCP connection and its metadata."""
|
||||||
|
reader: asyncio.StreamReader
|
||||||
|
writer: asyncio.StreamWriter
|
||||||
|
device_key: str
|
||||||
|
host: str
|
||||||
|
port: int
|
||||||
|
created_at: float = field(default_factory=time.time)
|
||||||
|
last_used_at: float = field(default_factory=time.time)
|
||||||
|
|
||||||
|
|
||||||
|
class ConnectionPool:
|
||||||
|
"""Per-device persistent TCP connection cache with OS-level keepalive.
|
||||||
|
|
||||||
|
Each NL-43 device supports only one TCP connection at a time. This pool
|
||||||
|
caches that single connection per device key and reuses it across commands,
|
||||||
|
avoiding repeated TCP handshakes over high-latency cellular links.
|
||||||
|
|
||||||
|
Keepalive probes keep cellular NAT tables alive and detect dead connections
|
||||||
|
before the next command attempt.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
enable_persistent: bool = True,
|
||||||
|
idle_ttl: float = 120.0,
|
||||||
|
max_age: float = 300.0,
|
||||||
|
keepalive_idle: int = 15,
|
||||||
|
keepalive_interval: int = 10,
|
||||||
|
keepalive_count: int = 3,
|
||||||
|
):
|
||||||
|
self._connections: Dict[str, DeviceConnection] = {}
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
self._enable_persistent = enable_persistent
|
||||||
|
self._idle_ttl = idle_ttl
|
||||||
|
self._max_age = max_age
|
||||||
|
self._keepalive_idle = keepalive_idle
|
||||||
|
self._keepalive_interval = keepalive_interval
|
||||||
|
self._keepalive_count = keepalive_count
|
||||||
|
self._cleanup_task: Optional[asyncio.Task] = None
|
||||||
|
|
||||||
|
# -- lifecycle ----------------------------------------------------------
|
||||||
|
|
||||||
|
def start_cleanup(self):
|
||||||
|
"""Start background task that evicts stale connections."""
|
||||||
|
if self._enable_persistent and self._cleanup_task is None:
|
||||||
|
self._cleanup_task = asyncio.create_task(self._cleanup_loop())
|
||||||
|
logger.info("Connection pool cleanup task started")
|
||||||
|
|
||||||
|
async def close_all(self):
|
||||||
|
"""Close all cached connections (called at shutdown)."""
|
||||||
|
if self._cleanup_task is not None:
|
||||||
|
self._cleanup_task.cancel()
|
||||||
|
with contextlib.suppress(asyncio.CancelledError):
|
||||||
|
await self._cleanup_task
|
||||||
|
self._cleanup_task = None
|
||||||
|
|
||||||
|
async with self._lock:
|
||||||
|
for key, conn in list(self._connections.items()):
|
||||||
|
await self._close_connection(conn, reason="shutdown")
|
||||||
|
self._connections.clear()
|
||||||
|
logger.info("Connection pool: all connections closed")
|
||||||
|
|
||||||
|
# -- public API ---------------------------------------------------------
|
||||||
|
|
||||||
|
async def acquire(
|
||||||
|
self, device_key: str, host: str, port: int, timeout: float
|
||||||
|
) -> Tuple[asyncio.StreamReader, asyncio.StreamWriter, bool]:
|
||||||
|
"""Get a connection for a device (cached or fresh).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
(reader, writer, from_cache) — from_cache is True if reused.
|
||||||
|
"""
|
||||||
|
if self._enable_persistent:
|
||||||
|
async with self._lock:
|
||||||
|
conn = self._connections.pop(device_key, None)
|
||||||
|
|
||||||
|
if conn is not None:
|
||||||
|
if self._is_alive(conn):
|
||||||
|
self._drain_buffer(conn.reader)
|
||||||
|
conn.last_used_at = time.time()
|
||||||
|
logger.info(f"Pool hit for {device_key} (age={time.time() - conn.created_at:.0f}s)")
|
||||||
|
return conn.reader, conn.writer, True
|
||||||
|
else:
|
||||||
|
await self._close_connection(conn, reason="stale")
|
||||||
|
|
||||||
|
# Open fresh connection
|
||||||
|
reader, writer = await self._open_connection(host, port, timeout)
|
||||||
|
logger.info(f"New connection opened for {device_key}")
|
||||||
|
return reader, writer, False
|
||||||
|
|
||||||
|
async def release(self, device_key: str, reader: asyncio.StreamReader, writer: asyncio.StreamWriter, host: str, port: int):
|
||||||
|
"""Return a connection to the pool for reuse."""
|
||||||
|
if not self._enable_persistent:
|
||||||
|
self._close_writer(writer)
|
||||||
|
return
|
||||||
|
|
||||||
|
# Check transport is still healthy before caching
|
||||||
|
if writer.transport.is_closing() or reader.at_eof():
|
||||||
|
self._close_writer(writer)
|
||||||
|
return
|
||||||
|
|
||||||
|
conn = DeviceConnection(
|
||||||
|
reader=reader,
|
||||||
|
writer=writer,
|
||||||
|
device_key=device_key,
|
||||||
|
host=host,
|
||||||
|
port=port,
|
||||||
|
)
|
||||||
|
|
||||||
|
async with self._lock:
|
||||||
|
# Evict any existing connection for this device (shouldn't happen
|
||||||
|
# under normal locking, but be safe)
|
||||||
|
old = self._connections.pop(device_key, None)
|
||||||
|
if old is not None:
|
||||||
|
await self._close_connection(old, reason="replaced")
|
||||||
|
self._connections[device_key] = conn
|
||||||
|
|
||||||
|
async def discard(self, device_key: str):
|
||||||
|
"""Close and remove a connection from the pool (called on errors)."""
|
||||||
|
async with self._lock:
|
||||||
|
conn = self._connections.pop(device_key, None)
|
||||||
|
if conn is not None:
|
||||||
|
await self._close_connection(conn, reason="discarded")
|
||||||
|
logger.debug(f"Pool discard for {device_key}")
|
||||||
|
|
||||||
|
def get_stats(self) -> dict:
|
||||||
|
"""Return pool status for diagnostics."""
|
||||||
|
now = time.time()
|
||||||
|
connections = {}
|
||||||
|
for key, conn in self._connections.items():
|
||||||
|
connections[key] = {
|
||||||
|
"host": conn.host,
|
||||||
|
"port": conn.port,
|
||||||
|
"age_seconds": round(now - conn.created_at, 1),
|
||||||
|
"idle_seconds": round(now - conn.last_used_at, 1),
|
||||||
|
"alive": self._is_alive(conn),
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
"enabled": self._enable_persistent,
|
||||||
|
"active_connections": len(self._connections),
|
||||||
|
"idle_ttl": self._idle_ttl,
|
||||||
|
"max_age": self._max_age,
|
||||||
|
"keepalive_idle": self._keepalive_idle,
|
||||||
|
"keepalive_interval": self._keepalive_interval,
|
||||||
|
"keepalive_count": self._keepalive_count,
|
||||||
|
"connections": connections,
|
||||||
|
}
|
||||||
|
|
||||||
|
# -- internals ----------------------------------------------------------
|
||||||
|
|
||||||
|
async def _open_connection(
|
||||||
|
self, host: str, port: int, timeout: float
|
||||||
|
) -> Tuple[asyncio.StreamReader, asyncio.StreamWriter]:
|
||||||
|
"""Open a new TCP connection with keepalive options set."""
|
||||||
|
try:
|
||||||
|
reader, writer = await asyncio.wait_for(
|
||||||
|
asyncio.open_connection(host, port), timeout=timeout
|
||||||
|
)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
raise ConnectionError(f"Failed to connect to device at {host}:{port}")
|
||||||
|
except Exception as e:
|
||||||
|
raise ConnectionError(f"Failed to connect to device: {e}")
|
||||||
|
|
||||||
|
# Set TCP keepalive on the underlying socket
|
||||||
|
self._set_keepalive(writer)
|
||||||
|
return reader, writer
|
||||||
|
|
||||||
|
def _set_keepalive(self, writer: asyncio.StreamWriter):
|
||||||
|
"""Configure OS-level TCP keepalive on the connection socket."""
|
||||||
|
try:
|
||||||
|
sock = writer.transport.get_extra_info("socket")
|
||||||
|
if sock is None:
|
||||||
|
logger.warning("Could not access underlying socket for keepalive")
|
||||||
|
return
|
||||||
|
|
||||||
|
sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
|
||||||
|
|
||||||
|
# Linux-specific keepalive tuning
|
||||||
|
if hasattr(socket, "TCP_KEEPIDLE"):
|
||||||
|
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, self._keepalive_idle)
|
||||||
|
if hasattr(socket, "TCP_KEEPINTVL"):
|
||||||
|
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, self._keepalive_interval)
|
||||||
|
if hasattr(socket, "TCP_KEEPCNT"):
|
||||||
|
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, self._keepalive_count)
|
||||||
|
|
||||||
|
logger.debug(
|
||||||
|
f"TCP keepalive set: idle={self._keepalive_idle}s, "
|
||||||
|
f"interval={self._keepalive_interval}s, count={self._keepalive_count}"
|
||||||
|
)
|
||||||
|
except OSError as e:
|
||||||
|
logger.warning(f"Failed to set TCP keepalive: {e}")
|
||||||
|
|
||||||
|
def _is_alive(self, conn: DeviceConnection) -> bool:
|
||||||
|
"""Check whether a cached connection is still usable."""
|
||||||
|
now = time.time()
|
||||||
|
|
||||||
|
# Age / idle checks (value of -1 disables the check)
|
||||||
|
if self._idle_ttl >= 0 and now - conn.last_used_at > self._idle_ttl:
|
||||||
|
logger.debug(f"Connection {conn.device_key} idle too long ({now - conn.last_used_at:.0f}s > {self._idle_ttl}s)")
|
||||||
|
return False
|
||||||
|
if self._max_age >= 0 and now - conn.created_at > self._max_age:
|
||||||
|
logger.debug(f"Connection {conn.device_key} too old ({now - conn.created_at:.0f}s > {self._max_age}s)")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Transport-level checks
|
||||||
|
transport = conn.writer.transport
|
||||||
|
if transport.is_closing():
|
||||||
|
logger.debug(f"Connection {conn.device_key} transport is closing")
|
||||||
|
return False
|
||||||
|
if conn.reader.at_eof():
|
||||||
|
logger.debug(f"Connection {conn.device_key} reader at EOF")
|
||||||
|
return False
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _drain_buffer(reader: asyncio.StreamReader):
|
||||||
|
"""Drain any pending bytes (e.g. '$' prompt) from an idle connection."""
|
||||||
|
buf = reader._buffer # noqa: SLF001 — internal but stable across CPython
|
||||||
|
if buf:
|
||||||
|
pending = bytes(buf)
|
||||||
|
buf.clear()
|
||||||
|
logger.debug(f"Drained {len(pending)} bytes from cached connection: {pending!r}")
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _close_writer(writer: asyncio.StreamWriter):
|
||||||
|
"""Close a writer, suppressing errors."""
|
||||||
|
try:
|
||||||
|
writer.close()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
async def _close_connection(self, conn: DeviceConnection, reason: str = ""):
|
||||||
|
"""Fully close a cached connection."""
|
||||||
|
logger.debug(f"Closing connection {conn.device_key} ({reason})")
|
||||||
|
conn.writer.close()
|
||||||
|
with contextlib.suppress(Exception):
|
||||||
|
await conn.writer.wait_closed()
|
||||||
|
|
||||||
|
async def _cleanup_loop(self):
|
||||||
|
"""Periodically evict idle/expired connections."""
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
await asyncio.sleep(30)
|
||||||
|
async with self._lock:
|
||||||
|
for key in list(self._connections):
|
||||||
|
conn = self._connections[key]
|
||||||
|
if not self._is_alive(conn):
|
||||||
|
del self._connections[key]
|
||||||
|
await self._close_connection(conn, reason="cleanup")
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
# Module-level pool singleton
|
||||||
|
_connection_pool = ConnectionPool(
|
||||||
|
enable_persistent=TCP_PERSISTENT_ENABLED,
|
||||||
|
idle_ttl=TCP_IDLE_TTL,
|
||||||
|
max_age=TCP_MAX_AGE,
|
||||||
|
keepalive_idle=TCP_KEEPALIVE_IDLE,
|
||||||
|
keepalive_interval=TCP_KEEPALIVE_INTERVAL,
|
||||||
|
keepalive_count=TCP_KEEPALIVE_COUNT,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class NL43Client:
|
class NL43Client:
|
||||||
def __init__(self, host: str, port: int, timeout: float = 5.0, ftp_username: str = None, ftp_password: str = None, ftp_port: int = 21):
|
def __init__(self, host: str, port: int, timeout: float = 5.0, ftp_username: str = None, ftp_password: str = None, ftp_port: int = 21):
|
||||||
@@ -117,7 +534,12 @@ class NL43Client:
|
|||||||
self.device_key = f"{host}:{port}"
|
self.device_key = f"{host}:{port}"
|
||||||
|
|
||||||
async def _enforce_rate_limit(self):
|
async def _enforce_rate_limit(self):
|
||||||
"""Ensure ≥1 second between commands to the same device."""
|
"""Ensure ≥1 second between commands to the same device.
|
||||||
|
|
||||||
|
NL43 protocol requires ≥1s after the device responds before sending
|
||||||
|
the next command. The timestamp is recorded after each command completes
|
||||||
|
(connection closed), so we measure from completion, not from send time.
|
||||||
|
"""
|
||||||
async with _rate_limit_lock:
|
async with _rate_limit_lock:
|
||||||
last_time = _last_command_time.get(self.device_key, 0)
|
last_time = _last_command_time.get(self.device_key, 0)
|
||||||
elapsed = time.time() - last_time
|
elapsed = time.time() - last_time
|
||||||
@@ -125,7 +547,6 @@ class NL43Client:
|
|||||||
wait_time = 1.0 - elapsed
|
wait_time = 1.0 - elapsed
|
||||||
logger.debug(f"Rate limiting: waiting {wait_time:.2f}s for {self.device_key}")
|
logger.debug(f"Rate limiting: waiting {wait_time:.2f}s for {self.device_key}")
|
||||||
await asyncio.sleep(wait_time)
|
await asyncio.sleep(wait_time)
|
||||||
_last_command_time[self.device_key] = time.time()
|
|
||||||
|
|
||||||
async def _send_command(self, cmd: str) -> str:
|
async def _send_command(self, cmd: str) -> str:
|
||||||
"""Send ASCII command to NL43 device via TCP.
|
"""Send ASCII command to NL43 device via TCP.
|
||||||
@@ -133,69 +554,107 @@ class NL43Client:
|
|||||||
NL43 protocol returns two lines for query commands:
|
NL43 protocol returns two lines for query commands:
|
||||||
Line 1: Result code (R+0000 for success, error codes otherwise)
|
Line 1: Result code (R+0000 for success, error codes otherwise)
|
||||||
Line 2: Actual data (for query commands ending with '?')
|
Line 2: Actual data (for query commands ending with '?')
|
||||||
|
|
||||||
|
This method acquires a per-device lock to ensure only one TCP connection
|
||||||
|
is active at a time (NL43 devices only support single connections).
|
||||||
|
"""
|
||||||
|
# Acquire per-device lock to prevent concurrent connections
|
||||||
|
device_lock = await _get_device_lock(self.device_key)
|
||||||
|
async with device_lock:
|
||||||
|
return await self._send_command_unlocked(cmd)
|
||||||
|
|
||||||
|
async def _send_command_unlocked(self, cmd: str) -> str:
|
||||||
|
"""Internal: send command without acquiring device lock (lock must be held by caller).
|
||||||
|
|
||||||
|
Uses the connection pool to reuse cached TCP connections when possible.
|
||||||
|
If a cached connection fails, retries once with a fresh connection.
|
||||||
"""
|
"""
|
||||||
await self._enforce_rate_limit()
|
await self._enforce_rate_limit()
|
||||||
|
|
||||||
logger.info(f"Sending command to {self.device_key}: {cmd.strip()}")
|
logger.info(f"Sending command to {self.device_key}: {cmd.strip()}")
|
||||||
|
|
||||||
try:
|
try:
|
||||||
reader, writer = await asyncio.wait_for(
|
reader, writer, from_cache = await _connection_pool.acquire(
|
||||||
asyncio.open_connection(self.host, self.port), timeout=self.timeout
|
self.device_key, self.host, self.port, self.timeout
|
||||||
)
|
)
|
||||||
except asyncio.TimeoutError:
|
except ConnectionError:
|
||||||
logger.error(f"Connection timeout to {self.device_key}")
|
logger.error(f"Connection failed to {self.device_key}")
|
||||||
raise ConnectionError(f"Failed to connect to device at {self.host}:{self.port}")
|
raise
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Connection failed to {self.device_key}: {e}")
|
|
||||||
raise ConnectionError(f"Failed to connect to device: {str(e)}")
|
|
||||||
|
|
||||||
try:
|
try:
|
||||||
writer.write(cmd.encode("ascii"))
|
response = await self._execute_command(reader, writer, cmd)
|
||||||
await writer.drain()
|
# Success — return connection to pool for reuse
|
||||||
|
await _connection_pool.release(self.device_key, reader, writer, self.host, self.port)
|
||||||
|
_last_command_time[self.device_key] = time.time()
|
||||||
|
return response
|
||||||
|
|
||||||
# Read first line (result code)
|
|
||||||
first_line_data = await asyncio.wait_for(reader.readuntil(b"\n"), timeout=self.timeout)
|
|
||||||
result_code = first_line_data.decode(errors="ignore").strip()
|
|
||||||
|
|
||||||
# Remove leading $ prompt if present
|
|
||||||
if result_code.startswith("$"):
|
|
||||||
result_code = result_code[1:].strip()
|
|
||||||
|
|
||||||
logger.info(f"Result code from {self.device_key}: {result_code}")
|
|
||||||
|
|
||||||
# Check result code
|
|
||||||
if result_code == "R+0000":
|
|
||||||
# Success - for query commands, read the second line with actual data
|
|
||||||
is_query = cmd.strip().endswith("?")
|
|
||||||
if is_query:
|
|
||||||
data_line = await asyncio.wait_for(reader.readuntil(b"\n"), timeout=self.timeout)
|
|
||||||
response = data_line.decode(errors="ignore").strip()
|
|
||||||
logger.debug(f"Data line from {self.device_key}: {response}")
|
|
||||||
return response
|
|
||||||
else:
|
|
||||||
# Setting command - return success code
|
|
||||||
return result_code
|
|
||||||
elif result_code == "R+0001":
|
|
||||||
raise ValueError("Command error - device did not recognize command")
|
|
||||||
elif result_code == "R+0002":
|
|
||||||
raise ValueError("Parameter error - invalid parameter value")
|
|
||||||
elif result_code == "R+0003":
|
|
||||||
raise ValueError("Spec/type error - command not supported by this device model")
|
|
||||||
elif result_code == "R+0004":
|
|
||||||
raise ValueError("Status error - device is in wrong state for this command")
|
|
||||||
else:
|
|
||||||
raise ValueError(f"Unknown result code: {result_code}")
|
|
||||||
|
|
||||||
except asyncio.TimeoutError:
|
|
||||||
logger.error(f"Response timeout from {self.device_key}")
|
|
||||||
raise TimeoutError(f"Device did not respond within {self.timeout}s")
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Communication error with {self.device_key}: {e}")
|
# Discard the bad connection
|
||||||
raise
|
await _connection_pool.discard(self.device_key)
|
||||||
finally:
|
ConnectionPool._close_writer(writer)
|
||||||
writer.close()
|
|
||||||
with contextlib.suppress(Exception):
|
if from_cache:
|
||||||
await writer.wait_closed()
|
# Retry once with a fresh connection — the cached one may have gone stale
|
||||||
|
logger.warning(f"Cached connection failed for {self.device_key}, retrying fresh: {e}")
|
||||||
|
await self._enforce_rate_limit()
|
||||||
|
|
||||||
|
try:
|
||||||
|
reader, writer, _ = await _connection_pool.acquire(
|
||||||
|
self.device_key, self.host, self.port, self.timeout
|
||||||
|
)
|
||||||
|
except ConnectionError:
|
||||||
|
logger.error(f"Retry connection also failed to {self.device_key}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = await self._execute_command(reader, writer, cmd)
|
||||||
|
await _connection_pool.release(self.device_key, reader, writer, self.host, self.port)
|
||||||
|
_last_command_time[self.device_key] = time.time()
|
||||||
|
return response
|
||||||
|
except Exception:
|
||||||
|
await _connection_pool.discard(self.device_key)
|
||||||
|
ConnectionPool._close_writer(writer)
|
||||||
|
raise
|
||||||
|
else:
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def _execute_command(self, reader: asyncio.StreamReader, writer: asyncio.StreamWriter, cmd: str) -> str:
|
||||||
|
"""Send a command over an existing connection and parse the NL43 response."""
|
||||||
|
writer.write(cmd.encode("ascii"))
|
||||||
|
await writer.drain()
|
||||||
|
|
||||||
|
# Read first line (result code)
|
||||||
|
first_line_data = await asyncio.wait_for(reader.readuntil(b"\n"), timeout=self.timeout)
|
||||||
|
result_code = first_line_data.decode(errors="ignore").strip()
|
||||||
|
|
||||||
|
# Remove leading $ prompt if present
|
||||||
|
if result_code.startswith("$"):
|
||||||
|
result_code = result_code[1:].strip()
|
||||||
|
|
||||||
|
logger.info(f"Result code from {self.device_key}: {result_code}")
|
||||||
|
|
||||||
|
# Check result code
|
||||||
|
if result_code == "R+0000":
|
||||||
|
# Success — for query commands, read the second line with actual data
|
||||||
|
is_query = cmd.strip().endswith("?")
|
||||||
|
if is_query:
|
||||||
|
data_line = await asyncio.wait_for(reader.readuntil(b"\n"), timeout=self.timeout)
|
||||||
|
response = data_line.decode(errors="ignore").strip()
|
||||||
|
logger.debug(f"Data line from {self.device_key}: {response}")
|
||||||
|
return response
|
||||||
|
else:
|
||||||
|
# Setting command — return success code
|
||||||
|
return result_code
|
||||||
|
elif result_code == "R+0001":
|
||||||
|
raise ValueError("Command error - device did not recognize command")
|
||||||
|
elif result_code == "R+0002":
|
||||||
|
raise ValueError("Parameter error - invalid parameter value")
|
||||||
|
elif result_code == "R+0003":
|
||||||
|
raise ValueError("Spec/type error - command not supported by this device model")
|
||||||
|
elif result_code == "R+0004":
|
||||||
|
raise ValueError("Status error - device is in wrong state for this command")
|
||||||
|
else:
|
||||||
|
raise ValueError(f"Unknown result code: {result_code}")
|
||||||
|
|
||||||
async def request_dod(self) -> NL43Snapshot:
|
async def request_dod(self) -> NL43Snapshot:
|
||||||
"""Request DOD (Data Output Display) snapshot from device.
|
"""Request DOD (Data Output Display) snapshot from device.
|
||||||
@@ -429,105 +888,111 @@ class NL43Client:
|
|||||||
|
|
||||||
The stream continues until an exception occurs or the connection is closed.
|
The stream continues until an exception occurs or the connection is closed.
|
||||||
Send SUB character (0x1A) to stop the stream.
|
Send SUB character (0x1A) to stop the stream.
|
||||||
|
|
||||||
|
NOTE: This method holds the device lock for the entire duration of streaming,
|
||||||
|
blocking other commands to this device. This is intentional since NL43 devices
|
||||||
|
only support one TCP connection at a time.
|
||||||
"""
|
"""
|
||||||
await self._enforce_rate_limit()
|
# Acquire per-device lock - held for entire streaming session
|
||||||
|
device_lock = await _get_device_lock(self.device_key)
|
||||||
|
async with device_lock:
|
||||||
|
# Evict any cached connection — streaming needs its own dedicated socket
|
||||||
|
await _connection_pool.discard(self.device_key)
|
||||||
|
await self._enforce_rate_limit()
|
||||||
|
|
||||||
logger.info(f"Starting DRD stream for {self.device_key}")
|
logger.info(f"Starting DRD stream for {self.device_key}")
|
||||||
|
|
||||||
try:
|
|
||||||
reader, writer = await asyncio.wait_for(
|
|
||||||
asyncio.open_connection(self.host, self.port), timeout=self.timeout
|
|
||||||
)
|
|
||||||
except asyncio.TimeoutError:
|
|
||||||
logger.error(f"DRD stream connection timeout to {self.device_key}")
|
|
||||||
raise ConnectionError(f"Failed to connect to device at {self.host}:{self.port}")
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"DRD stream connection failed to {self.device_key}: {e}")
|
|
||||||
raise ConnectionError(f"Failed to connect to device: {str(e)}")
|
|
||||||
|
|
||||||
try:
|
|
||||||
# Start DRD streaming
|
|
||||||
writer.write(b"DRD?\r\n")
|
|
||||||
await writer.drain()
|
|
||||||
|
|
||||||
# Read initial result code
|
|
||||||
first_line_data = await asyncio.wait_for(reader.readuntil(b"\n"), timeout=self.timeout)
|
|
||||||
result_code = first_line_data.decode(errors="ignore").strip()
|
|
||||||
|
|
||||||
if result_code.startswith("$"):
|
|
||||||
result_code = result_code[1:].strip()
|
|
||||||
|
|
||||||
logger.debug(f"DRD stream result code from {self.device_key}: {result_code}")
|
|
||||||
|
|
||||||
if result_code != "R+0000":
|
|
||||||
raise ValueError(f"DRD stream failed to start: {result_code}")
|
|
||||||
|
|
||||||
logger.info(f"DRD stream started successfully for {self.device_key}")
|
|
||||||
|
|
||||||
# Continuously read data lines
|
|
||||||
while True:
|
|
||||||
try:
|
|
||||||
line_data = await asyncio.wait_for(reader.readuntil(b"\n"), timeout=30.0)
|
|
||||||
line = line_data.decode(errors="ignore").strip()
|
|
||||||
|
|
||||||
if not line:
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Remove leading $ if present
|
|
||||||
if line.startswith("$"):
|
|
||||||
line = line[1:].strip()
|
|
||||||
|
|
||||||
# Parse the DRD data (same format as DOD)
|
|
||||||
parts = [p.strip() for p in line.split(",") if p.strip() != ""]
|
|
||||||
|
|
||||||
if len(parts) < 2:
|
|
||||||
logger.warning(f"Malformed DRD data from {self.device_key}: {line}")
|
|
||||||
continue
|
|
||||||
|
|
||||||
snap = NL43Snapshot(unit_id="", raw_payload=line, measurement_state="Measure")
|
|
||||||
|
|
||||||
# Parse known positions (DRD format - same as DOD)
|
|
||||||
# DRD format: d0=counter, d1=Lp, d2=Leq, d3=Lmax, d4=Lmin, d5=Lpeak, d6=LIeq, ...
|
|
||||||
try:
|
|
||||||
# Capture d0 (counter) for timer synchronization
|
|
||||||
if len(parts) >= 1:
|
|
||||||
snap.counter = parts[0] # d0: Measurement interval counter (1-600)
|
|
||||||
if len(parts) >= 2:
|
|
||||||
snap.lp = parts[1] # d1: Instantaneous sound pressure level
|
|
||||||
if len(parts) >= 3:
|
|
||||||
snap.leq = parts[2] # d2: Equivalent continuous sound level
|
|
||||||
if len(parts) >= 4:
|
|
||||||
snap.lmax = parts[3] # d3: Maximum level
|
|
||||||
if len(parts) >= 5:
|
|
||||||
snap.lmin = parts[4] # d4: Minimum level
|
|
||||||
if len(parts) >= 6:
|
|
||||||
snap.lpeak = parts[5] # d5: Peak level
|
|
||||||
except (IndexError, ValueError) as e:
|
|
||||||
logger.warning(f"Error parsing DRD data points: {e}")
|
|
||||||
|
|
||||||
# Call the callback with the snapshot
|
|
||||||
await callback(snap)
|
|
||||||
|
|
||||||
except asyncio.TimeoutError:
|
|
||||||
logger.warning(f"DRD stream timeout (no data for 30s) from {self.device_key}")
|
|
||||||
break
|
|
||||||
except asyncio.IncompleteReadError:
|
|
||||||
logger.info(f"DRD stream closed by device {self.device_key}")
|
|
||||||
break
|
|
||||||
|
|
||||||
finally:
|
|
||||||
# Send SUB character to stop streaming
|
|
||||||
try:
|
try:
|
||||||
writer.write(b"\x1A")
|
reader, writer = await _connection_pool._open_connection(
|
||||||
|
self.host, self.port, self.timeout
|
||||||
|
)
|
||||||
|
except ConnectionError:
|
||||||
|
logger.error(f"DRD stream connection failed to {self.device_key}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Start DRD streaming
|
||||||
|
writer.write(b"DRD?\r\n")
|
||||||
await writer.drain()
|
await writer.drain()
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
|
|
||||||
writer.close()
|
# Read initial result code
|
||||||
with contextlib.suppress(Exception):
|
first_line_data = await asyncio.wait_for(reader.readuntil(b"\n"), timeout=self.timeout)
|
||||||
await writer.wait_closed()
|
result_code = first_line_data.decode(errors="ignore").strip()
|
||||||
|
|
||||||
logger.info(f"DRD stream ended for {self.device_key}")
|
if result_code.startswith("$"):
|
||||||
|
result_code = result_code[1:].strip()
|
||||||
|
|
||||||
|
logger.debug(f"DRD stream result code from {self.device_key}: {result_code}")
|
||||||
|
|
||||||
|
if result_code != "R+0000":
|
||||||
|
raise ValueError(f"DRD stream failed to start: {result_code}")
|
||||||
|
|
||||||
|
logger.info(f"DRD stream started successfully for {self.device_key}")
|
||||||
|
|
||||||
|
# Continuously read data lines
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
line_data = await asyncio.wait_for(reader.readuntil(b"\n"), timeout=30.0)
|
||||||
|
line = line_data.decode(errors="ignore").strip()
|
||||||
|
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Remove leading $ if present
|
||||||
|
if line.startswith("$"):
|
||||||
|
line = line[1:].strip()
|
||||||
|
|
||||||
|
# Parse the DRD data (same format as DOD)
|
||||||
|
parts = [p.strip() for p in line.split(",") if p.strip() != ""]
|
||||||
|
|
||||||
|
if len(parts) < 2:
|
||||||
|
logger.warning(f"Malformed DRD data from {self.device_key}: {line}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
snap = NL43Snapshot(unit_id="", raw_payload=line, measurement_state="Measure")
|
||||||
|
|
||||||
|
# Parse known positions (DRD format - same as DOD)
|
||||||
|
# DRD format: d0=counter, d1=Lp, d2=Leq, d3=Lmax, d4=Lmin, d5=Lpeak, d6=LIeq, ...
|
||||||
|
try:
|
||||||
|
# Capture d0 (counter) for timer synchronization
|
||||||
|
if len(parts) >= 1:
|
||||||
|
snap.counter = parts[0] # d0: Measurement interval counter (1-600)
|
||||||
|
if len(parts) >= 2:
|
||||||
|
snap.lp = parts[1] # d1: Instantaneous sound pressure level
|
||||||
|
if len(parts) >= 3:
|
||||||
|
snap.leq = parts[2] # d2: Equivalent continuous sound level
|
||||||
|
if len(parts) >= 4:
|
||||||
|
snap.lmax = parts[3] # d3: Maximum level
|
||||||
|
if len(parts) >= 5:
|
||||||
|
snap.lmin = parts[4] # d4: Minimum level
|
||||||
|
if len(parts) >= 6:
|
||||||
|
snap.lpeak = parts[5] # d5: Peak level
|
||||||
|
except (IndexError, ValueError) as e:
|
||||||
|
logger.warning(f"Error parsing DRD data points: {e}")
|
||||||
|
|
||||||
|
# Call the callback with the snapshot
|
||||||
|
await callback(snap)
|
||||||
|
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
logger.warning(f"DRD stream timeout (no data for 30s) from {self.device_key}")
|
||||||
|
break
|
||||||
|
except asyncio.IncompleteReadError:
|
||||||
|
logger.info(f"DRD stream closed by device {self.device_key}")
|
||||||
|
break
|
||||||
|
|
||||||
|
finally:
|
||||||
|
# Send SUB character to stop streaming
|
||||||
|
try:
|
||||||
|
writer.write(b"\x1A")
|
||||||
|
await writer.drain()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
writer.close()
|
||||||
|
with contextlib.suppress(Exception):
|
||||||
|
await writer.wait_closed()
|
||||||
|
|
||||||
|
logger.info(f"DRD stream ended for {self.device_key}")
|
||||||
|
|
||||||
async def set_measurement_time(self, preset: str):
|
async def set_measurement_time(self, preset: str):
|
||||||
"""Set measurement time preset.
|
"""Set measurement time preset.
|
||||||
@@ -1236,11 +1701,42 @@ class NL43Client:
|
|||||||
result["stopped"] = True
|
result["stopped"] = True
|
||||||
logger.info(f"[STOP-CYCLE] Measurement stopped")
|
logger.info(f"[STOP-CYCLE] Measurement stopped")
|
||||||
|
|
||||||
# Step 2: Enable FTP
|
# Step 2: Reset FTP (disable then enable) to clear any stale state
|
||||||
logger.info(f"[STOP-CYCLE] Step 2: Enabling FTP")
|
logger.info(f"[STOP-CYCLE] Step 2: Resetting FTP (disable then enable)")
|
||||||
|
try:
|
||||||
|
await self.disable_ftp()
|
||||||
|
logger.info(f"[STOP-CYCLE] FTP disabled")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"[STOP-CYCLE] FTP disable failed (may already be off): {e}")
|
||||||
await self.enable_ftp()
|
await self.enable_ftp()
|
||||||
result["ftp_enabled"] = True
|
logger.info(f"[STOP-CYCLE] FTP enable command sent")
|
||||||
logger.info(f"[STOP-CYCLE] FTP enabled")
|
|
||||||
|
# Step 2b: Wait and verify FTP is ready (NL-43 needs time to start FTP server)
|
||||||
|
ftp_ready_timeout = 30 # seconds
|
||||||
|
ftp_check_interval = 2 # seconds
|
||||||
|
ftp_ready = False
|
||||||
|
elapsed = 0
|
||||||
|
|
||||||
|
logger.info(f"[STOP-CYCLE] Step 2b: Waiting up to {ftp_ready_timeout}s for FTP server to be ready")
|
||||||
|
while elapsed < ftp_ready_timeout:
|
||||||
|
await asyncio.sleep(ftp_check_interval)
|
||||||
|
elapsed += ftp_check_interval
|
||||||
|
try:
|
||||||
|
ftp_status = await self.get_ftp_status()
|
||||||
|
logger.info(f"[STOP-CYCLE] FTP status check at {elapsed}s: {ftp_status}")
|
||||||
|
if ftp_status.lower() == "on":
|
||||||
|
ftp_ready = True
|
||||||
|
logger.info(f"[STOP-CYCLE] FTP server confirmed ready after {elapsed}s")
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"[STOP-CYCLE] FTP status check failed at {elapsed}s: {e}")
|
||||||
|
|
||||||
|
if ftp_ready:
|
||||||
|
result["ftp_enabled"] = True
|
||||||
|
logger.info(f"[STOP-CYCLE] FTP enabled and verified")
|
||||||
|
else:
|
||||||
|
logger.warning(f"[STOP-CYCLE] FTP not confirmed ready after {ftp_ready_timeout}s, proceeding anyway")
|
||||||
|
result["ftp_enabled"] = True # Command was sent, just not verified
|
||||||
|
|
||||||
if not download:
|
if not download:
|
||||||
logger.info(f"[STOP-CYCLE] === Cycle complete (download=False) ===")
|
logger.info(f"[STOP-CYCLE] === Cycle complete (download=False) ===")
|
||||||
|
|||||||
73
migrate_add_device_logs.py
Normal file
73
migrate_add_device_logs.py
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Database migration: Add device_logs table.
|
||||||
|
|
||||||
|
This table stores per-device log entries for debugging and audit trail.
|
||||||
|
|
||||||
|
Run this once to add the new table.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sqlite3
|
||||||
|
import os
|
||||||
|
|
||||||
|
# Path to the SLMM database
|
||||||
|
DB_PATH = os.path.join(os.path.dirname(__file__), "data", "slmm.db")
|
||||||
|
|
||||||
|
|
||||||
|
def migrate():
|
||||||
|
print(f"Adding device_logs table to: {DB_PATH}")
|
||||||
|
|
||||||
|
if not os.path.exists(DB_PATH):
|
||||||
|
print("Database does not exist yet. Table will be created automatically on first run.")
|
||||||
|
return
|
||||||
|
|
||||||
|
conn = sqlite3.connect(DB_PATH)
|
||||||
|
cursor = conn.cursor()
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Check if table already exists
|
||||||
|
cursor.execute("""
|
||||||
|
SELECT name FROM sqlite_master
|
||||||
|
WHERE type='table' AND name='device_logs'
|
||||||
|
""")
|
||||||
|
if cursor.fetchone():
|
||||||
|
print("✓ device_logs table already exists, no migration needed")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Create the table
|
||||||
|
print("Creating device_logs table...")
|
||||||
|
cursor.execute("""
|
||||||
|
CREATE TABLE device_logs (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
unit_id VARCHAR NOT NULL,
|
||||||
|
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||||
|
level VARCHAR DEFAULT 'INFO',
|
||||||
|
category VARCHAR DEFAULT 'GENERAL',
|
||||||
|
message TEXT NOT NULL
|
||||||
|
)
|
||||||
|
""")
|
||||||
|
|
||||||
|
# Create indexes for efficient querying
|
||||||
|
print("Creating indexes...")
|
||||||
|
cursor.execute("CREATE INDEX ix_device_logs_unit_id ON device_logs (unit_id)")
|
||||||
|
cursor.execute("CREATE INDEX ix_device_logs_timestamp ON device_logs (timestamp)")
|
||||||
|
|
||||||
|
conn.commit()
|
||||||
|
print("✓ Created device_logs table with indexes")
|
||||||
|
|
||||||
|
# Verify
|
||||||
|
cursor.execute("""
|
||||||
|
SELECT name FROM sqlite_master
|
||||||
|
WHERE type='table' AND name='device_logs'
|
||||||
|
""")
|
||||||
|
if not cursor.fetchone():
|
||||||
|
raise Exception("device_logs table was not created successfully")
|
||||||
|
|
||||||
|
print("✓ Migration completed successfully")
|
||||||
|
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
migrate()
|
||||||
60
migrate_add_start_time_sync_flag.py
Normal file
60
migrate_add_start_time_sync_flag.py
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Database migration: Add start_time_sync_attempted field to nl43_status table.
|
||||||
|
|
||||||
|
This field tracks whether FTP sync has been attempted for the current measurement,
|
||||||
|
preventing repeated sync attempts when FTP fails.
|
||||||
|
|
||||||
|
Run this once to add the new column.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sqlite3
|
||||||
|
import os
|
||||||
|
|
||||||
|
# Path to the SLMM database
|
||||||
|
DB_PATH = os.path.join(os.path.dirname(__file__), "data", "slmm.db")
|
||||||
|
|
||||||
|
|
||||||
|
def migrate():
|
||||||
|
print(f"Adding start_time_sync_attempted field to: {DB_PATH}")
|
||||||
|
|
||||||
|
if not os.path.exists(DB_PATH):
|
||||||
|
print("Database does not exist yet. Column will be created automatically.")
|
||||||
|
return
|
||||||
|
|
||||||
|
conn = sqlite3.connect(DB_PATH)
|
||||||
|
cursor = conn.cursor()
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Check if column already exists
|
||||||
|
cursor.execute("PRAGMA table_info(nl43_status)")
|
||||||
|
columns = [col[1] for col in cursor.fetchall()]
|
||||||
|
|
||||||
|
if 'start_time_sync_attempted' in columns:
|
||||||
|
print("✓ start_time_sync_attempted column already exists, no migration needed")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Add the column
|
||||||
|
print("Adding start_time_sync_attempted column...")
|
||||||
|
cursor.execute("""
|
||||||
|
ALTER TABLE nl43_status
|
||||||
|
ADD COLUMN start_time_sync_attempted BOOLEAN DEFAULT 0
|
||||||
|
""")
|
||||||
|
conn.commit()
|
||||||
|
print("✓ Added start_time_sync_attempted column")
|
||||||
|
|
||||||
|
# Verify
|
||||||
|
cursor.execute("PRAGMA table_info(nl43_status)")
|
||||||
|
columns = [col[1] for col in cursor.fetchall()]
|
||||||
|
|
||||||
|
if 'start_time_sync_attempted' not in columns:
|
||||||
|
raise Exception("start_time_sync_attempted column was not added successfully")
|
||||||
|
|
||||||
|
print("✓ Migration completed successfully")
|
||||||
|
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
migrate()
|
||||||
@@ -333,6 +333,134 @@
|
|||||||
|
|
||||||
html += `<p style="margin-top: 12px; font-size: 0.9em; color: #666;">Last run: ${new Date(data.timestamp).toLocaleString()}</p>`;
|
html += `<p style="margin-top: 12px; font-size: 0.9em; color: #666;">Last run: ${new Date(data.timestamp).toLocaleString()}</p>`;
|
||||||
|
|
||||||
|
// Add database dump section if available
|
||||||
|
if (data.database_dump) {
|
||||||
|
html += `<div style="margin-top: 16px; border-top: 1px solid #d0d7de; padding-top: 12px;">`;
|
||||||
|
html += `<h4 style="margin: 0 0 12px 0;">📦 Database Dump</h4>`;
|
||||||
|
|
||||||
|
// Config section
|
||||||
|
if (data.database_dump.config) {
|
||||||
|
const cfg = data.database_dump.config;
|
||||||
|
html += `<div style="background: #f0f4f8; padding: 12px; border-radius: 4px; margin-bottom: 12px;">`;
|
||||||
|
html += `<strong>Configuration (nl43_config)</strong>`;
|
||||||
|
html += `<table style="width: 100%; margin-top: 8px; font-size: 0.9em;">`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Host</td><td>${cfg.host}:${cfg.tcp_port}</td></tr>`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">TCP Enabled</td><td>${cfg.tcp_enabled ? '✓' : '✗'}</td></tr>`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">FTP Enabled</td><td>${cfg.ftp_enabled ? '✓' : '✗'}${cfg.ftp_enabled ? ` (port ${cfg.ftp_port}, user: ${cfg.ftp_username || 'none'})` : ''}</td></tr>`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Background Polling</td><td>${cfg.poll_enabled ? `✓ every ${cfg.poll_interval_seconds}s` : '✗ disabled'}</td></tr>`;
|
||||||
|
html += `</table></div>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Status cache section
|
||||||
|
if (data.database_dump.status_cache) {
|
||||||
|
const cache = data.database_dump.status_cache;
|
||||||
|
html += `<div style="background: #f0f8f4; padding: 12px; border-radius: 4px; margin-bottom: 12px;">`;
|
||||||
|
html += `<strong>Status Cache (nl43_status)</strong>`;
|
||||||
|
html += `<table style="width: 100%; margin-top: 8px; font-size: 0.9em;">`;
|
||||||
|
|
||||||
|
// Measurement state and timing
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Measurement State</td><td><strong>${cache.measurement_state || 'unknown'}</strong></td></tr>`;
|
||||||
|
if (cache.measurement_start_time) {
|
||||||
|
const startTime = new Date(cache.measurement_start_time);
|
||||||
|
const elapsed = Math.floor((Date.now() - startTime) / 1000);
|
||||||
|
const elapsedStr = elapsed > 3600 ? `${Math.floor(elapsed/3600)}h ${Math.floor((elapsed%3600)/60)}m` : elapsed > 60 ? `${Math.floor(elapsed/60)}m ${elapsed%60}s` : `${elapsed}s`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Measurement Started</td><td>${startTime.toLocaleString()} (${elapsedStr} ago)</td></tr>`;
|
||||||
|
}
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Counter (d0)</td><td>${cache.counter || 'N/A'}</td></tr>`;
|
||||||
|
|
||||||
|
// Sound levels
|
||||||
|
html += `<tr><td colspan="2" style="padding: 8px 8px 2px 8px; font-weight: 600; border-top: 1px solid #d0d7de;">Sound Levels (dB)</td></tr>`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Lp (Instantaneous)</td><td>${cache.lp || 'N/A'}</td></tr>`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Leq (Equivalent)</td><td>${cache.leq || 'N/A'}</td></tr>`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Lmax / Lmin</td><td>${cache.lmax || 'N/A'} / ${cache.lmin || 'N/A'}</td></tr>`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Lpeak</td><td>${cache.lpeak || 'N/A'}</td></tr>`;
|
||||||
|
|
||||||
|
// Device status
|
||||||
|
html += `<tr><td colspan="2" style="padding: 8px 8px 2px 8px; font-weight: 600; border-top: 1px solid #d0d7de;">Device Status</td></tr>`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Battery</td><td>${cache.battery_level || 'N/A'}${cache.power_source ? ` (${cache.power_source})` : ''}</td></tr>`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">SD Card</td><td>${cache.sd_remaining_mb ? `${cache.sd_remaining_mb} MB` : 'N/A'}${cache.sd_free_ratio ? ` (${cache.sd_free_ratio} free)` : ''}</td></tr>`;
|
||||||
|
|
||||||
|
// Polling status
|
||||||
|
html += `<tr><td colspan="2" style="padding: 8px 8px 2px 8px; font-weight: 600; border-top: 1px solid #d0d7de;">Polling Status</td></tr>`;
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Reachable</td><td>${cache.is_reachable ? '🟢 Yes' : '🔴 No'}</td></tr>`;
|
||||||
|
if (cache.last_seen) {
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Last Seen</td><td>${new Date(cache.last_seen).toLocaleString()}</td></tr>`;
|
||||||
|
}
|
||||||
|
if (cache.last_success) {
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Last Success</td><td>${new Date(cache.last_success).toLocaleString()}</td></tr>`;
|
||||||
|
}
|
||||||
|
if (cache.last_poll_attempt) {
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Last Poll Attempt</td><td>${new Date(cache.last_poll_attempt).toLocaleString()}</td></tr>`;
|
||||||
|
}
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Consecutive Failures</td><td>${cache.consecutive_failures || 0}</td></tr>`;
|
||||||
|
if (cache.last_error) {
|
||||||
|
html += `<tr><td style="padding: 2px 8px; color: #666;">Last Error</td><td style="color: #d00; font-size: 0.85em;">${cache.last_error}</td></tr>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
html += `</table></div>`;
|
||||||
|
|
||||||
|
// Raw payload (collapsible)
|
||||||
|
if (cache.raw_payload) {
|
||||||
|
html += `<details style="margin-top: 8px;"><summary style="cursor: pointer; color: #666; font-size: 0.9em;">📄 Raw Payload</summary>`;
|
||||||
|
html += `<pre style="background: #f6f8fa; padding: 8px; border-radius: 4px; font-size: 0.8em; overflow-x: auto; margin-top: 8px;">${cache.raw_payload}</pre></details>`;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
html += `<p style="color: #888; font-style: italic;">No cached status available for this unit.</p>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
html += `</div>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch and display device logs
|
||||||
|
try {
|
||||||
|
const logsRes = await fetch(`/api/nl43/${unitId}/logs?limit=50`);
|
||||||
|
if (logsRes.ok) {
|
||||||
|
const logsData = await logsRes.json();
|
||||||
|
if (logsData.logs && logsData.logs.length > 0) {
|
||||||
|
html += `<div style="margin-top: 16px; border-top: 1px solid #d0d7de; padding-top: 12px;">`;
|
||||||
|
html += `<h4 style="margin: 0 0 12px 0;">📋 Device Logs (${logsData.stats.total} total)</h4>`;
|
||||||
|
|
||||||
|
// Stats summary
|
||||||
|
if (logsData.stats.by_level) {
|
||||||
|
html += `<div style="margin-bottom: 8px; font-size: 0.85em; color: #666;">`;
|
||||||
|
const levels = logsData.stats.by_level;
|
||||||
|
const parts = [];
|
||||||
|
if (levels.ERROR) parts.push(`<span style="color: #d00;">${levels.ERROR} errors</span>`);
|
||||||
|
if (levels.WARNING) parts.push(`<span style="color: #fa0;">${levels.WARNING} warnings</span>`);
|
||||||
|
if (levels.INFO) parts.push(`${levels.INFO} info`);
|
||||||
|
html += parts.join(' · ');
|
||||||
|
html += `</div>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Log entries (collapsible)
|
||||||
|
html += `<details open><summary style="cursor: pointer; font-size: 0.9em; margin-bottom: 8px;">Recent entries (${logsData.logs.length})</summary>`;
|
||||||
|
html += `<div style="max-height: 300px; overflow-y: auto; background: #f6f8fa; border: 1px solid #d0d7de; border-radius: 4px; padding: 8px; font-size: 0.8em; font-family: monospace;">`;
|
||||||
|
|
||||||
|
logsData.logs.forEach(entry => {
|
||||||
|
const levelColor = {
|
||||||
|
'ERROR': '#d00',
|
||||||
|
'WARNING': '#b86e00',
|
||||||
|
'INFO': '#0969da',
|
||||||
|
'DEBUG': '#888'
|
||||||
|
}[entry.level] || '#666';
|
||||||
|
|
||||||
|
const time = new Date(entry.timestamp).toLocaleString();
|
||||||
|
html += `<div style="margin-bottom: 4px; border-bottom: 1px solid #eee; padding-bottom: 4px;">`;
|
||||||
|
html += `<span style="color: #888;">${time}</span> `;
|
||||||
|
html += `<span style="color: ${levelColor}; font-weight: 600;">[${entry.level}]</span> `;
|
||||||
|
html += `<span style="color: #666;">[${entry.category}]</span> `;
|
||||||
|
html += `${entry.message}`;
|
||||||
|
html += `</div>`;
|
||||||
|
});
|
||||||
|
|
||||||
|
html += `</div></details>`;
|
||||||
|
html += `</div>`;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch (logErr) {
|
||||||
|
console.log('Could not fetch device logs:', logErr);
|
||||||
|
}
|
||||||
|
|
||||||
resultsEl.innerHTML = html;
|
resultsEl.innerHTML = html;
|
||||||
log(`Diagnostics complete: ${data.overall_status}`);
|
log(`Diagnostics complete: ${data.overall_status}`);
|
||||||
|
|
||||||
|
|||||||
@@ -3,7 +3,7 @@
|
|||||||
<head>
|
<head>
|
||||||
<meta charset="UTF-8" />
|
<meta charset="UTF-8" />
|
||||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
<title>SLMM Roster - Sound Level Meter Configuration</title>
|
<title>SLMM - Device Roster & Connections</title>
|
||||||
<style>
|
<style>
|
||||||
* { box-sizing: border-box; }
|
* { box-sizing: border-box; }
|
||||||
body {
|
body {
|
||||||
@@ -227,41 +227,165 @@
|
|||||||
}
|
}
|
||||||
.toast-success { background: #2da44e; }
|
.toast-success { background: #2da44e; }
|
||||||
.toast-error { background: #cf222e; }
|
.toast-error { background: #cf222e; }
|
||||||
|
|
||||||
|
/* Tabs */
|
||||||
|
.tabs {
|
||||||
|
display: flex;
|
||||||
|
gap: 0;
|
||||||
|
margin-bottom: 0;
|
||||||
|
border-bottom: 2px solid #d0d7de;
|
||||||
|
}
|
||||||
|
.tab-btn {
|
||||||
|
padding: 10px 20px;
|
||||||
|
border: none;
|
||||||
|
background: none;
|
||||||
|
cursor: pointer;
|
||||||
|
font-size: 14px;
|
||||||
|
font-weight: 600;
|
||||||
|
color: #57606a;
|
||||||
|
border-bottom: 2px solid transparent;
|
||||||
|
margin-bottom: -2px;
|
||||||
|
transition: color 0.2s, border-color 0.2s;
|
||||||
|
}
|
||||||
|
.tab-btn:hover { color: #24292f; }
|
||||||
|
.tab-btn.active {
|
||||||
|
color: #24292f;
|
||||||
|
border-bottom-color: #fd8c73;
|
||||||
|
}
|
||||||
|
.tab-panel { display: none; }
|
||||||
|
.tab-panel.active { display: block; }
|
||||||
|
|
||||||
|
/* Connection pool panel */
|
||||||
|
.pool-config {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(auto-fill, minmax(180px, 1fr));
|
||||||
|
gap: 12px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
}
|
||||||
|
.pool-config-card {
|
||||||
|
background: #f6f8fa;
|
||||||
|
border: 1px solid #d0d7de;
|
||||||
|
border-radius: 6px;
|
||||||
|
padding: 12px;
|
||||||
|
}
|
||||||
|
.pool-config-card .label {
|
||||||
|
font-size: 11px;
|
||||||
|
color: #57606a;
|
||||||
|
text-transform: uppercase;
|
||||||
|
font-weight: 600;
|
||||||
|
margin-bottom: 4px;
|
||||||
|
}
|
||||||
|
.pool-config-card .value {
|
||||||
|
font-size: 18px;
|
||||||
|
font-weight: 600;
|
||||||
|
color: #24292f;
|
||||||
|
}
|
||||||
|
.conn-card {
|
||||||
|
background: white;
|
||||||
|
border: 1px solid #d0d7de;
|
||||||
|
border-radius: 6px;
|
||||||
|
padding: 16px;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
}
|
||||||
|
.conn-card-header {
|
||||||
|
display: flex;
|
||||||
|
justify-content: space-between;
|
||||||
|
align-items: center;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
}
|
||||||
|
.conn-card-header strong { font-size: 15px; }
|
||||||
|
.conn-card-grid {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(auto-fill, minmax(140px, 1fr));
|
||||||
|
gap: 8px;
|
||||||
|
}
|
||||||
|
.conn-stat .label {
|
||||||
|
font-size: 11px;
|
||||||
|
color: #57606a;
|
||||||
|
text-transform: uppercase;
|
||||||
|
font-weight: 600;
|
||||||
|
}
|
||||||
|
.conn-stat .value {
|
||||||
|
font-size: 14px;
|
||||||
|
font-weight: 600;
|
||||||
|
color: #24292f;
|
||||||
|
}
|
||||||
|
.conn-empty {
|
||||||
|
text-align: center;
|
||||||
|
padding: 32px;
|
||||||
|
color: #57606a;
|
||||||
|
}
|
||||||
|
.pool-actions {
|
||||||
|
display: flex;
|
||||||
|
gap: 8px;
|
||||||
|
margin-bottom: 16px;
|
||||||
|
}
|
||||||
</style>
|
</style>
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<div class="container">
|
<div class="container">
|
||||||
<div class="header">
|
<div class="header">
|
||||||
<h1>📊 Sound Level Meter Roster</h1>
|
<h1>SLMM - Roster & Connections</h1>
|
||||||
<div class="nav">
|
<div class="nav">
|
||||||
<a href="/" class="btn">← Back to Control Panel</a>
|
<a href="/" class="btn">← Back to Control Panel</a>
|
||||||
<button class="btn btn-primary" onclick="openAddModal()">+ Add Device</button>
|
<button class="btn btn-primary" onclick="openAddModal()">+ Add Device</button>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="table-container">
|
<div class="tabs">
|
||||||
<table id="rosterTable">
|
<button class="tab-btn active" onclick="switchTab('roster')">Device Roster</button>
|
||||||
<thead>
|
<button class="tab-btn" onclick="switchTab('connections')">Connections</button>
|
||||||
<tr>
|
</div>
|
||||||
<th>Unit ID</th>
|
|
||||||
<th>Host / IP</th>
|
<!-- Roster Tab -->
|
||||||
<th>TCP Port</th>
|
<div id="tab-roster" class="tab-panel active">
|
||||||
<th>FTP Port</th>
|
<div class="table-container" style="border-top-left-radius: 0; border-top-right-radius: 0;">
|
||||||
<th class="checkbox-cell">TCP</th>
|
<table id="rosterTable">
|
||||||
<th class="checkbox-cell">FTP</th>
|
<thead>
|
||||||
<th class="checkbox-cell">Polling</th>
|
<tr>
|
||||||
<th>Status</th>
|
<th>Unit ID</th>
|
||||||
<th class="actions-cell">Actions</th>
|
<th>Host / IP</th>
|
||||||
</tr>
|
<th>TCP Port</th>
|
||||||
</thead>
|
<th>FTP Port</th>
|
||||||
<tbody id="rosterBody">
|
<th class="checkbox-cell">TCP</th>
|
||||||
<tr>
|
<th class="checkbox-cell">FTP</th>
|
||||||
<td colspan="9" style="text-align: center; padding: 24px;">
|
<th class="checkbox-cell">Polling</th>
|
||||||
Loading...
|
<th>Status</th>
|
||||||
</td>
|
<th class="actions-cell">Actions</th>
|
||||||
</tr>
|
</tr>
|
||||||
</tbody>
|
</thead>
|
||||||
</table>
|
<tbody id="rosterBody">
|
||||||
|
<tr>
|
||||||
|
<td colspan="9" style="text-align: center; padding: 24px;">
|
||||||
|
Loading...
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- Connections Tab -->
|
||||||
|
<div id="tab-connections" class="tab-panel">
|
||||||
|
<div class="table-container" style="padding: 20px; border-top-left-radius: 0; border-top-right-radius: 0;">
|
||||||
|
<div class="pool-actions">
|
||||||
|
<button class="btn" onclick="loadConnections()">Refresh</button>
|
||||||
|
<button class="btn btn-danger" onclick="flushConnections()">Flush All Connections</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<h3 style="margin: 0 0 12px 0; font-size: 16px;">Pool Configuration</h3>
|
||||||
|
<div id="poolConfig" class="pool-config">
|
||||||
|
<div class="pool-config-card">
|
||||||
|
<div class="label">Status</div>
|
||||||
|
<div class="value" id="poolEnabled">--</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<h3 style="margin: 20px 0 12px 0; font-size: 16px;">Active Connections</h3>
|
||||||
|
<div id="connectionsList">
|
||||||
|
<div class="conn-empty">Loading...</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@@ -619,6 +743,159 @@
|
|||||||
closeModal();
|
closeModal();
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// ========== Tab Switching ==========
|
||||||
|
|
||||||
|
function switchTab(tabName) {
|
||||||
|
document.querySelectorAll('.tab-btn').forEach(btn => btn.classList.remove('active'));
|
||||||
|
document.querySelectorAll('.tab-panel').forEach(panel => panel.classList.remove('active'));
|
||||||
|
|
||||||
|
document.querySelector(`.tab-btn[onclick="switchTab('${tabName}')"]`).classList.add('active');
|
||||||
|
document.getElementById(`tab-${tabName}`).classList.add('active');
|
||||||
|
|
||||||
|
if (tabName === 'connections') {
|
||||||
|
loadConnections();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ========== Connection Pool ==========
|
||||||
|
|
||||||
|
let connectionsRefreshTimer = null;
|
||||||
|
|
||||||
|
async function loadConnections() {
|
||||||
|
try {
|
||||||
|
const res = await fetch('/api/nl43/_connections/status');
|
||||||
|
const data = await res.json();
|
||||||
|
|
||||||
|
if (!res.ok) {
|
||||||
|
showToast('Failed to load connection pool status', 'error');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const pool = data.pool;
|
||||||
|
renderPoolConfig(pool);
|
||||||
|
renderConnections(pool.connections);
|
||||||
|
|
||||||
|
// Auto-refresh while tab is active
|
||||||
|
clearTimeout(connectionsRefreshTimer);
|
||||||
|
if (document.getElementById('tab-connections').classList.contains('active')) {
|
||||||
|
connectionsRefreshTimer = setTimeout(loadConnections, 5000);
|
||||||
|
}
|
||||||
|
} catch (err) {
|
||||||
|
showToast('Error loading connections: ' + err.message, 'error');
|
||||||
|
console.error('Load connections error:', err);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderPoolConfig(pool) {
|
||||||
|
document.getElementById('poolConfig').innerHTML = `
|
||||||
|
<div class="pool-config-card">
|
||||||
|
<div class="label">Persistent</div>
|
||||||
|
<div class="value" style="color: ${pool.enabled ? '#1a7f37' : '#cf222e'}">${pool.enabled ? 'Enabled' : 'Disabled'}</div>
|
||||||
|
</div>
|
||||||
|
<div class="pool-config-card">
|
||||||
|
<div class="label">Active</div>
|
||||||
|
<div class="value">${pool.active_connections}</div>
|
||||||
|
</div>
|
||||||
|
<div class="pool-config-card">
|
||||||
|
<div class="label">Idle TTL</div>
|
||||||
|
<div class="value">${pool.idle_ttl}s</div>
|
||||||
|
</div>
|
||||||
|
<div class="pool-config-card">
|
||||||
|
<div class="label">Max Age</div>
|
||||||
|
<div class="value">${pool.max_age}s</div>
|
||||||
|
</div>
|
||||||
|
<div class="pool-config-card">
|
||||||
|
<div class="label">KA Idle</div>
|
||||||
|
<div class="value">${pool.keepalive_idle}s</div>
|
||||||
|
</div>
|
||||||
|
<div class="pool-config-card">
|
||||||
|
<div class="label">KA Interval</div>
|
||||||
|
<div class="value">${pool.keepalive_interval}s</div>
|
||||||
|
</div>
|
||||||
|
<div class="pool-config-card">
|
||||||
|
<div class="label">KA Probes</div>
|
||||||
|
<div class="value">${pool.keepalive_count}</div>
|
||||||
|
</div>
|
||||||
|
`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderConnections(connections) {
|
||||||
|
const container = document.getElementById('connectionsList');
|
||||||
|
const keys = Object.keys(connections);
|
||||||
|
|
||||||
|
if (keys.length === 0) {
|
||||||
|
container.innerHTML = `
|
||||||
|
<div class="conn-empty">
|
||||||
|
<div style="font-size: 32px; margin-bottom: 8px;">~</div>
|
||||||
|
<div><strong>No active connections</strong></div>
|
||||||
|
<div style="margin-top: 4px; font-size: 13px;">
|
||||||
|
Connections appear here when devices are actively being polled and the connection is cached between commands.
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
`;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
container.innerHTML = keys.map(key => {
|
||||||
|
const conn = connections[key];
|
||||||
|
const aliveColor = conn.alive ? '#1a7f37' : '#cf222e';
|
||||||
|
const aliveText = conn.alive ? 'Alive' : 'Stale';
|
||||||
|
return `
|
||||||
|
<div class="conn-card">
|
||||||
|
<div class="conn-card-header">
|
||||||
|
<strong>${escapeHtml(key)}</strong>
|
||||||
|
<span class="status-badge ${conn.alive ? 'status-ok' : 'status-error'}">${aliveText}</span>
|
||||||
|
</div>
|
||||||
|
<div class="conn-card-grid">
|
||||||
|
<div class="conn-stat">
|
||||||
|
<div class="label">Host</div>
|
||||||
|
<div class="value">${escapeHtml(conn.host)}</div>
|
||||||
|
</div>
|
||||||
|
<div class="conn-stat">
|
||||||
|
<div class="label">Port</div>
|
||||||
|
<div class="value">${conn.port}</div>
|
||||||
|
</div>
|
||||||
|
<div class="conn-stat">
|
||||||
|
<div class="label">Age</div>
|
||||||
|
<div class="value">${formatSeconds(conn.age_seconds)}</div>
|
||||||
|
</div>
|
||||||
|
<div class="conn-stat">
|
||||||
|
<div class="label">Idle</div>
|
||||||
|
<div class="value">${formatSeconds(conn.idle_seconds)}</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
`;
|
||||||
|
}).join('');
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatSeconds(s) {
|
||||||
|
if (s < 60) return Math.round(s) + 's';
|
||||||
|
if (s < 3600) return Math.floor(s / 60) + 'm ' + Math.round(s % 60) + 's';
|
||||||
|
return Math.floor(s / 3600) + 'h ' + Math.floor((s % 3600) / 60) + 'm';
|
||||||
|
}
|
||||||
|
|
||||||
|
async function flushConnections() {
|
||||||
|
if (!confirm('Close all cached TCP connections?\n\nDevices will reconnect on the next poll cycle.')) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
const res = await fetch('/api/nl43/_connections/flush', { method: 'POST' });
|
||||||
|
const data = await res.json();
|
||||||
|
|
||||||
|
if (!res.ok) {
|
||||||
|
showToast(data.detail || 'Failed to flush connections', 'error');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
showToast('All connections flushed', 'success');
|
||||||
|
await loadConnections();
|
||||||
|
} catch (err) {
|
||||||
|
showToast('Error flushing connections: ' + err.message, 'error');
|
||||||
|
}
|
||||||
|
}
|
||||||
</script>
|
</script>
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
|||||||
@@ -1,128 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
Test script to verify that sleep mode is automatically disabled when:
|
|
||||||
1. Device configuration is created/updated with TCP enabled
|
|
||||||
2. Measurements are started
|
|
||||||
|
|
||||||
This script tests the API endpoints, not the actual device communication.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import requests
|
|
||||||
import json
|
|
||||||
|
|
||||||
BASE_URL = "http://localhost:8100/api/nl43"
|
|
||||||
UNIT_ID = "test-nl43-001"
|
|
||||||
|
|
||||||
def test_config_update():
|
|
||||||
"""Test that config update works (actual sleep mode disable requires real device)"""
|
|
||||||
print("\n=== Testing Config Update ===")
|
|
||||||
|
|
||||||
# Create/update a device config
|
|
||||||
config_data = {
|
|
||||||
"host": "192.168.1.100",
|
|
||||||
"tcp_port": 2255,
|
|
||||||
"tcp_enabled": True,
|
|
||||||
"ftp_enabled": False,
|
|
||||||
"ftp_username": "admin",
|
|
||||||
"ftp_password": "password"
|
|
||||||
}
|
|
||||||
|
|
||||||
print(f"Updating config for {UNIT_ID}...")
|
|
||||||
response = requests.put(f"{BASE_URL}/{UNIT_ID}/config", json=config_data)
|
|
||||||
|
|
||||||
if response.status_code == 200:
|
|
||||||
print("✓ Config updated successfully")
|
|
||||||
print(f"Response: {json.dumps(response.json(), indent=2)}")
|
|
||||||
print("\nNote: Sleep mode disable was attempted (will succeed if device is reachable)")
|
|
||||||
return True
|
|
||||||
else:
|
|
||||||
print(f"✗ Config update failed: {response.status_code}")
|
|
||||||
print(f"Error: {response.text}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
def test_get_config():
|
|
||||||
"""Test retrieving the config"""
|
|
||||||
print("\n=== Testing Get Config ===")
|
|
||||||
|
|
||||||
response = requests.get(f"{BASE_URL}/{UNIT_ID}/config")
|
|
||||||
|
|
||||||
if response.status_code == 200:
|
|
||||||
print("✓ Config retrieved successfully")
|
|
||||||
print(f"Response: {json.dumps(response.json(), indent=2)}")
|
|
||||||
return True
|
|
||||||
elif response.status_code == 404:
|
|
||||||
print("✗ Config not found (create one first)")
|
|
||||||
return False
|
|
||||||
else:
|
|
||||||
print(f"✗ Request failed: {response.status_code}")
|
|
||||||
print(f"Error: {response.text}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
def test_start_measurement():
|
|
||||||
"""Test that start measurement attempts to disable sleep mode"""
|
|
||||||
print("\n=== Testing Start Measurement ===")
|
|
||||||
|
|
||||||
print(f"Attempting to start measurement on {UNIT_ID}...")
|
|
||||||
response = requests.post(f"{BASE_URL}/{UNIT_ID}/start")
|
|
||||||
|
|
||||||
if response.status_code == 200:
|
|
||||||
print("✓ Start command accepted")
|
|
||||||
print(f"Response: {json.dumps(response.json(), indent=2)}")
|
|
||||||
print("\nNote: Sleep mode was disabled before starting measurement")
|
|
||||||
return True
|
|
||||||
elif response.status_code == 404:
|
|
||||||
print("✗ Device config not found (create config first)")
|
|
||||||
return False
|
|
||||||
elif response.status_code == 502:
|
|
||||||
print("✗ Device not reachable (expected if no physical device)")
|
|
||||||
print(f"Response: {response.text}")
|
|
||||||
print("\nNote: This is expected behavior when testing without a physical device")
|
|
||||||
return True # This is actually success - the endpoint tried to communicate
|
|
||||||
else:
|
|
||||||
print(f"✗ Request failed: {response.status_code}")
|
|
||||||
print(f"Error: {response.text}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
def main():
|
|
||||||
print("=" * 60)
|
|
||||||
print("Sleep Mode Auto-Disable Test")
|
|
||||||
print("=" * 60)
|
|
||||||
print("\nThis test verifies that sleep mode is automatically disabled")
|
|
||||||
print("when device configs are updated or measurements are started.")
|
|
||||||
print("\nNote: Without a physical device, some operations will fail at")
|
|
||||||
print("the device communication level, but the API logic will execute.")
|
|
||||||
|
|
||||||
# Run tests
|
|
||||||
results = []
|
|
||||||
|
|
||||||
# Test 1: Update config (should attempt to disable sleep mode)
|
|
||||||
results.append(("Config Update", test_config_update()))
|
|
||||||
|
|
||||||
# Test 2: Get config
|
|
||||||
results.append(("Get Config", test_get_config()))
|
|
||||||
|
|
||||||
# Test 3: Start measurement (should attempt to disable sleep mode)
|
|
||||||
results.append(("Start Measurement", test_start_measurement()))
|
|
||||||
|
|
||||||
# Summary
|
|
||||||
print("\n" + "=" * 60)
|
|
||||||
print("Test Summary")
|
|
||||||
print("=" * 60)
|
|
||||||
|
|
||||||
for test_name, result in results:
|
|
||||||
status = "✓ PASS" if result else "✗ FAIL"
|
|
||||||
print(f"{status}: {test_name}")
|
|
||||||
|
|
||||||
print("\n" + "=" * 60)
|
|
||||||
print("Implementation Details:")
|
|
||||||
print("=" * 60)
|
|
||||||
print("1. Config endpoint is now async and calls ensure_sleep_mode_disabled()")
|
|
||||||
print(" when TCP is enabled")
|
|
||||||
print("2. Start measurement endpoint calls ensure_sleep_mode_disabled()")
|
|
||||||
print(" before starting the measurement")
|
|
||||||
print("3. Sleep mode check is non-blocking - config/start will succeed")
|
|
||||||
print(" even if the device is unreachable")
|
|
||||||
print("=" * 60)
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
Reference in New Issue
Block a user