# NL-43 + RX55 TCP “Wedge” Investigation (2255 Refusal) — Full Log & Next Steps **Last updated:** 2026-02-18 **Owner:** Brian / serversdown **Context:** Terra-View / SLMM / field-deployed Rion NL-43 behind Sierra Wireless RX55 --- ## 0) What this document is This is a **comprehensive, chronological** record of the debugging we did to isolate a failure where the **NL-43’s TCP control port (2255) eventually stops accepting connections** (“wedges”), while other services (notably FTP/21) remain reachable. This is written to be fed back into future troubleshooting, so it intentionally includes the **full reasoning chain, experiments, commands, packet evidence, and conclusions**. --- ## 1) Architecture (as tested) ### Network path - **Server (SLMM host):** `10.0.0.40` - **RX55 WAN IP:** `63.45.161.30` - **RX55 LAN subnet:** `192.168.1.0/24` - **RX55 LAN gateway:** `192.168.1.1` - **NL-43 LAN IP:** `192.168.1.10` (confirmed via ARP OUI + ping; see LAN validation) ### RX55 details - **Sierra Wireless RX55** - **OS:** 5.2 - **Firmware:** `01.14.24.00` - **Carrier:** Verizon LTE (Band 66) ### Port forwarding rules (RX55) - **WAN:2255 → NL-43:2255** (NL-43 TCP control) - **WAN:21 → NL-43:21** (NL-43 FTP control) You also experimented with additional forwards: - **WAN:2253 → NL-43:2255** (test) - **WAN:2253 → NL-43:2253** (test) - **WAN:4450 → NL-43:4450** (test) **Important:** Rule “Input zone / interface” was set to **WAN-NAT**, and Source IP left as **Any IPv4**. This is correct for inbound port-forward behavior on Sierra OS 5.x. --- ## 2) Original problem statement (the “wedge”) After running for hours, the NL-43 becomes unreachable over TCP control. ### Symptom signature (WAN-side) - Client attempts to connect to `63.45.161.30:2255` - Instead of timing out, the client gets **connection refused** quickly. - Packet-level: SYN from client → **RST,ACK** back (meaning active refusal vs silent drop) ### Critical operational behavior - **Power cycling the NL-43 fixes it.** - **Power cycling the RX55 does NOT fix it.** - FTP sometimes remains available even while TCP control (2255) is dead. This combination is what forced us to determine whether: - The RX55 is rejecting connections, OR - The NL-43 is no longer listening on 2255, OR - Something about the RX55 path triggers the NL-43’s control listener to die. --- ## 3) Event timeline evidence (SLMM logs) A concrete wedge window was observed on **2026-02-18**: - 10:55:46 AM — Poll success (Start) - 11:00:28 AM — Measurement STOPPED (scheduled stop/download cycle succeeded) - 11:55:50 AM — Poll success (Stop) - 12:55:55 PM — Poll success (Stop) - **1:55:58 PM — Poll failed (attempt 1/3): Errno 111 (connection refused)** - 2:56:02 PM — Poll failed (attempt 2/3): Errno 111 (connection refused) Key interpretation: - The wedge occurred sometime between **12:55 and 1:55**. - The failure type is **refused**, not timeout. --- ## 4) Early hypotheses (before proof) We considered two main buckets: ### A) NL-43-side failure (most suspicious) - NL-43 TCP control service crashes / exits / unbinds from 2255 - socket leak / accept backlog exhaustion - “single control session allowed” and it gets stuck thinking a session is active - mode/service manager bug (service restart fails after other activities) - firmware bug in TCP daemon ### B) RX55-side failure (possible trigger / less likely once FTP works) - NAT/forwarding table corruption - firewall behavior - helper/ALG interference - MSS/MTU weirdness causing edge-case behavior - session churn behavior causing downstream issues --- ## 5) Key experiments and what they proved ### 5.1) LAN-only stability test (No RX55 path) **Test:** NL-43 tested directly on LAN (no modem path involved). - Ran **24+ hours** - Scheduler start/stop cycles worked - Stress test: **500 commands @ 1/sec** → no failure - Response time trend decreased (not degrading) **Result:** The NL-43 appears stable in a “pure LAN” environment. **Interpretation:** The trigger is likely related to the RX55/WAN environment, connection patterns, or service switching patterns—not just simple uptime. --- ### 5.2) Port-forward behavior: timeout vs refused (RX55 behavior characterization) You observed: - **If a WAN port is NOT forwarded (no rule):** connecting to that port **times out** (silent drop) - **If a WAN port IS forwarded to NL-43 but nothing listens:** it **actively refuses** (RST) Concrete example: - Port **4450** with no rule → timeout - Port **4450 → NL-43:4450** rule created → connection refused **Interpretation:** This confirms the RX55 is actually forwarding packets to the NL-43 when a rule exists. “Refused” is consistent with the NL-43 (or RX55 relay behavior) responding quickly because the packet reached the target. Important nuance: - A “refused” on forwarded ports does **not** automatically prove the NL-43 is the one generating RST, because NAT hides the inside host and the RX55 could reject on behalf of an unreachable target. We needed a LAN-side proof test to close the loop. --- ### 5.3) UDP test confusion (and resolution) You ran: ```bash nc -vzu 63.45.161.30 2255 nc -vz 63.45.161.30 2255 ``` Observed: - UDP: “succeeded” - TCP: “connection refused” Resolution: - UDP has **no handshake**. netcat prints “succeeded” if it doesn’t immediately receive an ICMP unreachable. It does **not** mean a UDP service exists. - TCP refused is meaningful: a RST implies “no listener” or “actively rejected.” **Net effect:** UDP test did not change the diagnosis. --- ### 5.4) Packet capture proof (WAN-side) You captured a Wireshark/tcpdump summary with these key patterns: #### Port 2255 (TCP control) Example: - `10.0.0.40 → 63.45.161.30:2255` SYN - `63.45.161.30 → 10.0.0.40` **RST, ACK** within ~50ms This happened repeatedly. #### Port 2253 (test port) Multiple SYN attempts to 2253 showed **retransmissions and no response**, i.e., **silent drop** (consistent with no rule or not forwarded at that moment). #### Port 21 (FTP) Clean 3-way handshake: - SYN → SYN/ACK → ACK Then: - FTP server banner: `220 Connection Ready` Then: - `530 Not logged in` (because SLMM was sending non-FTP “requests” as an experiment) Session closes cleanly. **Key takeaway from capture:** - TCP transport to NL-43 via RX55 is definitely working (port 21 proves it). - Port 2255 is being actively refused. This strongly suggested “2255 listener is gone,” but still didn’t fully prove whether the refusal was generated internally by NL-43 or by RX55 on behalf of NL-43. --- ## 6) The decisive experiment: LAN-side test while wedged (final proof) Because the RX55 does not offer SSH, the plan was to test from **inside the LAN behind the RX55**. ### 6.1) Physical LAN tap setup Constraint: - NL-43 has only one Ethernet port. Solution: - Insert an unmanaged switch: - RX55 LAN → switch - NL-43 → switch - Windows 10 laptop → switch This creates a shared L2 segment where the laptop can test NL-43 directly. ### 6.2) Windows LAN validation On the Windows laptop: - `ipconfig` showed: - IP: `192.168.1.100` - Gateway: `192.168.1.1` (RX55) - Initial `arp -a` only showed RX55, not NL-43. You then: - pinged likely host addresses and discovered NL-43 responds on **192.168.1.10** - `arp -a` then showed: - `192.168.1.10 → 00-10-50-14-0a-d8` - OUI `00-10-50` recognized as **Rion** (matches NL-43) So LAN identities were confirmed: - RX55: `192.168.1.1` - NL-43: `192.168.1.10` ### 6.3) The LAN port tests (the smoking gun) From Windows: ```powershell Test-NetConnection -ComputerName 192.168.1.10 -Port 2255 Test-NetConnection -ComputerName 192.168.1.10 -Port 21 ``` Results (while the unit was “wedged” from the WAN perspective): - **2255:** `TcpTestSucceeded : False` - **21:** `TcpTestSucceeded : True` **Conclusion (PROVEN):** - The NL-43 is reachable on the LAN - FTP port 21 is alive - **The NL-43 is NOT listening on TCP port 2255** - Therefore the RX55 is not the root cause of the refusal. The WAN refusal is consistent with the NL-43 having no listener on 2255. This is now settled. --- ## 7) What we learned (final conclusions) ### 7.1) RX55 innocence (for this failure mode) The RX55 is not “randomly rejecting” or “breaking TCP” in the way originally feared. It successfully forwards and supports TCP to the NL-43 on port 21, and the LAN-side test proves the 2255 failure exists *even without NAT/WAN involvement*. ### 7.2) NL-43 control listener failure The NL-43’s TCP control service (port 2255) stops listening while: - the device remains alive - the LAN stack remains alive (ping) - FTP remains alive (port 21) This looks like one of: - control daemon crash/exit - service unbind - stuck service state (e.g., “busy” / “session active forever”) - resource leak (sockets/file descriptors) specific to the control service - firmware service manager bug (start/stop of services fails after certain sequences) --- ## 8) Additional constraint discovered: “Web App mode” conflicts You noted an important operational constraint: > Turning on the web app disables other interfaces like TCP and FTP. Meaning the NL-43 appears to have mutually exclusive service/mode behavior (or at least serious conflicts). That matters because: - If any workflow toggles modes (explicitly or implicitly), it could destabilize the service lifecycle. - It reduces the possibility of using “web UI toggle” as an easy remote recovery mechanism **if** it disables the services needed. We have not yet run a controlled long test to determine whether: - mode switching contributes directly to the 2255 listener dying, OR - it happens even in a pure TCP-only mode with no switching. --- ## 9) Immediate operational decision (field tomorrow) Because the device is needed in the field immediately, you chose: - **Old-school manual deployment** - **Manual SD card downloads** - Avoid reliance on 2255/TCP control and remote workflows for now. **Important operational note:** The 2255 listener dying does not necessarily stop the NL-43 from measuring; it primarily breaks remote control/polling. Manual SD workflow sidesteps the entire remote control dependency. --- ## 10) What’s next (future work — when the unit is back) Because long tests can’t be run before tomorrow, the plan is to resume in a few weeks with controlled experiments designed to isolate the trigger and develop an operational mitigation. ### 10.1) Controlled experiment matrix (recommended) Run each test for 24–72 hours, or until wedge occurs, and record: - number of TCP connects - whether connections are persistent - whether FTP is used - whether any mode toggling is performed - time-to-wedge #### Test A — TCP-only (ideal baseline) - TCP control only (2255) - **True persistent connection** (open once, keep forever) - No FTP - No web mode toggling Outcome interpretation: - If stable: connection churn and/or FTP/mode switching is the trigger. - If wedges anyway: pure 2255 daemon leak/bug. #### Test B — TCP with connection churn - Same as A but intentionally reconnect on a schedule (current SLMM behavior) - No FTP Outcome: - If this wedges but A doesn’t: churn is the trigger. #### Test C — FTP activity + TCP - Introduce scheduled FTP sessions (downloads) while using TCP control - Observe whether wedge correlates with FTP use or with post-download periods. Outcome: - If wedge correlates with FTP, suspect internal service lifecycle conflict. #### Test D — Web mode interaction (only if safe/possible) - Evaluate what toggling web mode does to TCP/FTP services. - Determine if any remote-safe “soft reset” exists. --- ## 11) Mitigation options (ranked) ### Option 1 — Make SLMM truly persistent (highest probability of success) If the NL-43 wedges due to session churn or leaked socket states, the best mitigation is: - Open one TCP socket per device - Keep it open indefinitely - Use OS keepalive - Do **not** rotate connections on timers - Reconnect only when the socket actually dies This reduces: - connect/close cycles - NAT edge-case exposure - resource churn inside NL-43 ### Option 2 — Service “soft reset” (if possible without disabling required services) If there exists any way to restart the 2255 service without power cycling: - LAN TCP toggle (if it doesn’t require web mode) - any “restart comms” command (unknown) - any maintenance menu sequence then SLMM could: - detect wedge - trigger soft reset - recover automatically Current constraint: web app mode appears to disable other services, so this may not be viable. ### Option 3 — Hardware watchdog power cycle (industrial but reliable) If this is a firmware bug with no clean workaround: - Add a remotely controlled relay/power switch - On wedge detection, power-cycle NL-43 automatically - Optionally schedule a nightly power cycle to prevent leak accumulation This is “field reality” and often the only long-term move with embedded devices. ### Option 4 — Vendor escalation (Rion) You now have excellent evidence: - LAN-side proof: 2255 dead while 21 alive - WAN packet evidence - clear isolation of RX55 innocence This is strong enough to send to Rion support as a firmware defect report. --- ## 12) Repro “wedge bundle” checklist (for future captures) When the wedge happens again, capture these before power cycling: 1) From server: - `nc -vz 63.45.161.30 2255` (expect refused) - `nc -vz 63.45.161.30 21` (expect success if FTP alive) 2) From LAN side (via switch/laptop): - `Test-NetConnection 192.168.1.10 -Port 2255` - `Test-NetConnection 192.168.1.10 -Port 21` 3) Optional: packet capture around the refused attempt. 4) Record: - last successful poll timestamp - last FTP session timestamp - any scheduled start/stop/download cycles near wedge time - SLMM connection reuse/rotation settings in effect --- ## 13) Final, current-state summary (as of 2026-02-18) - The issue is **NOT** the RX55 rejecting inbound connections. - The NL-43 is **alive**, reachable on LAN, and FTP works. - The NL-43’s **TCP control listener on 2255 stops listening** while the device remains otherwise healthy. - The wedge can occur hours after successful operations. - The unit is needed in the field immediately, so investigation pauses. - Next phase: controlled tests to isolate trigger + implement mitigation (persistent socket or watchdog reset). --- ## 14) Notes / misc observations - The Wireshark trace showed repeated FTP sessions were opened and closed cleanly, but SLMM’s “FTP requests” were not valid FTP (causing `530 Not logged in`). That was part of experimentation, not a normal workflow. - UDP “success” via netcat is not meaningful because UDP has no handshake; it simply indicates no ICMP unreachable was returned. --- **End of document.**