doc: §2, §10, Appendix C | **MILESTONE — Link-layer grammar formally confirmed.**

This commit is contained in:
serversdwn
2026-03-04 17:42:15 -05:00
parent a684d3e642
commit fa9873cf4a

View File

@@ -47,6 +47,7 @@
| 2026-03-02 | Appendix A | **UPDATED:** New capture architecture: two flat raw wire dumps per session (`raw_s3.bin`, `raw_bw.bin`), one per direction, no record wrapper. Replaces structured `.bin` format for parser input. |
| 2026-03-02 | Appendix A | **PARSER:** Deterministic DLE state machine implemented (`s3_parser.py`). Three states: `IDLE → IN_FRAME → AFTER_DLE`. Replaces heuristic global scanning. Properly handles DLE stuffing (`10 10` → literal `10`). Only complete STX→ETX pairs counted as frames. |
| 2026-03-02 | Appendix A | **VALIDATED:** `raw_bw.bin` yields 7 complete frames via state machine. `raw_s3.bin` contains large structured responses (first frame payload ~3922 bytes). Both files confirmed lossless. BW bare `0x02` pattern confirmed as asymmetric framing (BW sends bare STX, S3 sends DLE+STX). |
| 2026-03-03 | §2, §10, Appendix C | **MILESTONE — Link-layer grammar formally confirmed.** BW start marker is `41 02` (ACK+STX as a unit — bare `02` alone is not sufficient). BW frame boundary is structural sequence `03 41 02`. ETX lookahead: bare `03` only accepted as ETX when followed by `41 02` or at EOF. Checksum confirmed split: small frames use SUM8, large frames use unknown algorithm. Checksum is semantic, not a framing primitive. `s3_parser.py` v0.2.2 implements dual `--mode s3/bw`. |
---
@@ -71,22 +72,29 @@
The two sides of the connection use **fully asymmetric framing**. DLE stuffing applies on both sides.
| Direction | STX (frame start) | ETX (frame end) | Stuffing | Notes |
| Direction | Start marker | End marker | Stuffing | Notes |
|---|---|---|---|---|
| S3 → BW (device) | `0x10 0x02` (DLE+STX) | `0x10 0x03` (DLE+ETX) | `0x10``0x10 0x10` | Full DLE framing |
| BW → S3 (Blastware) | `0x02` (bare STX) | `0x03` (bare ETX) | `0x10``0x10 0x10` | Bare delimiters, DLE stuffing only |
| S3 → BW (device) | `10 02` (DLE+STX) | `10 03` (DLE+ETX) | `10``10 10` | Full DLE framing |
| BW → S3 (Blastware) | `41 02` (ACK+STX) | `03` (bare ETX) | `10``10 10` | ACK is part of start marker |
**BW start marker:** `41 02` is treated as a single two-byte start signature. Bare `02` alone is **not** sufficient to start a BW frame — it must be preceded by `41` (ACK). This prevents false triggering on `02` bytes inside payload data.
**BW ETX rule:** Bare `03` is accepted as frame end **only** when followed by `41 02` (next frame's ACK+STX) or at EOF. The structural boundary pattern is:
```
... [payload] 03 41 02 [next payload] ...
```
In-payload `03` bytes are preserved as data when not followed by `41 02`.
**Evidence:**
- 91/98 BW frames validate checksum when parsed with bare `0x03` as ETX
- All `10 03` sequences in `raw_bw.bin` are in-payload data — none are followed by `41 02` (next frame start)
- `10 03` appearing in BW payload is always `10 10 03` origin (stuffed DLE + literal `03`) — the S3 device correctly parses this via its own state machine without false ETX detection
- S3 captures consistently terminate with `10 03` confirmed via HxD
- 98/98 BW frames extracted from `raw_bw.bin` using `41 02` start + `03 41 02` structural boundary
- 91/98 small BW frames validate SUM8 checksum; 7 large config/write frames do not match any known checksum algorithm
- All `10 03` sequences in `raw_bw.bin` confirmed as in-payload data (none followed by `41 02`)
- `s3_parser.py v0.2.2` implements both modes; BW ETX lookahead confirmed working
**Practical impact for parsers:**
- Parser on `raw_s3.bin`: trigger on `10 02`, terminate on `10 03`
- Parser on `raw_bw.bin`: trigger on bare `02`, terminate on bare `03`
- Both parsers must handle `10 10` → literal `10` unstuffing
- ETX detection must be state-machine-aware (not raw byte search) to avoid false matches on stuffed sequences
**Checksum is NOT a framing primitive:**
- Small frames (e.g. keepalive SUB `5B`): SUM8 validates consistently
- Large frames (e.g. SUB `71` config writes): checksum algorithm unknown — does not match SUM8, CRC-16/IBM, CRC-16/CCITT-FALSE, or CRC-16/X25
- Frame boundaries are determined structurally; checksum validation is a semantic-layer concern only
### Frame Structure by Direction
@@ -658,24 +666,27 @@ ESCAPE:
### Parser State Machine — BW→S3 direction (Blastware commands)
Trigger on bare STX, terminate on bare ETX. DLE only appears in stuffing context.
Trigger on `41 02` (ACK+STX as a unit). ETX accepted only when followed by `41 02` or at EOF.
```
IDLE:
receive 0x41 → emit ACK event, stay IDLE
receive 0x02 → frame started, goto IN_FRAME
receive anything → discard, stay IDLE
receive 0x41 + next==0x02 → frame started (consume both), goto IN_FRAME
receive anything → discard, stay IDLE
IN_FRAME:
receive 0x10 → goto ESCAPE
receive 0x03 → frame complete — validate checksum, process buffer, goto IDLE
receive any byte → append to buffer, stay IN_FRAME
receive 0x10 → goto ESCAPE
receive 0x03 + lookahead==0x41 0x02, or EOF
→ frame complete — validate checksum, process buffer, goto IDLE
receive 0x03 (no lookahead) → append to buffer (in-payload 03), stay IN_FRAME
receive any byte → append to buffer, stay IN_FRAME
ESCAPE:
receive 0x10 → append single 0x10 to buffer, goto IN_FRAME (stuffed literal)
receive anything → append DLE + byte to buffer (recovery), goto IN_FRAME
receive 0x10 → append single 0x10 to buffer, goto IN_FRAME (stuffed literal)
receive anything → append DLE + byte to buffer (recovery), goto IN_FRAME
```
**Architectural note:** Checksum validation is optional and informational only. Frame boundaries are determined structurally via the `03 41 02` sequence — never by checksum gating.
---
## 11. Checksum Reference Implementation
@@ -829,10 +840,10 @@ As of `s3_bridge v0.5.0`, captures are produced as **two flat raw wire dump file
Every byte on the wire is written verbatim — no modification, no record headers, no timestamps. `0x10 0x03` (DLE+ETX) is preserved intact.
**Practical impact for parsing:**
- `raw_s3.bin`: trigger on `0x10 0x02`, terminate on `0x10 0x03` (DLE+ETX)
- `raw_bw.bin`: trigger on bare `0x02`, terminate on bare `0x03`
- Both: handle `0x10 0x10` → literal `0x10` unstuffing
- ETX detection must be state-machine-aware on both sides to avoid false matches on stuffed sequences
- `raw_s3.bin`: trigger on `10 02`, terminate on `10 03` (state-machine-aware)
- `raw_bw.bin`: trigger on `41 02` (ACK+STX as a unit), terminate on `03` only when followed by `41 02` or at EOF
- Both: handle `10 10` → literal `10` unstuffing
- Use `s3_parser.py --mode s3` and `--mode bw` respectively
---
@@ -840,6 +851,7 @@ Every byte on the wire is written verbatim — no modification, no record header
| Question | Priority | Added | Notes |
|---|---|---|---|
| **Large BW frame checksum algorithm** — Small frames (SUB `5B` keepalive etc.) validate with SUM8. Large config/write frames (SUB `71`, `68`, `69` etc.) do not match SUM8, CRC-16/IBM, CRC-16/CCITT-FALSE, or CRC-16/X25 in either endianness. Unknown whether it covers full payload or excludes header bytes, or whether different SUB types use different algorithms. | MEDIUM | 2026-03-03 | NEW |
| Byte at timestamp offset 3 — hours, minutes, or padding? | MEDIUM | 2026-02-26 | |
| `trail[0]` in serial number response — unit-specific byte, derivation unknown. `trail[1]` resolved as firmware minor version. | MEDIUM | 2026-02-26 | |
| Full channel ID mapping in SUB `5A` stream (01/02/03/04 → which sensor?) | MEDIUM | 2026-02-26 | |
@@ -920,7 +932,7 @@ As of 2026-03-02 the capture pipeline produces two flat raw wire dump files per
No record headers, no timestamps, no framing logic applied by the dumper. Files are flat concatenations of `serial.read()` chunks. Frame boundaries must be recovered by the parser.
### C.3 Parser Design — DLE State Machine
### C.3 Parser Design — Dual-Mode State Machine (`s3_parser.py v0.2.2`)
A deterministic state machine replaces all prior heuristic scanning.
@@ -948,14 +960,15 @@ STATE_AFTER_DLE — last byte was 0x10, awaiting qualifier
**BW→S3 parser states:**
| Current State | Byte | Action | Next State |
| Current State | Condition | Action | Next State |
|---|---|---|---|
| IDLE | `02` | Begin new frame | IN_FRAME |
| IDLE | byte==`41` AND next==`02` | Begin new frame (consume both) | IN_FRAME |
| IDLE | any | Discard | IDLE |
| IN_FRAME | `03` | Frame complete, emit | IDLE |
| IN_FRAME | `10` | — | AFTER_DLE |
| IN_FRAME | byte==`03` AND (next two==`41 02` OR at EOF) | Frame complete, emit | IDLE |
| IN_FRAME | byte==`03` (no lookahead match) | Append `03` to payload | IN_FRAME |
| IN_FRAME | byte==`10` | — | AFTER_DLE |
| IN_FRAME | other | Append to payload | IN_FRAME |
| AFTER_DLE | `10` | Append literal `0x10` | IN_FRAME |
| AFTER_DLE | byte==`10` | Append literal `10` | IN_FRAME |
| AFTER_DLE | other | Append DLE + byte (recovery) | IN_FRAME |
**Properties:**
@@ -967,9 +980,10 @@ STATE_AFTER_DLE — last byte was 0x10, awaiting qualifier
### C.4 Observed Traffic (Validation Captures)
**`raw_bw.bin`** (Blastware → S3):
- 98 complete frames via state machine (bare STX + bare ETX mode)
- 91/98 checksums validate; 7 failures are large frames containing in-payload `10 03` sequences that a naive scanner misreads as ETX
- Bare `0x02` STX and bare `0x03` ETX confirmed; DLE used for stuffing only
- 98 complete frames via `41 02` start + `03 41 02` structural boundary detection
- 91/98 small frames validate SUM8 checksum; 7 large config/write frames fail all known checksum algorithms
- `41 02` confirmed as two-byte start signature; bare `02` alone is insufficient
- Bare `03` ETX confirmed; in-payload `03` bytes correctly preserved via lookahead rule
- Contains project metadata strings: `"Standard Recording Setup.set"`, `"Claude test2"`, `"Location #1 - Brians House"`
**`raw_s3.bin`** (S3 → Blastware):
@@ -983,7 +997,10 @@ STATE_AFTER_DLE — last byte was 0x10, awaiting qualifier
1. **Global byte counting ≠ frame counting.** `0x10 0x02` appears inside payloads. Only state machine transitions produce valid frame boundaries.
2. **STX count ≠ frame count.** Only STX→ETX pairs within proper state transitions count.
3. **EOF mid-frame is normal.** Capture termination during active traffic produces an incomplete trailing frame. Not an error.
4. **Layer separation.** The parser extracts frames only. Decoding block IDs, validating checksums, and interpreting semantics are responsibilities of a separate protocol decoder layer above it.
4. **Start marker must be the full signature.** In BW mode, `41 02` is the start marker — not bare `02`. Bare `02` appears in payload data and would cause phantom frames.
5. **ETX lookahead prevents false termination.** In BW mode, `03` is only a frame terminator when followed by `41 02` or at EOF. In-payload `03` bytes are common in large config frames.
6. **Framing is structural. Checksum is semantic.** Frame boundaries are determined by grammar patterns — never by checksum validation. Checksum belongs to the protocol decoder layer, not the framing layer.
7. **Layer separation.** The parser extracts frames only. Decoding block IDs, validating checksums, and interpreting semantics are responsibilities of a separate protocol decoder layer above it.
### C.6 Parser Layer Architecture