diff --git a/parsers/README_s3_parser.md b/parsers/README_s3_parser.md new file mode 100644 index 0000000..7694586 --- /dev/null +++ b/parsers/README_s3_parser.md @@ -0,0 +1,125 @@ +# s3_parser.py + +## Purpose + +`s3_parser.py` extracts complete DLE-framed packets from raw serial +capture files produced by the `s3_bridge` logger. + +It operates strictly at the **framing layer**. It does **not** decode +higher-level protocol structures. + +This parser is designed specifically for Instantel / Series 3--style +serial traffic using: + +- `DLE STX` (`0x10 0x02`) to start a frame +- `DLE ETX` (`0x10 0x03`) to end a frame +- DLE byte stuffing (`0x10 0x10` → literal `0x10`) + +------------------------------------------------------------------------ + +## Design Philosophy + +This parser: + +- Uses a deterministic state machine (no regex, no global scanning). +- Assumes raw wire framing is preserved (`DLE+ETX` is present). +- Does **not** attempt auto-detection of framing style. +- Extracts only complete `STX → ETX` frame pairs. +- Safely ignores incomplete trailing frames at EOF. + +Separation of concerns is intentional: + +- **Parser = framing extraction** +- **Decoder = protocol interpretation (future layer)** + +Do not add message-level logic here. + +------------------------------------------------------------------------ + +## Input + +Raw binary `.bin` files captured from: + +- `--raw-bw` tap (Blastware → S3) +- `--raw-s3` tap (S3 → Blastware) + +These must preserve raw serial bytes. + +------------------------------------------------------------------------ + +## Usage + +Basic frame extraction: + +``` bash +python s3_parser.py raw_s3.bin --trailer-len 2 +``` + +Options: + +- `--trailer-len N` + - Number of bytes to capture after `DLE ETX` + - Often `2` (CRC16) +- `--crc` + - Attempts CRC16 validation against first 2 trailer bytes + - Tries several common CRC16 variants +- `--crc-endian {little|big}` + - Endianness for interpreting trailer bytes (default: little) +- `--out frames.jsonl` + - Writes full JSONL output instead of printing summary + +------------------------------------------------------------------------ + +## Output Format + +Each extracted frame produces: + +``` json +{ + "index": 0, + "start_offset": 20, + "end_offset": 4033, + "payload_len": 3922, + "payload_hex": "...", + "trailer_hex": "000f", + "crc_match": null +} +``` + +Where: + +- `payload_hex` = unescaped payload bytes (DLE stuffing removed) +- `trailer_hex` = bytes immediately following `DLE ETX` +- `crc_match` = matched CRC algorithm (if `--crc` enabled) + +------------------------------------------------------------------------ + +## Known Behavior + +- Frames that start but never receive a matching `DLE ETX` before EOF + are discarded. +- Embedded `0x10 0x02` inside payload does not trigger a new frame + (correct behavior). +- Embedded `0x10 0x10` is correctly unescaped to a single `0x10`. + +------------------------------------------------------------------------ + +## What This Parser Does NOT Do + +- It does not decode Instantel message structure. +- It does not interpret block IDs or message types. +- It does not validate protocol-level fields. +- It does not reconstruct multi-frame logical responses. + +That is the responsibility of a higher-level decoder. + +------------------------------------------------------------------------ + +## Status + +Framing layer verified against: + +- raw_bw.bin (command/control direction) +- raw_s3.bin (device response direction) + +State machine validated via start/end instrumentation.