Files
seismo-relay/parsers/README_s3_parser.md
2026-03-03 02:37:43 -05:00

126 lines
3.3 KiB
Markdown

# s3_parser.py
## Purpose
`s3_parser.py` extracts complete DLE-framed packets from raw serial
capture files produced by the `s3_bridge` logger.
It operates strictly at the **framing layer**. It does **not** decode
higher-level protocol structures.
This parser is designed specifically for Instantel / Series 3--style
serial traffic using:
- `DLE STX` (`0x10 0x02`) to start a frame
- `DLE ETX` (`0x10 0x03`) to end a frame
- DLE byte stuffing (`0x10 0x10` → literal `0x10`)
------------------------------------------------------------------------
## Design Philosophy
This parser:
- Uses a deterministic state machine (no regex, no global scanning).
- Assumes raw wire framing is preserved (`DLE+ETX` is present).
- Does **not** attempt auto-detection of framing style.
- Extracts only complete `STX → ETX` frame pairs.
- Safely ignores incomplete trailing frames at EOF.
Separation of concerns is intentional:
- **Parser = framing extraction**
- **Decoder = protocol interpretation (future layer)**
Do not add message-level logic here.
------------------------------------------------------------------------
## Input
Raw binary `.bin` files captured from:
- `--raw-bw` tap (Blastware → S3)
- `--raw-s3` tap (S3 → Blastware)
These must preserve raw serial bytes.
------------------------------------------------------------------------
## Usage
Basic frame extraction:
``` bash
python s3_parser.py raw_s3.bin --trailer-len 2
```
Options:
- `--trailer-len N`
- Number of bytes to capture after `DLE ETX`
- Often `2` (CRC16)
- `--crc`
- Attempts CRC16 validation against first 2 trailer bytes
- Tries several common CRC16 variants
- `--crc-endian {little|big}`
- Endianness for interpreting trailer bytes (default: little)
- `--out frames.jsonl`
- Writes full JSONL output instead of printing summary
------------------------------------------------------------------------
## Output Format
Each extracted frame produces:
``` json
{
"index": 0,
"start_offset": 20,
"end_offset": 4033,
"payload_len": 3922,
"payload_hex": "...",
"trailer_hex": "000f",
"crc_match": null
}
```
Where:
- `payload_hex` = unescaped payload bytes (DLE stuffing removed)
- `trailer_hex` = bytes immediately following `DLE ETX`
- `crc_match` = matched CRC algorithm (if `--crc` enabled)
------------------------------------------------------------------------
## Known Behavior
- Frames that start but never receive a matching `DLE ETX` before EOF
are discarded.
- Embedded `0x10 0x02` inside payload does not trigger a new frame
(correct behavior).
- Embedded `0x10 0x10` is correctly unescaped to a single `0x10`.
------------------------------------------------------------------------
## What This Parser Does NOT Do
- It does not decode Instantel message structure.
- It does not interpret block IDs or message types.
- It does not validate protocol-level fields.
- It does not reconstruct multi-frame logical responses.
That is the responsibility of a higher-level decoder.
------------------------------------------------------------------------
## Status
Framing layer verified against:
- raw_bw.bin (command/control direction)
- raw_s3.bin (device response direction)
State machine validated via start/end instrumentation.