feat: preserve and encode raw 0C record in sidecar extensions for offline analysis

doc(fix): retracts raw int16 LE sample set assumptions.
feat(protocol): implement v0.14.0 SUB 5A protocol rewrite with enhanced chunk handling and new helpers
2026-05-08 21:50:01 +00:00 · 2026-05-08 19:26:25 +00:00 · 2026-05-08 19:11:55 +00:00 · 2026-05-08 19:06:26 +00:00
11 changed files with 718 additions and 45 deletions
@@ -121,6 +121,65 @@ All notable changes to seismo-relay are documented here.
 ---
 ## v0.14.0 — 2026-05-02
 ### Changed (major rewrite)
 - **`read_bulk_waveform_stream` — STRT-bounded chunk walk.**  Replaces the
  earlier `0x0400`-step / `max(key4[2:4], 0x0400)` chunk-counter formula,
  which over-read ~5× past the actual event end into post-event circular-
  buffer garbage.  The new walk:
    1. Probe at `counter = start_offset` (event 1: `0x0000`; event N:
       `cur_key[2:4]`).
    2. Parse `end_offset` from the STRT record at `data[17]` of the probe
       response (`end_key[2:4]` field).
    3. For event 1 only, read the two fixed metadata pages at counter
       `0x1002` and `0x1004` — these contain the global session-start
       compliance setup (Project / Client / User Name / Seis Loc /
       Extended Notes ASCII strings).  Continuation events skip these
       (BW caches them across the session).
    4. Walk sample chunks at **`0x0200` increments (NOT `0x0400`)**, bounded
       by `end_offset` — the loop exits when
       `next_chunk_counter + 0x0200 > end_offset`.
    5. Send the proper TERM frame (see new `bulk_waveform_term_v2()`) with
       `offset_word = end_offset - next_boundary` and
       `params[2:4] = next_boundary BE`.  The TERM response carries the
       partial last chunk + 26-byte file footer.
 - **New helpers:** `bulk_waveform_term_v2(key4, end_offset, last_chunk_counter)`
  and `parse_strt_end_offset(a5_data)` in `minimateplus.framing`.
 - **`stop_after_metadata` / `extra_chunks_after_metadata` kwargs are now
  no-ops** under the v0.14.x walk.  They are retained on the
  `read_bulk_waveform_stream` signature for backward compatibility but log a
  DEBUG line when set.  The old "scan for `b'Project:'` and stop one chunk
  later" workaround is obsolete — the loop is deterministically bounded by
  the STRT-derived `end_offset`.
 - **Project / Client / User Name / Seis Loc string source corrected.**
  These come from the dedicated metadata pages at counter `0x1002` /
  `0x1004`, not from "A5 frame 7" of the sample-chunk stream.  The
  earlier "A5 frame 7" claim was an artifact of the broken `0x0400`-step
  walk where the bad counter formula coincidentally landed sample-chunk
  fi=7 on top of the 0x1002 metadata page.
 ### Verified
 - Three independent BW MITM captures (4-27-26 + 5-1-26 + 5-4-26) confirm
  the new walk matches BW's behaviour event-for-event.
 - `end_offset` values verified across 3 events: `0x1ABE` (4-27-26 2-sec),
  `0x21F2` (5-1-26 3-sec), `0x417E` (5-1-26 event-2).
 ### Notes
 - Earlier v0.13.0 / v0.13.1 / v0.13.2 entries describe partial steps along
  the way (some of the file builder fixes, filename bugs, etc.) that were
  superseded by the full rewrite.  Treat this v0.14.0 entry as the
  definitive landing point for the corrected SUB 5A protocol.
 ---
 ## v0.14.1 — 2026-05-04
 ### Fixed
@@ -1,4 +1,4 @@
-# seismo-relay  `v0.14.3`
+# seismo-relay  `v0.15.0`
 A ground-up replacement for **Blastware** — Instantel's aging Windows-only
 software for managing MiniMate Plus seismographs.
@@ -14,7 +14,11 @@ over direct RS-232 or cellular modem (Sierra Wireless RV50 / RV55).
 > byte-perfect against Blastware captures across 2-sec, 3-sec, and 10-sec
 > events.** Generated `.G10` / `.AB0` files open cleanly in Blastware with
 > full Event Reports, frequency analysis, and waveform plots.
-> See [CHANGELOG.md](CHANGELOG.md) for full version history.
+> **v0.15.0 (2026-05-07)** adds layered per-event storage (BW binary +
 > raw 5A pickle + HDF5 + `.sfm.json` sidecar), a plot-ready
 > `sfm.plot.v1` JSON shape with server-side ADC-to-physical-units
 > conversion, and a BW-file importer for ingesting externally-produced
 > events.  See [CHANGELOG.md](CHANGELOG.md) for full version history.
 ---
@@ -11,6 +11,7 @@
 | Date | Section | Change |
 |---|---|---|
 | 2026-05-08 | §7.6.1 (RETRACTION) | **❌ RETRACTED — "raw int16 LE 8 bytes/sample-set" body codec was never validated.** The original 4-2-26 confirmation was based on misreading broken-decoder output (full-scale ±32K noise) as evidence the signal had saturated. BW's own 0C peaks for that capture (Tran=0.420 / Vert=3.870 / Long=0.495 in/s) prove the signal was NOT saturated — none of those exceed 13K ADC counts. No event in the project's archive has ever come close to saturation, yet the decoder consistently produces ±32K noise on every event. Conclusion: the body codec is not raw int16 LE; the actual encoding is open. Body byte distribution is heavily skewed (24% `0x00`, 10.5% `0x10`, lots of `10 XX` pairs) — likely a delta encoding with `0x10` as escape, but unverified. Retraction box added at top of §7.6.1; "fully-saturating event" claim removed from channel-identification note. The histogram codec in §7.6.2 IS verified and decoded correctly (different recording mode, 32-byte blocks); use it as a structural hint when reverse-engineering the waveform codec. |
 | 2026-02-26 | Initial | Document created from first hex dump analysis |
 | 2026-02-26 | §2 Frame Structure | **CORRECTED:** Frame uses DLE-STX (`0x10 0x02`) and DLE-ETX (`0x10 0x03`), not bare `0x02`/`0x03`. `0x41` confirmed as ACK not STX. DLE stuffing rule added. |
 | 2026-02-26 | §8 Timestamp | **UPDATED:** Year `0x07CB = 1995` confirmed as MiniMate hardware default date when RTC battery is disconnected. Not an encoding error. Confidence upgraded from ❓ to 🔶. |
@@ -851,14 +852,59 @@ MicL:  39 64 1D AA  =  0.0000875 psi
 >   strings actually live — NOT in any sample-chunk frame)
 > - **§7.8.8** — multi-event "Download All" sequence
 >
-> The waveform sample encoding (4-channel interleaved s16 LE, 8 bytes per sample-set) described in §7.6.1
+> The waveform sample encoding described in §7.6.1 below (4-channel interleaved s16 LE, 8 bytes
-> below is still correct.  Only the frame-indexing claims and metadata-source claims are wrong.
+> per sample-set) is **NOT actually verified** — see the retraction note at the top of §7.6.1.
 > The frame-indexing claims and metadata-source claims in §7.6 are also wrong; use §7.8.5–§7.8.8.
 **Two distinct formats exist depending on recording mode.  Both confirmed from captures.**
 ---
-#### 7.6.1 Blast / Waveform mode — ✅ CONFIRMED (4-2-26 capture)
+#### 7.6.1 Blast / Waveform mode — ❌ NOT VERIFIED (retracted 2026-05-08)
 > ## ⚠️ RETRACTION (2026-05-08)
 >
 > The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim
 > below was **never actually validated**.  It got into this document
 > because the decoder built around that assumption produced full-scale
 > ±32K counts on every channel of the 4-2-26 capture, and the
 > ±32K-shaped output was misread as "the signal must have saturated."
 >
 > Cross-checking the BW-reported peaks proves the opposite:
 >
 > | Channel | BW PPV (in/s) | Expected ADC counts at 10 in/s FS |
 > |---|---|---|
 > | Tran | 0.420 | **1,376** |
 > | Vert | 3.870 | **12,686** |
 > | Long | 0.495 | **1,622** |
 >
 > None of these are anywhere near ±32K saturation.  No event in the
 > project's archive (across all captures from 1-2-26 onward) has
 > ever come close to saturation either.  Yet the decoder has
 > consistently produced ±32K-shaped noise on every event.  The right
 > conclusion is that the byte-to-sample interpretation has been wrong
 > the whole time, NOT that every event happened to saturate.
 >
 > What's actually known about the body bytes:
 >
 > - The byte distribution is heavily skewed (24% `0x00`, 10.5% `0x10`,
 >   plus high frequencies of `0x01 / 0x04 / 0x0F / 0xF0 / 0xF1`).  Lots
 >   of `10 XX` pairs.  Reading them as LE int16 produces uniform ±32K
 >   noise — the signature of mis-aligned or encoded data.
 > - The CHANGELOG note for v0.14.2 calls the body a "delta-encoded
 >   ADC stream" — that hint plus the byte distribution points toward
 >   a delta encoding with `0x10` as an escape marker, but no decoder
 >   has been worked out yet.
 > - The histogram-mode codec in §7.6.2 IS verified and decoded
 >   correctly (different format: 32-byte blocks with 9× int16 LE
 >   samples + metadata).  The same firmware emits both formats, so
 >   §7.6.2 may share encoding primitives with the waveform codec
 >   and is worth using as a structural hint when reverse-engineering.
 >
 > **Treat the spec below as a starting hypothesis to disprove, not
 > ground truth.**  The frame-layout pieces (STRT location, preamble,
 > chunk header) appear correct; the per-byte sample interpretation
 > is the open question.
 4-channel interleaved signed 16-bit little-endian, 8 bytes per sample-set:
@@ -923,11 +969,18 @@ Total:     7633B  → 954 naive sample-sets, 948 alignment-corrected
 Only 948 of 9306 sample-sets captured (10%) — `stop_after_metadata=True` terminated
 download after A5[7] was received.
-**Channel identification note:**  The 4-2-26 blast saturated all four geophone channels
+**Channel identification note:**  Channel ordering [Tran, Vert, Long, Mic] = [ch0, ch1, ch2, ch3]
-to near-maximum ADC output (~32000–32617 counts).  Channel ordering [Tran, Vert, Long, Mic]
+is the Blastware convention.  This ordering has not been independently verified end-to-end,
-= [ch0, ch1, ch2, ch3] is the Blastware convention and is consistent with per-channel PPV
+since no decoder yet produces samples that match BW's own rendering of the same event (see
-values (Tran=0.420, Vert=3.870, Long=0.495 in/s from 0C record), but cannot be
+the retraction at the top of §7.6.1).  Once the body codec is decoded, the per-channel PPV
-independently confirmed from a fully-saturating event alone.
+values from the 0C record (Tran=0.420, Vert=3.870, Long=0.495 in/s for the 4-2-26 capture)
 provide the cross-check that pins down channel order.
 > **Historical note:** earlier revisions of this section claimed the 4-2-26 blast had
 > "saturated all four channels to ~32000–32617 counts," citing that as evidence the s16 LE
 > interpretation was correct.  That claim was wrong — the ±32K values were the broken
 > decoder's output, not the actual signal amplitude (which the 0C peaks above show was
 > nowhere near saturation).  Retracted 2026-05-08.
 ---
@@ -639,7 +639,7 @@ def write_blastware_file(
        strt = b"STRT" + b"\xff\xfe" + key4 + bytes(14) + bytes([rectime & 0xFF])
        probe_skip = 7 + 21
-    log.warning(
+    log.debug(
        "write_blastware_file: strt_pos_stripped=%d  probe_skip=%d  "
        "probe_data_len=%d  strt_hex=%s",
        strt_pos_stripped if strt_pos_stripped >= 0 else -1,
@@ -708,8 +708,8 @@ def write_blastware_file(
            skip = 12   # sample chunks
        contribution = _frame_body_bytes(frame, skip)
-        log.warning("write_blastware_file: fi=%d skip=%d raw_data=%d contribution=%d",
+        log.debug("write_blastware_file: fi=%d skip=%d raw_data=%d contribution=%d",
-                    fi, skip, len(frame.data), len(contribution))
+                  fi, skip, len(frame.data), len(contribution))
        all_bytes.extend(contribution)
    # Terminator contributes its content, which ends with the 26-byte footer.
@@ -717,7 +717,7 @@ def write_blastware_file(
    # one shorter than chunk frames' 5-byte inner header.  Confirmed 2026-04-21.
    if term_frame is not None:
        term_contribution = _frame_body_bytes(term_frame, 11)
-        log.warning(
+        log.debug(
            "write_blastware_file: term_frame data_len=%d  skip=11  "
            "contribution_len=%d  first8=%s",
            len(term_frame.data),
@@ -726,7 +726,7 @@ def write_blastware_file(
        )
        all_bytes.extend(term_contribution)
-    log.warning(
+    log.debug(
        "write_blastware_file: all_bytes total=%d  last28=%s",
        len(all_bytes),
        bytes(all_bytes[-28:]).hex() if len(all_bytes) >= 28 else bytes(all_bytes).hex(),
@@ -760,7 +760,7 @@ def write_blastware_file(
    if footer_pos >= 0:
        body   = bytes(all_bytes[:footer_pos])
        footer = bytes(all_bytes[footer_pos:footer_pos + 26])
-        log.warning(
+        log.debug(
            "write_blastware_file: real 0e 08 footer at all_bytes[%d]; "
            "truncating %d post-footer bytes",
            footer_pos, len(all_bytes) - footer_pos - 26,
@@ -1362,6 +1362,20 @@ def _decode_waveform_record_into(data: bytes, event: Event) -> None:
    Modifies event in-place.
    """
    # ── Always preserve the raw 210 bytes ─────────────────────────────────────
    # The 0C record carries far more than just peaks + project strings:
    # ZC Freq, Time of Peak, Peak Acceleration, Peak Displacement, Vector
    # Sum Time, MicL Time of Peak, and the per-channel sensor self-check
    # results (Test Freq / Ratio / Pass-Fail) all live somewhere in this
    # 210-byte block.  Their byte offsets are not yet mapped — keeping the
    # raw bytes lets us decode those fields offline once we have a paired
    # (raw 0C, BW-report) sample to fit against.  Cheap to keep around
    # (210 bytes per event).
    try:
        event._raw_record = bytes(data[:210])
    except Exception:
        pass
    # ── Record type + format detection ────────────────────────────────────────
    # `record_type` is the user-facing label ("Waveform" for any triggered
    # event regardless of timestamp-header layout).  `fmt` is the internal
@@ -15,6 +15,7 @@ declared in `event_to_sidecar_dict()`.
 from __future__ import annotations
 import base64
 import datetime
 import hashlib
 import json
@@ -135,6 +136,20 @@ def event_to_sidecar_dict(
    captured_at = captured_at or datetime.datetime.utcnow()
    # Stash raw 0C record bytes in `extensions.raw_records` so future
    # field-decoding work (Peak Acceleration, ZC Freq, Time of Peak,
    # sensor self-check results, etc.) can run offline against committed
    # sidecars without a live device.  Cheap (~280 bytes base64) and
    # forward-compatible (older readers ignore unknown extensions keys).
    ext_dict: dict = dict(extensions) if extensions else {}
    raw_0c = getattr(event, "_raw_record", None)
    if raw_0c:
        rr = ext_dict.setdefault("raw_records", {})
        # Don't clobber a raw_0c that callers explicitly passed in via
        # `extensions=...` (e.g. round-trip preservation in patch_sidecar).
        rr.setdefault("waveform_record_b64", base64.b64encode(raw_0c).decode("ascii"))
        rr.setdefault("waveform_record_len", len(raw_0c))
    return {
        "schema_version": SCHEMA_VERSION,
        "kind":           SIDECAR_KIND,
@@ -174,7 +189,7 @@ def event_to_sidecar_dict(
            "notes":         "",
        },
-        "extensions": extensions or {},
+        "extensions": ext_dict,
    }
@@ -111,14 +111,15 @@ def build_5a_frame(offset_word: int, raw_params: bytes) -> bytes:
    verified against this algorithm on 2026-04-02).
    Args:
-        offset_word: 16-bit offset (0x1004 for probe/chunks, 0x005A for term).
+        offset_word: 16-bit offset.  For probe/chunks/metadata pages this is
-        raw_params:  10 or 11 params bytes (from bulk_waveform_params or
+                     `0x1002`.  For the proper TERM frame this is computed by
-                     bulk_waveform_term_params). 0x10 bytes in params are
+                     `bulk_waveform_term_v2()` from the STRT-derived
-                     written RAW — NOT DLE-stuffed. Confirmed 2026-04-06 by
+                     `end_offset`.
-                     comparing wire bytes: BW sends bare `10 04` for chunk 1
+        raw_params:  10, 11, or 12 params bytes (from `bulk_waveform_params`
-                     (counter=0x1004), not stuffed `10 10 04`. Device reads
+                     for probes/samples, `bulk_waveform_term_v2` for TERM, or
-                     params at fixed byte positions; stuffing shifts the bytes
+                     a manually-built 12-byte block for the metadata pages
-                     and corrupts the counter, causing device to ignore the frame.
+                     0x1002 / 0x1004).  See gotcha #3 below — params region
                     uses partial DLE stuffing of 0x10 bytes.
    Returns:
        Complete frame bytes: [ACK][STX][stuffed_section][chk][ETX]
@@ -433,21 +434,26 @@ def bulk_waveform_params(key4: bytes, counter: int, *, is_probe: bool = False) -
 def bulk_waveform_term_params(key4: bytes, counter: int) -> bytes:
    """
-    DEPRECATED 2026-05-01 — see bulk_waveform_term_v2().
+    ⛔ DEPRECATED — DO NOT USE IN NEW CODE.
-    Build the 10-byte params block for the SUB 5A termination request, OLD layout
+    This is the v1 termination params helper, paired with the broken
-    (used in conjunction with the fixed offset_word=0x005A).  Kept for backward
+    `_BULK_TERM_OFFSET = 0x005A` magic offset_word.  Together they produce a
-    compatibility — produces a tiny ~100-byte device-side terminator response
+    ~100-byte device-side terminator response that does NOT contain the
-    rather than the proper partial-last-chunk + footer payload that BW gets.
+    partial-last-chunk waveform tail or the 26-byte file footer.  Files
    reconstructed using this terminator are missing their last ~512 bytes of
    waveform data and have a synthesized footer that disagrees with what BW
    would have written.
-      params[0]   = key4[0]
+    **For new code, use `bulk_waveform_term_v2(key4, end_offset, last_chunk_counter)`**
-      params[1]   = key4[1]
+    which computes the correct offset_word + params from the STRT-derived
-      params[2]   = (counter >> 8) & 0xFF
+    `end_offset`.  v2 produces wire bytes that match BW exactly across all
-      params[3:]  = zeros
+    tested events (4-27-26 / 5-1-26 / 5-4-26 captures).
-    Use bulk_waveform_term_v2() for new code — it computes the verified
+    This function is retained ONLY for the defensive fallback path in
-    offset_word + params from end_offset (extracted from STRT) and the last
+    `read_bulk_waveform_stream()` that triggers when STRT parsing fails or no
-    chunk counter.
+    chunks are fetched (= a malformed event or an unexpected device state).
    The fallback already logs a WARNING when it activates; if you see that
    warning, the bug is upstream — STRT should have been parseable.
    """
    if len(key4) != 4:
        raise ValueError(f"waveform key must be 4 bytes, got {len(key4)}")
@@ -937,7 +937,7 @@ class MiniMateProtocol:
                continue
            chunk = data_rsp.data[11:]
-            log.warning(
+            log.debug(
                "read_compliance_config: frame %s  page=0x%04X  data=%d  cfg_chunk=%d  running_total=%d",
                step_name, data_rsp.page_key, len(data_rsp.data),
                len(chunk), len(config) + len(chunk),
@@ -957,17 +957,18 @@ class MiniMateProtocol:
        except TimeoutError:
            pass
-        log.warning(
+        log.info(
            "read_compliance_config: done — %d cfg bytes total",
            len(config),
        )
-        # Hex dump first 128 bytes for field mapping
+        # Hex dump first 128 bytes — useful only for field-mapping work, not normal operation.
-        for row in range(0, min(len(config), 128), 16):
+        if log.isEnabledFor(logging.DEBUG):
-            row_bytes = bytes(config[row:row + 16])
+            for row in range(0, min(len(config), 128), 16):
-            hex_part = ' '.join(f'{b:02x}' for b in row_bytes)
+                row_bytes = bytes(config[row:row + 16])
-            asc_part = ''.join(chr(b) if 32 <= b < 127 else '.' for b in row_bytes)
+                hex_part = ' '.join(f'{b:02x}' for b in row_bytes)
-            log.warning("  cfg[%04x]: %-48s  %s", row, hex_part, asc_part)
+                asc_part = ''.join(chr(b) if 32 <= b < 127 else '.' for b in row_bytes)
                log.debug("  cfg[%04x]: %-48s  %s", row, hex_part, asc_part)
        return bytes(config)
@@ -0,0 +1,216 @@
 """
 sfm.dump_0c — inspect the raw 210-byte SUB 0C waveform record stored in a
 sidecar JSON's `extensions.raw_records.waveform_record_b64`.
 Usage:
    python -m sfm.dump_0c <sidecar.sfm.json> [<sidecar.sfm.json> ...]
 Prints, for each input:
  - A header summarising the sidecar's metadata-block claims (peaks,
    project, timestamp) — the "what BW says this event measured" view.
  - A 16-byte-wide hex dump of the raw 0C record, annotated with known
    field anchors (STRT, channel labels, project strings).
  - A "candidate float regions" scan that brute-forces every byte
    position as a float32 BE and prints any that yield a value in a
    plausible range (1e-7 to 1e3) — useful for hunting where Peak
    Acceleration / Peak Displacement / ZC Freq / Time of Peak live.
 Pairing the printed candidates with the BW Event Report values lets
 us nail down byte offsets for the missing fields without a live
 device.
 """
 from __future__ import annotations
 import argparse
 import base64
 import json
 import struct
 import sys
 from pathlib import Path
 # ── Annotations for known anchors in a 210-byte 0C record ──────────────────
 # Anchors we look for and label inline in the hex dump.  Each is a needle
 # (bytes to find) and a short label.  Found via .find() — the first
 # occurrence wins.
 _ANCHORS = [
    (b"Tran",            "Tran label  (PPV @ +6, PVS @ -12)"),
    (b"Vert",            "Vert label  (PPV @ +6)"),
    (b"Long",            "Long label  (PPV @ +6)"),
    (b"MicL",            "MicL label  (peak psi @ +6)"),
    (b"Project:",        "Project: label"),
    (b"Client:",         "Client: label"),
    (b"User Name:",      "User Name: label"),
    (b"Seis Loc:",       "Seis Loc: label"),
    (b"Extended Notes",  "Extended Notes label"),
 ]
 def _hex_dump(data: bytes, anchors: dict[int, str]) -> str:
    """Return a 16-byte-wide hex+ASCII dump, with anchor labels printed
    on the line that contains the anchor's start byte."""
    lines = []
    for off in range(0, len(data), 16):
        chunk = data[off : off + 16]
        hex_part   = " ".join(f"{b:02x}" for b in chunk)
        ascii_part = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk)
        line = f"  {off:04x}  {hex_part:<47}  |{ascii_part}|"
        # If any anchor lands on a byte in this row, append a tag
        tags = [
            f"[{a:#04x}: {label}]"
            for a, label in anchors.items()
            if off <= a < off + 16
        ]
        if tags:
            line += "  " + "  ".join(tags)
        lines.append(line)
    return "\n".join(lines)
 def _scan_float32_be(data: bytes, lo: float, hi: float) -> list[tuple[int, float]]:
    """Brute-force every offset where data[off:off+4] is a float32 BE in
    (lo, hi).  Includes negatives in the symmetric range."""
    hits = []
    for i in range(len(data) - 3):
        try:
            v = struct.unpack_from(">f", data, i)[0]
        except struct.error:
            continue
        if v != v:                     # NaN
            continue
        if abs(v) < 1e-30 or abs(v) > 1e10:   # crap range
            continue
        a = abs(v)
        if lo <= a <= hi:
            hits.append((i, v))
    return hits
 def _scan_uint16_be(data: bytes, lo: int, hi: int) -> list[tuple[int, int]]:
    """Find every offset where uint16 BE is in [lo, hi]."""
    hits = []
    for i in range(len(data) - 1):
        v = (data[i] << 8) | data[i + 1]
        if lo <= v <= hi:
            hits.append((i, v))
    return hits
 def _summarize_sidecar(side: dict) -> str:
    ev   = side.get("event", {})
    pv   = side.get("peak_values", {})
    pi   = side.get("project_info", {})
    bw   = side.get("blastware", {})
    return (
        f"  serial:     {ev.get('serial')}\n"
        f"  timestamp:  {ev.get('timestamp')}\n"
        f"  waveform:   {ev.get('waveform_key')}  ({ev.get('record_type')})\n"
        f"  sample_rate:{ev.get('sample_rate')} sps  rectime:{ev.get('rectime_seconds')}s\n"
        f"  bw file:    {bw.get('filename')}  ({bw.get('filesize')} B)\n"
        f"  peaks:      "
        f"Tran={pv.get('transverse'):.5f}  "
        f"Vert={pv.get('vertical'):.5f}  "
        f"Long={pv.get('longitudinal'):.5f}  "
        f"PVS={pv.get('vector_sum'):.5f} in/s  "
        f"Mic={pv.get('mic_psi'):.6e} psi"
        if all(pv.get(k) is not None for k in
               ("transverse", "vertical", "longitudinal", "vector_sum", "mic_psi"))
        else f"  peaks:      {pv}\n  project:    {pi}"
    ) + (
        f"\n  project:    {pi.get('project')!r}  / {pi.get('client')!r}  / "
        f"operator={pi.get('operator')!r}  loc={pi.get('sensor_location')!r}"
    )
 def dump_one(path: Path) -> int:
    side = json.loads(path.read_text(encoding="utf-8"))
    raw_b64 = (
        side.get("extensions", {})
            .get("raw_records", {})
            .get("waveform_record_b64")
    )
    if not raw_b64:
        print(f"\n=== {path} ===")
        print("  ! no extensions.raw_records.waveform_record_b64 — sidecar")
        print("    pre-dates raw-0C persistence (added in v0.15.x).  Re-save")
        print("    the event from the device to capture the bytes.")
        return 1
    raw = base64.b64decode(raw_b64)
    # Build anchor map
    anchors: dict[int, str] = {}
    for needle, label in _ANCHORS:
        i = raw.find(needle)
        if i >= 0:
            anchors[i] = label
    print(f"\n=== {path} ===")
    print("metadata claimed by sidecar:")
    print(_summarize_sidecar(side))
    print(f"\nraw 0C record  ({len(raw)} bytes):")
    print(_hex_dump(raw, anchors))
    # Float32 BE candidates in geo-relevant ranges
    geo_hits = _scan_float32_be(raw, 1e-5, 50.0)
    # Filter: only show hits that are NOT trivially the per-channel labels'
    # +6 PPV floats already documented (those will land in any sweep too).
    print("\nfloat32 BE candidates (1e-5 .. 50.0):")
    for off, v in geo_hits:
        annotation = ""
        for needle, _ in _ANCHORS[:4]:   # geo + mic labels
            i = raw.find(needle)
            if i >= 0 and off == i + 6:
                annotation = f"  ← {needle.decode()} PPV (label+6)"
                break
        print(f"    {off:#04x}  ({off:3d})  {v:>+15.6f}{annotation}")
    print("\nuint16 BE candidates ZC-Freq-ish (1..200):")
    for off, v in _scan_uint16_be(raw, 1, 200):
        if v < 5:    # too noisy at very low end
            continue
        print(f"    {off:#04x}  ({off:3d})  = {v}")
    print("\nuint16 BE candidates Time-of-Peak-ish if stored as ms (1..30000):")
    for off, v in _scan_uint16_be(raw, 1, 30000):
        if v < 100:  # noise filter
            continue
        # Only the first ~80 are worth showing — too many hits otherwise
        if off > 80:
            break
        print(f"    {off:#04x}  ({off:3d})  = {v} ms ?")
    print()
    return 0
 def main(argv: list[str] | None = None) -> int:
    p = argparse.ArgumentParser(
        description="Inspect a saved 0C waveform record from a sidecar JSON.",
    )
    p.add_argument(
        "sidecars",
        nargs="+",
        type=Path,
        help="Path(s) to <event>.sfm.json sidecar file(s).",
    )
    args = p.parse_args(argv)
    rc = 0
    for path in args.sidecars:
        try:
            rc |= dump_one(path)
        except Exception as exc:
            print(f"\n=== {path} ===\n  ERROR: {exc}", file=sys.stderr)
            rc |= 2
    return rc
 if __name__ == "__main__":
    sys.exit(main())
@@ -0,0 +1,252 @@
 """
 test_5a_protocol.py — Regression test for the v0.14.x SUB 5A protocol fixes.
 Verifies that SFM's framing helpers reproduce Blastware's exact wire bytes
 for every 5A request frame in the 5-1-26 "bwcap3sec" capture, AND that the
 file builder produces a byte-identical file when fed the BW capture's A5
 responses.
 Together these two tests protect all four v0.14.x fixes:
  v0.14.0 — STRT-bounded chunk walk (probe @ 0, metadata pages @ 0x1002 +
            0x1004, samples @ 0x0600..0x1E00 step 0x0200, TERM at residual)
  v0.14.1 — event-N probe counter is `start_offset`, not `start_offset+0x46`
            (covered by the multi-event captures, not this 3-sec event-1
            capture — but the helpers are the same code path)
  v0.14.2 — file body assembly is contiguous concatenation, no de-duplication
  v0.14.3 — partial DLE stuffing of `0x10` bytes in 5A params (counter=0x1000
            wire bytes are `10 10 00`, not `10 00`)
 If any of these fixes regresses, this test fails immediately with a clear
 byte-level diff.
 Run:
    python -m pytest tests/test_5a_protocol.py -v
 or:
    python tests/test_5a_protocol.py
 """
 from __future__ import annotations
 import os
 import sys
 import pytest
 # Allow running from the project root without installation
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from minimateplus.framing import (
    S3FrameParser,
    build_5a_frame,
    bulk_waveform_params,
    bulk_waveform_term_v2,
 )
 # ── Capture loading ────────────────────────────────────────────────────────────
 ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 # Reference BW MITM capture: BW saving a 3-sec event 0 (start_key=01110000,
 # end_offset=0x21F2). 17 5A frames: probe + 2 metadata pages + 13 samples + TERM.
 BW_TX_PATH = os.path.join(
    ROOT,
    "bridges/captures/5-1-26/comcheck/bwcap3sec/"
    "raw_bw_20260501_165723_copy_3sec_waveform_to_disk.bin",
 )
 BW_S3_PATH = os.path.join(
    ROOT,
    "bridges/captures/5-1-26/comcheck/bwcap3sec/"
    "raw_s3_20260501_165723_copy_3sec_waveform_to_disk.bin",
 )
 # BW's saved Blastware file for the same event (used for file-builder verification).
 BW_SAVED_FILE = os.path.join(
    ROOT, "example-events/decode_test/5-1-26/bw/M529LKIQ.G10",
 )
 def _split_bw_frames(data: bytes) -> list[bytes]:
    """Split BW TX bytes into individual frames (ACK STX … bare ETX)."""
    frames: list[bytes] = []
    i = 0
    while i < len(data):
        if data[i] != 0x41 or i + 1 >= len(data) or data[i + 1] != 0x02:
            i += 1
            continue
        j = i + 2
        while j < len(data):
            if data[j] == 0x03:
                break
            if data[j] == 0x10 and j + 1 < len(data):
                j += 2
                continue
            j += 1
        if j >= len(data):
            break
        frames.append(data[i : j + 1])
        i = j + 1
    return frames
@pytest.fixture(scope="module")
 def bw_5a_frames() -> list[bytes]:
    """All 5A frames from the BW TX capture, in wire order."""
    if not os.path.exists(BW_TX_PATH):
        pytest.skip(f"BW capture not found: {BW_TX_PATH}")
    raw = open(BW_TX_PATH, "rb").read()
    frames = [
        f for f in _split_bw_frames(raw)
        if len(f) >= 6 and f[5] == 0x5A   # body[3] == 0x5A (SUB)
    ]
    assert len(frames) == 17, f"expected 17 5A frames in capture, got {len(frames)}"
    return frames
@pytest.fixture(scope="module")
 def bw_a5_frames():
    """All A5 (response) frames from the matching S3 capture."""
    if not os.path.exists(BW_S3_PATH):
        pytest.skip(f"BW S3 capture not found: {BW_S3_PATH}")
    raw = open(BW_S3_PATH, "rb").read()
    p = S3FrameParser()
    p.feed(raw)
    a5 = [f for f in p.frames if f.sub == 0xA5]
    assert len(a5) == 17, f"expected 17 A5 frames in capture, got {len(a5)}"
    return a5
 # ── 5A request frame byte-perfect verification ────────────────────────────────
 KEY4 = bytes.fromhex("01110000")   # start_key for the 3-sec event 0
 END_OFFSET = 0x21F2                # parsed from STRT in the BW capture
 LAST_CHUNK_COUNTER = 0x1E00        # last full 0x0200-byte chunk before TERM
 SAMPLE_COUNTERS = (
    0x0600, 0x0800, 0x0A00, 0x0C00, 0x0E00,
    0x1000, 0x1200, 0x1400, 0x1600, 0x1800,
    0x1A00, 0x1C00, 0x1E00,
 )
 def _meta_params(key: bytes, counter: int) -> bytes:
    """Build the 12-byte metadata-page params block (matches BW for 0x1002 / 0x1004)."""
    return bytes(
        [
            0x00, key[0], key[1],
            (counter >> 8) & 0xFF, counter & 0xFF,
            0, 0, 0, 0, 0, 0, 0,
        ]
    )
 def test_probe_frame_byte_perfect(bw_5a_frames):
    """Probe @ counter=0x0000 (frame 0)."""
    sfm = build_5a_frame(0x1002, bulk_waveform_params(KEY4, 0, is_probe=True))
    assert sfm == bw_5a_frames[0], (
        f"\nSFM:    {sfm.hex()}\nBW:     {bw_5a_frames[0].hex()}"
    )
@pytest.mark.parametrize("idx,counter", [(1, 0x1002), (2, 0x1004)])
 def test_metadata_page_frames_byte_perfect(bw_5a_frames, idx, counter):
    """Metadata pages @ counter=0x1002 and 0x1004 (frames 1 and 2)."""
    sfm = build_5a_frame(0x1002, _meta_params(KEY4, counter))
    assert sfm == bw_5a_frames[idx], (
        f"\nSFM:    {sfm.hex()}\nBW:     {bw_5a_frames[idx].hex()}"
    )
@pytest.mark.parametrize("i,counter", list(enumerate(SAMPLE_COUNTERS)))
 def test_sample_chunk_frames_byte_perfect(bw_5a_frames, i, counter):
    """
    Sample chunks @ counter=0x0600..0x1E00, step 0x0200 (frames 3..15).
    Critically, frame 8 (counter=0x1000) requires the v0.14.3 partial DLE
    stuffing fix — wire params include `10 10 00` for the counter, not `10 00`.
    """
    sfm = build_5a_frame(0x1002, bulk_waveform_params(KEY4, counter))
    bw_idx = 3 + i
    assert sfm == bw_5a_frames[bw_idx], (
        f"\ncounter=0x{counter:04X}"
        f"\nSFM:    {sfm.hex()}"
        f"\nBW:     {bw_5a_frames[bw_idx].hex()}"
    )
 def test_term_frame_byte_perfect(bw_5a_frames):
    """TERM frame at residual (frame 16)."""
    offset_word, params = bulk_waveform_term_v2(KEY4, END_OFFSET, LAST_CHUNK_COUNTER)
    sfm = build_5a_frame(offset_word, params)
    assert sfm == bw_5a_frames[16], (
        f"\nSFM:    {sfm.hex()}\nBW:     {bw_5a_frames[16].hex()}"
    )
 def test_strt_end_offset_parsing(bw_a5_frames):
    """The probe response (A5[0]) carries STRT at byte 17 with end_offset=0x21F2."""
    from minimateplus.framing import parse_strt_end_offset
    end_offset = parse_strt_end_offset(bw_a5_frames[0].data)
    assert end_offset == END_OFFSET, (
        f"expected end_offset=0x{END_OFFSET:04X}, got "
        f"{f'0x{end_offset:04X}' if end_offset is not None else 'None'}"
    )
 # ── File builder byte-perfect verification ────────────────────────────────────
 def test_blastware_file_builder_byte_perfect(bw_a5_frames):
    """
    Feed the BW capture's A5 frames into write_blastware_file() and verify the
    output is byte-identical to BW's saved M529LKIQ.G10 reference file.
    This protects the v0.14.2 strip-removal fix and the file-builder skip
    values (probe=38, meta=13, samples=12, TERM=11).
    """
    if not os.path.exists(BW_SAVED_FILE):
        pytest.skip(f"BW saved file not found: {BW_SAVED_FILE}")
    import tempfile
    from minimateplus.blastware_file import write_blastware_file
    from minimateplus.models import Event
    ev = Event(index=0)
    ev._waveform_key = KEY4
    ev.rectime_seconds = 3
    ev.timestamp = None   # let the builder pull the footer from the TERM frame
    with tempfile.NamedTemporaryFile(suffix=".G10", delete=False) as tf:
        tmp_path = tf.name
    try:
        write_blastware_file(ev, bw_a5_frames, tmp_path)
        sfm_bytes = open(tmp_path, "rb").read()
    finally:
        os.unlink(tmp_path)
    bw_bytes = open(BW_SAVED_FILE, "rb").read()
    assert len(sfm_bytes) == len(bw_bytes), (
        f"file size mismatch: SFM={len(sfm_bytes)} BW={len(bw_bytes)}"
    )
    if sfm_bytes != bw_bytes:
        # Find first diff for actionable error message
        for i in range(len(bw_bytes)):
            if bw_bytes[i] != sfm_bytes[i]:
                ctx_start = max(0, i - 8)
                ctx_end = min(len(bw_bytes), i + 16)
                pytest.fail(
                    f"file diverges at byte 0x{i:04X}\n"
                    f"  BW :  {bw_bytes[ctx_start:ctx_end].hex()}\n"
                    f"  SFM:  {sfm_bytes[ctx_start:ctx_end].hex()}\n"
                    f"          {'  ' * (i - ctx_start)}^^"
                )
 # ── Standalone runner ─────────────────────────────────────────────────────────
 if __name__ == "__main__":
    sys.exit(pytest.main([__file__, "-v"]))
@@ -127,6 +127,59 @@ def test_sidecar_write_and_read_round_trip(tmp_path: Path):
    assert loaded["source"]["kind"] == "sfm-ach"
 def test_sidecar_persists_raw_0c_record_in_extensions(tmp_path: Path):
    """An Event with _raw_record populated should land its 210 bytes
    base64-encoded in extensions.raw_records.waveform_record_b64, so
    later analysis (e.g. mapping Peak Acceleration / Time of Peak / ZC
    Freq byte offsets) can run offline against the saved sidecar."""
    import base64
    ev, _ = _make_synthetic_event()
    # Synthesize a 210-byte 0C record with embedded label needles so
    # the dump tool's anchor scan has something to find.
    raw = bytearray(210)
    raw[10:14]   = b"Tran"
    raw[60:64]   = b"Vert"
    raw[110:114] = b"Long"
    raw[160:164] = b"MicL"
    ev._raw_record = bytes(raw)
    d = event_file_io.event_to_sidecar_dict(
        ev, serial="BE11529",
        blastware_filename="M529LKIQ.7M0W", blastware_filesize=1024,
        blastware_sha256="x" * 64, source_kind="sfm-live",
    )
    rr = d["extensions"]["raw_records"]
    assert rr["waveform_record_len"] == 210
    decoded = base64.b64decode(rr["waveform_record_b64"])
    assert decoded == ev._raw_record
    # Round-trip through write/read
    path = tmp_path / "raw0c.sfm.json"
    event_file_io.write_sidecar(path, d)
    loaded = event_file_io.read_sidecar(path)
    assert (
        base64.b64decode(loaded["extensions"]["raw_records"]["waveform_record_b64"])
        == ev._raw_record
    )
 def test_sidecar_omits_raw_records_when_event_has_no_0c(tmp_path: Path):
    """Events without a _raw_record (e.g. constructed by importers that
    never see 0C) should NOT add an empty raw_records block — keep the
    sidecar clean for those flows."""
    ev, _ = _make_synthetic_event()
    assert ev._raw_record is None
    d = event_file_io.event_to_sidecar_dict(
        ev, serial="BE11529",
        blastware_filename="M529LKIQ.7M0W", blastware_filesize=1024,
        blastware_sha256="x" * 64, source_kind="bw-import",
    )
    assert d["extensions"] == {}
 def test_sidecar_rejects_unsupported_schema_version(tmp_path: Path):
    path = tmp_path / "future.sfm.json"
    path.write_text(json.dumps({
Author	SHA1	Message	Date
serversdown	a18712442f	feat: preserve and encode raw 0C record in sidecar extensions for offline analysis	2026-05-08 21:50:01 +00:00
serversdown	8aea46b8a0	doc(fix): retracts raw int16 LE sample set assumptions.	2026-05-08 19:26:25 +00:00
serversdown	9123269b1f	feat(protocol): implement v0.14.0 SUB 5A protocol rewrite with enhanced chunk handling and new helpers test: add regression tests for v0.14.x SUB 5A protocol fixes refactor(logging): change warning logs to debug for less verbosity in write_blastware_file	2026-05-08 19:11:55 +00:00
serversdown	9400f59167	doc: update readme to 0.15.0	2026-05-08 19:06:26 +00:00