3 Commits

Author SHA1 Message Date
serversdown 0f7630c10d Merge pull request 'doc: update readme to 0.15.0' (#17) from sfm-waveform-store into main
Reviewed-on: #17
2026-05-08 15:15:36 -04:00
serversdown e1a73b2c44 Merge pull request 'feat: add waveform store handling' (#16) from sfm-waveform-store into main
Reviewed-on: #16
2026-05-08 15:03:32 -04:00
serversdown 429c6ac87a feat(protocol): implement v0.14.0 SUB 5A protocol rewrite with enhanced chunk handling and new helpers
test: add regression tests for v0.14.x SUB 5A protocol fixes
refactor(logging): change warning logs to debug for less verbosity in write_blastware_file
2026-05-06 14:18:31 -04:00
5 changed files with 9 additions and 360 deletions
+8 -61
View File
@@ -11,7 +11,6 @@
| Date | Section | Change | | Date | Section | Change |
|---|---|---| |---|---|---|
| 2026-05-08 | §7.6.1 (RETRACTION) | **❌ RETRACTED — "raw int16 LE 8 bytes/sample-set" body codec was never validated.** The original 4-2-26 confirmation was based on misreading broken-decoder output (full-scale ±32K noise) as evidence the signal had saturated. BW's own 0C peaks for that capture (Tran=0.420 / Vert=3.870 / Long=0.495 in/s) prove the signal was NOT saturated — none of those exceed 13K ADC counts. No event in the project's archive has ever come close to saturation, yet the decoder consistently produces ±32K noise on every event. Conclusion: the body codec is not raw int16 LE; the actual encoding is open. Body byte distribution is heavily skewed (24% `0x00`, 10.5% `0x10`, lots of `10 XX` pairs) — likely a delta encoding with `0x10` as escape, but unverified. Retraction box added at top of §7.6.1; "fully-saturating event" claim removed from channel-identification note. The histogram codec in §7.6.2 IS verified and decoded correctly (different recording mode, 32-byte blocks); use it as a structural hint when reverse-engineering the waveform codec. |
| 2026-02-26 | Initial | Document created from first hex dump analysis | | 2026-02-26 | Initial | Document created from first hex dump analysis |
| 2026-02-26 | §2 Frame Structure | **CORRECTED:** Frame uses DLE-STX (`0x10 0x02`) and DLE-ETX (`0x10 0x03`), not bare `0x02`/`0x03`. `0x41` confirmed as ACK not STX. DLE stuffing rule added. | | 2026-02-26 | §2 Frame Structure | **CORRECTED:** Frame uses DLE-STX (`0x10 0x02`) and DLE-ETX (`0x10 0x03`), not bare `0x02`/`0x03`. `0x41` confirmed as ACK not STX. DLE stuffing rule added. |
| 2026-02-26 | §8 Timestamp | **UPDATED:** Year `0x07CB = 1995` confirmed as MiniMate hardware default date when RTC battery is disconnected. Not an encoding error. Confidence upgraded from ❓ to 🔶. | | 2026-02-26 | §8 Timestamp | **UPDATED:** Year `0x07CB = 1995` confirmed as MiniMate hardware default date when RTC battery is disconnected. Not an encoding error. Confidence upgraded from ❓ to 🔶. |
@@ -852,59 +851,14 @@ MicL: 39 64 1D AA = 0.0000875 psi
> strings actually live — NOT in any sample-chunk frame) > strings actually live — NOT in any sample-chunk frame)
> - **§7.8.8** — multi-event "Download All" sequence > - **§7.8.8** — multi-event "Download All" sequence
> >
> The waveform sample encoding described in §7.6.1 below (4-channel interleaved s16 LE, 8 bytes > The waveform sample encoding (4-channel interleaved s16 LE, 8 bytes per sample-set) described in §7.6.1
> per sample-set) is **NOT actually verified** — see the retraction note at the top of §7.6.1. > below is still correct. Only the frame-indexing claims and metadata-source claims are wrong.
> The frame-indexing claims and metadata-source claims in §7.6 are also wrong; use §7.8.5–§7.8.8.
**Two distinct formats exist depending on recording mode. Both confirmed from captures.** **Two distinct formats exist depending on recording mode. Both confirmed from captures.**
--- ---
#### 7.6.1 Blast / Waveform mode — ❌ NOT VERIFIED (retracted 2026-05-08) #### 7.6.1 Blast / Waveform mode — ✅ CONFIRMED (4-2-26 capture)
> ## ⚠️ RETRACTION (2026-05-08)
>
> The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim
> below was **never actually validated**. It got into this document
> because the decoder built around that assumption produced full-scale
> ±32K counts on every channel of the 4-2-26 capture, and the
> ±32K-shaped output was misread as "the signal must have saturated."
>
> Cross-checking the BW-reported peaks proves the opposite:
>
> | Channel | BW PPV (in/s) | Expected ADC counts at 10 in/s FS |
> |---|---|---|
> | Tran | 0.420 | **1,376** |
> | Vert | 3.870 | **12,686** |
> | Long | 0.495 | **1,622** |
>
> None of these are anywhere near ±32K saturation. No event in the
> project's archive (across all captures from 1-2-26 onward) has
> ever come close to saturation either. Yet the decoder has
> consistently produced ±32K-shaped noise on every event. The right
> conclusion is that the byte-to-sample interpretation has been wrong
> the whole time, NOT that every event happened to saturate.
>
> What's actually known about the body bytes:
>
> - The byte distribution is heavily skewed (24% `0x00`, 10.5% `0x10`,
> plus high frequencies of `0x01 / 0x04 / 0x0F / 0xF0 / 0xF1`). Lots
> of `10 XX` pairs. Reading them as LE int16 produces uniform ±32K
> noise — the signature of mis-aligned or encoded data.
> - The CHANGELOG note for v0.14.2 calls the body a "delta-encoded
> ADC stream" — that hint plus the byte distribution points toward
> a delta encoding with `0x10` as an escape marker, but no decoder
> has been worked out yet.
> - The histogram-mode codec in §7.6.2 IS verified and decoded
> correctly (different format: 32-byte blocks with 9× int16 LE
> samples + metadata). The same firmware emits both formats, so
> §7.6.2 may share encoding primitives with the waveform codec
> and is worth using as a structural hint when reverse-engineering.
>
> **Treat the spec below as a starting hypothesis to disprove, not
> ground truth.** The frame-layout pieces (STRT location, preamble,
> chunk header) appear correct; the per-byte sample interpretation
> is the open question.
4-channel interleaved signed 16-bit little-endian, 8 bytes per sample-set: 4-channel interleaved signed 16-bit little-endian, 8 bytes per sample-set:
@@ -969,18 +923,11 @@ Total: 7633B → 954 naive sample-sets, 948 alignment-corrected
Only 948 of 9306 sample-sets captured (10%) — `stop_after_metadata=True` terminated Only 948 of 9306 sample-sets captured (10%) — `stop_after_metadata=True` terminated
download after A5[7] was received. download after A5[7] was received.
**Channel identification note:** Channel ordering [Tran, Vert, Long, Mic] = [ch0, ch1, ch2, ch3] **Channel identification note:** The 4-2-26 blast saturated all four geophone channels
is the Blastware convention. This ordering has not been independently verified end-to-end, to near-maximum ADC output (~3200032617 counts). Channel ordering [Tran, Vert, Long, Mic]
since no decoder yet produces samples that match BW's own rendering of the same event (see = [ch0, ch1, ch2, ch3] is the Blastware convention and is consistent with per-channel PPV
the retraction at the top of §7.6.1). Once the body codec is decoded, the per-channel PPV values (Tran=0.420, Vert=3.870, Long=0.495 in/s from 0C record), but cannot be
values from the 0C record (Tran=0.420, Vert=3.870, Long=0.495 in/s for the 4-2-26 capture) independently confirmed from a fully-saturating event alone.
provide the cross-check that pins down channel order.
> **Historical note:** earlier revisions of this section claimed the 4-2-26 blast had
> "saturated all four channels to ~3200032617 counts," citing that as evidence the s16 LE
> interpretation was correct. That claim was wrong — the ±32K values were the broken
> decoder's output, not the actual signal amplitude (which the 0C peaks above show was
> nowhere near saturation). Retracted 2026-05-08.
--- ---
-14
View File
@@ -1362,20 +1362,6 @@ def _decode_waveform_record_into(data: bytes, event: Event) -> None:
Modifies event in-place. Modifies event in-place.
""" """
# ── Always preserve the raw 210 bytes ─────────────────────────────────────
# The 0C record carries far more than just peaks + project strings:
# ZC Freq, Time of Peak, Peak Acceleration, Peak Displacement, Vector
# Sum Time, MicL Time of Peak, and the per-channel sensor self-check
# results (Test Freq / Ratio / Pass-Fail) all live somewhere in this
# 210-byte block. Their byte offsets are not yet mapped — keeping the
# raw bytes lets us decode those fields offline once we have a paired
# (raw 0C, BW-report) sample to fit against. Cheap to keep around
# (210 bytes per event).
try:
event._raw_record = bytes(data[:210])
except Exception:
pass
# ── Record type + format detection ──────────────────────────────────────── # ── Record type + format detection ────────────────────────────────────────
# `record_type` is the user-facing label ("Waveform" for any triggered # `record_type` is the user-facing label ("Waveform" for any triggered
# event regardless of timestamp-header layout). `fmt` is the internal # event regardless of timestamp-header layout). `fmt` is the internal
+1 -16
View File
@@ -15,7 +15,6 @@ declared in `event_to_sidecar_dict()`.
from __future__ import annotations from __future__ import annotations
import base64
import datetime import datetime
import hashlib import hashlib
import json import json
@@ -136,20 +135,6 @@ def event_to_sidecar_dict(
captured_at = captured_at or datetime.datetime.utcnow() captured_at = captured_at or datetime.datetime.utcnow()
# Stash raw 0C record bytes in `extensions.raw_records` so future
# field-decoding work (Peak Acceleration, ZC Freq, Time of Peak,
# sensor self-check results, etc.) can run offline against committed
# sidecars without a live device. Cheap (~280 bytes base64) and
# forward-compatible (older readers ignore unknown extensions keys).
ext_dict: dict = dict(extensions) if extensions else {}
raw_0c = getattr(event, "_raw_record", None)
if raw_0c:
rr = ext_dict.setdefault("raw_records", {})
# Don't clobber a raw_0c that callers explicitly passed in via
# `extensions=...` (e.g. round-trip preservation in patch_sidecar).
rr.setdefault("waveform_record_b64", base64.b64encode(raw_0c).decode("ascii"))
rr.setdefault("waveform_record_len", len(raw_0c))
return { return {
"schema_version": SCHEMA_VERSION, "schema_version": SCHEMA_VERSION,
"kind": SIDECAR_KIND, "kind": SIDECAR_KIND,
@@ -189,7 +174,7 @@ def event_to_sidecar_dict(
"notes": "", "notes": "",
}, },
"extensions": ext_dict, "extensions": extensions or {},
} }
-216
View File
@@ -1,216 +0,0 @@
"""
sfm.dump_0c inspect the raw 210-byte SUB 0C waveform record stored in a
sidecar JSON's `extensions.raw_records.waveform_record_b64`.
Usage:
python -m sfm.dump_0c <sidecar.sfm.json> [<sidecar.sfm.json> ...]
Prints, for each input:
- A header summarising the sidecar's metadata-block claims (peaks,
project, timestamp) the "what BW says this event measured" view.
- A 16-byte-wide hex dump of the raw 0C record, annotated with known
field anchors (STRT, channel labels, project strings).
- A "candidate float regions" scan that brute-forces every byte
position as a float32 BE and prints any that yield a value in a
plausible range (1e-7 to 1e3) useful for hunting where Peak
Acceleration / Peak Displacement / ZC Freq / Time of Peak live.
Pairing the printed candidates with the BW Event Report values lets
us nail down byte offsets for the missing fields without a live
device.
"""
from __future__ import annotations
import argparse
import base64
import json
import struct
import sys
from pathlib import Path
# ── Annotations for known anchors in a 210-byte 0C record ──────────────────
# Anchors we look for and label inline in the hex dump. Each is a needle
# (bytes to find) and a short label. Found via .find() — the first
# occurrence wins.
_ANCHORS = [
(b"Tran", "Tran label (PPV @ +6, PVS @ -12)"),
(b"Vert", "Vert label (PPV @ +6)"),
(b"Long", "Long label (PPV @ +6)"),
(b"MicL", "MicL label (peak psi @ +6)"),
(b"Project:", "Project: label"),
(b"Client:", "Client: label"),
(b"User Name:", "User Name: label"),
(b"Seis Loc:", "Seis Loc: label"),
(b"Extended Notes", "Extended Notes label"),
]
def _hex_dump(data: bytes, anchors: dict[int, str]) -> str:
"""Return a 16-byte-wide hex+ASCII dump, with anchor labels printed
on the line that contains the anchor's start byte."""
lines = []
for off in range(0, len(data), 16):
chunk = data[off : off + 16]
hex_part = " ".join(f"{b:02x}" for b in chunk)
ascii_part = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk)
line = f" {off:04x} {hex_part:<47} |{ascii_part}|"
# If any anchor lands on a byte in this row, append a tag
tags = [
f"[{a:#04x}: {label}]"
for a, label in anchors.items()
if off <= a < off + 16
]
if tags:
line += " " + " ".join(tags)
lines.append(line)
return "\n".join(lines)
def _scan_float32_be(data: bytes, lo: float, hi: float) -> list[tuple[int, float]]:
"""Brute-force every offset where data[off:off+4] is a float32 BE in
(lo, hi). Includes negatives in the symmetric range."""
hits = []
for i in range(len(data) - 3):
try:
v = struct.unpack_from(">f", data, i)[0]
except struct.error:
continue
if v != v: # NaN
continue
if abs(v) < 1e-30 or abs(v) > 1e10: # crap range
continue
a = abs(v)
if lo <= a <= hi:
hits.append((i, v))
return hits
def _scan_uint16_be(data: bytes, lo: int, hi: int) -> list[tuple[int, int]]:
"""Find every offset where uint16 BE is in [lo, hi]."""
hits = []
for i in range(len(data) - 1):
v = (data[i] << 8) | data[i + 1]
if lo <= v <= hi:
hits.append((i, v))
return hits
def _summarize_sidecar(side: dict) -> str:
ev = side.get("event", {})
pv = side.get("peak_values", {})
pi = side.get("project_info", {})
bw = side.get("blastware", {})
return (
f" serial: {ev.get('serial')}\n"
f" timestamp: {ev.get('timestamp')}\n"
f" waveform: {ev.get('waveform_key')} ({ev.get('record_type')})\n"
f" sample_rate:{ev.get('sample_rate')} sps rectime:{ev.get('rectime_seconds')}s\n"
f" bw file: {bw.get('filename')} ({bw.get('filesize')} B)\n"
f" peaks: "
f"Tran={pv.get('transverse'):.5f} "
f"Vert={pv.get('vertical'):.5f} "
f"Long={pv.get('longitudinal'):.5f} "
f"PVS={pv.get('vector_sum'):.5f} in/s "
f"Mic={pv.get('mic_psi'):.6e} psi"
if all(pv.get(k) is not None for k in
("transverse", "vertical", "longitudinal", "vector_sum", "mic_psi"))
else f" peaks: {pv}\n project: {pi}"
) + (
f"\n project: {pi.get('project')!r} / {pi.get('client')!r} / "
f"operator={pi.get('operator')!r} loc={pi.get('sensor_location')!r}"
)
def dump_one(path: Path) -> int:
side = json.loads(path.read_text(encoding="utf-8"))
raw_b64 = (
side.get("extensions", {})
.get("raw_records", {})
.get("waveform_record_b64")
)
if not raw_b64:
print(f"\n=== {path} ===")
print(" ! no extensions.raw_records.waveform_record_b64 — sidecar")
print(" pre-dates raw-0C persistence (added in v0.15.x). Re-save")
print(" the event from the device to capture the bytes.")
return 1
raw = base64.b64decode(raw_b64)
# Build anchor map
anchors: dict[int, str] = {}
for needle, label in _ANCHORS:
i = raw.find(needle)
if i >= 0:
anchors[i] = label
print(f"\n=== {path} ===")
print("metadata claimed by sidecar:")
print(_summarize_sidecar(side))
print(f"\nraw 0C record ({len(raw)} bytes):")
print(_hex_dump(raw, anchors))
# Float32 BE candidates in geo-relevant ranges
geo_hits = _scan_float32_be(raw, 1e-5, 50.0)
# Filter: only show hits that are NOT trivially the per-channel labels'
# +6 PPV floats already documented (those will land in any sweep too).
print("\nfloat32 BE candidates (1e-5 .. 50.0):")
for off, v in geo_hits:
annotation = ""
for needle, _ in _ANCHORS[:4]: # geo + mic labels
i = raw.find(needle)
if i >= 0 and off == i + 6:
annotation = f"{needle.decode()} PPV (label+6)"
break
print(f" {off:#04x} ({off:3d}) {v:>+15.6f}{annotation}")
print("\nuint16 BE candidates ZC-Freq-ish (1..200):")
for off, v in _scan_uint16_be(raw, 1, 200):
if v < 5: # too noisy at very low end
continue
print(f" {off:#04x} ({off:3d}) = {v}")
print("\nuint16 BE candidates Time-of-Peak-ish if stored as ms (1..30000):")
for off, v in _scan_uint16_be(raw, 1, 30000):
if v < 100: # noise filter
continue
# Only the first ~80 are worth showing — too many hits otherwise
if off > 80:
break
print(f" {off:#04x} ({off:3d}) = {v} ms ?")
print()
return 0
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(
description="Inspect a saved 0C waveform record from a sidecar JSON.",
)
p.add_argument(
"sidecars",
nargs="+",
type=Path,
help="Path(s) to <event>.sfm.json sidecar file(s).",
)
args = p.parse_args(argv)
rc = 0
for path in args.sidecars:
try:
rc |= dump_one(path)
except Exception as exc:
print(f"\n=== {path} ===\n ERROR: {exc}", file=sys.stderr)
rc |= 2
return rc
if __name__ == "__main__":
sys.exit(main())
-53
View File
@@ -127,59 +127,6 @@ def test_sidecar_write_and_read_round_trip(tmp_path: Path):
assert loaded["source"]["kind"] == "sfm-ach" assert loaded["source"]["kind"] == "sfm-ach"
def test_sidecar_persists_raw_0c_record_in_extensions(tmp_path: Path):
"""An Event with _raw_record populated should land its 210 bytes
base64-encoded in extensions.raw_records.waveform_record_b64, so
later analysis (e.g. mapping Peak Acceleration / Time of Peak / ZC
Freq byte offsets) can run offline against the saved sidecar."""
import base64
ev, _ = _make_synthetic_event()
# Synthesize a 210-byte 0C record with embedded label needles so
# the dump tool's anchor scan has something to find.
raw = bytearray(210)
raw[10:14] = b"Tran"
raw[60:64] = b"Vert"
raw[110:114] = b"Long"
raw[160:164] = b"MicL"
ev._raw_record = bytes(raw)
d = event_file_io.event_to_sidecar_dict(
ev, serial="BE11529",
blastware_filename="M529LKIQ.7M0W", blastware_filesize=1024,
blastware_sha256="x" * 64, source_kind="sfm-live",
)
rr = d["extensions"]["raw_records"]
assert rr["waveform_record_len"] == 210
decoded = base64.b64decode(rr["waveform_record_b64"])
assert decoded == ev._raw_record
# Round-trip through write/read
path = tmp_path / "raw0c.sfm.json"
event_file_io.write_sidecar(path, d)
loaded = event_file_io.read_sidecar(path)
assert (
base64.b64decode(loaded["extensions"]["raw_records"]["waveform_record_b64"])
== ev._raw_record
)
def test_sidecar_omits_raw_records_when_event_has_no_0c(tmp_path: Path):
"""Events without a _raw_record (e.g. constructed by importers that
never see 0C) should NOT add an empty raw_records block keep the
sidecar clean for those flows."""
ev, _ = _make_synthetic_event()
assert ev._raw_record is None
d = event_file_io.event_to_sidecar_dict(
ev, serial="BE11529",
blastware_filename="M529LKIQ.7M0W", blastware_filesize=1024,
blastware_sha256="x" * 64, source_kind="bw-import",
)
assert d["extensions"] == {}
def test_sidecar_rejects_unsupported_schema_version(tmp_path: Path): def test_sidecar_rejects_unsupported_schema_version(tmp_path: Path):
path = tmp_path / "future.sfm.json" path = tmp_path / "future.sfm.json"
path.write_text(json.dumps({ path.write_text(json.dumps({