codec-re: solve waveform body block framing; per-byte sample mapping still open
Decoded the structural framing of the Blastware waveform body — the bytes between the 21-byte STRT record and the 26-byte file footer. The body is a sequence of tagged variable-length blocks, NOT raw int16 LE. Five tag types (10/20/00/30/40 NN) and their lengths are now confirmed against the 4-event May 2026 fixture bundle. Body splits cleanly into ~16 segments (for a 1280-sample event) separated by 40 02 segment headers carrying a monotonically incrementing uint32 LE counter at bytes [8:12]. What's done: - minimateplus/waveform_codec.py — block walker, segment splitter, segment header parser. decode_waveform_v2 is a stub returning None until the byte-to-sample mapping is solved; client.py is unchanged. - tests/test_waveform_codec.py — 31 tests covering block detection, lengths, contiguous-walk, segment splitting, segment-header parsing, and counter monotonicity. All pass. - tests/fixtures/decode-re-5-8-26/ — bundled fixtures (4 events, BW binary + Blastware ASCII export each). - docs/instantel_protocol_reference.md §7.6.1 — replaced retraction box with the verified structural decoding plus an explicit list of what's still open. What's still open: the per-byte mapping inside 10 NN / 20 NN blocks. 96 channel-permutation × nibble-order × sign-convention combinations were brute-force tested; none match BW's ASCII export to within ±1 ADC count. The codec is more elaborate than uniform 4-bit deltas — likely a hybrid variable-bit-width scheme with segment-anchor resync points. Next recommended step: capture an event with a known calibration tone to pin down magnitude scaling. Walker also bails out partway through event-b (open issue documented in both the module and the protocol reference).
This commit is contained in:
@@ -860,127 +860,160 @@ MicL: 39 64 1D AA = 0.0000875 psi
|
||||
|
||||
---
|
||||
|
||||
#### 7.6.1 Blast / Waveform mode — ❌ NOT VERIFIED (retracted 2026-05-08)
|
||||
#### 7.6.1 Blast / Waveform mode — 🟡 STRUCTURAL FRAMING DECODED (2026-05-08)
|
||||
|
||||
> ## ⚠️ RETRACTION (2026-05-08)
|
||||
> **Status (2026-05-08):** Block-level framing is solved and verified
|
||||
> against the 4-event May 8 2026 bundle (3 sec / 2 sec / 1 sec / 1 sec
|
||||
> events captured live from BE11529). The per-byte mapping from block
|
||||
> data to ADC samples is **still open** — the previous int16 LE claim
|
||||
> is REFUTED (see history below).
|
||||
>
|
||||
> The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim
|
||||
> below was **never actually validated**. It got into this document
|
||||
> because the decoder built around that assumption produced full-scale
|
||||
> ±32K counts on every channel of the 4-2-26 capture, and the
|
||||
> ±32K-shaped output was misread as "the signal must have saturated."
|
||||
>
|
||||
> Cross-checking the BW-reported peaks proves the opposite:
|
||||
>
|
||||
> | Channel | BW PPV (in/s) | Expected ADC counts at 10 in/s FS |
|
||||
> |---|---|---|
|
||||
> | Tran | 0.420 | **1,376** |
|
||||
> | Vert | 3.870 | **12,686** |
|
||||
> | Long | 0.495 | **1,622** |
|
||||
>
|
||||
> None of these are anywhere near ±32K saturation. No event in the
|
||||
> project's archive (across all captures from 1-2-26 onward) has
|
||||
> ever come close to saturation either. Yet the decoder has
|
||||
> consistently produced ±32K-shaped noise on every event. The right
|
||||
> conclusion is that the byte-to-sample interpretation has been wrong
|
||||
> the whole time, NOT that every event happened to saturate.
|
||||
>
|
||||
> What's actually known about the body bytes:
|
||||
>
|
||||
> - The byte distribution is heavily skewed (24% `0x00`, 10.5% `0x10`,
|
||||
> plus high frequencies of `0x01 / 0x04 / 0x0F / 0xF0 / 0xF1`). Lots
|
||||
> of `10 XX` pairs. Reading them as LE int16 produces uniform ±32K
|
||||
> noise — the signature of mis-aligned or encoded data.
|
||||
> - The CHANGELOG note for v0.14.2 calls the body a "delta-encoded
|
||||
> ADC stream" — that hint plus the byte distribution points toward
|
||||
> a delta encoding with `0x10` as an escape marker, but no decoder
|
||||
> has been worked out yet.
|
||||
> - The histogram-mode codec in §7.6.2 IS verified and decoded
|
||||
> correctly (different format: 32-byte blocks with 9× int16 LE
|
||||
> samples + metadata). The same firmware emits both formats, so
|
||||
> §7.6.2 may share encoding primitives with the waveform codec
|
||||
> and is worth using as a structural hint when reverse-engineering.
|
||||
>
|
||||
> **Treat the spec below as a starting hypothesis to disprove, not
|
||||
> ground truth.** The frame-layout pieces (STRT location, preamble,
|
||||
> chunk header) appear correct; the per-byte sample interpretation
|
||||
> is the open question.
|
||||
> The earlier "4-channel interleaved s16 LE, 8 bytes per sample-set"
|
||||
> claim was never validated and was wrong. No event in the project's
|
||||
> archive ever came close to ADC saturation, yet the int16 LE decoder
|
||||
> consistently produced full-scale ±32K noise — that was the signature
|
||||
> of mis-aligned encoded data, not signal saturation.
|
||||
|
||||
4-channel interleaved signed 16-bit little-endian, 8 bytes per sample-set:
|
||||
##### Body file layout
|
||||
|
||||
A Blastware waveform-file body (the variable-length section between
|
||||
the 21-byte STRT record and the 26-byte file footer) is composed of
|
||||
**tagged variable-length blocks**, NOT raw int16 samples.
|
||||
|
||||
```
|
||||
[T_lo T_hi V_lo V_hi L_lo L_hi M_lo M_hi] × N sample-sets
|
||||
[preamble: 7 or 9 bytes]
|
||||
[stream of tagged blocks]
|
||||
[trailer: per-channel summary blocks]
|
||||
```
|
||||
|
||||
- **T** = Transverse (Tran), **V** = Vertical (Vert), **L** = Longitudinal (Long), **M** = Microphone
|
||||
- Channel order follows the Blastware convention: Tran is always first (ch[0]).
|
||||
- Encoding: signed int16 little-endian. Full scale = ±32768 counts.
|
||||
- Sample rate: set by compliance config (typical: 1024 Hz for blast monitoring).
|
||||
- Each A5 frame chunk carries a different number of waveform bytes. Frame sizes
|
||||
are NOT multiples of 8, so naive concatenation scrambles channel assignments at
|
||||
frame boundaries. **Always track cumulative byte offset mod 8 to correct alignment.**
|
||||
**Preamble:** starts with the 4-byte magic ``00 02 00 00``. Single-shot
|
||||
events have a 7-byte preamble; continuous events have a 9-byte preamble
|
||||
(the 4 events in the May 8 2026 bundle split 2/2 between the two
|
||||
lengths). Bytes [4:9] of the preamble appear to encode initial
|
||||
per-channel state but the layout has not been pinned down — for some
|
||||
events byte [4] equals truth Tran[0] in 16-count units (0.005 in/s
|
||||
LSB), but other channel-byte assignments don't fit consistently.
|
||||
|
||||
**A5[0] frame layout:**
|
||||
##### Block tags (CONFIRMED 2026-05-08)
|
||||
|
||||
Every block starts with a 2-byte tag. Five tag types are confirmed:
|
||||
|
||||
| Tag (hex) | Block type | On-wire length |
|
||||
|-----------|-------------------------------------|-----------------------|
|
||||
| ``10 NN`` | Small-delta data block | NN/2 + 2 bytes |
|
||||
| ``20 NN`` | Literal data block (int8-shaped) | NN + 2 bytes |
|
||||
| ``00 NN`` | 2-byte marker between data blocks | 2 bytes |
|
||||
| ``30 NN`` | Trailer summary block | NN × 4 bytes |
|
||||
| ``40 02`` | Segment header | 20 bytes (fixed) |
|
||||
|
||||
NN is always a multiple of 4. ``10 NN`` and ``20 NN`` data blocks
|
||||
alternate with ``00 NN`` markers — every ``10/20 NN`` block is
|
||||
followed by a ``00 NN`` marker before the next data block.
|
||||
|
||||
##### Segments
|
||||
|
||||
The body is divided into ~16 SEGMENTS for a 1280-sample event (= 1
|
||||
segment per ~80 sample-sets), separated by ``40 02`` segment headers.
|
||||
A 3328-sample event has ~42 segments.
|
||||
|
||||
The 18-byte ``40 02`` payload structure (CONFIRMED across all 4
|
||||
fixtures by inspecting the increment of bytes [8:12]):
|
||||
|
||||
| Offset | Length | Field |
|
||||
|--------|--------|--------------------------------------------------|
|
||||
| 0 | 4 | Anchor / channel state (open — see below) |
|
||||
| 4 | 4 | Variable field (open) |
|
||||
| 8 | 4 | uint32 LE counter — increments by 1 per segment |
|
||||
| 12 | 4 | Fixed pattern ``02 00 00 01`` |
|
||||
| 16 | 2 | Variable tail |
|
||||
|
||||
The counter at bytes [8:12] starts in the 0x40s for a freshly-erased
|
||||
device and increments cleanly — useful as a structural sanity check.
|
||||
|
||||
Examples from event-c (1 sec single-shot):
|
||||
|
||||
```
|
||||
db[7:]: [11-byte header] [21-byte STRT record] [6-byte preamble] [waveform ...]
|
||||
STRT: offset 11 in db[7:]
|
||||
+0..3 b'STRT' magic
|
||||
+8..9 uint16 BE total_samples (full-record expected sample-set count)
|
||||
+16..17 uint16 BE pretrig_samples (pre-trigger window, in sample-sets)
|
||||
+18 uint8 rectime_seconds
|
||||
preamble: +19..20 0x00 0x00 null padding
|
||||
+21..24 0xFF × 4 synchronisation sentinel
|
||||
Waveform: starts at strt_pos + 27 within db[7:]
|
||||
Segment header 1 (offset 235):
|
||||
40 02 | 00 00 00 00 | 0a 4b 01 1e | 47 00 00 00 | 02 00 00 01 | 00 01
|
||||
^counter=0x47
|
||||
Segment header 2 (offset 523):
|
||||
40 02 | ff fe ff fe | 13 f5 01 06 | 48 00 00 00 | 02 00 00 01 | 00 02
|
||||
^counter=0x48 (+1)
|
||||
```
|
||||
|
||||
**A5[1..N] frame layout (non-metadata frames):**
|
||||
##### Trailer
|
||||
|
||||
```
|
||||
db[7:]: [8-byte per-frame header] [waveform ...]
|
||||
Header: [counter LE uint16, 0x00 × 6] — frame sequence counter (0, 8, 12, 16, 20, …×0x400)
|
||||
Waveform: starts at byte 8 of db[7:]
|
||||
```
|
||||
The trailer (after the last segment's data) is a sequence of 32-byte
|
||||
``30 08`` blocks plus a final ``30 04`` / ``20 04`` / ``40 02`` summary
|
||||
ending in the constant 2-byte tail ``00 1A``. These contain
|
||||
per-channel statistics (peak times, peak values, mean offsets — bytes
|
||||
in the form ``f3/f4/f5`` near ``20 10`` markers strongly resemble
|
||||
int8 channel-bias values around -12). Detailed decoding of the
|
||||
trailer is outside the path needed for sample reconstruction.
|
||||
|
||||
**Special frames:**
|
||||
##### What's still open
|
||||
|
||||
| Frame index | Contents |
|
||||
- **The byte → sample mapping inside ``10 NN`` and ``20 NN`` blocks.**
|
||||
Tested hypotheses that did not match BW's ASCII export to within ±1
|
||||
ADC count:
|
||||
|
||||
1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved,
|
||||
all 24 channel permutations × 2 nibble orders × 2 sign conventions
|
||||
× 2 init-from-header settings (= 96 combinations). All produce
|
||||
values that diverge from truth after the first ~7 sample-sets.
|
||||
2. ``20 NN`` data = int8 absolute or delta samples for one channel.
|
||||
Magnitudes in observed blocks (peak ±34 in event-c at offset 351)
|
||||
do not match any channel's PPV at any plausible ADC quantization
|
||||
(1-count, 4-count, 8-count, 16-count).
|
||||
3. ``00 NN`` marker = "skip N sample-sets with zero deltas". Sums
|
||||
of NN/4 across markers do not consistently match the 80
|
||||
sample-sets-per-segment count.
|
||||
|
||||
The codec is more elaborate than uniform 4-bit deltas. A hybrid
|
||||
variable-bit-width scheme (4-bit deltas in ``10 NN``, 8-bit deltas
|
||||
or absolutes in ``20 NN``, segment-header anchors after each
|
||||
``40 02``) is the most plausible remaining hypothesis.
|
||||
|
||||
- **The role of byte [4:9] of the preamble.** Byte 4 == Tran[0]
|
||||
truth value (in 16-count units) for events a/b/d, but doesn't
|
||||
fit consistently for event-c. Bytes [5:9] don't match a simple
|
||||
per-channel encoding.
|
||||
|
||||
- **Walker correctness past offset ~427 in event-b.** The walker
|
||||
bails out partway through event-b — there is at least one block
|
||||
whose length doesn't fit the lengths confirmed for the other
|
||||
three events. Likely a ``20 NN`` with NN > 0xFC (currently
|
||||
rejected by the walker), or a different length formula in some
|
||||
context.
|
||||
|
||||
##### Recommended next step
|
||||
|
||||
A capture with a known external waveform (calibration tone of known
|
||||
frequency and amplitude) would unlock the magnitude scaling and
|
||||
disambiguate which channel a ``20 NN`` block belongs to. Multiple
|
||||
captures of the same signal at different ``geo_range`` settings
|
||||
(Normal 10 in/s vs Sensitive 1.25 in/s) would also pin down whether
|
||||
sample values are scaled at the codec layer or only at the BW
|
||||
display layer.
|
||||
|
||||
##### Reference module
|
||||
|
||||
``minimateplus/waveform_codec.py`` implements the verified block
|
||||
walker (:func:`walk_body`, :func:`split_segments`,
|
||||
:func:`parse_segment_header`). ``decode_waveform_v2`` is a stub that
|
||||
returns ``None`` until a verified per-byte sample decoder is wired
|
||||
up; production code (``minimateplus/client.py``) continues to use
|
||||
the legacy int16 LE decoder, which produces wrong samples but stable
|
||||
output shape — keep the ``.h5`` sidecars marked as
|
||||
"sample-codec unverified" until the byte-to-sample mapping lands.
|
||||
|
||||
##### History (do not re-derive)
|
||||
|
||||
| Date | Note |
|
||||
|---|---|
|
||||
| A5[0] | Probe response: STRT record + first waveform chunk |
|
||||
| A5[7] | Event-time metadata strings only (no waveform data) |
|
||||
| A5[9] | Terminator frame (page_key=0x0000) — ignored |
|
||||
| A5[1..6,8] | Waveform chunks |
|
||||
|
||||
**Confirmed from 4-2-26 blast capture (total_samples=9306, pretrig=298, rate=1024 Hz):**
|
||||
|
||||
```
|
||||
Frame Waveform bytes Cumulative Align(mod 8)
|
||||
A5[0] 933B 933B 0
|
||||
A5[1] 963B 1896B 5
|
||||
A5[2] 946B 2842B 0
|
||||
A5[3] 960B 3802B 2
|
||||
A5[4] 952B 4754B 2
|
||||
A5[5] 946B 5700B 2
|
||||
A5[6] 941B 6641B 4
|
||||
A5[8] 992B 7633B 1
|
||||
Total: 7633B → 954 naive sample-sets, 948 alignment-corrected
|
||||
```
|
||||
|
||||
Only 948 of 9306 sample-sets captured (10%) — `stop_after_metadata=True` terminated
|
||||
download after A5[7] was received.
|
||||
|
||||
**Channel identification note:** Channel ordering [Tran, Vert, Long, Mic] = [ch0, ch1, ch2, ch3]
|
||||
is the Blastware convention. This ordering has not been independently verified end-to-end,
|
||||
since no decoder yet produces samples that match BW's own rendering of the same event (see
|
||||
the retraction at the top of §7.6.1). Once the body codec is decoded, the per-channel PPV
|
||||
values from the 0C record (Tran=0.420, Vert=3.870, Long=0.495 in/s for the 4-2-26 capture)
|
||||
provide the cross-check that pins down channel order.
|
||||
|
||||
> **Historical note:** earlier revisions of this section claimed the 4-2-26 blast had
|
||||
> "saturated all four channels to ~32000–32617 counts," citing that as evidence the s16 LE
|
||||
> interpretation was correct. That claim was wrong — the ±32K values were the broken
|
||||
> decoder's output, not the actual signal amplitude (which the 0C peaks above show was
|
||||
> nowhere near saturation). Retracted 2026-05-08.
|
||||
| 2026-05-08 | Block tagging confirmed against the 4-event May 2026 bundle. All bodies parse cleanly through `walk_body` for events a/c/d. Event-b walks partway and stops at offset 427 (open issue). |
|
||||
| 2026-05-08 | Earlier "4-channel interleaved s16 LE" claim formally retracted — never validated, produced full-scale ±32K noise on every event because the bytes are encoded, not raw samples. |
|
||||
| 2026-04-02 | "Frame 7 metadata", "Frame 9 terminator", and `0x0400`-step chunk-counter claims documented as-was; later proved to be artifacts of an over-reading 5A walk (now superseded by §7.8.5–7.8.7). |
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user