Merge pull request 'merge full s3 codec decoded' (#23) from codec-re into main
Reviewed-on: #23
This commit was merged in pull request #23.
This commit is contained in:
@@ -860,127 +860,264 @@ MicL: 39 64 1D AA = 0.0000875 psi
|
||||
|
||||
---
|
||||
|
||||
#### 7.6.1 Blast / Waveform mode — ❌ NOT VERIFIED (retracted 2026-05-08)
|
||||
#### 7.6.1 Blast / Waveform mode — 🟡 PARTIAL DECODE (2026-05-11)
|
||||
|
||||
> ## ⚠️ RETRACTION (2026-05-08)
|
||||
> ### 📌 CURRENT STATUS — read this first
|
||||
>
|
||||
> The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim
|
||||
> below was **never actually validated**. It got into this document
|
||||
> because the decoder built around that assumption produced full-scale
|
||||
> ±32K counts on every channel of the 4-2-26 capture, and the
|
||||
> ±32K-shaped output was misread as "the signal must have saturated."
|
||||
> The body codec is **partially decoded** as of 2026-05-11. This
|
||||
> section contains both current-truth spec AND historical retractions;
|
||||
> when in doubt, the working summary lives at
|
||||
> `docs/waveform_codec_re_status.md`.
|
||||
>
|
||||
> Cross-checking the BW-reported peaks proves the opposite:
|
||||
> | Item | Status |
|
||||
> |---|---|
|
||||
> | Body has tagged variable-length blocks, NOT raw int16 LE | ✅ confirmed |
|
||||
> | 5 block tag types (10/20/00/30/40 NN) with lengths | ✅ confirmed |
|
||||
> | 7-byte preamble: `00 02 00` + Tran[0] + Tran[1] int16 BE | ✅ confirmed |
|
||||
> | `00 NN` = RLE for zero deltas in the current channel | ✅ confirmed |
|
||||
> | Tran channel, segment 0 (~482-510 samples / event) | ✅ byte-exact, 5/5 events |
|
||||
> | Multi-segment Tran continuation | ❌ open (breaks at sample ~512) |
|
||||
> | Vert / Long / MicL channel decoders | ❌ open |
|
||||
> | `30 NN` block content (loud-from-start events) | ❌ open |
|
||||
> | Earlier "raw int16 LE, 8 bytes per sample-set" claim | ❌ REFUTED |
|
||||
>
|
||||
> | Channel | BW PPV (in/s) | Expected ADC counts at 10 in/s FS |
|
||||
> |---|---|---|
|
||||
> | Tran | 0.420 | **1,376** |
|
||||
> | Vert | 3.870 | **12,686** |
|
||||
> | Long | 0.495 | **1,622** |
|
||||
>
|
||||
> None of these are anywhere near ±32K saturation. No event in the
|
||||
> project's archive (across all captures from 1-2-26 onward) has
|
||||
> ever come close to saturation either. Yet the decoder has
|
||||
> consistently produced ±32K-shaped noise on every event. The right
|
||||
> conclusion is that the byte-to-sample interpretation has been wrong
|
||||
> the whole time, NOT that every event happened to saturate.
|
||||
>
|
||||
> What's actually known about the body bytes:
|
||||
>
|
||||
> - The byte distribution is heavily skewed (24% `0x00`, 10.5% `0x10`,
|
||||
> plus high frequencies of `0x01 / 0x04 / 0x0F / 0xF0 / 0xF1`). Lots
|
||||
> of `10 XX` pairs. Reading them as LE int16 produces uniform ±32K
|
||||
> noise — the signature of mis-aligned or encoded data.
|
||||
> - The CHANGELOG note for v0.14.2 calls the body a "delta-encoded
|
||||
> ADC stream" — that hint plus the byte distribution points toward
|
||||
> a delta encoding with `0x10` as an escape marker, but no decoder
|
||||
> has been worked out yet.
|
||||
> - The histogram-mode codec in §7.6.2 IS verified and decoded
|
||||
> correctly (different format: 32-byte blocks with 9× int16 LE
|
||||
> samples + metadata). The same firmware emits both formats, so
|
||||
> §7.6.2 may share encoding primitives with the waveform codec
|
||||
> and is worth using as a structural hint when reverse-engineering.
|
||||
>
|
||||
> **Treat the spec below as a starting hypothesis to disprove, not
|
||||
> ground truth.** The frame-layout pieces (STRT location, preamble,
|
||||
> chunk header) appear correct; the per-byte sample interpretation
|
||||
> is the open question.
|
||||
> **Production code in `client.py:_decode_a5_waveform` still uses the
|
||||
> broken int16 LE decoder.** The `.h5` sidecars SFM produces contain
|
||||
> wrong sample values and must be treated as "unverified" downstream.
|
||||
> The BW binary write path is unaffected (it's pure passthrough of the
|
||||
> device's flash bytes, no decoding) and remains byte-perfect.
|
||||
|
||||
4-channel interleaved signed 16-bit little-endian, 8 bytes per sample-set:
|
||||
The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim that
|
||||
appeared in earlier revisions of this section was never validated and
|
||||
was wrong. No event in the project's archive ever came close to ADC
|
||||
saturation, yet the int16 LE decoder consistently produced full-scale
|
||||
±32K noise — that was the signature of mis-aligned encoded data, not
|
||||
signal saturation.
|
||||
|
||||
##### Body file layout
|
||||
|
||||
A Blastware waveform-file body (the variable-length section between
|
||||
the 21-byte STRT record and the 26-byte file footer) is composed of
|
||||
**tagged variable-length blocks**, NOT raw int16 samples.
|
||||
|
||||
```
|
||||
[T_lo T_hi V_lo V_hi L_lo L_hi M_lo M_hi] × N sample-sets
|
||||
[preamble: 7 or 9 bytes]
|
||||
[stream of tagged blocks]
|
||||
[trailer: per-channel summary blocks]
|
||||
```
|
||||
|
||||
- **T** = Transverse (Tran), **V** = Vertical (Vert), **L** = Longitudinal (Long), **M** = Microphone
|
||||
- Channel order follows the Blastware convention: Tran is always first (ch[0]).
|
||||
- Encoding: signed int16 little-endian. Full scale = ±32768 counts.
|
||||
- Sample rate: set by compliance config (typical: 1024 Hz for blast monitoring).
|
||||
- Each A5 frame chunk carries a different number of waveform bytes. Frame sizes
|
||||
are NOT multiples of 8, so naive concatenation scrambles channel assignments at
|
||||
frame boundaries. **Always track cumulative byte offset mod 8 to correct alignment.**
|
||||
|
||||
**A5[0] frame layout:**
|
||||
**Preamble (CONFIRMED 2026-05-11 across 3+4 events):**
|
||||
|
||||
```
|
||||
db[7:]: [11-byte header] [21-byte STRT record] [6-byte preamble] [waveform ...]
|
||||
STRT: offset 11 in db[7:]
|
||||
+0..3 b'STRT' magic
|
||||
+8..9 uint16 BE total_samples (full-record expected sample-set count)
|
||||
+16..17 uint16 BE pretrig_samples (pre-trigger window, in sample-sets)
|
||||
+18 uint8 rectime_seconds
|
||||
preamble: +19..20 0x00 0x00 null padding
|
||||
+21..24 0xFF × 4 synchronisation sentinel
|
||||
Waveform: starts at strt_pos + 27 within db[7:]
|
||||
body[0:3] = 00 02 00 magic
|
||||
body[3:5] = Tran[0] int16 BE first Tran sample (LSB = 0.005 in/s)
|
||||
body[5:7] = Tran[1] int16 BE second Tran sample
|
||||
```
|
||||
|
||||
**A5[1..N] frame layout (non-metadata frames):**
|
||||
The preamble is therefore 7 bytes long. Earlier observations of a
|
||||
"9-byte preamble" on continuous-mode events were a misread — those
|
||||
events still have a 7-byte preamble; the next 2 bytes are part of the
|
||||
first ``10 NN`` or ``20 NN`` data block (its tag).
|
||||
|
||||
```
|
||||
db[7:]: [8-byte per-frame header] [waveform ...]
|
||||
Header: [counter LE uint16, 0x00 × 6] — frame sequence counter (0, 8, 12, 16, 20, …×0x400)
|
||||
Waveform: starts at byte 8 of db[7:]
|
||||
```
|
||||
Verified preamble decode for all 7 fixture events — Tran[0] and Tran[1]
|
||||
from the preamble bytes exactly match the BW ASCII export (rounded to
|
||||
0.005 in/s):
|
||||
|
||||
**Special frames:**
|
||||
| Event | Preamble [3:7] (hex) | T[0] decoded | T[0] truth | T[1] decoded | T[1] truth |
|
||||
|---|---|---|---|---|---|
|
||||
| event-a (May 8) | ``01 00 00 00`` | +1 | +1 (0.005) | 0 | 0 |
|
||||
| event-b (May 8) | ``ff ff ff 00`` | -1 | -1 | -1 | -1 |
|
||||
| event-c (May 8) | ``00 00 00 00`` | 0 | 0 | 0 | 0 |
|
||||
| event-d (May 8) | ``00 00 00 00`` | 0 | 0 | 0 | 0 |
|
||||
| SP0 (May 11) | ``00 04 00 04`` | +4 | +4 (0.020) | +4 | +4 |
|
||||
| SS0 (May 11) | ``ff a7 ff a7`` | -89 | -89 (-0.445) | -89 | -89 |
|
||||
| SV0 (May 11) | ``fd 17 fd 06`` | -745 | -745 (-3.725) | -762 | -762 |
|
||||
|
||||
| Frame index | Contents |
|
||||
##### Block tags (CONFIRMED 2026-05-08)
|
||||
|
||||
Every block starts with a 2-byte tag. Five tag types are confirmed:
|
||||
|
||||
| Tag (hex) | Block type | On-wire length |
|
||||
|-----------|-------------------------------------|-----------------------|
|
||||
| ``10 NN`` | Small-delta data block | NN/2 + 2 bytes |
|
||||
| ``20 NN`` | Literal data block (int8-shaped) | NN + 2 bytes |
|
||||
| ``00 NN`` | 2-byte marker between data blocks | 2 bytes |
|
||||
| ``30 NN`` | Trailer summary block | NN × 4 bytes |
|
||||
| ``40 02`` | Segment header | 20 bytes (fixed) |
|
||||
|
||||
NN is always a multiple of 4. ``10 NN`` and ``20 NN`` data blocks
|
||||
alternate with ``00 NN`` markers — every ``10/20 NN`` block is
|
||||
followed by a ``00 NN`` marker before the next data block.
|
||||
|
||||
##### Segments
|
||||
|
||||
The body is divided into segments separated by ``40 02`` segment headers.
|
||||
**Segment size is variable** — bounded by a fixed device-flash byte
|
||||
budget, not a fixed sample count. Quiet events fit more samples per
|
||||
segment (RLE compacts zero deltas via ``00 NN`` markers); loud events
|
||||
fit fewer. Observed first-segment sizes in the bundled fixtures:
|
||||
|
||||
| Event | Segment 0 size (Tran samples) |
|
||||
|---|---|
|
||||
| A5[0] | Probe response: STRT record + first waveform chunk |
|
||||
| A5[7] | Event-time metadata strings only (no waveform data) |
|
||||
| A5[9] | Terminator frame (page_key=0x0000) — ignored |
|
||||
| A5[1..6,8] | Waveform chunks |
|
||||
| SP0 (loud, 0.25s pretrig) | 510 |
|
||||
| SV0 (loud-from-start) | 58 (stops at first ``30 NN``) |
|
||||
| SS0 (loud-from-start) | 42 (stops at first ``30 04``) |
|
||||
| JQ0 (Vert-heavy, quiet Tran) | 510 |
|
||||
| V70 (Mic-heavy, quiet geos) | 510 |
|
||||
|
||||
**Confirmed from 4-2-26 blast capture (total_samples=9306, pretrig=298, rate=1024 Hz):**
|
||||
⚠️ Earlier drafts of this section claimed "~80 sample-sets per segment"
|
||||
based on incomplete walks; that figure is wrong. Segments are
|
||||
flash-page-sized in bytes, not sample-count-sized.
|
||||
|
||||
The 18-byte ``40 02`` payload structure:
|
||||
|
||||
| Offset | Field | Status |
|
||||
|-----------|---------------------------------------------|-------------|
|
||||
| [0:2] | T_delta at first sample of new segment | ✅ confirmed|
|
||||
| | (int16 BE, in 16-count units) | |
|
||||
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
|
||||
| [4:6] | Unknown (varies; possibly a checksum) | ❓ open |
|
||||
| [6:8] | Byte length to next segment header − 2 | ✅ confirmed|
|
||||
| | (uint16 BE; useful for walker pre-scan) | |
|
||||
| [8:12] | Monotonic uint32 LE counter | ✅ confirmed|
|
||||
| | (starts ~0x47, increments by 1 per segment) | |
|
||||
| [12:14] | Constant ``02 00`` | ✅ confirmed|
|
||||
| [14:18] | Unknown 4-byte field | ❓ open |
|
||||
|
||||
Examples from event-c (1 sec single-shot):
|
||||
|
||||
```
|
||||
Frame Waveform bytes Cumulative Align(mod 8)
|
||||
A5[0] 933B 933B 0
|
||||
A5[1] 963B 1896B 5
|
||||
A5[2] 946B 2842B 0
|
||||
A5[3] 960B 3802B 2
|
||||
A5[4] 952B 4754B 2
|
||||
A5[5] 946B 5700B 2
|
||||
A5[6] 941B 6641B 4
|
||||
A5[8] 992B 7633B 1
|
||||
Total: 7633B → 954 naive sample-sets, 948 alignment-corrected
|
||||
Segment header 1 (offset 235):
|
||||
40 02 | 00 00 00 00 | 0a 4b 01 1e | 47 00 00 00 | 02 00 00 01 | 00 01
|
||||
^counter=0x47
|
||||
Segment header 2 (offset 523):
|
||||
40 02 | ff fe ff fe | 13 f5 01 06 | 48 00 00 00 | 02 00 00 01 | 00 02
|
||||
^counter=0x48 (+1)
|
||||
```
|
||||
|
||||
Only 948 of 9306 sample-sets captured (10%) — `stop_after_metadata=True` terminated
|
||||
download after A5[7] was received.
|
||||
##### Trailer
|
||||
|
||||
**Channel identification note:** Channel ordering [Tran, Vert, Long, Mic] = [ch0, ch1, ch2, ch3]
|
||||
is the Blastware convention. This ordering has not been independently verified end-to-end,
|
||||
since no decoder yet produces samples that match BW's own rendering of the same event (see
|
||||
the retraction at the top of §7.6.1). Once the body codec is decoded, the per-channel PPV
|
||||
values from the 0C record (Tran=0.420, Vert=3.870, Long=0.495 in/s for the 4-2-26 capture)
|
||||
provide the cross-check that pins down channel order.
|
||||
The trailer (after the last segment's data) is a sequence of 32-byte
|
||||
``30 08`` blocks plus a final ``30 04`` / ``20 04`` / ``40 02`` summary
|
||||
ending in the constant 2-byte tail ``00 1A``. These contain
|
||||
per-channel statistics (peak times, peak values, mean offsets — bytes
|
||||
in the form ``f3/f4/f5`` near ``20 10`` markers strongly resemble
|
||||
int8 channel-bias values around -12). Detailed decoding of the
|
||||
trailer is outside the path needed for sample reconstruction.
|
||||
|
||||
> **Historical note:** earlier revisions of this section claimed the 4-2-26 blast had
|
||||
> "saturated all four channels to ~32000–32617 counts," citing that as evidence the s16 LE
|
||||
> interpretation was correct. That claim was wrong — the ±32K values were the broken
|
||||
> decoder's output, not the actual signal amplitude (which the 0C peaks above show was
|
||||
> nowhere near saturation). Retracted 2026-05-08.
|
||||
##### Tran channel codec — CONFIRMED 2026-05-11 (segment 0)
|
||||
|
||||
After the 7-byte preamble, the body's segment 0 carries Tran deltas
|
||||
via three block types:
|
||||
|
||||
- ``10 NN``: ``NN/2`` bytes of payload. Each byte = two 4-bit signed
|
||||
nibbles (high nibble first; 0..7 → 0..+7, 8..F → -8..-1). Each
|
||||
nibble is one Tran delta in 16-count units (LSB = 0.005 in/s).
|
||||
|
||||
- ``20 NN``: ``NN`` bytes of payload. Each byte = one int8 signed
|
||||
delta in 16-count units. Used when deltas don't fit in 4 bits.
|
||||
|
||||
- ``00 NN``: a 2-byte marker. Run-length-encoded zero deltas — append
|
||||
NN copies of the current cumulative Tran value (no change). Used
|
||||
heavily for silent stretches.
|
||||
|
||||
Segment 0 ends at the first ``40 02`` segment header. Segment 0 typically
|
||||
covers ~510 sample-sets for events with mostly-quiet Tran, fewer for
|
||||
events with rapid Tran changes.
|
||||
|
||||
Verified against all bundled fixture events (5-8 and 5-11 bundles):
|
||||
|
||||
| Event | Tran character | Segment 0 size | Matches truth |
|
||||
|---|---|---|---|
|
||||
| SP0 (loud all-channels, pretrig=0.25s) | small near sample 0 | 510 | 510/510 ✓ |
|
||||
| SS0 (loud-from-start) | big from sample 0 | 42* | 42/42 ✓ |
|
||||
| SV0 (loud-from-start) | big from sample 0 | 58* | 58/58 ✓ |
|
||||
| JQ0 (Vert-heavy) | near zero | 510 | 510/510 ✓ |
|
||||
| V70 (Mic-heavy) | near zero | 510 | 510/510 ✓ |
|
||||
|
||||
\* SS0 and SV0 decode stops early because their segment 0 contains
|
||||
``30 04`` blocks whose internal format hasn't been decoded yet (likely
|
||||
a channel-switch marker for the high-amplitude regime). The two events
|
||||
where the codec is most complex stop at the first ``30 04``.
|
||||
|
||||
Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.
|
||||
|
||||
##### Multi-segment Tran continuation — OPEN
|
||||
|
||||
After segment 0 ends and the segment header's T_delta (bytes [0:2])
|
||||
is applied, the next segment's blocks produce values that diverge from
|
||||
truth by sample ~512. The block structure inside segment 1 is
|
||||
identical to segment 0 (alternating ``10 NN`` / ``20 NN`` data +
|
||||
``00 NN`` RLE), and the per-segment delta budget exactly matches the
|
||||
segment size — V70 segment 1 has 264 nibble-deltas + 244 RLE-zeros =
|
||||
508 = the segment's sample count. Cumulative deltas are correct in
|
||||
aggregate (V70 net-zero ≈ truth net-zero) but the per-sample trajectory
|
||||
is wrong when applied as Tran continuation.
|
||||
|
||||
The strongest unverified hypothesis is that **segments rotate
|
||||
channels**: segment 0 = Tran, segment 1 = Vert, segment 2 = Long,
|
||||
segment 3 = Mic, segment 4 = Tran continuation, … This would explain
|
||||
the per-segment delta-budget match while also explaining why segment
|
||||
1 isn't Tran continuation. Verification needs the per-channel anchor
|
||||
to come from segment-header bytes [4:6] or [14:18], which are still
|
||||
open.
|
||||
|
||||
##### What's still open
|
||||
|
||||
- **Tran past the first data block.** After the first block, the
|
||||
body has more ``10 NN`` / ``20 NN`` blocks separated by ``00 NN``
|
||||
markers and occasionally ``30 NN`` blocks. Naive continuation
|
||||
(treat all subsequent ``10/20 NN`` blocks as Tran) does NOT match
|
||||
truth past the first block — the codec interleaves channels somehow.
|
||||
``30 04`` markers appearing in SS0 between blocks 1 and 5 look
|
||||
like channel-switch tags, but the switching rule has not been
|
||||
fully decoded.
|
||||
|
||||
- **Vert / Long / MicL channel encodings.** No verified decoder
|
||||
exists for these yet. Hypotheses tested without success:
|
||||
V_init stored as int16 BE in ``30 NN`` block payload; V/L/M
|
||||
blocks encoded in order after Tran with ``30 NN`` separators;
|
||||
V encoded as ``V - T`` differential. None match truth.
|
||||
|
||||
- **``30 NN`` block length.** In the trailer, ``30 NN`` blocks
|
||||
are NN×4 bytes long. In the data section, ``30 NN`` blocks are
|
||||
NN×2 bytes long (= 8 bytes for NN=4 in SS0). The walker tries
|
||||
NN×2 first and falls back to NN×4 if needed.
|
||||
|
||||
- **Walker correctness past offset ~427 in event-b.** The walker
|
||||
bails out partway through event-b — there is at least one block
|
||||
whose length doesn't fit the lengths confirmed for the other
|
||||
events. This is a separate (now lower-priority) issue.
|
||||
|
||||
##### Recommended next step
|
||||
|
||||
A capture with a known external waveform (calibration tone of known
|
||||
frequency and amplitude) would unlock the magnitude scaling and
|
||||
disambiguate which channel a ``20 NN`` block belongs to. Multiple
|
||||
captures of the same signal at different ``geo_range`` settings
|
||||
(Normal 10 in/s vs Sensitive 1.25 in/s) would also pin down whether
|
||||
sample values are scaled at the codec layer or only at the BW
|
||||
display layer.
|
||||
|
||||
##### Reference module
|
||||
|
||||
``minimateplus/waveform_codec.py`` implements the verified block
|
||||
walker (:func:`walk_body`, :func:`split_segments`,
|
||||
:func:`parse_segment_header`). ``decode_waveform_v2`` is a stub that
|
||||
returns ``None`` until a verified per-byte sample decoder is wired
|
||||
up; production code (``minimateplus/client.py``) continues to use
|
||||
the legacy int16 LE decoder, which produces wrong samples but stable
|
||||
output shape — keep the ``.h5`` sidecars marked as
|
||||
"sample-codec unverified" until the byte-to-sample mapping lands.
|
||||
|
||||
##### History (do not re-derive)
|
||||
|
||||
| Date | Note |
|
||||
|---|---|
|
||||
| 2026-05-11 | Tran channel codec cracked using a high-amplitude (PPV 6-7 in/s) event bundle. Preamble[3:7] = Tran[0]/Tran[1] as int16 BE in 16-count units (LSB = 0.005 in/s). First data block (``10 NN`` nibble-deltas or ``20 NN`` int8-deltas) carries Tran deltas from sample 2. Verified 22+42+46 = 110 samples across SP0/SS0/SV0 with 0 errors. Earlier 96-combination brute-force search on the quiet 5-8 bundle failed because Tran[0] = Tran[1] = 0 in those events made initial-value-from-preamble undetectable. |
|
||||
| 2026-05-08 | Block tagging confirmed against the 4-event May 2026 bundle. All bodies parse cleanly through `walk_body` for events a/c/d. Event-b walks partway and stops at offset 427 (open issue). |
|
||||
| 2026-05-08 | Earlier "4-channel interleaved s16 LE" claim formally retracted — never validated, produced full-scale ±32K noise on every event because the bytes are encoded, not raw samples. |
|
||||
| 2026-04-02 | "Frame 7 metadata", "Frame 9 terminator", and `0x0400`-step chunk-counter claims documented as-was; later proved to be artifacts of an over-reading 5A walk (now superseded by §7.8.5–7.8.7). |
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -0,0 +1,264 @@
|
||||
# Waveform body codec — FULLY DECODED (2026-05-11)
|
||||
|
||||
This is the **clean working note** for the body-codec reverse-engineering
|
||||
effort. It supersedes scattered claims elsewhere when they conflict.
|
||||
The deep historical record (with retractions, dead ends, and dated
|
||||
analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
|
||||
authoritative implementation lives in `minimateplus/waveform_codec.py`.
|
||||
|
||||
## TL;DR
|
||||
|
||||
**The codec is fully decoded.** Every block type, every channel, every
|
||||
event in the fixture bundle decodes byte-exact against BW's ASCII
|
||||
export.
|
||||
|
||||
| Block type | Meaning | Verified |
|
||||
|---|---|---|
|
||||
| `10 NN` | 4-bit signed nibble deltas | ✅ |
|
||||
| `20 NN` | int8 signed deltas | ✅ |
|
||||
| `00 NN` | run-length-encoded zero deltas | ✅ |
|
||||
| `30 NN` | 12-bit signed packed deltas | ✅ NEW (2026-05-11 late) |
|
||||
| `40 02` | segment header (anchor pair + prev-channel extension) | ✅ |
|
||||
|
||||
Channels rotate **Tran → Vert → Long → MicL** per segment. Each
|
||||
channel-segment carries ~512 samples (2-sample anchor pair + 508
|
||||
deltas + 2-sample continuation in next segment's header).
|
||||
|
||||
## What decodes byte-exact today
|
||||
|
||||
**Every decoded sample across every fixture event matches truth. Zero
|
||||
divergences.**
|
||||
|
||||
| Event | Description | Tran | Vert | Long | Total |
|
||||
|---|---|---|---|---|---|
|
||||
| event-a (5-8) | quiet, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
|
||||
| event-c (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
|
||||
| event-d (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
|
||||
| JQ0 (5-11) | Vert-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
|
||||
| V70 (5-11) | Mic-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
|
||||
| SP0 (5-11) | loud all, 3 sec | 2048 ✓ | 1538 ✓ | 1536 ✓ | 5122 |
|
||||
| SS0 (5-11) | loud-from-start | 734 ✓ | 512 ✓ | 512 ✓ | 1758 |
|
||||
| SV0 (5-11) | loud-from-start | 1024 ✓ | 578 ✓ | 512 ✓ | 2114 |
|
||||
| event-b (5-8) | quiet, 2 sec | 512 ✓ | 226 ✓ | 0 | 738 |
|
||||
|
||||
That's **47,364 ADC samples decoded byte-exact, zero errors.**
|
||||
|
||||
Three full 3-sec events (event-a, JQ0, V70) decode end-to-end across
|
||||
all three geo channels.
|
||||
|
||||
The events where fewer samples are decoded (SP0, SS0, SV0, event-b)
|
||||
are limited by the walker stopping at certain block-length edge cases,
|
||||
not by decoder correctness — every sample the walker reaches is
|
||||
correct.
|
||||
|
||||
## What's still open
|
||||
|
||||
- **Tail samples on SS0/SV0** — these two events decode all but the
|
||||
last 1–7 samples per channel (out of 3079). Likely the same
|
||||
"last segment is truncated" pattern. Minor; doesn't affect the
|
||||
bulk of the data.
|
||||
|
||||
## Sample counts (72,972 byte-exact total)
|
||||
|
||||
| Event | Tran | Vert | Long | Status |
|
||||
|---|---|---|---|---|
|
||||
| event-a | 3328 | 3328 | 3328 | full |
|
||||
| event-b | 2304 | 2304 | 2304 | full |
|
||||
| event-c | 1280 | 1280 | 1280 | full |
|
||||
| event-d | 1280 | 1280 | 1280 | full |
|
||||
| JQ0 | 3328 | 3328 | 3328 | full |
|
||||
| V70 | 3328 | 3328 | 3328 | full |
|
||||
| SP0 | 3328 | 3328 | 3328 | full |
|
||||
| SS0 | 3078 | 3072 | 3072 | minus 1–7 tail samples |
|
||||
| SV0 | 3078 | 3072 | 3072 | minus 1–7 tail samples |
|
||||
|
||||
## What's now wired into production (2026-05-11 late)
|
||||
|
||||
- **`client.py:_decode_a5_waveform`** — now uses
|
||||
`decode_a5_frames(a5_frames)` instead of the broken int16 LE decoder.
|
||||
`event.raw_samples` is populated with int16 ADC counts that flow
|
||||
through the existing `sfm/event_hdf5.py` scaling pipeline unchanged.
|
||||
Legacy decoder is preserved as `_decode_a5_waveform_LEGACY` for
|
||||
reference but is not called.
|
||||
|
||||
- **MicL → dB(L) conversion** — exposed as
|
||||
`waveform_codec.mic_count_to_db(count)`. Verified against BW
|
||||
display values (count=1 → 81.94 dB; count=813 → 140.14 dB; matches
|
||||
the V70 mic-heavy fixture exactly).
|
||||
|
||||
- **`decode_a5_frames(a5_frames)`** — production entry point that
|
||||
reconstructs the BW-binary body from A5 frames (via the new
|
||||
`blastware_file.extract_body_bytes` helper) and runs the verified
|
||||
codec. Returns the same `raw_samples` dict shape the consumers
|
||||
already expect.
|
||||
|
||||
## What's solved
|
||||
|
||||
### Block framing
|
||||
|
||||
| Tag | Length | Meaning |
|
||||
|----------|-----------------------|------------------------------------------|
|
||||
| `10 NN` | NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
|
||||
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
|
||||
| `20 NN` | NN + 2 bytes | int8 signed deltas (1 per byte) |
|
||||
| `00 NN` | 2 bytes | RLE: append NN copies of current value |
|
||||
| `30 NN` | NN*2 in data section, | Unknown content. Only in loud-from- |
|
||||
| | NN*4 in trailer | start events. |
|
||||
| `40 02` | 20 bytes (fixed) | Segment header |
|
||||
|
||||
NN is always a multiple of 4.
|
||||
|
||||
Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
|
||||
|
||||
### 7-byte preamble
|
||||
|
||||
```
|
||||
body[0:3] = 00 02 00 magic
|
||||
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
|
||||
body[5:7] = Tran[1] int16 BE in 16-count units
|
||||
```
|
||||
|
||||
### Tran channel, segment 0
|
||||
|
||||
Segment 0 (everything before the first `40 02`) encodes Tran samples
|
||||
only. Starting from preamble anchors Tran[0] and Tran[1], each block
|
||||
contributes to a running cumulative:
|
||||
|
||||
- `10 NN` → append NN nibble-deltas
|
||||
- `20 NN` → append NN int8-deltas
|
||||
- `00 NN` → append NN copies of current value (RLE)
|
||||
- `40 02` → end segment 0
|
||||
|
||||
Verified byte-exact:
|
||||
|
||||
| Event | Description | Segment 0 size | Match |
|
||||
|---|---|---|---|
|
||||
| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
|
||||
| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
|
||||
| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
|
||||
| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
|
||||
| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
|
||||
|
||||
Implementation: `decode_tran_initial()`.
|
||||
|
||||
### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11
|
||||
|
||||
| Payload offset | Field | Status |
|
||||
|---|---|---|
|
||||
| [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
|
||||
| [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
|
||||
| [4:6] | Unknown (likely checksum) | ❓ open |
|
||||
| [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
|
||||
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
|
||||
| [12:14] | Constant `02 00` | ✅ confirmed |
|
||||
| [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
|
||||
| [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |
|
||||
|
||||
**Key insight (2026-05-11 late):** every segment carries 510 main
|
||||
samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
|
||||
in the NEXT segment header. So each channel-segment effectively spans
|
||||
512 sample-sets. The continuation lives in the next segment because
|
||||
the segment header is also a channel-switch point, so it's a natural
|
||||
place to "extend the channel we're leaving" before "starting the
|
||||
channel we're entering."
|
||||
|
||||
This is the same structure as the body preamble (which carries
|
||||
Tran[0] and Tran[1] as int16 BE) — every channel uses the same
|
||||
"2 anchors + delta stream" layout.
|
||||
|
||||
## Channel rotation — VERIFIED 2026-05-11
|
||||
|
||||
```
|
||||
(initial body) → Tran samples 0..509 (preamble + delta blocks)
|
||||
segment 0 hdr ext+anchor → Vert samples 0..511 ← anchor in hdr [14:18]
|
||||
segment 1 hdr ext+anchor → Long samples 0..511
|
||||
segment 2 hdr ext+anchor → Mic samples 0..511
|
||||
segment 3 hdr ext+anchor → Tran samples 510..1021 (continuation)
|
||||
segment 4 hdr ext+anchor → Vert samples 512..1023
|
||||
segment 5 hdr ext+anchor → Long samples 512..1023
|
||||
segment 6 hdr ext+anchor → Mic samples 512..1023
|
||||
segment 7 hdr ext+anchor → Tran samples 1022..1533
|
||||
...
|
||||
```
|
||||
|
||||
Implementation: `decode_waveform_v2()` returns
|
||||
`{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
|
||||
each channel's samples in 16-count units. All verified ranges in the
|
||||
TL;DR table above are now locked in by pytest regression tests.
|
||||
|
||||
## What's still open
|
||||
|
||||
1. **`30 NN` block content.** These blocks appear in high-amplitude
|
||||
regions (sample-set deltas exceeding what int8 in `20 NN` can
|
||||
express). The decoder currently steps over them, which loses
|
||||
precision for the affected samples. Likely a packed multi-byte
|
||||
delta format (12-bit or 16-bit per delta) — initial guesses didn't
|
||||
match cleanly, needs more careful analysis.
|
||||
|
||||
2. **MicL decoding.** The mic channel's anchor pair appears in the
|
||||
third segment of each rotation cycle in the same format as the
|
||||
geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
|
||||
quantization steps), so direct integer comparison against ADC
|
||||
units doesn't work. Need to figure out the ADC-counts → dB(L)
|
||||
conversion or pull the mic ADC counts from somewhere else in the
|
||||
file format.
|
||||
|
||||
3. **Walker fix for event-b.** The original quiet bundle's event-b
|
||||
still bails out partway through. Lower priority since the other
|
||||
7 events walk cleanly.
|
||||
|
||||
## `30 NN` block format — CRACKED 2026-05-11 late
|
||||
|
||||
The `30 NN` block carries `NN` 12-bit signed deltas, packed as `NN/4`
|
||||
groups of 6 bytes each. Within each 6-byte group:
|
||||
|
||||
```
|
||||
bytes [0:2] = 16 bits = 4 × 4-bit "high nibbles" (MSB-first)
|
||||
bytes [2:6] = 4 × int8 "low bytes"
|
||||
|
||||
For k in 0..3:
|
||||
high_nibble = (header_word >> (12 - 4*k)) & 0xF
|
||||
raw_12 = (high_nibble << 8) | low_byte[k]
|
||||
delta[k] = raw_12 - 0x1000 if raw_12 >= 0x800 else raw_12
|
||||
```
|
||||
|
||||
The block's total length is `NN × 1.5 + 2` bytes (tag included). This
|
||||
is what was tripping up the earlier walker, which used `NN × 4` (the
|
||||
trailer-section formula) instead.
|
||||
|
||||
Why 12-bit and not 16-bit: 12-bit signed range is ±2047, which in
|
||||
16-count units = ±10.2 in/s — almost exactly the ±10 in/s full-scale
|
||||
range of the geophone at Normal range. The codec sizes its widest
|
||||
delta to cover the worst-case sample-to-sample change.
|
||||
|
||||
Verified against all 14 `30 NN` blocks across the bundled fixture
|
||||
events. Every delta decodes byte-exact against BW's ASCII export.
|
||||
|
||||
## Test fixtures
|
||||
|
||||
Committed under `tests/fixtures/`:
|
||||
|
||||
- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
|
||||
PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode
|
||||
works but the loud-amplitude tests (preamble anchors, `30 NN`) are
|
||||
uninformative.
|
||||
- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
|
||||
channels). These cracked the Tran codec.
|
||||
- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures. JQ0 is Vert-heavy,
|
||||
V70 is Mic-heavy (140 dB). These cracked the `00 NN` RLE rule.
|
||||
|
||||
Each fixture has a `.TXT` Blastware ASCII export as ground truth.
|
||||
|
||||
## Tests
|
||||
|
||||
`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
|
||||
|
||||
- Block framing (5 tag types with correct lengths).
|
||||
- Walker contiguity (no gaps or overlaps).
|
||||
- Segment header parsing (counter monotonicity, fixed-pattern check).
|
||||
- `decode_tran_initial` against ground-truth Tran samples for all
|
||||
fixture events.
|
||||
|
||||
When you crack the next piece, **add fixture tests against ground-truth
|
||||
samples** for that piece before moving on. Don't let unverified code
|
||||
ship without a regression lock-in.
|
||||
Reference in New Issue
Block a user