docs: clean up waveform-codec doc layers per review
Three "truth layers" had drifted apart between commits. Fixed: 1. waveform_codec.py docstring rewritten from the 2026-05-08 "structural framing only" state to the 2026-05-11 "Tran segment 0 solved + segment-header partially decoded" state. Killed stale "~80 sample-sets per segment" language (real segments are flash-page-byte-sized, not sample-count-sized; observed first-segment sizes are 42-510 samples depending on signal). Killed stale "preamble is 7 or 9 bytes" language (always 7). 2. docs/instantel_protocol_reference.md §7.6.1: added a clear "CURRENT STATUS" box at the top with a status table. Replaced the stale "~80 sample-sets" line with the verified per-event segment sizes. Merged two redundant segment-header field-table sections. 3. docs/waveform_codec_re_status.md (NEW): clean working-status doc. Solved / not solved / hypothesis / next experiment / fixtures / tests. The protocol reference remains the historical Rosetta Stone; this new file is the current-truth working note that shouldn't accumulate fossil layers. 4. CLAUDE.md §"Waveform body codec": prominent warning box at top — "DO NOT TRUST decoded sample arrays yet." BW binary passthrough is the only sample-bearing output to trust until the decoder lands. Added a "Next experiment" subsection pointing the next pass at the segment-channel scoring analyzer. 40 tests still pass.
This commit is contained in:
@@ -61,10 +61,24 @@ Full read pipeline + write pipeline + erase pipeline + monitor log + call home c
|
|||||||
|
|
||||||
## Waveform body codec — PARTIAL (2026-05-11)
|
## Waveform body codec — PARTIAL (2026-05-11)
|
||||||
|
|
||||||
|
> ### ⛔️ DO NOT TRUST decoded sample arrays yet
|
||||||
|
>
|
||||||
|
> `client.py:_decode_a5_waveform` still uses the broken legacy int16 LE
|
||||||
|
> decoder. The `.h5` sidecars SFM writes contain WRONG sample values
|
||||||
|
> for every event. Treat decoded sample arrays as "unverified" in all
|
||||||
|
> downstream consumers.
|
||||||
|
>
|
||||||
|
> The **BW binary write path** (`blastware_file.py`) is unaffected —
|
||||||
|
> it's pure passthrough of device flash bytes and remains byte-perfect.
|
||||||
|
> Use the `.bw` binary as the authoritative waveform output until the
|
||||||
|
> codec is fully decoded.
|
||||||
|
>
|
||||||
|
> Clean working-status doc: `docs/waveform_codec_re_status.md`.
|
||||||
|
> Full archaeological record: `docs/instantel_protocol_reference.md §7.6.1`.
|
||||||
|
|
||||||
The **per-byte decoding** of the Blastware waveform-file body (between the
|
The **per-byte decoding** of the Blastware waveform-file body (between the
|
||||||
21-byte STRT record and the 26-byte footer) was historically claimed to be
|
21-byte STRT record and the 26-byte footer) was historically claimed to be
|
||||||
"raw int16 LE, 8 bytes per sample-set." That was wrong — see the
|
"raw int16 LE, 8 bytes per sample-set." That was wrong. The body
|
||||||
retraction in `docs/instantel_protocol_reference.md §7.6.1`. The body
|
|
||||||
is actually a tagged-block stream with a custom delta+RLE codec.
|
is actually a tagged-block stream with a custom delta+RLE codec.
|
||||||
|
|
||||||
### What's solved (2026-05-11)
|
### What's solved (2026-05-11)
|
||||||
@@ -96,13 +110,26 @@ is actually a tagged-block stream with a custom delta+RLE codec.
|
|||||||
(SS0, SV0) and breaks the simple Tran walk there. Probably a channel-
|
(SS0, SV0) and breaks the simple Tran walk there. Probably a channel-
|
||||||
switch or alternative-encoding marker for high-amplitude regions.
|
switch or alternative-encoding marker for high-amplitude regions.
|
||||||
|
|
||||||
|
### Next experiment
|
||||||
|
|
||||||
|
**Don't hero-code the full decoder.** Build a small analysis tool — a
|
||||||
|
segment-channel scoring analyzer. For each segment of each fixture
|
||||||
|
event, run the segment-0 Tran block-walk + RLE decode and score the
|
||||||
|
cumulative trajectory against the BW ASCII truth for each of {Tran,
|
||||||
|
Vert, Long, MicL} over that segment's sample range, trying different
|
||||||
|
anchor-bytes candidates from the segment header. The winning
|
||||||
|
(channel, anchor-location) combination for each segment reveals
|
||||||
|
whether segments rotate channels and which header bytes encode the
|
||||||
|
per-segment channel anchors.
|
||||||
|
|
||||||
|
See `docs/waveform_codec_re_status.md` for the full specification of
|
||||||
|
the next experiment.
|
||||||
|
|
||||||
### Production-code status
|
### Production-code status
|
||||||
|
|
||||||
`client.py:_decode_a5_waveform` still uses the old (broken) int16 LE
|
`client.py:_decode_a5_waveform` still uses the old (broken) int16 LE
|
||||||
decoder. Until the multi-channel decoder lands, the `.h5` sidecars
|
decoder (see warning at the top of this section). `decode_waveform_v2()`
|
||||||
produced by SFM contain WRONG samples — keep treating them as
|
in `minimateplus/waveform_codec.py` returns `None` as a placeholder.
|
||||||
"unverified" downstream. `decode_waveform_v2()` returns `None` as a
|
|
||||||
placeholder.
|
|
||||||
|
|
||||||
### Test fixtures
|
### Test fixtures
|
||||||
|
|
||||||
|
|||||||
@@ -860,20 +860,39 @@ MicL: 39 64 1D AA = 0.0000875 psi
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
#### 7.6.1 Blast / Waveform mode — 🟡 STRUCTURAL FRAMING + TRAN CODEC DECODED (2026-05-11)
|
#### 7.6.1 Blast / Waveform mode — 🟡 PARTIAL DECODE (2026-05-11)
|
||||||
|
|
||||||
> **Status (2026-05-11):** Block-level framing is solved. The Tran-channel
|
> ### 📌 CURRENT STATUS — read this first
|
||||||
> encoding (preamble + first data block) is **fully verified** against the
|
|
||||||
> 3-event May 11 2026 high-amplitude bundle (PPV 6-7 in/s) and the 4-event
|
|
||||||
> May 8 bundle. Verts / Long / MicL channel encodings and multi-block
|
|
||||||
> Tran continuation are **still open**. The previous int16 LE claim
|
|
||||||
> remains REFUTED (see history below).
|
|
||||||
>
|
>
|
||||||
> The earlier "4-channel interleaved s16 LE, 8 bytes per sample-set"
|
> The body codec is **partially decoded** as of 2026-05-11. This
|
||||||
> claim was never validated and was wrong. No event in the project's
|
> section contains both current-truth spec AND historical retractions;
|
||||||
> archive ever came close to ADC saturation, yet the int16 LE decoder
|
> when in doubt, the working summary lives at
|
||||||
> consistently produced full-scale ±32K noise — that was the signature
|
> `docs/waveform_codec_re_status.md`.
|
||||||
> of mis-aligned encoded data, not signal saturation.
|
>
|
||||||
|
> | Item | Status |
|
||||||
|
> |---|---|
|
||||||
|
> | Body has tagged variable-length blocks, NOT raw int16 LE | ✅ confirmed |
|
||||||
|
> | 5 block tag types (10/20/00/30/40 NN) with lengths | ✅ confirmed |
|
||||||
|
> | 7-byte preamble: `00 02 00` + Tran[0] + Tran[1] int16 BE | ✅ confirmed |
|
||||||
|
> | `00 NN` = RLE for zero deltas in the current channel | ✅ confirmed |
|
||||||
|
> | Tran channel, segment 0 (~482-510 samples / event) | ✅ byte-exact, 5/5 events |
|
||||||
|
> | Multi-segment Tran continuation | ❌ open (breaks at sample ~512) |
|
||||||
|
> | Vert / Long / MicL channel decoders | ❌ open |
|
||||||
|
> | `30 NN` block content (loud-from-start events) | ❌ open |
|
||||||
|
> | Earlier "raw int16 LE, 8 bytes per sample-set" claim | ❌ REFUTED |
|
||||||
|
>
|
||||||
|
> **Production code in `client.py:_decode_a5_waveform` still uses the
|
||||||
|
> broken int16 LE decoder.** The `.h5` sidecars SFM produces contain
|
||||||
|
> wrong sample values and must be treated as "unverified" downstream.
|
||||||
|
> The BW binary write path is unaffected (it's pure passthrough of the
|
||||||
|
> device's flash bytes, no decoding) and remains byte-perfect.
|
||||||
|
|
||||||
|
The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim that
|
||||||
|
appeared in earlier revisions of this section was never validated and
|
||||||
|
was wrong. No event in the project's archive ever came close to ADC
|
||||||
|
saturation, yet the int16 LE decoder consistently produced full-scale
|
||||||
|
±32K noise — that was the signature of mis-aligned encoded data, not
|
||||||
|
signal saturation.
|
||||||
|
|
||||||
##### Body file layout
|
##### Body file layout
|
||||||
|
|
||||||
@@ -932,23 +951,38 @@ followed by a ``00 NN`` marker before the next data block.
|
|||||||
|
|
||||||
##### Segments
|
##### Segments
|
||||||
|
|
||||||
The body is divided into ~16 SEGMENTS for a 1280-sample event (= 1
|
The body is divided into segments separated by ``40 02`` segment headers.
|
||||||
segment per ~80 sample-sets), separated by ``40 02`` segment headers.
|
**Segment size is variable** — bounded by a fixed device-flash byte
|
||||||
A 3328-sample event has ~42 segments.
|
budget, not a fixed sample count. Quiet events fit more samples per
|
||||||
|
segment (RLE compacts zero deltas via ``00 NN`` markers); loud events
|
||||||
|
fit fewer. Observed first-segment sizes in the bundled fixtures:
|
||||||
|
|
||||||
The 18-byte ``40 02`` payload structure (CONFIRMED across all 4
|
| Event | Segment 0 size (Tran samples) |
|
||||||
fixtures by inspecting the increment of bytes [8:12]):
|
|---|---|
|
||||||
|
| SP0 (loud, 0.25s pretrig) | 510 |
|
||||||
|
| SV0 (loud-from-start) | 58 (stops at first ``30 NN``) |
|
||||||
|
| SS0 (loud-from-start) | 42 (stops at first ``30 04``) |
|
||||||
|
| JQ0 (Vert-heavy, quiet Tran) | 510 |
|
||||||
|
| V70 (Mic-heavy, quiet geos) | 510 |
|
||||||
|
|
||||||
| Offset | Length | Field |
|
⚠️ Earlier drafts of this section claimed "~80 sample-sets per segment"
|
||||||
|--------|--------|--------------------------------------------------|
|
based on incomplete walks; that figure is wrong. Segments are
|
||||||
| 0 | 4 | Anchor / channel state (open — see below) |
|
flash-page-sized in bytes, not sample-count-sized.
|
||||||
| 4 | 4 | Variable field (open) |
|
|
||||||
| 8 | 4 | uint32 LE counter — increments by 1 per segment |
|
|
||||||
| 12 | 4 | Fixed pattern ``02 00 00 01`` |
|
|
||||||
| 16 | 2 | Variable tail |
|
|
||||||
|
|
||||||
The counter at bytes [8:12] starts in the 0x40s for a freshly-erased
|
The 18-byte ``40 02`` payload structure:
|
||||||
device and increments cleanly — useful as a structural sanity check.
|
|
||||||
|
| Offset | Field | Status |
|
||||||
|
|-----------|---------------------------------------------|-------------|
|
||||||
|
| [0:2] | T_delta at first sample of new segment | ✅ confirmed|
|
||||||
|
| | (int16 BE, in 16-count units) | |
|
||||||
|
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
|
||||||
|
| [4:6] | Unknown (varies; possibly a checksum) | ❓ open |
|
||||||
|
| [6:8] | Byte length to next segment header − 2 | ✅ confirmed|
|
||||||
|
| | (uint16 BE; useful for walker pre-scan) | |
|
||||||
|
| [8:12] | Monotonic uint32 LE counter | ✅ confirmed|
|
||||||
|
| | (starts ~0x47, increments by 1 per segment) | |
|
||||||
|
| [12:14] | Constant ``02 00`` | ✅ confirmed|
|
||||||
|
| [14:18] | Unknown 4-byte field | ❓ open |
|
||||||
|
|
||||||
Examples from event-c (1 sec single-shot):
|
Examples from event-c (1 sec single-shot):
|
||||||
|
|
||||||
@@ -1008,26 +1042,25 @@ where the codec is most complex stop at the first ``30 04``.
|
|||||||
|
|
||||||
Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.
|
Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.
|
||||||
|
|
||||||
##### Segment header T-delta (PARTIAL 2026-05-11)
|
##### Multi-segment Tran continuation — OPEN
|
||||||
|
|
||||||
The 20-byte ``40 02`` segment header has its first 2 bytes ([0:2] of
|
After segment 0 ends and the segment header's T_delta (bytes [0:2])
|
||||||
payload) as an int16 BE Tran delta for the first sample of the new
|
is applied, the next segment's blocks produce values that diverge from
|
||||||
segment. Verified across V70 (3 segments with 0 deltas) and SP0/JQ0
|
truth by sample ~512. The block structure inside segment 1 is
|
||||||
(1 segment with +1 delta). Other bytes of the segment header payload
|
identical to segment 0 (alternating ``10 NN`` / ``20 NN`` data +
|
||||||
are partially understood:
|
``00 NN`` RLE), and the per-segment delta budget exactly matches the
|
||||||
|
segment size — V70 segment 1 has 264 nibble-deltas + 244 RLE-zeros =
|
||||||
|
508 = the segment's sample count. Cumulative deltas are correct in
|
||||||
|
aggregate (V70 net-zero ≈ truth net-zero) but the per-sample trajectory
|
||||||
|
is wrong when applied as Tran continuation.
|
||||||
|
|
||||||
| Payload offset | Field | Status |
|
The strongest unverified hypothesis is that **segments rotate
|
||||||
|---|---|---|
|
channels**: segment 0 = Tran, segment 1 = Vert, segment 2 = Long,
|
||||||
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
|
segment 3 = Mic, segment 4 = Tran continuation, … This would explain
|
||||||
| [2:4] | unknown (often 0; not a simple V or T delta) | ❓ open |
|
the per-segment delta-budget match while also explaining why segment
|
||||||
| [4:6] | unknown (varies per event; possibly a checksum) | ❓ open |
|
1 isn't Tran continuation. Verification needs the per-channel anchor
|
||||||
| [6:8] | byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
|
to come from segment-header bytes [4:6] or [14:18], which are still
|
||||||
| [8:12] | monotonic uint32 LE counter | ✅ confirmed |
|
open.
|
||||||
| [12:14] | constant ``02 00`` | ✅ confirmed |
|
|
||||||
| [14:18] | unknown 4-byte field | ❓ open |
|
|
||||||
|
|
||||||
Multi-segment Tran decoding diverges after sample ~512 — the per-segment
|
|
||||||
channel ordering after the header is still unknown.
|
|
||||||
|
|
||||||
##### What's still open
|
##### What's still open
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,172 @@
|
|||||||
|
# Waveform body codec — current working status (2026-05-11)
|
||||||
|
|
||||||
|
This is the **clean working note** for the body-codec reverse-engineering
|
||||||
|
effort. It supersedes scattered claims elsewhere when they conflict.
|
||||||
|
The deep historical record (with retractions, dead ends, and dated
|
||||||
|
analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
|
||||||
|
authoritative implementation lives in `minimateplus/waveform_codec.py`.
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
The Blastware waveform-file body is a **tagged variable-length block
|
||||||
|
stream**, NOT raw int16 LE samples. Block framing is solved. Tran
|
||||||
|
channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
|
||||||
|
across all 5 high-amplitude fixture events). Multi-segment continuation
|
||||||
|
and the Vert / Long / MicL channel decoders are still open.
|
||||||
|
|
||||||
|
**Production code in `minimateplus/client.py:_decode_a5_waveform` still
|
||||||
|
uses the broken legacy int16 LE decoder.** Sample arrays it writes to
|
||||||
|
the `.h5` sidecars are wrong and must be treated as "unverified" by all
|
||||||
|
downstream consumers. The BW binary write path (`blastware_file.py`)
|
||||||
|
is unaffected — it's pure passthrough and remains byte-perfect.
|
||||||
|
|
||||||
|
## What's solved
|
||||||
|
|
||||||
|
### Block framing
|
||||||
|
|
||||||
|
| Tag | Length | Meaning |
|
||||||
|
|----------|-----------------------|------------------------------------------|
|
||||||
|
| `10 NN` | NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
|
||||||
|
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
|
||||||
|
| `20 NN` | NN + 2 bytes | int8 signed deltas (1 per byte) |
|
||||||
|
| `00 NN` | 2 bytes | RLE: append NN copies of current value |
|
||||||
|
| `30 NN` | NN*2 in data section, | Unknown content. Only in loud-from- |
|
||||||
|
| | NN*4 in trailer | start events. |
|
||||||
|
| `40 02` | 20 bytes (fixed) | Segment header |
|
||||||
|
|
||||||
|
NN is always a multiple of 4.
|
||||||
|
|
||||||
|
Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
|
||||||
|
|
||||||
|
### 7-byte preamble
|
||||||
|
|
||||||
|
```
|
||||||
|
body[0:3] = 00 02 00 magic
|
||||||
|
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
|
||||||
|
body[5:7] = Tran[1] int16 BE in 16-count units
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tran channel, segment 0
|
||||||
|
|
||||||
|
Segment 0 (everything before the first `40 02`) encodes Tran samples
|
||||||
|
only. Starting from preamble anchors Tran[0] and Tran[1], each block
|
||||||
|
contributes to a running cumulative:
|
||||||
|
|
||||||
|
- `10 NN` → append NN nibble-deltas
|
||||||
|
- `20 NN` → append NN int8-deltas
|
||||||
|
- `00 NN` → append NN copies of current value (RLE)
|
||||||
|
- `40 02` → end segment 0
|
||||||
|
|
||||||
|
Verified byte-exact:
|
||||||
|
|
||||||
|
| Event | Description | Segment 0 size | Match |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
|
||||||
|
| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
|
||||||
|
| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
|
||||||
|
| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
|
||||||
|
| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
|
||||||
|
|
||||||
|
Implementation: `decode_tran_initial()`.
|
||||||
|
|
||||||
|
### Segment header (`40 02`, 20 bytes total)
|
||||||
|
|
||||||
|
| Payload offset | Field | Status |
|
||||||
|
|---|---|---|
|
||||||
|
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
|
||||||
|
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
|
||||||
|
| [4:6] | Unknown (possibly checksum) | ❓ open |
|
||||||
|
| [6:8] | Byte length to next segment header − 2 (uint16 BE) | ✅ confirmed |
|
||||||
|
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
|
||||||
|
| [12:14] | Constant `02 00` | ✅ confirmed |
|
||||||
|
| [14:18] | Unknown 4-byte field | ❓ open |
|
||||||
|
|
||||||
|
## What's still open
|
||||||
|
|
||||||
|
1. **Multi-segment Tran continuation.** After segment 0, applying
|
||||||
|
segment 1's blocks as Tran continuation diverges from truth by
|
||||||
|
sample ~512. Block structure is identical to segment 0 and the
|
||||||
|
per-segment delta budget matches the segment size — but the per-
|
||||||
|
sample trajectory is wrong.
|
||||||
|
|
||||||
|
2. **Vert / Long / MicL channel decoders.** No verified decoder for
|
||||||
|
any non-Tran channel.
|
||||||
|
|
||||||
|
3. **`30 NN` block content.** Only appears in loud-from-start events.
|
||||||
|
Probably a channel-switch or alternative-encoding marker for high-
|
||||||
|
amplitude regions. Walker steps over it without decoding.
|
||||||
|
|
||||||
|
## Strongest unverified hypothesis
|
||||||
|
|
||||||
|
Segments rotate channels:
|
||||||
|
|
||||||
|
```
|
||||||
|
segment 0 → Tran samples 0..509
|
||||||
|
segment 1 → Vert samples 0..507
|
||||||
|
segment 2 → Long samples 0..507
|
||||||
|
segment 3 → Mic samples 0..507
|
||||||
|
segment 4 → Tran samples 510..N (continuation)
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
This would explain:
|
||||||
|
- Why segment-0 = Tran works perfectly.
|
||||||
|
- Why segment 1 has the same block structure but applying it as Tran
|
||||||
|
continuation gives wrong values.
|
||||||
|
- Why the per-segment delta budget matches the segment size for a
|
||||||
|
*single* channel (508 deltas per segment, not 4 × 508).
|
||||||
|
|
||||||
|
Not yet verified because the per-channel anchor at segment-start isn't
|
||||||
|
identified in the segment header. Bytes [4:6] and [14:18] of the
|
||||||
|
header are the prime candidates.
|
||||||
|
|
||||||
|
## Next experiment — segment-channel scoring analyzer
|
||||||
|
|
||||||
|
Don't try to hero-code the full decoder. Instead, build a small
|
||||||
|
analysis tool that:
|
||||||
|
|
||||||
|
1. For each segment in every fixture event, runs the segment-0 Tran
|
||||||
|
decoder (block-walk + RLE) and produces a cumulative trajectory
|
||||||
|
of 508 deltas.
|
||||||
|
2. Scores that trajectory against the BW ASCII truth for *each* of
|
||||||
|
{Tran, Vert, Long, MicL} over the segment's sample range, starting
|
||||||
|
from different anchor-byte candidates from the segment header.
|
||||||
|
3. Reports which (channel, anchor-bytes-location) combination produces
|
||||||
|
the lowest error for each segment.
|
||||||
|
|
||||||
|
If the rotation hypothesis is right, segment 0 should clearly score
|
||||||
|
best against Tran, segment 1 against Vert, etc. The winning
|
||||||
|
anchor-bytes-location will reveal which segment-header bytes encode
|
||||||
|
the per-segment channel anchors.
|
||||||
|
|
||||||
|
If the rotation hypothesis is *not* right, the scorer will at least
|
||||||
|
narrow down what segment 1 actually carries.
|
||||||
|
|
||||||
|
## Test fixtures
|
||||||
|
|
||||||
|
Committed under `tests/fixtures/`:
|
||||||
|
|
||||||
|
- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
|
||||||
|
PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode
|
||||||
|
works but the loud-amplitude tests (preamble anchors, `30 NN`) are
|
||||||
|
uninformative.
|
||||||
|
- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
|
||||||
|
channels). These cracked the Tran codec.
|
||||||
|
- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures. JQ0 is Vert-heavy,
|
||||||
|
V70 is Mic-heavy (140 dB). These cracked the `00 NN` RLE rule.
|
||||||
|
|
||||||
|
Each fixture has a `.TXT` Blastware ASCII export as ground truth.
|
||||||
|
|
||||||
|
## Tests
|
||||||
|
|
||||||
|
`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
|
||||||
|
|
||||||
|
- Block framing (5 tag types with correct lengths).
|
||||||
|
- Walker contiguity (no gaps or overlaps).
|
||||||
|
- Segment header parsing (counter monotonicity, fixed-pattern check).
|
||||||
|
- `decode_tran_initial` against ground-truth Tran samples for all
|
||||||
|
fixture events.
|
||||||
|
|
||||||
|
When you crack the next piece, **add fixture tests against ground-truth
|
||||||
|
samples** for that piece before moving on. Don't let unverified code
|
||||||
|
ship without a regression lock-in.
|
||||||
+103
-89
@@ -1,119 +1,133 @@
|
|||||||
"""
|
"""
|
||||||
waveform_codec.py — block-walker for the MiniMate Plus waveform body codec.
|
waveform_codec.py — block-walker and partial decoder for the MiniMate Plus
|
||||||
|
waveform-file body.
|
||||||
|
|
||||||
PARTIAL REVERSE-ENGINEERING — 2026-05-08.
|
PARTIAL REVERSE-ENGINEERING — last updated 2026-05-11.
|
||||||
|
|
||||||
Status: STRUCTURAL FRAMING confirmed; per-block sample interpretation OPEN.
|
The Blastware waveform-file body — the bytes between the 21-byte STRT
|
||||||
|
record and the 26-byte file footer — is NOT raw int16 LE samples (the
|
||||||
|
historical assumption that produced full-scale ±32K noise on every
|
||||||
|
event). It is a tagged variable-length block stream with a custom
|
||||||
|
delta + RLE codec.
|
||||||
|
|
||||||
This module replaces the int16-LE assumption that produced full-scale ±32K
|
Current status:
|
||||||
noise on every event. The body is NOT raw int16 LE: it is a sequence of
|
|
||||||
tagged variable-length blocks. The block framing is solved here. The
|
|
||||||
mapping from block bytes to ADC samples is **NOT yet pinned down** — the
|
|
||||||
work-in-progress decoder ``decode_waveform_v2`` returns ``None`` until
|
|
||||||
a verified algorithm is wired in.
|
|
||||||
|
|
||||||
Until ``decode_waveform_v2`` returns a verified result, callers that need
|
- Block framing: ✅ solved (block types and lengths all confirmed)
|
||||||
sample data should keep relying on the legacy decoder in ``client.py``
|
- Tran channel, segment 0: ✅ solved (decode_tran_initial returns
|
||||||
(known-broken, but at least stable in shape) and not consume this
|
byte-exact values vs BW's ASCII export, across 5 of 5 loud-bundle
|
||||||
module's sample output.
|
events; first ~510 samples per event)
|
||||||
|
- Multi-segment Tran continuation: ❌ open (every hypothesis breaks
|
||||||
|
at the segment-1 boundary around sample 512)
|
||||||
|
- Vert / Long / Mic channel decoders: ❌ open
|
||||||
|
- 30 NN block content: ❌ open (only appears in loud-from-start events)
|
||||||
|
|
||||||
|
Production code in client.py still uses the broken int16 LE decoder.
|
||||||
|
``decode_waveform_v2`` here returns ``None`` as a placeholder. Callers
|
||||||
|
that need sample arrays should treat the legacy decoder's output as
|
||||||
|
"unverified" — the BW binary write path is the only sample-bearing
|
||||||
|
output that is currently trustworthy.
|
||||||
|
|
||||||
────────────────────────────────────────────────────────────────────────────
|
────────────────────────────────────────────────────────────────────────────
|
||||||
Body structure (CONFIRMED 2026-05-08 against decode-re/5-8-26 4-event bundle)
|
Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
|
||||||
────────────────────────────────────────────────────────────────────────────
|
────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
The Blastware waveform-file body lives between bytes [22+21=43] and the
|
[7-byte preamble] [stream of tagged blocks] [trailer]
|
||||||
26-byte file footer (``[: -26]``). Layout:
|
|
||||||
|
|
||||||
[preamble: 7 or 9 bytes]
|
The preamble is always exactly 7 bytes:
|
||||||
[data section: a stream of tagged blocks]
|
|
||||||
[trailer: per-channel summary blocks]
|
|
||||||
|
|
||||||
The preamble starts with the magic ``00 02 00 00``. After that there is
|
body[0:3] = 00 02 00 magic
|
||||||
either 3 or 5 bytes of header before the first ``10 NN`` block tag — in
|
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
|
||||||
the 4-event bundle, single-shot events have a 7-byte preamble and
|
body[5:7] = Tran[1] int16 BE in 16-count units
|
||||||
continuous events have 9. The exact meaning of bytes [4:9] is open
|
|
||||||
(empirically: byte [4] for event-a == truth Tran[0]; byte [4] for
|
|
||||||
event-b == truth Tran[0]; events c/d = 0; treating it as a per-channel
|
|
||||||
"initial value" partially matches but is inconsistent across events).
|
|
||||||
|
|
||||||
Blocks have 2-byte tags and these confirmed lengths:
|
(Earlier drafts of this module described a "7-or-9-byte preamble";
|
||||||
|
that was wrong — single-shot and continuous events both use 7 bytes.
|
||||||
|
The "extra 2 bytes" on continuous events were the first ``00 NN`` RLE
|
||||||
|
marker, not part of the preamble.)
|
||||||
|
|
||||||
| Tag (hex) | Block type | Total length |
|
Block types and lengths (all confirmed):
|
||||||
|-----------|--------------------------------------|-----------------|
|
|
||||||
| ``10 NN`` | Small-delta data block | NN/2 + 2 bytes |
|
|
||||||
| ``20 NN`` | Literal data block (looks int8-ish) | NN + 2 bytes |
|
|
||||||
| ``00 NN`` | 2-byte marker between data blocks | 2 bytes |
|
|
||||||
| ``30 NN`` | Trailer summary block | NN × 4 bytes |
|
|
||||||
| ``40 02`` | Segment header | 20 bytes |
|
|
||||||
|
|
||||||
In the 4-event bundle, every event's body parses as a clean sequence of
|
| Tag | Length | Meaning |
|
||||||
these blocks all the way through the trailer (when the walker is given
|
|----------|-----------------------|----------------------------------------|
|
||||||
the right preamble length). No "??" stops occur once the start offset
|
| ``10 NN``| NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
|
||||||
is correct.
|
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
|
||||||
|
| ``20 NN``| NN + 2 bytes | int8 signed deltas (1 per byte) |
|
||||||
|
| ``00 NN``| 2 bytes | RLE: append NN copies of current value |
|
||||||
|
| ``30 NN``| NN*2 in data, NN*4 | Unknown content. Only in loud events. |
|
||||||
|
| | in trailer | |
|
||||||
|
| ``40 02``| 20 bytes (fixed) | Segment header |
|
||||||
|
|
||||||
Segments and the ``40 02`` header
|
NN is always a multiple of 4.
|
||||||
────────────────────────────────────
|
|
||||||
|
|
||||||
The body is divided into ~16 SEGMENTS, each separated by a ``40 02``
|
────────────────────────────────────────────────────────────────────────────
|
||||||
header. Each segment carries ~80 sample-sets (1280-sample event = 16
|
Tran channel, segment 0 (CONFIRMED 2026-05-11)
|
||||||
segments × 80 sample-sets, 3328-sample event = ~42 segments). The 18-byte
|
────────────────────────────────────────────────────────────────────────────
|
||||||
``40 02`` payload contains:
|
|
||||||
|
|
||||||
bytes 0..3 4-byte channel anchor / state (varies per segment)
|
Segment 0 — everything before the first ``40 02`` segment header — encodes
|
||||||
bytes 4..7 4-byte field, varies (RMS/peak per channel?)
|
Tran samples only. Starting from preamble anchors Tran[0] and Tran[1],
|
||||||
bytes 8..11 4-byte uint32 LE counter (increments by 1 per segment;
|
each subsequent block contributes to the running Tran value:
|
||||||
starts at e.g. 0x47 for the first in-data segment)
|
|
||||||
bytes 12..15 4-byte fixed pattern: 02 00 00 01
|
|
||||||
bytes 16..17 2-byte segment-relative payload counter
|
|
||||||
|
|
||||||
The counter at bytes [8..11] increments cleanly across segments — useful
|
10 NN → append NN deltas (4-bit signed nibbles)
|
||||||
as a sanity check. The role of bytes [0..3] (anchor candidates) and
|
20 NN → append NN deltas (int8 signed bytes)
|
||||||
[4..7] is not pinned down: simple "channel state at segment boundary"
|
00 NN → append NN copies of the current value (RLE zeros)
|
||||||
hypotheses do NOT match truth across all four sample bundles tested.
|
40 02 → segment 0 ends; multi-segment continuation is open
|
||||||
|
|
||||||
What's open
|
This decodes the first 482–510 samples of Tran for each event with zero
|
||||||
────────────
|
errors against BW's ASCII export. The exact segment-0 sample count
|
||||||
|
varies per event (it's bounded by a fixed device-flash byte budget, not
|
||||||
|
a fixed sample count — quiet events fit more samples because zero
|
||||||
|
deltas pack into ``00 NN`` markers compactly).
|
||||||
|
|
||||||
The mapping ``block bytes → ADC samples`` is the open question. Tested
|
Implementation: :func:`decode_tran_initial`.
|
||||||
hypotheses that did **not** match BW's ASCII export to within the
|
|
||||||
required ±1 ADC count:
|
|
||||||
|
|
||||||
1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved
|
────────────────────────────────────────────────────────────────────────────
|
||||||
(TVLM/VTLM/LMTV/all 24 permutations × 2 nibble orders × 2 sign
|
Segment header (40 02, 20 bytes total)
|
||||||
conventions = 96 combinations tested). All produce values that
|
────────────────────────────────────────────────────────────────────────────
|
||||||
diverge from truth after the first ~7 sample-sets.
|
|
||||||
|
|
||||||
2. ``20 NN`` data = int8 absolute samples for one channel. Magnitudes
|
The 18-byte payload of the ``40 02`` block:
|
||||||
in observed blocks (peak ~±34 in the smoothest event-c block at
|
|
||||||
offset 351) do not match any channel's PPV at any plausible
|
|
||||||
ADC-count quantization (1-count, 4-count, 8-count, 16-count).
|
|
||||||
|
|
||||||
3. ``00 NN`` marker = "skip N sample-sets". Sums of NN/4 across markers
|
| Offset | Field | Status |
|
||||||
do not match 80 sample-sets per segment.
|
|-----------|---------------------------------------------|-------------|
|
||||||
|
| [0:2] | T_delta at first sample of new segment | ✅ confirmed|
|
||||||
|
| | (int16 BE, in 16-count units) | |
|
||||||
|
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
|
||||||
|
| [4:6] | Unknown (varies; possibly checksum) | ❓ open |
|
||||||
|
| [6:8] | Byte length to next segment header − 2 | ✅ confirmed|
|
||||||
|
| | (uint16 BE; useful for walker pre-scan) | |
|
||||||
|
| [8:12] | Monotonic uint32 LE counter | ✅ confirmed|
|
||||||
|
| | (starts ~0x47, increments by 1 per segment) | |
|
||||||
|
| [12:14] | Constant ``02 00`` | ✅ confirmed|
|
||||||
|
| [14:18] | Unknown 4-byte field | ❓ open |
|
||||||
|
|
||||||
4. Concatenating ALL ``10 NN`` payload bytes and reading as a continuous
|
────────────────────────────────────────────────────────────────────────────
|
||||||
nibble stream (TVLM round-robin) produces the same 96-combination
|
What breaks the multi-segment decoder (the main open question)
|
||||||
problem as (1).
|
────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
The most promising lead — that ``20 NN`` blocks carry literal int8
|
After segment 0 ends and the segment header T_delta is consumed,
|
||||||
sample-sequences for the largest-amplitude channel within a segment —
|
applying segment 1's blocks as Tran continuation produces values that
|
||||||
is consistent with the smooth waveform shape of those payloads, but
|
diverge from truth by sample ~512. The block structure inside segment
|
||||||
the magnitude scaling has not been pinned down. It's possible that
|
1 is IDENTICAL to segment 0 (same alternating 10 NN / 00 NN pattern),
|
||||||
``10 NN`` and ``20 NN`` blocks carry different bit-widths of the same
|
and the delta budget matches the segment size exactly (V70 segment 1
|
||||||
channel-interleaved delta stream (variable-width like Rice coding)
|
has 264 nibble-deltas + 244 RLE zeros = 508 = the segment's sample
|
||||||
with 4-bit deltas as default and 8-bit deltas as escape.
|
count). But the cumulative is wrong.
|
||||||
|
|
||||||
Potential next steps for whoever picks this up:
|
The strongest unverified hypothesis is that segments rotate channels:
|
||||||
|
|
||||||
- Capture an event with a KNOWN external waveform (e.g. a calibration
|
segment 0 → Tran samples 0..509
|
||||||
signal of known frequency/amplitude) so the truth is unambiguous and
|
segment 1 → Vert samples 0..507
|
||||||
the magnitude scaling is unambiguous.
|
segment 2 → Long samples 0..507
|
||||||
- Capture multiple events with the SAME signal but DIFFERENT geo_range
|
segment 3 → Mic samples 0..507
|
||||||
(Normal 10 in/s vs Sensitive 1.25 in/s) to disambiguate scaling.
|
segment 4 → Tran samples 510..N (continuation)
|
||||||
- Examine sequential 0x10 segment headers for a single event — the
|
...
|
||||||
4-byte "anchor" should reflect cumulative sample state at the
|
|
||||||
boundary; matching it to truth at that sample index would unlock
|
This is consistent with the segment-1 block sums net-to-near-zero in
|
||||||
the per-segment delta decode.
|
V70 (where all 4 channels are near zero) and with the per-segment delta
|
||||||
|
budget matching the segment size for a single channel. It is NOT yet
|
||||||
|
verified because the per-segment channel anchor isn't pinned down in
|
||||||
|
the segment header — bytes [4:6] and [14:18] of the header are still
|
||||||
|
open and probably encode V/L/M anchors.
|
||||||
|
|
||||||
|
See ``docs/waveform_codec_re_status.md`` for the current working notes
|
||||||
|
and the suggested next experiment ("segment-channel scoring analyzer").
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|||||||
Reference in New Issue
Block a user