docs: clean up waveform-codec doc layers per review
Three "truth layers" had drifted apart between commits. Fixed: 1. waveform_codec.py docstring rewritten from the 2026-05-08 "structural framing only" state to the 2026-05-11 "Tran segment 0 solved + segment-header partially decoded" state. Killed stale "~80 sample-sets per segment" language (real segments are flash-page-byte-sized, not sample-count-sized; observed first-segment sizes are 42-510 samples depending on signal). Killed stale "preamble is 7 or 9 bytes" language (always 7). 2. docs/instantel_protocol_reference.md §7.6.1: added a clear "CURRENT STATUS" box at the top with a status table. Replaced the stale "~80 sample-sets" line with the verified per-event segment sizes. Merged two redundant segment-header field-table sections. 3. docs/waveform_codec_re_status.md (NEW): clean working-status doc. Solved / not solved / hypothesis / next experiment / fixtures / tests. The protocol reference remains the historical Rosetta Stone; this new file is the current-truth working note that shouldn't accumulate fossil layers. 4. CLAUDE.md §"Waveform body codec": prominent warning box at top — "DO NOT TRUST decoded sample arrays yet." BW binary passthrough is the only sample-bearing output to trust until the decoder lands. Added a "Next experiment" subsection pointing the next pass at the segment-channel scoring analyzer. 40 tests still pass.
This commit is contained in:
+103
-89
@@ -1,119 +1,133 @@
|
||||
"""
|
||||
waveform_codec.py — block-walker for the MiniMate Plus waveform body codec.
|
||||
waveform_codec.py — block-walker and partial decoder for the MiniMate Plus
|
||||
waveform-file body.
|
||||
|
||||
PARTIAL REVERSE-ENGINEERING — 2026-05-08.
|
||||
PARTIAL REVERSE-ENGINEERING — last updated 2026-05-11.
|
||||
|
||||
Status: STRUCTURAL FRAMING confirmed; per-block sample interpretation OPEN.
|
||||
The Blastware waveform-file body — the bytes between the 21-byte STRT
|
||||
record and the 26-byte file footer — is NOT raw int16 LE samples (the
|
||||
historical assumption that produced full-scale ±32K noise on every
|
||||
event). It is a tagged variable-length block stream with a custom
|
||||
delta + RLE codec.
|
||||
|
||||
This module replaces the int16-LE assumption that produced full-scale ±32K
|
||||
noise on every event. The body is NOT raw int16 LE: it is a sequence of
|
||||
tagged variable-length blocks. The block framing is solved here. The
|
||||
mapping from block bytes to ADC samples is **NOT yet pinned down** — the
|
||||
work-in-progress decoder ``decode_waveform_v2`` returns ``None`` until
|
||||
a verified algorithm is wired in.
|
||||
Current status:
|
||||
|
||||
Until ``decode_waveform_v2`` returns a verified result, callers that need
|
||||
sample data should keep relying on the legacy decoder in ``client.py``
|
||||
(known-broken, but at least stable in shape) and not consume this
|
||||
module's sample output.
|
||||
- Block framing: ✅ solved (block types and lengths all confirmed)
|
||||
- Tran channel, segment 0: ✅ solved (decode_tran_initial returns
|
||||
byte-exact values vs BW's ASCII export, across 5 of 5 loud-bundle
|
||||
events; first ~510 samples per event)
|
||||
- Multi-segment Tran continuation: ❌ open (every hypothesis breaks
|
||||
at the segment-1 boundary around sample 512)
|
||||
- Vert / Long / Mic channel decoders: ❌ open
|
||||
- 30 NN block content: ❌ open (only appears in loud-from-start events)
|
||||
|
||||
Production code in client.py still uses the broken int16 LE decoder.
|
||||
``decode_waveform_v2`` here returns ``None`` as a placeholder. Callers
|
||||
that need sample arrays should treat the legacy decoder's output as
|
||||
"unverified" — the BW binary write path is the only sample-bearing
|
||||
output that is currently trustworthy.
|
||||
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
Body structure (CONFIRMED 2026-05-08 against decode-re/5-8-26 4-event bundle)
|
||||
Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
The Blastware waveform-file body lives between bytes [22+21=43] and the
|
||||
26-byte file footer (``[: -26]``). Layout:
|
||||
[7-byte preamble] [stream of tagged blocks] [trailer]
|
||||
|
||||
[preamble: 7 or 9 bytes]
|
||||
[data section: a stream of tagged blocks]
|
||||
[trailer: per-channel summary blocks]
|
||||
The preamble is always exactly 7 bytes:
|
||||
|
||||
The preamble starts with the magic ``00 02 00 00``. After that there is
|
||||
either 3 or 5 bytes of header before the first ``10 NN`` block tag — in
|
||||
the 4-event bundle, single-shot events have a 7-byte preamble and
|
||||
continuous events have 9. The exact meaning of bytes [4:9] is open
|
||||
(empirically: byte [4] for event-a == truth Tran[0]; byte [4] for
|
||||
event-b == truth Tran[0]; events c/d = 0; treating it as a per-channel
|
||||
"initial value" partially matches but is inconsistent across events).
|
||||
body[0:3] = 00 02 00 magic
|
||||
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
|
||||
body[5:7] = Tran[1] int16 BE in 16-count units
|
||||
|
||||
Blocks have 2-byte tags and these confirmed lengths:
|
||||
(Earlier drafts of this module described a "7-or-9-byte preamble";
|
||||
that was wrong — single-shot and continuous events both use 7 bytes.
|
||||
The "extra 2 bytes" on continuous events were the first ``00 NN`` RLE
|
||||
marker, not part of the preamble.)
|
||||
|
||||
| Tag (hex) | Block type | Total length |
|
||||
|-----------|--------------------------------------|-----------------|
|
||||
| ``10 NN`` | Small-delta data block | NN/2 + 2 bytes |
|
||||
| ``20 NN`` | Literal data block (looks int8-ish) | NN + 2 bytes |
|
||||
| ``00 NN`` | 2-byte marker between data blocks | 2 bytes |
|
||||
| ``30 NN`` | Trailer summary block | NN × 4 bytes |
|
||||
| ``40 02`` | Segment header | 20 bytes |
|
||||
Block types and lengths (all confirmed):
|
||||
|
||||
In the 4-event bundle, every event's body parses as a clean sequence of
|
||||
these blocks all the way through the trailer (when the walker is given
|
||||
the right preamble length). No "??" stops occur once the start offset
|
||||
is correct.
|
||||
| Tag | Length | Meaning |
|
||||
|----------|-----------------------|----------------------------------------|
|
||||
| ``10 NN``| NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
|
||||
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
|
||||
| ``20 NN``| NN + 2 bytes | int8 signed deltas (1 per byte) |
|
||||
| ``00 NN``| 2 bytes | RLE: append NN copies of current value |
|
||||
| ``30 NN``| NN*2 in data, NN*4 | Unknown content. Only in loud events. |
|
||||
| | in trailer | |
|
||||
| ``40 02``| 20 bytes (fixed) | Segment header |
|
||||
|
||||
Segments and the ``40 02`` header
|
||||
────────────────────────────────────
|
||||
NN is always a multiple of 4.
|
||||
|
||||
The body is divided into ~16 SEGMENTS, each separated by a ``40 02``
|
||||
header. Each segment carries ~80 sample-sets (1280-sample event = 16
|
||||
segments × 80 sample-sets, 3328-sample event = ~42 segments). The 18-byte
|
||||
``40 02`` payload contains:
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
Tran channel, segment 0 (CONFIRMED 2026-05-11)
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
bytes 0..3 4-byte channel anchor / state (varies per segment)
|
||||
bytes 4..7 4-byte field, varies (RMS/peak per channel?)
|
||||
bytes 8..11 4-byte uint32 LE counter (increments by 1 per segment;
|
||||
starts at e.g. 0x47 for the first in-data segment)
|
||||
bytes 12..15 4-byte fixed pattern: 02 00 00 01
|
||||
bytes 16..17 2-byte segment-relative payload counter
|
||||
Segment 0 — everything before the first ``40 02`` segment header — encodes
|
||||
Tran samples only. Starting from preamble anchors Tran[0] and Tran[1],
|
||||
each subsequent block contributes to the running Tran value:
|
||||
|
||||
The counter at bytes [8..11] increments cleanly across segments — useful
|
||||
as a sanity check. The role of bytes [0..3] (anchor candidates) and
|
||||
[4..7] is not pinned down: simple "channel state at segment boundary"
|
||||
hypotheses do NOT match truth across all four sample bundles tested.
|
||||
10 NN → append NN deltas (4-bit signed nibbles)
|
||||
20 NN → append NN deltas (int8 signed bytes)
|
||||
00 NN → append NN copies of the current value (RLE zeros)
|
||||
40 02 → segment 0 ends; multi-segment continuation is open
|
||||
|
||||
What's open
|
||||
────────────
|
||||
This decodes the first 482–510 samples of Tran for each event with zero
|
||||
errors against BW's ASCII export. The exact segment-0 sample count
|
||||
varies per event (it's bounded by a fixed device-flash byte budget, not
|
||||
a fixed sample count — quiet events fit more samples because zero
|
||||
deltas pack into ``00 NN`` markers compactly).
|
||||
|
||||
The mapping ``block bytes → ADC samples`` is the open question. Tested
|
||||
hypotheses that did **not** match BW's ASCII export to within the
|
||||
required ±1 ADC count:
|
||||
Implementation: :func:`decode_tran_initial`.
|
||||
|
||||
1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved
|
||||
(TVLM/VTLM/LMTV/all 24 permutations × 2 nibble orders × 2 sign
|
||||
conventions = 96 combinations tested). All produce values that
|
||||
diverge from truth after the first ~7 sample-sets.
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
Segment header (40 02, 20 bytes total)
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
2. ``20 NN`` data = int8 absolute samples for one channel. Magnitudes
|
||||
in observed blocks (peak ~±34 in the smoothest event-c block at
|
||||
offset 351) do not match any channel's PPV at any plausible
|
||||
ADC-count quantization (1-count, 4-count, 8-count, 16-count).
|
||||
The 18-byte payload of the ``40 02`` block:
|
||||
|
||||
3. ``00 NN`` marker = "skip N sample-sets". Sums of NN/4 across markers
|
||||
do not match 80 sample-sets per segment.
|
||||
| Offset | Field | Status |
|
||||
|-----------|---------------------------------------------|-------------|
|
||||
| [0:2] | T_delta at first sample of new segment | ✅ confirmed|
|
||||
| | (int16 BE, in 16-count units) | |
|
||||
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
|
||||
| [4:6] | Unknown (varies; possibly checksum) | ❓ open |
|
||||
| [6:8] | Byte length to next segment header − 2 | ✅ confirmed|
|
||||
| | (uint16 BE; useful for walker pre-scan) | |
|
||||
| [8:12] | Monotonic uint32 LE counter | ✅ confirmed|
|
||||
| | (starts ~0x47, increments by 1 per segment) | |
|
||||
| [12:14] | Constant ``02 00`` | ✅ confirmed|
|
||||
| [14:18] | Unknown 4-byte field | ❓ open |
|
||||
|
||||
4. Concatenating ALL ``10 NN`` payload bytes and reading as a continuous
|
||||
nibble stream (TVLM round-robin) produces the same 96-combination
|
||||
problem as (1).
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
What breaks the multi-segment decoder (the main open question)
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
The most promising lead — that ``20 NN`` blocks carry literal int8
|
||||
sample-sequences for the largest-amplitude channel within a segment —
|
||||
is consistent with the smooth waveform shape of those payloads, but
|
||||
the magnitude scaling has not been pinned down. It's possible that
|
||||
``10 NN`` and ``20 NN`` blocks carry different bit-widths of the same
|
||||
channel-interleaved delta stream (variable-width like Rice coding)
|
||||
with 4-bit deltas as default and 8-bit deltas as escape.
|
||||
After segment 0 ends and the segment header T_delta is consumed,
|
||||
applying segment 1's blocks as Tran continuation produces values that
|
||||
diverge from truth by sample ~512. The block structure inside segment
|
||||
1 is IDENTICAL to segment 0 (same alternating 10 NN / 00 NN pattern),
|
||||
and the delta budget matches the segment size exactly (V70 segment 1
|
||||
has 264 nibble-deltas + 244 RLE zeros = 508 = the segment's sample
|
||||
count). But the cumulative is wrong.
|
||||
|
||||
Potential next steps for whoever picks this up:
|
||||
The strongest unverified hypothesis is that segments rotate channels:
|
||||
|
||||
- Capture an event with a KNOWN external waveform (e.g. a calibration
|
||||
signal of known frequency/amplitude) so the truth is unambiguous and
|
||||
the magnitude scaling is unambiguous.
|
||||
- Capture multiple events with the SAME signal but DIFFERENT geo_range
|
||||
(Normal 10 in/s vs Sensitive 1.25 in/s) to disambiguate scaling.
|
||||
- Examine sequential 0x10 segment headers for a single event — the
|
||||
4-byte "anchor" should reflect cumulative sample state at the
|
||||
boundary; matching it to truth at that sample index would unlock
|
||||
the per-segment delta decode.
|
||||
segment 0 → Tran samples 0..509
|
||||
segment 1 → Vert samples 0..507
|
||||
segment 2 → Long samples 0..507
|
||||
segment 3 → Mic samples 0..507
|
||||
segment 4 → Tran samples 510..N (continuation)
|
||||
...
|
||||
|
||||
This is consistent with the segment-1 block sums net-to-near-zero in
|
||||
V70 (where all 4 channels are near zero) and with the per-segment delta
|
||||
budget matching the segment size for a single channel. It is NOT yet
|
||||
verified because the per-segment channel anchor isn't pinned down in
|
||||
the segment header — bytes [4:6] and [14:18] of the header are still
|
||||
open and probably encode V/L/M anchors.
|
||||
|
||||
See ``docs/waveform_codec_re_status.md`` for the current working notes
|
||||
and the suggested next experiment ("segment-channel scoring analyzer").
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
Reference in New Issue
Block a user