docs: clean up waveform-codec doc layers per review

Three "truth layers" had drifted apart between commits.  Fixed:

1. waveform_codec.py docstring rewritten from the 2026-05-08
   "structural framing only" state to the 2026-05-11 "Tran segment 0
   solved + segment-header partially decoded" state.  Killed stale
   "~80 sample-sets per segment" language (real segments are
   flash-page-byte-sized, not sample-count-sized; observed first-segment
   sizes are 42-510 samples depending on signal).  Killed stale
   "preamble is 7 or 9 bytes" language (always 7).

2. docs/instantel_protocol_reference.md §7.6.1: added a clear
   "CURRENT STATUS" box at the top with a status table.  Replaced the
   stale "~80 sample-sets" line with the verified per-event segment
   sizes.  Merged two redundant segment-header field-table sections.

3. docs/waveform_codec_re_status.md (NEW): clean working-status doc.
   Solved / not solved / hypothesis / next experiment / fixtures /
   tests.  The protocol reference remains the historical Rosetta
   Stone; this new file is the current-truth working note that
   shouldn't accumulate fossil layers.

4. CLAUDE.md §"Waveform body codec": prominent warning box at top —
   "DO NOT TRUST decoded sample arrays yet."  BW binary passthrough
   is the only sample-bearing output to trust until the decoder
   lands.  Added a "Next experiment" subsection pointing the next
   pass at the segment-channel scoring analyzer.

40 tests still pass.
This commit is contained in:
Claude
2026-05-12 02:43:25 +00:00
committed by serversdown
parent 5bf5329369
commit f68ee9f0f9
4 changed files with 385 additions and 139 deletions
+33 -6
View File
@@ -61,10 +61,24 @@ Full read pipeline + write pipeline + erase pipeline + monitor log + call home c
## Waveform body codec — PARTIAL (2026-05-11)
> ### ⛔️ DO NOT TRUST decoded sample arrays yet
>
> `client.py:_decode_a5_waveform` still uses the broken legacy int16 LE
> decoder. The `.h5` sidecars SFM writes contain WRONG sample values
> for every event. Treat decoded sample arrays as "unverified" in all
> downstream consumers.
>
> The **BW binary write path** (`blastware_file.py`) is unaffected —
> it's pure passthrough of device flash bytes and remains byte-perfect.
> Use the `.bw` binary as the authoritative waveform output until the
> codec is fully decoded.
>
> Clean working-status doc: `docs/waveform_codec_re_status.md`.
> Full archaeological record: `docs/instantel_protocol_reference.md §7.6.1`.
The **per-byte decoding** of the Blastware waveform-file body (between the
21-byte STRT record and the 26-byte footer) was historically claimed to be
"raw int16 LE, 8 bytes per sample-set." That was wrong — see the
retraction in `docs/instantel_protocol_reference.md §7.6.1`. The body
"raw int16 LE, 8 bytes per sample-set." That was wrong. The body
is actually a tagged-block stream with a custom delta+RLE codec.
### What's solved (2026-05-11)
@@ -96,13 +110,26 @@ is actually a tagged-block stream with a custom delta+RLE codec.
(SS0, SV0) and breaks the simple Tran walk there. Probably a channel-
switch or alternative-encoding marker for high-amplitude regions.
### Next experiment
**Don't hero-code the full decoder.** Build a small analysis tool — a
segment-channel scoring analyzer. For each segment of each fixture
event, run the segment-0 Tran block-walk + RLE decode and score the
cumulative trajectory against the BW ASCII truth for each of {Tran,
Vert, Long, MicL} over that segment's sample range, trying different
anchor-bytes candidates from the segment header. The winning
(channel, anchor-location) combination for each segment reveals
whether segments rotate channels and which header bytes encode the
per-segment channel anchors.
See `docs/waveform_codec_re_status.md` for the full specification of
the next experiment.
### Production-code status
`client.py:_decode_a5_waveform` still uses the old (broken) int16 LE
decoder. Until the multi-channel decoder lands, the `.h5` sidecars
produced by SFM contain WRONG samples — keep treating them as
"unverified" downstream. `decode_waveform_v2()` returns `None` as a
placeholder.
decoder (see warning at the top of this section). `decode_waveform_v2()`
in `minimateplus/waveform_codec.py` returns `None` as a placeholder.
### Test fixtures
+77 -44
View File
@@ -860,20 +860,39 @@ MicL: 39 64 1D AA = 0.0000875 psi
---
#### 7.6.1 Blast / Waveform mode — 🟡 STRUCTURAL FRAMING + TRAN CODEC DECODED (2026-05-11)
#### 7.6.1 Blast / Waveform mode — 🟡 PARTIAL DECODE (2026-05-11)
> **Status (2026-05-11):** Block-level framing is solved. The Tran-channel
> encoding (preamble + first data block) is **fully verified** against the
> 3-event May 11 2026 high-amplitude bundle (PPV 6-7 in/s) and the 4-event
> May 8 bundle. Verts / Long / MicL channel encodings and multi-block
> Tran continuation are **still open**. The previous int16 LE claim
> remains REFUTED (see history below).
> ### 📌 CURRENT STATUS — read this first
>
> The earlier "4-channel interleaved s16 LE, 8 bytes per sample-set"
> claim was never validated and was wrong. No event in the project's
> archive ever came close to ADC saturation, yet the int16 LE decoder
> consistently produced full-scale ±32K noise — that was the signature
> of mis-aligned encoded data, not signal saturation.
> The body codec is **partially decoded** as of 2026-05-11. This
> section contains both current-truth spec AND historical retractions;
> when in doubt, the working summary lives at
> `docs/waveform_codec_re_status.md`.
>
> | Item | Status |
> |---|---|
> | Body has tagged variable-length blocks, NOT raw int16 LE | ✅ confirmed |
> | 5 block tag types (10/20/00/30/40 NN) with lengths | ✅ confirmed |
> | 7-byte preamble: `00 02 00` + Tran[0] + Tran[1] int16 BE | ✅ confirmed |
> | `00 NN` = RLE for zero deltas in the current channel | ✅ confirmed |
> | Tran channel, segment 0 (~482-510 samples / event) | ✅ byte-exact, 5/5 events |
> | Multi-segment Tran continuation | ❌ open (breaks at sample ~512) |
> | Vert / Long / MicL channel decoders | ❌ open |
> | `30 NN` block content (loud-from-start events) | ❌ open |
> | Earlier "raw int16 LE, 8 bytes per sample-set" claim | ❌ REFUTED |
>
> **Production code in `client.py:_decode_a5_waveform` still uses the
> broken int16 LE decoder.** The `.h5` sidecars SFM produces contain
> wrong sample values and must be treated as "unverified" downstream.
> The BW binary write path is unaffected (it's pure passthrough of the
> device's flash bytes, no decoding) and remains byte-perfect.
The "4-channel interleaved s16 LE, 8 bytes per sample-set" claim that
appeared in earlier revisions of this section was never validated and
was wrong. No event in the project's archive ever came close to ADC
saturation, yet the int16 LE decoder consistently produced full-scale
±32K noise — that was the signature of mis-aligned encoded data, not
signal saturation.
##### Body file layout
@@ -932,23 +951,38 @@ followed by a ``00 NN`` marker before the next data block.
##### Segments
The body is divided into ~16 SEGMENTS for a 1280-sample event (= 1
segment per ~80 sample-sets), separated by ``40 02`` segment headers.
A 3328-sample event has ~42 segments.
The body is divided into segments separated by ``40 02`` segment headers.
**Segment size is variable** — bounded by a fixed device-flash byte
budget, not a fixed sample count. Quiet events fit more samples per
segment (RLE compacts zero deltas via ``00 NN`` markers); loud events
fit fewer. Observed first-segment sizes in the bundled fixtures:
The 18-byte ``40 02`` payload structure (CONFIRMED across all 4
fixtures by inspecting the increment of bytes [8:12]):
| Event | Segment 0 size (Tran samples) |
|---|---|
| SP0 (loud, 0.25s pretrig) | 510 |
| SV0 (loud-from-start) | 58 (stops at first ``30 NN``) |
| SS0 (loud-from-start) | 42 (stops at first ``30 04``) |
| JQ0 (Vert-heavy, quiet Tran) | 510 |
| V70 (Mic-heavy, quiet geos) | 510 |
| Offset | Length | Field |
|--------|--------|--------------------------------------------------|
| 0 | 4 | Anchor / channel state (open — see below) |
| 4 | 4 | Variable field (open) |
| 8 | 4 | uint32 LE counter — increments by 1 per segment |
| 12 | 4 | Fixed pattern ``02 00 00 01`` |
| 16 | 2 | Variable tail |
⚠️ Earlier drafts of this section claimed "~80 sample-sets per segment"
based on incomplete walks; that figure is wrong. Segments are
flash-page-sized in bytes, not sample-count-sized.
The counter at bytes [8:12] starts in the 0x40s for a freshly-erased
device and increments cleanly — useful as a structural sanity check.
The 18-byte ``40 02`` payload structure:
| Offset | Field | Status |
|-----------|---------------------------------------------|-------------|
| [0:2] | T_delta at first sample of new segment | ✅ confirmed|
| | (int16 BE, in 16-count units) | |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (varies; possibly a checksum) | ❓ open |
| [6:8] | Byte length to next segment header 2 | ✅ confirmed|
| | (uint16 BE; useful for walker pre-scan) | |
| [8:12] | Monotonic uint32 LE counter | ✅ confirmed|
| | (starts ~0x47, increments by 1 per segment) | |
| [12:14] | Constant ``02 00`` | ✅ confirmed|
| [14:18] | Unknown 4-byte field | ❓ open |
Examples from event-c (1 sec single-shot):
@@ -1008,26 +1042,25 @@ where the codec is most complex stop at the first ``30 04``.
Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.
##### Segment header T-delta (PARTIAL 2026-05-11)
##### Multi-segment Tran continuation — OPEN
The 20-byte ``40 02`` segment header has its first 2 bytes ([0:2] of
payload) as an int16 BE Tran delta for the first sample of the new
segment. Verified across V70 (3 segments with 0 deltas) and SP0/JQ0
(1 segment with +1 delta). Other bytes of the segment header payload
are partially understood:
After segment 0 ends and the segment header's T_delta (bytes [0:2])
is applied, the next segment's blocks produce values that diverge from
truth by sample ~512. The block structure inside segment 1 is
identical to segment 0 (alternating ``10 NN`` / ``20 NN`` data +
``00 NN`` RLE), and the per-segment delta budget exactly matches the
segment size — V70 segment 1 has 264 nibble-deltas + 244 RLE-zeros =
508 = the segment's sample count. Cumulative deltas are correct in
aggregate (V70 net-zero ≈ truth net-zero) but the per-sample trajectory
is wrong when applied as Tran continuation.
| Payload offset | Field | Status |
|---|---|---|
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
| [2:4] | unknown (often 0; not a simple V or T delta) | ❓ open |
| [4:6] | unknown (varies per event; possibly a checksum) | ❓ open |
| [6:8] | byte length to next segment header 2 (uint16 BE) | ✅ confirmed |
| [8:12] | monotonic uint32 LE counter | ✅ confirmed |
| [12:14] | constant ``02 00`` | ✅ confirmed |
| [14:18] | unknown 4-byte field | ❓ open |
Multi-segment Tran decoding diverges after sample ~512 — the per-segment
channel ordering after the header is still unknown.
The strongest unverified hypothesis is that **segments rotate
channels**: segment 0 = Tran, segment 1 = Vert, segment 2 = Long,
segment 3 = Mic, segment 4 = Tran continuation, … This would explain
the per-segment delta-budget match while also explaining why segment
1 isn't Tran continuation. Verification needs the per-channel anchor
to come from segment-header bytes [4:6] or [14:18], which are still
open.
##### What's still open
+172
View File
@@ -0,0 +1,172 @@
# Waveform body codec — current working status (2026-05-11)
This is the **clean working note** for the body-codec reverse-engineering
effort. It supersedes scattered claims elsewhere when they conflict.
The deep historical record (with retractions, dead ends, and dated
analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
authoritative implementation lives in `minimateplus/waveform_codec.py`.
## TL;DR
The Blastware waveform-file body is a **tagged variable-length block
stream**, NOT raw int16 LE samples. Block framing is solved. Tran
channel segment-0 decoding is solved (byte-exact vs BW's ASCII export
across all 5 high-amplitude fixture events). Multi-segment continuation
and the Vert / Long / MicL channel decoders are still open.
**Production code in `minimateplus/client.py:_decode_a5_waveform` still
uses the broken legacy int16 LE decoder.** Sample arrays it writes to
the `.h5` sidecars are wrong and must be treated as "unverified" by all
downstream consumers. The BW binary write path (`blastware_file.py`)
is unaffected — it's pure passthrough and remains byte-perfect.
## What's solved
### Block framing
| Tag | Length | Meaning |
|----------|-----------------------|------------------------------------------|
| `10 NN` | NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
| `20 NN` | NN + 2 bytes | int8 signed deltas (1 per byte) |
| `00 NN` | 2 bytes | RLE: append NN copies of current value |
| `30 NN` | NN*2 in data section, | Unknown content. Only in loud-from- |
| | NN*4 in trailer | start events. |
| `40 02` | 20 bytes (fixed) | Segment header |
NN is always a multiple of 4.
Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
### 7-byte preamble
```
body[0:3] = 00 02 00 magic
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
body[5:7] = Tran[1] int16 BE in 16-count units
```
### Tran channel, segment 0
Segment 0 (everything before the first `40 02`) encodes Tran samples
only. Starting from preamble anchors Tran[0] and Tran[1], each block
contributes to a running cumulative:
- `10 NN` → append NN nibble-deltas
- `20 NN` → append NN int8-deltas
- `00 NN` → append NN copies of current value (RLE)
- `40 02` → end segment 0
Verified byte-exact:
| Event | Description | Segment 0 size | Match |
|---|---|---|---|
| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
Implementation: `decode_tran_initial()`.
### Segment header (`40 02`, 20 bytes total)
| Payload offset | Field | Status |
|---|---|---|
| [0:2] | T_delta at first sample of new segment (int16 BE) | ✅ confirmed |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (possibly checksum) | ❓ open |
| [6:8] | Byte length to next segment header 2 (uint16 BE) | ✅ confirmed |
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
| [12:14] | Constant `02 00` | ✅ confirmed |
| [14:18] | Unknown 4-byte field | ❓ open |
## What's still open
1. **Multi-segment Tran continuation.** After segment 0, applying
segment 1's blocks as Tran continuation diverges from truth by
sample ~512. Block structure is identical to segment 0 and the
per-segment delta budget matches the segment size — but the per-
sample trajectory is wrong.
2. **Vert / Long / MicL channel decoders.** No verified decoder for
any non-Tran channel.
3. **`30 NN` block content.** Only appears in loud-from-start events.
Probably a channel-switch or alternative-encoding marker for high-
amplitude regions. Walker steps over it without decoding.
## Strongest unverified hypothesis
Segments rotate channels:
```
segment 0 → Tran samples 0..509
segment 1 → Vert samples 0..507
segment 2 → Long samples 0..507
segment 3 → Mic samples 0..507
segment 4 → Tran samples 510..N (continuation)
...
```
This would explain:
- Why segment-0 = Tran works perfectly.
- Why segment 1 has the same block structure but applying it as Tran
continuation gives wrong values.
- Why the per-segment delta budget matches the segment size for a
*single* channel (508 deltas per segment, not 4 × 508).
Not yet verified because the per-channel anchor at segment-start isn't
identified in the segment header. Bytes [4:6] and [14:18] of the
header are the prime candidates.
## Next experiment — segment-channel scoring analyzer
Don't try to hero-code the full decoder. Instead, build a small
analysis tool that:
1. For each segment in every fixture event, runs the segment-0 Tran
decoder (block-walk + RLE) and produces a cumulative trajectory
of 508 deltas.
2. Scores that trajectory against the BW ASCII truth for *each* of
{Tran, Vert, Long, MicL} over the segment's sample range, starting
from different anchor-byte candidates from the segment header.
3. Reports which (channel, anchor-bytes-location) combination produces
the lowest error for each segment.
If the rotation hypothesis is right, segment 0 should clearly score
best against Tran, segment 1 against Vert, etc. The winning
anchor-bytes-location will reveal which segment-header bytes encode
the per-segment channel anchors.
If the rotation hypothesis is *not* right, the scorer will at least
narrow down what segment 1 actually carries.
## Test fixtures
Committed under `tests/fixtures/`:
- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode
works but the loud-amplitude tests (preamble anchors, `30 NN`) are
uninformative.
- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
channels). These cracked the Tran codec.
- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures. JQ0 is Vert-heavy,
V70 is Mic-heavy (140 dB). These cracked the `00 NN` RLE rule.
Each fixture has a `.TXT` Blastware ASCII export as ground truth.
## Tests
`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
- Block framing (5 tag types with correct lengths).
- Walker contiguity (no gaps or overlaps).
- Segment header parsing (counter monotonicity, fixed-pattern check).
- `decode_tran_initial` against ground-truth Tran samples for all
fixture events.
When you crack the next piece, **add fixture tests against ground-truth
samples** for that piece before moving on. Don't let unverified code
ship without a regression lock-in.
+103 -89
View File
@@ -1,119 +1,133 @@
"""
waveform_codec.py block-walker for the MiniMate Plus waveform body codec.
waveform_codec.py block-walker and partial decoder for the MiniMate Plus
waveform-file body.
PARTIAL REVERSE-ENGINEERING 2026-05-08.
PARTIAL REVERSE-ENGINEERING last updated 2026-05-11.
Status: STRUCTURAL FRAMING confirmed; per-block sample interpretation OPEN.
The Blastware waveform-file body the bytes between the 21-byte STRT
record and the 26-byte file footer is NOT raw int16 LE samples (the
historical assumption that produced full-scale ±32K noise on every
event). It is a tagged variable-length block stream with a custom
delta + RLE codec.
This module replaces the int16-LE assumption that produced full-scale ±32K
noise on every event. The body is NOT raw int16 LE: it is a sequence of
tagged variable-length blocks. The block framing is solved here. The
mapping from block bytes to ADC samples is **NOT yet pinned down** the
work-in-progress decoder ``decode_waveform_v2`` returns ``None`` until
a verified algorithm is wired in.
Current status:
Until ``decode_waveform_v2`` returns a verified result, callers that need
sample data should keep relying on the legacy decoder in ``client.py``
(known-broken, but at least stable in shape) and not consume this
module's sample output.
- Block framing: solved (block types and lengths all confirmed)
- Tran channel, segment 0: solved (decode_tran_initial returns
byte-exact values vs BW's ASCII export, across 5 of 5 loud-bundle
events; first ~510 samples per event)
- Multi-segment Tran continuation: open (every hypothesis breaks
at the segment-1 boundary around sample 512)
- Vert / Long / Mic channel decoders: open
- 30 NN block content: open (only appears in loud-from-start events)
Production code in client.py still uses the broken int16 LE decoder.
``decode_waveform_v2`` here returns ``None`` as a placeholder. Callers
that need sample arrays should treat the legacy decoder's output as
"unverified" the BW binary write path is the only sample-bearing
output that is currently trustworthy.
Body structure (CONFIRMED 2026-05-08 against decode-re/5-8-26 4-event bundle)
Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
The Blastware waveform-file body lives between bytes [22+21=43] and the
26-byte file footer (``[: -26]``). Layout:
[7-byte preamble] [stream of tagged blocks] [trailer]
[preamble: 7 or 9 bytes]
[data section: a stream of tagged blocks]
[trailer: per-channel summary blocks]
The preamble is always exactly 7 bytes:
The preamble starts with the magic ``00 02 00 00``. After that there is
either 3 or 5 bytes of header before the first ``10 NN`` block tag in
the 4-event bundle, single-shot events have a 7-byte preamble and
continuous events have 9. The exact meaning of bytes [4:9] is open
(empirically: byte [4] for event-a == truth Tran[0]; byte [4] for
event-b == truth Tran[0]; events c/d = 0; treating it as a per-channel
"initial value" partially matches but is inconsistent across events).
body[0:3] = 00 02 00 magic
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
body[5:7] = Tran[1] int16 BE in 16-count units
Blocks have 2-byte tags and these confirmed lengths:
(Earlier drafts of this module described a "7-or-9-byte preamble";
that was wrong single-shot and continuous events both use 7 bytes.
The "extra 2 bytes" on continuous events were the first ``00 NN`` RLE
marker, not part of the preamble.)
| Tag (hex) | Block type | Total length |
|-----------|--------------------------------------|-----------------|
| ``10 NN`` | Small-delta data block | NN/2 + 2 bytes |
| ``20 NN`` | Literal data block (looks int8-ish) | NN + 2 bytes |
| ``00 NN`` | 2-byte marker between data blocks | 2 bytes |
| ``30 NN`` | Trailer summary block | NN × 4 bytes |
| ``40 02`` | Segment header | 20 bytes |
Block types and lengths (all confirmed):
In the 4-event bundle, every event's body parses as a clean sequence of
these blocks all the way through the trailer (when the walker is given
the right preamble length). No "??" stops occur once the start offset
is correct.
| Tag | Length | Meaning |
|----------|-----------------------|----------------------------------------|
| ``10 NN``| NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
| ``20 NN``| NN + 2 bytes | int8 signed deltas (1 per byte) |
| ``00 NN``| 2 bytes | RLE: append NN copies of current value |
| ``30 NN``| NN*2 in data, NN*4 | Unknown content. Only in loud events. |
| | in trailer | |
| ``40 02``| 20 bytes (fixed) | Segment header |
Segments and the ``40 02`` header
NN is always a multiple of 4.
The body is divided into ~16 SEGMENTS, each separated by a ``40 02``
header. Each segment carries ~80 sample-sets (1280-sample event = 16
segments × 80 sample-sets, 3328-sample event = ~42 segments). The 18-byte
``40 02`` payload contains:
Tran channel, segment 0 (CONFIRMED 2026-05-11)
bytes 0..3 4-byte channel anchor / state (varies per segment)
bytes 4..7 4-byte field, varies (RMS/peak per channel?)
bytes 8..11 4-byte uint32 LE counter (increments by 1 per segment;
starts at e.g. 0x47 for the first in-data segment)
bytes 12..15 4-byte fixed pattern: 02 00 00 01
bytes 16..17 2-byte segment-relative payload counter
Segment 0 everything before the first ``40 02`` segment header encodes
Tran samples only. Starting from preamble anchors Tran[0] and Tran[1],
each subsequent block contributes to the running Tran value:
The counter at bytes [8..11] increments cleanly across segments useful
as a sanity check. The role of bytes [0..3] (anchor candidates) and
[4..7] is not pinned down: simple "channel state at segment boundary"
hypotheses do NOT match truth across all four sample bundles tested.
10 NN append NN deltas (4-bit signed nibbles)
20 NN append NN deltas (int8 signed bytes)
00 NN append NN copies of the current value (RLE zeros)
40 02 segment 0 ends; multi-segment continuation is open
What's open
This decodes the first 482510 samples of Tran for each event with zero
errors against BW's ASCII export. The exact segment-0 sample count
varies per event (it's bounded by a fixed device-flash byte budget, not
a fixed sample count quiet events fit more samples because zero
deltas pack into ``00 NN`` markers compactly).
The mapping ``block bytes ADC samples`` is the open question. Tested
hypotheses that did **not** match BW's ASCII export to within the
required ±1 ADC count:
Implementation: :func:`decode_tran_initial`.
1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved
(TVLM/VTLM/LMTV/all 24 permutations × 2 nibble orders × 2 sign
conventions = 96 combinations tested). All produce values that
diverge from truth after the first ~7 sample-sets.
Segment header (40 02, 20 bytes total)
2. ``20 NN`` data = int8 absolute samples for one channel. Magnitudes
in observed blocks (peak ~±34 in the smoothest event-c block at
offset 351) do not match any channel's PPV at any plausible
ADC-count quantization (1-count, 4-count, 8-count, 16-count).
The 18-byte payload of the ``40 02`` block:
3. ``00 NN`` marker = "skip N sample-sets". Sums of NN/4 across markers
do not match 80 sample-sets per segment.
| Offset | Field | Status |
|-----------|---------------------------------------------|-------------|
| [0:2] | T_delta at first sample of new segment | confirmed|
| | (int16 BE, in 16-count units) | |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (varies; possibly checksum) | open |
| [6:8] | Byte length to next segment header 2 | confirmed|
| | (uint16 BE; useful for walker pre-scan) | |
| [8:12] | Monotonic uint32 LE counter | confirmed|
| | (starts ~0x47, increments by 1 per segment) | |
| [12:14] | Constant ``02 00`` | confirmed|
| [14:18] | Unknown 4-byte field | open |
4. Concatenating ALL ``10 NN`` payload bytes and reading as a continuous
nibble stream (TVLM round-robin) produces the same 96-combination
problem as (1).
What breaks the multi-segment decoder (the main open question)
The most promising lead that ``20 NN`` blocks carry literal int8
sample-sequences for the largest-amplitude channel within a segment
is consistent with the smooth waveform shape of those payloads, but
the magnitude scaling has not been pinned down. It's possible that
``10 NN`` and ``20 NN`` blocks carry different bit-widths of the same
channel-interleaved delta stream (variable-width like Rice coding)
with 4-bit deltas as default and 8-bit deltas as escape.
After segment 0 ends and the segment header T_delta is consumed,
applying segment 1's blocks as Tran continuation produces values that
diverge from truth by sample ~512. The block structure inside segment
1 is IDENTICAL to segment 0 (same alternating 10 NN / 00 NN pattern),
and the delta budget matches the segment size exactly (V70 segment 1
has 264 nibble-deltas + 244 RLE zeros = 508 = the segment's sample
count). But the cumulative is wrong.
Potential next steps for whoever picks this up:
The strongest unverified hypothesis is that segments rotate channels:
- Capture an event with a KNOWN external waveform (e.g. a calibration
signal of known frequency/amplitude) so the truth is unambiguous and
the magnitude scaling is unambiguous.
- Capture multiple events with the SAME signal but DIFFERENT geo_range
(Normal 10 in/s vs Sensitive 1.25 in/s) to disambiguate scaling.
- Examine sequential 0x10 segment headers for a single event the
4-byte "anchor" should reflect cumulative sample state at the
boundary; matching it to truth at that sample index would unlock
the per-segment delta decode.
segment 0 Tran samples 0..509
segment 1 Vert samples 0..507
segment 2 Long samples 0..507
segment 3 Mic samples 0..507
segment 4 Tran samples 510..N (continuation)
...
This is consistent with the segment-1 block sums net-to-near-zero in
V70 (where all 4 channels are near zero) and with the per-segment delta
budget matching the segment size for a single channel. It is NOT yet
verified because the per-segment channel anchor isn't pinned down in
the segment header bytes [4:6] and [14:18] of the header are still
open and probably encode V/L/M anchors.
See ``docs/waveform_codec_re_status.md`` for the current working notes
and the suggested next experiment ("segment-channel scoring analyzer").
"""
from __future__ import annotations