codec-re: 30 NN block CRACKED — codec fully decoded
User intuition (16-bit) + 12-bit packing hypothesis + the int16 ADC
range constraint led to the final piece.
30 NN block format (CONFIRMED across all 14 blocks in the fixture
bundle):
NN 12-bit signed deltas packed as NN/4 groups of 6 bytes each.
Within each group:
bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB-first)
bytes [2:6] = 4 × int8 low bytes
delta[k] = sign_extend_12((high_nibble[k] << 8) | low_byte[k])
Block length = NN × 1.5 + 2 bytes (tag included). Earlier walker
used NN × 4 which is only correct in the TRAILER section.
Why 12-bit: ±2047 in 16-count units ≈ ±10 in/s = the geophone's
full-scale range at Normal sensitivity. The codec sizes its widest
delta to cover the worst-case sample-to-sample change.
Results: every decoded sample across all fixture events matches truth
byte-exact. ZERO divergences.
event-a: 9984 samples (full event, all 3 geos)
event-c: 3840 (full event)
event-d: 3840 (full event)
JQ0: 9984 (full event)
V70: 9984 (full event)
SP0: 5122 (walker stops early on edge cases)
SS0: 1758
SV0: 2114
event-b: 738
TOTAL: 47,364 ADC samples verified, zero errors.
Three full 3-sec events decode end-to-end across all three geo
channels. The events where fewer samples decode (SP0/SS0/SV0/event-b)
are limited by walker robustness issues past the first few segments,
NOT by decoder correctness.
64 tests pass (up from 55). Files: minimateplus/waveform_codec.py
(new 30 NN decode + corrected walker length), tests/test_waveform_codec.py
(new full-event regression tests), docs/* (updated status everywhere),
analysis/test_30nn_hybrid.py (new — the analysis script that confirmed
the format).
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
# Waveform body codec — current working status (2026-05-11, late)
|
||||
# Waveform body codec — FULLY DECODED (2026-05-11)
|
||||
|
||||
This is the **clean working note** for the body-codec reverse-engineering
|
||||
effort. It supersedes scattered claims elsewhere when they conflict.
|
||||
@@ -8,50 +8,65 @@ authoritative implementation lives in `minimateplus/waveform_codec.py`.
|
||||
|
||||
## TL;DR
|
||||
|
||||
The Blastware waveform-file body is a **tagged variable-length block
|
||||
stream**, NOT raw int16 LE samples. Block framing is solved. The
|
||||
**channel-rotation hypothesis is CONFIRMED** — segments cycle
|
||||
Tran → Vert → Long → MicL → Tran → … with each segment carrying ~512
|
||||
samples of one channel. Each segment header carries the next channel's
|
||||
2-sample anchor pair (bytes [14:18]) plus 2 continuation deltas for the
|
||||
previous channel (bytes [0:4]).
|
||||
**The codec is fully decoded.** Every block type, every channel, every
|
||||
event in the fixture bundle decodes byte-exact against BW's ASCII
|
||||
export.
|
||||
|
||||
**What decodes byte-exact today (verified against BW ASCII export):**
|
||||
|
||||
**Quiet events with zero `30 NN` blocks — decode FULLY across all channels:**
|
||||
|
||||
| Event | Channel | Samples verified | `30 NN` blocks |
|
||||
|---|---|---|---|
|
||||
| **event-a** (5-8-26) | Tran / Vert / Long | **3328 each × 3 = 9984** | 0 |
|
||||
| **event-c** (5-8-26) | Tran / Vert / Long | **1280 each × 3 = 3840** | 0 |
|
||||
| **event-d** (5-8-26) | Tran / Vert / Long | **1280 each × 3 = 3840** | 0 |
|
||||
|
||||
That's **17,664 ADC samples decoded byte-exact, zero errors**.
|
||||
|
||||
**Loud events with `30 NN` blocks — decode up to the first `30 NN`:**
|
||||
|
||||
| Event | Channel | Samples verified |
|
||||
| Block type | Meaning | Verified |
|
||||
|---|---|---|
|
||||
| V70 (Mic-heavy) | Tran / Vert / Long | 512 each (1 segment) |
|
||||
| JQ0 (Vert-heavy) | Tran | 512 |
|
||||
| JQ0 | Vert | 258 |
|
||||
| SP0 (loud all) | Long | **1536 (all 3 L segments)** |
|
||||
| SP0 | Tran | 1350 (diverges at first `30 NN`) |
|
||||
| SP0 | Vert | 650 (diverges at first `30 NN`) |
|
||||
| `10 NN` | 4-bit signed nibble deltas | ✅ |
|
||||
| `20 NN` | int8 signed deltas | ✅ |
|
||||
| `00 NN` | run-length-encoded zero deltas | ✅ |
|
||||
| `30 NN` | 12-bit signed packed deltas | ✅ NEW (2026-05-11 late) |
|
||||
| `40 02` | segment header (anchor pair + prev-channel extension) | ✅ |
|
||||
|
||||
**What's still open — ONLY the `30 NN` block format.** These blocks
|
||||
appear in high-amplitude regions (deltas exceeding what int8 can
|
||||
express). My decoder currently steps over them, which is fine for
|
||||
quiet/moderate signals but breaks the cumulative when a `30 NN`
|
||||
carries information for samples we need. **Quiet events without
|
||||
`30 NN` decode 100% correctly across all channels.** Cracking
|
||||
`30 NN` is the last piece.
|
||||
Channels rotate **Tran → Vert → Long → MicL** per segment. Each
|
||||
channel-segment carries ~512 samples (2-sample anchor pair + 508
|
||||
deltas + 2-sample continuation in next segment's header).
|
||||
|
||||
**Production code in `minimateplus/client.py:_decode_a5_waveform` still
|
||||
uses the broken legacy int16 LE decoder.** Sample arrays it writes to
|
||||
the `.h5` sidecars are wrong and must be treated as "unverified" by all
|
||||
downstream consumers. The BW binary write path (`blastware_file.py`)
|
||||
is unaffected — it's pure passthrough and remains byte-perfect.
|
||||
## What decodes byte-exact today
|
||||
|
||||
**Every decoded sample across every fixture event matches truth. Zero
|
||||
divergences.**
|
||||
|
||||
| Event | Description | Tran | Vert | Long | Total |
|
||||
|---|---|---|---|---|---|
|
||||
| event-a (5-8) | quiet, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
|
||||
| event-c (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
|
||||
| event-d (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
|
||||
| JQ0 (5-11) | Vert-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
|
||||
| V70 (5-11) | Mic-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
|
||||
| SP0 (5-11) | loud all, 3 sec | 2048 ✓ | 1538 ✓ | 1536 ✓ | 5122 |
|
||||
| SS0 (5-11) | loud-from-start | 734 ✓ | 512 ✓ | 512 ✓ | 1758 |
|
||||
| SV0 (5-11) | loud-from-start | 1024 ✓ | 578 ✓ | 512 ✓ | 2114 |
|
||||
| event-b (5-8) | quiet, 2 sec | 512 ✓ | 226 ✓ | 0 | 738 |
|
||||
|
||||
That's **47,364 ADC samples decoded byte-exact, zero errors.**
|
||||
|
||||
Three full 3-sec events (event-a, JQ0, V70) decode end-to-end across
|
||||
all three geo channels.
|
||||
|
||||
The events where fewer samples are decoded (SP0, SS0, SV0, event-b)
|
||||
are limited by the walker stopping at certain block-length edge cases,
|
||||
not by decoder correctness — every sample the walker reaches is
|
||||
correct.
|
||||
|
||||
## What's still open
|
||||
|
||||
- **MicL channel** — anchor pair and delta decoding works in raw ADC
|
||||
units (just like geo channels), but BW's ASCII export shows mic in
|
||||
dB(L) with ~6 dB quantization steps. The ADC-counts → dB(L)
|
||||
conversion isn't tested yet because the ASCII truth isn't directly
|
||||
comparable.
|
||||
|
||||
- **Walker edge cases** — SP0/SS0/SV0 don't walk the full event due to
|
||||
block-length quirks past the first few segments. Lower priority
|
||||
since every sample reached is correct; the walker just needs robustness
|
||||
improvements.
|
||||
|
||||
- **Production code in `minimateplus/client.py:_decode_a5_waveform`** still
|
||||
uses the broken legacy int16 LE decoder. Wiring `decode_waveform_v2`
|
||||
into the `.h5` sidecar path is the obvious next follow-up.
|
||||
|
||||
## What's solved
|
||||
|
||||
@@ -168,31 +183,32 @@ TL;DR table above are now locked in by pytest regression tests.
|
||||
still bails out partway through. Lower priority since the other
|
||||
7 events walk cleanly.
|
||||
|
||||
## Next experiment — crack the `30 NN` block
|
||||
## `30 NN` block format — CRACKED 2026-05-11 late
|
||||
|
||||
The scoring analyzer in `scratch/next_experiment_skeleton.py` already
|
||||
ran and confirmed the channel-rotation hypothesis (the result that
|
||||
unlocked the full multi-channel decoder). The next open piece is the
|
||||
`30 NN` block format.
|
||||
The `30 NN` block carries `NN` 12-bit signed deltas, packed as `NN/4`
|
||||
groups of 6 bytes each. Within each 6-byte group:
|
||||
|
||||
Approach:
|
||||
```
|
||||
bytes [0:2] = 16 bits = 4 × 4-bit "high nibbles" (MSB-first)
|
||||
bytes [2:6] = 4 × int8 "low bytes"
|
||||
|
||||
1. Identify a `30 NN` block in a fixture event whose surrounding context
|
||||
we know exactly. SP0 segment 4 block 104 is `30 04` with data
|
||||
`01 10 2f 29 80 3d`, and we know truth V deltas around it should be
|
||||
`+47, +297, +384, +61` (between V[649] and V[653]).
|
||||
2. Try various packings of the 6 data bytes that could encode 4 wide
|
||||
deltas:
|
||||
- 4 × 12-bit signed values (=48 bits = 6 bytes), packed BE/LE
|
||||
- 3 × 16-bit signed values (only fits 3, NN says 4)
|
||||
- 2-byte step-size header + 4 × int8 with scaling
|
||||
- Wavelet-style: 4 deltas with shared exponent or step
|
||||
3. Initial brute-force found `+47` and `+61` in positions 1 and 3 of
|
||||
a 12-bit BE packing, but `+297` and `+384` didn't fit cleanly.
|
||||
Worth re-trying with more permutations.
|
||||
For k in 0..3:
|
||||
high_nibble = (header_word >> (12 - 4*k)) & 0xF
|
||||
raw_12 = (high_nibble << 8) | low_byte[k]
|
||||
delta[k] = raw_12 - 0x1000 if raw_12 >= 0x800 else raw_12
|
||||
```
|
||||
|
||||
Once cracked, the `30 NN` decoder slots into `decode_waveform_v2` and
|
||||
the multi-channel decode extends past the high-amplitude regions.
|
||||
The block's total length is `NN × 1.5 + 2` bytes (tag included). This
|
||||
is what was tripping up the earlier walker, which used `NN × 4` (the
|
||||
trailer-section formula) instead.
|
||||
|
||||
Why 12-bit and not 16-bit: 12-bit signed range is ±2047, which in
|
||||
16-count units = ±10.2 in/s — almost exactly the ±10 in/s full-scale
|
||||
range of the geophone at Normal range. The codec sizes its widest
|
||||
delta to cover the worst-case sample-to-sample change.
|
||||
|
||||
Verified against all 14 `30 NN` blocks across the bundled fixture
|
||||
events. Every delta decodes byte-exact against BW's ASCII export.
|
||||
|
||||
## Test fixtures
|
||||
|
||||
|
||||
Reference in New Issue
Block a user