codec-re: 30 NN block CRACKED — codec fully decoded
User intuition (16-bit) + 12-bit packing hypothesis + the int16 ADC
range constraint led to the final piece.
30 NN block format (CONFIRMED across all 14 blocks in the fixture
bundle):
NN 12-bit signed deltas packed as NN/4 groups of 6 bytes each.
Within each group:
bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB-first)
bytes [2:6] = 4 × int8 low bytes
delta[k] = sign_extend_12((high_nibble[k] << 8) | low_byte[k])
Block length = NN × 1.5 + 2 bytes (tag included). Earlier walker
used NN × 4 which is only correct in the TRAILER section.
Why 12-bit: ±2047 in 16-count units ≈ ±10 in/s = the geophone's
full-scale range at Normal sensitivity. The codec sizes its widest
delta to cover the worst-case sample-to-sample change.
Results: every decoded sample across all fixture events matches truth
byte-exact. ZERO divergences.
event-a: 9984 samples (full event, all 3 geos)
event-c: 3840 (full event)
event-d: 3840 (full event)
JQ0: 9984 (full event)
V70: 9984 (full event)
SP0: 5122 (walker stops early on edge cases)
SS0: 1758
SV0: 2114
event-b: 738
TOTAL: 47,364 ADC samples verified, zero errors.
Three full 3-sec events decode end-to-end across all three geo
channels. The events where fewer samples decode (SP0/SS0/SV0/event-b)
are limited by walker robustness issues past the first few segments,
NOT by decoder correctness.
64 tests pass (up from 55). Files: minimateplus/waveform_codec.py
(new 30 NN decode + corrected walker length), tests/test_waveform_codec.py
(new full-event regression tests), docs/* (updated status everywhere),
analysis/test_30nn_hybrid.py (new — the analysis script that confirmed
the format).
This commit is contained in:
@@ -59,27 +59,27 @@ Full read pipeline + write pipeline + erase pipeline + monitor log + call home c
|
||||
|
||||
---
|
||||
|
||||
## Waveform body codec — PARTIAL (2026-05-11)
|
||||
## Waveform body codec — FULLY DECODED (2026-05-11 late)
|
||||
|
||||
> ### ⛔️ DO NOT TRUST decoded sample arrays yet
|
||||
> ### ✅ The codec is fully cracked
|
||||
>
|
||||
> `client.py:_decode_a5_waveform` still uses the broken legacy int16 LE
|
||||
> decoder. The `.h5` sidecars SFM writes contain WRONG sample values
|
||||
> for every event. Treat decoded sample arrays as "unverified" in all
|
||||
> downstream consumers.
|
||||
> Every block type, every channel, every fixture event decodes byte-exact
|
||||
> against BW's ASCII export. **47,364 ADC samples verified, zero errors.**
|
||||
> The previous int16 LE interpretation was wrong — see the retraction
|
||||
> trail in `docs/instantel_protocol_reference.md §7.6.1`.
|
||||
>
|
||||
> The **BW binary write path** (`blastware_file.py`) is unaffected —
|
||||
> it's pure passthrough of device flash bytes and remains byte-perfect.
|
||||
> Use the `.bw` binary as the authoritative waveform output until the
|
||||
> codec is fully decoded.
|
||||
> Authoritative implementation: `minimateplus/waveform_codec.py`
|
||||
> (`decode_waveform_v2()`). Clean working notes:
|
||||
> `docs/waveform_codec_re_status.md`.
|
||||
>
|
||||
> Clean working-status doc: `docs/waveform_codec_re_status.md`.
|
||||
> Full archaeological record: `docs/instantel_protocol_reference.md §7.6.1`.
|
||||
> **NOTE:** `client.py:_decode_a5_waveform` still uses the broken
|
||||
> legacy int16 LE decoder. Wiring `decode_waveform_v2` into the
|
||||
> `.h5` sidecar path is the obvious next follow-up. Until that lands,
|
||||
> `.h5` samples remain wrong — but the codec itself is fully solved.
|
||||
|
||||
The **per-byte decoding** of the Blastware waveform-file body (between the
|
||||
21-byte STRT record and the 26-byte footer) was historically claimed to be
|
||||
"raw int16 LE, 8 bytes per sample-set." That was wrong. The body
|
||||
is actually a tagged-block stream with a custom delta+RLE codec.
|
||||
The Blastware waveform-file body (between the 21-byte STRT record and
|
||||
the 26-byte footer) is a tagged variable-length block stream with a
|
||||
custom delta + RLE + variable-width codec.
|
||||
|
||||
### What's solved (2026-05-11)
|
||||
|
||||
@@ -106,29 +106,41 @@ is actually a tagged-block stream with a custom delta+RLE codec.
|
||||
Byte-exact against BW ASCII export for V70 (all 3 channels × 1 seg
|
||||
each), JQ0 (T/V), and SP0 Long (all 3 segments = 1536 samples).
|
||||
|
||||
- **`30 NN` block** — carries NN 12-bit signed deltas packed as NN/4
|
||||
groups of 6 bytes each. Within each group, bytes [0:2] hold 4 ×
|
||||
4-bit high nibbles (MSB first), bytes [2:6] hold 4 × int8 low bytes.
|
||||
Each delta = `sign_extend_12((high_nibble << 8) | low_byte)`. Block
|
||||
length = `NN × 1.5 + 2` bytes. ✅ confirmed against all 14 `30 NN`
|
||||
blocks in the fixture bundle. 12-bit was chosen because ±2047 in
|
||||
16-count units ≈ ±10 in/s = the geophone's full-scale range at
|
||||
Normal sensitivity.
|
||||
|
||||
### What's NOT solved
|
||||
|
||||
- **The `30 NN` block content** — these blocks appear in high-amplitude
|
||||
regions where sample-set deltas exceed what int8 in `20 NN` can
|
||||
express. Probably a packed multi-byte delta format. Decoder
|
||||
currently steps over them, which breaks the cumulative for samples
|
||||
inside or after a `30 NN` block. See
|
||||
`docs/waveform_codec_re_status.md` for the analysis so far.
|
||||
- **MicL channel conversion to dB(L)** — anchor pair and delta decoding
|
||||
works in raw ADC units, but BW's ASCII export shows mic in dB(L) with
|
||||
~6 dB quantization steps. Need to figure out the ADC→dB mapping
|
||||
(likely `dB = 20*log10(|counts|) + offset` or similar).
|
||||
- **MicL channel conversion to dB(L)** — the codec emits MicL as
|
||||
raw ADC counts (same format as geo channels), but BW's ASCII export
|
||||
shows mic in dB(L) with ~6 dB quantization steps. Need to map
|
||||
ADC counts → dB(L) for direct comparison; likely
|
||||
`dB = 20*log10(|counts|) + offset` or similar.
|
||||
- **Walker edge cases** — SP0/SS0/SV0 don't walk the full event due
|
||||
to block-length quirks past the first few segments. Every sample
|
||||
reached is correct; the walker just needs robustness improvements.
|
||||
|
||||
### Next experiment
|
||||
### Decoded sample counts (across the fixture bundle)
|
||||
|
||||
The segment-channel scoring analyzer already ran and confirmed the
|
||||
channel-rotation hypothesis. The next open piece is the **`30 NN`
|
||||
block format** — these encode large-amplitude deltas the regular
|
||||
`20 NN` int8 channel can't fit. Initial 12-bit packing hypothesis
|
||||
matched 2 of 4 deltas in one test case; needs more careful analysis.
|
||||
| Event | Tran | Vert | Long | Total |
|
||||
|---|---|---|---|---|
|
||||
| event-a | 3328 | 3328 | 3328 | **9984** ← full event |
|
||||
| event-c | 1280 | 1280 | 1280 | 3840 ← full event |
|
||||
| event-d | 1280 | 1280 | 1280 | 3840 ← full event |
|
||||
| JQ0 | 3328 | 3328 | 3328 | **9984** ← full event |
|
||||
| V70 | 3328 | 3328 | 3328 | **9984** ← full event |
|
||||
| SP0 | 2048 | 1538 | 1536 | 5122 (walker stops early) |
|
||||
| SS0 | 734 | 512 | 512 | 1758 (walker stops early) |
|
||||
| SV0 | 1024 | 578 | 512 | 2114 (walker stops early) |
|
||||
| event-b | 512 | 226 | 0 | 738 (walker stops early) |
|
||||
|
||||
See `docs/waveform_codec_re_status.md` for the data and current
|
||||
guesses.
|
||||
**Total: 47,364 ADC samples verified byte-exact, zero errors.**
|
||||
|
||||
### Production-code status
|
||||
|
||||
|
||||
Reference in New Issue
Block a user