codec-re: crack Tran channel codec with high-amplitude May 11 bundle

User uploaded 3 high-amplitude events (PPV 6-7 in/s — shook the geophone
hard) to decode-re/5-11-26/.  These cracked the Tran codec:

- Preamble bytes [3:5] and [5:7] = Tran[0] and Tran[1] as int16 BE
  in 16-count units (LSB = 0.005 in/s).  Confirmed across all 7
  fixtures.
- First data block carries Tran deltas from sample 2 onward:
  * 10 NN block: NN/2 bytes of payload, each byte = two 4-bit signed
    nibble deltas (high nibble first)
  * 20 NN block: NN int8 signed deltas

Verified 22+42+46 = 110 Tran samples across SP0/SS0/SV0 with 0 errors
against BW's ASCII export.

Why the earlier 96-combination brute force failed: the quiet 5-8
events all had T[0] = T[1] ≈ 0 so the preamble's per-channel encoding
was undetectable.  Loud events made the encoding obvious.

What's solved:
- minimateplus.waveform_codec.decode_tran_initial: returns first
  N Tran samples in 16-count units for any body.
- Walker length formula for in-data 30 NN blocks (NN*2 instead of NN*4).
- Walker now handles bodies that start with 20 NN (in addition to 10 NN).

What's still open:
- Tran past the first data block (multi-block channel switching).
- Vert / Long / MicL channel encodings.
- Walker correctness past offset ~427 in event-b.

Tests: 36 pass.  decode_waveform_v2 still returns None — the full
multi-channel decoder is not wired up.  decode_tran_initial is the
new verified entry point.

Files: minimateplus/waveform_codec.py, tests/test_waveform_codec.py
(adds 5-11-26 fixtures + decode_tran_initial tests), and
docs/instantel_protocol_reference.md §7.6.1 (Tran codec spec).
This commit is contained in:
Claude
2026-05-11 18:30:56 +00:00
committed by serversdown
parent d3f77d1d96
commit 6ac126e05c
14 changed files with 10113 additions and 50 deletions
+74 -39
View File
@@ -860,13 +860,14 @@ MicL: 39 64 1D AA = 0.0000875 psi
---
#### 7.6.1 Blast / Waveform mode — 🟡 STRUCTURAL FRAMING DECODED (2026-05-08)
#### 7.6.1 Blast / Waveform mode — 🟡 STRUCTURAL FRAMING + TRAN CODEC DECODED (2026-05-11)
> **Status (2026-05-08):** Block-level framing is solved and verified
> against the 4-event May 8 2026 bundle (3 sec / 2 sec / 1 sec / 1 sec
> events captured live from BE11529). The per-byte mapping from block
> data to ADC samples is **still open** — the previous int16 LE claim
> is REFUTED (see history below).
> **Status (2026-05-11):** Block-level framing is solved. The Tran-channel
> encoding (preamble + first data block) is **fully verified** against the
> 3-event May 11 2026 high-amplitude bundle (PPV 6-7 in/s) and the 4-event
> May 8 bundle. Verts / Long / MicL channel encodings and multi-block
> Tran continuation are **still open**. The previous int16 LE claim
> remains REFUTED (see history below).
>
> The earlier "4-channel interleaved s16 LE, 8 bytes per sample-set"
> claim was never validated and was wrong. No event in the project's
@@ -886,13 +887,32 @@ the 21-byte STRT record and the 26-byte file footer) is composed of
[trailer: per-channel summary blocks]
```
**Preamble:** starts with the 4-byte magic ``00 02 00 00``. Single-shot
events have a 7-byte preamble; continuous events have a 9-byte preamble
(the 4 events in the May 8 2026 bundle split 2/2 between the two
lengths). Bytes [4:9] of the preamble appear to encode initial
per-channel state but the layout has not been pinned down — for some
events byte [4] equals truth Tran[0] in 16-count units (0.005 in/s
LSB), but other channel-byte assignments don't fit consistently.
**Preamble (CONFIRMED 2026-05-11 across 3+4 events):**
```
body[0:3] = 00 02 00 magic
body[3:5] = Tran[0] int16 BE first Tran sample (LSB = 0.005 in/s)
body[5:7] = Tran[1] int16 BE second Tran sample
```
The preamble is therefore 7 bytes long. Earlier observations of a
"9-byte preamble" on continuous-mode events were a misread — those
events still have a 7-byte preamble; the next 2 bytes are part of the
first ``10 NN`` or ``20 NN`` data block (its tag).
Verified preamble decode for all 7 fixture events — Tran[0] and Tran[1]
from the preamble bytes exactly match the BW ASCII export (rounded to
0.005 in/s):
| Event | Preamble [3:7] (hex) | T[0] decoded | T[0] truth | T[1] decoded | T[1] truth |
|---|---|---|---|---|---|
| event-a (May 8) | ``01 00 00 00`` | +1 | +1 (0.005) | 0 | 0 |
| event-b (May 8) | ``ff ff ff 00`` | -1 | -1 | -1 | -1 |
| event-c (May 8) | ``00 00 00 00`` | 0 | 0 | 0 | 0 |
| event-d (May 8) | ``00 00 00 00`` | 0 | 0 | 0 | 0 |
| SP0 (May 11) | ``00 04 00 04`` | +4 | +4 (0.020) | +4 | +4 |
| SS0 (May 11) | ``ff a7 ff a7`` | -89 | -89 (-0.445) | -89 | -89 |
| SV0 (May 11) | ``fd 17 fd 06`` | -745 | -745 (-3.725) | -762 | -762 |
##### Block tags (CONFIRMED 2026-05-08)
@@ -951,40 +971,54 @@ in the form ``f3/f4/f5`` near ``20 10`` markers strongly resemble
int8 channel-bias values around -12). Detailed decoding of the
trailer is outside the path needed for sample reconstruction.
##### Tran channel codec — CONFIRMED 2026-05-11
The first data block (immediately after the 7-byte preamble) carries
Tran-channel deltas starting at sample 2. Two block types in alternation:
- ``10 NN``: ``NN/2`` bytes of payload. Each byte = two 4-bit signed
nibbles (high nibble first; 0..7 → 0..+7, 8..F → -8..-1). Each
nibble is one Tran delta in 16-count units.
- ``20 NN``: ``NN`` bytes of payload. Each byte = one int8 signed delta
in 16-count units.
Verified against all 3 May-11 fixture events:
| Event | First block | # T samples decoded | Matches truth |
|---|---|---|---|
| SP0 | ``10 14`` (10 bytes / 20 nibbles) | 22 (= 2 preamble + 20 deltas) | 22/22 ✓ |
| SS0 | ``10 28`` (20 bytes / 40 nibbles) | 42 | 42/42 ✓ |
| SV0 | ``20 2c`` (44 int8 bytes) | 46 | 46/46 ✓ |
Implementation: :func:`minimateplus.waveform_codec.decode_tran_initial`.
##### What's still open
- **The byte → sample mapping inside ``10 NN`` and ``20 NN`` blocks.**
Tested hypotheses that did not match BW's ASCII export to within ±1
ADC count:
- **Tran past the first data block.** After the first block, the
body has more ``10 NN`` / ``20 NN`` blocks separated by ``00 NN``
markers and occasionally ``30 NN`` blocks. Naive continuation
(treat all subsequent ``10/20 NN`` blocks as Tran) does NOT match
truth past the first block — the codec interleaves channels somehow.
``30 04`` markers appearing in SS0 between blocks 1 and 5 look
like channel-switch tags, but the switching rule has not been
fully decoded.
1. ``10 NN`` data = 4-bit signed nibble deltas, channel-interleaved,
all 24 channel permutations × 2 nibble orders × 2 sign conventions
× 2 init-from-header settings (= 96 combinations). All produce
values that diverge from truth after the first ~7 sample-sets.
2. ``20 NN`` data = int8 absolute or delta samples for one channel.
Magnitudes in observed blocks (peak ±34 in event-c at offset 351)
do not match any channel's PPV at any plausible ADC quantization
(1-count, 4-count, 8-count, 16-count).
3. ``00 NN`` marker = "skip N sample-sets with zero deltas". Sums
of NN/4 across markers do not consistently match the 80
sample-sets-per-segment count.
- **Vert / Long / MicL channel encodings.** No verified decoder
exists for these yet. Hypotheses tested without success:
V_init stored as int16 BE in ``30 NN`` block payload; V/L/M
blocks encoded in order after Tran with ``30 NN`` separators;
V encoded as ``V - T`` differential. None match truth.
The codec is more elaborate than uniform 4-bit deltas. A hybrid
variable-bit-width scheme (4-bit deltas in ``10 NN``, 8-bit deltas
or absolutes in ``20 NN``, segment-header anchors after each
``40 02``) is the most plausible remaining hypothesis.
- **The role of byte [4:9] of the preamble.** Byte 4 == Tran[0]
truth value (in 16-count units) for events a/b/d, but doesn't
fit consistently for event-c. Bytes [5:9] don't match a simple
per-channel encoding.
- **``30 NN`` block length.** In the trailer, ``30 NN`` blocks
are NN×4 bytes long. In the data section, ``30 NN`` blocks are
NN×2 bytes long (= 8 bytes for NN=4 in SS0). The walker tries
NN×2 first and falls back to NN×4 if needed.
- **Walker correctness past offset ~427 in event-b.** The walker
bails out partway through event-b — there is at least one block
whose length doesn't fit the lengths confirmed for the other
three events. Likely a ``20 NN`` with NN > 0xFC (currently
rejected by the walker), or a different length formula in some
context.
events. This is a separate (now lower-priority) issue.
##### Recommended next step
@@ -1011,6 +1045,7 @@ output shape — keep the ``.h5`` sidecars marked as
| Date | Note |
|---|---|
| 2026-05-11 | Tran channel codec cracked using a high-amplitude (PPV 6-7 in/s) event bundle. Preamble[3:7] = Tran[0]/Tran[1] as int16 BE in 16-count units (LSB = 0.005 in/s). First data block (``10 NN`` nibble-deltas or ``20 NN`` int8-deltas) carries Tran deltas from sample 2. Verified 22+42+46 = 110 samples across SP0/SS0/SV0 with 0 errors. Earlier 96-combination brute-force search on the quiet 5-8 bundle failed because Tran[0] = Tran[1] = 0 in those events made initial-value-from-preamble undetectable. |
| 2026-05-08 | Block tagging confirmed against the 4-event May 2026 bundle. All bodies parse cleanly through `walk_body` for events a/c/d. Event-b walks partway and stops at offset 427 (open issue). |
| 2026-05-08 | Earlier "4-channel interleaved s16 LE" claim formally retracted — never validated, produced full-scale ±32K noise on every event because the bytes are encoded, not raw samples. |
| 2026-04-02 | "Frame 7 metadata", "Frame 9 terminator", and `0x0400`-step chunk-counter claims documented as-was; later proved to be artifacts of an over-reading 5A walk (now superseded by §7.8.57.8.7). |