merge full s3 codec decoded #23
Reference in New Issue
Block a user
Delete Branch "codec-re"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
User uploaded 3 high-amplitude events (PPV 6-7 in/s — shook the geophone hard) to decode-re/5-11-26/. These cracked the Tran codec: - Preamble bytes [3:5] and [5:7] = Tran[0] and Tran[1] as int16 BE in 16-count units (LSB = 0.005 in/s). Confirmed across all 7 fixtures. - First data block carries Tran deltas from sample 2 onward: * 10 NN block: NN/2 bytes of payload, each byte = two 4-bit signed nibble deltas (high nibble first) * 20 NN block: NN int8 signed deltas Verified 22+42+46 = 110 Tran samples across SP0/SS0/SV0 with 0 errors against BW's ASCII export. Why the earlier 96-combination brute force failed: the quiet 5-8 events all had T[0] = T[1] ≈ 0 so the preamble's per-channel encoding was undetectable. Loud events made the encoding obvious. What's solved: - minimateplus.waveform_codec.decode_tran_initial: returns first N Tran samples in 16-count units for any body. - Walker length formula for in-data 30 NN blocks (NN*2 instead of NN*4). - Walker now handles bodies that start with 20 NN (in addition to 10 NN). What's still open: - Tran past the first data block (multi-block channel switching). - Vert / Long / MicL channel encodings. - Walker correctness past offset ~427 in event-b. Tests: 36 pass. decode_waveform_v2 still returns None — the full multi-channel decoder is not wired up. decode_tran_initial is the new verified entry point. Files: minimateplus/waveform_codec.py, tests/test_waveform_codec.py (adds 5-11-26 fixtures + decode_tran_initial tests), and docs/instantel_protocol_reference.md §7.6.1 (Tran codec spec).User intuition (16-bit) + 12-bit packing hypothesis + the int16 ADC range constraint led to the final piece. 30 NN block format (CONFIRMED across all 14 blocks in the fixture bundle): NN 12-bit signed deltas packed as NN/4 groups of 6 bytes each. Within each group: bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB-first) bytes [2:6] = 4 × int8 low bytes delta[k] = sign_extend_12((high_nibble[k] << 8) | low_byte[k]) Block length = NN × 1.5 + 2 bytes (tag included). Earlier walker used NN × 4 which is only correct in the TRAILER section. Why 12-bit: ±2047 in 16-count units ≈ ±10 in/s = the geophone's full-scale range at Normal sensitivity. The codec sizes its widest delta to cover the worst-case sample-to-sample change. Results: every decoded sample across all fixture events matches truth byte-exact. ZERO divergences. event-a: 9984 samples (full event, all 3 geos) event-c: 3840 (full event) event-d: 3840 (full event) JQ0: 9984 (full event) V70: 9984 (full event) SP0: 5122 (walker stops early on edge cases) SS0: 1758 SV0: 2114 event-b: 738 TOTAL: 47,364 ADC samples verified, zero errors. Three full 3-sec events decode end-to-end across all three geo channels. The events where fewer samples decode (SP0/SS0/SV0/event-b) are limited by walker robustness issues past the first few segments, NOT by decoder correctness. 64 tests pass (up from 55). Files: minimateplus/waveform_codec.py (new 30 NN decode + corrected walker length), tests/test_waveform_codec.py (new full-event regression tests), docs/* (updated status everywhere), analysis/test_30nn_hybrid.py (new — the analysis script that confirmed the format).Replaces the broken legacy int16 LE decoder in client.py with the verified multi-channel codec. Three changes: 1. blastware_file.extract_body_bytes(a5_frames) — new helper that factors out the body-reconstruction logic from write_blastware_file so both writers (BW binary) and decoders (sample arrays) can use the same canonical bytes. 2. waveform_codec.decode_a5_frames(a5_frames) — production entry point. Returns the raw_samples dict consumers expect (Tran/Vert/Long as int16 ADC counts; MicL as native ADC counts). Internally: A5 frames → extract_body_bytes → decode_waveform_v2 → decoded_to_adc_counts (geos ×16; mic pass-through) 3. waveform_codec.mic_count_to_db(count) — MicL ADC → dB(L) per BW's display formula: dB = sign(count) × (81.94 + 20 × log10(|count|)) for |count| ≥ 1 Verified against V70 fixture: count=813 → 140.14 dB (BW PSPL 140.1). client.py:_decode_a5_waveform is reduced to a thin wrapper that calls decode_a5_frames and populates event.raw_samples. Original implementation preserved as _decode_a5_waveform_LEGACY (dead code; reference only). Also fixed a tail-end bug in decode_waveform_v2 where trailer-section "40 02" markers (containing ASCII serial bytes, NOT real segment headers) were being mis-interpreted, producing 2 spurious samples per channel at the end of each event. Added bytes [12:14] == "02 00" validation to reject non-header markers. 7 new pytest tests cover the new helpers and dB conversion. Total: 71 passing (up from 64). Known limitation (carried over from before): the walker still stops mid-event on the loudest fixtures (SP0/SS0/SV0/event-b) at some mid-segment edge cases not yet characterized. Every sample reached is decoded correctly; the walker just doesn't reach all of them. Loud events still yield 5,000–15,000 byte-exact samples each.When NN exceeds 0xFC, the codec extends to 12-bit NN by using the low nibble of the TYPE byte as the high nibble of NN: 1X NN → nibble-delta block, NN = (X << 8) | NN_byte 2X NN → int8-delta block, same NN encoding Walker and decode_waveform_v2 now handle both narrow (X=0) and wide (X != 0) forms uniformly. Discovered while investigating why SP0/SS0/SV0/event-b walkers stopped mid-event. SP0 segment 12 (V continuation, cycle 3) starts with "11 90" — high nibble of byte 0 = 1 (= nibble-delta block type), low nibble = 1 plus byte 1 = 0x90 → NN = 0x190 = 400 nibble deltas in 202 bytes. Walker was rejecting "11" as a non-tag. Sample count went from 47,364 to 72,972 verified byte-exact: event-a: 9984 (full) was 9984 (full) event-b: 6912 (full) was 738 event-c: 3840 (full) was 3840 (full) event-d: 3840 (full) was 3840 (full) JQ0: 9984 (full) was 9984 (full) V70: 9984 (full) was 9984 (full) SP0: 9984 (full) was 5122 SS0: 9222 (-7 tail) was 1758 SV0: 9222 (-7 tail) was 2114 7 of 9 fixtures now decode end-to-end across all 3 geo channels. The 2 remaining (SS0, SV0) are missing only 1-7 tail samples per channel — minor walker edge case at the very end. 74 tests pass (was 71).