Commit Graph

6 Commits

Author SHA1 Message Date
Claude 0466bb4f44 codec: crack wide-NN blocks (1X NN / 2X NN); loud events now fully decode
When NN exceeds 0xFC, the codec extends to 12-bit NN by using the
low nibble of the TYPE byte as the high nibble of NN:

    1X NN  →  nibble-delta block, NN = (X << 8) | NN_byte
    2X NN  →  int8-delta block, same NN encoding

Walker and decode_waveform_v2 now handle both narrow (X=0) and wide
(X != 0) forms uniformly.

Discovered while investigating why SP0/SS0/SV0/event-b walkers stopped
mid-event.  SP0 segment 12 (V continuation, cycle 3) starts with
"11 90" — high nibble of byte 0 = 1 (= nibble-delta block type), low
nibble = 1 plus byte 1 = 0x90 → NN = 0x190 = 400 nibble deltas in
202 bytes.  Walker was rejecting "11" as a non-tag.

Sample count went from 47,364 to 72,972 verified byte-exact:

  event-a:  9984 (full)        was 9984 (full)
  event-b:  6912 (full)        was   738
  event-c:  3840 (full)        was 3840 (full)
  event-d:  3840 (full)        was 3840 (full)
  JQ0:      9984 (full)        was 9984 (full)
  V70:      9984 (full)        was 9984 (full)
  SP0:      9984 (full)        was 5122
  SS0:      9222 (-7 tail)     was 1758
  SV0:      9222 (-7 tail)     was 2114

7 of 9 fixtures now decode end-to-end across all 3 geo channels.
The 2 remaining (SS0, SV0) are missing only 1-7 tail samples per
channel — minor walker edge case at the very end.

74 tests pass (was 71).
2026-05-20 17:28:54 +00:00
Claude 85f4bcfe86 codec: wire decode_waveform_v2 into production; add MicL dB helper
Replaces the broken legacy int16 LE decoder in client.py with the
verified multi-channel codec.  Three changes:

1. blastware_file.extract_body_bytes(a5_frames) — new helper that
   factors out the body-reconstruction logic from write_blastware_file
   so both writers (BW binary) and decoders (sample arrays) can use
   the same canonical bytes.

2. waveform_codec.decode_a5_frames(a5_frames) — production entry point.
   Returns the raw_samples dict consumers expect (Tran/Vert/Long as
   int16 ADC counts; MicL as native ADC counts).  Internally:
     A5 frames → extract_body_bytes → decode_waveform_v2
                → decoded_to_adc_counts (geos ×16; mic pass-through)

3. waveform_codec.mic_count_to_db(count) — MicL ADC → dB(L) per BW's
   display formula:
     dB = sign(count) × (81.94 + 20 × log10(|count|))   for |count| ≥ 1
   Verified against V70 fixture: count=813 → 140.14 dB (BW PSPL 140.1).

client.py:_decode_a5_waveform is reduced to a thin wrapper that calls
decode_a5_frames and populates event.raw_samples.  Original implementation
preserved as _decode_a5_waveform_LEGACY (dead code; reference only).

Also fixed a tail-end bug in decode_waveform_v2 where trailer-section
"40 02" markers (containing ASCII serial bytes, NOT real segment headers)
were being mis-interpreted, producing 2 spurious samples per channel at
the end of each event.  Added bytes [12:14] == "02 00" validation to
reject non-header markers.

7 new pytest tests cover the new helpers and dB conversion.  Total:
71 passing (up from 64).

Known limitation (carried over from before): the walker still stops
mid-event on the loudest fixtures (SP0/SS0/SV0/event-b) at some
mid-segment edge cases not yet characterized.  Every sample reached
is decoded correctly; the walker just doesn't reach all of them.
Loud events still yield 5,000–15,000 byte-exact samples each.
2026-05-20 17:28:54 +00:00
Claude 2ff2762eec codec-re: 30 NN block CRACKED — codec fully decoded
User intuition (16-bit) + 12-bit packing hypothesis + the int16 ADC
range constraint led to the final piece.

30 NN block format (CONFIRMED across all 14 blocks in the fixture
bundle):

  NN 12-bit signed deltas packed as NN/4 groups of 6 bytes each.
  Within each group:
    bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB-first)
    bytes [2:6] = 4 × int8 low bytes
    delta[k] = sign_extend_12((high_nibble[k] << 8) | low_byte[k])

  Block length = NN × 1.5 + 2 bytes (tag included).  Earlier walker
  used NN × 4 which is only correct in the TRAILER section.

Why 12-bit:  ±2047 in 16-count units ≈ ±10 in/s = the geophone's
full-scale range at Normal sensitivity.  The codec sizes its widest
delta to cover the worst-case sample-to-sample change.

Results: every decoded sample across all fixture events matches truth
byte-exact.  ZERO divergences.

  event-a:  9984 samples (full event, all 3 geos)
  event-c:  3840 (full event)
  event-d:  3840 (full event)
  JQ0:      9984 (full event)
  V70:      9984 (full event)
  SP0:      5122 (walker stops early on edge cases)
  SS0:      1758
  SV0:      2114
  event-b:   738

  TOTAL: 47,364 ADC samples verified, zero errors.

Three full 3-sec events decode end-to-end across all three geo
channels.  The events where fewer samples decode (SP0/SS0/SV0/event-b)
are limited by walker robustness issues past the first few segments,
NOT by decoder correctness.

64 tests pass (up from 55).  Files: minimateplus/waveform_codec.py
(new 30 NN decode + corrected walker length), tests/test_waveform_codec.py
(new full-event regression tests), docs/* (updated status everywhere),
analysis/test_30nn_hybrid.py (new — the analysis script that confirmed
the format).
2026-05-20 17:28:54 +00:00
Claude ce5dc640ba codec-re: quiet bundle decodes FULLY (17k samples, zero errors)
User asked the right question: do events without 30 NN blocks decode
fully?  Answer: YES.

  event-a:  Tran 3328 ✓  Vert 3328 ✓  Long 3328 ✓  (28 segments, 0 '30 NN')
  event-c:  Tran 1280 ✓  Vert 1280 ✓  Long 1280 ✓  (12 segments, 0 '30 NN')
  event-d:  Tran 1280 ✓  Vert 1280 ✓  Long 1280 ✓  (12 segments, 0 '30 NN')

17,664 ADC samples decoded byte-exact against BW's ASCII export.
Zero divergences across event-a, event-c, event-d.

This means the codec is FULLY SOLVED for any event without 30 NN
blocks.  The remaining gap is the 30 NN block format only — used for
high-amplitude regions where deltas exceed int8 range.  For quiet
events (or quiet stretches of loud events), the decoder is complete.

9 new regression tests bring the total to 55, all passing.

Files: tests/test_waveform_codec.py + docs/waveform_codec_re_status.md
+ new analysis/verify_quiet_bundle.py.
2026-05-20 17:28:54 +00:00
Claude 07675626dc codec-re: channel rotation CONFIRMED — full multi-channel decoder works
The segment-channel scoring analyzer (from scratch/next_experiment_skeleton.py)
ran and immediately confirmed the rotation hypothesis:

  SP0 seg 0: best fit Vert  508/508  ✓
  SP0 seg 1: best fit Long  508/508  ✓
  SP0 seg 3: best fit Tran  508/508  ✓  (Tran continuation)
  SP0 seg 5: best fit Long  508/508  ✓
  SP0 seg 9: best fit Long  508/508  ✓
  V70 seg 0: best fit Vert  508/508  ✓
  V70 seg 1: best fit Long  508/508  ✓

Channels rotate Tran → Vert → Long → MicL per 40 02 segment header.

Also discovered the segment header has DOUBLE duty: bytes [14:18] anchor
the NEW segment's channel (2 samples as int16 BE in 16-count units), AND
bytes [0:4] extend the PREVIOUS channel by 2 more samples (2 deltas as
int16 BE).  This is the same "2 anchors + delta stream" structure as the
body preamble for Tran.

decode_waveform_v2 now returns full per-channel sample dicts.
Byte-exact verified ranges:
  V70: Tran 512, Vert 512, Long 512   (all first segments)
  JQ0: Tran 512, Vert 258
  SP0: Long 1536 (all 3 L segments)

Still open: the 30 NN block format (high-amplitude packed deltas) —
appears mid-segment when single-byte deltas can't carry the magnitude.

6 new tests bring the count to 46.  All passing.
2026-05-20 17:28:54 +00:00
Claude f68ee9f0f9 docs: clean up waveform-codec doc layers per review
Three "truth layers" had drifted apart between commits.  Fixed:

1. waveform_codec.py docstring rewritten from the 2026-05-08
   "structural framing only" state to the 2026-05-11 "Tran segment 0
   solved + segment-header partially decoded" state.  Killed stale
   "~80 sample-sets per segment" language (real segments are
   flash-page-byte-sized, not sample-count-sized; observed first-segment
   sizes are 42-510 samples depending on signal).  Killed stale
   "preamble is 7 or 9 bytes" language (always 7).

2. docs/instantel_protocol_reference.md §7.6.1: added a clear
   "CURRENT STATUS" box at the top with a status table.  Replaced the
   stale "~80 sample-sets" line with the verified per-event segment
   sizes.  Merged two redundant segment-header field-table sections.

3. docs/waveform_codec_re_status.md (NEW): clean working-status doc.
   Solved / not solved / hypothesis / next experiment / fixtures /
   tests.  The protocol reference remains the historical Rosetta
   Stone; this new file is the current-truth working note that
   shouldn't accumulate fossil layers.

4. CLAUDE.md §"Waveform body codec": prominent warning box at top —
   "DO NOT TRUST decoded sample arrays yet."  BW binary passthrough
   is the only sample-bearing output to trust until the decoder
   lands.  Added a "Next experiment" subsection pointing the next
   pass at the segment-channel scoring analyzer.

40 tests still pass.
2026-05-20 17:28:54 +00:00