Commit Graph

9 Commits

Author SHA1 Message Date
Claude 2ff2762eec codec-re: 30 NN block CRACKED — codec fully decoded
User intuition (16-bit) + 12-bit packing hypothesis + the int16 ADC
range constraint led to the final piece.

30 NN block format (CONFIRMED across all 14 blocks in the fixture
bundle):

  NN 12-bit signed deltas packed as NN/4 groups of 6 bytes each.
  Within each group:
    bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB-first)
    bytes [2:6] = 4 × int8 low bytes
    delta[k] = sign_extend_12((high_nibble[k] << 8) | low_byte[k])

  Block length = NN × 1.5 + 2 bytes (tag included).  Earlier walker
  used NN × 4 which is only correct in the TRAILER section.

Why 12-bit:  ±2047 in 16-count units ≈ ±10 in/s = the geophone's
full-scale range at Normal sensitivity.  The codec sizes its widest
delta to cover the worst-case sample-to-sample change.

Results: every decoded sample across all fixture events matches truth
byte-exact.  ZERO divergences.

  event-a:  9984 samples (full event, all 3 geos)
  event-c:  3840 (full event)
  event-d:  3840 (full event)
  JQ0:      9984 (full event)
  V70:      9984 (full event)
  SP0:      5122 (walker stops early on edge cases)
  SS0:      1758
  SV0:      2114
  event-b:   738

  TOTAL: 47,364 ADC samples verified, zero errors.

Three full 3-sec events decode end-to-end across all three geo
channels.  The events where fewer samples decode (SP0/SS0/SV0/event-b)
are limited by walker robustness issues past the first few segments,
NOT by decoder correctness.

64 tests pass (up from 55).  Files: minimateplus/waveform_codec.py
(new 30 NN decode + corrected walker length), tests/test_waveform_codec.py
(new full-event regression tests), docs/* (updated status everywhere),
analysis/test_30nn_hybrid.py (new — the analysis script that confirmed
the format).
2026-05-20 17:28:54 +00:00
Claude d4cdce77fa codec-re: 30 NN partial finding — sum matches but per-sample distribution doesn't
Tested the 12-bit signed packed delta hypothesis (motivated by the
observation that ±2047 in 16-count units ≈ ±32K raw ADC counts, almost
exactly the int16 ADC range — a strong design hint).

Result: mixed.  For SP0 block @1689 (V seg 4, samples 650..653):
  truth deltas:                47, 297, 384, 61   (sum = 789)
  12-bit BE contiguous pred:   17,  47, 664, 61   (sum = 789)

Positions 1 and 3 of the pred match truth values at positions 0 and 3
exactly, AND the total sum across all 4 positions matches.  But
positions 0 and 2 of pred don't match any truth value.

Hypothesis space narrows to:
- 12-bit deltas WITH a specific re-ordering or interleaving
- 12-bit deltas with one of the positions being a "step size" or
  "checksum-like" repacked value
- A nonlinear / coded format where the underlying total displacement
  is preserved but per-sample distribution is encoded differently

Two analysis scripts committed (test_30nn_12bit.py, test_30nn_v2.py).
The v2 script uses a real-decoder simulation to get the exact channel
+ sample-index for each 30 NN block, eliminating off-by-one errors in
the truth lookup.
2026-05-20 17:28:54 +00:00
Claude ce5dc640ba codec-re: quiet bundle decodes FULLY (17k samples, zero errors)
User asked the right question: do events without 30 NN blocks decode
fully?  Answer: YES.

  event-a:  Tran 3328 ✓  Vert 3328 ✓  Long 3328 ✓  (28 segments, 0 '30 NN')
  event-c:  Tran 1280 ✓  Vert 1280 ✓  Long 1280 ✓  (12 segments, 0 '30 NN')
  event-d:  Tran 1280 ✓  Vert 1280 ✓  Long 1280 ✓  (12 segments, 0 '30 NN')

17,664 ADC samples decoded byte-exact against BW's ASCII export.
Zero divergences across event-a, event-c, event-d.

This means the codec is FULLY SOLVED for any event without 30 NN
blocks.  The remaining gap is the 30 NN block format only — used for
high-amplitude regions where deltas exceed int8 range.  For quiet
events (or quiet stretches of loud events), the decoder is complete.

9 new regression tests bring the total to 55, all passing.

Files: tests/test_waveform_codec.py + docs/waveform_codec_re_status.md
+ new analysis/verify_quiet_bundle.py.
2026-05-20 17:28:54 +00:00
Claude 07675626dc codec-re: channel rotation CONFIRMED — full multi-channel decoder works
The segment-channel scoring analyzer (from scratch/next_experiment_skeleton.py)
ran and immediately confirmed the rotation hypothesis:

  SP0 seg 0: best fit Vert  508/508  ✓
  SP0 seg 1: best fit Long  508/508  ✓
  SP0 seg 3: best fit Tran  508/508  ✓  (Tran continuation)
  SP0 seg 5: best fit Long  508/508  ✓
  SP0 seg 9: best fit Long  508/508  ✓
  V70 seg 0: best fit Vert  508/508  ✓
  V70 seg 1: best fit Long  508/508  ✓

Channels rotate Tran → Vert → Long → MicL per 40 02 segment header.

Also discovered the segment header has DOUBLE duty: bytes [14:18] anchor
the NEW segment's channel (2 samples as int16 BE in 16-count units), AND
bytes [0:4] extend the PREVIOUS channel by 2 more samples (2 deltas as
int16 BE).  This is the same "2 anchors + delta stream" structure as the
body preamble for Tran.

decode_waveform_v2 now returns full per-channel sample dicts.
Byte-exact verified ranges:
  V70: Tran 512, Vert 512, Long 512   (all first segments)
  JQ0: Tran 512, Vert 258
  SP0: Long 1536 (all 3 L segments)

Still open: the 30 NN block format (high-amplitude packed deltas) —
appears mid-segment when single-byte deltas can't carry the magnitude.

6 new tests bring the count to 46.  All passing.
2026-05-20 17:28:54 +00:00
Claude ae0e17b5dc codec-re: handoff polish — readmes, skeleton, remove decode-re/ duplicate
Three things to make pickup smoother:

1. analysis/README.md (NEW): catalogues the ~25 scratch scripts.
   Categorizes them as "still useful" / "superseded — keep for
   archaeology" / "pure exploration".  Tells a fresh engineer which
   files to read first and which to ignore.

2. scratch/next_experiment_skeleton.py (NEW): stub + spec for the
   segment-channel scoring analyzer.  Includes the fixture loader,
   block walker, and decode-segment-as-channel helper — just enough
   scaffolding that the next pass starts from "fill in
   score_segment_against_all_channels()" rather than from scratch.
   Already runs and confirms 13 segments per 3-sec event with sample
   starts going to 6590 (way past the 3328 actual samples) — strong
   evidence that not all segments carry Tran.

3. Removed decode-re/ duplicate.  It was a mirror of tests/fixtures/.
   Analysis scripts that hardcoded decode-re/ paths updated to point
   at tests/fixtures/.  CLAUDE.md note updated: future event uploads
   go directly into a dated subdirectory under tests/fixtures/.

All 40 tests still pass.  Skeleton runs.
2026-05-20 17:28:54 +00:00
Claude 9ed6f2a8d8 codec-re: add segment 1 block dumper for analysis
Investigated multi-segment Tran continuation but couldn't crack it.
Each hypothesis tried (segment header consumes 0/1/2 T deltas, blocks
continue Tran with various interpretations) breaks at sample ~512.

Block budget for V70 segment 1: 264 nibbles + 244 RLE zeros = 508
deltas — exactly the segment size. So the block structure CAN encode
508 single-channel samples, but applying segment 1 blocks as Tran
gives wrong values.

Most likely the channel ordering changes in segment 1+ (e.g., segment
0 = Tran, segment 1 = Vert, segment 2 = Long, etc.) but I couldn't
verify cleanly.  Stopping here — segment-0 Tran decode is solid and
multi-segment work needs more fresh thinking.
2026-05-20 17:28:54 +00:00
Claude a0c9a482c7 codec-re: 00 NN is RLE; full Tran segment-0 decode (4 of 5 events)
User uploaded a Vert-heavy event (JQ0) and a Mic-heavy event (V70).
Those two were exactly what was needed to crack the next piece:

- 00 NN block = run-length-encoded zero deltas in the current channel.
  Append NN copies of the current cumulative value (no change).
- find_data_start now recognizes 00 NN as a valid first tag (some events
  begin with a leading 00 NN RLE block).
- decode_tran_initial now decodes the FULL segment 0 (not just the first
  data block).

Results across 5 fixture events:
  - M529LL1A.SP0 (loud-all-channels)  : 510 / 510  ✓
  - M529LL1L.JQ0 (Vert-heavy)         : 510 / 510  ✓
  - M529LL1L.V70 (Mic-heavy)          : 510 / 510  ✓
  - M529LL1A.SV0 (loud-from-start)    :  58 /  58  ✓
  - M529LL1A.SS0 (loud-from-start)    :  42 / 502  (stops at first 30 04)

The 30 04 block (only seen in loud-from-start events) hasn't been
decoded yet — likely a channel-switch marker for the high-amplitude
regime.

Also discovered: segment header (40 02) payload bytes [0:2] = T_delta
at first sample of new segment, [6:8] = byte length to next segment.
Multi-segment Tran decoding still diverges after sample 512 because
the per-segment channel ordering after the header is unknown.

Tests: 40 pass (up from 36).

Files:
- minimateplus/waveform_codec.py: find_data_start fix, RLE handling,
  full segment-0 decode in decode_tran_initial
- tests/test_waveform_codec.py: synthetic RLE test, full segment 0
  tests for JQ0 and V70
- tests/fixtures/5-11-26/: M529LL1L.JQ0, M529LL1L.V70 + TXT exports
- docs/instantel_protocol_reference.md §7.6.1: RLE + segment-header docs
2026-05-20 17:28:54 +00:00
Claude 6ac126e05c codec-re: crack Tran channel codec with high-amplitude May 11 bundle
User uploaded 3 high-amplitude events (PPV 6-7 in/s — shook the geophone
hard) to decode-re/5-11-26/.  These cracked the Tran codec:

- Preamble bytes [3:5] and [5:7] = Tran[0] and Tran[1] as int16 BE
  in 16-count units (LSB = 0.005 in/s).  Confirmed across all 7
  fixtures.
- First data block carries Tran deltas from sample 2 onward:
  * 10 NN block: NN/2 bytes of payload, each byte = two 4-bit signed
    nibble deltas (high nibble first)
  * 20 NN block: NN int8 signed deltas

Verified 22+42+46 = 110 Tran samples across SP0/SS0/SV0 with 0 errors
against BW's ASCII export.

Why the earlier 96-combination brute force failed: the quiet 5-8
events all had T[0] = T[1] ≈ 0 so the preamble's per-channel encoding
was undetectable.  Loud events made the encoding obvious.

What's solved:
- minimateplus.waveform_codec.decode_tran_initial: returns first
  N Tran samples in 16-count units for any body.
- Walker length formula for in-data 30 NN blocks (NN*2 instead of NN*4).
- Walker now handles bodies that start with 20 NN (in addition to 10 NN).

What's still open:
- Tran past the first data block (multi-block channel switching).
- Vert / Long / MicL channel encodings.
- Walker correctness past offset ~427 in event-b.

Tests: 36 pass.  decode_waveform_v2 still returns None — the full
multi-channel decoder is not wired up.  decode_tran_initial is the
new verified entry point.

Files: minimateplus/waveform_codec.py, tests/test_waveform_codec.py
(adds 5-11-26 fixtures + decode_tran_initial tests), and
docs/instantel_protocol_reference.md §7.6.1 (Tran codec spec).
2026-05-20 17:28:54 +00:00
Claude d3f77d1d96 codec-re: solve waveform body block framing; per-byte sample mapping still open
Decoded the structural framing of the Blastware waveform body — the bytes
between the 21-byte STRT record and the 26-byte file footer.  The body is
a sequence of tagged variable-length blocks, NOT raw int16 LE.  Five tag
types (10/20/00/30/40 NN) and their lengths are now confirmed against the
4-event May 2026 fixture bundle.  Body splits cleanly into ~16 segments
(for a 1280-sample event) separated by 40 02 segment headers carrying a
monotonically incrementing uint32 LE counter at bytes [8:12].

What's done:
- minimateplus/waveform_codec.py — block walker, segment splitter, segment
  header parser.  decode_waveform_v2 is a stub returning None until the
  byte-to-sample mapping is solved; client.py is unchanged.
- tests/test_waveform_codec.py — 31 tests covering block detection, lengths,
  contiguous-walk, segment splitting, segment-header parsing, and counter
  monotonicity.  All pass.
- tests/fixtures/decode-re-5-8-26/ — bundled fixtures (4 events, BW binary
  + Blastware ASCII export each).
- docs/instantel_protocol_reference.md §7.6.1 — replaced retraction box
  with the verified structural decoding plus an explicit list of what's
  still open.

What's still open: the per-byte mapping inside 10 NN / 20 NN blocks.  96
channel-permutation × nibble-order × sign-convention combinations were
brute-force tested; none match BW's ASCII export to within ±1 ADC count.
The codec is more elaborate than uniform 4-bit deltas — likely a hybrid
variable-bit-width scheme with segment-anchor resync points.  Next
recommended step: capture an event with a known calibration tone to pin
down magnitude scaling.

Walker also bails out partway through event-b (open issue documented in
both the module and the protocol reference).
2026-05-20 17:28:54 +00:00