Files
seismo-relay/docs/histogram_codec_re_status.md
T
serversdown c3c7fe559c docs: histogram body codec RE — starting-point status doc
Captures everything learned in the 2026-05-20 session before scope
forced a pause:

  - Block framing is solved: 32-byte blocks, one per histogram
    interval, signature byte pattern `[22:24]=0x0000` +
    `[28:32]=0x1e 0x0a 0x00 0x00` reliably identifies data blocks.
  - Block count = interval count (791 blocks in N844L20G.630H for
    a TXT-reported 792 intervals).
  - Sample[0] = Tran peak in 0.0005 in/s/count units (verified on
    one event — needs cross-event confirmation).
  - Samples 1-8 → channel/metric mapping is still open.  None of
    the obvious layouts (peak-then-freq alternating, all-peaks-
    then-all-freqs, per-channel 3-tuples) match the TXT values
    across multiple blocks.  Likely needs a higher-activity
    fixture (current N844 corpus is all noise-floor data) to
    disambiguate.
  - `>100 Hz` sentinel encoding in the binary is unknown.
  - 4-byte variable metadata field at block[24:28] needs
    correlation work against TXT columns.

Doc mirrors the structure of docs/waveform_codec_re_status.md so
a future RE session has a familiar entry point.  Includes the
suggested attack plan + the code seam where the eventual decoder
will land (minimateplus/histogram_codec.py).

The §7.6.2 spec in instantel_protocol_reference.md is structurally
correct but doesn't pin down per-sample semantics — this doc
supersedes it where they conflict on confidence level.

No code shipped on this branch.  When the codec is cracked, the
plan is to land minimateplus/histogram_codec.py + wire into
event_file_io.read_blastware_file() + remove the has_samples
short-circuit from scripts/backfill_sidecars.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 21:13:26 +00:00

7.9 KiB
Raw Blame History

Histogram body codec — IN PROGRESS (started 2026-05-20)

Working notes for the Series III histogram-mode event body codec reverse-engineering effort. Mirrors the structure of waveform_codec_re_status.md (the now-completed waveform codec). The historical context lives in docs/instantel_protocol_reference.md §7.6.2; this doc is the active scratchpad.

TL;DR (current state)

Block framing is solved. Sample-to-channel mapping is open.

Component Status
32-byte block structure confirmed
Block count vs interval count confirmed (1 block per interval)
Sample-0 = Tran_peak at 0.0005 in/s/count scale confirmed against one event
Remaining samples 1-8 → channel mapping open
Frequency encoding (TXT shows >100 Hz, binary shows 1) open
Mic dB encoding open

The §7.6.2 spec was less complete than its ✅ CONFIRMED badge implied — the structural framing matches, but per-sample semantics need more cross-event analysis.

Confirmed structure (2026-05-20)

Body layout

body = [stream of 32-byte blocks]

Body length isn't always a multiple of 32 — observed 1-byte and 9-byte trailing remnants. Walker should iterate 32-stride and stop before the tail.

32-byte block header

[0]    0x00                   always-zero (probably a fixed format tag)
[1]    segment_id (uint8)     0x00, 0x01, 0x02, 0x03 — 256 blocks per segment
[2:4]  block_ctr (uint16 LE)  resets each segment (0x0100, 0x0101, ...)
[4:22] 9× int16 LE samples
[22:24] 0x00 0x00              constant
[24:28] 4-byte variable        unknown — possibly timestamp delta or CRC
[28:30] 0x1e 0x0a              constant signature (`30, 10`)
[30:32] 0x00 0x00              constant

Anchor for finding data blocks during a body walk: block[22:24] == b"\x00\x00" AND block[28:32] == b"\x1e\x0a\x00\x00". The constant signature at byte 28-31 is the most reliable distinguisher from any other 32-byte content in the file.

Block count = interval count

Confirmed against example-events/histogram/N844L20G.630H:

  • TXT reports Number of Intervals : 792.00
  • Binary contains 791 data blocks (one per interval, off-by-one at the tail — probably the last interval is truncated mid-write at recording stop)

Implication: each block represents exactly one histogram interval (1 minute in this fixture, configurable per device). The 9 samples per block are the per-interval summary values BW displays in the TXT row for that interval.

What sample 0 means

Confirmed: sample[0] / 2000 = Tran peak amplitude in in/s for the Normal-range geophone. Equivalently, sample[0] is in units of 0.0005 in/s per count (NOT the 0.005 in/s display quantum or the 1-count ADC quantum).

Verified for block 0 of N844L20G.630H:

  • binary sample[0] = 10
  • TXT Tran_peak[0] = 0.005 in/s
  • check: 10 × 0.0005 = 0.005 ✓

Worth verifying this holds across blocks with non-trivial Tran peaks before generalizing.

Open mappings

Samples 1-8 → channel + metric

TXT structure is 10 columns per interval:

Tran  Tran  Vert  Vert  Long  Long  Geo   MicL  MicL   MicL
Peak  Freq  Peak  Freq  Peak  Freq  PVS   psi   dB(L)  Freq
in/s  Hz    in/s  Hz    in/s  Hz    in/s  psi   dB     Hz

Binary has 9 samples per block (one short of the column count). None of the obvious mappings work:

Hypothesis Why it fails
(T_peak, T_freq, V_peak, V_freq, L_peak, L_freq, Geo, M_peak, M_freq) Sample[1]=1 doesn't decode to >100 Hz under any obvious scale
(T_peak, V_peak, L_peak, T_freq, V_freq, L_freq, Geo, M_peak, M_freq) V_peak should be 1 → 0.005 in/s but is 1 → would compute 0.0005, TXT shows 0.005 for some intervals, 0.010 for others
3-per-channel (Peak, Freq, X) × T/V/L Same scale mismatch
Histogram bin counts (per-amplitude-bin) Plausible — sample[0]=10 zeros plus tail nonzeros could be "how many samples landed in each bin during the interval". But then sample[0] = T_peak coincidence is suspicious.

>100 Hz is a sentinel BW writes when the measured zero-crossing frequency exceeds the geophone's measurement range. The binary encoding of this sentinel is unknown. Common candidates:

  • Special value (e.g. 0xFFFF / 0x7FFF / 0)
  • A flag bit in the metadata bytes (especially the 4-byte variable field at [24:28])

Metadata 4-byte variable field (bytes 24:28)

Examples from the first 8 blocks of N844L20G.630H:

block 0: 03 90 2a 00
block 1: 04 f2 84 00
block 2: 03 2b e7 00
block 3: 03 fe 11 00
block 4: 03 f7 91 00
block 5: 03 e9 4e 00
block 6: 03 4c 5c 00
block 7: 03 99 aa 00

First byte is mostly 0x03 (blocks 0,2-7) and sometimes 0x04 (block 1). Could be a CRC, timestamp delta, or per-interval status byte. Worth correlating against TXT columns that vary block-to-block.

Fixture corpus

In-repo histogram fixtures (paired binary + ASCII TXT):

example-events/histogram/N844L20G.630H       (27 KB, 791 blocks, 792 intervals)
example-events/histogram/N844L21H.2R0H       (22 KB)
example-events/histogram/N844L22A.VT0H       (27 KB)
example-events/histogram/N844L23B.ND0H       ...
example-events/histogram/N844L27U.U30H       ...
example-events/histogram/N844L28V.NA0H       ...
example-events/histogram/N844L6QT.IQ0H       ...
example-events/histogram/N844L6RU.BO0H       ...
example-events/histogram/N844L6SO.6I0H       ...
example-events/histogram/N844L6TP.2R0H       (and more)

All from BE12844 (a single MiniMate Plus unit), recorded over 2025-08-10 at 1-minute histogram intervals. All "noise floor" events — mostly silent intervals with rare spikes.

Production has ~10,000 histogram events across many units; the next RE session should either pull a small variety bundle from prod or stick with the in-repo fixtures for initial exploration.

Suggested attack plan for next session

  1. Verify sample[0] = T_peak hypothesis across all 791 blocks of N844L20G.630H — confirms the scale factor isn't a coincidence.
  2. Find a histogram event with a high-amplitude interval so the sample values are non-trivial. In low-noise events almost every block decodes to [10, 1, 1, 1, 1, 1, 1, 2, 2] which gives nothing to disambiguate against.
  3. Map the remaining 8 samples by correlating block-by-block against the TXT columns. Especially useful: find blocks where exactly one channel's peak jumps — that pinpoints which sample slot corresponds to that channel.
  4. Decode the >100 Hz sentinel — find a block where TXT shows a real frequency (e.g. 73.1 Hz) and reverse the binary value.
  5. Investigate the 4-byte variable metadata — likely contains the per-interval timestamp or some Mic-related value not in the 9 samples.
  6. Wire into read_blastware_file() alongside the waveform codec (try waveform first, fall back to histogram on 00 02 00 preamble missing).
  7. Update scripts/backfill_sidecars.py to remove the has_samples short-circuit so histogram .h5 files regenerate too.

Code seam for the eventual decoder

minimateplus/histogram_codec.py (to-be-created) should mirror minimateplus/waveform_codec.py:

def decode_histogram_body(body: bytes) -> Optional[dict]:
    """Decode a histogram-mode body into per-channel sample arrays.

    Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
    with each channel's per-interval peak values in ADC counts.
    Returns ``None`` if the body cannot be parsed.
    """

Then in event_file_io.read_blastware_file():

decoded = decode_waveform_v2(body)
if decoded is None:
    decoded = decode_histogram_body(body)
if decoded is None:
    log.warning(...)
    samples = {"Tran": [], ...}
else:
    samples = decoded_to_adc_counts(decoded)
  • Waveform body codec — docs/waveform_codec_re_status.md ( done)
  • Protocol reference for histogram mode — docs/instantel_protocol_reference.md §7.6.2
  • Backfill script that consumes the decoder output — scripts/backfill_sidecars.py