From c3c7fe559c95c2197c2faf62a0f406664f075fdb Mon Sep 17 00:00:00 2001 From: serversdown Date: Wed, 20 May 2026 21:13:26 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20histogram=20body=20codec=20RE=20?= =?UTF-8?q?=E2=80=94=20starting-point=20status=20doc?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures everything learned in the 2026-05-20 session before scope forced a pause: - Block framing is solved: 32-byte blocks, one per histogram interval, signature byte pattern `[22:24]=0x0000` + `[28:32]=0x1e 0x0a 0x00 0x00` reliably identifies data blocks. - Block count = interval count (791 blocks in N844L20G.630H for a TXT-reported 792 intervals). - Sample[0] = Tran peak in 0.0005 in/s/count units (verified on one event — needs cross-event confirmation). - Samples 1-8 → channel/metric mapping is still open. None of the obvious layouts (peak-then-freq alternating, all-peaks- then-all-freqs, per-channel 3-tuples) match the TXT values across multiple blocks. Likely needs a higher-activity fixture (current N844 corpus is all noise-floor data) to disambiguate. - `>100 Hz` sentinel encoding in the binary is unknown. - 4-byte variable metadata field at block[24:28] needs correlation work against TXT columns. Doc mirrors the structure of docs/waveform_codec_re_status.md so a future RE session has a familiar entry point. Includes the suggested attack plan + the code seam where the eventual decoder will land (minimateplus/histogram_codec.py). The §7.6.2 spec in instantel_protocol_reference.md is structurally correct but doesn't pin down per-sample semantics — this doc supersedes it where they conflict on confidence level. No code shipped on this branch. When the codec is cracked, the plan is to land minimateplus/histogram_codec.py + wire into event_file_io.read_blastware_file() + remove the has_samples short-circuit from scripts/backfill_sidecars.py. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/histogram_codec_re_status.md | 212 ++++++++++++++++++++++++++++++ 1 file changed, 212 insertions(+) create mode 100644 docs/histogram_codec_re_status.md diff --git a/docs/histogram_codec_re_status.md b/docs/histogram_codec_re_status.md new file mode 100644 index 0000000..1a35d14 --- /dev/null +++ b/docs/histogram_codec_re_status.md @@ -0,0 +1,212 @@ +# Histogram body codec — IN PROGRESS (started 2026-05-20) + +Working notes for the Series III histogram-mode event body codec +reverse-engineering effort. Mirrors the structure of +`waveform_codec_re_status.md` (the now-completed waveform codec). The +historical context lives in `docs/instantel_protocol_reference.md +§7.6.2`; this doc is the active scratchpad. + +## TL;DR (current state) + +**Block framing is solved. Sample-to-channel mapping is open.** + +| Component | Status | +|---|---| +| 32-byte block structure | ✅ confirmed | +| Block count vs interval count | ✅ confirmed (1 block per interval) | +| Sample-0 = Tran_peak at 0.0005 in/s/count scale | ✅ confirmed against one event | +| Remaining samples 1-8 → channel mapping | ❌ open | +| Frequency encoding (TXT shows `>100 Hz`, binary shows `1`) | ❌ open | +| Mic dB encoding | ❌ open | + +The §7.6.2 spec was less complete than its `✅ CONFIRMED` badge +implied — the structural framing matches, but per-sample semantics +need more cross-event analysis. + +## Confirmed structure (2026-05-20) + +### Body layout + +``` +body = [stream of 32-byte blocks] +``` + +Body length isn't always a multiple of 32 — observed 1-byte and +9-byte trailing remnants. Walker should iterate 32-stride and stop +before the tail. + +### 32-byte block header + +``` +[0] 0x00 always-zero (probably a fixed format tag) +[1] segment_id (uint8) 0x00, 0x01, 0x02, 0x03 — 256 blocks per segment +[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, ...) +[4:22] 9× int16 LE samples +[22:24] 0x00 0x00 constant +[24:28] 4-byte variable unknown — possibly timestamp delta or CRC +[28:30] 0x1e 0x0a constant signature (`30, 10`) +[30:32] 0x00 0x00 constant +``` + +Anchor for finding data blocks during a body walk: `block[22:24] == +b"\x00\x00"` AND `block[28:32] == b"\x1e\x0a\x00\x00"`. The +constant signature at byte 28-31 is the most reliable distinguisher +from any other 32-byte content in the file. + +### Block count = interval count + +Confirmed against `example-events/histogram/N844L20G.630H`: +- TXT reports `Number of Intervals : 792.00` +- Binary contains 791 data blocks (one per interval, off-by-one at + the tail — probably the last interval is truncated mid-write at + recording stop) + +Implication: each block represents exactly one histogram interval +(1 minute in this fixture, configurable per device). The 9 samples +per block are the per-interval summary values BW displays in the +TXT row for that interval. + +### What sample 0 means + +Confirmed: `sample[0] / 2000 = Tran peak amplitude in in/s` for +the Normal-range geophone. Equivalently, sample[0] is in units of +**0.0005 in/s per count** (NOT the 0.005 in/s display quantum or the +1-count ADC quantum). + +Verified for block 0 of N844L20G.630H: +- binary sample[0] = 10 +- TXT Tran_peak[0] = 0.005 in/s +- check: 10 × 0.0005 = 0.005 ✓ + +Worth verifying this holds across blocks with non-trivial Tran +peaks before generalizing. + +## Open mappings + +### Samples 1-8 → channel + metric + +TXT structure is **10 columns per interval**: + +``` +Tran Tran Vert Vert Long Long Geo MicL MicL MicL +Peak Freq Peak Freq Peak Freq PVS psi dB(L) Freq +in/s Hz in/s Hz in/s Hz in/s psi dB Hz +``` + +Binary has **9 samples per block** (one short of the column count). +None of the obvious mappings work: + +| Hypothesis | Why it fails | +|---|---| +| (T_peak, T_freq, V_peak, V_freq, L_peak, L_freq, Geo, M_peak, M_freq) | Sample[1]=1 doesn't decode to `>100 Hz` under any obvious scale | +| (T_peak, V_peak, L_peak, T_freq, V_freq, L_freq, Geo, M_peak, M_freq) | V_peak should be 1 → 0.005 in/s but is 1 → would compute 0.0005, TXT shows 0.005 for some intervals, 0.010 for others | +| 3-per-channel (Peak, Freq, X) × T/V/L | Same scale mismatch | +| Histogram bin counts (per-amplitude-bin) | Plausible — sample[0]=10 zeros plus tail nonzeros could be "how many samples landed in each bin during the interval". But then sample[0] = T_peak coincidence is suspicious. | + +`>100 Hz` is a sentinel BW writes when the measured zero-crossing +frequency exceeds the geophone's measurement range. The binary +encoding of this sentinel is unknown. Common candidates: +- Special value (e.g. 0xFFFF / 0x7FFF / 0) +- A flag bit in the metadata bytes (especially the 4-byte variable + field at [24:28]) + +### Metadata 4-byte variable field (bytes 24:28) + +Examples from the first 8 blocks of N844L20G.630H: +``` +block 0: 03 90 2a 00 +block 1: 04 f2 84 00 +block 2: 03 2b e7 00 +block 3: 03 fe 11 00 +block 4: 03 f7 91 00 +block 5: 03 e9 4e 00 +block 6: 03 4c 5c 00 +block 7: 03 99 aa 00 +``` + +First byte is mostly `0x03` (blocks 0,2-7) and sometimes `0x04` (block +1). Could be a CRC, timestamp delta, or per-interval status byte. +Worth correlating against TXT columns that vary block-to-block. + +## Fixture corpus + +In-repo histogram fixtures (paired binary + ASCII TXT): + +``` +example-events/histogram/N844L20G.630H (27 KB, 791 blocks, 792 intervals) +example-events/histogram/N844L21H.2R0H (22 KB) +example-events/histogram/N844L22A.VT0H (27 KB) +example-events/histogram/N844L23B.ND0H ... +example-events/histogram/N844L27U.U30H ... +example-events/histogram/N844L28V.NA0H ... +example-events/histogram/N844L6QT.IQ0H ... +example-events/histogram/N844L6RU.BO0H ... +example-events/histogram/N844L6SO.6I0H ... +example-events/histogram/N844L6TP.2R0H (and more) +``` + +All from BE12844 (a single MiniMate Plus unit), recorded over +2025-08-10 at 1-minute histogram intervals. All "noise floor" +events — mostly silent intervals with rare spikes. + +Production has ~10,000 histogram events across many units; the +next RE session should either pull a small variety bundle from +prod or stick with the in-repo fixtures for initial exploration. + +## Suggested attack plan for next session + +1. **Verify sample[0] = T_peak hypothesis across all 791 blocks + of N844L20G.630H** — confirms the scale factor isn't a coincidence. +2. **Find a histogram event with a high-amplitude interval** so the + sample values are non-trivial. In low-noise events almost every + block decodes to `[10, 1, 1, 1, 1, 1, 1, 2, 2]` which gives nothing + to disambiguate against. +3. **Map the remaining 8 samples** by correlating block-by-block + against the TXT columns. Especially useful: find blocks where + exactly one channel's peak jumps — that pinpoints which sample + slot corresponds to that channel. +4. **Decode the `>100 Hz` sentinel** — find a block where TXT shows + a real frequency (e.g. `73.1 Hz`) and reverse the binary value. +5. **Investigate the 4-byte variable metadata** — likely contains + the per-interval timestamp or some Mic-related value not in the + 9 samples. +6. **Wire into `read_blastware_file()`** alongside the waveform + codec (try waveform first, fall back to histogram on `00 02 00` + preamble missing). +7. **Update `scripts/backfill_sidecars.py`** to remove the + `has_samples` short-circuit so histogram `.h5` files regenerate + too. + +## Code seam for the eventual decoder + +`minimateplus/histogram_codec.py` (to-be-created) should mirror +`minimateplus/waveform_codec.py`: + +```python +def decode_histogram_body(body: bytes) -> Optional[dict]: + """Decode a histogram-mode body into per-channel sample arrays. + + Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}`` + with each channel's per-interval peak values in ADC counts. + Returns ``None`` if the body cannot be parsed. + """ +``` + +Then in `event_file_io.read_blastware_file()`: + +```python +decoded = decode_waveform_v2(body) +if decoded is None: + decoded = decode_histogram_body(body) +if decoded is None: + log.warning(...) + samples = {"Tran": [], ...} +else: + samples = decoded_to_adc_counts(decoded) +``` + +## Related work + +- Waveform body codec — `docs/waveform_codec_re_status.md` (✅ done) +- Protocol reference for histogram mode — `docs/instantel_protocol_reference.md §7.6.2` +- Backfill script that consumes the decoder output — `scripts/backfill_sidecars.py`