# Histogram body codec — FULLY DECODED (2026-05-20) Clean working status doc for the MiniMate Plus histogram-mode event body codec. Companion to `waveform_codec_re_status.md`. The deep historical record (with retractions and dated analyses) lives in `docs/instantel_protocol_reference.md §7.6.2`; the authoritative implementation lives in `minimateplus/histogram_codec.py`. ## TL;DR **The codec is fully decoded.** Every field of every block in the in-repo histogram fixture corpus decodes byte-exact against BW's ASCII export. 24 regression tests pass against ~3,500 blocks across 5 fixtures. ## Body format ``` body = [stream of 32-byte data blocks] + [small trailing remnant] ``` Each block represents one histogram interval. Block layout: ``` [0] 0x00 always-zero tag [1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment [2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …) [4:6] 0x000a (uint16 LE) constant marker (= 10) [6:8] T_peak_count uint16 LE Tran peak (count × 0.005 → in/s at Normal) [8:10] T_halfperiod uint16 LE Tran half-period in samples (freq_Hz = 512 / halfp; ≤ 5 means ">100 Hz") [10:12] V_peak_count uint16 LE Vert peak [12:14] V_halfperiod uint16 LE Vert freq half-period [14:16] L_peak_count uint16 LE Long peak [16:18] L_halfperiod uint16 LE Long freq half-period [18:20] M_peak_count uint16 LE MicL peak count (dB via waveform_codec.mic_count_to_db) [20:22] M_halfperiod uint16 LE MicL freq half-period [22:24] 0x00 0x00 constant [24:28] 4-byte variable purpose unknown — possibly CRC, timestamp delta, or psi(L) numeric; not needed for waveform reconstruction [28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature ``` Reliable block-identification anchor: ```python block[22:24] == b"\x00\x00" and block[28:32] == b"\x1e\x0a\x00\x00" ``` (The `1e 0a 00 00` constant tail is the most distinctive signature.) ## Per-channel encoding | Channel | Peak encoding | Frequency encoding | |---|---|---| | Tran | count × 0.005 = in/s at Normal range | `freq_Hz = 512 / halfperiod` | | Vert | same | same | | Long | same | same | | MicL | count → dB via `mic_count_to_db(count)` (same formula as waveform codec) | same | **`>100 Hz` sentinel**: when halfperiod ≤ 5 (giving ≥100 Hz from the 512/halfp formula), BW displays `>100 Hz`. Codec's `half_period_to_hz` returns `None` in this range. ## Verified facts (cross-checked against fixture corpus) Example: N844L6Z8.ZR0H block 130 → all 8 decoded fields byte-exact: ``` binary samples [10, 6, 24, 4, 18, 5, 21, 5, 9] TXT row [0.030, 21, 0.020, 28, 0.025, 24, 0.040, 0.000, 95.92, 57] slot[0] = 10 marker slot[1] = 6 × 0.005 = 0.030 in/s ✓ T_peak slot[2] = 24 → 512/24 = 21.3 → 21 Hz ✓ T_freq slot[3] = 4 × 0.005 = 0.020 in/s ✓ V_peak slot[4] = 18 → 512/18 = 28.4 → 28 Hz ✓ V_freq slot[5] = 5 × 0.005 = 0.025 in/s ✓ L_peak slot[6] = 21 → 512/21 = 24.4 → 24 Hz ✓ L_freq slot[7] = 5 → 81.94 + 20·log10(5) = 95.92 dB ✓ M_peak slot[8] = 9 → 512/9 = 56.9 → 57 Hz ✓ M_freq ``` ## Verified test coverage `tests/test_histogram_codec.py` (24 tests): - Block walking: yields one record per `.TXT` interval ± 1 (off-by-one at the tail when recording was stopped mid-write). Segment-ID groups of 256 blocks confirmed. - Geo peaks: every block of N844L20G, N844L6Z8, N844L6XE, N844L23B matches `.TXT` within the 0.0005 in/s quantization step. - Geo freqs: every block of N844L6Z8 and N844L6XE matches `.TXT` within 1 Hz (BW display rounds). `>100 Hz` sentinel handled correctly. - Mic dB: every block of N844L6XE, N844L23B, N844L6Z8 matches `.TXT` within 0.1 dB (BW display precision). - Mic freq: matches `.TXT` within 1 Hz across active blocks. ## What's NOT yet decoded - **4-byte variable metadata field (bytes 24:28)**. Not needed for waveform reconstruction. Speculation: per-block CRC, sub-second timestamp offset, or a Mic psi(L) count not in the 9 samples. Punt until something needs it. - **Geo PVS (TXT col 7, e.g. "0.040 in/s")**. Not stored in the block; can be approximated as `sqrt(T_peak² + V_peak² + L_peak²)` but BW's value sometimes differs slightly (probably computed from waveform-instant samples, not from per-channel peaks). Punt — the `.h5` consumers don't need PVS as a sample channel. - **Mic psi(L) value (TXT col 8)**. TXT shows it as a small psi value derived from the dB measurement. Not in the 9 samples. Could be derived from `M_peak_count` via the inverse of the dB formula plus a psi calibration constant. Defer. ## Output shape `decode_histogram_body` returns the standard 4-channel dict that mirrors `waveform_codec.decode_waveform_v2`'s output: ```python { "Tran": [peak_count_per_interval, ...], # 16-count units (LSB = 0.005 in/s) "Vert": [..., ...], "Long": [..., ...], "MicL": [..., ...], # raw ADC counts } ``` Run through `waveform_codec.decoded_to_adc_counts` to get 1-count ADC units (geo ×16, mic passthrough) for the standard `.h5` writer. For the full per-interval record with frequencies + metadata, use `decode_histogram_body_full()`. ## Where it's wired - `minimateplus/event_file_io.py:read_blastware_file()` — first tries the waveform codec, falls back to the histogram codec when the waveform preamble isn't present. Same output shape, same downstream pipeline. - `scripts/backfill_sidecars.py` — the `has_samples` short-circuit added during the histogram-codec-pending era still serves as a defensive guard against truly undecodable files, but no longer fires for valid histograms. ## Companion reference - `docs/waveform_codec_re_status.md` — sibling status doc for the much-more-complex waveform-mode codec. - `docs/instantel_protocol_reference.md §7.6.2` — historical protocol-reference entry. Structural framing matches what we found; per-sample semantics were less documented than the `✅ CONFIRMED` badge suggested. This doc supersedes §7.6.2 where they conflict on confidence level.