v0.20.0 - prerelease features. #25
@@ -0,0 +1,212 @@
|
|||||||
|
# Histogram body codec — IN PROGRESS (started 2026-05-20)
|
||||||
|
|
||||||
|
Working notes for the Series III histogram-mode event body codec
|
||||||
|
reverse-engineering effort. Mirrors the structure of
|
||||||
|
`waveform_codec_re_status.md` (the now-completed waveform codec). The
|
||||||
|
historical context lives in `docs/instantel_protocol_reference.md
|
||||||
|
§7.6.2`; this doc is the active scratchpad.
|
||||||
|
|
||||||
|
## TL;DR (current state)
|
||||||
|
|
||||||
|
**Block framing is solved. Sample-to-channel mapping is open.**
|
||||||
|
|
||||||
|
| Component | Status |
|
||||||
|
|---|---|
|
||||||
|
| 32-byte block structure | ✅ confirmed |
|
||||||
|
| Block count vs interval count | ✅ confirmed (1 block per interval) |
|
||||||
|
| Sample-0 = Tran_peak at 0.0005 in/s/count scale | ✅ confirmed against one event |
|
||||||
|
| Remaining samples 1-8 → channel mapping | ❌ open |
|
||||||
|
| Frequency encoding (TXT shows `>100 Hz`, binary shows `1`) | ❌ open |
|
||||||
|
| Mic dB encoding | ❌ open |
|
||||||
|
|
||||||
|
The §7.6.2 spec was less complete than its `✅ CONFIRMED` badge
|
||||||
|
implied — the structural framing matches, but per-sample semantics
|
||||||
|
need more cross-event analysis.
|
||||||
|
|
||||||
|
## Confirmed structure (2026-05-20)
|
||||||
|
|
||||||
|
### Body layout
|
||||||
|
|
||||||
|
```
|
||||||
|
body = [stream of 32-byte blocks]
|
||||||
|
```
|
||||||
|
|
||||||
|
Body length isn't always a multiple of 32 — observed 1-byte and
|
||||||
|
9-byte trailing remnants. Walker should iterate 32-stride and stop
|
||||||
|
before the tail.
|
||||||
|
|
||||||
|
### 32-byte block header
|
||||||
|
|
||||||
|
```
|
||||||
|
[0] 0x00 always-zero (probably a fixed format tag)
|
||||||
|
[1] segment_id (uint8) 0x00, 0x01, 0x02, 0x03 — 256 blocks per segment
|
||||||
|
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, ...)
|
||||||
|
[4:22] 9× int16 LE samples
|
||||||
|
[22:24] 0x00 0x00 constant
|
||||||
|
[24:28] 4-byte variable unknown — possibly timestamp delta or CRC
|
||||||
|
[28:30] 0x1e 0x0a constant signature (`30, 10`)
|
||||||
|
[30:32] 0x00 0x00 constant
|
||||||
|
```
|
||||||
|
|
||||||
|
Anchor for finding data blocks during a body walk: `block[22:24] ==
|
||||||
|
b"\x00\x00"` AND `block[28:32] == b"\x1e\x0a\x00\x00"`. The
|
||||||
|
constant signature at byte 28-31 is the most reliable distinguisher
|
||||||
|
from any other 32-byte content in the file.
|
||||||
|
|
||||||
|
### Block count = interval count
|
||||||
|
|
||||||
|
Confirmed against `example-events/histogram/N844L20G.630H`:
|
||||||
|
- TXT reports `Number of Intervals : 792.00`
|
||||||
|
- Binary contains 791 data blocks (one per interval, off-by-one at
|
||||||
|
the tail — probably the last interval is truncated mid-write at
|
||||||
|
recording stop)
|
||||||
|
|
||||||
|
Implication: each block represents exactly one histogram interval
|
||||||
|
(1 minute in this fixture, configurable per device). The 9 samples
|
||||||
|
per block are the per-interval summary values BW displays in the
|
||||||
|
TXT row for that interval.
|
||||||
|
|
||||||
|
### What sample 0 means
|
||||||
|
|
||||||
|
Confirmed: `sample[0] / 2000 = Tran peak amplitude in in/s` for
|
||||||
|
the Normal-range geophone. Equivalently, sample[0] is in units of
|
||||||
|
**0.0005 in/s per count** (NOT the 0.005 in/s display quantum or the
|
||||||
|
1-count ADC quantum).
|
||||||
|
|
||||||
|
Verified for block 0 of N844L20G.630H:
|
||||||
|
- binary sample[0] = 10
|
||||||
|
- TXT Tran_peak[0] = 0.005 in/s
|
||||||
|
- check: 10 × 0.0005 = 0.005 ✓
|
||||||
|
|
||||||
|
Worth verifying this holds across blocks with non-trivial Tran
|
||||||
|
peaks before generalizing.
|
||||||
|
|
||||||
|
## Open mappings
|
||||||
|
|
||||||
|
### Samples 1-8 → channel + metric
|
||||||
|
|
||||||
|
TXT structure is **10 columns per interval**:
|
||||||
|
|
||||||
|
```
|
||||||
|
Tran Tran Vert Vert Long Long Geo MicL MicL MicL
|
||||||
|
Peak Freq Peak Freq Peak Freq PVS psi dB(L) Freq
|
||||||
|
in/s Hz in/s Hz in/s Hz in/s psi dB Hz
|
||||||
|
```
|
||||||
|
|
||||||
|
Binary has **9 samples per block** (one short of the column count).
|
||||||
|
None of the obvious mappings work:
|
||||||
|
|
||||||
|
| Hypothesis | Why it fails |
|
||||||
|
|---|---|
|
||||||
|
| (T_peak, T_freq, V_peak, V_freq, L_peak, L_freq, Geo, M_peak, M_freq) | Sample[1]=1 doesn't decode to `>100 Hz` under any obvious scale |
|
||||||
|
| (T_peak, V_peak, L_peak, T_freq, V_freq, L_freq, Geo, M_peak, M_freq) | V_peak should be 1 → 0.005 in/s but is 1 → would compute 0.0005, TXT shows 0.005 for some intervals, 0.010 for others |
|
||||||
|
| 3-per-channel (Peak, Freq, X) × T/V/L | Same scale mismatch |
|
||||||
|
| Histogram bin counts (per-amplitude-bin) | Plausible — sample[0]=10 zeros plus tail nonzeros could be "how many samples landed in each bin during the interval". But then sample[0] = T_peak coincidence is suspicious. |
|
||||||
|
|
||||||
|
`>100 Hz` is a sentinel BW writes when the measured zero-crossing
|
||||||
|
frequency exceeds the geophone's measurement range. The binary
|
||||||
|
encoding of this sentinel is unknown. Common candidates:
|
||||||
|
- Special value (e.g. 0xFFFF / 0x7FFF / 0)
|
||||||
|
- A flag bit in the metadata bytes (especially the 4-byte variable
|
||||||
|
field at [24:28])
|
||||||
|
|
||||||
|
### Metadata 4-byte variable field (bytes 24:28)
|
||||||
|
|
||||||
|
Examples from the first 8 blocks of N844L20G.630H:
|
||||||
|
```
|
||||||
|
block 0: 03 90 2a 00
|
||||||
|
block 1: 04 f2 84 00
|
||||||
|
block 2: 03 2b e7 00
|
||||||
|
block 3: 03 fe 11 00
|
||||||
|
block 4: 03 f7 91 00
|
||||||
|
block 5: 03 e9 4e 00
|
||||||
|
block 6: 03 4c 5c 00
|
||||||
|
block 7: 03 99 aa 00
|
||||||
|
```
|
||||||
|
|
||||||
|
First byte is mostly `0x03` (blocks 0,2-7) and sometimes `0x04` (block
|
||||||
|
1). Could be a CRC, timestamp delta, or per-interval status byte.
|
||||||
|
Worth correlating against TXT columns that vary block-to-block.
|
||||||
|
|
||||||
|
## Fixture corpus
|
||||||
|
|
||||||
|
In-repo histogram fixtures (paired binary + ASCII TXT):
|
||||||
|
|
||||||
|
```
|
||||||
|
example-events/histogram/N844L20G.630H (27 KB, 791 blocks, 792 intervals)
|
||||||
|
example-events/histogram/N844L21H.2R0H (22 KB)
|
||||||
|
example-events/histogram/N844L22A.VT0H (27 KB)
|
||||||
|
example-events/histogram/N844L23B.ND0H ...
|
||||||
|
example-events/histogram/N844L27U.U30H ...
|
||||||
|
example-events/histogram/N844L28V.NA0H ...
|
||||||
|
example-events/histogram/N844L6QT.IQ0H ...
|
||||||
|
example-events/histogram/N844L6RU.BO0H ...
|
||||||
|
example-events/histogram/N844L6SO.6I0H ...
|
||||||
|
example-events/histogram/N844L6TP.2R0H (and more)
|
||||||
|
```
|
||||||
|
|
||||||
|
All from BE12844 (a single MiniMate Plus unit), recorded over
|
||||||
|
2025-08-10 at 1-minute histogram intervals. All "noise floor"
|
||||||
|
events — mostly silent intervals with rare spikes.
|
||||||
|
|
||||||
|
Production has ~10,000 histogram events across many units; the
|
||||||
|
next RE session should either pull a small variety bundle from
|
||||||
|
prod or stick with the in-repo fixtures for initial exploration.
|
||||||
|
|
||||||
|
## Suggested attack plan for next session
|
||||||
|
|
||||||
|
1. **Verify sample[0] = T_peak hypothesis across all 791 blocks
|
||||||
|
of N844L20G.630H** — confirms the scale factor isn't a coincidence.
|
||||||
|
2. **Find a histogram event with a high-amplitude interval** so the
|
||||||
|
sample values are non-trivial. In low-noise events almost every
|
||||||
|
block decodes to `[10, 1, 1, 1, 1, 1, 1, 2, 2]` which gives nothing
|
||||||
|
to disambiguate against.
|
||||||
|
3. **Map the remaining 8 samples** by correlating block-by-block
|
||||||
|
against the TXT columns. Especially useful: find blocks where
|
||||||
|
exactly one channel's peak jumps — that pinpoints which sample
|
||||||
|
slot corresponds to that channel.
|
||||||
|
4. **Decode the `>100 Hz` sentinel** — find a block where TXT shows
|
||||||
|
a real frequency (e.g. `73.1 Hz`) and reverse the binary value.
|
||||||
|
5. **Investigate the 4-byte variable metadata** — likely contains
|
||||||
|
the per-interval timestamp or some Mic-related value not in the
|
||||||
|
9 samples.
|
||||||
|
6. **Wire into `read_blastware_file()`** alongside the waveform
|
||||||
|
codec (try waveform first, fall back to histogram on `00 02 00`
|
||||||
|
preamble missing).
|
||||||
|
7. **Update `scripts/backfill_sidecars.py`** to remove the
|
||||||
|
`has_samples` short-circuit so histogram `.h5` files regenerate
|
||||||
|
too.
|
||||||
|
|
||||||
|
## Code seam for the eventual decoder
|
||||||
|
|
||||||
|
`minimateplus/histogram_codec.py` (to-be-created) should mirror
|
||||||
|
`minimateplus/waveform_codec.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def decode_histogram_body(body: bytes) -> Optional[dict]:
|
||||||
|
"""Decode a histogram-mode body into per-channel sample arrays.
|
||||||
|
|
||||||
|
Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
|
||||||
|
with each channel's per-interval peak values in ADC counts.
|
||||||
|
Returns ``None`` if the body cannot be parsed.
|
||||||
|
"""
|
||||||
|
```
|
||||||
|
|
||||||
|
Then in `event_file_io.read_blastware_file()`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
decoded = decode_waveform_v2(body)
|
||||||
|
if decoded is None:
|
||||||
|
decoded = decode_histogram_body(body)
|
||||||
|
if decoded is None:
|
||||||
|
log.warning(...)
|
||||||
|
samples = {"Tran": [], ...}
|
||||||
|
else:
|
||||||
|
samples = decoded_to_adc_counts(decoded)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related work
|
||||||
|
|
||||||
|
- Waveform body codec — `docs/waveform_codec_re_status.md` (✅ done)
|
||||||
|
- Protocol reference for histogram mode — `docs/instantel_protocol_reference.md §7.6.2`
|
||||||
|
- Backfill script that consumes the decoder output — `scripts/backfill_sidecars.py`
|
||||||
Reference in New Issue
Block a user