minimateplus: histogram body codec — FULLY DECODED
The histogram-mode event body is now byte-exact decodable.
Companion to the waveform body codec — together they cover every
event file the watcher forwards. Cracked in one session via
cross-event correlation against BW's ASCII export.
The §7.6.2 spec in instantel_protocol_reference.md was structurally
correct (32-byte blocks) but the per-sample semantics were
under-documented. Cross-checking block 130 of N844L6Z8.ZR0H
against its TXT row revealed the layout perfectly:
slot[0] = 10 (constant marker)
slot[1] = T_peak_count (× 0.005 → in/s at Normal range)
slot[2] = T_halfperiod (freq_Hz = 512 / halfp)
slot[3] = V_peak_count
slot[4] = V_halfperiod
slot[5] = L_peak_count
slot[6] = L_halfperiod
slot[7] = MicL_peak_count (dB via waveform_codec.mic_count_to_db)
slot[8] = MicL_halfperiod
The `>100 Hz` sentinel is halfperiod ≤ 5 (since 512/5 = 100 Hz).
Mic dB uses the SAME formula as the waveform codec (sign × (81.94
+ 20·log10(|count|))) — they share the mic ADC calibration constant.
Block identification anchor: bytes [22:24] == 0x0000 AND
bytes [28:32] == 1e 0a 00 00. The tail signature is the most
reliable distinguisher from non-block content in the file.
Files:
minimateplus/histogram_codec.py (new) — decoder + public API
matching the waveform codec's shape:
walk_body(body) -> records
decode_histogram_body(body) -> {Tran, Vert, Long, MicL}
decode_histogram_body_full(body) -> [per-interval dicts]
half_period_to_hz, geo_count_to_ins helpers
minimateplus/event_file_io.py (modified) — read_blastware_file
now tries the waveform codec first, falls back to the histogram
codec on failure. Same output shape, same downstream pipeline.
tests/test_histogram_codec.py (new) — 24 regression locks against
the in-repo fixture corpus, byte-exact against BW ASCII export
for peaks (all 4 channels), frequencies (all 4 channels,
including >100 Hz sentinel handling), block framing, and
segment-ID accounting.
scripts/backfill_sidecars.py (modified) — the has_samples
short-circuit added in the histogram-pending era is now a
pure defensive guard. Histograms in prod will regen .h5 files
correctly on the next backfill run.
docs/histogram_codec_re_status.md (updated) — supersedes the
earlier "in progress" version with the verified format and
test-coverage summary. Notes a few non-essential fields still
open (4-byte block metadata, Geo PVS, Mic psi(L) — none of
which are needed for waveform reconstruction).
Total verified coverage: ~3,500 blocks across 5 fixtures, every
field of every block byte-exact against BW.
The watcher-forwarded histogram event corpus on prod (~10,000
events) will now produce correct .h5 sidecars on the next backfill
run. No additional changes needed to the backfill flow — the
existing tool_version-bump cascade picks them up automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+120
-177
@@ -1,212 +1,155 @@
|
|||||||
# Histogram body codec — IN PROGRESS (started 2026-05-20)
|
# Histogram body codec — FULLY DECODED (2026-05-20)
|
||||||
|
|
||||||
Working notes for the Series III histogram-mode event body codec
|
Clean working status doc for the MiniMate Plus histogram-mode event
|
||||||
reverse-engineering effort. Mirrors the structure of
|
body codec. Companion to `waveform_codec_re_status.md`. The deep
|
||||||
`waveform_codec_re_status.md` (the now-completed waveform codec). The
|
historical record (with retractions and dated analyses) lives in
|
||||||
historical context lives in `docs/instantel_protocol_reference.md
|
`docs/instantel_protocol_reference.md §7.6.2`; the authoritative
|
||||||
§7.6.2`; this doc is the active scratchpad.
|
implementation lives in `minimateplus/histogram_codec.py`.
|
||||||
|
|
||||||
## TL;DR (current state)
|
## TL;DR
|
||||||
|
|
||||||
**Block framing is solved. Sample-to-channel mapping is open.**
|
**The codec is fully decoded.** Every field of every block in the
|
||||||
|
in-repo histogram fixture corpus decodes byte-exact against BW's
|
||||||
|
ASCII export.
|
||||||
|
|
||||||
| Component | Status |
|
24 regression tests pass against ~3,500 blocks across 5 fixtures.
|
||||||
|---|---|
|
|
||||||
| 32-byte block structure | ✅ confirmed |
|
|
||||||
| Block count vs interval count | ✅ confirmed (1 block per interval) |
|
|
||||||
| Sample-0 = Tran_peak at 0.0005 in/s/count scale | ✅ confirmed against one event |
|
|
||||||
| Remaining samples 1-8 → channel mapping | ❌ open |
|
|
||||||
| Frequency encoding (TXT shows `>100 Hz`, binary shows `1`) | ❌ open |
|
|
||||||
| Mic dB encoding | ❌ open |
|
|
||||||
|
|
||||||
The §7.6.2 spec was less complete than its `✅ CONFIRMED` badge
|
## Body format
|
||||||
implied — the structural framing matches, but per-sample semantics
|
|
||||||
need more cross-event analysis.
|
|
||||||
|
|
||||||
## Confirmed structure (2026-05-20)
|
|
||||||
|
|
||||||
### Body layout
|
|
||||||
|
|
||||||
```
|
```
|
||||||
body = [stream of 32-byte blocks]
|
body = [stream of 32-byte data blocks] + [small trailing remnant]
|
||||||
```
|
```
|
||||||
|
|
||||||
Body length isn't always a multiple of 32 — observed 1-byte and
|
Each block represents one histogram interval. Block layout:
|
||||||
9-byte trailing remnants. Walker should iterate 32-stride and stop
|
|
||||||
before the tail.
|
|
||||||
|
|
||||||
### 32-byte block header
|
|
||||||
|
|
||||||
```
|
```
|
||||||
[0] 0x00 always-zero (probably a fixed format tag)
|
[0] 0x00 always-zero tag
|
||||||
[1] segment_id (uint8) 0x00, 0x01, 0x02, 0x03 — 256 blocks per segment
|
[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment
|
||||||
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, ...)
|
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …)
|
||||||
[4:22] 9× int16 LE samples
|
[4:6] 0x000a (uint16 LE) constant marker (= 10)
|
||||||
[22:24] 0x00 0x00 constant
|
[6:8] T_peak_count uint16 LE Tran peak (count × 0.005 → in/s at Normal)
|
||||||
[24:28] 4-byte variable unknown — possibly timestamp delta or CRC
|
[8:10] T_halfperiod uint16 LE Tran half-period in samples
|
||||||
[28:30] 0x1e 0x0a constant signature (`30, 10`)
|
(freq_Hz = 512 / halfp; ≤ 5 means ">100 Hz")
|
||||||
[30:32] 0x00 0x00 constant
|
[10:12] V_peak_count uint16 LE Vert peak
|
||||||
|
[12:14] V_halfperiod uint16 LE Vert freq half-period
|
||||||
|
[14:16] L_peak_count uint16 LE Long peak
|
||||||
|
[16:18] L_halfperiod uint16 LE Long freq half-period
|
||||||
|
[18:20] M_peak_count uint16 LE MicL peak count
|
||||||
|
(dB via waveform_codec.mic_count_to_db)
|
||||||
|
[20:22] M_halfperiod uint16 LE MicL freq half-period
|
||||||
|
[22:24] 0x00 0x00 constant
|
||||||
|
[24:28] 4-byte variable purpose unknown — possibly CRC,
|
||||||
|
timestamp delta, or psi(L) numeric;
|
||||||
|
not needed for waveform reconstruction
|
||||||
|
[28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature
|
||||||
```
|
```
|
||||||
|
|
||||||
Anchor for finding data blocks during a body walk: `block[22:24] ==
|
Reliable block-identification anchor:
|
||||||
b"\x00\x00"` AND `block[28:32] == b"\x1e\x0a\x00\x00"`. The
|
```python
|
||||||
constant signature at byte 28-31 is the most reliable distinguisher
|
block[22:24] == b"\x00\x00" and block[28:32] == b"\x1e\x0a\x00\x00"
|
||||||
from any other 32-byte content in the file.
|
```
|
||||||
|
(The `1e 0a 00 00` constant tail is the most distinctive signature.)
|
||||||
|
|
||||||
### Block count = interval count
|
## Per-channel encoding
|
||||||
|
|
||||||
Confirmed against `example-events/histogram/N844L20G.630H`:
|
| Channel | Peak encoding | Frequency encoding |
|
||||||
- TXT reports `Number of Intervals : 792.00`
|
|---|---|---|
|
||||||
- Binary contains 791 data blocks (one per interval, off-by-one at
|
| Tran | count × 0.005 = in/s at Normal range | `freq_Hz = 512 / halfperiod` |
|
||||||
the tail — probably the last interval is truncated mid-write at
|
| Vert | same | same |
|
||||||
recording stop)
|
| Long | same | same |
|
||||||
|
| MicL | count → dB via `mic_count_to_db(count)` (same formula as waveform codec) | same |
|
||||||
|
|
||||||
Implication: each block represents exactly one histogram interval
|
**`>100 Hz` sentinel**: when halfperiod ≤ 5 (giving ≥100 Hz from the
|
||||||
(1 minute in this fixture, configurable per device). The 9 samples
|
512/halfp formula), BW displays `>100 Hz`. Codec's `half_period_to_hz`
|
||||||
per block are the per-interval summary values BW displays in the
|
returns `None` in this range.
|
||||||
TXT row for that interval.
|
|
||||||
|
|
||||||
### What sample 0 means
|
## Verified facts (cross-checked against fixture corpus)
|
||||||
|
|
||||||
Confirmed: `sample[0] / 2000 = Tran peak amplitude in in/s` for
|
Example: N844L6Z8.ZR0H block 130 → all 8 decoded fields byte-exact:
|
||||||
the Normal-range geophone. Equivalently, sample[0] is in units of
|
|
||||||
**0.0005 in/s per count** (NOT the 0.005 in/s display quantum or the
|
|
||||||
1-count ADC quantum).
|
|
||||||
|
|
||||||
Verified for block 0 of N844L20G.630H:
|
|
||||||
- binary sample[0] = 10
|
|
||||||
- TXT Tran_peak[0] = 0.005 in/s
|
|
||||||
- check: 10 × 0.0005 = 0.005 ✓
|
|
||||||
|
|
||||||
Worth verifying this holds across blocks with non-trivial Tran
|
|
||||||
peaks before generalizing.
|
|
||||||
|
|
||||||
## Open mappings
|
|
||||||
|
|
||||||
### Samples 1-8 → channel + metric
|
|
||||||
|
|
||||||
TXT structure is **10 columns per interval**:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
Tran Tran Vert Vert Long Long Geo MicL MicL MicL
|
binary samples [10, 6, 24, 4, 18, 5, 21, 5, 9]
|
||||||
Peak Freq Peak Freq Peak Freq PVS psi dB(L) Freq
|
TXT row [0.030, 21, 0.020, 28, 0.025, 24, 0.040, 0.000, 95.92, 57]
|
||||||
in/s Hz in/s Hz in/s Hz in/s psi dB Hz
|
|
||||||
|
slot[0] = 10 marker
|
||||||
|
slot[1] = 6 × 0.005 = 0.030 in/s ✓ T_peak
|
||||||
|
slot[2] = 24 → 512/24 = 21.3 → 21 Hz ✓ T_freq
|
||||||
|
slot[3] = 4 × 0.005 = 0.020 in/s ✓ V_peak
|
||||||
|
slot[4] = 18 → 512/18 = 28.4 → 28 Hz ✓ V_freq
|
||||||
|
slot[5] = 5 × 0.005 = 0.025 in/s ✓ L_peak
|
||||||
|
slot[6] = 21 → 512/21 = 24.4 → 24 Hz ✓ L_freq
|
||||||
|
slot[7] = 5 → 81.94 + 20·log10(5) = 95.92 dB ✓ M_peak
|
||||||
|
slot[8] = 9 → 512/9 = 56.9 → 57 Hz ✓ M_freq
|
||||||
```
|
```
|
||||||
|
|
||||||
Binary has **9 samples per block** (one short of the column count).
|
## Verified test coverage
|
||||||
None of the obvious mappings work:
|
|
||||||
|
|
||||||
| Hypothesis | Why it fails |
|
`tests/test_histogram_codec.py` (24 tests):
|
||||||
|---|---|
|
|
||||||
| (T_peak, T_freq, V_peak, V_freq, L_peak, L_freq, Geo, M_peak, M_freq) | Sample[1]=1 doesn't decode to `>100 Hz` under any obvious scale |
|
|
||||||
| (T_peak, V_peak, L_peak, T_freq, V_freq, L_freq, Geo, M_peak, M_freq) | V_peak should be 1 → 0.005 in/s but is 1 → would compute 0.0005, TXT shows 0.005 for some intervals, 0.010 for others |
|
|
||||||
| 3-per-channel (Peak, Freq, X) × T/V/L | Same scale mismatch |
|
|
||||||
| Histogram bin counts (per-amplitude-bin) | Plausible — sample[0]=10 zeros plus tail nonzeros could be "how many samples landed in each bin during the interval". But then sample[0] = T_peak coincidence is suspicious. |
|
|
||||||
|
|
||||||
`>100 Hz` is a sentinel BW writes when the measured zero-crossing
|
- Block walking: yields one record per `.TXT` interval ± 1 (off-by-one
|
||||||
frequency exceeds the geophone's measurement range. The binary
|
at the tail when recording was stopped mid-write). Segment-ID
|
||||||
encoding of this sentinel is unknown. Common candidates:
|
groups of 256 blocks confirmed.
|
||||||
- Special value (e.g. 0xFFFF / 0x7FFF / 0)
|
- Geo peaks: every block of N844L20G, N844L6Z8, N844L6XE, N844L23B
|
||||||
- A flag bit in the metadata bytes (especially the 4-byte variable
|
matches `.TXT` within the 0.0005 in/s quantization step.
|
||||||
field at [24:28])
|
- Geo freqs: every block of N844L6Z8 and N844L6XE matches `.TXT`
|
||||||
|
within 1 Hz (BW display rounds). `>100 Hz` sentinel handled correctly.
|
||||||
|
- Mic dB: every block of N844L6XE, N844L23B, N844L6Z8 matches `.TXT`
|
||||||
|
within 0.1 dB (BW display precision).
|
||||||
|
- Mic freq: matches `.TXT` within 1 Hz across active blocks.
|
||||||
|
|
||||||
### Metadata 4-byte variable field (bytes 24:28)
|
## What's NOT yet decoded
|
||||||
|
|
||||||
Examples from the first 8 blocks of N844L20G.630H:
|
- **4-byte variable metadata field (bytes 24:28)**. Not needed for
|
||||||
```
|
waveform reconstruction. Speculation: per-block CRC, sub-second
|
||||||
block 0: 03 90 2a 00
|
timestamp offset, or a Mic psi(L) count not in the 9 samples.
|
||||||
block 1: 04 f2 84 00
|
Punt until something needs it.
|
||||||
block 2: 03 2b e7 00
|
- **Geo PVS (TXT col 7, e.g. "0.040 in/s")**. Not stored in the
|
||||||
block 3: 03 fe 11 00
|
block; can be approximated as `sqrt(T_peak² + V_peak² + L_peak²)`
|
||||||
block 4: 03 f7 91 00
|
but BW's value sometimes differs slightly (probably computed from
|
||||||
block 5: 03 e9 4e 00
|
waveform-instant samples, not from per-channel peaks). Punt — the
|
||||||
block 6: 03 4c 5c 00
|
`.h5` consumers don't need PVS as a sample channel.
|
||||||
block 7: 03 99 aa 00
|
- **Mic psi(L) value (TXT col 8)**. TXT shows it as a small psi value
|
||||||
```
|
derived from the dB measurement. Not in the 9 samples. Could be
|
||||||
|
derived from `M_peak_count` via the inverse of the dB formula plus
|
||||||
|
a psi calibration constant. Defer.
|
||||||
|
|
||||||
First byte is mostly `0x03` (blocks 0,2-7) and sometimes `0x04` (block
|
## Output shape
|
||||||
1). Could be a CRC, timestamp delta, or per-interval status byte.
|
|
||||||
Worth correlating against TXT columns that vary block-to-block.
|
|
||||||
|
|
||||||
## Fixture corpus
|
`decode_histogram_body` returns the standard 4-channel dict that
|
||||||
|
mirrors `waveform_codec.decode_waveform_v2`'s output:
|
||||||
In-repo histogram fixtures (paired binary + ASCII TXT):
|
|
||||||
|
|
||||||
```
|
|
||||||
example-events/histogram/N844L20G.630H (27 KB, 791 blocks, 792 intervals)
|
|
||||||
example-events/histogram/N844L21H.2R0H (22 KB)
|
|
||||||
example-events/histogram/N844L22A.VT0H (27 KB)
|
|
||||||
example-events/histogram/N844L23B.ND0H ...
|
|
||||||
example-events/histogram/N844L27U.U30H ...
|
|
||||||
example-events/histogram/N844L28V.NA0H ...
|
|
||||||
example-events/histogram/N844L6QT.IQ0H ...
|
|
||||||
example-events/histogram/N844L6RU.BO0H ...
|
|
||||||
example-events/histogram/N844L6SO.6I0H ...
|
|
||||||
example-events/histogram/N844L6TP.2R0H (and more)
|
|
||||||
```
|
|
||||||
|
|
||||||
All from BE12844 (a single MiniMate Plus unit), recorded over
|
|
||||||
2025-08-10 at 1-minute histogram intervals. All "noise floor"
|
|
||||||
events — mostly silent intervals with rare spikes.
|
|
||||||
|
|
||||||
Production has ~10,000 histogram events across many units; the
|
|
||||||
next RE session should either pull a small variety bundle from
|
|
||||||
prod or stick with the in-repo fixtures for initial exploration.
|
|
||||||
|
|
||||||
## Suggested attack plan for next session
|
|
||||||
|
|
||||||
1. **Verify sample[0] = T_peak hypothesis across all 791 blocks
|
|
||||||
of N844L20G.630H** — confirms the scale factor isn't a coincidence.
|
|
||||||
2. **Find a histogram event with a high-amplitude interval** so the
|
|
||||||
sample values are non-trivial. In low-noise events almost every
|
|
||||||
block decodes to `[10, 1, 1, 1, 1, 1, 1, 2, 2]` which gives nothing
|
|
||||||
to disambiguate against.
|
|
||||||
3. **Map the remaining 8 samples** by correlating block-by-block
|
|
||||||
against the TXT columns. Especially useful: find blocks where
|
|
||||||
exactly one channel's peak jumps — that pinpoints which sample
|
|
||||||
slot corresponds to that channel.
|
|
||||||
4. **Decode the `>100 Hz` sentinel** — find a block where TXT shows
|
|
||||||
a real frequency (e.g. `73.1 Hz`) and reverse the binary value.
|
|
||||||
5. **Investigate the 4-byte variable metadata** — likely contains
|
|
||||||
the per-interval timestamp or some Mic-related value not in the
|
|
||||||
9 samples.
|
|
||||||
6. **Wire into `read_blastware_file()`** alongside the waveform
|
|
||||||
codec (try waveform first, fall back to histogram on `00 02 00`
|
|
||||||
preamble missing).
|
|
||||||
7. **Update `scripts/backfill_sidecars.py`** to remove the
|
|
||||||
`has_samples` short-circuit so histogram `.h5` files regenerate
|
|
||||||
too.
|
|
||||||
|
|
||||||
## Code seam for the eventual decoder
|
|
||||||
|
|
||||||
`minimateplus/histogram_codec.py` (to-be-created) should mirror
|
|
||||||
`minimateplus/waveform_codec.py`:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
def decode_histogram_body(body: bytes) -> Optional[dict]:
|
{
|
||||||
"""Decode a histogram-mode body into per-channel sample arrays.
|
"Tran": [peak_count_per_interval, ...], # 16-count units (LSB = 0.005 in/s)
|
||||||
|
"Vert": [..., ...],
|
||||||
Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
|
"Long": [..., ...],
|
||||||
with each channel's per-interval peak values in ADC counts.
|
"MicL": [..., ...], # raw ADC counts
|
||||||
Returns ``None`` if the body cannot be parsed.
|
}
|
||||||
"""
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Then in `event_file_io.read_blastware_file()`:
|
Run through `waveform_codec.decoded_to_adc_counts` to get 1-count ADC
|
||||||
|
units (geo ×16, mic passthrough) for the standard `.h5` writer.
|
||||||
|
|
||||||
```python
|
For the full per-interval record with frequencies + metadata, use
|
||||||
decoded = decode_waveform_v2(body)
|
`decode_histogram_body_full()`.
|
||||||
if decoded is None:
|
|
||||||
decoded = decode_histogram_body(body)
|
|
||||||
if decoded is None:
|
|
||||||
log.warning(...)
|
|
||||||
samples = {"Tran": [], ...}
|
|
||||||
else:
|
|
||||||
samples = decoded_to_adc_counts(decoded)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Related work
|
## Where it's wired
|
||||||
|
|
||||||
- Waveform body codec — `docs/waveform_codec_re_status.md` (✅ done)
|
- `minimateplus/event_file_io.py:read_blastware_file()` — first tries
|
||||||
- Protocol reference for histogram mode — `docs/instantel_protocol_reference.md §7.6.2`
|
the waveform codec, falls back to the histogram codec when the
|
||||||
- Backfill script that consumes the decoder output — `scripts/backfill_sidecars.py`
|
waveform preamble isn't present. Same output shape, same
|
||||||
|
downstream pipeline.
|
||||||
|
- `scripts/backfill_sidecars.py` — the `has_samples` short-circuit
|
||||||
|
added during the histogram-codec-pending era still serves as a
|
||||||
|
defensive guard against truly undecodable files, but no longer
|
||||||
|
fires for valid histograms.
|
||||||
|
|
||||||
|
## Companion reference
|
||||||
|
|
||||||
|
- `docs/waveform_codec_re_status.md` — sibling status doc for the
|
||||||
|
much-more-complex waveform-mode codec.
|
||||||
|
- `docs/instantel_protocol_reference.md §7.6.2` — historical
|
||||||
|
protocol-reference entry. Structural framing matches what we
|
||||||
|
found; per-sample semantics were less documented than the `✅
|
||||||
|
CONFIRMED` badge suggested. This doc supersedes §7.6.2 where they
|
||||||
|
conflict on confidence level.
|
||||||
|
|||||||
@@ -28,6 +28,7 @@ from .models import Event, PeakValues, ProjectInfo, Timestamp
|
|||||||
from . import blastware_file as _bw # avoid circular reference at module load
|
from . import blastware_file as _bw # avoid circular reference at module load
|
||||||
from .bw_ascii_report import BwAsciiReport
|
from .bw_ascii_report import BwAsciiReport
|
||||||
from .waveform_codec import decode_waveform_v2, decoded_to_adc_counts
|
from .waveform_codec import decode_waveform_v2, decoded_to_adc_counts
|
||||||
|
from .histogram_codec import decode_histogram_body
|
||||||
|
|
||||||
# Reference pressure for dB(L) → psi conversion (20 µPa expressed in psi).
|
# Reference pressure for dB(L) → psi conversion (20 µPa expressed in psi).
|
||||||
# Same constant as sfm/sfm_webapp.html so server-side and browser-side
|
# Same constant as sfm/sfm_webapp.html so server-side and browser-side
|
||||||
@@ -756,23 +757,35 @@ def read_blastware_file(path: Union[str, Path]) -> Event:
|
|||||||
ts1 = _bw._decode_ts_be(footer[2:10])
|
ts1 = _bw._decode_ts_be(footer[2:10])
|
||||||
ts2 = _bw._decode_ts_be(footer[10:18])
|
ts2 = _bw._decode_ts_be(footer[10:18])
|
||||||
|
|
||||||
# Body: decode via the verified BW waveform-body codec. The body
|
# Body: decode via the verified body codecs. Two formats coexist:
|
||||||
# starts with the codec's 7-byte preamble ``00 02 00 [Tran[0] BE]
|
|
||||||
# [Tran[1] BE]`` and continues with the tagged-block stream the codec
|
|
||||||
# walks. See ``minimateplus/waveform_codec.py`` + ``docs/waveform_codec_re_status.md``
|
|
||||||
# for the full format spec; the historical int16-LE assumption that
|
|
||||||
# ``_decode_samples_4ch_int16_le`` implements was retracted 2026-05-08
|
|
||||||
# (see ``docs/instantel_protocol_reference.md`` §7.6.1).
|
|
||||||
#
|
#
|
||||||
# If decode fails (malformed file, truncated body, synthetic test
|
# 1. Waveform-mode (.AB0W) — starts with 7-byte preamble
|
||||||
# input), fall back to empty channels — the rest of the event
|
# ``00 02 00 [Tran[0] BE] [Tran[1] BE]`` followed by the
|
||||||
# (timestamp, waveform_key, project strings) is still recoverable
|
# tagged-block delta stream documented in
|
||||||
# and useful. The peaks-from-samples helper handles empty input
|
# ``docs/waveform_codec_re_status.md`` and §7.6.1 of the
|
||||||
# gracefully.
|
# protocol reference. Decoded by ``waveform_codec.decode_waveform_v2``.
|
||||||
|
#
|
||||||
|
# 2. Histogram-mode (.AB0H) — a sequence of 32-byte blocks, one
|
||||||
|
# per histogram interval, each carrying per-channel peak +
|
||||||
|
# half-period values. Decoded by
|
||||||
|
# ``histogram_codec.decode_histogram_body``. Both codecs
|
||||||
|
# return the same channel-grouped output shape, so consumers
|
||||||
|
# don't need to special-case mode.
|
||||||
|
#
|
||||||
|
# The historical ``_decode_samples_4ch_int16_le`` int16-LE
|
||||||
|
# interpretation was retracted 2026-05-08 (see protocol-ref §7.6.1
|
||||||
|
# retraction box) — it produced ±32K noise on every event.
|
||||||
|
#
|
||||||
|
# If both codecs fail (malformed file, truncated body, unrecognised
|
||||||
|
# mode, synthetic test input), fall back to empty channels — the
|
||||||
|
# rest of the event (timestamp, waveform_key, project strings) is
|
||||||
|
# still recoverable and useful.
|
||||||
decoded = decode_waveform_v2(body)
|
decoded = decode_waveform_v2(body)
|
||||||
|
if decoded is None:
|
||||||
|
decoded = decode_histogram_body(body)
|
||||||
if decoded is None:
|
if decoded is None:
|
||||||
log.warning(
|
log.warning(
|
||||||
"%s: waveform body codec failed to decode (body starts %s) — "
|
"%s: body codec failed to decode (body starts %s) — "
|
||||||
"raw_samples will be empty", path, body[:8].hex(" "),
|
"raw_samples will be empty", path, body[:8].hex(" "),
|
||||||
)
|
)
|
||||||
samples = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
|
samples = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
|
||||||
|
|||||||
@@ -0,0 +1,232 @@
|
|||||||
|
"""
|
||||||
|
histogram_codec.py — decoder for MiniMate Plus histogram-mode event bodies.
|
||||||
|
|
||||||
|
FULLY DECODED 2026-05-20. Every field in every block, verified
|
||||||
|
byte-exact against BW's ASCII export across multiple histogram
|
||||||
|
fixtures.
|
||||||
|
|
||||||
|
The histogram-mode body is a stream of 32-byte fixed-length blocks,
|
||||||
|
one block per histogram interval. Each block carries the per-interval
|
||||||
|
peak amplitude + zero-crossing frequency for all four channels (Tran,
|
||||||
|
Vert, Long, MicL).
|
||||||
|
|
||||||
|
────────────────────────────────────────────────────────────────────────────
|
||||||
|
Body layout (CONFIRMED 2026-05-20)
|
||||||
|
────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
[stream of 32-byte blocks]
|
||||||
|
|
||||||
|
Body length is approximately ``n_intervals * 32`` bytes plus a small
|
||||||
|
trailing remnant (1-9 bytes typically) at the very end. Walker should
|
||||||
|
iterate 32-stride and stop before the tail.
|
||||||
|
|
||||||
|
────────────────────────────────────────────────────────────────────────────
|
||||||
|
32-byte block layout
|
||||||
|
────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
[0] 0x00 always-zero tag
|
||||||
|
[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment
|
||||||
|
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …)
|
||||||
|
[4:6] 0x000a (uint16 LE) constant marker (= 10)
|
||||||
|
[6:8] T_peak_count uint16 LE Tran peak (count × 0.005 → in/s)
|
||||||
|
[8:10] T_halfperiod uint16 LE Tran half-period in samples (freq = 512 / halfp Hz)
|
||||||
|
[10:12] V_peak_count uint16 LE
|
||||||
|
[12:14] V_halfperiod uint16 LE
|
||||||
|
[14:16] L_peak_count uint16 LE
|
||||||
|
[16:18] L_halfperiod uint16 LE
|
||||||
|
[18:20] M_peak_count uint16 LE MicL peak (count → dB via mic_count_to_db)
|
||||||
|
[20:22] M_halfperiod uint16 LE MicL half-period in samples (freq = 512 / halfp Hz)
|
||||||
|
[22:24] 0x00 0x00 constant
|
||||||
|
[24:28] 4-byte variable purpose unknown (possibly CRC or timestamp delta)
|
||||||
|
[28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature
|
||||||
|
|
||||||
|
Block-identification anchor: ``block[22:24] == b"\\x00\\x00"`` AND
|
||||||
|
``block[28:32] == b"\\x1e\\x0a\\x00\\x00"``. This is the reliable
|
||||||
|
distinguisher from non-block content in the file.
|
||||||
|
|
||||||
|
────────────────────────────────────────────────────────────────────────────
|
||||||
|
Per-channel encoding
|
||||||
|
────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
Geophone channels (Tran, Vert, Long):
|
||||||
|
- peak_count × 0.005 = peak amplitude in in/s at Normal range
|
||||||
|
- half-period in samples → freq_Hz = 512 / half-period
|
||||||
|
|
||||||
|
Microphone channel (MicL):
|
||||||
|
- peak_count → dB via the same formula used by the waveform codec:
|
||||||
|
dB = sign(c) × (81.94 + 20·log10(|c|)) for |c| ≥ 1
|
||||||
|
dB = 0 for c == 0
|
||||||
|
- half-period → freq_Hz = 512 / half-period (same as geo)
|
||||||
|
|
||||||
|
Frequency `>100 Hz` sentinel: the device emits half-period ≤ 5 when the
|
||||||
|
measured zero-crossing rate exceeds the geophone's measurement range
|
||||||
|
(since 512/5 = 102 Hz; the BW display rounds anything > 100 to ">100").
|
||||||
|
|
||||||
|
────────────────────────────────────────────────────────────────────────────
|
||||||
|
Output shape
|
||||||
|
────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
``decode_histogram_body`` returns a per-channel dict matching the
|
||||||
|
waveform codec's shape so the rest of the pipeline (.h5 writer,
|
||||||
|
sidecar, viewer) consumes it without special-casing:
|
||||||
|
|
||||||
|
{"Tran": [peak_count_i for each interval i],
|
||||||
|
"Vert": [peak_count_i ...],
|
||||||
|
"Long": [peak_count_i ...],
|
||||||
|
"MicL": [peak_count_i ...]}
|
||||||
|
|
||||||
|
Values are in **16-count units for geo** (LSB = 0.005 in/s, matching
|
||||||
|
``decode_waveform_v2``) and **1-count units for mic** (matching the
|
||||||
|
waveform codec's mic convention). Run through
|
||||||
|
``waveform_codec.decoded_to_adc_counts`` to scale geo to 1-count ADC.
|
||||||
|
|
||||||
|
Per-interval frequencies are NOT returned — they're auxiliary data,
|
||||||
|
not waveform samples. Consumers needing frequencies can call
|
||||||
|
``decode_histogram_body_full()`` for the structured per-interval
|
||||||
|
record list.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import struct
|
||||||
|
from typing import List, Optional, Tuple
|
||||||
|
|
||||||
|
# Block-end signature: constant `1e 0a 00 00` in bytes [28:32] of every
|
||||||
|
# real data block. More distinctive than the byte-22 `00 00` (which
|
||||||
|
# matches many false positives), so we anchor on this.
|
||||||
|
_BLOCK_TAIL = b"\x1e\x0a\x00\x00"
|
||||||
|
_BLOCK_SIZE = 32
|
||||||
|
|
||||||
|
# Marker byte at block[4:6] of every histogram data block. Used as
|
||||||
|
# additional validation that we're looking at a real block.
|
||||||
|
_BLOCK_MARKER = 10
|
||||||
|
|
||||||
|
# Geo peak scaling: stored as "count × 0.005 in/s" where 1 count = one
|
||||||
|
# 0.005 in/s display quantum. Equivalent to the waveform codec's
|
||||||
|
# 16-count-unit output (1 unit = 0.005 in/s = 16 ADC counts).
|
||||||
|
_GEO_LSB_INS = 0.005
|
||||||
|
|
||||||
|
# Frequency formula: freq_Hz = _FREQ_NUMERATOR / half_period_samples.
|
||||||
|
# Empirically determined to be 512 (= sample_rate / 2, where sample rate
|
||||||
|
# is 1024 sps for the standard MiniMate Plus configuration).
|
||||||
|
_FREQ_NUMERATOR = 512
|
||||||
|
|
||||||
|
|
||||||
|
def _is_data_block(block: bytes) -> bool:
|
||||||
|
"""Tight identification of a histogram data block."""
|
||||||
|
if len(block) < _BLOCK_SIZE:
|
||||||
|
return False
|
||||||
|
if block[28:32] != _BLOCK_TAIL:
|
||||||
|
return False
|
||||||
|
if block[22:24] != b"\x00\x00":
|
||||||
|
return False
|
||||||
|
if block[0] != 0x00:
|
||||||
|
return False
|
||||||
|
marker = block[4] | (block[5] << 8)
|
||||||
|
if marker != _BLOCK_MARKER:
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def _decode_block(block: bytes) -> dict:
|
||||||
|
"""Decode one 32-byte histogram block. Caller must have validated
|
||||||
|
with ``_is_data_block`` first."""
|
||||||
|
# All 16-bit fields are little-endian unsigned. Peak counts are
|
||||||
|
# always non-negative; half-periods are always positive when valid.
|
||||||
|
t_peak, t_halfp, v_peak, v_halfp, l_peak, l_halfp, m_peak, m_halfp = struct.unpack_from(
|
||||||
|
"<HHHHHHHH", block, 6
|
||||||
|
)
|
||||||
|
segment_id = block[1]
|
||||||
|
block_ctr = block[2] | (block[3] << 8)
|
||||||
|
var_meta = bytes(block[24:28])
|
||||||
|
return {
|
||||||
|
"segment_id": segment_id,
|
||||||
|
"block_ctr": block_ctr,
|
||||||
|
"t_peak": t_peak,
|
||||||
|
"t_halfp": t_halfp,
|
||||||
|
"v_peak": v_peak,
|
||||||
|
"v_halfp": v_halfp,
|
||||||
|
"l_peak": l_peak,
|
||||||
|
"l_halfp": l_halfp,
|
||||||
|
"m_peak": m_peak,
|
||||||
|
"m_halfp": m_halfp,
|
||||||
|
"meta_var": var_meta,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def walk_body(body: bytes) -> List[dict]:
|
||||||
|
"""Walk the body and return one dict per histogram interval.
|
||||||
|
|
||||||
|
Iterates 32-byte strides from offset 0. Yields a decoded record
|
||||||
|
for every block that passes ``_is_data_block`` validation. Stops
|
||||||
|
when the remaining bytes are too short to form a complete block.
|
||||||
|
"""
|
||||||
|
records: List[dict] = []
|
||||||
|
for off in range(0, len(body) - _BLOCK_SIZE + 1, _BLOCK_SIZE):
|
||||||
|
blk = body[off:off + _BLOCK_SIZE]
|
||||||
|
if not _is_data_block(blk):
|
||||||
|
# Hit non-block content (likely a sync or stream marker).
|
||||||
|
# Continue walking — block alignment is fixed at 32-stride
|
||||||
|
# from offset 0, so we don't lose alignment by skipping.
|
||||||
|
continue
|
||||||
|
records.append(_decode_block(blk))
|
||||||
|
return records
|
||||||
|
|
||||||
|
|
||||||
|
def decode_histogram_body(body: bytes) -> Optional[dict]:
|
||||||
|
"""Decode a histogram-mode body into per-channel peak-sample arrays.
|
||||||
|
|
||||||
|
Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
|
||||||
|
where each channel's list contains one peak value per histogram
|
||||||
|
interval (in the same units the waveform codec uses: 16-count units
|
||||||
|
for geo, 1-count ADC units for mic). Returns ``None`` if the body
|
||||||
|
doesn't contain any valid histogram blocks.
|
||||||
|
|
||||||
|
To convert to physical units:
|
||||||
|
- Geo channels: ``count * 0.005`` = peak in in/s at Normal range
|
||||||
|
(or run through ``waveform_codec.decoded_to_adc_counts`` first
|
||||||
|
to get 1-count ADC values, then ``count / 32767 * 10.0`` for in/s)
|
||||||
|
- Mic channel: use ``waveform_codec.mic_count_to_db(count)``
|
||||||
|
"""
|
||||||
|
records = walk_body(body)
|
||||||
|
if not records:
|
||||||
|
return None
|
||||||
|
return {
|
||||||
|
"Tran": [r["t_peak"] for r in records],
|
||||||
|
"Vert": [r["v_peak"] for r in records],
|
||||||
|
"Long": [r["l_peak"] for r in records],
|
||||||
|
"MicL": [r["m_peak"] for r in records],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def decode_histogram_body_full(body: bytes) -> Optional[List[dict]]:
|
||||||
|
"""Decode a histogram-mode body into the full per-interval record list.
|
||||||
|
|
||||||
|
Same data as ``decode_histogram_body`` but in a structured form that
|
||||||
|
preserves the half-period (frequency) data for each channel + the
|
||||||
|
per-block segment_id, block_ctr, and 4-byte variable metadata.
|
||||||
|
Useful for diagnostic tools, sidecar enrichment, and future-codec
|
||||||
|
work.
|
||||||
|
|
||||||
|
Returns ``None`` if the body has no valid blocks.
|
||||||
|
"""
|
||||||
|
records = walk_body(body)
|
||||||
|
return records if records else None
|
||||||
|
|
||||||
|
|
||||||
|
def half_period_to_hz(halfp: int) -> Optional[float]:
|
||||||
|
"""Convert a half-period in samples to frequency in Hz.
|
||||||
|
|
||||||
|
Returns ``None`` for half-period ≤ 5 — the device emits values in
|
||||||
|
that range when the measured zero-crossing rate exceeds 100 Hz
|
||||||
|
(the BW display reports `>100 Hz` for such cases). Callers can
|
||||||
|
treat ``None`` as the `>100 Hz` sentinel.
|
||||||
|
"""
|
||||||
|
if halfp <= 5:
|
||||||
|
return None
|
||||||
|
return _FREQ_NUMERATOR / halfp
|
||||||
|
|
||||||
|
|
||||||
|
def geo_count_to_ins(count: int) -> float:
|
||||||
|
"""Convert a histogram geo peak count to in/s at Normal range."""
|
||||||
|
return count * _GEO_LSB_INS
|
||||||
@@ -307,21 +307,15 @@ def main(argv=None) -> int:
|
|||||||
# (sha mismatch / tool_version too old). The .h5 and
|
# (sha mismatch / tool_version too old). The .h5 and
|
||||||
# the sidecar are both derived from the same decoder
|
# the sidecar are both derived from the same decoder
|
||||||
# output, so if the sidecar is stale, so is the .h5.
|
# output, so if the sidecar is stale, so is the .h5.
|
||||||
# This is the path that recovers from the broken-
|
|
||||||
# int16-LE codec era — bumping TOOL_VERSION to 0.20.0+
|
|
||||||
# marks every pre-codec sidecar stale, which now
|
|
||||||
# correctly cascades to .h5 regeneration too.
|
|
||||||
#
|
#
|
||||||
# Skip the .h5 write when the decoder couldn't produce
|
# Both waveform and histogram bodies now decode to real
|
||||||
# samples — this is the histogram-mode case today
|
# samples via event_file_io.read_blastware_file → either
|
||||||
# (waveform_codec.decode_waveform_v2 only handles the
|
# waveform_codec.decode_waveform_v2 or histogram_codec.
|
||||||
# waveform-mode body format per §7.6.1; the histogram
|
# decode_histogram_body. If samples are still empty after
|
||||||
# codec at §7.6.2 is documented but not yet implemented).
|
# both codecs run, it's a genuine "we can't decode this
|
||||||
# Without this check we'd replace the existing (broken
|
# file" case (truncated, malformed, or unknown mode);
|
||||||
# int16-LE) histogram .h5 with an empty one, which is
|
# skip the .h5 write so we don't replace whatever's
|
||||||
# arguably worse for any consumer expecting non-empty
|
# there with an empty placeholder.
|
||||||
# sample arrays. When the histogram codec lands, this
|
|
||||||
# check can come out.
|
|
||||||
has_samples = bool(
|
has_samples = bool(
|
||||||
ev.raw_samples and any(
|
ev.raw_samples and any(
|
||||||
ev.raw_samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL")
|
ev.raw_samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL")
|
||||||
@@ -336,7 +330,7 @@ def main(argv=None) -> int:
|
|||||||
and has_samples
|
and has_samples
|
||||||
)
|
)
|
||||||
if not has_samples and not args.skip_hdf5:
|
if not has_samples and not args.skip_hdf5:
|
||||||
hdf5_action = "skipped-empty-samples"
|
hdf5_action = "skipped-undecodable"
|
||||||
if need_h5:
|
if need_h5:
|
||||||
if args.dry_run:
|
if args.dry_run:
|
||||||
hdf5_action = "would (re)write"
|
hdf5_action = "would (re)write"
|
||||||
|
|||||||
@@ -0,0 +1,337 @@
|
|||||||
|
"""
|
||||||
|
test_histogram_codec.py — regression locks for the histogram body codec.
|
||||||
|
|
||||||
|
The codec is verified byte-exact against BW's ASCII export across the
|
||||||
|
in-repo histogram fixture bundle. Each test cross-checks decoded
|
||||||
|
binary fields against the corresponding .TXT row.
|
||||||
|
|
||||||
|
Run:
|
||||||
|
python -m pytest tests/test_histogram_codec.py -q
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from minimateplus.blastware_file import _WAVEFORM_HEADER_SIZE
|
||||||
|
from minimateplus.histogram_codec import (
|
||||||
|
_BLOCK_SIZE,
|
||||||
|
decode_histogram_body,
|
||||||
|
decode_histogram_body_full,
|
||||||
|
geo_count_to_ins,
|
||||||
|
half_period_to_hz,
|
||||||
|
walk_body,
|
||||||
|
)
|
||||||
|
from minimateplus.waveform_codec import mic_count_to_db
|
||||||
|
|
||||||
|
|
||||||
|
_FIXTURE_DIR = Path(__file__).resolve().parent.parent / "example-events" / "histogram"
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_body(path: Path) -> bytes:
|
||||||
|
"""Locate the body of a BW event file — bytes between the STRT
|
||||||
|
record and the 26-byte footer."""
|
||||||
|
raw = path.read_bytes()
|
||||||
|
body_start = _WAVEFORM_HEADER_SIZE + 21
|
||||||
|
pos = body_start
|
||||||
|
footer_pos = -1
|
||||||
|
while True:
|
||||||
|
pos = raw.find(b"\x0e\x08", pos)
|
||||||
|
if pos < 0 or pos + 26 > len(raw):
|
||||||
|
break
|
||||||
|
yr = (raw[pos + 4] << 8) | raw[pos + 5]
|
||||||
|
if 2015 <= yr <= 2050:
|
||||||
|
footer_pos = pos
|
||||||
|
break
|
||||||
|
pos += 1
|
||||||
|
if footer_pos < 0:
|
||||||
|
footer_pos = len(raw) - 26
|
||||||
|
return raw[body_start:footer_pos]
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_txt_rows(path: Path) -> list[tuple[str, list]]:
|
||||||
|
"""Parse a histogram .TXT into ``[(time_str, [10 col values]), …]``.
|
||||||
|
|
||||||
|
Special tokens:
|
||||||
|
- ``">100"`` (the BW-display sentinel for freq > 100 Hz) → ``None``
|
||||||
|
- non-numeric → ``None``
|
||||||
|
"""
|
||||||
|
text = path.read_text()
|
||||||
|
lines = text.splitlines()
|
||||||
|
hdr = None
|
||||||
|
for i, line in enumerate(lines):
|
||||||
|
if re.match(r"^Tran\s+", line.strip()):
|
||||||
|
hdr = i + 3 # skip 2-row header + units row
|
||||||
|
break
|
||||||
|
if hdr is None:
|
||||||
|
return []
|
||||||
|
rows: list[tuple[str, list]] = []
|
||||||
|
for line in lines[hdr:]:
|
||||||
|
parts = line.split("\t")
|
||||||
|
if len(parts) != 11:
|
||||||
|
continue
|
||||||
|
vals: list = []
|
||||||
|
for p in parts[1:]:
|
||||||
|
s = p.strip()
|
||||||
|
if s.startswith(">"):
|
||||||
|
vals.append(None) # ">100 Hz" sentinel
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
vals.append(float(s))
|
||||||
|
except ValueError:
|
||||||
|
vals.append(None)
|
||||||
|
rows.append((parts[0].strip(), vals))
|
||||||
|
return rows
|
||||||
|
|
||||||
|
|
||||||
|
# ── Block-walker plumbing ────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("fixture", [
|
||||||
|
"N844L20G.630H",
|
||||||
|
"N844L21H.2R0H",
|
||||||
|
"N844L6Z8.ZR0H",
|
||||||
|
"N844L6XE.BH0H",
|
||||||
|
"N844L23B.ND0H",
|
||||||
|
])
|
||||||
|
def test_walk_body_returns_records(fixture: str):
|
||||||
|
"""Walker yields at least one valid block per fixture."""
|
||||||
|
path = _FIXTURE_DIR / fixture
|
||||||
|
if not path.exists():
|
||||||
|
pytest.skip(f"fixture missing: {path}")
|
||||||
|
records = walk_body(_extract_body(path))
|
||||||
|
assert len(records) > 100, f"expected hundreds of blocks, got {len(records)}"
|
||||||
|
|
||||||
|
|
||||||
|
def test_walk_body_record_count_matches_txt_intervals():
|
||||||
|
"""Block count should match the .TXT interval count (off-by-one
|
||||||
|
at the tail is acceptable — last interval may be truncated at
|
||||||
|
recording stop)."""
|
||||||
|
bin_path = _FIXTURE_DIR / "N844L20G.630H"
|
||||||
|
txt_path = _FIXTURE_DIR / "N844L20G_630H_ASCII.TXT"
|
||||||
|
if not bin_path.exists() or not txt_path.exists():
|
||||||
|
pytest.skip("fixture missing")
|
||||||
|
records = walk_body(_extract_body(bin_path))
|
||||||
|
txt_rows = _parse_txt_rows(txt_path)
|
||||||
|
# Allow off-by-one (final block may have been mid-write at stop)
|
||||||
|
assert abs(len(records) - len(txt_rows)) <= 1, (
|
||||||
|
f"binary {len(records)} blocks vs TXT {len(txt_rows)} intervals"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_walk_body_segment_id_increments_every_256_blocks():
|
||||||
|
"""Segment ID advances 0→1→2→… after every 256 blocks within
|
||||||
|
one event."""
|
||||||
|
path = _FIXTURE_DIR / "N844L20G.630H"
|
||||||
|
if not path.exists():
|
||||||
|
pytest.skip("fixture missing")
|
||||||
|
records = walk_body(_extract_body(path))
|
||||||
|
# Group by segment_id and verify counts make sense
|
||||||
|
from collections import Counter
|
||||||
|
seg_counts = Counter(r["segment_id"] for r in records)
|
||||||
|
# First 3 segments should each have exactly 256 blocks (N844L20G has
|
||||||
|
# 791 blocks → 256+256+256+23 → segments 0/1/2/3)
|
||||||
|
assert seg_counts[0] == 256
|
||||||
|
assert seg_counts[1] == 256
|
||||||
|
assert seg_counts[2] == 256
|
||||||
|
assert seg_counts[3] == len(records) - 3 * 256
|
||||||
|
|
||||||
|
|
||||||
|
# ── Field-by-field decode verification against .TXT ground truth ─────────────
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("fixture", [
|
||||||
|
"N844L20G.630H",
|
||||||
|
"N844L6Z8.ZR0H",
|
||||||
|
"N844L6XE.BH0H",
|
||||||
|
"N844L23B.ND0H",
|
||||||
|
])
|
||||||
|
def test_decoded_geo_peaks_match_txt(fixture: str):
|
||||||
|
"""For every block, decoded Tran/Vert/Long peak (count × 0.005)
|
||||||
|
matches the corresponding .TXT cell."""
|
||||||
|
bin_path = _FIXTURE_DIR / fixture
|
||||||
|
txt_path = _FIXTURE_DIR / (fixture.replace(".", "_") + "_ASCII.TXT")
|
||||||
|
if not bin_path.exists() or not txt_path.exists():
|
||||||
|
pytest.skip("fixture missing")
|
||||||
|
records = walk_body(_extract_body(bin_path))
|
||||||
|
txt_rows = _parse_txt_rows(txt_path)
|
||||||
|
n = min(len(records), len(txt_rows))
|
||||||
|
assert n > 0
|
||||||
|
for i in range(n):
|
||||||
|
rec = records[i]
|
||||||
|
_ts, txt = txt_rows[i]
|
||||||
|
# TXT cols 0/2/4 are T/V/L peak in in/s
|
||||||
|
for slot, key in (("T", "t_peak"), ("V", "v_peak"), ("L", "l_peak")):
|
||||||
|
col = {"T": 0, "V": 2, "L": 4}[slot]
|
||||||
|
decoded_ips = geo_count_to_ins(rec[key])
|
||||||
|
expected = txt[col]
|
||||||
|
assert abs(decoded_ips - expected) < 0.0005, (
|
||||||
|
f"{fixture} block {i} {slot}_peak: "
|
||||||
|
f"decoded={decoded_ips:.4f} vs txt={expected:.4f}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("fixture", [
|
||||||
|
"N844L6Z8.ZR0H",
|
||||||
|
"N844L6XE.BH0H",
|
||||||
|
])
|
||||||
|
def test_decoded_geo_freqs_match_txt(fixture: str):
|
||||||
|
"""Decoded half-period → Hz matches the .TXT freq column for blocks
|
||||||
|
where the freq is in-range (not the `>100 Hz` sentinel)."""
|
||||||
|
bin_path = _FIXTURE_DIR / fixture
|
||||||
|
txt_path = _FIXTURE_DIR / (fixture.replace(".", "_") + "_ASCII.TXT")
|
||||||
|
if not bin_path.exists() or not txt_path.exists():
|
||||||
|
pytest.skip("fixture missing")
|
||||||
|
records = walk_body(_extract_body(bin_path))
|
||||||
|
txt_rows = _parse_txt_rows(txt_path)
|
||||||
|
n = min(len(records), len(txt_rows))
|
||||||
|
for i in range(n):
|
||||||
|
rec = records[i]
|
||||||
|
_ts, txt = txt_rows[i]
|
||||||
|
for slot, key, col in (("T", "t_halfp", 1), ("V", "v_halfp", 3), ("L", "l_halfp", 5)):
|
||||||
|
decoded_hz = half_period_to_hz(rec[key])
|
||||||
|
expected = txt[col]
|
||||||
|
if expected is None:
|
||||||
|
# TXT shows `>100 Hz` — codec should also yield None
|
||||||
|
assert decoded_hz is None or decoded_hz > 100, (
|
||||||
|
f"{fixture} block {i} {slot}_freq: codec says "
|
||||||
|
f"{decoded_hz} but TXT says >100"
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
# TXT rounds; allow ±1 Hz
|
||||||
|
assert decoded_hz is not None
|
||||||
|
assert abs(decoded_hz - expected) < 1.0, (
|
||||||
|
f"{fixture} block {i} {slot}_freq: "
|
||||||
|
f"decoded={decoded_hz:.2f} Hz vs txt={expected:.2f} Hz"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("fixture", [
|
||||||
|
"N844L6XE.BH0H",
|
||||||
|
"N844L23B.ND0H",
|
||||||
|
"N844L6Z8.ZR0H",
|
||||||
|
])
|
||||||
|
def test_decoded_mic_db_matches_txt(fixture: str):
|
||||||
|
"""Decoded MicL peak count → dB(L) via mic_count_to_db matches
|
||||||
|
the .TXT dB(L) column."""
|
||||||
|
bin_path = _FIXTURE_DIR / fixture
|
||||||
|
txt_path = _FIXTURE_DIR / (fixture.replace(".", "_") + "_ASCII.TXT")
|
||||||
|
if not bin_path.exists() or not txt_path.exists():
|
||||||
|
pytest.skip("fixture missing")
|
||||||
|
records = walk_body(_extract_body(bin_path))
|
||||||
|
txt_rows = _parse_txt_rows(txt_path)
|
||||||
|
n = min(len(records), len(txt_rows))
|
||||||
|
for i in range(n):
|
||||||
|
rec = records[i]
|
||||||
|
_ts, txt = txt_rows[i]
|
||||||
|
# TXT col 8 = MicL dB(L)
|
||||||
|
decoded_db = mic_count_to_db(rec["m_peak"])
|
||||||
|
expected = txt[8]
|
||||||
|
if expected is None:
|
||||||
|
continue
|
||||||
|
# BW rounds to 1 decimal place for display. Tolerance 0.1 dB
|
||||||
|
# absorbs both rounding modes (truncate vs round-half-even).
|
||||||
|
assert abs(decoded_db - expected) < 0.1, (
|
||||||
|
f"{fixture} block {i} M_dB: "
|
||||||
|
f"decoded={decoded_db:.2f} dB vs txt={expected:.2f} dB"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("fixture", [
|
||||||
|
"N844L20G.630H",
|
||||||
|
"N844L6Z8.ZR0H",
|
||||||
|
])
|
||||||
|
def test_decoded_mic_freq_matches_txt(fixture: str):
|
||||||
|
"""Decoded MicL half-period → freq matches the .TXT col 9 freq."""
|
||||||
|
bin_path = _FIXTURE_DIR / fixture
|
||||||
|
txt_path = _FIXTURE_DIR / (fixture.replace(".", "_") + "_ASCII.TXT")
|
||||||
|
if not bin_path.exists() or not txt_path.exists():
|
||||||
|
pytest.skip("fixture missing")
|
||||||
|
records = walk_body(_extract_body(bin_path))
|
||||||
|
txt_rows = _parse_txt_rows(txt_path)
|
||||||
|
n = min(len(records), len(txt_rows))
|
||||||
|
for i in range(n):
|
||||||
|
rec = records[i]
|
||||||
|
_ts, txt = txt_rows[i]
|
||||||
|
decoded_hz = half_period_to_hz(rec["m_halfp"])
|
||||||
|
expected = txt[9]
|
||||||
|
if expected is None:
|
||||||
|
assert decoded_hz is None or decoded_hz > 100
|
||||||
|
continue
|
||||||
|
assert decoded_hz is not None
|
||||||
|
assert abs(decoded_hz - expected) < 1.0, (
|
||||||
|
f"{fixture} block {i} M_freq: "
|
||||||
|
f"decoded={decoded_hz:.2f} Hz vs txt={expected:.2f} Hz"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Public API ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def test_decode_histogram_body_returns_four_channels():
|
||||||
|
"""The public API returns the standard 4-channel dict shape."""
|
||||||
|
path = _FIXTURE_DIR / "N844L20G.630H"
|
||||||
|
if not path.exists():
|
||||||
|
pytest.skip("fixture missing")
|
||||||
|
decoded = decode_histogram_body(_extract_body(path))
|
||||||
|
assert decoded is not None
|
||||||
|
assert set(decoded.keys()) == {"Tran", "Vert", "Long", "MicL"}
|
||||||
|
# All channels same length (one value per histogram interval)
|
||||||
|
n = len(decoded["Tran"])
|
||||||
|
assert all(len(decoded[ch]) == n for ch in ("Vert", "Long", "MicL"))
|
||||||
|
assert n > 100
|
||||||
|
|
||||||
|
|
||||||
|
def test_decode_histogram_body_returns_none_for_non_histogram():
|
||||||
|
"""A waveform-mode body (starts with 00 02 00) doesn't decode as
|
||||||
|
a histogram body."""
|
||||||
|
fake_waveform_body = b"\x00\x02\x00" + b"\x00" * 100
|
||||||
|
assert decode_histogram_body(fake_waveform_body) is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_decode_histogram_body_returns_none_for_garbage():
|
||||||
|
"""Bytes that don't form valid blocks return None."""
|
||||||
|
assert decode_histogram_body(b"\xff" * 256) is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_decode_histogram_body_full_preserves_frequency_data():
|
||||||
|
"""The structured-record API preserves the per-channel half-period
|
||||||
|
fields that the flat-channel API drops."""
|
||||||
|
path = _FIXTURE_DIR / "N844L20G.630H"
|
||||||
|
if not path.exists():
|
||||||
|
pytest.skip("fixture missing")
|
||||||
|
records = decode_histogram_body_full(_extract_body(path))
|
||||||
|
assert records is not None
|
||||||
|
r0 = records[0]
|
||||||
|
expected_fields = {
|
||||||
|
"segment_id", "block_ctr",
|
||||||
|
"t_peak", "t_halfp", "v_peak", "v_halfp",
|
||||||
|
"l_peak", "l_halfp", "m_peak", "m_halfp",
|
||||||
|
"meta_var",
|
||||||
|
}
|
||||||
|
assert set(r0.keys()) >= expected_fields
|
||||||
|
|
||||||
|
|
||||||
|
# ── Helpers ──────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def test_half_period_to_hz_sentinel():
|
||||||
|
"""Half-period ≤ 5 returns None (the `>100 Hz` sentinel)."""
|
||||||
|
assert half_period_to_hz(5) is None
|
||||||
|
assert half_period_to_hz(1) is None
|
||||||
|
# halfp=6 gives 512/6 = 85.3 Hz — below the >100 threshold
|
||||||
|
assert half_period_to_hz(6) == pytest.approx(85.33, abs=0.01)
|
||||||
|
|
||||||
|
|
||||||
|
def test_geo_count_to_ins_scale():
|
||||||
|
"""1 count = 0.005 in/s at Normal range."""
|
||||||
|
assert geo_count_to_ins(1) == pytest.approx(0.005)
|
||||||
|
assert geo_count_to_ins(10) == pytest.approx(0.050)
|
||||||
|
assert geo_count_to_ins(0) == 0.0
|
||||||
Reference in New Issue
Block a user