2 Commits

Author SHA1 Message Date
serversdown d506ebc103 histogram_codec: peak count is uint8 (not uint16 LE) — properly cracks
the BE9558 / BE18003 extension-byte case

The bytes at [7]/[11]/[15]/[19] are an annotation field (purpose still
unclear — empirically non-zero on intervals with sub-Hz or unmeasurable
freq), NOT the high byte of the peak count.  The N844 fixture corpus
the original RE was done against had zero values in those bytes for
every block, so uint8 and uint16 LE were equivalent there — but on
real BE9558 Tran-drift events and BE18003 Histogram+Continuous events
the uint16 LE interpretation produced peaks up to 268 in/s and 35×
inflated PVS sums.

Cross-correlated against BW's per-interval ASCII export on:
  - K558LKZU/LL1P/LL3K  → 100% T/V/L/M peak match (1435 blocks each)
  - T003LKZR/LL0O/LL1M  → 100% T/V/L, 99.3% M (0.05 dB rounding only)
  - N599LKZS/LL0L        → 100% all channels
  - N844 fixture corpus  → 100% all channels (unchanged)

Annotations preserved on every record for future RE; the defensive
_MAX_PEAK_COUNT bound is no longer needed (uint8 maxes at 1.275 in/s,
well below any physical limit).

Synthetic regression test added using the verbatim K558LKZU.RE0H
interval-12 block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 06:05:19 +00:00
serversdown e949232875 histogram_codec + backfill: tighter peak ceiling, preserve bw_report
histogram_codec: drop _MAX_PEAK_COUNT 4096 → 2200. The old ceiling
let extension-byte blocks slip through at up to 20.48 in/s per
channel, producing 35× inflated PVS sums when first deployed to
prod. 2200 covers Normal-range full-scale (10 in/s = 2000 counts)
plus 10% headroom for quantization edge cases.

backfill_sidecars: also preserve the bw_report block alongside
review + extensions when regenerating sidecars. event_to_sidecar_dict
takes a BwAsciiReport dataclass not a dict, so for bw_report we
overlay the existing block after regen rather than passing as a kwarg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:50:10 +00:00
5 changed files with 156 additions and 51 deletions
+1 -1
View File
@@ -1,6 +1,6 @@
/bridges/captures/ /bridges/captures/
/example-events/ /example-events/
/tests/fixtures/
/manuals/ /manuals/
# Python build artifacts # Python build artifacts
+35 -5
View File
@@ -12,7 +12,21 @@ implementation lives in `minimateplus/histogram_codec.py`.
in-repo histogram fixture corpus decodes byte-exact against BW's in-repo histogram fixture corpus decodes byte-exact against BW's
ASCII export. ASCII export.
24 regression tests pass against ~3,500 blocks across 5 fixtures. 26 regression tests pass against ~3,500 blocks across 5 in-repo
fixtures, plus a synthetic regression block taken from a real
BE9558 prod event to lock in the uint8-peak interpretation.
**Important correction (2026-05-21):** the per-channel peak count
is `uint8` at byte[6]/[10]/[14]/[18], NOT `uint16 LE` at byte[6:8]
etc. The N844 fixture corpus the original RE was done against has
zero values in bytes [7]/[11]/[15]/[19] for every block, so the
two interpretations happened to be equivalent. Cross-correlating
non-N844 events (BE9558 Tran-drift, BE18003 Histogram+Continuous)
against BW's per-interval ASCII export — 4 channels × ~1400 blocks
per event × multiple events = 100% byte-exact only when the peak
is read as uint8. Reading as uint16 LE produced peaks up to 268
in/s per channel and 35× inflated PVS sums when first deployed to
prod (rolled back, root-caused, and fixed in commit 7183b95+1).
## Body format ## Body format
@@ -27,15 +41,21 @@ Each block represents one histogram interval. Block layout:
[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment [1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …) [2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …)
[4:6] 0x000a (uint16 LE) constant marker (= 10) [4:6] 0x000a (uint16 LE) constant marker (= 10)
[6:8] T_peak_count uint16 LE Tran peak (count × 0.005 → in/s at Normal) [6] T_peak_count uint8 Tran peak (count × 0.005 → in/s at Normal,
max 1.275 in/s — fits in uint8)
[7] T_annotation uint8 empirically non-zero on intervals with sub-Hz
or unmeasurable freq; meaning not fully RE'd
[8:10] T_halfperiod uint16 LE Tran half-period in samples [8:10] T_halfperiod uint16 LE Tran half-period in samples
(freq_Hz = 512 / halfp; ≤ 5 means ">100 Hz") (freq_Hz = 512 / halfp; ≤ 5 means ">100 Hz")
[10:12] V_peak_count uint16 LE Vert peak [10] V_peak_count uint8 Vert peak
[11] V_annotation uint8
[12:14] V_halfperiod uint16 LE Vert freq half-period [12:14] V_halfperiod uint16 LE Vert freq half-period
[14:16] L_peak_count uint16 LE Long peak [14] L_peak_count uint8 Long peak
[15] L_annotation uint8
[16:18] L_halfperiod uint16 LE Long freq half-period [16:18] L_halfperiod uint16 LE Long freq half-period
[18:20] M_peak_count uint16 LE MicL peak count [18] M_peak_count uint8 MicL peak count
(dB via waveform_codec.mic_count_to_db) (dB via waveform_codec.mic_count_to_db)
[19] M_annotation uint8
[20:22] M_halfperiod uint16 LE MicL freq half-period [20:22] M_halfperiod uint16 LE MicL freq half-period
[22:24] 0x00 0x00 constant [22:24] 0x00 0x00 constant
[24:28] 4-byte variable purpose unknown — possibly CRC, [24:28] 4-byte variable purpose unknown — possibly CRC,
@@ -99,6 +119,16 @@ slot[8] = 9 → 512/9 = 56.9 → 57 Hz ✓ M_freq
## What's NOT yet decoded ## What's NOT yet decoded
- **Annotation bytes (`block[7]/[11]/[15]/[19]`)**. Empirically
non-zero on intervals where the per-channel ZC frequency comes
out as `N/A` or sub-Hz (`<1.0`, `1.X`). Hypothesis tested in the
RE session: byte != 0 ↔ sub-Hz freq. Only ~50% correlation
across the K558 corpus, so the relationship is more complex.
Possibilities: time-of-peak-within-interval, halfp extension for
very-long-period signals, or a debug/diagnostic field the firmware
writes opportunistically. Doesn't affect peak amplitudes or
waveform reconstruction. Captured as `record["annotations"]` for
future RE.
- **4-byte variable metadata field (bytes 24:28)**. Not needed for - **4-byte variable metadata field (bytes 24:28)**. Not needed for
waveform reconstruction. Speculation: per-block CRC, sub-second waveform reconstruction. Speculation: per-block CRC, sub-second
timestamp offset, or a Mic psi(L) count not in the 9 samples. timestamp offset, or a Mic psi(L) count not in the 9 samples.
+54 -38
View File
@@ -28,18 +28,32 @@ iterate 32-stride and stop before the tail.
[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment [1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …) [2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …)
[4:6] 0x000a (uint16 LE) constant marker (= 10) [4:6] 0x000a (uint16 LE) constant marker (= 10)
[6:8] T_peak_count uint16 LE Tran peak (count × 0.005 → in/s) [6] T_peak_count uint8 Tran peak (count × 0.005 → in/s, max 1.275 in/s)
[7] T_annotation uint8 empirically non-zero on intervals with sub-Hz
or unmeasurable Tran freq; meaning not fully RE'd
[8:10] T_halfperiod uint16 LE Tran half-period in samples (freq = 512 / halfp Hz) [8:10] T_halfperiod uint16 LE Tran half-period in samples (freq = 512 / halfp Hz)
[10:12] V_peak_count uint16 LE [10] V_peak_count uint8
[11] V_annotation uint8
[12:14] V_halfperiod uint16 LE [12:14] V_halfperiod uint16 LE
[14:16] L_peak_count uint16 LE [14] L_peak_count uint8
[15] L_annotation uint8
[16:18] L_halfperiod uint16 LE [16:18] L_halfperiod uint16 LE
[18:20] M_peak_count uint16 LE MicL peak (count → dB via mic_count_to_db) [18] M_peak_count uint8 MicL peak (count → dB via mic_count_to_db)
[19] M_annotation uint8
[20:22] M_halfperiod uint16 LE MicL half-period in samples (freq = 512 / halfp Hz) [20:22] M_halfperiod uint16 LE MicL half-period in samples (freq = 512 / halfp Hz)
[22:24] 0x00 0x00 constant [22:24] 0x00 0x00 constant
[24:28] 4-byte variable purpose unknown (possibly CRC or timestamp delta) [24:28] 4-byte variable purpose unknown (possibly CRC or timestamp delta)
[28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature [28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature
NOTE on peak-count width: an earlier interpretation treated the peak
fields as uint16 LE spanning [6:8] / [10:12] / [14:16] / [18:20].
That happened to be byte-exact against the N844 fixture corpus only
because every annotation byte in those fixtures was zero, making
``uint16 LE == uint8``. Cross-correlating BE9558 (K558) Tran-drift
and BE18003 (T003) Histogram+Continuous events against the BW ASCII
export proved peak is uint8 alone — see test_histogram_codec.py
and docs/histogram_codec_re_status.md.
Block-identification anchor: ``block[22:24] == b"\\x00\\x00"`` AND Block-identification anchor: ``block[22:24] == b"\\x00\\x00"`` AND
``block[28:32] == b"\\x1e\\x0a\\x00\\x00"``. This is the reliable ``block[28:32] == b"\\x1e\\x0a\\x00\\x00"``. This is the reliable
distinguisher from non-block content in the file. distinguisher from non-block content in the file.
@@ -101,23 +115,6 @@ _BLOCK_SIZE = 32
# additional validation that we're looking at a real block. # additional validation that we're looking at a real block.
_BLOCK_MARKER = 10 _BLOCK_MARKER = 10
# Maximum plausible peak-count value. Normal-range geophone tops out
# at 10 in/s = 2000 counts at the 0.005 in/s per count scale; even
# Sensitive range (1.25 in/s FS) wouldn't exceed ~250. Mic counts run
# 0..~400 in observed data. 4096 leaves comfortable headroom for any
# legitimate value across all modes.
#
# Some prod blocks have been observed with peak-count fields whose
# HIGH byte is non-zero (block[7] != 0 etc.) — observed across BE9558
# and BE18003 units in Histogram-mode events. Reading these as
# uint16 LE produces values like 30981 / 41733 / 62469, which scale
# to physically impossible peaks (150+ in/s). Best guess: an
# undocumented "time-of-peak-within-interval" extension byte the
# device writes in some sub-mode (possibly Histogram+Continuous).
# Until reverse-engineered, blocks exceeding this bound are skipped
# rather than propagating bogus values into PVS computations.
_MAX_PEAK_COUNT = 4096
# Geo peak scaling: stored as "count × 0.005 in/s" where 1 count = one # Geo peak scaling: stored as "count × 0.005 in/s" where 1 count = one
# 0.005 in/s display quantum. Equivalent to the waveform codec's # 0.005 in/s display quantum. Equivalent to the waveform codec's
# 16-count-unit output (1 unit = 0.005 in/s = 16 ADC counts). # 16-count-unit output (1 unit = 0.005 in/s = 16 ADC counts).
@@ -149,23 +146,36 @@ def _decode_block(block: bytes) -> Optional[dict]:
"""Decode one 32-byte histogram block. Caller must have validated """Decode one 32-byte histogram block. Caller must have validated
with ``_is_data_block`` first. with ``_is_data_block`` first.
Returns ``None`` if any peak field exceeds ``_MAX_PEAK_COUNT`` — Returns a record with per-channel peak counts (uint8) and
those blocks contain an undocumented extension byte format whose half-periods (uint16 LE).
naive uint16 LE interpretation gives physically impossible peaks.
Skipping the block is safer than propagating bogus values into
PVS computations downstream.
""" """
# All 16-bit fields are little-endian unsigned. Peak counts are # Peak counts are uint8 at bytes [6] / [10] / [14] / [18]. The
# always non-negative; half-periods are always positive when valid. # adjacent bytes [7] / [11] / [15] / [19] hold an annotation field
t_peak, t_halfp, v_peak, v_halfp, l_peak, l_halfp, m_peak, m_halfp = struct.unpack_from( # whose meaning isn't fully understood (empirically non-zero in
"<HHHHHHHH", block, 6 # intervals with sub-Hz or unmeasurable geo frequencies, mostly
) # zero otherwise — see test fixtures from BE9558/BE18003 corpora).
if (t_peak > _MAX_PEAK_COUNT or v_peak > _MAX_PEAK_COUNT # Crucially, those annotation bytes are NOT the high byte of the
or l_peak > _MAX_PEAK_COUNT or m_peak > _MAX_PEAK_COUNT): # peak count: cross-correlating against BW's per-interval ASCII
return None # export proves the peak is uint8 alone.
#
# Reading the peak as uint16 LE (the original interpretation) was
# accidentally correct only because every block in the N844 fixture
# corpus had a zero annotation byte; non-N844 events with non-zero
# annotation bytes decoded to physically impossible peaks (e.g.
# 268 in/s per channel) and produced 35× inflated PVS sums when
# first run against prod data. See histogram_codec_re_status.md.
t_peak = block[6]
v_peak = block[10]
l_peak = block[14]
m_peak = block[18]
t_halfp = block[8] | (block[9] << 8)
v_halfp = block[12] | (block[13] << 8)
l_halfp = block[16] | (block[17] << 8)
m_halfp = block[20] | (block[21] << 8)
segment_id = block[1] segment_id = block[1]
block_ctr = block[2] | (block[3] << 8) block_ctr = block[2] | (block[3] << 8)
var_meta = bytes(block[24:28]) var_meta = bytes(block[24:28])
annotations = (block[7], block[11], block[15], block[19])
return { return {
"segment_id": segment_id, "segment_id": segment_id,
"block_ctr": block_ctr, "block_ctr": block_ctr,
@@ -178,6 +188,7 @@ def _decode_block(block: bytes) -> Optional[dict]:
"m_peak": m_peak, "m_peak": m_peak,
"m_halfp": m_halfp, "m_halfp": m_halfp,
"meta_var": var_meta, "meta_var": var_meta,
"annotations": annotations,
} }
@@ -185,10 +196,15 @@ def walk_body(body: bytes) -> List[dict]:
"""Walk the body and return one dict per histogram interval. """Walk the body and return one dict per histogram interval.
Iterates 32-byte strides from offset 0. Yields a decoded record Iterates 32-byte strides from offset 0. Yields a decoded record
for every block that passes ``_is_data_block`` validation AND has for every block that passes ``_is_data_block`` validation. Stops
plausible peak values (``_decode_block`` returns None for blocks when the remaining bytes are too short to form a complete block.
with out-of-bound peaks). Stops when the remaining bytes are too
short to form a complete block. In Histogram+Continuous mode the body interleaves data blocks with
other 32-byte content (likely continuous-mode waveform blocks) that
fail the data-block validation; the walker naturally skips them
without losing 32-byte alignment. Use ``block_ctr`` from each
returned record to map back to the original interval index — the
record list is sparse when other block types are interleaved.
""" """
records: List[dict] = [] records: List[dict] = []
for off in range(0, len(body) - _BLOCK_SIZE + 1, _BLOCK_SIZE): for off in range(0, len(body) - _BLOCK_SIZE + 1, _BLOCK_SIZE):
+14 -3
View File
@@ -287,16 +287,25 @@ def main(argv=None) -> int:
or ev.total_samples < derived // 4): or ev.total_samples < derived // 4):
ev.total_samples = derived ev.total_samples = derived
# Preserve user-edited review state + extensions from the # Preserve user-edited review state + extensions + the
# existing sidecar (false_trigger flag, notes, etc.) so a # bw_report block from the existing sidecar so a backfill
# backfill never wipes them out. # never wipes them out. The bw_report block originates
# from the paired .TXT ASCII report parsed at ORIGINAL
# import time (ach forward / direct upload); the .TXT
# file is not in the waveform store, so we can't re-derive
# it from disk. event_to_sidecar_dict takes a
# BwAsciiReport dataclass (not a dict), so for bw_report
# we overlay the existing block after regen instead of
# passing it as a kwarg.
preserved_review = None preserved_review = None
preserved_ext = None preserved_ext = None
preserved_bw_report = None
if sidecar_path.exists(): if sidecar_path.exists():
try: try:
_existing = event_file_io.read_sidecar(sidecar_path) _existing = event_file_io.read_sidecar(sidecar_path)
preserved_review = _existing.get("review") preserved_review = _existing.get("review")
preserved_ext = _existing.get("extensions") preserved_ext = _existing.get("extensions")
preserved_bw_report = _existing.get("bw_report")
except Exception: except Exception:
pass pass
@@ -311,6 +320,8 @@ def main(argv=None) -> int:
review=preserved_review, review=preserved_review,
extensions=preserved_ext, extensions=preserved_ext,
) )
if preserved_bw_report is not None:
sidecar["bw_report"] = preserved_bw_report
# Also emit the .h5 clean-waveform file when: # Also emit the .h5 clean-waveform file when:
# - it's missing, OR # - it's missing, OR
+48
View File
@@ -335,3 +335,51 @@ def test_geo_count_to_ins_scale():
assert geo_count_to_ins(1) == pytest.approx(0.005) assert geo_count_to_ins(1) == pytest.approx(0.005)
assert geo_count_to_ins(10) == pytest.approx(0.050) assert geo_count_to_ins(10) == pytest.approx(0.050)
assert geo_count_to_ins(0) == 0.0 assert geo_count_to_ins(0) == 0.0
# ── Regression: peak is uint8 byte[N], NOT uint16 LE byte[N:N+2] ────────────
#
# Block taken verbatim from K558LKZU.RE0H (BE9558) interval 12 — a real
# field event where the Tran channel had developed a DC offset and was
# producing sub-Hz drift content the device couldn't characterize.
# The annotation byte at [7] = 0xd2 is non-zero in that case. The
# legacy codec read [6:8] as uint16 LE, producing T_peak = 53763 →
# 268 in/s — physically impossible and 35× too high for the actual
# 0.015 in/s value (T_lo = 3 alone gives the correct count).
# Verified against the paired BW ASCII export.
_K558_INTERVAL_12_BLOCK = bytes.fromhex(
"00 00 0c 01 0a 00 03 d2 45 00 02 00 02 00 02 00"
"02 00 10 00 06 00 00 00 0e 91 2f 00 1e 0a 00 00".replace(" ", "")
)
def test_extension_byte_does_not_inflate_peak():
"""The annotation byte at [7]/[11]/[15]/[19] must NOT contribute to
the peak count. Decoded T_peak must be 3 (uint8 byte[6]), NOT
53763 (uint16 LE byte[6:8])."""
body = _K558_INTERVAL_12_BLOCK
records = decode_histogram_body_full(body)
assert records is not None
assert len(records) == 1
r = records[0]
assert r["t_peak"] == 3, f"T_peak should be 3 (uint8), got {r['t_peak']}"
assert r["v_peak"] == 2
assert r["l_peak"] == 2
assert r["m_peak"] == 16
# Half-periods unchanged — still uint16 LE.
assert r["t_halfp"] == 0x0045 # 69 → 7.4 Hz
assert r["m_halfp"] == 6 # → 85.3 Hz
# Annotation byte is preserved (for future RE) but does not affect peak.
assert r["annotations"] == (0xd2, 0x00, 0x00, 0x00)
def test_extension_byte_decoded_to_correct_in_s():
"""End-to-end: the channel-grouped output for the K558 ext block
should give T = 3 counts = 0.015 in/s, not 53763 counts = 268 in/s."""
channels = decode_histogram_body(_K558_INTERVAL_12_BLOCK)
assert channels is not None
assert channels["Tran"] == [3]
assert geo_count_to_ins(channels["Tran"][0]) == pytest.approx(0.015)
assert channels["Vert"] == [2]
assert channels["Long"] == [2]
assert channels["MicL"] == [16]