Merge pull request 'Histogram body codec — full RE + peak-count fix that resolves the prod inflation incident' (#26) from feat/wire-histogram-codec into dev
Reviewed-on: #26
This commit was merged in pull request #26.
This commit is contained in:
+1
-1
@@ -1,6 +1,6 @@
|
|||||||
/bridges/captures/
|
/bridges/captures/
|
||||||
/example-events/
|
/example-events/
|
||||||
|
/tests/fixtures/
|
||||||
/manuals/
|
/manuals/
|
||||||
|
|
||||||
# Python build artifacts
|
# Python build artifacts
|
||||||
|
|||||||
@@ -12,7 +12,21 @@ implementation lives in `minimateplus/histogram_codec.py`.
|
|||||||
in-repo histogram fixture corpus decodes byte-exact against BW's
|
in-repo histogram fixture corpus decodes byte-exact against BW's
|
||||||
ASCII export.
|
ASCII export.
|
||||||
|
|
||||||
24 regression tests pass against ~3,500 blocks across 5 fixtures.
|
26 regression tests pass against ~3,500 blocks across 5 in-repo
|
||||||
|
fixtures, plus a synthetic regression block taken from a real
|
||||||
|
BE9558 prod event to lock in the uint8-peak interpretation.
|
||||||
|
|
||||||
|
**Important correction (2026-05-21):** the per-channel peak count
|
||||||
|
is `uint8` at byte[6]/[10]/[14]/[18], NOT `uint16 LE` at byte[6:8]
|
||||||
|
etc. The N844 fixture corpus the original RE was done against has
|
||||||
|
zero values in bytes [7]/[11]/[15]/[19] for every block, so the
|
||||||
|
two interpretations happened to be equivalent. Cross-correlating
|
||||||
|
non-N844 events (BE9558 Tran-drift, BE18003 Histogram+Continuous)
|
||||||
|
against BW's per-interval ASCII export — 4 channels × ~1400 blocks
|
||||||
|
per event × multiple events = 100% byte-exact only when the peak
|
||||||
|
is read as uint8. Reading as uint16 LE produced peaks up to 268
|
||||||
|
in/s per channel and 35× inflated PVS sums when first deployed to
|
||||||
|
prod (rolled back, root-caused, and fixed in commit 7183b95+1).
|
||||||
|
|
||||||
## Body format
|
## Body format
|
||||||
|
|
||||||
@@ -27,15 +41,21 @@ Each block represents one histogram interval. Block layout:
|
|||||||
[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment
|
[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment
|
||||||
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …)
|
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …)
|
||||||
[4:6] 0x000a (uint16 LE) constant marker (= 10)
|
[4:6] 0x000a (uint16 LE) constant marker (= 10)
|
||||||
[6:8] T_peak_count uint16 LE Tran peak (count × 0.005 → in/s at Normal)
|
[6] T_peak_count uint8 Tran peak (count × 0.005 → in/s at Normal,
|
||||||
|
max 1.275 in/s — fits in uint8)
|
||||||
|
[7] T_annotation uint8 empirically non-zero on intervals with sub-Hz
|
||||||
|
or unmeasurable freq; meaning not fully RE'd
|
||||||
[8:10] T_halfperiod uint16 LE Tran half-period in samples
|
[8:10] T_halfperiod uint16 LE Tran half-period in samples
|
||||||
(freq_Hz = 512 / halfp; ≤ 5 means ">100 Hz")
|
(freq_Hz = 512 / halfp; ≤ 5 means ">100 Hz")
|
||||||
[10:12] V_peak_count uint16 LE Vert peak
|
[10] V_peak_count uint8 Vert peak
|
||||||
|
[11] V_annotation uint8
|
||||||
[12:14] V_halfperiod uint16 LE Vert freq half-period
|
[12:14] V_halfperiod uint16 LE Vert freq half-period
|
||||||
[14:16] L_peak_count uint16 LE Long peak
|
[14] L_peak_count uint8 Long peak
|
||||||
|
[15] L_annotation uint8
|
||||||
[16:18] L_halfperiod uint16 LE Long freq half-period
|
[16:18] L_halfperiod uint16 LE Long freq half-period
|
||||||
[18:20] M_peak_count uint16 LE MicL peak count
|
[18] M_peak_count uint8 MicL peak count
|
||||||
(dB via waveform_codec.mic_count_to_db)
|
(dB via waveform_codec.mic_count_to_db)
|
||||||
|
[19] M_annotation uint8
|
||||||
[20:22] M_halfperiod uint16 LE MicL freq half-period
|
[20:22] M_halfperiod uint16 LE MicL freq half-period
|
||||||
[22:24] 0x00 0x00 constant
|
[22:24] 0x00 0x00 constant
|
||||||
[24:28] 4-byte variable purpose unknown — possibly CRC,
|
[24:28] 4-byte variable purpose unknown — possibly CRC,
|
||||||
@@ -99,6 +119,16 @@ slot[8] = 9 → 512/9 = 56.9 → 57 Hz ✓ M_freq
|
|||||||
|
|
||||||
## What's NOT yet decoded
|
## What's NOT yet decoded
|
||||||
|
|
||||||
|
- **Annotation bytes (`block[7]/[11]/[15]/[19]`)**. Empirically
|
||||||
|
non-zero on intervals where the per-channel ZC frequency comes
|
||||||
|
out as `N/A` or sub-Hz (`<1.0`, `1.X`). Hypothesis tested in the
|
||||||
|
RE session: byte != 0 ↔ sub-Hz freq. Only ~50% correlation
|
||||||
|
across the K558 corpus, so the relationship is more complex.
|
||||||
|
Possibilities: time-of-peak-within-interval, halfp extension for
|
||||||
|
very-long-period signals, or a debug/diagnostic field the firmware
|
||||||
|
writes opportunistically. Doesn't affect peak amplitudes or
|
||||||
|
waveform reconstruction. Captured as `record["annotations"]` for
|
||||||
|
future RE.
|
||||||
- **4-byte variable metadata field (bytes 24:28)**. Not needed for
|
- **4-byte variable metadata field (bytes 24:28)**. Not needed for
|
||||||
waveform reconstruction. Speculation: per-block CRC, sub-second
|
waveform reconstruction. Speculation: per-block CRC, sub-second
|
||||||
timestamp offset, or a Mic psi(L) count not in the 9 samples.
|
timestamp offset, or a Mic psi(L) count not in the 9 samples.
|
||||||
|
|||||||
@@ -28,18 +28,32 @@ iterate 32-stride and stop before the tail.
|
|||||||
[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment
|
[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment
|
||||||
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …)
|
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …)
|
||||||
[4:6] 0x000a (uint16 LE) constant marker (= 10)
|
[4:6] 0x000a (uint16 LE) constant marker (= 10)
|
||||||
[6:8] T_peak_count uint16 LE Tran peak (count × 0.005 → in/s)
|
[6] T_peak_count uint8 Tran peak (count × 0.005 → in/s, max 1.275 in/s)
|
||||||
|
[7] T_annotation uint8 empirically non-zero on intervals with sub-Hz
|
||||||
|
or unmeasurable Tran freq; meaning not fully RE'd
|
||||||
[8:10] T_halfperiod uint16 LE Tran half-period in samples (freq = 512 / halfp Hz)
|
[8:10] T_halfperiod uint16 LE Tran half-period in samples (freq = 512 / halfp Hz)
|
||||||
[10:12] V_peak_count uint16 LE
|
[10] V_peak_count uint8
|
||||||
|
[11] V_annotation uint8
|
||||||
[12:14] V_halfperiod uint16 LE
|
[12:14] V_halfperiod uint16 LE
|
||||||
[14:16] L_peak_count uint16 LE
|
[14] L_peak_count uint8
|
||||||
|
[15] L_annotation uint8
|
||||||
[16:18] L_halfperiod uint16 LE
|
[16:18] L_halfperiod uint16 LE
|
||||||
[18:20] M_peak_count uint16 LE MicL peak (count → dB via mic_count_to_db)
|
[18] M_peak_count uint8 MicL peak (count → dB via mic_count_to_db)
|
||||||
|
[19] M_annotation uint8
|
||||||
[20:22] M_halfperiod uint16 LE MicL half-period in samples (freq = 512 / halfp Hz)
|
[20:22] M_halfperiod uint16 LE MicL half-period in samples (freq = 512 / halfp Hz)
|
||||||
[22:24] 0x00 0x00 constant
|
[22:24] 0x00 0x00 constant
|
||||||
[24:28] 4-byte variable purpose unknown (possibly CRC or timestamp delta)
|
[24:28] 4-byte variable purpose unknown (possibly CRC or timestamp delta)
|
||||||
[28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature
|
[28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature
|
||||||
|
|
||||||
|
NOTE on peak-count width: an earlier interpretation treated the peak
|
||||||
|
fields as uint16 LE spanning [6:8] / [10:12] / [14:16] / [18:20].
|
||||||
|
That happened to be byte-exact against the N844 fixture corpus only
|
||||||
|
because every annotation byte in those fixtures was zero, making
|
||||||
|
``uint16 LE == uint8``. Cross-correlating BE9558 (K558) Tran-drift
|
||||||
|
and BE18003 (T003) Histogram+Continuous events against the BW ASCII
|
||||||
|
export proved peak is uint8 alone — see test_histogram_codec.py
|
||||||
|
and docs/histogram_codec_re_status.md.
|
||||||
|
|
||||||
Block-identification anchor: ``block[22:24] == b"\\x00\\x00"`` AND
|
Block-identification anchor: ``block[22:24] == b"\\x00\\x00"`` AND
|
||||||
``block[28:32] == b"\\x1e\\x0a\\x00\\x00"``. This is the reliable
|
``block[28:32] == b"\\x1e\\x0a\\x00\\x00"``. This is the reliable
|
||||||
distinguisher from non-block content in the file.
|
distinguisher from non-block content in the file.
|
||||||
@@ -128,17 +142,40 @@ def _is_data_block(block: bytes) -> bool:
|
|||||||
return True
|
return True
|
||||||
|
|
||||||
|
|
||||||
def _decode_block(block: bytes) -> dict:
|
def _decode_block(block: bytes) -> Optional[dict]:
|
||||||
"""Decode one 32-byte histogram block. Caller must have validated
|
"""Decode one 32-byte histogram block. Caller must have validated
|
||||||
with ``_is_data_block`` first."""
|
with ``_is_data_block`` first.
|
||||||
# All 16-bit fields are little-endian unsigned. Peak counts are
|
|
||||||
# always non-negative; half-periods are always positive when valid.
|
Returns a record with per-channel peak counts (uint8) and
|
||||||
t_peak, t_halfp, v_peak, v_halfp, l_peak, l_halfp, m_peak, m_halfp = struct.unpack_from(
|
half-periods (uint16 LE).
|
||||||
"<HHHHHHHH", block, 6
|
"""
|
||||||
)
|
# Peak counts are uint8 at bytes [6] / [10] / [14] / [18]. The
|
||||||
|
# adjacent bytes [7] / [11] / [15] / [19] hold an annotation field
|
||||||
|
# whose meaning isn't fully understood (empirically non-zero in
|
||||||
|
# intervals with sub-Hz or unmeasurable geo frequencies, mostly
|
||||||
|
# zero otherwise — see test fixtures from BE9558/BE18003 corpora).
|
||||||
|
# Crucially, those annotation bytes are NOT the high byte of the
|
||||||
|
# peak count: cross-correlating against BW's per-interval ASCII
|
||||||
|
# export proves the peak is uint8 alone.
|
||||||
|
#
|
||||||
|
# Reading the peak as uint16 LE (the original interpretation) was
|
||||||
|
# accidentally correct only because every block in the N844 fixture
|
||||||
|
# corpus had a zero annotation byte; non-N844 events with non-zero
|
||||||
|
# annotation bytes decoded to physically impossible peaks (e.g.
|
||||||
|
# 268 in/s per channel) and produced 35× inflated PVS sums when
|
||||||
|
# first run against prod data. See histogram_codec_re_status.md.
|
||||||
|
t_peak = block[6]
|
||||||
|
v_peak = block[10]
|
||||||
|
l_peak = block[14]
|
||||||
|
m_peak = block[18]
|
||||||
|
t_halfp = block[8] | (block[9] << 8)
|
||||||
|
v_halfp = block[12] | (block[13] << 8)
|
||||||
|
l_halfp = block[16] | (block[17] << 8)
|
||||||
|
m_halfp = block[20] | (block[21] << 8)
|
||||||
segment_id = block[1]
|
segment_id = block[1]
|
||||||
block_ctr = block[2] | (block[3] << 8)
|
block_ctr = block[2] | (block[3] << 8)
|
||||||
var_meta = bytes(block[24:28])
|
var_meta = bytes(block[24:28])
|
||||||
|
annotations = (block[7], block[11], block[15], block[19])
|
||||||
return {
|
return {
|
||||||
"segment_id": segment_id,
|
"segment_id": segment_id,
|
||||||
"block_ctr": block_ctr,
|
"block_ctr": block_ctr,
|
||||||
@@ -151,6 +188,7 @@ def _decode_block(block: bytes) -> dict:
|
|||||||
"m_peak": m_peak,
|
"m_peak": m_peak,
|
||||||
"m_halfp": m_halfp,
|
"m_halfp": m_halfp,
|
||||||
"meta_var": var_meta,
|
"meta_var": var_meta,
|
||||||
|
"annotations": annotations,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -160,6 +198,13 @@ def walk_body(body: bytes) -> List[dict]:
|
|||||||
Iterates 32-byte strides from offset 0. Yields a decoded record
|
Iterates 32-byte strides from offset 0. Yields a decoded record
|
||||||
for every block that passes ``_is_data_block`` validation. Stops
|
for every block that passes ``_is_data_block`` validation. Stops
|
||||||
when the remaining bytes are too short to form a complete block.
|
when the remaining bytes are too short to form a complete block.
|
||||||
|
|
||||||
|
In Histogram+Continuous mode the body interleaves data blocks with
|
||||||
|
other 32-byte content (likely continuous-mode waveform blocks) that
|
||||||
|
fail the data-block validation; the walker naturally skips them
|
||||||
|
without losing 32-byte alignment. Use ``block_ctr`` from each
|
||||||
|
returned record to map back to the original interval index — the
|
||||||
|
record list is sparse when other block types are interleaved.
|
||||||
"""
|
"""
|
||||||
records: List[dict] = []
|
records: List[dict] = []
|
||||||
for off in range(0, len(body) - _BLOCK_SIZE + 1, _BLOCK_SIZE):
|
for off in range(0, len(body) - _BLOCK_SIZE + 1, _BLOCK_SIZE):
|
||||||
@@ -169,7 +214,13 @@ def walk_body(body: bytes) -> List[dict]:
|
|||||||
# Continue walking — block alignment is fixed at 32-stride
|
# Continue walking — block alignment is fixed at 32-stride
|
||||||
# from offset 0, so we don't lose alignment by skipping.
|
# from offset 0, so we don't lose alignment by skipping.
|
||||||
continue
|
continue
|
||||||
records.append(_decode_block(blk))
|
decoded = _decode_block(blk)
|
||||||
|
if decoded is None:
|
||||||
|
# Block validated as a histogram block but had peak fields
|
||||||
|
# outside the plausible range — undocumented extension.
|
||||||
|
# Skip rather than propagating bogus PVS contributions.
|
||||||
|
continue
|
||||||
|
records.append(decoded)
|
||||||
return records
|
return records
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -54,14 +54,26 @@ log = logging.getLogger("backfill_sidecars")
|
|||||||
|
|
||||||
|
|
||||||
def _looks_like_event_file(path: Path) -> bool:
|
def _looks_like_event_file(path: Path) -> bool:
|
||||||
"""Same heuristic as the importer CLI."""
|
"""Same heuristic as the importer CLI.
|
||||||
|
|
||||||
|
Filters to BW (Series III) event files only — Thor (Series IV)
|
||||||
|
`.IDFW` / `.IDFH` files share the store but have their own ingest
|
||||||
|
path (`WaveformStore.save_imported_idf`) and are NOT decodable by
|
||||||
|
`event_file_io.read_blastware_file`. Their sidecars are populated
|
||||||
|
at ingest from the paired `.IDFW.txt` ASCII report; nothing the
|
||||||
|
backfill regenerates would improve on them, so we exclude them
|
||||||
|
from scope.
|
||||||
|
"""
|
||||||
if not path.is_file():
|
if not path.is_file():
|
||||||
return False
|
return False
|
||||||
if path.name.endswith((".a5.pkl", ".sfm.json")):
|
if path.name.endswith((".a5.pkl", ".sfm.json", ".h5")):
|
||||||
return False
|
return False
|
||||||
ext = path.suffix.lstrip(".")
|
ext = path.suffix.lstrip(".")
|
||||||
if not (3 <= len(ext) <= 4):
|
if not (3 <= len(ext) <= 4):
|
||||||
return False
|
return False
|
||||||
|
# Thor IDF files share the .{W,H}-suffix shape but aren't BW.
|
||||||
|
if ext.upper() in ("IDFW", "IDFH"):
|
||||||
|
return False
|
||||||
if not (ext[-1].upper() in {"W", "H"} or ext.endswith("0")):
|
if not (ext[-1].upper() in {"W", "H"} or ext.endswith("0")):
|
||||||
return False
|
return False
|
||||||
try:
|
try:
|
||||||
@@ -275,16 +287,25 @@ def main(argv=None) -> int:
|
|||||||
or ev.total_samples < derived // 4):
|
or ev.total_samples < derived // 4):
|
||||||
ev.total_samples = derived
|
ev.total_samples = derived
|
||||||
|
|
||||||
# Preserve user-edited review state + extensions from the
|
# Preserve user-edited review state + extensions + the
|
||||||
# existing sidecar (false_trigger flag, notes, etc.) so a
|
# bw_report block from the existing sidecar so a backfill
|
||||||
# backfill never wipes them out.
|
# never wipes them out. The bw_report block originates
|
||||||
|
# from the paired .TXT ASCII report parsed at ORIGINAL
|
||||||
|
# import time (ach forward / direct upload); the .TXT
|
||||||
|
# file is not in the waveform store, so we can't re-derive
|
||||||
|
# it from disk. event_to_sidecar_dict takes a
|
||||||
|
# BwAsciiReport dataclass (not a dict), so for bw_report
|
||||||
|
# we overlay the existing block after regen instead of
|
||||||
|
# passing it as a kwarg.
|
||||||
preserved_review = None
|
preserved_review = None
|
||||||
preserved_ext = None
|
preserved_ext = None
|
||||||
|
preserved_bw_report = None
|
||||||
if sidecar_path.exists():
|
if sidecar_path.exists():
|
||||||
try:
|
try:
|
||||||
_existing = event_file_io.read_sidecar(sidecar_path)
|
_existing = event_file_io.read_sidecar(sidecar_path)
|
||||||
preserved_review = _existing.get("review")
|
preserved_review = _existing.get("review")
|
||||||
preserved_ext = _existing.get("extensions")
|
preserved_ext = _existing.get("extensions")
|
||||||
|
preserved_bw_report = _existing.get("bw_report")
|
||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
@@ -299,6 +320,8 @@ def main(argv=None) -> int:
|
|||||||
review=preserved_review,
|
review=preserved_review,
|
||||||
extensions=preserved_ext,
|
extensions=preserved_ext,
|
||||||
)
|
)
|
||||||
|
if preserved_bw_report is not None:
|
||||||
|
sidecar["bw_report"] = preserved_bw_report
|
||||||
|
|
||||||
# Also emit the .h5 clean-waveform file when:
|
# Also emit the .h5 clean-waveform file when:
|
||||||
# - it's missing, OR
|
# - it's missing, OR
|
||||||
|
|||||||
@@ -0,0 +1,185 @@
|
|||||||
|
"""
|
||||||
|
scripts/check_bw_report_preservation.py — verify that running backfill_sidecars
|
||||||
|
doesn't wipe the `bw_report` block from sidecars that already had one.
|
||||||
|
|
||||||
|
Two-step workflow:
|
||||||
|
|
||||||
|
# Before running backfill — capture a baseline snapshot:
|
||||||
|
python scripts/check_bw_report_preservation.py snapshot \
|
||||||
|
--store-root /path/to/waveforms \
|
||||||
|
--out before.json
|
||||||
|
|
||||||
|
# Run backfill:
|
||||||
|
python scripts/backfill_sidecars.py --store-root /path/to/waveforms --force
|
||||||
|
|
||||||
|
# After backfill — diff against the baseline:
|
||||||
|
python scripts/check_bw_report_preservation.py diff \
|
||||||
|
--store-root /path/to/waveforms \
|
||||||
|
--baseline before.json
|
||||||
|
|
||||||
|
The diff classifies every sidecar into one of:
|
||||||
|
|
||||||
|
PRESERVED had bw_report before, has same hash now ← GOOD
|
||||||
|
CHANGED had bw_report before, has different hash now ← suspicious
|
||||||
|
(backfill should only ever copy the block verbatim)
|
||||||
|
WIPED had bw_report before, doesn't now ← BUG — data loss
|
||||||
|
STILL_MISSING didn't have bw_report before, still doesn't ← expected
|
||||||
|
NEW didn't have bw_report before, has one now
|
||||||
|
(only possible if a re-ingest happened between snapshots;
|
||||||
|
shouldn't happen during backfill)
|
||||||
|
REMOVED sidecar existed in baseline, file is gone now
|
||||||
|
ADDED sidecar didn't exist in baseline, exists now
|
||||||
|
|
||||||
|
Exit code is 0 if no WIPED or CHANGED entries are found, 1 otherwise.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
# Allow running from the repo root without installation.
|
||||||
|
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||||
|
|
||||||
|
from minimateplus import event_file_io
|
||||||
|
|
||||||
|
|
||||||
|
def _bw_report_hash(sidecar_data: dict) -> Optional[str]:
|
||||||
|
"""Canonical-JSON hash of the bw_report block, or None if absent."""
|
||||||
|
br = sidecar_data.get("bw_report")
|
||||||
|
if not br:
|
||||||
|
return None
|
||||||
|
# sort_keys for stable hashing across dict-ordering differences
|
||||||
|
blob = json.dumps(br, sort_keys=True, separators=(",", ":"))
|
||||||
|
return hashlib.sha256(blob.encode()).hexdigest()
|
||||||
|
|
||||||
|
|
||||||
|
def _scan_store(store_root: Path) -> dict:
|
||||||
|
"""Walk every <serial>/<file>.sfm.json and return {relpath: hash_or_None}.
|
||||||
|
|
||||||
|
Relpath is `<serial>/<filename>` — stable across machines/snapshots.
|
||||||
|
"""
|
||||||
|
out: dict[str, Optional[str]] = {}
|
||||||
|
for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
|
||||||
|
for sidecar in sorted(serial_dir.glob("*.sfm.json")):
|
||||||
|
relpath = f"{serial_dir.name}/{sidecar.name}"
|
||||||
|
try:
|
||||||
|
data = event_file_io.read_sidecar(sidecar)
|
||||||
|
except Exception as exc:
|
||||||
|
print(f" WARN: failed to read {relpath}: {exc}", file=sys.stderr)
|
||||||
|
continue
|
||||||
|
out[relpath] = _bw_report_hash(data)
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_snapshot(args) -> int:
|
||||||
|
store_root = Path(args.store_root).expanduser().resolve()
|
||||||
|
if not store_root.exists():
|
||||||
|
print(f"error: store root does not exist: {store_root}", file=sys.stderr)
|
||||||
|
return 2
|
||||||
|
out_path = Path(args.out).expanduser().resolve()
|
||||||
|
|
||||||
|
print(f"Scanning {store_root} …")
|
||||||
|
snapshot = _scan_store(store_root)
|
||||||
|
|
||||||
|
with_bw = sum(1 for v in snapshot.values() if v is not None)
|
||||||
|
without_bw = sum(1 for v in snapshot.values() if v is None)
|
||||||
|
print(f" total sidecars: {len(snapshot)}")
|
||||||
|
print(f" with bw_report: {with_bw}")
|
||||||
|
print(f" without bw_report: {without_bw}")
|
||||||
|
|
||||||
|
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(out_path, "w") as f:
|
||||||
|
json.dump({
|
||||||
|
"store_root": str(store_root),
|
||||||
|
"total": len(snapshot),
|
||||||
|
"with_bw": with_bw,
|
||||||
|
"sidecars": snapshot,
|
||||||
|
}, f, indent=2, sort_keys=True)
|
||||||
|
print(f"Wrote baseline → {out_path}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_diff(args) -> int:
|
||||||
|
store_root = Path(args.store_root).expanduser().resolve()
|
||||||
|
if not store_root.exists():
|
||||||
|
print(f"error: store root does not exist: {store_root}", file=sys.stderr)
|
||||||
|
return 2
|
||||||
|
baseline_path = Path(args.baseline).expanduser().resolve()
|
||||||
|
if not baseline_path.exists():
|
||||||
|
print(f"error: baseline file not found: {baseline_path}", file=sys.stderr)
|
||||||
|
return 2
|
||||||
|
|
||||||
|
with open(baseline_path) as f:
|
||||||
|
baseline = json.load(f)
|
||||||
|
before = baseline["sidecars"]
|
||||||
|
print(f"Scanning {store_root} for comparison against {baseline_path.name} …")
|
||||||
|
after = _scan_store(store_root)
|
||||||
|
|
||||||
|
classes = {k: [] for k in (
|
||||||
|
"PRESERVED", "CHANGED", "WIPED", "STILL_MISSING", "NEW", "REMOVED", "ADDED",
|
||||||
|
)}
|
||||||
|
all_keys = set(before) | set(after)
|
||||||
|
for key in sorted(all_keys):
|
||||||
|
b = before.get(key, "__MISSING__")
|
||||||
|
a = after.get(key, "__MISSING__")
|
||||||
|
if b == "__MISSING__":
|
||||||
|
classes["ADDED"].append(key)
|
||||||
|
elif a == "__MISSING__":
|
||||||
|
classes["REMOVED"].append(key)
|
||||||
|
elif b is None and a is None:
|
||||||
|
classes["STILL_MISSING"].append(key)
|
||||||
|
elif b is None and a is not None:
|
||||||
|
classes["NEW"].append(key)
|
||||||
|
elif b is not None and a is None:
|
||||||
|
classes["WIPED"].append(key)
|
||||||
|
elif b == a:
|
||||||
|
classes["PRESERVED"].append(key)
|
||||||
|
else:
|
||||||
|
classes["CHANGED"].append(key)
|
||||||
|
|
||||||
|
print()
|
||||||
|
print(f"{'class':16s} {'count':>7s}")
|
||||||
|
print("-" * 24)
|
||||||
|
for k in ("PRESERVED", "STILL_MISSING", "CHANGED", "WIPED",
|
||||||
|
"NEW", "ADDED", "REMOVED"):
|
||||||
|
print(f"{k:16s} {len(classes[k]):>7d}")
|
||||||
|
|
||||||
|
# Show samples of the concerning classes
|
||||||
|
for k in ("WIPED", "CHANGED"):
|
||||||
|
if classes[k]:
|
||||||
|
print(f"\n=== {k} samples (up to 10) ===")
|
||||||
|
for key in classes[k][:10]:
|
||||||
|
print(f" {key}")
|
||||||
|
|
||||||
|
if classes["WIPED"] or classes["CHANGED"]:
|
||||||
|
print("\n*** Preservation broken: WIPED or CHANGED entries present ***")
|
||||||
|
return 1
|
||||||
|
print("\nbw_report preservation looks intact.")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv=None) -> int:
|
||||||
|
p = argparse.ArgumentParser(description=__doc__)
|
||||||
|
sub = p.add_subparsers(dest="cmd", required=True)
|
||||||
|
|
||||||
|
p_snap = sub.add_parser("snapshot", help="capture baseline bw_report hashes")
|
||||||
|
p_snap.add_argument("--store-root", required=True)
|
||||||
|
p_snap.add_argument("--out", required=True, help="output JSON path")
|
||||||
|
p_snap.set_defaults(func=cmd_snapshot)
|
||||||
|
|
||||||
|
p_diff = sub.add_parser("diff", help="diff current store against a baseline")
|
||||||
|
p_diff.add_argument("--store-root", required=True)
|
||||||
|
p_diff.add_argument("--baseline", required=True, help="JSON from `snapshot`")
|
||||||
|
p_diff.set_defaults(func=cmd_diff)
|
||||||
|
|
||||||
|
args = p.parse_args(argv)
|
||||||
|
return args.func(args)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
@@ -335,3 +335,51 @@ def test_geo_count_to_ins_scale():
|
|||||||
assert geo_count_to_ins(1) == pytest.approx(0.005)
|
assert geo_count_to_ins(1) == pytest.approx(0.005)
|
||||||
assert geo_count_to_ins(10) == pytest.approx(0.050)
|
assert geo_count_to_ins(10) == pytest.approx(0.050)
|
||||||
assert geo_count_to_ins(0) == 0.0
|
assert geo_count_to_ins(0) == 0.0
|
||||||
|
|
||||||
|
|
||||||
|
# ── Regression: peak is uint8 byte[N], NOT uint16 LE byte[N:N+2] ────────────
|
||||||
|
#
|
||||||
|
# Block taken verbatim from K558LKZU.RE0H (BE9558) interval 12 — a real
|
||||||
|
# field event where the Tran channel had developed a DC offset and was
|
||||||
|
# producing sub-Hz drift content the device couldn't characterize.
|
||||||
|
# The annotation byte at [7] = 0xd2 is non-zero in that case. The
|
||||||
|
# legacy codec read [6:8] as uint16 LE, producing T_peak = 53763 →
|
||||||
|
# 268 in/s — physically impossible and 35× too high for the actual
|
||||||
|
# 0.015 in/s value (T_lo = 3 alone gives the correct count).
|
||||||
|
# Verified against the paired BW ASCII export.
|
||||||
|
_K558_INTERVAL_12_BLOCK = bytes.fromhex(
|
||||||
|
"00 00 0c 01 0a 00 03 d2 45 00 02 00 02 00 02 00"
|
||||||
|
"02 00 10 00 06 00 00 00 0e 91 2f 00 1e 0a 00 00".replace(" ", "")
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_extension_byte_does_not_inflate_peak():
|
||||||
|
"""The annotation byte at [7]/[11]/[15]/[19] must NOT contribute to
|
||||||
|
the peak count. Decoded T_peak must be 3 (uint8 byte[6]), NOT
|
||||||
|
53763 (uint16 LE byte[6:8])."""
|
||||||
|
body = _K558_INTERVAL_12_BLOCK
|
||||||
|
records = decode_histogram_body_full(body)
|
||||||
|
assert records is not None
|
||||||
|
assert len(records) == 1
|
||||||
|
r = records[0]
|
||||||
|
assert r["t_peak"] == 3, f"T_peak should be 3 (uint8), got {r['t_peak']}"
|
||||||
|
assert r["v_peak"] == 2
|
||||||
|
assert r["l_peak"] == 2
|
||||||
|
assert r["m_peak"] == 16
|
||||||
|
# Half-periods unchanged — still uint16 LE.
|
||||||
|
assert r["t_halfp"] == 0x0045 # 69 → 7.4 Hz
|
||||||
|
assert r["m_halfp"] == 6 # → 85.3 Hz
|
||||||
|
# Annotation byte is preserved (for future RE) but does not affect peak.
|
||||||
|
assert r["annotations"] == (0xd2, 0x00, 0x00, 0x00)
|
||||||
|
|
||||||
|
|
||||||
|
def test_extension_byte_decoded_to_correct_in_s():
|
||||||
|
"""End-to-end: the channel-grouped output for the K558 ext block
|
||||||
|
should give T = 3 counts = 0.015 in/s, not 53763 counts = 268 in/s."""
|
||||||
|
channels = decode_histogram_body(_K558_INTERVAL_12_BLOCK)
|
||||||
|
assert channels is not None
|
||||||
|
assert channels["Tran"] == [3]
|
||||||
|
assert geo_count_to_ins(channels["Tran"][0]) == pytest.approx(0.015)
|
||||||
|
assert channels["Vert"] == [2]
|
||||||
|
assert channels["Long"] == [2]
|
||||||
|
assert channels["MicL"] == [16]
|
||||||
|
|||||||
Reference in New Issue
Block a user