v0.20.0 - prerelease features. #25
Reference in New Issue
Block a user
Delete Branch "feat/wire-histogram-codec"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes a data-loss bug discovered while dry-running the backfill against the prod store. Symptom: every histogram event in the store has its body decoded by read_blastware_file → codec returns None → samples = empty dict → ``ev.peak_values = _peaks_from_samples(empty)`` returns ``PeakValues(0, 0, 0, 0, 0)`` (NOT None). The backfill script's existing "seed from DB row when peak_values is None" branch then correctly *skips* the seeding, and the all-zeros PeakValues flows into ``db.insert_events()``'s UPSERT path, OVERWRITING the existing good DB peak values for that event (which were populated from the paired BW ASCII report at ingest). Net effect: running the backfill on prod would have wiped the PPV / mic / vector-sum columns for ~10,000 histogram events. Fix: only compute peaks-from-samples when there are actually samples. For events the codec couldn't decode (histogram-mode bodies, until the §7.6.2 histogram codec is wired in), leave peak_values=None as the "we don't know" signal. Downstream consumers: - backfill_sidecars.py — its existing ``if ev.peak_values is None:`` branch (line 243) seeds from the DB row, preserving the real BW-report peaks across the regen. - WaveformStore.save_imported_bw — apply_report_to_event overlays peaks from the paired BW ASCII report when one was uploaded. Histogram imports without a paired report end up with NULL peaks in the DB, which is correct (better than zeros — clearly says "no peak data available" rather than "peaks are exactly zero"). Updated the existing synthetic-event round-trip test to expect peak_values=None for the no-real-body case, which is the truth now. The 7 fixture-corpus regression tests for real BW waveforms continue to pass — those have decodable samples, so peak_values is still populated from the codec output as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Captures everything learned in the 2026-05-20 session before scope forced a pause: - Block framing is solved: 32-byte blocks, one per histogram interval, signature byte pattern `[22:24]=0x0000` + `[28:32]=0x1e 0x0a 0x00 0x00` reliably identifies data blocks. - Block count = interval count (791 blocks in N844L20G.630H for a TXT-reported 792 intervals). - Sample[0] = Tran peak in 0.0005 in/s/count units (verified on one event — needs cross-event confirmation). - Samples 1-8 → channel/metric mapping is still open. None of the obvious layouts (peak-then-freq alternating, all-peaks- then-all-freqs, per-channel 3-tuples) match the TXT values across multiple blocks. Likely needs a higher-activity fixture (current N844 corpus is all noise-floor data) to disambiguate. - `>100 Hz` sentinel encoding in the binary is unknown. - 4-byte variable metadata field at block[24:28] needs correlation work against TXT columns. Doc mirrors the structure of docs/waveform_codec_re_status.md so a future RE session has a familiar entry point. Includes the suggested attack plan + the code seam where the eventual decoder will land (minimateplus/histogram_codec.py). The §7.6.2 spec in instantel_protocol_reference.md is structurally correct but doesn't pin down per-sample semantics — this doc supersedes it where they conflict on confidence level. No code shipped on this branch. When the codec is cracked, the plan is to land minimateplus/histogram_codec.py + wire into event_file_io.read_blastware_file() + remove the has_samples short-circuit from scripts/backfill_sidecars.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>The histogram-mode event body is now byte-exact decodable. Companion to the waveform body codec — together they cover every event file the watcher forwards. Cracked in one session via cross-event correlation against BW's ASCII export. The §7.6.2 spec in instantel_protocol_reference.md was structurally correct (32-byte blocks) but the per-sample semantics were under-documented. Cross-checking block 130 of N844L6Z8.ZR0H against its TXT row revealed the layout perfectly: slot[0] = 10 (constant marker) slot[1] = T_peak_count (× 0.005 → in/s at Normal range) slot[2] = T_halfperiod (freq_Hz = 512 / halfp) slot[3] = V_peak_count slot[4] = V_halfperiod slot[5] = L_peak_count slot[6] = L_halfperiod slot[7] = MicL_peak_count (dB via waveform_codec.mic_count_to_db) slot[8] = MicL_halfperiod The `>100 Hz` sentinel is halfperiod ≤ 5 (since 512/5 = 100 Hz). Mic dB uses the SAME formula as the waveform codec (sign × (81.94 + 20·log10(|count|))) — they share the mic ADC calibration constant. Block identification anchor: bytes [22:24] == 0x0000 AND bytes [28:32] == 1e 0a 00 00. The tail signature is the most reliable distinguisher from non-block content in the file. Files: minimateplus/histogram_codec.py (new) — decoder + public API matching the waveform codec's shape: walk_body(body) -> records decode_histogram_body(body) -> {Tran, Vert, Long, MicL} decode_histogram_body_full(body) -> [per-interval dicts] half_period_to_hz, geo_count_to_ins helpers minimateplus/event_file_io.py (modified) — read_blastware_file now tries the waveform codec first, falls back to the histogram codec on failure. Same output shape, same downstream pipeline. tests/test_histogram_codec.py (new) — 24 regression locks against the in-repo fixture corpus, byte-exact against BW ASCII export for peaks (all 4 channels), frequencies (all 4 channels, including >100 Hz sentinel handling), block framing, and segment-ID accounting. scripts/backfill_sidecars.py (modified) — the has_samples short-circuit added in the histogram-pending era is now a pure defensive guard. Histograms in prod will regen .h5 files correctly on the next backfill run. docs/histogram_codec_re_status.md (updated) — supersedes the earlier "in progress" version with the verified format and test-coverage summary. Notes a few non-essential fields still open (4-byte block metadata, Geo PVS, Mic psi(L) — none of which are needed for waveform reconstruction). Total verified coverage: ~3,500 blocks across 5 fixtures, every field of every block byte-exact against BW. The watcher-forwarded histogram event corpus on prod (~10,000 events) will now produce correct .h5 sidecars on the next backfill run. No additional changes needed to the backfill flow — the existing tool_version-bump cascade picks them up automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>