Mirror what the ingest path does: BW's reported peaks (and sample_rate
/ record_time) take precedence over codec output where present.
Without this, --force backfill silently overwrites bw_report-overlaid
DB columns with codec-derived peaks. Wrong for events where the codec
doesn't fully decode (waveform walker edge cases on SP0/SS0/SV0-style
events, histogram byte[5]!=0 sub-format that isn't yet RE'd), producing
PVS=0 on real high-amplitude events. Bit on prod 2026-05-22 with
three top-10 waveform events ending up at PVS=0 (rolled back same day,
this fix is the proper resolution).
New helper minimateplus.event_file_io.apply_bw_report_dict_to_event
operates on the projected sidecar dict shape (the structure
_bw_report_to_dict produces, which is what gets preserved in the
sidecar). Mirrors apply_report_to_event's semantics: only writes
fields where bw_report has a non-None value, no-ops cleanly on
empty / None input.
Dev validation against prod snapshot:
pre : 1839.7315 pvs_sum 356 events with DB PVS ≠ sidecar bw_report
post : 2016.4902 pvs_sum 2 events still mismatched (both have NULL
timestamp + duplicate rows, edge case)
Both edge-case events DO get the correct value written by the new
backfill — their stale rows from prior backfills remain because
UNIQUE(serial, timestamp) doesn't fire on NULL. Separate dedup
cleanup needed for those 2 events (0.014% of corpus); not blocking.
Backfill remains idempotent + bw_report preservation still passes
(0 WIPED, 0 CHANGED on the 3rd consecutive run).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
the BE9558 / BE18003 extension-byte case
The bytes at [7]/[11]/[15]/[19] are an annotation field (purpose still
unclear — empirically non-zero on intervals with sub-Hz or unmeasurable
freq), NOT the high byte of the peak count. The N844 fixture corpus
the original RE was done against had zero values in those bytes for
every block, so uint8 and uint16 LE were equivalent there — but on
real BE9558 Tran-drift events and BE18003 Histogram+Continuous events
the uint16 LE interpretation produced peaks up to 268 in/s and 35×
inflated PVS sums.
Cross-correlated against BW's per-interval ASCII export on:
- K558LKZU/LL1P/LL3K → 100% T/V/L/M peak match (1435 blocks each)
- T003LKZR/LL0O/LL1M → 100% T/V/L, 99.3% M (0.05 dB rounding only)
- N599LKZS/LL0L → 100% all channels
- N844 fixture corpus → 100% all channels (unchanged)
Annotations preserved on every record for future RE; the defensive
_MAX_PEAK_COUNT bound is no longer needed (uint8 maxes at 1.275 in/s,
well below any physical limit).
Synthetic regression test added using the verbatim K558LKZU.RE0H
interval-12 block.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
histogram_codec: drop _MAX_PEAK_COUNT 4096 → 2200. The old ceiling
let extension-byte blocks slip through at up to 20.48 in/s per
channel, producing 35× inflated PVS sums when first deployed to
prod. 2200 covers Normal-range full-scale (10 in/s = 2000 counts)
plus 10% headroom for quantization edge cases.
backfill_sidecars: also preserve the bw_report block alongside
review + extensions when regenerating sidecars. event_to_sidecar_dict
takes a BwAsciiReport dataclass not a dict, so for bw_report we
overlay the existing block after regen rather than passing as a kwarg.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered while running the backfill on prod: certain histogram
blocks contain an undocumented extension byte format whose naive
uint16 LE interpretation yields physically impossible peak values
(150+ in/s when the device max is 10). Concrete example from
K558LKSG.3I0H block at body+7424:
bytes [6:10] = 05 79 69 00
current code: T_peak = uint16 LE = 0x7905 = 30981 → 154.9 in/s
reality: T_peak = byte[6] = 5 → 0.025 in/s (matches BW display)
The high byte (0x79 here) appears to be an extension field — possibly
"time of peak within interval" or a Histogram+Continuous sub-mode
marker. Observed across BE9558 and BE18003 units in prod data; never
appeared in the BE12844 fixture corpus the codec was originally
verified against.
Effect on prod: 26 out of 1433 blocks in this one event had inflated
peaks, plus dozens of similar events across the fleet → sum(PVS)
inflated from baseline 988 to 34501 (35x). Rolled back via the
pre-backfill snapshot before any UI exposure.
Defensive fix: bounds-check peak counts in `_decode_block`. Any
field exceeding `_MAX_PEAK_COUNT` (4096 = ~20 in/s, well past the
device's 10 in/s Normal-range FS) causes the block to be skipped
entirely. Other valid blocks in the same event still decode
correctly.
Trade-off: those skipped blocks lose their per-interval data
(peaks + frequencies). Acceptable until the extension format is
reverse-engineered — better than propagating bogus values into PVS
computations downstream.
The 24 existing tests all still pass — the fixtures used during the
original codec development don't exercise the extension-byte case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The histogram-mode event body is now byte-exact decodable.
Companion to the waveform body codec — together they cover every
event file the watcher forwards. Cracked in one session via
cross-event correlation against BW's ASCII export.
The §7.6.2 spec in instantel_protocol_reference.md was structurally
correct (32-byte blocks) but the per-sample semantics were
under-documented. Cross-checking block 130 of N844L6Z8.ZR0H
against its TXT row revealed the layout perfectly:
slot[0] = 10 (constant marker)
slot[1] = T_peak_count (× 0.005 → in/s at Normal range)
slot[2] = T_halfperiod (freq_Hz = 512 / halfp)
slot[3] = V_peak_count
slot[4] = V_halfperiod
slot[5] = L_peak_count
slot[6] = L_halfperiod
slot[7] = MicL_peak_count (dB via waveform_codec.mic_count_to_db)
slot[8] = MicL_halfperiod
The `>100 Hz` sentinel is halfperiod ≤ 5 (since 512/5 = 100 Hz).
Mic dB uses the SAME formula as the waveform codec (sign × (81.94
+ 20·log10(|count|))) — they share the mic ADC calibration constant.
Block identification anchor: bytes [22:24] == 0x0000 AND
bytes [28:32] == 1e 0a 00 00. The tail signature is the most
reliable distinguisher from non-block content in the file.
Files:
minimateplus/histogram_codec.py (new) — decoder + public API
matching the waveform codec's shape:
walk_body(body) -> records
decode_histogram_body(body) -> {Tran, Vert, Long, MicL}
decode_histogram_body_full(body) -> [per-interval dicts]
half_period_to_hz, geo_count_to_ins helpers
minimateplus/event_file_io.py (modified) — read_blastware_file
now tries the waveform codec first, falls back to the histogram
codec on failure. Same output shape, same downstream pipeline.
tests/test_histogram_codec.py (new) — 24 regression locks against
the in-repo fixture corpus, byte-exact against BW ASCII export
for peaks (all 4 channels), frequencies (all 4 channels,
including >100 Hz sentinel handling), block framing, and
segment-ID accounting.
scripts/backfill_sidecars.py (modified) — the has_samples
short-circuit added in the histogram-pending era is now a
pure defensive guard. Histograms in prod will regen .h5 files
correctly on the next backfill run.
docs/histogram_codec_re_status.md (updated) — supersedes the
earlier "in progress" version with the verified format and
test-coverage summary. Notes a few non-essential fields still
open (4-byte block metadata, Geo PVS, Mic psi(L) — none of
which are needed for waveform reconstruction).
Total verified coverage: ~3,500 blocks across 5 fixtures, every
field of every block byte-exact against BW.
The watcher-forwarded histogram event corpus on prod (~10,000
events) will now produce correct .h5 sidecars on the next backfill
run. No additional changes needed to the backfill flow — the
existing tool_version-bump cascade picks them up automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes a data-loss bug discovered while dry-running the backfill against
the prod store.
Symptom: every histogram event in the store has its body decoded by
read_blastware_file → codec returns None → samples = empty dict →
``ev.peak_values = _peaks_from_samples(empty)`` returns
``PeakValues(0, 0, 0, 0, 0)`` (NOT None). The backfill script's
existing "seed from DB row when peak_values is None" branch then
correctly *skips* the seeding, and the all-zeros PeakValues flows into
``db.insert_events()``'s UPSERT path, OVERWRITING the existing good DB
peak values for that event (which were populated from the paired BW
ASCII report at ingest).
Net effect: running the backfill on prod would have wiped the PPV /
mic / vector-sum columns for ~10,000 histogram events.
Fix: only compute peaks-from-samples when there are actually samples.
For events the codec couldn't decode (histogram-mode bodies, until
the §7.6.2 histogram codec is wired in), leave peak_values=None as
the "we don't know" signal. Downstream consumers:
- backfill_sidecars.py — its existing ``if ev.peak_values is None:``
branch (line 243) seeds from the DB row, preserving the real
BW-report peaks across the regen.
- WaveformStore.save_imported_bw — apply_report_to_event overlays
peaks from the paired BW ASCII report when one was uploaded.
Histogram imports without a paired report end up with NULL peaks
in the DB, which is correct (better than zeros — clearly says
"no peak data available" rather than "peaks are exactly zero").
Updated the existing synthetic-event round-trip test to expect
peak_values=None for the no-real-body case, which is the truth now.
The 7 fixture-corpus regression tests for real BW waveforms continue
to pass — those have decodable samples, so peak_values is still
populated from the codec output as before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes that close the rollout gap left by the
read_blastware_file codec wiring:
1. minimateplus/event_file_io.py: bump TOOL_VERSION from 0.16.1 to
0.20.0. This is the version stamp the backfill script reads from
each sidecar's source.tool_version field to detect "this sidecar
was written before the current decoder shipped, regenerate it."
Bumping past every value baked into existing prod sidecars flags
them all as stale on the next backfill run — which is exactly what
we want, since every pre-codec-wiring sidecar was written by the
retracted int16-LE decoder.
2. scripts/backfill_sidecars.py: when the sidecar is being
regenerated this iteration (sha mismatch, tool_version too old,
or --force), also regenerate the .h5. Previously the .h5 logic
only rewrote when --force was passed or the file was missing —
so a tool_version-driven sidecar regen left the broken .h5 in
place forever. Added a `sidecar_stale` boolean to track the
"we're rewriting the sidecar this iteration" state and wired it
into the h5 need-rewrite check.
Path coverage (verified by trace):
- sidecar missing → both regen
- --force → both regen
- sha mismatch → both regen
- tool_ver too old → both regen (THE post-codec-wiring case)
- everything OK → skip iteration entirely (h5 untouched)
Operator review state (review.false_trigger, reviewer, notes) and
the sidecar's extensions block are preserved across regen by the
existing read-existing-sidecar / pass-into-event_to_sidecar_dict
path — unchanged from prior behavior.
Deploy procedure (on prod):
1. Pull this change + the read_blastware_file codec wiring.
2. `python scripts/backfill_sidecars.py --dry-run` to preview.
Every sidecar with source.tool_version<0.20.0 will show as
"would (re)write".
3. Run for real (drop --dry-run). Expect every pre-fix event
to regen. Big stores may take a while.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`read_blastware_file()` was still calling `_decode_samples_4ch_int16_le`
(the retracted int16-LE-interleaved hypothesis) on the body bytes,
producing ±32K noise on every channel of every BW file read from disk.
This was the path watcher-forwarded events take into the system
(via the import endpoint → save_imported_bw → read_blastware_file,
since the watcher doesn't ship A5 frames), so every .h5 sidecar
generated for a forwarded event has been wrong since the feature
shipped.
The fix is mechanical: pass the body bytes straight to
`waveform_codec.decode_waveform_v2()` and run the result through
`decoded_to_adc_counts()` for the 16x geo scaling. The body already
starts with the codec's exact 7-byte preamble `00 02 00 [Tran[0] BE]
[Tran[1] BE]` — confirmed by `body[:3].hex()` across all 9 fixture
events. No body-slice adjustment needed.
If the codec returns None (truncated/malformed file, synthetic test
input with no real waveform), fall back to empty channels with a log
warning. The rest of the event (timestamp, waveform_key, project
strings, sensor_location, peaks-from-samples=0) is still recoverable.
Verified against the bundled fixture corpus:
V70 Tran/Vert/Long 3328/3328 sample-sets match .TXT ground truth
within the 0.005 in/s display quantum, every row
6S0/RG0/AB0/470 (5-8-26) 3328/2304/1280/1280 samples; Vert PPVs
match BW's own report within 0.02 in/s
JQ0 3328 samples, Vert PPV 3.384 vs BW 3.465
SP0/SS0/SV0 (loud events) 3072–3328 samples; known walker
tail-truncation 1–7 samples per channel, samples reached are
byte-exact
Existing `test_read_blastware_file_round_trip` (synthetic empty event)
continues to pass thanks to the None-fallback. Codec verify scripts
(`analysis/verify_quiet_bundle.py`, `analysis/verify_full_decode.py`)
re-run unchanged.
Added two regression-lock tests in tests/test_event_file_io.py:
- test_read_blastware_file_decodes_via_codec[6 fixtures]
— verifies sample count + Vert PPV per fixture
- test_read_blastware_file_v70_samples_match_txt_truth
— verifies every one of V70's 3328 sample-sets across Tran/Vert/Long
matches the .TXT ground truth row-by-row within 0.003 in/s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When NN exceeds 0xFC, the codec extends to 12-bit NN by using the
low nibble of the TYPE byte as the high nibble of NN:
1X NN → nibble-delta block, NN = (X << 8) | NN_byte
2X NN → int8-delta block, same NN encoding
Walker and decode_waveform_v2 now handle both narrow (X=0) and wide
(X != 0) forms uniformly.
Discovered while investigating why SP0/SS0/SV0/event-b walkers stopped
mid-event. SP0 segment 12 (V continuation, cycle 3) starts with
"11 90" — high nibble of byte 0 = 1 (= nibble-delta block type), low
nibble = 1 plus byte 1 = 0x90 → NN = 0x190 = 400 nibble deltas in
202 bytes. Walker was rejecting "11" as a non-tag.
Sample count went from 47,364 to 72,972 verified byte-exact:
event-a: 9984 (full) was 9984 (full)
event-b: 6912 (full) was 738
event-c: 3840 (full) was 3840 (full)
event-d: 3840 (full) was 3840 (full)
JQ0: 9984 (full) was 9984 (full)
V70: 9984 (full) was 9984 (full)
SP0: 9984 (full) was 5122
SS0: 9222 (-7 tail) was 1758
SV0: 9222 (-7 tail) was 2114
7 of 9 fixtures now decode end-to-end across all 3 geo channels.
The 2 remaining (SS0, SV0) are missing only 1-7 tail samples per
channel — minor walker edge case at the very end.
74 tests pass (was 71).
Replaces the broken legacy int16 LE decoder in client.py with the
verified multi-channel codec. Three changes:
1. blastware_file.extract_body_bytes(a5_frames) — new helper that
factors out the body-reconstruction logic from write_blastware_file
so both writers (BW binary) and decoders (sample arrays) can use
the same canonical bytes.
2. waveform_codec.decode_a5_frames(a5_frames) — production entry point.
Returns the raw_samples dict consumers expect (Tran/Vert/Long as
int16 ADC counts; MicL as native ADC counts). Internally:
A5 frames → extract_body_bytes → decode_waveform_v2
→ decoded_to_adc_counts (geos ×16; mic pass-through)
3. waveform_codec.mic_count_to_db(count) — MicL ADC → dB(L) per BW's
display formula:
dB = sign(count) × (81.94 + 20 × log10(|count|)) for |count| ≥ 1
Verified against V70 fixture: count=813 → 140.14 dB (BW PSPL 140.1).
client.py:_decode_a5_waveform is reduced to a thin wrapper that calls
decode_a5_frames and populates event.raw_samples. Original implementation
preserved as _decode_a5_waveform_LEGACY (dead code; reference only).
Also fixed a tail-end bug in decode_waveform_v2 where trailer-section
"40 02" markers (containing ASCII serial bytes, NOT real segment headers)
were being mis-interpreted, producing 2 spurious samples per channel at
the end of each event. Added bytes [12:14] == "02 00" validation to
reject non-header markers.
7 new pytest tests cover the new helpers and dB conversion. Total:
71 passing (up from 64).
Known limitation (carried over from before): the walker still stops
mid-event on the loudest fixtures (SP0/SS0/SV0/event-b) at some
mid-segment edge cases not yet characterized. Every sample reached
is decoded correctly; the walker just doesn't reach all of them.
Loud events still yield 5,000–15,000 byte-exact samples each.
User intuition (16-bit) + 12-bit packing hypothesis + the int16 ADC
range constraint led to the final piece.
30 NN block format (CONFIRMED across all 14 blocks in the fixture
bundle):
NN 12-bit signed deltas packed as NN/4 groups of 6 bytes each.
Within each group:
bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB-first)
bytes [2:6] = 4 × int8 low bytes
delta[k] = sign_extend_12((high_nibble[k] << 8) | low_byte[k])
Block length = NN × 1.5 + 2 bytes (tag included). Earlier walker
used NN × 4 which is only correct in the TRAILER section.
Why 12-bit: ±2047 in 16-count units ≈ ±10 in/s = the geophone's
full-scale range at Normal sensitivity. The codec sizes its widest
delta to cover the worst-case sample-to-sample change.
Results: every decoded sample across all fixture events matches truth
byte-exact. ZERO divergences.
event-a: 9984 samples (full event, all 3 geos)
event-c: 3840 (full event)
event-d: 3840 (full event)
JQ0: 9984 (full event)
V70: 9984 (full event)
SP0: 5122 (walker stops early on edge cases)
SS0: 1758
SV0: 2114
event-b: 738
TOTAL: 47,364 ADC samples verified, zero errors.
Three full 3-sec events decode end-to-end across all three geo
channels. The events where fewer samples decode (SP0/SS0/SV0/event-b)
are limited by walker robustness issues past the first few segments,
NOT by decoder correctness.
64 tests pass (up from 55). Files: minimateplus/waveform_codec.py
(new 30 NN decode + corrected walker length), tests/test_waveform_codec.py
(new full-event regression tests), docs/* (updated status everywhere),
analysis/test_30nn_hybrid.py (new — the analysis script that confirmed
the format).
The segment-channel scoring analyzer (from scratch/next_experiment_skeleton.py)
ran and immediately confirmed the rotation hypothesis:
SP0 seg 0: best fit Vert 508/508 ✓
SP0 seg 1: best fit Long 508/508 ✓
SP0 seg 3: best fit Tran 508/508 ✓ (Tran continuation)
SP0 seg 5: best fit Long 508/508 ✓
SP0 seg 9: best fit Long 508/508 ✓
V70 seg 0: best fit Vert 508/508 ✓
V70 seg 1: best fit Long 508/508 ✓
Channels rotate Tran → Vert → Long → MicL per 40 02 segment header.
Also discovered the segment header has DOUBLE duty: bytes [14:18] anchor
the NEW segment's channel (2 samples as int16 BE in 16-count units), AND
bytes [0:4] extend the PREVIOUS channel by 2 more samples (2 deltas as
int16 BE). This is the same "2 anchors + delta stream" structure as the
body preamble for Tran.
decode_waveform_v2 now returns full per-channel sample dicts.
Byte-exact verified ranges:
V70: Tran 512, Vert 512, Long 512 (all first segments)
JQ0: Tran 512, Vert 258
SP0: Long 1536 (all 3 L segments)
Still open: the 30 NN block format (high-amplitude packed deltas) —
appears mid-segment when single-byte deltas can't carry the magnitude.
6 new tests bring the count to 46. All passing.
Three "truth layers" had drifted apart between commits. Fixed:
1. waveform_codec.py docstring rewritten from the 2026-05-08
"structural framing only" state to the 2026-05-11 "Tran segment 0
solved + segment-header partially decoded" state. Killed stale
"~80 sample-sets per segment" language (real segments are
flash-page-byte-sized, not sample-count-sized; observed first-segment
sizes are 42-510 samples depending on signal). Killed stale
"preamble is 7 or 9 bytes" language (always 7).
2. docs/instantel_protocol_reference.md §7.6.1: added a clear
"CURRENT STATUS" box at the top with a status table. Replaced the
stale "~80 sample-sets" line with the verified per-event segment
sizes. Merged two redundant segment-header field-table sections.
3. docs/waveform_codec_re_status.md (NEW): clean working-status doc.
Solved / not solved / hypothesis / next experiment / fixtures /
tests. The protocol reference remains the historical Rosetta
Stone; this new file is the current-truth working note that
shouldn't accumulate fossil layers.
4. CLAUDE.md §"Waveform body codec": prominent warning box at top —
"DO NOT TRUST decoded sample arrays yet." BW binary passthrough
is the only sample-bearing output to trust until the decoder
lands. Added a "Next experiment" subsection pointing the next
pass at the segment-channel scoring analyzer.
40 tests still pass.
User uploaded a Vert-heavy event (JQ0) and a Mic-heavy event (V70).
Those two were exactly what was needed to crack the next piece:
- 00 NN block = run-length-encoded zero deltas in the current channel.
Append NN copies of the current cumulative value (no change).
- find_data_start now recognizes 00 NN as a valid first tag (some events
begin with a leading 00 NN RLE block).
- decode_tran_initial now decodes the FULL segment 0 (not just the first
data block).
Results across 5 fixture events:
- M529LL1A.SP0 (loud-all-channels) : 510 / 510 ✓
- M529LL1L.JQ0 (Vert-heavy) : 510 / 510 ✓
- M529LL1L.V70 (Mic-heavy) : 510 / 510 ✓
- M529LL1A.SV0 (loud-from-start) : 58 / 58 ✓
- M529LL1A.SS0 (loud-from-start) : 42 / 502 (stops at first 30 04)
The 30 04 block (only seen in loud-from-start events) hasn't been
decoded yet — likely a channel-switch marker for the high-amplitude
regime.
Also discovered: segment header (40 02) payload bytes [0:2] = T_delta
at first sample of new segment, [6:8] = byte length to next segment.
Multi-segment Tran decoding still diverges after sample 512 because
the per-segment channel ordering after the header is unknown.
Tests: 40 pass (up from 36).
Files:
- minimateplus/waveform_codec.py: find_data_start fix, RLE handling,
full segment-0 decode in decode_tran_initial
- tests/test_waveform_codec.py: synthetic RLE test, full segment 0
tests for JQ0 and V70
- tests/fixtures/5-11-26/: M529LL1L.JQ0, M529LL1L.V70 + TXT exports
- docs/instantel_protocol_reference.md §7.6.1: RLE + segment-header docs
User uploaded 3 high-amplitude events (PPV 6-7 in/s — shook the geophone
hard) to decode-re/5-11-26/. These cracked the Tran codec:
- Preamble bytes [3:5] and [5:7] = Tran[0] and Tran[1] as int16 BE
in 16-count units (LSB = 0.005 in/s). Confirmed across all 7
fixtures.
- First data block carries Tran deltas from sample 2 onward:
* 10 NN block: NN/2 bytes of payload, each byte = two 4-bit signed
nibble deltas (high nibble first)
* 20 NN block: NN int8 signed deltas
Verified 22+42+46 = 110 Tran samples across SP0/SS0/SV0 with 0 errors
against BW's ASCII export.
Why the earlier 96-combination brute force failed: the quiet 5-8
events all had T[0] = T[1] ≈ 0 so the preamble's per-channel encoding
was undetectable. Loud events made the encoding obvious.
What's solved:
- minimateplus.waveform_codec.decode_tran_initial: returns first
N Tran samples in 16-count units for any body.
- Walker length formula for in-data 30 NN blocks (NN*2 instead of NN*4).
- Walker now handles bodies that start with 20 NN (in addition to 10 NN).
What's still open:
- Tran past the first data block (multi-block channel switching).
- Vert / Long / MicL channel encodings.
- Walker correctness past offset ~427 in event-b.
Tests: 36 pass. decode_waveform_v2 still returns None — the full
multi-channel decoder is not wired up. decode_tran_initial is the
new verified entry point.
Files: minimateplus/waveform_codec.py, tests/test_waveform_codec.py
(adds 5-11-26 fixtures + decode_tran_initial tests), and
docs/instantel_protocol_reference.md §7.6.1 (Tran codec spec).
Decoded the structural framing of the Blastware waveform body — the bytes
between the 21-byte STRT record and the 26-byte file footer. The body is
a sequence of tagged variable-length blocks, NOT raw int16 LE. Five tag
types (10/20/00/30/40 NN) and their lengths are now confirmed against the
4-event May 2026 fixture bundle. Body splits cleanly into ~16 segments
(for a 1280-sample event) separated by 40 02 segment headers carrying a
monotonically incrementing uint32 LE counter at bytes [8:12].
What's done:
- minimateplus/waveform_codec.py — block walker, segment splitter, segment
header parser. decode_waveform_v2 is a stub returning None until the
byte-to-sample mapping is solved; client.py is unchanged.
- tests/test_waveform_codec.py — 31 tests covering block detection, lengths,
contiguous-walk, segment splitting, segment-header parsing, and counter
monotonicity. All pass.
- tests/fixtures/decode-re-5-8-26/ — bundled fixtures (4 events, BW binary
+ Blastware ASCII export each).
- docs/instantel_protocol_reference.md §7.6.1 — replaced retraction box
with the verified structural decoding plus an explicit list of what's
still open.
What's still open: the per-byte mapping inside 10 NN / 20 NN blocks. 96
channel-permutation × nibble-order × sign-convention combinations were
brute-force tested; none match BW's ASCII export to within ±1 ADC count.
The codec is more elaborate than uniform 4-bit deltas — likely a hybrid
variable-bit-width scheme with segment-anchor resync points. Next
recommended step: capture an event with a known calibration tone to pin
down magnitude scaling.
Walker also bails out partway through event-b (open issue documented in
both the module and the protocol reference).
The BW ACH ingest path was inserting every event with
record_type="Waveform" regardless of the actual type because
read_blastware_file() had `ev.record_type = "Waveform"` hardcoded, and
the live watcher-forward path parses files from a tmp path (suffix
".bw") that doesn't carry the original extension.
V10.72+ MiniMate Plus firmware encodes the event type as the last
character of the AB0T extension scheme (H=Histogram, W=Waveform,
M=Manual, E=Event, C=Combo). This change:
1. Adds derive_record_type_from_filename() public helper in
minimateplus/event_file_io.py
2. Uses it inside read_blastware_file() so direct callers (the
--dry-run path of scripts/import_bw.py, tests, ad-hoc scripts)
get correct types automatically
3. Overrides ev.record_type in WaveformStore.save_imported_bw()
using the ORIGINAL filename (source_path.name) — required
because the parser sees only the tmp file
Old S338 firmware (3-char extensions ending in `0`) and any
unrecognized suffix fall back to "Waveform".
Existing DB rows ingested before this fix are stuck with
record_type="Waveform" — a one-off SQL backfill would fix them
retroactively if desired. Terra-view's event modal also derives
client-side from the filename, so the UI already shows the correct
type for old events even without the backfill.
Version bumped to 0.16.1 in pyproject.toml, event_file_io.py
TOOL_VERSION, sfm/server.py FastAPI version, and CHANGELOG.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "BW ACH ingestion" release. Paired with series3-watcher v1.5.0,
every Blastware ACH event (binary + _ASCII.TXT report) lands in
SeismoDb with device-authoritative peaks, project metadata, sensor
self-check, and ZC/Time-of-Peak data — without depending on the
still-undecoded waveform body codec.
Bumps pyproject.toml + minimateplus/event_file_io.py TOOL_VERSION
to 0.16.0. README banner + CHANGELOG entry summarise the work
that landed across commits cdfe4ad..f83993a on this branch.
Two compounding bugs caused forwarded events to land in the DB with
broken-codec peak values (~10 in/s saturation on every channel) and
no project info, even when the watcher correctly paired a BW ASCII
report with the binary.
Bug 1: save_imported_bw built the sidecar JSON with the report's
authoritative peak / project values via event_to_sidecar_dict(
bw_report=...), but never overlaid those onto the in-memory Event
that flows to db.insert_events(). So the DB row got peak_values
from read_blastware_file()._peaks_from_samples() — which runs the
still-undecoded waveform body codec assuming raw int16 LE and
produces ±32K-shaped noise (= ±10 in/s at Normal range) regardless
of the actual signal. The sidecar JSON had the truth but the DB
columns (which the webapp queries for fast filter/sort) lied.
Bug 2: insert_events' IntegrityError handler only refreshed the
filename/filesize/a5_pickle/sidecar columns when a duplicate
(serial, timestamp) was seen. Peak values, project info,
sample_rate, record_type stayed locked in at whatever the FIRST
insert wrote. So even after Bug 1 was fixed, the historical
events in the DB (already inserted with broken-codec peaks) would
never get their values corrected, because a re-forward would just
hit IntegrityError and skip the field refresh.
Fix 1 (minimateplus/event_file_io.py + sfm/waveform_store.py):
- New apply_report_to_event(event, report) helper folds the BW
report's device-authoritative fields onto the Event in-place:
per-channel PPV, peak vector sum, mic PSPL→psi, project /
client / operator / sensor_location, sample_rate, record_time.
- save_imported_bw() calls the helper right after parsing the
report. The Event that flows to insert_events() now carries
correct values.
Fix 2 (sfm/database.py):
- insert_events()'s IntegrityError UPDATE now refreshes every
device-authoritative column from the new data: tran_ppv,
vert_ppv, long_ppv, peak_vector_sum, mic_ppv, project, client,
operator, sensor_location, sample_rate, record_type, plus
the existing filename/filesize/a5_pickle/sidecar fields.
- Preserves: id, waveform_key, session_id, created_at (immutable
/ FK fields), and false_trigger (operator review state).
End-to-end simulation verified:
- Step 1: import without report → DB has ±10 in/s peaks, no project
- Step 2: re-import WITH report → upsert path fires, DB now has
device-authoritative 0.005 in/s peaks + sensor_location
- Step 3: operator sets false_trigger=1, re-import again → flag
preserved, peaks remain correct
For the user's situation: deleting the watcher state file forces a
re-forward of all events. Each re-forward now pairs with its
_ASCII.TXT, applies the report onto the Event, and the upsert
refreshes the DB row. No DB nuke needed.
Full SFM suite: 62 passed, 44 skipped.
The four operator-supplied note fields in BW's Compliance Setup →
Notes tab (Project / Client / User Name / Seis Loc) have
USER-EDITABLE LABELS — an operator can rename them in BW's UI to
"Building:", "Site Address:", "Inspector:", or anything else, and
the ASCII export writes those literal labels verbatim. The
previous label-normalisation map approach (just added in commit
6a7e8c6) was fragile: it could only match label spellings we'd
enumerated in advance. An operator using "Site:" instead of
"Seis Loc:" would have their sensor location silently dropped.
What IS reliable: BW always writes the 4 user-notes lines
contiguously, in the same order, between the "Units :" line and
the "Geo Range :" line of the export. So parse them by POSITION:
position 1 → project
position 2 → client
position 3 → operator
position 4 → sensor_location
The original labels BW wrote are preserved in a new
`BwAsciiReport.user_note_labels` dict (canonical slot → literal
label string) so terra-view can render them as the operator named
them.
Removes the `_OPERATOR_LABEL_MAP` / `_normalise_label_for_lookup`
helpers and the elif-by-normalised-label branch in `parse_report`.
Replaces with a small state machine that flips on the "Units" line
and flips off on the "Geo Range" line.
Tests:
- Default-label fixtures (waveform + histogram) still populate
correctly, with operator's labels captured.
- Synthetic custom-labelled exports ("Building:" / "Site Address:" /
etc.) populate the right slots by position.
- Histogram-specific "Seis. Location:" works.
- Lines outside the Units→Geo Range range are ignored even if
they look like user notes (defensive against malformed exports).
- Partial blocks (fewer than 4 lines) leave later slots None.
- Extra lines beyond 4 are dropped (5th slot doesn't exist).
26 tests in test_bw_ascii_report.py (was 33; net drop reflects
parametrised label tests collapsed into 6 focused position tests).
Full SFM suite: 62 passed, 44 skipped.
Pairs with series3-watcher v1.5.0 which fixes the filename pairing
so the report reaches this parser in the first place.
Blastware writes the operator-supplied fields with different label
spellings across firmware versions and recording modes — most
notably "Seis. Location" on histogram exports vs "Seis Loc:" on
waveform exports. Previous parser only matched the latter, so
every histogram event silently lost its sensor_location field.
Replace the four hardcoded `key.rstrip(":") == "X"` branches with
a single `_OPERATOR_LABEL_MAP` dispatch table keyed by normalised
label (lowercase, trailing colon/period stripped, internal
whitespace collapsed). Adds these variants on day 1:
project: "Project:" / "Project"
client: "Client:" / "Client"
operator: "User Name:" / "User Name"
sensor_location: "Seis Loc:" / "Seis. Location" / "Seis Location"
/ "Sensor Location" / "Seis Loc"
To absorb future BW label drift, add a one-line dict entry — no
new elif branch.
14 new tests cover:
- Each label variant routes to the correct field (parametrised)
- Case-insensitive matching ("seis loc" / "SEIS LOC" / "SeIs LoC")
- Whitespace-collapse ("Seis Loc" with double-space)
- End-to-end parse of a real histogram fixture from
example-events/histogram/ — sensor_location ('Loc #1 - 2652 Hepner...')
populates correctly even though the file uses "Seis. Location"
Total bw_ascii_report tests: 19 → 33. Full SFM suite still green
(69 passed, 44 skipped — pre-existing skips for h5py-dep tests).
Pairs with series3-watcher v1.5.4 (which fixes the filename pairing
so histograms actually reach this parser in the first place).
Blastware's ACH writes a per-event ASCII report (.TXT) alongside each
event binary, containing the rich derived per-channel fields BW
computes (PPV, ZC Freq, Time of Peak, Peak Acceleration, Peak
Displacement, Peak Vector Sum + time, sensor self-check Pass/Fail,
monitor-log timestamps). None of this lives in the BW binary itself.
When the watcher daemon forwards both files to /db/import/blastware_file
in one multipart POST, we now:
- Pair binaries with their .TXT partners by filename match
- Parse the report into a structured BwAsciiReport
- Land the rich fields in a new top-level `bw_report` block of the
sidecar JSON
- Overlay the report's peaks/project_info/timestamp/sample_rate/
record_time/total_samples/pretrig_samples onto the canonical
sidecar fields (the report values are device-authoritative; the
BW-binary STRT-derived values had bugs like reading the 0x46
record-type marker as rectime)
This unblocks the monthly-summary review workflow — events become
sortable/filterable by peak, location, project, etc. — without
depending on the still-undecoded waveform body codec.
### Added
- **Layered event storage architecture.** Each event now lands as four
files in the per-serial waveform store, each with a clear role:
- `<filename>` — the Blastware-readable binary (BW file). Untouched.
- `<filename>.a5.pkl` — the raw 5A frames (regenerative source).
- `<filename>.h5` — clean per-channel waveform arrays in physical
units (in/s for geo, psi for mic) plus event metadata (HDF5 with
gzip compression). This is the canonical format for downstream
analysis tools.
- `<filename>.sfm.json` — the modern review/metadata sidecar (peaks,
project, source provenance, review state, extensions).
SQLite (`seismo_relay.db`) is the searchable index over all four.
- **Plot-ready waveform JSON (`sfm.plot.v1`).** The `/device/event/{idx}/waveform`
and `/db/events/{id}/waveform.json` endpoints now return samples in
physical units with explicit time-axis metadata, peak markers, and
per-channel unit hints — no more guessing the ADC-to-velocity scale
client-side. The webapp waveform viewer was rewritten to consume
this shape.
- **In-app waveform viewer accuracy fix.** The standalone SFM webapp
viewer was scaling geophone amplitudes by `geoAdcScale / 32767`
(≈ 6.206 / 32767), where `geoAdcScale = 6.206053` is the device's
*in/s per V* hardware constant — not the ADC-counts-to-velocity
factor. This silently scaled every plot ~38% too low for Normal-range
geophones (the correct full-scale is 10.0 in/s, or 1.25 in/s for
Sensitive). Conversion is now done server-side using the geo_range
from compliance config; the client just plots.
- New `sfm/event_hdf5.py` module: `write_event_hdf5()`,
`read_event_hdf5()`, plus a plot-JSON helper.
- Backfill script extended to also emit `.h5` for existing events.
### Dependencies
- Added `h5py>=3.10` and `numpy>=1.24` for the HDF5 storage layer.
- Added `python-multipart>=0.0.7` (required by FastAPI for the
`/db/import/blastware_file` endpoint introduced in this release).
- Added `waveform_key` and `event_timestamp` columns to `CachedEvent` and `CachedWaveform` for integrity verification.
- Implemented logic to flush the cache when a mismatch in (waveform_key, event_timestamp) is detected during event and waveform updates.
- Enhanced `set_events` and `set_waveform` methods to check for mismatches and trigger cache eviction as necessary.
- Introduced a new `LiveCache` class to manage in-memory caching of live device data, separating it from the server logic for better testability.
- Added tests to verify the correctness of cache invalidation logic, particularly for post-erase key reuse scenarios.
- Updated web application to include a "Force refresh" toggle, allowing users to bypass the cache and re-fetch data from the device.
test: add regression tests for v0.14.x SUB 5A protocol fixes
refactor(logging): change warning logs to debug for less verbosity in write_blastware_file
## v0.13.0 — 2026-05-01
### Fixed
- **SUB 5A bulk waveform stream — over-read bug for events ≥ 2 sec.**
`read_bulk_waveform_stream` was walking the chunk counter past the actual
end of the event, picking up post-event circular-buffer garbage that
corrupted reconstructed Blastware files for any waveform > ~1 sec. The
loop now extracts the event's `end_offset` from the STRT record at
`data[23:27]` of the probe response and stops the chunk walk when the next
counter would step past it. Verified against three BW MITM captures
(4-27-26 + 5-1-26): 2-sec event drops from 37 over-read chunks to 7
bounded chunks; 3-sec drops to 9; non-zero-start "event 2" drops to 9.
### Added
- `framing.bulk_waveform_term_v2(key4, end_offset, last_chunk_counter)` —
computes the corrected SUB 5A TERM frame's `(offset_word, params)` per the
formula confirmed across all 3 BW captures. Not yet wired into
`read_bulk_waveform_stream` (the legacy TERM is still used to preserve the
existing `blastware_file.write_blastware_file` frame-structure expectations);
available for the next iteration that switches to BW's 0x0200 chunk step.
- `framing.parse_strt_end_offset(a5_data)` — extracts the event-end pointer
from the STRT record in an A5 response payload.
Adds a new CapturingTransport wrapper in minimateplus.transport that mirrors
every TX/RX byte to two raw .bin files using the same on-wire format as
bridges/ach_mitm.py, so the resulting captures are byte-for-byte compatible
with the existing Blastware MITM captures and load directly in the Analyzer.
A new "Download" tab in seismo_lab.py lets the user connect to a device over
TCP or serial and run connect / list-keys / download-events while the wrapper
saves raw_bw_<ts>.bin (our TX) and raw_s3_<ts>.bin (device TX) into a
seismo_dl_<ts>[_<label>]/ session directory. On completion, the panel hands
both files to the Analyzer and switches tabs, mirroring the UX of the
existing Bridge capture flow.
Frame 0 is always the probe; frames 1+ are always data (waveform ADC
chunks, compliance config, compliance continuation). Gating on
classify_frame() at fi>0 produces false positives: ADC binary data
can coincidentally contain b"STRT\xff\xfe", causing frames 1 and 5
to be silently dropped from the body (confirmed from live capture on
event key=01110000). Remove all type-based filtering; include every
frame unconditionally with the standard index-based skip amounts.
Add classify_frame() which categorises each A5 frame by content:
terminator — page_key == 0x0000
probe_or_strt — contains b"STRT"
metadata — contains compliance-config ASCII markers
(Project:, Client:, Standard Recording Setup, …)
waveform — binary-heavy (< 20% printable ASCII), i.e. raw ADC data
unknown — fallback
Update write_blastware_file() body loop: frame 0 (probe) is still
always processed; frames 1+ are only included when classify_frame
returns "waveform". Metadata frames (compliance config block with
Project:/Client:/etc.) and any stray STRT-bearing frames are skipped
with a warning/debug log. Terminator frame handling is unchanged.
Adds temporary print() diagnostics so each frame's classification is
visible in the server log to aid debugging.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
For 2-second events at 1024 sps the "Project:" metadata frame appears
beyond chunk 32 (the old default cap), causing the safety limit to be
hit and ~34 KB of waveform data to be downloaded instead of stopping
at the metadata frame. Raising max_chunks to 128 ensures
stop_after_metadata=True can locate the metadata frame for record
times up to ~4 seconds.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>