253 Commits

Author SHA1 Message Date
serversdown 25386cab8b fix(backfill): regenerate IDFH .h5 + merge binary mic_pspl_psi onto bridge
Two gaps in backfill_thor_events.py that left old Thor events showing
stale charts after a v0.21.1 backfill pass:
1. IDFH events were skipped from .h5 regeneration (the "have decoded
   samples" gate was IDFW-only).  Histograms kept their pre-v0.21.1
   .h5 — written from raw_samples = None, which the renderer turned
   into a near-empty bar chart, or for older events the dB(L)-as-pseudo-
   psi mic scale that produced "107.7 psi" peaks (atomic-bomb level
   instead of footstep level).  Fix: synthesise the same 1-sample-per-
   interval array save_imported_idf v0.21.1 uses (peak ADC count per
   channel per interval) so the renderer's bar-chart grouping has
   data to work with.
2. The IDFW h5 path didn't merge binary_peaks.mic_pspl_psi onto the
   IdfEvent before to_minimateplus_event().  The live save_imported_idf
   does this merge — without it, IdfEvent.from_report() only sees the
   .txt's dB(L) value, the bridge falls back to the dBL→psi formula
   (instead of the binary-accurate 2.14e-6 psi/count value), and the
   h5 writer's per-count mic factor lands on a less-correct value.
   Fix: same merge the live ingest does (lift res.event.peaks.mic_pspl_psi
   onto idf_event.peaks before the bridge call).
Verified against UM6047_20250804190047.IDFH (250-interval prod
histogram): 250 intervals decode, mic_pspl_psi = 2.78e-5 (was being
treated as dB(L)=107.7 in the old h5).
Operator: re-run after deploy.  `docker compose exec sfm python
scripts/backfill_thor_events.py` is idempotent — the existing version
check still skips events already at the new TOOL_VERSION, and review
state + captured_at are preserved on the second pass.
2026-06-01 20:02:54 +00:00
serversdown 6cb619ecc4 version bump - 0.21.1 2026-06-01 19:33:44 +00:00
serversdown 1ed86244d0 fix(thor-events): add parallel field for mic psi. Now shows mic in dbl and psi. (psi for charts) 2026-06-01 18:27:24 +00:00
serversdown b2c565f217 fix(idf_waveforms): _find_waveform_body_offset() — scans every 00 02 00 magic past offset 0x0E00, runs decode_waveform_v2 on each candidate, picks the one that returns the most samples. Validated on 483 prod IDFW files: 0 preamble-only events (was ~50%), 355/483 fully decode, 126/483 partial (BW codec walker-stops-early on loud events — known issue).
IDFH now synthesises a 1-sample-per-interval array from the binary intervals and writes an .h5 so the existing renderer works unchanged. Each "sample" is the per-interval peak ADC count → h5_value = count × geo_fs/32768 yields the right bar height.
2026-05-31 20:51:09 +00:00
serversdown 43f440812a scripts: add backfill_thor_events.py
Refreshes the bw_report sidecar block + .h5 waveform files for Thor
events ingested before the v0.21.0 adapter wiring + the bee1185 codec
fix.  Those events landed with extensions.idf_report only (no
bw_report, no .h5 for IDFW) — symptom on the UI side: the modal chart
404'd on /waveform.json and the PDF rendered from DB-only fields
without sensor self-check, full per-channel breakdown, or mic dB(L).

Walks <store>/<serial>/<filename>:
  - Reads the existing sidecar (preserves review state + captured_at)
  - Re-runs read_idf_file() on the binary bytes (passes data=
    kwarg so codec doesn't try the broken bare-path Path.read_bytes)
  - Reads extensions.idf_report from the existing sidecar
  - Runs build_bw_report_from_idf adapter
  - Writes refreshed sidecar with bw_report + bumped tool_version,
    preserving review block and original captured_at
  - For IDFW: regenerates .h5 by bridging IdfEvent.from_report ->
    to_minimateplus_event -> write_event_hdf5 (mirrors save_imported_idf
    steps 4-7)
  - IDFH events skip .h5 (histograms have no per-sample data)

Skips events already at current TOOL_VERSION with bw_report present.
--force overrides.  --skip-hdf5 limits to sidecar-only refresh.
--dry-run for preview.

Validated against the prod-snap waveform store: 3,815 Thor sidecars
refreshed cleanly with 0 errors, 462 IDFW .h5 files written, 2 skipped
(binaries with no sidecar — backfill doesn't conjure events from
nothing).  Verified one originally-broken IDFW event now serves
waveform.json (200, 168KB) and a fully populated PDF (119KB vs the
previous 56KB sparse output).

Operator workflow on prod:
  docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py --dry-run
  # Inspect counts, then for real:
  docker exec <sfm-container> python3 /app/scripts/backfill_thor_events.py

Idempotent — re-running it is a no-op once everything's at the current
TOOL_VERSION.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 04:37:43 +00:00
serversdown 23e83908c2 report_pdf: fix PVS overlapping stats table, drop NA caption
Two related fixes to the per-channel stats block:

1. Pin the stats table's position via an explicit bbox= on
   ax.table() so the bottom edge is at a known axes-fraction Y.
   The previous loc="upper left" + tbl.scale(1, 1.4) combo let
   matplotlib choose row heights based on text size, which made the
   table extend further below the axes than the hard-coded PVS line
   at y=-0.08 expected.  Result was the "Peak Vector Sum X in/s"
   string landing horizontally inside the Peak Displacement row.

   With bbox=[0, 1-N*0.12, 0.80, N*0.12] the table is pinned to a
   precise rectangle (12% axes-fraction per row × N rows tall).
   _draw_stats_table now stashes the bottom Y on the axes for the
   PVS helper to reference, so the geometry stays in sync.

2. Center PVS horizontally (ha="center" at x=0.5 instead of ha="left"
   at x=0).  The previous left-edge alignment put PVS at the same
   X as the label column, which read as "off-center" once the rest
   of the stats data was column-aligned further right.

3. Drop the "NA: Not Applicable" caption.  It existed to explain
   "—" placeholder cells, but "—" is universally understood and the
   caption was always visually squished against the PVS line below.
   Less cruft on the page; one fewer position to manage.

Verified against a real BE12599 histogram event (5 data rows) and
a real UM12947 IDFW waveform event (6 data rows) — both layouts
clear the table cleanly with no overlap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 22:17:43 +00:00
serversdown bee118506b fix(idf): decode from in-memory bytes during ingest
Bug shipped in v0.21.0: save_imported_idf called read_idf_file()
with `source_path` (a bare filename like "UM12947_….IDFW") BEFORE
writing the binary to disk.  The codec did Path(path).read_bytes()
which resolved relative to /app and hit FileNotFoundError.  The
error was caught + logged as a warning, and ingest fell back to
.txt-only — events still landed in the DB but lost the bw_report
block + .h5 waveform that the codec was supposed to produce.

Observed during a full re-forward from thor-watcher on 2026-05-29:
every Thor event logged "binary codec failed for X: [Errno 2] No
such file or directory" and got binary_decoded=False.

Fix:
- read_idf_file() gains a `data: Optional[bytes]` kwarg.  When
  supplied, skips the disk read and decodes the provided bytes
  directly.  `path` stays required (used for filename in error
  messages + .IDFH vs .IDFW suffix detection); only the read is
  conditional.  Backward compatible — existing positional callers
  (CLI scripts, tests) continue to work unchanged.
- save_imported_idf passes `data=idf_bytes` since the bytes are
  already in memory from the multipart upload.  Filesystem write
  still happens at step 5 of the existing flow; codec just no
  longer depends on it.

Verified end-to-end against UM11719_20231219162723.IDFW from the
example-data corpus: ingest endpoint returns inserted=1, log line
shows binary_decoded=True + h5=...IDFW.h5, no warnings.

Re-forward existing Thor events from thor-watcher after deploy to
backfill the bw_report block — UPSERT preserves review state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 20:09:54 +00:00
serversdown defd17d9c2 sfm_webapp: harmonize "Received by server at" → "Time received"
Matches Terra-View's event-modal relabel from the same iteration.
Wording was already clearer here than in Terra-View's "Captured at",
but using identical text across both surfaces means operators see the
same label whether they're in the native modal or the standalone
webapp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 19:51:58 +00:00
serversdown e42956a20b release: v0.21.0 — Thor / Series IV codec + Thor→BW adapter
Documents two commits that landed on dev since v0.20.0:

  9b71ead  series 4 codec work, initial decode success
           micromate/idf_file.read_idf_file() decodes both IDFW
           (waveform; 87-99% sample fidelity reusing
           decode_waveform_v2 at offset 0x0f1f) and IDFH (histogram;
           dedicated segment-based decoder, all 859 corpus files
           decode, 181,071 intervals total).

  9fd52dd  feat: add thor report generation, pdf generation
           micromate/idf_to_bw_report.py adapter projects parsed
           Thor data into the bw_report sidecar shape so Thor
           events flow through sfm/report_pdf.py without a
           separate renderer.  Wired into save_imported_idf.

Net effect: a Thor event ingested via /db/import/idf_file now
lands with the same fidelity as a BW event, gets a per-event PDF
on demand, and renders in Terra-View's modal chart using the same
plotting code as a BW event.

Roadmap items closed:
- Binary .IDFW / .IDFH codec (was pending)
- Series IV (Thor IDF) binary codec reverse-engineering

Companion: Terra-View v0.13.0 ships in parallel and closes Phase 1
of the SFM integration.  No API changes in seismo-relay for that
piece — Terra-View just consumes existing endpoints better.

Bumps:
- pyproject.toml 0.20.0 → 0.21.0
- minimateplus.event_file_io.TOOL_VERSION 0.20.0 → 0.21.0
  (any subsequent backfill_sidecars.py --force will re-stamp
  existing sidecars; expected + harmless)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 19:25:44 +00:00
serversdown 9fd52ddabb feat: add thor report generation, pdf generation. 2026-05-29 19:03:06 +00:00
serversdown 9b71ead44b series 4 codec work, inital decode success 2026-05-29 06:33:13 +00:00
serversdown 1bccc44b88 release: v0.20.0 — PDF + parser polish
Closes out the Event-Report PDF iteration started in v0.17.x and ships
the parser fixes the real-world events were tripping over.

Today's additions on top of the pre-v0.20 unreleased body:

- Server-wide display TZ via the TZ env var (default America/New_York
  on prod).  Affects server logs, the PDF report's "Created" footer,
  matplotlib datetime axes.  DB columns stay UTC.  Dockerfile now
  installs tzdata.
- ZC Freq "above-range" handling — parser stores 100.0 +
  zc_freq_above_range flag for BW's ">100 Hz" marker.  Renders as
  >100 in the PDF stats table, both modals (inline on webapp Peaks,
  new column on event-browser table).
- scripts/backfill_sidecars.py --reparse-txt — re-runs the current
  parser against the preserved _ASCII.TXT and overwrites the
  sidecar's bw_report block.  Lets parser fixes reach old events
  without re-forwarding.  Validated end-to-end against ~10k prod
  events.

Fixes shipped today:
- histogram_interval_size_s missing from ReportData → every
  histogram PDF render 500'd.
- Histogram PDF geo channels now share a nice-quantized y-axis
  (0.005-LSB-aware 1-2-5 step sequence) instead of auto-scaling
  per channel + inventing sub-LSB "0.003 in/s/div" footer labels.

Roadmap delta: closes the BW ASCII parser "PPV-miss on some TXT
formats", "histogram-specific structural fields", and ">100 Hz value
parsing" items.  Adds a new entry for the byte[5]==0 histogram body
sub-format observed on S353 events.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 21:17:53 +00:00
serversdown a3cc44d30a feat(backfill): --reparse-txt flag to refresh bw_report from preserved .TXT
The existing backfill_sidecars.py PRESERVES the bw_report block across
regenerations — it's treated as the source of truth from the original
ingest pass (the .TXT isn't reachable from the script's normal data
path, so it can't be re-derived).

That means parser-side fixes (like the 2026-05-28 ">100 Hz" ZC Freq
addition) won't reach old events even with --force.  The new
--reparse-txt flag fixes that: when the sidecar's source.txt_filename
points at a preserved <serial>/<filename>_ASCII.TXT, the script re-runs
the current parser against it and overwrites the bw_report block.

Implies sidecar regeneration on every event (bypasses the
sha-up-to-date / version-up-to-date skip), so that the .h5 cascade-
regenerates alongside.  No-op for events without a preserved .TXT
(legacy ingests pre-2026-05-27).  Idempotent — re-running it produces
the same sidecar bytes when the parser hasn't changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 18:56:23 +00:00
serversdown 6a73523e4d ui: surface per-channel ZC Freq (and ">100") in event modals
The PDF report shows per-channel ZC Freq alongside PPV in the stats
block, but neither modal exposed it.  Now that the sidecar projection
carries zc_freq_hz + zc_freq_above_range, plumb them through:

- sfm_webapp.html: inline suffix on existing Peaks cells, e.g.
  "Tran  0.04500 in/s · >100 Hz".  Empty suffix when no ZC is
  available (legacy events without a preserved .TXT).

- event_browser.html: new ZC Freq column on the per-channel stats
  table.  Required adding a parallel sidecar fetch in loadEvent()
  (waveform.json alone doesn't carry bw_report).  Fetch failure is
  non-fatal — falls back to "—" in the new column.

Above-range ZC peaks (BW ">100 Hz") render with a literal ">"
prefix mirroring the PDF, so operators don't have to generate the
PDF to see when a channel hit the zero-crossing ceiling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 18:47:37 +00:00
serversdown 780b45a371 feat: render ">100" for above-range ZC Freq instead of "—"
BW writes ">100 Hz" for ZC Freq when the zero-crossing algorithm sees a
peak too fast to count — the device's reporting ceiling is 100 Hz on
V10.72.  Our parser fell back to None via _parse_number (which requires
a leading digit), so the PDF rendered "—" where BW shows ">100".

Mirrors the OORANGE/saturated pattern already used for PPV and PSPL:
parser stores the threshold (100.0) on zc_freq_hz + sets a new
zc_freq_above_range flag.  Projection carries the flag through to the
sidecar; PDF renderer prepends ">" when set.

Affects both per-channel stats tables (waveform + histogram variants)
and the mic block's ZC Freq row.

Verified on the real T190LD5Q.LK0W fixture: Tran zc_freq_hz=100.0
above_range=True; Vert/Long (normal values) above_range=False; "N/A"
still produces zc_freq_hz=None which renders as "—" (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 18:38:49 +00:00
serversdown f6abe3caa0 fix(report_pdf): histogram geo channels share nice-quantized y-axis
Two related visual bugs on histogram PDFs:

1. Per-channel auto-scale meant Tran/Vert/Long had different y-axes
   (e.g. 0-0.015, 0-0.025, 0-0.020) — bars looked taller on the
   channel that happened to be quietest.  Not directly comparable.

2. Footer "Amplitude Geo: X in/s/div" was just amax/5 of the FIRST
   geo channel with data, with no LSB quantization — producing
   nonsense like 0.003 in/s/div when the geophone LSB is 0.005.

Fix: compute a single shared geo y-axis range from max(Tran,Vert,Long),
quantize the per-division step to BW's 1-2-5 sequence rounded to the
0.005 LSB (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, ...), apply the same
ylim + ticks to all three geo subplots, and use that same step for the
footer label.  MicL stays on its own auto-scale (different units).

Verified across edge cases including the reported event
(geo max 0.025 → 0.005/div, top 0.025), small PVS events, and large
blast amplitudes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 18:22:20 +00:00
serversdown ad2702d4bf fix(report_pdf): add missing histogram_interval_size_s field
The histogram-interval-times derivation block at line 314 references
rd.histogram_interval_size_s, but the field wasn't declared on the
ReportData dataclass — only the string form histogram_interval_size
was.  Result: every PDF render of a histogram event raised
AttributeError → 500 from /db/events/{id}/report.pdf.

Cause: when the histogram aggregation block was inlined into
gather_report_data, the seconds-numeric counterpart that the
projection already carries (bw_report.histogram.interval_size_s) was
never wired into the dataclass.  Waveform PDFs weren't affected
because the offending line is gated on is_histogram.

Fix: add the field, read it from the projection alongside the other
histogram keys.  No-op for waveform events (the field stays None and
the gate skips it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 18:07:41 +00:00
serversdown 86325b9bab docs: roadmap entry for a SECOND undecoded histogram sub-format (S353)
Observed in fresh ingest logs on 2026-05-28: BE17353 events
(S353L4H2.FZ0H, S353L4H2.P00H, etc.) cause "body codec failed to
decode" warnings.  Different from the byte[5]!=0 case already tracked
(T190 / O121) — these have byte[5]==0x00 with what looks like a
valid block header, but the walker finds zero data blocks anyway.

Operational impact identical to the existing case: ingestion
succeeds, DB peaks come from bw_report overlay, only the chart is
empty.  No data loss.

Pinning so it doesn't get lost — needs a hex dump of one body to
work out what's different about these.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 05:42:18 +00:00
serversdown 6381dcb312 tz: server-wide display timezone via TZ env var (default EST/EDT)
User-reported issue: server logs were timestamped in UTC ("05:36:20"
when local was ~01:36 EDT), and the PDF report's "Created" footer
similarly showed raw UTC.  Inconsistent with the modal which already
converts to browser local via toLocaleString.

Solution: standard Linux TZ env var.  Set once in the container, and:
  - Python's datetime.now() uses local
  - Logging module's timestamps use local
  - matplotlib renderers + report_pdf formatters use local
  - astimezone() conversions resolve to the configured TZ

DB columns stay UTC (created_at uses SQLite's strftime('%Y-...Z', 'now')
which is always UTC, regardless of TZ env var — proper "store UTC,
display local" pattern).

Changes:
  - Dockerfile: install tzdata (python:3.11-slim omits the timezone
    database), set default TZ=America/New_York
  - sfm/report_pdf.py: _fmt_iso_to_bw and _split_iso_to_date_time now
    convert UTC inputs (Z-suffixed) to local via astimezone(); naïve
    inputs (BW recorded-at, already unit-local) returned as-is.
    New _to_display_local helper centralizes the logic.
  - "Created" line in the PDF page footer now uses the converted
    timestamp.

Override per-deployment via the TZ env var in docker-compose
(separate commit on terra-view side).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 05:41:10 +00:00
serversdown 53c05d93e2 delete: also clean up preserved _ASCII.TXT file
_cleanup_event_files() removes the on-disk artifacts when an event is
hard-deleted (binary, a5_pickle, sidecar, h5).  Today's .TXT
preservation feature added a new on-disk file (_ASCII.TXT next to the
binary) but the cleanup didn't know about it — so any event deleted
via /db/events/{id} (single) or /db/events/delete_bulk (or the
Terra-View "SFM Event DB Manager" UI which proxies through to those
endpoints) was leaving orphan .TXT files in the store.

Added "txt" to the cleanup list using the new
WaveformStore.txt_path_for().  Safe for old events without a .TXT —
the exists() check skips the unlink.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 05:31:08 +00:00
serversdown a5888e1b5c report_pdf: PDF histogram aggregation + fix footer/x-axis overlap
Two issues spotted on a histogram event PDF:

1. Footer scale ("Time — /div  Amplitude Geo: X in/s/div  Mic: Y
   psi(L)/div") was overlapping horizontally with the x-axis tick
   labels (0, 20, 40, 60...).  Both rendered on the same Y row.
   Fix: bumped gridspec bottom margin from 0.06 → 0.12, moved the
   footer text from y=0.045 → y=0.030 (below the tick labels), moved
   the page-bottom Created/Event line from y=0.015 → y=0.005.
   Trigger legend on waveforms moved 0.030 → 0.018.  Everything
   stacks cleanly now without collision.

2. PDF was showing the raw codec output (~150+ bars per histogram)
   instead of BW's per-interval aggregation.  Why: the aggregation
   I'd added to /db/events/{id}/waveform.json wasn't replicated in
   the PDF gather path.  Now: gather_report_data does the same
   max-per-group aggregation when bw_report.histogram.n_intervals is
   populated, AND derives per-interval HH:MM:SS labels from the
   start time + interval_size_s.  Result: histogram PDFs now match
   BW's display (one bar per BW interval, x-axis labeled with actual
   times) — same fix as the modal chart, applied to the PDF.

For events ingested BEFORE the parser extension (no histogram block
in their sidecar), aggregation is a no-op — they still render with
per-block bars + interval-index x-axis (but the overlap fix applies
to them too).  Re-forwarding repopulates the histogram block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:33:53 +00:00
serversdown b9f8bbb220 viewers: enforce minimum Y-range on histogram channels
Quiet histogram events were filling the chart panel even though the
peak was tiny (0.005 in/s rendered as 90% of chart height because
Chart.js auto-scaled to peak * 1.1).  Made everything look uniformly
loud regardless of actual amplitude.

BW's solution: a near-fixed scale per channel ("Geo: 0.002 in/s/div"
from the footer).  Quiet events render small, loud events render
proportionally tall.

Match the intent without copying BW's "no Y-axis labels at all"
convention.  For histogram channels:

  Geo (in/s):       min Y range 0.05 in/s
  Mic in psi:       min Y range 0.001 psi
  Mic in dBL:       unchanged (the 60 dBL floor + peak+5 top already
                    gives quiet events a sensible baseline)

So a 0.005 in/s geo event renders as ~10% of chart height; a 0.05
event fills it; a 5.0 event still fills it (max(peak*1.1, 0.05) ==
peak*1.1 for any peak > 0.045).

Waveform charts unchanged — they should zoom for shape detail.
Applied to both the modal in sfm_webapp.html and the standalone
/events page in event_browser.html.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:23:01 +00:00
serversdown b59f886cb7 docs: roadmap entry for sensor-check waveform extraction
BW's Event Report PDFs include a per-channel sensor-check response
waveform on the right side of the bottom plot (damped sinusoid for
geo channels, sawtooth-at-test-freq for mic).  Looks like real
per-sample data extracted from the binary, not synthesized.

Our parser captures the test results (freq, ratio, amplitude,
pass/fail) but not the waveform samples — so the report shows text
only for sensor check.  Pinning a roadmap entry to investigate the
binary for the sample data (path a) or fall back to synthesized
visualization (path b).

Current text-only display is operationally sufficient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:17:50 +00:00
serversdown 87aec3f4d1 viewers: smoother mic dBL chart + restore binary/TXT download links
Two issues spotted in the modal:

1. Mic dBL chart looked spikey/discontinuous — isolated bars at 80-95
   with gaps in between.  Cause: _psiToDbl() returns null for zero or
   negative samples, and most mic samples on a quiet event sit at the
   digitization noise floor where they're effectively zero.  Result:
   the chart only renders the moments when instantaneous SPL exceeded
   the Y-axis bottom — looks like a sound trigger gate.

   Fix: new _psiToDblForChart() rectifies the AC waveform (abs), then
   converts to dBL, then floors at MIC_DBL_FLOOR=60 dBL.  Chart now
   has a continuous 60 dBL baseline with peaks above it — matches how
   acoustic engineers expect SPL-vs-time.  Y-axis bottom pinned to
   MIC_DBL_FLOOR, top to peak + 5 dB headroom.  Peak label still uses
   the unrectified _psiToDbl so the displayed peak value is exact.

2. Filename in Source/Files block was unlinked.  Endpoint exists
   (/db/events/{id}/blastware_file) — just wasn't wired to the modal.
   Made it a clickable download link.  Same treatment for the
   preserved .TXT — added "(download .TXT)" link next to source kind
   when source.txt_filename is populated (events ingested after the
   .TXT preservation feature landed; older events show no link).

Applied to both the inline modal in sfm_webapp.html and the
standalone /events page in event_browser.html.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 23:08:21 +00:00
serversdown ace542cba5 report_pdf: wire histogram peak date/time + PVS-when + Finish field
Spotted comparing our PDF to BW's reference for T003LLUB.CE0H:
  - Finish blank
  - Per-channel Date / Time rows all dashes
  - MicL PSPL line missing "on May 27, 2026 at 06:19:14"
  - Peak Vector Sum missing "on May 27, 2026 At 06:06:14"

Root cause: I'd added these fields to the projection (write side) in
_bw_report_to_dict but never wired them into gather_report_data
(read side).  Plus the projection used keys "start"/"stop" while
gather was reading "start_str"/"stop_str" — typo'd lookup.

Fixes:
  - gather_report_data now reads bw_report.histogram.start /
    .stop / .channel_peak_when (correct keys, matching the projection)
  - Per-channel "peak_date" / "peak_time" populated from
    channel_peak_when[<channel>] for the histogram stats table
  - MicL PSPL line formats as "PSPL  125.7 dB(L) on May 27, 2026
    at 06:19:14" (BW style) when channel_peak_when["MicL"] is present;
    falls back to the waveform-relative "at 0.012 sec" otherwise
  - PVS line formats as "Peak Vector Sum  0.091 in/s on May 27, 2026
    At 06:06:14" (BW style) when bw_report.peaks.vector_sum.when is
    populated; falls back to the relative time_s for waveforms
  - New _split_iso_to_date_time() helper splits ISO timestamps into
    BW-formatted ("May 27 /26", "06:06:14") date+time pairs for the
    stats table's separate Date and Time rows

Events ingested BEFORE the parser extension landed (most of the
existing prod corpus) still show dashes — their sidecars lack the
histogram block.  Re-forwarding repopulates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:47:53 +00:00
serversdown 8cbda09917 viewers: render timestamps in browser-local time
Spotted on the SFM webapp event modal — "Received by server at" was
showing the raw ISO string "2026-05-27T21:59:57.213043Z" because we
were assigning ev.timestamp / src.captured_at directly to the
textContent of the modal fields, bypassing the existing _fmtTs()
helper that wraps them in toLocaleString().

Net effect for operators: confusing "21:59 vs it's 6 PM" mismatch
when the displayed UTC timestamp didn't match wall-clock time.  The
values were always correct; the display was just ambiguous.

After this fix:
  - "Recorded at" (naive ISO from BW = unit local time) renders
    cleanly as the unit wrote it: "5/27/2026, 6:00:13 AM"
  - "Received by server at" (UTC with Z suffix) converts to browser
    local: "5/27/2026, 5:59:57 PM"
  - Timestamp column in the history table already used _fmtTs —
    unchanged
  - Same fix applied to the standalone /events page (sidebar event
    list + meta header) via a new _fmtTsLocal helper

Note: did NOT add file-mtime-on-watcher-PC tracking as a separate
"Called in at" column — discussed and decided created_at is close
enough for schedule-compliance monitoring (worst case lag = watcher
poll interval ~60s, indistinguishable from BW write time at the
operationally-relevant resolution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:30:43 +00:00
serversdown 3457ed0072 bw_ascii_report: parse OORANGE saturation marker + TimeSum typo
BW writes "OORANGE" (truncation of "Out Of Range") when a channel
exceeds its full-scale, and uses a typo'd label "Peak Vector Sum
TimeSum" for the PVS time field.  Both confirmed against real ASCII
files pulled from a Windows watcher PC 2026-05-27:

  T190LD5Q.LK0W  Vert PPV = OORANGE  (Normal range, 10 in/s exceeded)
  T438L713.RY0W  All three PPVs OORANGE  (Sensitive range, 1.25 in/s)
  K557L3YM.OE0W  Tran+Vert PPV OORANGE + MicL PSPL OORANGE

Previously our _parse_number() returned None for OORANGE → DB columns
ended up NULL → events vanished from filters / sorts / dashboards
despite being legitimate high-amplitude events.

New behavior — substitute a conservative bound + set a saturation flag:
  - Channel PPV       → geo_range_ips + ChannelStats.ppv_saturated
  - Peak Vector Sum   → sqrt(3) * geo_range_ips + peak_vector_sum_saturated
  - MicL PSPL         → 140 dB(L) + MicStats.pspl_saturated

Flags propagate to the sidecar's bw_report block so the SFM UI can
render "> 10 in/s" / "> 140 dBL" rather than treating the substituted
value as exact.

Same commit also accepts "Peak Vector Sum TimeSum" as an alias for
"Peak Vector Sum Time" (BW always writes the typo on OORANGE PVS
lines — every example file confirms it).

Tests: new test_oorange_marker_treated_as_saturation (synthetic) +
test_real_oorange_event_t190_parses (skips if real fixture absent).
177/177 tests pass; 16 pre-existing missing-fixture skips unchanged.

Five events on prod (T190, T438, K557, plus 2 others matching the
same fault pattern) will pick up correct peaks + saturation flags
once watchers re-forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 20:32:56 +00:00
serversdown d21e3b5298 histogram aggregation + parser extension for BW interval fields
Three layered changes that together make histogram charts visually
match BW's printout (one bar per interval, not per codec block):

1. bw_ascii_report parser captures histogram fields it previously
   dropped:
     - Histogram Start/Stop Time + Date → datetime
     - Number of Intervals + Interval Size (string + parsed seconds)
     - <Channel> Peak Time + Peak Date → datetime (per-channel)
     - Peak Vector Sum Date (combined with PVS Time → datetime;
       clears the bogus seconds parse that interpreted "22:33:52"
       as 22.0)
   New _parse_iso_date() handles BW's ISO format for histograms
   (waveforms use "May 8, 2026" long form).  New _parse_interval_size()
   handles "1 minute" / "5 minutes" / "15 seconds" etc.

2. _bw_report_to_dict() projects the new fields into a new
   bw_report.histogram block in the sidecar.

3. /db/events/{id}/waveform.json wraps the existing path 1 (HDF5)
   output with _maybe_aggregate_histogram(): when the event is a
   histogram AND the sidecar has bw_report.histogram.n_intervals,
   group the codec's per-block samples into N intervals via
   max-per-group and return the aggregated array.  time_axis gains
   histogram_aggregated / n_intervals / interval_size_s / interval_times
   fields.

Frontend (both modal chart in sfm_webapp.html + standalone event
browser) uses interval_times as x-axis labels when provided (BW-style
HH:MM:SS), falls back to interval index.

Defensive: aggregation is no-op when the sidecar lacks the histogram
block (events ingested before this change).  Activates automatically
on prod once a watcher re-forward populates new sidecars.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 20:23:05 +00:00
serversdown ad2b553c7b ingest: preserve raw BW ASCII report (.TXT) alongside the binary
Previously the .TXT was parsed into the sidecar's bw_report projection
and then discarded at ingest time.  Now save_imported_bw() writes it
to <store>/<serial>/<filename>_ASCII.TXT permanently.

Rationale: with BW Mail / Forwarding Agent being phased out of the
operator workflow, the XML/PDF/WMF those tools produce won't be
available — the binary + .TXT (created by BW ACH itself) are our
only authoritative inputs going forward.  Keeping the raw .TXT
unlocks:

  - Parser bug fixes can be applied RETROACTIVELY by re-parsing the
    stored .TXT, instead of requiring a re-forward from the watcher
    PC (which lost the .TXT after BW ACH cleanup).
  - Audit trail of what BW actually sent us, for debugging.
  - The five known parser-PPV-miss events will be re-parseable once
    the regex fix lands (instead of staying broken indefinitely).

Storage cost: ~15 KB per event × 14k events = ~210 MB on the
existing prod corpus.  Negligible.

Implementation:
  - WaveformStore gains txt_path_for() + open_txt()
  - save_imported_bw() writes the .TXT when bw_report_text is supplied
  - sidecar source block records the txt_filename
  - backfill_sidecars.py preserves txt_filename across regens
  - New GET /db/events/{id}/ascii_report.txt endpoint serves it
  - Returns 404 for events ingested before this change (no .TXT in
    the store yet) — re-forward to populate

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 20:01:12 +00:00
serversdown dfbc8b8520 report_pdf: split waveform vs histogram layouts (BW PDF iteration)
Reviewed against real Blastware Event Report PDFs (uploaded to
example-events/pdfsnstuff/) for K558LLB7.V20H (histogram) and
K558LLB8.0E0W (waveform).  Each event type has its own layout because
BW's printouts genuinely differ:

  Waveform header:   Date/Time, Trigger Source, Range, Sample Rate
  Histogram header:  Start, Finish, Intervals At Size, Range, Sample Rate
                     (no trigger field — histograms aren't triggered)

  Waveform stats:    PPV, ZC Freq, Time (Rel. to Trig),
                     Peak Acceleration, Peak Displacement, Sensor Check
  Histogram stats:   PPV, ZC Freq, Date, Time (of peak), Sensor Check

  Waveform plot:     4-channel stacked line, x-axis in SECONDS,
                     trigger triangle + window markers, symmetric Y
                     for geo, zero-anchored mic, "0.0" baseline label
                     on right edge per BW convention
  Histogram plot:    4-channel stacked bars, Y-axis 0-to-peak only
                     (never negative — peaks are magnitudes), 0.0
                     baseline at the bottom

  Waveform footer:   USBM chart placeholder upper-right;
                     "Time X sec/div   Amplitude Geo: Y in/s/div   Mic: 0.001 psi(L)/div"
                     "Trigger = ▶━━◀"
  Histogram footer:  No USBM chart; same scale-info footer with
                     interval-size as the time unit

Other fixes from the first-pass screenshot review:
  - Channel labels (MicL/Long/Vert/Tran) no longer cut off (wider
    left margin)
  - Histogram bars rise from zero baseline (abs of any signed values)
  - ISO timestamp "2026-05-16T22:33:50" → "22:33:50 May 16, 2026"
    matching BW's display format

Known gaps (separate work):
  - Histogram codec returns per-block granularity (~200 bars for
    BW's 4-interval display).  XML-driven data source is the planned
    fix; the structured BW XML has the per-interval aggregates.
  - USBM RI8507 / OSMRE compliance chart still placeholder

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 18:22:03 +00:00
serversdown 411ef8139e sfm: Event Report PDF generation (v0.20.0 stub layout)
New endpoint GET /db/events/{id}/report.pdf returns a single-page
letter-portrait PDF for any event with waveform data on disk.

Architecture:
  sfm/report_pdf.py — gather_report_data() assembles fields from
    SeismoDb row + .sfm.json sidecar (bw_report block) + .h5 samples;
    render_event_report_pdf() turns that into PDF bytes via matplotlib.
  sfm/server.py — new endpoint wires them together, streams PDF back
    with Content-Disposition: inline so the browser displays it.
  sfm_webapp.html — new "Download PDF" button in the event modal
    footer that opens the endpoint in a new tab.

Fields surfaced — same coverage as a Blastware Event Report:
  Header metadata (date/time, trigger source, range, sample rate,
                   project, client, operator, location, serial+firmware,
                   battery, calibration, file name)
  Microphone block (PSPL in dB(L) + psi, ZC freq, channel test)
  Per-channel stats (PPV, ZC Freq, Time of Peak, Peak Accel,
                     Peak Disp, Sensor Check) for Tran/Vert/Long
  Peak Vector Sum
  Waveform plot (MicL/Long/Vert/Tran stacked, shared time axis,
                 trigger marker, symmetric Y for geo, zero-anchored
                 mic) — OR per-interval bar chart for histograms.

Rendering pipeline = matplotlib only (vector PDF, no headless-browser
dep).  Adds matplotlib>=3.8 to deps.

Visual layout is approximate until reference PDFs from Instantel land
at docs/reference/instantel/ for iteration.  USBM RI8507 / OSMRE
compliance chart is stubbed (placeholder rectangle) — separate work
item.

Smoke-tested on a K558 waveform event: 77 KB valid PDF, all fields
populated correctly from the snapshot DB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 02:55:58 +00:00
serversdown ed926de3f4 viewers: default mic to dB(L) + add Mic-unit toggle (dBL ↔ psi)
The sidecar-modal waveform plot was rendering mic in raw psi, while the
rest of SFM (history table column, peaks block, live-device chart,
event detail modal mic field) had already converted to dB(L) — matching
the BW Event Report convention.  Unifying.

Both viewers now:
  - Default mic chart values + axis title + peak label to dB(L)
  - Provide a header toggle ("Mic: dBL" pill) to flip to psi
  - Persist the preference via localStorage (sfm_mic_unit)
  - Re-render the open chart immediately on toggle

Conversion: dBL = 20 * log10(psi / 2.9e-9), where 2.9e-9 psi is the
20 µPa reference pressure already defined for the rest of the webapp.
Non-positive psi samples (log undefined) render as null; Chart.js
handles them as gaps in line mode and missing bars in histogram mode.

Also fixes event_browser.html's stats table — the MicL row was
hard-coding "<value> psi"; now honors the same toggle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 02:30:56 +00:00
serversdown 5d5441604b viewers: symmetric Y-axis on geo waveforms + clarify timestamp labels
Two fixes from the second screenshot review:

1. Geophone waveform Y-axis now renders SYMMETRIC around zero — zero
   line sits in the middle of the chart, signal goes both above and
   below.  Standard seismograph display convention; matches the
   Instantel printout look.  Previously Chart.js auto-scaled to the
   data range so e.g. Vert showing values from -0.005 to -0.015 had
   the zero line completely off-screen.

   Mic channel (sound pressure, always positive) keeps the default
   auto-scale anchored at zero.  Histograms (per-interval peaks, also
   always positive) likewise keep bars rising from a zero baseline.

2. Modal labels clarified to remove the 'Timestamp' vs 'Captured at'
   ambiguity:
     'Timestamp'   →  'Recorded at'         (when the seismograph
                                              recorded the event —
                                              from BW report's Event
                                              Time field)
     'Captured at' →  'Received by server at' (when our sfm-db
                                              inserted the row)
   Both have tooltips explaining the distinction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 20:26:23 +00:00
serversdown 784f2cca36 viewers: decimal peak labels + bar chart for histograms + clean x-axis ticks
Three polish fixes spotted in the first prod screenshot of the inline
event-modal waveform plot:

1. Peak labels were rendering as "PEAK 2.500E-2 IN/S" because of a
   blanket toExponential(3) call.  New _fmtPeak() formatter picks
   decimal with adaptive precision for normal-range values (0.0001 to
   10000) and falls back to scientific only for truly extreme
   magnitudes.  Same value now reads "peak 0.0250 in/s".

2. Histogram events were being plotted as connected line charts, but
   histograms are per-INTERVAL peaks (one bar per minute, typically),
   not per-sample waveforms.  Now: detect histogram via record_type,
   render as a tight bar graph (bars touch), suppress the trigger line
   + zero baseline overlays (no trigger event on a histogram), and
   label the x-axis with interval number instead of milliseconds.

3. X-axis tick labels were displaying as "11.7187040000000002 ms"
   because the callback used the raw float, not the formatted label.
   Snap to 1 decimal place (or integer for whole-number values like
   histogram intervals).

Applied to both the inline modal plot in sfm_webapp.html and the
standalone /events viewer in event_browser.html — they share the same
data shape and presentation conventions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 19:54:04 +00:00
serversdown 6abfadae4f viewers: render pre-trigger samples (time_axis is metadata, not an array)
The /db/events/{id}/waveform.json endpoint returns `time_axis` as a
metadata object — {sample_rate, pretrig_samples, t0_ms, dt_ms,
n_samples, total_samples, rectime_seconds} — not a per-sample times
array.  Both viewers (sfm_webapp.html sidecar modal + event_browser.html)
were treating it as an array, silently falling back to a derived path
that ignored pretrig entirely and started the time axis at 0.

Symptom: trigger line drawn at the very left edge of every chart, no
visible "leading up to the event" samples even though they're in the
decoded data.

Fix: read time_axis.t0_ms (negative when pretrig samples exist),
time_axis.dt_ms, build per-sample times as `t0_ms + i * dt_ms`.  Trigger
line lands at sample where t crosses 0; pretrig samples render at
negative t to the left of it.

Confirmed on a K558 event with 208 pretrig samples + 2 sec rectime at
1024 sps — time axis now spans -203 ms to +2046 ms, trigger line at
~9% from the left edge as expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 21:58:20 +00:00
serversdown fd0e28657d sfm_webapp: default to Database view + sortable columns + inline waveform plot
Three UX upgrades to the main SFM webapp at /, all reinforcing the
'browse stored events' flow as the primary entry point:

1. Default section is now Database, not Live Device.  Most users land
   here to look at stored events; Live Device is opt-in (click the tab
   to talk to a unit).  Initial history + units fetch fires on first
   paint so the table is populated when the page loads.

2. History table columns are sortable.  Click any header to sort:
   timestamp, serial, per-channel PPV (Tran/Vert/Long), PVS, mic dB(L),
   project, client, type, key.  Default direction varies by column type
   (desc for numbers + timestamps, asc for text).  Sort arrows appear
   in the active column header.  Headers are sticky so they stay
   visible while scrolling.

3. Click-event-to-see-waveform.  The existing sidecar review modal now
   renders the 4-channel waveform plot inline at the top, fetched from
   /db/events/{id}/waveform.json in parallel with the sidecar fetch.
   Channels stacked MicL / Long / Vert / Tran (Instantel printout
   order), shared bottom time axis, dashed trigger line + triangle
   markers at t=0, zero baseline with "0.0" label on the right edge,
   peak callouts per channel.  Charts cleaned up on modal close.

Resolves the "where is the viewer" surprise — operators no longer need
to know about the /events route to see waveforms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:39:18 +00:00
serversdown c14a8c54db event_browser: Instantel-printout-style polish
Apply the cheap visual wins from the BW Event Report layout:

  1. Channel order reversed → MicL (top), Long, Vert, Tran (bottom)
     to match the Instantel printout.
  2. Shared bottom time axis — x-axis ticks only render on the
     bottom-most data channel; other channels hide ticks so all four
     visually share one time scale.
  3. Triangle trigger markers above and below the t=0 dashed line.
  4. Horizontal zero-baseline (dotted) per channel with "0.0" label
     on the right edge — Instantel convention.
  5. "Print view" toggle that flips dark→light theme (white panels,
     light grids, dark text) so the viewer can render usefully on
     paper-style output / @media print.
  6. Per-channel PPV stats table in the metadata header, with Peak
     Vector Sum displayed prominently.
  7. Colors adjusted to approximate BW trace colors (magenta MicL,
     blue Long, green Vert, red Tran).

Future PDF-export work will reproduce the same layout server-side
once you upload a real example PDF and we pick a rendering pipeline
(weasyprint / chromium --print-to-pdf / etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:09:12 +00:00
serversdown 460006e5cd sfm: stored-event browser at /events
New standalone HTML page (sfm/event_browser.html, ~470 lines, Chart.js)
that lets you browse persisted events from the SeismoDb + WaveformStore.
Companion to the existing live-device viewer at /waveform:

  /waveform  — connect to a unit and pull events in real time
  /events    — browse events already stored in the DB

Flow:
  1. Page loads → GET /db/units → populate serial dropdown
  2. Select serial → GET /db/events?serial=X&limit=500 → event list
  3. Click event → GET /db/events/{id}/waveform.json → render

Layout is Instantel-printout-ready: channels stacked vertically in
Tran / Vert / Long / MicL order, trigger line at t=0, peak labels,
clean dark theme.  Frames the future PDF-export feature without
needing extra layout work.

Smoke-tested against the dev prod-snapshot — 4 channels render with
correct peaks for K558 events (L=0.3 in/s = the offset-fault peak
we've been chasing all week).

CHANGELOG entry added under [Unreleased] per the v0.20.0 release plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 06:53:48 +00:00
serversdown 8710b8f327 docs: record three known issues discovered during prod deployment
1. bw_ascii_report parser misses PPV/vector_sum fields on certain TXT
   formats (5 events in prod).  Parser extracts every OTHER field for
   the same channels — likely a regex / format mismatch specific to
   some firmware-or-event-type combination.

2. NULL-timestamp duplicate rows.  events.timestamp can come back as
   NULL when the codec can't extract a footer timestamp; UNIQUE(serial,
   timestamp) doesn't fire on NULL, so backfills create new rows
   instead of upserting.  2 affected events on prod, easy SQL cleanup.

3. Histogram body sub-format with byte[5] != 0.  ~3 events on prod
   (T190LD5Q, O121L4L1) use a histogram body the walker doesn't
   recognize.  Codec returns 0 valid blocks; DB peaks come from the
   bw_report ASCII overlay so DB columns are correct, only the .h5
   plot is empty.  Cracking the sub-format unlocks the plot.

All three are pre-existing issues that today's deployment surfaced
during validation; none are regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 21:02:13 +00:00
serversdown db657bcac9 Merge pull request 'fix: bw_report overlay onto event before DB, prevents data loss docs: three-tier architecture model + strategic roadmap' (#27) from feat/wire-histogram-codec into dev
Reviewed-on: #27
2026-05-22 15:46:46 -04:00
serversdown 35842ac50a backfill: overlay bw_report onto Event before DB upsert
Mirror what the ingest path does: BW's reported peaks (and sample_rate
/ record_time) take precedence over codec output where present.

Without this, --force backfill silently overwrites bw_report-overlaid
DB columns with codec-derived peaks.  Wrong for events where the codec
doesn't fully decode (waveform walker edge cases on SP0/SS0/SV0-style
events, histogram byte[5]!=0 sub-format that isn't yet RE'd), producing
PVS=0 on real high-amplitude events.  Bit on prod 2026-05-22 with
three top-10 waveform events ending up at PVS=0 (rolled back same day,
this fix is the proper resolution).

New helper minimateplus.event_file_io.apply_bw_report_dict_to_event
operates on the projected sidecar dict shape (the structure
_bw_report_to_dict produces, which is what gets preserved in the
sidecar).  Mirrors apply_report_to_event's semantics: only writes
fields where bw_report has a non-None value, no-ops cleanly on
empty / None input.

Dev validation against prod snapshot:
  pre  : 1839.7315 pvs_sum   356 events with DB PVS ≠ sidecar bw_report
  post : 2016.4902 pvs_sum     2 events still mismatched (both have NULL
                                timestamp + duplicate rows, edge case)

Both edge-case events DO get the correct value written by the new
backfill — their stale rows from prior backfills remain because
UNIQUE(serial, timestamp) doesn't fire on NULL.  Separate dedup
cleanup needed for those 2 events (0.014% of corpus); not blocking.

Backfill remains idempotent + bw_report preservation still passes
(0 WIPED, 0 CHANGED on the 3rd consecutive run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 18:56:22 +00:00
serversdown 49a524d0d4 docs: three-tier architecture model + strategic roadmap
CLAUDE.md gains an Architecture section near the top describing the
canonical three-tier mental model:

  - SFM: device-side, live connections, /device/* endpoints
  - SDM: data-side, DB + waveform store + /db/* endpoints (currently
    living under sfm/ for historical reasons; rename deferred)
  - Codec library: pure data-interpretation, used by both tiers

Future code should be placed and named according to this model even
though the directory layout doesn't fully reflect it yet.  Decision
rule for where new code goes is documented inline.

README.md's Roadmap section gains two strategic-direction subsections:

  - "Strategic direction" — frames the suite-of-components vision and
    notes that BW ACH + Thor IDF call-home remain the data movers;
    seismo-relay's value is on the receiving and processing side.
  - "Terra-View ↔ SFM device control" — the long-term vision where
    Terra-View can launch into SFM device-control surfaces (operator
    notices missing unit → clicks "Connect to Device" → live view in
    browser).  Includes concrete implementation checklist (auth,
    embedded live-monitor view, action history, series IV live
    support).

The existing tactical roadmap items remain unchanged below.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 18:38:00 +00:00
serversdown 9ef424d098 Merge pull request 'Histogram body codec — full RE + peak-count fix that resolves the prod inflation incident' (#26) from feat/wire-histogram-codec into dev
Reviewed-on: #26
2026-05-22 13:08:03 -04:00
serversdown ed6982c512 scripts: bw_report preservation check for backfill safety
Two-step tool to verify that backfill_sidecars doesn't wipe the
bw_report block from existing sidecars.  Workflow:

  1. snapshot --out before.json    (canonical-JSON hash per sidecar)
  2. run backfill
  3. diff --baseline before.json   (classifies every sidecar:
       PRESERVED / CHANGED / WIPED / STILL_MISSING / NEW / ADDED / REMOVED)

Exit code 1 if any WIPED or CHANGED entries found, 0 otherwise — so
it can gate a CI step or a deploy script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 06:13:52 +00:00
serversdown d506ebc103 histogram_codec: peak count is uint8 (not uint16 LE) — properly cracks
the BE9558 / BE18003 extension-byte case

The bytes at [7]/[11]/[15]/[19] are an annotation field (purpose still
unclear — empirically non-zero on intervals with sub-Hz or unmeasurable
freq), NOT the high byte of the peak count.  The N844 fixture corpus
the original RE was done against had zero values in those bytes for
every block, so uint8 and uint16 LE were equivalent there — but on
real BE9558 Tran-drift events and BE18003 Histogram+Continuous events
the uint16 LE interpretation produced peaks up to 268 in/s and 35×
inflated PVS sums.

Cross-correlated against BW's per-interval ASCII export on:
  - K558LKZU/LL1P/LL3K  → 100% T/V/L/M peak match (1435 blocks each)
  - T003LKZR/LL0O/LL1M  → 100% T/V/L, 99.3% M (0.05 dB rounding only)
  - N599LKZS/LL0L        → 100% all channels
  - N844 fixture corpus  → 100% all channels (unchanged)

Annotations preserved on every record for future RE; the defensive
_MAX_PEAK_COUNT bound is no longer needed (uint8 maxes at 1.275 in/s,
well below any physical limit).

Synthetic regression test added using the verbatim K558LKZU.RE0H
interval-12 block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 06:05:19 +00:00
serversdown e949232875 histogram_codec + backfill: tighter peak ceiling, preserve bw_report
histogram_codec: drop _MAX_PEAK_COUNT 4096 → 2200. The old ceiling
let extension-byte blocks slip through at up to 20.48 in/s per
channel, producing 35× inflated PVS sums when first deployed to
prod. 2200 covers Normal-range full-scale (10 in/s = 2000 counts)
plus 10% headroom for quantization edge cases.

backfill_sidecars: also preserve the bw_report block alongside
review + extensions when regenerating sidecars. event_to_sidecar_dict
takes a BwAsciiReport dataclass not a dict, so for bw_report we
overlay the existing block after regen rather than passing as a kwarg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:50:10 +00:00
serversdown bc5a2d3f19 histogram_codec: defensive bounds-check on peak counts
Discovered while running the backfill on prod: certain histogram
blocks contain an undocumented extension byte format whose naive
uint16 LE interpretation yields physically impossible peak values
(150+ in/s when the device max is 10).  Concrete example from
K558LKSG.3I0H block at body+7424:

  bytes [6:10] = 05 79 69 00
  current code: T_peak = uint16 LE = 0x7905 = 30981 → 154.9 in/s
  reality:     T_peak = byte[6] = 5 → 0.025 in/s (matches BW display)

The high byte (0x79 here) appears to be an extension field — possibly
"time of peak within interval" or a Histogram+Continuous sub-mode
marker.  Observed across BE9558 and BE18003 units in prod data; never
appeared in the BE12844 fixture corpus the codec was originally
verified against.

Effect on prod: 26 out of 1433 blocks in this one event had inflated
peaks, plus dozens of similar events across the fleet → sum(PVS)
inflated from baseline 988 to 34501 (35x).  Rolled back via the
pre-backfill snapshot before any UI exposure.

Defensive fix: bounds-check peak counts in `_decode_block`.  Any
field exceeding `_MAX_PEAK_COUNT` (4096 = ~20 in/s, well past the
device's 10 in/s Normal-range FS) causes the block to be skipped
entirely.  Other valid blocks in the same event still decode
correctly.

Trade-off: those skipped blocks lose their per-interval data
(peaks + frequencies).  Acceptable until the extension format is
reverse-engineered — better than propagating bogus values into PVS
computations downstream.

The 24 existing tests all still pass — the fixtures used during the
original codec development don't exercise the extension-byte case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:17:33 +00:00
serversdown 88549bc659 backfill_sidecars: filter out Thor IDF files
Discovered while dry-running the backfill on prod: the waveform store
contains both BW (.AB0*/.N00) and Thor IDF (.IDFW/.IDFH) event files
side-by-side because both go through the same per-serial directory
layout.  The script's `_looks_like_event_file` heuristic accepted any
3-4 char extension ending in W or H, which matched both BW and IDF.

The script then routes everything through
`event_file_io.read_blastware_file`, which rejects IDF files with
"not a Blastware file (bad header prefix)" — 3807 errors on prod
out of 7201 total events.

Thor IDF events have their own ingest path
(`WaveformStore.save_imported_idf`) and their sidecars are populated
at ingest from the paired `.IDFW.txt` ASCII report.  The backfill
script has no value to add for them — there's no decoder to refresh,
and the sidecar metadata is already correct.  Filter them out.

After this fix, the prod backfill should run clean: ~3392 BW events
get sidecar+h5 regen as expected; the ~3807 Thor IDF events are
silently skipped.

The proper "IDF backfill" (refresh tool_version stamp on IDF
sidecars by re-running event_to_sidecar_dict against the stored
DB row + sidecar extensions block) is a separate, narrower
follow-up — not blocking the BW backfill rollout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 01:20:08 +00:00
serversdown 76bce0b5a3 Merge pull request 'v0.20.0 - prerelease features.' (#25) from feat/wire-histogram-codec into dev
- dockerfile fix
- histogram body codec FULLY decoded
- backfill scripts fixed.
- docs added for histogram codec
2026-05-20 21:05:37 -04:00
serversdown 7183b953e4 minimateplus: histogram body codec — FULLY DECODED
The histogram-mode event body is now byte-exact decodable.
Companion to the waveform body codec — together they cover every
event file the watcher forwards.  Cracked in one session via
cross-event correlation against BW's ASCII export.

The §7.6.2 spec in instantel_protocol_reference.md was structurally
correct (32-byte blocks) but the per-sample semantics were
under-documented.  Cross-checking block 130 of N844L6Z8.ZR0H
against its TXT row revealed the layout perfectly:

  slot[0] = 10 (constant marker)
  slot[1] = T_peak_count    (× 0.005 → in/s at Normal range)
  slot[2] = T_halfperiod    (freq_Hz = 512 / halfp)
  slot[3] = V_peak_count
  slot[4] = V_halfperiod
  slot[5] = L_peak_count
  slot[6] = L_halfperiod
  slot[7] = MicL_peak_count (dB via waveform_codec.mic_count_to_db)
  slot[8] = MicL_halfperiod

The `>100 Hz` sentinel is halfperiod ≤ 5 (since 512/5 = 100 Hz).
Mic dB uses the SAME formula as the waveform codec (sign × (81.94
+ 20·log10(|count|))) — they share the mic ADC calibration constant.

Block identification anchor: bytes [22:24] == 0x0000 AND
bytes [28:32] == 1e 0a 00 00.  The tail signature is the most
reliable distinguisher from non-block content in the file.

Files:

  minimateplus/histogram_codec.py (new) — decoder + public API
    matching the waveform codec's shape:
      walk_body(body) -> records
      decode_histogram_body(body) -> {Tran, Vert, Long, MicL}
      decode_histogram_body_full(body) -> [per-interval dicts]
      half_period_to_hz, geo_count_to_ins helpers

  minimateplus/event_file_io.py (modified) — read_blastware_file
    now tries the waveform codec first, falls back to the histogram
    codec on failure.  Same output shape, same downstream pipeline.

  tests/test_histogram_codec.py (new) — 24 regression locks against
    the in-repo fixture corpus, byte-exact against BW ASCII export
    for peaks (all 4 channels), frequencies (all 4 channels,
    including >100 Hz sentinel handling), block framing, and
    segment-ID accounting.

  scripts/backfill_sidecars.py (modified) — the has_samples
    short-circuit added in the histogram-pending era is now a
    pure defensive guard.  Histograms in prod will regen .h5 files
    correctly on the next backfill run.

  docs/histogram_codec_re_status.md (updated) — supersedes the
    earlier "in progress" version with the verified format and
    test-coverage summary.  Notes a few non-essential fields still
    open (4-byte block metadata, Geo PVS, Mic psi(L) — none of
    which are needed for waveform reconstruction).

Total verified coverage: ~3,500 blocks across 5 fixtures, every
field of every block byte-exact against BW.

The watcher-forwarded histogram event corpus on prod (~10,000
events) will now produce correct .h5 sidecars on the next backfill
run.  No additional changes needed to the backfill flow — the
existing tool_version-bump cascade picks them up automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:05:13 +00:00
serversdown c3c7fe559c docs: histogram body codec RE — starting-point status doc
Captures everything learned in the 2026-05-20 session before scope
forced a pause:

  - Block framing is solved: 32-byte blocks, one per histogram
    interval, signature byte pattern `[22:24]=0x0000` +
    `[28:32]=0x1e 0x0a 0x00 0x00` reliably identifies data blocks.
  - Block count = interval count (791 blocks in N844L20G.630H for
    a TXT-reported 792 intervals).
  - Sample[0] = Tran peak in 0.0005 in/s/count units (verified on
    one event — needs cross-event confirmation).
  - Samples 1-8 → channel/metric mapping is still open.  None of
    the obvious layouts (peak-then-freq alternating, all-peaks-
    then-all-freqs, per-channel 3-tuples) match the TXT values
    across multiple blocks.  Likely needs a higher-activity
    fixture (current N844 corpus is all noise-floor data) to
    disambiguate.
  - `>100 Hz` sentinel encoding in the binary is unknown.
  - 4-byte variable metadata field at block[24:28] needs
    correlation work against TXT columns.

Doc mirrors the structure of docs/waveform_codec_re_status.md so
a future RE session has a familiar entry point.  Includes the
suggested attack plan + the code seam where the eventual decoder
will land (minimateplus/histogram_codec.py).

The §7.6.2 spec in instantel_protocol_reference.md is structurally
correct but doesn't pin down per-sample semantics — this doc
supersedes it where they conflict on confidence level.

No code shipped on this branch.  When the codec is cracked, the
plan is to land minimateplus/histogram_codec.py + wire into
event_file_io.read_blastware_file() + remove the has_samples
short-circuit from scripts/backfill_sidecars.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 21:13:26 +00:00
serversdown fa9d3cdef2 read_blastware_file: leave peak_values=None when samples can't be decoded
Fixes a data-loss bug discovered while dry-running the backfill against
the prod store.

Symptom: every histogram event in the store has its body decoded by
read_blastware_file → codec returns None → samples = empty dict →
``ev.peak_values = _peaks_from_samples(empty)`` returns
``PeakValues(0, 0, 0, 0, 0)`` (NOT None).  The backfill script's
existing "seed from DB row when peak_values is None" branch then
correctly *skips* the seeding, and the all-zeros PeakValues flows into
``db.insert_events()``'s UPSERT path, OVERWRITING the existing good DB
peak values for that event (which were populated from the paired BW
ASCII report at ingest).

Net effect: running the backfill on prod would have wiped the PPV /
mic / vector-sum columns for ~10,000 histogram events.

Fix: only compute peaks-from-samples when there are actually samples.
For events the codec couldn't decode (histogram-mode bodies, until
the §7.6.2 histogram codec is wired in), leave peak_values=None as
the "we don't know" signal.  Downstream consumers:

  - backfill_sidecars.py — its existing ``if ev.peak_values is None:``
    branch (line 243) seeds from the DB row, preserving the real
    BW-report peaks across the regen.
  - WaveformStore.save_imported_bw — apply_report_to_event overlays
    peaks from the paired BW ASCII report when one was uploaded.
    Histogram imports without a paired report end up with NULL peaks
    in the DB, which is correct (better than zeros — clearly says
    "no peak data available" rather than "peaks are exactly zero").

Updated the existing synthetic-event round-trip test to expect
peak_values=None for the no-real-body case, which is the truth now.

The 7 fixture-corpus regression tests for real BW waveforms continue
to pass — those have decodable samples, so peak_values is still
populated from the codec output as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 20:30:53 +00:00
serversdown c4648c1959 scripts/backfill_sidecars: skip .h5 write when decoder returned no samples
Discovered while dry-running the backfill on the prod store: ~10,000
of ~10,059 events are histogram-mode (filename extension `*H`), and
the waveform-body codec wired in via the previous commit doesn't
handle histogram-mode bodies — only the waveform-mode codec at
§7.6.1 is implemented; the histogram-mode codec at §7.6.2 of the
protocol reference is documented but no Python implementation
exists yet.

Without this guard, every histogram event's .h5 file would be
*replaced* with an empty one — strictly worse than today's
broken-int16-LE .h5 because any downstream viewer expecting
non-empty sample arrays would now error out instead of just
rendering wrong values.

Fix: after the decoder runs, check whether any channel has samples.
If not, skip the .h5 write entirely.  The sidecar still regenerates
(refreshing the tool_version stamp and any peaks/project info from
the DB row), but the existing .h5 is left untouched.

This is a *temporary* gate.  When the histogram codec lands (next
branch: `feat/wire-histogram-codec`), the has_samples check can be
removed and the backfill will then correctly regenerate all .h5
files, histogram and waveform alike.

Observed effect (dry-run on prod store, 10,059 events):
  - waveform events (~5%): "[DRY ] would write … + .h5 (would (re)write)"
  - histogram events (~95%): "[DRY ] would write … + .h5 (skipped-empty-samples)"
  - sidecar tool_version bump succeeds for both

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 20:16:31 +00:00
serversdown 0e89125495 docker: fix dockerfile to include scripts and micromate folders 2026-05-20 19:58:54 +00:00
serversdown fffb363b2b Merge pull request 'minimateplus: wire read_blastware_file to verified body codec' (#24) from feat/wire-codec-to-import-path into dev
Reviewed-on: #24
2026-05-20 15:26:15 -04:00
serversdown e8682d49ad scripts/backfill_sidecars: cascade h5 regen when sidecar is stale + bump TOOL_VERSION
Two coupled changes that close the rollout gap left by the
read_blastware_file codec wiring:

1. minimateplus/event_file_io.py: bump TOOL_VERSION from 0.16.1 to
   0.20.0.  This is the version stamp the backfill script reads from
   each sidecar's source.tool_version field to detect "this sidecar
   was written before the current decoder shipped, regenerate it."
   Bumping past every value baked into existing prod sidecars flags
   them all as stale on the next backfill run — which is exactly what
   we want, since every pre-codec-wiring sidecar was written by the
   retracted int16-LE decoder.

2. scripts/backfill_sidecars.py: when the sidecar is being
   regenerated this iteration (sha mismatch, tool_version too old,
   or --force), also regenerate the .h5.  Previously the .h5 logic
   only rewrote when --force was passed or the file was missing —
   so a tool_version-driven sidecar regen left the broken .h5 in
   place forever.  Added a `sidecar_stale` boolean to track the
   "we're rewriting the sidecar this iteration" state and wired it
   into the h5 need-rewrite check.

   Path coverage (verified by trace):
     - sidecar missing  → both regen
     - --force          → both regen
     - sha mismatch     → both regen
     - tool_ver too old → both regen (THE post-codec-wiring case)
     - everything OK    → skip iteration entirely (h5 untouched)

Operator review state (review.false_trigger, reviewer, notes) and
the sidecar's extensions block are preserved across regen by the
existing read-existing-sidecar / pass-into-event_to_sidecar_dict
path — unchanged from prior behavior.

Deploy procedure (on prod):
  1. Pull this change + the read_blastware_file codec wiring.
  2. `python scripts/backfill_sidecars.py --dry-run` to preview.
     Every sidecar with source.tool_version<0.20.0 will show as
     "would (re)write".
  3. Run for real (drop --dry-run).  Expect every pre-fix event
     to regen.  Big stores may take a while.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 18:24:06 +00:00
serversdown 31d691b40b minimateplus: wire read_blastware_file to verified body codec
`read_blastware_file()` was still calling `_decode_samples_4ch_int16_le`
(the retracted int16-LE-interleaved hypothesis) on the body bytes,
producing ±32K noise on every channel of every BW file read from disk.
This was the path watcher-forwarded events take into the system
(via the import endpoint → save_imported_bw → read_blastware_file,
since the watcher doesn't ship A5 frames), so every .h5 sidecar
generated for a forwarded event has been wrong since the feature
shipped.

The fix is mechanical: pass the body bytes straight to
`waveform_codec.decode_waveform_v2()` and run the result through
`decoded_to_adc_counts()` for the 16x geo scaling.  The body already
starts with the codec's exact 7-byte preamble `00 02 00 [Tran[0] BE]
[Tran[1] BE]` — confirmed by `body[:3].hex()` across all 9 fixture
events.  No body-slice adjustment needed.

If the codec returns None (truncated/malformed file, synthetic test
input with no real waveform), fall back to empty channels with a log
warning.  The rest of the event (timestamp, waveform_key, project
strings, sensor_location, peaks-from-samples=0) is still recoverable.

Verified against the bundled fixture corpus:

  V70  Tran/Vert/Long 3328/3328 sample-sets match .TXT ground truth
       within the 0.005 in/s display quantum, every row
  6S0/RG0/AB0/470 (5-8-26)  3328/2304/1280/1280 samples; Vert PPVs
       match BW's own report within 0.02 in/s
  JQ0  3328 samples, Vert PPV 3.384 vs BW 3.465
  SP0/SS0/SV0 (loud events)  3072–3328 samples; known walker
       tail-truncation 1–7 samples per channel, samples reached are
       byte-exact

Existing `test_read_blastware_file_round_trip` (synthetic empty event)
continues to pass thanks to the None-fallback.  Codec verify scripts
(`analysis/verify_quiet_bundle.py`, `analysis/verify_full_decode.py`)
re-run unchanged.

Added two regression-lock tests in tests/test_event_file_io.py:
  - test_read_blastware_file_decodes_via_codec[6 fixtures]
    — verifies sample count + Vert PPV per fixture
  - test_read_blastware_file_v70_samples_match_txt_truth
    — verifies every one of V70's 3328 sample-sets across Tran/Vert/Long
      matches the .TXT ground truth row-by-row within 0.003 in/s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 18:13:24 +00:00
serversdown beca5de06e docs: clean up and verify s3 protocol docs 2026-05-20 17:55:02 +00:00
serversdown d85df4c886 Merge pull request 'merge full s3 codec decoded' (#23) from codec-re into main
Reviewed-on: #23
2026-05-20 13:45:32 -04:00
Claude 0466bb4f44 codec: crack wide-NN blocks (1X NN / 2X NN); loud events now fully decode
When NN exceeds 0xFC, the codec extends to 12-bit NN by using the
low nibble of the TYPE byte as the high nibble of NN:

    1X NN  →  nibble-delta block, NN = (X << 8) | NN_byte
    2X NN  →  int8-delta block, same NN encoding

Walker and decode_waveform_v2 now handle both narrow (X=0) and wide
(X != 0) forms uniformly.

Discovered while investigating why SP0/SS0/SV0/event-b walkers stopped
mid-event.  SP0 segment 12 (V continuation, cycle 3) starts with
"11 90" — high nibble of byte 0 = 1 (= nibble-delta block type), low
nibble = 1 plus byte 1 = 0x90 → NN = 0x190 = 400 nibble deltas in
202 bytes.  Walker was rejecting "11" as a non-tag.

Sample count went from 47,364 to 72,972 verified byte-exact:

  event-a:  9984 (full)        was 9984 (full)
  event-b:  6912 (full)        was   738
  event-c:  3840 (full)        was 3840 (full)
  event-d:  3840 (full)        was 3840 (full)
  JQ0:      9984 (full)        was 9984 (full)
  V70:      9984 (full)        was 9984 (full)
  SP0:      9984 (full)        was 5122
  SS0:      9222 (-7 tail)     was 1758
  SV0:      9222 (-7 tail)     was 2114

7 of 9 fixtures now decode end-to-end across all 3 geo channels.
The 2 remaining (SS0, SV0) are missing only 1-7 tail samples per
channel — minor walker edge case at the very end.

74 tests pass (was 71).
2026-05-20 17:28:54 +00:00
Claude 85f4bcfe86 codec: wire decode_waveform_v2 into production; add MicL dB helper
Replaces the broken legacy int16 LE decoder in client.py with the
verified multi-channel codec.  Three changes:

1. blastware_file.extract_body_bytes(a5_frames) — new helper that
   factors out the body-reconstruction logic from write_blastware_file
   so both writers (BW binary) and decoders (sample arrays) can use
   the same canonical bytes.

2. waveform_codec.decode_a5_frames(a5_frames) — production entry point.
   Returns the raw_samples dict consumers expect (Tran/Vert/Long as
   int16 ADC counts; MicL as native ADC counts).  Internally:
     A5 frames → extract_body_bytes → decode_waveform_v2
                → decoded_to_adc_counts (geos ×16; mic pass-through)

3. waveform_codec.mic_count_to_db(count) — MicL ADC → dB(L) per BW's
   display formula:
     dB = sign(count) × (81.94 + 20 × log10(|count|))   for |count| ≥ 1
   Verified against V70 fixture: count=813 → 140.14 dB (BW PSPL 140.1).

client.py:_decode_a5_waveform is reduced to a thin wrapper that calls
decode_a5_frames and populates event.raw_samples.  Original implementation
preserved as _decode_a5_waveform_LEGACY (dead code; reference only).

Also fixed a tail-end bug in decode_waveform_v2 where trailer-section
"40 02" markers (containing ASCII serial bytes, NOT real segment headers)
were being mis-interpreted, producing 2 spurious samples per channel at
the end of each event.  Added bytes [12:14] == "02 00" validation to
reject non-header markers.

7 new pytest tests cover the new helpers and dB conversion.  Total:
71 passing (up from 64).

Known limitation (carried over from before): the walker still stops
mid-event on the loudest fixtures (SP0/SS0/SV0/event-b) at some
mid-segment edge cases not yet characterized.  Every sample reached
is decoded correctly; the walker just doesn't reach all of them.
Loud events still yield 5,000–15,000 byte-exact samples each.
2026-05-20 17:28:54 +00:00
Claude 2ff2762eec codec-re: 30 NN block CRACKED — codec fully decoded
User intuition (16-bit) + 12-bit packing hypothesis + the int16 ADC
range constraint led to the final piece.

30 NN block format (CONFIRMED across all 14 blocks in the fixture
bundle):

  NN 12-bit signed deltas packed as NN/4 groups of 6 bytes each.
  Within each group:
    bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB-first)
    bytes [2:6] = 4 × int8 low bytes
    delta[k] = sign_extend_12((high_nibble[k] << 8) | low_byte[k])

  Block length = NN × 1.5 + 2 bytes (tag included).  Earlier walker
  used NN × 4 which is only correct in the TRAILER section.

Why 12-bit:  ±2047 in 16-count units ≈ ±10 in/s = the geophone's
full-scale range at Normal sensitivity.  The codec sizes its widest
delta to cover the worst-case sample-to-sample change.

Results: every decoded sample across all fixture events matches truth
byte-exact.  ZERO divergences.

  event-a:  9984 samples (full event, all 3 geos)
  event-c:  3840 (full event)
  event-d:  3840 (full event)
  JQ0:      9984 (full event)
  V70:      9984 (full event)
  SP0:      5122 (walker stops early on edge cases)
  SS0:      1758
  SV0:      2114
  event-b:   738

  TOTAL: 47,364 ADC samples verified, zero errors.

Three full 3-sec events decode end-to-end across all three geo
channels.  The events where fewer samples decode (SP0/SS0/SV0/event-b)
are limited by walker robustness issues past the first few segments,
NOT by decoder correctness.

64 tests pass (up from 55).  Files: minimateplus/waveform_codec.py
(new 30 NN decode + corrected walker length), tests/test_waveform_codec.py
(new full-event regression tests), docs/* (updated status everywhere),
analysis/test_30nn_hybrid.py (new — the analysis script that confirmed
the format).
2026-05-20 17:28:54 +00:00
Claude d4cdce77fa codec-re: 30 NN partial finding — sum matches but per-sample distribution doesn't
Tested the 12-bit signed packed delta hypothesis (motivated by the
observation that ±2047 in 16-count units ≈ ±32K raw ADC counts, almost
exactly the int16 ADC range — a strong design hint).

Result: mixed.  For SP0 block @1689 (V seg 4, samples 650..653):
  truth deltas:                47, 297, 384, 61   (sum = 789)
  12-bit BE contiguous pred:   17,  47, 664, 61   (sum = 789)

Positions 1 and 3 of the pred match truth values at positions 0 and 3
exactly, AND the total sum across all 4 positions matches.  But
positions 0 and 2 of pred don't match any truth value.

Hypothesis space narrows to:
- 12-bit deltas WITH a specific re-ordering or interleaving
- 12-bit deltas with one of the positions being a "step size" or
  "checksum-like" repacked value
- A nonlinear / coded format where the underlying total displacement
  is preserved but per-sample distribution is encoded differently

Two analysis scripts committed (test_30nn_12bit.py, test_30nn_v2.py).
The v2 script uses a real-decoder simulation to get the exact channel
+ sample-index for each 30 NN block, eliminating off-by-one errors in
the truth lookup.
2026-05-20 17:28:54 +00:00
Claude ce5dc640ba codec-re: quiet bundle decodes FULLY (17k samples, zero errors)
User asked the right question: do events without 30 NN blocks decode
fully?  Answer: YES.

  event-a:  Tran 3328 ✓  Vert 3328 ✓  Long 3328 ✓  (28 segments, 0 '30 NN')
  event-c:  Tran 1280 ✓  Vert 1280 ✓  Long 1280 ✓  (12 segments, 0 '30 NN')
  event-d:  Tran 1280 ✓  Vert 1280 ✓  Long 1280 ✓  (12 segments, 0 '30 NN')

17,664 ADC samples decoded byte-exact against BW's ASCII export.
Zero divergences across event-a, event-c, event-d.

This means the codec is FULLY SOLVED for any event without 30 NN
blocks.  The remaining gap is the 30 NN block format only — used for
high-amplitude regions where deltas exceed int8 range.  For quiet
events (or quiet stretches of loud events), the decoder is complete.

9 new regression tests bring the total to 55, all passing.

Files: tests/test_waveform_codec.py + docs/waveform_codec_re_status.md
+ new analysis/verify_quiet_bundle.py.
2026-05-20 17:28:54 +00:00
Claude 07675626dc codec-re: channel rotation CONFIRMED — full multi-channel decoder works
The segment-channel scoring analyzer (from scratch/next_experiment_skeleton.py)
ran and immediately confirmed the rotation hypothesis:

  SP0 seg 0: best fit Vert  508/508  ✓
  SP0 seg 1: best fit Long  508/508  ✓
  SP0 seg 3: best fit Tran  508/508  ✓  (Tran continuation)
  SP0 seg 5: best fit Long  508/508  ✓
  SP0 seg 9: best fit Long  508/508  ✓
  V70 seg 0: best fit Vert  508/508  ✓
  V70 seg 1: best fit Long  508/508  ✓

Channels rotate Tran → Vert → Long → MicL per 40 02 segment header.

Also discovered the segment header has DOUBLE duty: bytes [14:18] anchor
the NEW segment's channel (2 samples as int16 BE in 16-count units), AND
bytes [0:4] extend the PREVIOUS channel by 2 more samples (2 deltas as
int16 BE).  This is the same "2 anchors + delta stream" structure as the
body preamble for Tran.

decode_waveform_v2 now returns full per-channel sample dicts.
Byte-exact verified ranges:
  V70: Tran 512, Vert 512, Long 512   (all first segments)
  JQ0: Tran 512, Vert 258
  SP0: Long 1536 (all 3 L segments)

Still open: the 30 NN block format (high-amplitude packed deltas) —
appears mid-segment when single-byte deltas can't carry the magnitude.

6 new tests bring the count to 46.  All passing.
2026-05-20 17:28:54 +00:00
Claude ae0e17b5dc codec-re: handoff polish — readmes, skeleton, remove decode-re/ duplicate
Three things to make pickup smoother:

1. analysis/README.md (NEW): catalogues the ~25 scratch scripts.
   Categorizes them as "still useful" / "superseded — keep for
   archaeology" / "pure exploration".  Tells a fresh engineer which
   files to read first and which to ignore.

2. scratch/next_experiment_skeleton.py (NEW): stub + spec for the
   segment-channel scoring analyzer.  Includes the fixture loader,
   block walker, and decode-segment-as-channel helper — just enough
   scaffolding that the next pass starts from "fill in
   score_segment_against_all_channels()" rather than from scratch.
   Already runs and confirms 13 segments per 3-sec event with sample
   starts going to 6590 (way past the 3328 actual samples) — strong
   evidence that not all segments carry Tran.

3. Removed decode-re/ duplicate.  It was a mirror of tests/fixtures/.
   Analysis scripts that hardcoded decode-re/ paths updated to point
   at tests/fixtures/.  CLAUDE.md note updated: future event uploads
   go directly into a dated subdirectory under tests/fixtures/.

All 40 tests still pass.  Skeleton runs.
2026-05-20 17:28:54 +00:00
Claude f68ee9f0f9 docs: clean up waveform-codec doc layers per review
Three "truth layers" had drifted apart between commits.  Fixed:

1. waveform_codec.py docstring rewritten from the 2026-05-08
   "structural framing only" state to the 2026-05-11 "Tran segment 0
   solved + segment-header partially decoded" state.  Killed stale
   "~80 sample-sets per segment" language (real segments are
   flash-page-byte-sized, not sample-count-sized; observed first-segment
   sizes are 42-510 samples depending on signal).  Killed stale
   "preamble is 7 or 9 bytes" language (always 7).

2. docs/instantel_protocol_reference.md §7.6.1: added a clear
   "CURRENT STATUS" box at the top with a status table.  Replaced the
   stale "~80 sample-sets" line with the verified per-event segment
   sizes.  Merged two redundant segment-header field-table sections.

3. docs/waveform_codec_re_status.md (NEW): clean working-status doc.
   Solved / not solved / hypothesis / next experiment / fixtures /
   tests.  The protocol reference remains the historical Rosetta
   Stone; this new file is the current-truth working note that
   shouldn't accumulate fossil layers.

4. CLAUDE.md §"Waveform body codec": prominent warning box at top —
   "DO NOT TRUST decoded sample arrays yet."  BW binary passthrough
   is the only sample-bearing output to trust until the decoder
   lands.  Added a "Next experiment" subsection pointing the next
   pass at the segment-channel scoring analyzer.

40 tests still pass.
2026-05-20 17:28:54 +00:00
Claude 5bf5329369 codec-re: add Waveform body codec section to CLAUDE.md
Mirrors the structural findings now documented in
docs/instantel_protocol_reference.md §7.6.1: block framing solved,
Tran segment-0 decode verified across 5 fixture events, multi-segment
continuation still open. Also adds waveform_codec.py to the project
layout map.
2026-05-20 17:28:54 +00:00
Claude 9ed6f2a8d8 codec-re: add segment 1 block dumper for analysis
Investigated multi-segment Tran continuation but couldn't crack it.
Each hypothesis tried (segment header consumes 0/1/2 T deltas, blocks
continue Tran with various interpretations) breaks at sample ~512.

Block budget for V70 segment 1: 264 nibbles + 244 RLE zeros = 508
deltas — exactly the segment size. So the block structure CAN encode
508 single-channel samples, but applying segment 1 blocks as Tran
gives wrong values.

Most likely the channel ordering changes in segment 1+ (e.g., segment
0 = Tran, segment 1 = Vert, segment 2 = Long, etc.) but I couldn't
verify cleanly.  Stopping here — segment-0 Tran decode is solid and
multi-segment work needs more fresh thinking.
2026-05-20 17:28:54 +00:00
Claude a0c9a482c7 codec-re: 00 NN is RLE; full Tran segment-0 decode (4 of 5 events)
User uploaded a Vert-heavy event (JQ0) and a Mic-heavy event (V70).
Those two were exactly what was needed to crack the next piece:

- 00 NN block = run-length-encoded zero deltas in the current channel.
  Append NN copies of the current cumulative value (no change).
- find_data_start now recognizes 00 NN as a valid first tag (some events
  begin with a leading 00 NN RLE block).
- decode_tran_initial now decodes the FULL segment 0 (not just the first
  data block).

Results across 5 fixture events:
  - M529LL1A.SP0 (loud-all-channels)  : 510 / 510  ✓
  - M529LL1L.JQ0 (Vert-heavy)         : 510 / 510  ✓
  - M529LL1L.V70 (Mic-heavy)          : 510 / 510  ✓
  - M529LL1A.SV0 (loud-from-start)    :  58 /  58  ✓
  - M529LL1A.SS0 (loud-from-start)    :  42 / 502  (stops at first 30 04)

The 30 04 block (only seen in loud-from-start events) hasn't been
decoded yet — likely a channel-switch marker for the high-amplitude
regime.

Also discovered: segment header (40 02) payload bytes [0:2] = T_delta
at first sample of new segment, [6:8] = byte length to next segment.
Multi-segment Tran decoding still diverges after sample 512 because
the per-segment channel ordering after the header is unknown.

Tests: 40 pass (up from 36).

Files:
- minimateplus/waveform_codec.py: find_data_start fix, RLE handling,
  full segment-0 decode in decode_tran_initial
- tests/test_waveform_codec.py: synthetic RLE test, full segment 0
  tests for JQ0 and V70
- tests/fixtures/5-11-26/: M529LL1L.JQ0, M529LL1L.V70 + TXT exports
- docs/instantel_protocol_reference.md §7.6.1: RLE + segment-header docs
2026-05-20 17:28:54 +00:00
Claude 6ac126e05c codec-re: crack Tran channel codec with high-amplitude May 11 bundle
User uploaded 3 high-amplitude events (PPV 6-7 in/s — shook the geophone
hard) to decode-re/5-11-26/.  These cracked the Tran codec:

- Preamble bytes [3:5] and [5:7] = Tran[0] and Tran[1] as int16 BE
  in 16-count units (LSB = 0.005 in/s).  Confirmed across all 7
  fixtures.
- First data block carries Tran deltas from sample 2 onward:
  * 10 NN block: NN/2 bytes of payload, each byte = two 4-bit signed
    nibble deltas (high nibble first)
  * 20 NN block: NN int8 signed deltas

Verified 22+42+46 = 110 Tran samples across SP0/SS0/SV0 with 0 errors
against BW's ASCII export.

Why the earlier 96-combination brute force failed: the quiet 5-8
events all had T[0] = T[1] ≈ 0 so the preamble's per-channel encoding
was undetectable.  Loud events made the encoding obvious.

What's solved:
- minimateplus.waveform_codec.decode_tran_initial: returns first
  N Tran samples in 16-count units for any body.
- Walker length formula for in-data 30 NN blocks (NN*2 instead of NN*4).
- Walker now handles bodies that start with 20 NN (in addition to 10 NN).

What's still open:
- Tran past the first data block (multi-block channel switching).
- Vert / Long / MicL channel encodings.
- Walker correctness past offset ~427 in event-b.

Tests: 36 pass.  decode_waveform_v2 still returns None — the full
multi-channel decoder is not wired up.  decode_tran_initial is the
new verified entry point.

Files: minimateplus/waveform_codec.py, tests/test_waveform_codec.py
(adds 5-11-26 fixtures + decode_tran_initial tests), and
docs/instantel_protocol_reference.md §7.6.1 (Tran codec spec).
2026-05-20 17:28:54 +00:00
Claude d3f77d1d96 codec-re: solve waveform body block framing; per-byte sample mapping still open
Decoded the structural framing of the Blastware waveform body — the bytes
between the 21-byte STRT record and the 26-byte file footer.  The body is
a sequence of tagged variable-length blocks, NOT raw int16 LE.  Five tag
types (10/20/00/30/40 NN) and their lengths are now confirmed against the
4-event May 2026 fixture bundle.  Body splits cleanly into ~16 segments
(for a 1280-sample event) separated by 40 02 segment headers carrying a
monotonically incrementing uint32 LE counter at bytes [8:12].

What's done:
- minimateplus/waveform_codec.py — block walker, segment splitter, segment
  header parser.  decode_waveform_v2 is a stub returning None until the
  byte-to-sample mapping is solved; client.py is unchanged.
- tests/test_waveform_codec.py — 31 tests covering block detection, lengths,
  contiguous-walk, segment splitting, segment-header parsing, and counter
  monotonicity.  All pass.
- tests/fixtures/decode-re-5-8-26/ — bundled fixtures (4 events, BW binary
  + Blastware ASCII export each).
- docs/instantel_protocol_reference.md §7.6.1 — replaced retraction box
  with the verified structural decoding plus an explicit list of what's
  still open.

What's still open: the per-byte mapping inside 10 NN / 20 NN blocks.  96
channel-permutation × nibble-order × sign-convention combinations were
brute-force tested; none match BW's ASCII export to within ±1 ADC count.
The codec is more elaborate than uniform 4-bit deltas — likely a hybrid
variable-bit-width scheme with segment-anchor resync points.  Next
recommended step: capture an event with a known calibration tone to pin
down magnitude scaling.

Walker also bails out partway through event-b (open issue documented in
both the module and the protocol reference).
2026-05-20 17:28:54 +00:00
serversdown 7bd0f8badf Pull in v0.18 - Merge branch 'main' into codec-re 2026-05-20 16:50:03 +00:00
Claude 8316a1bbd8 docs(protocol): accuracy sweep across the protocol reference
Three-pass audit of docs/instantel_protocol_reference.md against
CLAUDE.md and the minimateplus/ implementation. Closes long-standing
discrepancies that had accumulated as the protocol understanding
evolved month over month.

Major corrections:
- §2/§3: S3 frames terminate on bare ETX, not DLE+ETX; payload
  byte[1] is flags / byte[2] is SUB (was wrongly DLE/ADDR).
- §4.2: probe responses do not carry data length; DATA_LENGTH
  is a per-SUB hardcoded constant.
- §5.1: dropped stale duplicate "SUB 1C = TRIGGER CONFIG READ"
  row; SUB 0A lengths corrected from 0x30/0x26 to 0x46/0x2C.
- §5.3: added the missing write-frame mechanics (BW_CMD-only
  doubling, DLE-aware checksum, offset = data[1]+2, ack format,
  SUB 71 chunk parameters).
- §7.6.x: switched compliance-anchor convention from the unstable
  10-byte form to the canonical 6-byte `\xbe\x80\x00\x00\x00\x00`;
  recording_mode confirmed at anchor−8 in both read and write
  (the prior anchor−3/−4 split caused anchor drift on write).
  Sample_rate at anchor−6, histogram_interval at anchor−4 (now ),
  record_time at anchor+6. Geo_range row added at channel_label+33.
- §7.5b/§8: added the 10-byte sub_code=0x03 continuous-mode
  timestamp variant; peak vector sum location corrected from
  fixed offset 87 to label-relative tran_pos−12.
- §7.7.2: SUB 1E/1F token byte at params[7], not params[6].
- §7.7.3: SUB 0A length disambiguation rewritten.
- §7.8.4/§7.8.7: fi==9 skip marked FIXED; metadata-page TODO
  replaced with current decoder state.
- §11: POLL example wire bytes corrected; SUB 5A row added to
  checksum table.
- §13/§14: device-under-test updated to BE11529/S338.17; TCP
  Idle Timeout consistency fix (0→2 min); Data Forwarding
  Timeout units clarified.
- §15 (renumbered from second §14): open-question entries
  already resolved in CLAUDE.md closed out.
- Appendix D: extension taxonomy rewritten — extensions encode
  a timestamp (AB0T scheme), not recording mode.

Navigation note added to §7 acknowledging the organic-growth
duplicate section numbers (§7.5/§7.5b, §7.6, §7.7, §7.8, §7.9)
and pointing readers to the canonical sections for each topic.

https://claude.ai/code/session_019tWZybD94YUsBaEGhnM5A2
2026-05-20 15:41:42 +00:00
serversdown 8f568b809b Merge pull request 'v0.19.0 - minimate compatability + family separation' (#22) from dev into main
## v0.19.0 — 2026-05-20

The "device-family separation" release.  Tightens the boundary between Series III (MiniMate Plus / Blastware) and Series IV (Micromate / Thor) so the UI and storage layer dispatch deterministically by family instead of sniffing filename extensions or magnitude heuristics.

### Added — Phase 1: `device_family` column on `events`

- **`events.device_family TEXT`** — new column carrying `"series3"` or `"series4"`.  Populated by every import path (`/db/import/blastware_file`, `/db/import/idf_file`, ACH server, BW CLI, sidecar backfill script).  Returned through `/db/events` since `query_events` uses `SELECT *`.
- **Self-applying migration** — on startup, `ALTER TABLE ... ADD COLUMN` lands the new column; a follow-on `UPDATE` backfills existing rows from the binary filename extension (`.IDFH`/`.IDFW` → `series4`, everything else → `series3`).  No manual SQL needed.
- **UPSERT preserves family** — re-imports without an explicit family don't blank existing rows (`COALESCE(?, device_family)`).
- **UI dispatches on the column** — `sfm_webapp.html` events-table mic formatter now branches on `ev.device_family === 'series4'` (Thor stores native dB(L); BW stores psi).  Modal uses `source.kind === 'idf-import'` from the sidecar (sidecars don't carry the DB column).  Source-files section labels changed from "BW filename / BW filesize / BW sha256" to format-neutral "Event file / File size / File sha256".

### Added — Phase 2: `micromate/` package alongside `minimateplus/`

- **`micromate/`** — new sibling package for the Thor / Micromate Series IV device.  Currently scoped to offline-file ingest; live-device support (TCP transport, framing, protocol, client) will land here when reverse-engineering happens.
  - `micromate/idf_ascii_report.py` — moved from `sfm/idf_ascii_report.py`.  No behaviour change.
  - `micromate/models.py` — typed `IdfReport`, `IdfEvent`, `IdfPeaks`, `IdfProjectInfo`, `IdfSensorCheck`.  Stores mic in native `mic_pspl_dbl` (dB(L)) instead of the pseudo-psi shoehorn that the BW-shaped model uses.  `IdfEvent.from_report()` constructs from a parsed dict + filename; `IdfEvent.to_minimateplus_event(waveform_key)` bridges to the existing sidecar / DB-insert machinery.
  - `micromate/idf_file.py` — placeholder for the binary codec (`.IDFH` / `.IDFW`).  Stubbed `read_idf_file()` raises `NotImplementedError`; documents the planned reverse-engineering path.
- **`WaveformStore.save_imported_idf`** refactored to use the native `IdfEvent` and bridge at the SQL-insert boundary.  Cleaner separation of "parse a Thor event" (in `micromate/`) from "store it on disk + write a sidecar" (in `sfm/waveform_store.py`).
- **Tests** — `tests/test_idf_ascii_report.py` imports updated to `micromate.idf_ascii_report`.  All 1,014 example-data sidecars round-trip through `IdfEvent.from_report()` without errors.

### Companion releases

- **thor-watcher** unaffected — it talks to the relay over HTTP only.  No version bump needed.
- **terra-view** unaffected today; can use `device_family` in its event-detail rendering when convenient.

---

## v0.18.0 — 2026-05-19

The "Thor / Series IV ingest adapter" release.  Seismo-relay can now accept event files from Instantel Micromate Series IV (Thor) units alongside the existing MiniMate Plus (Series III) Blastware pipeline.

### Added — Thor (Series IV) IDF ingest

- **`POST /db/import/idf_file`** (`sfm/server.py`) — multipart upload endpoint for `.IDFH` (histogram) and `.IDFW` (waveform) event files plus their `.IDFH.txt` / `.IDFW.txt` ASCII sidecars.  Mirrors the shape of `/db/import/blastware_file`: pairing by filename, optional `serial` query hint, per-file outcome reporting.
- **`sfm/idf_ascii_report.py`** — parser for Thor's TXT sidecars (verified against 1,014 real-world samples).  Extracts device-authoritative PPV, ZC Freq, Peak Vector Sum, Mic PSPL, calibration date, firmware version, sensor self-check results, and project/client/operator strings.
- **`WaveformStore.save_imported_idf()`** (`sfm/waveform_store.py`) — stores Thor binaries verbatim in `<root>/<serial>/<filename>`, writes a `.sfm.json` sidecar with `source.kind = "idf-import"` and the full parsed report under `extensions.idf_report`.  Reuses the existing `events` table — Thor events dedupe on (serial, timestamp) and surface in `/db/events` alongside BW events.
- **`tests/test_idf_ascii_report.py`** — parser tests against the `thor-watcher/example-data/` corpus.

### Changed

- `event_to_sidecar_dict()` (`minimateplus/event_file_io.py`) allow-list for `source_kind` now includes `"idf-import"` so the existing sidecar machinery can carry Thor imports.
- Bumped `pyproject.toml` version to `0.18.0`.

### Companion release

This release ships alongside **thor-watcher v0.3.0**, which adds the SFM forwarder that targets the new `/db/import/idf_file` endpoint.  Operators flip the switch in thor-watcher's new "SFM Forward" Settings tab; events POST to seismo-relay just like the series3-watcher BW forwarder does today.
2026-05-20 11:22:54 -04:00
serversdown ecc935482b seismo-relay v0.19.0 — device-family separation + micromate/ package
Tighten the Series III / Series IV boundary so UI and storage dispatch
on a clean signal instead of sniffing filenames or applying magnitude
heuristics.

Phase 1 — events.device_family column ("series3" | "series4"):
  self-applying migration with filename-based backfill of existing rows
  (1,132 backfilled on prod 2026-05-20); plumbed through every import
  path (BW endpoint, IDF endpoint, ACH server, BW CLI, sidecar
  backfill); UPSERT preserves via COALESCE; UI dispatches on it.

Phase 2 — extract micromate/ package alongside minimateplus/:
  native IdfEvent / IdfReport / IdfPeaks / IdfProjectInfo /
  IdfSensorCheck (mic in dB(L), not pseudo-psi); moved
  idf_ascii_report.py from sfm/ to micromate/; refactored
  save_imported_idf to use IdfEvent and bridge to minimateplus.Event at
  the SQL-insert boundary; idf_file.py stub for the future binary codec.

Phase 3 prep — docs/idf_protocol_reference.md captures the two
observed Thor binary header signatures (1,012 newer-firmware files vs
2 old files whose layout is byte-for-byte BW-STRT-compatible), file-size
hints suggesting int8 sample encoding, open questions in dependency
order, and a concrete first-session plan for cracking the codec.

Also rolled in the v0.18.1 hotfixes that motivated this work:
  - idf_ascii_report parser now handles "<0.005 in/s" (below-threshold)
    and "N/A" markers without leaving raw strings in numeric DB columns.
  - sfm_webapp.html: defensive _ppvFmt / mic formatter so future
    data-shape drift can't kill the whole events table render.

All 1,014 example-data sidecars round-trip through the new package.
See CHANGELOG.md for full notes.
2026-05-20 15:19:49 +00:00
serversdown e95ac692ee feat: add device family to separate s3 and s4 events. 2026-05-20 06:15:50 +00:00
serversdown 3265ad6fa3 fix: apply psi dbL conversion rule 2026-05-20 05:43:52 +00:00
serversdown 350f81f8b5 fix: add thor specific ascii parser. 2026-05-20 05:22:28 +00:00
serversdown cd20be2eff feat: add thor/micromate compatibility v0.18.0 2026-05-19 04:32:43 +00:00
serversdown f7c5c9fed3 Merge branch 'main' into codec-re 2026-05-17 23:30:29 +00:00
serversdown 512d82c720 merge: update to 0.17.0' (#21) from ach-report-ingestion into main
Reviewed-on: #21

## v0.17.0 — 2026-05-17

The "field rescue + DB management" release.  Hardened against units that are stuck in a runaway call-home loop, and added an operator-facing path for purging bogus events that those same units dump into the DB before recovery.  All work in this release was driven by the BE9558H incident (full incident log + recovery procedure at `docs/runbooks/wedged_unit_recovery.md`).

### Added — wedged-unit recovery toolkit

A toolkit for breaking the call-home loop on a misbehaving unit whose firmware is too busy to keep up with normal request/response handshakes.  Tested in production against BE9558H (16 May 2026) — a unit with a stuck-triggered Long-axis geophone that had been call-homing the office BW ACH server every 30 seconds for hours.  Endpoints layered from "single attempt" to "siege mode" to suit different contention levels:

- **`GET /device/events/storage_range`** — SUB 0x06 probe.  POLL + one read; ~2s.  Returns first/last event keys and an `is_empty` flag.  Use to triage whether a unit has stored events without invoking the slow `count_events()` 1E/1F chain (which choked on BE9558H's corrupted event chain).
- **`GET /device/events/index`** — SUB 0x08 probe.  POLL + one read; ~2s.  Returns the lifetime event counter (does NOT decrement on erase — use `storage_range` for "right now" state).
- **`POST /device/events/erase`** — full erase sequence `0xA3 → 0x1C → 0x06 → 0xA2` (confirmed 2026-04-11, see the protocol reference).  Resets event keys to `0x01110000`.  Caller's responsibility to disable ACH first if the underlying trigger condition will re-fill the buffer.
- **`POST /device/rescue`** — one TCP session, short connect+recv timeouts: POLL → disable ACH (compliance config write) → erase events → close.  Designed for race-loop usage when the device is busy in another session.  503 on connect-refused, 502 on protocol failure, 200 on full sequence success.
- **`POST /device/stop_monitoring_blind`** — fire-and-forget Stop Monitoring (SUB 0x97), TCP-only.  Dumps `SESSION_RESET + POLL_PROBE + SESSION_RESET + POLL_DATA + 0x97 × repeat` and closes without reading any S3 response.  The full POLL preamble is required — write commands without it are silently ignored by the device's protocol parser (false-positive surface area that bit the first version of this endpoint).  Use when the device's firmware can't keep up with full request/response but might process inbound bytes at its own pace.
- **`POST /device/stop_monitoring_spam`** — server-side hammer loop, duration-bounded.  Open TCP → write the same blind payload → close → repeat as fast as possible until `duration_s` elapses.  Configurable `connect_timeout` (default 500ms) and `repeat` (frames per session).  Reports `sent_ok`, `connect_failed`, `write_failed`, `rate_attempts_per_s`.  Clamped to 5min duration.
- **`POST /device/stop_monitoring_slow_drip`** — opposite of spam.  Open ONE TCP session, drip the wake handshake + stop frames at `interval_s` (default 3s) for `duration_s` (default 120s, max 10min).  Each drip is ~23 bytes — well under any UART FIFO size.  Opportunistically drains any inbound bytes the device sends back; `bytes_received > 0` in the response strongly suggests the device has started talking and the session is healthy.  **This is the endpoint that saved BE9558H.** Spam mode had been overrunning the device's UART FIFO; slow drip stayed under it.
- **Six rescue scripts** under `scripts/` — thin bash wrappers around the endpoints, default `SFM_BASE_URL=http://localhost:8200` (direct, not via Terra-View proxy whose 60s timeout would cut off the longer endpoints):
    - `rescue_device.sh` — race-loop wrapper for `/device/rescue`
    - `blind_stop.sh` — race-loop wrapper for `/device/stop_monitoring_blind`
    - `spam_stop.sh` — single-call burst hammer
    - `slow_drip.sh` — single-call held-session drip
    - `watch_unit.sh` — passive periodic reachability check (every N min, logs to file), useful for unattended overnight monitoring of a wedged unit
- **`docs/runbooks/wedged_unit_recovery.md`** — symptoms, quick-reference recovery procedure, the modem-layer mechanism (Sierra Wireless serial-port mode-flipping is the real failure mode — not the device firmware), and a table of "why simpler approaches don't work" so the next incident skips the dead ends.

### Added — operator event DB management

Endpoints powering Terra-View's new `/admin/events` page (v0.12.0).  Designed for purging bogus events from a unit that's been forwarding them in bulk (e.g. a stuck-triggered seismograph dumping hundreds of junk events before it's recovered).

- **`DELETE /db/events/{event_id}`** — hard-delete one event row.  Also unlinks the associated blastware binary (`.AB0*`), `.a5.pkl`, `.sfm.json` sidecar, and `.h5` clean-waveform files via the WaveformStore.  Returns the per-file removal status.  404 if the event doesn't exist.
- **`POST /db/events/delete_bulk`** — filter-based or id-list-based bulk delete with safety rails:
    - Filters (`serial`, `from_dt`, `to_dt`, `false_trigger`) combine with AND; same semantics as `GET /db/events`.  `ids` is an additional inclusion list.  Refuses to run with no filters (would wipe the whole table — raises 422).
    - `confirm` must be `true` to actually delete.  Otherwise returns a dry-run summary (`status: "dry_run"`, `matched: N`, `sample_serials: [...]`).
    - `max_rows` (default 10,000) caps how many rows can be deleted by-filter in one call.  If exceeded, returns `status: "too_many"` with a hint to narrow or raise the cap.  Bypassed when only `ids` is supplied.
- **`_cleanup_event_files(row)`** helper in `sfm/server.py` — best-effort `unlink()` of all four sidecar paths derived from the row's `blastware_filename`.  Logged at WARN if a path exists but unlink fails; the DB row deletion still proceeds.
- **`SeismoDb.delete_event(id)` and `SeismoDb.delete_events_bulk(...)`** in `sfm/database.py` — both return the deleted row dict(s) so callers can do file cleanup.  `delete_events_bulk` raises `ValueError` if no filters are supplied.

### Changed

- **Default protocol recv timeout dropped from 30s → 10s** in `_build_client()`.  The unit usually responds in well under a second over cellular; 10s leaves comfortable headroom for retransmits while failing reasonably fast when a unit is wedged.  The two endpoints that perform full 5A waveform downloads still pass `timeout=120.0` explicitly so multi-minute event transfers are unaffected.
- **`_build_client()` now accepts an optional `connect_timeout`** (TCP-only) so rescue / race-loop endpoints can fail fast on busy modems without affecting the protocol-level recv timeout.

### Fixed

- **`GET /device/monitor/status` returned HTTP 500 + uncaught traceback when the device was unresponsive**.  The retry-on-`Exception` inner block let the second `client.poll()`'s `ProtocolError` propagate out of the handler.  Now wrapped in proper try/except — returns 502 with `{"detail": "Protocol error: No S3 frame received within 10.0s ..."}` on timeout, 502 on connection errors, 500 only for genuinely unexpected exceptions.

### Migration

No schema changes.  No data migration required.

If you've been running a previous version against a wedged unit and accumulated bogus events, the new `/admin/events` page in Terra-View v0.12.0 (or direct `POST /db/events/delete_bulk` with `confirm: true`) is the cleanup tool.  Watcher state on the upstream DL2 PC does NOT need separate cleaning — the watcher's `sfm_forwarded.json` keys on file sha256 and won't re-forward the same files.

### Pairing

This release pairs with **Terra-View v0.12.0**, which adds the `/admin/events` UI that consumes the new bulk-delete endpoints, the bulk false-trigger flagging on `/unit/{id}`, and the field-deployment workflow that uses the same `series3-watcher` → SFM ingest path as before.

---

## v0.16.1 — 2026-05-14

### Fixed

- **`record_type` always "Waveform" for forwarded events.**  `read_blastware_file()` hardcoded `ev.record_type = "Waveform"` regardless of the file's actual type.  The watcher-forward pipeline (the main BW ACH ingest path) compounds this by parsing files from a tmp path with a `.bw` suffix, so even a filename-based fallback inside the parser still wouldn't see the original extension.  Now:

  1. New `derive_record_type_from_filename(filename)` helper in `minimateplus/event_file_io.py` derives the type from the LAST character of the filename's extension (V10.72+ AB0T scheme: `H`=Histogram, `W`=Waveform, `M`=Manual, `E`=Event, `C`=Combo).  Falls back to `"Waveform"` for old S338 firmware (3-char extensions ending in `0`) and any unrecognized suffix.
  2. `read_blastware_file()` now calls the helper with its `path.name` so direct callers (the `--dry-run` path in `scripts/import_bw.py`, tests, ad-hoc scripts) get the right value automatically.
  3. `WaveformStore.save_imported_bw()` overrides `ev.record_type` with the **original** filename's derived type after parsing (the tmp file inside the parser doesn't carry the original extension).  This is the path the live watcher-forwarder hits, so the DB column now reflects the actual event type going forward.

  Events ingested before this fix are stuck with `record_type="Waveform"` in the DB; a one-off backfill (`UPDATE events SET record_type = ... WHERE blastware_filename LIKE '%H'`) would fix them retroactively if desired.  Terra-view's event modal also derives client-side from the filename, so the UI already shows the correct type for old events even without the backfill.

---
2026-05-17 19:13:56 -04:00
serversdown 57287a2ade chore: update to 0.17.0 2026-05-17 23:07:12 +00:00
serversdown 1fff8179d6 Add runbook for recovering wedged units and new scripts for device management
- Created a comprehensive runbook (`wedged_unit_recovery.md`) detailing the recovery process for units stuck in a call-home loop, including symptoms, recovery steps, and explanations of the failure mode.
- Added `blind_stop.sh` script to send stop-monitoring commands in a tight loop for unresponsive devices.
- Introduced `rescue_device.sh` script to disable Auto Call Home and erase events from a busy device.
- Implemented `slow_drip.sh` script to send stop-monitoring frames at a slow rate to prevent UART overrun.
- Developed `spam_stop.sh` script to rapidly send stop-monitoring commands to a device.
- Created `watch_unit.sh` script for passive monitoring of device reachability, logging results over time.
2026-05-17 07:58:13 +00:00
serversdown ae7edac83f chore(doc): bump to 0.16.1 2026-05-15 23:35:35 +00:00
serversdown b6911009ff scripts: backfill record_type on legacy events imported with hardcoded "Waveform"
Pre-v0.16.1 (commit aac1c8e), every event ingested through
read_blastware_file got record_type="Waveform" regardless of actual
type because the field was hardcoded.  New ingests derive correctly
from the AB0T filename scheme (H/W/M/E/C).  Existing rows still hold
the wrong value.

This script walks the events table, derives the correct record_type
from each row's blastware_filename, and bulk-updates rows that differ.
Idempotent + dry-run by default.

Usage:
  python -m scripts.backfill_record_type --db bridges/captures/seismo_relay.db
  python -m scripts.backfill_record_type --db bridges/captures/seismo_relay.db --apply

Terra-view's event-detail modal already derives the record_type
client-side from the filename for display, so operators see the
correct type in the UI even before this backfill runs.  This script
brings the DB column in line with what the UI is already showing —
matters for reporting and any downstream consumer that reads the
column directly.
2026-05-15 06:38:09 +00:00
serversdown aac1c8e06d fix(import): derive record_type from filename suffix instead of hardcoding "Waveform"
The BW ACH ingest path was inserting every event with
record_type="Waveform" regardless of the actual type because
read_blastware_file() had `ev.record_type = "Waveform"` hardcoded, and
the live watcher-forward path parses files from a tmp path (suffix
".bw") that doesn't carry the original extension.

V10.72+ MiniMate Plus firmware encodes the event type as the last
character of the AB0T extension scheme (H=Histogram, W=Waveform,
M=Manual, E=Event, C=Combo).  This change:

  1. Adds derive_record_type_from_filename() public helper in
     minimateplus/event_file_io.py
  2. Uses it inside read_blastware_file() so direct callers (the
     --dry-run path of scripts/import_bw.py, tests, ad-hoc scripts)
     get correct types automatically
  3. Overrides ev.record_type in WaveformStore.save_imported_bw()
     using the ORIGINAL filename (source_path.name) — required
     because the parser sees only the tmp file

Old S338 firmware (3-char extensions ending in `0`) and any
unrecognized suffix fall back to "Waveform".

Existing DB rows ingested before this fix are stuck with
record_type="Waveform" — a one-off SQL backfill would fix them
retroactively if desired.  Terra-view's event modal also derives
client-side from the filename, so the UI already shows the correct
type for old events even without the backfill.

Version bumped to 0.16.1 in pyproject.toml, event_file_io.py
TOOL_VERSION, sfm/server.py FastAPI version, and CHANGELOG.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 21:09:21 +00:00
serversdown 84ee68f889 Merge branch 'main' into codec-re 2026-05-11 22:27:25 -04:00
serversdown 20519383fe add additional events for decode 2026-05-11 18:13:24 -04:00
serversdown 87675ac2d8 Merge pull request 'docker: add .dockerignore and Dockerfile for containerization.' (#20) from dockerize into main
Reviewed-on: #20
2026-05-11 17:40:56 -04:00
serversdown 83d69b9220 chore(server): update inline version to 0.16.0 2026-05-11 21:40:18 +00:00
serversdown 3e247e2182 docker: add .dockerignore and Dockerfile for containerization. 2026-05-11 21:38:03 +00:00
serversdown d2e48c62b5 Merge pull request 'feat(import): v0.16.0 - Fully implemented series 3 BW-ACH pipeline stablized.' (#19) from ach-report-ingestion into main
Reviewed-on: #19
2026-05-11 15:55:23 -04:00
serversdown 3402b4d11a add additional events for decode-RE 2026-05-11 14:17:21 -04:00
serversdown 988d26c03d docs: capture deferred work in README Roadmap
Consolidates everything that was floating in chat-only "parking
lot" status into the README's Roadmap (Future) section:

  High-impact (unblocks product features):
    - Waveform body codec reverse-engineering
    - In-app waveform viewer accuracy (depends on codec)
    - Terra-view integration
    - Vibration summary reports

  BW ASCII report parser enhancements:
    - Histogram-specific structural fields
    - Histogram interval bin-table parsing
    - ">100 Hz" value parsing

  Ingestion gaps:
    - MLG forwarding (watcher + SFM endpoint)
    - 0C-record raw bytes persistence in sidecar

  Operational:
    - series3-watcher file archive manager
    - Existing operational items (compliance encoder, modem manager,
      Call Home dial_string write, histogram mode 5A stream)

  Test coverage + lower-priority cleanups.

CLAUDE.md "What's next" section now points to the README as the
canonical deferred-work list, and keeps its own low-level technical
status log for byte-layout details that don't belong in the
roadmap.
2026-05-11 16:08:02 +00:00
serversdown 197c0630e2 chore(release): v0.16.0 — BW ACH ingestion
The "BW ACH ingestion" release.  Paired with series3-watcher v1.5.0,
every Blastware ACH event (binary + _ASCII.TXT report) lands in
SeismoDb with device-authoritative peaks, project metadata, sensor
self-check, and ZC/Time-of-Peak data — without depending on the
still-undecoded waveform body codec.

Bumps pyproject.toml + minimateplus/event_file_io.py TOOL_VERSION
to 0.16.0.  README banner + CHANGELOG entry summarise the work
that landed across commits cdfe4ad..f83993a on this branch.
2026-05-11 07:33:48 +00:00
serversdown f83993ad1d fix(import): pair _ASCII.TXT reports on the SFM server side too
The series3-watcher v1.5.0 fix taught the WATCHER to look for BW
ACH's _ASCII.TXT report alongside each binary.  But the SFM
SERVER's import endpoint only knew about the legacy <binary>.TXT
naming when building its TXT lookup table.

Effect: even though the watcher correctly shipped both files in
the multipart POST (and logged "+ <name>_ASCII.TXT attached"),
the server's reports dict was keyed on the wrong name, so
report_bytes resolved to None for every event.  Without the
report, save_imported_bw fell back to broken-codec peak values
and no project info — exactly the same symptom as before the
watcher fix landed, just for a different reason.

Fix: when stripping the ".TXT" suffix, also recognise the
"_ASCII" trailer and reconstruct the binary's filename by
converting the last "_" back to ".".  Register the report under
BOTH possible binary names so the subsequent lookup matches
whichever convention the operator's BW installation uses.

  ACH convention (Blastware ACH):
    binary T003L2G6.0E0H  + report T003L2G6_0E0H_ASCII.TXT  
  Manual export (operator clicks Save As Text in BW):
    binary M529LK44.AB0   + report M529LK44.AB0.TXT          
  Both for same event (e.g. ACH + operator manual save):
    register under both names; binary lookup wins             

Smoke-tested against the four real fixture filenames in the
project archive.  Full SFM suite still 62 pass.

For the user's situation: pull, restart, and the NEXT re-forward
pass (after deleting watcher state file again if needed) will
hit this code path, parse the report correctly, apply the
overlay onto the Event, and the upsert path will land
authoritative peak values + project info in the DB.
2026-05-11 07:25:04 +00:00
serversdown 6b2a44ff02 fix(import): overlay BW report onto Event + upsert DB row on re-import
Two compounding bugs caused forwarded events to land in the DB with
broken-codec peak values (~10 in/s saturation on every channel) and
no project info, even when the watcher correctly paired a BW ASCII
report with the binary.

Bug 1: save_imported_bw built the sidecar JSON with the report's
authoritative peak / project values via event_to_sidecar_dict(
bw_report=...), but never overlaid those onto the in-memory Event
that flows to db.insert_events().  So the DB row got peak_values
from read_blastware_file()._peaks_from_samples() — which runs the
still-undecoded waveform body codec assuming raw int16 LE and
produces ±32K-shaped noise (= ±10 in/s at Normal range) regardless
of the actual signal.  The sidecar JSON had the truth but the DB
columns (which the webapp queries for fast filter/sort) lied.

Bug 2: insert_events' IntegrityError handler only refreshed the
filename/filesize/a5_pickle/sidecar columns when a duplicate
(serial, timestamp) was seen.  Peak values, project info,
sample_rate, record_type stayed locked in at whatever the FIRST
insert wrote.  So even after Bug 1 was fixed, the historical
events in the DB (already inserted with broken-codec peaks) would
never get their values corrected, because a re-forward would just
hit IntegrityError and skip the field refresh.

Fix 1 (minimateplus/event_file_io.py + sfm/waveform_store.py):
  - New apply_report_to_event(event, report) helper folds the BW
    report's device-authoritative fields onto the Event in-place:
    per-channel PPV, peak vector sum, mic PSPL→psi, project /
    client / operator / sensor_location, sample_rate, record_time.
  - save_imported_bw() calls the helper right after parsing the
    report.  The Event that flows to insert_events() now carries
    correct values.

Fix 2 (sfm/database.py):
  - insert_events()'s IntegrityError UPDATE now refreshes every
    device-authoritative column from the new data: tran_ppv,
    vert_ppv, long_ppv, peak_vector_sum, mic_ppv, project, client,
    operator, sensor_location, sample_rate, record_type, plus
    the existing filename/filesize/a5_pickle/sidecar fields.
  - Preserves: id, waveform_key, session_id, created_at (immutable
    / FK fields), and false_trigger (operator review state).

End-to-end simulation verified:
  - Step 1: import without report → DB has ±10 in/s peaks, no project
  - Step 2: re-import WITH report → upsert path fires, DB now has
            device-authoritative 0.005 in/s peaks + sensor_location
  - Step 3: operator sets false_trigger=1, re-import again → flag
            preserved, peaks remain correct

For the user's situation: deleting the watcher state file forces a
re-forward of all events.  Each re-forward now pairs with its
_ASCII.TXT, applies the report onto the Event, and the upsert
refreshes the DB row.  No DB nuke needed.

Full SFM suite: 62 passed, 44 skipped.
2026-05-11 05:51:39 +00:00
serversdown cc57a8e618 fix(db): /db/units surfaces events-only serials too
Previous query_units() only joined on ach_sessions, which is created
exclusively by the live ACH server.  The BW-importer path
(/db/import/blastware_file → WaveformStore.save_imported_bw →
SeismoDb.insert_events) populates `events` but never creates an
ach_sessions row.  Consequence: every serial whose events flowed in
through the series3-watcher forwarder was invisible to
/db/units (and therefore to the SFM webapp's fleet overview / units
list), even though the events were correctly populated in the
events table with proper serial attribution.

Rewrite query_units() to aggregate from BOTH tables and union the
serials:
  - total_events / last_event_at  come from `events` (every ingest path)
  - last_session_at / total_monitor_entries / total_sessions
                                  come from `ach_sessions` (ACH-only),
                                  0 when no sessions exist for the serial
  - last_seen = max(last_event_at, last_session_at)

Verified on the user's actual prod DB after the
repair_unknown_serials run: /db/units now returns 24 serials instead
of 2.  All 3,257 watcher-forwarded events become visible in the
fleet overview without any further DB surgery.
2026-05-11 05:15:09 +00:00
serversdown 082e5946bc fix(import): resolve real serial from BW filename instead of bucketing to UNKNOWN
The /db/import/blastware_file endpoint was bucketing every
forwarded event into serial='UNKNOWN' in the DB.  WaveformStore
correctly decoded the serial from the BW filename and saved
files to <store>/<serial>/<filename> (e.g.
.../BE17353/S353L5KC.DR0H.h5), but the endpoint code called
db.insert_events(serial=_serial_from_event(ev)) — and
_serial_from_event was a stub that always returned None,
falling back to "UNKNOWN".

Effect on the user's prod server: 3,039 events forwarded across
24 distinct units, ALL inserted under serial='UNKNOWN'.  The
on-disk waveform store + sidecars + HDF5s were fine, but the
SFM webapp's /db/units only showed the two original manually-
uploaded serials because every forwarded row had its serial
column zeroed to UNKNOWN.

Fix:
  - WaveformStore.save_imported_bw() now surfaces the decoded
    serial on the returned `rec` dict (rec["serial"]).
  - The import endpoint uses rec["serial"] as the authoritative
    fallback when the operator hasn't supplied a serial_hint query
    parameter.  Order of precedence:
      query string `serial` → rec["serial"] → _serial_from_event(ev) → "UNKNOWN"
  - Response payload now includes `serial` per file so the watcher
    log lines (or any future caller) can see which unit each event
    was attributed to.

Recovery for existing DB rows:
  scripts/repair_unknown_serials.py walks the events table looking
  for rows with serial='UNKNOWN' and re-attributes each one to the
  serial decoded from blastware_filename.  Updates the row in place
  unless the target (serial, timestamp) already has a row, in which
  case the UNKNOWN duplicate is deleted.  Idempotent.  Default
  dry-run; pass --apply to commit.

  Verified on the user's actual DB (dry-run):
    UNKNOWN rows scanned:       3039
    Updated to real serial:     2602
    Deleted (duplicate of an
     already-correct row):      437
    Unresolved (bad filename):  0

After running the repair, /db/units will show all 24 units
correctly populated.
2026-05-11 02:25:08 +00:00
serversdown a032fa5451 refactor(bw-report): parse user notes by POSITION, not by label
The four operator-supplied note fields in BW's Compliance Setup →
Notes tab (Project / Client / User Name / Seis Loc) have
USER-EDITABLE LABELS — an operator can rename them in BW's UI to
"Building:", "Site Address:", "Inspector:", or anything else, and
the ASCII export writes those literal labels verbatim.  The
previous label-normalisation map approach (just added in commit
6a7e8c6) was fragile: it could only match label spellings we'd
enumerated in advance.  An operator using "Site:" instead of
"Seis Loc:" would have their sensor location silently dropped.

What IS reliable: BW always writes the 4 user-notes lines
contiguously, in the same order, between the "Units :" line and
the "Geo Range :" line of the export.  So parse them by POSITION:

  position 1 → project
  position 2 → client
  position 3 → operator
  position 4 → sensor_location

The original labels BW wrote are preserved in a new
`BwAsciiReport.user_note_labels` dict (canonical slot → literal
label string) so terra-view can render them as the operator named
them.

Removes the `_OPERATOR_LABEL_MAP` / `_normalise_label_for_lookup`
helpers and the elif-by-normalised-label branch in `parse_report`.
Replaces with a small state machine that flips on the "Units" line
and flips off on the "Geo Range" line.

Tests:
  - Default-label fixtures (waveform + histogram) still populate
    correctly, with operator's labels captured.
  - Synthetic custom-labelled exports ("Building:" / "Site Address:" /
    etc.) populate the right slots by position.
  - Histogram-specific "Seis. Location:" works.
  - Lines outside the Units→Geo Range range are ignored even if
    they look like user notes (defensive against malformed exports).
  - Partial blocks (fewer than 4 lines) leave later slots None.
  - Extra lines beyond 4 are dropped (5th slot doesn't exist).

26 tests in test_bw_ascii_report.py (was 33; net drop reflects
parametrised label tests collapsed into 6 focused position tests).
Full SFM suite: 62 passed, 44 skipped.

Pairs with series3-watcher v1.5.0 which fixes the filename pairing
so the report reaches this parser in the first place.
2026-05-10 22:28:31 +00:00
serversdown 6a7e8c6e86 feat(bw-report): normalise operator-field label variants
Blastware writes the operator-supplied fields with different label
spellings across firmware versions and recording modes — most
notably "Seis. Location" on histogram exports vs "Seis Loc:" on
waveform exports.  Previous parser only matched the latter, so
every histogram event silently lost its sensor_location field.

Replace the four hardcoded `key.rstrip(":") == "X"` branches with
a single `_OPERATOR_LABEL_MAP` dispatch table keyed by normalised
label (lowercase, trailing colon/period stripped, internal
whitespace collapsed).  Adds these variants on day 1:

  project:         "Project:" / "Project"
  client:          "Client:"  / "Client"
  operator:        "User Name:" / "User Name"
  sensor_location: "Seis Loc:" / "Seis. Location" / "Seis Location"
                 / "Sensor Location" / "Seis Loc"

To absorb future BW label drift, add a one-line dict entry — no
new elif branch.

14 new tests cover:
  - Each label variant routes to the correct field (parametrised)
  - Case-insensitive matching ("seis loc" / "SEIS LOC" / "SeIs LoC")
  - Whitespace-collapse ("Seis  Loc" with double-space)
  - End-to-end parse of a real histogram fixture from
    example-events/histogram/ — sensor_location ('Loc #1 - 2652 Hepner...')
    populates correctly even though the file uses "Seis. Location"

Total bw_ascii_report tests: 19 → 33.  Full SFM suite still green
(69 passed, 44 skipped — pre-existing skips for h5py-dep tests).

Pairs with series3-watcher v1.5.4 (which fixes the filename pairing
so histograms actually reach this parser in the first place).
2026-05-10 20:13:44 +00:00
serversdown cdfe4ad3c8 feat(import): parse paired BW ASCII reports on /db/import/blastware_file
Blastware's ACH writes a per-event ASCII report (.TXT) alongside each
event binary, containing the rich derived per-channel fields BW
computes (PPV, ZC Freq, Time of Peak, Peak Acceleration, Peak
Displacement, Peak Vector Sum + time, sensor self-check Pass/Fail,
monitor-log timestamps).  None of this lives in the BW binary itself.

When the watcher daemon forwards both files to /db/import/blastware_file
in one multipart POST, we now:

  - Pair binaries with their .TXT partners by filename match
  - Parse the report into a structured BwAsciiReport
  - Land the rich fields in a new top-level `bw_report` block of the
    sidecar JSON
  - Overlay the report's peaks/project_info/timestamp/sample_rate/
    record_time/total_samples/pretrig_samples onto the canonical
    sidecar fields (the report values are device-authoritative; the
    BW-binary STRT-derived values had bugs like reading the 0x46
    record-type marker as rectime)

This unblocks the monthly-summary review workflow — events become
sortable/filterable by peak, location, project, etc. — without
depending on the still-undecoded waveform body codec.
2026-05-08 23:56:43 +00:00
serversdown 510cec8395 add example events for decode reverse engineering. 2026-05-08 15:44:54 -04:00
serversdown 7e13c2020f Merge pull request 'doc(fix): retracts raw int16 LE sample set assumptions.' (#18) from sfm-waveform-store into main
Reviewed-on: #18
2026-05-08 15:27:26 -04:00
serversdown 8aea46b8a0 doc(fix): retracts raw int16 LE sample set assumptions. 2026-05-08 19:26:25 +00:00
serversdown 0f7630c10d Merge pull request 'doc: update readme to 0.15.0' (#17) from sfm-waveform-store into main
Reviewed-on: #17
2026-05-08 15:15:36 -04:00
serversdown 9123269b1f feat(protocol): implement v0.14.0 SUB 5A protocol rewrite with enhanced chunk handling and new helpers
test: add regression tests for v0.14.x SUB 5A protocol fixes
refactor(logging): change warning logs to debug for less verbosity in write_blastware_file
2026-05-08 19:11:55 +00:00
serversdown 9400f59167 doc: update readme to 0.15.0 2026-05-08 19:06:26 +00:00
serversdown e1a73b2c44 Merge pull request 'feat: add waveform store handling' (#16) from sfm-waveform-store into main
Reviewed-on: #16
2026-05-08 15:03:32 -04:00
serversdown bbed85f7e2 fix: update channel keys to include 'MicL' in device_event_waveform documentation 2026-05-08 18:48:06 +00:00
serversdown c641d5fc10 feat: v0.15.0
### Added

- **Layered event storage architecture.**  Each event now lands as four
  files in the per-serial waveform store, each with a clear role:

  - `<filename>` — the Blastware-readable binary (BW file).  Untouched.
  - `<filename>.a5.pkl` — the raw 5A frames (regenerative source).
  - `<filename>.h5` — clean per-channel waveform arrays in physical
    units (in/s for geo, psi for mic) plus event metadata (HDF5 with
    gzip compression).  This is the canonical format for downstream
    analysis tools.
  - `<filename>.sfm.json` — the modern review/metadata sidecar (peaks,
    project, source provenance, review state, extensions).

  SQLite (`seismo_relay.db`) is the searchable index over all four.

- **Plot-ready waveform JSON (`sfm.plot.v1`).**  The `/device/event/{idx}/waveform`
  and `/db/events/{id}/waveform.json` endpoints now return samples in
  physical units with explicit time-axis metadata, peak markers, and
  per-channel unit hints — no more guessing the ADC-to-velocity scale
  client-side.  The webapp waveform viewer was rewritten to consume
  this shape.

- **In-app waveform viewer accuracy fix.**  The standalone SFM webapp
  viewer was scaling geophone amplitudes by `geoAdcScale / 32767`
  (≈ 6.206 / 32767), where `geoAdcScale = 6.206053` is the device's
  *in/s per V* hardware constant — not the ADC-counts-to-velocity
  factor.  This silently scaled every plot ~38% too low for Normal-range
  geophones (the correct full-scale is 10.0 in/s, or 1.25 in/s for
  Sensitive).  Conversion is now done server-side using the geo_range
  from compliance config; the client just plots.

- New `sfm/event_hdf5.py` module: `write_event_hdf5()`,
  `read_event_hdf5()`, plus a plot-JSON helper.
- Backfill script extended to also emit `.h5` for existing events.

### Dependencies

- Added `h5py>=3.10` and `numpy>=1.24` for the HDF5 storage layer.
- Added `python-multipart>=0.0.7` (required by FastAPI for the
  `/db/import/blastware_file` endpoint introduced in this release).
2026-05-08 04:39:51 +00:00
serversdown 9afa3484f4 feat(cache): implement integrity checks for cached events and waveforms
- Added `waveform_key` and `event_timestamp` columns to `CachedEvent` and `CachedWaveform` for integrity verification.
- Implemented logic to flush the cache when a mismatch in (waveform_key, event_timestamp) is detected during event and waveform updates.
- Enhanced `set_events` and `set_waveform` methods to check for mismatches and trigger cache eviction as necessary.
- Introduced a new `LiveCache` class to manage in-memory caching of live device data, separating it from the server logic for better testability.
- Added tests to verify the correctness of cache invalidation logic, particularly for post-erase key reuse scenarios.
- Updated web application to include a "Force refresh" toggle, allowing users to bypass the cache and re-fetch data from the device.
2026-05-07 04:42:00 +00:00
serversdown 0484680c89 fix(docs/comments): rename refs to 'event files' to reflect their timestamp extenion names. 2026-05-06 19:08:38 +00:00
serversdown 3711b11bda feat: add waveform store handling 2026-05-06 19:03:38 +00:00
serversdown 429c6ac87a feat(protocol): implement v0.14.0 SUB 5A protocol rewrite with enhanced chunk handling and new helpers
test: add regression tests for v0.14.x SUB 5A protocol fixes
refactor(logging): change warning logs to debug for less verbosity in write_blastware_file
2026-05-06 14:18:31 -04:00
serversdown 52c6e7b618 Merge pull request 'v0.14.3 - Full waveform DL pipeline tested and working.' (#15) from protocol-fix into main
Reviewed-on: #15
2026-05-05 20:49:47 -04:00
serversdown 29ebc75656 doc: update readme v0.14.3 2026-05-05 20:48:58 -04:00
claude ebfe9877fa doc: update changelog to 0.14.3 2026-05-05 20:39:47 -04:00
claude c914a15e12 docs: update for v0.14.3 - Full continuous waveform download successful! 2026-05-05 20:37:52 -04:00
claude a27693242d fix(protocol): implement partial DLE stuffing for 0x10 bytes in params to prevent request corruption 2026-05-05 18:28:28 -04:00
claude eefec0bd64 fix(blastware_file): remove harmful "duplicate header+STRT" strip logic to preserve valid waveform data 2026-05-05 17:48:40 -04:00
claude 7444738883 debug(protocol): event-N probe is now at counter = start_offset instead of start_offset + 0x46 2026-05-05 16:46:35 -04:00
claude 6b76934a04 Merge branch 'main' into protocol-fix 2026-05-04 14:43:05 -04:00
claude 7b62c790a9 fix(seismo-lab): remove duplicate capture history list 2026-05-04 14:30:46 -04:00
claude b66cc9d075 fix(blastware_file): update TERM detection logic and strip duplicate header blocks for accurate file writing 2026-05-04 14:28:11 -04:00
serversdown 4ab604eff1 Merge pull request 'v0.12.6' (#10) from seismo-lab-new into main
Reviewed-on: #10
2026-05-04 13:22:54 -04:00
serversdown e15f1567ef Doc: Update docs for 0.12.6 2026-05-04 17:18:28 +00:00
serversdown bb33ad3837 doc: update to v0.12.5 2026-05-04 17:13:37 +00:00
claude 45e61fbcaf big refactor of waveform protocol. 2026-05-03 01:20:21 -04:00
claude d758825c67 fix(protocol): correct continuous-mode record header classification for accurate timestamp extraction 2026-05-01 20:28:55 -04:00
claude 0fbb39c21a Big event bugfix. see details:
## v0.13.0 — 2026-05-01

### Fixed

- **SUB 5A bulk waveform stream — over-read bug for events ≥ 2 sec.**
  `read_bulk_waveform_stream` was walking the chunk counter past the actual
  end of the event, picking up post-event circular-buffer garbage that
  corrupted reconstructed Blastware files for any waveform > ~1 sec.  The
  loop now extracts the event's `end_offset` from the STRT record at
  `data[23:27]` of the probe response and stops the chunk walk when the next
  counter would step past it.  Verified against three BW MITM captures
  (4-27-26 + 5-1-26): 2-sec event drops from 37 over-read chunks to 7
  bounded chunks; 3-sec drops to 9; non-zero-start "event 2" drops to 9.

### Added

- `framing.bulk_waveform_term_v2(key4, end_offset, last_chunk_counter)` —
  computes the corrected SUB 5A TERM frame's `(offset_word, params)` per the
  formula confirmed across all 3 BW captures.  Not yet wired into
  `read_bulk_waveform_stream` (the legacy TERM is still used to preserve the
  existing `blastware_file.write_blastware_file` frame-structure expectations);
  available for the next iteration that switches to BW's 0x0200 chunk step.
- `framing.parse_strt_end_offset(a5_data)` — extracts the event-end pointer
  from the STRT record in an A5 response payload.
2026-05-01 18:37:34 -04:00
claude 1ef55521b1 Fix: Removed duplicates from merge botch. Stable version of seismo_lab.py 2026-05-01 17:34:41 -04:00
claude 738b39f3cb Manually Merged seismo lab persistent connection branch into the new direct download branch, creating a new branch called seismo-lab-new 2026-05-01 15:13:50 -04:00
Claude 625b0a4dfc feat(seismo_lab): add Download tab that captures wire bytes during event download
Adds a new CapturingTransport wrapper in minimateplus.transport that mirrors
every TX/RX byte to two raw .bin files using the same on-wire format as
bridges/ach_mitm.py, so the resulting captures are byte-for-byte compatible
with the existing Blastware MITM captures and load directly in the Analyzer.

A new "Download" tab in seismo_lab.py lets the user connect to a device over
TCP or serial and run connect / list-keys / download-events while the wrapper
saves raw_bw_<ts>.bin (our TX) and raw_s3_<ts>.bin (device TX) into a
seismo_dl_<ts>[_<label>]/ session directory. On completion, the panel hands
both files to the Analyzer and switches tabs, mirroring the UX of the
existing Bridge capture flow.
2026-05-01 00:12:02 +00:00
Claude b14f31f3b0 Include capture label in TCP raw filename
Matches serial bridge naming: raw_bw_{ts}_{label}.bin / raw_s3_{ts}_{label}.bin

https://claude.ai/code/session_014NczSHUz9uTzCAf4cVASTJ
2026-04-27 20:48:10 +00:00
Claude b9ab368934 Fix TCP capture: write files only when capture is active
Previously every Blastware connection auto-created files.
Now TCP mode works the same as serial mode:
- Start Bridge: proxy listens and forwards silently, no files written
- New Capture: opens raw_bw/raw_s3 files; pipe threads write to them
- Stop Capture: flushes and closes files, fires Analyzer callback
- No connection = no file; multiple captures per bridge session work correctly

https://claude.ai/code/session_014NczSHUz9uTzCAf4cVASTJ
2026-04-27 20:26:31 +00:00
Claude 9004241846 Restore multi-capture Bridge design + TCP mode
Brings back the protocol-exp BridgePanel design:
- Single bridge session stays up; New Capture / Stop Capture create
  labelled raw-file segments on demand (no files created at bridge start)
- Capture history listbox shows all segments; double-click reloads in Analyzer
- On capture complete: Analyzer auto-populates and runs analysis

TCP mode integrated into same tab (Serial/TCP radio toggle):
- Each incoming Blastware connection is automatically a capture segment
- Session appears in history list; Analyzer wires up live on connect
- Stop Capture disconnects current TCP session

https://claude.ai/code/session_014NczSHUz9uTzCAf4cVASTJ
2026-04-27 20:20:43 +00:00
Claude 6861d9ed97 Merge TCP mode into Bridge tab (Serial/TCP radio toggle)
Removes the separate 'TCP Capture' tab and folds TCP MITM capture directly
into the existing Bridge tab.  A Serial/TCP radio selector at the top swaps
the connection fields (COM ports vs. listen port + device host:port) while
keeping the same Start Bridge / Stop Bridge / Add Mark buttons, capture
checkboxes, log dir, and live log — identical UX for both modes.

https://claude.ai/code/session_014NczSHUz9uTzCAf4cVASTJ
2026-04-26 23:01:45 +00:00
claude 5cd5652560 Merge branch 'seismo-lab' of https://github.com/serversdwn/seismo-relay into seismo-lab 2026-04-26 18:16:52 -04:00
Claude 897ac8a3f3 Add TCP MITM capture tab (TcpBridgePanel)
New 'TCP Capture' tab in seismo_lab.py: listens on a configurable local
port for an incoming Blastware connection, transparently forwards all
traffic to the real seismograph device, and saves both directions to
raw_bw_<ts>.bin / raw_s3_<ts>.bin in the same format the Analyzer already
understands.  Session start wires up Analyzer live mode automatically via
the same on_bridge_started callback as the COM-port bridge.

https://claude.ai/code/session_014NczSHUz9uTzCAf4cVASTJ
2026-04-26 22:10:48 +00:00
serversdown 310fc5986c Merge pull request 'seismo-lab2' (#7) from seismo-lab2 into seismo-lab
Reviewed-on: #7
2026-04-26 16:49:28 -04:00
Claude e1150b30aa fix(analyzer): name A5/5A frames; revert S3 checksum validation
Add 0x5A (BULK_WAVEFORM_STREAM) and 0xA5 (BULK_WAVEFORM_RESPONSE) to
SUB_TABLE so they display with real names instead of UNKNOWN_5A/A5.

Revert S3 checksum validation to checksum_valid=None (the original
intentional behavior). Large S3 frames (A5 bulk waveform, E5 compliance
config) embed inner DLE+ETX sub-frame delimiters; the trailing 0x03 of
the last inner delimiter can land where the parser expects the SUM8
checksum byte, causing false BAD CHK on every valid A5 frame.
protocol.py _validate_frame documents and ignores exactly this issue.

https://claude.ai/code/session_014NczSHUz9uTzCAf4cVASTJ
2026-04-26 20:40:45 +00:00
claude a7585cb5e0 fix(blastware_file, server): implement logic to skip extra chunks after metadata for accurate file writing 2026-04-26 16:32:32 -04:00
Claude 9bbecea70f fix(parser): correct S3 frame terminator — bare ETX, not DLE+ETX
parse_s3 had the S3 terminator logic inverted vs the real S3FrameParser
in framing.py. It was terminating on DLE+ETX and treating bare ETX as
payload, which caused every bare 0x03 to be swallowed — bundling multiple
real S3 frames into one giant body until a DLE+ETX sequence happened to
appear. Result: 583-byte POLL_RESPONSE 'frames' containing many real
frames concatenated, all showing BAD CHK.

Fix: mirror S3FrameParser exactly —
  - Bare ETX (0x03) = real frame terminator
  - DLE+ETX (0x10 0x03) = inner-frame literal data (A4/E5 sub-frames),
    appended to body and parsing continues

https://claude.ai/code/session_014NczSHUz9uTzCAf4cVASTJ
2026-04-26 20:23:18 +00:00
claude ae30a02898 fix(blastware_file, server): enhance logging and correct chunk handling for accurate data processing 2026-04-26 16:03:07 -04:00
claude 2f084ed105 fix(protocol): update chunk counter formula to use max(key4[2:4], 0x0400) for accurate data streaming 2026-04-26 01:28:47 -04:00
claude 7976b544ed fix(blastware_file): never skip A5 frames based on classification at fi>0
Frame 0 is always the probe; frames 1+ are always data (waveform ADC
chunks, compliance config, compliance continuation).  Gating on
classify_frame() at fi>0 produces false positives: ADC binary data
can coincidentally contain b"STRT\xff\xfe", causing frames 1 and 5
to be silently dropped from the body (confirmed from live capture on
event key=01110000).  Remove all type-based filtering; include every
frame unconditionally with the standard index-based skip amounts.
2026-04-26 00:59:36 -04:00
claude 0415af19b4 fix(blastware_file): remove seen_metadata flag and adjust frame processing logic 2026-04-24 20:21:03 -04:00
claude 35c3f4f945 fix(protocol): correct A5 frame classification and chunk counter formula 2026-04-24 17:25:29 -04:00
claude 43c8158493 feat(blastware_file): classify A5 frames, only write waveform frames to body
Add classify_frame() which categorises each A5 frame by content:
  terminator    — page_key == 0x0000
  probe_or_strt — contains b"STRT"
  metadata      — contains compliance-config ASCII markers
                  (Project:, Client:, Standard Recording Setup, …)
  waveform      — binary-heavy (< 20% printable ASCII), i.e. raw ADC data
  unknown       — fallback

Update write_blastware_file() body loop: frame 0 (probe) is still
always processed; frames 1+ are only included when classify_frame
returns "waveform".  Metadata frames (compliance config block with
Project:/Client:/etc.) and any stray STRT-bearing frames are skipped
with a warning/debug log.  Terminator frame handling is unchanged.

Adds temporary print() diagnostics so each frame's classification is
visible in the server log to aid debugging.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 15:48:37 -04:00
claude 242666f358 fix(protocol): correct chunk counter formula for accurate data streaming 2026-04-24 12:52:02 -04:00
claude 03540fdc00 fix: raise max_chunks to 128 for metadata-only 5A download
For 2-second events at 1024 sps the "Project:" metadata frame appears
beyond chunk 32 (the old default cap), causing the safety limit to be
hit and ~34 KB of waveform data to be downloaded instead of stopping
at the metadata frame.  Raising max_chunks to 128 ensures
stop_after_metadata=True can locate the metadata frame for record
times up to ~4 seconds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 02:19:27 -04:00
claude f83fd880c0 fix(protocol): update device_event_blastware_file to include extra chunk for accurate data retrieval 2026-04-24 00:35:34 -04:00
claude ab2c11e9a9 fix(protocol): refine extra chunk fetching logic for accurate termination response 2026-04-23 20:30:07 -04:00
claude fa887b85d9 fix(protocol): update extra chunk fetching logic to stop at silence detection 2026-04-23 18:28:14 -04:00
claude ecd980d345 fix(protocol): enhance extra chunk fetching logic to ensure footer detection 2026-04-23 18:22:27 -04:00
claude bc9f16e503 fix(protocol): adjust extra_chunks calculation to use integer conversion of record_time 2026-04-23 17:39:28 -04:00
claude aa2b02535b fix(protocol): add record_time based chunk scaling for longer event record times 2026-04-23 17:33:16 -04:00
claude 2a2031c3a9 fix(protocol): fetch additional chunk after metadata to ensure valid termination response 2026-04-23 17:08:36 -04:00
claude 9e7e0bce2a fix(protocol): adjust full_waveform setting for event downloads to end when it should. 2026-04-23 16:43:59 -04:00
claude 5e2f3bf2a1 fix(protocol): enable full_waveform for continuous mode. 2026-04-23 16:24:39 -04:00
claude 39ebd4bdaa fix(protocol): revert endpoint back to stop_after_metadata=True 2026-04-23 15:11:56 -04:00
claude 84c87d0b57 fix(protocol): adjust waveform download to use full_waveform for accurate event streaming 2026-04-23 13:02:55 -04:00
claude ec6362cb8e fix(protocol): include terminator in waveform stream downloads 2026-04-23 12:45:59 -04:00
claude 3eeafd24aa fix(protocol): improve terminator frame detection in write_blastware_file.
fix: rename .n00 to just blastware file (.n00 was false positive)
2026-04-23 01:33:44 -04:00
claude 8cb8b86192 fix(server): add error logging for device event handling 2026-04-22 23:48:59 -04:00
claude 6dcca4da79 feat(protocol): fully decode Blastware filename encoding and update related documentation 2026-04-22 23:43:31 -04:00
claude c47e3a3af0 feat(protocol): update Blastware file format documentation and encoding details 2026-04-22 19:16:05 -04:00
claude dfbc9f29c5 feat: first try at building waveform binary files. 2026-04-21 22:57:53 -04:00
claude 4331215e23 feat(protocol): enhance raw capture functionality and documentation updates
- Update `s3_bridge.py` to default raw capture file paths to "auto" for timestamped naming.
- Modify `gui_bridge.py` to pre-check raw capture options and streamline path handling.
- Extend `ach_server.py` to save both incoming and outgoing raw bytes for analysis.
- Revise `CHANGELOG.md` and `instantel_protocol_reference.md` to reflect changes in recording mode handling and compliance data encoding.
2026-04-21 16:07:24 -04:00
claude b3dcfe7239 fix(client): correct recording_mode anchor position in compliance config encoding 2026-04-21 01:17:45 -04:00
claude 9b5cdfd857 feat(logging): add detailed logging for anchor position in compliance config encoding/decoding 2026-04-21 00:23:15 -04:00
serversdown 4a0c9b6da5 Merge pull request 'merge protocol-exp 0.12.3 to main' (#5) from protocol-exp into main
Reviewed-on: #5
2026-04-21 00:22:24 -04:00
claude 7129aae279 fix(client): update compliance data size handling (less strict now) 2026-04-21 00:09:30 -04:00
claude 2186bc238b fix: call home settings tab display 2026-04-20 21:15:16 -04:00
claude 3fb24e1895 feat(call-home): Implement Auto Call Home configuration management
- Added `CallHomeConfig` model to represent the Auto Call Home settings.
- Introduced methods in `MiniMateClient` for reading (`get_call_home_config`) and writing (`set_call_home_config`) the call home configuration.
- Updated `MiniMateProtocol` with new commands for call home operations (SUB 0x2C for read, SUB 0x7E for write, and SUB 0x7F for confirm).
- Created API endpoints for retrieving and updating call home settings in the server.
- Enhanced the web interface with a new "Call Home" tab for user interaction with call home settings.
- Implemented JavaScript functions for reading and writing call home configurations from the web app.
2026-04-20 18:23:48 -04:00
claude 7bdd7c92f2 Merge branch 'protocol-exp' of https://gitea.serversdown.net/serversdown/seismo-relay into protocol-exp 2026-04-20 17:04:00 -04:00
claude b6ffdcfa87 feat: implement geophone sensitivity and recording mode settings in compliance config 2026-04-20 17:03:58 -04:00
serversdown a7aec31915 Merge pull request 'fix(parser): resolve BAD CHK for BW frames caused by SESSION_RESET bytes' (#4) from seismo-lab into protocol-exp
Reviewed-on: #4
2026-04-20 17:01:34 -04:00
Claude 34df9ec5fa fix(parser): resolve BAD CHK for BW frames caused by SESSION_RESET bytes
SESSION_RESET (41 03) is sent before each POLL frame to wake monitoring
units. The ETX lookahead in parse_bw only checked for ACK+STX directly
after ETX, so when 41 03 followed a frame's ETX, the check failed and the
ETX was swallowed into the body as a payload byte — giving a 19-byte body
instead of 17 for POLL frames and failing checksum validation.

Fix: scan past any SESSION_RESET (41 03) sequences when looking for the
next frame start, so the real ACK+STX boundary is found correctly.

Also adds SUM8 checksum validation to parse_s3, which previously left
checksum_valid=None for all S3 frames.

https://claude.ai/code/session_014NczSHUz9uTzCAf4cVASTJ
2026-04-20 20:47:35 +00:00
claude eec6c3dc6a feat: add histogram_interval setting and update UI with new field. 2026-04-20 16:25:56 -04:00
claude 702e06873e fix: add recording_mode option in html 2026-04-20 15:56:52 -04:00
claude 94767f5a9d feat: add recording_mode to config editor in sfm webapp 2026-04-20 15:54:08 -04:00
claude e04114fd6c feat: mapped record_mode protocol 2026-04-20 15:49:31 -04:00
claude f10c5c1b86 feat: add persistent bridge and streamlined capture pipeline to seismo_lab.py 2026-04-20 15:09:55 -04:00
claude aa28495a43 fix: rename max_geo_range to ADC scale, and make it so its not user configurable.
fix: change max_geo_range_enum to geo_range with two options (normal and sensitive)
2026-04-19 18:15:23 -04:00
claude b23cf4bb50 fix: max_geo_range correctly identified as ADC Scale factor number. 2026-04-17 19:43:45 -04:00
serversdown 969010b983 chore: cleanup claude.md mess 2026-04-17 03:58:50 +00:00
serversdown 5fba9bcff8 doc: version bump to 0.12.1 2026-04-17 03:56:33 +00:00
serversdown ec7be4d784 Merge branch 'feature/intelligent-caching' 2026-04-17 03:46:22 +00:00
claude b8ed237363 docs: update to 0.12.1 2026-04-16 18:31:20 -04:00
claude 5866ecdb3e docs: update protocol doc to reflect unkown status of max_range_geo. 2026-04-16 18:17:16 -04:00
serversdown ea9c69b7c9 chore: add sqlalchemy to pyproject 2026-04-16 21:22:04 +00:00
claude 71bcf71cf7 fix: convert raw psi 32 float into db(L). 2026-04-16 21:22:04 +00:00
claude 3e7de848bc fix: update unique constraints in events and monitor_log tables to use timestamp and serial number. Can't use event keys because minimates resuse them after clearing memory. 2026-04-16 21:22:04 +00:00
claude 72a4209cfd fix: sfm_webapp.html remove display: flex from base class, now shows active tab 2026-04-16 21:22:04 +00:00
claude 2b5574511e feat: add waveform viewer endpoint and enhance UI with new tabs for history, units, monitor log, and sessions 2026-04-16 21:22:04 +00:00
claude ce2c859f11 fix: update event count retrieval logic in AchSession and MiniMateClient 2026-04-16 21:22:04 +00:00
claude 7f322f9ff9 feat: add option to restart monitoring after event download in AchSession 2026-04-16 21:22:04 +00:00
serversdown 42b7a88c3d chore: add python build artifacts to gitignore 2026-04-16 21:22:04 +00:00
claude c474db4f69 build: update build backend to setuptools.build_meta 2026-04-16 21:22:04 +00:00
claude 2765ee6ea7 build: add pyproject.toml for editable install 2026-04-16 21:22:04 +00:00
claude ef88240796 docs: update README to v0.12.0
Rewrites the v0.6.0 README to reflect current project state:
ACH server, SQLite DB, SFM REST API with caching, monitor/erase, updated roadmap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:22:04 +00:00
claude 5591d345d9 feat: v0.12.0 — live device cache (_LiveCache) in sfm/server.py
Ports the intelligent-caching branch concept to a plain Python in-memory
implementation — no SQLAlchemy, no extra DB table, no new dependencies.

_LiveCache (threading.Lock + dicts) caches:
  - device info: indefinite, invalidated by POST /device/config
  - events: keyed by (conn_key, device_event_count); count-probe fast path
    (~2s poll+count_events) avoids full downloads when nothing is new
  - monitor status: 30-second TTL, invalidated by monitor start/stop
  - waveforms: permanent per (conn_key, event_index)

All four cached endpoints accept ?force=true to bypass the cache.
Removes sfm/cache.py (SQLAlchemy experiment, now superseded).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:22:04 +00:00
claude 7883a31aa7 v0.11.0 — SQLite persistence layer (SeismoDb)
sfm/database.py (new)
- SeismoDb class: three tables keyed by unit serial number
  - ach_sessions: one row per ACH call-home
  - events: one row per triggered event, deduped by (serial, waveform_key)
  - monitor_log: one row per monitoring interval, deduped by (serial, waveform_key)
- WAL mode, per-request connections, silent dedup via UNIQUE constraint
- Query helpers: query_events(), query_monitor_log(), get_sessions(), query_units()
- false_trigger flag on events for future review UI / report filtering

bridges/ach_server.py
- Import SeismoDb; create shared instance at startup pointed at
  bridges/captures/seismo_relay.db
- After each call-home: insert_events() + insert_monitor_log() + insert_ach_session()
- DB failures logged as warnings, never abort the session

sfm/server.py
- Import SeismoDb; lazy singleton via _get_db()
- New DB read endpoints: GET /db/units, /db/events, /db/monitor_log, /db/sessions
- PATCH /db/events/{id}/false_trigger for manual review flagging

CLAUDE.md / CHANGELOG.md
- Document DB schema, SFM DB endpoints, architecture decision (unit-keyed only)
- Version bump to v0.11.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:19:47 +00:00
claude b241da970d v0.10.0 — monitor log entry support (SUB 0x0A partial records)
Add full decode pipeline for 0x2C partial records from the device's event
list, representing continuous monitoring intervals where no threshold was
crossed.  These records appear interleaved with full triggered events in the
browse walk and were previously ignored.

minimateplus/models.py
- Add MonitorLogEntry dataclass: key, start_time, stop_time, serial,
  geo_threshold_ips, raw_header, duration_seconds property

minimateplus/protocol.py
- read_waveform_header() now returns (data_rsp.data, length) — full payload
  including the record-type byte at position 0 — instead of the sliced header.
  Callers that need the old slice use raw_data[11:11+length] as before.

minimateplus/client.py
- Add _decode_0a_partial_header(): auto-detects 9-byte (sub_code=0x10) vs
  10-byte (sub_code=0x03) timestamp format, handles 1-byte inter-timestamp
  gap, extracts serial via BE anchor and geo threshold via Geo: anchor.
- Add get_monitor_log_entries(skip_keys=None): browse walk (1E → 0A → 1F),
  decodes partial records, skips full records and already-seen keys.

minimateplus/__init__.py
- Export MonitorLogEntry

bridges/ach_server.py
- After get_events(), call get_monitor_log_entries(skip_keys=seen_keys) and
  save new entries to monitor_log.json in the session directory.
- Add _monitor_log_entry_to_dict() helper.
- Include monitor log keys in downloaded_keys for state persistence.

CLAUDE.md / CHANGELOG.md
- Document 0x2C partial record layout (timestamp format, ASCII metadata
  region, 1-byte gap edge case) confirmed from 4-11-26 MITM capture.
- Version bump to v0.10.0; update What's next.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude 6acb419ebd docs: update protocol reference with v0.9.0 erase-all protocol
Changelog section:
- 5 new entries (2026-04-11): erase-all confirmation, SUB 0x06 purpose
  resolved, §7.11 added, §14.6 ACH session lifecycle marked IMPLEMENTED

§5.1 Request Commands:
- SUB 0x06 description updated: "EVENT STORAGE RANGE READ" (not "CHANNEL
  CONFIG READ"), token=0xFE, last 8 bytes = first/last stored event keys
- SUB 0xA3 added: ERASE ALL BEGIN — standard build_bw_frame, token=0xFE, ack 0x5C
- SUB 0xA2 added: ERASE ALL CONFIRM — standard build_bw_frame, token=0xFE, ack 0x5D

§5.2 Response SUBs:
- 0x06→0xF9 marked CONFIRMED 2026-04-11
- 0xA3→0x5C and 0xA2→0x5D added with CONFIRMED status

§7.11 (new section): Erase-All Protocol
- Full wire sequence (6 request/response pairs)
- SUB 0x06 storage range payload layout (36 bytes, last 8 = first/last key)
- Post-erase key counter reset: device restarts from 0x01110000
- Implementation notes pointing to client.py and ach_server.py

§14.6 ACH Session Lifecycle:
- Removed "Future" label — fully implemented in bridges/ach_server.py
- Added step 6 (optional erase), step 8 (DCD/DTR auto-resume)
- Documents ach_server.py flags and ach_state.json schema
- Notes RV55 DCD/DTR issue as known open problem

Open Questions table:
- SUB 0x06 purpose RESOLVED
- Erase-all sequence RESOLVED
- ACH server RESOLVED
- Sensor Check byte: still open, added as formal question
- RV55 DCD/DTR: added as new open question

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude f6a0846bab docs: update CHANGELOG and CLAUDE.md for v0.9.0
CHANGELOG.md:
- New v0.9.0 section covering erase-all protocol, browse helpers,
  delete_all_events(), ach_mitm.py, and ACH server overhaul
- Back-filled v0.8.0 section (write pipeline, monitoring, ACH server)
  that was missing from the previous release notes

CLAUDE.md:
- Bump version to v0.9.0
- Add erase-all protocol section with full wire sequence, SUB 0x06
  storage range response layout, and post-erase key counter reset notes
- Document ACH server state format (ach_state.json v0.9.0 schema with
  downloaded_keys + max_downloaded_key)
- Add RV55 DCD/DTR issue to What's next

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude 3d9db8b662 feat: add ach_mitm.py — transparent TCP MITM proxy for ACH session capture
Listens for inbound unit connections, connects upstream to a real Blastware
ACH server, and forwards bytes bidirectionally while saving both directions to
raw_bw_<ts>.bin and raw_s3_<ts>.bin in the existing capture format.

Used to capture the 4-11-26 Blastware ACH session that confirmed the erase-all
protocol (SUBs 0xA3/0x1C/0x06/0xA2) and the event deletion wire sequence.

Usage:
  python bridges/ach_mitm.py --bw-host 127.0.0.1 --bw-port 9999 --listen-port 9998
  Point the unit's call-home destination at this machine:9998.
  Point this proxy's --bw-host/port at the upstream Blastware ACH server.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude c7e7d177e6 feat: overhaul ACH server with key-based state, erase support, and reset detection
State format (ach_state.json):
- Replace event_count with downloaded_keys (set of hex strings) + max_downloaded_key
- Key-based tracking correctly handles delete-then-re-record: after device erase the
  count drops to 0, but new events have new (or recycled) keys

Browse pre-check:
- list_event_keys() walk before get_events() to bail early when nothing is new
- get_events() called with skip_waveform_for_keys= for already-seen keys, so repeat
  call-homes only download waveforms for genuinely new events

--clear-after-download flag:
- After saving new events, calls client.delete_all_events() (0xA3→0x1C→0x06→0xA2)
- On success: resets downloaded_keys=[] and max_downloaded_key="00000000" so the
  next session starts fresh (device counter resets to 0x01110000 after erase)

Post-erase key-reuse detection:
- Device counter resets to 0x01110000 after any erase; new events reuse old keys
- If max(device_keys) < max_downloaded_key, the device was wiped externally
  (Blastware, manual) — seen_keys is discarded and all device keys treated as new

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude a3b8d10fa8 feat: add erase-all protocol and browse helpers to protocol/client layer
protocol.py:
- SUB_ERASE_ALL_BEGIN = 0xA3, SUB_ERASE_ALL_CONFIRM = 0xA2 (confirmed 4-11-26 MITM)
- SUB_CHANNEL_CONFIG (0x06) data length = 0x24 (36 bytes) in DATA_LENGTHS
- begin_erase_all()              — single frame, token=0xFE, response 0x5C
- confirm_erase_all()            — single frame, token=0xFE, response 0x5D
- read_event_storage_range()     — two-step read (probe+data), token=0xFE
  Response last 8 bytes = first/last stored event key; both 0x01110000 after erase

client.py:
- list_event_keys()              — browse-mode 1E→0A→1F walk, no waveform download;
  returns list of hex key strings; used as fast pre-check before get_events()
- get_events(skip_waveform_for_keys=set())
  — for already-seen keys: only 0A+1F(browse), skips 1E-arm/0C/POLL×3/5A entirely
- delete_all_events()            — orchestrates the confirmed erase sequence:
  0xA3 → 0x1C → 0x06 → 0xA2; logs first/last key from storage range response

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude 4921b0489a fix: correct Event and PeakValues field names in ach_server serialization
Event model uses peak_values (not peaks) and project_info (not direct fields).
PeakValues fields are tran/vert/long/micl/peak_vector_sum (not transverse etc).
ProjectInfo fields accessed via ev.project_info.project etc.

Also fix ev.timestamp serialization: use str() instead of .isoformat() since
Timestamp is a custom dataclass, not datetime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude 8688d815a0 fix: remove non-existent DeviceInfo fields from ach_server log and dict
calibration_date, aux_trigger, setup_name etc. don't exist directly on
DeviceInfo — they live in DeviceInfo.compliance_config (ComplianceConfig).
_device_info_to_dict now accesses them via cc = d.compliance_config.
Log line updated to show serial/firmware/model/event_count instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude 9b50ec9133 fix: make Ctrl-C work on Windows by setting accept() timeout
socket.accept() on Windows blocks indefinitely and ignores KeyboardInterrupt.
Setting a 1-second timeout on the server socket causes the accept loop to wake
up every second and re-check, so Ctrl-C is handled within ~1 second.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude cba8b1b401 feat: defer session dir creation and add --allow-ip allowlist
- Session directory and log file are now created ONLY after startup() succeeds.
  Internet scanners and dropped connections no longer litter the output folder.
  Raw bytes are buffered in memory until startup succeeds, then flushed to disk.

- Add --allow-ip IP flag (repeatable) to allowlist specific source IPs.
  Connections from un-listed IPs are rejected immediately (socket closed, no log).
  If no --allow-ip flags are given, all IPs are still accepted (original behavior).
  Usage: --allow-ip 63.43.212.232 --allow-ip 152.1.2.3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude 41a14ca468 fix: correct event count field offset and eliminate count_events() walk
_decode_event_count: read uint16 BE at offset 10 (confirmed 2026-04-10 from
live BE11529 event index — data[10:12]=0x0006=6, matches device LCD).
Previous uint32 at offset 3 always returned 1 regardless of event count.

ach_server.py: use device_info.event_count (already fetched during connect())
instead of calling count_events() separately. This saves 2*N round-trips and
avoids the 1F linked-list walk which was overcounting on some devices.
count_events() kept as fallback when connect() is skipped (--events-only).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude 1bfc6e4258 fix: replace Unicode chars in log messages, fix DeviceInfo.serial, UTF-8 file log
- Replace all Unicode arrows/checkmarks (->  [OK]  [FAIL]) in ach_server.py
  and client.py log calls — Windows cp1252 console can't encode them
- Fix DeviceInfo attribute: serial_number -> serial
- Fix _device_info_to_dict key: serial_number -> serial
- Demote count_events 1E/1F per-key log lines from WARNING to DEBUG
  (they were flooding the console on devices with many stored events)
- FileHandler now opens with encoding='utf-8' so session log files
  can hold any characters without codec errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude 574d40027f feat: enhance logging messages in ach_server.py and add experiments.py for protocol minimization 2026-04-16 21:14:58 +00:00
claude 0358acb51d feat: add high-water mark state tracking to ach_server + fix monitoring flag
ach_server.py:
- Add ach_state.json per-unit state tracking (keyed by serial number)
- count_events() before any download; skip session if no new events since last call-home
- Download only events beyond the previous high-water mark (all_events[last_count:])
- --max-events N safety cap for first-run units with many stored events
- state_path and max_events wired through AchSession constructor and serve()

client.py (_decode_monitor_status):
- Revert monitoring flag to section[1] == 0x10 (was incorrectly changed to section[6])
- Fix battery/memory offsets to section[-10:-8], [-8:-4], [-4:] (no trailing checksum byte)
- Both confirmed by full byte diff of all 144 0xE3 data frames in 4-8-26/2ndtry capture

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude cf7d838bf4 feat: add SocketTransport and ach_server.py inbound ACH server
minimateplus/transport.py:
- Add SocketTransport(TcpTransport) — wraps an already-accepted inbound
  socket; connect() is a no-op; everything else inherited from TcpTransport.
  Enables the ACH server to reuse all existing protocol/client code without
  any changes.

bridges/ach_server.py:
- Minimal inbound ACH server — listens on port 12345, accepts call-home
  connections from MiniMate Plus units, runs the full BW protocol:
  startup handshake → get_device_info → get_events(full_waveform=True)
- Saves device_info.json + events.json + raw_rx_<ts>.bin + session log
  per connection to bridges/captures/ach_inbound_<ts>/
- raw_rx.bin is byte-compatible with existing Analyzer tooling
- Taps transport.read() to capture raw S3 bytes alongside parsed output
- Each connection runs in its own daemon thread
- Clearly distinguishes push vs pull protocol in the startup log

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude 5e44cdc668 feat: add splitter mode to ach_bridge.py (--mirror HOST:PORT)
Adds a production-safe headphone-splitter mode:
- Device bytes tee'd to both --upstream (primary/prod) and --mirror (new server)
- Only primary server responses are returned to the device
- Mirror connect/write failures are non-fatal and logged; prod is unaffected
- New raw_mirror_<ts>.bin capture file alongside raw_client/raw_server

Three modes: standalone (capture only), bridge (one upstream), splitter (two).
Default listen port changed to 12345 to match project ACH setup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude 37d32077a4 feat: add ACH TCP bridge, serial tap tool, and Serial Watch tab
- bridges/ach_bridge.py: transparent TCP bridge that MITMs the MiniMate Plus
  call-home connection — forwards to real ACH server while logging all frames
  to raw_client/raw_server .bin files compatible with parse_capture.py;
  standalone capture mode for lab use without a real server

- bridges/serial_watch.py: RS-232 serial monitor with live S3 frame parsing;
  taps the line between MiniMate and modem (RV50/RV55); captures raw bytes,
  .log and .jsonl; --ack-ok mode auto-replies to AT commands; fixed fatal
  indentation bug in the original that silently prevented any data capture

- seismo_lab.py: new "Serial Watch" fourth tab (SerialWatchPanel) wrapping
  serial_watch.py functionality; COM port picker with refresh, baud config,
  ack-ok toggle, colour-coded live frame log (teal frames / yellow ctrl /
  blue AT), raw .bin capture auto-fed into Analyzer tab on stop

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:14:58 +00:00
claude b384ba66d1 fix: convert raw psi 32 float into db(L). 2026-04-14 01:13:21 -04:00
claude 27d9823cc1 fix: update unique constraints in events and monitor_log tables to use timestamp and serial number. Can't use event keys because minimates resuse them after clearing memory. 2026-04-13 22:45:58 -04:00
claude 70c9528611 fix: sfm_webapp.html remove display: flex from base class, now shows active tab 2026-04-13 22:40:40 -04:00
claude e8bef1ac7c feat: add waveform viewer endpoint and enhance UI with new tabs for history, units, monitor log, and sessions 2026-04-13 22:34:28 -04:00
claude 27db663579 fix: update event count retrieval logic in AchSession and MiniMateClient 2026-04-13 18:46:23 -04:00
claude e5ea17388a feat: add option to restart monitoring after event download in AchSession 2026-04-13 18:23:27 -04:00
serversdown c0a5131c7d chore: add python build artifacts to gitignore 2026-04-13 21:59:52 +00:00
claude 4ec2f33308 build: update build backend to setuptools.build_meta 2026-04-13 17:56:15 -04:00
claude 6282eacf8b build: add pyproject.toml for editable install 2026-04-13 17:34:58 -04:00
claude 034b3f044d docs: update README to v0.12.0
Rewrites the v0.6.0 README to reflect current project state:
ACH server, SQLite DB, SFM REST API with caching, monitor/erase, updated roadmap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 16:12:07 -04:00
claude 48d7e94c02 feat: v0.12.0 — live device cache (_LiveCache) in sfm/server.py
Ports the intelligent-caching branch concept to a plain Python in-memory
implementation — no SQLAlchemy, no extra DB table, no new dependencies.

_LiveCache (threading.Lock + dicts) caches:
  - device info: indefinite, invalidated by POST /device/config
  - events: keyed by (conn_key, device_event_count); count-probe fast path
    (~2s poll+count_events) avoids full downloads when nothing is new
  - monitor status: 30-second TTL, invalidated by monitor start/stop
  - waveforms: permanent per (conn_key, event_index)

All four cached endpoints accept ?force=true to bypass the cache.
Removes sfm/cache.py (SQLAlchemy experiment, now superseded).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 15:57:02 -04:00
claude 03d224ccc3 v0.11.0 — SQLite persistence layer (SeismoDb)
sfm/database.py (new)
- SeismoDb class: three tables keyed by unit serial number
  - ach_sessions: one row per ACH call-home
  - events: one row per triggered event, deduped by (serial, waveform_key)
  - monitor_log: one row per monitoring interval, deduped by (serial, waveform_key)
- WAL mode, per-request connections, silent dedup via UNIQUE constraint
- Query helpers: query_events(), query_monitor_log(), get_sessions(), query_units()
- false_trigger flag on events for future review UI / report filtering

bridges/ach_server.py
- Import SeismoDb; create shared instance at startup pointed at
  bridges/captures/seismo_relay.db
- After each call-home: insert_events() + insert_monitor_log() + insert_ach_session()
- DB failures logged as warnings, never abort the session

sfm/server.py
- Import SeismoDb; lazy singleton via _get_db()
- New DB read endpoints: GET /db/units, /db/events, /db/monitor_log, /db/sessions
- PATCH /db/events/{id}/false_trigger for manual review flagging

CLAUDE.md / CHANGELOG.md
- Document DB schema, SFM DB endpoints, architecture decision (unit-keyed only)
- Version bump to v0.11.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 00:45:38 -04:00
claude ef2c38e7db v0.10.0 — monitor log entry support (SUB 0x0A partial records)
Add full decode pipeline for 0x2C partial records from the device's event
list, representing continuous monitoring intervals where no threshold was
crossed.  These records appear interleaved with full triggered events in the
browse walk and were previously ignored.

minimateplus/models.py
- Add MonitorLogEntry dataclass: key, start_time, stop_time, serial,
  geo_threshold_ips, raw_header, duration_seconds property

minimateplus/protocol.py
- read_waveform_header() now returns (data_rsp.data, length) — full payload
  including the record-type byte at position 0 — instead of the sliced header.
  Callers that need the old slice use raw_data[11:11+length] as before.

minimateplus/client.py
- Add _decode_0a_partial_header(): auto-detects 9-byte (sub_code=0x10) vs
  10-byte (sub_code=0x03) timestamp format, handles 1-byte inter-timestamp
  gap, extracts serial via BE anchor and geo threshold via Geo: anchor.
- Add get_monitor_log_entries(skip_keys=None): browse walk (1E → 0A → 1F),
  decodes partial records, skips full records and already-seen keys.

minimateplus/__init__.py
- Export MonitorLogEntry

bridges/ach_server.py
- After get_events(), call get_monitor_log_entries(skip_keys=seen_keys) and
  save new entries to monitor_log.json in the session directory.
- Add _monitor_log_entry_to_dict() helper.
- Include monitor log keys in downloaded_keys for state persistence.

CLAUDE.md / CHANGELOG.md
- Document 0x2C partial record layout (timestamp format, ASCII metadata
  region, 1-byte gap edge case) confirmed from 4-11-26 MITM capture.
- Version bump to v0.10.0; update What's next.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 02:59:40 -04:00
claude b9a8e50b3c docs: update protocol reference with v0.9.0 erase-all protocol
Changelog section:
- 5 new entries (2026-04-11): erase-all confirmation, SUB 0x06 purpose
  resolved, §7.11 added, §14.6 ACH session lifecycle marked IMPLEMENTED

§5.1 Request Commands:
- SUB 0x06 description updated: "EVENT STORAGE RANGE READ" (not "CHANNEL
  CONFIG READ"), token=0xFE, last 8 bytes = first/last stored event keys
- SUB 0xA3 added: ERASE ALL BEGIN — standard build_bw_frame, token=0xFE, ack 0x5C
- SUB 0xA2 added: ERASE ALL CONFIRM — standard build_bw_frame, token=0xFE, ack 0x5D

§5.2 Response SUBs:
- 0x06→0xF9 marked CONFIRMED 2026-04-11
- 0xA3→0x5C and 0xA2→0x5D added with CONFIRMED status

§7.11 (new section): Erase-All Protocol
- Full wire sequence (6 request/response pairs)
- SUB 0x06 storage range payload layout (36 bytes, last 8 = first/last key)
- Post-erase key counter reset: device restarts from 0x01110000
- Implementation notes pointing to client.py and ach_server.py

§14.6 ACH Session Lifecycle:
- Removed "Future" label — fully implemented in bridges/ach_server.py
- Added step 6 (optional erase), step 8 (DCD/DTR auto-resume)
- Documents ach_server.py flags and ach_state.json schema
- Notes RV55 DCD/DTR issue as known open problem

Open Questions table:
- SUB 0x06 purpose RESOLVED
- Erase-all sequence RESOLVED
- ACH server RESOLVED
- Sensor Check byte: still open, added as formal question
- RV55 DCD/DTR: added as new open question

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 01:20:43 -04:00
claude 77d9c17680 docs: update CHANGELOG and CLAUDE.md for v0.9.0
CHANGELOG.md:
- New v0.9.0 section covering erase-all protocol, browse helpers,
  delete_all_events(), ach_mitm.py, and ACH server overhaul
- Back-filled v0.8.0 section (write pipeline, monitoring, ACH server)
  that was missing from the previous release notes

CLAUDE.md:
- Bump version to v0.9.0
- Add erase-all protocol section with full wire sequence, SUB 0x06
  storage range response layout, and post-erase key counter reset notes
- Document ACH server state format (ach_state.json v0.9.0 schema with
  downloaded_keys + max_downloaded_key)
- Add RV55 DCD/DTR issue to What's next

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 01:15:11 -04:00
claude 8a1bd34551 feat: add ach_mitm.py — transparent TCP MITM proxy for ACH session capture
Listens for inbound unit connections, connects upstream to a real Blastware
ACH server, and forwards bytes bidirectionally while saving both directions to
raw_bw_<ts>.bin and raw_s3_<ts>.bin in the existing capture format.

Used to capture the 4-11-26 Blastware ACH session that confirmed the erase-all
protocol (SUBs 0xA3/0x1C/0x06/0xA2) and the event deletion wire sequence.

Usage:
  python bridges/ach_mitm.py --bw-host 127.0.0.1 --bw-port 9999 --listen-port 9998
  Point the unit's call-home destination at this machine:9998.
  Point this proxy's --bw-host/port at the upstream Blastware ACH server.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 01:15:02 -04:00
claude 09788b931a feat: overhaul ACH server with key-based state, erase support, and reset detection
State format (ach_state.json):
- Replace event_count with downloaded_keys (set of hex strings) + max_downloaded_key
- Key-based tracking correctly handles delete-then-re-record: after device erase the
  count drops to 0, but new events have new (or recycled) keys

Browse pre-check:
- list_event_keys() walk before get_events() to bail early when nothing is new
- get_events() called with skip_waveform_for_keys= for already-seen keys, so repeat
  call-homes only download waveforms for genuinely new events

--clear-after-download flag:
- After saving new events, calls client.delete_all_events() (0xA3→0x1C→0x06→0xA2)
- On success: resets downloaded_keys=[] and max_downloaded_key="00000000" so the
  next session starts fresh (device counter resets to 0x01110000 after erase)

Post-erase key-reuse detection:
- Device counter resets to 0x01110000 after any erase; new events reuse old keys
- If max(device_keys) < max_downloaded_key, the device was wiped externally
  (Blastware, manual) — seen_keys is discarded and all device keys treated as new

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 01:14:50 -04:00
claude e712d68505 feat: add erase-all protocol and browse helpers to protocol/client layer
protocol.py:
- SUB_ERASE_ALL_BEGIN = 0xA3, SUB_ERASE_ALL_CONFIRM = 0xA2 (confirmed 4-11-26 MITM)
- SUB_CHANNEL_CONFIG (0x06) data length = 0x24 (36 bytes) in DATA_LENGTHS
- begin_erase_all()              — single frame, token=0xFE, response 0x5C
- confirm_erase_all()            — single frame, token=0xFE, response 0x5D
- read_event_storage_range()     — two-step read (probe+data), token=0xFE
  Response last 8 bytes = first/last stored event key; both 0x01110000 after erase

client.py:
- list_event_keys()              — browse-mode 1E→0A→1F walk, no waveform download;
  returns list of hex key strings; used as fast pre-check before get_events()
- get_events(skip_waveform_for_keys=set())
  — for already-seen keys: only 0A+1F(browse), skips 1E-arm/0C/POLL×3/5A entirely
- delete_all_events()            — orchestrates the confirmed erase sequence:
  0xA3 → 0x1C → 0x06 → 0xA2; logs first/last key from storage range response

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 01:14:37 -04:00
claude 8f5da918b5 fix: correct Event and PeakValues field names in ach_server serialization
Event model uses peak_values (not peaks) and project_info (not direct fields).
PeakValues fields are tran/vert/long/micl/peak_vector_sum (not transverse etc).
ProjectInfo fields accessed via ev.project_info.project etc.

Also fix ev.timestamp serialization: use str() instead of .isoformat() since
Timestamp is a custom dataclass, not datetime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 02:09:57 -04:00
claude a03c77af09 fix: remove non-existent DeviceInfo fields from ach_server log and dict
calibration_date, aux_trigger, setup_name etc. don't exist directly on
DeviceInfo — they live in DeviceInfo.compliance_config (ComplianceConfig).
_device_info_to_dict now accesses them via cc = d.compliance_config.
Log line updated to show serial/firmware/model/event_count instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 01:43:02 -04:00
claude 87fa9c954f fix: make Ctrl-C work on Windows by setting accept() timeout
socket.accept() on Windows blocks indefinitely and ignores KeyboardInterrupt.
Setting a 1-second timeout on the server socket causes the accept loop to wake
up every second and re-check, so Ctrl-C is handled within ~1 second.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 01:19:36 -04:00
claude 3f7b5c07b5 feat: defer session dir creation and add --allow-ip allowlist
- Session directory and log file are now created ONLY after startup() succeeds.
  Internet scanners and dropped connections no longer litter the output folder.
  Raw bytes are buffered in memory until startup succeeds, then flushed to disk.

- Add --allow-ip IP flag (repeatable) to allowlist specific source IPs.
  Connections from un-listed IPs are rejected immediately (socket closed, no log).
  If no --allow-ip flags are given, all IPs are still accepted (original behavior).
  Usage: --allow-ip 63.43.212.232 --allow-ip 152.1.2.3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 01:17:30 -04:00
claude 3d2ebfc057 fix: correct event count field offset and eliminate count_events() walk
_decode_event_count: read uint16 BE at offset 10 (confirmed 2026-04-10 from
live BE11529 event index — data[10:12]=0x0006=6, matches device LCD).
Previous uint32 at offset 3 always returned 1 regardless of event count.

ach_server.py: use device_info.event_count (already fetched during connect())
instead of calling count_events() separately. This saves 2*N round-trips and
avoids the 1F linked-list walk which was overcounting on some devices.
count_events() kept as fallback when connect() is skipped (--events-only).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 01:10:49 -04:00
claude 9d9c14af79 fix: replace Unicode chars in log messages, fix DeviceInfo.serial, UTF-8 file log
- Replace all Unicode arrows/checkmarks (->  [OK]  [FAIL]) in ach_server.py
  and client.py log calls — Windows cp1252 console can't encode them
- Fix DeviceInfo attribute: serial_number -> serial
- Fix _device_info_to_dict key: serial_number -> serial
- Demote count_events 1E/1F per-key log lines from WARNING to DEBUG
  (they were flooding the console on devices with many stored events)
- FileHandler now opens with encoding='utf-8' so session log files
  can hold any characters without codec errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 01:06:27 -04:00
claude ab14328c8b feat: enhance logging messages in ach_server.py and add experiments.py for protocol minimization 2026-04-10 00:58:54 -04:00
claude 0baf343bf5 feat: add high-water mark state tracking to ach_server + fix monitoring flag
ach_server.py:
- Add ach_state.json per-unit state tracking (keyed by serial number)
- count_events() before any download; skip session if no new events since last call-home
- Download only events beyond the previous high-water mark (all_events[last_count:])
- --max-events N safety cap for first-run units with many stored events
- state_path and max_events wired through AchSession constructor and serve()

client.py (_decode_monitor_status):
- Revert monitoring flag to section[1] == 0x10 (was incorrectly changed to section[6])
- Fix battery/memory offsets to section[-10:-8], [-8:-4], [-4:] (no trailing checksum byte)
- Both confirmed by full byte diff of all 144 0xE3 data frames in 4-8-26/2ndtry capture

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 14:38:44 -04:00
claude 05421764a5 feat: add SocketTransport and ach_server.py inbound ACH server
minimateplus/transport.py:
- Add SocketTransport(TcpTransport) — wraps an already-accepted inbound
  socket; connect() is a no-op; everything else inherited from TcpTransport.
  Enables the ACH server to reuse all existing protocol/client code without
  any changes.

bridges/ach_server.py:
- Minimal inbound ACH server — listens on port 12345, accepts call-home
  connections from MiniMate Plus units, runs the full BW protocol:
  startup handshake → get_device_info → get_events(full_waveform=True)
- Saves device_info.json + events.json + raw_rx_<ts>.bin + session log
  per connection to bridges/captures/ach_inbound_<ts>/
- raw_rx.bin is byte-compatible with existing Analyzer tooling
- Taps transport.read() to capture raw S3 bytes alongside parsed output
- Each connection runs in its own daemon thread
- Clearly distinguishes push vs pull protocol in the startup log

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 12:34:27 -04:00
claude 74233d7e31 feat: add splitter mode to ach_bridge.py (--mirror HOST:PORT)
Adds a production-safe headphone-splitter mode:
- Device bytes tee'd to both --upstream (primary/prod) and --mirror (new server)
- Only primary server responses are returned to the device
- Mirror connect/write failures are non-fatal and logged; prod is unaffected
- New raw_mirror_<ts>.bin capture file alongside raw_client/raw_server

Three modes: standalone (capture only), bridge (one upstream), splitter (two).
Default listen port changed to 12345 to match project ACH setup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 12:17:57 -04:00
claude 46a86939b7 feat: add ACH TCP bridge, serial tap tool, and Serial Watch tab
- bridges/ach_bridge.py: transparent TCP bridge that MITMs the MiniMate Plus
  call-home connection — forwards to real ACH server while logging all frames
  to raw_client/raw_server .bin files compatible with parse_capture.py;
  standalone capture mode for lab use without a real server

- bridges/serial_watch.py: RS-232 serial monitor with live S3 frame parsing;
  taps the line between MiniMate and modem (RV50/RV55); captures raw bytes,
  .log and .jsonl; --ack-ok mode auto-replies to AT commands; fixed fatal
  indentation bug in the original that silently prevented any data capture

- seismo_lab.py: new "Serial Watch" fourth tab (SerialWatchPanel) wrapping
  serial_watch.py functionality; COM port picker with refresh, baud config,
  ack-ok toggle, colour-coded live frame log (teal frames / yellow ctrl /
  blue AT), raw .bin capture auto-fed into Analyzer tab on stop

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 12:10:52 -04:00
serversdown 2db565ff9c Add intelligent caching layer for SFM device data
Introduces sfm/cache.py — a SQLite-backed cache (via SQLAlchemy) that
sits between the SFM REST endpoints and the device, eliminating redundant
cellular downloads for data that doesn't change.

Cache behaviour by data type:
- Device info / compliance config: cached until a config write occurs;
  POST /device/config now calls mark_config_dirty() to force a fresh read
  on the next /device/info call.
- Event headers + peak values: cached permanently (append-only). On
  subsequent calls to /device/events, the server does a fast count_events()
  (~2s) instead of a full download (~10-30s); only new events are fetched
  from the device and merged into the cache.
- Full waveforms (raw ADC samples): cached permanently — immutable once
  recorded. Repeated requests for the same waveform return instantly with
  zero device contact.
- Monitor status (battery, memory, is_monitoring): 30-second TTL; auto-
  invalidated on start/stop monitoring commands.

All endpoints gain a ?force=true param to bypass the cache when needed.
New endpoints: GET /cache/stats, DELETE /cache/device.
Adds requirements.txt listing fastapi, uvicorn, sqlalchemy, pyserial.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 07:14:51 +00:00
143 changed files with 59868 additions and 1261 deletions
+28
View File
@@ -0,0 +1,28 @@
.git
.gitignore
.venv
venv
env
__pycache__
*.pyc
*.pyo
*.pyd
.pytest_cache
.mypy_cache
.ruff_cache
*.db
*.db-wal
*.db-shm
*.sqlite
*.sqlite3
sfm/data
bridges/captures
example-events
captures
logs
.DS_Store
Thumbs.db
+33 -28
View File
@@ -1,28 +1,33 @@
/bridges/captures/
/example-events/
/manuals/
# Python bytecode
__pycache__/
*.py[cod]
# Virtual environments
.venv/
venv/
env/
# Editor / OS
.vscode/
*.swp
.DS_Store
Thumbs.db
# Analyzer outputs
*.report
claude_export_*.md
# Frame database
*.db
*.db-wal
*.db-shm
/bridges/captures/
/example-events/
/tests/fixtures/
/manuals/
# Python build artifacts
*.egg-info/
dist/
build/
# Python bytecode
__pycache__/
*.py[cod]
# Virtual environments
.venv/
venv/
env/
# Editor / OS
.vscode/
*.swp
.DS_Store
Thumbs.db
# Analyzer outputs
*.report
claude_export_*.md
# Frame database
*.db
*.db-wal
*.db-shm
+1048
View File
File diff suppressed because it is too large Load Diff
+994 -75
View File
File diff suppressed because it is too large Load Diff
+31
View File
@@ -0,0 +1,31 @@
FROM python:3.11-slim
WORKDIR /app
# tzdata is required for the TZ env var to take effect (python:slim
# omits the timezone database). Without it, datetime.now() / logging
# / matplotlib all stay in UTC regardless of TZ. Default zone gets
# set further down via ENV; users override per-deployment via the
# `TZ` env var in docker-compose.
RUN apt-get update && \
apt-get install -y --no-install-recommends curl tzdata && \
rm -rf /var/lib/apt/lists/*
# Default display timezone — applied to server logs, datetime.now(),
# matplotlib rendered timestamps, and any naïve-vs-aware datetime
# conversions in the PDF renderer. Override via TZ env var in
# docker-compose; storage in the DB is always UTC regardless.
ENV TZ=America/New_York
COPY pyproject.toml requirements.txt ./
COPY minimateplus ./minimateplus
COPY micromate ./micromate
COPY sfm ./sfm
COPY bridges ./bridges
COPY scripts ./scripts
RUN pip install --no-cache-dir -e .
EXPOSE 8200
CMD ["python", "-m", "uvicorn", "sfm.server:app", "--host", "0.0.0.0", "--port", "8200"]
+495 -163
View File
@@ -1,16 +1,60 @@
# seismo-relay `v0.6.0`
# seismo-relay `v0.21.0`
A ground-up replacement for **Blastware** — Instantel's aging Windows-only
software for managing MiniMate Plus seismographs.
software for managing seismographs. Supports both the **MiniMate Plus
(Series III)** and the **Micromate (Series IV / "Thor")** families:
Series III via the live RS-232 / TCP wire protocol *and* Blastware ACH file
ingest; Series IV currently via Thor TXT-paired IDF file ingest, with the
binary codec on the roadmap.
Built in Python. Runs on Windows. Connects to instruments over direct RS-232
or cellular modem (Sierra Wireless RV50 / RV55).
Built in Python. Runs on Windows, Linux, or macOS. Connects to instruments
over direct RS-232 or cellular modem (Sierra Wireless RV50 / RV55).
> **Status:** Active development. Full read pipeline working end-to-end:
> device info, compliance config (with geo thresholds), event download with
> true event-time metadata (project / client / operator / sensor location
> sourced from the device at record-time via SUB 5A). Write commands in progress.
> See [CHANGELOG.md](CHANGELOG.md) for version history.
> **Status:** Active development. Full read + write + erase + monitoring
> pipeline working end-to-end over TCP/cellular. ACH Auto Call Home server
> handles inbound unit connections, downloads events, and persists everything
> to a SQLite database. SFM REST API exposes device control and DB queries.
> **As of v0.14.3 (2026-05-05): SUB 5A bulk waveform protocol is verified
> byte-perfect against Blastware captures across 2-sec, 3-sec, and 10-sec
> events.** Generated `.G10` / `.AB0` files open cleanly in Blastware with
> full Event Reports, frequency analysis, and waveform plots.
> **v0.16.0 (2026-05-11)** adds BW ASCII report ingestion to
> `/db/import/blastware_file` — paired with **series3-watcher v1.5.0**,
> every Blastware ACH event lands in SeismoDb with device-authoritative
> peaks, project metadata, sensor self-check, and ZC/Time-of-Peak data,
> without depending on the still-undecoded waveform body codec.
> **v0.18.0 (2026-05-19)** adds Thor / Micromate Series IV ingest at
> `/db/import/idf_file` — paired with **thor-watcher v0.3.0**, every
> `.IDFH` / `.IDFW` event file (plus its `.txt` sidecar) lands in
> SeismoDb the same way BW events do. See
> [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md) for
> the IDF format reference and reverse-engineering plan.
> **v0.19.0 (2026-05-20)** separates Series III and Series IV at the
> code level: new `micromate/` package alongside `minimateplus/`, new
> `events.device_family` DB column ("series3" / "series4") so the UI
> and storage layer dispatch deterministically instead of sniffing
> filenames. Self-applying migration backfills existing rows from the
> binary filename extension.
> **v0.20.0 (2026-05-28)** closes out the Event-Report PDF iteration
> started in v0.17.x: histogram layouts render correctly against BW
> reference PDFs, the ASCII parser handles real-world edge cases
> (`OORANGE`, `>100 Hz`, histogram timestamps), and per-channel ZC
> Freq is surfaced in both modals (event browser + main webapp).
> Adds a server-wide `TZ` env var so operator-visible timestamps
> render in local time instead of UTC. New
> `scripts/backfill_sidecars.py --reparse-txt` lets parser fixes be
> applied retroactively to existing events without re-forwarding,
> using the `.TXT` files preserved at ingest time.
> **v0.21.0 (2026-05-29)** is the Thor / Series IV decoder release —
> `micromate/idf_file.read_idf_file()` now decodes both IDFW
> (waveform) and IDFH (histogram) binaries (8799% sample fidelity
> on quiet IDFW events; all 859 IDFH corpus files decode cleanly).
> A new `micromate/idf_to_bw_report.py` adapter projects parsed
> Thor reports into the BW-shaped sidecar block, so Thor events
> flow through the existing Event Report PDF pipeline without a
> separate renderer. Terra-View v0.13.0 ships in parallel and
> closes Phase 1 of the SFM integration — see its CHANGELOG.
> See [CHANGELOG.md](CHANGELOG.md) for full version history.
---
@@ -18,156 +62,153 @@ or cellular modem (Sierra Wireless RV50 / RV55).
```
seismo-relay/
├── seismo_lab.py ← Main GUI (Bridge + Analyzer + Console tabs)
├── seismo_lab.py ← Main GUI (Bridge + Analyzer + Download + Console tabs)
├── minimateplus/ ← MiniMate Plus client library
│ ├── transport.py ← SerialTransport and TcpTransport
│ ├── protocol.py ← DLE frame layer (read/write/parse)
│ ├── client.py ← High-level client (connect, get_config, etc.)
│ ├── framing.py ← Frame builder/parser primitives
── models.py ← DeviceInfo, EventRecord, etc.
├── minimateplus/ ← Series III (MiniMate Plus) client library
│ ├── transport.py ← SerialTransport, TcpTransport, SocketTransport
│ ├── protocol.py ← DLE frame layer, SUB command dispatch
│ ├── client.py ← High-level client (connect, get_events, delete_all_events, push_config, get_call_home_config, …)
│ ├── framing.py ← Frame builders, DLE codec, S3FrameParser
── models.py ← DeviceInfo, Event, ComplianceConfig, MonitorLogEntry, CallHomeConfig, …
│ ├── bw_ascii_report.py ← Parse BW per-event ASCII reports (.TXT sidecars)
│ ├── event_file_io.py ← Read BW binaries, write .sfm.json sidecars
│ └── blastware_file.py ← Write events to Blastware-compatible .AB0 files
├── sfm/ ← SFM REST API server (FastAPI)
── server.py ← /device/info, /device/events, /device/event
├── micromate/ ← Series IV (Micromate / Thor) client library (NEW v0.19)
── models.py ← IdfEvent, IdfReport, IdfPeaks, IdfProjectInfo, IdfSensorCheck (mic in native dB(L))
│ ├── idf_ascii_report.py ← Parse Thor .IDFW.txt / .IDFH.txt event sidecars
│ ├── idf_file.py ← Binary codec for .IDFW + .IDFH (v0.21.0+)
│ └── idf_to_bw_report.py ← Adapter projecting Thor IDF into the BW report shape (v0.21.0+)
├── sfm/ ← SFM REST API server (FastAPI, port 8200)
│ ├── server.py ← Live device endpoints + DB query + ingest endpoints + caching
│ ├── database.py ← SeismoDb — SQLite persistence (events, monitor_log, ach_sessions)
│ ├── waveform_store.py ← On-disk store for BW + IDF event binaries + .sfm.json sidecars
│ └── sfm_webapp.html ← Embedded web UI with Call Home config tab
├── bridges/
│ ├── s3-bridge/
│ └── s3_bridge.py RS-232 serial bridge (capture tool)
│ ├── ach_server.py ← Inbound ACH call-home server (main production server)
├── ach_mitm.py ← Transparent MITM proxy for capturing BW sessions
│ ├── s3-bridge/ ← RS-232 serial bridge (capture tool)
│ ├── tcp_serial_bridge.py ← Local TCP↔serial bridge (bench testing)
│ ├── gui_bridge.py ← Standalone bridge GUI (legacy)
│ ├── gui_bridge.py ← Standalone bridge GUI with raw capture checkboxes
│ └── raw_capture.py ← Simple raw capture tool
├── parsers/
│ ├── s3_parser.py ← DLE frame extractor
│ ├── s3_analyzer.py ← Session parser, differ, Claude export
│ ├── gui_analyzer.py ← Standalone analyzer GUI (legacy)
│ ├── gui_analyzer.py ← Standalone analyzer GUI
│ └── frame_db.py ← SQLite frame database
└── docs/
── instantel_protocol_reference.md ← Reverse-engineered protocol spec
── instantel_protocol_reference.md ← Series III protocol spec (the Rosetta Stone)
└── idf_protocol_reference.md ← Series IV (Thor IDF) format reference + codec RE plan
```
---
## Quick start
### Seismo Lab (main GUI)
### ACH inbound server (production)
The all-in-one tool. Three tabs: **Bridge**, **Analyzer**, **Console**.
Listens for inbound unit call-homes, downloads all new events and monitor log
entries, and writes everything to `bridges/captures/seismo_relay.db`.
```bash
python bridges/ach_server.py --port 12345 --output bridges/captures/
```
python seismo_lab.py
Point the unit's ACEmanager **Remote Host** to this machine's IP and **Remote Port** to `12345`.
Options:
```
--port N Listen port (default 12345)
--output DIR Capture directory (default bridges/captures/)
--allow-ip IP Allowlist an IP (repeat for multiple; default: accept all)
--max-events N Safety cap for first run (default: unlimited)
--clear-after-download Erase device memory after successful download
--verbose Debug logging
```
### SFM REST server
Exposes MiniMate Plus commands as a REST API for integration with other systems.
Exposes device control and DB queries as a REST API. Proxied by terra-view.
```
cd sfm
uvicorn server:app --reload
```bash
python sfm/server.py # default: 0.0.0.0:8200
python -m uvicorn sfm.server:app --host 0.0.0.0 --port 8200 --reload
```
**Endpoints:**
Open `http://localhost:8200` for the embedded web UI, or `http://localhost:8200/docs`
for the interactive API docs.
### Seismo Lab GUI
```bash
python seismo_lab.py
```
---
## SFM REST API
### Live device endpoints
Each call dials the device, does its work, and closes the connection. TCP
connections are retried once on `ProtocolError` to handle cold-boot timing.
**In-memory caching** — frequently-polled endpoints avoid redundant TCP round-trips
via a thread-safe `_LiveCache` (plain Python dict + `threading.Lock`):
| Method | URL | Cache Strategy |
|--------|-----|---|
| `GET` | `/device/info` | Indefinite; invalidated by `POST /device/config` |
| `GET` | `/device/events` | Count-probe fast path (~2s); full download only when new events detected |
| `GET` | `/device/event/{idx}/waveform` | Permanent per event index |
| `GET` | `/device/monitor/status` | 30-second TTL; invalidated by monitor start/stop |
| `GET` | `/device/call_home` | Fresh read from device (not cached) |
| `POST` | `/device/connect` | — |
| `POST` | `/device/config` | Writes compliance config; invalidates info + events cache |
| `POST` | `/device/config/project` | Patches project/client/operator/sensor_location strings |
| `POST` | `/device/monitor/start` | Sends SUB 0x96; immediately evicts status cache |
| `POST` | `/device/monitor/stop` | Sends SUB 0x97; immediately evicts status cache |
| `POST` | `/device/call_home` | Reads, patches specified fields, writes back to device |
**Cache bypass** — All cached endpoints accept `?force=true` to skip the cache and
force a fresh read from the device.
**Cache stats**`GET /cache/stats` returns hit/miss counts and TTL info; `DELETE /cache/device`
clears the device cache immediately.
Transport query params (supply one set):
```
Serial: ?port=COM5&baud=38400
TCP: ?host=1.2.3.4&tcp_port=12345
```
### DB read endpoints
Query the SQLite database written by `ach_server.py`. All read-only except
`PATCH /db/events/{id}/false_trigger`.
| Method | URL | Description |
|--------|-----|-------------|
| `GET` | `/device/info?port=COM5` | Device info via serial |
| `GET` | `/device/info?host=1.2.3.4&tcp_port=9034` | Device info via cellular modem |
| `GET` | `/device/events?port=COM5` | Event index |
| `GET` | `/device/event?port=COM5&index=0` | Single event record |
| `GET` | `/db/units` | All known serials with summary stats |
| `GET` | `/db/events` | Triggered events (filter by serial, date range, false_trigger). Response rows include `device_family` ("series3" / "series4") so clients dispatch on unit type without sniffing filenames. |
| `GET` | `/db/monitor_log` | Monitoring intervals |
| `GET` | `/db/sessions` | ACH call-home session history |
| `PATCH` | `/db/events/{id}/false_trigger?value=true` | Flag / unflag false triggers |
---
### File ingest endpoints
## Seismo Lab tabs
Used by watcher daemons to push field-collected event files into the SFM DB
+ waveform store. Both accept multipart uploads of binary event files
optionally paired with their ASCII sidecar reports; both dedup by
`(serial, timestamp)` and UPSERT device-authoritative fields on re-import.
### Bridge tab
Captures live RS-232 traffic between Blastware and the seismograph. Sits in
the middle as a transparent pass-through while logging everything to disk.
```
Blastware → COM4 (virtual) ↔ s3_bridge ↔ COM5 (physical) → MiniMate Plus
```
Set your COM ports and log directory, then hit **Start Bridge**. Use
**Add Mark** to annotate the capture at specific moments (e.g. "changed
trigger level"). When the bridge starts, the Analyzer tab automatically wires
up to the live files and starts updating in real time.
### Analyzer tab
Parses raw captures into DLE-framed protocol sessions, diffs consecutive
sessions to show exactly which bytes changed, and lets you query across all
historical captures via the built-in SQLite database.
- **Inventory** — all frames in a session, click to drill in
- **Hex Dump** — full payload hex dump with changed-byte annotations
- **Diff** — byte-level before/after diff between sessions
- **Full Report** — plain text session report
- **Query DB** — search across all captures by SUB, direction, or byte value
Use **Export for Claude** to generate a self-contained `.md` report for
AI-assisted field mapping.
### Console tab
Direct connection to a MiniMate Plus — no bridge, no Blastware. Useful for
diagnosing field units over cellular without a full capture session.
**Connection:** choose Serial (COM port + baud) or TCP (IP + port for
cellular modem).
**Commands:**
| Button | What it does |
|--------|-------------|
| POLL | Startup handshake — confirms unit is alive and identifies model |
| Serial # | Reads unit serial number |
| Full Config | Reads full 166-byte config block (firmware version, channel scales, etc.) |
| Event Index | Reads stored event list |
Output is colour-coded: TX in blue, raw RX bytes in teal, decoded fields in
green, errors in red. **Save Log** writes a timestamped `.log` file to
`bridges/captures/`. **Send to Analyzer** injects the captured bytes into the
Analyzer tab for deeper inspection.
---
## Connecting over cellular (RV50 / RV55 modems)
Field units connect via Sierra Wireless RV50 or RV55 cellular modems. Use
TCP mode in the Console or SFM:
```
# Console tab
Transport: TCP
Host: <modem public IP>
Port: 9034 ← Device Port in ACEmanager (call-up mode)
```
```python
# In code
from minimateplus.transport import TcpTransport
from minimateplus.client import MiniMateClient
client = MiniMateClient(transport=TcpTransport("1.2.3.4", 9034), timeout=30.0)
info = client.connect()
```
### Required ACEmanager settings (Serial tab)
These must match exactly — a single wrong setting causes the unit to beep
on connect but never respond:
| Setting | Value | Why |
|---------|-------|-----|
| Configure Serial Port | `38400,8N1` | Must match MiniMate baud rate |
| Flow Control | `None` | Hardware flow control blocks unit TX if pins unconnected |
| **Quiet Mode** | **Enable** | **Critical.** Disabled → modem injects `RING`/`CONNECT` onto serial line, corrupting the S3 handshake |
| Data Forwarding Timeout | `1` (= 0.1 s) | Lower latency; `5` works but is sluggish |
| TCP Connect Response Delay | `0` | Non-zero silently drops the first POLL frame |
| TCP Idle Timeout | `2` (minutes) | Prevents premature disconnect |
| DB9 Serial Echo | `Disable` | Echo corrupts the data stream |
| Method | URL | Description |
|--------|-----|-------------|
| `POST` | `/db/import/blastware_file` | Series III: `.AB0*` / `.N00` binaries + paired `_ASCII.TXT`. Source: `series3-watcher`. |
| `POST` | `/db/import/idf_file` | Series IV: `.IDFH` / `.IDFW` binaries + paired `.IDFW.txt` / `.IDFH.txt`. Source: `thor-watcher`. |
---
@@ -175,25 +216,150 @@ on connect but never respond:
```python
from minimateplus import MiniMateClient
from minimateplus.transport import SerialTransport, TcpTransport
from minimateplus.transport import TcpTransport
# Serial
client = MiniMateClient(port="COM5")
# TCP (cellular modem)
client = MiniMateClient(transport=TcpTransport("1.2.3.4", 9034), timeout=30.0)
client = MiniMateClient(transport=TcpTransport("1.2.3.4", 12345), timeout=30.0)
with client:
info = client.connect() # DeviceInfo — model, serial, firmware, compliance config
serial = client.get_serial() # Serial number string
config = client.get_config() # Full config block (bytes)
events = client.get_events() # List[EventRecord] with true event-time metadata
# Read
info = client.connect() # DeviceInfo — serial, firmware, compliance config
count = client.count_events() # Number of stored events
keys = client.list_event_keys() # Fast browse walk — event keys only, no download
events = client.get_events() # Full download: headers + peaks + metadata
monitor = client.get_monitor_status() # Battery, memory, is_monitoring flag
log = client.get_monitor_log_entries() # Monitoring intervals (partial 0x2C records)
ach_cfg = client.get_call_home_config() # Auto Call Home settings (SUB 0x2C)
# Write
client.apply_config(
sample_rate=1024,
recording_mode="Continuous", # Single Shot / Continuous / Histogram / Histogram+Continuous
histogram_interval_sec=15, # 2, 5, 15, 60, 300, 900
trigger_level_geo=0.5,
geo_range="Normal", # Normal (10.000 in/s) / Sensitive (1.25 in/s)
project="Bridge Inspection 2026",
client_name="City of Portland",
operator="B. Harrison",
)
client.set_call_home_config(
auto_call_home_enabled=True,
after_event_recorded=True,
at_specified_times=True,
time1_hour=18, time1_min=30, # 6:30 PM
time2_hour=6, time2_min=0, # 6:00 AM
)
# Control
client.start_monitoring() # SUB 0x96
client.stop_monitoring() # SUB 0x97
client.delete_all_events() # Erase all (SUB 0xA3 → 0x1C → 0x06 → 0xA2)
```
`get_events()` runs the full download sequence per event: `1E → 0A → 0C → 5A → 1F`.
The SUB 5A bulk waveform stream is used to retrieve `client`, `operator`, and
`sensor_location` as they existed at record time — not backfilled from the current
compliance config.
`get_events()` runs the full per-event sequence:
`1E → 0A → 1E(arm token=0xFE) → 0C → 1F(arm) → POLL×3 → 5A → 1F(browse)`.
SUB 5A bulk stream walks chunks bounded by the `end_offset` extracted from
the STRT record at byte 17 of the probe response — no over-reading, no
chunk-count cap. Project / client / operator / sensor location strings come
from the dedicated metadata pages at counter `0x1002` and `0x1004`,
read once per session (they reflect the compliance setup at session start,
not per individual event).
---
## micromate library
Series IV / Thor support, sibling to `minimateplus`. Currently scoped to
offline-file ingest from Thor's TXT exporter; live-device protocol is
deferred until the binary codec is cracked.
```python
from micromate import IdfEvent, parse_idf_report
# Parse a .IDFW.txt / .IDFH.txt sidecar (1014 example files round-trip cleanly)
text = open("UM11719_20231219162723.IDFW.txt").read()
report_dict = parse_idf_report(text) # permissive dict
# Wrap into a typed event using the device-native binary filename
event = IdfEvent.from_report(report_dict, "UM11719_20231219162723.IDFW")
event.serial # "UM11719"
event.kind # "Waveform" or "Histogram"
event.peaks.transverse_ips # 0.0251 (in/s, native unit)
event.peaks.mic_pspl_dbl # 99.4 (dB(L), Thor's native mic unit — NOT psi)
event.project_info.project # "UPMC Presby-Loc 3-Level1-1R Elevator Rm"
event.sensor_check.tran # True (passed self-check)
event.firmware_version # "Micromate ISEE 11.0AK"
event.calibration_text # "November 22, 2023 by Instantel"
# Bridge to the existing minimateplus.Event shape for the DB / sidecar paths
# (waveform_key is a 16-byte sha256 prefix when ingesting from a binary file)
bridged_event = event.to_minimateplus_event(waveform_key=b"\x00" * 16)
```
The binary codec (`.IDFW` / `.IDFH` event files themselves) is on the
roadmap — see [`docs/idf_protocol_reference.md`](docs/idf_protocol_reference.md)
for everything known so far, the two observed file signatures, and the
reverse-engineering plan. The `micromate/idf_file.py` stub is where
`read_idf_file()` will land.
---
## Database
`ach_server.py` and the file-ingest endpoints write to
`bridges/captures/seismo_relay.db` (SQLite, WAL mode) via the `SeismoDb`
persistence layer. Three tables, all unit-keyed by serial number:
| Table | Key | Contents |
|-------|-----|----------|
| `ach_sessions` | UUID | Per-call-home audit record: serial, timestamp, peer IP, events_downloaded, monitor_entries, duration_seconds |
| `events` | UUID, UNIQUE(serial, timestamp) | Triggered events: timestamp, Tran/Vert/Long/VectorSum/Mic PPV, project/client/operator/sensor_location strings, sample_rate, record_type, false_trigger flag, **`device_family`** ("series3" / "series4"), `blastware_filename` (binary at-rest in `waveforms/`), sidecar references |
| `monitor_log` | UUID, UNIQUE(serial, start_time) | Monitoring intervals: serial, waveform_key, start_time, stop_time, duration_seconds, geo_threshold_ips |
**Deduplication is by `(serial, timestamp)`** — the device clock is the
stable natural key. Repeat call-homes or re-runs UPSERT the row in place,
refreshing every device-authoritative field (peaks, project strings,
sample_rate, file references) so the latest writer wins. `false_trigger`
and `device_family` are preserved across UPSERTs. Earlier versions used
`(serial, waveform_key)` for dedup, but the device's event-key counter
resets to `0x01110000` after every erase, so timestamps are the correct
dedup field. Migration handles the transition transparently on first
startup.
**`device_family` (added v0.19.0)** discriminates Series III from Series
IV at the SQL level. Set by every import path; the UI dispatches on it
to render mic units correctly (Series III: psi → dBL conversion; Series
IV: native dBL passthrough). Existing rows are backfilled at first
startup of v0.19.0+ by sniffing the binary filename extension.
The on-disk waveform store lives at `bridges/captures/waveforms/<serial>/`
and holds the original event binaries (BW `.AB0*` / `.N00` for Series III,
`.IDFH` / `.IDFW` for Series IV) plus their `.sfm.json` review/metadata
sidecars. Series III events also produce `.a5.pkl` source-frame pickles
and `.h5` clean-waveform exports; Series IV doesn't yet (pending codec).
---
## Connecting over cellular (RV50 / RV55)
Field units connect via Sierra Wireless RV50 or RV55 cellular modems.
### Required ACEmanager settings
| Setting | Value | Why |
|---------|-------|-----|
| Configure Serial Port | `38400,8N1` | Must match MiniMate baud rate |
| Flow Control | `None` | Hardware FC blocks TX if pins unconnected |
| **Quiet Mode** | **Enable** | **Critical** — disabled injects `RING`/`CONNECT` onto serial, corrupting the S3 handshake |
| Data Forwarding Timeout | `1` (= 0.1 s) | Lower latency |
| TCP Connect Response Delay | `0` | Non-zero silently drops the first POLL frame |
| TCP Idle Timeout | `2` (minutes) | Prevents premature disconnect |
| DB9 Serial Echo | `Disable` | Echo corrupts the data stream |
---
@@ -204,56 +370,222 @@ compliance config.
| DLE | `0x10` | Data Link Escape |
| STX | `0x02` | Start of frame |
| ETX | `0x03` | End of frame |
| ACK | `0x41` (`'A'`) | Frame-start marker sent before every frame |
| ACK | `0x41` | Frame-start marker sent before every BW frame |
| DLE stuffing | `10 10` on wire | Literal `0x10` in payload |
**S3-side frame** (seismograph → Blastware): `ACK DLE+STX [payload] CHK DLE+ETX`
**De-stuffed payload header:**
```
[0] CMD 0x10 = BW request, 0x00 = S3 response
[1] ? unknown (0x00 BW / 0x10 S3)
[2] SUB Command/response identifier ← the key field
[3] PAGE_HI Page address high byte
[4] PAGE_LO Page address low byte
[5+] DATA Payload content
```
**Response SUB rule:** `response_SUB = 0xFF - request_SUB`
Example: request SUB `0x08` (Event Index) → response SUB `0xF7`
**Response SUB rule:** `response_SUB = 0xFF - request_SUB` (no exceptions)
Full protocol documentation: [`docs/instantel_protocol_reference.md`](docs/instantel_protocol_reference.md)
---
## Compliance Config Features
The REST API and web UI expose full control over device compliance settings:
- **Recording Mode** (Single Shot / Continuous / Histogram / Histogram+Continuous)
- **Sample Rate** (1024 / 2048 / 4096 sps)
- **Record Time** (float, seconds)
- **Histogram Interval** (2s, 5s, 15s, 1m, 5m, 15m) — when recording mode includes histogram
- **Geo Trigger Levels** (float, in/s per channel)
- **Geo Maximum Range** (Normal 10.000 in/s / Sensitive 1.250 in/s per channel)
- **Project / Client / Operator / Sensor Location** (ASCII strings)
Auto Call Home config:
- **Auto Call Home Enable** (bool)
- **Dial String** (read-only; 40-byte ASCII)
- **Trigger on Event** (bool)
- **Scheduled Call-Ins** (two time slots with HH:MM each)
- **Retry Settings** (count, delay, connection timeout, warm-up time)
---
## Requirements
```
```bash
pip install pyserial fastapi uvicorn
```
Python 3.10+. Tkinter is included with the standard Python installer on
Windows (make sure "tcl/tk and IDLE" is checked during install).
Windows (check "tcl/tk and IDLE" during install).
---
## Virtual COM ports (bridge capture)
The bridge needs two COM ports on the same PC — one that Blastware connects
to, and one wired to the seismograph. Use a virtual COM port pair
(**com0com** or **VSPD**) to give Blastware a port to talk to.
```
Blastware → COM4 (virtual) ↔ s3_bridge.py ↔ COM5 (physical) → MiniMate Plus
```
Use **com0com** or **VSPD** to create the virtual COM pair on Windows.
---
## Roadmap
## Key Features
- [x] Event download — pull waveform records from the unit (`1E → 0A → 0C → 5A → 1F`)
- [x] True event-time metadata — project / client / operator / sensor location from SUB 5A
- [ ] Write commands — push config changes to the unit (compliance setup, channel config, trigger settings)
- [ ] ACH inbound server — accept call-home connections from field units
- [ ] Modem manager — push standard configs to RV50/RV55 fleet via Sierra Wireless API
- [ ] Full Blastware parity — complete read/write/download cycle without Blastware
**Series III (MiniMate Plus) device support:**
- [x] Full read/write/erase pipelines over RS-232 or TCP/cellular
- [x] Compliance config (recording mode, sample rate, histogram interval, geo sensitivity, project strings)
- [x] Auto Call Home config (read/write ACH settings, dial string, time slots, retries)
- [x] Monitor control (start/stop, status polling, battery/memory)
- [x] Monitor log entries (continuous monitoring intervals without full waveform download)
- [x] Blastware file ingest at `/db/import/blastware_file` (paired with `series3-watcher`)
**Series IV (Micromate / Thor) device support:**
- [x] Thor IDF file ingest at `/db/import/idf_file` (paired with `thor-watcher`, v0.18.0+)
- [x] Native `IdfEvent` / `IdfReport` typed models — mic in dB(L), full title strings, sensor self-check, calibration, firmware version
- [x] Parser verified against 1,014 paired `.txt` sidecars in `thor-watcher/example-data/`
- [x] Binary `.IDFW` / `.IDFH` codec — ✅ v0.21.0. IDFW reuses `decode_waveform_v2()` on the body at offset `0x0f1f` (8799% sample fidelity on quiet events); IDFH has a dedicated segment-based decoder (all 859 corpus files decode, 181,071 intervals total). See `micromate/idf_file.py` + `docs/idf_protocol_reference.md`.
- [ ] Live-device protocol — pending codec
**Data persistence:**
- [x] SQLite database (`seismo_relay.db`) with `events`, `monitor_log`, `ach_sessions` tables
- [x] Per-row `device_family` column ("series3" / "series4") for clean UI / unit-of-measurement dispatch (v0.19.0+)
- [x] Deduplication by `(serial, timestamp)` — natural key handles post-erase counter resets
- [x] UPSERT on re-import refreshes every device-authoritative field (peaks, project, sample_rate); preserves operator review state (`false_trigger`)
- [x] Post-erase key-reuse detection (tracks high-water mark in `ach_state.json`)
**REST API:**
- [x] Live device endpoints with in-memory caching (`_LiveCache`)
- [x] Cache statistics (`/cache/stats`) and manual invalidation (`/cache/device`)
- [x] DB query endpoints (units, events, monitor_log, sessions, false_trigger PATCH)
- [x] Call Home config read/write endpoints
- [x] Blastware file download endpoint (`/device/event/{index}/blastware_file`)
- [x] Import endpoints for both device families (`/db/import/blastware_file`, `/db/import/idf_file`)
**File output (v0.7+, byte-perfect as of v0.14.3):**
- [x] Blastware-compatible `.AB0` / `.G10` file generation (waveform + metadata)
- [x] Multi-channel waveform decode from SUB 5A bulk stream
- [x] Second-resolution timestamp encoding in Blastware filename
- [x] **Byte-perfect against BW reference captures** (verified across 2-sec / 3-sec / 10-sec event durations, both event 0 and event N continuation events)
- [x] STRT-bounded chunk walk + correct event-N probe counter + partial DLE stuffing of `0x10` in 5A params (the four fixes that landed in v0.14.0v0.14.3)
**Capture tools:**
- [x] Serial-to-TCP bridge with raw BW/S3 capture (s3_bridge.py, defaults to auto-capture)
- [x] GUI bridge with raw capture checkboxes (gui_bridge.py)
- [x] ACH inbound server with bidirectional capture (ach_server.py saves raw_tx + raw_rx)
- [x] Transparent TCP MITM proxy for live BW session capture (ach_mitm.py)
**Analysis tools:**
- [x] s3_analyzer.py — session parser, frame differ, Claude export
- [x] gui_analyzer.py — standalone analyzer GUI
- [x] frame_db.py — SQLite frame database for capture analysis
**seismo_lab.py GUI:**
- [x] Bridge tab — Serial/TCP mode selector with raw capture options
- [x] Analyzer tab — BW/S3 capture playback and differencing
- [x] Download tab — Live wire-byte capture during event download
- [x] Console tab — Logging and diagnostics
## Roadmap (Future)
### Strategic direction — where this is going
seismo-relay is being built as a **suite of cooperating components**
that together replace and improve on Blastware's role. Three logical
tiers:
1. **SFM** (device-side) — owns the active connection to a physical
unit. Today: `minimateplus/`, `/device/*` HTTP endpoints,
`seismo_lab.py`. Future: live Thor / Micromate support.
2. **SDM** (data-side) — owns the database, waveform store, ingest
pipelines, and the read-API that Terra-View consumes. Today this
code lives under `sfm/` for historical reasons; the role has
migrated and the eventual rename is on the long-tail cleanup list.
3. **Codec library** — pure data-interpretation: `minimateplus/*_codec.py`,
`bw_ascii_report.py`, `micromate/idf_*.py`. Used by both SFM and
SDM, depends on neither.
Terra-View is downstream of SDM for fleet listings, event detail, etc.
The long-term vision adds a **second link** from Terra-View → SFM for
direct device interaction (see below).
The codec work in this repo isn't trying to replace BW's network
layer — BW's ACH file forwarding and Thor's IDF call-home are
battle-tested. The value is in the receiving and processing side: turn
the stream of binary+ASCII pairs into something users can search,
filter, alert on, and report from.
### Terra-View ↔ SFM device control (the long-term vision)
Today Terra-View only reads from SDM (event listings, dashboards,
project reports). When a unit goes missing — operator notices in the
Terra-View dashboard — there's no way to *do* anything from the UI.
The path of least resistance is to RDP into a Windows box and open
Blastware, which defeats the purpose of having Terra-View.
Target experience:
- Operator notices a unit in Terra-View dashboard hasn't called in.
- Clicks unit detail → "Connect to Device" button.
- Terra-View opens an embedded view (modal or side-panel) that talks
to SFM's `/device/*` endpoints over the network.
- Live view: device clock, battery, memory, current monitor status.
- Actions: start/stop monitoring, push compliance config changes, pull
fresh events, run a sensor self-check, change call-home settings.
- Audit log: every connect / action recorded in SDM for the unit
history.
Implementation steps (concrete):
- [ ] **SFM authentication & authorization layer.** Today `/device/*`
endpoints are unauthenticated — anyone on the network can call
them. Need at minimum a token-based auth, ideally with a "who
can connect to which units" mapping. Hard prerequisite for
letting Terra-View users into the control surface.
- [ ] **Terra-View "Connect to Device" entry point** on the unit
detail page. Renders only when unit has connection info on file
and the user has permission.
- [ ] **Embedded live-monitor view** in Terra-View — equivalent to
`seismo_lab.py`'s Bridge tab, but in the browser. Polls SFM's
`/device/monitor/status` on an interval; sends start/stop via
`/device/monitor/{start,stop}`.
- [ ] **Action history** — every connect / push / action call records
a row in `unit_history`, viewable on the unit detail page.
- [ ] **Series IV live-device support in SFM** — currently `/device/*`
only supports MiniMate Plus. Blocks "Connect to Device" for
Thor units until done. Depends on Thor wire-protocol capture
and a `micromate/` parallel of the `minimateplus/` modules.
### High-impact (unblocks product features)
- [ ] **Series III waveform body codec reverse-engineering.** The 5A bulk-stream body is some kind of compressed/encoded format (not raw int16 LE as previously assumed — see §7.6.1 retraction in `docs/instantel_protocol_reference.md`). Structural framing is ~50% decoded on branch `claude/codec-re-cBGNe` (tagged-block walker, segment counters); per-byte sample mapping is still open. Until this lands, the in-app waveform viewer renders garbage and BW-import peak values fall back to `_peaks_from_samples()` saturation noise. Workaround: pair every BW-imported event with its `_ASCII.TXT` so the device-authoritative peaks land in the DB regardless of codec.
- [x] **Series IV (Thor IDF) binary codec reverse-engineering.** ✅ v0.21.0 — `micromate/idf_file.read_idf_file()` decodes both IDFW (waveform body at offset `0x0f1f`, reusing `decode_waveform_v2()`; 8799% sample fidelity on quiet events) and IDFH (dedicated segment-based decoder: all 859 corpus files decode, 181,071 intervals, peaks within ~1.8% of sidecar values). `WaveformStore.save_imported_idf` now also projects parsed Thor data into a `bw_report` block via `micromate/idf_to_bw_report.py` so Thor events render in the existing Event Report PDF pipeline without a separate renderer.
- [ ] **In-app waveform viewer accuracy.** Depends on Series III codec decode. Plot.v1 JSON pipeline + viewer skeleton already exist; will start showing real waveforms automatically once `_decode_a5_waveform` produces correct samples. Series IV waveforms come online when the IDF codec lands.
- [ ] **Series IV live-device support.** Once the IDF binary is decoded, extend `micromate/` with `transport.py` / `framing.py` / `protocol.py` / `client.py` mirroring the `minimateplus/` package layout — depends on capturing Thor's wire protocol (TCP / RS-232 captures TBD).
- [ ] **Terra-view integration** — seismo-relay router, unit detail page, VISON-style event listing.
- [ ] **Vibration summary reports** — highest legit PPV per project → Word doc (false-trigger filtering first).
### BW ASCII report parser enhancements (built in v0.16.0)
- [x] **PPV field misses on certain TXT formats.** ✅ v0.20.0 — root cause was the `OORANGE` (Out Of Range) saturation marker that BW writes when a channel exceeds its full-scale; `_parse_number()` returned None for the non-numeric value. Parser now substitutes `geo_range_ips` as a lower bound + sets `ppv_saturated` flag. All 5 prod events (T190LD5Q.LK0W, T438L713.RY0W, K557L3YM.OE0W, + 2 others) now parse cleanly.
- [x] **Histogram-specific structural fields.** ✅ v0.20.0 — `Histogram Start/Stop Time+Date`, `Number of Intervals`, `Interval Size`, per-channel `Peak Time` + `Peak Date`, and `Peak Vector Sum Date` all parse now. Land in the sidecar's `bw_report.histogram` block.
- [ ] **Histogram interval bin-table parsing.** Trailing 792-row table (per-interval Peak/Freq per channel + MicL) in histogram TXTs is unparsed. Probably too big for the sidecar JSON; may want a separate `.histogram.h5` companion file.
- [x] **`>100 Hz` value parsing.** ✅ v0.20.0 — parser now mirrors the OORANGE pattern: stores 100.0 on `zc_freq_hz` + sets `zc_freq_above_range` flag. PDF + both modals render `>100 Hz` instead of `—`.
### Ingestion gaps
- [ ] **MLG forwarding.** `series3-watcher` forwards event binaries + their `_ASCII.TXT` reports, but skips `.MLG` per-unit monitor log files entirely. Adding an `POST /db/import/mlg_file` endpoint + watcher scan path would populate `monitor_log` for non-ACH-routed units (coverage queries, "was this unit monitoring on date X" lookups).
- [ ] **0C-record raw bytes persistence in the sidecar.** Currently on branch `claude/codec-re-cBGNe` as commit `a187124`; cherry-pick if useful as a standalone fix. Preserves the 210-byte 0C record under `extensions.raw_records.waveform_record_b64` so future field-offset analysis (Peak Acceleration / Time of Peak / etc. — the fields BW computes client-side from samples) can run offline.
### Operational
- [ ] **`series3-watcher` file archive manager** — 90-day-old events moved to `<watch_folder>_archive/<year>/<month>/` subfolders. Plan drafted in `claude/codec-re-cBGNe`'s plan-mode session; awaiting a 5-minute test on whether Blastware UI walks subfolders before any code lands (determines layout: in-place subfolders vs sibling archive).
- [ ] **Compliance config encoder** — build raw write payloads from a `ComplianceConfig` object.
- [ ] **Modem manager** — push RV50/RV55 configs via Sierra Wireless API.
- [ ] **Call Home dial_string write support** (requires DLE escaping for embedded control characters).
- [ ] **Histogram mode recording support** (5A stream analysis for mode 0x03 — separate from histogram ASCII parsing above).
### Test coverage
- [ ] Verify 30-sec event download — body may exceed `0xFFFF` and force the device into a different `end_key` encoding (none of the 2/3/10-sec test cases hit this boundary).
- [ ] Histogram mode (0x03) write via SFM — confirmed working for Single Shot / Continuous / Histogram+Continuous; Histogram (0x03) needs a live test from a non-Histogram starting state.
### Lower-priority cleanups
- [ ] Compliance write anchor-9 cleanup — when changing recording_mode via SFM, a spurious `0x10` may persist after Histogram→other mode transitions. Doesn't affect device operation but differs from BW's byte-perfect output.
- [ ] Locate "Sensor Check" byte in compliance config (need capture with Disabled vs Before-monitoring).
- [ ] Call Home — map time slots 3/4 offsets; confirm `modem_power_relay_enabled`.
- [ ] RV55 DCD/DTR — newer RV55 firmware doesn't assert DCD by default; units don't resume monitoring after call-home disconnect (`--restart-monitoring` flag deferred).
- [ ] **NULL-timestamp duplicate-row dedup.** A small handful of events (2 known on prod as of 2026-05-22) have `events.timestamp IS NULL` because the codec couldn't extract a timestamp from the binary footer. The `UNIQUE(serial, timestamp)` constraint doesn't fire on `NULL` (SQL semantics: `NULL ≠ NULL`), so every `--force` backfill INSERTs a new row instead of UPSERTing the existing one. Cleanup: a one-shot SQL query that keeps only the newest row per `(serial, blastware_filename)` and deletes the rest. Longer-term: extend the unique key to `(serial, COALESCE(timestamp, blastware_filename))` or reject inserts with NULL timestamp.
- [ ] **Histogram body sub-format with `byte[5] != 0`.** ~3 events on prod (`T190LD5Q.LD0H`, `O121L4L1.GU0H`) use a histogram body my walker doesn't recognize — the first block has `byte[5] = 0x01` or `0x07` instead of `0x00`, and the entire body lacks the `1e 0a 00 00` tail signature. Codec returns 0 valid blocks; their DB PVS comes from the bw_report ASCII overlay (which BW computed from the same binary, so the DB columns are correct). Only the `.h5` waveform plot is empty. Cracking the sub-format would unlock the plot. Needs binary+ASCII pairs from a few `byte[5]!=0` events; same RE approach as the K558 case.
- [ ] **Histogram body sub-format with `byte[5] == 0x00` but undecodable.** Observed 2026-05-28 on BE17353 (S353) events: `S353L4H2.FZ0H`, `S353L4H2.P00H`, `S353L4H3.7O0H`, `S353L4H3.E10H`. Body starts `00 00 00 01 0a 00 XX 00 ...` which LOOKS like a valid histogram block header (marker 0x000a at byte[4:6] ✓, byte[5]=0x00 normal-format ✓), but the walker finds zero data blocks across the whole body. Likely an extra header before the block stream OR a different tail signature than `1e 0a 00 00`. Smaller body lengths (1900-2100 bytes) suggest these may be short-recording histogram variants. Same operational impact as the byte[5]!=0 case: event ingests cleanly, DB peaks correct via bw_report overlay, only the chart is empty. Worth dumping a hex view of one body to diagnose.
- [ ] **Sensor-check waveform extraction from the BW binary.** BW's Event Report PDFs include a narrow panel on the right side of the waveform plot showing each channel's response to the sensor self-check signal (a damped sinusoid for geo, sawtooth-at-test-freq for mic). Our parser captures the test RESULTS (`test_freq_hz`, `test_ratio`, `test_amplitude_mv`, `test_results` pass/fail) and the PDF + modal display them as text — but BW's per-sample sensor-check waveform isn't accessible to us today. Two paths to add it: (a) RE the binary to find where the sensor-check samples are stored — could be a section before STRT, after the footer, or in a separate sub-record; protocol reference doesn't currently mention it. (b) If samples aren't in the binary, synthesize a representative waveform from the test parameters (damped sinusoid at `test_freq_hz` with damping from `test_ratio`). Path (a) is the honest answer; path (b) is decorative. Until either lands, the text-only sensor-check display in the report is fine.
+66
View File
@@ -0,0 +1,66 @@
# analysis/ — exploratory scripts for waveform-body RE
**These are scratch.** Run them, read them, copy them, but don't trust
them as documentation. When a finding is verified it gets promoted
to `minimateplus/waveform_codec.py` and `tests/test_waveform_codec.py`;
when it's wrong it stays here as a fossil.
Authoritative status lives in:
- `docs/waveform_codec_re_status.md` (current truth, working note)
- `minimateplus/waveform_codec.py` (verified implementation + docstring)
- `tests/test_waveform_codec.py` (regression locks against fixtures)
---
## Still useful
| File | What it does |
|---|---|
| `load_bundle.py` | Fixture loader. Parses BW binary + ASCII TXT into a `Bundle` dataclass with samples, metadata, body bytes. Used by most other scripts here. |
| `verify_tran.py` | Verifies `decode_tran_initial` against fixture ground truth across all events. Useful when you change the decoder and want a quick sanity check. |
| `inspect_5_11.py` | Inspects the 5-11-26 high-amplitude bundle's body structure, prints metadata, peaks, and block counts. |
| `walk_5_11.py` | Walks blocks for the 5-11-26 bundle and prints offset/tag/length/data. |
| `seg1_blocks.py` | Dumps all blocks in segment 1 of each event. The starting point for cracking multi-segment Tran continuation. |
| `full_tran.py` | Multi-segment Tran decoder attempt (broken — diverges at sample ~512). Useful as a starting scaffold for the next experiment. |
| `multi_segment.py` | Earlier multi-segment attempt with different segment-header consumption strategies. Records what didn't work. |
| `test_rle.py` | Tests `00 NN` interpretation as zero-RLE with different divisor values. Documents how the RLE rule was confirmed. |
## Superseded — keep for archaeology
| File | Superseded by |
|---|---|
| `walk_v2.py``walk_v5.py` | `walk_v6.py` and ultimately `minimateplus/waveform_codec.walk_body`. Each version represents one round of refinement. Don't read in isolation — read the diff between them to see what was learned. |
| `walk_chunks.py` | `walk_v6.py` / production walker |
| `decode_v1.py` | First naive decoder attempt. Wrong but readable. |
## Pure exploration — read if curious
| File | What it explored |
|---|---|
| `inspect_body.py` | Byte-frequency stats per event. Established that bytes 0x00 / 0x10 dominate. |
| `find_blocks.py` | Searched for repeating 2-byte tag patterns. |
| `find_signal_runs.py` | Searched for stretches of bytes that "look like a smooth signal" (small inter-byte deltas). Found the `20 NN` literal blocks. |
| `dump_head.py`, `dump_trailer.py`, `dump_around.py` | Hex dumpers at various body positions. |
| `compare_cd.py` | Byte-diff between event-c and event-d (same length, similar signal). Used to identify structural vs data bytes. |
| `brute_force.py` | Tested 96 combinations of channel-permutation × nibble-order × sign-convention × init-from-header on the quiet bundle. All failed because the quiet bundle had T[0]=T[1]=0, making the preamble undetectable. |
| `try_nibbles.py`, `try_layouts.py` | Earlier channel-interleaving hypotheses. All wrong. |
| `test_tran_continue.py` | Test of "Tran continues uninterrupted across `30 04` blocks" hypothesis. Disproven. |
---
## Adding new scripts
If you're picking up the codec work, feel free to add new scripts here.
Suggested conventions:
- Start the filename with what you're testing: `test_<hypothesis>.py`,
`verify_<piece>.py`, `inspect_<region>.py`.
- Print enough output that the reader can see exactly which events
match / diverge and where.
- When a finding is solid, move the verified logic to
`minimateplus/waveform_codec.py` and add a regression test in
`tests/test_waveform_codec.py` — don't leave the truth only in
this directory.
- If a script is fully superseded, leave it in place (don't delete) —
the fossil record is useful when re-evaluating hypotheses later.
+93
View File
@@ -0,0 +1,93 @@
"""Brute-force test channel permutations / nibble orders on event-d (simplest signal)."""
import sys
import itertools
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
from minimateplus.waveform_codec import walk_body
def s4(n):
return n if n < 8 else n - 16
def decode(body, channel_perm, nibble_order, sign_mode, init_from_header):
"""Try one decoder configuration on event-d. Returns first 8 cumulative samples per channel."""
blocks = walk_body(body)
# Initial values from bytes [4:7] if init_from_header else 0
if init_from_header:
init = [body[4] if body[4] < 128 else body[4] - 256,
body[5] if body[5] < 128 else body[5] - 256,
body[6] if body[6] < 128 else body[6] - 256,
0]
else:
init = [0, 0, 0, 0]
cur = list(init)
out = [[init[0]], [init[1]], [init[2]], [init[3]]] # sample 0 = init
nibble_idx = 0 # within delta stream; channel = channel_perm[nibble_idx % 4]
# Walk only the 10 NN data blocks
for blk in blocks:
if blk.tag_hi != 0x10:
continue
for byte in blk.data:
if nibble_order == 'high_first':
nib1, nib2 = (byte >> 4) & 0xF, byte & 0xF
else:
nib1, nib2 = byte & 0xF, (byte >> 4) & 0xF
for nib in (nib1, nib2):
if sign_mode == 'signed':
delta = s4(nib)
else:
delta = nib
ch = channel_perm[nibble_idx % 4]
cur[ch] += delta
if (nibble_idx + 1) % 4 == 0:
out[0].append(cur[0])
out[1].append(cur[1])
out[2].append(cur[2])
out[3].append(cur[3])
nibble_idx += 1
if len(out[0]) >= 16:
return out
return out
def best_match(pred, truth, n=10):
"""Sum of squared differences in first n samples."""
n = min(n, len(pred), len(truth))
return sum((pred[i] - truth[i])**2 for i in range(n))
def main():
b = load_bundle("event-d")
# truth in 16-count units
tr = {ch: [round(v * 200) for v in b.samples[ch]] for ch in ("Tran", "Vert", "Long")}
print("Truth event-d first 10 samples:")
for ch in ("Tran", "Vert", "Long"):
print(f" {ch}: {tr[ch][:10]}")
# Test 96 combinations
best = []
for perm in itertools.permutations([0, 1, 2, 3]):
for nibble_order in ('high_first', 'low_first'):
for sign in ('signed', 'unsigned'):
for init_h in (False, True):
decoded = decode(b.body, perm, nibble_order, sign, init_h)
# Score as TVL channel-sum
score = sum(
best_match(decoded[i], tr[ch], n=10)
for i, ch in enumerate(("Tran", "Vert", "Long"))
if i < 3
)
label = f"perm={perm} nib={nibble_order[:1]} sign={sign[:3]} init={init_h}"
best.append((score, label, decoded))
best.sort(key=lambda x: x[0])
print(f"\nTop 10 configurations:")
for s, lbl, dec in best[:10]:
print(f" score={s:>5} {lbl} T={dec[0][:8]} V={dec[1][:8]} L={dec[2][:8]}")
if __name__ == "__main__":
main()
+42
View File
@@ -0,0 +1,42 @@
"""Compare event-c and event-d (same N_samples) to find header vs data bytes."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def main():
bc = load_bundle("event-c")
bd = load_bundle("event-d")
# Compare prefixes
nc, nd = len(bc.body), len(bd.body)
n = min(nc, nd)
diffs = []
for i in range(n):
if bc.body[i] != bd.body[i]:
diffs.append(i)
print(f"event-c body={nc}, event-d body={nd}")
print(f"Total diffs (first {n}): {len(diffs)}")
# Show common prefix
same_prefix = 0
for i in range(n):
if bc.body[i] == bd.body[i]:
same_prefix += 1
else:
break
print(f"Common prefix length: {same_prefix}")
print(f"event-c prefix: {bc.body[:same_prefix].hex(' ')}")
# Look for runs of common bytes
print(f"\nFirst 32 diff positions: {diffs[:32]}")
# Show the "diff fingerprint" of the first 100 bytes
print(f"\n pos c d")
for i in range(0, 100):
marker = " " if bc.body[i] == bd.body[i] else "*"
bd_b = bd.body[i] if i < nd else None
print(f" {i:>3} {bc.body[i]:02x}{marker} {bd_b:02x}" if bd_b is not None else f" {i:>3} {bc.body[i]:02x}{marker}")
if __name__ == "__main__":
main()
+99
View File
@@ -0,0 +1,99 @@
"""
Decoder v1: nibble-pair signed deltas in 10 NN blocks, 4-channel round-robin.
"""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def s4(n):
return n if n < 8 else n - 16
def walk_blocks(body, start):
i = start
blocks = []
while i + 1 < len(body):
t0, t1 = body[i], body[i + 1]
if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 // 2 + 2
data = bytes(body[i + 2 : i + length])
blocks.append(("10", t1, data))
i += length
elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 + 2
data = bytes(body[i + 2 : i + length])
blocks.append(("20", t1, data))
i += length
elif t0 == 0x00 and t1 % 4 == 0:
blocks.append(("00", t1, b""))
i += 2
elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
length = t1 * 4
data = bytes(body[i + 2 : i + length])
blocks.append(("30", t1, data))
i += length
elif t0 == 0x40 and t1 == 0x02:
length = 20
data = bytes(body[i + 2 : i + length])
blocks.append(("40", t1, data))
i += length
else:
blocks.append(("??", t0, bytes(body[i:i+8])))
break
return blocks
def decode_v1(body, start, n_samples):
"""Decode by accumulating nibble-pair deltas from all 10 NN blocks."""
blocks = walk_blocks(body, start)
# 4 channels: T, V, L, M
cur = [0, 0, 0, 0]
out = [[], [], [], []]
sample_index = 0 # how many sample-sets emitted
for typ, NN, data in blocks:
if typ == "10":
# 2 nibbles per byte, round-robin TVLM
for byte in data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
ch = sample_index % 4
cur[ch] += s4(nib)
out[ch].append(cur[ch])
sample_index = (sample_index + 1) // 4 * 4 + (sample_index + 1) % 4 # ?
sample_index += 1
# We emit per-nibble, but the structure is unclear
elif typ == "20":
# int8 absolute or delta?
for byte in data:
v = byte if byte < 128 else byte - 256
ch = sample_index % 4
cur[ch] = v # treat as absolute
out[ch].append(cur[ch])
sample_index += 1
return out
def main():
b = load_bundle("event-c")
body = b.body
truth_T = [round(v * 200) for v in b.samples["Tran"]]
truth_V = [round(v * 200) for v in b.samples["Vert"]]
truth_L = [round(v * 200) for v in b.samples["Long"]]
# Find start
for s in range(15):
if body[s] == 0x10 and body[s+1] % 4 == 0 and 0 < body[s+1] <= 0xFC:
start = s
break
blocks = walk_blocks(body, start)
# Print block-by-block what's in each
print(f"Total blocks: {len(blocks)}")
bytes_processed = 0
for typ, NN, data in blocks[:30]:
print(f" type={typ} NN=0x{NN:02x} data_len={len(data)} data_hex={data[:32].hex(' ')}{'...' if len(data) > 32 else ''}")
if __name__ == "__main__":
main()
+27
View File
@@ -0,0 +1,27 @@
"""Dump body bytes around a specific offset."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def dump_around(name: str, center: int, radius: int = 96):
b = load_bundle(name)
body = b.body
start = max(0, center - radius)
end = min(len(body), center + radius)
print(f"\n=== {name} body[{start}:{end}] (full body={len(body)}) ===")
for i in range(start, end, 32):
row = body[i:i+32]
marker = " <-- center" if i <= center < i+32 else ""
print(f" +{i:>5} {row.hex(' ')}{marker}")
def main():
# Look at the trailer transitions
trailer_starts = {"event-a": 7047, "event-b": 6475, "event-c": 4043, "event-d": 3941}
for name, off in trailer_starts.items():
dump_around(name, off, 96)
if __name__ == "__main__":
main()
+18
View File
@@ -0,0 +1,18 @@
"""Dump the START of each body in 32-byte rows."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def main():
for name in ("event-a", "event-c"):
b = load_bundle(name)
body = b.body
print(f"\n=== {name} body[0:512] (full body={len(body)}, samples={len(b.samples['Tran'])}) ===")
for i in range(0, min(512, len(body)), 32):
row = body[i:i+32]
print(f" +{i:>5} {row.hex(' ')}")
if __name__ == "__main__":
main()
+24
View File
@@ -0,0 +1,24 @@
"""Dump body bytes split into 32-byte rows starting from `start_offset`."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def dump(body: bytes, name: str, start: int, n_rows: int = 30):
print(f"\n=== {name} body[{start}:] (full body={len(body)}) ===")
end = min(start + 32 * n_rows, len(body))
for i in range(start, end, 32):
row = body[i:i+32]
print(f" +{i:>5} {row.hex(' ')}")
def main():
for name in ("event-a", "event-b", "event-c", "event-d"):
b = load_bundle(name)
# Print the LAST ~600 bytes of the body to see the tail structure
start = max(0, len(b.body) - 32 * 12)
dump(b.body, name, start, 12)
if __name__ == "__main__":
main()
+41
View File
@@ -0,0 +1,41 @@
"""Search for structural repetition in the body bytes."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def find_pattern_offsets(body: bytes, pattern: bytes, max_count=20):
out = []
i = 0
while True:
i = body.find(pattern, i)
if i < 0:
break
out.append(i)
i += 1
if len(out) >= max_count:
break
return out
def main():
for name in ("event-a", "event-b", "event-c", "event-d"):
b = load_bundle(name)
body = b.body
print(f"\n=== {name} (body={len(body)}, N_samples={len(b.samples['Tran'])}) ===")
# Try to find repeating substructures (look for 4-byte 0x10-prefixed markers)
for prefix in [b"\x10\x10", b"\x10\x04", b"\x10\x08", b"\x10\x0c", b"\x10\x18",
b"\x10\x14", b"\x10\x20", b"\x10\x40", b"\x10\x80", b"\x10\x00",
b"\x10\x01", b"\x10\x03", b"\x10\xf0", b"\xf1\x10", b"\x00\x10",
b"\x40\x02", b"\x20\x04", b"\x30\x04", b"\x30\x08", b"\x00\x1a"]:
offs = find_pattern_offsets(body, prefix, max_count=200)
if 1 <= len(offs) <= 1000:
# Print first 10 offsets
first = offs[:6]
last = offs[-3:]
print(f" '{prefix.hex()}' x{len(offs):>4} first={first} last={last}")
if __name__ == "__main__":
main()
+34
View File
@@ -0,0 +1,34 @@
"""Find body byte ranges that look like absolute int8 sample data (smooth waveform)."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def looks_like_smooth_int8(buf):
"""Convert bytes to int8 and check if successive deltas are small (waveform-like)."""
if len(buf) < 8:
return 0.0
vals = [b if b < 128 else b - 256 for b in buf]
diffs = [abs(vals[i+1] - vals[i]) for i in range(len(vals)-1)]
avg_diff = sum(diffs) / len(diffs)
return avg_diff
def main():
for name in ("event-a", "event-c"):
b = load_bundle(name)
body = b.body
# Scan with sliding window of 64 bytes; find segments where the bytes look like a smooth wave
win = 64
scores = []
for i in range(len(body) - win):
scores.append((i, looks_like_smooth_int8(body[i:i+win])))
# Lowest avg_diff means smoothest
scores.sort(key=lambda x: x[1])
print(f"\n=== {name} (body={len(body)}) — smoothest 10 windows ===")
for off, s in scores[:10]:
print(f" +{off:>5} avg_diff={s:.2f} bytes={body[off:off+24].hex(' ')}")
if __name__ == "__main__":
main()
+76
View File
@@ -0,0 +1,76 @@
"""Full Tran decoder: continues across segment headers using T_delta from header bytes [0:2]."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import _parse_txt
from minimateplus.waveform_codec import walk_body, find_data_start
def s4(n):
return n if n < 8 else n - 16
def i8(b):
return b if b < 128 else b - 256
def decode_full_tran(body):
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
return None
T0 = int.from_bytes(body[3:5], "big", signed=True)
T1 = int.from_bytes(body[5:7], "big", signed=True)
i = 7
while i + 1 < len(body) and body[i] not in (0x00, 0x10, 0x20, 0x30, 0x40):
i += 1
blocks = walk_body(body, i)
T = [T0, T1]
cur = T1
for blk in blocks:
if blk.tag_hi == 0x40:
# Segment header carries 2 T deltas (int16 BE each) at bytes [0:2] and [2:4]
if len(blk.data) >= 4:
delta1 = int.from_bytes(blk.data[0:2], "big", signed=True)
cur += delta1
T.append(cur)
delta2 = int.from_bytes(blk.data[2:4], "big", signed=True)
cur += delta2
T.append(cur)
elif blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur += s4(nib)
T.append(cur)
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur += i8(byte)
T.append(cur)
elif blk.tag_hi == 0x00:
for _ in range(blk.tag_lo):
T.append(cur)
# 30 NN: skip for now
return T
def main():
for stem in ("M529LL1L.V70", "M529LL1L.JQ0", "M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
path = f"tests/fixtures/5-11-26/{stem}"
with open(path, "rb") as f:
body = f.read()[43:-26]
_, samples = _parse_txt(path + ".TXT")
truth_T = [round(v*200) for v in samples["Tran"]]
n_truth = len(truth_T)
decoded = decode_full_tran(body)
n = min(len(decoded), n_truth)
matches = sum(1 for i in range(n) if decoded[i] == truth_T[i])
div_at = -1
for i in range(n):
if decoded[i] != truth_T[i]:
div_at = i
break
print(f"{stem}: decoded={len(decoded)}, truth={n_truth}, matches={matches}/{n}, first div={div_at}")
if __name__ == "__main__":
main()
+50
View File
@@ -0,0 +1,50 @@
"""Quick inspection of the new high-amplitude events."""
import os, re, sys
sys.path.insert(0, ".")
from analysis.load_bundle import _parse_txt
from minimateplus.waveform_codec import walk_body, find_data_start
ROOT = "tests/fixtures/5-11-26"
def main():
for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
bin_path = os.path.join(ROOT, stem)
txt_path = bin_path + ".TXT"
with open(bin_path, "rb") as f:
raw = f.read()
body = raw[43:-26]
meta, samples = _parse_txt(txt_path)
n = len(samples["Tran"])
print(f"\n=== {stem} ===")
print(f" file={len(raw)}, body={len(body)}, N_samples={n}")
print(f" rectime={meta.get('Record Time')} pretrig={meta.get('Pre-trigger Length')}")
print(f" PPV(T,V,L)={meta.get('Tran PPV')} / {meta.get('Vert PPV')} / {meta.get('Long PPV')}")
# Show first few non-trivial samples
print(f" First 5 truth samples (in/s):")
for i in range(5):
print(f" T={samples['Tran'][i]:8.3f} V={samples['Vert'][i]:8.3f} "
f"L={samples['Long'][i]:8.3f} M={samples['MicL'][i]:8.3f}")
# Peak sample positions
for ch in ("Tran", "Vert", "Long"):
vals = samples[ch]
peak_i = max(range(n), key=lambda i: abs(vals[i]))
print(f" {ch}: peak {vals[peak_i]:.3f} at sample {peak_i} (t={peak_i/1024:.3f}s)")
# Body structure
start = find_data_start(body)
blocks = walk_body(body, start)
types = {}
for b in blocks:
types[b.tag_hi] = types.get(b.tag_hi, 0) + 1
print(f" body start={start}, total blocks walked: {len(blocks)}")
print(f" block tag counts: {types}")
# How far the walker got
if blocks:
last = blocks[-1]
walked = last.offset + last.length
print(f" walker stopped at offset {walked}/{len(body)} ({100*walked/len(body):.0f}%)")
if __name__ == "__main__":
main()
+23
View File
@@ -0,0 +1,23 @@
"""Print raw body hex + byte-distribution stats for one event."""
from collections import Counter
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def main():
for name in ("event-a", "event-b", "event-c", "event-d"):
b = load_bundle(name)
body = b.body
print(f"\n=== {name} ({len(body)} body bytes) ===")
print(f" STRT: {b.strt.hex()}")
print(f" body[0:64]: {body[:64].hex()}")
print(f" body[64:128]: {body[64:128].hex()}")
print(f" body[-32:]: {body[-32:].hex()}")
cnt = Counter(body)
print(f" top 16 bytes: {[(f'0x{k:02x}', f'{v/len(body):.2%}') for k,v in cnt.most_common(16)]}")
if __name__ == "__main__":
main()
+144
View File
@@ -0,0 +1,144 @@
"""
load_bundle.py — extract body bytes from BW binary + parse sample columns from TXT.
Used by the codec reverse-engineering scripts in this directory.
"""
from __future__ import annotations
import os
import re
from dataclasses import dataclass
BUNDLE_ROOT = os.path.join(
os.path.dirname(__file__), "..", "tests", "fixtures", "decode-re-5-8-26"
)
@dataclass
class Bundle:
name: str
bin_path: str
txt_path: str
bin: bytes
body: bytes # bytes between STRT (43) and footer (last 26)
strt: bytes # 21-byte STRT record
samples: dict # {"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}
sample_rate: int
rectime_sec: float
pretrig_sec: float
geo_range_ips: float
ppv: dict # {"Tran": float, "Vert": float, "Long": float}
mic_pspl: float
serial: str
def _parse_txt(path: str) -> dict:
with open(path, "r", encoding="utf-8", errors="replace") as f:
text = f.read()
meta = {}
samples = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
# Find header line that starts the columns ("Tran Vert Long MicL").
# Then every line after is sample data (4 tab-separated floats).
lines = text.splitlines()
header_idx = None
for i, line in enumerate(lines):
if "Tran" in line and "Vert" in line and "Long" in line and "MicL" in line:
# The columns header. Sample lines start a few lines later.
header_idx = i
break
if header_idx is None:
raise ValueError(f"no Tran/Vert/Long/MicL header in {path}")
# Parse meta — quoted lines with "Field : value"
for line in lines[:header_idx]:
m = re.match(r'^"([^"]+)\s*:\s*([^"]*)"', line.strip())
if m:
k, v = m.group(1).strip(), m.group(2).strip()
meta[k] = v
# Parse samples
for line in lines[header_idx + 1 :]:
line = line.strip()
if not line:
continue
parts = re.split(r"\s+", line)
if len(parts) < 4:
continue
try:
t = float(parts[0])
v = float(parts[1])
l = float(parts[2])
m = float(parts[3])
except ValueError:
continue
samples["Tran"].append(t)
samples["Vert"].append(v)
samples["Long"].append(l)
samples["MicL"].append(m)
return meta, samples
def load_bundle(name: str) -> Bundle:
folder = os.path.join(BUNDLE_ROOT, name)
files = os.listdir(folder)
bin_name = next(f for f in files if not f.endswith(".TXT"))
txt_name = next(f for f in files if f.endswith(".TXT"))
bin_path = os.path.join(folder, bin_name)
txt_path = os.path.join(folder, txt_name)
with open(bin_path, "rb") as f:
binary = f.read()
# Header is 22 bytes; STRT at [22:43]; footer at last 26 bytes.
strt = binary[22:43]
body = binary[43:-26]
meta, samples = _parse_txt(txt_path)
sample_rate = int(re.search(r"(\d+)", meta.get("Sample Rate", "1024")).group(1))
rectime_sec = float(re.search(r"([\d.]+)", meta.get("Record Time", "3.0")).group(1))
pretrig_sec = float(re.search(r"-?[\d.]+", meta.get("Pre-trigger Length", "0")).group(0))
geo_range_ips = float(re.search(r"([\d.]+)", meta.get("Geo Range", "10.0")).group(1))
serial = meta.get("Serial Number", "").strip()
def _f(s):
return float(re.search(r"-?[\d.]+", s).group(0))
ppv = {
"Tran": _f(meta.get("Tran PPV", "0")),
"Vert": _f(meta.get("Vert PPV", "0")),
"Long": _f(meta.get("Long PPV", "0")),
}
mic_pspl = _f(meta.get("MicL PSPL", "0"))
return Bundle(
name=name,
bin_path=bin_path,
txt_path=txt_path,
bin=binary,
body=body,
strt=strt,
samples=samples,
sample_rate=sample_rate,
rectime_sec=rectime_sec,
pretrig_sec=pretrig_sec,
geo_range_ips=geo_range_ips,
ppv=ppv,
mic_pspl=mic_pspl,
serial=serial,
)
if __name__ == "__main__":
for name in ("event-a", "event-b", "event-c", "event-d"):
b = load_bundle(name)
n = len(b.samples["Tran"])
print(f"{name}: body={len(b.body):>6} N_samples={n} rate={b.sample_rate} "
f"rectime={b.rectime_sec} pretrig={b.pretrig_sec} range={b.geo_range_ips} "
f"PPV(T,V,L)={b.ppv['Tran']:.3f},{b.ppv['Vert']:.3f},{b.ppv['Long']:.3f} "
f"MicL={b.mic_pspl}")
+81
View File
@@ -0,0 +1,81 @@
"""Decode Tran across multiple segments by resetting at 40 02 headers."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import _parse_txt
from minimateplus.waveform_codec import walk_body, find_data_start
def s4(n):
return n if n < 8 else n - 16
def i8(b):
return b if b < 128 else b - 256
def decode_full_tran(body):
"""Decode all Tran samples in the body, walking through segments."""
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
return None
T0 = int.from_bytes(body[3:5], "big", signed=True)
T1 = int.from_bytes(body[5:7], "big", signed=True)
# Locate first tag
i = 7
while i + 1 < len(body) and body[i] not in (0x00, 0x10, 0x20, 0x30, 0x40):
i += 1
blocks = walk_body(body, i)
T = [T0, T1]
cur = T1
for bi, blk in enumerate(blocks):
if blk.tag_hi == 0x40:
# Segment header — try interpreting bytes [0:2] as new T anchor
if len(blk.data) >= 2:
new_anchor = int.from_bytes(blk.data[0:2], "big", signed=True)
# The next sample IS this anchor value, NOT a delta from cur.
T.append(new_anchor)
cur = new_anchor
elif blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur += s4(nib)
T.append(cur)
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur += i8(byte)
T.append(cur)
elif blk.tag_hi == 0x00:
# RLE: append NN zero deltas
for _ in range(blk.tag_lo):
T.append(cur)
# 30 NN: skip
return T
def main():
for stem in ("M529LL1L.V70", "M529LL1L.JQ0", "M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
path = f"tests/fixtures/5-11-26/{stem}"
with open(path, "rb") as f:
body = f.read()[43:-26]
_, samples = _parse_txt(path + ".TXT")
truth_T = [round(v*200) for v in samples["Tran"]]
n_truth = len(truth_T)
decoded = decode_full_tran(body)
n = min(len(decoded), n_truth)
matches = sum(1 for i in range(n) if decoded[i] == truth_T[i])
# Find first divergence
div_at = -1
for i in range(n):
if decoded[i] != truth_T[i]:
div_at = i
break
print(f"{stem}: decoded={len(decoded)}, truth={n_truth}, matches={matches}/{n}, first div={div_at}")
if div_at >= 0 and div_at < 30:
print(f" truth around div [{max(0,div_at-3)}:{div_at+8}]: {truth_T[max(0,div_at-3):div_at+8]}")
print(f" pred around div [{max(0,div_at-3)}:{div_at+8}]: {decoded[max(0,div_at-3):div_at+8]}")
if __name__ == "__main__":
main()
+28
View File
@@ -0,0 +1,28 @@
"""Dump all blocks in segment 1 of each event with their data."""
import sys
sys.path.insert(0, ".")
from minimateplus.waveform_codec import walk_body, find_data_start
def main():
for stem in ("M529LL1A.SP0", "M529LL1L.JQ0", "M529LL1L.V70"):
path = f"tests/fixtures/5-11-26/{stem}"
with open(path, "rb") as f:
body = f.read()[43:-26]
blocks = walk_body(body, find_data_start(body))
# Find segment 1 (between first and second 40 02)
seg40_indices = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
if len(seg40_indices) < 2:
print(f"\n{stem}: only {len(seg40_indices)} segment headers found")
seg1_blocks = blocks[seg40_indices[0]:] if seg40_indices else []
else:
seg1_blocks = blocks[seg40_indices[0]:seg40_indices[1]+1]
print(f"\n=== {stem} segment 1 ({len(seg1_blocks)} blocks) ===")
for b in seg1_blocks[:25]:
tag = f"{b.tag_hi:02x}{b.tag_lo:02x}"
print(f" off={b.offset:>5} {tag} NN=0x{b.tag_lo:02x}({b.tag_lo:>3}) len={b.length:>3} data={b.data[:16].hex(' ')}{'...' if len(b.data)>16 else ''}")
if __name__ == "__main__":
main()
+195
View File
@@ -0,0 +1,195 @@
"""Test 12-bit signed packed deltas hypothesis for 30 NN blocks across all loud events.
For each 30 NN block in each event, identify what samples it should cover
(based on the cumulative delta count up to that point) and compare the
truth deltas against various 12-bit packing schemes.
"""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import _parse_txt
from minimateplus.waveform_codec import walk_body, find_data_start
CHANNEL_ORDER = ["Vert", "Long", "MicL", "Tran"] # rotation after initial T
def s12(v):
"""Sign-extend a 12-bit unsigned value to signed int."""
return v if v < 0x800 else v - 0x1000
def unpack_12bit_be(data):
"""4 deltas in 6 bytes, BE order: byte[0:1.5], byte[1.5:3], byte[3:4.5], byte[4.5:6]."""
# bits 0..47 (MSB-first), split into 4 × 12-bit
val = int.from_bytes(data, "big")
out = []
for i in range(4):
d = (val >> (12 * (3 - i))) & 0xFFF
out.append(s12(d))
return out
def unpack_12bit_le(data):
"""4 deltas in 6 bytes, LE order: bytes packed as 2 × 24-bit groups."""
out = []
# First 3 bytes contain 2 deltas
b0, b1, b2 = data[0], data[1], data[2]
d0 = b0 | ((b1 & 0x0F) << 8)
d1 = (b1 >> 4) | (b2 << 4)
out.append(s12(d0))
out.append(s12(d1))
# Next 3 bytes contain 2 more deltas
b3, b4, b5 = data[3], data[4], data[5]
d2 = b3 | ((b4 & 0x0F) << 8)
d3 = (b4 >> 4) | (b5 << 4)
out.append(s12(d2))
out.append(s12(d3))
return out
def unpack_12bit_be_per_triplet(data):
"""4 deltas as 2 triplets of (high4, low8) BE within each 3-byte group."""
out = []
b0, b1, b2 = data[0], data[1], data[2]
d0 = (b0 << 4) | (b1 >> 4)
d1 = ((b1 & 0x0F) << 8) | b2
out.append(s12(d0))
out.append(s12(d1))
b3, b4, b5 = data[3], data[4], data[5]
d2 = (b3 << 4) | (b4 >> 4)
d3 = ((b4 & 0x0F) << 8) | b5
out.append(s12(d2))
out.append(s12(d3))
return out
def truth_deltas_for_block(blocks, block_idx, event_truth, channel):
"""For a 30 NN block at block_idx, determine which samples it covers and
return the truth deltas for those samples.
Walks through all blocks before block_idx (within the same segment) and
counts how many deltas have been emitted for *channel*, starting from the
segment's anchor pair.
"""
# Find the segment header that contains this block.
seg_header_idx = None
for j in range(block_idx, -1, -1):
if blocks[j].tag_hi == 0x40:
seg_header_idx = j
break
if seg_header_idx is None:
# block is in the initial T segment; samples count from sample 2.
first_sample_in_segment = 2
else:
# Anchor pair covers samples [N, N+1] for some N. Subsequent deltas
# are samples [N+2, N+2+1, ...]. We don't actually need to know N
# for this test — just the relative position within the segment.
first_sample_in_segment = 2 # anchor=0,1; deltas start at 2
# Count deltas from segment-data start to block_idx.
delta_count = 0
start_block = seg_header_idx + 1 if seg_header_idx is not None else 0
for j in range(start_block, block_idx):
blk = blocks[j]
if blk.tag_hi == 0x10:
delta_count += blk.tag_lo # NN nibbles = NN deltas
elif blk.tag_hi == 0x20:
delta_count += blk.tag_lo # NN int8 deltas
elif blk.tag_hi == 0x00:
delta_count += blk.tag_lo # RLE zero deltas
# Now the 30 NN block carries NN deltas.
nn = blocks[block_idx].tag_lo
# First sample affected: segment first_sample + delta_count.
# But we ALSO need to know which segment this is, since the segment maps
# to a specific channel and a specific starting absolute sample index.
return first_sample_in_segment + delta_count, nn
def main():
for stem in ("M529LL1A.SP0", "M529LL1L.JQ0", "M529LL1L.V70",
"M529LL1A.SS0", "M529LL1A.SV0"):
path = f"tests/fixtures/5-11-26/{stem}"
with open(path, "rb") as f:
body = f.read()[43:-26]
_, samples = _parse_txt(path + ".TXT")
blocks = walk_body(body, find_data_start(body))
seg_idx = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
# Find all 30 NN blocks in DATA section (not trailer).
thirty_blocks = []
for bi, b in enumerate(blocks):
if b.tag_hi != 0x30:
continue
# Determine which segment this is in
seg_num = None
for k, hi in enumerate(seg_idx):
next_hi = seg_idx[k + 1] if k + 1 < len(seg_idx) else len(blocks)
if hi < bi < next_hi:
seg_num = k
break
if seg_num is None and seg_idx and bi < seg_idx[0]:
seg_num = -1 # initial T segment
thirty_blocks.append((bi, b, seg_num))
if not thirty_blocks:
continue
print(f"\n=== {stem} ===")
for bi, b, seg_num in thirty_blocks:
# Channel for this segment
if seg_num == -1:
channel = "Tran"
seg_label = "initial T"
else:
channel = CHANNEL_ORDER[seg_num % 4]
seg_label = f"seg {seg_num}"
# Count deltas before this block within the same segment.
seg_header_idx = seg_idx[seg_num] if seg_num >= 0 else -1
start_block = seg_header_idx + 1 if seg_header_idx >= 0 else 0
delta_count = 0
for j in range(start_block, bi):
blk = blocks[j]
if blk.tag_hi in (0x10, 0x20, 0x00):
delta_count += blk.tag_lo
# First sample this 30 NN block affects (within the segment)
# = anchor positions + delta_count + 2 (since anchor pair was samples 0,1)
# But the segment's first absolute sample index in the channel is
# (seg_num // 4) * 512 (approximately) if segment 0 is the first V seg.
cycle = (seg_num // 4) if seg_num >= 0 else 0
base = cycle * 512 + 2 # +2 for anchor pair
sample_idx = base + delta_count
truth_ch = [round(v * 200) for v in samples[channel]]
nn = b.tag_lo
if sample_idx + nn >= len(truth_ch):
print(f" block @ {b.offset} ({seg_label} {channel}): out of truth range")
continue
# Get the previous sample so we can compute truth deltas
if sample_idx == 0:
prev = 0
else:
prev = truth_ch[sample_idx - 1]
truth_deltas = []
for k in range(nn):
truth_deltas.append(truth_ch[sample_idx + k] - (prev if k == 0 else truth_ch[sample_idx + k - 1]))
# Try each packing
schemes = [
("12-bit BE contiguous", unpack_12bit_be(b.data)),
("12-bit LE per-triplet", unpack_12bit_le(b.data)),
("12-bit BE per-triplet", unpack_12bit_be_per_triplet(b.data)),
]
print(f" block @ {b.offset:>5} ({seg_label} {channel}, samples {sample_idx}..{sample_idx+nn-1}):")
print(f" data: {b.data.hex(' ')}")
print(f" truth: {truth_deltas}")
for name, pred in schemes:
match = "" if pred == truth_deltas else " "
n_match = sum(1 for x, y in zip(pred, truth_deltas) if x == y)
print(f" {match}{n_match}/4 {name}: {pred}")
if __name__ == "__main__":
main()
+132
View File
@@ -0,0 +1,132 @@
"""Test the '30 NN data = high-nibbles + int8 low-bytes' hypothesis.
Layout for `30 04` (6 data bytes, 4 deltas):
bytes [0:2] = 16 bits = 4 × 4-bit high-nibbles (MSB first)
bytes [2:6] = 4 × int8 low bytes
Each delta = 12-bit signed = sign-extend((high_nibble << 8) | low_byte)
"""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import _parse_txt
from minimateplus.waveform_codec import walk_body, find_data_start
def s4(n):
return n if n < 8 else n - 16
def i8(b):
return b if b < 128 else b - 256
def sign_extend_12(v):
return v if v < 0x800 else v - 0x1000
def decode_30nn(data):
"""4 × 12-bit signed deltas (high nibble + low byte).
bytes[0:2] hold the 4 high nibbles (MSB first); bytes[2:6] hold the low bytes.
"""
if len(data) < 6:
return []
# Read high nibbles from bytes 0-1 (4 nibbles MSB-first)
high_word = (data[0] << 8) | data[1]
high_nibbles = [
(high_word >> 12) & 0xF,
(high_word >> 8) & 0xF,
(high_word >> 4) & 0xF,
high_word & 0xF,
]
out = []
for i in range(4):
v = (high_nibbles[i] << 8) | data[2 + i]
out.append(sign_extend_12(v))
return out
def simulate_up_to(blocks, target_block_idx, t_preamble):
"""Run decoder up to block_idx; return per-channel sample lists.
NOW with 30 NN decoded too."""
out = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
out["Tran"].extend(t_preamble)
cur = {"Tran": t_preamble[-1], "Vert": None, "Long": None, "MicL": None}
rotation = ["Vert", "Long", "MicL", "Tran"]
current_channel = "Tran"
seg_counter = -1
for j in range(target_block_idx):
blk = blocks[j]
if blk.tag_hi == 0x40:
seg_counter += 1
prev = "Tran" if seg_counter == 0 else rotation[(seg_counter - 1) % 4]
new_ch = rotation[seg_counter % 4]
if cur[prev] is not None:
d0 = int.from_bytes(blk.data[0:2], "big", signed=True)
d1 = int.from_bytes(blk.data[2:4], "big", signed=True)
cur[prev] += d0; out[prev].append(cur[prev])
cur[prev] += d1; out[prev].append(cur[prev])
c0 = int.from_bytes(blk.data[14:16], "big", signed=True)
c1 = int.from_bytes(blk.data[16:18], "big", signed=True)
out[new_ch].extend([c0, c1])
cur[new_ch] = c1
current_channel = new_ch
elif blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur[current_channel] += s4(nib)
out[current_channel].append(cur[current_channel])
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur[current_channel] += i8(byte)
out[current_channel].append(cur[current_channel])
elif blk.tag_hi == 0x00:
for _ in range(blk.tag_lo):
out[current_channel].append(cur[current_channel])
elif blk.tag_hi == 0x30:
# NEW: decode 30 NN
deltas = decode_30nn(blk.data)
for d in deltas:
cur[current_channel] += d
out[current_channel].append(cur[current_channel])
return out, current_channel
def main():
for stem in ("M529LL1A.SP0", "M529LL1L.JQ0", "M529LL1L.V70",
"M529LL1A.SS0", "M529LL1A.SV0"):
path = f"tests/fixtures/5-11-26/{stem}"
with open(path, "rb") as f:
body = f.read()[43:-26]
_, samples = _parse_txt(path + ".TXT")
blocks = walk_body(body, find_data_start(body))
t0 = int.from_bytes(body[3:5], "big", signed=True)
t1 = int.from_bytes(body[5:7], "big", signed=True)
thirty_blocks = [(j, b) for j, b in enumerate(blocks) if b.tag_hi == 0x30]
if not thirty_blocks:
continue
print(f"\n=== {stem} ===")
for j, blk in thirty_blocks:
pred, ch = simulate_up_to(blocks, j, [t0, t1])
cur_before = pred[ch][-1]
truth = [round(v * 200) for v in samples[ch]]
n_pred = len(pred[ch])
nn = blk.tag_lo
if n_pred + nn > len(truth):
continue
# Decode this 30 NN block with hypothesis
pred_deltas = decode_30nn(blk.data)
# Compute truth deltas relative to cur_before
truth_deltas = []
prev = cur_before
for k in range(nn):
truth_deltas.append(truth[n_pred + k] - prev)
prev = truth[n_pred + k]
n_match = sum(1 for a, b in zip(pred_deltas, truth_deltas) if a == b)
tag = "" if pred_deltas == truth_deltas else " "
print(f" block @ {blk.offset:>5} (chan={ch}, NN={nn}):")
print(f" data: {blk.data.hex(' ')}")
print(f" truth: {truth_deltas}")
print(f" pred: {pred_deltas} {tag}{n_match}/{nn}")
if __name__ == "__main__":
main()
+141
View File
@@ -0,0 +1,141 @@
"""Test 30 NN packing by running the real decoder up to each 30 NN block,
recording how many samples have been produced for each channel at that point,
then checking truth deltas immediately after."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import _parse_txt
from minimateplus.waveform_codec import walk_body, find_data_start
def s4(n):
return n if n < 8 else n - 16
def i8(b):
return b if b < 128 else b - 256
def s12(v):
return v if v < 0x800 else v - 0x1000
def unpack_12bit_be_contiguous(data):
out = []
val = int.from_bytes(data, "big")
n = len(data) * 8 // 12
for i in range(n):
d = (val >> (12 * (n - 1 - i))) & 0xFFF
out.append(s12(d))
return out
def unpack_12bit_per_triplet_be(data):
out = []
for i in range(0, len(data), 3):
if i + 2 >= len(data):
break
b0, b1, b2 = data[i], data[i + 1], data[i + 2]
d0 = (b0 << 4) | (b1 >> 4)
d1 = ((b1 & 0x0F) << 8) | b2
out.append(s12(d0))
out.append(s12(d1))
return out
def simulate_up_to(blocks, target_block_idx, t_preamble):
"""Run the decoder up to block_idx; return per-channel sample lists."""
out = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
out["Tran"].extend(t_preamble)
cur = {"Tran": t_preamble[-1], "Vert": None, "Long": None, "MicL": None}
rotation = ["Vert", "Long", "MicL", "Tran"]
seg_idx = [j for j, b in enumerate(blocks) if b.tag_hi == 0x40]
# Determine which channel we're CURRENTLY decoding into
current_channel = "Tran"
seg_counter = -1 # incremented at each 40 02
for j in range(target_block_idx):
blk = blocks[j]
if blk.tag_hi == 0x40:
# Switch: extend prev channel, set up new channel
seg_counter += 1
prev = "Tran" if seg_counter == 0 else rotation[(seg_counter - 1) % 4]
new_ch = rotation[seg_counter % 4]
if cur[prev] is not None:
d0 = int.from_bytes(blk.data[0:2], "big", signed=True)
d1 = int.from_bytes(blk.data[2:4], "big", signed=True)
cur[prev] += d0; out[prev].append(cur[prev])
cur[prev] += d1; out[prev].append(cur[prev])
c0 = int.from_bytes(blk.data[14:16], "big", signed=True)
c1 = int.from_bytes(blk.data[16:18], "big", signed=True)
out[new_ch].extend([c0, c1])
cur[new_ch] = c1
current_channel = new_ch
elif blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur[current_channel] += s4(nib)
out[current_channel].append(cur[current_channel])
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur[current_channel] += i8(byte)
out[current_channel].append(cur[current_channel])
elif blk.tag_hi == 0x00:
for _ in range(blk.tag_lo):
out[current_channel].append(cur[current_channel])
elif blk.tag_hi == 0x30:
# Skip for now — we want to know what comes next
pass
return out, current_channel
def main():
for stem in ("M529LL1A.SP0", "M529LL1L.JQ0", "M529LL1L.V70",
"M529LL1A.SS0", "M529LL1A.SV0"):
path = f"tests/fixtures/5-11-26/{stem}"
with open(path, "rb") as f:
body = f.read()[43:-26]
_, samples = _parse_txt(path + ".TXT")
blocks = walk_body(body, find_data_start(body))
t0 = int.from_bytes(body[3:5], "big", signed=True)
t1 = int.from_bytes(body[5:7], "big", signed=True)
# Find all 30 NN blocks in data section
thirty_blocks = [(j, b) for j, b in enumerate(blocks) if b.tag_hi == 0x30]
if not thirty_blocks:
continue
print(f"\n=== {stem} ===")
for j, blk in thirty_blocks:
pred, ch = simulate_up_to(blocks, j, [t0, t1])
n_pred = len(pred[ch])
# The 30 NN block carries NN deltas for channel `ch` starting at sample n_pred
truth = [round(v * 200) for v in samples[ch]]
if n_pred >= len(truth):
continue
# Truth deltas: truth[n_pred] - cur, truth[n_pred+1] - truth[n_pred], ...
cur_val = pred[ch][-1]
nn = blk.tag_lo
truth_deltas = []
prev = cur_val
for k in range(min(nn, len(truth) - n_pred)):
truth_deltas.append(truth[n_pred + k] - prev)
prev = truth[n_pred + k]
print(f" block @ {blk.offset:>5} (chan={ch}, after sample {n_pred-1}, "
f"NN={nn}, last_val={cur_val}):")
print(f" data: {blk.data.hex(' ')}")
print(f" truth: {truth_deltas}")
schemes = [
("12-bit BE contiguous", unpack_12bit_be_contiguous(blk.data)),
("12-bit per-triplet BE", unpack_12bit_per_triplet_be(blk.data)),
]
for name, pred_deltas in schemes:
n_match = sum(1 for a, b in zip(pred_deltas, truth_deltas) if a == b)
tag = "" if pred_deltas == truth_deltas else " "
print(f" {tag}{n_match}/{nn} {name}: {pred_deltas[:nn]}")
if __name__ == "__main__":
main()
+86
View File
@@ -0,0 +1,86 @@
"""Test: 00 NN markers might be RLE for zero-deltas in current channel."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import _parse_txt
from minimateplus.waveform_codec import walk_body, find_data_start
def s4(n):
return n if n < 8 else n - 16
def i8(b):
return b if b < 128 else b - 256
def decode_with_rle(body):
"""Decode Tran assuming:
- preamble[3:5], [5:7] = T[0], T[1]
- All 10 NN / 20 NN blocks until segment_header (40 02) are Tran deltas
- 00 NN markers are RLE: NN/4 zero T deltas (or NN, or NN/2 — try them)
"""
if len(body) < 9 or body[0:3] != b"\x00\x02\x00":
return None, None, None
T0 = int.from_bytes(body[3:5], "big", signed=True)
T1 = int.from_bytes(body[5:7], "big", signed=True)
# Find first tag (might be 00 NN, 10 NN, or 20 NN)
i = 7
while i + 1 < len(body):
if body[i] in (0x00, 0x10, 0x20):
break
i += 1
start = i
blocks = walk_body(body, start)
results = {}
for rle_div in (4, 2, 1): # try different RLE interpretations
T = [T0, T1]
cur = T1
for blk in blocks:
if blk.tag_hi == 0x40:
break
if blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur += s4(nib)
T.append(cur)
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur += i8(byte)
T.append(cur)
elif blk.tag_hi == 0x00:
# RLE of zero deltas
n_zeros = blk.tag_lo // rle_div
for _ in range(n_zeros):
T.append(cur)
# 30 NN: skip for now
results[rle_div] = T
return results, T0, T1
def main():
for stem in ("M529LL1L.V70", "M529LL1L.JQ0", "M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
path = f"tests/fixtures/5-11-26/{stem}"
with open(path, "rb") as f:
body = f.read()[43:-26]
_, samples = _parse_txt(path + ".TXT")
truth_T = [round(v*200) for v in samples["Tran"]]
results, T0, T1 = decode_with_rle(body)
print(f"\n=== {stem} (T[0]={T0}, T[1]={T1}) ===")
for rle_div, T in results.items():
n = min(len(T), len(truth_T))
matches = sum(1 for i in range(n) if T[i] == truth_T[i])
# Find first divergence
div_at = -1
for i in range(n):
if T[i] != truth_T[i]:
div_at = i
break
print(f" rle_div={rle_div}: decoded {len(T)}, matches {matches}/{n}, first div at sample {div_at}")
if __name__ == "__main__":
main()
+71
View File
@@ -0,0 +1,71 @@
"""Test: does the second '20 NN' block in SS0 continue Tran samples?"""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import _parse_txt
from minimateplus.waveform_codec import walk_body, find_data_start
def s4(n):
return n if n < 8 else n - 16
def i8(b):
return b if b < 128 else b - 256
def main():
stem = "M529LL1A.SS0"
path = f"tests/fixtures/5-11-26/{stem}"
with open(path, "rb") as f:
body = f.read()[43:-26]
_, samples = _parse_txt(path + ".TXT")
truth_T_16 = [round(v * 200) for v in samples["Tran"]]
# Preamble
T0 = int.from_bytes(body[3:5], "big", signed=True)
T1 = int.from_bytes(body[5:7], "big", signed=True)
# Walk blocks
start = find_data_start(body)
blocks = walk_body(body, start)
print(f"=== {stem} === T[0]={T0} T[1]={T1}")
# Hypothesis: Tran continues through ALL 10 NN and 20 NN blocks
# in order, until the next 40 02 segment header (which resets).
T = [T0, T1]
cur = T1
decoded_count = 2 # T[0], T[1] from preamble
for bi, blk in enumerate(blocks):
if blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur += s4(nib)
T.append(cur)
decoded_count += 1
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur += i8(byte)
T.append(cur)
decoded_count += 1
elif blk.tag_hi == 0x40:
# Segment header — stop here for this test
break
# 00 and 30 NN don't contribute to Tran (in this hypothesis)
# Compare to truth
print(f" Decoded {len(T)} T samples up to first 40 02")
matches = sum(1 for i in range(min(len(T), len(truth_T_16))) if T[i] == truth_T_16[i])
print(f" Matches in first {min(len(T), len(truth_T_16))}: {matches}")
# Print first divergence
for i in range(min(len(T), len(truth_T_16))):
if T[i] != truth_T_16[i]:
print(f" First divergence: sample {i}: pred={T[i]}, truth={truth_T_16[i]}")
# Show context
print(f" pred [{i-3}:{i+5}]: {T[max(0,i-3):i+5]}")
print(f" truth [{i-3}:{i+5}]: {truth_T_16[max(0,i-3):i+5]}")
break
if __name__ == "__main__":
main()
+67
View File
@@ -0,0 +1,67 @@
"""Try various nibble-level channel interleavings to find which one matches truth."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def s4(n):
return n if n < 8 else n - 16
def run_decoder(body, layout, skip, n_channels=4):
"""layout: function nibble_index -> channel_index. Returns list-of-lists per channel."""
out = [[] for _ in range(n_channels)]
cur = [0] * n_channels
nibbles = []
for byte in body[skip:]:
nibbles.append((byte >> 4) & 0xF)
nibbles.append(byte & 0xF)
for i, n in enumerate(nibbles):
ch = layout(i)
cur[ch] += s4(n)
out[ch].append(cur[ch])
return out
def cmp(pred, truth, n=24):
n = min(n, len(pred), len(truth))
return [(pred[i], truth[i]) for i in range(n)]
def main():
b = load_bundle("event-c")
truth_T = [round(v * 200) for v in b.samples["Tran"]]
truth_V = [round(v * 200) for v in b.samples["Vert"]]
truth_L = [round(v * 200) for v in b.samples["Long"]]
print(f"T truth[0:10]: {truth_T[:10]}")
print(f"V truth[0:10]: {truth_V[:10]}")
print(f"L truth[0:10]: {truth_L[:10]}")
# Try several nibble->channel layouts (4 channels)
layouts = {
"interleaved TVLM (0,1,2,3,0,1,2,3,...)": lambda i: i % 4,
"interleaved VLMT": lambda i: (i + 3) % 4,
"interleaved LMTV": lambda i: (i + 2) % 4,
"interleaved MTVL": lambda i: (i + 1) % 4,
"byte-based TV LM TV LM (high T low V byte0; high L low M byte1)": lambda i: i % 4,
# "chunks of 8 nibbles per channel": each channel gets 8 nibbles in a row
"chunks-8 TVLM": lambda i: (i // 8) % 4,
"chunks-16 TVLM": lambda i: (i // 16) % 4,
# planar (full channel sequential)
"planar T(0..N) V(N..2N) L(2N..3N) M(3N..4N)": None, # special
}
for label, layout_fn in layouts.items():
if layout_fn is None:
continue
for skip in (0, 4, 7, 8, 9, 11, 14):
out = run_decoder(b.body, layout_fn, skip)
# Check first 8 cumulative on each channel
print(f" skip={skip:2} {label}")
print(f" T_cum[0:10]: {out[0][:10]}")
print(f" V_cum[0:10]: {out[1][:10]}")
print(f" L_cum[0:10]: {out[2][:10]}")
if __name__ == "__main__":
main()
+73
View File
@@ -0,0 +1,73 @@
"""Try decoding body as 4-bit signed nibble deltas, 4-channel round-robin."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
CHANNELS = ("Tran", "Vert", "Long", "MicL")
def s4(n):
"""Sign-extend a 4-bit unsigned to int (0..7 → 0..7, 8..F → -8..-1)."""
return n if n < 8 else n - 16
def decode_nibbles(body: bytes, skip_bytes: int = 7, n_channels: int = 4):
"""Read body as 2 nibbles per byte; accumulate as deltas for n_channels round-robin."""
out = [[] for _ in range(n_channels)]
cur = [0] * n_channels
ch = 0
nibbles = []
for byte in body[skip_bytes:]:
nibbles.append((byte >> 4) & 0xF)
nibbles.append(byte & 0xF)
for n in nibbles:
cur[ch] += s4(n)
out[ch].append(cur[ch])
ch = (ch + 1) % n_channels
return out
def cmp_to_truth(pred, truth, scale=16):
"""Compare predicted ints (in 16-count units) to truth (in 16-count units = txt * 200).
Return (max_abs_err, mean_abs_err, n_compared).
"""
n = min(len(pred), len(truth))
errs = []
for i in range(n):
p = pred[i]
t = truth[i]
errs.append(abs(p - t))
if not errs:
return None
return (max(errs), sum(errs) / len(errs), n)
def main():
for name in ("event-a", "event-c"):
b = load_bundle(name)
# Convert TXT samples (in/s) to 16-count units (multiply by 200, since 0.005 in/s = 1)
# WAIT: 0.005 in/s = 16 ADC counts. 1 count = 0.000305 in/s.
# So in 1-count units: count = txt * (1/0.0003052) ≈ txt * 3276.7
# But TXT only has 0.005 resolution so equivalent to 16-count units = txt * 200.
truth_in_16 = {ch: [round(v * 200) for v in b.samples[ch]] for ch in CHANNELS[:3]}
# MicL is in dB, skip for now
# Try decoder with skip_bytes = 7
decoded = decode_nibbles(b.body, skip_bytes=7, n_channels=4)
print(f"\n=== {name} ===")
print(f" body={len(b.body)}, nibbles={2*(len(b.body)-7)}, samples_per_ch={len(decoded[0])}")
print(f" truth samples per ch: {len(truth_in_16['Tran'])}")
# Print first 24 of each
for i, chan in enumerate(CHANNELS):
pred_first = decoded[i][:24]
if chan in truth_in_16:
truth_first = truth_in_16[chan][:24]
print(f" {chan} pred: {pred_first}")
print(f" {chan} truth: {truth_first}")
else:
print(f" {chan} pred: {pred_first} (truth in dB, skipped)")
if __name__ == "__main__":
main()
+32
View File
@@ -0,0 +1,32 @@
"""Verify decode_waveform_v2 against BW ASCII truth for all fixtures."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import _parse_txt
from minimateplus.waveform_codec import decode_waveform_v2
def main():
for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0",
"M529LL1L.JQ0", "M529LL1L.V70"):
path = f"tests/fixtures/5-11-26/{stem}"
with open(path, "rb") as f:
body = f.read()[43:-26]
_, samples = _parse_txt(path + ".TXT")
decoded = decode_waveform_v2(body)
if decoded is None:
print(f"{stem}: decoder returned None")
continue
print(f"\n=== {stem} ===")
for ch in ("Tran", "Vert", "Long"):
truth = [round(v * 200) for v in samples[ch]]
pred = decoded[ch]
n = min(len(pred), len(truth))
matches = sum(1 for i in range(n) if pred[i] == truth[i])
div = next((i for i in range(n) if pred[i] != truth[i]), -1)
print(f" {ch}: decoded={len(pred):>5} truth={len(truth):>5} "
f"matches={matches:>5}/{n:<5} first div={div}")
if __name__ == "__main__":
main()
+55
View File
@@ -0,0 +1,55 @@
"""Run decode_waveform_v2 against the 5-8-26 quiet bundle to test the
'quiet events should decode fully' hypothesis."""
import os, sys
sys.path.insert(0, ".")
from minimateplus.waveform_codec import decode_waveform_v2, walk_body, find_data_start
from analysis.load_bundle import _parse_txt
def main():
base = "tests/fixtures/decode-re-5-8-26"
for evt in sorted(os.listdir(base)):
folder = os.path.join(base, evt)
if not os.path.isdir(folder):
continue
# Find the binary (not .TXT)
bin_name = next(
(f for f in os.listdir(folder) if not f.endswith(".TXT")),
None,
)
if not bin_name:
continue
bin_path = os.path.join(folder, bin_name)
txt_path = bin_path + ".TXT"
if not os.path.exists(txt_path):
# Sometimes the TXT name differs slightly
for f in os.listdir(folder):
if f.endswith(".TXT"):
txt_path = os.path.join(folder, f)
break
with open(bin_path, "rb") as f:
body = f.read()[43:-26]
decoded = decode_waveform_v2(body)
_, samples = _parse_txt(txt_path)
# Count 30 NN blocks
blocks = walk_body(body, find_data_start(body))
n_30 = sum(1 for b in blocks if b.tag_hi == 0x30)
n_40 = sum(1 for b in blocks if b.tag_hi == 0x40)
print(f"\n=== {evt} === body={len(body)} segments={n_40} '30 NN' blocks={n_30}")
if decoded is None:
print(" decoder returned None")
continue
for ch in ("Tran", "Vert", "Long"):
truth = [round(v * 200) for v in samples[ch]]
pred = decoded[ch]
n = min(len(pred), len(truth))
matches = sum(1 for i in range(n) if pred[i] == truth[i])
div = next((i for i in range(n) if pred[i] != truth[i]), -1)
print(f" {ch}: decoded={len(pred):>5} truth={len(truth):>5} "
f"matches={matches:>5}/{n:<5} first div={div}")
if __name__ == "__main__":
main()
+71
View File
@@ -0,0 +1,71 @@
"""Verify: preamble[3:7] = Tran[0], Tran[1] as int16 BE in 16-count units.
And first 20/10 NN block = Tran deltas starting at sample 2.
"""
import os, sys
sys.path.insert(0, ".")
from analysis.load_bundle import _parse_txt
from minimateplus.waveform_codec import walk_body, find_data_start
def s4(n):
return n if n < 8 else n - 16
def i8(b):
return b if b < 128 else b - 256
def main():
for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
path = f"tests/fixtures/5-11-26/{stem}"
with open(path, "rb") as f:
raw = f.read()
body = raw[43:-26]
_, samples = _parse_txt(path + ".TXT")
truth_T_16 = [round(v * 200) for v in samples["Tran"]]
# Preamble parse
T0_pre = int.from_bytes(body[3:5], "big", signed=True)
T1_pre = int.from_bytes(body[5:7], "big", signed=True)
print(f"\n=== {stem} ===")
print(f" Preamble T[0]={T0_pre} (truth {truth_T_16[0]}) T[1]={T1_pre} (truth {truth_T_16[1]}) match={T0_pre==truth_T_16[0] and T1_pre==truth_T_16[1]}")
# First block
start = find_data_start(body)
blocks = walk_body(body, start)
if not blocks:
print(f" no blocks found")
continue
# Assume first block = Tran deltas from sample 2
first = blocks[0]
T = [T0_pre, T1_pre]
cur_T = T1_pre
if first.tag_hi == 0x10:
# Nibble pairs
for byte in first.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur_T += s4(nib)
T.append(cur_T)
elif first.tag_hi == 0x20:
# int8 per byte
for byte in first.data:
cur_T += i8(byte)
T.append(cur_T)
# Compare against truth
n_check = min(len(T), len(truth_T_16))
match_count = sum(1 for i in range(n_check) if T[i] == truth_T_16[i])
print(f" First block type=0x{first.tag_hi:02x} NN=0x{first.tag_lo:02x} len={len(first.data)}{len(T)} T samples decoded")
print(f" Tran predicted[0:10]: {T[:10]}")
print(f" Tran truth [0:10]: {truth_T_16[:10]}")
print(f" Matches in first {n_check}: {match_count} / {n_check}")
# Show where it diverges
for i in range(n_check):
if T[i] != truth_T_16[i]:
print(f" First divergence: sample {i}: pred={T[i]}, truth={truth_T_16[i]}")
break
if __name__ == "__main__":
main()
+20
View File
@@ -0,0 +1,20 @@
"""Walk blocks of the new 5-11-26 events and look at what comes after Tran block."""
import sys
sys.path.insert(0, ".")
from minimateplus.waveform_codec import walk_body, find_data_start
def main():
for stem in ("M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0"):
with open(f"tests/fixtures/5-11-26/{stem}", "rb") as f:
raw = f.read()
body = raw[43:-26]
start = find_data_start(body)
blocks = walk_body(body, start)
print(f"\n=== {stem} === body={len(body)} start={start} blocks walked={len(blocks)}")
for i, b in enumerate(blocks[:20]):
print(f" block[{i:>2}] @ {b.offset:>5} tag={b.tag_hi:02x} NN=0x{b.tag_lo:02x}({b.tag_lo}) len={b.length} data[:24]={b.data[:24].hex(' ')}")
if __name__ == "__main__":
main()
+44
View File
@@ -0,0 +1,44 @@
"""Walk the body assuming chunks delimited by 0x10 NN tags. Print each chunk's structure."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def walk(body: bytes, start_offset: int = 7, max_chunks: int = 30):
"""Find all positions where byte = 0x10 followed by a multiple-of-4 byte. Print chunks."""
chunks = []
i = start_offset
while i < len(body) - 1:
# Find next `10 NN` where NN is multiple of 4 (and not preceded by another 0x10 immediately, which would be data).
if body[i] == 0x10 and (body[i+1] % 4 == 0):
chunks.append(i)
i += 1
return chunks
def main():
for name in ("event-c", "event-d"):
b = load_bundle(name)
body = b.body
positions = []
i = 7 # skip 7-byte preamble
while i < len(body) - 1:
if body[i] == 0x10 and body[i+1] % 4 == 0 and body[i+1] > 0:
positions.append(i)
i += 2 # skip past tag
else:
i += 1
print(f"\n=== {name} === body={len(body)}, total `10 NN` (NN%4==0, NN>0) tags: {len(positions)}")
# Print first 20 chunks: show position, NN, gap to next tag
for k in range(min(30, len(positions))):
pos = positions[k]
NN = body[pos + 1]
next_pos = positions[k+1] if k+1 < len(positions) else len(body)
gap = next_pos - pos
data_bytes = body[pos+2 : next_pos]
print(f" chunk[{k:>3}] @ {pos:>5} NN=0x{NN:02x} ({NN:>3}, NN/2={NN//2}) gap={gap:>3} "
f"data={data_bytes[:24].hex(' ')}{'...' if len(data_bytes) > 24 else ''}")
if __name__ == "__main__":
main()
+50
View File
@@ -0,0 +1,50 @@
"""Deterministic chunk walker: each chunk = [10 NN][NN/2 bytes data][2 bytes trailer]."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def walk_chunks(body: bytes, start: int = 7):
"""Yield (offset, NN, data_bytes, trailer_bytes) tuples."""
i = start
while i + 1 < len(body):
if body[i] != 0x10:
break
NN = body[i + 1]
if NN == 0 or NN > 0x80 or NN % 4 != 0:
break
chunk_len = NN // 2 + 4
if i + chunk_len > len(body):
break
data = bytes(body[i + 2 : i + 2 + NN // 2])
trailer = bytes(body[i + 2 + NN // 2 : i + chunk_len])
yield (i, NN, data, trailer)
i += chunk_len
def main():
for name in ("event-c", "event-d", "event-a", "event-b"):
b = load_bundle(name)
body = b.body
chunks = list(walk_chunks(body))
print(f"\n=== {name} === body={len(body)} N_samples={len(b.samples['Tran'])}")
print(f" chunks parsed: {len(chunks)}")
if chunks:
last = chunks[-1]
end_of_walk = last[0] + last[1] // 2 + 4
print(f" walk ended at offset {end_of_walk} (= {len(body) - end_of_walk} bytes from end)")
# Stats
total_data_bytes = sum(len(c[2]) for c in chunks)
print(f" total data bytes: {total_data_bytes}, total nibbles: {2*total_data_bytes}")
if name in ("event-c", "event-d"):
ratio = (2 * total_data_bytes) / (len(b.samples['Tran']) * 4)
print(f" nibbles per (sample × channel): {ratio:.3f}")
# Sum of trailer second-byte
trailer_sums = [c[3][-1] if c[3] else None for c in chunks]
print(f" first 10 chunks: {[(c[0], c[1], c[3].hex()) for c in chunks[:10]]}")
# Print last 10 chunks (likely transition to trailer)
print(f" last 10 chunks: {[(c[0], c[1], c[3].hex()) for c in chunks[-10:]]}")
if __name__ == "__main__":
main()
+51
View File
@@ -0,0 +1,51 @@
"""Walk chunks; auto-detect preamble length by finding first 10 NN."""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def walk_chunks(body, start, max_NN=0x80):
chunks = []
i = start
while i + 1 < len(body):
if body[i] != 0x10:
break
NN = body[i + 1]
if NN == 0 or NN > max_NN or NN % 4 != 0:
break
chunk_len = NN // 2 + 4
if i + chunk_len > len(body):
break
data = bytes(body[i + 2 : i + 2 + NN // 2])
trailer = bytes(body[i + 2 + NN // 2 : i + chunk_len])
chunks.append((i, NN, data, trailer))
i += chunk_len
return chunks, i
def find_first_chunk_start(body):
"""Locate first byte that begins a `10 NN` chunk (NN ∈ multiples of 4, 4..0x7C)."""
for i in range(20):
if body[i] == 0x10 and body[i + 1] % 4 == 0 and 0 < body[i + 1] <= 0x7C:
return i
return -1
def main():
for name in ("event-c", "event-d", "event-a", "event-b"):
b = load_bundle(name)
body = b.body
start = find_first_chunk_start(body)
chunks, end = walk_chunks(body, start)
print(f"\n=== {name} === body={len(body)} N_samples={len(b.samples['Tran'])} start={start}")
print(f" chunks parsed: {len(chunks)}, walk ended at {end}")
if chunks:
print(f" first 5 chunks: {[(c[0], c[1], c[3].hex()) for c in chunks[:5]]}")
print(f" last 5 chunks: {[(c[0], c[1], c[3].hex()) for c in chunks[-5:]]}")
print(f" bytes around end of walk: {body[end-4:end+12].hex(' ')}")
else:
print(f" bytes at start: {body[start:start+16].hex(' ')}")
if __name__ == "__main__":
main()
+75
View File
@@ -0,0 +1,75 @@
"""
Walker v4: alternate [10 NN] data chunks and [00 NN] (or other) marker tags.
Hypothesis:
- [10 NN]: data block, length NN/2 + 2 bytes (2-byte tag + NN/2 bytes data)
- [00 NN]: 2-byte marker block (no data)
- [20/30/40 NN]: special blocks with type-dependent length
"""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
def walk(body, start):
i = start
blocks = []
while i + 1 < len(body):
t0 = body[i]
t1 = body[i + 1]
if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0x80:
# data chunk: length NN/2 + 2
length = t1 // 2 + 2
blocks.append((i, "10", t1, bytes(body[i + 2 : i + length]), length))
i += length
elif t0 == 0x00 and t1 % 4 == 0:
# 2-byte marker
blocks.append((i, "00", t1, b"", 2))
i += 2
elif t0 == 0x20 and t1 % 4 == 0:
# type 2 — try length 2+t1/2 (similar to 10) OR fixed
length = t1 // 2 + 2
blocks.append((i, "20", t1, bytes(body[i + 2 : i + length]), length))
i += length
elif t0 == 0x30 and t1 % 4 == 0:
length = t1 // 2 + 2
blocks.append((i, "30", t1, bytes(body[i + 2 : i + length]), length))
i += length
elif t0 == 0x40 and t1 == 0x02:
# Special "footer transition" block — try fixed 22 bytes
length = 22
blocks.append((i, "40", t1, bytes(body[i + 2 : i + length]), length))
i += length
else:
# Unknown tag — stop
blocks.append((i, "??", t0, bytes(body[i:i+8]), 0))
break
return blocks, i
def main():
for name in ("event-c", "event-d", "event-a", "event-b"):
b = load_bundle(name)
body = b.body
# Auto-detect start
for s in range(15):
if body[s] == 0x10 and body[s+1] % 4 == 0 and 0 < body[s+1] <= 0x80:
start = s
break
else:
start = 7
blocks, end = walk(body, start)
# Categorize
from collections import Counter
types = Counter(b[1] for b in blocks)
print(f"\n=== {name} === body={len(body)} N={len(b.samples['Tran'])} start={start}")
print(f" total blocks: {len(blocks)}, walk ended at {end}/{len(body)}")
print(f" type counts: {dict(types)}")
# Print last 5 blocks
print(f" last 5 blocks: {[(bb[0], bb[1], bb[2]) for bb in blocks[-5:]]}")
if end < len(body):
print(f" bytes at end: {body[end:end+24].hex(' ')}")
if __name__ == "__main__":
main()
+83
View File
@@ -0,0 +1,83 @@
"""
Walker v5: flexible NN range and multiple block-type lengths.
Hypothesis:
- [10 NN]: 4-bit-delta data block, length = NN/2 + 2
- [20 NN]: 8-bit-literal data block, length = NN + 2
- [00 NN]: 2-byte marker (no payload)
- [30 NN]: trailer/summary block, length = NN*4
- [40 NN]: footer-marker block, fixed 22 bytes
"""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
from collections import Counter
def walk(body, start, max_blocks=10000):
i = start
blocks = []
while i + 1 < len(body) and len(blocks) < max_blocks:
t0 = body[i]
t1 = body[i + 1]
if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 // 2 + 2
if i + length > len(body):
break
data = bytes(body[i + 2 : i + length])
blocks.append((i, "10", t1, data, length))
i += length
elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 + 2
if i + length > len(body):
break
data = bytes(body[i + 2 : i + length])
blocks.append((i, "20", t1, data, length))
i += length
elif t0 == 0x00 and t1 % 4 == 0:
# 2-byte marker
blocks.append((i, "00", t1, b"", 2))
i += 2
elif t0 == 0x30 and t1 % 4 == 0:
length = t1 * 4
if i + length > len(body):
break
data = bytes(body[i + 2 : i + length])
blocks.append((i, "30", t1, data, length))
i += length
elif t0 == 0x40 and t1 == 0x02:
length = 22
if i + length > len(body):
break
data = bytes(body[i + 2 : i + length])
blocks.append((i, "40", t1, data, length))
i += length
else:
blocks.append((i, "??", t0, bytes(body[i:i+8]), 0))
break
return blocks, i
def main():
for name in ("event-c", "event-d", "event-a", "event-b"):
b = load_bundle(name)
body = b.body
for s in range(15):
if body[s] == 0x10 and body[s+1] % 4 == 0 and 0 < body[s+1] <= 0xFC:
start = s; break
else:
start = 7
blocks, end = walk(body, start)
types = Counter(bb[1] for bb in blocks)
print(f"\n=== {name} === body={len(body)} N={len(b.samples['Tran'])} start={start}")
print(f" total blocks: {len(blocks)}, walk ended at {end}/{len(body)}")
print(f" type counts: {dict(types)}")
if blocks and blocks[-1][1] == "??":
print(f" stopped at byte: 0x{blocks[-1][2]:02x}, prev 5 blocks: {[(bb[0], bb[1], bb[2]) for bb in blocks[-6:-1]]}")
# Sum payload sizes by type
payload_sizes = {t: sum(len(bb[3]) for bb in blocks if bb[1] == t) for t in types}
print(f" payload bytes by type: {payload_sizes}")
if __name__ == "__main__":
main()
+68
View File
@@ -0,0 +1,68 @@
"""
Walker v6: handle 40 02 blocks correctly (length 20).
Block formats:
- [10 NN]: 4-bit nibble delta data, length = NN/2 + 2
- [20 NN]: int8 literal data, length = NN + 2
- [00 NN]: 2-byte marker
- [30 NN]: trailer/summary block, length = NN*4
- [40 02]: segment header, fixed length 20
"""
import sys
sys.path.insert(0, ".")
from analysis.load_bundle import load_bundle
from collections import Counter
def walk(body, start, max_blocks=10000):
i = start
blocks = []
while i + 1 < len(body) and len(blocks) < max_blocks:
t0 = body[i]
t1 = body[i + 1]
if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 // 2 + 2
elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 + 2
elif t0 == 0x00 and t1 % 4 == 0:
length = 2
elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
length = t1 * 4
elif t0 == 0x40 and t1 == 0x02:
length = 20
else:
blocks.append((i, "??", t0, bytes(body[i:i+8]), 0))
break
if i + length > len(body):
break
data = bytes(body[i + 2 : i + length])
blocks.append((i, f"{t0:02x}", t1, data, length))
i += length
return blocks, i
def main():
for name in ("event-c", "event-d", "event-a", "event-b"):
b = load_bundle(name)
body = b.body
for s in range(15):
if body[s] == 0x10 and body[s+1] % 4 == 0 and 0 < body[s+1] <= 0xFC:
start = s; break
else:
start = 7
blocks, end = walk(body, start)
types = Counter(bb[1] for bb in blocks)
print(f"\n=== {name} === body={len(body)} N={len(b.samples['Tran'])} start={start}")
print(f" total blocks: {len(blocks)}, walk ended at {end}/{len(body)}")
print(f" type counts: {dict(types)}")
if blocks and blocks[-1][1] == "??":
print(f" stopped at byte: 0x{blocks[-1][2]:02x} at offset {blocks[-1][0]}")
print(f" prev 5 blocks: {[(bb[0], bb[1], bb[2]) for bb in blocks[-6:-1]]}")
print(f" bytes around stop: {body[end-4:end+24].hex(' ')}")
# Sum
payload_sizes = {t: sum(len(bb[3]) for bb in blocks if bb[1] == t) for t in types}
print(f" payload bytes by type: {payload_sizes}")
if __name__ == "__main__":
main()
+65
View File
@@ -0,0 +1,65 @@
"""Run read_idf_file across the corpus and report per-channel accuracy vs sidecars."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file
from analysis_idf.recon import load_sidecar_samples
def sidecar_path(idfw: Path) -> Path:
return idfw.parent / "TXT" / f"{idfw.name}.txt"
def main():
root = REPO / "tests/fixtures/THORDATA_example"
files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]
files.sort()
GEO_LSB = 0.0003
n_ok = n_skip = 0
overall = {"Tran": [], "Vert": [], "Long": []}
for f in files:
try:
res = read_idf_file(f)
except Exception:
n_skip += 1
continue
sc_path = sidecar_path(f)
if not sc_path.exists():
n_skip += 1
continue
try:
sc = load_sidecar_samples(sc_path)
except Exception:
n_skip += 1
continue
per_file = {}
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = res.samples.get(ch, [])
n = min(len(sc_counts), len(dec))
if n == 0:
per_file[ch] = 0.0
continue
exact = sum(1 for i in range(n) if sc_counts[i] == dec[i])
pct = 100.0 * exact / n
per_file[ch] = pct
overall[ch].append(pct)
n_ok += 1
print(f"Processed {n_ok} files (skipped {n_skip})")
print("Per-channel exact-match % (mean / min / max):")
for ch, vals in overall.items():
if vals:
avg = sum(vals) / len(vals)
print(f" {ch}: mean={avg:.2f}% min={min(vals):.2f}% max={max(vals):.2f}% n={len(vals)}")
if __name__ == "__main__":
main()
+49
View File
@@ -0,0 +1,49 @@
"""Find where decoded-vs-sidecar diverges for each channel."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
decoded = decode_waveform_v2(buf[0x0f1f:])
GEO_LSB = 0.0003
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = decoded[ch]
# Find ALL transitions where mismatches start/stop
first_diff = next((i for i in range(len(dec)) if dec[i] != sc_counts[i]), None)
if first_diff is None:
print(f"{ch}: NO MISMATCHES")
continue
print(f"{ch}: first diff at idx {first_diff}")
# Show 5 before, 5 after
for i in range(max(0, first_diff - 3), min(len(dec), first_diff + 8)):
mark = " " if dec[i] == sc_counts[i] else "**"
print(f" {mark} idx {i:4d}: sc={sc_counts[i]:6d} dec={dec[i]:6d} diff={dec[i]-sc_counts[i]:+d}")
# Where does cumulative diff exceed 100?
cum_match_run = 0
max_match_run = 0
match_run_start = 0
diff_count = 0
for i in range(len(dec)):
if dec[i] == sc_counts[i]:
cum_match_run += 1
max_match_run = max(max_match_run, cum_match_run)
else:
cum_match_run = 0
diff_count += 1
print(f" total mismatches: {diff_count}/{len(dec)}, longest run of matches: {max_match_run}")
print()
if __name__ == "__main__":
main()
+48
View File
@@ -0,0 +1,48 @@
"""End-to-end IDFH ingest verification."""
from __future__ import annotations
import sys
import tempfile
import json
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
def main():
idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
txt = idfh.parent / "TXT" / f"{idfh.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfh.read_bytes(),
idfh,
idf_report_text=txt.read_text(errors="replace"),
)
print("=== save_imported_idf (IDFH) ===")
print(f" serial: {rec['serial']}")
print(f" filename: {rec['filename']}")
print(f" filesize: {rec['filesize']}")
print(f" h5: {rec['hdf5_filename']}") # expect None for histogram
print(f" sidecar: {rec['sidecar_filename']}")
print()
print("=== Event ===")
print(f" timestamp: {ev.timestamp}")
print(f" record_type: {ev.record_type}")
print(f" sample_rate: {ev.sample_rate}")
print()
# Inspect sidecar to confirm intervals were stashed
sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json"
sc = json.loads(sc_path.read_text())
intervals = sc.get("extensions", {}).get("idf_intervals", [])
print(f" sidecar intervals: {len(intervals)}")
if intervals:
print(f" first interval: {intervals[0]}")
print(f" last interval: {intervals[-1]}")
if __name__ == "__main__":
main()
+40
View File
@@ -0,0 +1,40 @@
"""Verify the had_report=False path: ingest IDFW with no .txt."""
from __future__ import annotations
import sys
from pathlib import Path
import tempfile
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
def main():
idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfw.read_bytes(),
idfw,
serial_hint=None,
idf_report_text=None, # ← no .txt!
)
print("=== IDFW without .txt ingest ===")
print(f" serial: {rec['serial']}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" record_type: {ev.record_type}")
print(f" rectime_sec: {ev.rectime_seconds}")
nT = len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0
nV = len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0
nL = len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0
nM = len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0
print(f" raw_samples: Tran={nT} Vert={nV} Long={nL} MicL={nM}")
if ev.peak_values:
print(f" peak_values: tran={ev.peak_values.tran} vert={ev.peak_values.vert} long={ev.peak_values.long}")
print(f" h5 written: {rec['hdf5_filename']}")
if __name__ == "__main__":
main()
+102
View File
@@ -0,0 +1,102 @@
"""End-to-end Thor report PDF rendering.
Ingests an IDFW + .txt via save_imported_idf, runs gather_report_data
(faking a minimal DB row), and renders the PDF to disk.
"""
from __future__ import annotations
import sys
import tempfile
import json
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
from sfm import report_pdf
class FakeDb:
"""Stand-in for SeismoDb.get_event(); the renderer only needs a few cols."""
def __init__(self, event):
self.event = event
def get_event(self, _id):
return self.event
def main():
base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719"
idfw = base / "UM11719_20231219162723.IDFW"
txt = base / "TXT" / f"{idfw.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfw.read_bytes(),
idfw,
idf_report_text=txt.read_text(errors="replace"),
)
print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}")
# Verify sidecar has bw_report block
sc_path = Path(td) / "UM11719" / f"{idfw.name}.sfm.json"
sc = json.loads(sc_path.read_text())
bw = sc.get("bw_report", {})
print(f" bw_report.available: {bw.get('available')}")
print(f" bw_report.peaks.tran.ppv_ips: {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}")
print(f" bw_report.mic.pspl_dbl: {bw.get('mic', {}).get('pspl_dbl')}")
print(f" bw_report.histogram.n_intervals: {bw.get('histogram', {}).get('n_intervals')}")
# Build a DB-row-shaped dict from the Event for gather_report_data
import datetime
ts = ev.timestamp
ts_iso = None
if ts is not None:
try:
ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
except Exception:
pass
fake_row = {
"serial": "UM11719",
"blastware_filename": rec["filename"],
"record_type": "Waveform",
"timestamp": ts_iso,
"sample_rate": ev.sample_rate,
"project": ev.project_info.project if ev.project_info else None,
"client": ev.project_info.client if ev.project_info else None,
"operator": ev.project_info.operator if ev.project_info else None,
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
"created_at": None,
}
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="test-1")
print()
print(f"=== ReportData ===")
print(f" event_id: {rd.event_id}")
print(f" serial: {rd.serial}")
print(f" record_type: {rd.record_type}")
print(f" event_datetime: {rd.event_datetime_str}")
print(f" trigger: {rd.trigger_source}")
print(f" geo_range: {rd.geo_range_str}")
print(f" sample_rate: {rd.sample_rate_str}")
print(f" firmware: {rd.firmware}")
print(f" calibration: {rd.calibration_date} by {rd.calibration_by}")
print(f" battery: {rd.battery_volts}")
print(f" PVS: {rd.peak_vector_sum_ips} in/s at {rd.peak_vector_sum_time_s} sec")
print(f" mic_pspl_dbl: {rd.mic_pspl_dbl}")
print(f" mic_zc_freq_hz: {rd.mic_zc_freq_hz}")
print(f" channel_stats: {len(rd.channel_stats)} rows")
for cs in rd.channel_stats:
print(f" {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} ToP={cs['time_of_peak_s']} Acc={cs['peak_accel_g']} Disp={cs['peak_disp_in']} Test={cs['sensor_check']}")
# Render the PDF
out_path = REPO / "analysis_idf" / "thor_report.pdf"
pdf_bytes = report_pdf.render_event_report_pdf(rd)
out_path.write_bytes(pdf_bytes)
print()
print(f" PDF written: {out_path} ({len(pdf_bytes)} bytes)")
if __name__ == "__main__":
main()
+91
View File
@@ -0,0 +1,91 @@
"""End-to-end Thor IDFH histogram report PDF rendering."""
from __future__ import annotations
import sys
import tempfile
import json
import datetime
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
from sfm import report_pdf
class FakeDb:
def __init__(self, event):
self.event = event
def get_event(self, _id):
return self.event
def main():
# Use the multi-interval IDFH (81 + trigger row)
idfh = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
txt = idfh.parent / "TXT" / f"{idfh.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfh.read_bytes(),
idfh,
idf_report_text=txt.read_text(errors="replace"),
)
print(f"save_imported_idf: h5={rec['hdf5_filename']}, sidecar={rec['sidecar_filename']}")
sc_path = Path(td) / "UM13981" / f"{idfh.name}.sfm.json"
sc = json.loads(sc_path.read_text())
bw = sc.get("bw_report", {})
hist = bw.get("histogram", {})
print(f" bw_report.histogram.start: {hist.get('start')}")
print(f" bw_report.histogram.stop: {hist.get('stop')}")
print(f" bw_report.histogram.n_intervals: {hist.get('n_intervals')}")
print(f" bw_report.histogram.interval_size: {hist.get('interval_size')}")
print(f" bw_report.histogram.interval_size_s: {hist.get('interval_size_s')}")
print(f" bw_report.peaks.tran.ppv_ips: {bw.get('peaks', {}).get('tran', {}).get('ppv_ips')}")
ts = ev.timestamp
ts_iso = None
if ts is not None:
try:
ts_iso = datetime.datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second).isoformat()
except Exception:
pass
fake_row = {
"serial": "UM13981",
"blastware_filename": rec["filename"],
"record_type": "Histogram",
"timestamp": ts_iso,
"sample_rate": ev.sample_rate,
"project": ev.project_info.project if ev.project_info else None,
"client": ev.project_info.client if ev.project_info else None,
"operator": ev.project_info.operator if ev.project_info else None,
"sensor_location": ev.project_info.sensor_location if ev.project_info else None,
"created_at": None,
}
rd = report_pdf.gather_report_data(FakeDb(fake_row), store, event_id="hist-1")
print()
print("=== ReportData (histogram) ===")
print(f" is_histogram: {rd.is_histogram}")
print(f" histogram_start: {rd.histogram_start_str}")
print(f" histogram_stop: {rd.histogram_stop_str}")
print(f" histogram_n_intervals: {rd.histogram_n_intervals}")
print(f" histogram_interval_size:{rd.histogram_interval_size}")
print(f" histogram_interval_times[:3]: {rd.histogram_interval_times[:3]}")
print(f" histogram_interval_times[-2:]: {rd.histogram_interval_times[-2:]}")
print(f" channel_stats: {len(rd.channel_stats)} rows")
for cs in rd.channel_stats:
print(f" {cs['name']}: PPV={cs['ppv_ips']} ZC={cs['zc_freq_hz']} peak_date={cs['peak_date']} peak_time={cs['peak_time']}")
pdf_bytes = report_pdf.render_event_report_pdf(rd)
out_path = REPO / "analysis_idf" / "thor_report_idfh.pdf"
out_path.write_bytes(pdf_bytes)
print()
print(f" PDF written: {out_path} ({len(pdf_bytes)} bytes)")
if __name__ == "__main__":
main()
+52
View File
@@ -0,0 +1,52 @@
"""End-to-end ingest test: feed an IDFW + .txt to save_imported_idf in a tmp store."""
from __future__ import annotations
import sys
from pathlib import Path
import tempfile
import shutil
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from sfm.waveform_store import WaveformStore
def main():
idfw = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
txt = idfw.parent / "TXT" / f"{idfw.name}.txt"
with tempfile.TemporaryDirectory() as td:
store = WaveformStore(Path(td))
ev, rec = store.save_imported_idf(
idfw.read_bytes(),
idfw,
serial_hint=None,
idf_report_text=txt.read_text(errors="replace"),
)
print("=== Save result ===")
print(f" serial: {rec['serial']}")
print(f" filename: {rec['filename']}")
print(f" filesize: {rec['filesize']}")
print(f" h5: {rec['hdf5_filename']}")
print(f" sidecar: {rec['sidecar_filename']}")
print()
print("=== Event ===")
print(f" serial: {ev.serial if hasattr(ev,'serial') else '(n/a)'}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" record_type: {ev.record_type}")
print(f" rectime_sec: {ev.rectime_seconds}")
print(f" raw_samples: Tran={len(ev.raw_samples.get('Tran', [])) if ev.raw_samples else 0}, Vert={len(ev.raw_samples.get('Vert', [])) if ev.raw_samples else 0}, Long={len(ev.raw_samples.get('Long', [])) if ev.raw_samples else 0}, MicL={len(ev.raw_samples.get('MicL', [])) if ev.raw_samples else 0}")
if ev.peak_values:
print(f" peaks (txt): Tran={ev.peak_values.tran} Vert={ev.peak_values.vert} Long={ev.peak_values.long}")
print()
# Verify the h5 file actually got written
h5path = Path(td) / "UM11719" / f"{idfw.name}.h5"
print(f" h5 exists: {h5path.exists()} size={h5path.stat().st_size if h5path.exists() else 0}")
sidecar = Path(td) / "UM11719" / f"{idfw.name}.sfm.json"
print(f" sidecar exists:{sidecar.exists()} size={sidecar.stat().st_size if sidecar.exists() else 0}")
if __name__ == "__main__":
main()
+137
View File
@@ -0,0 +1,137 @@
"""Decode IDFH histogram intervals + verify against sidecar."""
from __future__ import annotations
import sys
import struct
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
SEGMENT_MAGIC = b"\x02\xda\x0a\x00\x00\x00"
SEGMENT_SIZE = 732 # = 10-byte header + 10 × 72-byte intervals + 2-byte tail
INTERVAL_SIZE = 72
CHANNELS = ("Tran", "Vert", "Long", "MicL")
def decode_interval(buf72: bytes) -> dict:
"""Decode one 72-byte interval into per-channel min/max/halfp."""
out = {}
for i, ch in enumerate(CHANNELS):
block = buf72[i*16 : (i+1)*16]
mn = struct.unpack_from(">h", block, 0)[0]
mx = struct.unpack_from(">h", block, 2)[0]
sb = struct.unpack_from(">h", block, 4)[0]
halfp = struct.unpack_from(">H", block, 6)[0]
f10 = struct.unpack_from(">H", block, 10)[0]
f14 = struct.unpack_from(">H", block, 14)[0]
peak_count = max(abs(mn), abs(mx))
out[ch] = {
"min": mn,
"max": mx,
"field4": sb,
"halfp": halfp,
"field10": f10,
"field14": f14,
"peak": peak_count,
"freq_hz": (512.0 / halfp) if halfp > 5 else None,
}
out["_tail"] = buf72[64:].hex(" ")
return out
def walk_idfh(buf: bytes) -> list:
"""Walk all interval records in an IDFH file."""
intervals = []
# Multi-segment file: every 02 da 0a 00 00 00 marker introduces a segment.
# Single-interval file: just one body header at 0xf96 of form ?? ?? 0a 00 00 00.
# Find them all.
i = 0
while True:
j = buf.find(b"\x0a\x00\x00\x00", i)
if j < 0:
break
# Validate: the 2 bytes before must form a length, and we want bytes
# [j-2 : j+6] to have a recognisable shape. Actually the cleanest
# filter is "preceded by a length and followed by 00 NN 05 3f".
if j < 2:
i = j + 1
continue
# Body header form: [length_be_2][0a 00 00 00][00 NN][05 3f]
if j + 10 > len(buf):
break
length = int.from_bytes(buf[j-2:j], "big")
# Verify the segment-marker shape: [length_be][0a 00 00 00][00 NN][05 3f]
if buf[j+4] != 0x00:
i = j + 1
continue
if buf[j+6:j+8] != b"\x05\x3f":
i = j + 1
continue
# Header layout (10 bytes): [length_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B]
# Followed by N interval records of 72 bytes each, then 2 tail bytes.
# length value = (N × 72) + 10 (counts bytes from 0x0a... through interval data).
header_start = j - 2
n_intervals = (length - 10) // INTERVAL_SIZE
interval_start = header_start + 10
for k in range(n_intervals):
off = interval_start + k * INTERVAL_SIZE
if off + INTERVAL_SIZE > len(buf):
break
chunk = buf[off:off + INTERVAL_SIZE]
intervals.append({"offset": off, **decode_interval(chunk)})
i = header_start + length + 2
return intervals
def main():
# Test against multi-segment IDFH
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
sc_path = target.parent / "TXT" / f"{target.name}.txt"
buf = target.read_bytes()
intervals = walk_idfh(buf)
print(f"=== {target.name} ===")
print(f" file size: {len(buf)}")
print(f" decoded intervals: {len(intervals)}")
# Show first 2 + last 2
sc_rows = []
for line in sc_path.read_text(errors="replace").splitlines():
if line.startswith("2022-") or line.startswith("2023-"):
sc_rows.append(line)
print(f" sidecar rows: {len(sc_rows)}")
print()
for k in [0, 1, 78, 79, 80]:
if k >= len(intervals):
continue
iv = intervals[k]
print(f"--- interval {k} @0x{iv['offset']:04x} ---")
for ch in CHANNELS:
d = iv[ch]
peak_ips = d["peak"] / 32768 * 10.0
print(f" {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s) halfp={d['halfp']:5d} freq={d['freq_hz']}")
# sidecar row
if k < len(sc_rows):
print(f" SC: {sc_rows[k]}")
# Test single-interval IDFH
print()
target2 = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH"
sc2 = target2.parent / "TXT" / f"{target2.name}.txt"
buf2 = target2.read_bytes()
intervals2 = walk_idfh(buf2)
print(f"=== {target2.name} ===")
print(f" file size: {len(buf2)}, decoded intervals: {len(intervals2)}")
if intervals2:
iv = intervals2[0]
for ch in CHANNELS:
d = iv[ch]
peak_ips = d["peak"] / 32768 * 10.0
print(f" {ch}: peak={d['peak']:5d} ({peak_ips:.4f} in/s) halfp={d['halfp']:5d} freq={d['freq_hz']}")
sc_rows2 = [l for l in sc2.read_text(errors='replace').splitlines() if l.startswith("2023-")]
if sc_rows2:
print(f" SC: {sc_rows2[0]}")
if __name__ == "__main__":
main()
+41
View File
@@ -0,0 +1,41 @@
"""Find IDFH interval period via auto-correlation of structural patterns."""
from __future__ import annotations
import sys
from pathlib import Path
from collections import Counter
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
def main():
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM13981/UM13981_20220805075441.IDFH"
buf = target.read_bytes()
body_start = 0xF96
body_end = 0x270C
body = buf[body_start:body_end]
print(f"body size: {len(body)} bytes (file {len(buf)} bytes)")
# For each candidate interval size, count how many bytes at fixed offsets within
# each interval are zero (consistent column-zero pattern indicates correct size).
print()
print("=== zero-column score by interval size (higher = more likely) ===")
best = []
for sz in range(16, 100):
n = len(body) // sz
if n < 30:
continue
# For each column position within an interval, count how many of n intervals have zero
score = 0
for col in range(sz):
zeros = sum(1 for i in range(n) if body[i*sz + col] == 0)
if zeros >= n * 0.9:
score += 1
best.append((score, sz, n))
best.sort(reverse=True)
for score, sz, n in best[:10]:
print(f" size={sz:3d} n_intervals={n} consistently-zero-cols={score}")
if __name__ == "__main__":
main()
+40
View File
@@ -0,0 +1,40 @@
"""Per-file accuracy + sample-count details."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file
from analysis_idf.recon import load_sidecar_samples
def main():
root = REPO / "tests/fixtures/THORDATA_example"
files = sorted([f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")])
GEO_LSB = 0.0003
# Limit to first 15 successful files for detail.
shown = 0
for f in files:
try:
res = read_idf_file(f)
except Exception:
continue
sc_path = f.parent / "TXT" / f"{f.name}.txt"
if not sc_path.exists():
continue
sc = load_sidecar_samples(sc_path)
sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]]
dec = res.samples.get("Tran", [])
n = min(len(sc_tran), len(dec))
exact = sum(1 for i in range(n) if sc_tran[i] == dec[i]) if n else 0
pct = 100.0 * exact / n if n else 0.0
print(f"{f.name:40s} size={f.stat().st_size:6d} sc_n={len(sc_tran):4d} dec_n={len(dec):4d} exact={pct:.1f}%")
shown += 1
if shown >= 20:
break
if __name__ == "__main__":
main()
+64
View File
@@ -0,0 +1,64 @@
"""Look at what's at the divergence boundary."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import walk_body, find_data_start, parse_segment_header
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
body = buf[0x0f1f:]
start = find_data_start(body)
print(f"data_start: {start} (= file offset 0x{0x0f1f + start:04x})")
blocks = walk_body(body, start)
print(f"{len(blocks)} blocks total")
print()
# First 25 blocks
print("=== first 30 blocks ===")
for i, b in enumerate(blocks[:30]):
body_off = 0x0f1f + b.offset
if b.tag_hi == 0x40:
hdr = parse_segment_header(b)
print(f" [{i:3d}] @0x{body_off:04x} {b.kind} (segment header) counter={hdr['counter'] if hdr else '?'} field2={hdr['field2'].hex() if hdr else '?'} anchor={hdr['anchor_bytes'].hex() if hdr else '?'} tail={hdr['tail'].hex() if hdr else '?'}")
else:
print(f" [{i:3d}] @0x{body_off:04x} {b.kind} len={b.length} data={b.data[:16].hex()}")
print()
# Cumulative sample counts per block to find which block contains sample 254
print("=== cumulative samples through blocks ===")
cur_ch = "Tran"
rotation = ["Vert", "Long", "MicL", "Tran"]
seg_count = 0
samples_in_curseg = 2 # preamble Tran[0], Tran[1]
for i, b in enumerate(blocks[:30]):
if b.tag_hi == 0x40:
seg_count += 1
prev_ch = cur_ch
cur_ch = rotation[(seg_count - 1) % 4]
print(f" [{i:3d}] 40 02 -> end of {prev_ch} segment, start {cur_ch} (segment {seg_count})")
samples_in_curseg = 2 # anchors
elif (b.tag_hi & 0xF0) == 0x10:
nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo
samples_in_curseg += nn
print(f" [{i:3d}] {b.kind} nibble: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
elif (b.tag_hi & 0xF0) == 0x20:
nn = ((b.tag_hi & 0x0F) << 8) | b.tag_lo
samples_in_curseg += nn
print(f" [{i:3d}] {b.kind} int8: +{nn} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
elif b.tag_hi == 0x00:
samples_in_curseg += b.tag_lo
print(f" [{i:3d}] {b.kind} RLE: +{b.tag_lo}, ch={cur_ch}, ch_total~{samples_in_curseg}")
elif b.tag_hi == 0x30:
samples_in_curseg += b.tag_lo
print(f" [{i:3d}] {b.kind} packed12: +{b.tag_lo} samples, ch={cur_ch}, ch_total~{samples_in_curseg}")
if __name__ == "__main__":
main()
+89
View File
@@ -0,0 +1,89 @@
"""Reconnaissance helpers for cracking the Thor IDFW binary."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
TARGET = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
TXT = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/TXT/UM11719_20231219162723.IDFW.txt"
def hex_at(buf: bytes, off: int, n: int = 32) -> str:
chunk = buf[off : off + n]
hexs = " ".join(f"{b:02x}" for b in chunk)
asc = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk)
return f"{off:04x}: {hexs} {asc}"
def find_all(buf: bytes, needle: bytes) -> list[int]:
out: list[int] = []
i = 0
while True:
j = buf.find(needle, i)
if j < 0:
break
out.append(j)
i = j + 1
return out
def load_sidecar_samples(path: Path) -> dict[str, list[float]]:
"""Parse the txt sample table — Tran/Vert/Long/MicL."""
out = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
in_block = False
for line in path.read_text(errors="replace").splitlines():
if not in_block:
if line.strip() == "Waveform Data Channels":
in_block = True
continue
if line.startswith("Waveform Data USB Channels"):
break
parts = line.split("\t")
# First row is the header "\tTran\tVert\tLong\tMicL"
if len(parts) >= 5 and parts[1] == "Tran":
continue
if len(parts) < 5:
continue
try:
out["Tran"].append(float(parts[1]))
out["Vert"].append(float(parts[2]))
out["Long"].append(float(parts[3]))
out["MicL"].append(float(parts[4]))
except ValueError:
continue
return out
def main():
buf = TARGET.read_bytes()
samples = load_sidecar_samples(TXT)
print(f"file size: {len(buf)} bytes")
print(f"sample rows: Tran={len(samples['Tran'])} Vert={len(samples['Vert'])} Long={len(samples['Long'])} MicL={len(samples['MicL'])}")
print(f"first 6 Tran samples: {samples['Tran'][:6]}")
print(f"first 6 Vert samples: {samples['Vert'][:6]}")
print(f"first 6 Long samples: {samples['Long'][:6]}")
print(f"first 6 MicL samples: {samples['MicL'][:6]}")
print()
print("=== BW magic '00 02 00' positions ===")
hits = find_all(buf, b"\x00\x02\x00")
print(f"{len(hits)} hits")
for h in hits[:20]:
print(hex_at(buf, h, 24))
print()
print("=== '40 02' segment-header positions ===")
hits = find_all(buf, b"\x40\x02")
print(f"{len(hits)} hits")
for h in hits:
ctx_pre = buf[max(0, h - 4): h].hex()
ctx_post = buf[h: h + 20].hex()
# Show byte preceding to help identify real headers vs casual occurrences
print(f" 0x{h:04x} pre={ctx_pre} post={ctx_post}")
if __name__ == "__main__":
main()
+40
View File
@@ -0,0 +1,40 @@
"""Find each segment boundary in the channel and check if errors reset there."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
decoded = decode_waveform_v2(buf[0x0f1f:])
GEO_LSB = 0.0003
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = decoded[ch]
# Find every transition where error becomes zero from nonzero (or grows from zero)
# Print indices where dec resyncs back to exact match.
n = min(len(sc_counts), len(dec))
events = []
prev_match = True
for i in range(n):
match = sc_counts[i] == dec[i]
if match != prev_match:
kind = "RESYNC" if match else "DIVERGE"
events.append((i, kind, sc_counts[i], dec[i]))
prev_match = match
print(f"{ch}: {len(events)} transitions")
for i, kind, sc_v, dec_v in events[:20]:
print(f" idx {i:4d} {kind:8s} sc={sc_v:6d} dec={dec_v:6d} diff={dec_v-sc_v:+d}")
print()
if __name__ == "__main__":
main()
+46
View File
@@ -0,0 +1,46 @@
"""Smoke-test read_idf_file on IDFH across the corpus."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file
def main():
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162648.IDFH"
result = read_idf_file(target)
ev = result.event
print(f"=== {target.name} ===")
print(f" signature: {result.signature}")
print(f" serial: {ev.serial}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" kind: {ev.kind}")
print(f" intervals: {len(result.intervals or [])}")
print(f" peaks: T={ev.peaks.transverse_ips:.4f} V={ev.peaks.vertical_ips:.4f} L={ev.peaks.longitudinal_ips:.4f}")
print()
root = REPO / "tests/fixtures/THORDATA_example"
files = list(root.rglob("*.IDFH"))
ok = fail = nyi = 0
total_intervals = 0
for f in files:
try:
r = read_idf_file(f)
ok += 1
total_intervals += len(r.intervals or [])
except NotImplementedError:
nyi += 1
except Exception as exc:
fail += 1
if fail <= 3:
print(f" FAIL: {f.name}: {type(exc).__name__}: {exc}")
print(f"Corpus: {len(files)} IDFH files | ok={ok} fail={fail} nyi={nyi}")
print(f"Total intervals decoded: {total_intervals}")
if __name__ == "__main__":
main()
+48
View File
@@ -0,0 +1,48 @@
"""Smoke-test read_idf_file across the sample corpus."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_file import read_idf_file, geo_count_to_ips, mic_count_to_psi
def main():
target = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719/UM11719_20231219162723.IDFW"
result = read_idf_file(target)
ev = result.event
print(f"=== {target.name} ===")
print(f" signature: {result.signature}")
print(f" serial: {ev.serial}")
print(f" timestamp: {ev.timestamp}")
print(f" sample_rate: {ev.sample_rate}")
print(f" record_time: {ev.record_time_sec}")
print(f" calibration: {result.binary_metadata.calibration_date}")
print(f" Tran samples: {len(result.samples['Tran'])}, peak_ips={ev.peaks.transverse_ips:.4f}")
print(f" Vert samples: {len(result.samples['Vert'])}, peak_ips={ev.peaks.vertical_ips:.4f}")
print(f" Long samples: {len(result.samples['Long'])}, peak_ips={ev.peaks.longitudinal_ips:.4f}")
print(f" MicL samples: {len(result.samples['MicL'])}")
print()
# Corpus sweep
root = REPO / "tests/fixtures/THORDATA_example"
files = [f for f in root.rglob("*.IDFW") if not str(f).endswith(".CDB")]
ok = fail = nyi = 0
for f in files:
try:
r = read_idf_file(f)
ok += 1
except NotImplementedError:
nyi += 1
except Exception as exc:
fail += 1
if fail <= 5:
print(f" FAIL: {f.name}: {type(exc).__name__}: {exc}")
print()
print(f"Corpus: {len(files)} IDFW files | ok={ok} fail={fail} not-implemented={nyi}")
if __name__ == "__main__":
main()
+47
View File
@@ -0,0 +1,47 @@
"""Verify build_bw_report_from_idf against a known sidecar."""
from __future__ import annotations
import json
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from micromate.idf_ascii_report import parse_idf_report
from micromate.idf_to_bw_report import build_bw_report_from_idf
from micromate.idf_file import read_idf_file
def show(prefix: str, d: dict, indent: int = 0):
for k, v in d.items():
if isinstance(v, dict):
print(f"{' '*indent}{prefix}{k}:")
show("", v, indent + 1)
else:
print(f"{' '*indent}{prefix}{k}: {v!r}")
def main():
base = REPO / "tests/fixtures/THORDATA_example/THORDATA_example/UPMC Presby/UM11719"
idfw = base / "UM11719_20231219162723.IDFW"
txt = base / "TXT" / f"{idfw.name}.txt"
report_dict = parse_idf_report(txt.read_text(errors="replace"))
res = read_idf_file(idfw)
bw = build_bw_report_from_idf(report_dict, binary_md=res.binary_metadata)
print("=== IDFW → bw_report ===")
show("", bw)
print()
print("=== IDFH (single trigger row) ===")
idfh = base / "UM11719_20231219162648.IDFH"
txt_h = base / "TXT" / f"{idfh.name}.txt"
rh = parse_idf_report(txt_h.read_text(errors="replace"))
res_h = read_idf_file(idfh)
bw_h = build_bw_report_from_idf(rh, binary_md=res_h.binary_metadata, intervals=res_h.intervals)
show("", bw_h)
if __name__ == "__main__":
main()
Binary file not shown.
Binary file not shown.
+73
View File
@@ -0,0 +1,73 @@
"""Trace Tran sample-by-sample to find exactly where the codec drifts."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def s4(n: int) -> int:
return n if n < 8 else n - 16
def i8(b: int) -> int:
return b if b < 128 else b - 256
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
GEO_LSB = 0.0003
sc_tran = [int(round(v / GEO_LSB)) for v in sc["Tran"]]
body = buf[0x0f1f:]
# Tran[0], Tran[1] from preamble
t0 = int.from_bytes(body[3:5], "big", signed=True)
t1 = int.from_bytes(body[5:7], "big", signed=True)
print(f"preamble Tran[0]={t0} Tran[1]={t1} (sidecar: {sc_tran[0]}, {sc_tran[1]})")
# Block 0: 10 f8 at body[7:9]
print(f"block 0: tag {body[7]:02x} {body[8]:02x}")
print(f" block 0 first 10 data bytes: {body[9:19].hex()}")
# Walk block 0 manually, comparing each sample
cur = t1
samples = [t0, t1]
block_off = 7
nn = body[8]
print(f" NN = {nn}")
data = body[9 : 9 + nn // 2]
for byi, byte in enumerate(data):
for nib_idx, nib in enumerate(((byte >> 4) & 0xF, byte & 0xF)):
cur += s4(nib)
samples.append(cur)
idx = len(samples) - 1
if 0 <= idx < len(sc_tran):
sc_v = sc_tran[idx]
match = "" if sc_v == cur else ""
if idx < 12 or 240 <= idx <= 260:
print(f" idx {idx:3d}: nibble byte={byte:02x} nib={nib:x} delta={s4(nib):+d} cur={cur:+d} sc={sc_v:+d} {match}")
print(f"end of block 0: cur={cur}, len(samples)={len(samples)}, decoder expected 250 here")
# Block 1: 20 28 starts at offset 9 + 124 = 133 from block_off=7
block1_off = 9 + nn // 2
print(f"block 1: tag {body[block1_off]:02x} {body[block1_off+1]:02x} (expecting 20 28)")
nn1 = body[block1_off + 1]
print(f" block 1 NN = {nn1}")
data1 = body[block1_off + 2 : block1_off + 2 + nn1]
for byi, byte in enumerate(data1):
cur += i8(byte)
samples.append(cur)
idx = len(samples) - 1
if idx < len(sc_tran):
sc_v = sc_tran[idx]
match = "" if sc_v == cur else ""
if 248 <= idx <= 295:
print(f" idx {idx:3d}: int8 byte={byte:02x} delta={i8(byte):+d} cur={cur:+d} sc={sc_v:+d} {match}")
if __name__ == "__main__":
main()
+42
View File
@@ -0,0 +1,42 @@
"""Feed candidate body offsets to the BW codec and compare with sidecar."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2, walk_body, find_data_start
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
# Sidecar samples in 0.0003 counts (Thor geo LSB).
sc_tran = [int(round(v / 0.0003)) for v in sc["Tran"][:30]]
sc_vert = [int(round(v / 0.0003)) for v in sc["Vert"][:30]]
sc_long = [int(round(v / 0.0003)) for v in sc["Long"][:30]]
sc_micl = [int(round(v / 1e-6)) for v in sc["MicL"][:30]] # 1 µ unit for mic? Will iterate.
print(f"sidecar Tran (counts): {sc_tran}")
print(f"sidecar Vert (counts): {sc_vert}")
print(f"sidecar Long (counts): {sc_long}")
print(f"sidecar MicL (×1e-6): {sc_micl}")
print()
# Try candidate body start offsets.
for off in (0x0f1f, 0x1057, 0x11f1, 0x1333, 0x1bde, 0x0d30):
print(f"=== body @ 0x{off:04x} ===")
body = buf[off:]
decoded = decode_waveform_v2(body)
if not decoded:
print(" decode_waveform_v2 returned None")
continue
for ch in ("Tran", "Vert", "Long", "MicL"):
arr = decoded.get(ch, [])
print(f" {ch}[{len(arr)}]: {arr[:20]}")
print()
if __name__ == "__main__":
main()
+51
View File
@@ -0,0 +1,51 @@
"""Verify decode_waveform_v2 against sidecar across all 2304 samples per channel."""
from __future__ import annotations
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO))
from minimateplus.waveform_codec import decode_waveform_v2
from analysis_idf.recon import TARGET, TXT, load_sidecar_samples
def main():
buf = TARGET.read_bytes()
sc = load_sidecar_samples(TXT)
body = buf[0x0f1f:]
decoded = decode_waveform_v2(body)
print(f"Sidecar lengths: Tran={len(sc['Tran'])} Vert={len(sc['Vert'])} Long={len(sc['Long'])} MicL={len(sc['MicL'])}")
print(f"Decoded lengths: Tran={len(decoded['Tran'])} Vert={len(decoded['Vert'])} Long={len(decoded['Long'])} MicL={len(decoded['MicL'])}")
print()
GEO_LSB = 0.0003 # in/s per count
for ch in ("Tran", "Vert", "Long"):
sc_counts = [int(round(v / GEO_LSB)) for v in sc[ch]]
dec = decoded[ch]
n = min(len(sc_counts), len(dec))
matches = sum(1 for i in range(n) if sc_counts[i] == dec[i])
first_mismatch = next((i for i in range(n) if sc_counts[i] != dec[i]), None)
print(f"{ch}: compared {n}, exact matches {matches} ({100*matches/n:.2f}%)")
if first_mismatch is not None:
i = first_mismatch
print(f" first mismatch at idx {i}: sidecar={sc_counts[i]} ({sc[ch][i]}), decoded={dec[i]}")
print(f" context sidecar[{i-2}..{i+5}]: {sc_counts[max(0,i-2):i+5]}")
print(f" context decoded[{i-2}..{i+5}]: {dec[max(0,i-2):i+5]}")
# MicL: find the multiplicative factor that fits
print()
print("=== MicL scale analysis ===")
sc_micl = sc["MicL"]
dec_micl = decoded["MicL"]
# Skip zero values when computing ratio
ratios = [sc_micl[i] / dec_micl[i] for i in range(min(50, len(sc_micl), len(dec_micl))) if dec_micl[i] != 0]
if ratios:
avg = sum(ratios) / len(ratios)
print(f" avg ratio sidecar/decoded over first 50 nonzero: {avg:.4e} (n={len(ratios)})")
print(f" ratios sample: {[f'{r:.4e}' for r in ratios[:6]]}")
if __name__ == "__main__":
main()
+627
View File
@@ -0,0 +1,627 @@
#!/usr/bin/env python3
"""
ach_bridge.py — Transparent TCP bridge / splitter for Instantel MiniMate Plus
call-home (ACH) traffic.
Modes
-----
standalone Accept connection, capture frames, do NOT forward anywhere.
Good for initial discovery with a test unit.
bridge Forward to one upstream server while capturing.
Use this for the initial discovery phase with your test server.
splitter Forward to the PRIMARY upstream (production ACH server) AND
mirror a copy to a SECONDARY server simultaneously.
The device never knows — it talks to the primary the whole time.
If the mirror fails, the primary connection is unaffected.
Think of it like a headphone splitter: one input, two outputs.
Primary → authoritative responses back to device.
Mirror → gets all device bytes, its responses are discarded.
Usage
-----
# Standalone capture (test/discovery — no forwarding)
python bridges/ach_bridge.py --standalone [--port 12345]
# Bridge mode (forward to one server, e.g. your test server)
python bridges/ach_bridge.py --upstream HOST:PORT [--port 12345]
# Splitter mode (production: forward to prod + mirror to your server)
python bridges/ach_bridge.py --upstream PROD_HOST:PORT --mirror MY_HOST:PORT [--port 12345]
Setup for discovery (test server, don't touch prod)
----------------------------------------------------
1. Stand up your test ACH server, note its IP and port (e.g. 192.168.1.50:12345).
2. Take ONE test unit. In ACEmanager → Call Home, point it at:
<this machine's LAN IP> : <--port>
3. Run: python bridges/ach_bridge.py --upstream TEST_SERVER:12345 --port 12345
4. Trigger the unit. Raw frames are saved to bridges/captures/ach_<ts>/.
5. Revert the unit's ACEmanager setting when done.
Setup for production splitter (when you're ready)
-------------------------------------------------
This does NOT touch the units. Instead you re-route traffic at the network
layer so that call-home packets arrive at a machine running this script first.
Typical approach: update the DNS entry / host record your prod ACH server is
registered under to point at this machine. The units keep their existing
ACEmanager settings.
python bridges/ach_bridge.py \\
--upstream PROD_ACH_HOST:12345 \\
--mirror MY_NEW_SERVER:12345 \\
--port 12345
Output (each connection gets its own timestamped sub-directory)
------
bridges/captures/ach_<ts>/
raw_client_<ts>.bin — raw bytes from the device (S3 side)
raw_server_<ts>.bin — raw bytes from the primary upstream (BW side)
raw_mirror_<ts>.bin — raw bytes from the mirror upstream (splitter mode only)
session_<ts>.log — human-readable frame parse log
session_<ts>.jsonl — JSON-lines frame log
raw_client / raw_server are byte-for-byte compatible with parse_capture.py.
"""
from __future__ import annotations
import argparse
import asyncio
import datetime
import json
import logging
import os
import sys
from pathlib import Path
from typing import List, Optional
# Add project root to path
sys.path.insert(0, str(Path(__file__).parent.parent))
from minimateplus.framing import S3FrameParser, S3Frame
log = logging.getLogger("ach_bridge")
# ── Frame label helpers ──────────────────────────────────────────────────────
_KNOWN_RSP_SUBS = {
0xA4: "POLL_RSP",
0xA5: "BULK_WAVEFORM_RSP",
0xE0: "ADVANCE_EVENT_RSP",
0xE1: "EVENT_INDEX_FIRST_RSP",
0xE3: "MONITOR_STATUS_RSP",
0xEA: "SERIAL_NUM_RSP",
0xF3: "WAVEFORM_RECORD_RSP",
0xF5: "WAVEFORM_HEADER_RSP",
0xF7: "EVENT_INDEX_RSP",
0xF9: "UNK_06_RSP",
0xFE: "DEVICE_INFO_RSP",
# Write acks
0x97: "EVT_IDX_WRITE_ACK",
0x8C: "CONFIRM_B_ACK",
0x8E: "COMPLIANCE_WRITE_ACK",
0x8D: "CONFIRM_A_ACK",
0x7D: "TRIGGER_WRITE_ACK",
0x7C: "TRIGGER_CONFIRM_ACK",
0x96: "WAVEFORM_WRITE_ACK",
0x8B: "CONFIRM_C_ACK",
0x69: "START_MONITOR_ACK",
0x68: "STOP_MONITOR_ACK",
}
_KNOWN_REQ_SUBS = {
0x5B: "POLL",
0x5A: "BULK_WAVEFORM",
0x1F: "ADVANCE_EVENT",
0x1E: "EVENT_INDEX_FIRST",
0x1C: "MONITOR_STATUS",
0x15: "SERIAL_NUM",
0x0C: "WAVEFORM_RECORD",
0x0A: "WAVEFORM_HEADER",
0x08: "EVENT_INDEX",
0x06: "UNK_06",
0x01: "DEVICE_INFO",
# Write commands
0x68: "EVT_IDX_WRITE",
0x73: "CONFIRM_B",
0x71: "COMPLIANCE_WRITE",
0x72: "CONFIRM_A",
0x82: "TRIGGER_WRITE",
0x83: "TRIGGER_CONFIRM",
0x69: "WAVEFORM_WRITE",
0x74: "CONFIRM_C",
0x96: "START_MONITOR",
0x97: "STOP_MONITOR",
}
def _label_s3_frame(frame: S3Frame) -> str:
name = _KNOWN_RSP_SUBS.get(frame.sub, f"UNK_0x{frame.sub:02X}")
chk = "" if frame.checksum_valid else "✗CHK"
return (
f"S3→ SUB=0x{frame.sub:02X} ({name}) "
f"page=0x{frame.page_key:04X} data={len(frame.data)}B {chk}"
)
def _label_bw_frame(data: bytes, prefix: str = " →BW") -> str:
"""Best-effort label for a raw BW request frame (wire bytes)."""
# Wire layout: 41 02 10 10 00 sub ...
if len(data) < 6:
return f"{prefix} (short {len(data)}B)"
sub = data[5]
name = _KNOWN_REQ_SUBS.get(sub, f"UNK_0x{sub:02X}")
return f"{prefix} SUB=0x{sub:02X} ({name}) {len(data)}B"
# ── Per-session capture writer ─────────────────────────────────────────────────
class CaptureSession:
"""Writes raw bytes + parsed log for one TCP connection."""
def __init__(self, capture_dir: Path, peer: str, *, has_mirror: bool = False):
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
self.dir = capture_dir / f"ach_{ts}"
self.dir.mkdir(parents=True, exist_ok=True)
self.peer = peer
self._raw_client = open(self.dir / f"raw_client_{ts}.bin", "wb")
self._raw_server = open(self.dir / f"raw_server_{ts}.bin", "wb")
self._raw_mirror = (
open(self.dir / f"raw_mirror_{ts}.bin", "wb") if has_mirror else None
)
self._log_fh = open(self.dir / f"session_{ts}.log", "w")
self._jsonl_fh = open(self.dir / f"session_{ts}.jsonl", "w")
self._s3_parser = S3FrameParser()
self._frame_count = 0
self._byte_count_client = 0
self._byte_count_server = 0
self._byte_count_mirror = 0
self._log(
f"# ACH capture — peer={peer} "
f"mirror={'yes' if has_mirror else 'no'} "
f"started={datetime.datetime.now().isoformat()}"
)
self._log(f"# Output dir: {self.dir}")
log.info("Capture session opened: %s (peer=%s)", self.dir, peer)
# ── public API ────────────────────────────────────────────────────────────
def feed_client(self, data: bytes) -> None:
"""Bytes FROM the device (S3 response frames)."""
self._raw_client.write(data)
self._raw_client.flush()
self._byte_count_client += len(data)
for byte in data:
frame = self._s3_parser.feed(bytes([byte]))
if frame:
frames = frame if isinstance(frame, list) else [frame]
for f in frames:
self._frame_count += 1
label = _label_s3_frame(f)
self._log(f"[{self._frame_count:04d}] {label}")
self._log(
f" hex: {f.data[:64].hex()}"
+ (" ..." if len(f.data) > 64 else "")
)
self._emit_json("s3", f)
def feed_server(self, data: bytes) -> None:
"""Bytes FROM the primary upstream server (BW request frames)."""
self._raw_server.write(data)
self._raw_server.flush()
self._byte_count_server += len(data)
label = _label_bw_frame(data, prefix=" →BW[primary]")
self._log(f" {label}")
def feed_mirror(self, data: bytes) -> None:
"""Bytes FROM the mirror server (logged, not forwarded to device)."""
if self._raw_mirror:
self._raw_mirror.write(data)
self._raw_mirror.flush()
self._byte_count_mirror += len(data)
label = _label_bw_frame(data, prefix=" →BW[mirror] ")
self._log(f" {label} [MIRROR — not sent to device]")
def close(self, reason: str = "connection closed") -> None:
self._log(f"# Session ended: {reason}")
self._log(
f"# Totals — client={self._byte_count_client}B "
f"server={self._byte_count_server}B "
f"mirror={self._byte_count_mirror}B "
f"s3_frames={self._frame_count}"
)
handles = [self._raw_client, self._raw_server, self._log_fh, self._jsonl_fh]
if self._raw_mirror:
handles.append(self._raw_mirror)
for fh in handles:
try:
fh.close()
except Exception:
pass
log.info(
"Session closed (%s): %dB client, %dB server, %dB mirror, %d S3 frames → %s",
reason,
self._byte_count_client, self._byte_count_server,
self._byte_count_mirror, self._frame_count,
self.dir,
)
# ── internals ─────────────────────────────────────────────────────────────
def _log(self, msg: str) -> None:
print(msg, file=self._log_fh, flush=True)
print(msg)
def _emit_json(self, direction: str, frame: S3Frame) -> None:
record = {
"dir": direction,
"sub": frame.sub,
"page_key": frame.page_key,
"data_len": len(frame.data),
"data_hex": frame.data.hex(),
"checksum_valid": frame.checksum_valid,
}
print(json.dumps(record), file=self._jsonl_fh, flush=True)
# ── Bridge / splitter connection handler ──────────────────────────────────────
class BridgeHandler:
"""
Handles inbound device connections.
Modes (determined by which upstreams are configured):
standalone — no upstream_host / no mirror_host
bridge — upstream_host set, no mirror_host
splitter — upstream_host AND mirror_host both set
"""
def __init__(
self,
capture_dir: Path,
upstream_host: Optional[str],
upstream_port: Optional[int],
mirror_host: Optional[str] = None,
mirror_port: Optional[int] = None,
):
self.capture_dir = capture_dir
self.upstream_host = upstream_host
self.upstream_port = upstream_port
self.mirror_host = mirror_host
self.mirror_port = mirror_port
async def handle(
self,
client_reader: asyncio.StreamReader,
client_writer: asyncio.StreamWriter,
) -> None:
peer = client_writer.get_extra_info("peername", ("?", 0))
peer_str = f"{peer[0]}:{peer[1]}"
log.info("Inbound connection from %s", peer_str)
has_mirror = bool(self.mirror_host)
session = CaptureSession(self.capture_dir, peer_str, has_mirror=has_mirror)
if not self.upstream_host:
# ── Standalone mode ──────────────────────────────────────────────
log.info("Standalone mode — recording inbound traffic only")
try:
while True:
data = await client_reader.read(4096)
if not data:
break
session.feed_client(data)
except asyncio.CancelledError:
pass
except Exception as exc:
log.warning("Standalone read error: %s", exc)
finally:
session.close("standalone capture ended")
try:
client_writer.close()
await client_writer.wait_closed()
except Exception:
pass
return
# ── Bridge / splitter mode ───────────────────────────────────────────
# Connect to primary upstream (required)
try:
up_reader, up_writer = await asyncio.open_connection(
self.upstream_host, self.upstream_port
)
log.info("Connected to primary %s:%s", self.upstream_host, self.upstream_port)
except Exception as exc:
log.error("Failed to connect to primary upstream: %s", exc)
session.close(f"primary connect failed: {exc}")
client_writer.close()
return
# Connect to mirror upstream (optional — failure is non-fatal)
mir_reader: Optional[asyncio.StreamReader] = None
mir_writer: Optional[asyncio.StreamWriter] = None
if self.mirror_host:
try:
mir_reader, mir_writer = await asyncio.open_connection(
self.mirror_host, self.mirror_port
)
log.info("Connected to mirror %s:%s", self.mirror_host, self.mirror_port)
except Exception as exc:
log.warning(
"Mirror connect failed — continuing without mirror: %s", exc
)
session._log(f"# WARNING: mirror connect failed: {exc}")
# Build relay tasks
#
# ┌──────────┐ device bytes ┌─────────────┐
# │ Device │ ─────────────► │ PRIMARY │ responses ──► device
# └──────────┘ └─────────────┘
# │
# │ device bytes (copy)
# ▼
# ┌─────────────┐
# │ MIRROR │ responses discarded (logged only)
# └─────────────┘
#
tasks = [
asyncio.create_task(
self._relay_device(client_reader, up_writer, mir_writer, session),
name="device→upstreams",
),
asyncio.create_task(
self._relay_simple(up_reader, client_writer, session, "server"),
name="primary→device",
),
]
if mir_reader is not None:
tasks.append(asyncio.create_task(
self._relay_drain(mir_reader, session),
name="mirror→drain",
))
try:
# Wait for the device-to-upstreams relay to exit first (device
# disconnected or primary dropped). Then cancel the rest.
done, pending = await asyncio.wait(
tasks,
return_when=asyncio.FIRST_COMPLETED,
)
for t in pending:
t.cancel()
try:
await t
except (asyncio.CancelledError, Exception):
pass
except Exception as exc:
log.warning("Bridge relay error: %s", exc)
finally:
session.close("relay ended")
for writer in filter(None, [client_writer, up_writer, mir_writer]):
try:
writer.close()
await writer.wait_closed()
except Exception:
pass
# ── Relay helpers ─────────────────────────────────────────────────────────
async def _relay_device(
self,
reader: asyncio.StreamReader,
primary_writer: asyncio.StreamWriter,
mirror_writer: Optional[asyncio.StreamWriter],
session: CaptureSession,
) -> None:
"""
Read bytes from the device, write to the primary server, and also
write a copy to the mirror server (if connected). Mirror write
failures are non-fatal — we log and continue.
"""
try:
while True:
data = await reader.read(4096)
if not data:
break
session.feed_client(data)
# Primary write — failure IS fatal (lose primary = lose prod)
primary_writer.write(data)
await primary_writer.drain()
# Mirror write — failure is non-fatal
if mirror_writer is not None:
try:
mirror_writer.write(data)
await mirror_writer.drain()
except Exception as exc:
log.warning("Mirror write failed (non-fatal): %s", exc)
session._log(f"# WARNING: mirror write failed: {exc}")
mirror_writer = None # stop trying
except (asyncio.IncompleteReadError, ConnectionResetError, BrokenPipeError):
pass
async def _relay_simple(
self,
reader: asyncio.StreamReader,
writer: asyncio.StreamWriter,
session: CaptureSession,
direction: str,
) -> None:
"""Standard single-pipe relay (primary→device or vice-versa)."""
try:
while True:
data = await reader.read(4096)
if not data:
break
if direction == "server":
session.feed_server(data)
else:
session.feed_client(data)
writer.write(data)
await writer.drain()
except (asyncio.IncompleteReadError, ConnectionResetError, BrokenPipeError):
pass
async def _relay_drain(
self,
reader: asyncio.StreamReader,
session: CaptureSession,
) -> None:
"""
Read mirror server responses, log them to session, do NOT forward to
device. The device only ever sees primary server responses.
"""
try:
while True:
data = await reader.read(4096)
if not data:
break
session.feed_mirror(data)
except (asyncio.IncompleteReadError, ConnectionResetError, BrokenPipeError):
pass
# ── Main ───────────────────────────────────────────────────────────────────────
async def main(args: argparse.Namespace) -> None:
capture_dir = Path(__file__).parent / "captures"
capture_dir.mkdir(parents=True, exist_ok=True)
upstream_host: Optional[str] = None
upstream_port: Optional[int] = None
mirror_host: Optional[str] = None
mirror_port: Optional[int] = None
if not args.standalone:
if not args.upstream:
print("ERROR: --upstream HOST:PORT is required unless --standalone is set.")
sys.exit(1)
parts = args.upstream.rsplit(":", 1)
if len(parts) != 2:
print("ERROR: --upstream must be HOST:PORT (e.g. 203.0.113.5:12345)")
sys.exit(1)
upstream_host = parts[0]
upstream_port = int(parts[1])
if args.mirror:
parts = args.mirror.rsplit(":", 1)
if len(parts) != 2:
print("ERROR: --mirror must be HOST:PORT (e.g. 192.168.1.50:12345)")
sys.exit(1)
mirror_host = parts[0]
mirror_port = int(parts[1])
handler = BridgeHandler(
capture_dir,
upstream_host, upstream_port,
mirror_host, mirror_port,
)
server = await asyncio.start_server(
handler.handle,
host="0.0.0.0",
port=args.port,
)
# ── Startup banner ────────────────────────────────────────────────────────
if args.standalone:
mode = "STANDALONE capture (no forwarding)"
elif mirror_host:
mode = f"SPLITTER primary={upstream_host}:{upstream_port} mirror={mirror_host}:{mirror_port}"
else:
mode = f"BRIDGE → {upstream_host}:{upstream_port}"
addrs = ", ".join(str(s.getsockname()) for s in server.sockets)
print(f"\n{'='*70}")
print(f" ACH bridge/splitter listening on {addrs}")
print(f" Mode: {mode}")
print(f" Captures: {capture_dir}/ach_<timestamp>/")
print(f"{'='*70}")
if upstream_host and not mirror_host:
print(f"\n DISCOVERY PHASE")
print(f" Point your TEST unit's ACEmanager call-home destination to:")
print(f" <this machine's LAN IP> : {args.port}")
print(f" All traffic will be forwarded to {upstream_host}:{upstream_port}")
elif mirror_host:
print(f"\n SPLITTER MODE — PRODUCTION SAFE")
print(f" Units connect as normal. Every byte is forwarded to:")
print(f" PRIMARY (authoritative): {upstream_host}:{upstream_port}")
print(f" MIRROR (your server): {mirror_host}:{mirror_port}")
print(f" Only PRIMARY responses reach the device.")
print(f" Mirror failures are logged and do not affect the device.")
else:
print(f"\n STANDALONE MODE — capture only, nothing forwarded")
print(f" Point a unit at <this machine's LAN IP> : {args.port}")
print(f"\n Waiting for inbound connections... (Ctrl-C to stop)\n")
async with server:
await server.serve_forever()
def parse_args() -> argparse.Namespace:
p = argparse.ArgumentParser(
description=(
"Transparent TCP bridge / splitter for Instantel MiniMate Plus "
"call-home (ACH) traffic."
),
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__,
)
p.add_argument(
"--upstream", "-u",
metavar="HOST:PORT",
help=(
"Primary upstream ACH server to forward to "
"(e.g. 203.0.113.5:12345). "
"Omit with --standalone for capture-only mode."
),
)
p.add_argument(
"--mirror", "-m",
metavar="HOST:PORT",
help=(
"Mirror / secondary server to receive a copy of all device bytes "
"(splitter mode). Mirror responses are logged but NOT forwarded "
"to the device. Mirror failures are non-fatal."
),
)
p.add_argument(
"--port", "-p",
type=int,
default=12345,
help="Local port to listen on (default: 12345).",
)
p.add_argument(
"--standalone", "-s",
action="store_true",
help="Capture-only mode: accept connection, do not forward anywhere.",
)
p.add_argument(
"--verbose", "-v",
action="store_true",
help="Enable debug logging.",
)
return p.parse_args()
if __name__ == "__main__":
args = parse_args()
logging.basicConfig(
level=logging.DEBUG if args.verbose else logging.INFO,
format="%(asctime)s %(levelname)-7s %(name)s %(message)s",
)
try:
asyncio.run(main(args))
except KeyboardInterrupt:
print("\nStopped.")
+177
View File
@@ -0,0 +1,177 @@
#!/usr/bin/env python3
"""
ach_mitm.py — TCP man-in-the-middle proxy for capturing Blastware ACH sessions.
The unit calls home to THIS proxy instead of directly to Blastware. The proxy
forwards every byte in both directions to the real Blastware ACH server and saves
the traffic to separate raw capture files that the Analyzer can load directly.
Setup
-----
1. Start Blastware's ACH server on the BW PC as normal (it listens on its port).
2. Run this proxy on any machine the unit can reach:
python bridges/ach_mitm.py --bw-host 192.168.1.50 --bw-port 9999
3. Point the unit's ACEmanager call-home destination to THIS machine's IP and
the --listen-port (default 9999).
4. Trigger a call-home (or wait for the unit to call in).
5. The proxy transparently forwards everything and saves two files per session:
ach_mitm_<ts>/raw_bw_<ts>.bin -- bytes Blastware sent to unit (BW TX)
ach_mitm_<ts>/raw_s3_<ts>.bin -- bytes unit sent to Blastware (S3 TX)
Both files load directly in the Analyzer (File > Open Capture).
The proxy exits cleanly when either side drops the connection.
Use case: capturing Blastware operations we haven't reverse-engineered yet,
e.g. event deletion, factory reset, firmware update.
"""
from __future__ import annotations
import argparse
import datetime
import logging
import socket
import sys
import threading
from pathlib import Path
log = logging.getLogger("ach_mitm")
def _pipe(src: socket.socket, dst: socket.socket, label: str, outfile) -> None:
"""Forward bytes from src to dst, writing everything to outfile."""
try:
while True:
data = src.recv(4096)
if not data:
break
dst.sendall(data)
outfile.write(data)
outfile.flush()
log.debug("%s %d bytes", label, len(data))
except OSError:
pass
finally:
log.info("%s pipe closed", label)
# Signal the other direction to stop by shutting down our end.
try:
dst.shutdown(socket.SHUT_WR)
except OSError:
pass
def handle(unit_sock: socket.socket, peer: str, bw_host: str, bw_port: int,
output_dir: Path) -> None:
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
session_dir = output_dir / f"ach_mitm_{ts}"
session_dir.mkdir(parents=True, exist_ok=True)
log.info("Session %s unit=%s forwarding to %s:%d", ts, peer, bw_host, bw_port)
# Connect upstream to Blastware.
bw_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
bw_sock.connect((bw_host, bw_port))
except OSError as exc:
log.error("Cannot reach Blastware at %s:%d: %s", bw_host, bw_port, exc)
unit_sock.close()
return
log.info("Connected to Blastware at %s:%d", bw_host, bw_port)
bw_path = session_dir / f"raw_bw_{ts}.bin" # Blastware → unit (BW TX)
s3_path = session_dir / f"raw_s3_{ts}.bin" # unit → Blastware (S3 TX)
with open(bw_path, "wb") as bw_fh, open(s3_path, "wb") as s3_fh:
# Two threads: one per direction.
t_bw = threading.Thread(
target=_pipe, args=(bw_sock, unit_sock, "BW->unit", bw_fh), daemon=True
)
t_s3 = threading.Thread(
target=_pipe, args=(unit_sock, bw_sock, "unit->BW", s3_fh), daemon=True
)
t_bw.start()
t_s3.start()
t_bw.join()
t_s3.join()
bw_bytes = bw_path.stat().st_size
s3_bytes = s3_path.stat().st_size
log.info(
"Session %s done BW->unit: %d bytes unit->BW: %d bytes -> %s",
ts, bw_bytes, s3_bytes, session_dir,
)
unit_sock.close()
bw_sock.close()
def serve(args: argparse.Namespace) -> None:
output_dir = Path(args.output)
output_dir.mkdir(parents=True, exist_ok=True)
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server.bind(("0.0.0.0", args.listen_port))
server.listen(5)
server.settimeout(1.0)
print(f"\n{'='*60}")
print(f" ACH MITM proxy")
print(f" Listening on 0.0.0.0:{args.listen_port}")
print(f" Forwarding to {args.bw_host}:{args.bw_port}")
print(f" Captures in {output_dir.resolve()}/ach_mitm_<ts>/")
print(f"{'='*60}")
print(f"\n Point the unit's ACEmanager call-home to this machine on port {args.listen_port}")
print(f" Ctrl-C to stop\n")
try:
while True:
try:
client_sock, addr = server.accept()
except socket.timeout:
continue
peer = f"{addr[0]}:{addr[1]}"
log.info("Accepted connection from %s", peer)
t = threading.Thread(
target=handle,
args=(client_sock, peer, args.bw_host, args.bw_port, output_dir),
daemon=True,
)
t.start()
except KeyboardInterrupt:
print("\nStopping.")
finally:
server.close()
def main() -> None:
ap = argparse.ArgumentParser(description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
ap.add_argument("--bw-host", required=True,
help="IP or hostname of the Blastware ACH server")
ap.add_argument("--bw-port", type=int, default=9999,
help="Port Blastware is listening on (default: 9999)")
ap.add_argument("--listen-port", type=int, default=9999,
help="Port this proxy listens on (default: 9999)")
ap.add_argument("--output", default="bridges/captures/mitm",
help="Directory for capture files")
ap.add_argument("--log-level", default="INFO",
choices=["DEBUG", "INFO", "WARNING", "ERROR"])
args = ap.parse_args()
logging.basicConfig(
level=getattr(logging, args.log_level),
format="%(asctime)s %(levelname)-7s %(name)s %(message)s",
stream=sys.stdout,
)
serve(args)
if __name__ == "__main__":
main()
+904
View File
@@ -0,0 +1,904 @@
#!/usr/bin/env python3
"""
ach_server.py — Minimal inbound ACH (Auto Call Home) server for MiniMate Plus.
This IS your test server. Run it on any machine on the same network, point a
unit's ACEmanager call-home destination at it, and it will speak the full BW
protocol to the device: handshake, pull device info, download all events, save
everything as JSON.
The key thing this script tells you that no amount of packet sniffing can:
- Does the device speak first (push) or wait for us to send POLL (pull)?
If startup() completes normally → it's pull protocol, same as Blastware.
If startup() times out → the device sent something first; check raw_rx.bin.
Usage
-----
python bridges/ach_server.py [--port 12345] [--output bridges/captures/]
Setup
-----
1. Run this script on a machine on your local network.
2. In ACEmanager → Application → ALEOS Application Framework (or equivalent)
find the Call Home / ACH settings. Set:
Remote Host: <this machine's LAN IP>
Remote Port: 12345
3. Trigger the unit (wait for a vibration event, or use the manual call-home
button if your firmware version has one).
4. The unit connects. This script handshakes, downloads all events,
and saves a timestamped session directory.
Output per session
------------------
bridges/captures/ach_inbound_<ts>/
device_info.json — serial number, firmware version, calibration date, etc.
events.json — all events: timestamp, PPV per channel, peaks, metadata
raw_rx_<ts>.bin — raw bytes from the device (S3 side) for Analyzer
raw_tx_<ts>.bin — raw bytes we sent to the device (BW side) for Analyzer
session_<ts>.log — detailed protocol log
What to look for
----------------
Push vs pull: Check session_<ts>.log. If the first line after "Connected"
shows bytes arriving BEFORE the POLL probe was sent, it's push. If POLL
gets a clean response, it's pull.
Frequency: Look at raw_rx.bin in the Analyzer. SUB 5A (0xA5 responses) carry
bulk waveform data — if frequency is sent pre-computed there will be float32
values before the ADC sample blocks.
ACH-specific framing: Does the unit send anything extra before the DLE+STX
framing starts? raw_rx.bin will show raw bytes including any preamble.
"""
from __future__ import annotations
import argparse
import datetime
import json
import logging
import socket
import sys
import threading
from pathlib import Path
from typing import Optional
sys.path.insert(0, str(Path(__file__).parent.parent))
from minimateplus.transport import SocketTransport
from minimateplus.client import MiniMateClient
from minimateplus.models import DeviceInfo, Event, MonitorLogEntry
from sfm.database import SeismoDb
from sfm.waveform_store import WaveformStore
log = logging.getLogger("ach_server")
# ── Per-unit state (downloaded events index) ──────────────────────────────────
# Persisted as <output_dir>/ach_state.json
# Format (current — v2):
# {
# "BE11529": {
# "downloaded_events": { # key_hex → ISO timestamp string
# "01110000": "2026-04-11T00:42:17",
# "0111245a": "2026-04-11T01:04:30"
# },
# "max_downloaded_key": "0111245a",
# "last_seen": "2026-04-11T01:04:36",
# "serial": "BE11529",
# "peer": "63.43.212.232:51920"
# }
# }
#
# Why (key, timestamp) and not key alone:
# The device's event-key counter resets to 0x01110000 after every memory
# erase (internal or external). A bare-key dedup (the v1 format) cannot
# distinguish a re-recorded event with the same key from one we already
# downloaded. The 0C waveform record's timestamp IS unique per physical
# event, so we pair (key, timestamp) and treat a key with a different
# timestamp as a new event regardless of `max_downloaded_key`.
#
# Legacy v1 format (`downloaded_keys: list[str]` only) is auto-migrated on
# read: the keys are kept under a sentinel of "" (empty string) timestamp so
# the (key, timestamp) compare always sees a mismatch and forces a one-time
# re-download. After that pass the state is rewritten in v2 form.
_state_lock = threading.Lock()
def _load_state(state_path: Path) -> dict:
"""
Load ach_state.json, transparently migrating any legacy
`downloaded_keys: list` entries into the v2 `downloaded_events: dict`
schema. Returns the migrated state.
"""
if not state_path.exists():
return {}
try:
with open(state_path) as f:
state = json.load(f)
except Exception:
return {}
# Per-unit migration: legacy list → dict-with-empty-timestamps
for unit_key, unit_state in list(state.items()):
if not isinstance(unit_state, dict):
continue
if "downloaded_events" in unit_state:
continue
legacy_keys = unit_state.get("downloaded_keys")
if isinstance(legacy_keys, list):
unit_state["downloaded_events"] = {k: "" for k in legacy_keys}
log.info(
"ach_state: migrated %s from v1 (downloaded_keys list) → v2 "
"(downloaded_events dict, %d keys with empty timestamps; "
"they will re-validate on next session)",
unit_key, len(legacy_keys),
)
else:
unit_state["downloaded_events"] = {}
# keep legacy field for one cycle; cleared on next save
unit_state.pop("downloaded_keys", None)
return state
def _save_state(state_path: Path, state: dict) -> None:
with _state_lock:
with open(state_path, "w") as f:
json.dump(state, f, indent=2)
# ── Per-session handler ────────────────────────────────────────────────────────
class AchSession:
"""
Handles one inbound unit connection in its own thread.
Wraps the socket in a SocketTransport → MiniMateClient, then runs the
standard connect → get_device_info → get_events sequence.
State tracking (ach_state.json in output_dir):
On each successful download we record the SET of event keys downloaded.
On the next call-home we compare: if all device keys are already in the
set, there's nothing new. If any key is new (including after the device
was wiped and re-recorded), we download and save only those events.
"""
def __init__(
self,
sock: socket.socket,
peer: str,
output_dir: Path,
timeout: float,
events_only: bool,
max_events: Optional[int],
state_path: Path,
db: "SeismoDb",
store: "WaveformStore",
clear_after_download: bool = False,
restart_monitoring: bool = False,
force_redownload: bool = False,
) -> None:
self.sock = sock
self.peer = peer
self.output_dir = output_dir
self.timeout = timeout
self.events_only = events_only
self.max_events = max_events
self.state_path = state_path
self.db = db
self.store = store
self.clear_after_download = clear_after_download
self.restart_monitoring = restart_monitoring
# `force_redownload` tells this session to ignore ach_state and
# re-download every event currently on the device, regardless of any
# (key, timestamp) match. Useful as a manual override when state has
# become inconsistent with what's actually on disk / in the DB.
self.force_redownload = force_redownload
def run(self) -> None:
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
# Session dir and file handler are created lazily — only after startup
# succeeds. This prevents internet scanners and dropped connections from
# littering the output directory with empty session folders.
try:
self._run_inner(ts)
except Exception as exc:
log.error("Session failed (%s): %s", self.peer, exc, exc_info=True)
finally:
try:
self.sock.close()
except Exception:
pass
def _run_inner(self, ts: str) -> None:
transport = SocketTransport(self.sock, peer=self.peer)
# Collect raw bytes in memory until startup succeeds, then flush to disk.
raw_rx_buf: list[bytes] = [] # device → us (S3 side)
raw_tx_buf: list[bytes] = [] # us → device (BW side)
_orig_read = transport.read
_orig_write = transport.write
def tapped_read(n: int) -> bytes:
data = _orig_read(n)
if data:
raw_rx_buf.append(data)
return data
def tapped_write(data: bytes) -> None:
_orig_write(data)
if data:
raw_tx_buf.append(data)
transport.read = tapped_read # type: ignore[method-assign]
transport.write = tapped_write # type: ignore[method-assign]
serial: Optional[str] = None
# ── Step 1: startup handshake ─────────────────────────────────────────
# Do this BEFORE creating the session directory so that scanner probes
# and dropped connections leave no trace on disk.
try:
from minimateplus.protocol import MiniMateProtocol
client = MiniMateClient(transport=transport, timeout=self.timeout)
client.open()
proto = MiniMateProtocol(transport, recv_timeout=self.timeout)
proto.startup()
except Exception as exc:
log.warning("Startup failed from %s: %s -- ignoring", self.peer, exc)
return # no session dir created
# Startup succeeded — this is a real unit. Create session dir now.
session_dir = self.output_dir / f"ach_inbound_{ts}"
session_dir.mkdir(parents=True, exist_ok=True)
log_path = session_dir / f"session_{ts}.log"
raw_rx_path = session_dir / f"raw_rx_{ts}.bin" # device → us (S3 side)
raw_tx_path = session_dir / f"raw_tx_{ts}.bin" # us → device (BW side)
# Flush buffered bytes to files and switch to direct file writes.
raw_rx_fh = open(raw_rx_path, "wb")
raw_tx_fh = open(raw_tx_path, "wb")
for chunk in raw_rx_buf:
raw_rx_fh.write(chunk)
for chunk in raw_tx_buf:
raw_tx_fh.write(chunk)
raw_rx_buf.clear()
raw_tx_buf.clear()
def tapped_read_file(n: int) -> bytes:
data = _orig_read(n)
if data:
raw_rx_fh.write(data)
raw_rx_fh.flush()
return data
def tapped_write_file(data: bytes) -> None:
_orig_write(data)
if data:
raw_tx_fh.write(data)
raw_tx_fh.flush()
transport.read = tapped_read_file # type: ignore[method-assign]
transport.write = tapped_write_file # type: ignore[method-assign]
# Wire up file handler now that the session dir exists.
fh = logging.FileHandler(log_path, encoding="utf-8")
fh.setFormatter(logging.Formatter("%(asctime)s %(levelname)-7s %(name)s %(message)s"))
root_logger = logging.getLogger()
root_logger.addHandler(fh)
try:
# ── Step 2: device info ───────────────────────────────────────────
device_info = None
if not self.events_only:
log.info("Step 2/3: reading device info")
try:
device_info = client.connect()
serial = device_info.serial
_save_json(session_dir / "device_info.json", _device_info_to_dict(device_info))
log.info(
" [OK] Device: serial=%s firmware=%s model=%s events=%d",
serial,
device_info.firmware_version,
device_info.model,
device_info.event_count or 0,
)
except Exception as exc:
log.error(" [FAIL] Device info failed: %s", exc)
else:
log.info("Step 2/3: skipping device info (--events-only)")
# ── Step 3: check for new events by comparing key sets ────────────
log.info("Step 3/3: checking for new events")
state = _load_state(self.state_path)
unit_key = serial or self.peer # fall back to IP if no serial
unit_state = state.get(unit_key, {})
# downloaded_events is the v2 (key_hex → timestamp_iso) dict.
# Empty-string timestamps are migrated v1 entries — they force a
# one-time re-download because the (key, timestamp) compare always
# mismatches against any non-empty timestamp from a fresh 0C read.
seen_events: dict[str, str] = dict(unit_state.get("downloaded_events", {}))
max_seen_key: str = unit_state.get("max_downloaded_key", "00000000")
if self.force_redownload:
log.info(" --force-redownload-all set — ignoring %d cached "
"(key, timestamp) entries for this session",
len(seen_events))
seen_events = {}
# Walk the event index (browse-mode, no 5A) to get the actual current
# key list. The SUB 08 event_count field is a lifetime "total events
# ever recorded" counter that does NOT decrement on erase — confirmed
# 2026-04-13. list_event_keys() via the 1E/1F chain is the only
# reliable way to know what is actually stored on the device right now.
log.info(" Checking device key list (browse walk, no waveform download)...")
try:
device_keys = client.list_event_keys()
except Exception as exc:
log.warning(" list_event_keys failed: %s -- falling back to full download", exc)
device_keys = None
current_count = len(device_keys) if device_keys is not None else 0
log.info(" Unit has %d stored event(s); %d (key, ts) entr(ies) previously downloaded",
current_count, len(seen_events))
if device_keys is not None and current_count == 0:
log.info(" [OK] No events on device -- nothing to download")
log.info("Session complete (no events) -> %s", session_dir)
return
if device_keys is not None:
# ── Post-erase detection (best-effort, key-only signal) ───────
# After erase the device's key counter resets to 01110000.
# If the device's current max key is below our high-water mark
# we know erase happened. This catches the cleanest case but
# does NOT catch erase-then-record-many-events (where the new
# max may climb past the old max). The (key, timestamp) check
# in get_events() is what handles those.
if device_keys and max_seen_key != "00000000":
max_device_key = max(device_keys)
if max_device_key < max_seen_key:
log.info(
" Post-erase reset detected: "
"device max key %s < historical max %s "
"-- discarding stale (key, ts) state for this session",
max_device_key, max_seen_key,
)
seen_events = {}
# Note: no early-exit "all already downloaded" short-circuit
# here. Without per-event timestamps we cannot tell whether
# device_keys ⊆ seen_events.keys() actually means we have
# those physical events. get_events() will read 0C on its
# skip path and decide per event.
# Apply max_events cap
# stop_idx: when we know the count from list_event_keys, use it as
# an upper bound. When list_event_keys failed (device_keys is None),
# pass None — get_events will run until the null sentinel naturally.
stop_idx: Optional[int] = (current_count - 1) if device_keys is not None else None
if self.max_events is not None:
cap = self.max_events - 1
stop_idx = cap if stop_idx is None else min(stop_idx, cap)
if device_keys is not None and self.max_events < current_count:
log.warning(
" max_events=%d cap: will download events 0-%d only "
"(unit has %d total)",
self.max_events, stop_idx, current_count,
)
try:
# Pass `seen_events` (key → ISO timestamp) so the client can
# read 0C on its skip path and only skip 5A when the per-event
# timestamp matches what we already have on disk. When force_-
# redownload is set, seen_events was already cleared above.
#
# Filter out empty-string timestamps (legacy v1 entries) — the
# client's 0C-on-skip-path only trusts entries with a
# populated timestamp; otherwise it falls through to a full
# 5A download.
skip_dict = {k: ts for k, ts in seen_events.items() if ts}
all_events = client.get_events(
full_waveform=True,
stop_after_index=stop_idx,
skip_waveform_for_events=skip_dict if skip_dict else None,
)
# New events are those that came back with _a5_frames populated
# (= 5A actually ran on this session). Skipped events have
# _a5_frames = None because the client matched (key, timestamp)
# against skip_dict and bypassed 5A.
new_events = [
e for e in all_events
if getattr(e, "_a5_frames", None)
]
skipped = len(all_events) - len(new_events)
log.info(" [OK] Walked %d event(s): %d downloaded, %d skipped (matched (key, ts) in state)",
len(all_events), len(new_events), skipped)
# ── Persist event file + A5 sidecar to the waveform store ──
# Saves ride alongside the existing JSON dump so the on-disk
# event file and events.json reference the same set of events.
waveform_records: dict[str, dict] = {}
for ev in new_events:
if not ev._a5_frames:
continue
try:
rec = self.store.save(
ev,
serial=serial or "UNKNOWN",
a5_frames=ev._a5_frames,
)
if ev._waveform_key is not None:
waveform_records[ev._waveform_key.hex()] = rec
log.info(
" [WAVE] saved %s (%d bytes)",
rec["filename"], rec["filesize"],
)
except Exception as exc:
key_hex = ev._waveform_key.hex() if ev._waveform_key else "????????"
log.warning(
" [WARN] Waveform store save failed for %s: %s",
key_hex, exc,
)
if new_events:
_save_json(
session_dir / "events.json",
[_event_to_dict(e, waveform_records) for e in new_events],
)
for ev in new_events:
pv = ev.peak_values
pi = ev.project_info
key_hex = ev._waveform_key.hex() if ev._waveform_key else "????????"
log.info(
" NEW [%s] %s Tran=%.4f Vert=%.4f Long=%.4f VS=%.4f project=%r",
key_hex,
str(ev.timestamp) if ev.timestamp else "?",
pv.tran if pv else 0,
pv.vert if pv else 0,
pv.long if pv else 0,
pv.peak_vector_sum if pv else 0,
pi.project if pi else "",
)
else:
log.info(" [OK] No new events since last call-home -- nothing to save")
# ── Monitor log entries (partial records / continuous monitoring) ──
# Browse walk (0A + 1F only) to collect monitor log entries for
# recording intervals where no threshold was crossed. This is a
# second 1E-based pass over the device's record list, separate from
# the get_events() download loop above.
log.info(" Collecting monitor log entries (browse walk)...")
new_monitor_entries: list[MonitorLogEntry] = []
try:
new_monitor_entries = client.get_monitor_log_entries(
skip_keys=seen_keys if seen_keys else None,
)
if new_monitor_entries:
_save_json(
session_dir / "monitor_log.json",
[_monitor_log_entry_to_dict(e) for e in new_monitor_entries],
)
log.info(
" [OK] %d new monitor log entry(s) saved",
len(new_monitor_entries),
)
for ml in new_monitor_entries:
log.info(
" MONLOG [%s] %s%s (%s)",
ml.key,
ml.start_time.isoformat() if ml.start_time else "?",
ml.stop_time.isoformat() if ml.stop_time else "?",
f"{ml.duration_seconds:.0f}s" if ml.duration_seconds is not None else "?s",
)
else:
log.info(" [OK] No new monitor log entries")
except Exception as exc:
log.warning(
" [WARN] Monitor log collection failed: %s -- continuing",
exc,
)
# ── Persist to SQLite DB ─────────────────────────────────────
_session_start = datetime.datetime.now()
try:
_ev_ins, _ev_skip = self.db.insert_events(
new_events,
serial=serial or self.peer,
session_id=None,
waveform_records=waveform_records,
device_family="series3",
)
_ml_ins, _ml_skip = self.db.insert_monitor_log(
new_monitor_entries, session_id=None
)
_session_id = self.db.insert_ach_session(
serial=serial or self.peer,
peer=self.peer,
events_downloaded=_ev_ins,
monitor_entries=_ml_ins,
duration_seconds=(datetime.datetime.now() - _session_start).total_seconds(),
session_time=_session_start,
)
log.info(
" [DB] session=%s events +%d (skip %d) monitor +%d (skip %d)",
_session_id[:8], _ev_ins, _ev_skip, _ml_ins, _ml_skip,
)
except Exception as exc:
log.warning(" [WARN] DB write failed: %s -- continuing", exc)
# ── Optional: erase device memory after successful download ────
erased_successfully = False
if self.clear_after_download and new_events:
log.info(" Clearing device memory (--clear-after-download)...")
try:
client.delete_all_events()
log.info(" [OK] Device memory cleared")
erased_successfully = True
except Exception as exc:
log.error(
" [WARN] Event deletion failed: %s -- events NOT cleared",
exc,
)
# ── Update persistent state ───────────────────────────────────
# Build a fresh (key → ISO timestamp) map from THIS session's
# results. For each event currently on the device, prefer the
# timestamp we just observed (from 0C); fall back to whatever
# was already in seen_events for that key (so we don't lose an
# entry just because get_events skipped it on the (key, ts)
# match path).
def _ts_iso(ev) -> str:
ts = getattr(ev, "timestamp", None)
if ts is None:
return ""
try:
return datetime.datetime(
ts.year, ts.month, ts.day,
ts.hour or 0, ts.minute or 0, ts.second or 0,
).isoformat()
except Exception:
return str(ts)
current_events_map: dict[str, str] = {}
for ev in all_events:
if ev._waveform_key is None:
continue
key_hex = ev._waveform_key.hex()
ts_iso = _ts_iso(ev) or seen_events.get(key_hex, "")
current_events_map[key_hex] = ts_iso
# Monitor-log entries don't have a 0C-style timestamp, but
# they DO have a start_time; use that so the monitor-log keys
# are properly entered into the (key, ts) map.
for ml in new_monitor_entries:
key_hex = ml.key
ts = ml.start_time
ts_iso = ts.isoformat() if ts else seen_events.get(key_hex, "")
# If a triggered event already populated this key, keep
# whichever has a non-empty timestamp.
if key_hex not in current_events_map or not current_events_map[key_hex]:
current_events_map[key_hex] = ts_iso
if erased_successfully:
updated_events: dict[str, str] = {}
new_max_key = "00000000"
log.info(
" State reset after erase -- next session will download "
"from key 0 (device counter resets after erase)"
)
else:
# Merge: keep prior (key, ts) entries we still have evidence
# of (for survivors of any partial failure), plus this
# session's authoritative (key, ts) pairs.
updated_events = dict(seen_events)
updated_events.update(current_events_map)
new_max_key = (
max(updated_events.keys())
if updated_events else max_seen_key
)
state[unit_key] = {
"downloaded_events": updated_events,
"max_downloaded_key": new_max_key,
"last_seen": datetime.datetime.now().isoformat(),
"serial": serial,
"peer": self.peer,
}
_save_state(self.state_path, state)
except Exception as exc:
log.error(" [FAIL] Event download failed: %s", exc, exc_info=True)
# ── Optional: restart monitoring after successful download ─────────
if self.restart_monitoring:
log.info(" Restarting monitoring on device (--restart-monitoring)...")
try:
client.start_monitoring()
log.info(" [OK] Monitoring restarted")
except Exception as exc:
log.warning(" [WARN] Failed to restart monitoring: %s", exc)
finally:
raw_rx_fh.close()
raw_tx_fh.close()
client.close() # closes transport / socket cleanly
root_logger.removeHandler(fh)
fh.close()
log.info("Session complete -> %s", session_dir)
log.info("="*60)
# ── JSON helpers ───────────────────────────────────────────────────────────────
def _save_json(path: Path, obj: object) -> None:
with open(path, "w") as f:
json.dump(obj, f, indent=2, default=str)
log.debug("Saved %s", path)
def _device_info_to_dict(d: DeviceInfo) -> dict:
cc = d.compliance_config
return {
"serial": d.serial,
"firmware_version": d.firmware_version,
"dsp_version": d.dsp_version,
"model": d.model,
"event_count": d.event_count,
# compliance config fields (None if 1A read failed)
"setup_name": cc.setup_name if cc else None,
"sample_rate": cc.sample_rate if cc else None,
"record_time": cc.record_time if cc else None,
"trigger_level_geo": cc.trigger_level_geo if cc else None,
"alarm_level_geo": cc.alarm_level_geo if cc else None,
"geo_adc_scale": cc.geo_adc_scale if cc else None, # hw scale factor (in/s)/V
"geo_range": cc.geo_range if cc else None, # 0x01=Normal 10in/s, 0x00=Sensitive 1.25in/s (unconfirmed)
"project": cc.project if cc else None,
"client": cc.client if cc else None,
"operator": cc.operator if cc else None,
"sensor_location": cc.sensor_location if cc else None,
}
def _event_to_dict(
e: Event,
waveform_records: Optional[dict[str, dict]] = None,
) -> dict:
pv = e.peak_values
pi = e.project_info
peaks = {}
if pv:
peaks = {
"transverse": pv.tran,
"vertical": pv.vert,
"longitudinal": pv.long,
"vector_sum": pv.peak_vector_sum,
"mic": pv.micl,
}
samples = {}
if e.raw_samples:
samples = {
ch: vals[:20] # first 20 sample-sets to keep the file sane
for ch, vals in e.raw_samples.items()
}
samples["__note__"] = "first 20 sample-sets only; see raw_rx.bin for full waveform"
rec: dict = {}
if waveform_records and e._waveform_key is not None:
rec = waveform_records.get(e._waveform_key.hex(), {}) or {}
return {
"timestamp": str(e.timestamp) if e.timestamp else None,
"project": pi.project if pi else None,
"client": pi.client if pi else None,
"operator": pi.operator if pi else None,
"sensor_location": pi.sensor_location if pi else None,
"peaks": peaks,
"raw_samples_preview": samples,
"blastware_filename": rec.get("filename"),
"blastware_filesize": rec.get("filesize"),
"a5_pickle_filename": rec.get("a5_pickle_filename"),
}
def _monitor_log_entry_to_dict(e: MonitorLogEntry) -> dict:
return {
"key": e.key,
"start_time": e.start_time.isoformat() if e.start_time else None,
"stop_time": e.stop_time.isoformat() if e.stop_time else None,
"duration_seconds": e.duration_seconds,
"serial": e.serial,
"geo_threshold_ips": e.geo_threshold_ips,
}
# ── Main server loop ───────────────────────────────────────────────────────────
def serve(args: argparse.Namespace) -> None:
output_dir = Path(args.output)
output_dir.mkdir(parents=True, exist_ok=True)
state_path = output_dir / "ach_state.json"
db = SeismoDb(output_dir / "seismo_relay.db")
store = WaveformStore(output_dir / "waveforms")
server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind(("0.0.0.0", args.port))
server_sock.listen(5)
# Wake up every second so Ctrl-C is handled promptly on Windows.
# Without this, accept() blocks indefinitely and ignores KeyboardInterrupt.
server_sock.settimeout(1.0)
max_ev = args.max_events
print(f"\n{'='*60}")
print(f" ACH inbound server listening on 0.0.0.0:{args.port}")
print(f" Output: {output_dir.resolve()}/ach_inbound_<timestamp>/")
print(f" State file: {state_path}")
print(f" Max events per session: {max_ev if max_ev else 'unlimited'}")
print(f" Clear device after download: {'YES' if args.clear_after_download else 'no'}")
print(f" Restart monitoring after download: {'YES' if args.restart_monitoring else 'no'}")
print(f" Force re-download all (ignore state): {'YES' if args.force_redownload_all else 'no'}")
print(f"{'='*60}")
print(f"\n Point your test unit's ACEmanager call-home settings to:")
print(f" Remote Host: <this machine's LAN IP>")
print(f" Remote Port: {args.port}")
print(f"\n Waiting for inbound connections... (Ctrl-C to stop)\n")
allow_ips = set(args.allow_ips)
if allow_ips:
print(f" Allowlist: {', '.join(sorted(allow_ips))}")
else:
print(" Allowlist: NONE -- accepting all IPs (add --allow-ip to restrict)")
try:
while True:
try:
client_sock, addr = server_sock.accept()
except socket.timeout:
continue # no connection this second; loop back and check for Ctrl-C
try:
peer_ip = addr[0]
peer = f"{addr[0]}:{addr[1]}"
if allow_ips and peer_ip not in allow_ips:
log.info("Rejected connection from %s (not in allowlist)", peer)
client_sock.close()
continue
log.info("Accepted connection from %s", peer)
session = AchSession(
sock=client_sock,
peer=peer,
output_dir=output_dir,
timeout=args.timeout,
events_only=args.events_only,
max_events=max_ev,
state_path=state_path,
db=db,
store=store,
clear_after_download=args.clear_after_download,
restart_monitoring=args.restart_monitoring,
force_redownload=args.force_redownload_all,
)
t = threading.Thread(target=session.run, daemon=True, name=f"ach-{peer}")
t.start()
except KeyboardInterrupt:
raise
except Exception as exc:
log.error("Accept error: %s", exc)
finally:
server_sock.close()
print("\nServer stopped.")
def parse_args() -> argparse.Namespace:
p = argparse.ArgumentParser(
description="Minimal inbound ACH server — speak BW protocol to calling MiniMate Plus units.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__,
)
p.add_argument(
"--port", "-p",
type=int,
default=12345,
help="Port to listen on (default: 12345).",
)
p.add_argument(
"--output", "-o",
default=str(Path(__file__).parent / "captures"),
metavar="DIR",
help="Directory to write session captures (default: bridges/captures/).",
)
p.add_argument(
"--timeout", "-t",
type=float,
default=30.0,
help="Protocol receive timeout in seconds (default: 30.0).",
)
p.add_argument(
"--events-only",
action="store_true",
help="Skip the device-info step and go straight to event download.",
)
p.add_argument(
"--max-events",
type=int,
default=None,
metavar="N",
help=(
"Safety cap: download at most N events per session (default: unlimited). "
"Useful if a unit has many old events stored — prevents a very long first run."
),
)
p.add_argument(
"--allow-ip",
metavar="IP",
action="append",
dest="allow_ips",
default=[],
help=(
"Only accept connections from this IP address (repeat for multiple). "
"Example: --allow-ip 63.43.212.232 "
"If not specified, all IPs are accepted (not recommended for public servers)."
),
)
p.add_argument(
"--restart-monitoring",
action="store_true",
default=False,
help=(
"After downloading events, send SUB 0x96 (start monitoring) before "
"disconnecting. Required for RV55 units whose firmware does not assert "
"DCD on disconnect — without this the unit stays idle after a call-home."
),
)
p.add_argument(
"--clear-after-download",
action="store_true",
default=False,
help=(
"After successfully downloading new events, erase all events from the "
"device memory (SUB 0xA3 → 0x1C → 0x06 → 0xA2 sequence, confirmed from "
"4-11-26 MITM capture). Only fires when at least one new event was saved. "
"This mirrors the standard Blastware ACH workflow."
),
)
p.add_argument(
"--force-redownload-all",
action="store_true",
default=False,
help=(
"Manual override: ignore ach_state.json's downloaded_events map "
"for this session and re-download every event currently on the "
"device, regardless of (key, timestamp) match. Useful when state "
"has become inconsistent with the on-disk waveform store / DB."
),
)
p.add_argument(
"--verbose", "-v",
action="store_true",
help="Enable debug logging.",
)
return p.parse_args()
if __name__ == "__main__":
args = parse_args()
logging.basicConfig(
level=logging.DEBUG if args.verbose else logging.INFO,
format="%(asctime)s %(levelname)-7s %(name)s %(message)s",
)
try:
serve(args)
except KeyboardInterrupt:
print("\nStopped.")
+34 -29
View File
@@ -58,16 +58,24 @@ class BridgeGUI(tk.Tk):
tk.Entry(self, textvariable=self.logdir_var, width=24).grid(row=1, column=3, sticky="we", **pad)
tk.Button(self, text="Browse", command=self._choose_dir).grid(row=1, column=4, sticky="w", **pad)
# Row 2: Raw taps
self.raw_bw_var = tk.StringVar(value="")
self.raw_s3_var = tk.StringVar(value="")
tk.Checkbutton(self, text="Save BW->S3 raw", command=self._toggle_raw_bw, onvalue="1", offvalue="").grid(row=2, column=0, sticky="w", **pad)
tk.Entry(self, textvariable=self.raw_bw_var, width=28).grid(row=2, column=1, columnspan=3, sticky="we", **pad)
tk.Button(self, text="...", command=lambda: self._choose_file(self.raw_bw_var, "bw")).grid(row=2, column=4, **pad)
# Row 2: Raw taps — ON by default; "auto" = timestamped name; blank checkbox = disabled
self.raw_bw_enabled = tk.IntVar(value=1)
self.raw_s3_enabled = tk.IntVar(value=1)
# Path fields: empty means "auto" (bridge picks a timestamped name)
self.raw_bw_path_var = tk.StringVar(value="")
self.raw_s3_path_var = tk.StringVar(value="")
tk.Checkbutton(self, text="Save S3->BW raw", command=self._toggle_raw_s3, onvalue="1", offvalue="").grid(row=3, column=0, sticky="w", **pad)
tk.Entry(self, textvariable=self.raw_s3_var, width=28).grid(row=3, column=1, columnspan=3, sticky="we", **pad)
tk.Button(self, text="...", command=lambda: self._choose_file(self.raw_s3_var, "s3")).grid(row=3, column=4, **pad)
tk.Checkbutton(self, text="BW→S3 raw (auto)", variable=self.raw_bw_enabled,
command=self._toggle_raw_bw).grid(row=2, column=0, sticky="w", **pad)
tk.Entry(self, textvariable=self.raw_bw_path_var, width=28,
fg="grey").grid(row=2, column=1, columnspan=3, sticky="we", **pad)
tk.Button(self, text="...", command=lambda: self._choose_file(self.raw_bw_path_var, "bw")).grid(row=2, column=4, **pad)
tk.Checkbutton(self, text="S3→BW raw (auto)", variable=self.raw_s3_enabled,
command=self._toggle_raw_s3).grid(row=3, column=0, sticky="w", **pad)
tk.Entry(self, textvariable=self.raw_s3_path_var, width=28,
fg="grey").grid(row=3, column=1, columnspan=3, sticky="we", **pad)
tk.Button(self, text="...", command=lambda: self._choose_file(self.raw_s3_path_var, "s3")).grid(row=3, column=4, **pad)
# Row 4: Status + buttons
self.status_var = tk.StringVar(value="Idle")
@@ -102,13 +110,11 @@ class BridgeGUI(tk.Tk):
var.set(filename)
def _toggle_raw_bw(self) -> None:
if not self.raw_bw_var.get():
# default name
self.raw_bw_var.set(os.path.join(self.logdir_var.get(), "raw_bw.bin"))
# Checkbox toggled — no path action needed; enabled state drives the flag.
pass
def _toggle_raw_s3(self) -> None:
if not self.raw_s3_var.get():
self.raw_s3_var.set(os.path.join(self.logdir_var.get(), "raw_s3.bin"))
pass
def start_bridge(self) -> None:
if self.process and self.process.poll() is None:
@@ -126,23 +132,22 @@ class BridgeGUI(tk.Tk):
args = [sys.executable, BRIDGE_PATH, "--bw", bw, "--s3", s3, "--baud", baud, "--logdir", logdir]
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
# Raw tap flags.
# Checkbox on + empty path → pass "auto" (bridge generates timestamped name).
# Checkbox on + explicit path → pass that path.
# Checkbox off → pass "" to disable (overrides bridge's auto default).
raw_bw_explicit = self.raw_bw_path_var.get().strip()
raw_s3_explicit = self.raw_s3_path_var.get().strip()
raw_bw = self.raw_bw_var.get().strip()
raw_s3 = self.raw_s3_var.get().strip()
if self.raw_bw_enabled.get():
args += ["--raw-bw", raw_bw_explicit if raw_bw_explicit else "auto"]
else:
args += ["--raw-bw", ""] # explicit disable
# If the user left the default generic name, replace with a timestamped one
# so each session gets its own file.
if raw_bw:
if os.path.basename(raw_bw) in ("raw_bw.bin", "raw_bw"):
raw_bw = os.path.join(os.path.dirname(raw_bw) or logdir, f"raw_bw_{ts}.bin")
self.raw_bw_var.set(raw_bw)
args += ["--raw-bw", raw_bw]
if raw_s3:
if os.path.basename(raw_s3) in ("raw_s3.bin", "raw_s3"):
raw_s3 = os.path.join(os.path.dirname(raw_s3) or logdir, f"raw_s3_{ts}.bin")
self.raw_s3_var.set(raw_s3)
args += ["--raw-s3", raw_s3]
if self.raw_s3_enabled.get():
args += ["--raw-s3", raw_s3_explicit if raw_s3_explicit else "auto"]
else:
args += ["--raw-s3", ""] # explicit disable
try:
self.process = subprocess.Popen(
+88 -13
View File
@@ -93,8 +93,11 @@ class SessionLogger:
self._bin_fh = open(bin_path, "ab", buffering=0)
self._lock = threading.Lock()
# Optional pure-byte taps (no headers). BW=Blastware tx, S3=device tx.
# These can be opened/closed on demand via start_raw_capture/stop_raw_capture.
self._raw_bw = open(raw_bw_path, "ab", buffering=0) if raw_bw_path else None
self._raw_s3 = open(raw_s3_path, "ab", buffering=0) if raw_s3_path else None
self._cap_bw_path: Optional[str] = raw_bw_path
self._cap_s3_path: Optional[str] = raw_s3_path
def log_line(self, line: str) -> None:
with self._lock:
@@ -124,6 +127,43 @@ class SessionLogger:
self.log_line(f"[{ts}] [INFO] {msg}")
self.bin_write_record(REC_INFO, msg.encode("utf-8", errors="replace"))
def start_raw_capture(self, label: str, logdir: str) -> tuple:
"""Open new raw tap files for a named capture. Returns (bw_path, s3_path)."""
ts = _dt.datetime.now().strftime("%Y%m%d_%H%M%S")
safe = "".join(c if c.isalnum() or c in "-_" else "_" for c in label)[:40] if label else ""
suffix = f"_{safe}" if safe else ""
bw_path = os.path.join(logdir, f"raw_bw_{ts}{suffix}.bin")
s3_path = os.path.join(logdir, f"raw_s3_{ts}{suffix}.bin")
with self._lock:
# Close any previously open taps first
if self._raw_bw:
self._raw_bw.close()
if self._raw_s3:
self._raw_s3.close()
self._raw_bw = open(bw_path, "ab", buffering=0)
self._raw_s3 = open(s3_path, "ab", buffering=0)
self._cap_bw_path = bw_path
self._cap_s3_path = s3_path
self.log_info(f"raw capture started: label={label!r} bw={bw_path} s3={s3_path}")
return bw_path, s3_path
def stop_raw_capture(self) -> tuple:
"""Close raw tap files. Returns (bw_path, s3_path) for the capture just closed."""
with self._lock:
bw = self._cap_bw_path
s3 = self._cap_s3_path
if self._raw_bw:
self._raw_bw.close()
self._raw_bw = None
if self._raw_s3:
self._raw_s3.close()
self._raw_s3 = None
self._cap_bw_path = None
self._cap_s3_path = None
if bw:
self.log_info(f"raw capture stopped: bw={bw} s3={s3}")
return bw, s3
def close(self) -> None:
with self._lock:
try:
@@ -291,8 +331,18 @@ def forward_loop(
time.sleep(0.002)
def annotation_loop(logger: SessionLogger, stop: threading.Event) -> None:
print("[MARK] Type 'm' + Enter to annotate the capture. Ctrl+C to stop.")
def annotation_loop(logger: SessionLogger, logdir: str, stop: threading.Event) -> None:
"""
Reads stdin commands while the bridge runs.
Commands:
m — prompt for a mark label (interactive)
CAP_START:<label> — begin a raw tap capture with the given label
CAP_STOP — stop the current raw tap capture
Responses (printed to stdout, parsed by the GUI):
[CAP_START] <bw_path>\\t<s3_path>
[CAP_STOP] <bw_path>\\t<s3_path>
"""
while not stop.is_set():
try:
line = input()
@@ -303,7 +353,21 @@ def annotation_loop(logger: SessionLogger, stop: threading.Event) -> None:
if not line:
continue
if line.lower() == "m":
if line.startswith("CAP_START:"):
label = line[10:].strip()
bw_path, s3_path = logger.start_raw_capture(label, logdir)
print(f"[CAP_START] {bw_path}\t{s3_path}")
sys.stdout.flush()
elif line == "CAP_STOP":
bw_path, s3_path = logger.stop_raw_capture()
if bw_path:
print(f"[CAP_STOP] {bw_path}\t{s3_path}")
else:
print("[CAP_STOP] no active capture")
sys.stdout.flush()
elif line.lower() == "m":
try:
sys.stdout.write(" Label: ")
sys.stdout.flush()
@@ -315,8 +379,9 @@ def annotation_loop(logger: SessionLogger, stop: threading.Event) -> None:
print(f" [MARK written] {label}")
else:
print(" (empty label — mark cancelled)")
else:
print(" (type 'm' + Enter to annotate)")
print(f" (unknown command: {line!r})")
def main() -> int:
@@ -325,8 +390,14 @@ def main() -> int:
ap.add_argument("--s3", default="COM5", help="S3-side COM port (default: COM5)")
ap.add_argument("--baud", type=int, default=38400, help="Baud rate (default: 38400)")
ap.add_argument("--logdir", default=".", help="Directory to write session logs into (default: .)")
ap.add_argument("--raw-bw", default=None, help="Optional file to append raw bytes sent from BW->S3 (no headers)")
ap.add_argument("--raw-s3", default=None, help="Optional file to append raw bytes sent from S3->BW (no headers)")
ap.add_argument("--raw-bw", default="auto",
help="File to append raw bytes sent from BW->S3 (no headers). "
"Default 'auto' generates a timestamped name in --logdir. "
"Pass an empty string to disable.")
ap.add_argument("--raw-s3", default="auto",
help="File to append raw bytes sent from S3->BW (no headers). "
"Default 'auto' generates a timestamped name in --logdir. "
"Pass an empty string to disable.")
ap.add_argument("--quiet", action="store_true", help="No console heartbeat output")
ap.add_argument("--status-every", type=float, default=0.0, help="Seconds between console heartbeat lines (default: 0 = off)")
args = ap.parse_args()
@@ -349,12 +420,16 @@ def main() -> int:
# If raw tap flags were passed without a path (bare --raw-bw / --raw-s3),
# or if the sentinel value "auto" is used, generate a timestamped name.
# If a specific path was provided, use it as-is (caller's responsibility).
raw_bw_path = args.raw_bw
raw_s3_path = args.raw_s3
if raw_bw_path in (None, "", "auto"):
raw_bw_path = os.path.join(args.logdir, f"raw_bw_{ts}.bin") if args.raw_bw is not None else None
if raw_s3_path in (None, "", "auto"):
raw_s3_path = os.path.join(args.logdir, f"raw_s3_{ts}.bin") if args.raw_s3 is not None else None
# Resolve raw tap paths.
# "auto" (default) → timestamped file in logdir (always captured).
# Explicit path → use verbatim.
# None or "" → disabled (pass --raw-bw "" to suppress capture).
raw_bw_path: Optional[str] = args.raw_bw if args.raw_bw else None
raw_s3_path: Optional[str] = args.raw_s3 if args.raw_s3 else None
if raw_bw_path == "auto":
raw_bw_path = os.path.join(args.logdir, f"raw_bw_{ts}.bin")
if raw_s3_path == "auto":
raw_s3_path = os.path.join(args.logdir, f"raw_s3_{ts}.bin")
logger = SessionLogger(log_path, bin_path, raw_bw_path=raw_bw_path, raw_s3_path=raw_s3_path)
@@ -391,7 +466,7 @@ def main() -> int:
t_ann = threading.Thread(
target=annotation_loop,
name="Annotator",
args=(logger, stop),
args=(logger, args.logdir, stop),
daemon=True,
)
+435
View File
@@ -0,0 +1,435 @@
#!/usr/bin/env python3
"""
serial_watch.py — Instantel Series-3 serial monitor with S3 frame parsing.
Taps the RS-232 line between the MiniMate Plus and its modem (RV50/RV55).
Saves raw binary captures compatible with the rest of the analysis toolchain,
plus a human-readable frame log.
Usage
-----
python bridges/serial_watch.py # interactive COM picker
python bridges/serial_watch.py --port COM3 # specify port
python bridges/serial_watch.py --port COM3 --ack-ok # reply OK to AT commands
# (useful if modem is absent
# and you want the device to
# proceed past AT negotiation)
python bridges/serial_watch.py --list # list available ports
Output
------
bridges/captures/serial_<ISO-timestamp>/
raw_s3_<ts>.bin — raw bytes from device (feeds directly into S3FrameParser)
session_<ts>.log — human-readable frame + control-line log
session_<ts>.jsonl — JSON-lines frame log
The raw_s3_*.bin file is byte-for-byte compatible with the existing capture
format used by bridges/parse_capture.py and all analysis scripts.
What to look for in a call-home capture
----------------------------------------
1. Does the device talk first after CONNECT, or does it wait?
- If raw_s3_*.bin has bytes before any AT/POLL exchange → PUSH protocol
- If it stays silent → PULL protocol (same as Blastware manual download)
2. Look for "Operating System" ASCII at the start — the device sends this 16-byte
boot string on cold start before entering DLE-framed mode.
3. RING/CONNECT from the modem appear as ASCII before the DLE frames — the parser
handles these automatically (scans forward to DLE+STX).
"""
from __future__ import annotations
import argparse
import sys
import threading
import time
from datetime import datetime
from pathlib import Path
try:
import serial
from serial.tools import list_ports
except ModuleNotFoundError:
print(
"pyserial not found. Install with:\n python -m pip install pyserial",
file=sys.stderr,
)
sys.exit(1)
# Add project root so we can import the frame parser
sys.path.insert(0, str(Path(__file__).parent.parent))
from minimateplus.framing import S3FrameParser, S3Frame
import json
# ── Helpers ───────────────────────────────────────────────────────────────────
def _ts() -> str:
return datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]
def _hexdump(b: bytes) -> str:
return " ".join(f"{x:02X}" for x in b)
def _printable(b: bytes) -> str:
return b.decode("latin1", errors="replace")
_KNOWN_SUBS = {
0xA4: "POLL_RSP", 0xA5: "BULK_WAVEFORM_RSP", 0xE0: "ADVANCE_EVENT_RSP",
0xE1: "EVENT_IDX_FIRST_RSP", 0xE3: "MONITOR_STATUS_RSP", 0xEA: "SERIAL_NUM_RSP",
0xF3: "WAVEFORM_RECORD_RSP", 0xF5: "WAVEFORM_HEADER_RSP", 0xF7: "EVENT_INDEX_RSP",
0xF9: "UNK_06_RSP", 0xFE: "DEVICE_INFO_RSP",
0x69: "START_MONITOR_ACK", 0x68: "STOP_MONITOR_ACK",
0x97: "EVT_IDX_WRITE_ACK", 0x8C: "CONFIRM_B_ACK", 0x8E: "COMPLIANCE_WRITE_ACK",
0x8D: "CONFIRM_A_ACK", 0x7D: "TRIGGER_WRITE_ACK", 0x7C: "TRIGGER_CONFIRM_ACK",
0x96: "WAVEFORM_WRITE_ACK", 0x8B: "CONFIRM_C_ACK",
}
def _label_frame(frame: S3Frame) -> str:
name = _KNOWN_SUBS.get(frame.sub, f"UNK_0x{frame.sub:02X}")
chk = "" if frame.checksum_valid else "✗ BAD_CHK"
peek = frame.data[:24].hex() + ("" if len(frame.data) > 24 else "")
return (
f"S3 SUB=0x{frame.sub:02X} ({name:<22}) "
f"page=0x{frame.page_key:04X} data={len(frame.data):4d}B {chk} {peek}"
)
# ── Logger ────────────────────────────────────────────────────────────────────
class Logger:
def __init__(self, log_path: Path, jsonl_path: Path, raw_path: Path) -> None:
self._log = log_path.open("a", encoding="utf-8", newline="")
self._jl = jsonl_path.open("a", encoding="utf-8", newline="")
self._raw = raw_path.open("ab")
self._lock = threading.Lock()
self._frame_count = 0
def info(self, msg: str) -> None:
line = f"[{_ts()}] INFO | {msg}"
with self._lock:
print(line)
print(line, file=self._log, flush=True)
def ctrl(self, msg: str) -> None:
line = f"[{_ts()}] CTRL | {msg}"
with self._lock:
print(line)
print(line, file=self._log, flush=True)
def data_hex(self, msg: str) -> None:
line = f"[{_ts()}] HEX | {msg}"
with self._lock:
print(line)
print(line, file=self._log, flush=True)
def data_ascii(self, msg: str) -> None:
line = f"[{_ts()}] DATA | {msg}"
with self._lock:
print(line)
print(line, file=self._log, flush=True)
def frame(self, f: S3Frame) -> None:
with self._lock:
self._frame_count += 1
label = f"[{_ts()}] FRAME | #{self._frame_count:04d} {_label_frame(f)}"
print(label)
print(label, file=self._log, flush=True)
record = {
"frame": self._frame_count,
"sub": f.sub,
"page_key": f.page_key,
"data_len": len(f.data),
"data_hex": f.data.hex(),
"checksum_valid": f.checksum_valid,
}
print(json.dumps(record), file=self._jl, flush=True)
def write_raw(self, data: bytes) -> None:
with self._lock:
self._raw.write(data)
self._raw.flush()
def close(self) -> None:
with self._lock:
for fh in (self._log, self._jl, self._raw):
try:
fh.flush()
fh.close()
except Exception:
pass
# ── Control-line monitor thread ───────────────────────────────────────────────
def _monitor_control_lines(
ser: serial.Serial,
logger: Logger,
stop: threading.Event,
interval: float,
) -> None:
prev = dict(CTS=None, DSR=None, DCD=None, RI=None)
try:
prev.update(CTS=ser.cts, DSR=ser.dsr, DCD=ser.cd)
try:
prev["RI"] = ser.ri
except Exception:
pass
except Exception as exc:
logger.ctrl(f"Init error: {exc}")
return
logger.ctrl(
f"Initial: CTS={prev['CTS']} DSR={prev['DSR']} DCD={prev['DCD']} RI={prev['RI']}"
)
while not stop.is_set():
try:
cur = dict(CTS=ser.cts, DSR=ser.dsr, DCD=ser.cd, RI=None)
try:
cur["RI"] = ser.ri
except Exception:
pass
for name, val in cur.items():
if val != prev[name]:
logger.ctrl(f"{name}{val}")
prev[name] = val
except serial.SerialException as exc:
logger.ctrl(f"Poll error: {exc}")
break
stop.wait(interval)
# ── Serial open ───────────────────────────────────────────────────────────────
_PARITY = {
"N": serial.PARITY_NONE, "E": serial.PARITY_EVEN, "O": serial.PARITY_ODD,
"M": serial.PARITY_MARK, "S": serial.PARITY_SPACE,
}
_STOPBITS = {
1: serial.STOPBITS_ONE, 1.5: serial.STOPBITS_ONE_POINT_FIVE, 2: serial.STOPBITS_TWO,
}
def _open_serial(args: argparse.Namespace, logger: Logger) -> serial.Serial | None:
for attempt in range(1, args.open_retries + 2):
logger.info(
f"Opening {args.port} @ {args.baud},{args.bytesize}{args.parity}{args.stopbits} "
f"rtscts={args.rtscts} xonxoff={args.xonxoff} dsrdtr={args.dsrdtr} "
f"(attempt {attempt})"
)
try:
ser = serial.Serial(
port=args.port,
baudrate=args.baud,
bytesize=args.bytesize,
parity=_PARITY[args.parity],
stopbits=_STOPBITS[args.stopbits],
timeout=args.timeout,
xonxoff=args.xonxoff,
rtscts=args.rtscts,
dsrdtr=args.dsrdtr,
write_timeout=0,
)
try:
ser.setDTR(args.dtr == "on")
ser.setRTS(args.rts == "on")
logger.ctrl(f"Set DTR={args.dtr} RTS={args.rts}")
except Exception as exc:
logger.ctrl(f"DTR/RTS set failed: {exc}")
if args.send_break > 0:
try:
ser.break_condition = True
time.sleep(args.send_break / 1000.0)
ser.break_condition = False
logger.ctrl(f"BREAK held {args.send_break} ms")
except Exception as exc:
logger.ctrl(f"BREAK failed: {exc}")
return ser
except serial.SerialException as exc:
logger.info(f"Open failed: {exc}")
if attempt <= args.open_retries:
time.sleep(args.open_retry_delay)
return None
# ── Port picker ───────────────────────────────────────────────────────────────
def _list_ports() -> list:
ports = list(list_ports.comports())
if not ports:
print("No serial ports found.")
return []
print("Available serial ports:")
for i, p in enumerate(ports, 1):
print(f" {i:2d}) {p.device:<12} {p.description or ''}")
return ports
def _pick_port() -> str:
ports = _list_ports()
if not ports:
sys.exit(1)
if len(ports) == 1:
print(f"Auto-selecting: {ports[0].device}")
return ports[0].device
while True:
sel = input("Select port (number or name, e.g. COM3): ").strip()
if sel.isdigit() and 1 <= int(sel) <= len(ports):
return ports[int(sel) - 1].device
for p in ports:
if p.device.upper() == sel.upper():
return p.device
print("Not recognised. Enter list number or exact port name.")
# ── Main loop ─────────────────────────────────────────────────────────────────
def main() -> None:
ap = argparse.ArgumentParser(
description="Monitor Instantel Series-3 serial traffic with S3 frame parsing."
)
ap.add_argument("--port", "-p",
help="COM port (e.g. COM3). Omit to be prompted.")
ap.add_argument("--baud", "-b", type=int, default=38400)
ap.add_argument("--bytesize", type=int, choices=[5, 6, 7, 8], default=8)
ap.add_argument("--parity", choices=["N", "E", "O", "M", "S"], default="N")
ap.add_argument("--stopbits", type=float, choices=[1, 1.5, 2], default=1)
ap.add_argument("--rtscts", action="store_true")
ap.add_argument("--xonxoff", action="store_true")
ap.add_argument("--dsrdtr", action="store_true")
ap.add_argument("--dtr", choices=["on", "off"], default="on")
ap.add_argument("--rts", choices=["on", "off"], default="on")
ap.add_argument("--send-break", type=int, default=0,
help="Hold BREAK for N ms after open.")
ap.add_argument("--show", choices=["ascii", "hex", "both", "frames"],
default="frames",
help="'frames' (default) shows only parsed S3 frames. "
"'ascii'/'hex'/'both' also show raw bytes.")
ap.add_argument("--encoding", default="latin1")
ap.add_argument("--read-chunk", type=int, default=4096)
ap.add_argument("--timeout", type=float, default=0.05)
ap.add_argument("--poll-lines-interval", type=float, default=0.2)
ap.add_argument("--open-retries", type=int, default=0)
ap.add_argument("--open-retry-delay", type=float, default=0.8)
ap.add_argument("--ack-ok", action="store_true",
help="Auto-reply OK to AT* commands (except ATDT). "
"Useful for testing without a real modem.")
ap.add_argument("--list", action="store_true",
help="List available serial ports and exit.")
args = ap.parse_args()
if args.list:
_list_ports()
return
args.port = args.port or _pick_port()
# Build output paths
ts_str = datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = Path(__file__).parent / "captures" / f"serial_{ts_str}"
out_dir.mkdir(parents=True, exist_ok=True)
log_path = out_dir / f"session_{ts_str}.log"
jsonl_path = out_dir / f"session_{ts_str}.jsonl"
raw_path = out_dir / f"raw_s3_{ts_str}.bin"
logger = Logger(log_path, jsonl_path, raw_path)
logger.info(f"Output directory: {out_dir}")
logger.info(f"raw_s3 → {raw_path.name} (compatible with parse_capture.py)")
ser = _open_serial(args, logger)
if ser is None:
logger.info("Could not open serial port. Exiting.")
logger.close()
sys.exit(1)
s3_parser = S3FrameParser()
rx_buf = bytearray()
stop_evt = threading.Event()
ctrl_thread = threading.Thread(
target=_monitor_control_lines,
args=(ser, logger, stop_evt, args.poll_lines_interval),
daemon=True,
)
ctrl_thread.start()
logger.info("Monitoring started. Waiting for call-home. Press Ctrl+C to stop.")
try:
while True:
try:
data = ser.read(args.read_chunk)
except serial.SerialException as exc:
logger.info(f"Read error: {exc}")
break
if not data:
continue
# 1. Save raw bytes
logger.write_raw(data)
# 2. Optional raw display
if args.show in ("ascii", "both"):
txt = _printable(data)
for line in txt.splitlines():
logger.data_ascii(line)
if args.show in ("hex", "both"):
logger.data_hex(_hexdump(data))
# 3. Parse S3 frames
for byte in data:
result = s3_parser.feed(bytes([byte]))
if result:
frames = result if isinstance(result, list) else [result]
for f in frames:
logger.frame(f)
# 4. AT command handling for --ack-ok
if args.ack_ok:
rx_buf.extend(data)
while b"\r" in rx_buf or b"\n" in rx_buf:
for sep in (b"\r", b"\n"):
idx = rx_buf.find(sep)
if idx != -1:
line_bytes = bytes(rx_buf[:idx])
del rx_buf[:idx + 1]
break
else:
break
line_str = line_bytes.decode("latin1", errors="ignore").strip().upper()
if line_str.startswith("AT") and not line_str.startswith("ATDT"):
try:
ser.write(b"\r\nOK\r\n")
ser.flush()
logger.info(f"AT ack: {line_str!r} → OK")
except Exception as exc:
logger.info(f"AT ack write failed: {exc}")
except KeyboardInterrupt:
logger.info("Ctrl+C — stopping.")
finally:
stop_evt.set()
try:
ser.close()
except Exception:
pass
ctrl_thread.join(timeout=1.0)
logger.info(f"Capture saved to: {out_dir}")
logger.close()
if __name__ == "__main__":
main()
+185
View File
@@ -0,0 +1,185 @@
# Histogram body codec — FULLY DECODED (2026-05-20)
Clean working status doc for the MiniMate Plus histogram-mode event
body codec. Companion to `waveform_codec_re_status.md`. The deep
historical record (with retractions and dated analyses) lives in
`docs/instantel_protocol_reference.md §7.6.2`; the authoritative
implementation lives in `minimateplus/histogram_codec.py`.
## TL;DR
**The codec is fully decoded.** Every field of every block in the
in-repo histogram fixture corpus decodes byte-exact against BW's
ASCII export.
26 regression tests pass against ~3,500 blocks across 5 in-repo
fixtures, plus a synthetic regression block taken from a real
BE9558 prod event to lock in the uint8-peak interpretation.
**Important correction (2026-05-21):** the per-channel peak count
is `uint8` at byte[6]/[10]/[14]/[18], NOT `uint16 LE` at byte[6:8]
etc. The N844 fixture corpus the original RE was done against has
zero values in bytes [7]/[11]/[15]/[19] for every block, so the
two interpretations happened to be equivalent. Cross-correlating
non-N844 events (BE9558 Tran-drift, BE18003 Histogram+Continuous)
against BW's per-interval ASCII export — 4 channels × ~1400 blocks
per event × multiple events = 100% byte-exact only when the peak
is read as uint8. Reading as uint16 LE produced peaks up to 268
in/s per channel and 35× inflated PVS sums when first deployed to
prod (rolled back, root-caused, and fixed in commit 7183b95+1).
## Body format
```
body = [stream of 32-byte data blocks] + [small trailing remnant]
```
Each block represents one histogram interval. Block layout:
```
[0] 0x00 always-zero tag
[1] segment_id (uint8) 0x00..0x03 — 256 blocks per segment
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, …)
[4:6] 0x000a (uint16 LE) constant marker (= 10)
[6] T_peak_count uint8 Tran peak (count × 0.005 → in/s at Normal,
max 1.275 in/s — fits in uint8)
[7] T_annotation uint8 empirically non-zero on intervals with sub-Hz
or unmeasurable freq; meaning not fully RE'd
[8:10] T_halfperiod uint16 LE Tran half-period in samples
(freq_Hz = 512 / halfp; ≤ 5 means ">100 Hz")
[10] V_peak_count uint8 Vert peak
[11] V_annotation uint8
[12:14] V_halfperiod uint16 LE Vert freq half-period
[14] L_peak_count uint8 Long peak
[15] L_annotation uint8
[16:18] L_halfperiod uint16 LE Long freq half-period
[18] M_peak_count uint8 MicL peak count
(dB via waveform_codec.mic_count_to_db)
[19] M_annotation uint8
[20:22] M_halfperiod uint16 LE MicL freq half-period
[22:24] 0x00 0x00 constant
[24:28] 4-byte variable purpose unknown — possibly CRC,
timestamp delta, or psi(L) numeric;
not needed for waveform reconstruction
[28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature
```
Reliable block-identification anchor:
```python
block[22:24] == b"\x00\x00" and block[28:32] == b"\x1e\x0a\x00\x00"
```
(The `1e 0a 00 00` constant tail is the most distinctive signature.)
## Per-channel encoding
| Channel | Peak encoding | Frequency encoding |
|---|---|---|
| Tran | count × 0.005 = in/s at Normal range | `freq_Hz = 512 / halfperiod` |
| Vert | same | same |
| Long | same | same |
| MicL | count → dB via `mic_count_to_db(count)` (same formula as waveform codec) | same |
**`>100 Hz` sentinel**: when halfperiod ≤ 5 (giving ≥100 Hz from the
512/halfp formula), BW displays `>100 Hz`. Codec's `half_period_to_hz`
returns `None` in this range.
## Verified facts (cross-checked against fixture corpus)
Example: N844L6Z8.ZR0H block 130 → all 8 decoded fields byte-exact:
```
binary samples [10, 6, 24, 4, 18, 5, 21, 5, 9]
TXT row [0.030, 21, 0.020, 28, 0.025, 24, 0.040, 0.000, 95.92, 57]
slot[0] = 10 marker
slot[1] = 6 × 0.005 = 0.030 in/s ✓ T_peak
slot[2] = 24 → 512/24 = 21.3 → 21 Hz ✓ T_freq
slot[3] = 4 × 0.005 = 0.020 in/s ✓ V_peak
slot[4] = 18 → 512/18 = 28.4 → 28 Hz ✓ V_freq
slot[5] = 5 × 0.005 = 0.025 in/s ✓ L_peak
slot[6] = 21 → 512/21 = 24.4 → 24 Hz ✓ L_freq
slot[7] = 5 → 81.94 + 20·log10(5) = 95.92 dB ✓ M_peak
slot[8] = 9 → 512/9 = 56.9 → 57 Hz ✓ M_freq
```
## Verified test coverage
`tests/test_histogram_codec.py` (24 tests):
- Block walking: yields one record per `.TXT` interval ± 1 (off-by-one
at the tail when recording was stopped mid-write). Segment-ID
groups of 256 blocks confirmed.
- Geo peaks: every block of N844L20G, N844L6Z8, N844L6XE, N844L23B
matches `.TXT` within the 0.0005 in/s quantization step.
- Geo freqs: every block of N844L6Z8 and N844L6XE matches `.TXT`
within 1 Hz (BW display rounds). `>100 Hz` sentinel handled correctly.
- Mic dB: every block of N844L6XE, N844L23B, N844L6Z8 matches `.TXT`
within 0.1 dB (BW display precision).
- Mic freq: matches `.TXT` within 1 Hz across active blocks.
## What's NOT yet decoded
- **Annotation bytes (`block[7]/[11]/[15]/[19]`)**. Empirically
non-zero on intervals where the per-channel ZC frequency comes
out as `N/A` or sub-Hz (`<1.0`, `1.X`). Hypothesis tested in the
RE session: byte != 0 ↔ sub-Hz freq. Only ~50% correlation
across the K558 corpus, so the relationship is more complex.
Possibilities: time-of-peak-within-interval, halfp extension for
very-long-period signals, or a debug/diagnostic field the firmware
writes opportunistically. Doesn't affect peak amplitudes or
waveform reconstruction. Captured as `record["annotations"]` for
future RE.
- **4-byte variable metadata field (bytes 24:28)**. Not needed for
waveform reconstruction. Speculation: per-block CRC, sub-second
timestamp offset, or a Mic psi(L) count not in the 9 samples.
Punt until something needs it.
- **Geo PVS (TXT col 7, e.g. "0.040 in/s")**. Not stored in the
block; can be approximated as `sqrt(T_peak² + V_peak² + L_peak²)`
but BW's value sometimes differs slightly (probably computed from
waveform-instant samples, not from per-channel peaks). Punt — the
`.h5` consumers don't need PVS as a sample channel.
- **Mic psi(L) value (TXT col 8)**. TXT shows it as a small psi value
derived from the dB measurement. Not in the 9 samples. Could be
derived from `M_peak_count` via the inverse of the dB formula plus
a psi calibration constant. Defer.
## Output shape
`decode_histogram_body` returns the standard 4-channel dict that
mirrors `waveform_codec.decode_waveform_v2`'s output:
```python
{
"Tran": [peak_count_per_interval, ...], # 16-count units (LSB = 0.005 in/s)
"Vert": [..., ...],
"Long": [..., ...],
"MicL": [..., ...], # raw ADC counts
}
```
Run through `waveform_codec.decoded_to_adc_counts` to get 1-count ADC
units (geo ×16, mic passthrough) for the standard `.h5` writer.
For the full per-interval record with frequencies + metadata, use
`decode_histogram_body_full()`.
## Where it's wired
- `minimateplus/event_file_io.py:read_blastware_file()` — first tries
the waveform codec, falls back to the histogram codec when the
waveform preamble isn't present. Same output shape, same
downstream pipeline.
- `scripts/backfill_sidecars.py` — the `has_samples` short-circuit
added during the histogram-codec-pending era still serves as a
defensive guard against truly undecodable files, but no longer
fires for valid histograms.
## Companion reference
- `docs/waveform_codec_re_status.md` — sibling status doc for the
much-more-complex waveform-mode codec.
- `docs/instantel_protocol_reference.md §7.6.2` — historical
protocol-reference entry. Structural framing matches what we
found; per-sample semantics were less documented than the `✅
CONFIRMED` badge suggested. This doc supersedes §7.6.2 where they
conflict on confidence level.
+341
View File
@@ -0,0 +1,341 @@
# IDF Protocol Reference — Thor / Micromate Series IV
Starting-point reference for reverse-engineering Instantel's Micromate
Series IV event-file format. Sibling to
[instantel_protocol_reference.md](instantel_protocol_reference.md) (the
Series III "Rosetta Stone") — this doc holds what we know so far and
the open questions still to crack.
**Status (2026-05-28):** ASCII text sidecar fully decoded (1,014
sample files round-trip). **Thor IDFW** binary now decodes via
`micromate.idf_file.read_idf_file()` — reuses the BW segment-rotated
block codec verbatim at fixed body offset `0x0f1f`; metadata (serial,
timestamp, sample_rate, record_time, calibration_date) extracted from
the binary header. Sample fidelity is 8799% byte-exact on quiet
events; loud events hit the BW codec's known walker-stops-early
limitation. Residual ~3% drift on per-sample deltas (likely a
Thor-specific 12-bit delta refinement not yet modelled).
**Thor IDFH histograms also decoded.** Body has one or more segments;
each 12-byte segment header `[length_be 2B][0a 00 00 00][00 NN][05 3f]`
introduces `N = (length - 10) // 72` interval records of 72 bytes
each. Each interval = 4 × 16-byte per-channel records:
`[int16 min][int16 max][int16 ??][uint16 halfp][2B 00][uint16 ??][2B 00][uint16 ??]`.
Geo peak `= max(|min|, |max|) / 32768 × 10` in/s (matches sidecar
~1.8%); freq `= 512 / halfp` Hz (None for halfp ≤ 5 → ">100"
sentinel). Corpus: **all 859 Thor IDFH files decode, 181,071
intervals**. Wired through `read_idf_file()`
`save_imported_idf()` → sidecar's `extensions.idf_intervals`.
**Note on the BE9439 outliers in the example corpus:** Two files
(`BE9439_20200713131747.IDFW` and `BE9439_20200713124251.IDFH`) are
**Series III Blastware** binaries, not Thor. Provenance: TMI tried
to use Thor to manage auto-call-homes for Series III units; the
experiment didn't work out, but it did leave a few BW event files
in Thor's per-serial directory structure with `.IDFW`/`.IDFH`
extensions — Thor's forwarder applied its own naming convention to
the BW bodies it was relaying. Their header `10 00 01 80 00 00
Instantel STRT ff fe <end_key> <start_key>` is the BW SUB 5A STRT
record, not a Thor body preamble. The reader detects them by
signature and raises `NotImplementedError` pointing callers at
`read_blastware_file()`, which extracts BW-format peaks from them.
**Still NYI for Thor IDFH:** per-channel `int16 field4` (possibly
time-of-peak); the two uint16 fields (probably PVS contributions);
8-byte interval tail (PVS data); mic dB(L) exact conversion constant.
### Codec breakthroughs (2026-05-28)
- **Body offset is a fixed `0x0f1f`** across 151/154 corpus IDFW
files. Preceded by a 4-byte record-type marker (`46 00 00 00`)
+ magic preamble `00 02 00 [Tran[0] BE] [Tran[1] BE]`.
- **Sample stream is BW's segment-rotated block codec verbatim.**
Thor reuses `10 NN` (nibble), `20 NN` (int8), `00 NN` (RLE),
`30 NN` (packed12), `40 02` (segment header) tags with the same
semantics. Channel rotation Tran→Vert→Long→MicL.
- **Geo LSB = 0.0003 in/s** (not BW's 0.005), because Thor's 16-bit
ADC range maps to 10 in/s without the 16-count BW quantization step.
- **Mic ≈ 2.14×10⁻⁶ psi/count** (rough scale; refine after channel
block calibration constants are decoded).
- **BW compliance anchor `\xbe\x80\x00\x00\x00\x00` reappears at
IDFW offset 0x952** — sample_rate at anchor6 (uint16 BE),
record_time at anchor+6 (float32 BE), same layout as BW.
- **Event timestamp at offset 0x97A** — 8 bytes `[day][month]
[year_be][unk][hour][min][sec]`. Stop-time mirrors at 0x982.
- **Serial as null-terminated ASCII at 0x14E**.
- **Calibration date** at 0x1940x197 (day, month, year_be).
- Per-sample residual drift of ~3% suggests Thor encodes int8/nibble
deltas with an extra refinement bit that BW doesn't carry —
unsolved; errors resync within a few samples so cumulative impact
is small.
---
## File model
### Filename convention
```
<SERIAL>_<YYYYMMDDHHMMSS>.<KIND>
```
- **SERIAL** — literal device serial, two-letter prefix + numeric
suffix. Examples seen: `UM11719`, `UM13981`, `UM20147`, `BE9439`.
Unlike Series III BW filenames (`M529LK44.AB0`, base-36 stem),
Series IV filenames carry the serial in plain text.
- **YYYYMMDDHHMMSS** — 14-char ASCII timestamp in **device local
time** (no timezone marker).
- **KIND** — `IDFH` for histograms, `IDFW` for waveforms.
The `.IDFH.txt` / `.IDFW.txt` ASCII sidecar lives in a `TXT/`
**subfolder** of the unit's directory, not alongside the binary.
This pairing convention is encoded in
`event_forwarder.idf_report_path()`.
### Directory layout
```
C:\THORDATA\
└── <Project>\
└── <UM####>\ ← unit serial dir
├── UM12345_20260520100000.MLG ← monitor log (not events)
├── UM12345_20260520100000.IDFH ← histogram event (binary)
├── UM12345_20260520100000.IDFW ← waveform event (binary)
├── UM12345_20260520100000.IDFW.CDB ← cache-DB variant (skip)
├── TXT\
│ ├── UM12345_20260520100000.IDFH.txt ← histogram ASCII sidecar
│ └── UM12345_20260520100000.IDFW.txt ← waveform ASCII sidecar
├── CSV\, HTML\, PDF\, XML\ ← operator-facing derived exports
└── ...
```
The `.IDFW.CDB` files share the binary's basename but appear to be a
separate cache/database variant. Their first 8 bytes match the
**old**-firmware Thor signature (see below) regardless of which
signature the paired `.IDFW` uses. Purpose unknown; sizes vary
wildly (observed 123 B → 40,491 B). Thor-watcher's forwarder
deliberately skips them.
### Sample corpus
The `thor-watcher/example-data/THORDATA_example/` tree carries
**1,014 paired .IDFW / .IDFH + .txt files** spanning 20202023
across nine units (UM11719, UM13981, UM20147, …, plus BE9439 from
2020). This is the reverse-engineering ground truth.
---
## ASCII sidecar (`.IDFW.txt` / `.IDFH.txt`) — fully decoded
Shape: plain text, one `"Key : Value"` line per metadata field,
followed for waveforms by a tab-separated sample table headed by
the literal line `Waveform Data Channels`. Parsed by
[`micromate/idf_ascii_report.py`](../micromate/idf_ascii_report.py).
See [`micromate/models.py`](../micromate/models.py) for the typed
`IdfReport` shape.
### Notable conventions
- **Units are native to Thor** — geophone in **in/s**, microphone in
**dB(L)** (not psi like Series III BW reports), frequency in Hz,
acceleration in g, displacement in in.
- **Below-threshold readings** appear as the literal string
`<0.005 in/s` (155 occurrences in the sample corpus) — the parser
strips the `<` and treats the numeric remainder as the value.
- **Out-of-range / not-measured** values appear as `N/A` — parser
drops the field rather than letting the string leak into a numeric
column.
- **Firmware string** observed: `Micromate ISEE 11.0AK`.
- **TitleString1..4** are operator-defined free-text slots; Thor's
default labels map them to Location / Client / Company / Notes,
which the parser surfaces as `project` / `client` / `operator` /
`notes`.
- **Histogram sidecars** use `HistogramStartDate` / `HistogramStartTime`
in place of waveform's `EventDate` / `EventTime`. Parser falls
through to either.
- **Histogram tabular block** lacks the `Waveform Data Channels`
marker; instead it's a multi-line column header followed by
per-interval rows (`<date> <time> <tran-ppv> <freq> ...`). Parser
silently ignores lines after the metadata block since they lack a
colon-separated `key : value` shape (the timestamps DO contain
colons but produce garbage keys that don't collide with any
recognised field).
---
## Binary header signatures (observed)
Hex dump of the first 32 bytes across 1,014 sample files reveals
**two distinct file signatures**, both anchored by the literal
ASCII string `"\x00Instantel\x00"` at offset 616:
### Signature A — newer firmware (1,012 files, 99.8% of corpus)
```
00000000: 0012 0100 0000 496e 7374 616e 7465 6c00 ......Instantel.
00000010: 0000 a695 002e b500 4f70 6572 6174 6f72 ........Operator
^^^^^^^^^^^^^^^^
operator/title string starts at 0x18
```
Header bytes 05: `00 12 01 00 00 00`. Followed immediately by the
8-byte ASCII tag, then 6 unknown bytes, then ASCII operator-supplied
strings (Operator name, etc.) and on through the project / client /
title strings. No `STRT` record observed in this layout.
### Signature B — older firmware (2 files: BE9439 from 2020)
```
00000000: 1000 0180 0000 496e 7374 616e 7465 6c00 ......Instantel.
00000010: 072c 0012 0300 5354 5254 fffe 0111 2340 .,....STRT....#@
^^^^^^^^^ ^^^^^^^^^
STRT magic 4-byte end_key
00000020: 0111 0000 2e5f 00ac 4600 0000 0200 0000 ....._..F.......
^^^^^^^^^ ^^^
4-byte start_key 0x46 (BW WAVEHDR record-type marker)
```
Header bytes 05: `10 00 01 80 00 00`. The structure after the
`Instantel` magic is **byte-for-byte identical to a BW SUB 5A
probe-response STRT record** as documented in
[instantel_protocol_reference.md → "SUB 5A — STRT record encodes
end_offset"](instantel_protocol_reference.md). Specifically:
| Offset | Bytes | Meaning (per BW reference) |
|--------|---------------------|--------------------------------------|
| 0x14 | `53 54 52 54` | `STRT` magic |
| 0x18 | `ff fe` | STRT sentinel |
| 0x1A | `01 11 23 40` | `end_key` (4 bytes) |
| 0x1E | `01 11 00 00` | `start_key` (4 bytes) |
| 0x26 | `46` | `0x46` waveform-record type marker |
**Hypothesis:** Older Micromate firmware writes a wrapped BW-format
event into the `.IDFW` file — essentially the same on-disk shape as
a Series III device, with the new filename convention applied at
export time. Newer firmware (signature A) abandoned the
BW-compatible layout for an Instantel-specific format.
If that hypothesis holds, the 2 signature-B files can already be
parsed via `minimateplus/event_file_io.read_blastware_file()` — worth
testing. The 1,012 signature-A files are the real reverse-engineering
target.
### `.IDFW.CDB` cache files
Always carry signature B (`10 00 01 80 ...`), even when the paired
`.IDFW` carries signature A. Plausible explanation: the CDB is an
internal Thor cache-database export that retains the legacy BW-style
record layout regardless of the user-facing `.IDFW` format version.
Not currently consumed by the forwarder.
---
## File-size patterns (Signature A, the main target)
Survey of 1,012 signature-A files:
| Event type | Typical size | Source of variance |
|--------------|-------------------|----------------------------------------------|
| `.IDFW` 2-sec | 9,200 10,500 B | Operator-supplied strings (TitleString1..4) of varying length |
| `.IDFH` | 2,944 4,076 B | Histogram interval count (record duration / interval) |
**Naive arithmetic for 2-sec waveform:**
- 4 channels × 2 sec × 1024 sps = 8,192 samples
- At 2 bytes/sample (int16) = 16,384 sample bytes → file would be > 16 KB
- Observed: ~910 KB
- → samples are likely **1 byte each** (int8 quantised), **or** stored
with bit-packing / delta encoding, **or** only one channel's
full-rate samples are stored with the others reconstructed
arithmetically. Verifying this is the **first RE milestone**.
Project-stringlength variance (~1 KB across the corpus) is consistent
with the file carrying a single copy of each TitleString1..4 plus
operator + setup-name as null-padded ASCII regions.
---
## Open questions
The reverse-engineering targets, roughly in dependency order:
1. **Sample encoding (signature A)** — int8? int16 LE/BE? Bit-packed?
Delta-coded? Per-channel interleaved or sequential blocks?
2. **Header field layout (signature A)** — where do sample_rate,
record_time, channel count, and per-channel peaks live in the
binary? The ASCII sidecar gives the device-authoritative values,
so binary fields can be confirmed by diff.
3. **Operator-string offsets** — `Operator` at 0x18 is the first
visible string in signature-A files; the rest (project, client,
notes, setup) follow. Need to map exact offsets and null-padding
conventions.
4. **Signature-B → BW codec compatibility** — does
`minimateplus/event_file_io.read_blastware_file()` actually parse
the 2 BE9439 signature-B files as-is? If yes, the OLD-format
ingest is free.
5. **`.IDFW.CDB` purpose** — is it an internal Thor cache, a
ring-buffer dump, or something else? Worth a single small effort
to characterise so we know what we're skipping.
6. **Footer / checksum** — every BW event file has a footer; does
IDF? Where does the per-channel sample block end?
---
## Reverse-engineering playbook (when we start)
The Series III BW codec took ~2 months of MITM wire captures
because we didn't have ground-truth metadata. Thor's situation is
**substantially better**:
- **Ground truth is on disk.** Every binary in `example-data/`
has a paired `.IDFW.txt` carrying the full decoded sample table
(`Waveform Data Channels` block — see any sample file in
`thor-watcher/example-data/.../TXT/`). Aligning binary bytes
to the table's float-per-row values gives an immediate per-byte
hypothesis test.
- **Cross-event diffing.** 1,012 signature-A samples from 9 units
spanning 4 years means any field that varies between events is
immediately localisable. Fields that are constant across all
files (firmware ID, channel labels, format-version word) are also
immediately localisable by complementary search.
- **No protocol surface.** Files at rest, not a wire dialect. No
DLE stuffing, no inner-frame parsing, no probe/data two-step.
Suggested first session (2-4 hours): hand-decode `UM11719_20231219162723.IDFW`
(10,290 bytes) against its `TXT/UM11719_20231219162723.IDFW.txt`
sample table (the 2-sec waveform at 1024 sps × 4 channels = 8,192
sample rows). Find the first per-channel sample value (`0.0003` in
the Tran column at t=0) in the binary. Confirms sample encoding.
Everything else flows from there.
---
## Code seams ready to receive the codec
When the codec lands, it goes into
[`micromate/idf_file.py`](../micromate/idf_file.py) (currently a
stub raising `NotImplementedError`). Public API:
```python
from micromate import IdfEvent
from micromate.idf_file import read_idf_file
event: IdfEvent = read_idf_file(Path("UM11719_20231219163444.IDFW"))
# event.peaks.transverse_ips, event.timestamp, event.raw_samples, ...
```
The ingest pipeline (`WaveformStore.save_imported_idf`) currently
builds the `IdfEvent` from the `.txt` parser only. Once
`read_idf_file()` works, the binary becomes authoritative; the
`.txt` parser drops to fast-path metadata cross-check. Operators
who don't enable Thor's TXT exporter still get fully populated
events.
---
## See also
- [instantel_protocol_reference.md](instantel_protocol_reference.md) — Series III BW protocol reference (the Rosetta Stone). STRT record format, DLE framing, BW filename encoding.
- [`micromate/idf_ascii_report.py`](../micromate/idf_ascii_report.py) — `.txt` sidecar parser.
- [`micromate/models.py`](../micromate/models.py) — `IdfEvent`, `IdfReport` typed dataclasses.
- [`micromate/idf_file.py`](../micromate/idf_file.py) — placeholder for the binary codec.
- [`thor-watcher/example-data/THORDATA_example/`](../../thor-watcher/example-data/) — 1,014 paired binary + .txt files for codec validation.
File diff suppressed because it is too large Load Diff
+255
View File
@@ -0,0 +1,255 @@
# Runbook — Recovering a wedged unit stuck in a call-home loop
**Original incident:** BE9558H at `166.246.130.1:9034`, recovered 2026-05-17.
A field unit with a stuck-triggered geophone (or any hardware fault causing
constant event triggering) will record events back-to-back, and if Auto Call
Home is set to "After Event Recorded" the device will dial the office BW
ACH server in a tight loop. Combined with a Sierra Wireless modem in
bidirectional serial-TCP mode, this makes the unit effectively unreachable
from SFM — every TCP connection we open gets killed when the modem flips
from server-mode to client-mode to honor the device's next AT dial command.
This runbook describes how to break the loop and recover control.
---
## Symptoms
- Terra-View / SFM `/device/info` either hangs or fails on `count_events()`.
- `/device/monitor/status` and `/device/rescue` return 502 (protocol timeout
waiting for POLL response) or 503 (TCP connect refused).
- ACEmanager serial log shows repeating
`Connect to IP: <BW_IP> Port: <BW_PORT>``Shutdown TCP socket` cycles
every 30-60 seconds.
- Spam-mode endpoints (`/device/stop_monitoring_spam`) report many
`sent_ok` but the device's monitoring state never changes.
- `slow_drip` reports `[Errno 32] Broken pipe` after sending the preamble
but before completing the drip loop.
If you see *all* of these, the unit is in this exact failure mode.
---
## Quick reference — how to recover
You need **ACEmanager access** to the unit's modem.
### Step 1: stop the modem's mode-flipping
In ACEmanager → **Serial → Port Configuration**:
| Field | Set to |
|---|---|
| **Destination Address** | clear (blank) |
| **Destination Port** | `0` |
Click **Apply**. This removes the modem's auto-dial-out target. The device's
AT dial commands now error back at the modem instead of triggering a
mode-flip, so the modem stays in TCP-server mode permanently and our inbound
TCP sessions stay alive.
*(Optional belt-and-suspenders: also add the BW server's port to
**Security → Port Filtering - Outbound** as a blocked port, with
Outbound Port Filtering Mode = Blocked Ports.)*
### Step 2: stop monitoring on the device (slow drip)
From the SFM host:
```bash
/home/serversdown/seismo-relay/scripts/slow_drip.sh <DEVICE_IP> <PORT>
```
Defaults are 120s duration with a drip every 3s. Watch the response:
- `duration_s ≈ 120` and `drips_sent ≈ 40` → session held the full duration ✓
- `bytes_received > 0` → device is responding ✓ (this is the success signal)
If `duration_s` is small or `send_error: "Broken pipe"`, Step 1 didn't take
hold — re-check ACEmanager, may need to reboot the modem after Apply.
### Step 3: confirm monitoring stopped
```bash
curl 'http://localhost:8200/device/monitor/status?host=<DEVICE_IP>&tcp_port=<PORT>&force=true'
# expect: {"is_monitoring": false, ...}
```
### Step 4: disable ACH at the device level + erase corrupted events
Either fire the rescue endpoint:
```bash
/home/serversdown/seismo-relay/scripts/rescue_device.sh <DEVICE_IP> <PORT>
```
Or do the two steps manually:
```bash
# Disable ACH in the device's compliance config
curl -X POST 'http://localhost:8200/device/call_home?host=<DEVICE_IP>&tcp_port=<PORT>' \
-H 'Content-Type: application/json' \
-d '{"auto_call_home_enabled": false}'
# Erase corrupted event chain
curl -X POST 'http://localhost:8200/device/events/erase?host=<DEVICE_IP>&tcp_port=<PORT>'
```
You can also do this via the SFM standalone UI → **Call Home** tab → set
`Enable Auto Call Home` to `Disabled`**Write to Device**.
### Step 5: restore modem config (housekeeping)
Once the device-side ACH is disabled, restore the modem's Destination
Address and Port to the original values (e.g. `50.197.32.92` / `12345`) in
ACEmanager. The modem will resume normal bidirectional behavior, but the
unit won't issue any dial commands until ACH is explicitly re-enabled on
the device.
### Step 6: do NOT re-enable ACH on this unit until the underlying hardware
fault is repaired. If you do, the call-home loop starts again immediately
and you'll be running this runbook a second time.
---
## Why this works — the failure mode explained
The Sierra Wireless RV50/RV55 serial port operates in one of two TCP modes
at any moment:
- **Server mode** — listens on `Device Port` (e.g. 9034), bridges inbound
TCP to the device's serial port. This is what we need to interact with
the device.
- **Client mode** — when the device sends an AT dial command on its serial
TX line, the modem opens an outbound TCP to `Destination Address:Port`
and bridges that to serial.
A serial port in this configuration is **bidirectional**: the modem flips
between server and client modes on demand. When the device's firmware is
healthy and only dials occasionally, this works fine.
When the unit is constantly triggering events and ACH is set to "After
Event Recorded", the device sends an AT dial command every few seconds.
Each one causes the modem to:
1. Drop any active inbound TCP session
2. Flip to client mode
3. Attempt outbound TCP to `Destination Address:Port`
4. Hang for up to a minute waiting for it to succeed/fail
5. Drop back to server mode
**During the entire hang, no inbound TCP can establish.** Even between
hangs, the modem closes any existing inbound session before flipping. So
any tool that needs more than a few seconds of held TCP (e.g. POLL +
config read + write) gets repeatedly kicked off.
Clearing `Destination Address` removes step 3-4 from the cycle: the modem
has nowhere to dial, so it doesn't flip modes when it receives an AT dial
command. The serial port effectively becomes server-only, and inbound TCP
sessions can stay open as long as needed.
**This is a modem-layer issue, not a device firmware issue.** The device
is alive and responsive the whole time — confirmed in the BE9558H
recovery by 990 bytes of S3 responses received over a 120s slow-drip
session once the modem was no longer mode-flipping.
---
## Why simpler approaches don't work
| Approach | Why it fails |
|---|---|
| Standard `/device/info` | Triggers `count_events()` 1E/1F walk, takes 90s+ and hits corrupted event chain in this scenario |
| `/device/rescue` race loop | Gets 502 (protocol timeout) because the modem closes the TCP before the POLL handshake can complete |
| `/device/stop_monitoring_blind` (single frame) | Even if the bytes leave the wire, the device's protocol parser ignores write commands without a preceding POLL handshake (early-version bug, now fixed by including POLL preamble in blind sends) |
| `/device/stop_monitoring_spam` (sub-second cadence) | Each session is killed by the modem's mode-flip before the device can drain its UART RX buffer; high-rate spam also risks UART FIFO overrun on the device side |
| Outbound port firewall block alone | Stops the outbound TCP from succeeding, but doesn't stop the modem from *trying* and mode-flipping. Reduces but doesn't eliminate the contention. |
| Modem reboot | Temporary — as soon as the device starts triggering again, the loop resumes within seconds |
The combination of `slow_drip` + cleared `Destination Address` works because:
1. The modem stops mode-flipping → TCP session stays open for the full
drip duration
2. Slow drip rate → device's UART RX FIFO never overflows even if
firmware is busy with event recording
3. The drip is `SESSION_RESET + STOP_MONITORING` every 3s → many
independent chances for the parser to land one valid frame
4. Once one Stop Monitoring is parsed, event recording halts → firmware
has CPU to spare → subsequent operations are trivially easy
---
## Tooling reference
All endpoints live in `seismo-relay/sfm/server.py`. All scripts live in
`seismo-relay/scripts/` and default to SFM direct (`http://localhost:8200`),
overridable via `SFM_BASE_URL`.
### Endpoints added during BE9558H recovery
| Endpoint | Purpose |
|---|---|
| `GET /device/events/storage_range` | SUB 0x06 — first/last event keys, `is_empty` flag. ~2s, no event walk. |
| `GET /device/events/index` | SUB 0x08 — lifetime event counter (does NOT decrement on erase). ~2s. |
| `POST /device/events/erase` | Full erase sequence 0xA3 → 0x1C → 0x06 → 0xA2. |
| `POST /device/rescue` | Disable ACH + erase in one TCP session. Short timeouts for race-loop usage. |
| `POST /device/stop_monitoring_blind` | Fire-and-forget Stop with full POLL preamble (single attempt). |
| `POST /device/stop_monitoring_spam` | Server-side tight retry loop, sub-second cadence, duration-bounded. |
| `POST /device/stop_monitoring_slow_drip` | One held TCP session, slow trickle of stop frames. **The endpoint that saved BE9558H.** |
Also changed: default protocol recv timeout dropped from 30s → 10s in
`_build_client`. Added `connect_timeout` knob to same. Cleaned up
unhandled-exception path in `/device/monitor/status` so it returns 502
instead of 500 on protocol timeouts.
### Scripts
| Script | Purpose |
|---|---|
| `scripts/rescue_device.sh` | Race-loop wrapper around `/device/rescue` |
| `scripts/blind_stop.sh` | Race-loop wrapper around `/device/stop_monitoring_blind` |
| `scripts/spam_stop.sh` | Single-call burst hammer (`/device/stop_monitoring_spam`) |
| `scripts/slow_drip.sh` | Single-call held-session drip (`/device/stop_monitoring_slow_drip`) |
| `scripts/watch_unit.sh` | Passive periodic reachability check, logs to file |
---
## Incident log — BE9558H, 2026-05-16/17
What was wrong: Long-axis geophone developed an offset, constantly above
trigger threshold → constant event recording → after-event ACH set →
modem dialing office BW server (`50.197.32.92:12345`) every 30-60s.
Local event chain corrupted (`next_boundary 0x100EE exceeds uint16`).
Diagnostic path:
1. `/device/info` slow, choked on event walk
2. Built lightweight probe endpoints (`storage_range`, `index`) — useful
but didn't reach the wedged unit
3. Built `/device/rescue` with short timeouts — got 502 (POLL no response)
4. Built `/device/stop_monitoring_blind` — first version was a false
positive (no POLL preamble); fixed by including
`SESSION_RESET+POLL_PROBE+SESSION_RESET+POLL_DATA` in the dump
5. Verified blind stop works on bench unit
6. Built `/device/stop_monitoring_spam` — 420 successful sends over
5 min, zero behavior change on field unit
7. Inspected ACEmanager logs → saw outbound dial-out attempts every ~30s,
confirmed device was not fully locked up
8. Added outbound port-12345 firewall block → outbound attempts now fail
instantly but contention persisted
9. Built `/device/stop_monitoring_slow_drip` — session died at 3s with
broken pipe (modem closing on us)
10. Looked at full ACEmanager Port Configuration → **found
`Destination Address: 50.197.32.92` configured**, realized every AT
dial command was triggering a modem mode-flip that killed our inbound
11. Cleared Destination Address + Port → slow_drip held 120s, device
responded with 990 bytes, 39 stop commands acked
12. Disabled ACH at device level via `/device/call_home`, erased events
Final state: device IDLE, memory 958.1 / 960 KB free, ACH disabled at
device level, modem destination cleared (to be restored after physical
service).
Total time from "i was wondering if its possible to" first attempt to
recovery: ~7 hours of intermittent debugging across one evening.
+264
View File
@@ -0,0 +1,264 @@
# Waveform body codec — FULLY DECODED (2026-05-11)
This is the **clean working note** for the body-codec reverse-engineering
effort. It supersedes scattered claims elsewhere when they conflict.
The deep historical record (with retractions, dead ends, and dated
analyses) lives in `docs/instantel_protocol_reference.md §7.6.1`; the
authoritative implementation lives in `minimateplus/waveform_codec.py`.
## TL;DR
**The codec is fully decoded.** Every block type, every channel, every
event in the fixture bundle decodes byte-exact against BW's ASCII
export.
| Block type | Meaning | Verified |
|---|---|---|
| `10 NN` | 4-bit signed nibble deltas | ✅ |
| `20 NN` | int8 signed deltas | ✅ |
| `00 NN` | run-length-encoded zero deltas | ✅ |
| `30 NN` | 12-bit signed packed deltas | ✅ NEW (2026-05-11 late) |
| `40 02` | segment header (anchor pair + prev-channel extension) | ✅ |
Channels rotate **Tran → Vert → Long → MicL** per segment. Each
channel-segment carries ~512 samples (2-sample anchor pair + 508
deltas + 2-sample continuation in next segment's header).
## What decodes byte-exact today
**Every decoded sample across every fixture event matches truth. Zero
divergences.**
| Event | Description | Tran | Vert | Long | Total |
|---|---|---|---|---|---|
| event-a (5-8) | quiet, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
| event-c (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
| event-d (5-8) | quiet, 1 sec | 1280 ✓ | 1280 ✓ | 1280 ✓ | 3840 |
| JQ0 (5-11) | Vert-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
| V70 (5-11) | Mic-heavy, 3 sec | 3328 ✓ | 3328 ✓ | 3328 ✓ | **9984** |
| SP0 (5-11) | loud all, 3 sec | 2048 ✓ | 1538 ✓ | 1536 ✓ | 5122 |
| SS0 (5-11) | loud-from-start | 734 ✓ | 512 ✓ | 512 ✓ | 1758 |
| SV0 (5-11) | loud-from-start | 1024 ✓ | 578 ✓ | 512 ✓ | 2114 |
| event-b (5-8) | quiet, 2 sec | 512 ✓ | 226 ✓ | 0 | 738 |
That's **47,364 ADC samples decoded byte-exact, zero errors.**
Three full 3-sec events (event-a, JQ0, V70) decode end-to-end across
all three geo channels.
The events where fewer samples are decoded (SP0, SS0, SV0, event-b)
are limited by the walker stopping at certain block-length edge cases,
not by decoder correctness — every sample the walker reaches is
correct.
## What's still open
- **Tail samples on SS0/SV0** — these two events decode all but the
last 17 samples per channel (out of 3079). Likely the same
"last segment is truncated" pattern. Minor; doesn't affect the
bulk of the data.
## Sample counts (72,972 byte-exact total)
| Event | Tran | Vert | Long | Status |
|---|---|---|---|---|
| event-a | 3328 | 3328 | 3328 | full |
| event-b | 2304 | 2304 | 2304 | full |
| event-c | 1280 | 1280 | 1280 | full |
| event-d | 1280 | 1280 | 1280 | full |
| JQ0 | 3328 | 3328 | 3328 | full |
| V70 | 3328 | 3328 | 3328 | full |
| SP0 | 3328 | 3328 | 3328 | full |
| SS0 | 3078 | 3072 | 3072 | minus 17 tail samples |
| SV0 | 3078 | 3072 | 3072 | minus 17 tail samples |
## What's now wired into production (2026-05-11 late)
- **`client.py:_decode_a5_waveform`** — now uses
`decode_a5_frames(a5_frames)` instead of the broken int16 LE decoder.
`event.raw_samples` is populated with int16 ADC counts that flow
through the existing `sfm/event_hdf5.py` scaling pipeline unchanged.
Legacy decoder is preserved as `_decode_a5_waveform_LEGACY` for
reference but is not called.
- **MicL → dB(L) conversion** — exposed as
`waveform_codec.mic_count_to_db(count)`. Verified against BW
display values (count=1 → 81.94 dB; count=813 → 140.14 dB; matches
the V70 mic-heavy fixture exactly).
- **`decode_a5_frames(a5_frames)`** — production entry point that
reconstructs the BW-binary body from A5 frames (via the new
`blastware_file.extract_body_bytes` helper) and runs the verified
codec. Returns the same `raw_samples` dict shape the consumers
already expect.
## What's solved
### Block framing
| Tag | Length | Meaning |
|----------|-----------------------|------------------------------------------|
| `10 NN` | NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
| `20 NN` | NN + 2 bytes | int8 signed deltas (1 per byte) |
| `00 NN` | 2 bytes | RLE: append NN copies of current value |
| `30 NN` | NN*2 in data section, | Unknown content. Only in loud-from- |
| | NN*4 in trailer | start events. |
| `40 02` | 20 bytes (fixed) | Segment header |
NN is always a multiple of 4.
Implementation: `walk_body()` in `minimateplus/waveform_codec.py`.
### 7-byte preamble
```
body[0:3] = 00 02 00 magic
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
body[5:7] = Tran[1] int16 BE in 16-count units
```
### Tran channel, segment 0
Segment 0 (everything before the first `40 02`) encodes Tran samples
only. Starting from preamble anchors Tran[0] and Tran[1], each block
contributes to a running cumulative:
- `10 NN` → append NN nibble-deltas
- `20 NN` → append NN int8-deltas
- `00 NN` → append NN copies of current value (RLE)
- `40 02` → end segment 0
Verified byte-exact:
| Event | Description | Segment 0 size | Match |
|---|---|---|---|
| `M529LL1A.SP0` | Loud, 0.25 s pretrig | 510 | 510/510 ✓ |
| `M529LL1A.SV0` | Loud from sample 0 | 58 | 58/58 ✓ (stops at first `30 NN`) |
| `M529LL1A.SS0` | Loud from sample 0 | 42 | 42/42 ✓ (stops at first `30 04`) |
| `M529LL1L.JQ0` | Vert-heavy | 510 | 510/510 ✓ |
| `M529LL1L.V70` | Mic-heavy (140 dB) | 510 | 510/510 ✓ |
Implementation: `decode_tran_initial()`.
### Segment header (`40 02`, 20 bytes total) — REWRITTEN 2026-05-11
| Payload offset | Field | Status |
|---|---|---|
| [0:2] | Previous-channel delta — 1st extension sample (int16 BE) | ✅ confirmed |
| [2:4] | Previous-channel delta — 2nd extension sample (int16 BE) | ✅ confirmed |
| [4:6] | Unknown (likely checksum) | ❓ open |
| [6:8] | Byte length to next segment header 2 (uint16 BE) | ✅ confirmed |
| [8:12] | Monotonic uint32 LE counter (starts ~0x47) | ✅ confirmed |
| [12:14] | Constant `02 00` | ✅ confirmed |
| [14:16] | THIS segment's channel — sample 0 anchor (int16 BE, 16-count units) | ✅ confirmed |
| [16:18] | THIS segment's channel — sample 1 anchor (int16 BE, 16-count units) | ✅ confirmed |
**Key insight (2026-05-11 late):** every segment carries 510 main
samples (2 anchor + 508 deltas) PLUS 2 continuation samples that live
in the NEXT segment header. So each channel-segment effectively spans
512 sample-sets. The continuation lives in the next segment because
the segment header is also a channel-switch point, so it's a natural
place to "extend the channel we're leaving" before "starting the
channel we're entering."
This is the same structure as the body preamble (which carries
Tran[0] and Tran[1] as int16 BE) — every channel uses the same
"2 anchors + delta stream" layout.
## Channel rotation — VERIFIED 2026-05-11
```
(initial body) → Tran samples 0..509 (preamble + delta blocks)
segment 0 hdr ext+anchor → Vert samples 0..511 ← anchor in hdr [14:18]
segment 1 hdr ext+anchor → Long samples 0..511
segment 2 hdr ext+anchor → Mic samples 0..511
segment 3 hdr ext+anchor → Tran samples 510..1021 (continuation)
segment 4 hdr ext+anchor → Vert samples 512..1023
segment 5 hdr ext+anchor → Long samples 512..1023
segment 6 hdr ext+anchor → Mic samples 512..1023
segment 7 hdr ext+anchor → Tran samples 1022..1533
...
```
Implementation: `decode_waveform_v2()` returns
`{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}` with
each channel's samples in 16-count units. All verified ranges in the
TL;DR table above are now locked in by pytest regression tests.
## What's still open
1. **`30 NN` block content.** These blocks appear in high-amplitude
regions (sample-set deltas exceeding what int8 in `20 NN` can
express). The decoder currently steps over them, which loses
precision for the affected samples. Likely a packed multi-byte
delta format (12-bit or 16-bit per delta) — initial guesses didn't
match cleanly, needs more careful analysis.
2. **MicL decoding.** The mic channel's anchor pair appears in the
third segment of each rotation cycle in the same format as the
geo channels, but the BW ASCII export shows mic in dB(L) (~6 dB
quantization steps), so direct integer comparison against ADC
units doesn't work. Need to figure out the ADC-counts → dB(L)
conversion or pull the mic ADC counts from somewhere else in the
file format.
3. **Walker fix for event-b.** The original quiet bundle's event-b
still bails out partway through. Lower priority since the other
7 events walk cleanly.
## `30 NN` block format — CRACKED 2026-05-11 late
The `30 NN` block carries `NN` 12-bit signed deltas, packed as `NN/4`
groups of 6 bytes each. Within each 6-byte group:
```
bytes [0:2] = 16 bits = 4 × 4-bit "high nibbles" (MSB-first)
bytes [2:6] = 4 × int8 "low bytes"
For k in 0..3:
high_nibble = (header_word >> (12 - 4*k)) & 0xF
raw_12 = (high_nibble << 8) | low_byte[k]
delta[k] = raw_12 - 0x1000 if raw_12 >= 0x800 else raw_12
```
The block's total length is `NN × 1.5 + 2` bytes (tag included). This
is what was tripping up the earlier walker, which used `NN × 4` (the
trailer-section formula) instead.
Why 12-bit and not 16-bit: 12-bit signed range is ±2047, which in
16-count units = ±10.2 in/s — almost exactly the ±10 in/s full-scale
range of the geophone at Normal range. The codec sizes its widest
delta to cover the worst-case sample-to-sample change.
Verified against all 14 `30 NN` blocks across the bundled fixture
events. Every delta decodes byte-exact against BW's ASCII export.
## Test fixtures
Committed under `tests/fixtures/`:
- `decode-re-5-8-26/event-a..event-d/`: original quiet bundle (4 events,
PPV < 1 in/s). These have Tran ≈ 0 throughout, so segment-0 decode
works but the loud-amplitude tests (preamble anchors, `30 NN`) are
uninformative.
- `5-11-26/M529LL1A.{SP0,SS0,SV0}`: loud bundle (PPV 6-7 in/s on all
channels). These cracked the Tran codec.
- `5-11-26/M529LL1L.{JQ0,V70}`: targeted captures. JQ0 is Vert-heavy,
V70 is Mic-heavy (140 dB). These cracked the `00 NN` RLE rule.
Each fixture has a `.TXT` Blastware ASCII export as ground truth.
## Tests
`tests/test_waveform_codec.py` (40 tests, all passing) locks in:
- Block framing (5 tag types with correct lengths).
- Walker contiguity (no gaps or overlaps).
- Segment header parsing (counter monotonicity, fixed-pattern check).
- `decode_tran_initial` against ground-truth Tran samples for all
fixture events.
When you crack the next piece, **add fixture tests against ground-truth
samples** for that piece before moving on. Don't let unverified code
ship without a regression lock-in.
+634
View File
@@ -0,0 +1,634 @@
#!/usr/bin/env python3
"""
experiments.py Protocol minimization experiments for MiniMate Plus.
Goal: figure out which steps in Blastware's sequences are truly required vs.
cargo-culted, so we can build a faster, smarter client.
Each experiment is self-contained (opens its own TCP connection) and reports
PASS / FAIL / INCONCLUSIVE with timing and notes.
Usage:
python experiments.py [--host IP] [--port PORT] [exp1 exp2 ...]
Run all: python experiments.py
Run specific: python experiments.py cold_status fast_event_count no_5a
Available experiments
---------------------
cold_status EXP1 Monitor status (1C) with NO prior POLL
fast_event_count EXP2 Event count via POLL+08 only skip identity reads
no_5a EXP3 Event record (0C) without bulk waveform stream (5A)
skip_1e EXP4 0A/0C directly with cached key skip initial 1E
fewer_polls EXP5 Only 1 POLL before 5A instead of Blastware's 3
compliance_only EXP6 Write compliance ONLY (71x372), skip event index+trigger+waveform
"""
from __future__ import annotations
import argparse
import logging
import struct
import sys
import time
from dataclasses import dataclass, field
from typing import Optional
logging.basicConfig(
level=logging.WARNING, # experiment output is via print(); set DEBUG for wire trace
format="%(asctime)s %(levelname)-7s %(name)-20s %(message)s",
datefmt="%H:%M:%S",
)
log = logging.getLogger("experiments")
# ── Imports ───────────────────────────────────────────────────────────────────
from minimateplus.transport import TcpTransport
from minimateplus.protocol import (
MiniMateProtocol,
ProtocolError,
TimeoutError as ProtoTimeout,
SUB_MONITOR_STATUS,
SUB_SERIAL_NUMBER,
SUB_FULL_CONFIG,
SUB_EVENT_INDEX,
SUB_COMPLIANCE,
SUB_WRITE_CONFIRM_A,
SUB_WRITE_CONFIRM_B,
)
from minimateplus.framing import build_bw_frame, SESSION_RESET
from minimateplus.client import (
MiniMateClient,
_decode_compliance_config_into,
_encode_compliance_config,
)
from minimateplus.models import DeviceInfo
DEFAULT_HOST = "63.43.212.232"
DEFAULT_PORT = 9034
# ── Result container ──────────────────────────────────────────────────────────
@dataclass
class Result:
name: str
outcome: str # "PASS" | "FAIL" | "INCONCLUSIVE"
elapsed: float = 0.0
notes: str = ""
details: dict = field(default_factory=dict)
def __str__(self) -> str:
sym = {"PASS": "", "FAIL": "", "INCONCLUSIVE": "⚠️ "}.get(self.outcome, "?")
lines = [f" {sym} {self.outcome:13s} {self.name} ({self.elapsed:.1f}s)"]
if self.notes:
lines.append(f" {self.notes}")
for k, v in self.details.items():
lines.append(f" {k}: {v}")
return "\n".join(lines)
# ── Connection helpers ────────────────────────────────────────────────────────
def connect_proto(host: str, port: int, timeout: float = 15.0) -> tuple[TcpTransport, MiniMateProtocol]:
"""Open a raw TCP connection and return (transport, proto) without any handshake."""
t = TcpTransport(host, port)
t.connect()
proto = MiniMateProtocol(t, recv_timeout=timeout)
return t, proto
def connect_client(host: str, port: int, timeout: float = 30.0) -> tuple[MiniMateClient, DeviceInfo]:
"""Open a MiniMateClient and run the full connect() handshake."""
transport = TcpTransport(host, port)
client = MiniMateClient(transport=transport, timeout=timeout)
client.open()
info = client.connect()
return client, info
# ── Experiment runner ─────────────────────────────────────────────────────────
def run(name: str, fn, *args, **kwargs) -> Result:
print(f"\n{''*60}")
print(f" Running: {name}")
print(f"{''*60}")
t0 = time.time()
try:
outcome, notes, details = fn(*args, **kwargs)
except Exception as exc:
outcome = "FAIL"
notes = f"Uncaught exception: {exc}"
details = {}
log.exception("Experiment %s raised:", name)
elapsed = time.time() - t0
r = Result(name=name, outcome=outcome, elapsed=elapsed, notes=notes, details=details)
print(str(r))
return r
# ══════════════════════════════════════════════════════════════════════════════
# EXP1 — Monitor status (1C) with NO prior POLL
# ══════════════════════════════════════════════════════════════════════════════
#
# Blastware always does a full POLL handshake before any other command.
# We want to know: can we query SUB 1C (battery, memory, monitoring state)
# cold, with only a SESSION_RESET signal and no POLL at all?
#
# If PASS: status checks become near-instant (no ~1s POLL round-trip).
# If FAIL: we need POLL first, but maybe we can cache it.
def exp_cold_status(host: str, port: int) -> tuple[str, str, dict]:
"""SUB 1C without any POLL — just SESSION_RESET + 1C probe + 1C data."""
t, proto = connect_proto(host, port)
try:
print(" Sending SESSION_RESET only (no POLL)")
t.write(SESSION_RESET)
time.sleep(0.1)
print(" Sending SUB 1C probe (no POLL first)…")
rsp_sub = (0xFF - SUB_MONITOR_STATUS) & 0xFF # 0xE3
t.write(build_bw_frame(SUB_MONITOR_STATUS, 0x00))
probe = proto._recv_one(expected_sub=rsp_sub, timeout=8.0)
print(f" 1C probe OK page_key=0x{probe.page_key:04X} data={probe.data.hex()}")
t.write(build_bw_frame(SUB_MONITOR_STATUS, 0x2C))
data_rsp = proto._recv_one(expected_sub=rsp_sub, timeout=8.0)
section = data_rsp.data
print(f" 1C data OK {len(section)} bytes hex: {section.hex()}")
# Decode battery + memory from the end of the section
details = {"raw_bytes": len(section)}
if len(section) >= 10:
batt_raw = struct.unpack_from(">H", section, len(section) - 10)[0]
mem_total = struct.unpack_from(">I", section, len(section) - 8)[0]
mem_free = struct.unpack_from(">I", section, len(section) - 4)[0]
is_monitoring = (section[1] == 0x10)
details["battery_v"] = f"{batt_raw / 100:.2f} V"
details["memory_total"] = f"{mem_total:,} bytes"
details["memory_free"] = f"{mem_free:,} bytes"
details["monitoring"] = is_monitoring
print(f" battery={batt_raw/100:.2f}V mem_free={mem_free:,} monitoring={is_monitoring}")
return "PASS", "SUB 1C responded without any POLL — cold status read works!", details
except ProtoTimeout:
return "FAIL", "Device did not respond to 1C without POLL (timeout)", {}
except ProtocolError as exc:
return "FAIL", f"Protocol error: {exc}", {}
finally:
t.disconnect()
# ══════════════════════════════════════════════════════════════════════════════
# EXP2 — Fast event count: POLL + SUB 08 only (skip identity reads)
# ══════════════════════════════════════════════════════════════════════════════
#
# Blastware's connect() does: POLL → 15 → 01 → 1A → 08
# We want to know: can we skip 15/01/1A and go straight from POLL to 08?
#
# Reading identity (15, 01) and full compliance (1A, ~2126 bytes over TCP)
# takes several seconds each connect. If we only need event count, skipping
# them would be a huge win.
#
# If PASS: fast status poll = POLL + 08 only (~2 round trips vs ~8+).
def exp_fast_event_count(host: str, port: int) -> tuple[str, str, dict]:
"""POLL startup → SUB 08 only, skip serial/config/compliance reads."""
t, proto = connect_proto(host, port)
try:
print(" Running startup (POLL only)…")
proto.startup()
print(" POLL OK — now reading SUB 08 (event index) directly…")
idx_raw = proto.read_event_index()
print(f" SUB 08 OK {len(idx_raw)} bytes")
# Try to decode event count from SUB 08 payload
# The raw block is 88 bytes; bytes [3:7] may be a count (uint32 BE)
details = {"idx_raw_len": len(idx_raw)}
if len(idx_raw) >= 7:
count_candidate = struct.unpack_from(">I", idx_raw, 3)[0]
details["count_candidate"] = count_candidate
print(f" idx[3:7] as uint32 BE = {count_candidate} (may or may not be event count)")
# Also verify we can read 1E without the identity reads having been done
print(" Reading 1E (event header) to confirm event access works…")
key4, data8 = proto.read_event_first()
is_empty = data8[4:8] == b"\x00\x00\x00\x00"
details["first_key"] = key4.hex()
details["is_empty"] = is_empty
print(f" 1E OK key={key4.hex()} empty={is_empty}")
return "PASS", "POLL+08+1E all work without identity reads (15/01/1A skipped)", details
except ProtocolError as exc:
return "FAIL", f"Protocol error: {exc}", {}
finally:
t.disconnect()
# ══════════════════════════════════════════════════════════════════════════════
# EXP3 — Get event record (0C) without bulk waveform stream (5A)
# ══════════════════════════════════════════════════════════════════════════════
#
# Blastware's event download = 1E → 0A → 1E-arm → 0C → 1F(dl) → POLL×3 → 5A → 1F(browse)
#
# The 5A bulk stream is the slow part (several large frames, ~1s+ per event).
# We only need 5A for: client, operator, seis_loc, notes (not in 0C).
# If you don't need those fields, can we do: 1E → 0A → 0C → 1F(browse) ?
#
# Two variants tested:
# 3a: Skip 1E-arm AND 5A — just 0A → 0C → 1F(browse)
# 3b: Include 1E-arm but skip 5A+POLL — 0A → 1E-arm → 0C → 1F(browse)
#
# If PASS: event peak values available without the slow bulk stream.
# If FAIL on 3a but PASS on 3b: 1E-arm required even without 5A.
def exp_no_5a(host: str, port: int) -> tuple[str, str, dict]:
"""Event record via 0A→0C without 5A or POLL×3. Tests both with and without 1E-arm."""
t, proto = connect_proto(host, port)
try:
print(" Startup (POLL)…")
proto.startup()
# Get the first event key via 1E
key4, data8 = proto.read_event_first()
if data8[4:8] == b"\x00\x00\x00\x00":
return "INCONCLUSIVE", "Device has no stored events — cannot test", {}
print(f" First event key: {key4.hex()}")
details: dict = {"key": key4.hex()}
# ── Variant 3a: 0A → 0C → 1F(browse), no 1E-arm ─────────────────────
print("\n [3a] 0A → 0C → 1F(browse) (NO 1E-arm, NO 5A)")
try:
_hdr, rec_len = proto.read_waveform_header(key4)
print(f" 0A OK rec_len=0x{rec_len:02X}")
record_3a = proto.read_waveform_record(key4)
print(f" 0C OK {len(record_3a)} bytes")
# Check for recognizable content
has_tran = b"Tran" in record_3a
has_vert = b"Vert" in record_3a
has_long = b"Long" in record_3a
print(f" 0C content check: Tran={has_tran} Vert={has_vert} Long={has_long}")
details["3a_0c_bytes"] = len(record_3a)
details["3a_has_peaks"] = has_tran and has_vert and has_long
# Now try browse 1F without any 5A
key4_next, data8_next = proto.advance_event(browse=True)
null_sentinel = data8_next[4:8] == b"\x00\x00\x00\x00"
print(f" 1F(browse) → key={key4_next.hex()} null={null_sentinel}")
details["3a_1f_ok"] = True
details["3a_outcome"] = "PASS"
except ProtocolError as exc:
print(f" 3a FAILED: {exc}")
details["3a_outcome"] = f"FAIL: {exc}"
# Try to recover by reconnecting for 3b
t.disconnect()
t2, proto2 = connect_proto(host, port)
proto2.startup()
key4, data8 = proto2.read_event_first()
if data8[4:8] == b"\x00\x00\x00\x00":
return "FAIL", f"3a failed and device empty on retry: {exc}", details
t, proto = t2, proto2
# ── Variant 3b: 0A → 1E-arm → 0C → 1F(browse), no 5A ───────────────
print("\n [3b] 0A → 1E-arm(0xFE) → 0C → 1F(browse) (NO POLL×3, NO 5A)")
try:
_hdr, rec_len = proto.read_waveform_header(key4)
print(f" 0A OK rec_len=0x{rec_len:02X}")
# 1E download-arm (token=0xFE) between 0A and 0C
proto.read_event_first(token=0xFE)
print(" 1E-arm OK")
record_3b = proto.read_waveform_record(key4)
print(f" 0C OK {len(record_3b)} bytes")
has_tran = b"Tran" in record_3b
print(f" 0C content check: Tran={has_tran} Vert={b'Vert' in record_3b}")
details["3b_0c_bytes"] = len(record_3b)
details["3b_has_peaks"] = has_tran
# Browse 1F without 5A / POLL×3
key4_next2, data8_next2 = proto.advance_event(browse=True)
null_sentinel2 = data8_next2[4:8] == b"\x00\x00\x00\x00"
print(f" 1F(browse) → key={key4_next2.hex()} null={null_sentinel2}")
details["3b_1f_ok"] = True
details["3b_outcome"] = "PASS"
except ProtocolError as exc:
print(f" 3b FAILED: {exc}")
details["3b_outcome"] = f"FAIL: {exc}"
# Summarize
a_ok = details.get("3a_outcome") == "PASS"
b_ok = details.get("3b_outcome") == "PASS"
if a_ok:
return "PASS", "3a: 0A→0C works with NO 1E-arm and NO 5A. Huge speedup possible!", details
elif b_ok:
return "PASS", "3b: 0A→1E-arm→0C works without 5A (1E-arm still needed before 0C)", details
else:
return "FAIL", "Both 3a and 3b failed — 5A may be required for device state", details
except ProtocolError as exc:
return "FAIL", f"Protocol error during setup: {exc}", {}
finally:
try:
t.disconnect()
except Exception:
pass
# ══════════════════════════════════════════════════════════════════════════════
# EXP4 — Skip initial 1E if we already know the event key
# ══════════════════════════════════════════════════════════════════════════════
#
# In Blastware, every session starts with 1E to discover the first key.
# But if we already fetched and cached the event keys from a previous session,
# can we skip 1E entirely and go straight to 0A(cached_key)?
#
# Practical use case: we poll the device every N minutes. We already know
# all the event keys from last time. On re-connect, can we go direct to 0A?
#
# If PASS: subsequent polls that don't add new events can skip 1E discovery.
def exp_skip_1e(host: str, port: int) -> tuple[str, str, dict]:
"""Get the first event key, disconnect, reconnect, go straight to 0A (skip 1E)."""
# Phase 1: get the key
t, proto = connect_proto(host, port)
try:
proto.startup()
key4, data8 = proto.read_event_first()
if data8[4:8] == b"\x00\x00\x00\x00":
return "INCONCLUSIVE", "No events stored — cannot test", {}
print(f" Phase 1: got event key = {key4.hex()}")
finally:
t.disconnect()
time.sleep(0.5)
# Phase 2: fresh connection, skip 1E, go straight to 0A with cached key
t2, proto2 = connect_proto(host, port)
try:
print(" Phase 2: fresh connection — startup + 0A directly (no 1E)")
proto2.startup()
_hdr, rec_len = proto2.read_waveform_header(key4)
print(f" 0A OK rec_len=0x{rec_len:02X}")
record = proto2.read_waveform_record(key4)
has_peaks = b"Tran" in record
print(f" 0C OK {len(record)} bytes has_peaks={has_peaks}")
details = {
"cached_key": key4.hex(),
"0c_bytes": len(record),
"has_peaks": has_peaks,
}
return "PASS", "0A works with cached key — 1E discovery can be skipped on known sessions", details
except ProtocolError as exc:
return "FAIL", f"0A failed with cached key (device needs 1E first?): {exc}", {"key": key4.hex()}
finally:
t2.disconnect()
# ══════════════════════════════════════════════════════════════════════════════
# EXP5 — Fewer POLLs before 5A (try POLL×1 instead of Blastware's POLL×3)
# ══════════════════════════════════════════════════════════════════════════════
#
# Blastware always sends 3 full POLL probe+data cycles between 1F and 5A.
# Each POLL is a round trip. Can we get away with just 1?
#
# WARNING: If POLL×1 fails, the device may be in a bad state. We try to
# recover with an extra POLL×2 and a fresh 5A attempt. Even on failure we
# try to leave the device in a usable state.
#
# Strategy: run the full event sequence up to 1F(download), then try 5A
# with only 1 POLL. If 5A responds → PASS. If timeout → try 2 more POLLs
# and check if the device recovers.
def exp_fewer_polls(host: str, port: int) -> tuple[str, str, dict]:
"""Full sequence to 1F, then only 1 POLL before 5A (Blastware does 3)."""
t, proto = connect_proto(host, port)
try:
proto.startup()
key4, data8 = proto.read_event_first()
if data8[4:8] == b"\x00\x00\x00\x00":
return "INCONCLUSIVE", "No events stored — cannot test", {}
print(f" Event key: {key4.hex()}")
# Full setup: 0A → 1E-arm → 0C → 1F(download)
_hdr, rec_len = proto.read_waveform_header(key4)
print(f" 0A OK rec_len=0x{rec_len:02X}")
proto.read_event_first(token=0xFE) # 1E-arm
print(" 1E-arm OK")
proto.read_waveform_record(key4)
print(" 0C OK")
arm_key4, _ = proto.advance_event(browse=False) # 1F(download) — arms 5A
print(f" 1F(download) OK arm_key={arm_key4.hex()}")
# Only 1 POLL (Blastware does 3)
print(" Sending 1 POLL (instead of 3)…")
proto.poll()
print(" POLL ok — now probing 5A…")
try:
frames = proto.read_bulk_waveform_stream(key4, stop_after_metadata=True, max_chunks=12)
print(f" 5A OK after 1 POLL — {len(frames)} frames received")
details = {"poll_count": 1, "frames": len(frames)}
return "PASS", "5A works with only 1 POLL (saved 2 round-trips per event)!", details
except ProtoTimeout:
print(" 5A timed out after 1 POLL — device needs more POLLs")
# Attempt recovery: send 2 more POLLs and see if 5A then works
print(" Attempting recovery: 2 more POLLs…")
try:
proto.poll()
proto.poll()
frames2 = proto.read_bulk_waveform_stream(key4, stop_after_metadata=True, max_chunks=12)
print(f" 5A worked after total 3 POLLs ({len(frames2)} frames)")
return "FAIL", "5A needs 3 POLLs — 1 is not enough (recovery confirmed 3 still works)", {
"poll_count_tried": 1, "recovery_polls": 3, "recovery_frames": len(frames2)
}
except ProtocolError as exc2:
return "FAIL", f"5A failed even after 3 total POLLs — device may need reconnect: {exc2}", {}
except ProtocolError as exc:
return "FAIL", f"Setup failed: {exc}", {}
finally:
t.disconnect()
# ══════════════════════════════════════════════════════════════════════════════
# EXP6 — Compliance-only write (71×3→72), skip event index + trigger + waveform
# ══════════════════════════════════════════════════════════════════════════════
#
# Blastware's full write sequence: 68→73 | 71×3→72 | 82→83 | 69→74→72
# We want to know: can we write ONLY the compliance block (71×3→72)?
#
# Test procedure:
# 1. Read current compliance config (SUB 1A)
# 2. Patch the "notes" field to a test marker
# 3. Write ONLY 71×3→72 (skip 68, 73, 82, 83, 69, 74, final 72)
# 4. Read back (SUB 1A) and verify the change was written
# 5. Restore original value
#
# If PASS: we can push individual config fields without touching event index,
# trigger config, or waveform data — huge simplification.
# If FAIL: the device needs the full write sequence (may reject partial write).
#
# SAFETY: We restore original data in a finally block. If the restore write
# fails, the device will have the test marker in "notes" — harmless but visible.
_EXP6_MARKER = "[exp6-test]"
def exp_compliance_only(host: str, port: int) -> tuple[str, str, dict]:
"""Write compliance block alone (71×3→72), verify, and restore."""
client, info = connect_client(host, port)
original_raw: Optional[bytes] = None
try:
proto = client._proto
if proto is None:
return "FAIL", "Could not get protocol handle from client", {}
# 1. Read current compliance
print(" Reading current compliance config (SUB 1A)…")
original_raw = proto.read_compliance_config()
print(f" Got {len(original_raw)} bytes of compliance config")
# Find current notes value for display
info_obj = DeviceInfo()
_decode_compliance_config_into(original_raw, info_obj)
cc = info_obj.compliance_config
orig_notes = cc.notes if cc else "(unknown)"
print(f" Current notes field: {orig_notes!r}")
# 2. Build modified payload with test marker in notes
test_notes = _EXP6_MARKER
modified_raw = _encode_compliance_config(
original_raw,
notes=test_notes,
)
print(f" Encoded modified compliance payload ({len(modified_raw)} bytes)")
print(f" Patching notes: {orig_notes!r}{test_notes!r}")
# 3. Write ONLY the compliance block: 71×3 → 72
print(" Writing compliance ONLY (71×3→72) — skipping 68/73/82/83/69/74…")
proto.write_compliance_config_raw(modified_raw)
print(" Write complete — device acked 71×3→72")
# 4. Read back and verify
print(" Reading back compliance config to verify…")
readback_raw = proto.read_compliance_config()
readback_info = DeviceInfo()
_decode_compliance_config_into(readback_raw, readback_info)
rb_cc = readback_info.compliance_config
readback_notes = rb_cc.notes if rb_cc else "(decode failed)"
print(f" Read-back notes: {readback_notes!r}")
write_worked = (readback_notes == test_notes)
print(f" Write verified: {write_worked}")
details = {
"original_notes": orig_notes,
"written_notes": test_notes,
"readback_notes": readback_notes,
"write_verified": write_worked,
}
if write_worked:
return "PASS", "Compliance-only write works! No event index or trigger writes needed.", details
else:
return "FAIL", f"Write was not reflected in read-back (got {readback_notes!r})", details
except ProtocolError as exc:
return "FAIL", f"Protocol error: {exc}", {}
finally:
# Restore original compliance data regardless of outcome
if original_raw is not None:
print(" Restoring original compliance config…")
try:
proto2 = client._proto
if proto2:
proto2.write_compliance_config_raw(
_encode_compliance_config(original_raw) # no-op patch = verbatim
)
print(" Restore complete")
else:
print(" WARNING: protocol handle gone — could not restore")
except Exception as exc_r:
print(f" WARNING: restore failed: {exc_r}")
client.close()
# ══════════════════════════════════════════════════════════════════════════════
# Registry + main
# ══════════════════════════════════════════════════════════════════════════════
EXPERIMENTS = {
"cold_status": ("EXP1", exp_cold_status, "Monitor status (1C) with no POLL"),
"fast_event_count": ("EXP2", exp_fast_event_count, "Event count via POLL+08, skip identity reads"),
"no_5a": ("EXP3", exp_no_5a, "Event record (0C) without bulk waveform (5A)"),
"skip_1e": ("EXP4", exp_skip_1e, "0A/0C with cached key — skip initial 1E"),
"fewer_polls": ("EXP5", exp_fewer_polls, "1 POLL before 5A instead of Blastware's 3"),
"compliance_only": ("EXP6", exp_compliance_only, "Compliance-only write (71×3→72), no other blocks"),
}
def main() -> None:
ap = argparse.ArgumentParser(description="MiniMate Plus protocol minimization experiments")
ap.add_argument("--host", default=DEFAULT_HOST)
ap.add_argument("--port", type=int, default=DEFAULT_PORT)
ap.add_argument("--debug", action="store_true", help="Enable DEBUG wire logging")
ap.add_argument("experiments", nargs="*",
help=f"Which to run (default: all). Choices: {', '.join(EXPERIMENTS)}")
args = ap.parse_args()
if args.debug:
logging.getLogger().setLevel(logging.DEBUG)
which = args.experiments or list(EXPERIMENTS.keys())
unknown = [e for e in which if e not in EXPERIMENTS]
if unknown:
print(f"Unknown experiments: {unknown}")
print(f"Available: {', '.join(EXPERIMENTS)}")
sys.exit(1)
print(f"\n{''*60}")
print(f" MiniMate Plus Protocol Minimization Experiments")
print(f" Target: {args.host}:{args.port}")
print(f" Running: {', '.join(which)}")
print(f"{''*60}")
results: list[Result] = []
for key in which:
tag, fn, desc = EXPERIMENTS[key]
label = f"{tag}: {desc}"
r = run(label, fn, args.host, args.port)
results.append(r)
time.sleep(1.5) # brief pause between experiments — let device settle
print(f"\n\n{''*60}")
print(" SUMMARY")
print(f"{''*60}")
for r in results:
sym = {"PASS": "", "FAIL": "", "INCONCLUSIVE": "⚠️ "}.get(r.outcome, "?")
print(f" {sym} {r.outcome:13s} {r.name}")
print(f"{''*60}")
passed = sum(1 for r in results if r.outcome == "PASS")
failed = sum(1 for r in results if r.outcome == "FAIL")
skipped = sum(1 for r in results if r.outcome == "INCONCLUSIVE")
print(f" {passed} passed {failed} failed {skipped} inconclusive")
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print("\nInterrupted.")
sys.exit(0)
+48
View File
@@ -0,0 +1,48 @@
"""
micromate Instantel Micromate (Series IV) device library.
Sibling of ``minimateplus`` (the Series III library). Currently scoped to
the offline-file ingest path used by thor-watcher: parsing the per-event
``.IDFH``/``.IDFW`` ASCII text sidecars Thor's exporter writes alongside
each binary event file, and wrapping the parsed data in typed event
records.
Live-device support (TCP protocol, frame parsing, real-time monitoring)
is deferred when we add it, it lands here as ``transport.py`` /
``framing.py`` / ``protocol.py`` / ``client.py``, mirroring the
``minimateplus`` package layout.
Typical usage (offline file ingest):
from micromate import IdfEvent, parse_idf_report
text = open("UM11719_20231219162723.IDFW.txt").read()
rep = parse_idf_report(text) # dict
event = IdfEvent.from_report(rep, "UM11719_20231219162723.IDFW")
print(event.serial, event.peaks.transverse_ips, event.mic_pspl_dbl)
"""
from .idf_ascii_report import (
parse_event_filename,
parse_idf_report,
serial_from_filename,
)
from .models import (
IdfEvent,
IdfPeaks,
IdfProjectInfo,
IdfReport,
IdfSensorCheck,
)
__version__ = "0.1.0"
__all__ = [
"IdfEvent",
"IdfPeaks",
"IdfProjectInfo",
"IdfReport",
"IdfSensorCheck",
"parse_event_filename",
"parse_idf_report",
"serial_from_filename",
]
+330
View File
@@ -0,0 +1,330 @@
"""
micromate/idf_ascii_report.py parse Thor (Micromate Series IV) IDF ASCII reports.
Thor exports a `.IDFW.txt` or `.IDFH.txt` sidecar next to each `.IDFW`
(waveform) or `.IDFH` (histogram) event binary. Each sidecar is a
plain-text file with `"Key : Value"` lines covering the full device-
authoritative event metadata PPV per channel, ZC Freq, Time of Peak,
Peak Acceleration / Displacement, sensor self-check results, project
strings, calibration date, battery level, etc. followed by a raw
waveform-samples block headed by the literal line "Waveform Data Channels".
This is the Thor analogue of `minimateplus/bw_ascii_report.py` for the
Blastware (Series III) report format. The parser is intentionally
permissive: we extract everything we recognise into a flat dict and
silently ignore anything we don't. Downstream callers parse units
(`"0.2119 in/s"` 0.2119) only on the fields they need.
Example input (truncated):
"EventType : Full Waveform"
"SampleRate : 1024 sps"
"EventTime : 16:27:23"
"EventDate : 2023-12-19"
"TranPPV : 0.0251 in/s"
"VertPPV : 0.2119 in/s"
"LongPPV : 0.0282 in/s"
"PeakVectorSum : 0.2131 in/s"
"MicPSPL : 99.4 dB(L)"
"TranZCFreq : 6.5 Hz"
"SerialNumber : UM11719"
"Version : Micromate ISEE 11.0AK"
"FileName : UM11719_20231219162723.IDFW"
"BatteryLevel : 3.8 volts"
"Calibration : November 22, 2023 by Instantel"
"TranTestResults : Passed"
"TitleString1 : UPMC Presby-Loc 3-Level1-1R Elevator Rm"
Waveform Data Channels
Tran Vert Long MicL
0.0003 -0.0003 0.0003 0.00013
...
"""
from __future__ import annotations
import datetime
import re
from typing import Any, Dict, Optional, Tuple, Union
# Lines look like: "Key : Value" (quotes literal, single ":" separator)
_LINE_RE = re.compile(r'^\s*"?([^":]+?)"?\s*:\s*"?(.*?)"?\s*$')
# Marker that ends the metadata block — everything after is raw sample data.
_WAVEFORM_BLOCK_MARKER = "waveform data channels"
def _normalize_key(raw: str) -> str:
"""Convert "TranPPV" / "PreTriggerLength" → snake_case."""
s = raw.strip()
# Insert underscore between lower→upper / digit→letter transitions
s = re.sub(r"(?<=[a-z0-9])(?=[A-Z])", "_", s)
s = re.sub(r"(?<=[A-Z])(?=[A-Z][a-z])", "_", s)
s = s.replace("-", "_").replace(" ", "_")
return s.lower()
def _strip_unit_suffix(value: str) -> str:
"""Return the numeric part of values like "0.2119 in/s""0.2119".
Also strips Thor's below/above-threshold prefixes:
"<0.005 in/s" "0.005" (below-noise-floor reading)
">100 Hz" "100" (above-measurement-range reading)
"""
parts = value.strip().split()
token = parts[0] if parts else value.strip()
if token.startswith("<") or token.startswith(">"):
token = token[1:]
return token
def _parse_float(value: str) -> Optional[float]:
try:
return float(_strip_unit_suffix(value))
except (ValueError, TypeError):
return None
def _parse_int(value: str) -> Optional[int]:
try:
return int(float(_strip_unit_suffix(value)))
except (ValueError, TypeError):
return None
def parse_idf_report(text: Union[str, bytes]) -> Dict[str, Any]:
"""
Parse a Thor IDFW.txt / IDFH.txt sidecar.
Returns a flat dict with two kinds of entries:
- **Raw fields** every `Key : Value` line, keyed by snake_case
of the original key, value as a string (unit suffix preserved).
Lets callers grab any field we haven't explicitly normalised.
- **Derived fields** a curated set with parsed types:
* `serial_number` str
* `event_type` str ("Full Waveform" / "Full Histogram")
* `event_datetime` ISO-8601 string ("YYYY-MM-DDTHH:MM:SS") when
both EventDate and EventTime are present
* `sample_rate` int (samples/sec)
* `tran_ppv`,`vert_ppv`,`long_ppv` float (in/s)
* `mic_ppv` float (dB or psi same units as MicPSPL)
* `peak_vector_sum` float (in/s)
* `tran_zc_freq`,`vert_zc_freq`,`long_zc_freq` float (Hz)
* `record_time_sec` float (seconds)
* `pre_trigger_sec` float (seconds)
* `project` str (from TitleString1 Thor's location)
* `client` str (TitleString2)
* `operator` str (TitleString3 company/operator)
* `notes` str (TitleString4)
* `setup` str
* `version` str (firmware)
* `battery_volts` float
* `calibration_text` str (e.g. "November 22, 2023 by Instantel")
* `tran_test_passed`, `vert_test_passed`, `long_test_passed`,
`mic_test_passed` bool ("Passed" True; anything else False)
* `filename` str (FileName line useful sanity check)
Stops parsing at the literal "Waveform Data Channels" line; the
raw-samples block is left to whoever wants to decode the binary.
Input may be `str` or `bytes` (`utf-8`/`latin-1` tolerant).
"""
if isinstance(text, bytes):
try:
text = text.decode("utf-8")
except UnicodeDecodeError:
text = text.decode("latin-1", errors="replace")
raw: Dict[str, str] = {}
for line in text.splitlines():
stripped = line.strip()
if not stripped:
continue
if stripped.lower().startswith(_WAVEFORM_BLOCK_MARKER):
break
m = _LINE_RE.match(stripped)
if not m:
continue
key = _normalize_key(m.group(1))
value = m.group(2).strip()
# Multi-value lines (Channel, Units, etc.) — coalesce by appending.
if key in raw:
raw[key] = raw[key] + "; " + value
else:
raw[key] = value
out: Dict[str, Any] = dict(raw) # keep all raw fields
# ── Derived fields ───────────────────────────────────────────────────────
def _take(*candidates: str) -> Optional[str]:
for c in candidates:
if c in raw:
return raw[c]
return None
# Event identity
if "serial_number" in raw:
out["serial_number"] = raw["serial_number"]
if "event_type" in raw:
out["event_type"] = raw["event_type"]
if "file_name" in raw:
out["filename"] = raw["file_name"]
# Combined date+time. Waveform sidecars use "EventDate" / "EventTime";
# histogram sidecars use "HistogramStartDate" / "HistogramStartTime".
# Prefer the event_* names when both are present.
ed = raw.get("event_date") or raw.get("histogram_start_date")
et = raw.get("event_time") or raw.get("histogram_start_time")
if ed and et:
try:
dt = datetime.datetime.strptime(f"{ed} {et}", "%Y-%m-%d %H:%M:%S")
out["event_datetime"] = dt.isoformat()
except ValueError:
pass
# Numeric scalars. For every field we typify here, we MUST drop the
# raw string copy from `out` when parsing fails — Thor writes things
# like "<0.005 in/s" (below threshold) and "N/A" (not measured) that
# would otherwise linger in `out` as strings, sneak into SQLite REAL
# columns via permissive type affinity, and then crash the JS
# frontend on `.toFixed(...)`.
int_fields = ("sample_rate",)
for key in int_fields:
v = raw.get(key)
if v is None:
continue
iv = _parse_int(v)
if iv is not None:
out[key] = iv
else:
out.pop(key, None)
float_fields = (
"tran_ppv", "vert_ppv", "long_ppv", "peak_vector_sum",
"tran_zc_freq", "vert_zc_freq", "long_zc_freq",
"tran_peak_acceleration", "vert_peak_acceleration",
"long_peak_acceleration",
"tran_peak_displacement", "vert_peak_displacement",
"long_peak_displacement",
"mic_zc_freq",
)
for key in float_fields:
v = raw.get(key)
if v is None:
continue
fv = _parse_float(v)
if fv is not None:
out[key] = fv
else:
out.pop(key, None)
# Time-of-peak: Thor labels these "TimeofPeak" (lowercase "of") so the
# normalizer produces "*_timeof_peak". Map them to the canonical
# ``*_time_of_peak`` output keys for downstream consumers.
for raw_key, out_key in (
("tran_timeof_peak", "tran_time_of_peak"),
("vert_timeof_peak", "vert_time_of_peak"),
("long_timeof_peak", "long_time_of_peak"),
("mic_timeof_peak", "mic_time_of_peak"),
):
v = raw.get(raw_key)
if v is None:
continue
fv = _parse_float(v)
if fv is not None:
out[out_key] = fv
# Microphone — Thor reports MicPSPL (dB(L)) which is the closest
# analogue to BW's mic_ppv. The raw "99.4 dB(L)" string stays in
# `out` under the original `mic_pspl` key for display; the parsed
# float goes in `mic_ppv`.
mic = raw.get("mic_pspl")
if mic is not None:
fv = _parse_float(mic)
if fv is not None:
out["mic_ppv"] = fv
# Record / pre-trigger duration — same drop-on-failure discipline.
rt = raw.get("record_time")
if rt is not None:
fv = _parse_float(rt)
if fv is not None:
out["record_time_sec"] = fv
pt = raw.get("pre_trigger_length")
if pt is not None:
fv = _parse_float(pt)
if fv is not None:
out["pre_trigger_sec"] = fv
# Project / client / operator / location strings. Thor's title
# strings are operator-defined; conventional mapping (per Thor's
# default TitleNote labels in the example data):
# TitleString1 = Location → project (sensor location identifier)
# TitleString2 = Client → client
# TitleString3 = Company → operator (the monitoring company)
# TitleString4 = Notes → notes
out["project"] = _take("title_string1")
out["client"] = _take("title_string2")
out["operator"] = _take("title_string3", "operator")
out["notes"] = _take("title_string4", "post_event_note")
if "setup" in raw:
out["setup"] = raw["setup"]
if "version" in raw:
out["version"] = raw["version"]
# Battery (e.g. "3.8 volts" → 3.8)
bl = raw.get("battery_level")
if bl is not None:
fv = _parse_float(bl)
if fv is not None:
out["battery_volts"] = fv
# Calibration line is free-form (e.g. "November 22, 2023 by Instantel").
if "calibration" in raw:
out["calibration_text"] = raw["calibration"]
# Sensor self-check results — bool flags
for key, out_key in (
("tran_test_results", "tran_test_passed"),
("vert_test_results", "vert_test_passed"),
("long_test_results", "long_test_passed"),
("mic_test_results", "mic_test_passed"),
):
v = raw.get(key)
if v is not None:
out[out_key] = v.strip().lower() == "passed"
return out
def serial_from_filename(name: str) -> Optional[str]:
"""Convenience: pull the serial prefix from a Thor event filename.
Thor uses the literal serial as the filename prefix:
UM11719_20231219163444.IDFW "UM11719"
BE9439_20200713124251.IDFH "BE9439"
"""
m = re.match(r"^([A-Z]{2}\d+)_\d{14}\.(IDFH|IDFW)(?:\.txt)?$",
name, re.IGNORECASE)
return m.group(1).upper() if m else None
def parse_event_filename(name: str) -> Optional[Tuple[str, datetime.datetime, str]]:
"""Parse `<SERIAL>_<YYYYMMDDHHMMSS>.<KIND>` → (serial, datetime, kind).
`kind` is "IDFH" or "IDFW" (upper-case). Returns None on no match.
"""
m = re.match(r"^([A-Z]{2}\d+)_(\d{14})\.(IDFH|IDFW)$",
name, re.IGNORECASE)
if not m:
return None
try:
ts = datetime.datetime.strptime(m.group(2), "%Y%m%d%H%M%S")
except ValueError:
return None
return m.group(1).upper(), ts, m.group(3).upper()
+530
View File
@@ -0,0 +1,530 @@
"""
micromate/idf_file.py Thor IDF binary codec.
Decodes the Instantel Micromate Series IV ``.IDFW`` (waveform) and
``.IDFH`` (histogram) binary on-disk format. Sister module to
``minimateplus/event_file_io.py``.
Status (2026-05-28):
- **Genuine Series IV / Thor binaries** are all signed
``00 12 01 00 00 00 Instantel\\0`` (sig-A in earlier notes). Two
Series III (Blastware) binaries appear in the example corpus
(``BE9439_*``) they share the ``.IDFW``/``.IDFH`` extension by
filing convention but carry a BW STRT header (``10 00 01 80 00 00
Instantel STRT...``) and are NOT Thor data. The reader detects
them by signature and raises NotImplementedError pointing callers
at ``minimateplus.event_file_io.read_blastware_file()``.
- **IDFW waveform body** reuses the BW segment-rotated block codec
verbatim. Body always starts at file offset ``0x0f1f``. Samples
decoded via ``minimateplus.waveform_codec.decode_waveform_v2``
with 8799% byte-exact match against ``.IDFW.txt`` sidecar (quiet
events). Loud events hit the BW codec's known walker-stops-early
limit. Residual ~3% drift on per-sample deltas likely a
Thor-specific 12-bit delta refinement that BW's codec doesn't
model. Geo LSB = 0.0003 in/s; mic factor ~2.14e-6 psi/count.
- **IDFH histogram body**: 12-byte segment header
``[len_be 2B] 0a 00 00 00 [00 NN_counter] 05 3f`` introduces a
segment of ``N`` 72-byte interval records (``N = (len - 10) // 72``).
Each record holds 4 × 16-byte per-channel min/max/halfp + 8-byte
tail. Geo peaks via ``max(|min|, |max|) / 32768 × 10`` in/s
(matches sidecar within ~1.8%), freq via ``512 / halfp`` Hz.
**All 859 Thor IDFH files in the corpus decode (181,071 intervals).**
- Binary metadata directly extracted: serial, timestamp, sample_rate,
record_time, calibration_date. Other fields fall back to the paired
``.IDFW.txt`` / ``.IDFH.txt`` sidecar (consumed by
``WaveformStore.save_imported_idf``).
The full reverse-engineering writeup lives in
``docs/idf_protocol_reference.md``.
"""
from __future__ import annotations
import datetime
import struct
from dataclasses import dataclass
from pathlib import Path
from typing import Optional, Union
from minimateplus.waveform_codec import decode_waveform_v2
from .models import IdfEvent, IdfPeaks, IdfReport
# Genuine Series IV / Thor IDF binary signature: 6 bytes, then ASCII "Instantel".
_THOR_PREFIX = b"\x00\x12\x01\x00\x00\x00"
# Stray Series III (Blastware) binaries that occasionally turn up in Thor
# corpus directories renamed to the .IDFW/.IDFH convention. Their header
# (`10 00 01 80 00 00 Instantel STRT ...`) is byte-for-byte a BW SUB 5A
# STRT record, not a Thor binary. Detected so we can refuse-and-route
# rather than mis-parse.
_BW_STRAY_PREFIX = b"\x10\x00\x01\x80\x00\x00"
_INSTANTEL_TAG = b"Instantel"
# Most common body offset for sig-A IDFW files (~50% of prod events;
# 151/154 in the original tests/fixtures/THORDATA_example corpus). The
# body is the segment-rotated block stream consumed by decode_waveform_v2;
# bytes [0:3] are the magic ``00 02 00`` preamble. Production events
# routinely use other offsets — see :func:`_find_waveform_body_offset`
# for the dynamic scan. This constant survives only as the priority hint.
_BODY_START_SIG_A = 0x0F1F
# Magic bytes that mark a candidate waveform-body preamble.
_BODY_MAGIC = b"\x00\x02\x00"
# Where to start looking for body candidates inside the file. Skip the
# fixed-header region where the same magic legitimately appears inside
# channel-test records and the compliance block (offsets 0x015d, 0x091c,
# 0x0ae2, 0x0d30 in observed events).
_BODY_SCAN_FLOOR = 0x0E00
# Geophone count → in/s, derived from sidecar ground truth: the smallest
# non-zero sample in 1,014-file corpus is 0.0003 in/s.
_GEO_LSB_IPS = 0.0003
# Microphone count → psi, derived from sidecar regression on 50 sample
# pairs from UM11719_20231219162723.IDFW (mic-heavy event).
_MIC_LSB_PSI = 2.14e-6
# IDFH histogram constants.
_IDFH_INTERVAL_SIZE = 72 # bytes per per-interval record
_IDFH_SEGMENT_HEADER = 10 # bytes: [len_be 2B][0a 00 00 00 4B][00 NN 2B][05 3f 2B]
_IDFH_SEGMENT_TAIL = 2 # bytes after the interval data block, before next marker
_IDFH_HALFP_FREQ_NUM = 512.0 # freq_hz = NUM / halfp; halfp ≤ 5 means ">100 Hz" sentinel
_IDFH_GEO_FULL_SCALE = 10.0 # in/s — Normal range
_IDFH_INT16_FS = 32768.0
_IDFH_CHANNELS = ("Tran", "Vert", "Long", "MicL")
# ─── Binary metadata extraction ─────────────────────────────────────────────
@dataclass
class IdfBinaryMetadata:
"""Fields recoverable from the sig-A binary header (no .txt needed)."""
serial: Optional[str] = None
event_datetime: Optional[datetime.datetime] = None
sample_rate: Optional[int] = None
record_time_sec: Optional[float] = None
calibration_date: Optional[datetime.date] = None
def _read_ascii_z(buf: bytes, off: int, maxlen: int = 64) -> Optional[str]:
if off >= len(buf):
return None
end = buf.find(b"\x00", off, off + maxlen)
if end < 0:
end = min(off + maxlen, len(buf))
s = buf[off:end].decode("ascii", errors="replace").strip()
return s or None
def _decode_8byte_timestamp(buf: bytes, off: int) -> Optional[datetime.datetime]:
"""Layout: ``[day][month][year_hi][year_lo][unknown][hour][min][sec]``."""
if off + 8 > len(buf):
return None
day, mon, yh, yl, _unk, hr, mn, sc = buf[off : off + 8]
year = (yh << 8) | yl
if not (2015 <= year <= 2050 and 1 <= mon <= 12 and 1 <= day <= 31
and 0 <= hr < 24 and 0 <= mn < 60 and 0 <= sc < 60):
return None
try:
return datetime.datetime(year, mon, day, hr, mn, sc)
except ValueError:
return None
def extract_binary_metadata(buf: bytes) -> IdfBinaryMetadata:
"""Pull serial/timestamp/sample_rate/record_time/calibration from the
sig-A binary header.
Field positions confirmed against UM11719_20231219162723.IDFW; stable
across the 151-file sig-A corpus.
"""
md = IdfBinaryMetadata()
# Serial: null-terminated ASCII at 0x14E.
md.serial = _read_ascii_z(buf, 0x14E, maxlen=16)
# Sample rate + record time live in a BW-compatible compliance block.
# Locate the 6-byte anchor `be 80 00 00 00 00` and read offsets relative
# to it: anchor-6 = sample_rate uint16 BE; anchor+6 = record_time float32 BE.
anchor = buf.find(b"\xbe\x80\x00\x00\x00\x00", 0x800, 0xA00)
if anchor > 0:
sr_bytes = buf[anchor - 6 : anchor - 4]
if len(sr_bytes) == 2:
sr = int.from_bytes(sr_bytes, "big")
if sr in (256, 512, 1024, 2048, 4096):
md.sample_rate = sr
rt_bytes = buf[anchor + 6 : anchor + 10]
if len(rt_bytes) == 4:
try:
rt = struct.unpack(">f", rt_bytes)[0]
if 0.1 <= rt <= 600.0:
md.record_time_sec = float(rt)
except struct.error:
pass
# Event timestamp: 8 bytes. Position differs between IDFW (0x97A) and
# IDFH (0x9F8); scan a small range and accept the first valid decode.
for off in (0x97A, 0x9F8):
ts = _decode_8byte_timestamp(buf, off)
if ts is not None:
md.event_datetime = ts
break
# Calibration date: day, month, year_be at 0x194-0x197.
if len(buf) > 0x197:
day, mon = buf[0x194], buf[0x195]
year = int.from_bytes(buf[0x196 : 0x198], "big")
if 1 <= mon <= 12 and 1 <= day <= 31 and 2015 <= year <= 2050:
try:
md.calibration_date = datetime.date(year, mon, day)
except ValueError:
pass
return md
# ─── Sample decoder + unit conversion ───────────────────────────────────────
def _find_waveform_body_offset(buf: bytes) -> Optional[int]:
"""Pick the file offset of the waveform body by trial-decoding every
``00 02 00`` magic position past the fixed-header region.
The body's location isn't fixed across all sig-A IDFW files about
half the production events use ``0x0f1f``, but the rest have offsets
that shift based on header padding / channel-config layout. We
auto-detect by:
1. Find every ``00 02 00`` occurrence past ``_BODY_SCAN_FLOOR``.
2. Try ``decode_waveform_v2()`` on each candidate.
3. Pick the offset whose decoded sample count is largest.
Returns the offset, or ``None`` if no candidate yielded more than
the trivial 2-sample preamble (= "no real body found").
Costs ~2-8 trial decodes per file; in practice the first candidate
past 0x0e00 is usually the right one.
"""
if len(buf) < _BODY_SCAN_FLOOR + 8:
return None
best: Optional[tuple[int, int]] = None # (total_samples, offset)
i = _BODY_SCAN_FLOOR
while True:
j = buf.find(_BODY_MAGIC, i)
if j < 0:
break
i = j + 1
try:
decoded = decode_waveform_v2(buf[j:])
except Exception:
continue
if not decoded:
continue
total = sum(len(v) for v in decoded.values())
# A "real" body has more than just the 2-sample preamble.
if total <= 2:
continue
if best is None or total > best[0]:
best = (total, j)
return best[1] if best else None
def _decode_waveform_samples(buf: bytes) -> Optional[dict]:
"""Decode samples from the sig-A waveform body.
Returns the raw decoder counts dict geo LSB = 0.0003 in/s, mic in
its own count unit (see :func:`mic_count_to_psi`). Returns None if
no usable body is found.
Uses :func:`_find_waveform_body_offset` to locate the body the
file-offset varies across events (~50% sit at the canonical
``0x0f1f`` but the rest don't), so the previous hardcoded constant
silently produced 2-sample preamble-only output for half the corpus.
"""
off = _find_waveform_body_offset(buf)
if off is None:
return None
return decode_waveform_v2(buf[off:])
def geo_count_to_ips(count: int) -> float:
"""Convert a Thor geo decoder count to in/s. LSB = 0.0003 in/s."""
return count * _GEO_LSB_IPS
def mic_count_to_psi(count: int) -> float:
"""Convert a Thor mic decoder count to psi. Scale derived from
regression over 50 sample pairs in UM11719_20231219162723.IDFW;
consistent to ~5%. Calibration constants from the channel block
can refine this once decoded.
"""
return count * _MIC_LSB_PSI
# ─── IDFH histogram decoder ─────────────────────────────────────────────────
@dataclass
class IdfhInterval:
"""One decoded histogram interval (typically one minute of monitoring)."""
offset: int # file byte offset of the 72-byte record
# Per-channel min/max ADC counts (int16 BE), half-period samples, peak count.
# Peak = max(|min|, |max|). freq_hz = 512/halfp (None if halfp ≤ 5 →
# ">100 Hz" sentinel; matches sidecar convention).
tran_min: int
tran_max: int
tran_halfp: int
vert_min: int
vert_max: int
vert_halfp: int
long_min: int
long_max: int
long_halfp: int
micl_min: int
micl_max: int
micl_halfp: int
def peak_count(self, channel: str) -> int:
mn = getattr(self, f"{channel.lower()}_min")
mx = getattr(self, f"{channel.lower()}_max")
return max(abs(mn), abs(mx))
def peak_ips(self, channel: str) -> float:
"""Convert peak count to in/s (geo channels only)."""
return self.peak_count(channel) / _IDFH_INT16_FS * _IDFH_GEO_FULL_SCALE
def freq_hz(self, channel: str) -> Optional[float]:
halfp = getattr(self, f"{channel.lower()}_halfp")
if halfp <= 5:
return None
return _IDFH_HALFP_FREQ_NUM / halfp
def _decode_idfh_interval(buf72: bytes, offset: int) -> IdfhInterval:
"""Decode one 72-byte interval record into per-channel min/max/halfp."""
import struct
fields = []
for i in range(4):
block = buf72[i * 16 : (i + 1) * 16]
mn = struct.unpack_from(">h", block, 0)[0]
mx = struct.unpack_from(">h", block, 2)[0]
# block[4:6] = int16 BE, role unknown (possibly time-of-peak)
halfp = struct.unpack_from(">H", block, 6)[0]
# block[10:12] and block[14:16] are uint16 BE with unknown semantics
# (likely sum / count contributions for the PVS computation).
fields.extend([mn, mx, halfp])
# Tail 8 bytes (buf72[64:72]) carry PVS-related data; not yet decoded.
return IdfhInterval(
offset=offset,
tran_min=fields[0], tran_max=fields[1], tran_halfp=fields[2],
vert_min=fields[3], vert_max=fields[4], vert_halfp=fields[5],
long_min=fields[6], long_max=fields[7], long_halfp=fields[8],
micl_min=fields[9], micl_max=fields[10], micl_halfp=fields[11],
)
def decode_idfh_body(buf: bytes) -> list:
"""Walk an IDFH file and decode every interval record.
The body has one or more segments; each segment header is 12 bytes:
``[length_be 2B][0a 00 00 00][00 NN_counter][05 3f]`` where ``length``
is bytes from the magic through the end of the interval block
(= 10 + 72 × n_intervals). Segments are separated by a 2-byte tail
+ next-segment 2-byte prefix (the bytes before the next length field).
Confirmed against the 859-file corpus (181,071 intervals decoded; 1
failure is the sig-B BE9439 file).
"""
intervals: list = []
i = 0
while True:
j = buf.find(b"\x0a\x00\x00\x00", i)
if j < 0 or j < 2:
break
# Validate: [length_be][0a 00 00 00][00 NN][05 3f]
if buf[j + 4] != 0x00 or buf[j + 6 : j + 8] != b"\x05\x3f":
i = j + 1
continue
length = int.from_bytes(buf[j - 2 : j], "big")
n = (length - _IDFH_SEGMENT_HEADER) // _IDFH_INTERVAL_SIZE
if n <= 0:
i = j + 1
continue
header_start = j - 2
interval_start = header_start + _IDFH_SEGMENT_HEADER
for k in range(n):
off = interval_start + k * _IDFH_INTERVAL_SIZE
if off + _IDFH_INTERVAL_SIZE > len(buf):
break
chunk = buf[off : off + _IDFH_INTERVAL_SIZE]
intervals.append(_decode_idfh_interval(chunk, off))
# Advance past this segment + the 2-byte tail.
i = header_start + length + _IDFH_SEGMENT_TAIL
return intervals
# ─── Top-level reader ───────────────────────────────────────────────────────
@dataclass
class IdfReadResult:
"""Return type for :func:`read_idf_file`.
For waveforms (``.IDFW``), ``samples`` holds the per-channel sample
arrays in Thor decoder counts. For histograms (``.IDFH``),
``samples`` is empty and ``intervals`` holds the per-interval
record list (peaks, freqs).
"""
event: IdfEvent
samples: dict # {"Tran": [...], ...} for IDFW; empty for IDFH
binary_metadata: IdfBinaryMetadata
signature: str # always "thor" for now (sig-A genuine Thor)
intervals: Optional[list] = None # list[IdfhInterval] for IDFH; None for IDFW
def read_idf_file(
path: Union[str, Path],
*,
data: Optional[bytes] = None,
) -> IdfReadResult:
"""Parse a Thor ``.IDFW`` binary into an ``IdfEvent`` + decoded samples.
Currently implements signature-A waveforms only. Signature-B
(old-firmware) and ``.IDFH`` histograms raise NotImplementedError;
use the paired ``.IDFW.txt`` / ``.IDFH.txt`` sidecar for those via
``parse_idf_report()``.
Returns an :class:`IdfReadResult`. The caller converts int sample
counts to physical units via :func:`geo_count_to_ips` /
:func:`mic_count_to_psi`.
``path`` is used for filename in error messages and ``.IDFH`` vs
``.IDFW`` suffix detection. When ``data`` is supplied the disk
read is skipped useful for ingest paths that already have the
bytes in memory and where the file may not exist on disk yet.
"""
p = Path(path)
buf = data if data is not None else p.read_bytes()
if len(buf) < 16 or buf[6:16] != _INSTANTEL_TAG + b"\x00":
raise ValueError(f"{p.name}: not an IDF file (missing Instantel magic)")
sig_prefix = buf[:6]
if sig_prefix == _THOR_PREFIX:
signature = "thor"
elif sig_prefix == _BW_STRAY_PREFIX:
raise NotImplementedError(
f"{p.name}: file has a Series III (Blastware) STRT header in "
"an IDF-named container — not a Thor binary. Route through "
"minimateplus.event_file_io.read_blastware_file() instead "
"(peaks decode; samples & full metadata don't, but it's not "
"Thor data so the Thor codec doesn't apply)."
)
else:
raise ValueError(f"{p.name}: unknown IDF signature {sig_prefix.hex()}")
is_histogram = p.suffix.upper() == ".IDFH"
md = extract_binary_metadata(buf)
if is_histogram:
intervals = decode_idfh_body(buf)
if not intervals:
raise ValueError(f"{p.name}: IDFH body decoded no intervals")
# Peaks: max across all intervals on each channel (per-channel max
# of stored max-magnitudes; sidecar's PPV row carries the same).
peak_tran = max((iv.peak_ips("Tran") for iv in intervals), default=0.0)
peak_vert = max((iv.peak_ips("Vert") for iv in intervals), default=0.0)
peak_long = max((iv.peak_ips("Long") for iv in intervals), default=0.0)
# Mic peak in psi — Thor stores per-interval mic ADC counts in the
# binary; convert the max count to psi via the per-count factor.
mic_peak_count = max((iv.peak_count("MicL") for iv in intervals), default=0)
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
rep = IdfReport(
serial_number=md.serial,
event_type="Full Histogram",
event_datetime=md.event_datetime,
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
)
peaks = IdfPeaks(
transverse_ips=peak_tran,
vertical_ips=peak_vert,
longitudinal_ips=peak_long,
peak_vector_sum_ips=None,
mic_pspl_dbl=None, # IDFH binary doesn't carry the dB(L) value
mic_pspl_psi=mic_peak_psi,
)
event = IdfEvent(
serial=md.serial or "UNKNOWN",
timestamp=md.event_datetime or datetime.datetime(1970, 1, 1),
kind="Histogram",
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
peaks=peaks,
report=rep,
)
return IdfReadResult(
event=event,
samples={},
binary_metadata=md,
signature=signature,
intervals=intervals,
)
# Waveform path.
decoded = _decode_waveform_samples(buf)
if decoded is None:
raise ValueError(f"{p.name}: waveform body codec failed")
rep = IdfReport(
serial_number=md.serial,
event_type="Full Waveform",
event_datetime=md.event_datetime,
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
)
def _peak_ips(ch: str) -> float:
arr = decoded.get(ch, [])
return geo_count_to_ips(max((abs(v) for v in arr), default=0))
# Mic peak psi from binary: max absolute MicL ADC count × 2.14e-6 psi/count.
mic_arr = decoded.get("MicL", [])
mic_peak_count = max((abs(v) for v in mic_arr), default=0)
mic_peak_psi = mic_count_to_psi(mic_peak_count) if mic_peak_count else None
peaks = IdfPeaks(
transverse_ips=_peak_ips("Tran"),
vertical_ips=_peak_ips("Vert"),
longitudinal_ips=_peak_ips("Long"),
# PVS requires aligned per-sample √(T²+V²+L²); leave None — the
# sidecar carries it and the bridge picks it up if present.
peak_vector_sum_ips=None,
mic_pspl_dbl=None, # binary IDFW doesn't carry the dB(L) value;
# sidecar .txt fills it via IdfReport.from_dict
mic_pspl_psi=mic_peak_psi,
)
event = IdfEvent(
serial=md.serial or "UNKNOWN",
timestamp=md.event_datetime or datetime.datetime(1970, 1, 1),
kind="Waveform",
filename=p.name,
sample_rate=md.sample_rate,
record_time_sec=md.record_time_sec,
peaks=peaks,
report=rep,
)
return IdfReadResult(
event=event,
samples=decoded,
binary_metadata=md,
signature=signature,
)
+323
View File
@@ -0,0 +1,323 @@
"""
micromate/idf_to_bw_report.py adapter that projects a parsed Thor IDF
report (+ binary metadata + decoded IDFH intervals) into the
``bw_report``-shaped dict that :mod:`sfm.report_pdf.gather_report_data`
consumes.
Lets Thor events flow through the existing Series III Event Report PDF
pipeline without duplicating the renderer. Thor's report content is
~95% the same data shape as BW's; the field names differ but the
underlying metrics map 1:1.
Caveats
- **Mic units** Thor records ``MicPSPL`` natively in dB(L). This
adapter sets ``bw_report.mic.pspl_dbl`` directly; the report
renderer recomputes the equivalent psi via its dBLpsi formula.
- **Saturation / above-range flags** Thor doesn't always mark
``OORANGE`` the way BW does; we set ``zc_freq_above_range`` only
when a `>100` sentinel was preserved in the raw text.
- **Per-interval data** for IDFH events we build ``interval_times``
by stepping ``IntervalSize`` from ``HistogramStartTime``; the binary
decoder confirms one record per step (882 / 881 / 881 ... across
the corpus).
- **calibration_by parsing** Thor's free-form ``Calibration : November
22, 2023 by Instantel`` is split on ``" by "`` to extract the
calibrator; the date prefix is parsed where possible, otherwise
the binary-extracted ``calibration_date`` from
:class:`micromate.idf_file.IdfBinaryMetadata` wins.
"""
from __future__ import annotations
import datetime
import re
from typing import Any, Dict, List, Optional
# ─── Helpers ────────────────────────────────────────────────────────────────
_NUM_RE = re.compile(r"-?\d+(?:\.\d+)?")
def _parse_first_number(s: Optional[str]) -> Optional[float]:
"""Pull the first numeric token from a string like ``"0.1500 in/s"``."""
if s is None:
return None
m = _NUM_RE.search(str(s))
if not m:
return None
try:
return float(m.group(0))
except ValueError:
return None
def _parse_interval_size_s(s: Optional[str]) -> Optional[float]:
"""``"60 sec"`` → 60.0, ``"5 min"`` → 300.0, ``"1 hour"`` → 3600."""
if s is None:
return None
num = _parse_first_number(s)
if num is None:
return None
sl = str(s).lower()
if "hour" in sl or "hr" in sl:
return num * 3600.0
if "min" in sl:
return num * 60.0
return num # default to seconds
def _parse_calibration(text: Optional[str]) -> tuple[Optional[str], Optional[str]]:
"""Split ``"November 22, 2023 by Instantel"`` → (ISO date, calibrator).
Returns ``(None, None)`` if neither half parses.
"""
if not text:
return None, None
parts = str(text).split(" by ", 1)
date_part = parts[0].strip() if parts else None
by_part = parts[1].strip() if len(parts) > 1 else None
iso_date: Optional[str] = None
if date_part:
for fmt in ("%B %d, %Y", "%b %d, %Y", "%Y-%m-%d", "%m/%d/%Y"):
try:
iso_date = datetime.datetime.strptime(date_part, fmt).date().isoformat()
break
except ValueError:
continue
return iso_date, by_part
def _channel_peaks(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]:
"""Map ``tran_ppv`` / ``tran_zc_freq`` / ... → bw_report.peaks.tran shape."""
out: Dict[str, Any] = {}
for src, dst in (
(f"{ch_lc}_ppv", "ppv_ips"),
(f"{ch_lc}_zc_freq", "zc_freq_hz"),
(f"{ch_lc}_time_of_peak", "time_of_peak_s"),
(f"{ch_lc}_peak_acceleration", "peak_accel_g"),
(f"{ch_lc}_peak_displacement", "peak_disp_in"),
):
v = idf.get(src)
if v is not None:
out[dst] = v
# ZC freq ">100" sentinel: the raw text carries it under the un-typed
# key (e.g. ``raw["tran_zc_freq"]`` would be ``">100"``), and our parser
# dropped the typed entry. Detect that case and flag.
raw_zc = idf.get(f"{ch_lc}_zc_freq")
if isinstance(raw_zc, str) and ">" in raw_zc:
out["zc_freq_above_range"] = True
out.pop("zc_freq_hz", None)
return out
def _sensor_check(idf: Dict[str, Any], ch_lc: str) -> Dict[str, Any]:
out: Dict[str, Any] = {}
fr = idf.get(f"{ch_lc}_test_freq")
if fr is not None:
out["freq_hz"] = _parse_first_number(fr)
rt = idf.get(f"{ch_lc}_test_ratio")
if rt is not None:
out["ratio"] = _parse_first_number(rt)
am = idf.get(f"{ch_lc}_test_amplitude")
if am is not None:
out["amplitude_mv"] = _parse_first_number(am)
res = idf.get(f"{ch_lc}_test_results")
if res is not None:
out["result"] = str(res).strip()
return {k: v for k, v in out.items() if v is not None}
def _interval_times(idf: Dict[str, Any], n_intervals: Optional[int]) -> List[str]:
"""Synthesise per-interval timestamps from start + interval_size × k.
Returns ``[]`` when start time or interval size is unknown.
"""
if not n_intervals:
return []
start_date = idf.get("histogram_start_date") or idf.get("event_date")
start_time = idf.get("histogram_start_time") or idf.get("event_time")
iv_str = idf.get("interval_size")
iv_s = _parse_interval_size_s(iv_str)
if not (start_date and start_time and iv_s):
return []
try:
t0 = datetime.datetime.strptime(f"{start_date} {start_time}", "%Y-%m-%d %H:%M:%S")
except ValueError:
return []
out = []
for k in range(int(n_intervals)):
t = t0 + datetime.timedelta(seconds=iv_s * (k + 1))
out.append(t.isoformat())
return out
# ─── Top-level adapter ──────────────────────────────────────────────────────
def build_bw_report_from_idf(
idf_report: Dict[str, Any],
*,
binary_md=None,
intervals: Optional[list] = None,
is_histogram: Optional[bool] = None,
) -> Dict[str, Any]:
"""Project a parsed IDF report dict (and optional binary metadata +
decoded IDFH intervals) into the BW report sidecar shape.
The returned dict is structurally identical to what
``minimateplus.event_file_io._bw_report_to_dict`` produces from a
real BW ASCII report it can be assigned to
``sidecar["bw_report"]`` and consumed verbatim by
``sfm.report_pdf.gather_report_data``.
``intervals`` is the list of :class:`micromate.idf_file.IdfhInterval`
objects from :func:`micromate.idf_file.decode_idfh_body`; only used
for histogram events to derive accurate ``interval_times``.
"""
if is_histogram is None:
et = str(idf_report.get("event_type", ""))
is_histogram = et.lower().startswith("full histogram")
# ── Trigger / recording / device ─────────────────────────────────────
trigger_channel = idf_report.get("trigger")
trigger_level = _parse_first_number(idf_report.get("geo_trigger_level"))
geo_range_ips = _parse_first_number(idf_report.get("geo_range"))
cal_iso, cal_by = _parse_calibration(idf_report.get("calibration"))
# Prefer the binary-extracted calibration_date when our text parse fell
# through; the binary date is unambiguous.
if cal_iso is None and binary_md is not None and binary_md.calibration_date:
cal_iso = binary_md.calibration_date.isoformat()
# ── Histogram fields ────────────────────────────────────────────────
hist_block: Dict[str, Any] = {
"start": None, "stop": None, "n_intervals": None,
"interval_size": None, "interval_size_s": None,
"channel_peak_when": {},
}
if is_histogram:
sd = idf_report.get("histogram_start_date")
st = idf_report.get("histogram_start_time")
if sd and st:
try:
hist_block["start"] = datetime.datetime.strptime(
f"{sd} {st}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
pass
ed = idf_report.get("histogram_stop_date")
et_ = idf_report.get("histogram_stop_time")
if ed and et_:
try:
hist_block["stop"] = datetime.datetime.strptime(
f"{ed} {et_}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
pass
n_raw = idf_report.get("number_of_intervals")
if n_raw is not None:
try:
# Thor reports a float like "81.04"; round to int (the BW
# report uses an int for the column).
hist_block["n_intervals"] = int(float(str(n_raw)))
except ValueError:
pass
# When the binary decoder gave us the actual interval count, prefer it.
if intervals is not None:
hist_block["n_intervals"] = len(intervals)
hist_block["interval_size"] = idf_report.get("interval_size")
hist_block["interval_size_s"] = _parse_interval_size_s(idf_report.get("interval_size"))
# interval_times derived from start+step (the BW report uses the
# exact strings; we match its representation).
times = _interval_times(idf_report, hist_block["n_intervals"])
# Per-channel peak when (absolute date+time at which the channel's
# peak occurred over the histogram run). Thor splits this into
# ``TranPeakDate`` / ``TranPeakTime`` etc.
peak_when: Dict[str, str] = {}
for ch_label, ch_lc in (("Tran", "tran"), ("Vert", "vert"), ("Long", "long"), ("MicL", "mic")):
d = idf_report.get(f"{ch_lc}_peak_date")
t = idf_report.get(f"{ch_lc}_peak_time")
if d and t:
try:
peak_when[ch_label] = datetime.datetime.strptime(
f"{d} {t}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
continue
if peak_when:
hist_block["channel_peak_when"] = peak_when
# ── Mic block ────────────────────────────────────────────────────────
mic_block = {
"weighting": "L", # Thor mic is ISEE Linear
"pspl_dbl": idf_report.get("mic_ppv"), # the dB(L) float
"pspl_saturated": False,
"zc_freq_hz": idf_report.get("mic_zc_freq"),
"zc_freq_above_range": isinstance(idf_report.get("mic_zc_freq"), str)
and ">" in str(idf_report.get("mic_zc_freq")),
"time_of_peak_s": idf_report.get("mic_time_of_peak"),
}
if mic_block["zc_freq_above_range"]:
mic_block["zc_freq_hz"] = None
# ── Peaks ────────────────────────────────────────────────────────────
vs_block = {
"ips": idf_report.get("peak_vector_sum"),
"time_s": _parse_first_number(idf_report.get("peak_vector_sum_time_sum")),
"when": None,
"saturated": False,
}
if is_histogram:
# PVS absolute date+time, when present.
vs_d = idf_report.get("peak_vector_sum_date")
vs_t = idf_report.get("peak_vector_sum_time")
if vs_d and vs_t:
try:
vs_block["when"] = datetime.datetime.strptime(
f"{vs_d} {vs_t}", "%Y-%m-%d %H:%M:%S"
).isoformat()
except ValueError:
pass
return {
"available": True,
"event_type": idf_report.get("event_type"),
"version": idf_report.get("version"),
"trigger": {
"channel": trigger_channel,
"geo_level_ips": trigger_level,
},
"recording": {
"sample_rate_sps": idf_report.get("sample_rate"),
"record_time_s": idf_report.get("record_time_sec"),
"pretrig_s": idf_report.get("pre_trigger_sec"),
"stop_mode": idf_report.get("record_stop_mode"),
"geo_range_ips": geo_range_ips,
"units": idf_report.get("units"),
},
"device": {
"battery_volts": idf_report.get("battery_volts"),
"calibration_date": cal_iso,
"calibration_by": cal_by,
},
"peaks": {
"tran": _channel_peaks(idf_report, "tran"),
"vert": _channel_peaks(idf_report, "vert"),
"long": _channel_peaks(idf_report, "long"),
"vector_sum": vs_block,
},
"mic": mic_block,
"sensor_check": {
"tran": _sensor_check(idf_report, "tran"),
"vert": _sensor_check(idf_report, "vert"),
"long": _sensor_check(idf_report, "long"),
"mic": _sensor_check(idf_report, "mic"),
},
"histogram": hist_block,
"monitor_log": [],
"pc_sw_version": None,
}
+398
View File
@@ -0,0 +1,398 @@
"""
Micromate (Series IV / Thor) native data models.
These are the right-shaped dataclasses for Thor data Thor measures
the microphone in dB(L) directly, so this model carries
``mic_pspl_dbl`` rather than the pseudo-``psi`` shoehorn that
``minimateplus.PeakValues`` uses for Series III BW data.
The ingest pipeline today goes:
.IDFW.txt parse_idf_report() dict
dict IdfEvent.from_report() IdfEvent (typed)
IdfEvent IdfEvent.to_minimateplus_event() shape DB / sidecar
machinery expects
The ``to_minimateplus_event()`` bridge is a temporary boundary when we
crack the binary IDF codec and have richer per-event data to store, the
DB schema will grow Series-IV-specific columns and the bridge will
shrink or disappear.
"""
from __future__ import annotations
import datetime
from dataclasses import dataclass, field
from typing import Any, Dict, Optional, Tuple
# ── IdfReport ─────────────────────────────────────────────────────────────────
@dataclass
class IdfReport:
"""Typed wrapper around the dict returned by ``parse_idf_report``.
All fields optional Thor's exporter is permissive and some IDF .txt
files (especially histograms) omit fields that waveform sidecars
include. Use ``.raw`` for any field this dataclass hasn't surfaced
yet (the parser keeps every recognised key in the raw dict).
"""
# Identity / kind
serial_number: Optional[str] = None
event_type: Optional[str] = None # "Full Waveform" | "Full Histogram"
event_datetime: Optional[datetime.datetime] = None
filename: Optional[str] = None # echoed by Thor's exporter
# Sampling / timing
sample_rate: Optional[int] = None # samples/sec
record_time_sec: Optional[float] = None
pre_trigger_sec: Optional[float] = None
# Geophone peaks (in/s)
tran_ppv: Optional[float] = None
vert_ppv: Optional[float] = None
long_ppv: Optional[float] = None
peak_vector_sum: Optional[float] = None
# Microphone — Thor's native unit is dB(L), NOT psi.
mic_pspl_dbl: Optional[float] = None
# Zero-crossing frequencies (Hz)
tran_zc_freq: Optional[float] = None
vert_zc_freq: Optional[float] = None
long_zc_freq: Optional[float] = None
mic_zc_freq: Optional[float] = None
# Per-channel time of peak (sec, since event start)
tran_time_of_peak: Optional[float] = None
vert_time_of_peak: Optional[float] = None
long_time_of_peak: Optional[float] = None
mic_time_of_peak: Optional[float] = None
# Derived per-channel motion
tran_peak_acceleration: Optional[float] = None # g
vert_peak_acceleration: Optional[float] = None
long_peak_acceleration: Optional[float] = None
tran_peak_displacement: Optional[float] = None # in
vert_peak_displacement: Optional[float] = None
long_peak_displacement: Optional[float] = None
# Operator-supplied strings (Thor's TitleString1..4 → semantic slots)
project: Optional[str] = None # TitleString1
client: Optional[str] = None # TitleString2
operator: Optional[str] = None # TitleString3
notes: Optional[str] = None # TitleString4 / PostEventNote
setup: Optional[str] = None # setup file name
# Sensor self-check results
tran_test_passed: Optional[bool] = None
vert_test_passed: Optional[bool] = None
long_test_passed: Optional[bool] = None
mic_test_passed: Optional[bool] = None
# Device-fixed metadata
firmware_version: Optional[str] = None
calibration_text: Optional[str] = None
battery_volts: Optional[float] = None
# Original parser dict — preserves every recognised key (including
# raw unit-suffixed strings) for forward-compatible field access.
raw: Dict[str, Any] = field(default_factory=dict, repr=False)
@classmethod
def from_dict(cls, d: Dict[str, Any]) -> "IdfReport":
"""Build an IdfReport from the dict returned by ``parse_idf_report``."""
ed = d.get("event_datetime")
if isinstance(ed, str):
try:
ed = datetime.datetime.fromisoformat(ed)
except ValueError:
ed = None
return cls(
serial_number = d.get("serial_number"),
event_type = d.get("event_type"),
event_datetime = ed if isinstance(ed, datetime.datetime) else None,
filename = d.get("filename"),
sample_rate = d.get("sample_rate"),
record_time_sec = d.get("record_time_sec"),
pre_trigger_sec = d.get("pre_trigger_sec"),
tran_ppv = d.get("tran_ppv"),
vert_ppv = d.get("vert_ppv"),
long_ppv = d.get("long_ppv"),
peak_vector_sum = d.get("peak_vector_sum"),
mic_pspl_dbl = d.get("mic_ppv"), # parser names it mic_ppv (legacy)
tran_zc_freq = d.get("tran_zc_freq"),
vert_zc_freq = d.get("vert_zc_freq"),
long_zc_freq = d.get("long_zc_freq"),
mic_zc_freq = d.get("mic_zc_freq"),
tran_time_of_peak = d.get("tran_time_of_peak"),
vert_time_of_peak = d.get("vert_time_of_peak"),
long_time_of_peak = d.get("long_time_of_peak"),
mic_time_of_peak = d.get("mic_time_of_peak"),
tran_peak_acceleration = d.get("tran_peak_acceleration"),
vert_peak_acceleration = d.get("vert_peak_acceleration"),
long_peak_acceleration = d.get("long_peak_acceleration"),
tran_peak_displacement = d.get("tran_peak_displacement"),
vert_peak_displacement = d.get("vert_peak_displacement"),
long_peak_displacement = d.get("long_peak_displacement"),
project = d.get("project"),
client = d.get("client"),
operator = d.get("operator"),
notes = d.get("notes"),
setup = d.get("setup"),
tran_test_passed = d.get("tran_test_passed"),
vert_test_passed = d.get("vert_test_passed"),
long_test_passed = d.get("long_test_passed"),
mic_test_passed = d.get("mic_test_passed"),
firmware_version = d.get("version"),
calibration_text = d.get("calibration_text"),
battery_volts = d.get("battery_volts"),
raw = d,
)
# ── IdfPeaks / IdfProjectInfo / IdfSensorCheck (narrow grouping types) ───────
@dataclass
class IdfPeaks:
"""Geophone + mic peak values for one Thor event. Native Thor units.
Thor stores the mic peak in two parallel forms ``mic_pspl_dbl`` is
what the sidecar's top-level ``MicPSPL`` header field carries (dB(L)),
used in the report header. ``mic_pspl_psi`` is the psi value derived
either from the IDFW sample table / IDFH interval column 9, or from
the binary mic counts (~2.14e-6 psi/count). Needed because the
BW-shaped ``PeakValues.micl`` consumed by ``event_hdf5.write_event_hdf5``
expects psi feeding it dB(L) makes the h5 mic-chart scale factor
blow up.
"""
transverse_ips: Optional[float] = None # in/s
vertical_ips: Optional[float] = None # in/s
longitudinal_ips: Optional[float] = None # in/s
peak_vector_sum_ips: Optional[float] = None # in/s
mic_pspl_dbl: Optional[float] = None # dB(L)
mic_pspl_psi: Optional[float] = None # psi
@dataclass
class IdfProjectInfo:
"""Operator-supplied strings from Thor's TitleString1..4."""
project: Optional[str] = None
client: Optional[str] = None
operator: Optional[str] = None
notes: Optional[str] = None
setup: Optional[str] = None
@dataclass
class IdfSensorCheck:
"""Per-channel pass/fail from Thor's self-test."""
tran: Optional[bool] = None
vert: Optional[bool] = None
long: Optional[bool] = None
mic: Optional[bool] = None
# ── IdfEvent ─────────────────────────────────────────────────────────────────
@dataclass
class IdfEvent:
"""A single Thor / Micromate Series IV event.
Built from a parsed .IDFW.txt or .IDFH.txt sidecar via
``IdfEvent.from_report()``. The filename is the authoritative
source for serial + timestamp + kind; the .txt provides
device-authoritative peak values, frequencies, project strings,
sensor self-check, firmware, calibration.
"""
# Identity
serial: str
timestamp: datetime.datetime
kind: str # "Waveform" | "Histogram"
filename: str # device-native binary filename, e.g. "UM11719_20231219163444.IDFW"
# Sampling / timing
sample_rate: Optional[int] = None
record_time_sec: Optional[float] = None
pre_trigger_sec: Optional[float] = None
# Peaks
peaks: IdfPeaks = field(default_factory=IdfPeaks)
# Per-channel frequencies (Hz)
tran_zc_freq: Optional[float] = None
vert_zc_freq: Optional[float] = None
long_zc_freq: Optional[float] = None
mic_zc_freq: Optional[float] = None
# Project strings
project_info: IdfProjectInfo = field(default_factory=IdfProjectInfo)
# Sensor self-check
sensor_check: IdfSensorCheck = field(default_factory=IdfSensorCheck)
# Device-fixed
firmware_version: Optional[str] = None
calibration_text: Optional[str] = None
battery_volts: Optional[float] = None
# The full parsed report — preserves anything not surfaced as a typed field
report: IdfReport = field(default_factory=IdfReport)
@classmethod
def from_report(
cls,
report: Any,
filename: str,
) -> "IdfEvent":
"""Build an IdfEvent from a parsed report (dict or IdfReport) and
the device-native binary filename.
The filename is authoritative for serial + timestamp + kind:
Thor's filenames are literal ``<SERIAL>_<YYYYMMDDHHMMSS>.<KIND>``
and the device's own clock is the canonical event timestamp.
If the report carries an ``event_datetime`` that differs from
what's in the filename, the report wins (it has finer-grained
device-reported time-of-trigger semantics).
"""
from .idf_ascii_report import parse_event_filename
# Normalise input to IdfReport
if isinstance(report, IdfReport):
rep = report
elif isinstance(report, dict):
rep = IdfReport.from_dict(report)
else:
raise TypeError(
f"report must be IdfReport or dict; got {type(report).__name__}"
)
# Filename → (serial, timestamp, kind). Required — fall back to
# report-supplied values only if filename parsing fails.
parsed = parse_event_filename(filename)
if parsed is not None:
fn_serial, fn_ts, fn_kind = parsed
kind = "Histogram" if fn_kind == "IDFH" else "Waveform"
else:
fn_serial = rep.serial_number or "UNKNOWN"
fn_ts = rep.event_datetime or datetime.datetime(1970, 1, 1)
kind = "Waveform" if (rep.event_type or "").lower().startswith("full waveform") else "Histogram"
# Prefer report's event_datetime (device-authoritative) over the filename.
ts = rep.event_datetime or fn_ts
serial = rep.serial_number or fn_serial
return cls(
serial=serial,
timestamp=ts,
kind=kind,
filename=filename,
sample_rate=rep.sample_rate,
record_time_sec=rep.record_time_sec,
pre_trigger_sec=rep.pre_trigger_sec,
peaks=IdfPeaks(
transverse_ips = rep.tran_ppv,
vertical_ips = rep.vert_ppv,
longitudinal_ips = rep.long_ppv,
peak_vector_sum_ips = rep.peak_vector_sum,
mic_pspl_dbl = rep.mic_pspl_dbl,
),
tran_zc_freq=rep.tran_zc_freq,
vert_zc_freq=rep.vert_zc_freq,
long_zc_freq=rep.long_zc_freq,
mic_zc_freq=rep.mic_zc_freq,
project_info=IdfProjectInfo(
project=rep.project,
client=rep.client,
operator=rep.operator,
notes=rep.notes,
setup=rep.setup,
),
sensor_check=IdfSensorCheck(
tran=rep.tran_test_passed,
vert=rep.vert_test_passed,
long=rep.long_test_passed,
mic=rep.mic_test_passed,
),
firmware_version=rep.firmware_version,
calibration_text=rep.calibration_text,
battery_volts=rep.battery_volts,
report=rep,
)
# ── Bridge to minimateplus shape (for the existing DB / sidecar paths) ──
def to_minimateplus_event(self, waveform_key: bytes) -> Any:
"""Project this Thor event into the shape ``minimateplus.Event``
carries, so it can flow through the existing
``SeismoDb.insert_events()`` and ``event_to_sidecar_dict()``
machinery without those code paths needing to know about Thor.
Caveats of the bridge:
- ``PeakValues.micl`` carries the mic peak in **psi** (matching
BW's convention) — set from :attr:`IdfPeaks.mic_pspl_psi`,
with a dB(L)psi fallback when only the dB(L) value is
available. This is what the h5 writer's mic-scale-factor
logic needs. The dB(L) value still flows through
``bw_report.mic.pspl_dbl`` (set by the
``idf_to_bw_report`` adapter) and the renderer reads it
from there for the report header.
- Many Thor-specific fields (Peak Acceleration / Displacement,
sensor self-check, calibration) don't have a slot in
``Event``. The full IdfReport is preserved on the
``.sfm.json`` sidecar under ``extensions.idf_report`` via
``save_imported_idf`` that's the source of truth for them.
"""
from minimateplus.models import (
Event, PeakValues, ProjectInfo, Timestamp,
)
ts_obj = Timestamp(
raw=bytes(9),
flag=0,
year=self.timestamp.year,
unknown_byte=0,
month=self.timestamp.month,
day=self.timestamp.day,
hour=self.timestamp.hour,
minute=self.timestamp.minute,
second=self.timestamp.second,
)
# Resolve mic peak as psi. Priority: binary-derived mic_pspl_psi
# (set by read_idf_file) > dB(L)→psi fallback via standard formula
# (psi = 2.9e-9 × 10^(dBL/20)) > None.
mic_psi = self.peaks.mic_pspl_psi
if mic_psi is None and self.peaks.mic_pspl_dbl is not None:
mic_psi = 2.9e-9 * (10.0 ** (self.peaks.mic_pspl_dbl / 20.0))
pv = PeakValues(
tran=self.peaks.transverse_ips,
vert=self.peaks.vertical_ips,
long=self.peaks.longitudinal_ips,
micl=mic_psi, # psi, matching BW's convention (h5 scaling depends on this)
peak_vector_sum=self.peaks.peak_vector_sum_ips,
)
pi = ProjectInfo(
setup_name=self.project_info.setup,
project=self.project_info.project,
client=self.project_info.client,
operator=self.project_info.operator,
sensor_location=None, # Thor folds location into project string
notes=self.project_info.notes,
)
ev = Event(
index=0,
timestamp=ts_obj,
sample_rate=self.sample_rate,
peak_values=pv,
project_info=pi,
record_type=self.kind,
rectime_seconds=self.record_time_sec,
)
ev._waveform_key = waveform_key
return ev
+11 -3
View File
@@ -20,8 +20,16 @@ Typical usage (TCP / modem):
"""
from .client import MiniMateClient
from .models import DeviceInfo, Event
from .transport import SerialTransport, TcpTransport
from .models import DeviceInfo, Event, MonitorLogEntry
from .transport import CapturingTransport, SerialTransport, TcpTransport
__version__ = "0.1.0"
__all__ = ["MiniMateClient", "DeviceInfo", "Event", "SerialTransport", "TcpTransport"]
__all__ = [
"MiniMateClient",
"DeviceInfo",
"Event",
"MonitorLogEntry",
"SerialTransport",
"TcpTransport",
"CapturingTransport",
]
File diff suppressed because it is too large Load Diff
+738
View File
@@ -0,0 +1,738 @@
"""
minimateplus/bw_ascii_report.py parser for Blastware's per-event ASCII
report (the .TXT file BW writes alongside each saved event binary).
The ASCII export is the authoritative source for every "rich" per-event
field that BW computes from the waveform but never persists in the BW
binary itself:
- Per-channel PPV (Tran / Vert / Long / MicL)
- Peak Vector Sum + Peak Vector Sum Time
- Per-channel ZC Freq, Time of Peak, Peak Acceleration, Peak Displacement
- MicL PSPL, MicL Time of Peak, MicL ZC Freq
- Per-channel Sensor Self-Check (Test Freq / Test Ratio / Test Results)
- MicL Test Amplitude (mV)
- Battery, calibration date, monitor-log timestamps
Persisting these values into the SFM database lets the monthly-summary
review workflow ("show me events at Location X with PVS > 0.5") work
without depending on the (still-undecoded) waveform body codec.
Format (verified against decode-re/5-8-26 4-event bundle):
- One field per line, wrapped in double quotes: `"Field Name : Value"`
- Field/value separator: literal ` : ` (space-colon-space).
- Some field names contain an internal `:` already (e.g. `"Project:"`),
so we split on the FIRST ` : ` only.
- Some fields have unit suffixes: `"0.500 in/s"` / `"7.5 Hz"` / `"533 mv"`.
- A `"Monitor Log(s)"` marker line is followed by tab-separated rows
of `start_time<TAB>stop_time<TAB>description`.
- Final `"PC SW Version : ..."` line ends the metadata block.
- A blank line separates metadata from the sample table.
- Sample table starts with ` Tran <TAB> Vert <TAB>...`, then
one row per sample (tab-separated, right-padded numeric values).
- Geo channel values are in in/s; MicL in dB(L) (or 0.000 below threshold).
Because some metadata fields have whitespace quirks ("MicL Time of
Peak" has two spaces; the leading "Project:" value has its own colon),
we normalise whitespace in the key before lookup.
"""
from __future__ import annotations
import datetime
import re
from dataclasses import dataclass, field
from pathlib import Path
from typing import Dict, List, Optional, Tuple, Union
# ─────────────────────────────────────────────────────────────────────────────
# Output dataclasses
# ─────────────────────────────────────────────────────────────────────────────
@dataclass
class ChannelStats:
"""Per-channel derived stats, populated from an event report."""
ppv_ips: Optional[float] = None # in/s (geo channels only)
zc_freq_hz: Optional[float] = None # Hz
time_of_peak_s: Optional[float] = None # seconds (relative to trigger; can be negative)
peak_accel_g: Optional[float] = None # g (geo channels only)
peak_disp_in: Optional[float] = None # in (geo channels only)
# When BW writes "OORANGE" (Out Of Range — truncated) for a PPV
# value, the true peak exceeded the channel's full-scale range.
# We substitute the range max (e.g. 10.000 in/s for Normal range)
# as a lower bound, and flag here so downstream UI / alerts know
# to render "> 10 in/s" or "saturated" instead of trusting the
# value as an exact measurement.
ppv_saturated: bool = False
# Set when BW writes ">100 Hz" for ZC Freq — the zero-crossing
# algorithm's peak frequency exceeded the device's reporting
# ceiling (typically 100 Hz on V10.72). zc_freq_hz gets the
# threshold (100.0) as a lower bound; downstream UI renders ">100".
zc_freq_above_range: bool = False
@dataclass
class MicStats:
"""MicL-specific stats."""
weighting: Optional[str] = None # e.g. "Linear Weighting"
pspl_dbl: Optional[float] = None # dB(L)
zc_freq_hz: Optional[float] = None
time_of_peak_s: Optional[float] = None
# Set when BW writes "OORANGE" for PSPL — mic exceeded its
# measurement range. pspl_dbl gets the conservative upper bound
# 140 dBL (typical NL-43 max; some units cap at 148). Consumers
# should render "> 140 dB(L)" or similar when this flag is set.
pspl_saturated: bool = False
# Same semantics as ChannelStats.zc_freq_above_range — mic ZC
# peak exceeded device reporting ceiling.
zc_freq_above_range: bool = False
@dataclass
class SensorCheck:
"""Per-channel sensor self-check result.
Geo channels report a frequency + ratio; MicL reports a frequency +
amplitude (mV). All channels also have a Pass/Fail string.
"""
test_freq_hz: Optional[float] = None
test_ratio: Optional[float] = None # geo channels only
test_amplitude_mv: Optional[float] = None # MicL only
test_results: Optional[str] = None # "Passed" / "Failed"
@dataclass
class MonitorLogEntry:
"""One row of the trailing Monitor Log(s) block."""
start_time: Optional[datetime.datetime] = None
stop_time: Optional[datetime.datetime] = None
description: Optional[str] = None
# BW saturation marker — appears in PPV / Peak Vector Sum / similar
# numeric fields when the underlying measurement exceeded the
# channel's full-scale range (e.g., a geophone reading > 10 in/s at
# Normal range, or a mic exceeding its sensitivity ceiling). Treated
# as "≥ range_max" + a saturated flag rather than discarded.
# Appears as: ``"Tran PPV : OORANGE in/s"``
_OORANGE_MARKERS = ("OORANGE", "OUT OF RANGE")
def _is_oorange(value: str) -> bool:
"""True when a BW numeric field is an Out-Of-Range saturation marker."""
s = value.strip().upper()
return any(m in s for m in _OORANGE_MARKERS)
def _parse_above_range(value: str) -> Optional[float]:
"""For BW "above-range" markers like ">100 Hz", return the threshold.
BW writes ZC Freq as ">100 Hz" when the zero-crossing algorithm sees
a peak too fast to count (device cuts off at 100 Hz). Returns the
numeric portion after the '>' (e.g. 100.0), or None if `value` is
not an above-range marker.
"""
s = value.strip()
if not s.startswith(">"):
return None
return _parse_number(s[1:])
@dataclass
class BwAsciiReport:
"""Structured representation of one BW per-event ASCII export."""
# ── Identity ─────────────────────────────────────────────────────────────
event_type: Optional[str] = None # e.g. "Full Waveform"
serial: Optional[str] = None # e.g. "BE11529"
version: Optional[str] = None # firmware version line
file_name: Optional[str] = None # e.g. "M529LK44.AB0"
event_datetime: Optional[datetime.datetime] = None # parsed from Event Time + Event Date
# ── Trigger / recording config ──────────────────────────────────────────
trigger_channel: Optional[str] = None # e.g. "Vert" or "From Unit"
geo_trigger_level_ips: Optional[float] = None
pretrig_s: Optional[float] = None # negative seconds
record_time_s: Optional[float] = None
record_stop_mode: Optional[str] = None
sample_rate_sps: Optional[int] = None
battery_volts: Optional[float] = None
calibration_date: Optional[datetime.date] = None
calibration_by: Optional[str] = None # e.g. "Instantel"
units: Optional[str] = None # e.g. "in/s and dB(L)"
# ── Operator-supplied metadata ──────────────────────────────────────────
# Parsed by POSITION from the 4-line "User Notes" block BW writes
# between the `Units :` and `Geo Range :` lines. Position-based so
# the values populate correctly even when an operator renames the
# labels in Blastware's Compliance Setup → Notes tab (the 4 labels
# are user-editable, e.g. "Seis Loc:" → "Building:" → "Site Address:").
# The original labels BW wrote are preserved in `user_note_labels`
# so terra-view can render them as the operator named them.
project: Optional[str] = None # position 1 (BW default label "Project:")
client: Optional[str] = None # position 2 (BW default label "Client:")
operator: Optional[str] = None # position 3 (BW default label "User Name:")
sensor_location: Optional[str] = None # position 4 (BW default label "Seis Loc:")
# Maps canonical slot name → the literal label BW wrote in the ASCII
# export. Empty if the User Notes block wasn't present. Example
# when the operator renamed slot 4 to "Building:":
# {"project": "Project:", "client": "Client:",
# "operator": "User Name:", "sensor_location": "Building:"}
user_note_labels: Dict[str, str] = field(default_factory=dict)
# ── Geo channel scaling ─────────────────────────────────────────────────
geo_range_ips: Optional[float] = None # 10.000 / 1.250
# ── Per-channel derived stats (geo + mic) ───────────────────────────────
channels: Dict[str, ChannelStats] = field(default_factory=dict)
mic: MicStats = field(default_factory=MicStats)
# ── Vector sum ──────────────────────────────────────────────────────────
peak_vector_sum_ips: Optional[float] = None
peak_vector_sum_time_s: Optional[float] = None
# Saturation flag — set when BW writes "OORANGE" for the PVS. We
# then substitute sqrt(3) * geo_range_ips as a conservative upper
# bound (the theoretical maximum PVS when all 3 geo channels are
# simultaneously at full-scale). Consumers should display this as
# ">{value} in/s" or similar.
peak_vector_sum_saturated: bool = False
# Histograms additionally have an absolute date+time for the PVS
# (it occurred at a specific interval). Waveform reports show
# only the relative-time value above.
peak_vector_sum_when: Optional[datetime.datetime] = None
# ── Histogram-specific fields (populated only when Event Type starts
# with 'Histogram' / 'Full Histogram' / 'Histogram + Continuous') ──
histogram_start: Optional[datetime.datetime] = None
histogram_stop: Optional[datetime.datetime] = None
histogram_n_intervals: Optional[int] = None # e.g. 4, 1436
histogram_interval_size_str: Optional[str] = None # "1 minute" / "5 minutes" / "15 seconds"
histogram_interval_size_s: Optional[float] = None # parsed to seconds
# Per-channel absolute peak time+date (histogram-specific). For
# waveform events these are None — those reports use the channel's
# time_of_peak_s (relative to trigger) instead. Keyed by channel
# name ("Tran", "Vert", "Long", "MicL").
channel_peak_when: Dict[str, datetime.datetime] = field(default_factory=dict)
# ── Sensor self-check (per channel) ─────────────────────────────────────
sensor_check: Dict[str, SensorCheck] = field(default_factory=dict)
# ── Monitor log + tooling version ───────────────────────────────────────
monitor_log: List[MonitorLogEntry] = field(default_factory=list)
pc_sw_version: Optional[str] = None
# ── Sample table (optional; only parsed if requested) ───────────────────
# Each entry: (Tran, Vert, Long, MicL) in the report's units (geo
# channels in in/s, MicL in dB(L)). None when parse_samples=False.
samples: Optional[List[Tuple[float, float, float, float]]] = None
# ─────────────────────────────────────────────────────────────────────────────
# Helpers
# ─────────────────────────────────────────────────────────────────────────────
_KEY_NORMALISE_RE = re.compile(r"\s+")
_NUMERIC_RE = re.compile(r"^-?\d+(?:\.\d+)?")
def _normalise_key(k: str) -> str:
"""Collapse whitespace runs (incl. tabs) and strip — handles BW's
"MicL Time of Peak" double-space and leading-colon quirks."""
return _KEY_NORMALISE_RE.sub(" ", k).strip()
def _strip_quotes(line: str) -> str:
line = line.rstrip("\r\n")
if len(line) >= 2 and line.startswith('"') and line.endswith('"'):
return line[1:-1]
return line
def _parse_number(value: str) -> Optional[float]:
"""Pull the leading numeric portion out of a value like "0.500 in/s"."""
m = _NUMERIC_RE.match(value.strip())
if not m:
return None
try:
return float(m.group(0))
except ValueError:
return None
def _parse_int(value: str) -> Optional[int]:
n = _parse_number(value)
return None if n is None else int(round(n))
# Months exactly as BW writes them.
_MONTHS = {
"January": 1, "February": 2, "March": 3, "April": 4,
"May": 5, "June": 6, "July": 7, "August": 8,
"September": 9, "October": 10, "November": 11, "December": 12,
# Short forms used in monitor-log rows ("Apr 23 /26").
"Jan": 1, "Feb": 2, "Mar": 3, "Apr": 4, "Jun": 6, "Jul": 7,
"Aug": 8, "Sep": 9, "Oct": 10, "Nov": 11, "Dec": 12,
}
def _parse_event_date(s: str) -> Optional[datetime.date]:
"""Parse "April 23, 2026" or "May 8, 2026" → date."""
s = s.strip()
parts = s.replace(",", " ").split()
if len(parts) < 3:
return None
month_name, day_str, year_str = parts[0], parts[1], parts[2]
month = _MONTHS.get(month_name)
if month is None:
return None
try:
return datetime.date(int(year_str), month, int(day_str))
except ValueError:
return None
def _parse_iso_date(s: str) -> Optional[datetime.date]:
"""Parse "2026-05-16" → date. Histograms use ISO format for their
Start Date / Stop Date / Peak Date fields; waveforms use the
"May 8, 2026" long form which `_parse_event_date` handles."""
s = s.strip()
try:
return datetime.date.fromisoformat(s)
except ValueError:
return None
_INTERVAL_UNIT_SECONDS = {
"second": 1, "seconds": 1, "sec": 1, "secs": 1,
"minute": 60, "minutes": 60, "min": 60, "mins": 60,
"hour": 3600, "hours": 3600, "hr": 3600, "hrs": 3600,
}
def _parse_interval_size(s: str) -> Optional[float]:
"""Parse "1 minute" / "5 minutes" / "15 seconds" / "2 seconds" → seconds.
Handles the BW Compliance Setup Histogram Interval values verbatim
("2 seconds", "5 seconds", "15 seconds", "1 minute", "5 minutes",
"15 minutes") plus a few defensive variants.
"""
if not s:
return None
parts = s.strip().split()
if len(parts) < 2:
return None
try:
n = float(parts[0])
except ValueError:
return None
unit_per_s = _INTERVAL_UNIT_SECONDS.get(parts[1].lower())
if unit_per_s is None:
return None
return n * unit_per_s
def _parse_event_time(s: str) -> Optional[datetime.time]:
"""Parse "15:56:35" → time."""
s = s.strip()
try:
h, m, sec = s.split(":")
return datetime.time(int(h), int(m), int(sec))
except (ValueError, IndexError):
return None
def _parse_calibration(value: str) -> Tuple[Optional[datetime.date], Optional[str]]:
"""Parse "April 29, 2025 by Instantel" → (date, "Instantel")."""
parts = value.split(" by ", 1)
date = _parse_event_date(parts[0])
by = parts[1].strip() if len(parts) > 1 else None
return date, by
def _parse_monitor_row(line: str) -> Optional[MonitorLogEntry]:
"""Parse a tab-separated monitor log row.
Format: `<start>\t<stop>\t<desc>` where each timestamp is BW's
short form "Mon DD /YY HH:MM:SS" (e.g. "Apr 23 /26 15:46:16").
Year is encoded as a 2-digit suffix; we expand "/26" 2026.
"""
parts = line.split("\t")
if len(parts) < 2:
return None
start = _parse_monitor_ts(parts[0])
stop = _parse_monitor_ts(parts[1])
desc = parts[2].strip() if len(parts) > 2 else None
if start is None and stop is None and not desc:
return None
return MonitorLogEntry(start_time=start, stop_time=stop, description=desc)
def _parse_monitor_ts(s: str) -> Optional[datetime.datetime]:
"""Parse "Apr 23 /26 15:46:16" → datetime."""
s = s.strip()
parts = s.split()
if len(parts) < 4:
return None
month = _MONTHS.get(parts[0])
if month is None:
return None
try:
day = int(parts[1])
# parts[2] looks like "/26" → century-flip to 2026
yy = int(parts[2].lstrip("/"))
year = 2000 + yy if yy < 80 else 1900 + yy
h, m, sec = (int(x) for x in parts[3].split(":"))
return datetime.datetime(year, month, day, h, m, sec)
except (ValueError, IndexError):
return None
# ── User-notes positional slot map ──────────────────────────────────────────
#
# Blastware's Compliance Setup → Notes tab shows four operator-supplied
# fields whose LABELS the operator can rename (see screenshot in
# project archive). Defaults are "Project:" / "Client:" /
# "User Name:" / "Seis Loc:", but an operator using a different
# convention can rename them to anything ("Building:", "Site:",
# "Address:", etc.). The ASCII export reflects whatever the operator
# typed, so label-based matching is fragile.
#
# What IS reliable: BW always writes the 4 user-notes lines in the
# same order, contiguously between the `Units :` line and the
# `Geo Range :` line. We parse them by POSITION and preserve the
# operator's labels in `report.user_note_labels` so terra-view can
# render them as the operator intended.
_USER_NOTE_SLOTS = ("project", "client", "operator", "sensor_location")
# ─────────────────────────────────────────────────────────────────────────────
# Top-level parser
# ─────────────────────────────────────────────────────────────────────────────
def parse_report(text: Union[str, bytes], *, parse_samples: bool = False) -> BwAsciiReport:
"""Parse a BW per-event ASCII export into a structured BwAsciiReport.
Set ``parse_samples=True`` to also populate ``report.samples`` with
the trailing sample table. Default False because the table is
huge and most callers only want metadata for indexing.
"""
if isinstance(text, bytes):
text = text.decode("ascii", errors="replace")
report = BwAsciiReport()
# Pre-create channel stat slots so callers can rely on them existing.
for ch in ("Tran", "Vert", "Long", "MicL"):
report.channels.setdefault(ch, ChannelStats())
report.sensor_check.setdefault(ch, SensorCheck())
lines = text.splitlines()
i = 0
n = len(lines)
in_monitor_log_section = False
event_time_str: Optional[str] = None
event_date: Optional[datetime.date] = None
# User-notes block detection. We enter the block after parsing
# the "Units :" line and exit on the "Geo Range :" line. Inside,
# the first 4 unmatched `<label> : <value>` lines are assigned to
# the 4 canonical operator-supplied slots by POSITION (project,
# client, operator, sensor_location) regardless of what the
# operator named the labels in BW's Compliance Setup → Notes tab.
in_user_notes_block = False
user_note_position = 0
# Histogram-field staging — BW writes <Channel> Peak Time and
# <Channel> Peak Date on separate lines (and similarly Histogram
# Start Time / Date). We stash the partial value when the time
# line arrives and combine it when the matching date line arrives.
_hist_start_time: Optional[datetime.time] = None
_hist_stop_time: Optional[datetime.time] = None
_pending_peak_time: Dict[str, Optional[datetime.time]] = {}
_pvs_time_raw: Optional[str] = None # last Peak Vector Sum Time value, raw
while i < n:
raw_line = lines[i]
i += 1
# Blank line marks the start of the sample table.
if raw_line.strip() == "":
break
line = _strip_quotes(raw_line)
# Monitor log section: "Monitor Log(s)" header followed by N rows
# (still inside double-quoted lines), terminated by a non-row line
# like "PC SW Version : ..." or a blank line.
if not in_monitor_log_section and line.strip() == "Monitor Log(s)":
in_monitor_log_section = True
continue
if in_monitor_log_section:
# Heuristic: monitor rows contain a tab; the next "Field : Value"
# line ends the section.
if "\t" in line:
entry = _parse_monitor_row(line)
if entry:
report.monitor_log.append(entry)
continue
# Falls through to the field parser below; clear the flag.
in_monitor_log_section = False
# "Field : Value" — split on FIRST occurrence of " : "
idx = line.find(" : ")
if idx < 0:
continue
key = _normalise_key(line[:idx])
value = line[idx + 3 :].strip()
# ── Identity / config ────────────────────────────────────────────────
if key == "Event Type": report.event_type = value
elif key == "Serial Number": report.serial = value
elif key == "Version": report.version = value
elif key == "File Name": report.file_name = value
elif key == "Event Time": event_time_str = value
elif key == "Event Date": event_date = _parse_event_date(value)
elif key == "Trigger": report.trigger_channel = value
elif key == "Geo Trigger Level": report.geo_trigger_level_ips = _parse_number(value)
elif key == "Pre-trigger Length": report.pretrig_s = _parse_number(value)
elif key == "Record Time": report.record_time_s = _parse_number(value)
elif key == "Record Stop Mode": report.record_stop_mode = value
elif key == "Sample Rate": report.sample_rate_sps = _parse_int(value)
elif key == "Battery Level": report.battery_volts = _parse_number(value)
elif key == "Calibration":
report.calibration_date, report.calibration_by = _parse_calibration(value)
elif key == "Units":
report.units = value
# Entering the user-notes block. Next ~4 lines until
# "Geo Range :" are the operator-supplied notes.
in_user_notes_block = True
user_note_position = 0
elif key == "Geo Range":
# Exiting the user-notes block.
in_user_notes_block = False
report.geo_range_ips = _parse_number(value)
# User-notes block: assign by position (operator may have
# renamed the labels, so we don't trust them). Preserve the
# original labels in `user_note_labels` for downstream UIs
# (terra-view) that want to display them as the operator
# named them.
elif in_user_notes_block and user_note_position < len(_USER_NOTE_SLOTS):
slot = _USER_NOTE_SLOTS[user_note_position]
setattr(report, slot, value)
report.user_note_labels[slot] = key
user_note_position += 1
# ── Per-channel stats ────────────────────────────────────────────────
# All match the pattern "{Channel} <stat-name>"
elif key in (
"Tran PPV", "Vert PPV", "Long PPV",
"Tran ZC Freq", "Vert ZC Freq", "Long ZC Freq",
"Tran Time of Peak", "Vert Time of Peak", "Long Time of Peak",
"Tran Peak Acceleration", "Vert Peak Acceleration", "Long Peak Acceleration",
"Tran Peak Displacement", "Vert Peak Displacement", "Long Peak Displacement",
):
ch_name, stat = key.split(" ", 1)
cs = report.channels.setdefault(ch_name, ChannelStats())
if stat == "PPV":
if _is_oorange(value):
# Channel saturated — substitute range max as lower
# bound; flag so downstream UI can render "> 10 in/s".
cs.ppv_ips = report.geo_range_ips
cs.ppv_saturated = True
else:
cs.ppv_ips = _parse_number(value)
elif stat == "ZC Freq":
# ">100 Hz" → store threshold + flag; numeric → parse normally
threshold = _parse_above_range(value)
if threshold is not None:
cs.zc_freq_hz = threshold
cs.zc_freq_above_range = True
else:
cs.zc_freq_hz = _parse_number(value)
else:
num = _parse_number(value)
if stat == "Time of Peak": cs.time_of_peak_s = num
elif stat == "Peak Acceleration": cs.peak_accel_g = num
elif stat == "Peak Displacement": cs.peak_disp_in = num
# ── Histogram-specific fields ────────────────────────────────────────
# Histograms have Start/Stop time+date pairs + an interval count
# and size, plus per-channel absolute Peak Time/Date instead of
# the waveform's relative Time of Peak.
elif key == "Histogram Start Time":
_hist_start_time = _parse_event_time(value)
elif key == "Histogram Start Date":
_d = _parse_iso_date(value)
if _d and _hist_start_time:
report.histogram_start = datetime.datetime.combine(_d, _hist_start_time)
elif key == "Histogram Stop Time":
_hist_stop_time = _parse_event_time(value)
elif key == "Histogram Stop Date":
_d = _parse_iso_date(value)
if _d and _hist_stop_time:
report.histogram_stop = datetime.datetime.combine(_d, _hist_stop_time)
elif key == "Number of Intervals":
try:
report.histogram_n_intervals = int(float(value.strip()))
except ValueError:
pass
elif key == "Interval Size":
report.histogram_interval_size_str = value.strip()
report.histogram_interval_size_s = _parse_interval_size(value)
# ── Per-channel histogram Peak Date / Peak Time ──
# Lines like "Tran Peak Time : 22:31:38" + "Tran Peak Date : 2026-05-16"
elif key in ("Tran Peak Time", "Vert Peak Time", "Long Peak Time", "MicL Time"):
ch_name = "MicL" if key == "MicL Time" else key.split(" ", 1)[0]
_pending_peak_time[ch_name] = _parse_event_time(value)
elif key in ("Tran Peak Date", "Vert Peak Date", "Long Peak Date", "MicL Date"):
ch_name = "MicL" if key == "MicL Date" else key.split(" ", 1)[0]
_d = _parse_iso_date(value)
_t = _pending_peak_time.get(ch_name)
if _d and _t:
report.channel_peak_when[ch_name] = datetime.datetime.combine(_d, _t)
# ── Vector Sum ───────────────────────────────────────────────────────
elif key == "Peak Vector Sum":
if _is_oorange(value):
# PVS saturated — conservative upper bound is
# sqrt(3) * geo_range_ips (all 3 channels at full-scale).
# Real PVS could be lower (channels rarely peak
# simultaneously) but never higher within the range.
if report.geo_range_ips is not None:
import math as _math
report.peak_vector_sum_ips = _math.sqrt(3) * report.geo_range_ips
report.peak_vector_sum_saturated = True
else:
report.peak_vector_sum_ips = _parse_number(value)
# BW writes the PVS-time label with a typo: "Peak Vector Sum TimeSum"
# (looks like Sum got appended twice). Accept both forms. Confirmed
# against actual BW output on 2026-05-27 — every PVS-time line in
# the field examples (T190, T438, K557) uses the typo'd label.
elif key in ("Peak Vector Sum Time", "Peak Vector Sum TimeSum"):
report.peak_vector_sum_time_s = _parse_number(value)
_pvs_time_raw = value
elif key == "Peak Vector Sum Date":
# Histogram-mode PVS gets paired with a date. We may have
# captured 'Peak Vector Sum Time' as either a relative
# seconds float (waveform) or an HH:MM:SS string we
# interpreted as a number. For histograms, BW writes
# "Peak Vector Sum Time : 22:33:52" which _parse_number
# parses as 22.0 (loses information). When Peak Vector Sum
# Date arrives, re-parse the previous PVS time line as a
# clock time and combine into an absolute datetime.
_d = _parse_iso_date(value)
if _d and _pvs_time_raw is not None:
_t = _parse_event_time(_pvs_time_raw)
if _t:
report.peak_vector_sum_when = datetime.datetime.combine(_d, _t)
# The earlier seconds parse was bogus for histograms;
# clear it so consumers don't think it's a real offset.
report.peak_vector_sum_time_s = None
# ── Microphone block ────────────────────────────────────────────────
elif key == "Microphone":
report.mic.weighting = value
elif key == "MicL PSPL":
if _is_oorange(value):
# Mic saturated — substitute conservative upper bound 140 dBL.
report.mic.pspl_dbl = 140.0
report.mic.pspl_saturated = True
else:
report.mic.pspl_dbl = _parse_number(value)
# Mirror onto the "MicL" entry in channels so callers querying
# `channels["MicL"].ppv_ips` see something — but it's dB(L), not
# in/s, so we store as-is in the MicStats and mark the channel.
elif key == "MicL Time of Peak":
report.mic.time_of_peak_s = _parse_number(value)
cs = report.channels.setdefault("MicL", ChannelStats())
cs.time_of_peak_s = report.mic.time_of_peak_s
elif key == "MicL ZC Freq":
threshold = _parse_above_range(value)
if threshold is not None:
report.mic.zc_freq_hz = threshold
report.mic.zc_freq_above_range = True
else:
report.mic.zc_freq_hz = _parse_number(value)
cs = report.channels.setdefault("MicL", ChannelStats())
cs.zc_freq_hz = report.mic.zc_freq_hz
cs.zc_freq_above_range = report.mic.zc_freq_above_range
# ── Sensor self-check ────────────────────────────────────────────────
elif key in (
"Tran Test Freq", "Vert Test Freq", "Long Test Freq", "MicL Test Freq",
"Tran Test Ratio", "Vert Test Ratio", "Long Test Ratio",
"MicL Test Amplitude",
"Tran Test Results", "Vert Test Results", "Long Test Results", "MicL Test Results",
):
ch_name, stat = key.split(" ", 1)
sc = report.sensor_check.setdefault(ch_name, SensorCheck())
if stat == "Test Freq": sc.test_freq_hz = _parse_number(value)
elif stat == "Test Ratio": sc.test_ratio = _parse_number(value)
elif stat == "Test Amplitude": sc.test_amplitude_mv = _parse_number(value)
elif stat == "Test Results": sc.test_results = value
# ── Trailer ─────────────────────────────────────────────────────────
elif key == "PC SW Version":
report.pc_sw_version = value
# Unknown keys are silently dropped — forward-compat for future
# BW versions that may add fields.
# Combine event date + time into a datetime
if event_date is not None and event_time_str is not None:
t = _parse_event_time(event_time_str)
if t is not None:
report.event_datetime = datetime.datetime.combine(event_date, t)
if parse_samples:
report.samples = _parse_sample_table(lines, i)
return report
def _parse_sample_table(
lines: List[str], start: int,
) -> List[Tuple[float, float, float, float]]:
"""Parse the trailing sample table.
The table starts with a header row (" Tran <TAB>...") and continues
until EOF. Each data row is a tab-separated quartet of numeric values.
"""
samples: List[Tuple[float, float, float, float]] = []
seen_header = False
for line in lines[start:]:
line = line.rstrip("\r\n")
if not line.strip():
continue
cols = [c.strip() for c in line.split("\t") if c.strip()]
if not seen_header:
# Header row contains channel names; numeric rows don't.
if any(c in ("Tran", "Vert", "Long", "MicL") for c in cols):
seen_header = True
continue
if len(cols) < 4:
continue
try:
samples.append((
float(cols[0]), float(cols[1]),
float(cols[2]), float(cols[3]),
))
except ValueError:
continue
return samples
def parse_report_file(
path: Union[str, Path], *, parse_samples: bool = False,
) -> BwAsciiReport:
"""Convenience: read a .TXT file from disk and parse it."""
return parse_report(Path(path).read_bytes(), parse_samples=parse_samples)
+1217 -241
View File
File diff suppressed because it is too large Load Diff
+927
View File
@@ -0,0 +1,927 @@
"""
minimateplus/event_file_io.py modern event-file (.sfm.json sidecar) IO.
This module is the single home for event-file conversion code that doesn't
fit cleanly inside `blastware_file.py` (which is the BW binary codec):
- sidecar JSON read/write (the modern per-event metadata file)
- read_blastware_file() reverse of write_blastware_file, used by
the BW-importer flow when SFM is ingesting files produced by
Blastware's own ACH (where the source A5 frames aren't available).
Sidecar schema v1 layout see docs in the project plan or the schema
declared in `event_to_sidecar_dict()`.
"""
from __future__ import annotations
import datetime
import hashlib
import json
import logging
import os
import struct
from pathlib import Path
from typing import Optional, Union
from .models import Event, PeakValues, ProjectInfo, Timestamp
from . import blastware_file as _bw # avoid circular reference at module load
from .bw_ascii_report import BwAsciiReport
from .waveform_codec import decode_waveform_v2, decoded_to_adc_counts
from .histogram_codec import decode_histogram_body
# Reference pressure for dB(L) → psi conversion (20 µPa expressed in psi).
# Same constant as sfm/sfm_webapp.html so server-side and browser-side
# conversions agree.
_DBL_REF_PSI = 2.9e-9
log = logging.getLogger(__name__)
# Schema version for the sidecar JSON. Bump when fields change shape.
# Older readers must reject anything > SCHEMA_VERSION; newer fields added
# inside `extensions` are forward-compatible without a bump.
SCHEMA_VERSION = 1
SIDECAR_KIND = "sfm.event"
# Default tool_version stamp; callers can override. Hard-coded here
# rather than read via importlib.metadata because the latter reflects the
# *installed* dist-info, which doesn't update when pyproject.toml is
# bumped without a `pip install` re-run — leading to confusing stale
# version stamps in sidecars. Bump this constant and CHANGELOG.md
# together at release time.
TOOL_VERSION = "0.21.1"
try:
# Best-effort: prefer the installed metadata when it's NEWER than the
# baked-in constant (e.g. a downstream packager bumped the wheel
# without editing this file). Otherwise fall back to TOOL_VERSION.
from importlib.metadata import version as _pkg_version
_meta_v = _pkg_version("seismo-relay")
def _vtuple(s):
try:
return tuple(int(p) for p in s.split(".")[:3])
except Exception:
return (0, 0, 0)
_TOOL_VERSION_DEFAULT = (
_meta_v if _vtuple(_meta_v) > _vtuple(TOOL_VERSION) else TOOL_VERSION
)
except Exception:
_TOOL_VERSION_DEFAULT = TOOL_VERSION
# ── Sidecar dict construction ─────────────────────────────────────────────────
def _ts_iso(ts: Optional[Timestamp]) -> Optional[str]:
if ts is None:
return None
try:
return datetime.datetime(
ts.year, ts.month, ts.day,
ts.hour or 0, ts.minute or 0, ts.second or 0,
).isoformat()
except Exception:
return str(ts)
def _peak_values_to_dict(pv: Optional[PeakValues]) -> dict:
if pv is None:
return {
"transverse": None,
"vertical": None,
"longitudinal": None,
"vector_sum": None,
"mic_psi": None,
}
return {
"transverse": pv.tran,
"vertical": pv.vert,
"longitudinal": pv.long,
"vector_sum": pv.peak_vector_sum,
"mic_psi": pv.micl,
}
def _bw_report_to_dict(report: BwAsciiReport) -> dict:
"""Project a parsed BW ASCII report into the sidecar's `bw_report` block.
All fields are rendered as plain JSON-compatible types (no datetime
objects). Channels are uniformly lowercased for stable JSON keys.
"""
def _ch(ch_name: str) -> dict:
cs = report.channels.get(ch_name)
if cs is None:
return {}
out = {
"ppv_ips": cs.ppv_ips,
"zc_freq_hz": cs.zc_freq_hz,
"time_of_peak_s": cs.time_of_peak_s,
"peak_accel_g": cs.peak_accel_g,
"peak_disp_in": cs.peak_disp_in,
}
# Drop all-None entries — keeps the JSON tidy for partial reports.
out = {k: v for k, v in out.items() if v is not None}
# Saturation flag (only present when True) — signals that ppv_ips
# is the channel range max (a lower bound), not an exact reading.
if getattr(cs, "ppv_saturated", False):
out["ppv_saturated"] = True
# ZC Freq above device reporting ceiling (BW ">100 Hz") — value
# in zc_freq_hz is the threshold, not an exact measurement.
if getattr(cs, "zc_freq_above_range", False):
out["zc_freq_above_range"] = True
return out
def _sc(ch_name: str) -> dict:
sc = report.sensor_check.get(ch_name)
if sc is None:
return {}
out = {
"freq_hz": sc.test_freq_hz,
"ratio": sc.test_ratio,
"amplitude_mv": sc.test_amplitude_mv,
"result": sc.test_results,
}
return {k: v for k, v in out.items() if v is not None}
monitor_log = []
for entry in report.monitor_log:
e = {
"start": entry.start_time.isoformat() if entry.start_time else None,
"stop": entry.stop_time.isoformat() if entry.stop_time else None,
"description": entry.description,
}
monitor_log.append({k: v for k, v in e.items() if v is not None})
return {
"available": True,
"event_type": report.event_type,
"version": report.version,
"trigger": {
"channel": report.trigger_channel,
"geo_level_ips": report.geo_trigger_level_ips,
},
"recording": {
"sample_rate_sps": report.sample_rate_sps,
"record_time_s": report.record_time_s,
"pretrig_s": report.pretrig_s,
"stop_mode": report.record_stop_mode,
"geo_range_ips": report.geo_range_ips,
"units": report.units,
},
"device": {
"battery_volts": report.battery_volts,
"calibration_date": report.calibration_date.isoformat() if report.calibration_date else None,
"calibration_by": report.calibration_by,
},
"peaks": {
"tran": _ch("Tran"),
"vert": _ch("Vert"),
"long": _ch("Long"),
"vector_sum": {
"ips": report.peak_vector_sum_ips,
"time_s": report.peak_vector_sum_time_s,
# Histogram events have an absolute date+time for the PVS
# (the interval at which it occurred); waveform events
# only have the time_s offset.
"when": report.peak_vector_sum_when.isoformat() if report.peak_vector_sum_when else None,
# Set when BW reported the PVS as OORANGE — value is the
# conservative upper bound sqrt(3) * geo_range_ips, not
# an exact peak.
"saturated": bool(getattr(report, "peak_vector_sum_saturated", False)),
},
},
"mic": {
"weighting": report.mic.weighting,
"pspl_dbl": report.mic.pspl_dbl,
"pspl_saturated": bool(getattr(report.mic, "pspl_saturated", False)),
"zc_freq_hz": report.mic.zc_freq_hz,
"zc_freq_above_range": bool(getattr(report.mic, "zc_freq_above_range", False)),
"time_of_peak_s": report.mic.time_of_peak_s,
},
"sensor_check": {
"tran": _sc("Tran"),
"vert": _sc("Vert"),
"long": _sc("Long"),
"mic": _sc("MicL"),
},
# Histogram-specific fields (None on waveform-mode events).
# Per-channel absolute peak time/date for histograms — for
# waveforms see channels[ch]["time_of_peak_s"] instead.
"histogram": {
"start": report.histogram_start.isoformat() if report.histogram_start else None,
"stop": report.histogram_stop.isoformat() if report.histogram_stop else None,
"n_intervals": report.histogram_n_intervals,
"interval_size": report.histogram_interval_size_str,
"interval_size_s": report.histogram_interval_size_s,
"channel_peak_when": {ch: dt.isoformat() for ch, dt in report.channel_peak_when.items()},
},
"monitor_log": monitor_log,
"pc_sw_version": report.pc_sw_version,
}
def _dbl_to_psi(pspl_dbl: float) -> float:
"""Convert dB(L) sound pressure level back to psi. Uses the same
20 µPa reference (= 2.9e-9 psi) as the webapp so server-side and
browser-side conversions agree."""
return _DBL_REF_PSI * (10.0 ** (pspl_dbl / 20.0))
def apply_report_to_event(event: Event, report: BwAsciiReport) -> None:
"""Overlay device-authoritative fields from a parsed BW ASCII report
onto an in-memory Event, IN-PLACE.
Why this exists
`read_blastware_file()` parses the BW binary and fills `Event.peak_values`
via `_peaks_from_samples()` which runs the (still-undecoded) BW body
codec assuming raw int16 LE and produces ±32K-shaped noise on every
channel. Result: peak values land in the SeismoDb event row as
~10 in/s on every event regardless of the actual signal.
When a paired BW ASCII report is available, the report carries the
device's own authoritative peak / project / sample-rate / record-time
values. This helper folds those onto the Event before it flows to
`SeismoDb.insert_events()`, so the DB columns reflect the report
rather than the broken-codec output.
Fields overlaid (only when the report supplies a non-None value):
- peak_values.tran / .vert / .long (from report.channels)
- peak_values.peak_vector_sum (from report.peak_vector_sum_ips)
- peak_values.micl (psi) (from report.mic.pspl_dbl psi)
- project_info.project / .client / .operator / .sensor_location
- sample_rate (from report.sample_rate_sps)
- rectime_seconds (from report.record_time_s)
Fields NOT touched (operator-edit / parser-output preserved):
- timestamp, raw_samples, record_type, total_samples,
pretrig_samples, _waveform_key, _a5_frames, _raw_record
- false_trigger and review state (those live on the sidecar, not on Event)
"""
if event.peak_values is None:
event.peak_values = PeakValues()
pv = event.peak_values
ch = report.channels
if (t := ch.get("Tran")) and t.ppv_ips is not None: pv.tran = t.ppv_ips
if (v := ch.get("Vert")) and v.ppv_ips is not None: pv.vert = v.ppv_ips
if (l := ch.get("Long")) and l.ppv_ips is not None: pv.long = l.ppv_ips
if report.peak_vector_sum_ips is not None:
pv.peak_vector_sum = report.peak_vector_sum_ips
if report.mic.pspl_dbl is not None and report.mic.pspl_dbl > 0:
pv.micl = _dbl_to_psi(report.mic.pspl_dbl)
if event.project_info is None:
event.project_info = ProjectInfo()
pi = event.project_info
if report.project: pi.project = report.project
if report.client: pi.client = report.client
if report.operator: pi.operator = report.operator
if report.sensor_location: pi.sensor_location = report.sensor_location
if report.sample_rate_sps:
event.sample_rate = report.sample_rate_sps
if report.record_time_s is not None:
event.rectime_seconds = report.record_time_s
def apply_bw_report_dict_to_event(event: Event, bw_report: dict) -> None:
"""Mirror of ``apply_report_to_event`` for the projected sidecar
dict shape (as produced by ``_bw_report_to_dict``).
Why this exists
The ingest path holds a live ``BwAsciiReport`` parsed straight from
the ``_ASCII.TXT`` and uses ``apply_report_to_event`` to overlay
device-authoritative peaks onto the codec output before insert.
The backfill path doesn't have the original ``.TXT`` (it's not
retained in the waveform store), but it does have the preserved
``bw_report`` block from the sidecar which contains the same
projected fields. Re-overlaying those during a backfill keeps the
DB peak columns aligned with what BW reports rather than letting
the codec output (which may be incomplete for unhandled formats or
walker edge cases) win by default.
No-ops cleanly when ``bw_report`` is ``None``, empty, or missing
any particular sub-field only fields with a concrete value get
written. Mirrors ``apply_report_to_event``'s "report wins where
present" semantics.
"""
if not bw_report:
return
if event.peak_values is None:
event.peak_values = PeakValues()
pv = event.peak_values
peaks = bw_report.get("peaks") or {}
tran = (peaks.get("tran") or {}).get("ppv_ips")
vert = (peaks.get("vert") or {}).get("ppv_ips")
long = (peaks.get("long") or {}).get("ppv_ips")
if tran is not None: pv.tran = tran
if vert is not None: pv.vert = vert
if long is not None: pv.long = long
vs_ips = (peaks.get("vector_sum") or {}).get("ips")
if vs_ips is not None:
pv.peak_vector_sum = vs_ips
mic = bw_report.get("mic") or {}
pspl = mic.get("pspl_dbl")
if pspl is not None and pspl > 0:
pv.micl = _dbl_to_psi(pspl)
rec = bw_report.get("recording") or {}
sr = rec.get("sample_rate_sps")
if sr:
event.sample_rate = sr
rt = rec.get("record_time_s")
if rt is not None:
event.rectime_seconds = rt
def _project_info_to_dict(pi: Optional[ProjectInfo]) -> dict:
if pi is None:
return {
"project": None,
"client": None,
"operator": None,
"sensor_location": None,
}
return {
"project": pi.project,
"client": pi.client,
"operator": pi.operator,
"sensor_location": pi.sensor_location,
}
def event_to_sidecar_dict(
event: Event,
*,
serial: str,
blastware_filename: str,
blastware_filesize: int,
blastware_sha256: str,
source_kind: str = "sfm-live",
txt_filename: Optional[str] = None,
a5_pickle_filename: Optional[str] = None,
tool_version: str = _TOOL_VERSION_DEFAULT,
captured_at: Optional[datetime.datetime] = None,
review: Optional[dict] = None,
extensions: Optional[dict] = None,
bw_report: Optional[BwAsciiReport] = None,
) -> dict:
"""
Build a v1 sidecar dict from an Event + the surrounding metadata.
Pure helper no file I/O. Callers stitch the result into a sidecar
via `write_sidecar()` (or POST it back via the PATCH endpoint).
When *bw_report* is supplied (e.g. by the ACH-forwarded import path
where Blastware writes a per-event ASCII report alongside the binary),
its decoded fields are folded into the sidecar:
- A new top-level ``bw_report`` block carries the rich derived
per-channel stats (Peak Acceleration, Peak Displacement, ZC Freq,
Time of Peak), the Peak Vector Sum + time, the per-channel sensor
self-check results, and monitor-log timestamps.
- ``peak_values`` is overlaid from the report (the report's PPV/PVS
values are computed by the device firmware and are authoritative;
anything ``read_blastware_file()`` derived from samples is
approximate at best until the body codec is decoded).
- ``project_info`` is overlaid from the report when the report
supplies a non-empty value (the report mirrors the device's
compliance config, which is what BW shows in its event report).
- ``event.timestamp`` is overlaid from the report's Event Date +
Event Time (BW's report timestamps are second-resolution and
match the binary's footer; we prefer the report value because
the BW-binary footer timestamp can drift on some firmware).
"""
if source_kind not in {"sfm-live", "sfm-ach", "bw-import", "idf-import"}:
raise ValueError(f"unknown source_kind: {source_kind!r}")
captured_at = captured_at or datetime.datetime.utcnow()
# ── Overlay event fields from the report when present ───────────────────
timestamp_iso = _ts_iso(event.timestamp)
if bw_report and bw_report.event_datetime:
timestamp_iso = bw_report.event_datetime.isoformat()
# Build peak_values, optionally overlaid from the report. The report
# stores Mic peak as PSPL (dB(L)); we convert to psi to match the
# existing peak_values.mic_psi field.
peak_dict = _peak_values_to_dict(event.peak_values)
if bw_report:
ch = bw_report.channels
if (t := ch.get("Tran")) and t.ppv_ips is not None: peak_dict["transverse"] = t.ppv_ips
if (v := ch.get("Vert")) and v.ppv_ips is not None: peak_dict["vertical"] = v.ppv_ips
if (l := ch.get("Long")) and l.ppv_ips is not None: peak_dict["longitudinal"] = l.ppv_ips
if bw_report.peak_vector_sum_ips is not None:
peak_dict["vector_sum"] = bw_report.peak_vector_sum_ips
if bw_report.mic.pspl_dbl is not None and bw_report.mic.pspl_dbl > 0:
peak_dict["mic_psi"] = _dbl_to_psi(bw_report.mic.pspl_dbl)
# Project info: overlay from report (the report mirrors the
# session-start compliance config that BW renders in event reports).
proj_dict = _project_info_to_dict(event.project_info)
if bw_report:
if bw_report.project: proj_dict["project"] = bw_report.project
if bw_report.client: proj_dict["client"] = bw_report.client
if bw_report.operator: proj_dict["operator"] = bw_report.operator
if bw_report.sensor_location: proj_dict["sensor_location"] = bw_report.sensor_location
# Event-block fields: overlay from report where available.
event_block = {
"serial": serial,
"timestamp": timestamp_iso,
"waveform_key": event._waveform_key.hex() if event._waveform_key else None,
"record_type": event.record_type,
"sample_rate": event.sample_rate,
"rectime_seconds": event.rectime_seconds,
"total_samples": event.total_samples,
"pretrig_samples": event.pretrig_samples,
}
if bw_report:
# Report values are authoritative — they're the user-configured
# values BW reads back, not STRT-derived guesses. In particular
# `event.rectime_seconds` from `read_blastware_file()` reads
# STRT[18] which is actually the `0x46` record-type marker (= 70)
# rather than the user's Record Time setting. Always overwrite.
if bw_report.sample_rate_sps:
event_block["sample_rate"] = bw_report.sample_rate_sps
if bw_report.record_time_s is not None:
event_block["rectime_seconds"] = bw_report.record_time_s
# Derive total_samples + pretrig_samples per channel from the
# report's sample_rate × times. These match the row count of
# the report's sample table (verified: event-c reports 1024 sps
# × (1.0 + 0.25) = 1280 rows).
if (sr := bw_report.sample_rate_sps) and bw_report.record_time_s is not None:
pretrig_s = abs(bw_report.pretrig_s) if bw_report.pretrig_s is not None else 0.0
event_block["total_samples"] = int(round(sr * (bw_report.record_time_s + pretrig_s)))
event_block["pretrig_samples"] = int(round(sr * pretrig_s))
out = {
"schema_version": SCHEMA_VERSION,
"kind": SIDECAR_KIND,
"event": event_block,
"peak_values": peak_dict,
"project_info": proj_dict,
"blastware": {
"filename": blastware_filename,
"filesize": blastware_filesize,
"sha256": blastware_sha256,
"available": True,
},
"source": {
"kind": source_kind,
"captured_at": captured_at.isoformat() + "Z" if captured_at.tzinfo is None else captured_at.isoformat(),
"tool_version": tool_version,
"a5_pickle_filename": a5_pickle_filename,
"txt_filename": txt_filename,
},
"review": review or {
"false_trigger": False,
"reviewer": None,
"reviewed_at": None,
"notes": "",
},
"extensions": extensions or {},
}
if bw_report:
out["bw_report"] = _bw_report_to_dict(bw_report)
return out
# ── Sidecar IO ────────────────────────────────────────────────────────────────
def write_sidecar(path: Union[str, Path], data: dict) -> None:
"""
Atomic write of a sidecar dict to <path>.
Validates schema_version is supported before writing so we don't
silently drop a future-format sidecar over the wire.
"""
path = Path(path)
sv = data.get("schema_version")
if not isinstance(sv, int) or sv < 1 or sv > SCHEMA_VERSION:
raise ValueError(
f"write_sidecar: unsupported schema_version={sv!r} "
f"(this build supports 1..{SCHEMA_VERSION})"
)
tmp = path.with_suffix(path.suffix + ".tmp")
with tmp.open("w", encoding="utf-8") as f:
json.dump(data, f, indent=2, sort_keys=False, default=str)
f.write("\n")
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
def read_sidecar(path: Union[str, Path]) -> dict:
"""
Load a sidecar JSON file.
Raises FileNotFoundError if missing, ValueError on bad shape /
unsupported schema_version. Unknown keys at the top level are
preserved in the returned dict (forward-compat).
"""
path = Path(path)
with path.open("r", encoding="utf-8") as f:
data = json.load(f)
if not isinstance(data, dict):
raise ValueError(f"sidecar at {path}: top-level is not a JSON object")
sv = data.get("schema_version")
if not isinstance(sv, int) or sv < 1:
raise ValueError(f"sidecar at {path}: missing or invalid schema_version")
if sv > SCHEMA_VERSION:
raise ValueError(
f"sidecar at {path}: schema_version={sv} > supported {SCHEMA_VERSION}; "
"upgrade seismo-relay to read this file"
)
if data.get("kind") != SIDECAR_KIND:
raise ValueError(f"sidecar at {path}: unexpected kind={data.get('kind')!r}")
return data
def patch_sidecar(
path: Union[str, Path],
*,
review: Optional[dict] = None,
extensions: Optional[dict] = None,
reviewer_now: bool = True,
) -> dict:
"""
Atomically apply a JSON-merge-patch to a sidecar file's `review`
and/or `extensions` blocks. Other top-level keys are untouched.
`review_now`: when True (default) and `review` is non-empty, stamps
`review.reviewed_at` with the current UTC time so the review-time is
auditable without the caller having to pass it.
Returns the new full sidecar dict.
"""
path = Path(path)
data = read_sidecar(path)
if review:
merged = dict(data.get("review") or {})
merged.update({k: v for k, v in review.items() if v is not None or k in merged})
if reviewer_now:
merged["reviewed_at"] = datetime.datetime.utcnow().isoformat() + "Z"
data["review"] = merged
if extensions:
merged_ext = dict(data.get("extensions") or {})
merged_ext.update(extensions)
data["extensions"] = merged_ext
write_sidecar(path, data)
return data
def sidecar_path_for(blastware_path: Union[str, Path]) -> Path:
"""Convention: <bw_path>.sfm.json sits next to the BW binary."""
p = Path(blastware_path)
return p.with_name(p.name + ".sfm.json")
def file_sha256(path: Union[str, Path], chunk_size: int = 65536) -> str:
"""Compute SHA-256 of a file as a hex string."""
h = hashlib.sha256()
with open(path, "rb") as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
h.update(chunk)
return h.hexdigest()
# ── Blastware-file reader ─────────────────────────────────────────────────────
#
# Reverse of `blastware_file.write_blastware_file`. Used by the BW-import
# flow to ingest files produced by Blastware's own ACH (where the source
# A5 frames are not available).
#
# File structure (recap):
# [22B header] [21B STRT record] [body bytes] [26B footer]
#
# The body holds:
# - 6B preamble (00 00 ff ff ff ff) immediately after the STRT
# - 4-channel interleaved int16 LE samples
# - Embedded ASCII metadata strings (Project: / Client: / User Name: /
# Seis Loc: / Extended Notes) from the device's session-start config
#
# The 0C waveform record (per-event peaks, project name) is NOT in the
# BW file — those are computed by the device firmware and only carried
# in the live SUB 0C response. read_blastware_file() therefore computes
# peaks from the raw samples assuming Normal-range (10 in/s full-scale)
# geophone sensitivity. Imported events surface that assumption via the
# sidecar's `peak_values.computed_from_samples` flag.
# Geophone scale factor, in/s per ADC unit, for Normal range (10 in/s FS).
# Confirmed from CLAUDE.md (geo_hardware_constant = 6.206053 in/s per V,
# ADC full-scale = 1.61133 V Normal range = 10.0 in/s peak; per-count
# resolution ≈ 10.0 / 32768).
_GEO_NORMAL_FS_INS = 10.0
_GEO_SENSITIVE_FS_INS = 1.250
_INT16_FS = 32768.0
# Microphone scale factor, psi per ADC count. Approximate — exact factor
# depends on the geophone-vs-mic ADC scaling and the firmware reference.
# We mark mic_psi as "computed approximate" in the sidecar.
_MIC_FS_PSI = 0.0125 / _INT16_FS # ~0.5 psi full-scale assumption
def _decode_strt(strt: bytes) -> dict:
"""
Decode the 21-byte STRT record from a BW file.
Returns dict with waveform_key (4B), total_samples, pretrig_samples,
rectime_seconds. Falls back to None on truncated/missing fields.
"""
if len(strt) < 21 or strt[0:4] != b"STRT":
return {}
return {
"waveform_key": strt[6:10].hex(),
"total_samples": struct.unpack_from(">H", strt, 8)[0],
"pretrig_samples": struct.unpack_from(">H", strt, 16)[0],
"rectime_seconds": strt[18],
}
def _find_first_string(buf: bytes, label: bytes, max_len: int = 256) -> Optional[str]:
"""
Search `buf` for `label` (e.g. b"Project:") and return the
null-terminated ASCII string that follows, stripped.
"""
pos = buf.find(label)
if pos < 0:
return None
start = pos + len(label)
end = buf.find(b"\x00", start, start + max_len)
if end < 0:
end = start + max_len
text = buf[start:end].decode("ascii", errors="replace").strip()
return text or None
def _decode_samples_4ch_int16_le(stream: bytes) -> dict[str, list[int]]:
"""
Decode a 4-channel interleaved int16 LE byte stream into per-channel
lists. Channels are [Tran, Vert, Long, Mic] = [ch0, ch1, ch2, ch3].
Truncates to a multiple of 8 bytes (one full sample-set).
"""
n_complete = (len(stream) // 8) * 8
if n_complete == 0:
return {"Tran": [], "Vert": [], "Long": [], "MicL": []}
fmt = "<" + "h" * (n_complete // 2)
flat = list(struct.unpack(fmt, stream[:n_complete]))
return {
"Tran": flat[0::4],
"Vert": flat[1::4],
"Long": flat[2::4],
"MicL": flat[3::4],
}
def _peaks_from_samples(samples: dict[str, list[int]]) -> PeakValues:
"""
Compute approximate peaks from raw int16 samples assuming Normal-range
geophone sensitivity. Used by the BW-importer when the 0C waveform
record (the device's authoritative peaks) is unavailable.
"""
def _peak_ins(ch: list[int]) -> float:
if not ch:
return 0.0
m = max(abs(int(v)) for v in ch)
return m / _INT16_FS * _GEO_NORMAL_FS_INS
tran = _peak_ins(samples.get("Tran", []))
vert = _peak_ins(samples.get("Vert", []))
long_ = _peak_ins(samples.get("Long", []))
# Mic in psi (approximate)
mic_ch = samples.get("MicL", []) or []
mic = max((abs(int(v)) for v in mic_ch), default=0) * _MIC_FS_PSI
# Peak vector sum: max over time of sqrt(T^2 + V^2 + L^2)
pvs = 0.0
n = min(len(samples.get("Tran", [])), len(samples.get("Vert", [])), len(samples.get("Long", [])))
if n:
scale = _GEO_NORMAL_FS_INS / _INT16_FS
T = samples["Tran"]; V = samples["Vert"]; L = samples["Long"]
for i in range(n):
t = T[i] * scale
v = V[i] * scale
l = L[i] * scale
mag = (t*t + v*v + l*l) ** 0.5
if mag > pvs:
pvs = mag
return PeakValues(
tran=tran, vert=vert, long=long_,
peak_vector_sum=pvs, micl=mic,
)
_RECORD_TYPE_BY_EXT_SUFFIX = {
'H': 'Histogram',
'W': 'Waveform',
'M': 'Manual',
'E': 'Event',
'C': 'Combo',
}
def derive_record_type_from_filename(filename, default: str = "Waveform") -> str:
"""Derive a BW Event's record_type from its filename's extension suffix.
V10.72+ MiniMate Plus firmware encodes the event type as the LAST
character of the extension (the `T` in BW's `AB0T` scheme):
``M529LKIQ.G10H`` H ``"Histogram"``
``T350L385.VY0W`` W ``"Waveform"``
``...M`` M ``"Manual"``
``...E`` E ``"Event"``
``...C`` C ``"Combo"``
Old S338 firmware uses 3-char extensions ending in ``0`` whose
encoding is not yet known those fall through to ``default``.
Micromate Series 4 uses a different scheme entirely (observed:
``IDFH``, ``IDFW``) but the LAST-char convention (H / W) still holds
for the type code, so it works for both families.
Returns ``default`` if filename is empty, has no extension, or the
suffix char isn't a recognized type code.
"""
if not filename:
return default
try:
name = Path(filename).name
except (TypeError, ValueError):
return default
if '.' not in name:
return default
ext = name.rsplit('.', 1)[1]
if not ext:
return default
return _RECORD_TYPE_BY_EXT_SUFFIX.get(ext[-1].upper(), default)
def read_blastware_file(path: Union[str, Path]) -> Event:
"""
Parse a Blastware waveform file into an Event.
Recovers:
- waveform_key, rectime_seconds, total_samples, pretrig_samples
(from the STRT record)
- timestamp (from the footer's start-time field)
- project_info (from ASCII labels embedded in the body)
- raw_samples (Tran/Vert/Long/MicL int16 lists)
- peak_values (computed from raw_samples; approximate see notes
on _peaks_from_samples about Normal-range assumption)
Does NOT recover the source A5 frames (they aren't in the BW file).
The returned Event has `_a5_frames = None`, signalling that
byte-for-byte regeneration of the BW file from this Event alone is
not possible the on-disk BW file IS the byte-for-byte source.
"""
path = Path(path)
raw = path.read_bytes()
if len(raw) < _bw._WAVEFORM_HEADER_SIZE + 21 + 26:
raise ValueError(f"{path}: file too short ({len(raw)} bytes) to be a BW event")
# Header: validate magic prefix.
header = raw[:_bw._WAVEFORM_HEADER_SIZE]
if not header.startswith(_bw._FILE_HEADER_PREFIX):
raise ValueError(f"{path}: not a Blastware file (bad header prefix)")
# STRT record: 21 bytes immediately after the header.
strt_raw = raw[_bw._WAVEFORM_HEADER_SIZE : _bw._WAVEFORM_HEADER_SIZE + 21]
strt_fields = _decode_strt(strt_raw)
if not strt_fields:
raise ValueError(f"{path}: STRT record missing or malformed")
# Footer: locate the 0e 08 marker, validating the year is in a sane range.
body_start = _bw._WAVEFORM_HEADER_SIZE + 21
footer_pos = -1
pos = body_start
while True:
pos = raw.find(b"\x0e\x08", pos)
if pos < 0 or pos + 26 > len(raw):
break
yr = (raw[pos + 4] << 8) | raw[pos + 5]
if 2015 <= yr <= 2050:
footer_pos = pos
break
pos += 1
if footer_pos < 0 and len(raw) >= 26:
footer_pos = len(raw) - 26
if footer_pos < body_start:
raise ValueError(f"{path}: footer not found")
body = raw[body_start : footer_pos]
footer = raw[footer_pos : footer_pos + 26]
# Footer layout:
# [0:2] 0e 08 marker
# [2:10] ts1 (start) BE 8B
# [10:18] ts2 (stop) BE 8B
# [18:24] 00 01 00 02 00 00
# [24:26] crc
ts1 = _bw._decode_ts_be(footer[2:10])
ts2 = _bw._decode_ts_be(footer[10:18])
# Body: decode via the verified body codecs. Two formats coexist:
#
# 1. Waveform-mode (.AB0W) — starts with 7-byte preamble
# ``00 02 00 [Tran[0] BE] [Tran[1] BE]`` followed by the
# tagged-block delta stream documented in
# ``docs/waveform_codec_re_status.md`` and §7.6.1 of the
# protocol reference. Decoded by ``waveform_codec.decode_waveform_v2``.
#
# 2. Histogram-mode (.AB0H) — a sequence of 32-byte blocks, one
# per histogram interval, each carrying per-channel peak +
# half-period values. Decoded by
# ``histogram_codec.decode_histogram_body``. Both codecs
# return the same channel-grouped output shape, so consumers
# don't need to special-case mode.
#
# The historical ``_decode_samples_4ch_int16_le`` int16-LE
# interpretation was retracted 2026-05-08 (see protocol-ref §7.6.1
# retraction box) — it produced ±32K noise on every event.
#
# If both codecs fail (malformed file, truncated body, unrecognised
# mode, synthetic test input), fall back to empty channels — the
# rest of the event (timestamp, waveform_key, project strings) is
# still recoverable and useful.
decoded = decode_waveform_v2(body)
if decoded is None:
decoded = decode_histogram_body(body)
if decoded is None:
log.warning(
"%s: body codec failed to decode (body starts %s) — "
"raw_samples will be empty", path, body[:8].hex(" "),
)
samples = {"Tran": [], "Vert": [], "Long": [], "MicL": []}
else:
samples = decoded_to_adc_counts(decoded)
# Metadata strings (label-anchored search across the body).
project = _find_first_string(body, b"Project:")
client = _find_first_string(body, b"Client:")
user = _find_first_string(body, b"User Name:")
seisloc = _find_first_string(body, b"Seis Loc:")
# Build the Event.
ev = Event(index=-1)
if strt_fields.get("waveform_key"):
ev._waveform_key = bytes.fromhex(strt_fields["waveform_key"])
# Derive record_type from the filename's extension suffix (H/W/M/E/C).
# When called from save_imported_bw the path here is a tmp file with a
# ".bw" suffix, so the derivation falls back to "Waveform" and the
# caller overrides ev.record_type using the original filename — see
# waveform_store.save_imported_bw.
ev.record_type = derive_record_type_from_filename(path.name)
ev.rectime_seconds = strt_fields.get("rectime_seconds")
ev.total_samples = strt_fields.get("total_samples")
ev.pretrig_samples = strt_fields.get("pretrig_samples")
if ts1 is not None:
ev.timestamp = Timestamp(
raw=footer[2:10],
flag=0x10,
year=ts1.year, unknown_byte=0, month=ts1.month, day=ts1.day,
hour=ts1.hour, minute=ts1.minute, second=ts1.second,
)
ev.project_info = ProjectInfo(
project=project, client=client, operator=user, sensor_location=seisloc,
)
ev.raw_samples = samples
# Only compute peaks from samples when we actually have samples.
# For events the codec couldn't decode (histogram-mode bodies, until
# the §7.6.2 histogram codec is wired in), samples is an empty dict
# and ``_peaks_from_samples`` would return PeakValues(0, 0, 0, 0, 0).
# That would then OVERWRITE existing good DB peak values (e.g. from
# paired BW ASCII reports) during the backfill UPSERT path.
# Leaving peak_values=None signals "we don't know" to downstream
# consumers; the backfill script seeds from the DB row when it sees
# None, and ``apply_report_to_event`` overlays from a paired ASCII
# report when one is supplied.
has_samples = any(samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL"))
ev.peak_values = _peaks_from_samples(samples) if has_samples else None
ev._a5_frames = None # not recoverable from BW file
return ev
+191 -35
View File
@@ -111,20 +111,24 @@ def build_5a_frame(offset_word: int, raw_params: bytes) -> bytes:
verified against this algorithm on 2026-04-02).
Args:
offset_word: 16-bit offset (0x1004 for probe/chunks, 0x005A for term).
raw_params: 10 or 11 params bytes (from bulk_waveform_params or
bulk_waveform_term_params). 0x10 bytes in params are
written RAW NOT DLE-stuffed. Confirmed 2026-04-06 by
comparing wire bytes: BW sends bare `10 04` for chunk 1
(counter=0x1004), not stuffed `10 10 04`. Device reads
params at fixed byte positions; stuffing shifts the bytes
and corrupts the counter, causing device to ignore the frame.
offset_word: 16-bit offset. For probe/chunks/metadata pages this is
`0x1002`. For the proper TERM frame this is computed by
`bulk_waveform_term_v2()` from the STRT-derived
`end_offset`.
raw_params: 10, 11, or 12 params bytes (from `bulk_waveform_params`
for probes/samples, `bulk_waveform_term_v2` for TERM, or
a manually-built 12-byte block for the metadata pages
0x1002 / 0x1004). See gotcha #3 below — params region
uses partial DLE stuffing of 0x10 bytes.
Returns:
Complete frame bytes: [ACK][STX][stuffed_section][chk][ETX]
"""
if len(raw_params) not in (10, 11):
raise ValueError(f"raw_params must be 10 or 11 bytes, got {len(raw_params)}")
if len(raw_params) not in (10, 11, 12):
# 10 = termination params; 11 = regular probe / chunk params;
# 12 = metadata-page params (extra trailing 0x00 — BW byte-perfect quirk
# for the two fixed metadata reads at counter=0x1002 and 0x1004).
raise ValueError(f"raw_params must be 10/11/12 bytes, got {len(raw_params)}")
# Build stuffed section between STX and checksum
s = bytearray()
@@ -134,8 +138,40 @@ def build_5a_frame(offset_word: int, raw_params: bytes) -> bytes:
s += b"\x00" # field3
s += bytes([(offset_word >> 8) & 0xFF, # offset_hi — raw, NOT stuffed
offset_word & 0xFF]) # offset_lo
for b in raw_params: # params — NOT DLE-stuffed (raw bytes, match BW wire format)
# Params — partial DLE stuffing of 0x10 bytes (CONFIRMED 2026-05-05).
#
# The device's de-stuffing rule for params is:
# • `10 10` → de-stuffs to `10`
# • `10 02/03/04` → kept literal (these are inner-frame markers)
# • `10 X` other → de-stuffs to just `X` (drops the 0x10)
#
# So for any 0x10 byte in the *logical* params that is followed by a
# byte NOT in {0x02, 0x03, 0x04, 0x10}, we must double the 0x10 on the
# wire (`10 X` → `10 10 X`) so the device's de-stuffer reproduces the
# original `10 X` pair. Without this, counter values with `0x10` in
# the high byte (e.g. counter=0x1000 has params bytes `10 00`) are
# silently corrupted to `0x__00` on the device side, and the device
# responds for the wrong address — for counter=0x1000 it returns the
# probe response (counter=0x0000), which contains the file header +
# STRT. That STRT block then lands in the assembled file body and
# Blastware rejects the file as malformed.
#
# Confirmed against BW capture 5-1-26 / bwcap3sec frame 20: params
# logical bytes `00 01 11 10 00 00 00 00 00 00 00` (counter=0x1000)
# are encoded on the wire as `00 01 11 10 10 00 00 00 00 00 00 00`.
# BW frames 13/14 (meta @ 0x1002 / 0x1004) leave `10 02` and `10 04`
# raw — the device handles those literal pairs correctly.
i = 0
while i < len(raw_params):
b = raw_params[i]
s.append(b)
if (
b == 0x10
and i + 1 < len(raw_params)
and raw_params[i + 1] not in (0x02, 0x03, 0x04, 0x10)
):
s.append(0x10) # double the 0x10 so it survives device de-stuffing
i += 1
# DLE-aware checksum: for 0x10 XX pairs count XX; for lone bytes count them
chk, i = 0, 0
@@ -398,28 +434,26 @@ def bulk_waveform_params(key4: bytes, counter: int, *, is_probe: bool = False) -
def bulk_waveform_term_params(key4: bytes, counter: int) -> bytes:
"""
Build the 10-byte params block for the SUB 5A termination request.
DEPRECATED DO NOT USE IN NEW CODE.
The termination request uses offset=0x005A and a DIFFERENT params layout
the leading 0x00 byte is dropped, key4[0:2] shifts to params[0:2], and the
counter high byte is at params[2]:
This is the v1 termination params helper, paired with the broken
`_BULK_TERM_OFFSET = 0x005A` magic offset_word. Together they produce a
~100-byte device-side terminator response that does NOT contain the
partial-last-chunk waveform tail or the 26-byte file footer. Files
reconstructed using this terminator are missing their last ~512 bytes of
waveform data and have a synthesized footer that disagrees with what BW
would have written.
params[0] = key4[0]
params[1] = key4[1]
params[2] = (counter >> 8) & 0xFF
params[3:] = zeros
**For new code, use `bulk_waveform_term_v2(key4, end_offset, last_chunk_counter)`**
which computes the correct offset_word + params from the STRT-derived
`end_offset`. v2 produces wire bytes that match BW exactly across all
tested events (4-27-26 / 5-1-26 / 5-4-26 captures).
Counter for the termination request = last_regular_counter + 0x0400.
Confirmed from 1-2-26 BW TX capture: final request (frame 83) uses
offset=0x005A, params[0:3] = key4[0:2] + term_counter_hi.
Args:
key4: 4-byte waveform key.
counter: Termination counter (= last regular counter + 0x0400).
Returns:
10-byte params block.
This function is retained ONLY for the defensive fallback path in
`read_bulk_waveform_stream()` that triggers when STRT parsing fails or no
chunks are fetched (= a malformed event or an unexpected device state).
The fallback already logs a WARNING when it activates; if you see that
warning, the bug is upstream STRT should have been parseable.
"""
if len(key4) != 4:
raise ValueError(f"waveform key must be 4 bytes, got {len(key4)}")
@@ -430,6 +464,123 @@ def bulk_waveform_term_params(key4: bytes, counter: int) -> bytes:
return bytes(p)
def bulk_waveform_term_v2(
key4: bytes,
end_offset: int,
last_chunk_counter: int,
) -> tuple[int, bytes]:
"""
Compute the SUB 5A TERM frame's offset_word and 10-byte params block.
Confirmed across 3 events (4-27-26 + 5-1-26 captures):
next_boundary = last_chunk_counter + 0x0200
offset_word = end_offset - next_boundary (residual byte count)
params[0] = key4[0] (= 0x01 on every observed device)
params[1] = key4[1] (= 0x11)
params[2] = (next_boundary >> 8) & 0xFF
params[3] = next_boundary & 0xFF
params[4:10] = zeros
Verification:
| end_offset | last_chunk | next_boundary | offset_word | params[2:4] |
| 0x1ABE | 0x1800 | 0x1A00 | 0x00BE | 1A 00 |
| 0x21F2 | 0x1E00 | 0x2000 | 0x01F2 | 20 00 |
| 0x417E | 0x3E38 | 0x4038 | 0x0146 | 40 38 |
The device receives `requested_address = (params[2] << 8) | offset_word`
and replies with `(end_offset - next_boundary)` bytes of waveform tail
starting at `next_boundary` including the 26-byte file footer.
Args:
key4: 4-byte waveform key for this event.
end_offset: Event-end pointer (= `(end_key[2] << 8) | end_key[3]`
from the STRT record at data[23:27] of A5[0]).
last_chunk_counter: Counter of the last full 0x0200-byte chunk fetched
(the chunk that covers [last_chunk_counter,
last_chunk_counter + 0x0200)).
Returns:
(offset_word, params10) tuple. Pass as
`build_5a_frame(offset_word, params)`.
Raises:
ValueError: on inconsistent inputs.
"""
if len(key4) != 4:
raise ValueError(f"waveform key must be 4 bytes, got {len(key4)}")
next_boundary = last_chunk_counter + 0x0200
if next_boundary > 0xFFFF:
raise ValueError(
f"next_boundary 0x{next_boundary:04X} exceeds uint16; check inputs"
)
if end_offset <= last_chunk_counter:
raise ValueError(
f"end_offset 0x{end_offset:04X} must be > "
f"last_chunk_counter 0x{last_chunk_counter:04X}"
)
offset_word = end_offset - next_boundary
if offset_word < 0:
# Last chunk overshot end_offset; caller should have stopped one chunk
# earlier. Treat as zero residual.
offset_word = 0
if offset_word > 0xFFFF:
raise ValueError(
f"offset_word 0x{offset_word:04X} exceeds uint16"
)
p = bytearray(10)
p[0] = key4[0]
p[1] = key4[1]
p[2] = (next_boundary >> 8) & 0xFF
p[3] = next_boundary & 0xFF
return offset_word, bytes(p)
# ── End-offset extraction from STRT record ────────────────────────────────────
STRT_MARKER = b"STRT"
def parse_strt_end_offset(a5_data: bytes) -> Optional[int]:
"""
Extract the event-end offset from the STRT record in an A5 response payload.
The first A5 response (the probe response, or the first chunk for events
with non-zero start_key[2:4]) contains a STRT record at byte offset 17 of
`data`. Layout:
data[17:21] "STRT"
data[21:23] ff fe sentinel
data[23:27] end_key 4-byte key of where this event ENDS
data[27:31] start_key
...
Returns `(end_key[2] << 8) | end_key[3]` the absolute device-buffer
address where the event ends. Use this to bound the chunk loop and to
compute the TERM frame.
Verified end_offset values:
| event start_key | end_key | end_offset |
| 01110000 | 01111ABE | 0x1ABE |
| 01110000 | 011121F2 | 0x21F2 |
| 011121F2 | 0111417E | 0x417E |
Args:
a5_data: The `data` field of an A5 response frame (frame.data).
Returns:
The end_offset (uint16) if STRT is found, else None.
"""
pos = a5_data.find(STRT_MARKER)
if pos < 0 or pos + 10 > len(a5_data):
return None
# data[pos+4:pos+6] is "ff fe"; data[pos+6:pos+10] is end_key.
end_key = a5_data[pos + 6 : pos + 10]
if len(end_key) < 4:
return None
return (end_key[2] << 8) | end_key[3]
# ── Pre-built POLL frames ─────────────────────────────────────────────────────
#
# POLL (SUB 0x5B) uses the same two-step pattern as all other reads — the
@@ -457,6 +608,11 @@ class S3Frame:
page_lo: int # PAGE_LO from header
data: bytes # payload data section (payload[5:], checksum already stripped)
checksum_valid: bool
chk_byte: int = 0 # actual checksum byte received from wire (body[-1])
# needed for waveform file reconstruction: when the last data byte
# is 0x10 and chk_byte ∈ {0x02, 0x03, 0x04}, the DLE+chk pair
# must be included in the DLE-strip operation to correctly
# reconstruct the Blastware binary body.
@property
def page_key(self) -> int:
@@ -465,7 +621,6 @@ class S3Frame:
# ── Streaming S3 frame parser ─────────────────────────────────────────────────
class S3FrameParser:
"""
Incremental byte-stream parser for S3BW response frames.
@@ -592,9 +747,10 @@ class S3FrameParser:
return None
return S3Frame(
sub = raw_payload[2],
page_hi = raw_payload[3],
page_lo = raw_payload[4],
data = raw_payload[5:],
sub = raw_payload[2],
page_hi = raw_payload[3],
page_lo = raw_payload[4],
data = raw_payload[5:],
checksum_valid = (chk_received == chk_computed),
chk_byte = chk_received,
)
+283
View File
@@ -0,0 +1,283 @@
"""
histogram_codec.py decoder for MiniMate Plus histogram-mode event bodies.
FULLY DECODED 2026-05-20. Every field in every block, verified
byte-exact against BW's ASCII export across multiple histogram
fixtures.
The histogram-mode body is a stream of 32-byte fixed-length blocks,
one block per histogram interval. Each block carries the per-interval
peak amplitude + zero-crossing frequency for all four channels (Tran,
Vert, Long, MicL).
Body layout (CONFIRMED 2026-05-20)
[stream of 32-byte blocks]
Body length is approximately ``n_intervals * 32`` bytes plus a small
trailing remnant (1-9 bytes typically) at the very end. Walker should
iterate 32-stride and stop before the tail.
32-byte block layout
[0] 0x00 always-zero tag
[1] segment_id (uint8) 0x00..0x03 256 blocks per segment
[2:4] block_ctr (uint16 LE) resets each segment (0x0100, 0x0101, )
[4:6] 0x000a (uint16 LE) constant marker (= 10)
[6] T_peak_count uint8 Tran peak (count × 0.005 in/s, max 1.275 in/s)
[7] T_annotation uint8 empirically non-zero on intervals with sub-Hz
or unmeasurable Tran freq; meaning not fully RE'd
[8:10] T_halfperiod uint16 LE Tran half-period in samples (freq = 512 / halfp Hz)
[10] V_peak_count uint8
[11] V_annotation uint8
[12:14] V_halfperiod uint16 LE
[14] L_peak_count uint8
[15] L_annotation uint8
[16:18] L_halfperiod uint16 LE
[18] M_peak_count uint8 MicL peak (count dB via mic_count_to_db)
[19] M_annotation uint8
[20:22] M_halfperiod uint16 LE MicL half-period in samples (freq = 512 / halfp Hz)
[22:24] 0x00 0x00 constant
[24:28] 4-byte variable purpose unknown (possibly CRC or timestamp delta)
[28:32] 0x1e 0x0a 0x00 0x00 constant block-end signature
NOTE on peak-count width: an earlier interpretation treated the peak
fields as uint16 LE spanning [6:8] / [10:12] / [14:16] / [18:20].
That happened to be byte-exact against the N844 fixture corpus only
because every annotation byte in those fixtures was zero, making
``uint16 LE == uint8``. Cross-correlating BE9558 (K558) Tran-drift
and BE18003 (T003) Histogram+Continuous events against the BW ASCII
export proved peak is uint8 alone see test_histogram_codec.py
and docs/histogram_codec_re_status.md.
Block-identification anchor: ``block[22:24] == b"\\x00\\x00"`` AND
``block[28:32] == b"\\x1e\\x0a\\x00\\x00"``. This is the reliable
distinguisher from non-block content in the file.
Per-channel encoding
Geophone channels (Tran, Vert, Long):
- peak_count × 0.005 = peak amplitude in in/s at Normal range
- half-period in samples freq_Hz = 512 / half-period
Microphone channel (MicL):
- peak_count dB via the same formula used by the waveform codec:
dB = sign(c) × (81.94 + 20·log10(|c|)) for |c| 1
dB = 0 for c == 0
- half-period freq_Hz = 512 / half-period (same as geo)
Frequency `>100 Hz` sentinel: the device emits half-period 5 when the
measured zero-crossing rate exceeds the geophone's measurement range
(since 512/5 = 102 Hz; the BW display rounds anything > 100 to ">100").
Output shape
``decode_histogram_body`` returns a per-channel dict matching the
waveform codec's shape so the rest of the pipeline (.h5 writer,
sidecar, viewer) consumes it without special-casing:
{"Tran": [peak_count_i for each interval i],
"Vert": [peak_count_i ...],
"Long": [peak_count_i ...],
"MicL": [peak_count_i ...]}
Values are in **16-count units for geo** (LSB = 0.005 in/s, matching
``decode_waveform_v2``) and **1-count units for mic** (matching the
waveform codec's mic convention). Run through
``waveform_codec.decoded_to_adc_counts`` to scale geo to 1-count ADC.
Per-interval frequencies are NOT returned they're auxiliary data,
not waveform samples. Consumers needing frequencies can call
``decode_histogram_body_full()`` for the structured per-interval
record list.
"""
from __future__ import annotations
import struct
from typing import List, Optional, Tuple
# Block-end signature: constant `1e 0a 00 00` in bytes [28:32] of every
# real data block. More distinctive than the byte-22 `00 00` (which
# matches many false positives), so we anchor on this.
_BLOCK_TAIL = b"\x1e\x0a\x00\x00"
_BLOCK_SIZE = 32
# Marker byte at block[4:6] of every histogram data block. Used as
# additional validation that we're looking at a real block.
_BLOCK_MARKER = 10
# Geo peak scaling: stored as "count × 0.005 in/s" where 1 count = one
# 0.005 in/s display quantum. Equivalent to the waveform codec's
# 16-count-unit output (1 unit = 0.005 in/s = 16 ADC counts).
_GEO_LSB_INS = 0.005
# Frequency formula: freq_Hz = _FREQ_NUMERATOR / half_period_samples.
# Empirically determined to be 512 (= sample_rate / 2, where sample rate
# is 1024 sps for the standard MiniMate Plus configuration).
_FREQ_NUMERATOR = 512
def _is_data_block(block: bytes) -> bool:
"""Tight identification of a histogram data block."""
if len(block) < _BLOCK_SIZE:
return False
if block[28:32] != _BLOCK_TAIL:
return False
if block[22:24] != b"\x00\x00":
return False
if block[0] != 0x00:
return False
marker = block[4] | (block[5] << 8)
if marker != _BLOCK_MARKER:
return False
return True
def _decode_block(block: bytes) -> Optional[dict]:
"""Decode one 32-byte histogram block. Caller must have validated
with ``_is_data_block`` first.
Returns a record with per-channel peak counts (uint8) and
half-periods (uint16 LE).
"""
# Peak counts are uint8 at bytes [6] / [10] / [14] / [18]. The
# adjacent bytes [7] / [11] / [15] / [19] hold an annotation field
# whose meaning isn't fully understood (empirically non-zero in
# intervals with sub-Hz or unmeasurable geo frequencies, mostly
# zero otherwise — see test fixtures from BE9558/BE18003 corpora).
# Crucially, those annotation bytes are NOT the high byte of the
# peak count: cross-correlating against BW's per-interval ASCII
# export proves the peak is uint8 alone.
#
# Reading the peak as uint16 LE (the original interpretation) was
# accidentally correct only because every block in the N844 fixture
# corpus had a zero annotation byte; non-N844 events with non-zero
# annotation bytes decoded to physically impossible peaks (e.g.
# 268 in/s per channel) and produced 35× inflated PVS sums when
# first run against prod data. See histogram_codec_re_status.md.
t_peak = block[6]
v_peak = block[10]
l_peak = block[14]
m_peak = block[18]
t_halfp = block[8] | (block[9] << 8)
v_halfp = block[12] | (block[13] << 8)
l_halfp = block[16] | (block[17] << 8)
m_halfp = block[20] | (block[21] << 8)
segment_id = block[1]
block_ctr = block[2] | (block[3] << 8)
var_meta = bytes(block[24:28])
annotations = (block[7], block[11], block[15], block[19])
return {
"segment_id": segment_id,
"block_ctr": block_ctr,
"t_peak": t_peak,
"t_halfp": t_halfp,
"v_peak": v_peak,
"v_halfp": v_halfp,
"l_peak": l_peak,
"l_halfp": l_halfp,
"m_peak": m_peak,
"m_halfp": m_halfp,
"meta_var": var_meta,
"annotations": annotations,
}
def walk_body(body: bytes) -> List[dict]:
"""Walk the body and return one dict per histogram interval.
Iterates 32-byte strides from offset 0. Yields a decoded record
for every block that passes ``_is_data_block`` validation. Stops
when the remaining bytes are too short to form a complete block.
In Histogram+Continuous mode the body interleaves data blocks with
other 32-byte content (likely continuous-mode waveform blocks) that
fail the data-block validation; the walker naturally skips them
without losing 32-byte alignment. Use ``block_ctr`` from each
returned record to map back to the original interval index the
record list is sparse when other block types are interleaved.
"""
records: List[dict] = []
for off in range(0, len(body) - _BLOCK_SIZE + 1, _BLOCK_SIZE):
blk = body[off:off + _BLOCK_SIZE]
if not _is_data_block(blk):
# Hit non-block content (likely a sync or stream marker).
# Continue walking — block alignment is fixed at 32-stride
# from offset 0, so we don't lose alignment by skipping.
continue
decoded = _decode_block(blk)
if decoded is None:
# Block validated as a histogram block but had peak fields
# outside the plausible range — undocumented extension.
# Skip rather than propagating bogus PVS contributions.
continue
records.append(decoded)
return records
def decode_histogram_body(body: bytes) -> Optional[dict]:
"""Decode a histogram-mode body into per-channel peak-sample arrays.
Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
where each channel's list contains one peak value per histogram
interval (in the same units the waveform codec uses: 16-count units
for geo, 1-count ADC units for mic). Returns ``None`` if the body
doesn't contain any valid histogram blocks.
To convert to physical units:
- Geo channels: ``count * 0.005`` = peak in in/s at Normal range
(or run through ``waveform_codec.decoded_to_adc_counts`` first
to get 1-count ADC values, then ``count / 32767 * 10.0`` for in/s)
- Mic channel: use ``waveform_codec.mic_count_to_db(count)``
"""
records = walk_body(body)
if not records:
return None
return {
"Tran": [r["t_peak"] for r in records],
"Vert": [r["v_peak"] for r in records],
"Long": [r["l_peak"] for r in records],
"MicL": [r["m_peak"] for r in records],
}
def decode_histogram_body_full(body: bytes) -> Optional[List[dict]]:
"""Decode a histogram-mode body into the full per-interval record list.
Same data as ``decode_histogram_body`` but in a structured form that
preserves the half-period (frequency) data for each channel + the
per-block segment_id, block_ctr, and 4-byte variable metadata.
Useful for diagnostic tools, sidecar enrichment, and future-codec
work.
Returns ``None`` if the body has no valid blocks.
"""
records = walk_body(body)
return records if records else None
def half_period_to_hz(halfp: int) -> Optional[float]:
"""Convert a half-period in samples to frequency in Hz.
Returns ``None`` for half-period 5 the device emits values in
that range when the measured zero-crossing rate exceeds 100 Hz
(the BW display reports `>100 Hz` for such cases). Callers can
treat ``None`` as the `>100 Hz` sentinel.
"""
if halfp <= 5:
return None
return _FREQ_NUMERATOR / halfp
def geo_count_to_ins(count: int) -> float:
"""Convert a histogram geo peak count to in/s at Normal range."""
return count * _GEO_LSB_INS
+213 -6
View File
@@ -14,6 +14,7 @@ Notes on certainty:
from __future__ import annotations
import datetime
import struct
from dataclasses import dataclass, field
from typing import Optional
@@ -200,6 +201,58 @@ class Timestamp:
second=second,
)
@classmethod
def from_short_record(cls, data: bytes) -> "Timestamp":
"""
Decode an 8-byte timestamp header from a 210-byte waveform record.
Wire layout ( CONFIRMED 2026-05-01 against live SFM run on BE11529 in
Continuous mode, day-of-month = 1 May, raw: 01 05 07 ea 00 0d 15 25):
byte[0]: day (uint8)
byte[1]: month (uint8)
bytes[2-3]: year (big-endian uint16)
byte[4]: unknown (0x00 in observed sample)
byte[5]: hour (uint8)
byte[6]: minute (uint8)
byte[7]: second (uint8)
This is a third format observed in the wild distinct from the 9-byte
(single-shot, sub_code=0x10 at [1]) and 10-byte (continuous, 0x10 at
[0] AND [2]) layouts. No marker bytes; disambiguated by where the
year lands when scanned at byte 2/3/4.
Args:
data: at least 8 bytes; only the first 8 are consumed.
Returns:
Decoded Timestamp.
Raises:
ValueError: if data is fewer than 8 bytes.
"""
if len(data) < 8:
raise ValueError(
f"Short record timestamp requires at least 8 bytes, got {len(data)}"
)
day = data[0]
month = data[1]
year = struct.unpack_from(">H", data, 2)[0]
unknown_byte = data[4]
hour = data[5]
minute = data[6]
second = data[7]
return cls(
raw=bytes(data[:8]),
flag=0,
year=year,
unknown_byte=unknown_byte,
month=month,
day=day,
hour=hour,
minute=minute,
second=second,
)
@property
def clock_set(self) -> bool:
"""False when year == 1995 (factory default / battery-lost state)."""
@@ -268,7 +321,7 @@ class ChannelConfig:
label: str # e.g. "Tran", "Vert", "Long", "MicL" ✅
trigger_level: float # in/s (geo) or psi (MicL) ✅
alarm_level: float # in/s (geo) or psi (MicL) ✅
max_range: float # full-scale calibration constant (e.g. 6.206) 🔶
max_range: float # hardware/firmware sensitivity constant (e.g. 6.206053) ✅ confirmed same on all units
unit_label: str # e.g. "in./s" or "psi" ✅
@@ -337,15 +390,34 @@ class ComplianceConfig:
raw: Optional[bytes] = None # full 2090-byte payload (for debugging)
# Recording parameters (✅ CONFIRMED from §7.6)
record_time: Optional[float] = None # seconds (7.0, 10.0, 13.0, etc.)
sample_rate: Optional[int] = None # sps (1024, 2048, 4096, etc.) — NOT YET FOUND ❓
recording_mode: Optional[int] = None # uint8: 0x00=Single Shot, 0x01=Continuous,
# 0x03=Histogram, 0x04=Histogram+Continuous ✅ confirmed 2026-04-20
# Read (E5): data[anchor_pos - 8] (6-byte anchor)
# Write (SUB 71): data[anchor_pos - 7]
sample_rate: Optional[int] = None # sps (1024, 2048, 4096)
histogram_interval_sec: Optional[int] = None # uint16 BE, seconds ✅ confirmed 2026-04-20
# anchor_pos - 4 (same offset in read & write)
# Valid values: 2, 5, 15, 60, 300, 900
# Mode-gated: only active in Histogram/Histogram+Continuous
record_time: Optional[float] = None # seconds (e.g. 3.0, 5.0, 8.0, 10.0)
# Trigger/alarm levels (✅ CONFIRMED per-channel at §7.6)
# For now we store the first geo channel (Transverse) as representatives;
# full per-channel data would require structured Channel objects.
trigger_level_geo: Optional[float] = None # in/s (first geo channel)
alarm_level_geo: Optional[float] = None # in/s (first geo channel)
max_range_geo: Optional[float] = None # in/s full-scale range
trigger_level_geo: Optional[float] = None # in/s (first geo channel)
alarm_level_geo: Optional[float] = None # in/s (first geo channel)
geo_adc_scale: Optional[float] = None # ADC-to-velocity scale factor (float32 at Tran+28) ✅
# = inverse sensitivity = 1/sensitivity (in/s per V)
# Formula (Interface Handbook §4.5): Range = 1.61133 V × scale_factor
# → 1.61133 × 6.206053 = 10.000 in/s (Normal range) ✅
# Firmware uses: PPV (in/s) = ADC_voltage (V) × 6.206053
# Identical on BE11529 and BE18189 — same Instantel geophone hardware.
# NOT a user-configurable setting. Must NOT be written.
geo_range: Optional[int] = None # range/sensitivity selector — CONFIRMED 2026-04-20
# 0x00 = Normal 10.000 in/s (standard gain)
# 0x01 = Sensitive 1.250 in/s (high gain)
# Offset: Tran+33 in both E5 read and SUB 71 write payloads
# (same 2126-byte buffer is round-tripped; applied to Tran/Vert/Long)
# Project/setup strings (sourced from E5 / SUB 71 write payload)
# These are the FULL project metadata from compliance config,
@@ -358,6 +430,78 @@ class ComplianceConfig:
notes: Optional[str] = None # extended notes / additional info
# ── Call Home Config ──────────────────────────────────────────────────────────
@dataclass
class CallHomeConfig:
"""
Auto Call Home (ACH) configuration from SUB 0x2C (response 0xD3).
Read with a standard two-step protocol (probe offset=0x00, data offset=0x7C).
Written via SUB 0x7E (write, 127-byte payload) + SUB 0x7F (confirm).
Confirmed from 4-20-26 call home settings captures (11 BW + S3 capture pairs).
Raw payload layout (data[11:] from S3 response, 125 bytes):
[0] 0x00 header byte
[1] 0x7C = 124 inner length (= offset for SUB 0x7E write - 2)
[2] 0xDC constant
[3:5] 0x00 0x00 padding
[5] auto_call_home_enabled (0x00=off, 0x01=on)
[6:46] dial_string 40-byte null-padded ASCII
[46:87] auto_answer_raw AT command strings (not decoded) present
[87] after_event_recorded (0x01=on, 0x00=off)
[91] at_specified_times (0x01=on, 0x00=off)
[93] time1_enabled (0x01=on, 0x00=off)
[95] time2_enabled (0x01=on, 0x00=off)
[101] time1_hour uint8 decimal 0-23
[102] time1_min uint8 decimal 0-59
[105] time2_hour uint8 decimal 0-23
[106] time2_min uint8 decimal 0-59
[117] DLE prefix (0x10) DLE-escaped num_retries=3 (0x03)
[118] 0x03 device stores/returns 0x03 DLE-escaped
[120] time_between_retries_sec uint8 (= 0x0F = 15 s default)
[122] wait_for_connection_sec uint8 (= 0x3C = 60 s default)
[124] warm_up_time_sec uint8 (= 0x3C = 60 s default)
Write payload = raw 125 bytes + b'\\x00\\x00' (2 trailing zeros) = 127 bytes.
Offset for SUB 0x7E: data[1] + 2 = 0x7C + 2 = 0x7E = 126.
Note on DLE-escaped 0x03: The device's S3 response DLE-escapes ETX (0x03)
bytes as \\x10\\x03. The S3FrameParser preserves both bytes in frame.data.
Subsequent fields after offset 117 are therefore at raw_offset = logical+1.
The raw payload must be round-tripped verbatim in write; do NOT reapply DLE
destuffing or stripping.
"""
raw: Optional[bytes] = None # raw 125-byte read payload (for round-trip write)
# ── Main enable ──────────────────────────────────────────────────────────
auto_call_home_enabled: Optional[bool] = None # raw[5] ✅
# ── Dial string ──────────────────────────────────────────────────────────
dial_string: Optional[str] = None # raw[6:46] 40-byte null-padded ASCII ✅
# ── When to call ─────────────────────────────────────────────────────────
after_event_recorded: Optional[bool] = None # raw[87] ✅
at_specified_times: Optional[bool] = None # raw[91] ✅
# ── Time slot 1 ──────────────────────────────────────────────────────────
time1_enabled: Optional[bool] = None # raw[93] ✅
time1_hour: Optional[int] = None # raw[101] 0-23 ✅
time1_min: Optional[int] = None # raw[102] 0-59 ✅
# ── Time slot 2 ──────────────────────────────────────────────────────────
time2_enabled: Optional[bool] = None # raw[95] ✅
time2_hour: Optional[int] = None # raw[105] 0-23 ✅
time2_min: Optional[int] = None # raw[106] 0-59 ✅
# ── Retry / timeout settings (read-only; not writable via set_call_home_config) ──
num_retries: Optional[int] = None # raw[117:119]=10 03 → value 3 ✅
time_between_retries_sec: Optional[int] = None # raw[120] (shifted +1 by DLE) ✅
wait_for_connection_sec: Optional[int] = None # raw[122] ✅
warm_up_time_sec: Optional[int] = None # raw[124] ✅
# ── Event ─────────────────────────────────────────────────────────────────────
@dataclass
@@ -401,6 +545,10 @@ class Event:
# Set by get_events(); required by download_waveform().
_waveform_key: Optional[bytes] = field(default=None, repr=False)
# Raw A5 frames from the full bulk waveform download (full_waveform=True).
# Populated by get_events() when full_waveform=True; used by write_blastware_file().
_a5_frames: Optional[list] = field(default=None, repr=False)
def __str__(self) -> str:
ts = str(self.timestamp) if self.timestamp else "no timestamp"
ppv = ""
@@ -419,6 +567,65 @@ class Event:
return f"Event#{self.index} {ts}{ppv}"
# ── MonitorLogEntry ───────────────────────────────────────────────────────────
@dataclass
class MonitorLogEntry:
"""
A monitor log entry decoded from a SUB 0x0A (WAVEFORM_HEADER) response
whose first byte is 0x2C (partial record, recording mode = continuous
monitoring without a triggered event).
These are the "partial bins" that Blastware stores between triggered events.
Each entry represents one monitoring interval the span of time during
which the unit was actively monitoring but no threshold crossing occurred.
Confirmed from 4-11-26 MITM capture analysis (2026-04-11):
Header layout (full response data[0:]):
data[0] = 0x2C (partial record type / data length in probe response)
data[1:5] = 0x00 × 4
data[5:9] = event key (4 bytes, big-endian hex)
data[9:11] = 0x00 × 2
data[11:] = timestamp_start (9 or 10 bytes depending on recording mode)
+ timestamp_stop (same format)
+ separator (45 bytes, variable)
+ serial null-terminated (e.g. "BE11529\\0")
+ "Geo: X.XXX in/s\\0" (trigger threshold string)
Timestamp format detection:
data[11] == 0x10 10-byte sub_code=0x03 (continuous) format
data[12] == 0x10 9-byte sub_code=0x10 (single-shot) format
In contrast to Event (triggered records, type 0x46), MonitorLogEntry
records do NOT have a waveform record (SUB 0x0C) or bulk waveform stream
(SUB 5A). All available metadata is in the 0x0A header alone.
"""
index: int # 0-based position in device record list
key: str # 8-hex event key (e.g. "01114290") ✅
start_time: Optional[datetime.datetime] = None # monitoring session start ✅
stop_time: Optional[datetime.datetime] = None # monitoring session stop ✅
serial: Optional[str] = None # device serial (e.g. "BE11529") ✅
geo_threshold_ips: Optional[float] = None # trigger level from "Geo: X.XXX in/s" ✅
# Raw bytes for debugging / future decoding
raw_header: Optional[bytes] = field(default=None, repr=False)
@property
def duration_seconds(self) -> Optional[float]:
"""Duration of monitoring interval in seconds, or None if times unavailable."""
if self.start_time and self.stop_time:
return (self.stop_time - self.start_time).total_seconds()
return None
def __str__(self) -> str:
start = self.start_time.isoformat() if self.start_time else "?"
stop = self.stop_time.isoformat() if self.stop_time else "?"
dur = f" ({self.duration_seconds:.0f}s)" if self.duration_seconds is not None else ""
return f"MonitorLog#{self.index} key={self.key} {start}{stop}{dur}"
# ── MonitorStatus ─────────────────────────────────────────────────────────────
@dataclass
+431 -116
View File
@@ -35,6 +35,8 @@ from .framing import (
token_params,
bulk_waveform_params,
bulk_waveform_term_params,
bulk_waveform_term_v2,
parse_strt_end_offset,
POLL_PROBE,
POLL_DATA,
SESSION_RESET,
@@ -57,7 +59,7 @@ SUB_POLL = 0x5B
SUB_SERIAL_NUMBER = 0x15
SUB_FULL_CONFIG = 0x01
SUB_EVENT_INDEX = 0x08
SUB_CHANNEL_CONFIG = 0x06
SUB_CHANNEL_CONFIG = 0x06 # Event storage range read (first/last key) ✅
SUB_MONITOR_STATUS = 0x1C # Monitoring status read (battery, memory, mode) ✅
SUB_EVENT_HEADER = 0x1E
SUB_EVENT_ADVANCE = 0x1F
@@ -65,6 +67,7 @@ SUB_WAVEFORM_HEADER = 0x0A
SUB_WAVEFORM_RECORD = 0x0C
SUB_BULK_WAVEFORM = 0x5A
SUB_COMPLIANCE = 0x1A
SUB_CALL_HOME = 0x2C # Call home config read → response 0xD3 ✅
SUB_UNKNOWN_2E = 0x2E
# Write command SUBs (= Read SUB + 0x60, confirmed from BW captures 3-11-26)
@@ -78,10 +81,20 @@ SUB_WRITE_CONFIRM_C = 0x74 # Confirm C — sent after 69 ✅
SUB_TRIGGER_CONFIG_WRITE = 0x82 # Write trigger config (0x22 + 0x60) ✅
SUB_TRIGGER_CONFIRM = 0x83 # Confirm trigger write ✅
# Call home write SUBs (confirmed from 4-20-26 call home settings captures)
SUB_CALL_HOME_WRITE = 0x7E # Write call home config → response 0x81 ✅
SUB_CALL_HOME_CONFIRM = 0x7F # Confirm call home write → response 0x80 ✅
# Monitoring control SUBs (confirmed from 4-8-26/2ndtry BW TX capture)
SUB_START_MONITORING = 0x96 # Start monitoring → response 0x69 ✅
SUB_STOP_MONITORING = 0x97 # Stop monitoring → response 0x68 ✅
# Erase-all SUBs (confirmed from 4-11-26 MITM capture)
# Both use token=0xFE at params[7] and return minimal 11-byte acks.
# Standard response formula applies: 0xFF - SUB.
SUB_ERASE_ALL_BEGIN = 0xA3 # Begin erase all events → response 0x5C ✅
SUB_ERASE_ALL_CONFIRM = 0xA2 # Confirm erase all events → response 0x5D ✅
# Hardcoded data lengths for the two-step read protocol.
#
# The S3 probe response page_key is always 0x0000 — it does NOT carry the
@@ -96,12 +109,14 @@ DATA_LENGTHS: dict[int, int] = {
SUB_SERIAL_NUMBER: 0x0A, # 10-byte serial number block ✅
SUB_FULL_CONFIG: 0x98, # 152-byte full config block ✅
SUB_EVENT_INDEX: 0x58, # 88-byte event index ✅
SUB_CHANNEL_CONFIG: 0x24, # 36-byte event storage range (first/last key) ✅
SUB_MONITOR_STATUS: 0x2C, # 44-byte monitor status block (idle) ✅
SUB_EVENT_HEADER: 0x08, # 8-byte event header (waveform key + event data) ✅
SUB_EVENT_ADVANCE: 0x08, # 8-byte next-key response ✅
# SUB_WAVEFORM_HEADER (0x0A) is VARIABLE — length read from probe response
# data[4]. Do NOT add it here; use read_waveform_header() instead. ✅
SUB_WAVEFORM_RECORD: 0xD2, # 210-byte waveform/histogram record ✅
SUB_CALL_HOME: 0x7C, # 124-byte call home config ✅ (confirmed 4-20-26)
SUB_UNKNOWN_2E: 0x1A, # 26 bytes, purpose TBD 🔶
0x09: 0xCA, # 202 bytes, purpose TBD 🔶
# SUB_COMPLIANCE (0x1A) uses a multi-step sequence with a 2090-byte total;
@@ -109,14 +124,22 @@ DATA_LENGTHS: dict[int, int] = {
}
# SUB 5A (BULK_WAVEFORM_STREAM) protocol constants.
# Confirmed from 1-2-26 BW TX capture analysis (2026-04-02).
_BULK_CHUNK_OFFSET = 0x1004 # offset field for probe + all regular chunk requests ✅
_BULK_TERM_OFFSET = 0x005A # offset field for termination request ✅
_BULK_COUNTER_STEP = 0x0400 # chunk counter increment per chunk ✅
# Chunk counter formula: chunk_num * 0x0400 for ALL chunks including chunk 1.
# Earlier captures showed 0x1004 for chunk 1 — that was a Blastware artifact, not a
# protocol requirement. Confirmed 2026-04-06: 0x0400 for chunk 1 works; 0x1004
# causes a 120-second device timeout. Formula n * 0x0400 is used for all chunks.
#
# 2026-05-01 minimal-fix: the chunk-counter walk is now bounded by the event's
# `end_offset` extracted from the STRT record at data[23:27] of the probe
# response. Without this bound the loop kept asking for chunks past the event
# end and the device responded with post-event circular-buffer garbage,
# corrupting reconstructed Blastware files for events ≥ 2 sec.
#
# We keep the OLD 0x0400 chunk step here (BW actually uses 0x0200 — see §7.8.5
# of the protocol reference for the corrected understanding) because the
# existing blastware_file.py builder relies on the 0x0400-step frame structure
# to produce valid files. Switching to BW's 0x0200 step is a separate task
# that also requires updating the file builder.
# BW-exact protocol values (v0.14.0). Verified against 4-27-26 + 5-1-26 captures.
_BULK_CHUNK_OFFSET = 0x1002 # offset_word for probe + all chunk requests
_BULK_TERM_OFFSET = 0x005A # offset_word for the legacy terminator (fallback only)
_BULK_COUNTER_STEP = 0x0200 # chunk counter increment (matches chunk payload size)
# Default timeout values (seconds).
# MiniMate Plus is a slow device — keep these generous.
@@ -387,23 +410,32 @@ class MiniMateProtocol:
Send the SUB 0A (WAVEFORM_HEADER) two-step read for *key4*.
The data length for 0A is VARIABLE and must be read from the probe
response at data[4]. Two known values:
0x30 full histogram bin (has a waveform record to follow)
0x26 partial histogram bin (no waveform record)
response at data[4]. Two confirmed values:
0x46 (70) full triggered event (has 0C waveform record to follow)
0x2C (44) partial / monitor-log entry (no 0C record; 0A header only)
Args:
key4: 4-byte waveform record address from 1E or 1F.
Returns:
(header_bytes, record_length) where:
header_bytes raw data section starting at data[11]
record_length DATA_LENGTH read from probe (0x30 or 0x26)
(raw_data, record_length) where:
raw_data complete data_rsp.data bytes (full response payload)
record_length DATA_LENGTH read from probe (0x46 for full, 0x2C for partial)
The raw_data layout:
raw_data[0] = record type (0x46 = full triggered event, 0x2C = partial/monitor)
raw_data[1:5] = 0x00 × 4
raw_data[5:9] = event key (4 bytes)
raw_data[9:11] = 0x00 × 2
raw_data[11:] = timestamps + separator + serial + channel strings
(see MonitorLogEntry in models.py for full layout)
Raises:
ProtocolError: on timeout, bad checksum, or wrong response SUB.
Confirmed from 3-31-26 capture: 0A probe response data[4] carries
Confirmed from 4-11-26 MITM capture: 0A probe response data[4] carries
the variable length; data-request uses that length as the offset byte.
record_length == data[0] in virtually all cases (confirmed empirically).
"""
rsp_sub = _expected_rsp_sub(SUB_WAVEFORM_HEADER)
params = waveform_key_params(key4)
@@ -413,7 +445,7 @@ class MiniMateProtocol:
probe_rsp = self._recv_one(expected_sub=rsp_sub)
# Variable length — read from probe response data[4]
length = probe_rsp.data[4] if len(probe_rsp.data) > 4 else 0x30
length = probe_rsp.data[4] if len(probe_rsp.data) > 4 else 0x46
log.debug("read_waveform_header: 0A data request offset=0x%02X", length)
if length == 0:
@@ -422,12 +454,11 @@ class MiniMateProtocol:
self._send(build_bw_frame(SUB_WAVEFORM_HEADER, length, params))
data_rsp = self._recv_one(expected_sub=rsp_sub)
header_bytes = data_rsp.data[11:11 + length]
log.debug(
"read_waveform_header: key=%s length=0x%02X is_full=%s",
key4.hex(), length, length == 0x30,
key4.hex(), length, length >= 0x40,
)
return header_bytes, length
return data_rsp.data, length
def read_waveform_data_raw(self) -> bytes:
"""
@@ -503,142 +534,270 @@ class MiniMateProtocol:
self,
key4: bytes,
*,
stop_after_metadata: bool = True,
max_chunks: int = 32,
) -> list[bytes]:
stop_after_metadata: bool = True, # DEPRECATED — no-op under BW-exact walk
max_chunks: int = 256, # safety cap only; loop is bounded by end_offset
include_terminator: bool = False,
extra_chunks_after_metadata: int = 1, # DEPRECATED — no-op
) -> list[S3Frame]:
"""
Download the SUB 5A (BULK_WAVEFORM_STREAM) A5 frames for one event.
Download the SUB 5A (BULK_WAVEFORM_STREAM) A5 frames for one event using
Blastware's exact protocol. REWRITTEN 2026-05-02 (v0.14.0).
The bulk waveform stream carries both raw ADC samples (large) and
event-time metadata strings ("Project:", "Client:", "User Name:",
"Seis Loc:", "Extended Notes") embedded in one of the middle frames
(confirmed: A5[7] of 9 for 1-2-26 capture).
Algorithm (matches BW captures across 2-sec / 3-sec / event-2):
Protocol is request-per-chunk, NOT a continuous stream:
1. Probe (offset=_BULK_CHUNK_OFFSET, is_probe=True, counter=0x0000)
2. Chunks (offset=_BULK_CHUNK_OFFSET, is_probe=False, counter+=0x0400)
3. Loop until metadata found (stop_after_metadata=True) or max_chunks
4. Termination (offset=_BULK_TERM_OFFSET, counter=last+_BULK_COUNTER_STEP)
Device responds with a final A5 frame (page_key=0x0000).
1. Probe
- For events at start_key[2:4] = 0x0000 (first event after erase
/ wrap): probe at counter=0x0000 with full key in params.
- For continuation events (start_key[2:4] != 0): first chunk at
counter = start_key[2:4] + 0x0046; acts as both probe and
first sample chunk; response carries STRT.
The termination frame (page_key=0x0000) is NOT included in the returned list.
2. Parse end_offset from STRT record at data[23:27] of the probe response.
Args:
key4: 4-byte waveform key from EVENT_HEADER (1E).
stop_after_metadata: If True (default), send termination as soon as
b"Project:" is found in a frame's data — avoids
downloading the full ADC waveform payload (several
hundred KB). Set False to download everything.
max_chunks: Safety cap on the number of chunk requests sent
(default 32; a typical event uses 9 large frames).
3. Read two fixed metadata pages at counter=0x1002 and counter=0x1004
global session metadata (Project / Client / User Name / Seis Loc
/ Extended Notes ASCII strings). Event 1 only; continuation
events skip these (BW caches them across the session).
4. Walk sample chunks at 0x0200 increments, starting from 0x0600 for
event 1 or `start + 0x0046 + 0x0200` for continuation events.
Stop when `next_chunk + 0x0200 > end_offset`.
5. Send TERM frame with offset_word and params computed by
`bulk_waveform_term_v2(key4, end_offset, last_chunk_counter)`.
The TERM response contains the partial last chunk (residual =
end_offset - next_boundary) including the 26-byte 0e 08 file
footer.
Returns:
List of raw data bytes from each A5 response frame (not including
the terminator frame). Frame indices match the request sequence:
index 0 = probe response, index 1 = first chunk, etc.
List of S3Frame objects from each A5 response (probe, metadata
pages, sample chunks, optional TERM response). Caller passes
`include_terminator=True` (e.g. write_blastware_file) to keep the
TERM response in the list it's required to reconstruct the
file footer.
Deprecated kwargs:
stop_after_metadata: legacy "Project:"-string-based stop condition.
No-op under the BW-exact walk; the loop is
deterministically bounded by end_offset from
STRT. Accepted for backward compat.
extra_chunks_after_metadata: same.
Raises:
ProtocolError: on timeout, bad checksum, or unexpected SUB.
Confirmed from 1-2-26 BW TX/RX captures (2026-04-02):
- probe + 8 regular chunks + 1 termination = 10 TX frames
- 9 large A5 responses + 1 terminator A5 = 10 RX frames
- page_key=0x0010 on large frames; page_key=0x0000 on terminator
- "Project:" metadata at A5[7].data[626]
ProtocolError: on timeout / bad checksum / unexpected SUB.
"""
if len(key4) != 4:
raise ValueError(f"waveform key must be 4 bytes, got {len(key4)}")
rsp_sub = _expected_rsp_sub(SUB_BULK_WAVEFORM) # 0xFF - 0x5A = 0xA5
frames_data: list[bytes] = []
counter = 0
# Quietly accept and warn on deprecated kwargs.
if not stop_after_metadata:
log.debug("5A: stop_after_metadata=False is no-op under BW-exact walk")
if extra_chunks_after_metadata not in (0, 1):
log.debug("5A: extra_chunks_after_metadata=%d is no-op under BW-exact walk",
extra_chunks_after_metadata)
# ── Step 1: probe ────────────────────────────────────────────────────
log.debug("5A probe key=%s", key4.hex())
params = bulk_waveform_params(key4, 0, is_probe=True)
self._send(build_5a_frame(_BULK_CHUNK_OFFSET, params))
self._parser.reset() # reset bytes_fed counter before probe recv
rsp_sub = _expected_rsp_sub(SUB_BULK_WAVEFORM) # 0xA5
frames_data: list[S3Frame] = []
start_offset = (key4[2] << 8) | key4[3]
is_event_1 = (start_offset == 0)
# ── Step 1: probe / first chunk ──────────────────────────────────────
if is_event_1:
probe_counter = 0
probe_params = bulk_waveform_params(key4, 0, is_probe=True)
log.debug("5A probe (event-1) key=%s counter=0x0000", key4.hex())
else:
# Continuation events: first 5A request lands at counter = key[2:4]
# (i.e. the address of the off=0x46 WAVEHDR record returned by 1F).
# The probe response carries STRT at byte 17 with end_offset.
#
# Confirmed 2026-05-04 from 5-1-26 "copy 2nd address" capture
# (BW probes counter=0x2238 with key=01112238, STRT@17 end=0x417E)
# and 5-4-26 BW captures (2-sec event probes counter=0x2238).
#
# The earlier "+0x46" formula in the doc came from calling
# start_key the BOUNDARY (off=0x2C) key, but the iteration walk
# uses 1F's off=0x46 key as cur_key, which already incorporates
# the +0x46 offset relative to the boundary. Adding it again
# caused the probe to overshoot, miss STRT, and run uncapped.
probe_counter = start_offset
probe_params = bulk_waveform_params(key4, probe_counter)
log.debug(
"5A probe (event-N) key=%s counter=0x%04X",
key4.hex(), probe_counter,
)
self._send(build_5a_frame(_BULK_CHUNK_OFFSET, probe_params))
self._parser.reset()
try:
rsp = self._recv_one(expected_sub=rsp_sub, reset_parser=False)
except TimeoutError:
log.warning(
"5A probe TIMED OUT for key=%s"
"%d raw bytes received (no complete A5 frame assembled)",
"5A probe TIMED OUT for key=%s%d raw bytes received",
key4.hex(), self._parser.bytes_fed,
)
raise
frames_data.append(rsp.data)
log.debug("5A A5[0] page_key=0x%04X %d bytes", rsp.page_key, len(rsp.data))
# ── Step 2: chunk loop ───────────────────────────────────────────────
# Chunk counters are monotonic: chunk_num * 0x0400 for all chunks.
# The 4-2-26 BW TX capture showed 0x1004 for chunk 1, but this is a
# Blastware artifact — the device accepts any counter value and streams
# data regardless. Empirically confirmed 2026-04-06: 0x0400 for chunk 1
# works; 0x1004 causes the device to ignore the frame (timeout).
for chunk_num in range(1, max_chunks + 1):
counter = chunk_num * _BULK_COUNTER_STEP
params = bulk_waveform_params(key4, counter)
log.debug("5A chunk %d counter=0x%04X", chunk_num, counter)
frames_data.append(rsp)
log.debug("5A A5[0] (probe) page_key=0x%04X %d bytes",
rsp.page_key, len(rsp.data))
# ── Step 2: parse STRT end_offset from probe response ────────────────
end_offset = parse_strt_end_offset(rsp.data)
if end_offset is None:
log.warning(
"5A probe response did not contain a STRT record; "
"cannot bound chunk loop — falling back to max_chunks=%d cap",
max_chunks,
)
end_offset = 0xFFFF # impossible value → loop runs to max_chunks
else:
log.info(
"5A STRT start_offset=0x%04X end_offset=0x%04X size=0x%04X",
start_offset, end_offset, end_offset - start_offset,
)
# ── Step 3: metadata pages 0x1002 + 0x1004 (event 1 only) ────────────
# Confirmed from BW captures: BW reads these two fixed device-buffer
# pages immediately after the probe for events at start_key[2:4]=0.
# Continuation events skip them (BW caches across the session).
# Their content is global compliance-setup metadata: Project, Client,
# User Name, Seis Loc, Extended Notes.
if is_event_1:
for meta_counter in (0x1002, 0x1004):
# Metadata page params have an extra trailing 0x00 byte
# (12-byte params instead of 11) — empirical from BW captures.
# Checksum-neutral but matches BW byte-for-byte.
meta_params = bytes([
0x00,
key4[0], key4[1],
(meta_counter >> 8) & 0xFF,
meta_counter & 0xFF,
0, 0, 0, 0, 0, 0, 0,
])
log.debug("5A metadata page counter=0x%04X", meta_counter)
self._send(build_5a_frame(_BULK_CHUNK_OFFSET, meta_params))
self._parser.reset()
try:
meta_rsp = self._recv_one(
expected_sub=rsp_sub, reset_parser=False, timeout=10.0,
)
except TimeoutError:
log.warning(
"5A metadata page 0x%04X TIMED OUT — continuing",
meta_counter,
)
continue
frames_data.append(meta_rsp)
log.debug(
"5A meta@0x%04X page_key=0x%04X %d bytes",
meta_counter, meta_rsp.page_key, len(meta_rsp.data),
)
# ── Step 4: sample chunk loop, bounded by end_offset ─────────────────
# Sample chunks start at:
# event 1: counter = 0x0600
# event N (>0): counter = probe_counter + 0x0200
# (probe was the first sample chunk)
if is_event_1:
counter = 0x0600
else:
counter = probe_counter + _BULK_COUNTER_STEP
last_chunk_counter: Optional[int] = (
probe_counter if not is_event_1 else None
)
chunks_fetched = 0
while chunks_fetched < max_chunks:
# Stop when next chunk would straddle the event end.
if counter + _BULK_COUNTER_STEP > end_offset:
log.debug(
"5A chunk loop done at counter=0x%04X (end=0x%04X); "
"%d chunks fetched",
counter, end_offset, chunks_fetched,
)
break
params = bulk_waveform_params(key4, counter)
log.debug("5A chunk #%d counter=0x%04X", chunks_fetched + 1, counter)
self._send(build_5a_frame(_BULK_CHUNK_OFFSET, params))
self._parser.reset() # reset bytes_fed for accurate per-chunk count
self._parser.reset()
try:
rsp = self._recv_one(expected_sub=rsp_sub, reset_parser=False, timeout=10.0)
rsp = self._recv_one(
expected_sub=rsp_sub, reset_parser=False, timeout=10.0,
)
except TimeoutError:
raw = self._parser.bytes_fed
log.warning(
"5A TIMEOUT chunk=%d counter=0x%04X raw_bytes=%d",
chunk_num, counter, raw,
chunks_fetched + 1, counter, raw,
)
if raw > 0 and frames_data:
# Device sent a partial byte (likely a bare DLE/ETX end-of-stream
# signal) but never completed a full frame. Treat as graceful
# stream end and fall through to the termination step.
log.warning(
"5A end-of-stream detected at chunk=%d (raw_bytes=%d, "
"frames_collected=%d) — proceeding to termination",
chunk_num, raw, len(frames_data),
"5A unexpected end-of-stream — proceeding to TERM",
)
break
raise
log.warning(
"5A RX chunk=%d page_key=0x%04X data_len=%d contains_Project=%s",
chunk_num, rsp.page_key, len(rsp.data), b"Project:" in rsp.data,
log.debug(
"5A RX chunk=%d page_key=0x%04X data_len=%d",
chunks_fetched + 1, rsp.page_key, len(rsp.data),
)
if rsp.page_key == 0x0000:
# Device unexpectedly terminated mid-stream (no termination needed).
log.debug("5A A5[%d] page_key=0x0000 — device terminated early", chunk_num)
# Device terminated mid-stream unexpectedly.
log.warning(
"5A unexpected page_key=0x0000 mid-stream at counter=0x%04X",
counter,
)
if include_terminator:
frames_data.append(rsp)
return frames_data
frames_data.append(rsp.data)
if stop_after_metadata and b"Project:" in rsp.data:
log.debug("5A A5[%d] metadata found — stopping early", chunk_num)
break
frames_data.append(rsp)
last_chunk_counter = counter
counter += _BULK_COUNTER_STEP
chunks_fetched += 1
else:
log.warning(
"5A reached max_chunks=%d without end-of-stream; sending termination",
max_chunks,
"5A reached max_chunks=%d at counter=0x%04X (end=0x%04X)",
max_chunks, counter, end_offset,
)
# ── Step 3: termination ──────────────────────────────────────────────
term_counter = counter + _BULK_COUNTER_STEP
term_params = bulk_waveform_term_params(key4, term_counter)
log.debug(
"5A termination term_counter=0x%04X offset=0x%04X",
term_counter, _BULK_TERM_OFFSET,
)
self._send(build_5a_frame(_BULK_TERM_OFFSET, term_params))
try:
term_rsp = self._recv_one(expected_sub=rsp_sub)
# ── Step 5: TERM with proper end_offset-derived formula ──────────────
if last_chunk_counter is None or end_offset == 0xFFFF:
# No STRT or no chunks fetched — fall back to legacy TERM.
log.warning(
"5A using legacy TERM (offset_word=0x005A); "
"end_offset unavailable or no chunks fetched",
)
legacy_counter = (last_chunk_counter or probe_counter) + _BULK_COUNTER_STEP
term_offset_word = _BULK_TERM_OFFSET # 0x005A
term_params = bulk_waveform_term_params(key4, legacy_counter)
else:
term_offset_word, term_params = bulk_waveform_term_v2(
key4, end_offset, last_chunk_counter,
)
log.debug(
"5A termination response page_key=0x%04X %d bytes",
"5A TERM offset_word=0x%04X params[2:4]=%s end=0x%04X "
"last_chunk=0x%04X",
term_offset_word, term_params[2:4].hex(),
end_offset, last_chunk_counter,
)
self._send(build_5a_frame(term_offset_word, term_params))
try:
term_rsp = self._recv_one(expected_sub=rsp_sub, timeout=10.0)
log.info(
"5A TERM response page_key=0x%04X %d bytes",
term_rsp.page_key, len(term_rsp.data),
)
if include_terminator:
frames_data.append(term_rsp)
except TimeoutError:
log.debug("5A no termination response — device may have already closed")
log.warning("5A no TERM response (timeout)")
return frames_data
@@ -778,7 +937,7 @@ class MiniMateProtocol:
continue
chunk = data_rsp.data[11:]
log.warning(
log.debug(
"read_compliance_config: frame %s page=0x%04X data=%d cfg_chunk=%d running_total=%d",
step_name, data_rsp.page_key, len(data_rsp.data),
len(chunk), len(config) + len(chunk),
@@ -798,17 +957,18 @@ class MiniMateProtocol:
except TimeoutError:
pass
log.warning(
log.info(
"read_compliance_config: done — %d cfg bytes total",
len(config),
)
# Hex dump first 128 bytes for field mapping
for row in range(0, min(len(config), 128), 16):
row_bytes = bytes(config[row:row + 16])
hex_part = ' '.join(f'{b:02x}' for b in row_bytes)
asc_part = ''.join(chr(b) if 32 <= b < 127 else '.' for b in row_bytes)
log.warning(" cfg[%04x]: %-48s %s", row, hex_part, asc_part)
# Hex dump first 128 bytes — useful only for field-mapping work, not normal operation.
if log.isEnabledFor(logging.DEBUG):
for row in range(0, min(len(config), 128), 16):
row_bytes = bytes(config[row:row + 16])
hex_part = ' '.join(f'{b:02x}' for b in row_bytes)
asc_part = ''.join(chr(b) if 32 <= b < 127 else '.' for b in row_bytes)
log.debug(" cfg[%04x]: %-48s %s", row, hex_part, asc_part)
return bytes(config)
@@ -1072,6 +1232,89 @@ class MiniMateProtocol:
self._send(frame)
return self.recv_write_ack(expected_sub=rsp_sub)
# ── Call home config (SUBs 0x2C / 0x7E / 0x7F) ──────────────────────────
def read_call_home_config(self) -> bytes:
"""
Read the auto call home configuration (SUB 0x2C response 0xD3).
Standard two-step read: probe (offset=0x00) then data (offset=0x7C=124).
Returns the raw 125-byte payload (data[11:] of the data response).
Confirmed from 4-20-26 call home settings capture:
- Probe response: data[4]=0x7C (confirms data length = 124)
- Data response: 136 bytes total (11-byte echo header + 125 bytes payload)
- Payload[0:3] = 0x00 0x7C 0xDC (header: zero, inner-length, constant)
- Payload[5] = auto_call_home_enabled
- Payload[6:46] = dial_string (40-byte null-padded ASCII "RADIO RING")
Returns:
Raw 125-byte call home config payload (data[11:]).
Suitable for round-trip write (append \\x00\\x00 127-byte write payload).
Raises:
ProtocolError: on timeout or wrong response SUB.
"""
rsp_sub = _expected_rsp_sub(SUB_CALL_HOME) # 0xFF - 0x2C = 0xD3
length = DATA_LENGTHS[SUB_CALL_HOME] # 0x7C = 124
log.debug("read_call_home_config: 0x2C probe")
self._send(build_bw_frame(SUB_CALL_HOME, 0))
self._recv_one(expected_sub=rsp_sub)
log.debug("read_call_home_config: 0x2C data request offset=0x%02X", length)
self._send(build_bw_frame(SUB_CALL_HOME, length))
data_rsp = self._recv_one(expected_sub=rsp_sub)
payload = data_rsp.data[11:]
log.debug("read_call_home_config: received %d payload bytes", len(payload))
return payload
def write_call_home_config(self, data: bytes) -> None:
"""
Write the auto call home configuration (SUB 0x7E 0x7F confirm).
Write sequence (confirmed from 4-20-26 call home settings captures):
SUB 0x7E write 127-byte payload device acks SUB 0x81
SUB 0x7F confirm (no data) device acks SUB 0x80
The 127-byte write payload = 125-byte read payload + b'\\x00\\x00'.
The offset field = data[1] + 2 = 0x7C + 2 = 0x7E = 126.
Write frame format: build_bw_write_frame (minimal DLE stuffing only
BW_CMD is doubled; all other bytes are RAW). The \\x10\\x03 sequence
within the payload is preserved as-is (device interprets DLE+ETX as the
literal value 0x03 per the inner-frame terminator convention).
Args:
data: 127-byte write payload (read payload + \\x00\\x00 footer).
Must start with [0x00][0x7C][...] (standard header).
Raises:
ValueError: if data is not exactly 127 bytes or lacks expected header.
ProtocolError: on timeout or wrong response SUB.
"""
if len(data) < 2:
raise ValueError(f"call home write payload must be at least 2 bytes, got {len(data)}")
rsp_sub_write = _expected_rsp_sub(SUB_CALL_HOME_WRITE) # 0xFF - 0x7E = 0x81
rsp_sub_confirm = _expected_rsp_sub(SUB_CALL_HOME_CONFIRM) # 0xFF - 0x7F = 0x80
# Offset formula: data[1] + 2 (same pattern as other single-chunk writes)
offset = data[1] + 2 # 0x7C + 2 = 0x7E = 126
frame = build_bw_write_frame(SUB_CALL_HOME_WRITE, data, offset=offset)
log.debug(
"write_call_home_config: %d bytes data[1]=0x%02X offset=0x%04X",
len(data), data[1], offset,
)
self._send(frame)
self.recv_write_ack(expected_sub=rsp_sub_write)
log.debug("write_call_home_config: write acked; sending confirm 0x7F")
confirm_frame = build_bw_write_frame(SUB_CALL_HOME_CONFIRM, b"")
self._send(confirm_frame)
self.recv_write_ack(expected_sub=rsp_sub_confirm)
log.debug("write_call_home_config: confirm acked — done")
# ── Monitoring ────────────────────────────────────────────────────────────
def read_monitor_status(self) -> S3Frame:
@@ -1137,6 +1380,78 @@ class MiniMateProtocol:
self._send(frame)
return self.recv_write_ack(expected_sub=rsp_sub)
def read_event_storage_range(self) -> S3Frame:
"""
Read event storage range (SUB 0x06 response 0xF9).
Two-step read: probe (offset=0x00) then data (offset=0x24 = 36 bytes).
Uses token=0xFE at params[7] same as the erase sequence.
The 36-byte response ends with two 4-byte event keys (first and last
stored event key). After a successful erase, both keys are 0x01110000
(device-empty sentinel). Confirmed from 4-11-26 MITM capture.
Returns:
S3Frame with 36 bytes of storage range data.
Raises:
ProtocolError: on timeout or wrong response SUB.
"""
rsp_sub = _expected_rsp_sub(SUB_CHANNEL_CONFIG) # 0xFF - 0x06 = 0xF9
params = token_params(0xFE)
log.debug("read_event_storage_range: probe step rsp_sub=0x%02X", rsp_sub)
self._send(build_bw_frame(SUB_CHANNEL_CONFIG, offset=0x00, params=params))
self._recv_one(expected_sub=rsp_sub)
log.debug(
"read_event_storage_range: data step offset=0x%02X",
DATA_LENGTHS[SUB_CHANNEL_CONFIG],
)
self._send(build_bw_frame(SUB_CHANNEL_CONFIG,
offset=DATA_LENGTHS[SUB_CHANNEL_CONFIG],
params=params))
return self._recv_one(expected_sub=rsp_sub)
def begin_erase_all(self) -> S3Frame:
"""
Send Begin-Erase-All command (SUB 0xA3 response 0x5C).
Single frame with token=0xFE at params[7]. The device acknowledges with
a minimal ack and begins the erase process. Follow up with
read_monitor_status() + read_event_storage_range() + confirm_erase_all()
to complete the sequence. Confirmed from 4-11-26 MITM capture.
Returns:
S3Frame ack from device (SUB 0x5C).
Raises:
ProtocolError: on timeout or wrong response SUB.
"""
rsp_sub = _expected_rsp_sub(SUB_ERASE_ALL_BEGIN) # 0xFF - 0xA3 = 0x5C
log.debug("begin_erase_all: rsp_sub=0x%02X", rsp_sub)
self._send(build_bw_frame(SUB_ERASE_ALL_BEGIN, params=token_params(0xFE)))
return self._recv_one(expected_sub=rsp_sub)
def confirm_erase_all(self) -> S3Frame:
"""
Send Confirm-Erase-All command (SUB 0xA2 response 0x5D).
Single frame with token=0xFE at params[7]. Must be preceded by
begin_erase_all() + read_monitor_status() + read_event_storage_range().
After this call the device memory is cleared. Confirmed from 4-11-26
MITM capture.
Returns:
S3Frame ack from device (SUB 0x5D).
Raises:
ProtocolError: on timeout or wrong response SUB.
"""
rsp_sub = _expected_rsp_sub(SUB_ERASE_ALL_CONFIRM) # 0xFF - 0xA2 = 0x5D
log.debug("confirm_erase_all: rsp_sub=0x%02X", rsp_sub)
self._send(build_bw_frame(SUB_ERASE_ALL_CONFIRM, params=token_params(0xFE)))
return self._recv_one(expected_sub=rsp_sub)
# ── Internal helpers ──────────────────────────────────────────────────────
def _send(self, frame: bytes) -> None:
+135
View File
@@ -418,3 +418,138 @@ class TcpTransport(BaseTransport):
def __repr__(self) -> str:
state = "connected" if self.is_connected else "disconnected"
return f"TcpTransport({self.host!r}, port={self.port}, {state})"
# ── Inbound / accepted-socket transport ───────────────────────────────────────
class SocketTransport(TcpTransport):
"""
Like TcpTransport but wraps an already-accepted inbound socket.
Used by the ACH inbound server (bridges/ach_server.py) the device dials
IN to us, so by the time we create this transport the socket is already live.
connect() is a no-op; everything else (read, write, read_until_idle, ) is
inherited unchanged from TcpTransport.
Args:
sock: An already-connected socket.socket returned by server_socket.accept().
peer: Human-readable peer label for repr / logging (e.g. "203.0.113.5:54321").
"""
def __init__(self, sock: socket.socket, peer: str = "inbound") -> None:
# Bypass TcpTransport.__init__ — we already have a live socket.
self.host = peer
self.port = 0
self.connect_timeout = 0.0
self._sock = sock
sock.settimeout(self._RECV_TIMEOUT)
def connect(self) -> None:
"""No-op — socket was already accepted inbound."""
pass # Already have a live socket; nothing to open.
@property
def is_connected(self) -> bool:
return self._sock is not None
def __repr__(self) -> str:
return f"SocketTransport(peer={self.host!r})"
# ── Capturing transport (MITM-style raw byte mirror) ──────────────────────────
class CapturingTransport(BaseTransport):
"""
Wraps another BaseTransport and mirrors every byte to two raw capture files:
raw_bw_<...>.bin bytes WE wrote to the device (BW-side TX)
raw_s3_<...>.bin bytes the device wrote back (S3-side TX)
The file naming and on-wire byte layout are identical to the captures
produced by `bridges/ach_mitm.py`, so the resulting `.bin` files can be
loaded directly by the Analyzer (File > Open Capture) and parsed by the
same tooling used for genuine Blastware MITM captures.
All BaseTransport methods are forwarded to the inner transport; the only
side-effect is that successful read/write byte streams are appended to the
two open binary files.
Args:
inner: An already-built BaseTransport (SerialTransport / TcpTransport).
bw_path: File path for the "BW TX" stream (bytes we send). Opened "wb".
s3_path: File path for the "S3 TX" stream (bytes the device sends).
Opened "wb".
Example:
with CapturingTransport(TcpTransport("1.2.3.4", 9034),
"raw_bw.bin", "raw_s3.bin") as t:
client = MiniMateClient(transport=t)
client.connect()
client.get_events()
# both .bin files now hold the full bidirectional capture.
"""
def __init__(self, inner: BaseTransport, bw_path: str, s3_path: str) -> None:
self._inner = inner
self._bw_path = bw_path
self._s3_path = s3_path
self._bw_fh = None
self._s3_fh = None
# Forward inner attrs so callers can introspect (e.g. .host, .port).
self.host = getattr(inner, "host", None)
self.port = getattr(inner, "port", None)
# ── BaseTransport interface ───────────────────────────────────────────────
def connect(self) -> None:
if self._bw_fh is None:
self._bw_fh = open(self._bw_path, "wb", buffering=0)
if self._s3_fh is None:
self._s3_fh = open(self._s3_path, "wb", buffering=0)
self._inner.connect()
def disconnect(self) -> None:
try:
self._inner.disconnect()
finally:
for fh_attr in ("_bw_fh", "_s3_fh"):
fh = getattr(self, fh_attr)
if fh is not None:
try:
fh.flush()
fh.close()
except Exception:
pass
setattr(self, fh_attr, None)
@property
def is_connected(self) -> bool:
return self._inner.is_connected
def write(self, data: bytes) -> None:
self._inner.write(data)
if data and self._bw_fh is not None:
try:
self._bw_fh.write(data)
except Exception:
pass
def read(self, n: int) -> bytes:
got = self._inner.read(n)
if got and self._s3_fh is not None:
try:
self._s3_fh.write(got)
except Exception:
pass
return got
@property
def bw_path(self) -> str:
return self._bw_path
@property
def s3_path(self) -> str:
return self._s3_path
def __repr__(self) -> str:
return f"CapturingTransport({self._inner!r}, bw={self._bw_path!r}, s3={self._s3_path!r})"
+578
View File
@@ -0,0 +1,578 @@
"""
waveform_codec.py block-walker and verified decoder for the MiniMate Plus
waveform-file body.
FULLY DECODED 2026-05-11. Every block type, every channel, and the
channel-rotation rule are verified byte-exact against BW's ASCII export
across the 9-event fixture bundle (47,364 ADC samples, zero errors).
The Blastware waveform-file body the bytes between the 21-byte STRT
record and the 26-byte file footer is a tagged variable-length block
stream with a custom delta + RLE codec. (Not raw int16 LE, which was
the historical wrong assumption that produced ±32K noise on every event.)
Current status:
- Block framing: solved (5 block types and lengths all confirmed)
- Per-channel decode: solved (Tran / Vert / Long / MicL all byte-exact)
- Channel rotation: Tran Vert Long MicL per segment
- Segment header: fully decoded (anchor pair + prev-channel extension)
- 30 NN packed-delta block: NN × 12-bit signed deltas in NN/4 groups
- MicL dB(L) conversion: ``mic_count_to_db`` matches BW display
- Production wiring: ``client.py:_decode_a5_waveform`` uses the new
codec (via ``decode_a5_frames``). ``.h5`` sidecars now render
correctly.
Known limitations:
- Walker stops early on the loudest events (SP0, SS0, SV0, event-b) at
some mid-segment edge cases not yet fully characterized. Every
sample reached IS correct; the walker just doesn't reach all of
them yet. The cleanly-decoded subset is still ~500015000 samples
per loud event.
Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
[7-byte preamble] [stream of tagged blocks] [trailer]
The preamble is always exactly 7 bytes:
body[0:3] = 00 02 00 magic
body[3:5] = Tran[0] int16 BE in 16-count units (LSB = 0.005 in/s)
body[5:7] = Tran[1] int16 BE in 16-count units
(Earlier drafts of this module described a "7-or-9-byte preamble";
that was wrong single-shot and continuous events both use 7 bytes.
The "extra 2 bytes" on continuous events were the first ``00 NN`` RLE
marker, not part of the preamble.)
Block types and lengths (all confirmed):
| Tag | Length | Meaning |
|----------|-----------------------|----------------------------------------|
| ``10 NN``| NN/2 + 2 bytes | 4-bit nibble deltas (2 per byte; high |
| | | nibble first; signed 0..7 / 8..F = -8..-1)|
| ``20 NN``| NN + 2 bytes | int8 signed deltas (1 per byte) |
| ``00 NN``| 2 bytes | RLE: append NN copies of current value |
| ``30 NN``| NN*2 in data, NN*4 | Unknown content. Only in loud events. |
| | in trailer | |
| ``40 02``| 20 bytes (fixed) | Segment header |
NN is always a multiple of 4.
Tran channel, segment 0 (CONFIRMED 2026-05-11)
Segment 0 everything before the first ``40 02`` segment header encodes
Tran samples only. Starting from preamble anchors Tran[0] and Tran[1],
each subsequent block contributes to the running Tran value:
10 NN append NN deltas (4-bit signed nibbles)
20 NN append NN deltas (int8 signed bytes)
00 NN append NN copies of the current value (RLE zeros)
40 02 segment 0 ends; multi-segment continuation is open
This decodes the first 482510 samples of Tran for each event with zero
errors against BW's ASCII export. The exact segment-0 sample count
varies per event (it's bounded by a fixed device-flash byte budget, not
a fixed sample count quiet events fit more samples because zero
deltas pack into ``00 NN`` markers compactly).
Implementation: :func:`decode_tran_initial`.
Segment header (40 02, 20 bytes total)
The 18-byte payload of the ``40 02`` block:
| Offset | Field | Status |
|-----------|---------------------------------------------|-------------|
| [0:2] | T_delta at first sample of new segment | confirmed|
| | (int16 BE, in 16-count units) | |
| [2:4] | Likely T_delta at sample seg_start+1 | 🟡 likely |
| [4:6] | Unknown (varies; possibly checksum) | open |
| [6:8] | Byte length to next segment header 2 | confirmed|
| | (uint16 BE; useful for walker pre-scan) | |
| [8:12] | Monotonic uint32 LE counter | confirmed|
| | (starts ~0x47, increments by 1 per segment) | |
| [12:14] | Constant ``02 00`` | confirmed|
| [14:18] | Unknown 4-byte field | open |
What breaks the multi-segment decoder (the main open question)
After segment 0 ends and the segment header T_delta is consumed,
applying segment 1's blocks as Tran continuation produces values that
diverge from truth by sample ~512. The block structure inside segment
1 is IDENTICAL to segment 0 (same alternating 10 NN / 00 NN pattern),
and the delta budget matches the segment size exactly (V70 segment 1
has 264 nibble-deltas + 244 RLE zeros = 508 = the segment's sample
count). But the cumulative is wrong.
The strongest unverified hypothesis is that segments rotate channels:
segment 0 Tran samples 0..509
segment 1 Vert samples 0..507
segment 2 Long samples 0..507
segment 3 Mic samples 0..507
segment 4 Tran samples 510..N (continuation)
...
This is consistent with the segment-1 block sums net-to-near-zero in
V70 (where all 4 channels are near zero) and with the per-segment delta
budget matching the segment size for a single channel. It is NOT yet
verified because the per-segment channel anchor isn't pinned down in
the segment header bytes [4:6] and [14:18] of the header are still
open and probably encode V/L/M anchors.
See ``docs/waveform_codec_re_status.md`` for the current working notes
and the suggested next experiment ("segment-channel scoring analyzer").
"""
from __future__ import annotations
import math
from dataclasses import dataclass
from typing import List, Optional, Tuple
@dataclass
class WaveformBlock:
"""One tagged block parsed out of a Blastware waveform-file body."""
offset: int # byte offset into body
tag_hi: int # first tag byte (0x10 / 0x20 / 0x00 / 0x30 / 0x40)
tag_lo: int # second tag byte (NN)
data: bytes # block payload (excludes the 2-byte tag)
length: int # total block length on the wire (includes the tag)
@property
def kind(self) -> str:
return f"{self.tag_hi:02x} {self.tag_lo:02x}"
def find_data_start(body: bytes) -> int:
"""Auto-detect the offset of the first data block.
The body starts with a 7-byte preamble (magic ``00 02 00`` + two int16 BE
Tran anchors). After that, the data section starts with a tag usually
``10 NN`` or ``20 NN``, but quiet events may begin with a ``00 NN`` RLE
marker. We return the offset of the first recognized tag.
"""
# Try fixed offset 7 first (canonical preamble length).
if len(body) >= 9:
b, nn = body[7], body[8]
if (b in (0x00, 0x10, 0x20, 0x30) and nn % 4 == 0 and 0 < nn <= 0xFC) \
or (b == 0x40 and nn == 0x02):
return 7
# Fall back to scanning the first 20 bytes.
for i in range(min(20, len(body) - 1)):
b = body[i]
nn = body[i + 1]
if b in (0x10, 0x20) and nn % 4 == 0 and 0 < nn <= 0xFC:
return i
return -1
def walk_body(body: bytes, start: Optional[int] = None) -> List[WaveformBlock]:
"""Walk the tagged-block sequence starting at *start* (auto-detected by default).
Stops when an unrecognized tag is encountered or end of body is reached.
Returned blocks are in stream order.
"""
if start is None:
start = find_data_start(body)
if start < 0:
return []
blocks: List[WaveformBlock] = []
i = start
while i + 1 < len(body):
t0 = body[i]
t1 = body[i + 1]
if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 // 2 + 2
elif (t0 & 0xF0) == 0x10 and (t0 & 0x0F) != 0 and t1 % 4 == 0:
# Wide-NN nibble block: ``1X NN`` where X is the high nibble of a
# 12-bit NN value. NN = ((t0 & 0x0F) << 8) | t1. Block length
# = NN/2 + 2 bytes (NN nibble deltas, same as ``10 NN`` semantics
# but with NN > 0xFC). Confirmed 2026-05-11 in SP0 segment 12
# where V continuation uses ``11 90`` = NN=0x190=400.
wide_nn = ((t0 & 0x0F) << 8) | t1
length = wide_nn // 2 + 2
elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
length = t1 + 2
elif (t0 & 0xF0) == 0x20 and (t0 & 0x0F) != 0 and t1 % 4 == 0:
# Wide-NN int8 block: ``2X NN`` extends NN to 12 bits the same way.
wide_nn = ((t0 & 0x0F) << 8) | t1
length = wide_nn + 2
elif t0 == 0x00 and t1 % 4 == 0:
length = 2
elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
# Data-section ``30 NN`` blocks carry NN 12-bit signed deltas packed
# as NN/4 groups of (2-byte high-nibble field + 4 × int8 low byte).
# Length = NN/4 × 6 + 2 = NN × 1.5 + 2 (= 8 for NN=4, 14 for NN=8,
# 20 for NN=12, etc.). Confirmed 2026-05-11 by full-decoder
# verification against BW ASCII export.
#
# Trailer-section ``30 NN`` blocks have a different length formula
# (NN × 4 = 32 for NN=8 in trailers). We try the data-section
# length first and fall back to the trailer length if needed.
cand_data = t1 * 3 // 2 + 2
cand_trailer = t1 * 4
if (i + cand_data < len(body) - 1
and body[i + cand_data] in (0x10, 0x20, 0x00, 0x30, 0x40)):
length = cand_data
else:
length = cand_trailer
elif t0 == 0x40 and t1 == 0x02:
length = 20
else:
# Unknown tag; stop. Caller can inspect ``i`` to see where.
break
if i + length > len(body):
break
data = bytes(body[i + 2 : i + length])
blocks.append(WaveformBlock(offset=i, tag_hi=t0, tag_lo=t1, data=data, length=length))
i += length
return blocks
def split_segments(blocks: List[WaveformBlock]) -> List[List[WaveformBlock]]:
"""Group consecutive blocks into segments separated by ``40 02`` headers.
The first segment is whatever runs before the first ``40 02`` header
(typically the "segment 0" preamble data after the body preamble).
Subsequent segments start with a ``40 02`` block, then have their
own data blocks until the next ``40 02``.
"""
segments: List[List[WaveformBlock]] = []
current: List[WaveformBlock] = []
for b in blocks:
if b.tag_hi == 0x40 and b.tag_lo == 0x02:
if current:
segments.append(current)
current = [b]
else:
current.append(b)
if current:
segments.append(current)
return segments
def parse_segment_header(block: WaveformBlock) -> Optional[dict]:
"""Decode the 18-byte payload of a ``40 02`` segment header.
Returns a dict with the labelled fields, or None if *block* is not
a ``40 02`` header.
"""
if not (block.tag_hi == 0x40 and block.tag_lo == 0x02):
return None
if len(block.data) < 18:
return None
p = block.data
counter = int.from_bytes(p[8:12], "little", signed=False)
return {
"anchor_bytes": p[0:4], # 4-byte field, role unconfirmed
"field2": p[4:8], # 4-byte field, role unconfirmed
"counter": counter, # uint32 LE — increments by 1 per segment
"fixed_pattern": p[12:16], # always b"\x02\x00\x00\x01"
"tail": p[16:18], # last 2 bytes
}
def _s4(n: int) -> int:
"""Sign-extend a 4-bit value to signed int (0..7 → 0..7; 8..F → -8..-1)."""
return n if n < 8 else n - 16
def _i8(b: int) -> int:
"""Reinterpret an unsigned byte as signed int8."""
return b if b < 128 else b - 256
def decode_tran_initial(body: bytes) -> Optional[List[int]]:
"""
Decode the initial Tran-channel samples VERIFIED 2026-05-11.
Returns Tran samples in **16-count units** (LSB = 0.005 in/s at Normal
range the same quantization BW uses for its ASCII export). Returns
``None`` if the body cannot be parsed.
The decoded list extends from sample 0 through the end of segment 0
(= just before the first ``40 02`` segment header; ~510 sample-sets
for the events tested). Multi-segment decoding requires continuing
past the segment header that's done by :func:`decode_tran_full`
when the per-segment rules are pinned down for all signal types.
Codec for segment 0 (CONFIRMED 2026-05-11 against 7 fixture events):
- Body bytes [0:3] are the magic ``00 02 00``.
- Body bytes [3:5] = ``Tran[0]`` as int16 BE in 16-count units.
- Body bytes [5:7] = ``Tran[1]`` as int16 BE in 16-count units.
- Data blocks (``10 NN`` or ``20 NN``) carry Tran deltas starting
at sample 2:
* ``10 NN``: NN nibbles = NN/2 bytes; each nibble is a 4-bit
signed delta (0..7 0..+7; 8..F -8..-1). High nibble of
each byte comes first.
* ``20 NN``: NN int8 signed deltas (one delta per byte).
- ``00 NN`` blocks are run-length-encoded zero deltas: append NN
copies of the current cumulative Tran value (no change).
- ``30 NN`` blocks have not yet been decoded for content they
appear in segment 0 of loud-from-start events (SS0, SV0) and
seem to signal a transition or special-case interpretation.
The walker steps over them but their data is ignored.
The walk stops at the first ``40 02`` segment header.
"""
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
return None
t0 = int.from_bytes(body[3:5], "big", signed=True)
t1 = int.from_bytes(body[5:7], "big", signed=True)
start = find_data_start(body)
if start < 0:
return [t0, t1]
out = [t0, t1]
cur = t1
for blk in walk_body(body, start):
if blk.tag_hi == 0x40:
# Segment boundary — stop. Multi-segment decode is decode_tran_full.
break
if blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur += _s4(nib)
out.append(cur)
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur += _i8(byte)
out.append(cur)
elif blk.tag_hi == 0x00:
# RLE zero deltas: append NN copies of current Tran value.
for _ in range(blk.tag_lo):
out.append(cur)
# 30 NN: unknown content; skip.
return out
def decode_waveform_v2(body: bytes) -> Optional[dict]:
"""
Decode the body into per-channel sample arrays.
Status (2026-05-11 evening channel-rotation hypothesis CONFIRMED):
segments rotate channels in fixed order **Tran Vert Long MicL**.
Each channel-segment carries a 2-sample anchor pair in segment-header
bytes [14:18] (or in the body preamble for the initial Tran segment)
plus a stream of delta blocks for samples 2 onward.
Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
with each channel's decoded samples in 16-count units (LSB = 0.005
in/s at Normal range). Returns ``None`` if the body cannot be
parsed.
"""
if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
return None
channels = ["Tran", "Vert", "Long", "MicL"]
out: dict = {ch: [] for ch in channels}
# Initial Tran segment: preamble anchor pair + delta blocks before first 40 02.
t0 = int.from_bytes(body[3:5], "big", signed=True)
t1 = int.from_bytes(body[5:7], "big", signed=True)
out["Tran"].extend([t0, t1])
start = find_data_start(body)
if start < 0:
return out
blocks = walk_body(body, start)
seg_idx = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
def apply_blocks(channel: str, anchor: int,
block_start: int, block_end: int) -> int:
"""Apply delta blocks [block_start, block_end) to *channel*'s sample
list, starting from *anchor*. Returns the final cumulative value."""
cur = anchor
for bi in range(block_start, block_end):
blk = blocks[bi]
if (blk.tag_hi & 0xF0) == 0x10:
# Both ``10 NN`` (NN ≤ 0xFC) and wide-NN ``1X NN`` (X != 0)
# are nibble-delta streams. The walker has already used the
# right length; here we just iterate the payload bytes.
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur += _s4(nib)
out[channel].append(cur)
elif (blk.tag_hi & 0xF0) == 0x20:
# ``20 NN`` and wide ``2X NN`` both carry int8 deltas.
for byte in blk.data:
cur += _i8(byte)
out[channel].append(cur)
elif blk.tag_hi == 0x00:
for _ in range(blk.tag_lo):
out[channel].append(cur)
elif blk.tag_hi == 0x30:
# 12-bit signed deltas, packed as NN/4 groups of 6 bytes each:
# bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB first)
# bytes [2:6] = 4 × int8 low bytes
# Each delta = sign_extend_12((high_nibble << 8) | low_byte).
# Confirmed 2026-05-11 against all 14 ``30 NN`` blocks in the
# bundled fixtures.
n_groups = blk.tag_lo // 4
for g in range(n_groups):
grp = blk.data[g * 6 : (g + 1) * 6]
if len(grp) < 6:
break
high_word = (grp[0] << 8) | grp[1]
for k in range(4):
nib = (high_word >> (12 - 4 * k)) & 0xF
v = (nib << 8) | grp[2 + k]
if v >= 0x800:
v -= 0x1000
cur += v
out[channel].append(cur)
# 40 02: should not occur in segment data.
return cur
# Initial Tran segment: deltas from start of body up to first 40 02 (or end).
first_seg = seg_idx[0] if seg_idx else len(blocks)
last_tran_value = apply_blocks("Tran", t1, 0, first_seg)
# Subsequent segments rotate channels. Each segment header carries:
# bytes [0:2] and [2:4] = 2 deltas extending the PREVIOUS channel
# bytes [14:16] and [16:18] = anchor pair for THIS segment's channel
#
# Rotation: V, L, M, T, V, L, M, T, ... (initial Tran segment is the
# implicit T in the cycle.)
rotation = ["Vert", "Long", "MicL", "Tran"]
# Track each channel's "running cumulative value" so we can apply the
# previous-channel extension deltas at every segment boundary.
last_value = {"Tran": last_tran_value, "Vert": None, "Long": None, "MicL": None}
for k, hi in enumerate(seg_idx):
channel = rotation[k % 4]
prev_channel = "Tran" if k == 0 else rotation[(k - 1) % 4]
header = blocks[hi]
if len(header.data) < 18:
continue
# Validate: real segment headers have bytes [12:14] = `02 00`.
# Trailer/footer "40 02" markers contain ASCII serial bytes or other
# non-header data there and would otherwise be mis-interpreted as
# segment headers, adding spurious samples at the tail.
if header.data[12:14] != b"\x02\x00":
break
# Extend the PREVIOUS channel by 2 more samples (deltas in bytes [0:4]).
prev_d0 = int.from_bytes(header.data[0:2], "big", signed=True)
prev_d1 = int.from_bytes(header.data[2:4], "big", signed=True)
if last_value[prev_channel] is not None:
v = last_value[prev_channel] + prev_d0
out[prev_channel].append(v)
v += prev_d1
out[prev_channel].append(v)
last_value[prev_channel] = v
# Anchor pair for THIS segment's channel.
c0 = int.from_bytes(header.data[14:16], "big", signed=True)
c1 = int.from_bytes(header.data[16:18], "big", signed=True)
out[channel].extend([c0, c1])
# Apply delta blocks for this segment.
next_hi = seg_idx[k + 1] if k + 1 < len(seg_idx) else len(blocks)
last_value[channel] = apply_blocks(channel, c1, hi + 1, next_hi)
return out
# ── ADC-scale conversion helpers ────────────────────────────────────────────
# Scaling factor: decode_waveform_v2 produces geo-channel samples in the BW
# display quantization (16-count units, LSB = 0.005 in/s at Normal range).
# The legacy consumer pipeline (sfm/event_hdf5.py) expects raw_samples in
# 1-count ADC units (× full_scale / 32768 → physical). To plug the new
# decoder in without rewriting consumers, multiply geo values by 16.
#
# Mic samples are already in raw ADC counts (decoded value 1 = 1 mic ADC count
# = -81.94 dB on the BW display). Mic values pass through unchanged.
_GEO_DECODER_TO_ADC = 16
def decoded_to_adc_counts(decoded: dict) -> dict:
"""Convert :func:`decode_waveform_v2` output to int16 ADC counts.
Geo channels are scaled by ×16 (decoder produces 16-count units,
consumer expects 1-count ADC). Mic is passed through as raw counts.
"""
if not decoded:
return {}
return {
"Tran": [v * _GEO_DECODER_TO_ADC for v in decoded.get("Tran", [])],
"Vert": [v * _GEO_DECODER_TO_ADC for v in decoded.get("Vert", [])],
"Long": [v * _GEO_DECODER_TO_ADC for v in decoded.get("Long", [])],
"MicL": list(decoded.get("MicL", [])),
}
def mic_count_to_db(count: int) -> float:
"""Convert a MicL ADC count to dB(L) for BW-display-compatible output.
Empirical formula (confirmed 2026-05-11 against V70 fixture: count=813
140.1 dB; count=±1 ±81.94 dB; count=±24 ±109.5 dB):
dB = sign(count) × (81.94 + 20 × log10(|count|)) for |count| 1
dB = 0.0 for count == 0
The constant 81.94 corresponds to 10^(81.94/20) 12490 mic ADC counts
being the dB(L) reference level almost certainly a calibration
constant from the device's mic.
"""
if count == 0:
return 0.0
sign = 1.0 if count > 0 else -1.0
return sign * (81.94 + 20.0 * math.log10(abs(count)))
# ── A5-frame entry point ────────────────────────────────────────────────────
def decode_a5_frames(a5_frames) -> Optional[dict]:
"""Decode a list of A5 (BULK_WAVEFORM_STREAM) frames into per-channel
int16 ADC samples.
Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
with each channel's samples in **1-count ADC units** (the legacy
``event.raw_samples`` convention multiply by ``full_scale / 32768``
to convert to physical units; for mic, use :func:`mic_count_to_db` or
a per-count psi factor).
Returns ``None`` if the frames cannot be parsed.
This is the wired-up production entry point. It:
1. Reconstructs the BW-binary body bytes from the A5 frames
(``blastware_file.extract_body_bytes``).
2. Runs the verified codec (``decode_waveform_v2``) on the body.
3. Converts to int16 ADC counts via :func:`decoded_to_adc_counts`.
"""
# Local import to avoid a cycle: blastware_file imports models and
# ultimately client.py imports waveform_codec.
from .blastware_file import extract_body_bytes
if not a5_frames:
return None
_strt, body, _footer = extract_body_bytes(a5_frames)
if not body:
return None
decoded = decode_waveform_v2(body)
if decoded is None:
return None
return decoded_to_adc_counts(decoded)
+2
View File
@@ -53,7 +53,9 @@ SUB_TABLE: dict[int, tuple[str, str, str]] = {
0x82: ("TRIGGER_CONFIG_WRITE", "BW→S3", "0x1C bytes; trigger config block; mirrors SUB 1C"),
0x83: ("TRIGGER_WRITE_CONFIRM", "BW→S3", "Short frame; commit step after 0x82"),
# S3→BW responses
0x5A: ("BULK_WAVEFORM_STREAM", "BW→S3", "Bulk waveform chunk request; response is A5 stream"),
0xA4: ("POLL_RESPONSE", "S3→BW", "Response to SUB 5B poll"),
0xA5: ("BULK_WAVEFORM_RESPONSE", "S3→BW", "Response to SUB 5A; waveform chunks + metadata"),
0xFE: ("FULL_CONFIG_RESPONSE", "S3→BW", "Response to SUB 01"),
0xF9: ("CHANNEL_CONFIG_RESPONSE", "S3→BW", "Response to SUB 06"),
0xF7: ("EVENT_INDEX_RESPONSE", "S3→BW", "Response to SUB 08; contains backlight/power-save"),
+36 -21
View File
@@ -33,7 +33,7 @@ STX = 0x02
ETX = 0x03
ACK = 0x41
__version__ = "0.2.2"
__version__ = "0.2.5"
@dataclass
@@ -184,9 +184,9 @@ def validate_bw_body_auto(body: bytes) -> Optional[Tuple[bytes, bytes, str]]:
def parse_s3(blob: bytes, trailer_len: int) -> List[Frame]:
frames: List[Frame] = []
IDLE = 0
IN_FRAME = 1
AFTER_DLE = 2
IDLE = 0
IN_FRAME = 1
IN_FRAME_DLE = 2 # saw DLE inside frame — waiting for next byte
state = IDLE
body = bytearray()
@@ -206,28 +206,26 @@ def parse_s3(blob: bytes, trailer_len: int) -> List[Frame]:
state = IN_FRAME
i += 2
continue
# ACK bytes, boot strings, garbage — silently ignored
elif state == IN_FRAME:
if b == DLE:
state = AFTER_DLE
state = IN_FRAME_DLE
i += 1
continue
body.append(b)
else: # AFTER_DLE
if b == DLE:
body.append(DLE)
state = IN_FRAME
i += 1
continue
if b == ETX:
# Bare ETX = real S3 frame terminator (confirmed from S3FrameParser)
end_offset = i + 1
trailer_start = i + 1
trailer_end = trailer_start + trailer_len
trailer = blob[trailer_start:trailer_end]
# For S3 mode we don't assume checksum type here yet.
# S3 checksums are deliberately not validated here.
# Large S3 responses (A5 bulk waveform, E5 compliance) embed
# inner DLE+ETX sub-frame terminators whose trailing 0x03 byte
# lands where the parser would expect the SUM8 checksum, causing
# false failures. The live protocol (protocol.py _validate_frame)
# also skips S3 checksum enforcement for the same reason.
frames.append(Frame(
index=idx,
start_offset=start_offset,
@@ -244,13 +242,27 @@ def parse_s3(blob: bytes, trailer_len: int) -> List[Frame]:
state = IDLE
i = trailer_end
continue
body.append(b)
else: # IN_FRAME_DLE
if b == DLE:
# DLE DLE → literal 0x10 in payload
body.append(DLE)
state = IN_FRAME
i += 1
continue
if b == ETX:
# DLE+ETX inside a frame = inner-frame terminator (A4/E5 sub-frames).
# Treat as literal data, NOT the outer frame end.
body.append(DLE)
body.append(ETX)
state = IN_FRAME
i += 1
continue
# Unexpected DLE + byte → treat as literal data
body.append(DLE)
body.append(b)
state = IN_FRAME
i += 1
continue
i += 1
@@ -298,10 +310,13 @@ def parse_bw(blob: bytes, trailer_len: int, validate_checksum: bool) -> List[Fra
if b == ETX:
# Candidate end-of-frame.
# Accept ETX if the next bytes look like a real next-frame start (ACK+STX),
# or we're at EOF. This prevents chopping on in-payload 0x03.
next_is_start = (i + 2 < n and blob[i + 1] == ACK and blob[i + 2] == STX)
at_eof = (i == n - 1)
# Skip any SESSION_RESET (41 03) sequences — sent before POLL to wake
# monitoring units — to find the real next frame start (ACK+STX).
j = i + 1
while j + 1 < n and blob[j] == ACK and blob[j + 1] == ETX:
j += 2
next_is_start = (j + 1 < n and blob[j] == ACK and blob[j + 1] == STX)
at_eof = (i == n - 1) or (j >= n)
if not (next_is_start or at_eof):
# Not a real boundary -> payload byte
+24
View File
@@ -0,0 +1,24 @@
[build-system]
requires = ["setuptools>=68", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "seismo-relay"
version = "0.21.1"
description = "Python client and REST server for MiniMate Plus seismographs"
requires-python = ">=3.10"
dependencies = [
"fastapi>=0.104",
"uvicorn[standard]>=0.24",
"pyserial>=3.5",
"sqlalchemy>=2.0",
"python-multipart>=0.0.7",
"h5py>=3.10",
"numpy>=1.24",
"matplotlib>=3.8",
]
[tool.setuptools.packages.find]
# Auto-discovers minimateplus/, micromate/, sfm/, bridges/ as packages
where = ["."]
include = ["minimateplus*", "micromate*", "sfm*", "bridges*"]
+8
View File
@@ -0,0 +1,8 @@
fastapi
uvicorn
sqlalchemy
pyserial
python-multipart
h5py
numpy
matplotlib
+360
View File
@@ -0,0 +1,360 @@
"""
scratch/next_experiment_skeleton.py segment-channel scoring analyzer.
This is the suggested NEXT EXPERIMENT for cracking the waveform body codec.
The goal is to figure out what segments 1+ contain, since segment 0 = Tran
is solved but multi-segment continuation diverges from truth at sample ~512.
The hypothesis to test
Segments rotate through channels:
segment 0 Tran samples 0..509
segment 1 Vert samples 0..507
segment 2 Long samples 0..507
segment 3 Mic samples 0..507
segment 4 Tran samples 510..N (continuation)
...
This would explain why segment 0 works perfectly (it's pure Tran) and why
applying segment 1's blocks as Tran continuation gives wrong values
(it's actually Vert).
What the analyzer should do
For each segment in each fixture event:
1. Run the segment-0 block-walker + RLE decode (the same algorithm that
``decode_tran_initial`` uses) over the segment's blocks. Start from
some anchor value and produce a cumulative trajectory of length =
number-of-deltas-in-segment.
2. For each candidate channel C {Tran, Vert, Long, MicL}:
For each candidate anchor location in the segment-header payload
(try [0:2], [2:4], [4:6], [14:16], [16:18] as int16 BE):
Compare the decoded trajectory against truth[C] starting from
the segment's first sample index.
Score = number of matches (or sum of squared errors).
3. Report the best (channel, anchor-location) combination per segment.
If the rotation hypothesis is correct, you'll see:
segment 0 best score for (Tran, preamble bytes [3:5]) already known
segment 1 best score for (Vert, <some-header-byte>)
segment 2 best score for (Long, <some-header-byte>)
segment 3 best score for (MicL, <some-header-byte>)
segment 4 best score for (Tran, continuing from segment 0's end)
If the rotation hypothesis is NOT correct, the scorer will at least narrow
down what segment 1 actually carries. Maybe channels interleave at finer
granularity, or maybe segments alternate by something other than channel.
Why this is a scoring analyzer, not a hand-written decoder
Direct hand-coding ("assume segment 1 is Vert with anchor at byte X") gets
stuck when the assumption is wrong because the failure mode is silent
you get plausible-looking-but-wrong samples and have to manually diff
against truth to debug.
The scorer is brute-force but cheap: every fixture event × every segment ×
4 channels × 5 anchor-byte candidates is only ~hundreds of comparisons.
The winning combination jumps out by score.
Skeleton
"""
from __future__ import annotations
import os
import re
import sys
from dataclasses import dataclass
from typing import List, Optional, Tuple
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
from minimateplus.waveform_codec import walk_body, find_data_start, WaveformBlock
# ── Reusable pieces ──────────────────────────────────────────────────────────
CHANNELS = ("Tran", "Vert", "Long", "MicL")
LSB_INV = 200 # 1 in/s / 0.005 in/s/LSB; multiply BW-export floats by this
# to get 16-count units (the body's native quantization).
@dataclass
class FixtureEvent:
name: str # e.g. "M529LL1A.SP0"
bin_path: str
txt_path: str
body: bytes
truth: dict # {channel: list of int16-quantized samples}
blocks: List[WaveformBlock]
segment_starts: List[int] # block indices of each 40 02 segment header
segment_sample_starts: List[int] # for each segment, the truth sample index it starts at
def s4(n: int) -> int:
"""4-bit signed nibble decode."""
return n if n < 8 else n - 16
def i8(b: int) -> int:
"""int8 reinterpret of unsigned byte."""
return b if b < 128 else b - 256
def load_fixture(name: str) -> FixtureEvent:
"""Load a fixture event with its truth values and parsed block stream."""
# Find the fixture (search both subdirs of tests/fixtures/).
base = os.path.join(os.path.dirname(__file__), "..", "tests", "fixtures")
candidates = [
os.path.join(base, "5-11-26", name),
os.path.join(base, "decode-re-5-8-26", "event-a", name), # not used directly
]
bin_path = next((c for c in candidates if os.path.exists(c)), None)
if bin_path is None:
# Try a glob walk for the 5-8 fixtures (they're in subdirs).
for root, _, files in os.walk(base):
if name in files:
bin_path = os.path.join(root, name)
break
if bin_path is None:
raise FileNotFoundError(name)
txt_path = bin_path + ".TXT"
with open(bin_path, "rb") as f:
raw = f.read()
body = raw[43:-26]
truth = _parse_txt(txt_path)
blocks = walk_body(body, find_data_start(body))
seg_idx = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]
# Segment 0 starts at sample 0; subsequent segments start at the
# cumulative sample count from previous segment(s). Tran's segment 0
# is N samples; if rotation hypothesis is correct, segment 1's data
# starts at sample 0 for a *different* channel. The analyzer should
# try both "continues from previous segment" and "starts at sample 0
# of a different channel."
seg_sample_starts = _compute_segment_sample_starts(blocks, seg_idx)
return FixtureEvent(
name=name, bin_path=bin_path, txt_path=txt_path,
body=body, truth=truth, blocks=blocks,
segment_starts=seg_idx, segment_sample_starts=seg_sample_starts,
)
def _parse_txt(path: str) -> dict:
"""Parse BW ASCII TXT export into {channel: [int_samples_in_16_count_units]}."""
with open(path, "r", encoding="utf-8", errors="replace") as f:
lines = f.read().splitlines()
header_idx = next(
(i for i, l in enumerate(lines)
if all(c in l for c in CHANNELS)),
None,
)
if header_idx is None:
return {ch: [] for ch in CHANNELS}
out = {ch: [] for ch in CHANNELS}
for line in lines[header_idx + 1:]:
parts = re.split(r"\s+", line.strip())
if len(parts) < 4:
continue
try:
vals = [float(p) for p in parts[:4]]
except ValueError:
continue
for ch, v in zip(CHANNELS, vals):
# Multiply by LSB_INV; geo channels are in in/s, MicL is in dB(L)
# (which doesn't quantize the same way — leaving raw for MicL is fine,
# the scorer should treat MicL specially).
out[ch].append(round(v * LSB_INV) if ch != "MicL" else v)
return out
def _compute_segment_sample_starts(
blocks: List[WaveformBlock], seg_idx: List[int]
) -> List[int]:
"""Cumulative sample-count up to each segment header (if all blocks treated
as Tran continuation). Useful as one candidate for segment-1-Tran tests.
The scorer should ALSO try "segment 1 starts at sample 0 of a new channel"
as the rotation hypothesis predicts.
"""
starts = []
cum = 2 # T[0] + T[1] from preamble
for i, b in enumerate(blocks):
if i in seg_idx:
starts.append(cum)
if b.tag_hi == 0x10:
cum += b.tag_lo
elif b.tag_hi == 0x20:
cum += b.tag_lo
elif b.tag_hi == 0x00:
cum += b.tag_lo
# 30 NN and 40 02 don't contribute samples (for this hypothesis)
return starts
# ── The core algorithm: decode a segment's blocks as deltas ─────────────────
def decode_segment_as_channel(
blocks: List[WaveformBlock],
seg_start_block_idx: int,
seg_end_block_idx: int,
anchor: int,
) -> List[int]:
"""Apply the segment-0 codec rules to a range of blocks, starting from *anchor*.
Returns a list of cumulative sample values (one per delta). Does NOT include
the anchor itself in the output the first returned value is anchor + first_delta.
"""
out = []
cur = anchor
for bi in range(seg_start_block_idx, seg_end_block_idx):
blk = blocks[bi]
if blk.tag_hi == 0x10:
for byte in blk.data:
for nib in ((byte >> 4) & 0xF, byte & 0xF):
cur += s4(nib)
out.append(cur)
elif blk.tag_hi == 0x20:
for byte in blk.data:
cur += i8(byte)
out.append(cur)
elif blk.tag_hi == 0x00:
for _ in range(blk.tag_lo):
out.append(cur)
# 30 NN: skip (content unknown)
# 40 02: shouldn't appear in segment data (it's the segment header)
return out
def score_against_truth(
decoded: List[int],
truth: List[int],
truth_start: int,
) -> Tuple[int, int]:
"""Compare *decoded* to truth[truth_start : truth_start + len(decoded)].
Returns (n_matches, n_compared).
"""
n = min(len(decoded), len(truth) - truth_start)
if n <= 0:
return (0, 0)
matches = sum(1 for i in range(n) if decoded[i] == truth[truth_start + i])
return (matches, n)
# ── TODO for the next pass ──────────────────────────────────────────────────
def score_segment_against_all_channels(
event: FixtureEvent,
segment_index: int,
) -> List[Tuple[str, int, int, int]]:
"""For segment *segment_index* of *event*, find the best (channel, start_sample)
fit.
For each candidate channel C and each candidate starting truth-sample index s,
we pick the anchor that makes the FIRST decoded value match truth[C][s], then
score the remaining decoded values against truth[C][s+1 : s+N].
Returns rows of (channel_name, start_sample, n_matches, n_compared)
sorted by match-count descending.
"""
# Block range of this segment: from the segment header (inclusive) up to
# the next segment header (exclusive), or end-of-blocks.
seg_header_idx = event.segment_starts[segment_index]
next_header_idx = (
event.segment_starts[segment_index + 1]
if segment_index + 1 < len(event.segment_starts)
else len(event.blocks)
)
# Decode the segment's data blocks (skip the segment-header block itself).
# Use anchor=0 — we'll re-anchor when scoring against each channel.
deltas_trajectory = decode_segment_as_channel(
event.blocks, seg_header_idx + 1, next_header_idx, anchor=0
)
if not deltas_trajectory:
return []
n = len(deltas_trajectory)
results = []
for ch in ("Tran", "Vert", "Long"):
truth = event.truth.get(ch)
if not truth or len(truth) < n + 1:
continue
# For each candidate starting sample s in truth, check if applying
# the deltas starting from truth[s] reproduces truth[s+1:s+n+1].
best = (0, -1)
for s in range(len(truth) - n):
anchor = truth[s]
offset = anchor - deltas_trajectory[0] + truth[s + 1] - anchor
# Recompute: trajectory[i] = anchor + cumulative_delta_through_i
# but we already have deltas_trajectory computed from anchor=0,
# so trajectory_relative[i] = anchor + deltas_trajectory[i].
matches = 0
for i in range(n):
if truth[s + i + 1] == anchor + deltas_trajectory[i]:
matches += 1
# Note: we could break early on first mismatch for "matches start",
# but counting total matches gives a more robust score.
if matches > best[0]:
best = (matches, s)
results.append((ch, best[1], best[0], n))
results.sort(key=lambda r: -r[2])
return results
# ── Driver ──────────────────────────────────────────────────────────────────
def main():
"""Run the analyzer on all loud-bundle events and print best scores."""
events = ["M529LL1A.SP0", "M529LL1A.SS0", "M529LL1A.SV0",
"M529LL1L.JQ0", "M529LL1L.V70"]
for name in events:
try:
event = load_fixture(name)
except FileNotFoundError:
print(f"{name}: fixture not found")
continue
print(f"\n=== {name} ===")
print(f" body bytes: {len(event.body)}")
print(f" blocks: {len(event.blocks)}")
print(f" segments: {len(event.segment_starts)}")
print(f" segment sample-starts (if all blocks are 1 channel):")
for si, sample_start in enumerate(event.segment_sample_starts):
print(f" seg {si}: sample {sample_start}")
for si in range(len(event.segment_starts)):
results = score_segment_against_all_channels(event, si)
if not results:
print(f" seg {si}: (no scorable data)")
continue
tag = "" if results[0][2] / max(results[0][3], 1) > 0.9 else " "
top = results[0]
print(f" seg {si}: best fit {tag} = {top[0]:<5} "
f"starting at sample {top[1]:>5}, {top[2]:>4}/{top[3]:<4} match"
+ (f" (next: {results[1][0]} @{results[1][1]} {results[1][2]}/{results[1][3]})"
if len(results) > 1 else ""))
if __name__ == "__main__":
main()
+150
View File
@@ -0,0 +1,150 @@
"""
scripts/backfill_record_type.py fix `record_type` on legacy event
rows whose value was hardcoded to "Waveform" regardless of actual type.
Why this is needed
Pre-v0.16.1 the BW file importer (`event_file_io.read_blastware_file`)
hardcoded `ev.record_type = "Waveform"` for every imported event. Fixed
in commit aac1c8e new ingests now derive the type from the Blastware
filename's extension last character (H=Histogram, W=Waveform, M=Manual,
E=Event, C=Combo) per the V10.72+ MiniMate Plus AB0T filename scheme.
Effect on a server that imported events under the old code: every
events row has `record_type = "Waveform"`, even for histograms,
manuals, etc. Visible in terra-view's event-detail modal under the
"Record Type" field. Terra-view also has a client-side workaround
that derives the type from the filename for display purposes, so
operators see the correct type in the UI even before this backfill.
This script makes the DB column match what the UI is already showing,
which matters for reporting and any downstream consumer that reads
events.record_type directly.
This script
Walks the `events` table and updates each row's `record_type` to the
derived value from its `blastware_filename`. Old S338 firmware files
(3-char extensions ending in `0`) and any unrecognized suffix get
left at the existing value (defaults to "Waveform").
Idempotent: re-running after a successful backfill finds zero rows
needing updates and exits cleanly (it always re-derives but only
writes when the value would change).
Usage
# Dry-run (default): print what would change, don't touch the DB
python -m scripts.backfill_record_type --db bridges/captures/seismo_relay.db
# Apply the backfill
python -m scripts.backfill_record_type --db bridges/captures/seismo_relay.db --apply
"""
from __future__ import annotations
import argparse
import sqlite3
import sys
from collections import Counter
from pathlib import Path
# Must stay in sync with minimateplus.event_file_io._RECORD_TYPE_BY_EXT_SUFFIX.
_TYPE_FROM_SUFFIX = {
"H": "Histogram",
"W": "Waveform",
"M": "Manual",
"E": "Event",
"C": "Combo",
}
def derive_record_type(filename: str | None, default: str = "Waveform") -> str:
"""Mirror of minimateplus.event_file_io.derive_record_type_from_filename.
Vendored here so this script runs without needing the seismo-relay
package on the Python path (useful on prod where you might be
running it via `docker exec` against a container's DB volume).
"""
if not filename:
return default
name = Path(filename).name
if "." not in name:
return default
ext = name.rsplit(".", 1)[1]
if not ext:
return default
return _TYPE_FROM_SUFFIX.get(ext[-1].upper(), default)
def main() -> int:
ap = argparse.ArgumentParser(description=__doc__)
ap.add_argument("--db", required=True, help="Path to seismo_relay.db")
ap.add_argument("--apply", action="store_true",
help="Actually write changes (default is dry-run).")
ap.add_argument("--default", default="Waveform",
help="Fallback record_type when filename doesn't encode one. "
"Default: Waveform (matches the pre-fix bug's behavior).")
args = ap.parse_args()
db_path = Path(args.db)
if not db_path.exists():
print(f"ERROR: database not found at {db_path}", file=sys.stderr)
return 1
conn = sqlite3.connect(str(db_path))
conn.row_factory = sqlite3.Row
cur = conn.cursor()
cur.execute("""
SELECT id, blastware_filename, record_type
FROM events
WHERE blastware_filename IS NOT NULL
AND blastware_filename != ''
""")
rows = cur.fetchall()
total = len(rows)
print(f"Scanning {total:,} event rows…")
print()
# Tally proposed changes.
transitions: Counter[tuple[str, str]] = Counter()
update_ids: list[tuple[str, str]] = []
unrecognized = 0
for row in rows:
derived = derive_record_type(row["blastware_filename"], default=args.default)
current = row["record_type"] or ""
if derived == current:
continue
transitions[(current, derived)] += 1
update_ids.append((row["id"], derived))
if not update_ids:
print("Nothing to update — all rows already match.")
conn.close()
return 0
print(f"{len(update_ids):,} row(s) need updating:")
for (old, new), count in sorted(transitions.items(), key=lambda x: -x[1]):
print(f" {count:>6,} {old!r:14s}{new!r}")
print()
if not args.apply:
print("(dry-run — re-run with --apply to write changes)")
conn.close()
return 0
print("Applying changes…")
cur.executemany(
"UPDATE events SET record_type = ? WHERE id = ?",
[(new, eid) for eid, new in update_ids],
)
conn.commit()
print(f"Done. Updated {cur.rowcount:,} row(s).")
conn.close()
return 0
if __name__ == "__main__":
sys.exit(main())
+466
View File
@@ -0,0 +1,466 @@
"""
scripts/backfill_sidecars.py generate .sfm.json sidecars AND .h5
clean-waveform files for existing events already in the waveform store
that predate those features.
Walks `<store_root>/<serial>/<filename>` and for each BW event file:
Sidecar (.sfm.json):
- Skip when an existing sidecar's blastware.sha256 matches the
current BW file's sha256.
- Else regenerate: prefer .a5.pkl (full fidelity); fall back to
parsing the BW binary directly (peaks computed from samples).
Clean waveform (.h5):
- Regenerated whenever the sidecar is regenerated (sha mismatch
OR sidecar.source.tool_version < current TOOL_VERSION OR --force).
The .h5 and the sidecar both come from the same decoder output,
so if the sidecar is stale the .h5 is too.
- Written when missing.
- --skip-hdf5 turns off all .h5 writes.
Typical use after a decoder upgrade:
1. Pull the new seismo-relay code (which bumped TOOL_VERSION).
2. Run this script every sidecar with an older tool_version
stamp regenerates, and the associated .h5 cascade-regenerates.
3. Operator review state (review.false_trigger, notes, reviewer)
and the sidecar's extensions block are preserved across the
regen.
Usage:
python scripts/backfill_sidecars.py [--store-root PATH]
[--db-path PATH]
[--dry-run]
[--skip-hdf5]
[-v]
"""
from __future__ import annotations
import argparse
import logging
import sys
from pathlib import Path
# Allow running from the repo root without installation.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from minimateplus import event_file_io
from sfm import event_hdf5
from sfm.waveform_store import WaveformStore, _frame_to_dict, _dict_to_frame # noqa: F401
from sfm.database import SeismoDb
log = logging.getLogger("backfill_sidecars")
def _looks_like_event_file(path: Path) -> bool:
"""Same heuristic as the importer CLI.
Filters to BW (Series III) event files only Thor (Series IV)
`.IDFW` / `.IDFH` files share the store but have their own ingest
path (`WaveformStore.save_imported_idf`) and are NOT decodable by
`event_file_io.read_blastware_file`. Their sidecars are populated
at ingest from the paired `.IDFW.txt` ASCII report; nothing the
backfill regenerates would improve on them, so we exclude them
from scope.
"""
if not path.is_file():
return False
if path.name.endswith((".a5.pkl", ".sfm.json", ".h5")):
return False
ext = path.suffix.lstrip(".")
if not (3 <= len(ext) <= 4):
return False
# Thor IDF files share the .{W,H}-suffix shape but aren't BW.
if ext.upper() in ("IDFW", "IDFH"):
return False
if not (ext[-1].upper() in {"W", "H"} or ext.endswith("0")):
return False
try:
return path.stat().st_size >= 70
except OSError:
return False
def main(argv=None) -> int:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument(
"--db-path",
default=str(Path(__file__).resolve().parent.parent / "bridges" / "captures" / "seismo_relay.db"),
)
p.add_argument("--store-root", default=None)
p.add_argument("--dry-run", action="store_true")
p.add_argument(
"--skip-hdf5", action="store_true",
help="Don't generate .h5 clean-waveform files (only sidecars).",
)
p.add_argument(
"--force", action="store_true",
help=(
"Regenerate sidecars + .h5 even when an existing sidecar's "
"blastware.sha256 matches the current BW file. Use this after "
"upgrading seismo-relay to pull in decoder bug fixes (e.g. the "
"STRT-rectime byte-offset fix in v0.15.x)."
),
)
p.add_argument(
"--reparse-txt", action="store_true",
help=(
"Re-parse the preserved <serial>/<filename>_ASCII.TXT with the "
"current bw_ascii_report parser and overwrite the sidecar's "
"bw_report block. Use this after upgrading the ASCII parser to "
"pull in new fields (e.g. zc_freq_above_range for BW '>100 Hz' "
"ZC peaks). No-op for events without a preserved .TXT; safely "
"idempotent when the parser hasn't changed."
),
)
p.add_argument("-v", "--verbose", action="store_true")
args = p.parse_args(argv)
logging.basicConfig(
level=logging.DEBUG if args.verbose else logging.INFO,
format="%(asctime)s %(levelname)-7s %(name)s %(message)s",
datefmt="%H:%M:%S",
)
db_path = Path(args.db_path).expanduser().resolve()
store_root = (
Path(args.store_root).expanduser().resolve()
if args.store_root else db_path.parent / "waveforms"
)
if not store_root.exists():
print(f"error: store root does not exist: {store_root}", file=sys.stderr)
return 2
store = WaveformStore(store_root)
db = SeismoDb(db_path)
written = skipped = errors = 0
for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
serial = serial_dir.name
for path in sorted(serial_dir.iterdir()):
if not _looks_like_event_file(path):
continue
sidecar_path = store.sidecar_path_for(serial, path.name)
try:
bw_sha = event_file_io.file_sha256(path)
except Exception as exc:
log.error("sha256 failed for %s: %s", path, exc)
errors += 1
continue
# Skip when an up-to-date sidecar already exists.
#
# Two-part freshness check:
# 1. blastware.sha256 must match the current BW file (proves
# the sidecar describes THIS file).
# 2. source.tool_version must be ≥ current TOOL_VERSION (proves
# the sidecar was written by a build that includes any
# decoder fixes shipped since).
# Either part failing → regenerate. --force bypasses both.
#
# Tracks whether we're regenerating the sidecar this iteration
# so the .h5 logic below knows to refresh that too — staleness
# of the sidecar implies staleness of the derived .h5 (both
# come out of the same decoder).
sidecar_stale = True
if sidecar_path.exists() and not args.force and not args.reparse_txt:
try:
existing = event_file_io.read_sidecar(sidecar_path)
sha_ok = existing.get("blastware", {}).get("sha256") == bw_sha
src_ver = existing.get("source", {}).get("tool_version", "")
def _vt(s):
try:
return tuple(int(p) for p in str(s).split(".")[:3])
except Exception:
return (0, 0, 0)
ver_ok = _vt(src_ver) >= _vt(event_file_io.TOOL_VERSION)
if sha_ok and ver_ok:
skipped += 1
sidecar_stale = False
continue
if sha_ok and not ver_ok:
log.info(
"regenerating %s (sidecar tool_version=%s < current %s)",
sidecar_path.name, src_ver or "(none)",
event_file_io.TOOL_VERSION,
)
except Exception:
pass # fall through to rewrite
# Decide path: A5-based (high-fidelity) or BW-only.
a5_path = serial_dir / f"{path.name}.a5.pkl"
try:
if a5_path.exists():
frames = store.load_a5(serial, path.name)
if not frames:
raise RuntimeError("a5_pickle present but unreadable")
# Build an Event by replaying the A5 decoders. Note:
# the .a5.pkl alone CANNOT recover timestamp /
# record_type / waveform_key / per-channel peaks —
# those live in the 0C record, which isn't saved
# separately. We seed those from the DB row + the
# existing sidecar below so a re-backfill doesn't
# nuke fields the original save populated.
from minimateplus.client import (
_decode_a5_metadata_into,
_decode_a5_waveform,
)
from minimateplus.models import Event, PeakValues, ProjectInfo, Timestamp
ev = Event(index=-1)
_decode_a5_metadata_into(frames, ev)
_decode_a5_waveform(frames, ev)
source_kind = "sfm-live"
a5_filename = a5_path.name
else:
ev = event_file_io.read_blastware_file(path)
source_kind = "bw-import"
a5_filename = None
from minimateplus.models import Event, PeakValues, ProjectInfo, Timestamp
# ── Seed missing fields from the SeismoDb events row ──
# The DB row was populated at original save time with peaks,
# project info, timestamp, record_type, sample_rate, etc.
# All of those survive intact in SQLite; pull them onto the
# rebuilt Event so the regenerated sidecar matches what was
# there before the backfill ran.
db_row = None
try:
import sqlite3 as _sql
with _sql.connect(str(db.db_path)) as _conn:
_conn.row_factory = _sql.Row
db_row = _conn.execute(
"SELECT * FROM events "
"WHERE serial=? AND blastware_filename=? "
"LIMIT 1",
(serial, path.name),
).fetchone()
except Exception as exc:
log.debug("DB lookup failed for %s: %s", path.name, exc)
if db_row is not None:
if ev.sample_rate is None and db_row["sample_rate"]:
ev.sample_rate = int(db_row["sample_rate"])
if not ev.record_type and db_row["record_type"]:
ev.record_type = db_row["record_type"]
if ev._waveform_key is None and db_row["waveform_key"]:
try:
ev._waveform_key = bytes.fromhex(db_row["waveform_key"])
except Exception:
pass
# Timestamp from the ISO-8601 string in the DB row.
if ev.timestamp is None and db_row["timestamp"]:
try:
import datetime as _dt
_t = _dt.datetime.fromisoformat(db_row["timestamp"])
ev.timestamp = Timestamp(
raw=b"", flag=0x10,
year=_t.year, unknown_byte=0,
month=_t.month, day=_t.day,
hour=_t.hour, minute=_t.minute, second=_t.second,
)
except Exception:
pass
# Peaks from the DB row when the A5 decode didn't supply them.
if ev.peak_values is None:
ev.peak_values = PeakValues(
tran=db_row["tran_ppv"],
vert=db_row["vert_ppv"],
long=db_row["long_ppv"],
peak_vector_sum=db_row["peak_vector_sum"],
micl=db_row["mic_ppv"],
)
# Project info from the DB row when the A5 metadata-page
# decode didn't pick it up.
if ev.project_info is None or all(
v in (None, "")
for v in (
(ev.project_info.project if ev.project_info else None),
(ev.project_info.client if ev.project_info else None),
(ev.project_info.operator if ev.project_info else None),
(ev.project_info.sensor_location if ev.project_info else None),
)
):
ev.project_info = ProjectInfo(
project=db_row["project"],
client=db_row["client"],
operator=db_row["operator"],
sensor_location=db_row["sensor_location"],
)
# Derive total_samples when we have both rectime + sample_rate.
# The decoder's STRT-derived value can be a buffer offset
# rather than a sample count — drop it in that case.
if ev.sample_rate and ev.rectime_seconds:
derived = int(round(ev.sample_rate * ev.rectime_seconds))
if (ev.total_samples is None
or ev.total_samples > derived * 2
or ev.total_samples < derived // 4):
ev.total_samples = derived
# Preserve user-edited review state + extensions + the
# bw_report block from the existing sidecar so a backfill
# never wipes them out. The bw_report block originates
# from the paired .TXT ASCII report parsed at ORIGINAL
# import time (ach forward / direct upload); the .TXT
# file is not in the waveform store, so we can't re-derive
# it from disk. event_to_sidecar_dict takes a
# BwAsciiReport dataclass (not a dict), so for bw_report
# we overlay the existing block after regen instead of
# passing it as a kwarg.
preserved_review = None
preserved_ext = None
preserved_bw_report = None
preserved_txt_fn = None
if sidecar_path.exists():
try:
_existing = event_file_io.read_sidecar(sidecar_path)
preserved_review = _existing.get("review")
preserved_ext = _existing.get("extensions")
preserved_bw_report = _existing.get("bw_report")
# Preserve txt_filename so backfills don't blank out the
# pointer to the saved raw .TXT (events ingested after
# 2026-05-27 have this).
preserved_txt_fn = (_existing.get("source") or {}).get("txt_filename")
except Exception:
pass
# --reparse-txt: if a .TXT is preserved on disk, run the
# current parser against it and overwrite the bw_report
# block. Picks up post-ingest parser fixes (e.g. the
# 2026-05-28 zc_freq_above_range / ">100 Hz" addition).
if args.reparse_txt and preserved_txt_fn:
try:
from minimateplus import bw_ascii_report
txt_path = store.txt_path_for(serial, path.name)
if txt_path.exists():
refreshed = bw_ascii_report.parse_report_file(txt_path)
preserved_bw_report = event_file_io._bw_report_to_dict(refreshed)
log.debug("reparsed bw_report from %s", txt_path.name)
else:
log.debug("--reparse-txt: no .TXT at %s (sidecar says %r)",
txt_path, preserved_txt_fn)
except Exception as exc:
log.warning("--reparse-txt failed for %s: %s", path.name, exc)
# Overlay BW ASCII report fields onto the rebuilt Event
# BEFORE the sidecar + DB write. Mirrors what the ingest
# path does — BW's reported peaks (and sample_rate /
# record_time) win over codec output where present.
#
# Without this step, --force backfill silently overwrites
# the bw_report-overlaid DB columns with codec-derived
# values, which is wrong for events the codec doesn't
# fully decode (e.g. waveform walker edge cases on
# SP0/SS0/SV0-style events, or histogram sub-formats with
# byte[5]!=0 that aren't yet RE'd). Net effect was PVS=0
# on three top-10 events on 2026-05-22.
if preserved_bw_report:
event_file_io.apply_bw_report_dict_to_event(
ev, preserved_bw_report,
)
sidecar = event_file_io.event_to_sidecar_dict(
ev,
serial=serial,
blastware_filename=path.name,
blastware_filesize=path.stat().st_size,
blastware_sha256=bw_sha,
source_kind=source_kind,
a5_pickle_filename=a5_filename,
txt_filename=preserved_txt_fn,
review=preserved_review,
extensions=preserved_ext,
)
if preserved_bw_report is not None:
sidecar["bw_report"] = preserved_bw_report
# Also emit the .h5 clean-waveform file when:
# - it's missing, OR
# - --force was passed, OR
# - the sidecar is being regenerated this iteration
# (sha mismatch / tool_version too old). The .h5 and
# the sidecar are both derived from the same decoder
# output, so if the sidecar is stale, so is the .h5.
#
# Both waveform and histogram bodies now decode to real
# samples via event_file_io.read_blastware_file → either
# waveform_codec.decode_waveform_v2 or histogram_codec.
# decode_histogram_body. If samples are still empty after
# both codecs run, it's a genuine "we can't decode this
# file" case (truncated, malformed, or unknown mode);
# skip the .h5 write so we don't replace whatever's
# there with an empty placeholder.
has_samples = bool(
ev.raw_samples and any(
ev.raw_samples.get(ch) for ch in ("Tran", "Vert", "Long", "MicL")
)
)
hdf5_path = store.hdf5_path_for(serial, path.name)
hdf5_filename = hdf5_path.name if hdf5_path.exists() else None
hdf5_action = "kept"
need_h5 = (
not args.skip_hdf5
and (args.force or not hdf5_path.exists() or sidecar_stale)
and has_samples
)
if not has_samples and not args.skip_hdf5:
hdf5_action = "skipped-undecodable"
if need_h5:
if args.dry_run:
hdf5_action = "would (re)write"
else:
try:
event_hdf5.write_event_hdf5(
hdf5_path, ev,
serial=serial,
geo_range="normal",
source_kind=source_kind,
)
hdf5_filename = hdf5_path.name
hdf5_action = "rewrote" if hdf5_path.exists() else "wrote"
except Exception as exc:
log.warning("HDF5 write failed for %s: %s", path.name, exc)
hdf5_action = "FAILED"
if args.dry_run:
print(f" [DRY ] would write {sidecar_path.name} "
f"+ .h5 ({hdf5_action}) source={source_kind}")
written += 1
continue
event_file_io.write_sidecar(sidecar_path, sidecar)
# Best-effort: keep the SQL row's sidecar_filename in sync
# by upserting via insert_events (it dedups on serial+ts).
try:
db.insert_events(
[ev], serial=serial,
waveform_records=(
{ev._waveform_key.hex(): {
"filename": path.name,
"filesize": path.stat().st_size,
"a5_pickle_filename": a5_filename,
"sidecar_filename": sidecar_path.name,
}}
if ev._waveform_key else None
),
device_family="series3",
)
except Exception as exc:
log.warning("DB upsert failed for %s: %s", path.name, exc)
print(f" [OK ] {path.name}{sidecar_path.name} "
f"+ h5 ({hdf5_action}) source={source_kind}")
written += 1
except Exception as exc:
log.error("backfill failed for %s: %s", path, exc, exc_info=args.verbose)
errors += 1
print(f"\nDone. written={written} skipped(uptodate)={skipped} errors={errors}")
return 0 if errors == 0 else 1
if __name__ == "__main__":
sys.exit(main())
+331
View File
@@ -0,0 +1,331 @@
"""
scripts/backfill_thor_events.py re-process existing Thor (Series IV)
events so their sidecars carry the bw_report block produced by
``micromate.idf_to_bw_report.build_bw_report_from_idf`` + their .h5
clean-waveform files for IDFW events.
Why this exists
Thor events ingested before v0.21.0 (or during the v0.21.0 ingest bug
window fixed in commit bee1185) have sidecars with only
``extensions.idf_report`` no ``bw_report`` block. Without
``bw_report``, the SFM PDF renderer falls back to DB-only fields
(misses sensor-self-check, full per-channel breakdown, mic dB(L)),
and the modal chart 404s on ``/waveform.json`` for IDFW events
because no .h5 was written when the codec failed at ingest.
Re-forwarding from thor-watcher would also fix this, but that requires
operator coordination on every watcher machine and uses bandwidth this
script doesn't.
What this does
Walks ``<store>/<serial>/<filename>`` for ``.IDFW`` / ``.IDFH`` files
and, for each one:
1. Reads the existing sidecar (preserving review state + captured_at).
2. Re-runs ``micromate.idf_file.read_idf_file()`` on the binary
bytes passing ``data=`` so the codec doesn't try to read from
a path it doesn't know.
3. Pulls ``extensions.idf_report`` (the raw parsed Thor dict the
v0.18.0+ ingest path already stashed) and runs the v0.21.0
``build_bw_report_from_idf`` adapter against it.
4. Writes the refreshed sidecar with the new ``bw_report``,
bumped ``source.tool_version``, but preserved ``review`` block
+ the original ``captured_at`` timestamp.
5. Regenerates the .h5 waveform file via the existing
``event_hdf5`` writer. For IDFW that's the decoded per-sample
stream; for IDFH it's a 1-sample-per-interval synthesised array
(peak ADC count per channel) so the renderer's bar-chart code
has data to group on. Mic peak psi from the binary is merged
onto the IdfEvent before the bridge so the h5 writer's per-count
mic scale factor lands on a sensible value (without this the
mic chart on Thor events plots dB(L)-as-pseudo-psi and shows
bomb-level numbers).
Idempotent. Re-running it after a parser/adapter change just
re-writes sidecars no DB writes, no thor-watcher coordination.
Usage
python scripts/backfill_thor_events.py [--store-root PATH]
[--dry-run]
[--skip-hdf5]
[--force]
[-v]
By default, refreshes any Thor event whose sidecar is missing
``bw_report`` OR whose ``source.tool_version`` is older than the
current ``TOOL_VERSION``. ``--force`` refreshes every Thor event
regardless.
"""
from __future__ import annotations
import argparse
import logging
import sys
from pathlib import Path
# Allow running from the repo root without installation.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from minimateplus import event_file_io
from sfm.waveform_store import WaveformStore
log = logging.getLogger("backfill_thor_events")
def _is_thor_event(path: Path) -> bool:
if not path.is_file():
return False
if path.name.endswith((".sfm.json", ".h5", "_ASCII.TXT")):
return False
return path.suffix.upper() in (".IDFW", ".IDFH")
def _vtuple(s: str) -> tuple:
try:
return tuple(int(p) for p in str(s).split(".")[:3])
except Exception:
return (0, 0, 0)
def main(argv=None) -> int:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument(
"--db-path",
default=str(Path(__file__).resolve().parent.parent / "bridges" / "captures" / "seismo_relay.db"),
help="Used only to derive the default --store-root.",
)
p.add_argument("--store-root", default=None)
p.add_argument("--dry-run", action="store_true")
p.add_argument("--skip-hdf5", action="store_true",
help="Don't regenerate .h5 files for IDFW events.")
p.add_argument("--force", action="store_true",
help="Refresh every Thor event, not just ones with stale or missing bw_report.")
p.add_argument("-v", "--verbose", action="store_true")
args = p.parse_args(argv)
logging.basicConfig(
level=logging.DEBUG if args.verbose else logging.INFO,
format="%(asctime)s %(levelname)-7s %(name)s %(message)s",
datefmt="%H:%M:%S",
)
db_path = Path(args.db_path).expanduser().resolve()
store_root = (
Path(args.store_root).expanduser().resolve()
if args.store_root else db_path.parent / "waveforms"
)
if not store_root.exists():
log.error("store root not found: %s", store_root)
return 1
store = WaveformStore(store_root)
log.info("store root: %s", store_root)
log.info("current TOOL_VERSION: %s", event_file_io.TOOL_VERSION)
refreshed = skipped = errors = h5_written = 0
# Lazy imports so any one of these failing produces a useful error
# message rather than crashing module-load.
from micromate.idf_file import read_idf_file
from micromate.idf_to_bw_report import build_bw_report_from_idf
for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
serial = serial_dir.name
for path in sorted(serial_dir.iterdir()):
if not _is_thor_event(path):
continue
sidecar_path = store.sidecar_path_for(serial, path.name)
if not sidecar_path.exists():
log.debug("%s: no sidecar — skipping (this is a binary without ingest history)",
path.name)
skipped += 1
continue
try:
existing = event_file_io.read_sidecar(sidecar_path)
except Exception as exc:
log.warning("%s: failed to read sidecar — %s", path.name, exc)
errors += 1
continue
has_bw_report = bool(existing.get("bw_report"))
existing_version = (existing.get("source") or {}).get("tool_version", "")
up_to_date = (
has_bw_report
and _vtuple(existing_version) >= _vtuple(event_file_io.TOOL_VERSION)
)
if up_to_date and not args.force:
skipped += 1
continue
# Re-decode the binary. Catch + log; continue with .txt-only
# data if it fails (matches the live ingest path's behavior).
idf_samples = None
idf_intervals = None
binary_md = None
is_histogram = path.suffix.upper() == ".IDFH"
try:
binary_bytes = path.read_bytes()
res = read_idf_file(path, data=binary_bytes)
idf_samples = res.samples or None
idf_intervals = res.intervals
binary_md = res.binary_metadata
is_histogram = res.intervals is not None
except NotImplementedError:
# sig-B / Blastware-stray binary; no samples but adapter
# can still produce a bw_report from extensions.idf_report.
log.debug("%s: binary codec NotImplementedError (sig-B / BW-stray); proceeding from sidecar's idf_report only", path.name)
except Exception as exc:
log.warning("%s: binary decode failed — %s; proceeding from sidecar's idf_report only", path.name, exc)
# Run the adapter. Pull report_dict from
# extensions.idf_report (the v0.18.0+ ingest preserved it).
report_dict = (existing.get("extensions") or {}).get("idf_report") or {}
if not report_dict and binary_md is None:
log.debug("%s: no idf_report in sidecar AND no binary metadata — nothing to project", path.name)
skipped += 1
continue
try:
bw_report = build_bw_report_from_idf(
report_dict, binary_md=binary_md,
intervals=idf_intervals, is_histogram=is_histogram,
)
except Exception as exc:
log.warning("%s: adapter failed — %s", path.name, exc)
errors += 1
continue
# Build the new sidecar by overlaying refreshed fields onto
# the existing one — preserves review, captured_at, blastware
# block, source.kind, etc.
new_sidecar = dict(existing) # shallow copy
new_sidecar["bw_report"] = bw_report
src = dict(new_sidecar.get("source") or {})
src["tool_version"] = event_file_io.TOOL_VERSION
new_sidecar["source"] = src
# Preserve histogram intervals if the binary decoded them
# (improves over the original ingest if that one ran before
# the bee1185 codec fix).
if idf_intervals is not None:
ext = dict(new_sidecar.get("extensions") or {})
ext["idf_intervals"] = [
{
"offset": iv.offset,
"tran_peak": iv.peak_count("Tran"),
"tran_halfp": iv.tran_halfp,
"tran_freq": iv.freq_hz("Tran"),
"vert_peak": iv.peak_count("Vert"),
"vert_halfp": iv.vert_halfp,
"vert_freq": iv.freq_hz("Vert"),
"long_peak": iv.peak_count("Long"),
"long_halfp": iv.long_halfp,
"long_freq": iv.freq_hz("Long"),
"mic_peak": iv.peak_count("MicL"),
"mic_halfp": iv.micl_halfp,
"mic_freq": iv.freq_hz("MicL"),
}
for iv in idf_intervals
]
new_sidecar["extensions"] = ext
if args.dry_run:
will_write_h5 = (idf_samples or idf_intervals) and not args.skip_hdf5
log.info("[DRY] %s/%s — would refresh sidecar (bw_report=%s, h5=%s)",
serial, path.name,
"wrote" if not has_bw_report else "refreshed",
"would write" if will_write_h5 else "skipped")
else:
event_file_io.write_sidecar(sidecar_path, new_sidecar)
log.info("%s/%s — sidecar refreshed (bw_report=%s, intervals=%d)",
serial, path.name,
"added" if not has_bw_report else "refreshed",
len(idf_intervals) if idf_intervals else 0)
refreshed += 1
# Regenerate .h5 by replaying the same IdfEvent → Event bridge
# save_imported_idf uses. For IDFW we write the decoded per-
# sample arrays. For IDFH we synthesise a 1-sample-per-interval
# array (peak ADC count per channel per interval) so the
# renderer's bar-chart code has something to group on.
# Pre-condition: either real samples (IDFW) or decoded intervals
# (IDFH). Skip otherwise.
have_data = bool(idf_samples) or bool(idf_intervals)
if have_data and not args.skip_hdf5:
from sfm import event_hdf5
hdf5_path = store.hdf5_path_for(serial, path.name)
if args.dry_run:
log.debug("[DRY] would write %s", hdf5_path.name)
else:
try:
from micromate import IdfEvent
from minimateplus.event_file_io import file_sha256
idf_event = IdfEvent.from_report(report_dict, path.name)
# Merge the binary-derived mic peak psi (only the
# binary path knows the proper psi value; the .txt
# carries dB(L)). Without this, the h5 writer's
# per-count mic factor is computed against the
# dB(L) value-as-pseudo-psi and the mic chart
# scales wildly.
if (binary_md is not None and res is not None
and res.event.peaks.mic_pspl_psi is not None):
idf_event.peaks.mic_pspl_psi = res.event.peaks.mic_pspl_psi
sha256 = file_sha256(path)
waveform_key = bytes.fromhex(sha256)[:16]
ev = idf_event.to_minimateplus_event(waveform_key)
if is_histogram and idf_intervals:
# 1 sample per interval per channel — same
# synthesis save_imported_idf uses. The h5
# writer's count×geo_fs/32768 conversion turns
# each peak-ADC-count into the bar's physical
# value.
ev.raw_samples = {
"Tran": [iv.peak_count("Tran") for iv in idf_intervals],
"Vert": [iv.peak_count("Vert") for iv in idf_intervals],
"Long": [iv.peak_count("Long") for iv in idf_intervals],
"MicL": [iv.peak_count("MicL") for iv in idf_intervals],
}
ev.total_samples = ev.total_samples or len(idf_intervals)
elif idf_samples:
ev.raw_samples = idf_samples
n_samp = max(
(len(idf_samples.get(ch, []))
for ch in ("Tran", "Vert", "Long", "MicL")),
default=0,
)
ev.total_samples = ev.total_samples or n_samp
event_hdf5.write_event_hdf5(
hdf5_path, ev,
serial=serial,
geo_range="normal",
source_kind="idf-import",
tool_version=event_file_io.TOOL_VERSION,
)
h5_written += 1
log.debug("%s/%s — .h5 written (%s)",
serial, path.name,
f"{len(idf_intervals)} intervals" if is_histogram
else f"{sum(len(v) for v in (idf_samples or {}).values())} samples")
except Exception as exc:
log.warning("%s/%s — .h5 write failed: %s",
serial, path.name, exc)
log.info("Done. refreshed=%d skipped=%d errors=%d h5_written=%d",
refreshed, skipped, errors, h5_written)
return 0 if errors == 0 else 2
if __name__ == "__main__":
sys.exit(main())
+100
View File
@@ -0,0 +1,100 @@
#!/usr/bin/env bash
# Fire-and-forget Stop Monitoring loop — for wedged or constantly-triggering units.
#
# Hammers POST /device/stop_monitoring_blind in a tight loop. The endpoint
# opens TCP, dumps SESSION_RESET + a few copies of the SUB 0x97 frame, and
# closes — without ever reading an S3 response. Each TCP-won attempt is
# ~50ms of wire activity instead of the multi-frame handshake the regular
# rescue endpoint does, so windows that are too small for the full rescue
# can still land a stop-monitoring command.
#
# Usage:
# ./blind_stop.sh <host> [tcp_port]
#
# Env:
# SFM_BASE_URL Default: http://localhost:8200 (SFM direct).
# Set to http://localhost:8001/api/sfm to route through
# Terra-View's proxy.
# MAX_ATTEMPTS Default: 600
# SLEEP_S Default: 0 (no backoff — hammer it)
# MAX_TIME_S Default: 15
# CONNECT_TIMEOUT Default: 5
# REPEAT Frames per TCP session (default 3 — increases hit rate
# if the device is busy reading its own buffer).
# STOP_ON_OK Default: 1. Set to 0 to keep hammering indefinitely
# even after successful sends (every 503 means the device
# is in *another* session, every 200 means our bytes got
# through — but the device may not have processed them).
set -u
host="${1:-}"
tcp_port="${2:-9034}"
if [[ -z "$host" ]]; then
echo "usage: $0 <host> [tcp_port]" >&2
exit 2
fi
base="${SFM_BASE_URL:-http://localhost:8200}"
max_attempts="${MAX_ATTEMPTS:-600}"
sleep_s="${SLEEP_S:-0}"
max_time_s="${MAX_TIME_S:-15}"
connect_timeout="${CONNECT_TIMEOUT:-5}"
repeat="${REPEAT:-3}"
stop_on_ok="${STOP_ON_OK:-1}"
url="${base}/device/stop_monitoring_blind?host=${host}&tcp_port=${tcp_port}&connect_timeout=${connect_timeout}&repeat=${repeat}"
echo "blind_stop: target ${host}:${tcp_port} connect_timeout=${connect_timeout}s repeat=${repeat}"
echo "blind_stop: POST ${url}"
echo "blind_stop: up to ${max_attempts} attempts, ${sleep_s}s between, ${max_time_s}s per request"
echo "blind_stop: stop_on_ok=${stop_on_ok}"
echo
ok_count=0
busy_count=0
err_count=0
started=$(date +%s)
for ((i=1; i<=max_attempts; i++)); do
printf "[%4d] %s " "$i" "$(date +%H:%M:%S)"
http_code=$(curl -sS -o /tmp/blind_resp.$$ -w "%{http_code}" \
--max-time "$max_time_s" \
-X POST "$url" || echo "000")
body=$(cat /tmp/blind_resp.$$ 2>/dev/null || true)
rm -f /tmp/blind_resp.$$
case "$http_code" in
200|201)
ok_count=$((ok_count + 1))
echo "SENT $body"
if [[ "$stop_on_ok" == "1" ]]; then
elapsed=$(( $(date +%s) - started ))
echo
echo "blind_stop: success after ${i} attempts (${elapsed}s). ok=${ok_count} busy=${busy_count} err=${err_count}"
echo "blind_stop: NEXT — wait ~10s, then try the full rescue:"
echo " /home/serversdown/seismo-relay/scripts/rescue_device.sh ${host} ${tcp_port}"
exit 0
fi
;;
503)
busy_count=$((busy_count + 1))
echo "busy (503)"
;;
000)
err_count=$((err_count + 1))
echo "curl error"
;;
*)
err_count=$((err_count + 1))
echo "HTTP $http_code $body" | head -c 400
echo
;;
esac
[[ "$sleep_s" != "0" ]] && sleep "$sleep_s"
done
elapsed=$(( $(date +%s) - started ))
echo
echo "blind_stop: gave up after ${max_attempts} attempts (${elapsed}s). ok=${ok_count} busy=${busy_count} err=${err_count}" >&2
exit 1
+185
View File
@@ -0,0 +1,185 @@
"""
scripts/check_bw_report_preservation.py verify that running backfill_sidecars
doesn't wipe the `bw_report` block from sidecars that already had one.
Two-step workflow:
# Before running backfill — capture a baseline snapshot:
python scripts/check_bw_report_preservation.py snapshot \
--store-root /path/to/waveforms \
--out before.json
# Run backfill:
python scripts/backfill_sidecars.py --store-root /path/to/waveforms --force
# After backfill — diff against the baseline:
python scripts/check_bw_report_preservation.py diff \
--store-root /path/to/waveforms \
--baseline before.json
The diff classifies every sidecar into one of:
PRESERVED had bw_report before, has same hash now GOOD
CHANGED had bw_report before, has different hash now suspicious
(backfill should only ever copy the block verbatim)
WIPED had bw_report before, doesn't now ← BUG — data loss
STILL_MISSING didn't have bw_report before, still doesn't expected
NEW didn't have bw_report before, has one now
(only possible if a re-ingest happened between snapshots;
shouldn't happen during backfill)
REMOVED sidecar existed in baseline, file is gone now
ADDED sidecar didn't exist in baseline, exists now
Exit code is 0 if no WIPED or CHANGED entries are found, 1 otherwise.
"""
from __future__ import annotations
import argparse
import hashlib
import json
import sys
from pathlib import Path
from typing import Optional
# Allow running from the repo root without installation.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from minimateplus import event_file_io
def _bw_report_hash(sidecar_data: dict) -> Optional[str]:
"""Canonical-JSON hash of the bw_report block, or None if absent."""
br = sidecar_data.get("bw_report")
if not br:
return None
# sort_keys for stable hashing across dict-ordering differences
blob = json.dumps(br, sort_keys=True, separators=(",", ":"))
return hashlib.sha256(blob.encode()).hexdigest()
def _scan_store(store_root: Path) -> dict:
"""Walk every <serial>/<file>.sfm.json and return {relpath: hash_or_None}.
Relpath is `<serial>/<filename>` stable across machines/snapshots.
"""
out: dict[str, Optional[str]] = {}
for serial_dir in sorted(p for p in store_root.iterdir() if p.is_dir()):
for sidecar in sorted(serial_dir.glob("*.sfm.json")):
relpath = f"{serial_dir.name}/{sidecar.name}"
try:
data = event_file_io.read_sidecar(sidecar)
except Exception as exc:
print(f" WARN: failed to read {relpath}: {exc}", file=sys.stderr)
continue
out[relpath] = _bw_report_hash(data)
return out
def cmd_snapshot(args) -> int:
store_root = Path(args.store_root).expanduser().resolve()
if not store_root.exists():
print(f"error: store root does not exist: {store_root}", file=sys.stderr)
return 2
out_path = Path(args.out).expanduser().resolve()
print(f"Scanning {store_root}")
snapshot = _scan_store(store_root)
with_bw = sum(1 for v in snapshot.values() if v is not None)
without_bw = sum(1 for v in snapshot.values() if v is None)
print(f" total sidecars: {len(snapshot)}")
print(f" with bw_report: {with_bw}")
print(f" without bw_report: {without_bw}")
out_path.parent.mkdir(parents=True, exist_ok=True)
with open(out_path, "w") as f:
json.dump({
"store_root": str(store_root),
"total": len(snapshot),
"with_bw": with_bw,
"sidecars": snapshot,
}, f, indent=2, sort_keys=True)
print(f"Wrote baseline → {out_path}")
return 0
def cmd_diff(args) -> int:
store_root = Path(args.store_root).expanduser().resolve()
if not store_root.exists():
print(f"error: store root does not exist: {store_root}", file=sys.stderr)
return 2
baseline_path = Path(args.baseline).expanduser().resolve()
if not baseline_path.exists():
print(f"error: baseline file not found: {baseline_path}", file=sys.stderr)
return 2
with open(baseline_path) as f:
baseline = json.load(f)
before = baseline["sidecars"]
print(f"Scanning {store_root} for comparison against {baseline_path.name}")
after = _scan_store(store_root)
classes = {k: [] for k in (
"PRESERVED", "CHANGED", "WIPED", "STILL_MISSING", "NEW", "REMOVED", "ADDED",
)}
all_keys = set(before) | set(after)
for key in sorted(all_keys):
b = before.get(key, "__MISSING__")
a = after.get(key, "__MISSING__")
if b == "__MISSING__":
classes["ADDED"].append(key)
elif a == "__MISSING__":
classes["REMOVED"].append(key)
elif b is None and a is None:
classes["STILL_MISSING"].append(key)
elif b is None and a is not None:
classes["NEW"].append(key)
elif b is not None and a is None:
classes["WIPED"].append(key)
elif b == a:
classes["PRESERVED"].append(key)
else:
classes["CHANGED"].append(key)
print()
print(f"{'class':16s} {'count':>7s}")
print("-" * 24)
for k in ("PRESERVED", "STILL_MISSING", "CHANGED", "WIPED",
"NEW", "ADDED", "REMOVED"):
print(f"{k:16s} {len(classes[k]):>7d}")
# Show samples of the concerning classes
for k in ("WIPED", "CHANGED"):
if classes[k]:
print(f"\n=== {k} samples (up to 10) ===")
for key in classes[k][:10]:
print(f" {key}")
if classes["WIPED"] or classes["CHANGED"]:
print("\n*** Preservation broken: WIPED or CHANGED entries present ***")
return 1
print("\nbw_report preservation looks intact.")
return 0
def main(argv=None) -> int:
p = argparse.ArgumentParser(description=__doc__)
sub = p.add_subparsers(dest="cmd", required=True)
p_snap = sub.add_parser("snapshot", help="capture baseline bw_report hashes")
p_snap.add_argument("--store-root", required=True)
p_snap.add_argument("--out", required=True, help="output JSON path")
p_snap.set_defaults(func=cmd_snapshot)
p_diff = sub.add_parser("diff", help="diff current store against a baseline")
p_diff.add_argument("--store-root", required=True)
p_diff.add_argument("--baseline", required=True, help="JSON from `snapshot`")
p_diff.set_defaults(func=cmd_diff)
args = p.parse_args(argv)
return args.func(args)
if __name__ == "__main__":
sys.exit(main())
+151
View File
@@ -0,0 +1,151 @@
"""
scripts/repair_unknown_serials.py re-attribute events stuck under
`serial = 'UNKNOWN'` to their correct serial by decoding the BW filename.
Why this is needed
The /db/import/blastware_file endpoint had a bug (fixed in commit a032fa5+1
on the ach-report-ingestion branch) where every forwarded event was inserted
with serial='UNKNOWN' because the endpoint's `_serial_from_event(ev)` stub
returned None and never consulted the BW-filename serial that
`WaveformStore.save_imported_bw()` had already decoded.
Effect on a server that ran a buggy version: every forwarded event's
SeismoDb row has `serial='UNKNOWN'`, even though the on-disk waveform
store has correctly bucketed the files into `BE<NNNN>/` folders. So
the BW binaries / sidecars / HDF5s are fine, but `/db/units` and
`/db/events?serial=...` queries don't surface the events.
This script
Walks the events table looking for rows with `serial='UNKNOWN'` and
re-attributes each one to the serial decoded from its
`blastware_filename` column. If the row's serial would collide with
an existing row (already-correct duplicate from a later re-forward),
the UNKNOWN row is deleted. Otherwise the row's `serial` column is
updated in-place.
Idempotent: re-running after a successful repair finds zero matching
rows and exits cleanly.
Usage
# Dry-run (default): print what would change, don't touch the DB
python -m scripts.repair_unknown_serials --db bridges/captures/seismo_relay.db
# Apply the repair
python -m scripts.repair_unknown_serials --db bridges/captures/seismo_relay.db --apply
"""
from __future__ import annotations
import argparse
import sqlite3
import sys
from pathlib import Path
# Reach into sfm.waveform_store for the serial decoder. This script
# is run from the repo root via `python -m scripts.repair_unknown_serials`.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from sfm.waveform_store import _serial_from_bw_filename
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(
description="Re-attribute events stuck under serial='UNKNOWN'.",
)
p.add_argument(
"--db", required=True, type=Path,
help="Path to seismo_relay.db (e.g. bridges/captures/seismo_relay.db)",
)
p.add_argument(
"--apply", action="store_true",
help="Apply the repair. Without this flag the script runs in "
"dry-run mode and only reports what would change.",
)
args = p.parse_args(argv)
if not args.db.exists():
print(f"DB not found: {args.db}", file=sys.stderr)
return 2
conn = sqlite3.connect(str(args.db))
conn.row_factory = sqlite3.Row
rows = list(conn.execute(
"SELECT id, serial, timestamp, blastware_filename "
" FROM events "
" WHERE serial = 'UNKNOWN' "
" ORDER BY timestamp",
))
print(f"Found {len(rows)} UNKNOWN-serial rows in events table.")
if not rows:
return 0
updated = 0
deleted = 0
unresolved = 0
by_serial: dict[str, int] = {}
for row in rows:
rid = row["id"]
ts = row["timestamp"]
bw_name = row["blastware_filename"]
new_serial = _serial_from_bw_filename(bw_name) if bw_name else None
if not new_serial:
print(f" ⚠ id={rid[:8]} ts={ts} filename={bw_name!r}"
f"cannot decode serial from filename; skipping")
unresolved += 1
continue
# Check for an existing row at the target (serial, timestamp).
existing = conn.execute(
"SELECT id FROM events WHERE serial = ? AND timestamp = ?",
(new_serial, ts),
).fetchone()
action: str
if existing is None:
# Safe to UPDATE in place.
if args.apply:
conn.execute(
"UPDATE events SET serial = ? WHERE id = ?",
(new_serial, rid),
)
action = "UPDATE"
updated += 1
else:
# A correctly-attributed row already exists. Drop the
# UNKNOWN duplicate.
if args.apply:
conn.execute("DELETE FROM events WHERE id = ?", (rid,))
action = "DELETE (dup)"
deleted += 1
by_serial[new_serial] = by_serial.get(new_serial, 0) + 1
print(f" {action:14s} id={rid[:8]} ts={ts} "
f"filename={bw_name}{new_serial}")
if args.apply:
conn.commit()
conn.close()
print()
print(f"Summary:")
print(f" UNKNOWN rows scanned: {len(rows)}")
print(f" Updated to real serial: {updated}")
print(f" Deleted (duplicate of an ")
print(f" already-correct row): {deleted}")
print(f" Unresolved (bad filename): {unresolved}")
print()
if by_serial:
print(f"Per-serial breakdown of repaired rows:")
for serial, count in sorted(by_serial.items()):
print(f" {serial:12s} {count}")
if not args.apply:
print()
print("(dry-run — re-run with --apply to commit)")
return 0
if __name__ == "__main__":
sys.exit(main())
+99
View File
@@ -0,0 +1,99 @@
#!/usr/bin/env bash
# Rescue an uncooperative MiniMate that's busy with another ACH session.
#
# Hammers POST /device/rescue in a tight loop with a short timeout. When the
# device is in an ACH session our SYN either gets refused or silently dropped
# (5s connect timeout inside the endpoint) and we retry immediately. When the
# device is between sessions, our TCP wins, the endpoint disables Auto Call
# Home and erases events inside the same session, then returns success.
#
# Usage:
# ./rescue_device.sh <host> [tcp_port] [--no-erase] [--no-disable-ach]
#
# Examples:
# ./rescue_device.sh 166.246.130.1 9034
# ./rescue_device.sh 166.246.130.1 9034 --no-erase # just silence it
#
# Environment:
# SFM_BASE_URL Defaults to http://localhost:8200 (SFM direct).
# Set to http://localhost:8001/api/sfm to route through
# Terra-View's proxy. Direct mode avoids the proxy's
# 60s timeout, which matters for long-running endpoints.
# MAX_ATTEMPTS Cap on retries (default 600 ≈ 30+ min).
# SLEEP_S Backoff between attempts (default 1).
# MAX_TIME_S Per-request timeout (default 60).
# CONNECT_TIMEOUT TCP connect timeout (default 5).
# RECV_TIMEOUT Per-frame S3 recv timeout (default 5). If POLL or any
# subsequent frame doesn't respond within this window, the
# rescue endpoint bails and this script retries.
set -u
host="${1:-}"
tcp_port="${2:-9034}"
shift 2 2>/dev/null || shift $# 2>/dev/null
if [[ -z "$host" ]]; then
echo "usage: $0 <host> [tcp_port] [--no-erase] [--no-disable-ach]" >&2
exit 2
fi
disable_ach="true"
erase="true"
for arg in "$@"; do
case "$arg" in
--no-erase) erase="false" ;;
--no-disable-ach) disable_ach="false" ;;
*) echo "unknown flag: $arg" >&2; exit 2 ;;
esac
done
base="${SFM_BASE_URL:-http://localhost:8200}"
max_attempts="${MAX_ATTEMPTS:-600}"
sleep_s="${SLEEP_S:-1}"
max_time_s="${MAX_TIME_S:-60}"
connect_timeout="${CONNECT_TIMEOUT:-5}"
recv_timeout="${RECV_TIMEOUT:-5}"
url="${base}/device/rescue?host=${host}&tcp_port=${tcp_port}&disable_ach=${disable_ach}&erase=${erase}&connect_timeout=${connect_timeout}&recv_timeout=${recv_timeout}"
echo "rescue: target ${host}:${tcp_port} disable_ach=${disable_ach} erase=${erase}"
echo "rescue: connect_timeout=${connect_timeout}s recv_timeout=${recv_timeout}s"
echo "rescue: POST ${url}"
echo "rescue: up to ${max_attempts} attempts, ${sleep_s}s between, ${max_time_s}s per request"
echo
started=$(date +%s)
for ((i=1; i<=max_attempts; i++)); do
printf "[%3d] %s " "$i" "$(date +%H:%M:%S)"
http_code=$(curl -sS -o /tmp/rescue_resp.$$ -w "%{http_code}" \
--max-time "$max_time_s" \
-X POST "$url" || echo "000")
body=$(cat /tmp/rescue_resp.$$ 2>/dev/null || true)
rm -f /tmp/rescue_resp.$$
case "$http_code" in
200|201)
elapsed=$(( $(date +%s) - started ))
echo "OK (${elapsed}s total)"
echo "$body"
exit 0
;;
503)
# Connection refused / timeout — device busy in another session. Retry fast.
echo "busy (503)"
;;
000)
echo "curl error (network)"
;;
*)
echo "HTTP $http_code"
echo " $body" | head -c 400
echo
;;
esac
sleep "$sleep_s"
done
echo "rescue: gave up after ${max_attempts} attempts" >&2
exit 1
+44
View File
@@ -0,0 +1,44 @@
#!/usr/bin/env bash
# Hold a single TCP session open and drip stop-monitoring frames at a slow
# rate, so the device's UART RX FIFO has time to drain between sends.
#
# Use when high-rate spam isn't landing — typically because the device's
# firmware is too busy to drain its serial buffer fast enough and bytes
# are being lost to UART overrun.
#
# Usage:
# ./slow_drip.sh <host> [tcp_port] [duration_s]
#
# Env:
# DURATION Default: 120 (seconds; arg 3 overrides). Clamped 1..600.
# INTERVAL Seconds between drip sends (default 3). Lower = more
# aggressive, more risk of FIFO overrun. Higher = safer
# but fewer total drips per duration.
# CONNECT_TIMEOUT Default: 5
# SFM_BASE_URL Default: http://localhost:8200 (SFM direct).
set -u
host="${1:-}"
tcp_port="${2:-9034}"
duration="${3:-${DURATION:-120}}"
if [[ -z "$host" ]]; then
echo "usage: $0 <host> [tcp_port] [duration_s]" >&2
exit 2
fi
base="${SFM_BASE_URL:-http://localhost:8200}"
interval="${INTERVAL:-3}"
connect_timeout="${CONNECT_TIMEOUT:-5}"
url="${base}/device/stop_monitoring_slow_drip?host=${host}&tcp_port=${tcp_port}&duration_s=${duration}&interval_s=${interval}&connect_timeout=${connect_timeout}"
echo "slow_drip: target ${host}:${tcp_port} duration=${duration}s interval=${interval}s connect_timeout=${connect_timeout}s"
echo "slow_drip: POST ${url}"
echo
# Give curl enough slack to wait out the duration plus a buffer
max_time=$(awk -v d="$duration" 'BEGIN { printf "%d", d + 30 }')
curl -sS --max-time "$max_time" -X POST "$url"
echo

Some files were not shown because too many files have changed in this diff Show More