feat(import): v0.16.0 - Fully implemented series 3 BW-ACH pipeline stablized. #19
Reference in New Issue
Block a user
Delete Branch "ach-report-ingestion"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This branch has a fully working BW-ACH to SFM ingestion pipeline. Uses series3-watcher, pushes events and corresponding ASCII files to SFM. SFM then write it to SQLite DB, as well as generating all files needed sidecar files for event standardization.
Blastware's ACH writes a per-event ASCII report (.TXT) alongside each event binary, containing the rich derived per-channel fields BW computes (PPV, ZC Freq, Time of Peak, Peak Acceleration, Peak Displacement, Peak Vector Sum + time, sensor self-check Pass/Fail, monitor-log timestamps). None of this lives in the BW binary itself. When the watcher daemon forwards both files to /db/import/blastware_file in one multipart POST, we now: - Pair binaries with their .TXT partners by filename match - Parse the report into a structured BwAsciiReport - Land the rich fields in a new top-level `bw_report` block of the sidecar JSON - Overlay the report's peaks/project_info/timestamp/sample_rate/ record_time/total_samples/pretrig_samples onto the canonical sidecar fields (the report values are device-authoritative; the BW-binary STRT-derived values had bugs like reading the 0x46 record-type marker as rectime) This unblocks the monthly-summary review workflow — events become sortable/filterable by peak, location, project, etc. — without depending on the still-undecoded waveform body codec.Blastware writes the operator-supplied fields with different label spellings across firmware versions and recording modes — most notably "Seis. Location" on histogram exports vs "Seis Loc:" on waveform exports. Previous parser only matched the latter, so every histogram event silently lost its sensor_location field. Replace the four hardcoded `key.rstrip(":") == "X"` branches with a single `_OPERATOR_LABEL_MAP` dispatch table keyed by normalised label (lowercase, trailing colon/period stripped, internal whitespace collapsed). Adds these variants on day 1: project: "Project:" / "Project" client: "Client:" / "Client" operator: "User Name:" / "User Name" sensor_location: "Seis Loc:" / "Seis. Location" / "Seis Location" / "Sensor Location" / "Seis Loc" To absorb future BW label drift, add a one-line dict entry — no new elif branch. 14 new tests cover: - Each label variant routes to the correct field (parametrised) - Case-insensitive matching ("seis loc" / "SEIS LOC" / "SeIs LoC") - Whitespace-collapse ("Seis Loc" with double-space) - End-to-end parse of a real histogram fixture from example-events/histogram/ — sensor_location ('Loc #1 - 2652 Hepner...') populates correctly even though the file uses "Seis. Location" Total bw_ascii_report tests: 19 → 33. Full SFM suite still green (69 passed, 44 skipped — pre-existing skips for h5py-dep tests). Pairs with series3-watcher v1.5.4 (which fixes the filename pairing so histograms actually reach this parser in the first place).The /db/import/blastware_file endpoint was bucketing every forwarded event into serial='UNKNOWN' in the DB. WaveformStore correctly decoded the serial from the BW filename and saved files to <store>/<serial>/<filename> (e.g. .../BE17353/S353L5KC.DR0H.h5), but the endpoint code called db.insert_events(serial=_serial_from_event(ev)) — and _serial_from_event was a stub that always returned None, falling back to "UNKNOWN". Effect on the user's prod server: 3,039 events forwarded across 24 distinct units, ALL inserted under serial='UNKNOWN'. The on-disk waveform store + sidecars + HDF5s were fine, but the SFM webapp's /db/units only showed the two original manually- uploaded serials because every forwarded row had its serial column zeroed to UNKNOWN. Fix: - WaveformStore.save_imported_bw() now surfaces the decoded serial on the returned `rec` dict (rec["serial"]). - The import endpoint uses rec["serial"] as the authoritative fallback when the operator hasn't supplied a serial_hint query parameter. Order of precedence: query string `serial` → rec["serial"] → _serial_from_event(ev) → "UNKNOWN" - Response payload now includes `serial` per file so the watcher log lines (or any future caller) can see which unit each event was attributed to. Recovery for existing DB rows: scripts/repair_unknown_serials.py walks the events table looking for rows with serial='UNKNOWN' and re-attributes each one to the serial decoded from blastware_filename. Updates the row in place unless the target (serial, timestamp) already has a row, in which case the UNKNOWN duplicate is deleted. Idempotent. Default dry-run; pass --apply to commit. Verified on the user's actual DB (dry-run): UNKNOWN rows scanned: 3039 Updated to real serial: 2602 Deleted (duplicate of an already-correct row): 437 Unresolved (bad filename): 0 After running the repair, /db/units will show all 24 units correctly populated.Previous query_units() only joined on ach_sessions, which is created exclusively by the live ACH server. The BW-importer path (/db/import/blastware_file → WaveformStore.save_imported_bw → SeismoDb.insert_events) populates `events` but never creates an ach_sessions row. Consequence: every serial whose events flowed in through the series3-watcher forwarder was invisible to /db/units (and therefore to the SFM webapp's fleet overview / units list), even though the events were correctly populated in the events table with proper serial attribution. Rewrite query_units() to aggregate from BOTH tables and union the serials: - total_events / last_event_at come from `events` (every ingest path) - last_session_at / total_monitor_entries / total_sessions come from `ach_sessions` (ACH-only), 0 when no sessions exist for the serial - last_seen = max(last_event_at, last_session_at) Verified on the user's actual prod DB after the repair_unknown_serials run: /db/units now returns 24 serials instead of 2. All 3,257 watcher-forwarded events become visible in the fleet overview without any further DB surgery.Two compounding bugs caused forwarded events to land in the DB with broken-codec peak values (~10 in/s saturation on every channel) and no project info, even when the watcher correctly paired a BW ASCII report with the binary. Bug 1: save_imported_bw built the sidecar JSON with the report's authoritative peak / project values via event_to_sidecar_dict( bw_report=...), but never overlaid those onto the in-memory Event that flows to db.insert_events(). So the DB row got peak_values from read_blastware_file()._peaks_from_samples() — which runs the still-undecoded waveform body codec assuming raw int16 LE and produces ±32K-shaped noise (= ±10 in/s at Normal range) regardless of the actual signal. The sidecar JSON had the truth but the DB columns (which the webapp queries for fast filter/sort) lied. Bug 2: insert_events' IntegrityError handler only refreshed the filename/filesize/a5_pickle/sidecar columns when a duplicate (serial, timestamp) was seen. Peak values, project info, sample_rate, record_type stayed locked in at whatever the FIRST insert wrote. So even after Bug 1 was fixed, the historical events in the DB (already inserted with broken-codec peaks) would never get their values corrected, because a re-forward would just hit IntegrityError and skip the field refresh. Fix 1 (minimateplus/event_file_io.py + sfm/waveform_store.py): - New apply_report_to_event(event, report) helper folds the BW report's device-authoritative fields onto the Event in-place: per-channel PPV, peak vector sum, mic PSPL→psi, project / client / operator / sensor_location, sample_rate, record_time. - save_imported_bw() calls the helper right after parsing the report. The Event that flows to insert_events() now carries correct values. Fix 2 (sfm/database.py): - insert_events()'s IntegrityError UPDATE now refreshes every device-authoritative column from the new data: tran_ppv, vert_ppv, long_ppv, peak_vector_sum, mic_ppv, project, client, operator, sensor_location, sample_rate, record_type, plus the existing filename/filesize/a5_pickle/sidecar fields. - Preserves: id, waveform_key, session_id, created_at (immutable / FK fields), and false_trigger (operator review state). End-to-end simulation verified: - Step 1: import without report → DB has ±10 in/s peaks, no project - Step 2: re-import WITH report → upsert path fires, DB now has device-authoritative 0.005 in/s peaks + sensor_location - Step 3: operator sets false_trigger=1, re-import again → flag preserved, peaks remain correct For the user's situation: deleting the watcher state file forces a re-forward of all events. Each re-forward now pairs with its _ASCII.TXT, applies the report onto the Event, and the upsert refreshes the DB row. No DB nuke needed. Full SFM suite: 62 passed, 44 skipped.The series3-watcher v1.5.0 fix taught the WATCHER to look for BW ACH's _ASCII.TXT report alongside each binary. But the SFM SERVER's import endpoint only knew about the legacy <binary>.TXT naming when building its TXT lookup table. Effect: even though the watcher correctly shipped both files in the multipart POST (and logged "+ <name>_ASCII.TXT attached"), the server's reports dict was keyed on the wrong name, so report_bytes resolved to None for every event. Without the report, save_imported_bw fell back to broken-codec peak values and no project info — exactly the same symptom as before the watcher fix landed, just for a different reason. Fix: when stripping the ".TXT" suffix, also recognise the "_ASCII" trailer and reconstruct the binary's filename by converting the last "_" back to ".". Register the report under BOTH possible binary names so the subsequent lookup matches whichever convention the operator's BW installation uses. ACH convention (Blastware ACH): binary T003L2G6.0E0H + report T003L2G6_0E0H_ASCII.TXT ✅ Manual export (operator clicks Save As Text in BW): binary M529LK44.AB0 + report M529LK44.AB0.TXT ✅ Both for same event (e.g. ACH + operator manual save): register under both names; binary lookup wins ✅ Smoke-tested against the four real fixture filenames in the project archive. Full SFM suite still 62 pass. For the user's situation: pull, restart, and the NEXT re-forward pass (after deleting watcher state file again if needed) will hit this code path, parse the report correctly, apply the overlay onto the Event, and the upsert path will land authoritative peak values + project info in the DB.Consolidates everything that was floating in chat-only "parking lot" status into the README's Roadmap (Future) section: High-impact (unblocks product features): - Waveform body codec reverse-engineering - In-app waveform viewer accuracy (depends on codec) - Terra-view integration - Vibration summary reports BW ASCII report parser enhancements: - Histogram-specific structural fields - Histogram interval bin-table parsing - ">100 Hz" value parsing Ingestion gaps: - MLG forwarding (watcher + SFM endpoint) - 0C-record raw bytes persistence in sidecar Operational: - series3-watcher file archive manager - Existing operational items (compliance encoder, modem manager, Call Home dial_string write, histogram mode 5A stream) Test coverage + lower-priority cleanups. CLAUDE.md "What's next" section now points to the README as the canonical deferred-work list, and keeps its own low-level technical status log for byte-layout details that don't belong in the roadmap.