Files
seismo-relay/docs/idf_protocol_reference.md
T
serversdown ecc935482b seismo-relay v0.19.0 — device-family separation + micromate/ package
Tighten the Series III / Series IV boundary so UI and storage dispatch
on a clean signal instead of sniffing filenames or applying magnitude
heuristics.

Phase 1 — events.device_family column ("series3" | "series4"):
  self-applying migration with filename-based backfill of existing rows
  (1,132 backfilled on prod 2026-05-20); plumbed through every import
  path (BW endpoint, IDF endpoint, ACH server, BW CLI, sidecar
  backfill); UPSERT preserves via COALESCE; UI dispatches on it.

Phase 2 — extract micromate/ package alongside minimateplus/:
  native IdfEvent / IdfReport / IdfPeaks / IdfProjectInfo /
  IdfSensorCheck (mic in dB(L), not pseudo-psi); moved
  idf_ascii_report.py from sfm/ to micromate/; refactored
  save_imported_idf to use IdfEvent and bridge to minimateplus.Event at
  the SQL-insert boundary; idf_file.py stub for the future binary codec.

Phase 3 prep — docs/idf_protocol_reference.md captures the two
observed Thor binary header signatures (1,012 newer-firmware files vs
2 old files whose layout is byte-for-byte BW-STRT-compatible), file-size
hints suggesting int8 sample encoding, open questions in dependency
order, and a concrete first-session plan for cracking the codec.

Also rolled in the v0.18.1 hotfixes that motivated this work:
  - idf_ascii_report parser now handles "<0.005 in/s" (below-threshold)
    and "N/A" markers without leaving raw strings in numeric DB columns.
  - sfm_webapp.html: defensive _ppvFmt / mic formatter so future
    data-shape drift can't kill the whole events table render.

All 1,014 example-data sidecars round-trip through the new package.
See CHANGELOG.md for full notes.
2026-05-20 15:19:49 +00:00

12 KiB
Raw Blame History

IDF Protocol Reference — Thor / Micromate Series IV

Starting-point reference for reverse-engineering Instantel's Micromate Series IV event-file format. Sibling to instantel_protocol_reference.md (the Series III "Rosetta Stone") — this doc holds what we know so far and the open questions still to crack.

Status (2026-05-20): ASCII text sidecar fully decoded (1,014 sample files round-trip). Binary .IDFH / .IDFW codec not yet implemented — binaries are stored opaquely by WaveformStore.save_imported_idf, with metadata sourced from the paired .txt sidecar.


File model

Filename convention

<SERIAL>_<YYYYMMDDHHMMSS>.<KIND>
  • SERIAL — literal device serial, two-letter prefix + numeric suffix. Examples seen: UM11719, UM13981, UM20147, BE9439. Unlike Series III BW filenames (M529LK44.AB0, base-36 stem), Series IV filenames carry the serial in plain text.
  • YYYYMMDDHHMMSS — 14-char ASCII timestamp in device local time (no timezone marker).
  • KINDIDFH for histograms, IDFW for waveforms.

The .IDFH.txt / .IDFW.txt ASCII sidecar lives in a TXT/ subfolder of the unit's directory, not alongside the binary. This pairing convention is encoded in event_forwarder.idf_report_path().

Directory layout

C:\THORDATA\
└── <Project>\
    └── <UM####>\                  ← unit serial dir
        ├── UM12345_20260520100000.MLG     ← monitor log (not events)
        ├── UM12345_20260520100000.IDFH    ← histogram event (binary)
        ├── UM12345_20260520100000.IDFW    ← waveform event (binary)
        ├── UM12345_20260520100000.IDFW.CDB ← cache-DB variant (skip)
        ├── TXT\
        │   ├── UM12345_20260520100000.IDFH.txt    ← histogram ASCII sidecar
        │   └── UM12345_20260520100000.IDFW.txt    ← waveform  ASCII sidecar
        ├── CSV\, HTML\, PDF\, XML\        ← operator-facing derived exports
        └── ...

The .IDFW.CDB files share the binary's basename but appear to be a separate cache/database variant. Their first 8 bytes match the old-firmware Thor signature (see below) regardless of which signature the paired .IDFW uses. Purpose unknown; sizes vary wildly (observed 123 B → 40,491 B). Thor-watcher's forwarder deliberately skips them.

Sample corpus

The thor-watcher/example-data/THORDATA_example/ tree carries 1,014 paired .IDFW / .IDFH + .txt files spanning 20202023 across nine units (UM11719, UM13981, UM20147, …, plus BE9439 from 2020). This is the reverse-engineering ground truth.


ASCII sidecar (.IDFW.txt / .IDFH.txt) — fully decoded

Shape: plain text, one "Key : Value" line per metadata field, followed for waveforms by a tab-separated sample table headed by the literal line Waveform Data Channels. Parsed by micromate/idf_ascii_report.py. See micromate/models.py for the typed IdfReport shape.

Notable conventions

  • Units are native to Thor — geophone in in/s, microphone in dB(L) (not psi like Series III BW reports), frequency in Hz, acceleration in g, displacement in in.
  • Below-threshold readings appear as the literal string <0.005 in/s (155 occurrences in the sample corpus) — the parser strips the < and treats the numeric remainder as the value.
  • Out-of-range / not-measured values appear as N/A — parser drops the field rather than letting the string leak into a numeric column.
  • Firmware string observed: Micromate ISEE 11.0AK.
  • TitleString1..4 are operator-defined free-text slots; Thor's default labels map them to Location / Client / Company / Notes, which the parser surfaces as project / client / operator / notes.
  • Histogram sidecars use HistogramStartDate / HistogramStartTime in place of waveform's EventDate / EventTime. Parser falls through to either.
  • Histogram tabular block lacks the Waveform Data Channels marker; instead it's a multi-line column header followed by per-interval rows (<date> <time> <tran-ppv> <freq> ...). Parser silently ignores lines after the metadata block since they lack a colon-separated key : value shape (the timestamps DO contain colons but produce garbage keys that don't collide with any recognised field).

Binary header signatures (observed)

Hex dump of the first 32 bytes across 1,014 sample files reveals two distinct file signatures, both anchored by the literal ASCII string "\x00Instantel\x00" at offset 616:

Signature A — newer firmware (1,012 files, 99.8% of corpus)

00000000: 0012 0100 0000 496e 7374 616e 7465 6c00   ......Instantel.
00000010: 0000 a695 002e b500 4f70 6572 6174 6f72   ........Operator
                                ^^^^^^^^^^^^^^^^
                                operator/title string starts at 0x18

Header bytes 05: 00 12 01 00 00 00. Followed immediately by the 8-byte ASCII tag, then 6 unknown bytes, then ASCII operator-supplied strings (Operator name, etc.) and on through the project / client / title strings. No STRT record observed in this layout.

Signature B — older firmware (2 files: BE9439 from 2020)

00000000: 1000 0180 0000 496e 7374 616e 7465 6c00   ......Instantel.
00000010: 072c 0012 0300 5354 5254 fffe 0111 2340   .,....STRT....#@
                          ^^^^^^^^^                ^^^^^^^^^
                          STRT magic               4-byte end_key
00000020: 0111 0000 2e5f 00ac 4600 0000 0200 0000   ....._..F.......
          ^^^^^^^^^             ^^^
          4-byte start_key      0x46 (BW WAVEHDR record-type marker)

Header bytes 05: 10 00 01 80 00 00. The structure after the Instantel magic is byte-for-byte identical to a BW SUB 5A probe-response STRT record as documented in instantel_protocol_reference.md → "SUB 5A — STRT record encodes end_offset". Specifically:

Offset Bytes Meaning (per BW reference)
0x14 53 54 52 54 STRT magic
0x18 ff fe STRT sentinel
0x1A 01 11 23 40 end_key (4 bytes)
0x1E 01 11 00 00 start_key (4 bytes)
0x26 46 0x46 waveform-record type marker

Hypothesis: Older Micromate firmware writes a wrapped BW-format event into the .IDFW file — essentially the same on-disk shape as a Series III device, with the new filename convention applied at export time. Newer firmware (signature A) abandoned the BW-compatible layout for an Instantel-specific format.

If that hypothesis holds, the 2 signature-B files can already be parsed via minimateplus/event_file_io.read_blastware_file() — worth testing. The 1,012 signature-A files are the real reverse-engineering target.

.IDFW.CDB cache files

Always carry signature B (10 00 01 80 ...), even when the paired .IDFW carries signature A. Plausible explanation: the CDB is an internal Thor cache-database export that retains the legacy BW-style record layout regardless of the user-facing .IDFW format version. Not currently consumed by the forwarder.


File-size patterns (Signature A, the main target)

Survey of 1,012 signature-A files:

Event type Typical size Source of variance
.IDFW 2-sec 9,200 10,500 B Operator-supplied strings (TitleString1..4) of varying length
.IDFH 2,944 4,076 B Histogram interval count (record duration / interval)

Naive arithmetic for 2-sec waveform:

  • 4 channels × 2 sec × 1024 sps = 8,192 samples
  • At 2 bytes/sample (int16) = 16,384 sample bytes → file would be > 16 KB
  • Observed: ~910 KB
  • → samples are likely 1 byte each (int8 quantised), or stored with bit-packing / delta encoding, or only one channel's full-rate samples are stored with the others reconstructed arithmetically. Verifying this is the first RE milestone.

Project-stringlength variance (~1 KB across the corpus) is consistent with the file carrying a single copy of each TitleString1..4 plus operator + setup-name as null-padded ASCII regions.


Open questions

The reverse-engineering targets, roughly in dependency order:

  1. Sample encoding (signature A) — int8? int16 LE/BE? Bit-packed? Delta-coded? Per-channel interleaved or sequential blocks?
  2. Header field layout (signature A) — where do sample_rate, record_time, channel count, and per-channel peaks live in the binary? The ASCII sidecar gives the device-authoritative values, so binary fields can be confirmed by diff.
  3. Operator-string offsetsOperator at 0x18 is the first visible string in signature-A files; the rest (project, client, notes, setup) follow. Need to map exact offsets and null-padding conventions.
  4. Signature-B → BW codec compatibility — does minimateplus/event_file_io.read_blastware_file() actually parse the 2 BE9439 signature-B files as-is? If yes, the OLD-format ingest is free.
  5. .IDFW.CDB purpose — is it an internal Thor cache, a ring-buffer dump, or something else? Worth a single small effort to characterise so we know what we're skipping.
  6. Footer / checksum — every BW event file has a footer; does IDF? Where does the per-channel sample block end?

Reverse-engineering playbook (when we start)

The Series III BW codec took ~2 months of MITM wire captures because we didn't have ground-truth metadata. Thor's situation is substantially better:

  • Ground truth is on disk. Every binary in example-data/ has a paired .IDFW.txt carrying the full decoded sample table (Waveform Data Channels block — see any sample file in thor-watcher/example-data/.../TXT/). Aligning binary bytes to the table's float-per-row values gives an immediate per-byte hypothesis test.
  • Cross-event diffing. 1,012 signature-A samples from 9 units spanning 4 years means any field that varies between events is immediately localisable. Fields that are constant across all files (firmware ID, channel labels, format-version word) are also immediately localisable by complementary search.
  • No protocol surface. Files at rest, not a wire dialect. No DLE stuffing, no inner-frame parsing, no probe/data two-step.

Suggested first session (2-4 hours): hand-decode UM11719_20231219162723.IDFW (10,290 bytes) against its TXT/UM11719_20231219162723.IDFW.txt sample table (the 2-sec waveform at 1024 sps × 4 channels = 8,192 sample rows). Find the first per-channel sample value (0.0003 in the Tran column at t=0) in the binary. Confirms sample encoding. Everything else flows from there.


Code seams ready to receive the codec

When the codec lands, it goes into micromate/idf_file.py (currently a stub raising NotImplementedError). Public API:

from micromate import IdfEvent
from micromate.idf_file import read_idf_file

event: IdfEvent = read_idf_file(Path("UM11719_20231219163444.IDFW"))
# event.peaks.transverse_ips, event.timestamp, event.raw_samples, ...

The ingest pipeline (WaveformStore.save_imported_idf) currently builds the IdfEvent from the .txt parser only. Once read_idf_file() works, the binary becomes authoritative; the .txt parser drops to fast-path metadata cross-check. Operators who don't enable Thor's TXT exporter still get fully populated events.


See also