Files
seismo-relay/docs/idf_protocol_reference.md

16 KiB
Raw Permalink Blame History

IDF Protocol Reference — Thor / Micromate Series IV

Starting-point reference for reverse-engineering Instantel's Micromate Series IV event-file format. Sibling to instantel_protocol_reference.md (the Series III "Rosetta Stone") — this doc holds what we know so far and the open questions still to crack.

Status (2026-05-28): ASCII text sidecar fully decoded (1,014 sample files round-trip). Thor IDFW binary now decodes via micromate.idf_file.read_idf_file() — reuses the BW segment-rotated block codec verbatim at fixed body offset 0x0f1f; metadata (serial, timestamp, sample_rate, record_time, calibration_date) extracted from the binary header. Sample fidelity is 8799% byte-exact on quiet events; loud events hit the BW codec's known walker-stops-early limitation. Residual ~3% drift on per-sample deltas (likely a Thor-specific 12-bit delta refinement not yet modelled).

Thor IDFH histograms also decoded. Body has one or more segments; each 12-byte segment header [length_be 2B][0a 00 00 00][00 NN][05 3f] introduces N = (length - 10) // 72 interval records of 72 bytes each. Each interval = 4 × 16-byte per-channel records: [int16 min][int16 max][int16 ??][uint16 halfp][2B 00][uint16 ??][2B 00][uint16 ??]. Geo peak = max(|min|, |max|) / 32768 × 10 in/s (matches sidecar ~1.8%); freq = 512 / halfp Hz (None for halfp ≤ 5 → ">100" sentinel). Corpus: all 859 Thor IDFH files decode, 181,071 intervals. Wired through read_idf_file()save_imported_idf() → sidecar's extensions.idf_intervals.

Note on the BE9439 outliers in the example corpus: Two files (BE9439_20200713131747.IDFW and BE9439_20200713124251.IDFH) are Series III Blastware binaries, not Thor. Provenance: TMI tried to use Thor to manage auto-call-homes for Series III units; the experiment didn't work out, but it did leave a few BW event files in Thor's per-serial directory structure with .IDFW/.IDFH extensions — Thor's forwarder applied its own naming convention to the BW bodies it was relaying. Their header 10 00 01 80 00 00 Instantel STRT ff fe <end_key> <start_key> is the BW SUB 5A STRT record, not a Thor body preamble. The reader detects them by signature and raises NotImplementedError pointing callers at read_blastware_file(), which extracts BW-format peaks from them.

Still NYI for Thor IDFH: per-channel int16 field4 (possibly time-of-peak); the two uint16 fields (probably PVS contributions); 8-byte interval tail (PVS data); mic dB(L) exact conversion constant.

Codec breakthroughs (2026-05-28)

  • Body offset is a fixed 0x0f1f across 151/154 corpus IDFW files. Preceded by a 4-byte record-type marker (46 00 00 00)
    • magic preamble 00 02 00 [Tran[0] BE] [Tran[1] BE].
  • Sample stream is BW's segment-rotated block codec verbatim. Thor reuses 10 NN (nibble), 20 NN (int8), 00 NN (RLE), 30 NN (packed12), 40 02 (segment header) tags with the same semantics. Channel rotation Tran→Vert→Long→MicL.
  • Geo LSB = 0.0003 in/s (not BW's 0.005), because Thor's 16-bit ADC range maps to 10 in/s without the 16-count BW quantization step.
  • Mic ≈ 2.14×10⁻⁶ psi/count (rough scale; refine after channel block calibration constants are decoded).
  • BW compliance anchor \xbe\x80\x00\x00\x00\x00 reappears at IDFW offset 0x952 — sample_rate at anchor6 (uint16 BE), record_time at anchor+6 (float32 BE), same layout as BW.
  • Event timestamp at offset 0x97A — 8 bytes [day][month] [year_be][unk][hour][min][sec]. Stop-time mirrors at 0x982.
  • Serial as null-terminated ASCII at 0x14E.
  • Calibration date at 0x1940x197 (day, month, year_be).
  • Per-sample residual drift of ~3% suggests Thor encodes int8/nibble deltas with an extra refinement bit that BW doesn't carry — unsolved; errors resync within a few samples so cumulative impact is small.

File model

Filename convention

<SERIAL>_<YYYYMMDDHHMMSS>.<KIND>
  • SERIAL — literal device serial, two-letter prefix + numeric suffix. Examples seen: UM11719, UM13981, UM20147, BE9439. Unlike Series III BW filenames (M529LK44.AB0, base-36 stem), Series IV filenames carry the serial in plain text.
  • YYYYMMDDHHMMSS — 14-char ASCII timestamp in device local time (no timezone marker).
  • KINDIDFH for histograms, IDFW for waveforms.

The .IDFH.txt / .IDFW.txt ASCII sidecar lives in a TXT/ subfolder of the unit's directory, not alongside the binary. This pairing convention is encoded in event_forwarder.idf_report_path().

Directory layout

C:\THORDATA\
└── <Project>\
    └── <UM####>\                  ← unit serial dir
        ├── UM12345_20260520100000.MLG     ← monitor log (not events)
        ├── UM12345_20260520100000.IDFH    ← histogram event (binary)
        ├── UM12345_20260520100000.IDFW    ← waveform event (binary)
        ├── UM12345_20260520100000.IDFW.CDB ← cache-DB variant (skip)
        ├── TXT\
        │   ├── UM12345_20260520100000.IDFH.txt    ← histogram ASCII sidecar
        │   └── UM12345_20260520100000.IDFW.txt    ← waveform  ASCII sidecar
        ├── CSV\, HTML\, PDF\, XML\        ← operator-facing derived exports
        └── ...

The .IDFW.CDB files share the binary's basename but appear to be a separate cache/database variant. Their first 8 bytes match the old-firmware Thor signature (see below) regardless of which signature the paired .IDFW uses. Purpose unknown; sizes vary wildly (observed 123 B → 40,491 B). Thor-watcher's forwarder deliberately skips them.

Sample corpus

The thor-watcher/example-data/THORDATA_example/ tree carries 1,014 paired .IDFW / .IDFH + .txt files spanning 20202023 across nine units (UM11719, UM13981, UM20147, …, plus BE9439 from 2020). This is the reverse-engineering ground truth.


ASCII sidecar (.IDFW.txt / .IDFH.txt) — fully decoded

Shape: plain text, one "Key : Value" line per metadata field, followed for waveforms by a tab-separated sample table headed by the literal line Waveform Data Channels. Parsed by micromate/idf_ascii_report.py. See micromate/models.py for the typed IdfReport shape.

Notable conventions

  • Units are native to Thor — geophone in in/s, microphone in dB(L) (not psi like Series III BW reports), frequency in Hz, acceleration in g, displacement in in.
  • Below-threshold readings appear as the literal string <0.005 in/s (155 occurrences in the sample corpus) — the parser strips the < and treats the numeric remainder as the value.
  • Out-of-range / not-measured values appear as N/A — parser drops the field rather than letting the string leak into a numeric column.
  • Firmware string observed: Micromate ISEE 11.0AK.
  • TitleString1..4 are operator-defined free-text slots; Thor's default labels map them to Location / Client / Company / Notes, which the parser surfaces as project / client / operator / notes.
  • Histogram sidecars use HistogramStartDate / HistogramStartTime in place of waveform's EventDate / EventTime. Parser falls through to either.
  • Histogram tabular block lacks the Waveform Data Channels marker; instead it's a multi-line column header followed by per-interval rows (<date> <time> <tran-ppv> <freq> ...). Parser silently ignores lines after the metadata block since they lack a colon-separated key : value shape (the timestamps DO contain colons but produce garbage keys that don't collide with any recognised field).

Binary header signatures (observed)

Hex dump of the first 32 bytes across 1,014 sample files reveals two distinct file signatures, both anchored by the literal ASCII string "\x00Instantel\x00" at offset 616:

Signature A — newer firmware (1,012 files, 99.8% of corpus)

00000000: 0012 0100 0000 496e 7374 616e 7465 6c00   ......Instantel.
00000010: 0000 a695 002e b500 4f70 6572 6174 6f72   ........Operator
                                ^^^^^^^^^^^^^^^^
                                operator/title string starts at 0x18

Header bytes 05: 00 12 01 00 00 00. Followed immediately by the 8-byte ASCII tag, then 6 unknown bytes, then ASCII operator-supplied strings (Operator name, etc.) and on through the project / client / title strings. No STRT record observed in this layout.

Signature B — older firmware (2 files: BE9439 from 2020)

00000000: 1000 0180 0000 496e 7374 616e 7465 6c00   ......Instantel.
00000010: 072c 0012 0300 5354 5254 fffe 0111 2340   .,....STRT....#@
                          ^^^^^^^^^                ^^^^^^^^^
                          STRT magic               4-byte end_key
00000020: 0111 0000 2e5f 00ac 4600 0000 0200 0000   ....._..F.......
          ^^^^^^^^^             ^^^
          4-byte start_key      0x46 (BW WAVEHDR record-type marker)

Header bytes 05: 10 00 01 80 00 00. The structure after the Instantel magic is byte-for-byte identical to a BW SUB 5A probe-response STRT record as documented in instantel_protocol_reference.md → "SUB 5A — STRT record encodes end_offset". Specifically:

Offset Bytes Meaning (per BW reference)
0x14 53 54 52 54 STRT magic
0x18 ff fe STRT sentinel
0x1A 01 11 23 40 end_key (4 bytes)
0x1E 01 11 00 00 start_key (4 bytes)
0x26 46 0x46 waveform-record type marker

Hypothesis: Older Micromate firmware writes a wrapped BW-format event into the .IDFW file — essentially the same on-disk shape as a Series III device, with the new filename convention applied at export time. Newer firmware (signature A) abandoned the BW-compatible layout for an Instantel-specific format.

If that hypothesis holds, the 2 signature-B files can already be parsed via minimateplus/event_file_io.read_blastware_file() — worth testing. The 1,012 signature-A files are the real reverse-engineering target.

.IDFW.CDB cache files

Always carry signature B (10 00 01 80 ...), even when the paired .IDFW carries signature A. Plausible explanation: the CDB is an internal Thor cache-database export that retains the legacy BW-style record layout regardless of the user-facing .IDFW format version. Not currently consumed by the forwarder.


File-size patterns (Signature A, the main target)

Survey of 1,012 signature-A files:

Event type Typical size Source of variance
.IDFW 2-sec 9,200 10,500 B Operator-supplied strings (TitleString1..4) of varying length
.IDFH 2,944 4,076 B Histogram interval count (record duration / interval)

Naive arithmetic for 2-sec waveform:

  • 4 channels × 2 sec × 1024 sps = 8,192 samples
  • At 2 bytes/sample (int16) = 16,384 sample bytes → file would be > 16 KB
  • Observed: ~910 KB
  • → samples are likely 1 byte each (int8 quantised), or stored with bit-packing / delta encoding, or only one channel's full-rate samples are stored with the others reconstructed arithmetically. Verifying this is the first RE milestone.

Project-stringlength variance (~1 KB across the corpus) is consistent with the file carrying a single copy of each TitleString1..4 plus operator + setup-name as null-padded ASCII regions.


Open questions

The reverse-engineering targets, roughly in dependency order:

  1. Sample encoding (signature A) — int8? int16 LE/BE? Bit-packed? Delta-coded? Per-channel interleaved or sequential blocks?
  2. Header field layout (signature A) — where do sample_rate, record_time, channel count, and per-channel peaks live in the binary? The ASCII sidecar gives the device-authoritative values, so binary fields can be confirmed by diff.
  3. Operator-string offsetsOperator at 0x18 is the first visible string in signature-A files; the rest (project, client, notes, setup) follow. Need to map exact offsets and null-padding conventions.
  4. Signature-B → BW codec compatibility — does minimateplus/event_file_io.read_blastware_file() actually parse the 2 BE9439 signature-B files as-is? If yes, the OLD-format ingest is free.
  5. .IDFW.CDB purpose — is it an internal Thor cache, a ring-buffer dump, or something else? Worth a single small effort to characterise so we know what we're skipping.
  6. Footer / checksum — every BW event file has a footer; does IDF? Where does the per-channel sample block end?

Reverse-engineering playbook (when we start)

The Series III BW codec took ~2 months of MITM wire captures because we didn't have ground-truth metadata. Thor's situation is substantially better:

  • Ground truth is on disk. Every binary in example-data/ has a paired .IDFW.txt carrying the full decoded sample table (Waveform Data Channels block — see any sample file in thor-watcher/example-data/.../TXT/). Aligning binary bytes to the table's float-per-row values gives an immediate per-byte hypothesis test.
  • Cross-event diffing. 1,012 signature-A samples from 9 units spanning 4 years means any field that varies between events is immediately localisable. Fields that are constant across all files (firmware ID, channel labels, format-version word) are also immediately localisable by complementary search.
  • No protocol surface. Files at rest, not a wire dialect. No DLE stuffing, no inner-frame parsing, no probe/data two-step.

Suggested first session (2-4 hours): hand-decode UM11719_20231219162723.IDFW (10,290 bytes) against its TXT/UM11719_20231219162723.IDFW.txt sample table (the 2-sec waveform at 1024 sps × 4 channels = 8,192 sample rows). Find the first per-channel sample value (0.0003 in the Tran column at t=0) in the binary. Confirms sample encoding. Everything else flows from there.


Code seams ready to receive the codec

When the codec lands, it goes into micromate/idf_file.py (currently a stub raising NotImplementedError). Public API:

from micromate import IdfEvent
from micromate.idf_file import read_idf_file

event: IdfEvent = read_idf_file(Path("UM11719_20231219163444.IDFW"))
# event.peaks.transverse_ips, event.timestamp, event.raw_samples, ...

The ingest pipeline (WaveformStore.save_imported_idf) currently builds the IdfEvent from the .txt parser only. Once read_idf_file() works, the binary becomes authoritative; the .txt parser drops to fast-path metadata cross-check. Operators who don't enable Thor's TXT exporter still get fully populated events.


See also