# IDF Protocol Reference — Thor / Micromate Series IV Starting-point reference for reverse-engineering Instantel's Micromate Series IV event-file format. Sibling to [instantel_protocol_reference.md](instantel_protocol_reference.md) (the Series III "Rosetta Stone") — this doc holds what we know so far and the open questions still to crack. **Status (2026-05-28):** ASCII text sidecar fully decoded (1,014 sample files round-trip). **Thor IDFW** binary now decodes via `micromate.idf_file.read_idf_file()` — reuses the BW segment-rotated block codec verbatim at fixed body offset `0x0f1f`; metadata (serial, timestamp, sample_rate, record_time, calibration_date) extracted from the binary header. Sample fidelity is 87–99% byte-exact on quiet events; loud events hit the BW codec's known walker-stops-early limitation. Residual ~3% drift on per-sample deltas (likely a Thor-specific 12-bit delta refinement not yet modelled). **Thor IDFH histograms also decoded.** Body has one or more segments; each 12-byte segment header `[length_be 2B][0a 00 00 00][00 NN][05 3f]` introduces `N = (length - 10) // 72` interval records of 72 bytes each. Each interval = 4 × 16-byte per-channel records: `[int16 min][int16 max][int16 ??][uint16 halfp][2B 00][uint16 ??][2B 00][uint16 ??]`. Geo peak `= max(|min|, |max|) / 32768 × 10` in/s (matches sidecar ~1.8%); freq `= 512 / halfp` Hz (None for halfp ≤ 5 → ">100" sentinel). Corpus: **all 859 Thor IDFH files decode, 181,071 intervals**. Wired through `read_idf_file()` → `save_imported_idf()` → sidecar's `extensions.idf_intervals`. **Note on the BE9439 outliers in the example corpus:** Two files (`BE9439_20200713131747.IDFW` and `BE9439_20200713124251.IDFH`) are **Series III Blastware** binaries, not Thor. Provenance: TMI tried to use Thor to manage auto-call-homes for Series III units; the experiment didn't work out, but it did leave a few BW event files in Thor's per-serial directory structure with `.IDFW`/`.IDFH` extensions — Thor's forwarder applied its own naming convention to the BW bodies it was relaying. Their header `10 00 01 80 00 00 Instantel STRT ff fe ` is the BW SUB 5A STRT record, not a Thor body preamble. The reader detects them by signature and raises `NotImplementedError` pointing callers at `read_blastware_file()`, which extracts BW-format peaks from them. **Still NYI for Thor IDFH:** per-channel `int16 field4` (possibly time-of-peak); the two uint16 fields (probably PVS contributions); 8-byte interval tail (PVS data); mic dB(L) exact conversion constant. ### Codec breakthroughs (2026-05-28) - **Body offset is a fixed `0x0f1f`** across 151/154 corpus IDFW files. Preceded by a 4-byte record-type marker (`46 00 00 00`) + magic preamble `00 02 00 [Tran[0] BE] [Tran[1] BE]`. - **Sample stream is BW's segment-rotated block codec verbatim.** Thor reuses `10 NN` (nibble), `20 NN` (int8), `00 NN` (RLE), `30 NN` (packed12), `40 02` (segment header) tags with the same semantics. Channel rotation Tran→Vert→Long→MicL. - **Geo LSB = 0.0003 in/s** (not BW's 0.005), because Thor's 16-bit ADC range maps to 10 in/s without the 16-count BW quantization step. - **Mic ≈ 2.14×10⁻⁶ psi/count** (rough scale; refine after channel block calibration constants are decoded). - **BW compliance anchor `\xbe\x80\x00\x00\x00\x00` reappears at IDFW offset 0x952** — sample_rate at anchor−6 (uint16 BE), record_time at anchor+6 (float32 BE), same layout as BW. - **Event timestamp at offset 0x97A** — 8 bytes `[day][month] [year_be][unk][hour][min][sec]`. Stop-time mirrors at 0x982. - **Serial as null-terminated ASCII at 0x14E**. - **Calibration date** at 0x194–0x197 (day, month, year_be). - Per-sample residual drift of ~3% suggests Thor encodes int8/nibble deltas with an extra refinement bit that BW doesn't carry — unsolved; errors resync within a few samples so cumulative impact is small. --- ## File model ### Filename convention ``` _. ``` - **SERIAL** — literal device serial, two-letter prefix + numeric suffix. Examples seen: `UM11719`, `UM13981`, `UM20147`, `BE9439`. Unlike Series III BW filenames (`M529LK44.AB0`, base-36 stem), Series IV filenames carry the serial in plain text. - **YYYYMMDDHHMMSS** — 14-char ASCII timestamp in **device local time** (no timezone marker). - **KIND** — `IDFH` for histograms, `IDFW` for waveforms. The `.IDFH.txt` / `.IDFW.txt` ASCII sidecar lives in a `TXT/` **subfolder** of the unit's directory, not alongside the binary. This pairing convention is encoded in `event_forwarder.idf_report_path()`. ### Directory layout ``` C:\THORDATA\ └── \ └── \ ← unit serial dir ├── UM12345_20260520100000.MLG ← monitor log (not events) ├── UM12345_20260520100000.IDFH ← histogram event (binary) ├── UM12345_20260520100000.IDFW ← waveform event (binary) ├── UM12345_20260520100000.IDFW.CDB ← cache-DB variant (skip) ├── TXT\ │ ├── UM12345_20260520100000.IDFH.txt ← histogram ASCII sidecar │ └── UM12345_20260520100000.IDFW.txt ← waveform ASCII sidecar ├── CSV\, HTML\, PDF\, XML\ ← operator-facing derived exports └── ... ``` The `.IDFW.CDB` files share the binary's basename but appear to be a separate cache/database variant. Their first 8 bytes match the **old**-firmware Thor signature (see below) regardless of which signature the paired `.IDFW` uses. Purpose unknown; sizes vary wildly (observed 123 B → 40,491 B). Thor-watcher's forwarder deliberately skips them. ### Sample corpus The `thor-watcher/example-data/THORDATA_example/` tree carries **1,014 paired .IDFW / .IDFH + .txt files** spanning 2020–2023 across nine units (UM11719, UM13981, UM20147, …, plus BE9439 from 2020). This is the reverse-engineering ground truth. --- ## ASCII sidecar (`.IDFW.txt` / `.IDFH.txt`) — fully decoded Shape: plain text, one `"Key : Value"` line per metadata field, followed for waveforms by a tab-separated sample table headed by the literal line `Waveform Data Channels`. Parsed by [`micromate/idf_ascii_report.py`](../micromate/idf_ascii_report.py). See [`micromate/models.py`](../micromate/models.py) for the typed `IdfReport` shape. ### Notable conventions - **Units are native to Thor** — geophone in **in/s**, microphone in **dB(L)** (not psi like Series III BW reports), frequency in Hz, acceleration in g, displacement in in. - **Below-threshold readings** appear as the literal string `<0.005 in/s` (155 occurrences in the sample corpus) — the parser strips the `<` and treats the numeric remainder as the value. - **Out-of-range / not-measured** values appear as `N/A` — parser drops the field rather than letting the string leak into a numeric column. - **Firmware string** observed: `Micromate ISEE 11.0AK`. - **TitleString1..4** are operator-defined free-text slots; Thor's default labels map them to Location / Client / Company / Notes, which the parser surfaces as `project` / `client` / `operator` / `notes`. - **Histogram sidecars** use `HistogramStartDate` / `HistogramStartTime` in place of waveform's `EventDate` / `EventTime`. Parser falls through to either. - **Histogram tabular block** lacks the `Waveform Data Channels` marker; instead it's a multi-line column header followed by per-interval rows (`