Tighten the Series III / Series IV boundary so UI and storage dispatch
on a clean signal instead of sniffing filenames or applying magnitude
heuristics.
Phase 1 — events.device_family column ("series3" | "series4"):
self-applying migration with filename-based backfill of existing rows
(1,132 backfilled on prod 2026-05-20); plumbed through every import
path (BW endpoint, IDF endpoint, ACH server, BW CLI, sidecar
backfill); UPSERT preserves via COALESCE; UI dispatches on it.
Phase 2 — extract micromate/ package alongside minimateplus/:
native IdfEvent / IdfReport / IdfPeaks / IdfProjectInfo /
IdfSensorCheck (mic in dB(L), not pseudo-psi); moved
idf_ascii_report.py from sfm/ to micromate/; refactored
save_imported_idf to use IdfEvent and bridge to minimateplus.Event at
the SQL-insert boundary; idf_file.py stub for the future binary codec.
Phase 3 prep — docs/idf_protocol_reference.md captures the two
observed Thor binary header signatures (1,012 newer-firmware files vs
2 old files whose layout is byte-for-byte BW-STRT-compatible), file-size
hints suggesting int8 sample encoding, open questions in dependency
order, and a concrete first-session plan for cracking the codec.
Also rolled in the v0.18.1 hotfixes that motivated this work:
- idf_ascii_report parser now handles "<0.005 in/s" (below-threshold)
and "N/A" markers without leaving raw strings in numeric DB columns.
- sfm_webapp.html: defensive _ppvFmt / mic formatter so future
data-shape drift can't kill the whole events table render.
All 1,014 example-data sidecars round-trip through the new package.
See CHANGELOG.md for full notes.
12 KiB
IDF Protocol Reference — Thor / Micromate Series IV
Starting-point reference for reverse-engineering Instantel's Micromate Series IV event-file format. Sibling to instantel_protocol_reference.md (the Series III "Rosetta Stone") — this doc holds what we know so far and the open questions still to crack.
Status (2026-05-20): ASCII text sidecar fully decoded (1,014
sample files round-trip). Binary .IDFH / .IDFW codec
not yet implemented — binaries are stored opaquely by
WaveformStore.save_imported_idf, with metadata sourced from the
paired .txt sidecar.
File model
Filename convention
<SERIAL>_<YYYYMMDDHHMMSS>.<KIND>
- SERIAL — literal device serial, two-letter prefix + numeric
suffix. Examples seen:
UM11719,UM13981,UM20147,BE9439. Unlike Series III BW filenames (M529LK44.AB0, base-36 stem), Series IV filenames carry the serial in plain text. - YYYYMMDDHHMMSS — 14-char ASCII timestamp in device local time (no timezone marker).
- KIND —
IDFHfor histograms,IDFWfor waveforms.
The .IDFH.txt / .IDFW.txt ASCII sidecar lives in a TXT/
subfolder of the unit's directory, not alongside the binary.
This pairing convention is encoded in
event_forwarder.idf_report_path().
Directory layout
C:\THORDATA\
└── <Project>\
└── <UM####>\ ← unit serial dir
├── UM12345_20260520100000.MLG ← monitor log (not events)
├── UM12345_20260520100000.IDFH ← histogram event (binary)
├── UM12345_20260520100000.IDFW ← waveform event (binary)
├── UM12345_20260520100000.IDFW.CDB ← cache-DB variant (skip)
├── TXT\
│ ├── UM12345_20260520100000.IDFH.txt ← histogram ASCII sidecar
│ └── UM12345_20260520100000.IDFW.txt ← waveform ASCII sidecar
├── CSV\, HTML\, PDF\, XML\ ← operator-facing derived exports
└── ...
The .IDFW.CDB files share the binary's basename but appear to be a
separate cache/database variant. Their first 8 bytes match the
old-firmware Thor signature (see below) regardless of which
signature the paired .IDFW uses. Purpose unknown; sizes vary
wildly (observed 123 B → 40,491 B). Thor-watcher's forwarder
deliberately skips them.
Sample corpus
The thor-watcher/example-data/THORDATA_example/ tree carries
1,014 paired .IDFW / .IDFH + .txt files spanning 2020–2023
across nine units (UM11719, UM13981, UM20147, …, plus BE9439 from
2020). This is the reverse-engineering ground truth.
ASCII sidecar (.IDFW.txt / .IDFH.txt) — fully decoded
Shape: plain text, one "Key : Value" line per metadata field,
followed for waveforms by a tab-separated sample table headed by
the literal line Waveform Data Channels. Parsed by
micromate/idf_ascii_report.py.
See micromate/models.py for the typed
IdfReport shape.
Notable conventions
- Units are native to Thor — geophone in in/s, microphone in dB(L) (not psi like Series III BW reports), frequency in Hz, acceleration in g, displacement in in.
- Below-threshold readings appear as the literal string
<0.005 in/s(155 occurrences in the sample corpus) — the parser strips the<and treats the numeric remainder as the value. - Out-of-range / not-measured values appear as
N/A— parser drops the field rather than letting the string leak into a numeric column. - Firmware string observed:
Micromate ISEE 11.0AK. - TitleString1..4 are operator-defined free-text slots; Thor's
default labels map them to Location / Client / Company / Notes,
which the parser surfaces as
project/client/operator/notes. - Histogram sidecars use
HistogramStartDate/HistogramStartTimein place of waveform'sEventDate/EventTime. Parser falls through to either. - Histogram tabular block lacks the
Waveform Data Channelsmarker; instead it's a multi-line column header followed by per-interval rows (<date> <time> <tran-ppv> <freq> ...). Parser silently ignores lines after the metadata block since they lack a colon-separatedkey : valueshape (the timestamps DO contain colons but produce garbage keys that don't collide with any recognised field).
Binary header signatures (observed)
Hex dump of the first 32 bytes across 1,014 sample files reveals
two distinct file signatures, both anchored by the literal
ASCII string "\x00Instantel\x00" at offset 6–16:
Signature A — newer firmware (1,012 files, 99.8% of corpus)
00000000: 0012 0100 0000 496e 7374 616e 7465 6c00 ......Instantel.
00000010: 0000 a695 002e b500 4f70 6572 6174 6f72 ........Operator
^^^^^^^^^^^^^^^^
operator/title string starts at 0x18
Header bytes 0–5: 00 12 01 00 00 00. Followed immediately by the
8-byte ASCII tag, then 6 unknown bytes, then ASCII operator-supplied
strings (Operator name, etc.) and on through the project / client /
title strings. No STRT record observed in this layout.
Signature B — older firmware (2 files: BE9439 from 2020)
00000000: 1000 0180 0000 496e 7374 616e 7465 6c00 ......Instantel.
00000010: 072c 0012 0300 5354 5254 fffe 0111 2340 .,....STRT....#@
^^^^^^^^^ ^^^^^^^^^
STRT magic 4-byte end_key
00000020: 0111 0000 2e5f 00ac 4600 0000 0200 0000 ....._..F.......
^^^^^^^^^ ^^^
4-byte start_key 0x46 (BW WAVEHDR record-type marker)
Header bytes 0–5: 10 00 01 80 00 00. The structure after the
Instantel magic is byte-for-byte identical to a BW SUB 5A
probe-response STRT record as documented in
instantel_protocol_reference.md → "SUB 5A — STRT record encodes
end_offset". Specifically:
| Offset | Bytes | Meaning (per BW reference) |
|---|---|---|
| 0x14 | 53 54 52 54 |
STRT magic |
| 0x18 | ff fe |
STRT sentinel |
| 0x1A | 01 11 23 40 |
end_key (4 bytes) |
| 0x1E | 01 11 00 00 |
start_key (4 bytes) |
| 0x26 | 46 |
0x46 waveform-record type marker |
Hypothesis: Older Micromate firmware writes a wrapped BW-format
event into the .IDFW file — essentially the same on-disk shape as
a Series III device, with the new filename convention applied at
export time. Newer firmware (signature A) abandoned the
BW-compatible layout for an Instantel-specific format.
If that hypothesis holds, the 2 signature-B files can already be
parsed via minimateplus/event_file_io.read_blastware_file() — worth
testing. The 1,012 signature-A files are the real reverse-engineering
target.
.IDFW.CDB cache files
Always carry signature B (10 00 01 80 ...), even when the paired
.IDFW carries signature A. Plausible explanation: the CDB is an
internal Thor cache-database export that retains the legacy BW-style
record layout regardless of the user-facing .IDFW format version.
Not currently consumed by the forwarder.
File-size patterns (Signature A, the main target)
Survey of 1,012 signature-A files:
| Event type | Typical size | Source of variance |
|---|---|---|
.IDFW 2-sec |
9,200 – 10,500 B | Operator-supplied strings (TitleString1..4) of varying length |
.IDFH |
2,944 – 4,076 B | Histogram interval count (record duration / interval) |
Naive arithmetic for 2-sec waveform:
- 4 channels × 2 sec × 1024 sps = 8,192 samples
- At 2 bytes/sample (int16) = 16,384 sample bytes → file would be > 16 KB
- Observed: ~9–10 KB
- → samples are likely 1 byte each (int8 quantised), or stored with bit-packing / delta encoding, or only one channel's full-rate samples are stored with the others reconstructed arithmetically. Verifying this is the first RE milestone.
Project-string–length variance (~1 KB across the corpus) is consistent with the file carrying a single copy of each TitleString1..4 plus operator + setup-name as null-padded ASCII regions.
Open questions
The reverse-engineering targets, roughly in dependency order:
- Sample encoding (signature A) — int8? int16 LE/BE? Bit-packed? Delta-coded? Per-channel interleaved or sequential blocks?
- Header field layout (signature A) — where do sample_rate, record_time, channel count, and per-channel peaks live in the binary? The ASCII sidecar gives the device-authoritative values, so binary fields can be confirmed by diff.
- Operator-string offsets —
Operatorat 0x18 is the first visible string in signature-A files; the rest (project, client, notes, setup) follow. Need to map exact offsets and null-padding conventions. - Signature-B → BW codec compatibility — does
minimateplus/event_file_io.read_blastware_file()actually parse the 2 BE9439 signature-B files as-is? If yes, the OLD-format ingest is free. .IDFW.CDBpurpose — is it an internal Thor cache, a ring-buffer dump, or something else? Worth a single small effort to characterise so we know what we're skipping.- Footer / checksum — every BW event file has a footer; does IDF? Where does the per-channel sample block end?
Reverse-engineering playbook (when we start)
The Series III BW codec took ~2 months of MITM wire captures because we didn't have ground-truth metadata. Thor's situation is substantially better:
- Ground truth is on disk. Every binary in
example-data/has a paired.IDFW.txtcarrying the full decoded sample table (Waveform Data Channelsblock — see any sample file inthor-watcher/example-data/.../TXT/). Aligning binary bytes to the table's float-per-row values gives an immediate per-byte hypothesis test. - Cross-event diffing. 1,012 signature-A samples from 9 units spanning 4 years means any field that varies between events is immediately localisable. Fields that are constant across all files (firmware ID, channel labels, format-version word) are also immediately localisable by complementary search.
- No protocol surface. Files at rest, not a wire dialect. No DLE stuffing, no inner-frame parsing, no probe/data two-step.
Suggested first session (2-4 hours): hand-decode UM11719_20231219162723.IDFW
(10,290 bytes) against its TXT/UM11719_20231219162723.IDFW.txt
sample table (the 2-sec waveform at 1024 sps × 4 channels = 8,192
sample rows). Find the first per-channel sample value (0.0003 in
the Tran column at t=0) in the binary. Confirms sample encoding.
Everything else flows from there.
Code seams ready to receive the codec
When the codec lands, it goes into
micromate/idf_file.py (currently a
stub raising NotImplementedError). Public API:
from micromate import IdfEvent
from micromate.idf_file import read_idf_file
event: IdfEvent = read_idf_file(Path("UM11719_20231219163444.IDFW"))
# event.peaks.transverse_ips, event.timestamp, event.raw_samples, ...
The ingest pipeline (WaveformStore.save_imported_idf) currently
builds the IdfEvent from the .txt parser only. Once
read_idf_file() works, the binary becomes authoritative; the
.txt parser drops to fast-path metadata cross-check. Operators
who don't enable Thor's TXT exporter still get fully populated
events.
See also
- instantel_protocol_reference.md — Series III BW protocol reference (the Rosetta Stone). STRT record format, DLE framing, BW filename encoding.
micromate/idf_ascii_report.py—.txtsidecar parser.micromate/models.py—IdfEvent,IdfReporttyped dataclasses.micromate/idf_file.py— placeholder for the binary codec.thor-watcher/example-data/THORDATA_example/— 1,014 paired binary + .txt files for codec validation.