ecc935482b
Tighten the Series III / Series IV boundary so UI and storage dispatch
on a clean signal instead of sniffing filenames or applying magnitude
heuristics.
Phase 1 — events.device_family column ("series3" | "series4"):
self-applying migration with filename-based backfill of existing rows
(1,132 backfilled on prod 2026-05-20); plumbed through every import
path (BW endpoint, IDF endpoint, ACH server, BW CLI, sidecar
backfill); UPSERT preserves via COALESCE; UI dispatches on it.
Phase 2 — extract micromate/ package alongside minimateplus/:
native IdfEvent / IdfReport / IdfPeaks / IdfProjectInfo /
IdfSensorCheck (mic in dB(L), not pseudo-psi); moved
idf_ascii_report.py from sfm/ to micromate/; refactored
save_imported_idf to use IdfEvent and bridge to minimateplus.Event at
the SQL-insert boundary; idf_file.py stub for the future binary codec.
Phase 3 prep — docs/idf_protocol_reference.md captures the two
observed Thor binary header signatures (1,012 newer-firmware files vs
2 old files whose layout is byte-for-byte BW-STRT-compatible), file-size
hints suggesting int8 sample encoding, open questions in dependency
order, and a concrete first-session plan for cracking the codec.
Also rolled in the v0.18.1 hotfixes that motivated this work:
- idf_ascii_report parser now handles "<0.005 in/s" (below-threshold)
and "N/A" markers without leaving raw strings in numeric DB columns.
- sfm_webapp.html: defensive _ppvFmt / mic formatter so future
data-shape drift can't kill the whole events table render.
All 1,014 example-data sidecars round-trip through the new package.
See CHANGELOG.md for full notes.
285 lines
12 KiB
Markdown
285 lines
12 KiB
Markdown
# IDF Protocol Reference — Thor / Micromate Series IV
|
||
|
||
Starting-point reference for reverse-engineering Instantel's Micromate
|
||
Series IV event-file format. Sibling to
|
||
[instantel_protocol_reference.md](instantel_protocol_reference.md) (the
|
||
Series III "Rosetta Stone") — this doc holds what we know so far and
|
||
the open questions still to crack.
|
||
|
||
**Status (2026-05-20):** ASCII text sidecar fully decoded (1,014
|
||
sample files round-trip). Binary `.IDFH` / `.IDFW` codec
|
||
**not yet implemented** — binaries are stored opaquely by
|
||
`WaveformStore.save_imported_idf`, with metadata sourced from the
|
||
paired `.txt` sidecar.
|
||
|
||
---
|
||
|
||
## File model
|
||
|
||
### Filename convention
|
||
|
||
```
|
||
<SERIAL>_<YYYYMMDDHHMMSS>.<KIND>
|
||
```
|
||
|
||
- **SERIAL** — literal device serial, two-letter prefix + numeric
|
||
suffix. Examples seen: `UM11719`, `UM13981`, `UM20147`, `BE9439`.
|
||
Unlike Series III BW filenames (`M529LK44.AB0`, base-36 stem),
|
||
Series IV filenames carry the serial in plain text.
|
||
- **YYYYMMDDHHMMSS** — 14-char ASCII timestamp in **device local
|
||
time** (no timezone marker).
|
||
- **KIND** — `IDFH` for histograms, `IDFW` for waveforms.
|
||
|
||
The `.IDFH.txt` / `.IDFW.txt` ASCII sidecar lives in a `TXT/`
|
||
**subfolder** of the unit's directory, not alongside the binary.
|
||
This pairing convention is encoded in
|
||
`event_forwarder.idf_report_path()`.
|
||
|
||
### Directory layout
|
||
|
||
```
|
||
C:\THORDATA\
|
||
└── <Project>\
|
||
└── <UM####>\ ← unit serial dir
|
||
├── UM12345_20260520100000.MLG ← monitor log (not events)
|
||
├── UM12345_20260520100000.IDFH ← histogram event (binary)
|
||
├── UM12345_20260520100000.IDFW ← waveform event (binary)
|
||
├── UM12345_20260520100000.IDFW.CDB ← cache-DB variant (skip)
|
||
├── TXT\
|
||
│ ├── UM12345_20260520100000.IDFH.txt ← histogram ASCII sidecar
|
||
│ └── UM12345_20260520100000.IDFW.txt ← waveform ASCII sidecar
|
||
├── CSV\, HTML\, PDF\, XML\ ← operator-facing derived exports
|
||
└── ...
|
||
```
|
||
|
||
The `.IDFW.CDB` files share the binary's basename but appear to be a
|
||
separate cache/database variant. Their first 8 bytes match the
|
||
**old**-firmware Thor signature (see below) regardless of which
|
||
signature the paired `.IDFW` uses. Purpose unknown; sizes vary
|
||
wildly (observed 123 B → 40,491 B). Thor-watcher's forwarder
|
||
deliberately skips them.
|
||
|
||
### Sample corpus
|
||
|
||
The `thor-watcher/example-data/THORDATA_example/` tree carries
|
||
**1,014 paired .IDFW / .IDFH + .txt files** spanning 2020–2023
|
||
across nine units (UM11719, UM13981, UM20147, …, plus BE9439 from
|
||
2020). This is the reverse-engineering ground truth.
|
||
|
||
---
|
||
|
||
## ASCII sidecar (`.IDFW.txt` / `.IDFH.txt`) — fully decoded
|
||
|
||
Shape: plain text, one `"Key : Value"` line per metadata field,
|
||
followed for waveforms by a tab-separated sample table headed by
|
||
the literal line `Waveform Data Channels`. Parsed by
|
||
[`micromate/idf_ascii_report.py`](../micromate/idf_ascii_report.py).
|
||
See [`micromate/models.py`](../micromate/models.py) for the typed
|
||
`IdfReport` shape.
|
||
|
||
### Notable conventions
|
||
|
||
- **Units are native to Thor** — geophone in **in/s**, microphone in
|
||
**dB(L)** (not psi like Series III BW reports), frequency in Hz,
|
||
acceleration in g, displacement in in.
|
||
- **Below-threshold readings** appear as the literal string
|
||
`<0.005 in/s` (155 occurrences in the sample corpus) — the parser
|
||
strips the `<` and treats the numeric remainder as the value.
|
||
- **Out-of-range / not-measured** values appear as `N/A` — parser
|
||
drops the field rather than letting the string leak into a numeric
|
||
column.
|
||
- **Firmware string** observed: `Micromate ISEE 11.0AK`.
|
||
- **TitleString1..4** are operator-defined free-text slots; Thor's
|
||
default labels map them to Location / Client / Company / Notes,
|
||
which the parser surfaces as `project` / `client` / `operator` /
|
||
`notes`.
|
||
- **Histogram sidecars** use `HistogramStartDate` / `HistogramStartTime`
|
||
in place of waveform's `EventDate` / `EventTime`. Parser falls
|
||
through to either.
|
||
- **Histogram tabular block** lacks the `Waveform Data Channels`
|
||
marker; instead it's a multi-line column header followed by
|
||
per-interval rows (`<date> <time> <tran-ppv> <freq> ...`). Parser
|
||
silently ignores lines after the metadata block since they lack a
|
||
colon-separated `key : value` shape (the timestamps DO contain
|
||
colons but produce garbage keys that don't collide with any
|
||
recognised field).
|
||
|
||
---
|
||
|
||
## Binary header signatures (observed)
|
||
|
||
Hex dump of the first 32 bytes across 1,014 sample files reveals
|
||
**two distinct file signatures**, both anchored by the literal
|
||
ASCII string `"\x00Instantel\x00"` at offset 6–16:
|
||
|
||
### Signature A — newer firmware (1,012 files, 99.8% of corpus)
|
||
|
||
```
|
||
00000000: 0012 0100 0000 496e 7374 616e 7465 6c00 ......Instantel.
|
||
00000010: 0000 a695 002e b500 4f70 6572 6174 6f72 ........Operator
|
||
^^^^^^^^^^^^^^^^
|
||
operator/title string starts at 0x18
|
||
```
|
||
|
||
Header bytes 0–5: `00 12 01 00 00 00`. Followed immediately by the
|
||
8-byte ASCII tag, then 6 unknown bytes, then ASCII operator-supplied
|
||
strings (Operator name, etc.) and on through the project / client /
|
||
title strings. No `STRT` record observed in this layout.
|
||
|
||
### Signature B — older firmware (2 files: BE9439 from 2020)
|
||
|
||
```
|
||
00000000: 1000 0180 0000 496e 7374 616e 7465 6c00 ......Instantel.
|
||
00000010: 072c 0012 0300 5354 5254 fffe 0111 2340 .,....STRT....#@
|
||
^^^^^^^^^ ^^^^^^^^^
|
||
STRT magic 4-byte end_key
|
||
00000020: 0111 0000 2e5f 00ac 4600 0000 0200 0000 ....._..F.......
|
||
^^^^^^^^^ ^^^
|
||
4-byte start_key 0x46 (BW WAVEHDR record-type marker)
|
||
```
|
||
|
||
Header bytes 0–5: `10 00 01 80 00 00`. The structure after the
|
||
`Instantel` magic is **byte-for-byte identical to a BW SUB 5A
|
||
probe-response STRT record** as documented in
|
||
[instantel_protocol_reference.md → "SUB 5A — STRT record encodes
|
||
end_offset"](instantel_protocol_reference.md). Specifically:
|
||
|
||
| Offset | Bytes | Meaning (per BW reference) |
|
||
|--------|---------------------|--------------------------------------|
|
||
| 0x14 | `53 54 52 54` | `STRT` magic |
|
||
| 0x18 | `ff fe` | STRT sentinel |
|
||
| 0x1A | `01 11 23 40` | `end_key` (4 bytes) |
|
||
| 0x1E | `01 11 00 00` | `start_key` (4 bytes) |
|
||
| 0x26 | `46` | `0x46` waveform-record type marker |
|
||
|
||
**Hypothesis:** Older Micromate firmware writes a wrapped BW-format
|
||
event into the `.IDFW` file — essentially the same on-disk shape as
|
||
a Series III device, with the new filename convention applied at
|
||
export time. Newer firmware (signature A) abandoned the
|
||
BW-compatible layout for an Instantel-specific format.
|
||
|
||
If that hypothesis holds, the 2 signature-B files can already be
|
||
parsed via `minimateplus/event_file_io.read_blastware_file()` — worth
|
||
testing. The 1,012 signature-A files are the real reverse-engineering
|
||
target.
|
||
|
||
### `.IDFW.CDB` cache files
|
||
|
||
Always carry signature B (`10 00 01 80 ...`), even when the paired
|
||
`.IDFW` carries signature A. Plausible explanation: the CDB is an
|
||
internal Thor cache-database export that retains the legacy BW-style
|
||
record layout regardless of the user-facing `.IDFW` format version.
|
||
Not currently consumed by the forwarder.
|
||
|
||
---
|
||
|
||
## File-size patterns (Signature A, the main target)
|
||
|
||
Survey of 1,012 signature-A files:
|
||
|
||
| Event type | Typical size | Source of variance |
|
||
|--------------|-------------------|----------------------------------------------|
|
||
| `.IDFW` 2-sec | 9,200 – 10,500 B | Operator-supplied strings (TitleString1..4) of varying length |
|
||
| `.IDFH` | 2,944 – 4,076 B | Histogram interval count (record duration / interval) |
|
||
|
||
**Naive arithmetic for 2-sec waveform:**
|
||
- 4 channels × 2 sec × 1024 sps = 8,192 samples
|
||
- At 2 bytes/sample (int16) = 16,384 sample bytes → file would be > 16 KB
|
||
- Observed: ~9–10 KB
|
||
- → samples are likely **1 byte each** (int8 quantised), **or** stored
|
||
with bit-packing / delta encoding, **or** only one channel's
|
||
full-rate samples are stored with the others reconstructed
|
||
arithmetically. Verifying this is the **first RE milestone**.
|
||
|
||
Project-string–length variance (~1 KB across the corpus) is consistent
|
||
with the file carrying a single copy of each TitleString1..4 plus
|
||
operator + setup-name as null-padded ASCII regions.
|
||
|
||
---
|
||
|
||
## Open questions
|
||
|
||
The reverse-engineering targets, roughly in dependency order:
|
||
|
||
1. **Sample encoding (signature A)** — int8? int16 LE/BE? Bit-packed?
|
||
Delta-coded? Per-channel interleaved or sequential blocks?
|
||
2. **Header field layout (signature A)** — where do sample_rate,
|
||
record_time, channel count, and per-channel peaks live in the
|
||
binary? The ASCII sidecar gives the device-authoritative values,
|
||
so binary fields can be confirmed by diff.
|
||
3. **Operator-string offsets** — `Operator` at 0x18 is the first
|
||
visible string in signature-A files; the rest (project, client,
|
||
notes, setup) follow. Need to map exact offsets and null-padding
|
||
conventions.
|
||
4. **Signature-B → BW codec compatibility** — does
|
||
`minimateplus/event_file_io.read_blastware_file()` actually parse
|
||
the 2 BE9439 signature-B files as-is? If yes, the OLD-format
|
||
ingest is free.
|
||
5. **`.IDFW.CDB` purpose** — is it an internal Thor cache, a
|
||
ring-buffer dump, or something else? Worth a single small effort
|
||
to characterise so we know what we're skipping.
|
||
6. **Footer / checksum** — every BW event file has a footer; does
|
||
IDF? Where does the per-channel sample block end?
|
||
|
||
---
|
||
|
||
## Reverse-engineering playbook (when we start)
|
||
|
||
The Series III BW codec took ~2 months of MITM wire captures
|
||
because we didn't have ground-truth metadata. Thor's situation is
|
||
**substantially better**:
|
||
|
||
- **Ground truth is on disk.** Every binary in `example-data/`
|
||
has a paired `.IDFW.txt` carrying the full decoded sample table
|
||
(`Waveform Data Channels` block — see any sample file in
|
||
`thor-watcher/example-data/.../TXT/`). Aligning binary bytes
|
||
to the table's float-per-row values gives an immediate per-byte
|
||
hypothesis test.
|
||
- **Cross-event diffing.** 1,012 signature-A samples from 9 units
|
||
spanning 4 years means any field that varies between events is
|
||
immediately localisable. Fields that are constant across all
|
||
files (firmware ID, channel labels, format-version word) are also
|
||
immediately localisable by complementary search.
|
||
- **No protocol surface.** Files at rest, not a wire dialect. No
|
||
DLE stuffing, no inner-frame parsing, no probe/data two-step.
|
||
|
||
Suggested first session (2-4 hours): hand-decode `UM11719_20231219162723.IDFW`
|
||
(10,290 bytes) against its `TXT/UM11719_20231219162723.IDFW.txt`
|
||
sample table (the 2-sec waveform at 1024 sps × 4 channels = 8,192
|
||
sample rows). Find the first per-channel sample value (`0.0003` in
|
||
the Tran column at t=0) in the binary. Confirms sample encoding.
|
||
Everything else flows from there.
|
||
|
||
---
|
||
|
||
## Code seams ready to receive the codec
|
||
|
||
When the codec lands, it goes into
|
||
[`micromate/idf_file.py`](../micromate/idf_file.py) (currently a
|
||
stub raising `NotImplementedError`). Public API:
|
||
|
||
```python
|
||
from micromate import IdfEvent
|
||
from micromate.idf_file import read_idf_file
|
||
|
||
event: IdfEvent = read_idf_file(Path("UM11719_20231219163444.IDFW"))
|
||
# event.peaks.transverse_ips, event.timestamp, event.raw_samples, ...
|
||
```
|
||
|
||
The ingest pipeline (`WaveformStore.save_imported_idf`) currently
|
||
builds the `IdfEvent` from the `.txt` parser only. Once
|
||
`read_idf_file()` works, the binary becomes authoritative; the
|
||
`.txt` parser drops to fast-path metadata cross-check. Operators
|
||
who don't enable Thor's TXT exporter still get fully populated
|
||
events.
|
||
|
||
---
|
||
|
||
## See also
|
||
|
||
- [instantel_protocol_reference.md](instantel_protocol_reference.md) — Series III BW protocol reference (the Rosetta Stone). STRT record format, DLE framing, BW filename encoding.
|
||
- [`micromate/idf_ascii_report.py`](../micromate/idf_ascii_report.py) — `.txt` sidecar parser.
|
||
- [`micromate/models.py`](../micromate/models.py) — `IdfEvent`, `IdfReport` typed dataclasses.
|
||
- [`micromate/idf_file.py`](../micromate/idf_file.py) — placeholder for the binary codec.
|
||
- [`thor-watcher/example-data/THORDATA_example/`](../../thor-watcher/example-data/) — 1,014 paired binary + .txt files for codec validation.
|