"""
waveform_codec.py — block-walker and partial decoder for the MiniMate Plus
waveform-file body.

PARTIAL REVERSE-ENGINEERING — last updated 2026-05-11.

The Blastware waveform-file body — the bytes between the 21-byte STRT
record and the 26-byte file footer — is NOT raw int16 LE samples (the
historical assumption that produced full-scale ±32K noise on every
event).  It is a tagged variable-length block stream with a custom
delta + RLE codec.

Current status:

- Block framing: ✅ solved (block types and lengths all confirmed)
- Tran channel, segment 0: ✅ solved (decode_tran_initial returns
  byte-exact values vs BW's ASCII export, across 5 of 5 loud-bundle
  events; first ~510 samples per event)
- Multi-segment Tran continuation: ❌ open (every hypothesis breaks
  at the segment-1 boundary around sample 512)
- Vert / Long / Mic channel decoders: ❌ open
- 30 NN block content: ❌ open (only appears in loud-from-start events)

Production code in client.py still uses the broken int16 LE decoder.
``decode_waveform_v2`` here returns ``None`` as a placeholder.  Callers
that need sample arrays should treat the legacy decoder's output as
"unverified" — the BW binary write path is the only sample-bearing
output that is currently trustworthy.

────────────────────────────────────────────────────────────────────────────
Body layout (CONFIRMED 2026-05-11 against 8 fixture events)
────────────────────────────────────────────────────────────────────────────

    [7-byte preamble] [stream of tagged blocks] [trailer]

The preamble is always exactly 7 bytes:

    body[0:3]  = 00 02 00              magic
    body[3:5]  = Tran[0]   int16 BE    in 16-count units (LSB = 0.005 in/s)
    body[5:7]  = Tran[1]   int16 BE    in 16-count units

(Earlier drafts of this module described a "7-or-9-byte preamble";
that was wrong — single-shot and continuous events both use 7 bytes.
The "extra 2 bytes" on continuous events were the first ``00 NN`` RLE
marker, not part of the preamble.)

Block types and lengths (all confirmed):

| Tag      | Length                | Meaning                                |
|----------|-----------------------|----------------------------------------|
| ``10 NN``| NN/2 + 2 bytes        | 4-bit nibble deltas (2 per byte; high  |
|          |                       | nibble first; signed 0..7 / 8..F = -8..-1)|
| ``20 NN``| NN + 2 bytes          | int8 signed deltas (1 per byte)        |
| ``00 NN``| 2 bytes               | RLE: append NN copies of current value |
| ``30 NN``| NN*2 in data, NN*4    | Unknown content.  Only in loud events. |
|          | in trailer            |                                        |
| ``40 02``| 20 bytes (fixed)      | Segment header                         |

NN is always a multiple of 4.

────────────────────────────────────────────────────────────────────────────
Tran channel, segment 0 (CONFIRMED 2026-05-11)
────────────────────────────────────────────────────────────────────────────

Segment 0 — everything before the first ``40 02`` segment header — encodes
Tran samples only.  Starting from preamble anchors Tran[0] and Tran[1],
each subsequent block contributes to the running Tran value:

    10 NN  →  append NN deltas (4-bit signed nibbles)
    20 NN  →  append NN deltas (int8 signed bytes)
    00 NN  →  append NN copies of the current value (RLE zeros)
    40 02  →  segment 0 ends; multi-segment continuation is open

This decodes the first 482–510 samples of Tran for each event with zero
errors against BW's ASCII export.  The exact segment-0 sample count
varies per event (it's bounded by a fixed device-flash byte budget, not
a fixed sample count — quiet events fit more samples because zero
deltas pack into ``00 NN`` markers compactly).

Implementation: :func:`decode_tran_initial`.

────────────────────────────────────────────────────────────────────────────
Segment header (40 02, 20 bytes total)
────────────────────────────────────────────────────────────────────────────

The 18-byte payload of the ``40 02`` block:

| Offset    | Field                                       | Status      |
|-----------|---------------------------------------------|-------------|
| [0:2]     | T_delta at first sample of new segment      | ✅ confirmed|
|           | (int16 BE, in 16-count units)               |             |
| [2:4]     | Likely T_delta at sample seg_start+1        | 🟡 likely   |
| [4:6]     | Unknown (varies; possibly checksum)         | ❓ open     |
| [6:8]     | Byte length to next segment header − 2      | ✅ confirmed|
|           | (uint16 BE; useful for walker pre-scan)     |             |
| [8:12]    | Monotonic uint32 LE counter                 | ✅ confirmed|
|           | (starts ~0x47, increments by 1 per segment) |             |
| [12:14]   | Constant ``02 00``                          | ✅ confirmed|
| [14:18]   | Unknown 4-byte field                        | ❓ open     |

────────────────────────────────────────────────────────────────────────────
What breaks the multi-segment decoder (the main open question)
────────────────────────────────────────────────────────────────────────────

After segment 0 ends and the segment header T_delta is consumed,
applying segment 1's blocks as Tran continuation produces values that
diverge from truth by sample ~512.  The block structure inside segment
1 is IDENTICAL to segment 0 (same alternating 10 NN / 00 NN pattern),
and the delta budget matches the segment size exactly (V70 segment 1
has 264 nibble-deltas + 244 RLE zeros = 508 = the segment's sample
count).  But the cumulative is wrong.

The strongest unverified hypothesis is that segments rotate channels:

    segment 0  →  Tran samples 0..509
    segment 1  →  Vert samples 0..507
    segment 2  →  Long samples 0..507
    segment 3  →  Mic  samples 0..507
    segment 4  →  Tran samples 510..N (continuation)
    ...

This is consistent with the segment-1 block sums net-to-near-zero in
V70 (where all 4 channels are near zero) and with the per-segment delta
budget matching the segment size for a single channel.  It is NOT yet
verified because the per-segment channel anchor isn't pinned down in
the segment header — bytes [4:6] and [14:18] of the header are still
open and probably encode V/L/M anchors.

See ``docs/waveform_codec_re_status.md`` for the current working notes
and the suggested next experiment ("segment-channel scoring analyzer").
"""

from __future__ import annotations

from dataclasses import dataclass
from typing import List, Optional, Tuple


@dataclass
class WaveformBlock:
    """One tagged block parsed out of a Blastware waveform-file body."""
    offset: int      # byte offset into body
    tag_hi: int      # first tag byte (0x10 / 0x20 / 0x00 / 0x30 / 0x40)
    tag_lo: int      # second tag byte (NN)
    data: bytes      # block payload (excludes the 2-byte tag)
    length: int      # total block length on the wire (includes the tag)

    @property
    def kind(self) -> str:
        return f"{self.tag_hi:02x} {self.tag_lo:02x}"


def find_data_start(body: bytes) -> int:
    """Auto-detect the offset of the first data block.

    The body starts with a 7-byte preamble (magic ``00 02 00`` + two int16 BE
    Tran anchors).  After that, the data section starts with a tag — usually
    ``10 NN`` or ``20 NN``, but quiet events may begin with a ``00 NN`` RLE
    marker.  We return the offset of the first recognized tag.
    """
    # Try fixed offset 7 first (canonical preamble length).
    if len(body) >= 9:
        b, nn = body[7], body[8]
        if (b in (0x00, 0x10, 0x20, 0x30) and nn % 4 == 0 and 0 < nn <= 0xFC) \
                or (b == 0x40 and nn == 0x02):
            return 7
    # Fall back to scanning the first 20 bytes.
    for i in range(min(20, len(body) - 1)):
        b = body[i]
        nn = body[i + 1]
        if b in (0x10, 0x20) and nn % 4 == 0 and 0 < nn <= 0xFC:
            return i
    return -1


def walk_body(body: bytes, start: Optional[int] = None) -> List[WaveformBlock]:
    """Walk the tagged-block sequence starting at *start* (auto-detected by default).

    Stops when an unrecognized tag is encountered or end of body is reached.
    Returned blocks are in stream order.
    """
    if start is None:
        start = find_data_start(body)
        if start < 0:
            return []

    blocks: List[WaveformBlock] = []
    i = start
    while i + 1 < len(body):
        t0 = body[i]
        t1 = body[i + 1]
        if t0 == 0x10 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
            length = t1 // 2 + 2
        elif t0 == 0x20 and t1 % 4 == 0 and 0 < t1 <= 0xFC:
            length = t1 + 2
        elif t0 == 0x00 and t1 % 4 == 0:
            length = 2
        elif t0 == 0x30 and t1 % 4 == 0 and 0 < t1 <= 0x10:
            # Data-section ``30 NN`` blocks carry NN 12-bit signed deltas packed
            # as NN/4 groups of (2-byte high-nibble field + 4 × int8 low byte).
            # Length = NN/4 × 6 + 2 = NN × 1.5 + 2 (= 8 for NN=4, 14 for NN=8,
            # 20 for NN=12, etc.).  Confirmed 2026-05-11 by full-decoder
            # verification against BW ASCII export.
            #
            # Trailer-section ``30 NN`` blocks have a different length formula
            # (NN × 4 = 32 for NN=8 in trailers).  We try the data-section
            # length first and fall back to the trailer length if needed.
            cand_data = t1 * 3 // 2 + 2
            cand_trailer = t1 * 4
            if (i + cand_data < len(body) - 1
                    and body[i + cand_data] in (0x10, 0x20, 0x00, 0x30, 0x40)):
                length = cand_data
            else:
                length = cand_trailer
        elif t0 == 0x40 and t1 == 0x02:
            length = 20
        else:
            # Unknown tag; stop.  Caller can inspect ``i`` to see where.
            break

        if i + length > len(body):
            break

        data = bytes(body[i + 2 : i + length])
        blocks.append(WaveformBlock(offset=i, tag_hi=t0, tag_lo=t1, data=data, length=length))
        i += length

    return blocks


def split_segments(blocks: List[WaveformBlock]) -> List[List[WaveformBlock]]:
    """Group consecutive blocks into segments separated by ``40 02`` headers.

    The first segment is whatever runs before the first ``40 02`` header
    (typically the "segment 0" preamble data after the body preamble).
    Subsequent segments start with a ``40 02`` block, then have their
    own data blocks until the next ``40 02``.
    """
    segments: List[List[WaveformBlock]] = []
    current: List[WaveformBlock] = []
    for b in blocks:
        if b.tag_hi == 0x40 and b.tag_lo == 0x02:
            if current:
                segments.append(current)
            current = [b]
        else:
            current.append(b)
    if current:
        segments.append(current)
    return segments


def parse_segment_header(block: WaveformBlock) -> Optional[dict]:
    """Decode the 18-byte payload of a ``40 02`` segment header.

    Returns a dict with the labelled fields, or None if *block* is not
    a ``40 02`` header.
    """
    if not (block.tag_hi == 0x40 and block.tag_lo == 0x02):
        return None
    if len(block.data) < 18:
        return None
    p = block.data
    counter = int.from_bytes(p[8:12], "little", signed=False)
    return {
        "anchor_bytes": p[0:4],          # 4-byte field, role unconfirmed
        "field2": p[4:8],                # 4-byte field, role unconfirmed
        "counter": counter,              # uint32 LE — increments by 1 per segment
        "fixed_pattern": p[12:16],       # always b"\x02\x00\x00\x01"
        "tail": p[16:18],                # last 2 bytes
    }


def _s4(n: int) -> int:
    """Sign-extend a 4-bit value to signed int (0..7 → 0..7; 8..F → -8..-1)."""
    return n if n < 8 else n - 16


def _i8(b: int) -> int:
    """Reinterpret an unsigned byte as signed int8."""
    return b if b < 128 else b - 256


def decode_tran_initial(body: bytes) -> Optional[List[int]]:
    """
    Decode the initial Tran-channel samples — VERIFIED 2026-05-11.

    Returns Tran samples in **16-count units** (LSB = 0.005 in/s at Normal
    range — the same quantization BW uses for its ASCII export).  Returns
    ``None`` if the body cannot be parsed.

    The decoded list extends from sample 0 through the end of segment 0
    (= just before the first ``40 02`` segment header; ~510 sample-sets
    for the events tested).  Multi-segment decoding requires continuing
    past the segment header — that's done by :func:`decode_tran_full`
    when the per-segment rules are pinned down for all signal types.

    Codec for segment 0 (CONFIRMED 2026-05-11 against 7 fixture events):

    - Body bytes [0:3] are the magic ``00 02 00``.
    - Body bytes [3:5] = ``Tran[0]`` as int16 BE in 16-count units.
    - Body bytes [5:7] = ``Tran[1]`` as int16 BE in 16-count units.
    - Data blocks (``10 NN`` or ``20 NN``) carry Tran deltas starting
      at sample 2:

      * ``10 NN``: NN nibbles = NN/2 bytes; each nibble is a 4-bit
        signed delta (0..7 → 0..+7; 8..F → -8..-1).  High nibble of
        each byte comes first.
      * ``20 NN``: NN int8 signed deltas (one delta per byte).

    - ``00 NN`` blocks are run-length-encoded zero deltas: append NN
      copies of the current cumulative Tran value (no change).

    - ``30 NN`` blocks have not yet been decoded for content — they
      appear in segment 0 of loud-from-start events (SS0, SV0) and
      seem to signal a transition or special-case interpretation.
      The walker steps over them but their data is ignored.

    The walk stops at the first ``40 02`` segment header.
    """
    if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
        return None
    t0 = int.from_bytes(body[3:5], "big", signed=True)
    t1 = int.from_bytes(body[5:7], "big", signed=True)

    start = find_data_start(body)
    if start < 0:
        return [t0, t1]

    out = [t0, t1]
    cur = t1
    for blk in walk_body(body, start):
        if blk.tag_hi == 0x40:
            # Segment boundary — stop.  Multi-segment decode is decode_tran_full.
            break
        if blk.tag_hi == 0x10:
            for byte in blk.data:
                for nib in ((byte >> 4) & 0xF, byte & 0xF):
                    cur += _s4(nib)
                    out.append(cur)
        elif blk.tag_hi == 0x20:
            for byte in blk.data:
                cur += _i8(byte)
                out.append(cur)
        elif blk.tag_hi == 0x00:
            # RLE zero deltas: append NN copies of current Tran value.
            for _ in range(blk.tag_lo):
                out.append(cur)
        # 30 NN: unknown content; skip.
    return out


def decode_waveform_v2(body: bytes) -> Optional[dict]:
    """
    Decode the body into per-channel sample arrays.

    Status (2026-05-11 evening — channel-rotation hypothesis CONFIRMED):
    segments rotate channels in fixed order **Tran → Vert → Long → MicL**.
    Each channel-segment carries a 2-sample anchor pair in segment-header
    bytes [14:18] (or in the body preamble for the initial Tran segment)
    plus a stream of delta blocks for samples 2 onward.

    Returns ``{"Tran": [...], "Vert": [...], "Long": [...], "MicL": [...]}``
    with each channel's decoded samples in 16-count units (LSB = 0.005
    in/s at Normal range).  Returns ``None`` if the body cannot be
    parsed.
    """
    if len(body) < 7 or body[0:3] != b"\x00\x02\x00":
        return None

    channels = ["Tran", "Vert", "Long", "MicL"]
    out: dict = {ch: [] for ch in channels}

    # Initial Tran segment: preamble anchor pair + delta blocks before first 40 02.
    t0 = int.from_bytes(body[3:5], "big", signed=True)
    t1 = int.from_bytes(body[5:7], "big", signed=True)
    out["Tran"].extend([t0, t1])

    start = find_data_start(body)
    if start < 0:
        return out

    blocks = walk_body(body, start)
    seg_idx = [i for i, b in enumerate(blocks) if b.tag_hi == 0x40]

    def apply_blocks(channel: str, anchor: int,
                     block_start: int, block_end: int) -> int:
        """Apply delta blocks [block_start, block_end) to *channel*'s sample
        list, starting from *anchor*.  Returns the final cumulative value."""
        cur = anchor
        for bi in range(block_start, block_end):
            blk = blocks[bi]
            if blk.tag_hi == 0x10:
                for byte in blk.data:
                    for nib in ((byte >> 4) & 0xF, byte & 0xF):
                        cur += _s4(nib)
                        out[channel].append(cur)
            elif blk.tag_hi == 0x20:
                for byte in blk.data:
                    cur += _i8(byte)
                    out[channel].append(cur)
            elif blk.tag_hi == 0x00:
                for _ in range(blk.tag_lo):
                    out[channel].append(cur)
            elif blk.tag_hi == 0x30:
                # 12-bit signed deltas, packed as NN/4 groups of 6 bytes each:
                #   bytes [0:2] = 16 bits = 4 × 4-bit high nibbles (MSB first)
                #   bytes [2:6] = 4 × int8 low bytes
                # Each delta = sign_extend_12((high_nibble << 8) | low_byte).
                # Confirmed 2026-05-11 against all 14 ``30 NN`` blocks in the
                # bundled fixtures.
                n_groups = blk.tag_lo // 4
                for g in range(n_groups):
                    grp = blk.data[g * 6 : (g + 1) * 6]
                    if len(grp) < 6:
                        break
                    high_word = (grp[0] << 8) | grp[1]
                    for k in range(4):
                        nib = (high_word >> (12 - 4 * k)) & 0xF
                        v = (nib << 8) | grp[2 + k]
                        if v >= 0x800:
                            v -= 0x1000
                        cur += v
                        out[channel].append(cur)
            # 40 02: should not occur in segment data.
        return cur

    # Initial Tran segment: deltas from start of body up to first 40 02 (or end).
    first_seg = seg_idx[0] if seg_idx else len(blocks)
    last_tran_value = apply_blocks("Tran", t1, 0, first_seg)

    # Subsequent segments rotate channels.  Each segment header carries:
    #   bytes [0:2] and [2:4] = 2 deltas extending the PREVIOUS channel
    #   bytes [14:16] and [16:18] = anchor pair for THIS segment's channel
    #
    # Rotation: V, L, M, T, V, L, M, T, ...  (initial Tran segment is the
    # implicit T in the cycle.)
    rotation = ["Vert", "Long", "MicL", "Tran"]
    # Track each channel's "running cumulative value" so we can apply the
    # previous-channel extension deltas at every segment boundary.
    last_value = {"Tran": last_tran_value, "Vert": None, "Long": None, "MicL": None}

    for k, hi in enumerate(seg_idx):
        channel = rotation[k % 4]
        prev_channel = "Tran" if k == 0 else rotation[(k - 1) % 4]
        header = blocks[hi]
        if len(header.data) < 18:
            continue
        # Extend the PREVIOUS channel by 2 more samples (deltas in bytes [0:4]).
        prev_d0 = int.from_bytes(header.data[0:2], "big", signed=True)
        prev_d1 = int.from_bytes(header.data[2:4], "big", signed=True)
        if last_value[prev_channel] is not None:
            v = last_value[prev_channel] + prev_d0
            out[prev_channel].append(v)
            v += prev_d1
            out[prev_channel].append(v)
            last_value[prev_channel] = v
        # Anchor pair for THIS segment's channel.
        c0 = int.from_bytes(header.data[14:16], "big", signed=True)
        c1 = int.from_bytes(header.data[16:18], "big", signed=True)
        out[channel].extend([c0, c1])
        # Apply delta blocks for this segment.
        next_hi = seg_idx[k + 1] if k + 1 < len(seg_idx) else len(blocks)
        last_value[channel] = apply_blocks(channel, c1, hi + 1, next_hi)

    return out