project-lyra

serversdown/project-lyra

Fork 0

Commit Graph

Author	SHA1	Message	Date
serversdown	194e3e64b9	feat: import raw ChatGPT export (new sharded format) OpenAI's export changed: conversations.json is now sharded into conversations-000.json..NNN.json, each a JSON array of conversations with the mapping tree and per-message create_time. ingest now reads that format directly (supersedes the old convert/trim/split scripts): walks each conversation's mapping ordered by create_time, keeps text and multimodal_text (drops thoughts/reasoning_recap), captures real per-message timestamps, and imports idempotently by conversation_id. `lyra-import <dir>` auto-detects raw-export vs legacy {title,messages} dirs; optional limit arg. Verified on 15 conversations: real dates, correct ordering, recall returns dated poker history. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 02:40:32 +00:00
serversdown	f3037b7879	feat: ChatGPT chat-log importer Import the parser's {title, messages} JSON into Lyra's memory so past conversations seed recall (and, later, the era-rollup tier). - lyra/ingest.py: one conversation -> one session, text messages -> exchanges; skips non-text (image asset) messages and non user/assistant roles; embeddings batched; idempotent by filename-derived session id; `lyra-import <dir>` CLI - memory.add_exchanges_bulk: batched insert of pre-embedded rows Format has no timestamps yet, so imports are stamped at import time; a future dated export will let era memory group by real calendar time. Verified on the 68-file lyra dev set: 7519 exchanges, idempotent re-run, recall returns relevant history. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 00:51:45 +00:00

Author

SHA1

Message

Date

serversdown

194e3e64b9

feat: import raw ChatGPT export (new sharded format)

OpenAI's export changed: conversations.json is now sharded into
conversations-000.json..NNN.json, each a JSON array of conversations with the
mapping tree and per-message create_time.

ingest now reads that format directly (supersedes the old convert/trim/split
scripts): walks each conversation's mapping ordered by create_time, keeps text
and multimodal_text (drops thoughts/reasoning_recap), captures real per-message
timestamps, and imports idempotently by conversation_id. `lyra-import <dir>`
auto-detects raw-export vs legacy {title,messages} dirs; optional limit arg.

Verified on 15 conversations: real dates, correct ordering, recall returns
dated poker history.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-16 02:40:32 +00:00

serversdown

f3037b7879

feat: ChatGPT chat-log importer

Import the parser's {title, messages} JSON into Lyra's memory so past
conversations seed recall (and, later, the era-rollup tier).

- lyra/ingest.py: one conversation -> one session, text messages -> exchanges;
  skips non-text (image asset) messages and non user/assistant roles; embeddings
  batched; idempotent by filename-derived session id; `lyra-import <dir>` CLI
- memory.add_exchanges_bulk: batched insert of pre-embedded rows

Format has no timestamps yet, so imports are stamped at import time; a future
dated export will let era memory group by real calendar time.

Verified on the 68-file lyra dev set: 7519 exchanges, idempotent re-run, recall
returns relevant history.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-16 00:51:45 +00:00

2 Commits