13 lines
3.0 KiB
JSON
13 lines
3.0 KiB
JSON
{
|
||
"title": "PokerBrain dataset structure",
|
||
"messages": [
|
||
{
|
||
"role": "user",
|
||
"content": "Let’s start the PokerBrain fine-tune project — stage 1: dataset structure.\n\nThis is an offshoot from our recent model work in the HI50 with mythomax. We found it reasons well, but it doesnt know anything about poker, and just sorta makes stuff up to sound confident. We need to possibly fine tune a model to use as a poker brain. "
|
||
},
|
||
{
|
||
"role": "assistant",
|
||
"content": "Perfect — this is a really strong next step. \n\nLet’s keep this in bite-sized, verifiable chunks. Stage 1 will focus purely on **dataset structure and format** (no training yet). \n\n---\n\n### 🧠 PokerBrain Fine-Tune — Stage 1 Goal\nDesign a **consistent dataset format** that can later be used to fine-tune or instruct-tune a smaller model (7B–13B) to reason accurately about poker hands, lines, and concepts.\n\n---\n\n### Step 1 — Define Dataset Purpose\nWe’re teaching the model to:\n1. **Reason about hands** (spot-by-spot logic, ranges, EV decisions). \n2. **Answer strategy questions** (“What should Hero do?” “Why is this a fold?”). \n3. **Recognize game types** (NLH, PLO, Stud, etc.) and adjust reasoning accordingly. \n4. **Explain like a coach**, not like a dictionary. \n\nSo our dataset needs examples of:\n- Annotated hands \n- Concept explanations \n- Mistake corrections \n- General Q&A \n\n---\n\n### Step 2 — Choose a Base Schema\nWe’ll use a structured **JSONL** format (one sample per line), like this:\n\n```json\n{\n \"id\": \"nlh_hand_0001\",\n \"game\": \"NLH\",\n \"stakes\": \"$1/$3\",\n \"scenario\": \"Hero opens UTG+1 to $15 with As Qc. Button calls, BB calls. Flop Ad Td 7c. Button bets $30 after Hero checks.\",\n \"question\": \"What should Hero do facing this bet?\",\n \"answer\": \"Hero should call. The board favors Hero’s range and calling keeps weaker Ax and draws in. Raising folds out bluffs and isolates better hands.\",\n \"tags\": [\"flop\", \"range advantage\", \"call vs raise\", \"single-raised pot\"],\n \"difficulty\": \"intermediate\",\n \"source\": \"manual\",\n \"format_version\": \"1.0\"\n}\n```\n\nLater, we can expand this to include **multi-turn reasoning chains** or **coach/student dialogue** examples.\n\n---\n\n### Step 3 — Sub-datasets (eventual categories)\nWe’ll likely split data into:\n- **Hands** — structured reasoning examples \n- **Concepts** — short lessons (e.g., “What is SPR?”) \n- **Corrections** — show bad reasoning → corrected explanation \n- **Quizzes** — question → correct answer \n\n---\n\nBefore moving to step 4 (template variants + data generation workflow), I want to confirm: \nWould you prefer this dataset to be in a **chat-style format** (like Alpaca or ChatML, e.g., system/user/assistant messages), or a **single-turn Q&A** format (question → answer only)? \n\nBoth can work — chat-style helps future dialogue use (e.g., coach mode), while single-turn is cleaner for base reasoning."
|
||
}
|
||
]
|
||
} |