# Training parameter recommendations Hardware: Dual Intel Xeon E5-2678 v3 (24 physical cores / 48 threads) + NVIDIA GTX 1050 Ti (4 GB VRAM) Purpose: recommended, ready-to-apply parameter sets for the repository's two training flows: - Card Model (`card_model/train_card_model.py`) — see `card_model/config.py` - MCCFR Trainer (`mccfr_trainer.py`) Do not modify code automatically; this document lists the variables and suggested values (two profiles: Quick/Dev and Balanced/Production). Edit the constants in the referenced files when you are ready. Files to adjust (examples): - Card Model config: [card_model/config.py](card_model/config.py#L65-L72) - MCCFR trainer: [mccfr_trainer.py](mccfr_trainer.py#L76-L97) --- ## Summary recommendation for your machine (short) - If you want fast iterations: use the `Quick / Dev` profile below. - If you want longer runs for better final performance and have time: use the `Balanced / Production` profile. --- ## Card Model (histogram + equity) — variables in `card_model/config.py` Two profiles: Quick / Dev (iterate fast) and Balanced / Production. ### Quick / Dev (recommended to iterate) - `NUM_TRAIN_SAMPLES` = 200_000 - `NUM_VAL_SAMPLES` = 10_000 - `NUM_ROLLOUTS` = 200 - `BATCH_SIZE` = 1024 - `NUM_EPOCHS` = 32 - `LEARNING_RATE` = 1e-3 - `WEIGHT_DECAY` = 1e-4 - `LAMBDA_MSE` = 0.1 - `NUM_WORKERS` = 20 # used for dataset generation and DataLoader in this codebase; 20 is a good balance on 24 cores Notes: - `NUM_ROLLOUTS=200` reduces data-generation cost (fewer MC rollouts) so samples are cheaper to produce. Increase to 1000 for higher-quality labels if you have time. - `BATCH_SIZE=1024` is safe for GTX 1050 Ti (4 GB VRAM). If you see OOM during CardModel training, reduce to 512. ### Balanced / Production (longer training, better final quality) - `NUM_TRAIN_SAMPLES` = 2_000_000 - `NUM_VAL_SAMPLES` = 100_000 - `NUM_ROLLOUTS` = 1000 - `BATCH_SIZE` = 4096 - `NUM_EPOCHS` = 64 - `LEARNING_RATE` = 5e-4 - `WEIGHT_DECAY` = 1e-4 - `LAMBDA_MSE` = 0.1 - `NUM_WORKERS` = 22 Notes: - Production profile expects long wall-clock time and sustained CPU usage. With `NUM_WORKERS=22` you still leave 2 physical cores for OS/driver tasks. - If training CardModel on GPU causes OOM, fallback to CPU (`device=torch.device('cpu')`) or reduce `BATCH_SIZE`. --- ## MCCFR Trainer (`mccfr_trainer.py`) — main self-play + network training Two profiles: Quick / Dev and Balanced / Production. ### Quick / Dev (safe to test) - `NUM_ITERATIONS` = 1_000 - `GAMES_PER_ITER` = 200 - `NUM_WORKERS` = 20 # worker processes for self-play traversals (use physical cores minus a few) - `BUFFER_MAX_SIZE` = 500_000 - `MIN_BUFFER_SIZE_FOR_TRAIN` = 10_000 - `TRAIN_BATCH_SIZE` = 4_096 - `TRAIN_STEPS_PER_ITER` = 20 - `LEARNING_RATE` = 1e-3 - `WEIGHT_DECAY` = 1e-4 - `CLIP_GRAD_NORM` = 1.0 - `CARD_MODEL_CHECKPOINT` = `card_model/data/best_card_model.pt` (use existing checkpoint if available) Why these values? - `NUM_WORKERS=20` uses most physical cores while leaving a few cores for the main process and OS. - `TRAIN_BATCH_SIZE=4096` is a conservative batch that should fit in 4 GB VRAM for the small CFR network and allow efficient training. - Reduce `MIN_BUFFER_SIZE_FOR_TRAIN` for faster first training iterations during experiments. ### Balanced / Production (long-run) - `NUM_ITERATIONS` = 50_000 - `GAMES_PER_ITER` = 500 - `NUM_WORKERS` = 20 - `BUFFER_MAX_SIZE` = 2_000_000 - `MIN_BUFFER_SIZE_FOR_TRAIN` = 100_000 - `TRAIN_BATCH_SIZE` = 8_192 - `TRAIN_STEPS_PER_ITER` = 50 - `LEARNING_RATE` = 5e-4 - `WEIGHT_DECAY` = 1e-4 - `CLIP_GRAD_NORM` = 1.0 Notes: - The CFR network is compact; even on 4GB VRAM you can try `TRAIN_BATCH_SIZE` up to 8k-16k depending on other GPU activity. Start with 8k and monitor GPU memory with `nvidia-smi`. - `NUM_WORKERS=20` still recommended; avoid setting `NUM_WORKERS` >= number of physical cores to reduce scheduling/oversubscription overhead. --- ## Suggested practical workflow (apply these before long runs) 1. For a first end-to-end test, use the **Quick / Dev** profile for both Card Model and MCCFR Trainer. 2. Generate CardModel training data once: - Run `python train_card_model.py` (it will generate or load `card_model/data/train_data.npz`). - If generation is too slow, reduce `NUM_TRAIN_SAMPLES` or `NUM_ROLLOUTS` in the Quick profile. 3. Train CardModel to obtain `card_model/data/best_card_model.pt`. 4. Use that checkpoint with `mccfr_trainer.py` (set `CARD_MODEL_CHECKPOINT` if you want to load it) and start MCCFR with Quick/Dev profile. 5. If both steps succeed and you want to scale up, switch to the Balanced/Production profile. Commands examples: - Generate & train CardModel (from repo root): ``` python train_card_model.py ``` - Start MCCFR trainer (from repo root): ``` python mccfr_trainer.py ``` Monitor GPU memory while training with `nvidia-smi -l 2` and reduce `BATCH_SIZE` / `TRAIN_BATCH_SIZE` if you see OOM. --- ## Notes & cautions - The repository hardcodes some constants in `card_model/config.py` and `mccfr_trainer.py`. This document lists the variables and recommended values — you must edit the constants in those files or override them in a wrapper script before running. - For multi-process data generation and MCCFR traversal, the code uses `spawn` start method to avoid CUDA forking issues. Keep that unchanged. - If you plan to fully utilize all 24 cores for data generation, avoid launching heavy background tasks. Disk I/O during parallel generation can be significant; make sure you have enough temporary disk space for intermediate `.npz` files. --- ## Quick reference: exact variables to set - `card_model/config.py`: - `NUM_TRAIN_SAMPLES`, `NUM_VAL_SAMPLES`, `NUM_ROLLOUTS`, `BATCH_SIZE`, `NUM_EPOCHS`, `LEARNING_RATE`, `NUM_WORKERS`, `WEIGHT_DECAY`. - `mccfr_trainer.py`: - `NUM_ITERATIONS`, `GAMES_PER_ITER`, `NUM_WORKERS`, `BUFFER_MAX_SIZE`, `MIN_BUFFER_SIZE_FOR_TRAIN`, `TRAIN_BATCH_SIZE`, `TRAIN_STEPS_PER_ITER`, `LEARNING_RATE`, `WEIGHT_DECAY`, `CARD_MODEL_CHECKPOINT`. --- If you want, I can now write a small wrapper script that launches CardModel data generation and training, then launches MCCFR with the chosen profile (no code changes to core files — the wrapper will set values at runtime). Reply if you want that wrapper created.