6.3 KiB
Training parameter recommendations
Hardware: Dual Intel Xeon E5-2678 v3 (24 physical cores / 48 threads) + NVIDIA GTX 1050 Ti (4 GB VRAM)
Purpose: recommended, ready-to-apply parameter sets for the repository's two training flows:
- Card Model (
card_model/train_card_model.py) — seecard_model/config.py - MCCFR Trainer (
mccfr_trainer.py)
Do not modify code automatically; this document lists the variables and suggested values (two profiles: Quick/Dev and Balanced/Production). Edit the constants in the referenced files when you are ready.
Files to adjust (examples):
- Card Model config: card_model/config.py
- MCCFR trainer: mccfr_trainer.py
Summary recommendation for your machine (short)
- If you want fast iterations: use the
Quick / Devprofile below. - If you want longer runs for better final performance and have time: use the
Balanced / Productionprofile.
Card Model (histogram + equity) — variables in card_model/config.py
Two profiles: Quick / Dev (iterate fast) and Balanced / Production.
Quick / Dev (recommended to iterate)
NUM_TRAIN_SAMPLES= 200_000NUM_VAL_SAMPLES= 10_000NUM_ROLLOUTS= 200BATCH_SIZE= 1024NUM_EPOCHS= 32LEARNING_RATE= 1e-3WEIGHT_DECAY= 1e-4LAMBDA_MSE= 0.1NUM_WORKERS= 20 # used for dataset generation and DataLoader in this codebase; 20 is a good balance on 24 cores
Notes:
NUM_ROLLOUTS=200reduces data-generation cost (fewer MC rollouts) so samples are cheaper to produce. Increase to 1000 for higher-quality labels if you have time.BATCH_SIZE=1024is safe for GTX 1050 Ti (4 GB VRAM). If you see OOM during CardModel training, reduce to 512.
Balanced / Production (longer training, better final quality)
NUM_TRAIN_SAMPLES= 2_000_000NUM_VAL_SAMPLES= 100_000NUM_ROLLOUTS= 1000BATCH_SIZE= 4096NUM_EPOCHS= 64LEARNING_RATE= 5e-4WEIGHT_DECAY= 1e-4LAMBDA_MSE= 0.1NUM_WORKERS= 22
Notes:
- Production profile expects long wall-clock time and sustained CPU usage. With
NUM_WORKERS=22you still leave 2 physical cores for OS/driver tasks. - If training CardModel on GPU causes OOM, fallback to CPU (
device=torch.device('cpu')) or reduceBATCH_SIZE.
MCCFR Trainer (mccfr_trainer.py) — main self-play + network training
Two profiles: Quick / Dev and Balanced / Production.
Quick / Dev (safe to test)
NUM_ITERATIONS= 1_000GAMES_PER_ITER= 200NUM_WORKERS= 20 # worker processes for self-play traversals (use physical cores minus a few)BUFFER_MAX_SIZE= 500_000MIN_BUFFER_SIZE_FOR_TRAIN= 10_000TRAIN_BATCH_SIZE= 4_096TRAIN_STEPS_PER_ITER= 20LEARNING_RATE= 1e-3WEIGHT_DECAY= 1e-4CLIP_GRAD_NORM= 1.0CARD_MODEL_CHECKPOINT=card_model/data/best_card_model.pt(use existing checkpoint if available)
Why these values?
NUM_WORKERS=20uses most physical cores while leaving a few cores for the main process and OS.TRAIN_BATCH_SIZE=4096is a conservative batch that should fit in 4 GB VRAM for the small CFR network and allow efficient training.- Reduce
MIN_BUFFER_SIZE_FOR_TRAINfor faster first training iterations during experiments.
Balanced / Production (long-run)
NUM_ITERATIONS= 50_000GAMES_PER_ITER= 500NUM_WORKERS= 20BUFFER_MAX_SIZE= 2_000_000MIN_BUFFER_SIZE_FOR_TRAIN= 100_000TRAIN_BATCH_SIZE= 8_192TRAIN_STEPS_PER_ITER= 50LEARNING_RATE= 5e-4WEIGHT_DECAY= 1e-4CLIP_GRAD_NORM= 1.0
Notes:
- The CFR network is compact; even on 4GB VRAM you can try
TRAIN_BATCH_SIZEup to 8k-16k depending on other GPU activity. Start with 8k and monitor GPU memory withnvidia-smi. NUM_WORKERS=20still recommended; avoid settingNUM_WORKERS>= number of physical cores to reduce scheduling/oversubscription overhead.
Suggested practical workflow (apply these before long runs)
- For a first end-to-end test, use the Quick / Dev profile for both Card Model and MCCFR Trainer.
- Generate CardModel training data once:
- Run
python train_card_model.py(it will generate or loadcard_model/data/train_data.npz). - If generation is too slow, reduce
NUM_TRAIN_SAMPLESorNUM_ROLLOUTSin the Quick profile.
- Run
- Train CardModel to obtain
card_model/data/best_card_model.pt. - Use that checkpoint with
mccfr_trainer.py(setCARD_MODEL_CHECKPOINTif you want to load it) and start MCCFR with Quick/Dev profile. - If both steps succeed and you want to scale up, switch to the Balanced/Production profile.
Commands examples:
- Generate & train CardModel (from repo root):
python train_card_model.py
- Start MCCFR trainer (from repo root):
python mccfr_trainer.py
Monitor GPU memory while training with nvidia-smi -l 2 and reduce BATCH_SIZE / TRAIN_BATCH_SIZE if you see OOM.
Notes & cautions
- The repository hardcodes some constants in
card_model/config.pyandmccfr_trainer.py. This document lists the variables and recommended values — you must edit the constants in those files or override them in a wrapper script before running. - For multi-process data generation and MCCFR traversal, the code uses
spawnstart method to avoid CUDA forking issues. Keep that unchanged. - If you plan to fully utilize all 24 cores for data generation, avoid launching heavy background tasks. Disk I/O during parallel generation can be significant; make sure you have enough temporary disk space for intermediate
.npzfiles.
Quick reference: exact variables to set
card_model/config.py:NUM_TRAIN_SAMPLES,NUM_VAL_SAMPLES,NUM_ROLLOUTS,BATCH_SIZE,NUM_EPOCHS,LEARNING_RATE,NUM_WORKERS,WEIGHT_DECAY.
mccfr_trainer.py:NUM_ITERATIONS,GAMES_PER_ITER,NUM_WORKERS,BUFFER_MAX_SIZE,MIN_BUFFER_SIZE_FOR_TRAIN,TRAIN_BATCH_SIZE,TRAIN_STEPS_PER_ITER,LEARNING_RATE,WEIGHT_DECAY,CARD_MODEL_CHECKPOINT.
If you want, I can now write a small wrapper script that launches CardModel data generation and training, then launches MCCFR with the chosen profile (no code changes to core files — the wrapper will set values at runtime). Reply if you want that wrapper created.