e2hang/new

Fork 0

Files

e2hang f609826eb7 E5 2678v3 *2 1050Ti

2026-04-22 10:11:42 +08:00

6.3 KiB

Raw Permalink Blame History

Training parameter recommendations

Hardware: Dual Intel Xeon E5-2678 v3 (24 physical cores / 48 threads) + NVIDIA GTX 1050 Ti (4 GB VRAM)

Purpose: recommended, ready-to-apply parameter sets for the repository's two training flows:

Card Model (card_model/train_card_model.py) — see card_model/config.py
MCCFR Trainer (mccfr_trainer.py)

Do not modify code automatically; this document lists the variables and suggested values (two profiles: Quick/Dev and Balanced/Production). Edit the constants in the referenced files when you are ready.

Files to adjust (examples):

Card Model config: card_model/config.py
MCCFR trainer: mccfr_trainer.py

Summary recommendation for your machine (short)

If you want fast iterations: use the Quick / Dev profile below.
If you want longer runs for better final performance and have time: use the Balanced / Production profile.

Card Model (histogram + equity) — variables in `card_model/config.py`

Two profiles: Quick / Dev (iterate fast) and Balanced / Production.

Quick / Dev (recommended to iterate)

NUM_TRAIN_SAMPLES = 200_000
NUM_VAL_SAMPLES = 10_000
NUM_ROLLOUTS = 200
BATCH_SIZE = 1024
NUM_EPOCHS = 32
LEARNING_RATE = 1e-3
WEIGHT_DECAY = 1e-4
LAMBDA_MSE = 0.1
NUM_WORKERS = 20 # used for dataset generation and DataLoader in this codebase; 20 is a good balance on 24 cores

Notes:

NUM_ROLLOUTS=200 reduces data-generation cost (fewer MC rollouts) so samples are cheaper to produce. Increase to 1000 for higher-quality labels if you have time.
BATCH_SIZE=1024 is safe for GTX 1050 Ti (4 GB VRAM). If you see OOM during CardModel training, reduce to 512.

Balanced / Production (longer training, better final quality)

NUM_TRAIN_SAMPLES = 2_000_000
NUM_VAL_SAMPLES = 100_000
NUM_ROLLOUTS = 1000
BATCH_SIZE = 4096
NUM_EPOCHS = 64
LEARNING_RATE = 5e-4
WEIGHT_DECAY = 1e-4
LAMBDA_MSE = 0.1
NUM_WORKERS = 22

Notes:

Production profile expects long wall-clock time and sustained CPU usage. With NUM_WORKERS=22 you still leave 2 physical cores for OS/driver tasks.
If training CardModel on GPU causes OOM, fallback to CPU (device=torch.device('cpu')) or reduce BATCH_SIZE.

MCCFR Trainer (`mccfr_trainer.py`) — main self-play + network training

Two profiles: Quick / Dev and Balanced / Production.

Quick / Dev (safe to test)

NUM_ITERATIONS = 1_000
GAMES_PER_ITER = 200
NUM_WORKERS = 20 # worker processes for self-play traversals (use physical cores minus a few)
BUFFER_MAX_SIZE = 500_000
MIN_BUFFER_SIZE_FOR_TRAIN = 10_000
TRAIN_BATCH_SIZE = 4_096
TRAIN_STEPS_PER_ITER = 20
LEARNING_RATE = 1e-3
WEIGHT_DECAY = 1e-4
CLIP_GRAD_NORM = 1.0
CARD_MODEL_CHECKPOINT = card_model/data/best_card_model.pt (use existing checkpoint if available)

Why these values?

NUM_WORKERS=20 uses most physical cores while leaving a few cores for the main process and OS.
TRAIN_BATCH_SIZE=4096 is a conservative batch that should fit in 4 GB VRAM for the small CFR network and allow efficient training.
Reduce MIN_BUFFER_SIZE_FOR_TRAIN for faster first training iterations during experiments.

Balanced / Production (long-run)

NUM_ITERATIONS = 50_000
GAMES_PER_ITER = 500
NUM_WORKERS = 20
BUFFER_MAX_SIZE = 2_000_000
MIN_BUFFER_SIZE_FOR_TRAIN = 100_000
TRAIN_BATCH_SIZE = 8_192
TRAIN_STEPS_PER_ITER = 50
LEARNING_RATE = 5e-4
WEIGHT_DECAY = 1e-4
CLIP_GRAD_NORM = 1.0

Notes:

The CFR network is compact; even on 4GB VRAM you can try TRAIN_BATCH_SIZE up to 8k-16k depending on other GPU activity. Start with 8k and monitor GPU memory with nvidia-smi.
NUM_WORKERS=20 still recommended; avoid setting NUM_WORKERS >= number of physical cores to reduce scheduling/oversubscription overhead.

Suggested practical workflow (apply these before long runs)

For a first end-to-end test, use the Quick / Dev profile for both Card Model and MCCFR Trainer.
Generate CardModel training data once:
- Run python train_card_model.py (it will generate or load card_model/data/train_data.npz).
- If generation is too slow, reduce NUM_TRAIN_SAMPLES or NUM_ROLLOUTS in the Quick profile.
Train CardModel to obtain card_model/data/best_card_model.pt.
Use that checkpoint with mccfr_trainer.py (set CARD_MODEL_CHECKPOINT if you want to load it) and start MCCFR with Quick/Dev profile.
If both steps succeed and you want to scale up, switch to the Balanced/Production profile.

Commands examples:

Generate & train CardModel (from repo root):

python train_card_model.py

Start MCCFR trainer (from repo root):

python mccfr_trainer.py

Monitor GPU memory while training with nvidia-smi -l 2 and reduce BATCH_SIZE / TRAIN_BATCH_SIZE if you see OOM.

Notes & cautions

The repository hardcodes some constants in card_model/config.py and mccfr_trainer.py. This document lists the variables and recommended values — you must edit the constants in those files or override them in a wrapper script before running.
For multi-process data generation and MCCFR traversal, the code uses spawn start method to avoid CUDA forking issues. Keep that unchanged.
If you plan to fully utilize all 24 cores for data generation, avoid launching heavy background tasks. Disk I/O during parallel generation can be significant; make sure you have enough temporary disk space for intermediate .npz files.

Quick reference: exact variables to set

card_model/config.py:
- NUM_TRAIN_SAMPLES, NUM_VAL_SAMPLES, NUM_ROLLOUTS, BATCH_SIZE, NUM_EPOCHS, LEARNING_RATE, NUM_WORKERS, WEIGHT_DECAY.
mccfr_trainer.py:
- NUM_ITERATIONS, GAMES_PER_ITER, NUM_WORKERS, BUFFER_MAX_SIZE, MIN_BUFFER_SIZE_FOR_TRAIN, TRAIN_BATCH_SIZE, TRAIN_STEPS_PER_ITER, LEARNING_RATE, WEIGHT_DECAY, CARD_MODEL_CHECKPOINT.

If you want, I can now write a small wrapper script that launches CardModel data generation and training, then launches MCCFR with the chosen profile (no code changes to core files — the wrapper will set values at runtime). Reply if you want that wrapper created.

6.3 KiB Raw Permalink Blame History

Training parameter recommendations

Summary recommendation for your machine (short)

Card Model (histogram + equity) — variables in card_model/config.py

Quick / Dev (recommended to iterate)

Balanced / Production (longer training, better final quality)

MCCFR Trainer (mccfr_trainer.py) — main self-play + network training

Quick / Dev (safe to test)

Balanced / Production (long-run)

Suggested practical workflow (apply these before long runs)

Notes & cautions

Quick reference: exact variables to set

6.3 KiB

Raw Permalink Blame History

Card Model (histogram + equity) — variables in `card_model/config.py`

MCCFR Trainer (`mccfr_trainer.py`) — main self-play + network training