26 KiB
神经网络架构 & C++ 在线推理指南
1. 系统总览
整个推理流水线由 两个神经网络 串联组成:
游戏状态 (OpenSpiel / Botzone)
│
├─── 提取牌面 ──→ CardModel ──→ pred_histogram [50] ──┐
│ ├─ concat → [55]
└─── 提取局势 ──→ env_features [5] ──┘
│
CFRNetwork
│
┌────────────┴────────────┐
│ │
regret_head [5] policy_head [5]
│ │
Regret Matching Softmax(masked)
│ │
current_strategy [5] avg_strategy [5]
在线推理只需要 avg_strategy —— 它是经过 Softmax 归一化的动作概率分布,从中采样即可得到最终动作。
2. CardModel 架构
2.1 用途
将扑克手牌(2 张底牌 + 0~5 张公共牌)编码为一个 50 维胜率直方图,作为 CFRNetwork 的输入之一。
2.2 网络结构
输入:
x_hole: [batch, 2] int64 — 2 张底牌 ID (0-51)
x_board: [batch, 5] int64 — 5 张公共牌 ID (0-51, 不足用 52 填充)
Embedding: nn.Embedding(53, 64, padding_idx=52)
- 53 个 token: 52 张牌 + 1 个 PAD(52)
- padding_idx=52 表示 PAD token 的 embedding 恒为 0
编码:
hole_emb = Embedding(x_hole).sum(dim=1) → [batch, 64]
board_emb = Embedding(x_board).sum(dim=1) → [batch, 64]
combined = cat([hole_emb, board_emb]) → [batch, 128]
Backbone MLP (每层: Linear → ReLU → LayerNorm):
128 → 512 → ReLU → LayerNorm(512)
512 → 512 → ReLU → LayerNorm(512)
512 → 256 → ReLU → LayerNorm(256)
Equity Head (在线推理不需要):
256 → 32 → ReLU → 1 → Sigmoid → [batch, 1] 标量胜率
Histogram Head (在线推理只需要这个):
256 → 64 → ReLU → 50 → Softmax → [batch, 50] 胜率直方图
输出:
pred_equity: [batch, 1] 胜率标量 (0~1)
pred_histogram: [batch, 50] 胜率直方图 (和为1) ← 这是 CFRNetwork 的 card_features
2.3 权重参数明细
| 层名 (state_dict key) | 形状 | 参数量 |
|---|---|---|
embedding.weight |
[53, 64] | 3,392 |
backbone.0.weight (Linear 128→512) |
[512, 128] | 65,536 |
backbone.0.bias |
[512] | 512 |
backbone.2.weight (LayerNorm 512) |
[512] | 512 |
backbone.2.bias (LayerNorm 512) |
[512] | 512 |
backbone.3.weight (Linear 512→512) |
[512, 512] | 262,144 |
backbone.3.bias |
[512] | 512 |
backbone.5.weight (LayerNorm 512) |
[512] | 512 |
backbone.5.bias (LayerNorm 512) |
512 | |
backbone.6.weight (Linear 512→256) |
[256, 512] | 131,072 |
backbone.6.bias |
[256] | 256 |
backbone.8.weight (LayerNorm 256) |
[256] | 256 |
backbone.8.bias (LayerNorm 256) |
256 | |
equity_head.0.weight (Linear 256→32) |
[32, 256] | 8,192 |
equity_head.0.bias |
[32] | 32 |
equity_head.2.weight (Linear 32→1) |
[1, 32] | 32 |
equity_head.2.bias |
[1] | 1 |
histogram_head.0.weight (Linear 256→64) |
[64, 256] | 16,384 |
histogram_head.0.bias |
[64] | 64 |
histogram_head.2.weight (Linear 64→50) |
[50, 64] | 3,200 |
histogram_head.2.bias |
[50] | 50 |
| 合计 | ~426,803 |
2.4 牌面 ID 编码规则
card_id = rank * 4 + suit
rank: 0=2, 1=3, 2=4, 3=5, 4=6, 5=7, 6=8, 7=9, 8=T, 9=J, 10=Q, 11=K, 12=A
suit: 0=c(梅花), 1=d(方块), 2=h(红心), 3=s(黑桃)
示例: Ac = 12*4+0 = 48, Ks = 11*4+3 = 47, 2c = 0*4+0 = 0
PAD_TOKEN = 52 (embedding 恒为零向量)
3. CFRNetwork 架构
3.1 用途
接受牌面特征 + 局势特征,输出 5 个动作 的遗憾值和策略 logits,经 Regret Matching 和 Softmax 得到动作概率分布。
3.2 网络结构
输入:
card_features: [batch, 50] — CardModel 输出的胜率直方图
env_features: [batch, 5] — 归一化后的局势特征
拼接:
x = cat([card_features, env_features]) → [batch, 55]
Backbone MLP (每层: Linear → ReLU):
55 → 256 → ReLU
256 → 256 → ReLU
256 → 128 → ReLU
Regret Head (在线推理可跳过):
128 → 5 (无激活函数, regret 可为负数)
Policy Head:
128 → 5 (输出 logits, 后续过 Softmax)
输出:
regrets: [batch, 5] 遗憾值原始输出
policy_logits: [batch, 5] 策略 logits
3.3 权重参数明细
| 层名 (state_dict key) | 形状 | 参数量 |
|---|---|---|
backbone.0.weight (Linear 55→256) |
[256, 55] | 14,080 |
backbone.0.bias |
[256] | 256 |
backbone.2.weight (Linear 256→256) |
[256, 256] | 65,536 |
backbone.2.bias |
[256] | 256 |
backbone.4.weight (Linear 256→128) |
[128, 256] | 32,768 |
backbone.4.bias |
[128] | 128 |
regret_head.weight (Linear 128→5) |
[5, 128] | 640 |
regret_head.bias |
[5] | 5 |
policy_head.weight (Linear 128→5) |
[5, 128] | 640 |
policy_head.bias |
[5] | 5 |
| 合计 | ~114,314 |
注意:
backbone是nn.Sequential,索引 0,2,4 是 Linear 层,索引 1,3,5 是 ReLU(无参数)。
3.4 动作空间
| 索引 | 名称 | 含义 |
|---|---|---|
| 0 | FOLD | 弃牌 |
| 1 | CALL | 跟注/过牌 |
| 2 | HALF_POT | 加注 = 1/2 跟注后底池 |
| 3 | FULL_POT | 加注 = 1.0 跟注后底池 |
| 4 | ALL_IN | 全押 |
4. 推理流水线详解
4.1 输入特征构造
4.1.1 env_features [5] 构造
env_features = [
pot / 20000.0, // 底池归一化
p0_stack / 20000.0, // 玩家0剩余筹码归一化
p1_stack / 20000.0, // 玩家1剩余筹码归一化
street / 3.0, // 轮次归一化 (0=Preflop, 1=Flop, 2=Turn, 3=River)
position, // 当前行动玩家 (0.0 或 1.0)
]
4.1.2 card_features [50] 构造
1. 提取当前玩家的 2 张底牌 ID → hole_cards[2]
2. 提取公共牌 ID (0~5张) → board_cards[0..5]
3. 公共牌不足 5 张时用 PAD_TOKEN=52 填充 → x_board[5]
4. 送入 CardModel:
- hole_emb = Embedding(hole_cards).sum(行向量求和) → [64]
- board_emb = Embedding(x_board).sum(行向量求和) → [64]
- combined = cat([hole_emb, board_emb]) → [128]
- features = Backbone(combined) → [256]
- histogram = HistogramHead(features) → [50]
5. card_features = histogram (50维胜率直方图)
4.1.3 legal_mask [5] 构造
legal_mask = [fold_ok, call_ok, half_pot_ok, full_pot_ok, allin_ok]
每个元素为 0 或 1,标识该 CFR 动作是否合法。
构造规则:
FOLD(0): 引擎合法动作包含 action 0
CALL(1): 引擎合法动作包含 action 1
ALL_IN(4): 引擎存在 >1 的加注动作
HALF_POT(2)/FULL_POT(3): 需同时满足:
- 存在加注动作
- 当前 street 加注次数 < 2 (RAISE_CAP)
- 计算目标贡献额 >= 最小加注额
- 目标贡献额映射后不等于 ALL-IN 动作
4.2 从 policy_logits 到动作概率 (avg_strategy)
这是在线推理的核心逻辑,只需使用 policy_head 的输出:
1. logits = policy_head(backbone_output) → [5]
2. masked_logits = logits
对 legal_mask[i]==0 的位置, 令 masked_logits[i] = -1e9
3. avg_strategy = softmax(masked_logits) → [5]
所有合法动作概率 > 0, 非法动作概率 ≈ 0
4. 从 avg_strategy 的合法动作中采样
4.3 降噪 (可选但推荐)
训练代码中的实践经验:将概率低于 3% 的动作直接置零后重新归一化,防止神经网络底噪导致异常 All-in。
5. C++ 实现指南
5.1 推荐方案: 手写前向传播 + 加载权重
由于网络结构简单(纯 MLP,无卷积/注意力),不需要 LibTorch,直接用 Eigen 或手写矩阵乘法即可,部署体积小、推理快。
5.2 权重文件格式
PyTorch 的 .pt 文件本质是 Python pickle 序列化的 dict。C++ 直接读取比较麻烦,推荐两步转换:
Step 1: Python 导出为二进制
import torch
import struct
def export_weights_bin(state_dict, output_path):
"""将 state_dict 导出为 C++ 可直接读取的二进制文件。"""
with open(output_path, 'wb') as f:
# 写入张量数量
f.write(struct.pack('I', len(state_dict)))
for name, tensor in state_dict.items():
# 写入名字长度 + 名字
name_bytes = name.encode('utf-8')
f.write(struct.pack('I', len(name_bytes)))
f.write(name_bytes)
# 写入维度数量
shape = tensor.shape
f.write(struct.pack('I', len(shape)))
# 写入每个维度大小
for dim in shape:
f.write(struct.pack('I', dim))
# 写入数据 (float32)
data = tensor.float().numpy().tobytes()
f.write(data)
# 导出 CardModel 权重
card_sd = torch.load("card_model/data/best_card_model.pt", map_location="cpu", weights_only=False)
if "model_state_dict" in card_sd:
card_sd = card_sd["model_state_dict"]
export_weights_bin(card_sd, "weights/card_model.bin")
# 导出 CFRNetwork 权重
cfr_sd = torch.load("botzone_cfr_net.pt", map_location="cpu", weights_only=False)
if "model_state_dict" in cfr_sd:
cfr_sd = cfr_sd["model_state_dict"]
export_weights_bin(cfr_sd, "weights/cfr_net.bin")
Step 2: C++ 读取二进制权重
struct Tensor {
std::string name;
std::vector<int> shape;
std::vector<float> data;
};
std::unordered_map<std::string, Tensor> load_weights(const std::string& path) {
std::unordered_map<std::string, Tensor> weights;
std::ifstream f(path, std::ios::binary);
uint32_t num_tensors;
f.read((char*)&num_tensors, 4);
for (uint32_t i = 0; i < num_tensors; i++) {
Tensor t;
uint32_t name_len;
f.read((char*)&name_len, 4);
t.name.resize(name_len);
f.read(t.name.data(), name_len);
uint32_t ndim;
f.read((char*)&ndim, 4);
t.shape.resize(ndim);
for (uint32_t d = 0; d < ndim; d++)
f.read((char*)&t.shape[d], 4);
int total = 1;
for (int d : t.shape) total *= d;
t.data.resize(total);
f.read((char*)t.data.data(), total * sizeof(float));
weights[t.name] = std::move(t);
}
return weights;
}
5.3 C++ 前向传播实现
5.3.1 基础算子
// 矩阵乘法: y = W * x + b (W: [out, in], x: [in], b: [out])
void linear(const float* W, const float* b, const float* x,
float* y, int in_dim, int out_dim) {
for (int i = 0; i < out_dim; i++) {
float sum = b[i];
for (int j = 0; j < in_dim; j++) {
sum += W[i * in_dim + j] * x[j];
}
y[i] = sum;
}
}
// ReLU
void relu(float* x, int dim) {
for (int i = 0; i < dim; i++)
x[i] = std::max(0.0f, x[i]);
}
// LayerNorm: y = (x - mean) / sqrt(var + eps) * gamma + beta
void layer_norm(const float* gamma, const float* beta,
const float* x, float* y, int dim, float eps = 1e-5f) {
float mean = 0.0f;
for (int i = 0; i < dim; i++) mean += x[i];
mean /= dim;
float var = 0.0f;
for (int i = 0; i < dim; i++) var += (x[i] - mean) * (x[i] - mean);
var /= dim;
float inv_std = 1.0f / std::sqrt(var + eps);
for (int i = 0; i < dim; i++)
y[i] = gamma[i] * (x[i] - mean) * inv_std + beta[i];
}
// Sigmoid
void sigmoid(float* x, int dim) {
for (int i = 0; i < dim; i++)
x[i] = 1.0f / (1.0f + std::exp(-x[i]));
}
// Softmax
void softmax(float* x, int dim) {
float max_val = *std::max_element(x, x + dim);
float sum = 0.0f;
for (int i = 0; i < dim; i++) {
x[i] = std::exp(x[i] - max_val);
sum += x[i];
}
for (int i = 0; i < dim; i++) x[i] /= sum;
}
5.3.2 CardModel 前向传播
class CardModelInference {
public:
// 预分配缓冲区
float hole_emb[64]; // embedding sum
float board_emb[64]; // embedding sum
float combined[128]; // concat
float backbone_buf[3][512]; // 各隐藏层
float hist_fc1[64];
float hist_out[50];
// 权重引用 (从 load_weights 获取)
const float* emb_weight; // [53, 64]
float backbone_w[3], backbone_b[3]; // Linear 权重/偏置
float ln_gamma[3], ln_beta[3]; // LayerNorm 参数
float hist_w1, hist_b1; // 256→64
float hist_w2, hist_b2; // 64→50
void forward(const int* hole_cards, // [2] int, 0-51
const int* board_cards, // [5] int, 0-51 (不足填52)
float* histogram) { // 输出 [50]
// 1. Embedding lookup + sum
// hole_emb = emb_weight[hole_cards[0]] + emb_weight[hole_cards[1]]
memset(hole_emb, 0, 64 * sizeof(float));
for (int c = 0; c < 2; c++) {
const float* emb = emb_weight + hole_cards[c] * 64;
for (int i = 0; i < 64; i++)
hole_emb[i] += emb[i];
}
// board_emb = sum of emb_weight[board_cards[i]], PAD(52) 的 embedding 全为0
memset(board_emb, 0, 64 * sizeof(float));
for (int c = 0; c < 5; c++) {
if (board_cards[c] == 52) continue; // PAD, skip
const float* emb = emb_weight + board_cards[c] * 64;
for (int i = 0; i < 64; i++)
board_emb[i] += emb[i];
}
// 2. Concat [hole_emb | board_emb]
memcpy(combined, hole_emb, 64 * sizeof(float));
memcpy(combined + 64, board_emb, 64 * sizeof(float));
// 3. Backbone: Linear → ReLU → LayerNorm × 3
int in_dim = 128;
int hidden_dims[3] = {512, 512, 256};
const float* input = combined;
for (int layer = 0; layer < 3; layer++) {
linear(backbone_w[layer], backbone_b[layer],
input, backbone_buf[layer], in_dim, hidden_dims[layer]);
relu(backbone_buf[layer], hidden_dims[layer]);
layer_norm(ln_gamma[layer], ln_beta[layer],
backbone_buf[layer], backbone_buf[layer], hidden_dims[layer]);
in_dim = hidden_dims[layer];
input = backbone_buf[layer];
}
// 4. Histogram head: 256 → 64 (ReLU) → 50 (Softmax)
linear(hist_w1, hist_b1, backbone_buf[2], hist_fc1, 256, 64);
relu(hist_fc1, 64);
linear(hist_w2, hist_b2, hist_fc1, hist_out, 64, 50);
softmax(hist_out, 50);
memcpy(histogram, hist_out, 50 * sizeof(float));
}
};
5.3.3 CFRNetwork 前向传播 (只走 policy_head)
class CFRNetInference {
public:
float concat_buf[55]; // card_features[50] + env_features[5]
float backbone_buf[3]; // 三层隐藏层
float logits[5]; // policy_head 输出
float strategy[5]; // 最终动作概率
// 权重
const float* backbone_w[3]; const float* backbone_b[3]; // 三层 Linear
const float* policy_w; const float* policy_b; // 128→5
void forward(const float* card_features, // [50]
const float* env_features, // [5]
const int* legal_mask, // [5], 0或1
float* out_strategy) { // 输出 [5]
// 1. Concat
memcpy(concat_buf, card_features, 50 * sizeof(float));
memcpy(concat_buf + 50, env_features, 5 * sizeof(float));
// 2. Backbone: Linear → ReLU × 3
int in_dim = 55;
int hidden_dims[3] = {256, 256, 128};
const float* input = concat_buf;
float* output = nullptr;
// 需要为每层分配缓冲区, 这里简化表示
// layer 0: 55 → 256
float h0[256];
linear(backbone_w[0], backbone_b[0], input, h0, 55, 256);
relu(h0, 256);
// layer 1: 256 → 256
float h1[256];
linear(backbone_w[1], backbone_b[1], h0, h1, 256, 256);
relu(h1, 256);
// layer 2: 256 → 128
float h2[128];
linear(backbone_w[2], backbone_b[2], h1, h2, 256, 128);
relu(h2, 128);
// 3. Policy head: 128 → 5
linear(policy_w, policy_b, h2, logits, 128, 5);
// 4. Masked Softmax
for (int i = 0; i < 5; i++) {
if (legal_mask[i] == 0)
logits[i] = -1e9f; // 非法动作设大负数
}
softmax(logits, 5);
memcpy(out_strategy, logits, 5 * sizeof(float));
}
};
5.3.4 完整推理流程
// === Step 1: 从游戏状态提取信息 ===
int hole_cards[2] = { /* 底牌 ID 0-51 */ };
int board_cards[5] = { /* 公共牌 ID, 不足5张用52填充 */ };
float env_features[5] = {
pot / 20000.0f,
p0_stack / 20000.0f,
p1_stack / 20000.0f,
street / 3.0f,
(float)position
};
int legal_mask[5] = { /* 由 BetTranslator 逻辑计算 */ };
// === Step 2: CardModel 前向传播 ===
float card_features[50];
card_model.forward(hole_cards, board_cards, card_features);
// === Step 3: CFRNetwork 前向传播 ===
float strategy[5];
cfr_net.forward(card_features, env_features, legal_mask, strategy);
// === Step 4: 采样 (可选: 先降噪) ===
// 降噪: 将 <3% 的概率置零后重新归一化
float threshold = 0.03f;
float sum = 0.0f;
for (int i = 0; i < 5; i++) {
if (legal_mask[i] && strategy[i] < threshold)
strategy[i] = 0.0f;
sum += strategy[i];
}
if (sum > 0) for (int i = 0; i < 5; i++) strategy[i] /= sum;
// 从合法动作中按概率采样
int chosen = sample_from_distribution(strategy, legal_mask);
// === Step 5: CFR 动作 → 引擎动作 (BetTranslator) ===
int engine_action = cfr_to_engine(state, chosen);
5.4 权重加载映射表
C++ 加载权重时,需要按照 PyTorch state_dict 的 key 名映射到对应的层:
CardModel 权重 key → 用途
| state_dict key | 形状 | 用途 |
|---|---|---|
embedding.weight |
[53, 64] | Embedding 查找表 |
backbone.0.weight |
[512, 128] | Linear 128→512 |
backbone.0.bias |
[512] | |
backbone.2.weight |
[512, 512] | LayerNorm gamma (错! 见下方) |
backbone.2.bias |
[512] | LayerNorm beta |
backbone.3.weight |
[512, 512] | Linear 512→512 |
backbone.3.bias |
[512] | |
backbone.5.weight |
[512, 512] | LayerNorm gamma |
backbone.5.bias |
[512] | LayerNorm beta |
backbone.6.weight |
[256, 512] | Linear 512→256 |
backbone.6.bias |
[256] | |
backbone.8.weight |
[256] | LayerNorm gamma |
backbone.8.bias |
[256] | LayerNorm beta |
histogram_head.0.weight |
[64, 256] | Linear 256→64 |
histogram_head.0.bias |
[64] | |
histogram_head.2.weight |
[50, 64] | Linear 64→50 |
histogram_head.2.bias |
[50] | |
equity_head.* |
- | 在线推理不需要 |
重要: CardModel 的
backbone是nn.Sequential,层的索引对应关系为:
backbone.0= Linear(128, 512)backbone.1= ReLU (无参数)backbone.2= LayerNorm(512)backbone.3= Linear(512, 512)backbone.4= ReLU (无参数)backbone.5= LayerNorm(512)backbone.6= Linear(512, 256)backbone.7= ReLU (无参数)backbone.8= LayerNorm(256)
CFRNetwork 权重 key → 用途
| state_dict key | 形状 | 用途 |
|---|---|---|
backbone.0.weight |
[256, 55] | Linear 55→256 |
backbone.0.bias |
[256] | |
backbone.2.weight |
[256, 256] | Linear 256→256 |
backbone.2.bias |
[256] | |
backbone.4.weight |
[128, 256] | Linear 256→128 |
backbone.4.bias |
[128] | |
regret_head.weight |
[5, 128] | 在线推理不需要 |
regret_head.bias |
[5] | 在线推理不需要 |
policy_head.weight |
[5, 128] | Linear 128→5 |
policy_head.bias |
[5] |
重要: CFRNetwork 的
backbone索引:
backbone.0= Linear(55, 256)backbone.1= ReLU (无参数)backbone.2= Linear(256, 256)backbone.3= ReLU (无参数)backbone.4= Linear(256, 128)backbone.5= ReLU (无参数)
6. 可优化项
6.1 是否需要 CardModel?
需要。CardModel 是推理流水线的必要组成部分,它将离散的牌面信息编码为 50 维连续向量,CFRNetwork 依赖这个输入。如果去掉 CardModel,你需要另外设计牌面编码方式,且训练好的 CFRNetwork 权重将无法使用。
6.2 是否需要 Regret Head?
在线推理不需要。Regret Head 用于训练时的 Regret Matching,在线推理只使用 Policy Head + Softmax 得到 avg_strategy。可以不加载 regret_head 的权重以节省内存。
6.3 是否需要 Equity Head?
在线推理不需要。Equity Head 只输出标量胜率用于监控,推理时只需要 Histogram Head 的 50 维输出。
6.4 替代方案: LibTorch
如果不想手写前向传播,可以使用 LibTorch (PyTorch C++ API) 直接加载 .pt 权重并执行推理。优点是代码量少,缺点是依赖体积大 (~200MB+)。
6.5 替代方案: ONNX Runtime
可以将两个模型导出为 ONNX 格式,用 ONNX Runtime C++ API 推理。兼顾易用性和性能。
# 导出 CardModel 到 ONNX
card_model = CardModel()
card_model.load_state_dict(torch.load("card_model/data/best_card_model.pt", map_location="cpu", weights_only=False))
card_model.eval()
dummy_hole = torch.randint(0, 52, (1, 2))
dummy_board = torch.randint(0, 52, (1, 5))
torch.onnx.export(card_model, (dummy_hole, dummy_board), "card_model.onnx",
input_names=["x_hole", "x_board"],
output_names=["pred_equity", "pred_histogram"])
# 导出 CFRNetwork 到 ONNX
cfr_net = CFRNetwork()
cfr_net.load_state_dict(torch.load("botzone_cfr_net.pt", map_location="cpu", weights_only=False))
cfr_net.eval()
dummy_card = torch.randn(1, 50)
dummy_env = torch.randn(1, 5)
torch.onnx.export(cfr_net, (dummy_card, dummy_env), "cfr_net.onnx",
input_names=["card_features", "env_features"],
output_names=["regrets", "policy_logits"])
7. 数据流总结
┌─────────────────────────────────────────────────────────────────┐
│ 在线推理数据流 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 游戏状态 │
│ ├── hole_cards[2] (int, 0-51) │
│ ├── board_cards[5] (int, 0-51 or 52=PAD) │
│ ├── pot, stacks, street, position │
│ └── legal_actions (引擎原生) │
│ │ │
│ ▼ │
│ ┌─────────────────┐ ┌──────────────────┐ │
│ │ CardModel │ │ Feature Builder │ │
│ │ Embedding(53,64)│ │ pot/20000 │ │
│ │ sum → cat │ │ p0_stack/20000 │ │
│ │ MLP 128→512→ │ │ p1_stack/20000 │ │
│ │ 512→256 │ │ street/3.0 │ │
│ │ +LayerNorm │ │ position (0|1) │ │
│ │ HistHead 256→ │ └────────┬─────────┘ │
│ │ 64→50+Sofmax │ │ │
│ └────────┬────────┘ │ │
│ │ │ │
│ card_features[50] env_features[5] │
│ │ │ │
│ └──────────┬───────────┘ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ CFRNetwork │ │
│ │ cat → [55] │ │
│ │ MLP 55→256→256→128 │ │
│ │ PolicyHead 128→5 │ │
│ └────────┬────────────┘ │
│ │ │
│ policy_logits[5] │
│ │ │
│ masked_fill(非法→-1e9) │
│ │ │
│ Softmax → avg_strategy[5] │
│ │ │
│ 降噪 (<3%→0, 重归一化) │
│ │ │
│ 按概率采样 → chosen_cfr_idx │
│ │ │
│ BetTranslator → engine_action │
│ │ │
│ 执行引擎动作 │
│ │
└─────────────────────────────────────────────────────────────────┘