# 神经网络架构 & C++ 在线推理指南 ## 1. 系统总览 整个推理流水线由 **两个神经网络** 串联组成: ``` 游戏状态 (OpenSpiel / Botzone) │ ├─── 提取牌面 ──→ CardModel ──→ pred_histogram [50] ──┐ │ ├─ concat → [55] └─── 提取局势 ──→ env_features [5] ──┘ │ CFRNetwork │ ┌────────────┴────────────┐ │ │ regret_head [5] policy_head [5] │ │ Regret Matching Softmax(masked) │ │ current_strategy [5] avg_strategy [5] ``` **在线推理只需要 `avg_strategy`** —— 它是经过 Softmax 归一化的动作概率分布,从中采样即可得到最终动作。 --- ## 2. CardModel 架构 ### 2.1 用途 将扑克手牌(2 张底牌 + 0~5 张公共牌)编码为一个 **50 维胜率直方图**,作为 CFRNetwork 的输入之一。 ### 2.2 网络结构 ``` 输入: x_hole: [batch, 2] int64 — 2 张底牌 ID (0-51) x_board: [batch, 5] int64 — 5 张公共牌 ID (0-51, 不足用 52 填充) Embedding: nn.Embedding(53, 64, padding_idx=52) - 53 个 token: 52 张牌 + 1 个 PAD(52) - padding_idx=52 表示 PAD token 的 embedding 恒为 0 编码: hole_emb = Embedding(x_hole).sum(dim=1) → [batch, 64] board_emb = Embedding(x_board).sum(dim=1) → [batch, 64] combined = cat([hole_emb, board_emb]) → [batch, 128] Backbone MLP (每层: Linear → ReLU → LayerNorm): 128 → 512 → ReLU → LayerNorm(512) 512 → 512 → ReLU → LayerNorm(512) 512 → 256 → ReLU → LayerNorm(256) Equity Head (在线推理不需要): 256 → 32 → ReLU → 1 → Sigmoid → [batch, 1] 标量胜率 Histogram Head (在线推理只需要这个): 256 → 64 → ReLU → 50 → Softmax → [batch, 50] 胜率直方图 输出: pred_equity: [batch, 1] 胜率标量 (0~1) pred_histogram: [batch, 50] 胜率直方图 (和为1) ← 这是 CFRNetwork 的 card_features ``` ### 2.3 权重参数明细 | 层名 (state_dict key) | 形状 | 参数量 | |---|---|---| | `embedding.weight` | [53, 64] | 3,392 | | `backbone.0.weight` (Linear 128→512) | [512, 128] | 65,536 | | `backbone.0.bias` | [512] | 512 | | `backbone.2.weight` (LayerNorm 512) | [512] | 512 | | `backbone.2.bias` (LayerNorm 512) | [512] | 512 | | `backbone.3.weight` (Linear 512→512) | [512, 512] | 262,144 | | `backbone.3.bias` | [512] | 512 | | `backbone.5.weight` (LayerNorm 512) | [512] | 512 | | `backbone.5.bias` (LayerNorm 512) | 512 | | `backbone.6.weight` (Linear 512→256) | [256, 512] | 131,072 | | `backbone.6.bias` | [256] | 256 | | `backbone.8.weight` (LayerNorm 256) | [256] | 256 | | `backbone.8.bias` (LayerNorm 256) | 256 | | `equity_head.0.weight` (Linear 256→32) | [32, 256] | 8,192 | | `equity_head.0.bias` | [32] | 32 | | `equity_head.2.weight` (Linear 32→1) | [1, 32] | 32 | | `equity_head.2.bias` | [1] | 1 | | `histogram_head.0.weight` (Linear 256→64) | [64, 256] | 16,384 | | `histogram_head.0.bias` | [64] | 64 | | `histogram_head.2.weight` (Linear 64→50) | [50, 64] | 3,200 | | `histogram_head.2.bias` | [50] | 50 | | **合计** | | **~426,803** | ### 2.4 牌面 ID 编码规则 ``` card_id = rank * 4 + suit rank: 0=2, 1=3, 2=4, 3=5, 4=6, 5=7, 6=8, 7=9, 8=T, 9=J, 10=Q, 11=K, 12=A suit: 0=c(梅花), 1=d(方块), 2=h(红心), 3=s(黑桃) 示例: Ac = 12*4+0 = 48, Ks = 11*4+3 = 47, 2c = 0*4+0 = 0 PAD_TOKEN = 52 (embedding 恒为零向量) ``` --- ## 3. CFRNetwork 架构 ### 3.1 用途 接受牌面特征 + 局势特征,输出 **5 个动作** 的遗憾值和策略 logits,经 Regret Matching 和 Softmax 得到动作概率分布。 ### 3.2 网络结构 ``` 输入: card_features: [batch, 50] — CardModel 输出的胜率直方图 env_features: [batch, 5] — 归一化后的局势特征 拼接: x = cat([card_features, env_features]) → [batch, 55] Backbone MLP (每层: Linear → ReLU): 55 → 256 → ReLU 256 → 256 → ReLU 256 → 128 → ReLU Regret Head (在线推理可跳过): 128 → 5 (无激活函数, regret 可为负数) Policy Head: 128 → 5 (输出 logits, 后续过 Softmax) 输出: regrets: [batch, 5] 遗憾值原始输出 policy_logits: [batch, 5] 策略 logits ``` ### 3.3 权重参数明细 | 层名 (state_dict key) | 形状 | 参数量 | |---|---|---| | `backbone.0.weight` (Linear 55→256) | [256, 55] | 14,080 | | `backbone.0.bias` | [256] | 256 | | `backbone.2.weight` (Linear 256→256) | [256, 256] | 65,536 | | `backbone.2.bias` | [256] | 256 | | `backbone.4.weight` (Linear 256→128) | [128, 256] | 32,768 | | `backbone.4.bias` | [128] | 128 | | `regret_head.weight` (Linear 128→5) | [5, 128] | 640 | | `regret_head.bias` | [5] | 5 | | `policy_head.weight` (Linear 128→5) | [5, 128] | 640 | | `policy_head.bias` | [5] | 5 | | **合计** | | **~114,314** | > 注意: `backbone` 是 `nn.Sequential`,索引 0,2,4 是 Linear 层,索引 1,3,5 是 ReLU(无参数)。 ### 3.4 动作空间 | 索引 | 名称 | 含义 | |---|---|---| | 0 | FOLD | 弃牌 | | 1 | CALL | 跟注/过牌 | | 2 | HALF_POT | 加注 = 1/2 跟注后底池 | | 3 | FULL_POT | 加注 = 1.0 跟注后底池 | | 4 | ALL_IN | 全押 | --- ## 4. 推理流水线详解 ### 4.1 输入特征构造 #### 4.1.1 env_features [5] 构造 ``` env_features = [ pot / 20000.0, // 底池归一化 p0_stack / 20000.0, // 玩家0剩余筹码归一化 p1_stack / 20000.0, // 玩家1剩余筹码归一化 street / 3.0, // 轮次归一化 (0=Preflop, 1=Flop, 2=Turn, 3=River) position, // 当前行动玩家 (0.0 或 1.0) ] ``` #### 4.1.2 card_features [50] 构造 ``` 1. 提取当前玩家的 2 张底牌 ID → hole_cards[2] 2. 提取公共牌 ID (0~5张) → board_cards[0..5] 3. 公共牌不足 5 张时用 PAD_TOKEN=52 填充 → x_board[5] 4. 送入 CardModel: - hole_emb = Embedding(hole_cards).sum(行向量求和) → [64] - board_emb = Embedding(x_board).sum(行向量求和) → [64] - combined = cat([hole_emb, board_emb]) → [128] - features = Backbone(combined) → [256] - histogram = HistogramHead(features) → [50] 5. card_features = histogram (50维胜率直方图) ``` #### 4.1.3 legal_mask [5] 构造 ``` legal_mask = [fold_ok, call_ok, half_pot_ok, full_pot_ok, allin_ok] 每个元素为 0 或 1,标识该 CFR 动作是否合法。 构造规则: FOLD(0): 引擎合法动作包含 action 0 CALL(1): 引擎合法动作包含 action 1 ALL_IN(4): 引擎存在 >1 的加注动作 HALF_POT(2)/FULL_POT(3): 需同时满足: - 存在加注动作 - 当前 street 加注次数 < 2 (RAISE_CAP) - 计算目标贡献额 >= 最小加注额 - 目标贡献额映射后不等于 ALL-IN 动作 ``` ### 4.2 从 policy_logits 到动作概率 (avg_strategy) 这是在线推理的核心逻辑,只需使用 `policy_head` 的输出: ``` 1. logits = policy_head(backbone_output) → [5] 2. masked_logits = logits 对 legal_mask[i]==0 的位置, 令 masked_logits[i] = -1e9 3. avg_strategy = softmax(masked_logits) → [5] 所有合法动作概率 > 0, 非法动作概率 ≈ 0 4. 从 avg_strategy 的合法动作中采样 ``` ### 4.3 降噪 (可选但推荐) 训练代码中的实践经验:将概率低于 3% 的动作直接置零后重新归一化,防止神经网络底噪导致异常 All-in。 --- ## 5. C++ 实现指南 ### 5.1 推荐方案: 手写前向传播 + 加载权重 由于网络结构简单(纯 MLP,无卷积/注意力),**不需要 LibTorch**,直接用 Eigen 或手写矩阵乘法即可,部署体积小、推理快。 ### 5.2 权重文件格式 PyTorch 的 `.pt` 文件本质是 Python pickle 序列化的 dict。C++ 直接读取比较麻烦,推荐两步转换: **Step 1: Python 导出为二进制** ```python import torch import struct def export_weights_bin(state_dict, output_path): """将 state_dict 导出为 C++ 可直接读取的二进制文件。""" with open(output_path, 'wb') as f: # 写入张量数量 f.write(struct.pack('I', len(state_dict))) for name, tensor in state_dict.items(): # 写入名字长度 + 名字 name_bytes = name.encode('utf-8') f.write(struct.pack('I', len(name_bytes))) f.write(name_bytes) # 写入维度数量 shape = tensor.shape f.write(struct.pack('I', len(shape))) # 写入每个维度大小 for dim in shape: f.write(struct.pack('I', dim)) # 写入数据 (float32) data = tensor.float().numpy().tobytes() f.write(data) # 导出 CardModel 权重 card_sd = torch.load("card_model/data/best_card_model.pt", map_location="cpu", weights_only=False) if "model_state_dict" in card_sd: card_sd = card_sd["model_state_dict"] export_weights_bin(card_sd, "weights/card_model.bin") # 导出 CFRNetwork 权重 cfr_sd = torch.load("botzone_cfr_net.pt", map_location="cpu", weights_only=False) if "model_state_dict" in cfr_sd: cfr_sd = cfr_sd["model_state_dict"] export_weights_bin(cfr_sd, "weights/cfr_net.bin") ``` **Step 2: C++ 读取二进制权重** ```cpp struct Tensor { std::string name; std::vector shape; std::vector data; }; std::unordered_map load_weights(const std::string& path) { std::unordered_map weights; std::ifstream f(path, std::ios::binary); uint32_t num_tensors; f.read((char*)&num_tensors, 4); for (uint32_t i = 0; i < num_tensors; i++) { Tensor t; uint32_t name_len; f.read((char*)&name_len, 4); t.name.resize(name_len); f.read(t.name.data(), name_len); uint32_t ndim; f.read((char*)&ndim, 4); t.shape.resize(ndim); for (uint32_t d = 0; d < ndim; d++) f.read((char*)&t.shape[d], 4); int total = 1; for (int d : t.shape) total *= d; t.data.resize(total); f.read((char*)t.data.data(), total * sizeof(float)); weights[t.name] = std::move(t); } return weights; } ``` ### 5.3 C++ 前向传播实现 #### 5.3.1 基础算子 ```cpp // 矩阵乘法: y = W * x + b (W: [out, in], x: [in], b: [out]) void linear(const float* W, const float* b, const float* x, float* y, int in_dim, int out_dim) { for (int i = 0; i < out_dim; i++) { float sum = b[i]; for (int j = 0; j < in_dim; j++) { sum += W[i * in_dim + j] * x[j]; } y[i] = sum; } } // ReLU void relu(float* x, int dim) { for (int i = 0; i < dim; i++) x[i] = std::max(0.0f, x[i]); } // LayerNorm: y = (x - mean) / sqrt(var + eps) * gamma + beta void layer_norm(const float* gamma, const float* beta, const float* x, float* y, int dim, float eps = 1e-5f) { float mean = 0.0f; for (int i = 0; i < dim; i++) mean += x[i]; mean /= dim; float var = 0.0f; for (int i = 0; i < dim; i++) var += (x[i] - mean) * (x[i] - mean); var /= dim; float inv_std = 1.0f / std::sqrt(var + eps); for (int i = 0; i < dim; i++) y[i] = gamma[i] * (x[i] - mean) * inv_std + beta[i]; } // Sigmoid void sigmoid(float* x, int dim) { for (int i = 0; i < dim; i++) x[i] = 1.0f / (1.0f + std::exp(-x[i])); } // Softmax void softmax(float* x, int dim) { float max_val = *std::max_element(x, x + dim); float sum = 0.0f; for (int i = 0; i < dim; i++) { x[i] = std::exp(x[i] - max_val); sum += x[i]; } for (int i = 0; i < dim; i++) x[i] /= sum; } ``` #### 5.3.2 CardModel 前向传播 ```cpp class CardModelInference { public: // 预分配缓冲区 float hole_emb[64]; // embedding sum float board_emb[64]; // embedding sum float combined[128]; // concat float backbone_buf[3][512]; // 各隐藏层 float hist_fc1[64]; float hist_out[50]; // 权重引用 (从 load_weights 获取) const float* emb_weight; // [53, 64] float backbone_w[3], backbone_b[3]; // Linear 权重/偏置 float ln_gamma[3], ln_beta[3]; // LayerNorm 参数 float hist_w1, hist_b1; // 256→64 float hist_w2, hist_b2; // 64→50 void forward(const int* hole_cards, // [2] int, 0-51 const int* board_cards, // [5] int, 0-51 (不足填52) float* histogram) { // 输出 [50] // 1. Embedding lookup + sum // hole_emb = emb_weight[hole_cards[0]] + emb_weight[hole_cards[1]] memset(hole_emb, 0, 64 * sizeof(float)); for (int c = 0; c < 2; c++) { const float* emb = emb_weight + hole_cards[c] * 64; for (int i = 0; i < 64; i++) hole_emb[i] += emb[i]; } // board_emb = sum of emb_weight[board_cards[i]], PAD(52) 的 embedding 全为0 memset(board_emb, 0, 64 * sizeof(float)); for (int c = 0; c < 5; c++) { if (board_cards[c] == 52) continue; // PAD, skip const float* emb = emb_weight + board_cards[c] * 64; for (int i = 0; i < 64; i++) board_emb[i] += emb[i]; } // 2. Concat [hole_emb | board_emb] memcpy(combined, hole_emb, 64 * sizeof(float)); memcpy(combined + 64, board_emb, 64 * sizeof(float)); // 3. Backbone: Linear → ReLU → LayerNorm × 3 int in_dim = 128; int hidden_dims[3] = {512, 512, 256}; const float* input = combined; for (int layer = 0; layer < 3; layer++) { linear(backbone_w[layer], backbone_b[layer], input, backbone_buf[layer], in_dim, hidden_dims[layer]); relu(backbone_buf[layer], hidden_dims[layer]); layer_norm(ln_gamma[layer], ln_beta[layer], backbone_buf[layer], backbone_buf[layer], hidden_dims[layer]); in_dim = hidden_dims[layer]; input = backbone_buf[layer]; } // 4. Histogram head: 256 → 64 (ReLU) → 50 (Softmax) linear(hist_w1, hist_b1, backbone_buf[2], hist_fc1, 256, 64); relu(hist_fc1, 64); linear(hist_w2, hist_b2, hist_fc1, hist_out, 64, 50); softmax(hist_out, 50); memcpy(histogram, hist_out, 50 * sizeof(float)); } }; ``` #### 5.3.3 CFRNetwork 前向传播 (只走 policy_head) ```cpp class CFRNetInference { public: float concat_buf[55]; // card_features[50] + env_features[5] float backbone_buf[3]; // 三层隐藏层 float logits[5]; // policy_head 输出 float strategy[5]; // 最终动作概率 // 权重 const float* backbone_w[3]; const float* backbone_b[3]; // 三层 Linear const float* policy_w; const float* policy_b; // 128→5 void forward(const float* card_features, // [50] const float* env_features, // [5] const int* legal_mask, // [5], 0或1 float* out_strategy) { // 输出 [5] // 1. Concat memcpy(concat_buf, card_features, 50 * sizeof(float)); memcpy(concat_buf + 50, env_features, 5 * sizeof(float)); // 2. Backbone: Linear → ReLU × 3 int in_dim = 55; int hidden_dims[3] = {256, 256, 128}; const float* input = concat_buf; float* output = nullptr; // 需要为每层分配缓冲区, 这里简化表示 // layer 0: 55 → 256 float h0[256]; linear(backbone_w[0], backbone_b[0], input, h0, 55, 256); relu(h0, 256); // layer 1: 256 → 256 float h1[256]; linear(backbone_w[1], backbone_b[1], h0, h1, 256, 256); relu(h1, 256); // layer 2: 256 → 128 float h2[128]; linear(backbone_w[2], backbone_b[2], h1, h2, 256, 128); relu(h2, 128); // 3. Policy head: 128 → 5 linear(policy_w, policy_b, h2, logits, 128, 5); // 4. Masked Softmax for (int i = 0; i < 5; i++) { if (legal_mask[i] == 0) logits[i] = -1e9f; // 非法动作设大负数 } softmax(logits, 5); memcpy(out_strategy, logits, 5 * sizeof(float)); } }; ``` #### 5.3.4 完整推理流程 ```cpp // === Step 1: 从游戏状态提取信息 === int hole_cards[2] = { /* 底牌 ID 0-51 */ }; int board_cards[5] = { /* 公共牌 ID, 不足5张用52填充 */ }; float env_features[5] = { pot / 20000.0f, p0_stack / 20000.0f, p1_stack / 20000.0f, street / 3.0f, (float)position }; int legal_mask[5] = { /* 由 BetTranslator 逻辑计算 */ }; // === Step 2: CardModel 前向传播 === float card_features[50]; card_model.forward(hole_cards, board_cards, card_features); // === Step 3: CFRNetwork 前向传播 === float strategy[5]; cfr_net.forward(card_features, env_features, legal_mask, strategy); // === Step 4: 采样 (可选: 先降噪) === // 降噪: 将 <3% 的概率置零后重新归一化 float threshold = 0.03f; float sum = 0.0f; for (int i = 0; i < 5; i++) { if (legal_mask[i] && strategy[i] < threshold) strategy[i] = 0.0f; sum += strategy[i]; } if (sum > 0) for (int i = 0; i < 5; i++) strategy[i] /= sum; // 从合法动作中按概率采样 int chosen = sample_from_distribution(strategy, legal_mask); // === Step 5: CFR 动作 → 引擎动作 (BetTranslator) === int engine_action = cfr_to_engine(state, chosen); ``` ### 5.4 权重加载映射表 C++ 加载权重时,需要按照 PyTorch `state_dict` 的 key 名映射到对应的层: #### CardModel 权重 key → 用途 | state_dict key | 形状 | 用途 | |---|---|---| | `embedding.weight` | [53, 64] | Embedding 查找表 | | `backbone.0.weight` | [512, 128] | Linear 128→512 | | `backbone.0.bias` | [512] | | | `backbone.2.weight` | [512, 512] | LayerNorm gamma (错! 见下方) | | `backbone.2.bias` | [512] | LayerNorm beta | | `backbone.3.weight` | [512, 512] | Linear 512→512 | | `backbone.3.bias` | [512] | | | `backbone.5.weight` | [512, 512] | LayerNorm gamma | | `backbone.5.bias` | [512] | LayerNorm beta | | `backbone.6.weight` | [256, 512] | Linear 512→256 | | `backbone.6.bias` | [256] | | | `backbone.8.weight` | [256] | LayerNorm gamma | | `backbone.8.bias` | [256] | LayerNorm beta | | `histogram_head.0.weight` | [64, 256] | Linear 256→64 | | `histogram_head.0.bias` | [64] | | | `histogram_head.2.weight` | [50, 64] | Linear 64→50 | | `histogram_head.2.bias` | [50] | | | `equity_head.*` | - | 在线推理不需要 | > **重要**: CardModel 的 `backbone` 是 `nn.Sequential`,层的索引对应关系为: > - `backbone.0` = Linear(128, 512) > - `backbone.1` = ReLU (无参数) > - `backbone.2` = LayerNorm(512) > - `backbone.3` = Linear(512, 512) > - `backbone.4` = ReLU (无参数) > - `backbone.5` = LayerNorm(512) > - `backbone.6` = Linear(512, 256) > - `backbone.7` = ReLU (无参数) > - `backbone.8` = LayerNorm(256) #### CFRNetwork 权重 key → 用途 | state_dict key | 形状 | 用途 | |---|---|---| | `backbone.0.weight` | [256, 55] | Linear 55→256 | | `backbone.0.bias` | [256] | | | `backbone.2.weight` | [256, 256] | Linear 256→256 | | `backbone.2.bias` | [256] | | | `backbone.4.weight` | [128, 256] | Linear 256→128 | | `backbone.4.bias` | [128] | | | `regret_head.weight` | [5, 128] | 在线推理不需要 | | `regret_head.bias` | [5] | 在线推理不需要 | | `policy_head.weight` | [5, 128] | Linear 128→5 | | `policy_head.bias` | [5] | | > **重要**: CFRNetwork 的 `backbone` 索引: > - `backbone.0` = Linear(55, 256) > - `backbone.1` = ReLU (无参数) > - `backbone.2` = Linear(256, 256) > - `backbone.3` = ReLU (无参数) > - `backbone.4` = Linear(256, 128) > - `backbone.5` = ReLU (无参数) --- ## 6. 可优化项 ### 6.1 是否需要 CardModel? **需要**。CardModel 是推理流水线的必要组成部分,它将离散的牌面信息编码为 50 维连续向量,CFRNetwork 依赖这个输入。如果去掉 CardModel,你需要另外设计牌面编码方式,且训练好的 CFRNetwork 权重将无法使用。 ### 6.2 是否需要 Regret Head? **在线推理不需要**。Regret Head 用于训练时的 Regret Matching,在线推理只使用 Policy Head + Softmax 得到 `avg_strategy`。可以不加载 `regret_head` 的权重以节省内存。 ### 6.3 是否需要 Equity Head? **在线推理不需要**。Equity Head 只输出标量胜率用于监控,推理时只需要 Histogram Head 的 50 维输出。 ### 6.4 替代方案: LibTorch 如果不想手写前向传播,可以使用 LibTorch (PyTorch C++ API) 直接加载 `.pt` 权重并执行推理。优点是代码量少,缺点是依赖体积大 (~200MB+)。 ### 6.5 替代方案: ONNX Runtime 可以将两个模型导出为 ONNX 格式,用 ONNX Runtime C++ API 推理。兼顾易用性和性能。 ```python # 导出 CardModel 到 ONNX card_model = CardModel() card_model.load_state_dict(torch.load("card_model/data/best_card_model.pt", map_location="cpu", weights_only=False)) card_model.eval() dummy_hole = torch.randint(0, 52, (1, 2)) dummy_board = torch.randint(0, 52, (1, 5)) torch.onnx.export(card_model, (dummy_hole, dummy_board), "card_model.onnx", input_names=["x_hole", "x_board"], output_names=["pred_equity", "pred_histogram"]) # 导出 CFRNetwork 到 ONNX cfr_net = CFRNetwork() cfr_net.load_state_dict(torch.load("botzone_cfr_net.pt", map_location="cpu", weights_only=False)) cfr_net.eval() dummy_card = torch.randn(1, 50) dummy_env = torch.randn(1, 5) torch.onnx.export(cfr_net, (dummy_card, dummy_env), "cfr_net.onnx", input_names=["card_features", "env_features"], output_names=["regrets", "policy_logits"]) ``` --- ## 7. 数据流总结 ``` ┌─────────────────────────────────────────────────────────────────┐ │ 在线推理数据流 │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 游戏状态 │ │ ├── hole_cards[2] (int, 0-51) │ │ ├── board_cards[5] (int, 0-51 or 52=PAD) │ │ ├── pot, stacks, street, position │ │ └── legal_actions (引擎原生) │ │ │ │ │ ▼ │ │ ┌─────────────────┐ ┌──────────────────┐ │ │ │ CardModel │ │ Feature Builder │ │ │ │ Embedding(53,64)│ │ pot/20000 │ │ │ │ sum → cat │ │ p0_stack/20000 │ │ │ │ MLP 128→512→ │ │ p1_stack/20000 │ │ │ │ 512→256 │ │ street/3.0 │ │ │ │ +LayerNorm │ │ position (0|1) │ │ │ │ HistHead 256→ │ └────────┬─────────┘ │ │ │ 64→50+Sofmax │ │ │ │ └────────┬────────┘ │ │ │ │ │ │ │ card_features[50] env_features[5] │ │ │ │ │ │ └──────────┬───────────┘ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ CFRNetwork │ │ │ │ cat → [55] │ │ │ │ MLP 55→256→256→128 │ │ │ │ PolicyHead 128→5 │ │ │ └────────┬────────────┘ │ │ │ │ │ policy_logits[5] │ │ │ │ │ masked_fill(非法→-1e9) │ │ │ │ │ Softmax → avg_strategy[5] │ │ │ │ │ 降噪 (<3%→0, 重归一化) │ │ │ │ │ 按概率采样 → chosen_cfr_idx │ │ │ │ │ BetTranslator → engine_action │ │ │ │ │ 执行引擎动作 │ │ │ └─────────────────────────────────────────────────────────────────┘ ```