611 lines
20 KiB
Markdown
611 lines
20 KiB
Markdown
# HUNL Poker AI: CFR→Engine Action Mapping 全链路审计报告
|
||
|
||
> 审计时间: 2026-04-25
|
||
> 审计范围: `env_adapter.py`, `cfr_net.py`, `play_arena.py`, `mccfr_trainer.py`
|
||
> 核心问题: 模型输出 CALL 概率 80%+,但执行变成 Bet20000 (ALL-IN)
|
||
|
||
---
|
||
|
||
## 一、问题定位(精确到代码级)
|
||
|
||
### 🚨 BUG #1(致命): 下注额计算公式错误 —— 使用 `current_pot` 而非 `pot_after_call`
|
||
|
||
```
|
||
[问题类型] bet size 计算错误(系统性的,所有 raise 动作都受影响)
|
||
[文件路径] /home/e2hang/kilo/codes/poker/env_adapter.py
|
||
[函数名] BetTranslator.cfr_to_engine_action
|
||
[代码行] 443, 456, 468
|
||
```
|
||
|
||
**当前代码:**
|
||
```python
|
||
# 第 443 行 (THIRD_POT)
|
||
target = max_contribution + (1.0 / 3.0) * pot
|
||
|
||
# 第 456 行 (HALF_POT)
|
||
target = max_contribution + 0.5 * pot
|
||
|
||
# 第 468 行 (FULL_POT)
|
||
target = max_contribution + 1.0 * pot
|
||
```
|
||
|
||
**为什么会导致 ALL-IN bug:**
|
||
|
||
公式用的是 `current_pot`(当前底池),但标准扑克规则和 OpenSpiel 引擎的 `pot_size(multiple)` API 使用的基准是 **`pot_after_call`**(跟注后的底池):
|
||
|
||
```
|
||
OpenSpiel 公式: bet_to = max_contribution + multiple × (pot + call_amount)
|
||
当前代码公式: bet_to = max_contribution + multiple × pot
|
||
```
|
||
|
||
其中 `call_amount = max_contribution - my_contribution`。
|
||
|
||
**实证数据**(P0 面对下注时,contribs=[100,300], pot=400, call=200):
|
||
|
||
| 动作 | OpenSpiel 正确值 | 当前代码计算值 | 差值 |
|
||
|------|-----------------|--------------|------|
|
||
| THIRD_POT | 500 | 433 | +67 |
|
||
| HALF_POT | 600 | 500 | +100 |
|
||
| FULL_POT | 900 | 700 | +200 |
|
||
|
||
当 pot 很大时(如 pot=20000),`FULL_POT` 的 target 差值可达数千。虽然当前代码的 target 偏低(而非偏高),不会直接导致 ALL-IN,但会导致:
|
||
|
||
1. **训练与推理的语义不一致**:网络学到的 "1/3 pot" 语义与标准扑克不一致
|
||
2. **与 OpenSpiel 内部逻辑不对齐**:OpenSpiel 的 `pot_size()` API 是正确的基准
|
||
|
||
---
|
||
|
||
### 🚨 BUG #2(致命): legal_mask 不考虑动作映射碰撞 —— HALF_POT/FULL_POT 映射到 ALL-IN 却仍标记为合法
|
||
|
||
```
|
||
[问题类型] legal_mask 语义错误 + closest 映射策略错误
|
||
[文件路径] /home/e2hang/kilo/codes/poker/env_adapter.py
|
||
[函数名] BetTranslator.get_cfr_legal_mask / cfr_to_engine_action
|
||
[代码行] 337-347, 442-474
|
||
```
|
||
|
||
**当前代码(get_cfr_legal_mask 第 337-343 行):**
|
||
```python
|
||
if has_any_raise and not raise_capped:
|
||
mask[2] = 1 # THIRD_POT
|
||
mask[3] = 1 # HALF_POT
|
||
mask[4] = 1 # FULL_POT
|
||
```
|
||
|
||
**为什么会导致 ALL-IN bug:**
|
||
|
||
`get_cfr_legal_mask` 只检查"引擎是否有 raise 动作可用",不检查具体某个 CFR 动作映射后的结果是否仍然是语义独立的。
|
||
|
||
**实证数据**(contribs=[10000,10000], pot=20000, min_raise=10100, max=20000):
|
||
|
||
| CFR 动作 | 计算的 target | 映射到的 engine action | 语义 |
|
||
|----------|-------------|---------------------|------|
|
||
| CALL | - | 1 (Call) | ✅ 正确 |
|
||
| THIRD_POT | 16667 | 16667 (Bet16667) | ✅ 正确 |
|
||
| HALF_POT | 20000 | **20000 (ALL-IN)** | ❌ 变成 ALL-IN |
|
||
| FULL_POT | 30000 | **20000 (ALL-IN)** | ❌ 变成 ALL-IN |
|
||
| ALL_IN | - | 20000 (ALL-IN) | ✅ 正确 |
|
||
|
||
**后果:**
|
||
- 网络输出 `HALF_POT`(以为是 1/2 底池加注),实际执行 `Bet20000`(ALL-IN)
|
||
- 网络输出 `FULL_POT`(以为是满池加注),实际执行 `Bet20000`(ALL-IN)
|
||
- **3 个不同的 CFR 动作(HALF_POT, FULL_POT, ALL_IN)映射到同一个 engine action**
|
||
- 如果网络给 HALF_POT 分配了 5% 概率,这 5% 会变成 ALL-IN!
|
||
|
||
---
|
||
|
||
### 🚨 BUG #3(高危): `target_int = max(target_int, bet_actions[0])` 强制向上取整 —— 小加注变成最小加注
|
||
|
||
```
|
||
[问题类型] closest 映射策略错误
|
||
[文件路径] /home/e2hang/kilo/codes/poker/env_adapter.py
|
||
[函数名] BetTranslator.cfr_to_engine_action
|
||
[代码行] 446, 458, 470
|
||
```
|
||
|
||
**当前代码(第 446 行):**
|
||
```python
|
||
target_int = max(target_int, bet_actions[0])
|
||
```
|
||
|
||
**为什么会导致问题:**
|
||
|
||
当 target 低于 min_raise 时(如 preflop: max_contrib=100, pot=150, THIRD_POT target=150, 但 min_raise=200),代码将 target 强制提升到 min_raise。
|
||
|
||
这导致:
|
||
- 网络以为自己在做 "1/3 底池小加注"(target=150)
|
||
- 实际执行了 "最小加注"(action=200,远大于 1/3 底池)
|
||
- 语义断裂:小比例加注变成了较大的固定加注
|
||
|
||
**正确行为:** 如果 target < min_raise,该 CFR 动作在该状态下**不可用**,应该:
|
||
- 要么在 legal_mask 中标记为非法
|
||
- 要么 fallback 到 CALL(因为 target 更接近 call 而非 min_raise)
|
||
|
||
---
|
||
|
||
### 🚨 BUG #4(高危): FOLD 动作 fallback 到 CALL —— 网络以为弃牌实际跟注
|
||
|
||
```
|
||
[问题类型] Fallback 逻辑错误
|
||
[文件路径] /home/e2hang/kilo/codes/poker/env_adapter.py
|
||
[函数名] BetTranslator.cfr_to_engine_action
|
||
[代码行] 401-407
|
||
```
|
||
|
||
**当前代码:**
|
||
```python
|
||
if cfr_action_idx == 0:
|
||
if ENGINE_FOLD in legal:
|
||
return ENGINE_FOLD
|
||
# Fallback: 如果不能 Fold(理论上不应出现),则 Call
|
||
if ENGINE_CALL in legal:
|
||
return ENGINE_CALL
|
||
return legal[0]
|
||
```
|
||
|
||
**实证数据**(contribs=[10000,10000], pot=20000,无对手下注可 check 时):
|
||
|
||
```
|
||
CFR FOLD (idx=0) -> engine 1 (player=0 move=Call) ❌ 应该不允许 FOLD!
|
||
```
|
||
|
||
当面对 check(没有下注需要跟注)时,legal_actions 不包含 0(Fold)。此时如果网络输出 FOLD,代码 fallback 到 CALL。
|
||
|
||
**后果:** 网络学到的 FOLD 动作在某些状态下实际执行了 CALL,导致:
|
||
- 训练时 regret 计算错误(遍历 FOLD 子树实际得到 CALL 的收益)
|
||
- 推理时想弃牌却跟注了
|
||
|
||
**根因:** `get_cfr_legal_mask` 在 `has_fold=False` 时正确设置 `mask[0]=0`,但如果网络仍输出 FOLD(比如未训练好、或噪音),`cfr_to_engine_action` 的 fallback 会将其转为 CALL。
|
||
|
||
---
|
||
|
||
### ⚠️ BUG #5(中危): `_find_nearest_legal_action` 使用绝对值距离 —— 可能选到远大于 target 的 action
|
||
|
||
```
|
||
[问题类型] closest 映射策略错误
|
||
[文件路径] /home/e2hang/kilo/codes/poker/env_adapter.py
|
||
[函数名] _find_nearest_legal_action
|
||
[代码行] 200
|
||
```
|
||
|
||
**当前代码:**
|
||
```python
|
||
best_action = min(bet_actions, key=lambda a: abs(a - target_contribution))
|
||
```
|
||
|
||
**为什么可能导致 ALL-IN:**
|
||
|
||
当 target 超出 max legal action 时,nearest 会选到 `bet_actions[-1]`(ALL-IN)。
|
||
|
||
例如:target=30000,legal_actions 最高=20000,`abs(20000-30000)=10000`,只有这一个选项,于是返回 ALL-IN。
|
||
|
||
虽然 `target_int = max(target_int, bet_actions[0])` 保证了下界,但**没有上界保护**。当 target 远大于 max legal 时,nearest 直接变成 ALL-IN。
|
||
|
||
---
|
||
|
||
## 二、OpenSpiel HUNL Fullgame Action 语义(权威说明)
|
||
|
||
### Action ID 含义
|
||
|
||
| Action ID | 含义 | 说明 |
|
||
|-----------|------|------|
|
||
| 0 | Fold | 固定,弃牌 |
|
||
| 1 | Call/Check | 固定,跟注或过牌 |
|
||
| N (N>=2) | Bet to N | **总贡献额(bet-to)**,不是增量! |
|
||
|
||
**关键:** `action=200` 表示"当前玩家累计投入变为 200",而非"额外加注 200"。
|
||
|
||
### Bet/Raise 的数学定义
|
||
|
||
OpenSpiel 的 `pot_size(multiple)` 方法返回的是:
|
||
|
||
```
|
||
pot_size(multiple) = round(maxSpent + multiple × pot_after_call)
|
||
```
|
||
|
||
其中:
|
||
- `maxSpent` = max(player_contributions) = 对手(或自己)的最大贡献额
|
||
- `pot_after_call` = current_pot + call_amount = 跟注后的底池
|
||
- `call_amount` = maxSpent - my_contribution
|
||
|
||
**标准扑克 "1/3 pot raise" 的含义:**
|
||
|
||
> 加注额 = 1/3 × (跟注后的底池)
|
||
> 总贡献 = max_contribution + 1/3 × pot_after_call
|
||
|
||
这等价于 OpenSpiel 的 `state.pot_size(1/3)`。
|
||
|
||
### Legal Actions 的特点
|
||
|
||
在 fullgame 模式下:
|
||
- 动作空间是连续的(从 min_raise 到 max_raise,步长=1)
|
||
- `min_raise` = 上一轮 raise 的幅度(NLHE 标准 min-raise 规则)
|
||
- `max_raise` = 玩家剩余筹码 + 已贡献 = 全押时的总贡献额
|
||
- 当只能 check 时,legal_actions = [1, min_bet, ..., max_bet](不包含 0)
|
||
|
||
---
|
||
|
||
## 三、修复方案
|
||
|
||
### 3.1 修正 bet-to 计算公式(修复 BUG #1)
|
||
|
||
**修改文件:** `env_adapter.py` 第 433-474 行
|
||
|
||
**核心修改:** 将 `pot` 替换为 `pot_after_call`
|
||
|
||
```python
|
||
# --- 修改前 ---
|
||
pot, contributions = _get_pot_and_contributions(state)
|
||
max_contribution = max(contributions)
|
||
|
||
# THIRD_POT
|
||
target = max_contribution + (1.0 / 3.0) * pot
|
||
|
||
# --- 修改后 ---
|
||
pot, contributions = _get_pot_and_contributions(state)
|
||
max_contribution = max(contributions)
|
||
current_player = state.current_player()
|
||
my_contribution = contributions[current_player]
|
||
call_amount = max_contribution - my_contribution
|
||
pot_after_call = pot + call_amount # 关键修正!
|
||
|
||
# THIRD_POT
|
||
target = max_contribution + (1.0 / 3.0) * pot_after_call
|
||
```
|
||
|
||
### 3.2 修正 legal_mask —— 排除映射碰撞的动作(修复 BUG #2)
|
||
|
||
**修改文件:** `env_adapter.py` `BetTranslator.get_cfr_legal_mask`
|
||
|
||
**核心思路:** 在生成 mask 后,实际执行映射检查,排除会映射到 ALL-IN 的非 ALL_IN 动作。
|
||
|
||
```python
|
||
def get_cfr_legal_mask(self, state) -> List[int]:
|
||
legal = state.legal_actions()
|
||
has_fold = ENGINE_FOLD in legal
|
||
has_call = ENGINE_CALL in legal
|
||
bet_actions = _get_bet_actions(legal)
|
||
has_any_raise = len(bet_actions) > 0
|
||
|
||
raise_count = _count_raises_this_street(state)
|
||
raise_capped = raise_count >= self.RAISE_CAP
|
||
|
||
mask = [0] * NUM_CFR_ACTIONS
|
||
mask[0] = 1 if has_fold else 0 # FOLD
|
||
mask[1] = 1 if has_call else 0 # CALL
|
||
|
||
# ALL_IN 始终允许(只要引擎支持加注)
|
||
all_in_action = bet_actions[-1] if bet_actions else None
|
||
mask[5] = 1 if has_any_raise else 0 # ALL_IN
|
||
|
||
if has_any_raise and not raise_capped:
|
||
# === 关键修正: 检查每个比例加注映射后是否仍然是独立语义 ===
|
||
pot, contributions = _get_pot_and_contributions(state)
|
||
max_contribution = max(contributions)
|
||
current_player = state.current_player()
|
||
my_contribution = contributions[current_player]
|
||
call_amount = max_contribution - my_contribution
|
||
pot_after_call = pot + call_amount
|
||
min_raise = bet_actions[0]
|
||
|
||
for cfr_idx, multiplier in RAISE_MULTIPLIERS.items():
|
||
target = max_contribution + multiplier * pot_after_call
|
||
target_int = int(round(target))
|
||
|
||
# 检查 1: target 低于 min_raise → 该动作不可用
|
||
if target_int < min_raise:
|
||
mask[cfr_idx] = 0
|
||
continue
|
||
|
||
# 检查 2: 映射后是否等于 ALL-IN → 语义碰撞,标记为非法
|
||
nearest = _find_nearest_legal_action(legal, target_int)
|
||
if nearest is not None and all_in_action is not None and nearest >= all_in_action:
|
||
mask[cfr_idx] = 0 # 映射到 ALL-IN,不应作为"比例加注"使用
|
||
continue
|
||
|
||
mask[cfr_idx] = 1
|
||
else:
|
||
mask[2] = 0
|
||
mask[3] = 0
|
||
mask[4] = 0
|
||
|
||
# 兜底
|
||
if sum(mask) == 0 and has_call:
|
||
mask[1] = 1
|
||
|
||
return mask
|
||
```
|
||
|
||
### 3.3 修正 cfr_to_engine_action —— 增加 target 上界保护(修复 BUG #3, #5)
|
||
|
||
**修改文件:** `env_adapter.py` `BetTranslator.cfr_to_engine_action`
|
||
|
||
**核心修改:**
|
||
|
||
```python
|
||
def cfr_to_engine_action(self, state, cfr_action_idx: int) -> int:
|
||
legal = state.legal_actions()
|
||
bet_actions = _get_bet_actions(legal)
|
||
|
||
# FOLD
|
||
if cfr_action_idx == 0:
|
||
if ENGINE_FOLD in legal:
|
||
return ENGINE_FOLD
|
||
# 修正: 不能 Fold 时,返回 Call(而非 raise)
|
||
if ENGINE_CALL in legal:
|
||
return ENGINE_CALL
|
||
return legal[0]
|
||
|
||
# CALL
|
||
if cfr_action_idx == 1:
|
||
if ENGINE_CALL in legal:
|
||
return ENGINE_CALL
|
||
if bet_actions:
|
||
return bet_actions[0]
|
||
if ENGINE_FOLD in legal:
|
||
return ENGINE_FOLD
|
||
return legal[0]
|
||
|
||
# 以下 2-5 都是加注动作
|
||
if not bet_actions:
|
||
if ENGINE_CALL in legal:
|
||
return ENGINE_CALL
|
||
if ENGINE_FOLD in legal:
|
||
return ENGINE_FOLD
|
||
return legal[0]
|
||
|
||
# ALL_IN
|
||
if cfr_action_idx == 5:
|
||
return bet_actions[-1]
|
||
|
||
# 比例加注 (2, 3, 4)
|
||
pot, contributions = _get_pot_and_contributions(state)
|
||
max_contribution = max(contributions)
|
||
current_player = state.current_player()
|
||
my_contribution = contributions[current_player]
|
||
call_amount = max_contribution - my_contribution
|
||
pot_after_call = pot + call_amount
|
||
|
||
multiplier = RAISE_MULTIPLIERS[cfr_action_idx]
|
||
target = max_contribution + multiplier * pot_after_call
|
||
target_int = int(round(target))
|
||
|
||
# === 关键修正: 上下界保护 ===
|
||
min_raise = bet_actions[0]
|
||
# 上界: 不超过 ALL-IN,且映射后不应等于 ALL-IN(那是 ALL_IN 动作的职责)
|
||
# 但如果 target 本身就很大,允许映射到最大合法动作(单调性)
|
||
max_raise = bet_actions[-1]
|
||
|
||
# 下界: target < min_raise → fallback 到 Call
|
||
if target_int < min_raise:
|
||
if ENGINE_CALL in legal:
|
||
return ENGINE_CALL
|
||
return bet_actions[0]
|
||
|
||
# 上界: target > max_raise → clamp 到 max_raise (ALL-IN)
|
||
# 但注意: 如果 target 远超 max_raise,说明这个比例加注实际上就是 ALL-IN
|
||
# 应该 fallback 到 Call 而非 ALL-IN,因为网络以为在做比例加注
|
||
if target_int > max_raise:
|
||
# target 超出了合法范围,该动作不可用,fallback 到 Call
|
||
if ENGINE_CALL in legal:
|
||
return ENGINE_CALL
|
||
return bet_actions[0]
|
||
|
||
# target 在合法范围内,找最近的合法 action
|
||
target_int = max(target_int, min_raise)
|
||
nearest = _find_nearest_legal_action(legal, target_int)
|
||
if nearest is not None:
|
||
return nearest
|
||
return bet_actions[0]
|
||
```
|
||
|
||
### 3.4 修正 FOLD fallback(修复 BUG #4)
|
||
|
||
已在 3.3 中修正:当 FOLD 不可用时,fallback 到 CALL 而非 raise。同时确保 `get_cfr_legal_mask` 正确标记 FOLD 为非法,这样网络就不应该输出 FOLD。
|
||
|
||
---
|
||
|
||
## 四、防御性设计(安全层)
|
||
|
||
### 4.1 Action Clipping Guard
|
||
|
||
在 `cfr_to_engine_action` 返回前,添加一个安全检查:
|
||
|
||
```python
|
||
def _safety_check(self, state, cfr_action_idx: int, engine_action: int) -> int:
|
||
"""
|
||
最终安全检查:确保映射结果不会出现语义违规。
|
||
|
||
规则:
|
||
1. CALL (idx=1) 永远不能映射到 raise action (>1)
|
||
2. FOLD (idx=0) 永远不能映射到 CALL/raise
|
||
3. 比例加注 (idx=2,3,4) 不能映射到 ALL-IN
|
||
4. 单调性: FOLD < CALL < THIRD_POT ≤ HALF_POT ≤ FULL_POT ≤ ALL_IN
|
||
"""
|
||
legal = state.legal_actions()
|
||
|
||
# 规则 1: CALL 不能变成 raise
|
||
if cfr_action_idx == 1 and engine_action > 1:
|
||
return ENGINE_CALL if ENGINE_CALL in legal else legal[0]
|
||
|
||
# 规则 2: FOLD 不能变成 CALL
|
||
if cfr_action_idx == 0 and engine_action != 0:
|
||
return ENGINE_FOLD if ENGINE_FOLD in legal else legal[0]
|
||
|
||
# 规则 3: 比例加注不能变成 ALL-IN
|
||
bet_actions = _get_bet_actions(legal)
|
||
if cfr_action_idx in (2, 3, 4) and bet_actions:
|
||
all_in_action = bet_actions[-1]
|
||
if engine_action >= all_in_action:
|
||
# 比例加注不应导致 ALL-IN,fallback 到 Call
|
||
return ENGINE_CALL if ENGINE_CALL in legal else bet_actions[0]
|
||
|
||
return engine_action
|
||
```
|
||
|
||
### 4.2 Monotonic Mapping Guard
|
||
|
||
确保映射的单调性:如果 `cfr_to_engine_action(state, 2)` 返回 X,那么 `cfr_to_engine_action(state, 3)` 必须 >= X。
|
||
|
||
```python
|
||
def cfr_to_engine_action_safe(self, state, cfr_action_idx: int) -> int:
|
||
"""带安全检查的映射入口。"""
|
||
engine_action = self.cfr_to_engine_action(state, cfr_action_idx)
|
||
return self._safety_check(state, cfr_action_idx, engine_action)
|
||
```
|
||
|
||
### 4.3 Rule-Based Guard(推荐)
|
||
|
||
在 `AIPlayer.choose()` 中添加最终防线:
|
||
|
||
```python
|
||
# 在 play_arena.py 的 AIPlayer.choose() 末尾
|
||
engine_action = self.translator.cfr_to_engine_action(state, chosen_cfr_idx)
|
||
|
||
# === 安全防线: CALL 不能变成 ALL-IN ===
|
||
if chosen_cfr_idx == 1 and engine_action > 1:
|
||
engine_action = 1 # 强制 CALL
|
||
|
||
# === 安全防线: 比例加注不能变成 ALL-IN ===
|
||
if chosen_cfr_idx in (2, 3, 4):
|
||
bet_actions = [a for a in state.legal_actions() if a > 1]
|
||
if bet_actions and engine_action >= bet_actions[-1]:
|
||
engine_action = 1 # fallback 到 CALL
|
||
|
||
return engine_action
|
||
```
|
||
|
||
---
|
||
|
||
## 五、日志增强(Debug Log)
|
||
|
||
### 5.1 BetTranslator 日志
|
||
|
||
在 `cfr_to_engine_action` 中添加结构化日志:
|
||
|
||
```python
|
||
import logging
|
||
|
||
logger = logging.getLogger("poker.bet_translator")
|
||
|
||
def cfr_to_engine_action(self, state, cfr_action_idx: int) -> int:
|
||
# ... 原有逻辑 ...
|
||
|
||
# 在比例加注分支中添加日志
|
||
if cfr_action_idx in RAISE_MULTIPLIERS:
|
||
logger.debug(
|
||
f"CFR→Engine: action={CFR_ACTIONS[cfr_action_idx]}, "
|
||
f"max_contrib={max_contribution}, pot={pot}, "
|
||
f"call_amount={call_amount}, pot_after_call={pot_after_call}, "
|
||
f"target={target_int}, "
|
||
f"min_raise={bet_actions[0]}, max_raise={bet_actions[-1]}, "
|
||
f"nearest={nearest}, engine_action={result}"
|
||
)
|
||
|
||
return result
|
||
```
|
||
|
||
### 5.2 AIPlayer 日志
|
||
|
||
在 `AIPlayer.choose()` 中添加映射追踪:
|
||
|
||
```python
|
||
# 在采样后、映射前
|
||
logger.info(
|
||
f"AI Decision: cfr_action={CFR_ACTIONS[chosen_cfr_idx]} ({chosen_cfr_idx}), "
|
||
f"prob={strategy[chosen_cfr_idx]:.3f}, "
|
||
f"legal_mask={legal_mask}"
|
||
)
|
||
|
||
# 在映射后
|
||
logger.info(
|
||
f"AI Action: cfr={CFR_ACTIONS[chosen_cfr_idx]} -> engine={engine_action} "
|
||
f"({state.action_to_string(state.current_player(), engine_action)})"
|
||
)
|
||
```
|
||
|
||
### 5.3 训练时日志
|
||
|
||
在 `mccfr_trainer.py` 的 traverse 函数中,对映射碰撞进行计数:
|
||
|
||
```python
|
||
MAPPING_COLLISIONS = {"half_to_allin": 0, "full_to_allin": 0, "third_below_min": 0}
|
||
|
||
# 在 cfr_to_engine_action 返回后检查
|
||
if cfr_idx == 3 and engine_action == bet_actions[-1]:
|
||
MAPPING_COLLISIONS["half_to_allin"] += 1
|
||
# ... 等等
|
||
```
|
||
|
||
---
|
||
|
||
## 六、Bug 影响总结
|
||
|
||
### 根因链路
|
||
|
||
```
|
||
BUG #1 (pot 计算错误)
|
||
→ target 偏低 → 训练语义不一致 → 但不直接导致 ALL-IN
|
||
|
||
BUG #2 (legal_mask 不检查碰撞) ★★★ 主要根因 ★★★
|
||
→ HALF_POT/FULL_POT 被标记为合法
|
||
→ 网络给 HALF_POT 分配 5-15% 概率(以为是中等加注)
|
||
→ 映射后变成 ALL-IN (20000)
|
||
→ 表现为: "CALL 概率高但执行 ALL-IN"
|
||
|
||
BUG #3 (target < min_raise 时强制上提)
|
||
→ 小加注变成较大加注 → 训练信号噪声
|
||
|
||
BUG #4 (FOLD fallback 到 CALL)
|
||
→ 网络输出 FOLD 实际执行 CALL → regret 计算错误
|
||
|
||
BUG #5 (无上界保护)
|
||
→ target > max_raise 时直接映射 ALL-IN
|
||
```
|
||
|
||
### 用户观察到的现象解释
|
||
|
||
用户说"CALL 概率 80%+ 但执行变成 ALL-IN"。最可能的场景:
|
||
|
||
1. 网络输出: CALL=80%, HALF_POT=8%, FULL_POT=5%, ALL_IN=3%, FOLD=2%, THIRD_POT=2%
|
||
2. 噪音过滤 (<3%): 过滤掉 ALL_IN(3%), FOLD(2%), THIRD_POT(2%)
|
||
3. 剩余: CALL=80%, HALF_POT=8%, FULL_POT=5%,归一化后: CALL=86%, HALF_POT=9%, FULL_POT=5%
|
||
4. 采样: 86% 概率选 CALL,14% 概率选 HALF_POT 或 FULL_POT
|
||
5. **如果选中 HALF_POT 或 FULL_POT,且底池足够大,映射结果就是 ALL-IN**
|
||
6. 大约每 7 次决策就有 1 次变成 ALL-IN
|
||
|
||
这完美解释了"CALL 概率极高但频繁出现 ALL-IN"的现象。
|
||
|
||
---
|
||
|
||
## 七、修复优先级
|
||
|
||
| 优先级 | Bug | 修复难度 | 影响 |
|
||
|--------|-----|---------|------|
|
||
| P0 | #2 legal_mask 不检查碰撞 | 中 | 直接导致 CALL→ALL-IN |
|
||
| P0 | #5 无上界保护 | 低 | 比例加注变成 ALL-IN |
|
||
| P1 | #1 pot 计算公式 | 低 | 训练语义不一致 |
|
||
| P1 | #3 target 强制上提 | 低 | 小加注语义错误 |
|
||
| P2 | #4 FOLD fallback | 低 | 边缘情况 |
|
||
|
||
---
|
||
|
||
## 八、验证方案
|
||
|
||
修复后,运行以下测试验证:
|
||
|
||
```python
|
||
# 1. 碰撞测试: 确保 HALF_POT/FULL_POT 不映射到 ALL-IN
|
||
# 在大底池场景下,验证所有 CFR 动作的映射结果互不相同
|
||
# (除了 FOLD/CALL 可以相同当只能 check 时)
|
||
|
||
# 2. 单调性测试: THIRD_POT ≤ HALF_POT ≤ FULL_POT ≤ ALL_IN
|
||
|
||
# 3. 边界测试:
|
||
# - target < min_raise → 应 fallback 到 CALL
|
||
# - target > max_raise → 应 fallback 到 CALL(非 ALL-IN)
|
||
# - pot=0 (preflop) → 计算不应出错
|
||
|
||
# 4. 随机自对弈: 跑 1000 局,统计映射碰撞次数应为 0
|
||
```
|