e2hang/new

Files

e2hang c9b96885a8 ops

2026-04-25 00:33:33 +08:00

20 KiB

Raw Permalink Blame History

HUNL Poker AI: CFR→Engine Action Mapping 全链路审计报告

审计时间: 2026-04-25 审计范围: env_adapter.py, cfr_net.py, play_arena.py, mccfr_trainer.py 核心问题: 模型输出 CALL 概率 80%+，但执行变成 Bet20000 (ALL-IN)

一、问题定位（精确到代码级）

🚨 BUG #1（致命）: 下注额计算公式错误 —— 使用 `current_pot` 而非 `pot_after_call`

[问题类型] bet size 计算错误（系统性的，所有 raise 动作都受影响）
[文件路径] /home/e2hang/kilo/codes/poker/env_adapter.py
[函数名]   BetTranslator.cfr_to_engine_action
[代码行]   443, 456, 468

当前代码:

# 第 443 行 (THIRD_POT)
target = max_contribution + (1.0 / 3.0) * pot

# 第 456 行 (HALF_POT)
target = max_contribution + 0.5 * pot

# 第 468 行 (FULL_POT)
target = max_contribution + 1.0 * pot

为什么会导致 ALL-IN bug:

公式用的是 current_pot（当前底池），但标准扑克规则和 OpenSpiel 引擎的 pot_size(multiple) API 使用的基准是 pot_after_call（跟注后的底池）：

OpenSpiel 公式:  bet_to = max_contribution + multiple × (pot + call_amount)
当前代码公式:    bet_to = max_contribution + multiple × pot

其中 call_amount = max_contribution - my_contribution。

实证数据（P0 面对下注时，contribs=[100,300], pot=400, call=200）：

动作	OpenSpiel 正确值	当前代码计算值	差值
THIRD_POT	500	433	+67
HALF_POT	600	500	+100
FULL_POT	900	700	+200

当 pot 很大时（如 pot=20000），FULL_POT 的 target 差值可达数千。虽然当前代码的 target 偏低（而非偏高），不会直接导致 ALL-IN，但会导致：

训练与推理的语义不一致：网络学到的 "1/3 pot" 语义与标准扑克不一致
与 OpenSpiel 内部逻辑不对齐：OpenSpiel 的 pot_size() API 是正确的基准

🚨 BUG #2（致命）: legal_mask 不考虑动作映射碰撞 —— HALF_POT/FULL_POT 映射到 ALL-IN 却仍标记为合法

[问题类型] legal_mask 语义错误 + closest 映射策略错误
[文件路径] /home/e2hang/kilo/codes/poker/env_adapter.py
[函数名]   BetTranslator.get_cfr_legal_mask / cfr_to_engine_action
[代码行]   337-347, 442-474

当前代码（get_cfr_legal_mask 第 337-343 行）:

if has_any_raise and not raise_capped:
    mask[2] = 1  # THIRD_POT
    mask[3] = 1  # HALF_POT
    mask[4] = 1  # FULL_POT

为什么会导致 ALL-IN bug:

get_cfr_legal_mask 只检查"引擎是否有 raise 动作可用"，不检查具体某个 CFR 动作映射后的结果是否仍然是语义独立的。

实证数据（contribs=[10000,10000], pot=20000, min_raise=10100, max=20000）：

CFR 动作	计算的 target	映射到的 engine action	语义
CALL	-	1 (Call)	✅ 正确
THIRD_POT	16667	16667 (Bet16667)	✅ 正确
HALF_POT	20000	20000 (ALL-IN)	❌ 变成 ALL-IN
FULL_POT	30000	20000 (ALL-IN)	❌ 变成 ALL-IN
ALL_IN	-	20000 (ALL-IN)	✅ 正确

后果:

网络输出 HALF_POT（以为是 1/2 底池加注），实际执行 Bet20000（ALL-IN）
网络输出 FULL_POT（以为是满池加注），实际执行 Bet20000（ALL-IN）
3 个不同的 CFR 动作（HALF_POT, FULL_POT, ALL_IN）映射到同一个 engine action
如果网络给 HALF_POT 分配了 5% 概率，这 5% 会变成 ALL-IN！

🚨 BUG #3（高危）: `target_int = max(target_int, bet_actions[0])` 强制向上取整 —— 小加注变成最小加注

[问题类型] closest 映射策略错误
[文件路径] /home/e2hang/kilo/codes/poker/env_adapter.py
[函数名]   BetTranslator.cfr_to_engine_action
[代码行]   446, 458, 470

当前代码（第 446 行）:

target_int = max(target_int, bet_actions[0])

为什么会导致问题:

当 target 低于 min_raise 时（如 preflop: max_contrib=100, pot=150, THIRD_POT target=150, 但 min_raise=200），代码将 target 强制提升到 min_raise。

这导致：

网络以为自己在做 "1/3 底池小加注"（target=150）
实际执行了 "最小加注"（action=200，远大于 1/3 底池）
语义断裂：小比例加注变成了较大的固定加注

正确行为: 如果 target < min_raise，该 CFR 动作在该状态下不可用，应该：

要么在 legal_mask 中标记为非法
要么 fallback 到 CALL（因为 target 更接近 call 而非 min_raise）

🚨 BUG #4（高危）: FOLD 动作 fallback 到 CALL —— 网络以为弃牌实际跟注

[问题类型] Fallback 逻辑错误
[文件路径] /home/e2hang/kilo/codes/poker/env_adapter.py
[函数名]   BetTranslator.cfr_to_engine_action
[代码行]   401-407

当前代码:

if cfr_action_idx == 0:
    if ENGINE_FOLD in legal:
        return ENGINE_FOLD
    # Fallback: 如果不能 Fold（理论上不应出现），则 Call
    if ENGINE_CALL in legal:
        return ENGINE_CALL
    return legal[0]

实证数据（contribs=[10000,10000], pot=20000，无对手下注可 check 时）：

CFR FOLD (idx=0) -> engine 1 (player=0 move=Call)  ❌ 应该不允许 FOLD！

当面对 check（没有下注需要跟注）时，legal_actions 不包含 0（Fold）。此时如果网络输出 FOLD，代码 fallback 到 CALL。

后果: 网络学到的 FOLD 动作在某些状态下实际执行了 CALL，导致：

训练时 regret 计算错误（遍历 FOLD 子树实际得到 CALL 的收益）
推理时想弃牌却跟注了

根因: get_cfr_legal_mask 在 has_fold=False 时正确设置 mask[0]=0，但如果网络仍输出 FOLD（比如未训练好、或噪音），cfr_to_engine_action 的 fallback 会将其转为 CALL。

⚠️ BUG #5（中危）: `_find_nearest_legal_action` 使用绝对值距离 —— 可能选到远大于 target 的 action

[问题类型] closest 映射策略错误
[文件路径] /home/e2hang/kilo/codes/poker/env_adapter.py
[函数名]   _find_nearest_legal_action
[代码行]   200

当前代码:

best_action = min(bet_actions, key=lambda a: abs(a - target_contribution))

为什么可能导致 ALL-IN:

当 target 超出 max legal action 时，nearest 会选到 bet_actions[-1]（ALL-IN）。

例如：target=30000，legal_actions 最高=20000，abs(20000-30000)=10000，只有这一个选项，于是返回 ALL-IN。

虽然 target_int = max(target_int, bet_actions[0]) 保证了下界，但没有上界保护。当 target 远大于 max legal 时，nearest 直接变成 ALL-IN。

二、OpenSpiel HUNL Fullgame Action 语义（权威说明）

Action ID 含义

Action ID	含义	说明
0	Fold	固定，弃牌
1	Call/Check	固定，跟注或过牌
N (N>=2)	Bet to N	总贡献额（bet-to），不是增量！

关键: action=200 表示"当前玩家累计投入变为 200"，而非"额外加注 200"。

Bet/Raise 的数学定义

OpenSpiel 的 pot_size(multiple) 方法返回的是：

pot_size(multiple) = round(maxSpent + multiple × pot_after_call)

其中：

maxSpent = max(player_contributions) = 对手（或自己）的最大贡献额
pot_after_call = current_pot + call_amount = 跟注后的底池
call_amount = maxSpent - my_contribution

标准扑克 "1/3 pot raise" 的含义:

加注额 = 1/3 × (跟注后的底池) 总贡献 = max_contribution + 1/3 × pot_after_call

这等价于 OpenSpiel 的 state.pot_size(1/3)。

Legal Actions 的特点

在 fullgame 模式下：

动作空间是连续的（从 min_raise 到 max_raise，步长=1）
min_raise = 上一轮 raise 的幅度（NLHE 标准 min-raise 规则）
max_raise = 玩家剩余筹码 + 已贡献 = 全押时的总贡献额
当只能 check 时，legal_actions = [1, min_bet, ..., max_bet]（不包含 0）

三、修复方案

3.1 修正 bet-to 计算公式（修复 BUG #1）

修改文件: env_adapter.py 第 433-474 行

核心修改: 将 pot 替换为 pot_after_call

# --- 修改前 ---
pot, contributions = _get_pot_and_contributions(state)
max_contribution = max(contributions)

# THIRD_POT
target = max_contribution + (1.0 / 3.0) * pot

# --- 修改后 ---
pot, contributions = _get_pot_and_contributions(state)
max_contribution = max(contributions)
current_player = state.current_player()
my_contribution = contributions[current_player]
call_amount = max_contribution - my_contribution
pot_after_call = pot + call_amount  # 关键修正！

# THIRD_POT
target = max_contribution + (1.0 / 3.0) * pot_after_call

3.2 修正 legal_mask —— 排除映射碰撞的动作（修复 BUG #2）

修改文件: env_adapter.py BetTranslator.get_cfr_legal_mask

核心思路: 在生成 mask 后，实际执行映射检查，排除会映射到 ALL-IN 的非 ALL_IN 动作。

def get_cfr_legal_mask(self, state) -> List[int]:
    legal = state.legal_actions()
    has_fold = ENGINE_FOLD in legal
    has_call = ENGINE_CALL in legal
    bet_actions = _get_bet_actions(legal)
    has_any_raise = len(bet_actions) > 0

    raise_count = _count_raises_this_street(state)
    raise_capped = raise_count >= self.RAISE_CAP

    mask = [0] * NUM_CFR_ACTIONS
    mask[0] = 1 if has_fold else 0       # FOLD
    mask[1] = 1 if has_call else 0       # CALL

    # ALL_IN 始终允许（只要引擎支持加注）
    all_in_action = bet_actions[-1] if bet_actions else None
    mask[5] = 1 if has_any_raise else 0   # ALL_IN

    if has_any_raise and not raise_capped:
        # === 关键修正: 检查每个比例加注映射后是否仍然是独立语义 ===
        pot, contributions = _get_pot_and_contributions(state)
        max_contribution = max(contributions)
        current_player = state.current_player()
        my_contribution = contributions[current_player]
        call_amount = max_contribution - my_contribution
        pot_after_call = pot + call_amount
        min_raise = bet_actions[0]

        for cfr_idx, multiplier in RAISE_MULTIPLIERS.items():
            target = max_contribution + multiplier * pot_after_call
            target_int = int(round(target))

            # 检查 1: target 低于 min_raise → 该动作不可用
            if target_int < min_raise:
                mask[cfr_idx] = 0
                continue

            # 检查 2: 映射后是否等于 ALL-IN → 语义碰撞，标记为非法
            nearest = _find_nearest_legal_action(legal, target_int)
            if nearest is not None and all_in_action is not None and nearest >= all_in_action:
                mask[cfr_idx] = 0  # 映射到 ALL-IN，不应作为"比例加注"使用
                continue

            mask[cfr_idx] = 1
    else:
        mask[2] = 0
        mask[3] = 0
        mask[4] = 0

    # 兜底
    if sum(mask) == 0 and has_call:
        mask[1] = 1

    return mask

3.3 修正 cfr_to_engine_action —— 增加 target 上界保护（修复 BUG #3, #5）

修改文件: env_adapter.py BetTranslator.cfr_to_engine_action

核心修改:

def cfr_to_engine_action(self, state, cfr_action_idx: int) -> int:
    legal = state.legal_actions()
    bet_actions = _get_bet_actions(legal)

    # FOLD
    if cfr_action_idx == 0:
        if ENGINE_FOLD in legal:
            return ENGINE_FOLD
        # 修正: 不能 Fold 时，返回 Call（而非 raise）
        if ENGINE_CALL in legal:
            return ENGINE_CALL
        return legal[0]

    # CALL
    if cfr_action_idx == 1:
        if ENGINE_CALL in legal:
            return ENGINE_CALL
        if bet_actions:
            return bet_actions[0]
        if ENGINE_FOLD in legal:
            return ENGINE_FOLD
        return legal[0]

    # 以下 2-5 都是加注动作
    if not bet_actions:
        if ENGINE_CALL in legal:
            return ENGINE_CALL
        if ENGINE_FOLD in legal:
            return ENGINE_FOLD
        return legal[0]

    # ALL_IN
    if cfr_action_idx == 5:
        return bet_actions[-1]

    # 比例加注 (2, 3, 4)
    pot, contributions = _get_pot_and_contributions(state)
    max_contribution = max(contributions)
    current_player = state.current_player()
    my_contribution = contributions[current_player]
    call_amount = max_contribution - my_contribution
    pot_after_call = pot + call_amount

    multiplier = RAISE_MULTIPLIERS[cfr_action_idx]
    target = max_contribution + multiplier * pot_after_call
    target_int = int(round(target))

    # === 关键修正: 上下界保护 ===
    min_raise = bet_actions[0]
    # 上界: 不超过 ALL-IN，且映射后不应等于 ALL-IN（那是 ALL_IN 动作的职责）
    # 但如果 target 本身就很大，允许映射到最大合法动作（单调性）
    max_raise = bet_actions[-1]

    # 下界: target < min_raise → fallback 到 Call
    if target_int < min_raise:
        if ENGINE_CALL in legal:
            return ENGINE_CALL
        return bet_actions[0]

    # 上界: target > max_raise → clamp 到 max_raise (ALL-IN)
    # 但注意: 如果 target 远超 max_raise，说明这个比例加注实际上就是 ALL-IN
    # 应该 fallback 到 Call 而非 ALL-IN，因为网络以为在做比例加注
    if target_int > max_raise:
        # target 超出了合法范围，该动作不可用，fallback 到 Call
        if ENGINE_CALL in legal:
            return ENGINE_CALL
        return bet_actions[0]

    # target 在合法范围内，找最近的合法 action
    target_int = max(target_int, min_raise)
    nearest = _find_nearest_legal_action(legal, target_int)
    if nearest is not None:
        return nearest
    return bet_actions[0]

3.4 修正 FOLD fallback（修复 BUG #4）

已在 3.3 中修正：当 FOLD 不可用时，fallback 到 CALL 而非 raise。同时确保 get_cfr_legal_mask 正确标记 FOLD 为非法，这样网络就不应该输出 FOLD。

四、防御性设计（安全层）

4.1 Action Clipping Guard

在 cfr_to_engine_action 返回前，添加一个安全检查：

def _safety_check(self, state, cfr_action_idx: int, engine_action: int) -> int:
    """
    最终安全检查：确保映射结果不会出现语义违规。

    规则:
    1. CALL (idx=1) 永远不能映射到 raise action (>1)
    2. FOLD (idx=0) 永远不能映射到 CALL/raise
    3. 比例加注 (idx=2,3,4) 不能映射到 ALL-IN
    4. 单调性: FOLD < CALL < THIRD_POT ≤ HALF_POT ≤ FULL_POT ≤ ALL_IN
    """
    legal = state.legal_actions()

    # 规则 1: CALL 不能变成 raise
    if cfr_action_idx == 1 and engine_action > 1:
        return ENGINE_CALL if ENGINE_CALL in legal else legal[0]

    # 规则 2: FOLD 不能变成 CALL
    if cfr_action_idx == 0 and engine_action != 0:
        return ENGINE_FOLD if ENGINE_FOLD in legal else legal[0]

    # 规则 3: 比例加注不能变成 ALL-IN
    bet_actions = _get_bet_actions(legal)
    if cfr_action_idx in (2, 3, 4) and bet_actions:
        all_in_action = bet_actions[-1]
        if engine_action >= all_in_action:
            # 比例加注不应导致 ALL-IN，fallback 到 Call
            return ENGINE_CALL if ENGINE_CALL in legal else bet_actions[0]

    return engine_action

4.2 Monotonic Mapping Guard

确保映射的单调性：如果 cfr_to_engine_action(state, 2) 返回 X，那么 cfr_to_engine_action(state, 3) 必须 >= X。

def cfr_to_engine_action_safe(self, state, cfr_action_idx: int) -> int:
    """带安全检查的映射入口。"""
    engine_action = self.cfr_to_engine_action(state, cfr_action_idx)
    return self._safety_check(state, cfr_action_idx, engine_action)

4.3 Rule-Based Guard（推荐）

在 AIPlayer.choose() 中添加最终防线：

# 在 play_arena.py 的 AIPlayer.choose() 末尾
engine_action = self.translator.cfr_to_engine_action(state, chosen_cfr_idx)

# === 安全防线: CALL 不能变成 ALL-IN ===
if chosen_cfr_idx == 1 and engine_action > 1:
    engine_action = 1  # 强制 CALL

# === 安全防线: 比例加注不能变成 ALL-IN ===
if chosen_cfr_idx in (2, 3, 4):
    bet_actions = [a for a in state.legal_actions() if a > 1]
    if bet_actions and engine_action >= bet_actions[-1]:
        engine_action = 1  # fallback 到 CALL

return engine_action

五、日志增强（Debug Log）

5.1 BetTranslator 日志

在 cfr_to_engine_action 中添加结构化日志：

import logging

logger = logging.getLogger("poker.bet_translator")

def cfr_to_engine_action(self, state, cfr_action_idx: int) -> int:
    # ... 原有逻辑 ...

    # 在比例加注分支中添加日志
    if cfr_action_idx in RAISE_MULTIPLIERS:
        logger.debug(
            f"CFR→Engine: action={CFR_ACTIONS[cfr_action_idx]}, "
            f"max_contrib={max_contribution}, pot={pot}, "
            f"call_amount={call_amount}, pot_after_call={pot_after_call}, "
            f"target={target_int}, "
            f"min_raise={bet_actions[0]}, max_raise={bet_actions[-1]}, "
            f"nearest={nearest}, engine_action={result}"
        )

    return result

5.2 AIPlayer 日志

在 AIPlayer.choose() 中添加映射追踪：

# 在采样后、映射前
logger.info(
    f"AI Decision: cfr_action={CFR_ACTIONS[chosen_cfr_idx]} ({chosen_cfr_idx}), "
    f"prob={strategy[chosen_cfr_idx]:.3f}, "
    f"legal_mask={legal_mask}"
)

# 在映射后
logger.info(
    f"AI Action: cfr={CFR_ACTIONS[chosen_cfr_idx]} -> engine={engine_action} "
    f"({state.action_to_string(state.current_player(), engine_action)})"
)

5.3 训练时日志

在 mccfr_trainer.py 的 traverse 函数中，对映射碰撞进行计数：

MAPPING_COLLISIONS = {"half_to_allin": 0, "full_to_allin": 0, "third_below_min": 0}

# 在 cfr_to_engine_action 返回后检查
if cfr_idx == 3 and engine_action == bet_actions[-1]:
    MAPPING_COLLISIONS["half_to_allin"] += 1
# ... 等等

六、Bug 影响总结

根因链路

BUG #1 (pot 计算错误)
  → target 偏低 → 训练语义不一致 → 但不直接导致 ALL-IN

BUG #2 (legal_mask 不检查碰撞) ★★★ 主要根因 ★★★
  → HALF_POT/FULL_POT 被标记为合法
  → 网络给 HALF_POT 分配 5-15% 概率（以为是中等加注）
  → 映射后变成 ALL-IN (20000)
  → 表现为: "CALL 概率高但执行 ALL-IN"

BUG #3 (target < min_raise 时强制上提)
  → 小加注变成较大加注 → 训练信号噪声

BUG #4 (FOLD fallback 到 CALL)
  → 网络输出 FOLD 实际执行 CALL → regret 计算错误

BUG #5 (无上界保护)
  → target > max_raise 时直接映射 ALL-IN

用户观察到的现象解释

用户说"CALL 概率 80%+ 但执行变成 ALL-IN"。最可能的场景：

网络输出: CALL=80%, HALF_POT=8%, FULL_POT=5%, ALL_IN=3%, FOLD=2%, THIRD_POT=2%
噪音过滤 (<3%): 过滤掉 ALL_IN(3%), FOLD(2%), THIRD_POT(2%)
剩余: CALL=80%, HALF_POT=8%, FULL_POT=5%，归一化后: CALL=86%, HALF_POT=9%, FULL_POT=5%
采样: 86% 概率选 CALL，14% 概率选 HALF_POT 或 FULL_POT
如果选中 HALF_POT 或 FULL_POT，且底池足够大，映射结果就是 ALL-IN
大约每 7 次决策就有 1 次变成 ALL-IN

这完美解释了"CALL 概率极高但频繁出现 ALL-IN"的现象。

七、修复优先级

优先级	Bug	修复难度	影响
P0	#2 legal_mask 不检查碰撞	中	直接导致 CALL→ALL-IN
P0	#5 无上界保护	低	比例加注变成 ALL-IN
P1	#1 pot 计算公式	低	训练语义不一致
P1	#3 target 强制上提	低	小加注语义错误
P2	#4 FOLD fallback	低	边缘情况

八、验证方案

修复后，运行以下测试验证：

# 1. 碰撞测试: 确保 HALF_POT/FULL_POT 不映射到 ALL-IN
# 在大底池场景下，验证所有 CFR 动作的映射结果互不相同
# （除了 FOLD/CALL 可以相同当只能 check 时）

# 2. 单调性测试: THIRD_POT ≤ HALF_POT ≤ FULL_POT ≤ ALL_IN

# 3. 边界测试: 
#    - target < min_raise → 应 fallback 到 CALL
#    - target > max_raise → 应 fallback 到 CALL（非 ALL-IN）
#    - pot=0 (preflop) → 计算不应出错

# 4. 随机自对弈: 跑 1000 局，统计映射碰撞次数应为 0

20 KiB Raw Permalink Blame History Unescape Escape

HUNL Poker AI: CFR→Engine Action Mapping 全链路审计报告

一、问题定位（精确到代码级）

🚨 BUG #1（致命）: 下注额计算公式错误 —— 使用 current_pot 而非 pot_after_call

🚨 BUG #2（致命）: legal_mask 不考虑动作映射碰撞 —— HALF_POT/FULL_POT 映射到 ALL-IN 却仍标记为合法

🚨 BUG #3（高危）: target_int = max(target_int, bet_actions[0]) 强制向上取整 —— 小加注变成最小加注

🚨 BUG #4（高危）: FOLD 动作 fallback 到 CALL —— 网络以为弃牌实际跟注

⚠️ BUG #5（中危）: _find_nearest_legal_action 使用绝对值距离 —— 可能选到远大于 target 的 action

二、OpenSpiel HUNL Fullgame Action 语义（权威说明）

Action ID 含义

Bet/Raise 的数学定义

Legal Actions 的特点

三、修复方案

3.1 修正 bet-to 计算公式（修复 BUG #1）

3.2 修正 legal_mask —— 排除映射碰撞的动作（修复 BUG #2）

3.3 修正 cfr_to_engine_action —— 增加 target 上界保护（修复 BUG #3, #5）

3.4 修正 FOLD fallback（修复 BUG #4）

四、防御性设计（安全层）

4.1 Action Clipping Guard

4.2 Monotonic Mapping Guard

4.3 Rule-Based Guard（推荐）

五、日志增强（Debug Log）

5.1 BetTranslator 日志

5.2 AIPlayer 日志

5.3 训练时日志

六、Bug 影响总结

根因链路

用户观察到的现象解释

七、修复优先级

八、验证方案

20 KiB

Raw Permalink Blame History

🚨 BUG #1（致命）: 下注额计算公式错误 —— 使用 `current_pot` 而非 `pot_after_call`

🚨 BUG #3（高危）: `target_int = max(target_int, bet_actions[0])` 强制向上取整 —— 小加注变成最小加注

⚠️ BUG #5（中危）: `_find_nearest_legal_action` 使用绝对值距离 —— 可能选到远大于 target 的 action