Podrobná bibliografie
| Název: |
结合PPO和蒙特卡洛树搜索的斗地主博弈模型. (Chinese) |
| Alternate Title: |
The improved DouDiZhu game model combining PPO with Monte Carlo Tree Search. (English) |
| Autoři: |
王世鹏, 王亚杰, 吴燕燕, 郭其龙, 赵甜宇 |
| Zdroj: |
Journal of Chongqing University of Technology (Natural Science); 2025, Vol. 39 Issue 8, p126-133, 8p |
| Témata: |
STOCHASTIC processes, STRATEGY games, DECISION making, PROBABILITY theory, ALGORITHMS |
| Abstract (English): |
DouDiZhu is a typical imperfect information game, whose decision-making involves multiple players, the huge action space, and the coexistence of cooperation and competition, leading to low efficiency in a single Monte Carlo Tree Search (MCTS). To improve the strategy and the efficiency of search for MCTS, the model for DouDiZhu game is proposed based on the Proximal Policy Optimization (PPO) algorithm combined with MCTS, First, PPO algorithm is employed to leam the game and strategy information and train a strategy model that provides action probability according to the current situation, offering strategy guidance for the selection and simulation stage of MCTS. Then, the selection formula is adjusted by the action probability output of PPO strategy model to guide the selection of high-quality action nodes. Finally, PPO replaces the random simulation process, which makes the simulation more consistent with the strategy and reduces the exploration of inefficient paths. Results show MCTS combined with PPO, the optimized MCTS not only improves the efficiency of decision-making, but also markedly increases the probability of victory, demonstrating its superiority in the decision-making process of the game of DouDiZhu. [ABSTRACT FROM AUTHOR] |
| Abstract (Chinese): |
斗地主是一种典型的非完备信息博弈, 由于具有多人博弈、动作空间庞大、合作与竞争并存等决策需求, 单 一的蒙特卡洛树搜索在应用时存在效率低的问题。为提升蒙特卡洛树搜索的策略效果和搜索效率, 提出一种基于 近端策略优化(proximal policy optimization, PPO)算法结合蒙特卡洛树搜索的斗地主博弈模型。利用PPO算法学习 斗地主中的牌局和策略信息, 训练出可根据当前局面提供动作概率的策略模型, 为蒙特卡洛树搜索的选择和模拟阶 段提供策略指导。在选择阶段, 通过PPO策略模型输出的动作概率优化策略选择公式, 指导高质量动作节点的选 择。在模拟阶段, PPO替代了随机模拟过程, 使模拟更加符合策略, 减少低效路径的探索。实验结果表明: 结合PPO 优化后的蒙特卡洛树搜索不仅提高了决策的效率, 还提升了模型的胜率, 表现出较强的斗地主博弈决策优势。. [ABSTRACT FROM AUTHOR] |
|
Copyright of Journal of Chongqing University of Technology (Natural Science) is the property of Chongqing University of Technology and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Databáze: |
Complementary Index |