Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker

Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best...

Full description

Saved in:
Bibliographic Details
Published in:Studia Universitatis Babes-Bolyai: Series Informatica Vol. 66; no. 2; p. 51
Main Author: Pricope, T.-V.
Format: Journal Article
Language:English
Published: Babes-Bolyai University, Cluj-Napoca 15.12.2021
Subjects:
ISSN:1224-869X, 2065-9601
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level.  
ISSN:1224-869X
2065-9601
DOI:10.24193/subbi.2021.2.04