Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker

Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Studia Universitatis Babes-Bolyai: Series Informatica Ročník 66; číslo 2; s. 51
Hlavní autor: Pricope, T.-V.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Babes-Bolyai University, Cluj-Napoca 15.12.2021
Témata:
ISSN:1224-869X, 2065-9601
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level.  
ISSN:1224-869X
2065-9601
DOI:10.24193/subbi.2021.2.04