Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker

Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Studia Universitatis Babes-Bolyai: Series Informatica Ročník 66; číslo 2; s. 51
Hlavní autor: Pricope, T.-V.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Babes-Bolyai University, Cluj-Napoca 15.12.2021
Témata:
ISSN:1224-869X, 2065-9601
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level.  
AbstractList Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level. Received by the editors: 1 June 2021. 2010 Mathematics Subject Classification. 68T05 . 1998 CR Categories and Descriptors. I.2.1 [Artificial Intelligence]: Learning – Applications and Expert Systems - Games.
Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level.  
Author Pricope, T.-V.
Author_xml – sequence: 1
  givenname: T.-V.
  surname: Pricope
  fullname: Pricope, T.-V.
BookMark eNo90EtPAjEYheHGYCKge5fduRr82ulcujR4AUOUKCbuml6-kuLMlHQwkX-vgnF1krN4Fu-IDLrYISGXDCZcMJlf95_GhAkHziZ8AuKEDDmURSZLYAMyZJyLrC7l-xkZ9f0GoMwliCF5vEXc0hcMnY_JYovdji5Qpy50a-pTbOkrNj5bNnpPQ0efYtaENuzoCr90T2excVfY0mX8wHROTr1uerz42zF5u79bTWfZ4vlhPr1ZZJYJKbLKIXcSbOGxNtzxylornXTGabCMgUdrHeRWFBxqiUwbVwlRywqxqEtw-ZjMj66LeqO2KbQ67VXUQR2OmNZKp12wDaqC85rXtUCjhagk05UwIrcgmfQVevNjwdGyKfZ9Qv_vMVCHrurQVf12VVyByL8BlKRt5A
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.24193/subbi.2021.2.04
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2065-9601
ExternalDocumentID oai_doaj_org_article_52282884eba44791a74b43c0919f7efb
10_24193_subbi_2021_2_04
GroupedDBID 29Q
2WC
AAYXX
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
E3Z
EN8
GROUPED_DOAJ
KQ8
OK1
OVT
RNS
5VS
ID FETCH-LOGICAL-c1494-7de2d90c5fe8b2d27ccc9d9dbda0c110feccd03c452089e1abd744897ee5860d3
IEDL.DBID DOA
ISSN 1224-869X
IngestDate Fri Oct 03 12:34:25 EDT 2025
Mon Nov 17 05:36:57 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License http://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1494-7de2d90c5fe8b2d27ccc9d9dbda0c110feccd03c452089e1abd744897ee5860d3
OpenAccessLink https://doaj.org/article/52282884eba44791a74b43c0919f7efb
ParticipantIDs doaj_primary_oai_doaj_org_article_52282884eba44791a74b43c0919f7efb
crossref_primary_10_24193_subbi_2021_2_04
PublicationCentury 2000
PublicationDate 2021-12-15
PublicationDateYYYYMMDD 2021-12-15
PublicationDate_xml – month: 12
  year: 2021
  text: 2021-12-15
  day: 15
PublicationDecade 2020
PublicationTitle Studia Universitatis Babes-Bolyai: Series Informatica
PublicationYear 2021
Publisher Babes-Bolyai University, Cluj-Napoca
Publisher_xml – name: Babes-Bolyai University, Cluj-Napoca
SSID ssj0063904
ssib026972067
Score 2.1659346
Snippet Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular...
SourceID doaj
crossref
SourceType Open Website
Index Database
StartPage 51
SubjectTerms Artificial Intelligence, Computer Poker, Adaptive Learning, Fictitious Play, Self-Play, Deep Reinforcement Learning, Neural Networks
Title Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker
URI https://doaj.org/article/52282884eba44791a74b43c0919f7efb
Volume 66
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2065-9601
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0063904
  issn: 1224-869X
  databaseCode: DOA
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1NS8NAEF2kePAiior1iz0I4mF1s9lks0e_ShEpRav0FvZjIsXYlraK_ntnk1R68-I1hLC8WfLm7c68IeQ0LUyUGmEY2EIwCTphlhvFAPnGJy4WrmqkfXlQvV42HOr-yqivUBNW2wPXwF1ifoCiIJNgjZRKR0ZJK2OHNKcLBYUNf1-u9IqYwp0kUq2CL3l9L4kkpWMMrLUjFIQCVeFFM5dtyUMrdv0Vr3S2yGaTENKreiHbZA3GO-T-FmBKH6GyNXXVCR5tnFBfaegIoU9QFqxfmm86GtPehJWhUYkO4MvMaXdS-jN4p_3JG8x2yXPnbnDTZc3UA-ZQrUimPAiveagCy6zwQjnntNfeesMdknWBoHseO5kInmmIjPUKNZZWAEmWch_vkdZ4MoZ9QjEbUhF4Dk4Zidogw1whEyYGHoGMIW6T8yUG-bQ2t8hRFFR45RVeecArFzmXbXIdQPp9L9hSVw8wWHkTrPyvYB38x0cOyUZYVagpiZIj0lrMPuCYrLvPxWg-O6n2wQ-OjLcD
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deep+Reinforcement+Learning+from+Self-Play+in+No-limit+Texas+Hold%27em+Poker&rft.jtitle=Studia+Universitatis+Babes%CC%A6-Bolyai.+Informatica&rft.au=Pricope%2C+T.-V.&rft.date=2021-12-15&rft.issn=1224-869X&rft.eissn=2065-9601&rft.volume=66&rft.issue=2&rft.spage=51&rft_id=info:doi/10.24193%2Fsubbi.2021.2.04&rft.externalDBID=n%2Fa&rft.externalDocID=10_24193_subbi_2021_2_04
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1224-869X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1224-869X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1224-869X&client=summon