Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker
Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best...
Gespeichert in:
| Veröffentlicht in: | Studia Universitatis Babes-Bolyai: Series Informatica Jg. 66; H. 2; S. 51 |
|---|---|
| 1. Verfasser: | |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Babes-Bolyai University, Cluj-Napoca
15.12.2021
|
| Schlagworte: | |
| ISSN: | 1224-869X, 2065-9601 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level.
|
|---|---|
| AbstractList | Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level. Received by the editors: 1 June 2021. 2010 Mathematics Subject Classification. 68T05 . 1998 CR Categories and Descriptors. I.2.1 [Artificial Intelligence]: Learning – Applications and Expert Systems - Games. Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level. |
| Author | Pricope, T.-V. |
| Author_xml | – sequence: 1 givenname: T.-V. surname: Pricope fullname: Pricope, T.-V. |
| BookMark | eNo90EtPAjEYheHGYCKge5fduRr82ulcujR4AUOUKCbuml6-kuLMlHQwkX-vgnF1krN4Fu-IDLrYISGXDCZcMJlf95_GhAkHziZ8AuKEDDmURSZLYAMyZJyLrC7l-xkZ9f0GoMwliCF5vEXc0hcMnY_JYovdji5Qpy50a-pTbOkrNj5bNnpPQ0efYtaENuzoCr90T2excVfY0mX8wHROTr1uerz42zF5u79bTWfZ4vlhPr1ZZJYJKbLKIXcSbOGxNtzxylornXTGabCMgUdrHeRWFBxqiUwbVwlRywqxqEtw-ZjMj66LeqO2KbQ67VXUQR2OmNZKp12wDaqC85rXtUCjhagk05UwIrcgmfQVevNjwdGyKfZ9Qv_vMVCHrurQVf12VVyByL8BlKRt5A |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.24193/subbi.2021.2.04 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2065-9601 |
| ExternalDocumentID | oai_doaj_org_article_52282884eba44791a74b43c0919f7efb 10_24193_subbi_2021_2_04 |
| GroupedDBID | 29Q 2WC AAYXX ADBBV ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION E3Z EN8 GROUPED_DOAJ KQ8 OK1 OVT RNS 5VS |
| ID | FETCH-LOGICAL-c1494-7de2d90c5fe8b2d27ccc9d9dbda0c110feccd03c452089e1abd744897ee5860d3 |
| IEDL.DBID | DOA |
| ISSN | 1224-869X |
| IngestDate | Fri Oct 03 12:34:25 EDT 2025 Mon Nov 17 05:36:57 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| License | http://creativecommons.org/licenses/by-nc-nd/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1494-7de2d90c5fe8b2d27ccc9d9dbda0c110feccd03c452089e1abd744897ee5860d3 |
| OpenAccessLink | https://doaj.org/article/52282884eba44791a74b43c0919f7efb |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_52282884eba44791a74b43c0919f7efb crossref_primary_10_24193_subbi_2021_2_04 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-12-15 |
| PublicationDateYYYYMMDD | 2021-12-15 |
| PublicationDate_xml | – month: 12 year: 2021 text: 2021-12-15 day: 15 |
| PublicationDecade | 2020 |
| PublicationTitle | Studia Universitatis Babes-Bolyai: Series Informatica |
| PublicationYear | 2021 |
| Publisher | Babes-Bolyai University, Cluj-Napoca |
| Publisher_xml | – name: Babes-Bolyai University, Cluj-Napoca |
| SSID | ssj0063904 ssib026972067 |
| Score | 2.1660388 |
| Snippet | Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular... |
| SourceID | doaj crossref |
| SourceType | Open Website Index Database |
| StartPage | 51 |
| SubjectTerms | Artificial Intelligence, Computer Poker, Adaptive Learning, Fictitious Play, Self-Play, Deep Reinforcement Learning, Neural Networks |
| Title | Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker |
| URI | https://doaj.org/article/52282884eba44791a74b43c0919f7efb |
| Volume | 66 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2065-9601 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0063904 issn: 1224-869X databaseCode: DOA dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8QwEA6yePAiiorrixwE8VBN27Rpjr6WRWRZdJW9lSaZyGJtl91V9N87SbuyNy9e-wjhmyQzXzvzDSGnWZLZGFx1u4Y44DblgZQ8DazgDPCmAa-z_fIgBoNsPJbDlVZfLieskQdugLvE-ABJQcZBFZwLGRaCKx5rdHPSCrDKnb5MyBUyhSspSqVwuuTNf0l0UjJGwyo1QUIYISu8aPuyLf3Qily_9yu9LbLZBoT0qpnINlmDaofc3wJM6SN4WVPtv-DRVgn1lbqKEPoEpQ2GZfFNJxUd1EHpCpXoCL6KOe3XpTmDdzqs32C2S557d6ObftB2PQg0shUeCAORkcxlgWUqMpHQWksjjTIF0-isLYJuWKx5ErFMQlgoI5BjSQGQZCkz8R7pVHUF-4SqxOJLCqMQlXlddSEKHNtK3MdJWIRdcr7EIJ824hY5kgKPV-7xyh1eeZQz3iXXDqTf55wstb-AxspbY-V_GevgPwY5JBtuVi6nJEyOSGcx-4Bjsq4_F5P57MSvgx8QrraI |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deep+Reinforcement+Learning+from+Self-Play+in+No-limit+Texas+Hold%27em+Poker&rft.jtitle=Studia+Universitatis+Babes%CC%A6-Bolyai.+Informatica&rft.au=Pricope%2C+T.-V.&rft.date=2021-12-15&rft.issn=1224-869X&rft.eissn=2065-9601&rft.volume=66&rft.issue=2&rft.spage=51&rft_id=info:doi/10.24193%2Fsubbi.2021.2.04&rft.externalDBID=n%2Fa&rft.externalDocID=10_24193_subbi_2021_2_04 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1224-869X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1224-869X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1224-869X&client=summon |