On the Convergence Rate of MCTS for the Optimal Value Estimation in Markov Decision Processes
A recent theoretical analysis of a Monte-Carlo tree search (MCTS) method properly modified from the "upper confidence bound applied to trees" (UCT) algorithm established a surprising result, due to a great deal of empirical successes reported from heuristic usage of UCT with relevant adjus...
Saved in:
| Published in: | IEEE transactions on automatic control Vol. 70; no. 7; pp. 4788 - 4793 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.07.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 0018-9286, 1558-2523 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | A recent theoretical analysis of a Monte-Carlo tree search (MCTS) method properly modified from the "upper confidence bound applied to trees" (UCT) algorithm established a surprising result, due to a great deal of empirical successes reported from heuristic usage of UCT with relevant adjustments for various problem domains in the literature, that its rate of convergence of the expected absolute error to zero is <inline-formula><tex-math notation="LaTeX">O(1/\sqrt{n})</tex-math></inline-formula> in estimating the optimal value at an initial state in a finite-horizon Markov decision process (MDP), where <inline-formula><tex-math notation="LaTeX">n</tex-math></inline-formula> is the number of simulations. We strengthen this dispiriting slow convergence result by arguing within a simpler algorithmic framework in the perspective of MDP, apart from the usual MCTS description, that the simpler strategy, called "upper confidence bound 1" (UCB1) for multiarmed bandit problems, when employed as an instance of MCTS by setting UCB1's arm set to be the policy set of the underlying MDP, has an asymptotically faster convergence-rate of <inline-formula><tex-math notation="LaTeX">O(\ln n / n)</tex-math></inline-formula>. We also point out that the UCT-based MCTS in general has the time and space complexities that depend on the size of the state space in the worst case, which contradicts the original design spirit of MCTS. Unless heuristically used, UCT-based MCTS has yet to have theoretical supports for its applicabilities. |
|---|---|
| AbstractList | A recent theoretical analysis of a Monte-Carlo tree search (MCTS) method properly modified from the “upper confidence bound applied to trees” (UCT) algorithm established a surprising result, due to a great deal of empirical successes reported from heuristic usage of UCT with relevant adjustments for various problem domains in the literature, that its rate of convergence of the expected absolute error to zero is [Formula Omitted] in estimating the optimal value at an initial state in a finite-horizon Markov decision process (MDP), where [Formula Omitted] is the number of simulations. We strengthen this dispiriting slow convergence result by arguing within a simpler algorithmic framework in the perspective of MDP, apart from the usual MCTS description, that the simpler strategy, called “upper confidence bound 1” (UCB1) for multiarmed bandit problems, when employed as an instance of MCTS by setting UCB1’s arm set to be the policy set of the underlying MDP, has an asymptotically faster convergence-rate of [Formula Omitted]. We also point out that the UCT-based MCTS in general has the time and space complexities that depend on the size of the state space in the worst case, which contradicts the original design spirit of MCTS. Unless heuristically used, UCT-based MCTS has yet to have theoretical supports for its applicabilities. A recent theoretical analysis of a Monte-Carlo tree search (MCTS) method properly modified from the "upper confidence bound applied to trees" (UCT) algorithm established a surprising result, due to a great deal of empirical successes reported from heuristic usage of UCT with relevant adjustments for various problem domains in the literature, that its rate of convergence of the expected absolute error to zero is <inline-formula><tex-math notation="LaTeX">O(1/\sqrt{n})</tex-math></inline-formula> in estimating the optimal value at an initial state in a finite-horizon Markov decision process (MDP), where <inline-formula><tex-math notation="LaTeX">n</tex-math></inline-formula> is the number of simulations. We strengthen this dispiriting slow convergence result by arguing within a simpler algorithmic framework in the perspective of MDP, apart from the usual MCTS description, that the simpler strategy, called "upper confidence bound 1" (UCB1) for multiarmed bandit problems, when employed as an instance of MCTS by setting UCB1's arm set to be the policy set of the underlying MDP, has an asymptotically faster convergence-rate of <inline-formula><tex-math notation="LaTeX">O(\ln n / n)</tex-math></inline-formula>. We also point out that the UCT-based MCTS in general has the time and space complexities that depend on the size of the state space in the worst case, which contradicts the original design spirit of MCTS. Unless heuristically used, UCT-based MCTS has yet to have theoretical supports for its applicabilities. |
| Author | Chang, Hyeong Soo |
| Author_xml | – sequence: 1 givenname: Hyeong Soo orcidid: 0000-0003-3298-0018 surname: Chang fullname: Chang, Hyeong Soo email: hschang@sogang.ac.kr organization: Department of Computer Science and Engineering, Sogang University, Seoul, South Korea |
| BookMark | eNpNkD1PwzAQhi1UJNrCzsBgiTnFH7Fjj1UoH1KrIihsKHKcM6SUuNhpJf49CWVgOr2n5-50zwgNGt8AQueUTCgl-mo1zSeMMDHhgitFsiM0pEKohAnGB2hICFWJZkqeoFGM6y7KNKVD9LpscPsOOPfNHsIbNBbwo2kBe4cX-eoJOx9-geW2rT_NBr-YzQ7wLPaprX2D6wYvTPjwe3wNto596yF4CzFCPEXHzmwinP3VMXq-ma3yu2S-vL3Pp_PEslS0ickMgEor6qTTpeWuJCUxhNrMOdBK6EraSlJpCdOlLI3TgpW8dCmvFNUc-BhdHvZug__aQWyLtd-FpjtZcMakICSTqqPIgbLBxxjAFdvQfRG-C0qKXmLRSSx6icWfxG7k4jBSA8A_XGWEiIz_ABCRb2c |
| CODEN | IETAA9 |
| Cites_doi | 10.1007/978-1-4471-5022-0 10.1287/opre.2021.2239 10.1016/0196-8858(85)90002-8 10.1109/cdc.1995.478953 10.1109/TCIAIG.2012.2186810 10.1007/978-3-540-75538-8_7 10.1002/SERIES1345 10.1007/11871842_29 10.1007/s001860400372 10.1287/opre.1040.0145 10.1007/s10462-022-10228-y 10.1016/j.tcs.2010.12.059 10.1023/A:1013689704352 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D |
| DOI | 10.1109/TAC.2025.3538807 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database Engineering Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1558-2523 |
| EndPage | 4793 |
| ExternalDocumentID | 10_1109_TAC_2025_3538807 10870057 |
| Genre | orig-research |
| GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 VH1 VJK ~02 AAYXX CITATION 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c245t-a7aee84d1f6f9bc3fb0b0a01c7ffe9859d6cd616c029b6baf952b3bf43d8193e3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001521488300015&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0018-9286 |
| IngestDate | Thu Oct 30 15:48:40 EDT 2025 Sat Nov 29 07:52:45 EST 2025 Wed Aug 27 02:14:29 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c245t-a7aee84d1f6f9bc3fb0b0a01c7ffe9859d6cd616c029b6baf952b3bf43d8193e3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-3298-0018 |
| PQID | 3226500768 |
| PQPubID | 85475 |
| PageCount | 6 |
| ParticipantIDs | proquest_journals_3226500768 ieee_primary_10870057 crossref_primary_10_1109_TAC_2025_3538807 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-07-01 |
| PublicationDateYYYYMMDD | 2025-07-01 |
| PublicationDate_xml | – month: 07 year: 2025 text: 2025-07-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on automatic control |
| PublicationTitleAbbrev | TAC |
| PublicationYear | 2025 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref14 ref11 ref10 ref1 ref8 ref7 ref9 ref4 ref3 ref6 ref5 Bertsekas (ref2) 2022 |
| References_xml | – ident: ref7 doi: 10.1007/978-1-4471-5022-0 – ident: ref13 doi: 10.1287/opre.2021.2239 – ident: ref11 doi: 10.1016/0196-8858(85)90002-8 – ident: ref3 doi: 10.1109/cdc.1995.478953 – ident: ref4 doi: 10.1109/TCIAIG.2012.2186810 – ident: ref8 doi: 10.1007/978-3-540-75538-8_7 – ident: ref12 doi: 10.1002/SERIES1345 – ident: ref10 doi: 10.1007/11871842_29 – ident: ref9 doi: 10.1007/s001860400372 – ident: ref6 doi: 10.1287/opre.1040.0145 – ident: ref14 doi: 10.1007/s10462-022-10228-y – volume-title: Lessons from AlphaZero Optimal, Model Predictive, Adaptive Control year: 2022 ident: ref2 – ident: ref5 doi: 10.1016/j.tcs.2010.12.059 – ident: ref1 doi: 10.1023/A:1013689704352 |
| SSID | ssj0016441 |
| Score | 2.4826643 |
| Snippet | A recent theoretical analysis of a Monte-Carlo tree search (MCTS) method properly modified from the "upper confidence bound applied to trees" (UCT) algorithm... A recent theoretical analysis of a Monte-Carlo tree search (MCTS) method properly modified from the “upper confidence bound applied to trees” (UCT) algorithm... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 4788 |
| SubjectTerms | Approximation algorithms Artificial intelligence Complexity theory Computational modeling Convergence Data mining Estimation Markov decision process (MDP) Markov processes Monte-Carlo tree search (MCTS) multiarmed bandit (MAB) Search problems Training Uncertainty Upper bound upper confidence bound 1 (UCB1) upper confidence bound applied to trees (UCT) |
| Title | On the Convergence Rate of MCTS for the Optimal Value Estimation in Markov Decision Processes |
| URI | https://ieeexplore.ieee.org/document/10870057 https://www.proquest.com/docview/3226500768 |
| Volume | 70 |
| WOSCitedRecordID | wos001521488300015&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2523 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0016441 issn: 0018-9286 databaseCode: RIE dateStart: 19630101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagYoCBZxGFgjywMKTk4ST2WIVWDNAiKKgLivw4S5UgRU3b34_tpFURYmBKIjlWdJ_v_Dn3QuiaMhUoEqae0pH0CGHEE5LaKAvQxhqaS9W15CEdDOh4zJ7qZHWXCwMALvgMOvbW-fLVVC7srzKj4WZ1GYKxjbbTNKmStdYuA7uxV2bXaHBI1z5Jn92Oupk5CYZxJ4pt7ZP0xx7kmqr8ssRue-kf_PPDDtF-zSNxtwL-CG1BcYz2NqoLnqD3YYENvcOZDSx3OZaAnw21xFONH7PRCzZ81Q0YGrPxaSZ74x8LwL3SPlm88KTANpdnusR3dSseXCcWQNlEr_3eKLv36m4KngxJPPd4ygEoUYFONBMy0sIXPvcDmWoNjMZMJVIlQSL9kIlEcM3iUERCk0gZ1hBBdIoaxbSAM4QD6kPMEp4qzoiSgstQxD6RVHMtaQQtdLOSb_5VFc3I3WHDZ7nBIrdY5DUWLdS08twYV4myhdorRPJarcrcWB_DKK3z8PyP1y7Qrp29Cqhto8Z8toBLtCOX80k5u3Ir5hvcMr8E |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB58gXrwueL6zMGLh2rapm1ylFVRXFfRVbxIyWMCgnbF3fX3m6RVFPHgqS2kD2Yyky-d-WYA9rgwsWFJERmb6ogxwSKluc-yQOu8oTvUXUu6Ra_HHx7EdUNWD1wYRAzJZ3jgT0Ms3wz02P8qcxbuZpcDGJMwnTGW0Jqu9RU08Et77XidDSf8KypJxWH_qOP2gkl2kGa--knxYxUKbVV--eKwwJwu_vPTlmChQZLkqFb9MkxgtQLz3-oLrsLjVUUcwCMdn1oeWJZIbhy4JANLLjv9W-IQaxhw5RzHi3vYvXweIzkZ-iuvMfJUEc_mGbyT46YZD2moBThswd3pSb9zFjX9FCKdsGwUyUIicmZim1uhdGoVVVTSWBfWouCZMLk2eZxrmgiVK2lFlqhUWZYahxtSTNdgqhpUuA4k5hQzkcvCSMGMVlInKqNMcyut5im2Yf9TvuVrXTajDNsNKkqni9Lromx00YaWl-e3cbUo27D1qZGyMaxh6fyPw5Q-fLjxx227MHvWv-yW3fPexSbM-TfV6bVbMDV6G-M2zOj30dPwbSfMng_3BsJL |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+the+Convergence+Rate+of+MCTS+for+the+Optimal+Value+Estimation+in+Markov+Decision+Processes&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Chang%2C+Hyeong+Soo&rft.date=2025-07-01&rft.pub=IEEE&rft.issn=0018-9286&rft.volume=70&rft.issue=7&rft.spage=4788&rft.epage=4793&rft_id=info:doi/10.1109%2FTAC.2025.3538807&rft.externalDocID=10870057 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon |