Monte Carlo Tree Search to Compare Reward Functions for Reinforcement Learning
Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control...
Saved in:
| Published in: | 2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI) pp. 000123 - 000128 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
25.05.2022
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control task. Finding the proper reward is time-consuming since the model must be trained with all the potential candidates. Finally, a comparison has to be conducted. This paper proposes that the Monte-Carlo Tree Search algorithm can be used to compare and rank the different reward strategies. To see that the search algorithm can be used for such a task. A Policy Gradient algorithm is trained to solve the Traffic Signal Control problem with different rewarding strategies from the literature. The results show that both methods suggest the same order between the performances of the rewarding concepts. Hence the Monte-Carlo Tree Search algorithm can find the best reward for training, which seriously decreases the resource intensity of the entire process. |
|---|---|
| AbstractList | Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control task. Finding the proper reward is time-consuming since the model must be trained with all the potential candidates. Finally, a comparison has to be conducted. This paper proposes that the Monte-Carlo Tree Search algorithm can be used to compare and rank the different reward strategies. To see that the search algorithm can be used for such a task. A Policy Gradient algorithm is trained to solve the Traffic Signal Control problem with different rewarding strategies from the literature. The results show that both methods suggest the same order between the performances of the rewarding concepts. Hence the Monte-Carlo Tree Search algorithm can find the best reward for training, which seriously decreases the resource intensity of the entire process. |
| Author | Becsi, Tamas Kovari, Balint Pelenczei, Balint |
| Author_xml | – sequence: 1 givenname: Balint surname: Kovari fullname: Kovari, Balint email: kovari.balint@kjk.bme.hu organization: Budapest Univ. of Technology and Economics,Department of Control for Transportation and Vehicle Systems,Budapest,Hungary – sequence: 2 givenname: Balint surname: Pelenczei fullname: Pelenczei, Balint email: bpelenczei@edu.bme.hu organization: Budapest Univ. of Technology and Economics,Department of Control for Transportation and Vehicle Systems,Budapest,Hungary – sequence: 3 givenname: Tamas surname: Becsi fullname: Becsi, Tamas email: becsi.tamas@kjk.bme.hu organization: Budapest Univ. of Technology and Economics,Department of Control for Transportation and Vehicle Systems,Budapest,Hungary |
| BookMark | eNotj11LwzAYRiPohc79AkHyB1rzJs3X5ShOB52C6_1ImzcaWJORVcR_78BdHXjgPHDuyHXKCQl5BFYDMPu0W7UbKRWYmjPOa2vBSjBXZGm1AaVkY4BLdkvetjnNSFtXDpn2BZHu0JXxi86Ztnk6uoL0A39c8XT9ncY55nSiIZfzGNOZI06YZtqdpRTT5z25Ce5wwuWFC9Kvn_v2tereXzbtqqtiY0TlvWPImhC0MV6PAN6ZIWjuOIDT0qggBqGagRttRhTKG_TWS68054CMiQV5-L-NiLg_lji58ru_RIo_GuhMJA |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SACI55618.2022.9919518 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781665481250 1665481250 |
| EndPage | 000128 |
| ExternalDocumentID | 9919518 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i483-dda0e04ff788d7c11da8bf72a211a7586f3b364b2878ce36d8ed9d5d67221e003 |
| IEDL.DBID | RIE |
| IngestDate | Thu Jan 18 11:14:33 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i483-dda0e04ff788d7c11da8bf72a211a7586f3b364b2878ce36d8ed9d5d67221e003 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_9919518 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-May-25 |
| PublicationDateYYYYMMDD | 2022-05-25 |
| PublicationDate_xml | – month: 05 year: 2022 text: 2022-May-25 day: 25 |
| PublicationDecade | 2020 |
| PublicationTitle | 2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI) |
| PublicationTitleAbbrev | SACI |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.7953148 |
| Snippet | Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 000123 |
| SubjectTerms | Intelligent Transportation Systems Monte Carlo methods Phase measurement Reinforcement learning Search problems Task analysis Traffic Control Training Transportation |
| Title | Monte Carlo Tree Search to Compare Reward Functions for Reinforcement Learning |
| URI | https://ieeexplore.ieee.org/document/9919518 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB3a4sGTSit-k4NHt-1m093sURaLHixFe-itbDITKUhX1tbf7yS7VAQv3kIICUwymTfJmxmAW2PKXMbWRmj4ClTGUZQj2ijVioxlAIIWQ7GJbDbTy2U-78DdPhaGiAL5jIa-Gf7ysbI7_1Q2YizDgEB3oZtlaROr1Qb9xuN89HpfPPlij56wJeWwHfyrakowGtOj_y13DIOf6Dsx39uVE-jQpg-zZ59GShRl_V6JRU0kGqKw2FaiaHjk4oU8BVZM2VSF0yQYkHJnyI1qwzOgaNOpvg1gMX1YFI9RWwshWiudRIjlmMbKOfZYMbNxjKU2LpMl-28lQ_7UJSZJlWH_R1tKUtSEOU4wzaSMiTX3FHqbakNnIIhUjIqn8tprkbR1zliN5H0r3rFz6HtRrD6abBerVgoXf3dfwqGXtv9Pl5Mr6G3rHV3Dgf3arj_rm7BF3ypflnc |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7mFPRJZRN_mwcf7damWZs-ynBsuJWhfdjbaHJXGcgqdfPvN0nLRPDFtxBCApdc7rvkuzuAe6XyhAdae6jMFShUQV6CqL1IClLaABDU6IpNxGkqF4tk3oKHXSwMETnyGfVs0_3lY6m39qmsb7CMAQRyD_YHQnC_jtZqwn4DP-m_Pg4nttyjpWxx3muG_6qb4szG6Ph_C55A9yf-js13luUUWrTuQDqziaTYMK_eS5ZVRKymCrNNyYY1k5y9kCXBspExVu48MQNJTafLjqrdQyBrEqq-dSEbPWXDsddUQ_BWQoYeYu6TL4rC-KwY6yDAXKoi5rnx4HID-qMiVGEklPGApKYwQkmY4ACjmPOAjO6eQXtdrukcGJEIUJiprP5qJKmLQmmJZL0rs2cX0LGiWH7U-S6WjRQu_-6-g8NxNpsup5P0-QqOrOTt7zofXEN7U23pBg7012b1Wd267foGLVKZvg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE+16th+International+Symposium+on+Applied+Computational+Intelligence+and+Informatics+%28SACI%29&rft.atitle=Monte+Carlo+Tree+Search+to+Compare+Reward+Functions+for+Reinforcement+Learning&rft.au=Kovari%2C+Balint&rft.au=Pelenczei%2C+Balint&rft.au=Becsi%2C+Tamas&rft.date=2022-05-25&rft.pub=IEEE&rft.spage=000123&rft.epage=000128&rft_id=info:doi/10.1109%2FSACI55618.2022.9919518&rft.externalDocID=9919518 |