Monte Carlo Tree Search to Compare Reward Functions for Reinforcement Learning

Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control...

Full description

Saved in:
Bibliographic Details
Published in:2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI) pp. 000123 - 000128
Main Authors: Kovari, Balint, Pelenczei, Balint, Becsi, Tamas
Format: Conference Proceeding
Language:English
Published: IEEE 25.05.2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control task. Finding the proper reward is time-consuming since the model must be trained with all the potential candidates. Finally, a comparison has to be conducted. This paper proposes that the Monte-Carlo Tree Search algorithm can be used to compare and rank the different reward strategies. To see that the search algorithm can be used for such a task. A Policy Gradient algorithm is trained to solve the Traffic Signal Control problem with different rewarding strategies from the literature. The results show that both methods suggest the same order between the performances of the rewarding concepts. Hence the Monte-Carlo Tree Search algorithm can find the best reward for training, which seriously decreases the resource intensity of the entire process.
AbstractList Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control task. Finding the proper reward is time-consuming since the model must be trained with all the potential candidates. Finally, a comparison has to be conducted. This paper proposes that the Monte-Carlo Tree Search algorithm can be used to compare and rank the different reward strategies. To see that the search algorithm can be used for such a task. A Policy Gradient algorithm is trained to solve the Traffic Signal Control problem with different rewarding strategies from the literature. The results show that both methods suggest the same order between the performances of the rewarding concepts. Hence the Monte-Carlo Tree Search algorithm can find the best reward for training, which seriously decreases the resource intensity of the entire process.
Author Becsi, Tamas
Kovari, Balint
Pelenczei, Balint
Author_xml – sequence: 1
  givenname: Balint
  surname: Kovari
  fullname: Kovari, Balint
  email: kovari.balint@kjk.bme.hu
  organization: Budapest Univ. of Technology and Economics,Department of Control for Transportation and Vehicle Systems,Budapest,Hungary
– sequence: 2
  givenname: Balint
  surname: Pelenczei
  fullname: Pelenczei, Balint
  email: bpelenczei@edu.bme.hu
  organization: Budapest Univ. of Technology and Economics,Department of Control for Transportation and Vehicle Systems,Budapest,Hungary
– sequence: 3
  givenname: Tamas
  surname: Becsi
  fullname: Becsi, Tamas
  email: becsi.tamas@kjk.bme.hu
  organization: Budapest Univ. of Technology and Economics,Department of Control for Transportation and Vehicle Systems,Budapest,Hungary
BookMark eNotj11LwzAYRiPohc79AkHyB1rzJs3X5ShOB52C6_1ImzcaWJORVcR_78BdHXjgPHDuyHXKCQl5BFYDMPu0W7UbKRWYmjPOa2vBSjBXZGm1AaVkY4BLdkvetjnNSFtXDpn2BZHu0JXxi86Ztnk6uoL0A39c8XT9ncY55nSiIZfzGNOZI06YZtqdpRTT5z25Ce5wwuWFC9Kvn_v2tereXzbtqqtiY0TlvWPImhC0MV6PAN6ZIWjuOIDT0qggBqGagRttRhTKG_TWS68054CMiQV5-L-NiLg_lji58ru_RIo_GuhMJA
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SACI55618.2022.9919518
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665481250
1665481250
EndPage 000128
ExternalDocumentID 9919518
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i483-dda0e04ff788d7c11da8bf72a211a7586f3b364b2878ce36d8ed9d5d67221e003
IEDL.DBID RIE
IngestDate Thu Jan 18 11:14:33 EST 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i483-dda0e04ff788d7c11da8bf72a211a7586f3b364b2878ce36d8ed9d5d67221e003
PageCount 6
ParticipantIDs ieee_primary_9919518
PublicationCentury 2000
PublicationDate 2022-May-25
PublicationDateYYYYMMDD 2022-05-25
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-May-25
  day: 25
PublicationDecade 2020
PublicationTitle 2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI)
PublicationTitleAbbrev SACI
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7953148
Snippet Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of...
SourceID ieee
SourceType Publisher
StartPage 000123
SubjectTerms Intelligent Transportation Systems
Monte Carlo methods
Phase measurement
Reinforcement learning
Search problems
Task analysis
Traffic Control
Training
Transportation
Title Monte Carlo Tree Search to Compare Reward Functions for Reinforcement Learning
URI https://ieeexplore.ieee.org/document/9919518
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB3a4sGTSit-k4NHt-1m093sURaLHixFe-itbDITKUhX1tbf7yS7VAQv3kIICUwymTfJmxmAW2PKXMbWRmj4ClTGUZQj2ijVioxlAIIWQ7GJbDbTy2U-78DdPhaGiAL5jIa-Gf7ysbI7_1Q2YizDgEB3oZtlaROr1Qb9xuN89HpfPPlij56wJeWwHfyrakowGtOj_y13DIOf6Dsx39uVE-jQpg-zZ59GShRl_V6JRU0kGqKw2FaiaHjk4oU8BVZM2VSF0yQYkHJnyI1qwzOgaNOpvg1gMX1YFI9RWwshWiudRIjlmMbKOfZYMbNxjKU2LpMl-28lQ_7UJSZJlWH_R1tKUtSEOU4wzaSMiTX3FHqbakNnIIhUjIqn8tprkbR1zliN5H0r3rFz6HtRrD6abBerVgoXf3dfwqGXtv9Pl5Mr6G3rHV3Dgf3arj_rm7BF3ypflnc
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7mFPRJZRN_mwcf7damWZs-ynBsuJWhfdjbaHJXGcgqdfPvN0nLRPDFtxBCApdc7rvkuzuAe6XyhAdae6jMFShUQV6CqL1IClLaABDU6IpNxGkqF4tk3oKHXSwMETnyGfVs0_3lY6m39qmsb7CMAQRyD_YHQnC_jtZqwn4DP-m_Pg4nttyjpWxx3muG_6qb4szG6Ph_C55A9yf-js13luUUWrTuQDqziaTYMK_eS5ZVRKymCrNNyYY1k5y9kCXBspExVu48MQNJTafLjqrdQyBrEqq-dSEbPWXDsddUQ_BWQoYeYu6TL4rC-KwY6yDAXKoi5rnx4HID-qMiVGEklPGApKYwQkmY4ACjmPOAjO6eQXtdrukcGJEIUJiprP5qJKmLQmmJZL0rs2cX0LGiWH7U-S6WjRQu_-6-g8NxNpsup5P0-QqOrOTt7zofXEN7U23pBg7012b1Wd267foGLVKZvg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE+16th+International+Symposium+on+Applied+Computational+Intelligence+and+Informatics+%28SACI%29&rft.atitle=Monte+Carlo+Tree+Search+to+Compare+Reward+Functions+for+Reinforcement+Learning&rft.au=Kovari%2C+Balint&rft.au=Pelenczei%2C+Balint&rft.au=Becsi%2C+Tamas&rft.date=2022-05-25&rft.pub=IEEE&rft.spage=000123&rft.epage=000128&rft_id=info:doi/10.1109%2FSACI55618.2022.9919518&rft.externalDocID=9919518