Monte Carlo Tree Search to Compare Reward Functions for Reinforcement Learning

Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI) s. 000123 - 000128
Hlavní autoři: Kovari, Balint, Pelenczei, Balint, Becsi, Tamas
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 25.05.2022
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control task. Finding the proper reward is time-consuming since the model must be trained with all the potential candidates. Finally, a comparison has to be conducted. This paper proposes that the Monte-Carlo Tree Search algorithm can be used to compare and rank the different reward strategies. To see that the search algorithm can be used for such a task. A Policy Gradient algorithm is trained to solve the Traffic Signal Control problem with different rewarding strategies from the literature. The results show that both methods suggest the same order between the performances of the rewarding concepts. Hence the Monte-Carlo Tree Search algorithm can find the best reward for training, which seriously decreases the resource intensity of the entire process.
DOI:10.1109/SACI55618.2022.9919518