Reward criteria impact on the performance of reinforcement learning agent for autonomous navigation
In reinforcement learning, an agent takes action at every time step (follows a policy) in an environment to maximize the expected cumulative reward. Therefore, the shaping of a reward function plays a crucial role in an agent’s learning. Designing an optimal reward function is not a trivial task. In...
Uloženo v:
| Vydáno v: | Applied soft computing Ročník 126; s. 109241 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.09.2022
|
| Témata: | |
| ISSN: | 1568-4946, 1872-9681 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | In reinforcement learning, an agent takes action at every time step (follows a policy) in an environment to maximize the expected cumulative reward. Therefore, the shaping of a reward function plays a crucial role in an agent’s learning. Designing an optimal reward function is not a trivial task. In this article, we propose a reward criterion using which we develop different reward functions. The reward criterion chosen is based on the percentage of positive and negative rewards received by an agent. This reward criteria further gives rise to three different classes, ‘Balanced Class,’ ‘Skewed Positive Class,’ and ‘Skewed Negative Class.’ We train a Deep Q-Network agent on a point-goal based navigation task using the different reward classes. We also compare the performance of the proposed classes with a benchmark class. Based on the experiments, the skewed negative class outperforms the benchmark class by achieving very less variance. On the other hand, the benchmark class converges relatively faster than the skewed negative class.
•A reward criterion to assess the performance of an RL agent.•Various reward functions to train an RL agent.•The proportion of positive and negative rewards in a reward shaping function.•The reward criterion: ‘Balanced Class’, ‘Skewed Positive Class’ and ‘Skewed Negative Class’.•The performance of an RL agent in the case of autonomous navigation. |
|---|---|
| ISSN: | 1568-4946 1872-9681 |
| DOI: | 10.1016/j.asoc.2022.109241 |