Reward design and hyperparameter tuning for generalizable deep reinforcement learning agents in autonomous racing.

Saved in:
Bibliographic Details
Title: Reward design and hyperparameter tuning for generalizable deep reinforcement learning agents in autonomous racing.
Authors: Kunda, Naga Sai Shreya, Kc, Pranave, Pandey, Mayank, Kumaar, A. A. Nippun
Source: Scientific Reports; 12/16/2025, Vol. 15 Issue 1, p1-15, 15p
Subject Terms: REINFORCEMENT learning, AUTOMOBILE racing, OPTIMIZATION algorithms, GENERALIZATION, REWARD (Psychology)
Abstract: Deep Reinforcement Learning (DRL) is transforming autonomous racing by enabling agents to make real-time, high-stakes decisions with the least supervision. Yet, strong generalization over multiple varied tracks is a key bottleneck. Within this paper, a rigorous examination of the relationship between reward system design and hyperparameter tuning for autonomous racing agents using the AWS DeepRacer platform as a unified benchmark is conducted. A comprehensive comparison of Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) algorithms on two vastly different reward structures with the aid of an extensive tuning of batch size, learning rate, discount factor, and entropy is performed. The results identify that a well-engineered reward mechanism, under the optimized hyperparameter (batch size 128, learning rate 0.0003, discount factor 0.99, entropy 0.01), allows PPO to outperform standard benchmarks with an average lap time of 12.464 s on 21 unseen tracks. These results demonstrate not only enhanced performance but also improved generalization, enabling the models to perform effectively on previously unseen tracks. Additionally, significant emphasis was placed on reward shaping and analyzing hyperparameter sensitivity in large-scale DRL systems to ensure their practical applicability in autonomous scenarios. [ABSTRACT FROM AUTHOR]
Copyright of Scientific Reports is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Complementary Index
Be the first to leave a comment!
You must be logged in first