Optimization of Automatic Driving Safety Strategy Based on Reinforcement learning Algorithm of PPO

Uloženo v:
Podrobná bibliografie
Název: Optimization of Automatic Driving Safety Strategy Based on Reinforcement learning Algorithm of PPO
Autoři: Guanxu Bai
Zdroj: Applied and Computational Engineering. 158:211-220
Informace o vydavateli: EWA Publishing, 2025.
Rok vydání: 2025
Popis: With the development of the automobile industry and artificial intelligence, autonomous driving is an important research topic and the future development trend, However, there are still defects in the decision-making ability of autonomous driving in the informed environment and the safe driving ability in complex environments. In order to solve this problem, based on the Proximal Policy Optimization (PPO) strategy of reinforcement learning, this study proposes two novel algorithms: Soft-constrained PPO and Hard-constrained PPO to optimize the policy of safe reinforcement learning. Soft constraints mean that by introducing new assessment criteria, the reward function is modified. The hard constraint is to force the unsafe training to stop by setting the maximum risk control threshold. After giving the algorithm, a comparative experiment is carried out, and the three models are trained in the same environment (highway-V0). It is found that the new proposed algorithm not only improves the performance, but also effectively controls the unsafe behaviors in the autonomous driving environment, such as lane deviation and collision.
Druh dokumentu: Article
ISSN: 2755-273X
2755-2721
DOI: 10.54254/2755-2721/2025.tj23484
Přístupové číslo: edsair.doi...........9e8ed8ea10be34b55ef1be4983f6b869
Databáze: OpenAIRE
Popis
Abstrakt:With the development of the automobile industry and artificial intelligence, autonomous driving is an important research topic and the future development trend, However, there are still defects in the decision-making ability of autonomous driving in the informed environment and the safe driving ability in complex environments. In order to solve this problem, based on the Proximal Policy Optimization (PPO) strategy of reinforcement learning, this study proposes two novel algorithms: Soft-constrained PPO and Hard-constrained PPO to optimize the policy of safe reinforcement learning. Soft constraints mean that by introducing new assessment criteria, the reward function is modified. The hard constraint is to force the unsafe training to stop by setting the maximum risk control threshold. After giving the algorithm, a comparative experiment is carried out, and the three models are trained in the same environment (highway-V0). It is found that the new proposed algorithm not only improves the performance, but also effectively controls the unsafe behaviors in the autonomous driving environment, such as lane deviation and collision.
ISSN:2755273X
27552721
DOI:10.54254/2755-2721/2025.tj23484