Optimization of Automatic Driving Safety Strategy Based on Reinforcement learning Algorithm of PPO
Uloženo v:
| Název: | Optimization of Automatic Driving Safety Strategy Based on Reinforcement learning Algorithm of PPO |
|---|---|
| Autoři: | Guanxu Bai |
| Zdroj: | Applied and Computational Engineering. 158:211-220 |
| Informace o vydavateli: | EWA Publishing, 2025. |
| Rok vydání: | 2025 |
| Popis: | With the development of the automobile industry and artificial intelligence, autonomous driving is an important research topic and the future development trend, However, there are still defects in the decision-making ability of autonomous driving in the informed environment and the safe driving ability in complex environments. In order to solve this problem, based on the Proximal Policy Optimization (PPO) strategy of reinforcement learning, this study proposes two novel algorithms: Soft-constrained PPO and Hard-constrained PPO to optimize the policy of safe reinforcement learning. Soft constraints mean that by introducing new assessment criteria, the reward function is modified. The hard constraint is to force the unsafe training to stop by setting the maximum risk control threshold. After giving the algorithm, a comparative experiment is carried out, and the three models are trained in the same environment (highway-V0). It is found that the new proposed algorithm not only improves the performance, but also effectively controls the unsafe behaviors in the autonomous driving environment, such as lane deviation and collision. |
| Druh dokumentu: | Article |
| ISSN: | 2755-273X 2755-2721 |
| DOI: | 10.54254/2755-2721/2025.tj23484 |
| Přístupové číslo: | edsair.doi...........9e8ed8ea10be34b55ef1be4983f6b869 |
| Databáze: | OpenAIRE |
| Abstrakt: | With the development of the automobile industry and artificial intelligence, autonomous driving is an important research topic and the future development trend, However, there are still defects in the decision-making ability of autonomous driving in the informed environment and the safe driving ability in complex environments. In order to solve this problem, based on the Proximal Policy Optimization (PPO) strategy of reinforcement learning, this study proposes two novel algorithms: Soft-constrained PPO and Hard-constrained PPO to optimize the policy of safe reinforcement learning. Soft constraints mean that by introducing new assessment criteria, the reward function is modified. The hard constraint is to force the unsafe training to stop by setting the maximum risk control threshold. After giving the algorithm, a comparative experiment is carried out, and the three models are trained in the same environment (highway-V0). It is found that the new proposed algorithm not only improves the performance, but also effectively controls the unsafe behaviors in the autonomous driving environment, such as lane deviation and collision. |
|---|---|
| ISSN: | 2755273X 27552721 |
| DOI: | 10.54254/2755-2721/2025.tj23484 |
Nájsť tento článok vo Web of Science