Tracking interval control for urban rail trains based on safe reinforcement learning
In order to solve the problem of controlling the interval between trains in the new train control system, which aims to ensure the safe operation of trains and improve traffic density, the process of managing train speed is treated as a decision-making process. The utilization of Safe Reinforcement...
Uloženo v:
| Vydáno v: | Engineering applications of artificial intelligence Ročník 137; s. 109226 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Ltd
01.11.2024
|
| Témata: | |
| ISSN: | 0952-1976 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | In order to solve the problem of controlling the interval between trains in the new train control system, which aims to ensure the safe operation of trains and improve traffic density, the process of managing train speed is treated as a decision-making process. The utilization of Safe Reinforcement Learning is implemented to attain immediate control of the train interval within the train section. Firstly, utilizing vehicle-to-vehicle communication, the train obtains state information about its surroundings. A constrained Markov Decision Process model is created that takes into account the dynamic characteristics of both the leading and tracking trains. Secondly, by integrating the minimal safety distance and the maximum operating efficiency distance, safety and optimality are connected. An augmented Lagrange multiplier method is utilized to design and implement the safe reinforcement learning algorithm. To enhance the convergence speed of the algorithm, a dual-priority system is implemented, classifying and extracting samples based on their varying levels of importance in empirical samples. Ultimately, simulations were performed to examine various train tracking scenarios. The findings demonstrate that, in the same scenarios, this algorithm surpasses both the Lagrange-based deep deterministic policy gradient algorithm and the fixed lambda based deep deterministic policy gradient algorithm. The safety performance has been improved by 30% and 60%, and the optimality performance has been improved by 40% and 30%, respectively. This algorithm, when paired with safety experience prioritized replay, achieves faster convergence compared to the enhanced version. In general, this algorithm exhibits robust suitability for train tracking interval control. |
|---|---|
| ISSN: | 0952-1976 |
| DOI: | 10.1016/j.engappai.2024.109226 |