Tracking interval control for urban rail trains based on safe reinforcement learning

In order to solve the problem of controlling the interval between trains in the new train control system, which aims to ensure the safe operation of trains and improve traffic density, the process of managing train speed is treated as a decision-making process. The utilization of Safe Reinforcement...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Engineering applications of artificial intelligence Ročník 137; s. 109226
Hlavní autoři: Lin, Junting, Qiu, Xiaohui, Li, Maolin
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.11.2024
Témata:
ISSN:0952-1976
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In order to solve the problem of controlling the interval between trains in the new train control system, which aims to ensure the safe operation of trains and improve traffic density, the process of managing train speed is treated as a decision-making process. The utilization of Safe Reinforcement Learning is implemented to attain immediate control of the train interval within the train section. Firstly, utilizing vehicle-to-vehicle communication, the train obtains state information about its surroundings. A constrained Markov Decision Process model is created that takes into account the dynamic characteristics of both the leading and tracking trains. Secondly, by integrating the minimal safety distance and the maximum operating efficiency distance, safety and optimality are connected. An augmented Lagrange multiplier method is utilized to design and implement the safe reinforcement learning algorithm. To enhance the convergence speed of the algorithm, a dual-priority system is implemented, classifying and extracting samples based on their varying levels of importance in empirical samples. Ultimately, simulations were performed to examine various train tracking scenarios. The findings demonstrate that, in the same scenarios, this algorithm surpasses both the Lagrange-based deep deterministic policy gradient algorithm and the fixed lambda based deep deterministic policy gradient algorithm. The safety performance has been improved by 30% and 60%, and the optimality performance has been improved by 40% and 30%, respectively. This algorithm, when paired with safety experience prioritized replay, achieves faster convergence compared to the enhanced version. In general, this algorithm exhibits robust suitability for train tracking interval control.
ISSN:0952-1976
DOI:10.1016/j.engappai.2024.109226