Tracking interval control for urban rail trains based on safe reinforcement learning

In order to solve the problem of controlling the interval between trains in the new train control system, which aims to ensure the safe operation of trains and improve traffic density, the process of managing train speed is treated as a decision-making process. The utilization of Safe Reinforcement...

Full description

Saved in:
Bibliographic Details
Published in:Engineering applications of artificial intelligence Vol. 137; p. 109226
Main Authors: Lin, Junting, Qiu, Xiaohui, Li, Maolin
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.11.2024
Subjects:
ISSN:0952-1976
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In order to solve the problem of controlling the interval between trains in the new train control system, which aims to ensure the safe operation of trains and improve traffic density, the process of managing train speed is treated as a decision-making process. The utilization of Safe Reinforcement Learning is implemented to attain immediate control of the train interval within the train section. Firstly, utilizing vehicle-to-vehicle communication, the train obtains state information about its surroundings. A constrained Markov Decision Process model is created that takes into account the dynamic characteristics of both the leading and tracking trains. Secondly, by integrating the minimal safety distance and the maximum operating efficiency distance, safety and optimality are connected. An augmented Lagrange multiplier method is utilized to design and implement the safe reinforcement learning algorithm. To enhance the convergence speed of the algorithm, a dual-priority system is implemented, classifying and extracting samples based on their varying levels of importance in empirical samples. Ultimately, simulations were performed to examine various train tracking scenarios. The findings demonstrate that, in the same scenarios, this algorithm surpasses both the Lagrange-based deep deterministic policy gradient algorithm and the fixed lambda based deep deterministic policy gradient algorithm. The safety performance has been improved by 30% and 60%, and the optimality performance has been improved by 40% and 30%, respectively. This algorithm, when paired with safety experience prioritized replay, achieves faster convergence compared to the enhanced version. In general, this algorithm exhibits robust suitability for train tracking interval control.
ISSN:0952-1976
DOI:10.1016/j.engappai.2024.109226