Tracking interval control for urban rail trains based on safe reinforcement learning
In order to solve the problem of controlling the interval between trains in the new train control system, which aims to ensure the safe operation of trains and improve traffic density, the process of managing train speed is treated as a decision-making process. The utilization of Safe Reinforcement...
Saved in:
| Published in: | Engineering applications of artificial intelligence Vol. 137; p. 109226 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
01.11.2024
|
| Subjects: | |
| ISSN: | 0952-1976 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In order to solve the problem of controlling the interval between trains in the new train control system, which aims to ensure the safe operation of trains and improve traffic density, the process of managing train speed is treated as a decision-making process. The utilization of Safe Reinforcement Learning is implemented to attain immediate control of the train interval within the train section. Firstly, utilizing vehicle-to-vehicle communication, the train obtains state information about its surroundings. A constrained Markov Decision Process model is created that takes into account the dynamic characteristics of both the leading and tracking trains. Secondly, by integrating the minimal safety distance and the maximum operating efficiency distance, safety and optimality are connected. An augmented Lagrange multiplier method is utilized to design and implement the safe reinforcement learning algorithm. To enhance the convergence speed of the algorithm, a dual-priority system is implemented, classifying and extracting samples based on their varying levels of importance in empirical samples. Ultimately, simulations were performed to examine various train tracking scenarios. The findings demonstrate that, in the same scenarios, this algorithm surpasses both the Lagrange-based deep deterministic policy gradient algorithm and the fixed lambda based deep deterministic policy gradient algorithm. The safety performance has been improved by 30% and 60%, and the optimality performance has been improved by 40% and 30%, respectively. This algorithm, when paired with safety experience prioritized replay, achieves faster convergence compared to the enhanced version. In general, this algorithm exhibits robust suitability for train tracking interval control. |
|---|---|
| ISSN: | 0952-1976 |
| DOI: | 10.1016/j.engappai.2024.109226 |