FP-WDDQN: An improved deep reinforcement learning algorithm for adaptive traffic signal control

Current adaptive traffic signal control methods based on centralized deep reinforcement learning are not applicable in large-scale adaptive traffic control environment. The scalability problem is overcome by assigning global control to each local RL agent through multi-intelligence reinforcement lea...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE ... International Conference on Data Mining workshops s. 44 - 51
Hlavní autoři: Zhang, Xiao, Xu, Xiaolong
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 04.12.2023
Témata:
ISSN:2375-9259
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Current adaptive traffic signal control methods based on centralized deep reinforcement learning are not applicable in large-scale adaptive traffic control environment. The scalability problem is overcome by assigning global control to each local RL agent through multi-intelligence reinforcement learning, but the environment now becomes partially visible ami non-stationarity from the perspective of each local agent due to limited communication between agents. In this paper, we propose a multi-agent framework called Forgetful Priority Weighed Double Deep Q-learning (FP-WDDQN). We firstly extend Weighted Double Deep Q-Learning(WDDQN) to the multi agent domain, so as to reduce the error caused by the algorithm's underestimation of the target network and get more accurate Q-value. Then we propose a new algorithm based on Forgetful Experience Mechanism(FEM) and Priority Experience Replay Mechanism(PERM). This mechanism makes WDDQN choose experience based on Temporal Difference (TD) error and Identify the importance of experience with FEM. It makes WDDQN preferentially select experience with high TD-error in the process of sampling and training its network based on FEM and TD-error, which improved the model training efficiency and stabilized the learning process. In this paper, we construct an urban traffic network with seven intersections in the simulation platform called Simulation of Urban MObility(SUMO), and the proposed FP-WDDQN is compared against Independent Double Deep Q-learning(IDDQN), WDDQN and Multi-agent Advantage Actor Critic (MA2C) under the condition of simulating the traffic dynamics of peak hours. Experiment shows FP-WDDQN has better performance than other algorithms in vehicle speed, intersection delay and intersection waiting queue length.
ISSN:2375-9259
DOI:10.1109/ICDMW60847.2023.00015