FP-WDDQN: An improved deep reinforcement learning algorithm for adaptive traffic signal control

Current adaptive traffic signal control methods based on centralized deep reinforcement learning are not applicable in large-scale adaptive traffic control environment. The scalability problem is overcome by assigning global control to each local RL agent through multi-intelligence reinforcement lea...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE ... International Conference on Data Mining workshops pp. 44 - 51
Main Authors:	Zhang, Xiao, Xu, Xiaolong
Format:	Conference Proceeding
Language:	English
Published:	IEEE 04.12.2023
Subjects:	Deep learning Deep reinforcement learning Finite element analysis Forgetful Experience Mechanism Heuristic algorithms Intelligent traffic light control Multi-agent technology Prioritized Experience Replay Scalability Traffic control Training Vehicle dynamics
ISSN:	2375-9259
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Current adaptive traffic signal control methods based on centralized deep reinforcement learning are not applicable in large-scale adaptive traffic control environment. The scalability problem is overcome by assigning global control to each local RL agent through multi-intelligence reinforcement learning, but the environment now becomes partially visible ami non-stationarity from the perspective of each local agent due to limited communication between agents. In this paper, we propose a multi-agent framework called Forgetful Priority Weighed Double Deep Q-learning (FP-WDDQN). We firstly extend Weighted Double Deep Q-Learning(WDDQN) to the multi agent domain, so as to reduce the error caused by the algorithm's underestimation of the target network and get more accurate Q-value. Then we propose a new algorithm based on Forgetful Experience Mechanism(FEM) and Priority Experience Replay Mechanism(PERM). This mechanism makes WDDQN choose experience based on Temporal Difference (TD) error and Identify the importance of experience with FEM. It makes WDDQN preferentially select experience with high TD-error in the process of sampling and training its network based on FEM and TD-error, which improved the model training efficiency and stabilized the learning process. In this paper, we construct an urban traffic network with seven intersections in the simulation platform called Simulation of Urban MObility(SUMO), and the proposed FP-WDDQN is compared against Independent Double Deep Q-learning(IDDQN), WDDQN and Multi-agent Advantage Actor Critic (MA2C) under the condition of simulating the traffic dynamics of peak hours. Experiment shows FP-WDDQN has better performance than other algorithms in vehicle speed, intersection delay and intersection waiting queue length.
ISSN:	2375-9259
DOI:	10.1109/ICDMW60847.2023.00015