FP-WDDQN: An improved deep reinforcement learning algorithm for adaptive traffic signal control
Current adaptive traffic signal control methods based on centralized deep reinforcement learning are not applicable in large-scale adaptive traffic control environment. The scalability problem is overcome by assigning global control to each local RL agent through multi-intelligence reinforcement lea...
Saved in:
| Published in: | IEEE ... International Conference on Data Mining workshops pp. 44 - 51 |
|---|---|
| Main Authors: | , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
04.12.2023
|
| Subjects: | |
| ISSN: | 2375-9259 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Current adaptive traffic signal control methods based on centralized deep reinforcement learning are not applicable in large-scale adaptive traffic control environment. The scalability problem is overcome by assigning global control to each local RL agent through multi-intelligence reinforcement learning, but the environment now becomes partially visible ami non-stationarity from the perspective of each local agent due to limited communication between agents. In this paper, we propose a multi-agent framework called Forgetful Priority Weighed Double Deep Q-learning (FP-WDDQN). We firstly extend Weighted Double Deep Q-Learning(WDDQN) to the multi agent domain, so as to reduce the error caused by the algorithm's underestimation of the target network and get more accurate Q-value. Then we propose a new algorithm based on Forgetful Experience Mechanism(FEM) and Priority Experience Replay Mechanism(PERM). This mechanism makes WDDQN choose experience based on Temporal Difference (TD) error and Identify the importance of experience with FEM. It makes WDDQN preferentially select experience with high TD-error in the process of sampling and training its network based on FEM and TD-error, which improved the model training efficiency and stabilized the learning process. In this paper, we construct an urban traffic network with seven intersections in the simulation platform called Simulation of Urban MObility(SUMO), and the proposed FP-WDDQN is compared against Independent Double Deep Q-learning(IDDQN), WDDQN and Multi-agent Advantage Actor Critic (MA2C) under the condition of simulating the traffic dynamics of peak hours. Experiment shows FP-WDDQN has better performance than other algorithms in vehicle speed, intersection delay and intersection waiting queue length. |
|---|---|
| ISSN: | 2375-9259 |
| DOI: | 10.1109/ICDMW60847.2023.00015 |