FP-WDDQN: An improved deep reinforcement learning algorithm for adaptive traffic signal control
Current adaptive traffic signal control methods based on centralized deep reinforcement learning are not applicable in large-scale adaptive traffic control environment. The scalability problem is overcome by assigning global control to each local RL agent through multi-intelligence reinforcement lea...
Uloženo v:
| Vydáno v: | IEEE ... International Conference on Data Mining workshops s. 44 - 51 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
04.12.2023
|
| Témata: | |
| ISSN: | 2375-9259 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Current adaptive traffic signal control methods based on centralized deep reinforcement learning are not applicable in large-scale adaptive traffic control environment. The scalability problem is overcome by assigning global control to each local RL agent through multi-intelligence reinforcement learning, but the environment now becomes partially visible ami non-stationarity from the perspective of each local agent due to limited communication between agents. In this paper, we propose a multi-agent framework called Forgetful Priority Weighed Double Deep Q-learning (FP-WDDQN). We firstly extend Weighted Double Deep Q-Learning(WDDQN) to the multi agent domain, so as to reduce the error caused by the algorithm's underestimation of the target network and get more accurate Q-value. Then we propose a new algorithm based on Forgetful Experience Mechanism(FEM) and Priority Experience Replay Mechanism(PERM). This mechanism makes WDDQN choose experience based on Temporal Difference (TD) error and Identify the importance of experience with FEM. It makes WDDQN preferentially select experience with high TD-error in the process of sampling and training its network based on FEM and TD-error, which improved the model training efficiency and stabilized the learning process. In this paper, we construct an urban traffic network with seven intersections in the simulation platform called Simulation of Urban MObility(SUMO), and the proposed FP-WDDQN is compared against Independent Double Deep Q-learning(IDDQN), WDDQN and Multi-agent Advantage Actor Critic (MA2C) under the condition of simulating the traffic dynamics of peak hours. Experiment shows FP-WDDQN has better performance than other algorithms in vehicle speed, intersection delay and intersection waiting queue length. |
|---|---|
| ISSN: | 2375-9259 |
| DOI: | 10.1109/ICDMW60847.2023.00015 |