Deep deterministic policy gradient algorithm for crowd-evacuation path planning

•We propose E-MADDPG algorithm with higher learning efficiency based on MADDPG.•We extract the motion trajectory from the pedestrian video to reduce the state space.•We propose a hierarchical crowd evacuation path planning method based on DRL. In existing evacuation methods, the large number of pede...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Computers & industrial engineering Ročník 161; s. 107621
Hlavní autoři:	Li, Xinjin, Liu, Hong, Li, Junqing, Li, Yan
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Ltd 01.11.2021
Témata:	Crowd simulation for evacuation Deep reinforcement learning Multi-agent reinforcement learning Path planning Crowd simulation for evacuation Multi-agent reinforcement learning Deep reinforcement learning Path planning
ISSN:	0360-8352, 1879-0550
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	•We propose E-MADDPG algorithm with higher learning efficiency based on MADDPG.•We extract the motion trajectory from the pedestrian video to reduce the state space.•We propose a hierarchical crowd evacuation path planning method based on DRL. In existing evacuation methods, the large number of pedestrians and the complex environment will affect the efficiency of evacuation. Therefore, we propose a hierarchical evacuation method based on multi-agent deep reinforcement learning (MADRL) to solve the above problem. First, we use a two-level evacuation mechanism to guide evacuations, the crowd is divided into leaders and followers. Second, in the upper level, leaders perform path planning to guide the evacuation. To obtain the best evacuation path, we propose the efficient multi-agent deep deterministic policy gradient (E-MADDPG) algorithm for crowd-evacuation path planning. E-MADDPG algorithm combines learning curves to improve the fixed experience pool of MADDPG algorithm and uses high-priority experience playback strategy to improve the sampling strategy. The improvement increases the learning efficiency of the algorithm. Meanwhile we extract pedestrian motion trajectories from real motion videos to reduce the state space of algorithm. Third, in the bottom layer, followers use the relative velocity obstacle (RVO) algorithm to avoid collisions and follow leaders to evacuate. Finally, experimental results illustrate that the E-MADDPG algorithm can improve path planning efficiency, while the proposed method can improve the efficiency of crowd evacuation.
ISSN:	0360-8352 1879-0550
DOI:	10.1016/j.cie.2021.107621