Deep deterministic policy gradient algorithm for crowd-evacuation path planning

•We propose E-MADDPG algorithm with higher learning efficiency based on MADDPG.•We extract the motion trajectory from the pedestrian video to reduce the state space.•We propose a hierarchical crowd evacuation path planning method based on DRL. In existing evacuation methods, the large number of pede...

Full description

Saved in:

Bibliographic Details
Published in:	Computers & industrial engineering Vol. 161; p. 107621
Main Authors:	Li, Xinjin, Liu, Hong, Li, Junqing, Li, Yan
Format:	Journal Article
Language:	English
Published:	Elsevier Ltd 01.11.2021
Subjects:	Crowd simulation for evacuation Deep reinforcement learning Multi-agent reinforcement learning Path planning Crowd simulation for evacuation Multi-agent reinforcement learning Deep reinforcement learning Path planning
ISSN:	0360-8352, 1879-0550
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•We propose E-MADDPG algorithm with higher learning efficiency based on MADDPG.•We extract the motion trajectory from the pedestrian video to reduce the state space.•We propose a hierarchical crowd evacuation path planning method based on DRL. In existing evacuation methods, the large number of pedestrians and the complex environment will affect the efficiency of evacuation. Therefore, we propose a hierarchical evacuation method based on multi-agent deep reinforcement learning (MADRL) to solve the above problem. First, we use a two-level evacuation mechanism to guide evacuations, the crowd is divided into leaders and followers. Second, in the upper level, leaders perform path planning to guide the evacuation. To obtain the best evacuation path, we propose the efficient multi-agent deep deterministic policy gradient (E-MADDPG) algorithm for crowd-evacuation path planning. E-MADDPG algorithm combines learning curves to improve the fixed experience pool of MADDPG algorithm and uses high-priority experience playback strategy to improve the sampling strategy. The improvement increases the learning efficiency of the algorithm. Meanwhile we extract pedestrian motion trajectories from real motion videos to reduce the state space of algorithm. Third, in the bottom layer, followers use the relative velocity obstacle (RVO) algorithm to avoid collisions and follow leaders to evacuate. Finally, experimental results illustrate that the E-MADDPG algorithm can improve path planning efficiency, while the proposed method can improve the efficiency of crowd evacuation.
ISSN:	0360-8352 1879-0550
DOI:	10.1016/j.cie.2021.107621