Mean deep deterministic policy gradient algorithm for pursuit strategies in three-body confrontation

•The ensemble-based algorithm mean deep deterministic policy gradient is proposed.•A Markov decision model for the three-body confrontation problem is established.•An action-transform method is developed for efficient learning.•Some additional learning techniques is equipped with to improve the perf...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Expert systems with applications Ročník 287; s. 128139
Hlavní autori:	Wang, Ziheng, Pu, Xiandong, Li, Yulin, Zhang, Jianlei, Zhang, Chunyan
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier Ltd 25.08.2025
Predmet:	Deterministic policy gradient Ensemble reinforcement learning Guidance Pursuit-evasion games Three-body confrontation Guidance Ensemble reinforcement learning Three-body confrontation Pursuit-evasion games Deterministic policy gradient
ISSN:	0957-4174
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	•The ensemble-based algorithm mean deep deterministic policy gradient is proposed.•A Markov decision model for the three-body confrontation problem is established.•An action-transform method is developed for efficient learning.•Some additional learning techniques is equipped with to improve the performance.•Ablation study and comparison experiments are conducted to prove the performance. Three-body confrontation is a challenging pursuit-evasion game with significant applications across various fields. Traditional methods based on differential game theory struggle to manage environmental complexity, imperfect information, and long-term decision-making. Leveraging the model-free approach and robust training capabilities of deep reinforcement learning, we propose an ensemble-based actor-critic algorithm named Augmented Mean Deep Deterministic Policy Gradient (AMDPG) to learn pursuit strategies in Three-body confrontation. This method includes an ensemble reinforcement learning architecture and incorporates multiple learning techniques to enhance its performance. Furthermore, we introduce an action-transform method that provides two prior strategies as heuristic guidance to accelerate action space exploration during learning. The proposed algorithm is evaluated in various scenarios, demonstrating superior policy performance and convergence compared to certain state-of-the-art algorithms. The learned strategies succeed in most testing scenarios, achieving higher penetration rates than its competitors.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2025.128139