Mean deep deterministic policy gradient algorithm for pursuit strategies in three-body confrontation

•The ensemble-based algorithm mean deep deterministic policy gradient is proposed.•A Markov decision model for the three-body confrontation problem is established.•An action-transform method is developed for efficient learning.•Some additional learning techniques is equipped with to improve the perf...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications Jg. 287; S. 128139
Hauptverfasser: Wang, Ziheng, Pu, Xiandong, Li, Yulin, Zhang, Jianlei, Zhang, Chunyan
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 25.08.2025
Schlagworte:
ISSN:0957-4174
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•The ensemble-based algorithm mean deep deterministic policy gradient is proposed.•A Markov decision model for the three-body confrontation problem is established.•An action-transform method is developed for efficient learning.•Some additional learning techniques is equipped with to improve the performance.•Ablation study and comparison experiments are conducted to prove the performance. Three-body confrontation is a challenging pursuit-evasion game with significant applications across various fields. Traditional methods based on differential game theory struggle to manage environmental complexity, imperfect information, and long-term decision-making. Leveraging the model-free approach and robust training capabilities of deep reinforcement learning, we propose an ensemble-based actor-critic algorithm named Augmented Mean Deep Deterministic Policy Gradient (AMDPG) to learn pursuit strategies in Three-body confrontation. This method includes an ensemble reinforcement learning architecture and incorporates multiple learning techniques to enhance its performance. Furthermore, we introduce an action-transform method that provides two prior strategies as heuristic guidance to accelerate action space exploration during learning. The proposed algorithm is evaluated in various scenarios, demonstrating superior policy performance and convergence compared to certain state-of-the-art algorithms. The learned strategies succeed in most testing scenarios, achieving higher penetration rates than its competitors.
ISSN:0957-4174
DOI:10.1016/j.eswa.2025.128139