Mean deep deterministic policy gradient algorithm for pursuit strategies in three-body confrontation
•The ensemble-based algorithm mean deep deterministic policy gradient is proposed.•A Markov decision model for the three-body confrontation problem is established.•An action-transform method is developed for efficient learning.•Some additional learning techniques is equipped with to improve the perf...
Uložené v:
| Vydané v: | Expert systems with applications Ročník 287; s. 128139 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier Ltd
25.08.2025
|
| Predmet: | |
| ISSN: | 0957-4174 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | •The ensemble-based algorithm mean deep deterministic policy gradient is proposed.•A Markov decision model for the three-body confrontation problem is established.•An action-transform method is developed for efficient learning.•Some additional learning techniques is equipped with to improve the performance.•Ablation study and comparison experiments are conducted to prove the performance.
Three-body confrontation is a challenging pursuit-evasion game with significant applications across various fields. Traditional methods based on differential game theory struggle to manage environmental complexity, imperfect information, and long-term decision-making. Leveraging the model-free approach and robust training capabilities of deep reinforcement learning, we propose an ensemble-based actor-critic algorithm named Augmented Mean Deep Deterministic Policy Gradient (AMDPG) to learn pursuit strategies in Three-body confrontation. This method includes an ensemble reinforcement learning architecture and incorporates multiple learning techniques to enhance its performance. Furthermore, we introduce an action-transform method that provides two prior strategies as heuristic guidance to accelerate action space exploration during learning. The proposed algorithm is evaluated in various scenarios, demonstrating superior policy performance and convergence compared to certain state-of-the-art algorithms. The learned strategies succeed in most testing scenarios, achieving higher penetration rates than its competitors. |
|---|---|
| ISSN: | 0957-4174 |
| DOI: | 10.1016/j.eswa.2025.128139 |