Mean deep deterministic policy gradient algorithm for pursuit strategies in three-body confrontation

•The ensemble-based algorithm mean deep deterministic policy gradient is proposed.•A Markov decision model for the three-body confrontation problem is established.•An action-transform method is developed for efficient learning.•Some additional learning techniques is equipped with to improve the perf...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 287; p. 128139
Main Authors: Wang, Ziheng, Pu, Xiandong, Li, Yulin, Zhang, Jianlei, Zhang, Chunyan
Format: Journal Article
Language:English
Published: Elsevier Ltd 25.08.2025
Subjects:
ISSN:0957-4174
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•The ensemble-based algorithm mean deep deterministic policy gradient is proposed.•A Markov decision model for the three-body confrontation problem is established.•An action-transform method is developed for efficient learning.•Some additional learning techniques is equipped with to improve the performance.•Ablation study and comparison experiments are conducted to prove the performance. Three-body confrontation is a challenging pursuit-evasion game with significant applications across various fields. Traditional methods based on differential game theory struggle to manage environmental complexity, imperfect information, and long-term decision-making. Leveraging the model-free approach and robust training capabilities of deep reinforcement learning, we propose an ensemble-based actor-critic algorithm named Augmented Mean Deep Deterministic Policy Gradient (AMDPG) to learn pursuit strategies in Three-body confrontation. This method includes an ensemble reinforcement learning architecture and incorporates multiple learning techniques to enhance its performance. Furthermore, we introduce an action-transform method that provides two prior strategies as heuristic guidance to accelerate action space exploration during learning. The proposed algorithm is evaluated in various scenarios, demonstrating superior policy performance and convergence compared to certain state-of-the-art algorithms. The learned strategies succeed in most testing scenarios, achieving higher penetration rates than its competitors.
ISSN:0957-4174
DOI:10.1016/j.eswa.2025.128139