Interception of a Single Intruding Unmanned Aerial Vehicle by Multiple Missiles Using the Novel EA-MADDPG Training Algorithm

This paper proposes an improved multi-agent deep deterministic policy gradient algorithm called the equal-reward and action-enhanced multi-agent deep deterministic policy gradient (EA-MADDPG) algorithm to solve the guidance problem of multiple missiles cooperating to intercept a single intruding UAV...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Drones (Basel) Jg. 8; H. 10; S. 524
Hauptverfasser: Cai, He, Li, Xingsheng, Zhang, Yibo, Gao, Huanli
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Basel MDPI AG 01.10.2024
Schlagworte:
ISSN:2504-446X, 2504-446X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper proposes an improved multi-agent deep deterministic policy gradient algorithm called the equal-reward and action-enhanced multi-agent deep deterministic policy gradient (EA-MADDPG) algorithm to solve the guidance problem of multiple missiles cooperating to intercept a single intruding UAV in three-dimensional space. The key innovations of EA-MADDPG include the implementation of the action filter with additional reward functions, optimal replay buffer, and equal reward setting. The additional reward functions and the action filter are set to enhance the exploration performance of the missiles during training. The optimal replay buffer and the equal reward setting are implemented to improve the utilization efficiency of exploration experiences obtained through the action filter. In order to prevent over-learning from certain experiences, a special storage mechanism is established, where experiences obtained through the action filter are stored only in the optimal replay buffer, while normal experiences are stored in both the optimal replay buffer and normal replay buffer. Meanwhile, we gradually reduce the selection probability of the action filter and the sampling ratio of the optimal replay buffer. Finally, comparative experiments show that the algorithm enhances the agents’ exploration capabilities, allowing them to learn policies more quickly and stably, which enables multiple missiles to complete the interception task more rapidly and with a higher success rate.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2504-446X
2504-446X
DOI:10.3390/drones8100524