Orthogonal Gradient Multi-Agent Deep Reinforcement Learning-based Algorithm for Spectrum Resource Optimization in UAV Swarm

In recent years, Unmanned Aerial Vehicle (UAV) swarm technology has made rapid progress, but communication problems remain a key factor limiting its widespread application. This study proposes an Orthogonal Gradient Multi-Agent Deep Reinforcement Learning-based Algorithm for Spectrum Resource Optimi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Physical communication Jg. 72; S. 102770
Hauptverfasser: Lou, Dengke, Wan, Boyu, Zhang, Yu, Du, Yihang, Wan, Fayu, Chen, Yong
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.10.2025
Schlagworte:
ISSN:1874-4907
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In recent years, Unmanned Aerial Vehicle (UAV) swarm technology has made rapid progress, but communication problems remain a key factor limiting its widespread application. This study proposes an Orthogonal Gradient Multi-Agent Deep Reinforcement Learning-based Algorithm for Spectrum Resource Optimization in UAV swarm (OG-MADRL) to solve this problem. Specifically, to address the potential issue of gradient instability that may arise during the spectrum resource optimization process for UAV swarm, this study introduces an Orthogonal Gradient system (OG), which synergistically combines orthogonal initialization with global gradient clipping. This approach effectively alleviates both gradient explosion and vanishing phenomena, thereby enhancing training stability and accelerating convergence. To satisfy dynamic communication demands, a sophisticated reward mechanism is designed to enable flexible resource allocation to each UAV. To overcome the poor coordination in distributed training frameworks, the proposes algorithm is implemented within a Centralized Training and Distributed Execution (CTDE) framework, efficiently integrating global information to enable rapid responses to environmental changes. Simulation results show that, compared to existing algorithms, the OG-MADRL algorithm not only improves cluster throughput and enhances training stability but also exhibits superior adaptability and robustness in interference environment with dynamic communication demands.
ISSN:1874-4907
DOI:10.1016/j.phycom.2025.102770