Orthogonal Gradient Multi-Agent Deep Reinforcement Learning-based Algorithm for Spectrum Resource Optimization in UAV Swarm

In recent years, Unmanned Aerial Vehicle (UAV) swarm technology has made rapid progress, but communication problems remain a key factor limiting its widespread application. This study proposes an Orthogonal Gradient Multi-Agent Deep Reinforcement Learning-based Algorithm for Spectrum Resource Optimi...

Full description

Saved in:
Bibliographic Details
Published in:Physical communication Vol. 72; p. 102770
Main Authors: Lou, Dengke, Wan, Boyu, Zhang, Yu, Du, Yihang, Wan, Fayu, Chen, Yong
Format: Journal Article
Language:English
Published: Elsevier B.V 01.10.2025
Subjects:
ISSN:1874-4907
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, Unmanned Aerial Vehicle (UAV) swarm technology has made rapid progress, but communication problems remain a key factor limiting its widespread application. This study proposes an Orthogonal Gradient Multi-Agent Deep Reinforcement Learning-based Algorithm for Spectrum Resource Optimization in UAV swarm (OG-MADRL) to solve this problem. Specifically, to address the potential issue of gradient instability that may arise during the spectrum resource optimization process for UAV swarm, this study introduces an Orthogonal Gradient system (OG), which synergistically combines orthogonal initialization with global gradient clipping. This approach effectively alleviates both gradient explosion and vanishing phenomena, thereby enhancing training stability and accelerating convergence. To satisfy dynamic communication demands, a sophisticated reward mechanism is designed to enable flexible resource allocation to each UAV. To overcome the poor coordination in distributed training frameworks, the proposes algorithm is implemented within a Centralized Training and Distributed Execution (CTDE) framework, efficiently integrating global information to enable rapid responses to environmental changes. Simulation results show that, compared to existing algorithms, the OG-MADRL algorithm not only improves cluster throughput and enhances training stability but also exhibits superior adaptability and robustness in interference environment with dynamic communication demands.
ISSN:1874-4907
DOI:10.1016/j.phycom.2025.102770