Energy-Aware MARL for Coordinated Data Collection in Multi-AUV Systems

Gespeichert in:
Bibliographische Detailangaben
Titel: Energy-Aware MARL for Coordinated Data Collection in Multi-AUV Systems
Autoren: Arif Wibisono, Hyoung-Kyu Song, Byung Moo Lee
Quelle: IEEE Access, Vol 13, Pp 155835-155854 (2025)
Verlagsinformationen: IEEE, 2025.
Publikationsjahr: 2025
Bestand: LCC:Electrical engineering. Electronics. Nuclear engineering
Schlagwörter: Autonomous underwater vehicle (AUV), multi-agent reinforcement learning (MARL), energy-efficient navigation, buffer overflow, flight eXceedance (FX), multi-agent deep deterministic policy gradient (MADDPG), Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Beschreibung: As the demand for adaptive and autonomous smart ocean systems continues to grow, multi-agent control strategies based on reinforcement learning for Autonomous Underwater Vehicles (AUVs) play a vital role in supporting data collection in challenging deep-sea environments. Unlike previous surveys, this paper presents a comprehensive review of Multi-Agent Reinforcement Learning (MARL) approaches with a specific emphasis on energy efficiency and inter-AUV coordination. We examine various MARL algorithms and their applications in real-world scenarios such as buffer overflow prevention, avoidance of Flight eXceedance (FX) violations, and adaptive path planning. As a supporting illustration, we include a case study based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm to demonstrate how coordinated policies can be formed under energy-constrained and partially observable scenarios. Additional experiments using MAPPO and MODDPG show that MODDPG excels in energy efficiency with low overflow, while MAPPO yields moderate rewards but lacks training stability. These results provide a conceptual foundation for validating energy-efficient reward strategies based on decentralized coordination. Scope note: the case study we included is a form of limited validation to complement the survey, not a new algorithmic contribution. Therefore, the experimental findings should be read as supporting the narrative (rather than as a claim of being the state-of-the-art method in this field). We also acknowledge that the simulation used is still limited to an idealistic 2D grid environment and does not fully represent real ocean dynamics. Therefore, we plan to extend it to a 3D particle-based or Computational Fluid Dynamics (CFD) framework, along with integration of ocean environmental data from historical sources such as NOAA, JAMSTEC, and Copernicus Marine. Major challenges such as non-stationarity in agent interactions, limitations of acoustic communication, and the simulation-to-reality gap are also discussed. Future research directions include Meta-Reinforcement Learning (Meta-RL), adaptive role assignment based on energy utility, large-scale decentralized MARL architectures, and training based on realistic ocean scenarios. This review and the supporting experiments are expected to serve as a strategic foundation for the development of efficient, robust, and scalable multi-agent AUV systems for future marine missions.
Publikationsart: article
Dateibeschreibung: electronic resource
Sprache: English
ISSN: 2169-3536
Relation: https://ieeexplore.ieee.org/document/11151265/; https://doaj.org/toc/2169-3536
DOI: 10.1109/ACCESS.2025.3606016
Zugangs-URL: https://doaj.org/article/1424f19da0a44f72bdd01f91070293e9
Dokumentencode: edsdoj.1424f19da0a44f72bdd01f91070293e9
Datenbank: Directory of Open Access Journals
Beschreibung
Abstract:As the demand for adaptive and autonomous smart ocean systems continues to grow, multi-agent control strategies based on reinforcement learning for Autonomous Underwater Vehicles (AUVs) play a vital role in supporting data collection in challenging deep-sea environments. Unlike previous surveys, this paper presents a comprehensive review of Multi-Agent Reinforcement Learning (MARL) approaches with a specific emphasis on energy efficiency and inter-AUV coordination. We examine various MARL algorithms and their applications in real-world scenarios such as buffer overflow prevention, avoidance of Flight eXceedance (FX) violations, and adaptive path planning. As a supporting illustration, we include a case study based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm to demonstrate how coordinated policies can be formed under energy-constrained and partially observable scenarios. Additional experiments using MAPPO and MODDPG show that MODDPG excels in energy efficiency with low overflow, while MAPPO yields moderate rewards but lacks training stability. These results provide a conceptual foundation for validating energy-efficient reward strategies based on decentralized coordination. Scope note: the case study we included is a form of limited validation to complement the survey, not a new algorithmic contribution. Therefore, the experimental findings should be read as supporting the narrative (rather than as a claim of being the state-of-the-art method in this field). We also acknowledge that the simulation used is still limited to an idealistic 2D grid environment and does not fully represent real ocean dynamics. Therefore, we plan to extend it to a 3D particle-based or Computational Fluid Dynamics (CFD) framework, along with integration of ocean environmental data from historical sources such as NOAA, JAMSTEC, and Copernicus Marine. Major challenges such as non-stationarity in agent interactions, limitations of acoustic communication, and the simulation-to-reality gap are also discussed. Future research directions include Meta-Reinforcement Learning (Meta-RL), adaptive role assignment based on energy utility, large-scale decentralized MARL architectures, and training based on realistic ocean scenarios. This review and the supporting experiments are expected to serve as a strategic foundation for the development of efficient, robust, and scalable multi-agent AUV systems for future marine missions.
ISSN:21693536
DOI:10.1109/ACCESS.2025.3606016