On the Convergence of a Reinforcement Learning Process to a Generalized Energy-Optimal Guidance Policy for Unmanned Underwater Vehicles

We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-explo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Oceans (New York. Online) S. 1 - 10
Hauptverfasser:	Greeley, Brian, Brandman, Jeremy, Book, Jeffrey, Barron, Charlie, Landry, Blake, Olson, Colin
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 23.09.2024
Schlagworte:	Dead reckoning Deep reinforcement learning guidance algorithms Meters neural networks Oceans proximal policy optimization reinforcement learning Sea measurements Time measurement Training Trajectory Uncertainty unmanned underwater vehicles Upper bound
ISSN:	2996-1882
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-exploit policy when tasked with transiting an unknown current field. In particular, a trained RL agent will perform an ex-ploratory dive when beginning its transit in order to determine the depth-dependent variation of the current field's magnitude and direction. Following this dive, the agent returns to the depth where the current is most favorable and exploits those currents for the remainder of its transit. But the maximum depth to which the agent is willing to explore is a function of the total distance between the vehicle and its goal. This learned strategy reflects the tradeoff between the vehicle's energy expenditure during its descent and the potential, but uncertain, accrual of a long-term energetic gain due to the presence of a favorable ocean current at depth. We present computational results and supportive analysis quantifying the maximum depth to which the agent is willing to explore as a function of the distance remaining in its transit and its uncertainty regarding the likelihood of finding more favorable currents.
ISSN:	2996-1882
DOI:	10.1109/OCEANS55160.2024.10754109