On the Convergence of a Reinforcement Learning Process to a Generalized Energy-Optimal Guidance Policy for Unmanned Underwater Vehicles

We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-explo...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Oceans (New York. Online) s. 1 - 10
Hlavní autori:	Greeley, Brian, Brandman, Jeremy, Book, Jeffrey, Barron, Charlie, Landry, Blake, Olson, Colin
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 23.09.2024
Predmet:	Dead reckoning Deep reinforcement learning guidance algorithms Meters neural networks Oceans proximal policy optimization reinforcement learning Sea measurements Time measurement Training Trajectory Uncertainty unmanned underwater vehicles Upper bound
ISSN:	2996-1882
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-exploit policy when tasked with transiting an unknown current field. In particular, a trained RL agent will perform an ex-ploratory dive when beginning its transit in order to determine the depth-dependent variation of the current field's magnitude and direction. Following this dive, the agent returns to the depth where the current is most favorable and exploits those currents for the remainder of its transit. But the maximum depth to which the agent is willing to explore is a function of the total distance between the vehicle and its goal. This learned strategy reflects the tradeoff between the vehicle's energy expenditure during its descent and the potential, but uncertain, accrual of a long-term energetic gain due to the presence of a favorable ocean current at depth. We present computational results and supportive analysis quantifying the maximum depth to which the agent is willing to explore as a function of the distance remaining in its transit and its uncertainty regarding the likelihood of finding more favorable currents.
ISSN:	2996-1882
DOI:	10.1109/OCEANS55160.2024.10754109