On the Convergence of a Reinforcement Learning Process to a Generalized Energy-Optimal Guidance Policy for Unmanned Underwater Vehicles

We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-explo...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Oceans (New York. Online) s. 1 - 10
Hlavní autoři: Greeley, Brian, Brandman, Jeremy, Book, Jeffrey, Barron, Charlie, Landry, Blake, Olson, Colin
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 23.09.2024
Témata:
ISSN:2996-1882
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-exploit policy when tasked with transiting an unknown current field. In particular, a trained RL agent will perform an ex-ploratory dive when beginning its transit in order to determine the depth-dependent variation of the current field's magnitude and direction. Following this dive, the agent returns to the depth where the current is most favorable and exploits those currents for the remainder of its transit. But the maximum depth to which the agent is willing to explore is a function of the total distance between the vehicle and its goal. This learned strategy reflects the tradeoff between the vehicle's energy expenditure during its descent and the potential, but uncertain, accrual of a long-term energetic gain due to the presence of a favorable ocean current at depth. We present computational results and supportive analysis quantifying the maximum depth to which the agent is willing to explore as a function of the distance remaining in its transit and its uncertainty regarding the likelihood of finding more favorable currents.
ISSN:2996-1882
DOI:10.1109/OCEANS55160.2024.10754109