On the Convergence of a Reinforcement Learning Process to a Generalized Energy-Optimal Guidance Policy for Unmanned Underwater Vehicles
We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-explo...
Gespeichert in:
| Veröffentlicht in: | Oceans (New York. Online) S. 1 - 10 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
23.09.2024
|
| Schlagworte: | |
| ISSN: | 2996-1882 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-exploit policy when tasked with transiting an unknown current field. In particular, a trained RL agent will perform an ex-ploratory dive when beginning its transit in order to determine the depth-dependent variation of the current field's magnitude and direction. Following this dive, the agent returns to the depth where the current is most favorable and exploits those currents for the remainder of its transit. But the maximum depth to which the agent is willing to explore is a function of the total distance between the vehicle and its goal. This learned strategy reflects the tradeoff between the vehicle's energy expenditure during its descent and the potential, but uncertain, accrual of a long-term energetic gain due to the presence of a favorable ocean current at depth. We present computational results and supportive analysis quantifying the maximum depth to which the agent is willing to explore as a function of the distance remaining in its transit and its uncertainty regarding the likelihood of finding more favorable currents. |
|---|---|
| ISSN: | 2996-1882 |
| DOI: | 10.1109/OCEANS55160.2024.10754109 |