On the Convergence of a Reinforcement Learning Process to a Generalized Energy-Optimal Guidance Policy for Unmanned Underwater Vehicles

We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-explo...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Oceans (New York. Online) s. 1 - 10
Hlavní autoři: Greeley, Brian, Brandman, Jeremy, Book, Jeffrey, Barron, Charlie, Landry, Blake, Olson, Colin
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 23.09.2024
Témata:
ISSN:2996-1882
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-exploit policy when tasked with transiting an unknown current field. In particular, a trained RL agent will perform an ex-ploratory dive when beginning its transit in order to determine the depth-dependent variation of the current field's magnitude and direction. Following this dive, the agent returns to the depth where the current is most favorable and exploits those currents for the remainder of its transit. But the maximum depth to which the agent is willing to explore is a function of the total distance between the vehicle and its goal. This learned strategy reflects the tradeoff between the vehicle's energy expenditure during its descent and the potential, but uncertain, accrual of a long-term energetic gain due to the presence of a favorable ocean current at depth. We present computational results and supportive analysis quantifying the maximum depth to which the agent is willing to explore as a function of the distance remaining in its transit and its uncertainty regarding the likelihood of finding more favorable currents.
AbstractList We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-exploit policy when tasked with transiting an unknown current field. In particular, a trained RL agent will perform an ex-ploratory dive when beginning its transit in order to determine the depth-dependent variation of the current field's magnitude and direction. Following this dive, the agent returns to the depth where the current is most favorable and exploits those currents for the remainder of its transit. But the maximum depth to which the agent is willing to explore is a function of the total distance between the vehicle and its goal. This learned strategy reflects the tradeoff between the vehicle's energy expenditure during its descent and the potential, but uncertain, accrual of a long-term energetic gain due to the presence of a favorable ocean current at depth. We present computational results and supportive analysis quantifying the maximum depth to which the agent is willing to explore as a function of the distance remaining in its transit and its uncertainty regarding the likelihood of finding more favorable currents.
Author Book, Jeffrey
Brandman, Jeremy
Greeley, Brian
Barron, Charlie
Olson, Colin
Landry, Blake
Author_xml – sequence: 1
  givenname: Brian
  surname: Greeley
  fullname: Greeley, Brian
  email: bgreeley@dcscorp.com
  organization: DCS Corporation,Alexandria,VA,22310
– sequence: 2
  givenname: Jeremy
  surname: Brandman
  fullname: Brandman, Jeremy
  email: jeremy.s.brandman.civ@us.navy.mil
  organization: U.S. Naval Research Laboratory,Washington,D.C.,20375
– sequence: 3
  givenname: Jeffrey
  surname: Book
  fullname: Book, Jeffrey
  email: jeffrey.w.book.civ@us.navy.mil
  organization: U.S. Naval Research Laboratory, Stennis Space Center,MS,39529
– sequence: 4
  givenname: Charlie
  surname: Barron
  fullname: Barron, Charlie
  email: charlie.n.barron.civ@us.navy.mil
  organization: U.S. Naval Research Laboratory, Stennis Space Center,MS,39529
– sequence: 5
  givenname: Blake
  surname: Landry
  fullname: Landry, Blake
  email: blake.j.landry4.civ@us.navy.mil
  organization: U.S. Naval Research Laboratory, Stennis Space Center,MS,39529
– sequence: 6
  givenname: Colin
  surname: Olson
  fullname: Olson, Colin
  email: colin.c.olson.civ@us.navy.mil
  organization: U.S. Naval Research Laboratory,Washington,D.C.,20375
BookMark eNo1kEFOwzAQRQ0CiVJ6AxbmACnjOIntZRWVgFSRClq2lWNPWqPUqZwAKhfg2gQBqz-jeXoa_Uty5luPhNwwmDIG6rbM57PH5zRlGUxjiJMpA5Emw-WETJRQknOWJgCSnZJRrFQWMSnjCzLpulcA4EyIJEtG5Kv0tN8hzVv_jmGL3iBta6rpEzpft8HgHn1PF6iDd35Ll6E12HW0bwemQI9BN-4TLZ0P4_YYlYfe7XVDizdn9Y9s2TbOHOmgomu_194P7NpbDB-6x0BfcOdMg90VOa910-HkL8dkdTdf5ffRoiwe8tkicor1kTIM0ehacoUJy7iNMxBWywyGlbO4spVEUydG1GCF5DbDGjkKazVgBRUfk-tfrUPEzSEMv4bj5r86_g0ey2jI
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/OCEANS55160.2024.10754109
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9798331540081
EISSN 2996-1882
EndPage 10
ExternalDocumentID 10754109
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IM
AAJGR
ALMA_UNASSIGNED_HOLDINGS
CBEJK
IPLJI
RIE
RIO
ID FETCH-LOGICAL-i91t-9c1eecaf839e4163d2607da860e41312bdb8ecf4c7f0d783d6efe3e7dda0eb0b3
IEDL.DBID RIE
IngestDate Wed Aug 27 03:03:51 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i91t-9c1eecaf839e4163d2607da860e41312bdb8ecf4c7f0d783d6efe3e7dda0eb0b3
PageCount 10
ParticipantIDs ieee_primary_10754109
PublicationCentury 2000
PublicationDate 2024-Sept.-23
PublicationDateYYYYMMDD 2024-09-23
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-Sept.-23
  day: 23
PublicationDecade 2020
PublicationTitle Oceans (New York. Online)
PublicationTitleAbbrev OCEANS
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003177464
Score 2.26881
Snippet We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Dead reckoning
Deep reinforcement learning
guidance algorithms
Meters
neural networks
Oceans
proximal policy optimization
reinforcement learning
Sea measurements
Time measurement
Training
Trajectory
Uncertainty
unmanned underwater vehicles
Upper bound
Title On the Convergence of a Reinforcement Learning Process to a Generalized Energy-Optimal Guidance Policy for Unmanned Underwater Vehicles
URI https://ieeexplore.ieee.org/document/10754109
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELWgQggubEXsMhLXFCd24uSIqhZObQUF9VZ5GUMkmqCSgsQP8NvYblrgwIFbEmWxPIpnPDPvPYQuCMQ0ksACzVgWMKV4IKQ1CIRSUiKzJNTGi03wXi8djbJBDVb3WBgA8M1n0HKHvpavSzVzqTL7h_OYhQ6ut8p5MgdrLRMq1hFylrB1dF7zaF72252r3p2rBBG7EYxYa_H8LyUV70i6W_8cwjZqfkPy8GDpbHbQChS7aPMHm-Ae-uwX2IZzuO0ayT2mEnBpsMC34OlRlc8E4ppR9RHXGAFclfaemn46_wCNOx4PGPTtajIRz_h6lmvhv-85hLF9Fb4vJsKt0NjrJr3bgHWKH-DJ99g10bDbGbZvglpnIcizsAoyFQIoYWyoBC4803aLw7VIE2JPaRhJLVNQhiluiOYp1QkYoMC1FgQkkXQfNYqygAOEvaIZixyNoWJGxiI2jFJCQLuCTpoeoqab0vHLnEljvJjNoz-uH6MNZzjXnxHRE9SopjM4RWvqrcpfp2fe_l_tYbNN
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELVQQSwXtiJ2jMQ1xYmdOjmiqqWIklZQUG-VlwlUogkqLUj8AL-N7aYFDhy4JVHkRJ7EM56Z9x5CZwRCGkhgnmYs9phS3BPSGAR8KSmRcdXXqROb4EkS9XpxpwCrOywMALjmM6jYQ1fL17ma2FSZ-cN5yHwL11sMGQvIFK41T6kYV8hZlS2j04JJ87xdq18kd7YWRMxWMGCV2Qi_tFScK2ms__MlNlD5G5SHO3N3s4kWINtCaz_4BLfRZzvDJqDDNdtK7lCVgPMUC3wLjiBVuVwgLjhVH3GBEsDj3NxTEFAPPkDjukMEem2zngzFM76cDLRwz3cswtgMhe-zobBrNHbKSe8mZB3hB3hyXXZl1G3Uu7WmVygteIPYH3ux8gGUSE2wBDZA02aTw7WIqsScUj-QWkagUqZ4SjSPqK5CChS41oKAJJLuoFKWZ7CLsNM0M6bxKVcslaEIU0YpIaBtSSeK9lDZTmn_Zcql0Z_N5v4f10_QSrN70-q3rpLrA7RqjWi7NQJ6iErj0QSO0JJ6Gw9eR8fuW_gCanW2lA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Oceans+%28New+York.+Online%29&rft.atitle=On+the+Convergence+of+a+Reinforcement+Learning+Process+to+a+Generalized+Energy-Optimal+Guidance+Policy+for+Unmanned+Underwater+Vehicles&rft.au=Greeley%2C+Brian&rft.au=Brandman%2C+Jeremy&rft.au=Book%2C+Jeffrey&rft.au=Barron%2C+Charlie&rft.date=2024-09-23&rft.pub=IEEE&rft.eissn=2996-1882&rft.spage=1&rft.epage=10&rft_id=info:doi/10.1109%2FOCEANS55160.2024.10754109&rft.externalDocID=10754109