On the Convergence of a Reinforcement Learning Process to a Generalized Energy-Optimal Guidance Policy for Unmanned Underwater Vehicles
We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-explo...
Uloženo v:
| Vydáno v: | Oceans (New York. Online) s. 1 - 10 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
23.09.2024
|
| Témata: | |
| ISSN: | 2996-1882 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-exploit policy when tasked with transiting an unknown current field. In particular, a trained RL agent will perform an ex-ploratory dive when beginning its transit in order to determine the depth-dependent variation of the current field's magnitude and direction. Following this dive, the agent returns to the depth where the current is most favorable and exploits those currents for the remainder of its transit. But the maximum depth to which the agent is willing to explore is a function of the total distance between the vehicle and its goal. This learned strategy reflects the tradeoff between the vehicle's energy expenditure during its descent and the potential, but uncertain, accrual of a long-term energetic gain due to the presence of a favorable ocean current at depth. We present computational results and supportive analysis quantifying the maximum depth to which the agent is willing to explore as a function of the distance remaining in its transit and its uncertainty regarding the likelihood of finding more favorable currents. |
|---|---|
| AbstractList | We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current profiles exhibiting time-stationary random variation in direction and magnitude as a function of depth-executes a distance-conditional explore-exploit policy when tasked with transiting an unknown current field. In particular, a trained RL agent will perform an ex-ploratory dive when beginning its transit in order to determine the depth-dependent variation of the current field's magnitude and direction. Following this dive, the agent returns to the depth where the current is most favorable and exploits those currents for the remainder of its transit. But the maximum depth to which the agent is willing to explore is a function of the total distance between the vehicle and its goal. This learned strategy reflects the tradeoff between the vehicle's energy expenditure during its descent and the potential, but uncertain, accrual of a long-term energetic gain due to the presence of a favorable ocean current at depth. We present computational results and supportive analysis quantifying the maximum depth to which the agent is willing to explore as a function of the distance remaining in its transit and its uncertainty regarding the likelihood of finding more favorable currents. |
| Author | Book, Jeffrey Brandman, Jeremy Greeley, Brian Barron, Charlie Olson, Colin Landry, Blake |
| Author_xml | – sequence: 1 givenname: Brian surname: Greeley fullname: Greeley, Brian email: bgreeley@dcscorp.com organization: DCS Corporation,Alexandria,VA,22310 – sequence: 2 givenname: Jeremy surname: Brandman fullname: Brandman, Jeremy email: jeremy.s.brandman.civ@us.navy.mil organization: U.S. Naval Research Laboratory,Washington,D.C.,20375 – sequence: 3 givenname: Jeffrey surname: Book fullname: Book, Jeffrey email: jeffrey.w.book.civ@us.navy.mil organization: U.S. Naval Research Laboratory, Stennis Space Center,MS,39529 – sequence: 4 givenname: Charlie surname: Barron fullname: Barron, Charlie email: charlie.n.barron.civ@us.navy.mil organization: U.S. Naval Research Laboratory, Stennis Space Center,MS,39529 – sequence: 5 givenname: Blake surname: Landry fullname: Landry, Blake email: blake.j.landry4.civ@us.navy.mil organization: U.S. Naval Research Laboratory, Stennis Space Center,MS,39529 – sequence: 6 givenname: Colin surname: Olson fullname: Olson, Colin email: colin.c.olson.civ@us.navy.mil organization: U.S. Naval Research Laboratory,Washington,D.C.,20375 |
| BookMark | eNo1kEFOwzAQRQ0CiVJ6AxbmACnjOIntZRWVgFSRClq2lWNPWqPUqZwAKhfg2gQBqz-jeXoa_Uty5luPhNwwmDIG6rbM57PH5zRlGUxjiJMpA5Emw-WETJRQknOWJgCSnZJRrFQWMSnjCzLpulcA4EyIJEtG5Kv0tN8hzVv_jmGL3iBta6rpEzpft8HgHn1PF6iDd35Ll6E12HW0bwemQI9BN-4TLZ0P4_YYlYfe7XVDizdn9Y9s2TbOHOmgomu_194P7NpbDB-6x0BfcOdMg90VOa910-HkL8dkdTdf5ffRoiwe8tkicor1kTIM0ehacoUJy7iNMxBWywyGlbO4spVEUydG1GCF5DbDGjkKazVgBRUfk-tfrUPEzSEMv4bj5r86_g0ey2jI |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/OCEANS55160.2024.10754109 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 9798331540081 |
| EISSN | 2996-1882 |
| EndPage | 10 |
| ExternalDocumentID | 10754109 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IM AAJGR ALMA_UNASSIGNED_HOLDINGS CBEJK IPLJI RIE RIO |
| ID | FETCH-LOGICAL-i91t-9c1eecaf839e4163d2607da860e41312bdb8ecf4c7f0d783d6efe3e7dda0eb0b3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 03:03:51 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i91t-9c1eecaf839e4163d2607da860e41312bdb8ecf4c7f0d783d6efe3e7dda0eb0b3 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_10754109 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-Sept.-23 |
| PublicationDateYYYYMMDD | 2024-09-23 |
| PublicationDate_xml | – month: 09 year: 2024 text: 2024-Sept.-23 day: 23 |
| PublicationDecade | 2020 |
| PublicationTitle | Oceans (New York. Online) |
| PublicationTitleAbbrev | OCEANS |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003177464 |
| Score | 2.26881 |
| Snippet | We demonstrate that an energy-minimizing guid-ance system for unmanned underwater vehicles-trained by deep reinforcement learning (RL) on ocean current... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Dead reckoning Deep reinforcement learning guidance algorithms Meters neural networks Oceans proximal policy optimization reinforcement learning Sea measurements Time measurement Training Trajectory Uncertainty unmanned underwater vehicles Upper bound |
| Title | On the Convergence of a Reinforcement Learning Process to a Generalized Energy-Optimal Guidance Policy for Unmanned Underwater Vehicles |
| URI | https://ieeexplore.ieee.org/document/10754109 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELWgQggubEXsMhLXFCd24uSIqhZObQUF9VZ5GUMkmqCSgsQP8NvYblrgwIFbEmWxPIpnPDPvPYQuCMQ0ksACzVgWMKV4IKQ1CIRSUiKzJNTGi03wXi8djbJBDVb3WBgA8M1n0HKHvpavSzVzqTL7h_OYhQ6ut8p5MgdrLRMq1hFylrB1dF7zaF72252r3p2rBBG7EYxYa_H8LyUV70i6W_8cwjZqfkPy8GDpbHbQChS7aPMHm-Ae-uwX2IZzuO0ayT2mEnBpsMC34OlRlc8E4ppR9RHXGAFclfaemn46_wCNOx4PGPTtajIRz_h6lmvhv-85hLF9Fb4vJsKt0NjrJr3bgHWKH-DJ99g10bDbGbZvglpnIcizsAoyFQIoYWyoBC4803aLw7VIE2JPaRhJLVNQhiluiOYp1QkYoMC1FgQkkXQfNYqygAOEvaIZixyNoWJGxiI2jFJCQLuCTpoeoqab0vHLnEljvJjNoz-uH6MNZzjXnxHRE9SopjM4RWvqrcpfp2fe_l_tYbNN |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELVQQSwXtiJ2jMQ1xYmdOjmiqqWIklZQUG-VlwlUogkqLUj8AL-N7aYFDhy4JVHkRJ7EM56Z9x5CZwRCGkhgnmYs9phS3BPSGAR8KSmRcdXXqROb4EkS9XpxpwCrOywMALjmM6jYQ1fL17ma2FSZ-cN5yHwL11sMGQvIFK41T6kYV8hZlS2j04JJ87xdq18kd7YWRMxWMGCV2Qi_tFScK2ms__MlNlD5G5SHO3N3s4kWINtCaz_4BLfRZzvDJqDDNdtK7lCVgPMUC3wLjiBVuVwgLjhVH3GBEsDj3NxTEFAPPkDjukMEem2zngzFM76cDLRwz3cswtgMhe-zobBrNHbKSe8mZB3hB3hyXXZl1G3Uu7WmVygteIPYH3ux8gGUSE2wBDZA02aTw7WIqsScUj-QWkagUqZ4SjSPqK5CChS41oKAJJLuoFKWZ7CLsNM0M6bxKVcslaEIU0YpIaBtSSeK9lDZTmn_Zcql0Z_N5v4f10_QSrN70-q3rpLrA7RqjWi7NQJ6iErj0QSO0JJ6Gw9eR8fuW_gCanW2lA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Oceans+%28New+York.+Online%29&rft.atitle=On+the+Convergence+of+a+Reinforcement+Learning+Process+to+a+Generalized+Energy-Optimal+Guidance+Policy+for+Unmanned+Underwater+Vehicles&rft.au=Greeley%2C+Brian&rft.au=Brandman%2C+Jeremy&rft.au=Book%2C+Jeffrey&rft.au=Barron%2C+Charlie&rft.date=2024-09-23&rft.pub=IEEE&rft.eissn=2996-1882&rft.spage=1&rft.epage=10&rft_id=info:doi/10.1109%2FOCEANS55160.2024.10754109&rft.externalDocID=10754109 |