Improving the Performance of Batch-Constrained Reinforcement Learning in Continuous Action Domains via Generative Adversarial Networks
The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a previously collected fixed batch of transitions. However, due to conditional Variational Autoencoders (VAE) used in the data generation module, the BC...
Uložené v:
| Vydané v: | 2022 30th Signal Processing and Communications Applications Conference (SIU) s. 1 - 4 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English Turkish |
| Vydavateľské údaje: |
IEEE
15.05.2022
|
| Predmet: | |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a previously collected fixed batch of transitions. However, due to conditional Variational Autoencoders (VAE) used in the data generation module, the BCQ algorithm optimizes a lower variational bound and hence, it is not generalizable to environments with large state and action spaces. In this paper, we show that the performance of the BCQ algorithm can be further improved with the employment of one of the recent advances in deep learning, Generative Adversarial Networks. Our extensive set of experiments shows that the introduced approach significantly improves BCQ in all of the control tasks tested. Moreover, the introduced approach demonstrates robust generalizability to environments with large state and action spaces in the OpenAI Gym control suite. |
|---|---|
| AbstractList | The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a previously collected fixed batch of transitions. However, due to conditional Variational Autoencoders (VAE) used in the data generation module, the BCQ algorithm optimizes a lower variational bound and hence, it is not generalizable to environments with large state and action spaces. In this paper, we show that the performance of the BCQ algorithm can be further improved with the employment of one of the recent advances in deep learning, Generative Adversarial Networks. Our extensive set of experiments shows that the introduced approach significantly improves BCQ in all of the control tasks tested. Moreover, the introduced approach demonstrates robust generalizability to environments with large state and action spaces in the OpenAI Gym control suite. |
| Author | Saglam, Baturay Dalmaz, Onat Gonc, Kaan Kozat, Suleyman S. |
| Author_xml | – sequence: 1 givenname: Baturay surname: Saglam fullname: Saglam, Baturay email: baturay@ee.bilkent.edu.tr organization: Bilkent Üniversitesi,Elektrik ve Elektronik Mühendisliği Bölümü,Ankara,Türkiye – sequence: 2 givenname: Onat surname: Dalmaz fullname: Dalmaz, Onat email: onat@ee.bilkent.edu.tr organization: Bilkent Üniversitesi,Elektrik ve Elektronik Mühendisliği Bölümü,Ankara,Türkiye – sequence: 3 givenname: Kaan surname: Gonc fullname: Gonc, Kaan email: kaan.gonc@bilkent.edu.tr organization: Bilkent Üniversitesi,Bilgisayar Mühendisliği Bölümü,Ankara,Türkiye – sequence: 4 givenname: Suleyman S. surname: Kozat fullname: Kozat, Suleyman S. email: kozat@ee.bilkent.edu.tr organization: Bilkent Üniversitesi,Elektrik ve Elektronik Mühendisliği Bölümü,Ankara,Türkiye |
| BookMark | eNotkEFOwzAQRY0ECwqcACHNBVJsJ3bTZSlQKlWAgK6rqTOhFs24ctwgLsC5SUVXf_Pel_4fiFMOTELcKDlUSo5v3-dLY4w1Qy21Ho5LW4xKeyIGylpTGDnWxbn4nTe7GDrPn5A2BK8U6xAbZEcQarjD5DbZNHCbInqmCt7Ic084aogTLAgjH1zP0FPJ8z7sW5i45APDfWh6qYXOI8yIKWLyHcGk6ii2GD1u4ZnSd4hf7aU4q3Hb0tUxL8Ty8eFj-pQtXmbz6WSReZXnKSuck45MpWWpSqetKzRJ5XI8LNK6sLVyVSmVXvdYXRGNsA-tyrVcjzRSfiGu_3s9Ea120TcYf1bHa_I_UnBiEw |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SIU55565.2022.9864786 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1665450924 9781665450928 |
| EndPage | 4 |
| ExternalDocumentID | 9864786 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i133t-4cc0ce5d20818c26c42e01c3a66542246f1cd8012b0cefdee7aefd218b0b72ae3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001307163400125&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu Jun 29 18:38:00 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English Turkish |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i133t-4cc0ce5d20818c26c42e01c3a66542246f1cd8012b0cefdee7aefd218b0b72ae3 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_9864786 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-May-15 |
| PublicationDateYYYYMMDD | 2022-05-15 |
| PublicationDate_xml | – month: 05 year: 2022 text: 2022-May-15 day: 15 |
| PublicationDecade | 2020 |
| PublicationTitle | 2022 30th Signal Processing and Communications Applications Conference (SIU) |
| PublicationTitleAbbrev | SIU |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.7943292 |
| Snippet | The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Aerospace electronics batch-constrained reinforcement learning Deep learning deep reinforcement learning Employment Extrapolation offline re-inforcement learning Q-learning Signal processing Signal processing algorithms |
| Title | Improving the Performance of Batch-Constrained Reinforcement Learning in Continuous Action Domains via Generative Adversarial Networks |
| URI | https://ieeexplore.ieee.org/document/9864786 |
| WOSCitedRecordID | wos001307163400125&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELXaioEJUIv41g2MuE3S2G5GvipYqgqo1K1y7AtkIEFN25_A7-bsRK2QWJgSRbYi2Zbvnn3vPcauDWqlhUq5SqzhsYqRJzJBTpErpHxZyUyk3mxCTSaj-TyZttjNlguDiL74DPvu1d_l29Ks3VHZwEmJq5Fss7ZSsuZqNaScMEgGr88zISg_IdAXRf2m7S_TFB8zxgf_-9sh6-3IdzDdhpUj1sKiy7634B8oY4PprtwfygzuaD_94M570zs-oIUX9Iqoxh_-QSOi-g55AU6OKi_WBPjh1nMa4KH8pE4VbHINtQq12wLBWzVX2i1QmNTF4lWPzcaPb_dPvLFQ4DmBzxWPjQkMChs55ToTSRNHGIRmqJ3psNOSy0JjXZBKqVlmEZWmB4X9NEhVpHF4zDpFWeAJAwIiaRroUTy0KjaWVqDUUiOlgzLSWsSnrOvGcPFVq2QsmuE7-_vzOdt30-Tu4UNxwTqr5Rov2Z7ZrPJqeeWn9ge4zqsr |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dS8MwFA1zCvqksonf3gcfzWy7pFkf_RobzjJ0g72NNLnTPtjKuu0n-LtN0rIh-OJTS0koJCH3nuSecwi5ViiF5CKhItKKMsGQRmGE1EQu3-TLIpzxxJlNiDjuTCbRsEZu1lwYRHTFZ9iyr-4uX-dqaY_Kbq2UuOiEW2SbMxZ4JVurouX4XnT71h9zbjIUA_uCoFW1_mWb4qJGd_9__zsgzQ39DobrwHJIapg1yPca_oPJ2WC4KfiHfAb3Zkf9oNZ903k-oIZXdJqoyh3_QSWj-g5pBlaQKs2WBvLDnWM1wGP-aToVsEollDrUdhMEZ9ZcSLtEIS7LxYsmGXefRg89Wpko0NTAzwVlSnkKuQ6sdp0KQsUC9HzVltZ22KrJzXylbZhKTLOZRhTSPEzgT7xEBBLbR6Se5RkeEzBQJEk82WFtLZjSZg2GMpRoEsIwkJKzE9KwYzj9KnUyptXwnf79-Yrs9kYvg-mgHz-fkT07ZfZW3ufnpL6YL_GC7KjVIi3ml26afwDF7q5y |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+30th+Signal+Processing+and+Communications+Applications+Conference+%28SIU%29&rft.atitle=Improving+the+Performance+of+Batch-Constrained+Reinforcement+Learning+in+Continuous+Action+Domains+via+Generative+Adversarial+Networks&rft.au=Saglam%2C+Baturay&rft.au=Dalmaz%2C+Onat&rft.au=Gonc%2C+Kaan&rft.au=Kozat%2C+Suleyman+S.&rft.date=2022-05-15&rft.pub=IEEE&rft.spage=1&rft.epage=4&rft_id=info:doi/10.1109%2FSIU55565.2022.9864786&rft.externalDocID=9864786 |