Improving the Performance of Batch-Constrained Reinforcement Learning in Continuous Action Domains via Generative Adversarial Networks

The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a previously collected fixed batch of transitions. However, due to conditional Variational Autoencoders (VAE) used in the data generation module, the BC...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2022 30th Signal Processing and Communications Applications Conference (SIU) s. 1 - 4
Hlavní autoři: Saglam, Baturay, Dalmaz, Onat, Gonc, Kaan, Kozat, Suleyman S.
Médium: Konferenční příspěvek
Jazyk:angličtina
turečtina
Vydáno: IEEE 15.05.2022
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a previously collected fixed batch of transitions. However, due to conditional Variational Autoencoders (VAE) used in the data generation module, the BCQ algorithm optimizes a lower variational bound and hence, it is not generalizable to environments with large state and action spaces. In this paper, we show that the performance of the BCQ algorithm can be further improved with the employment of one of the recent advances in deep learning, Generative Adversarial Networks. Our extensive set of experiments shows that the introduced approach significantly improves BCQ in all of the control tasks tested. Moreover, the introduced approach demonstrates robust generalizability to environments with large state and action spaces in the OpenAI Gym control suite.
AbstractList The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a previously collected fixed batch of transitions. However, due to conditional Variational Autoencoders (VAE) used in the data generation module, the BCQ algorithm optimizes a lower variational bound and hence, it is not generalizable to environments with large state and action spaces. In this paper, we show that the performance of the BCQ algorithm can be further improved with the employment of one of the recent advances in deep learning, Generative Adversarial Networks. Our extensive set of experiments shows that the introduced approach significantly improves BCQ in all of the control tasks tested. Moreover, the introduced approach demonstrates robust generalizability to environments with large state and action spaces in the OpenAI Gym control suite.
Author Saglam, Baturay
Dalmaz, Onat
Gonc, Kaan
Kozat, Suleyman S.
Author_xml – sequence: 1
  givenname: Baturay
  surname: Saglam
  fullname: Saglam, Baturay
  email: baturay@ee.bilkent.edu.tr
  organization: Bilkent Üniversitesi,Elektrik ve Elektronik Mühendisliği Bölümü,Ankara,Türkiye
– sequence: 2
  givenname: Onat
  surname: Dalmaz
  fullname: Dalmaz, Onat
  email: onat@ee.bilkent.edu.tr
  organization: Bilkent Üniversitesi,Elektrik ve Elektronik Mühendisliği Bölümü,Ankara,Türkiye
– sequence: 3
  givenname: Kaan
  surname: Gonc
  fullname: Gonc, Kaan
  email: kaan.gonc@bilkent.edu.tr
  organization: Bilkent Üniversitesi,Bilgisayar Mühendisliği Bölümü,Ankara,Türkiye
– sequence: 4
  givenname: Suleyman S.
  surname: Kozat
  fullname: Kozat, Suleyman S.
  email: kozat@ee.bilkent.edu.tr
  organization: Bilkent Üniversitesi,Elektrik ve Elektronik Mühendisliği Bölümü,Ankara,Türkiye
BookMark eNotkEFOwzAQRY0ECwqcACHNBVJsJ3bTZSlQKlWAgK6rqTOhFs24ctwgLsC5SUVXf_Pel_4fiFMOTELcKDlUSo5v3-dLY4w1Qy21Ho5LW4xKeyIGylpTGDnWxbn4nTe7GDrPn5A2BK8U6xAbZEcQarjD5DbZNHCbInqmCt7Ic084aogTLAgjH1zP0FPJ8z7sW5i45APDfWh6qYXOI8yIKWLyHcGk6ii2GD1u4ZnSd4hf7aU4q3Hb0tUxL8Ty8eFj-pQtXmbz6WSReZXnKSuck45MpWWpSqetKzRJ5XI8LNK6sLVyVSmVXvdYXRGNsA-tyrVcjzRSfiGu_3s9Ea120TcYf1bHa_I_UnBiEw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SIU55565.2022.9864786
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore (IEEE/IET Electronic Library - IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore (IEEE/IET Electronic Library - IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665450924
9781665450928
EndPage 4
ExternalDocumentID 9864786
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i133t-4cc0ce5d20818c26c42e01c3a66542246f1cd8012b0cefdee7aefd218b0b72ae3
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001307163400125&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jun 29 18:38:00 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
Turkish
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i133t-4cc0ce5d20818c26c42e01c3a66542246f1cd8012b0cefdee7aefd218b0b72ae3
PageCount 4
ParticipantIDs ieee_primary_9864786
PublicationCentury 2000
PublicationDate 2022-May-15
PublicationDateYYYYMMDD 2022-05-15
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-May-15
  day: 15
PublicationDecade 2020
PublicationTitle 2022 30th Signal Processing and Communications Applications Conference (SIU)
PublicationTitleAbbrev SIU
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7942281
Snippet The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Aerospace electronics
batch-constrained reinforcement learning
Deep learning
deep reinforcement learning
Employment
Extrapolation
offline re-inforcement learning
Q-learning
Signal processing
Signal processing algorithms
Title Improving the Performance of Batch-Constrained Reinforcement Learning in Continuous Action Domains via Generative Adversarial Networks
URI https://ieeexplore.ieee.org/document/9864786
WOSCitedRecordID wos001307163400125&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEF1q8eBJpRW_mYNHt81uk2xz9KsoSClqobeS3Z1oDibStP0J_m5nN6FF8OIpIUwIzISdmd157zF2JYzVQRpTd5LpmIeYxjzRieI6QW0FFdAy8kDhZzUeD2ezZNJi1xssDCL64TPsuVt_lm9Ls3JbZX1HJa6G8Q7bUSqusVoNKEcESf_1aRpFVJ9Q0ydlr7H9JZric8Zo_39fO2DdLfgOJpu0cshaWHTY96b5B6rYYLId94cyg1taTz-40970ig9o4QU9I6rxm3_QkKi-Q16Ao6PKixU1_HDjMQ1wX37SSxWs8xRqFmq3BIKXaq5S94PCuB4Wr7psOnp4u3vkjYQCz6n5XPLQmMBgZKVjrjMyNqHEQJhB6kSHHZdcRsFySUqTWWYRVUoXSvs60EqmODhi7aIs8JgB2UhryVCIjLqcMEEUglwRJsHAYmhOWMf5cP5Vs2TMG_ed_v34jO25MLlzeBGds_ZyscILtmvWy7xaXPrQ_gCZTKts
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEF1qFfSk0orfzsGjqdntJmmOfpUWaynaQm8l2Z1oDibStP0J_m5nN6FF8OIpIUwIzISdmd157zF2zZWO3cin7iSJfUdi5DthHAZOHGKsORXQwrNA4UEwHHam03BUYzdrLAwi2uEzbJlbe5avc7U0W2W3hko86PhbbNuTUrglWquC5XA3vH3rTzyPKhRq-4RoVda_ZFNs1uju_-97B6y5gd_BaJ1YDlkNswb7Xrf_QDUbjDYD_5AncE8r6odj1Det5gNqeEXLiars9h9UNKrvkGZgCKnSbEktP9xZVAM85p_0UgGrNIKSh9osgmDFmovI_KIwLMfFiyabdJ_GDz2nElFwUmo_F45UylXoaWG465TwlRToctWOjOywYZNLKFwmTcVklmjEIKILJf7YjQMRYfuI1bM8w2MGZCO0JkPOE-pzZIjIOblChm5bo1QnrGF8OPsqeTJmlftO_358xXZ745fBbNAfPp-xPRMycyrPvXNWX8yXeMF21GqRFvNLG-Yfqlyusw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+30th+Signal+Processing+and+Communications+Applications+Conference+%28SIU%29&rft.atitle=Improving+the+Performance+of+Batch-Constrained+Reinforcement+Learning+in+Continuous+Action+Domains+via+Generative+Adversarial+Networks&rft.au=Saglam%2C+Baturay&rft.au=Dalmaz%2C+Onat&rft.au=Gonc%2C+Kaan&rft.au=Kozat%2C+Suleyman+S.&rft.date=2022-05-15&rft.pub=IEEE&rft.spage=1&rft.epage=4&rft_id=info:doi/10.1109%2FSIU55565.2022.9864786&rft.externalDocID=9864786