Preferential Proximal Policy Optimization
The Proximal Policy Optimization (PPO) is a policy gradient approach providing state-of-the-art performance in many domains through the "surrogate" objective function using stochastic gradient ascent. While PPO is an appealing approach in reinforcement learning, it does not consider the im...
Uložené v:
| Vydané v: | Proceedings (IEEE International Conference on Emerging Technologies and Factory Automation) s. 293 - 300 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
15.12.2023
|
| Predmet: | |
| ISSN: | 1946-0759 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The Proximal Policy Optimization (PPO) is a policy gradient approach providing state-of-the-art performance in many domains through the "surrogate" objective function using stochastic gradient ascent. While PPO is an appealing approach in reinforcement learning, it does not consider the importance of states (a frequently seen state in a successful trajectory) in policy/value function updates. In this work, we introduce Preferential Proximal Policy Optimization (P3O) which incorporates the importance of these states into parameter updates. First, we determine the importance of each state based on the variance of the action probabilities given a particular state multiplied by the value function, normalized and smoothed using the Exponentially Weighted Moving Average. Then, we incorporate the state's importance in the surrogate objective function. That is, we redefine value and advantage estimation objectives functions in the PPO approach. Unlike other related approaches, we select the importance of states automatically which can be used for any algorithm utilizing a value function. Empirical evaluations across six Atari environments demonstrate that our approach significantly outperforms the baseline (vanilla PPO) across different tested environments, highlighting the value of our proposed method in learning complex environments. |
|---|---|
| AbstractList | The Proximal Policy Optimization (PPO) is a policy gradient approach providing state-of-the-art performance in many domains through the "surrogate" objective function using stochastic gradient ascent. While PPO is an appealing approach in reinforcement learning, it does not consider the importance of states (a frequently seen state in a successful trajectory) in policy/value function updates. In this work, we introduce Preferential Proximal Policy Optimization (P3O) which incorporates the importance of these states into parameter updates. First, we determine the importance of each state based on the variance of the action probabilities given a particular state multiplied by the value function, normalized and smoothed using the Exponentially Weighted Moving Average. Then, we incorporate the state's importance in the surrogate objective function. That is, we redefine value and advantage estimation objectives functions in the PPO approach. Unlike other related approaches, we select the importance of states automatically which can be used for any algorithm utilizing a value function. Empirical evaluations across six Atari environments demonstrate that our approach significantly outperforms the baseline (vanilla PPO) across different tested environments, highlighting the value of our proposed method in learning complex environments. |
| Author | Ebrahimi, Mehran Davoudi, Heidar Balasuntharam, Tamilselvan |
| Author_xml | – sequence: 1 givenname: Tamilselvan surname: Balasuntharam fullname: Balasuntharam, Tamilselvan email: tamilselvan.balasuntharam@ontariotechu.net organization: Ontario Tech University,Oshawa,ON,Canada – sequence: 2 givenname: Heidar surname: Davoudi fullname: Davoudi, Heidar email: heidar.davoudi@ontariotechu.ca organization: Ontario Tech University,Oshawa,ON,Canada – sequence: 3 givenname: Mehran surname: Ebrahimi fullname: Ebrahimi, Mehran email: mehran.ebrahimi@ontariotechu.ca organization: Ontario Tech University,Oshawa,ON,Canada |
| BookMark | eNotjs1OwzAQhA0CiVLyBiD1yiHBztre7LGK-KkU1B7gXNlmLRmlSZXkQHl6guAyM4dvRnMtLrq-YyHulCyUkvSwqV-btakIsShlCYWUUldnIiOkCowEbUDbc7FQpG0u0dCVyMbxc8bmtiWghbjfDRx54G5Krl3thv4rHX5D36ZwWm2PUzqkbzelvrsRl9G1I2f_vhTvT49v9UvebJ839brJk1I05doGRmvJAwdEH90Heqspem9D6VFWETBo1D7MisA8M1E7h84xGeVhKW7_dhMz74_D_Gc47ZXUhkgB_ACIT0ZF |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICMLA58977.2023.00048 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) - NZ IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350345346 |
| EISSN | 1946-0759 |
| EndPage | 300 |
| ExternalDocumentID | 10459913 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i119t-46ce7669b3ec77bfad7b649fbb6c2b708f37c474bcc4773ee7bff4aa7aae951b3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 02:17:08 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i119t-46ce7669b3ec77bfad7b649fbb6c2b708f37c474bcc4773ee7bff4aa7aae951b3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_10459913 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Dec.-15 |
| PublicationDateYYYYMMDD | 2023-12-15 |
| PublicationDate_xml | – month: 12 year: 2023 text: 2023-Dec.-15 day: 15 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE International Conference on Emerging Technologies and Factory Automation) |
| PublicationTitleAbbrev | ICMLA |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0001096939 |
| Score | 2.2405853 |
| Snippet | The Proximal Policy Optimization (PPO) is a policy gradient approach providing state-of-the-art performance in many domains through the "surrogate" objective... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 293 |
| SubjectTerms | Deep Learning Deep Reinforcement Learning Estimation Linear programming Machine Learning Machine learning algorithms Optimization Reinforcement learning Trajectory |
| Title | Preferential Proximal Policy Optimization |
| URI | https://ieeexplore.ieee.org/document/10459913 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5s8eBJxYpvevDiIXV3M93ZHKVYFGrtQaG3kmQnsGBb6UP8-SbpavHgwUvIC8IkTGaY5PsG4JqxlEpjImyiE4EpsVAlW-GNRRboylCaSOI6oOGwGI_VqAarRywMM8fPZ9wJ1fiWX87tOoTKvIZj1_szsgENItqAtbYBFe-MK6lqlI5v3T72ngZ33cJ7OJ2QJDwScxa_sqhEI9Lf_-fyB9DawvHaox9Dcwg7PDuCm1GdIcSr6FsY_qymoRJ5ftvP_iaY1hDLFrz27196D6LOeyCqNFUrgbllynNlJFsi43RJJkfljMltZigpnCSLhMb6kiSzn-NQa9KavcNk5DE0Z_MZn0DblQEJmzBjYTBFp9gaotwl0mJpMnUKrSDn5H1DbTH5FvHsj_5z2AtbGf5zpN0LaK4Wa76EXfuxqpaLq3ggX5AqjMI |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4omuhJjRjfcvDiobi7nd1uj4ZIIC7IARNupO1OExIBg2D8-bZllXjw4KXp69BpM53JtN83ALeEJZcKI2YiFTGMBTFZkmHOWCSergy5DiSuhej389FIDiqwesDCEFH4fEZNXw1v-eXcrHyozGk4ps6f4duwkyIm8RqutQmpOHdcclnhdFzrvtvqFQ9p7nycpk8THqg58195VIIZaR_8cwGHUN8A8hqDH1NzBFs0O4a7QZUjxCnpqx_-nEx9JTD9Np7dXTCtQJZ1eGk_DlsdVmU-YJM4lkuGmSGRZVJzMkJoq0qhM5RW68wkWkS55cKgQG1cKTiRm2NRKaEUOZdJ8xOozeYzOoWGLT0WNiLCXGOMVpLRQmQ24gZLncgzqHs5x29rcovxt4jnf_TfwF5n2CvGRbf_dAH7flv97444vYTacrGiK9g1H8vJ--I6HM4XFROQCQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+International+Conference+on+Emerging+Technologies+and+Factory+Automation%29&rft.atitle=Preferential+Proximal+Policy+Optimization&rft.au=Balasuntharam%2C+Tamilselvan&rft.au=Davoudi%2C+Heidar&rft.au=Ebrahimi%2C+Mehran&rft.date=2023-12-15&rft.pub=IEEE&rft.eissn=1946-0759&rft.spage=293&rft.epage=300&rft_id=info:doi/10.1109%2FICMLA58977.2023.00048&rft.externalDocID=10459913 |