Preferential Proximal Policy Optimization

The Proximal Policy Optimization (PPO) is a policy gradient approach providing state-of-the-art performance in many domains through the "surrogate" objective function using stochastic gradient ascent. While PPO is an appealing approach in reinforcement learning, it does not consider the im...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings (IEEE International Conference on Emerging Technologies and Factory Automation) s. 293 - 300
Hlavní autori: Balasuntharam, Tamilselvan, Davoudi, Heidar, Ebrahimi, Mehran
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 15.12.2023
Predmet:
ISSN:1946-0759
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract The Proximal Policy Optimization (PPO) is a policy gradient approach providing state-of-the-art performance in many domains through the "surrogate" objective function using stochastic gradient ascent. While PPO is an appealing approach in reinforcement learning, it does not consider the importance of states (a frequently seen state in a successful trajectory) in policy/value function updates. In this work, we introduce Preferential Proximal Policy Optimization (P3O) which incorporates the importance of these states into parameter updates. First, we determine the importance of each state based on the variance of the action probabilities given a particular state multiplied by the value function, normalized and smoothed using the Exponentially Weighted Moving Average. Then, we incorporate the state's importance in the surrogate objective function. That is, we redefine value and advantage estimation objectives functions in the PPO approach. Unlike other related approaches, we select the importance of states automatically which can be used for any algorithm utilizing a value function. Empirical evaluations across six Atari environments demonstrate that our approach significantly outperforms the baseline (vanilla PPO) across different tested environments, highlighting the value of our proposed method in learning complex environments.
AbstractList The Proximal Policy Optimization (PPO) is a policy gradient approach providing state-of-the-art performance in many domains through the "surrogate" objective function using stochastic gradient ascent. While PPO is an appealing approach in reinforcement learning, it does not consider the importance of states (a frequently seen state in a successful trajectory) in policy/value function updates. In this work, we introduce Preferential Proximal Policy Optimization (P3O) which incorporates the importance of these states into parameter updates. First, we determine the importance of each state based on the variance of the action probabilities given a particular state multiplied by the value function, normalized and smoothed using the Exponentially Weighted Moving Average. Then, we incorporate the state's importance in the surrogate objective function. That is, we redefine value and advantage estimation objectives functions in the PPO approach. Unlike other related approaches, we select the importance of states automatically which can be used for any algorithm utilizing a value function. Empirical evaluations across six Atari environments demonstrate that our approach significantly outperforms the baseline (vanilla PPO) across different tested environments, highlighting the value of our proposed method in learning complex environments.
Author Ebrahimi, Mehran
Davoudi, Heidar
Balasuntharam, Tamilselvan
Author_xml – sequence: 1
  givenname: Tamilselvan
  surname: Balasuntharam
  fullname: Balasuntharam, Tamilselvan
  email: tamilselvan.balasuntharam@ontariotechu.net
  organization: Ontario Tech University,Oshawa,ON,Canada
– sequence: 2
  givenname: Heidar
  surname: Davoudi
  fullname: Davoudi, Heidar
  email: heidar.davoudi@ontariotechu.ca
  organization: Ontario Tech University,Oshawa,ON,Canada
– sequence: 3
  givenname: Mehran
  surname: Ebrahimi
  fullname: Ebrahimi, Mehran
  email: mehran.ebrahimi@ontariotechu.ca
  organization: Ontario Tech University,Oshawa,ON,Canada
BookMark eNotjs1OwzAQhA0CiVLyBiD1yiHBztre7LGK-KkU1B7gXNlmLRmlSZXkQHl6guAyM4dvRnMtLrq-YyHulCyUkvSwqV-btakIsShlCYWUUldnIiOkCowEbUDbc7FQpG0u0dCVyMbxc8bmtiWghbjfDRx54G5Krl3thv4rHX5D36ZwWm2PUzqkbzelvrsRl9G1I2f_vhTvT49v9UvebJ839brJk1I05doGRmvJAwdEH90Heqspem9D6VFWETBo1D7MisA8M1E7h84xGeVhKW7_dhMz74_D_Gc47ZXUhkgB_ACIT0ZF
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICMLA58977.2023.00048
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL) - NZ
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350345346
EISSN 1946-0759
EndPage 300
ExternalDocumentID 10459913
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i119t-46ce7669b3ec77bfad7b649fbb6c2b708f37c474bcc4773ee7bff4aa7aae951b3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:17:08 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-46ce7669b3ec77bfad7b649fbb6c2b708f37c474bcc4773ee7bff4aa7aae951b3
PageCount 8
ParticipantIDs ieee_primary_10459913
PublicationCentury 2000
PublicationDate 2023-Dec.-15
PublicationDateYYYYMMDD 2023-12-15
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-Dec.-15
  day: 15
PublicationDecade 2020
PublicationTitle Proceedings (IEEE International Conference on Emerging Technologies and Factory Automation)
PublicationTitleAbbrev ICMLA
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001096939
Score 2.2405853
Snippet The Proximal Policy Optimization (PPO) is a policy gradient approach providing state-of-the-art performance in many domains through the "surrogate" objective...
SourceID ieee
SourceType Publisher
StartPage 293
SubjectTerms Deep Learning
Deep Reinforcement Learning
Estimation
Linear programming
Machine Learning
Machine learning algorithms
Optimization
Reinforcement learning
Trajectory
Title Preferential Proximal Policy Optimization
URI https://ieeexplore.ieee.org/document/10459913
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5s8eBJxYpvevDiIXV3M93ZHKVYFGrtQaG3kmQnsGBb6UP8-SbpavHgwUvIC8IkTGaY5PsG4JqxlEpjImyiE4EpsVAlW-GNRRboylCaSOI6oOGwGI_VqAarRywMM8fPZ9wJ1fiWX87tOoTKvIZj1_szsgENItqAtbYBFe-MK6lqlI5v3T72ngZ33cJ7OJ2QJDwScxa_sqhEI9Lf_-fyB9DawvHaox9Dcwg7PDuCm1GdIcSr6FsY_qymoRJ5ftvP_iaY1hDLFrz27196D6LOeyCqNFUrgbllynNlJFsi43RJJkfljMltZigpnCSLhMb6kiSzn-NQa9KavcNk5DE0Z_MZn0DblQEJmzBjYTBFp9gaotwl0mJpMnUKrSDn5H1DbTH5FvHsj_5z2AtbGf5zpN0LaK4Wa76EXfuxqpaLq3ggX5AqjMI
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4omuhJjRjfcvDiobi7nd1uj4ZIIC7IARNupO1OExIBg2D8-bZllXjw4KXp69BpM53JtN83ALeEJZcKI2YiFTGMBTFZkmHOWCSergy5DiSuhej389FIDiqwesDCEFH4fEZNXw1v-eXcrHyozGk4ps6f4duwkyIm8RqutQmpOHdcclnhdFzrvtvqFQ9p7nycpk8THqg58195VIIZaR_8cwGHUN8A8hqDH1NzBFs0O4a7QZUjxCnpqx_-nEx9JTD9Np7dXTCtQJZ1eGk_DlsdVmU-YJM4lkuGmSGRZVJzMkJoq0qhM5RW68wkWkS55cKgQG1cKTiRm2NRKaEUOZdJ8xOozeYzOoWGLT0WNiLCXGOMVpLRQmQ24gZLncgzqHs5x29rcovxt4jnf_TfwF5n2CvGRbf_dAH7flv97444vYTacrGiK9g1H8vJ--I6HM4XFROQCQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+International+Conference+on+Emerging+Technologies+and+Factory+Automation%29&rft.atitle=Preferential+Proximal+Policy+Optimization&rft.au=Balasuntharam%2C+Tamilselvan&rft.au=Davoudi%2C+Heidar&rft.au=Ebrahimi%2C+Mehran&rft.date=2023-12-15&rft.pub=IEEE&rft.eissn=1946-0759&rft.spage=293&rft.epage=300&rft_id=info:doi/10.1109%2FICMLA58977.2023.00048&rft.externalDocID=10459913