Authentic Boundary Proximal Policy Optimization

In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on cybernetics Ročník 52; číslo 9; s. 9428 - 9438
Hlavní autoři: Cheng, Yuhu, Huang, Longyang, Wang, Xuesong
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States IEEE 01.09.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:2168-2267, 2168-2275, 2168-2275
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO.
AbstractList In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO.In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO.
In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO’s horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO’s clipping operation and TRPO’s trust region constraint has not been well studied. In this article, we first analyze the effect of PPO’s clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO.
Author Cheng, Yuhu
Huang, Longyang
Wang, Xuesong
Author_xml – sequence: 1
  givenname: Yuhu
  orcidid: 0000-0003-2022-9999
  surname: Cheng
  fullname: Cheng, Yuhu
  email: chengyuhu@163.com
  organization: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China
– sequence: 2
  givenname: Longyang
  surname: Huang
  fullname: Huang, Longyang
  email: lyhuang789@163.com
  organization: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China
– sequence: 3
  givenname: Xuesong
  orcidid: 0000-0002-5327-1088
  surname: Wang
  fullname: Wang, Xuesong
  email: wangxuesongcumt@163.com
  organization: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33705327$$D View this record in MEDLINE/PubMed
BookMark eNp9kM1rwjAchsNwTOf8A8ZgCLvsUs1HkzRHlX2BoAd32Cm0acoibeOSFub--sX5cfCwXBLC8_54f8816NS21gDcIjhCCIrxavYxHWGI0YhAimLKLkAPI5ZEGHPaOb0Z74KB92sYThK-RHIFuoRwSAnmPTCetM2nrhujhlPb1nnqtsOls9-mSsvh0pZGbYeLTWMq85M2xtY34LJIS68Hh7sP3p-fVrPXaL54eZtN5pEisWiihEFEKEOZyHhOaK5YQpUIRXVKaIFIrDnMRc41QzjPSBxnRZIRLTLNeUq4In3wuJ-7cfar1b6RlfFKl2Vaa9t6iSlEmEFGcUAfztC1bV0d2knMYQxFIggJ1P2BarNK53LjwopuK48qAsD3gHLWe6cLqUzzt3PjUlNKBOXOu9x5lzvv8uA9JNFZ8jj8v8zdPmO01ideEM5YaPsLdXmKvA
CODEN ITCEB8
CitedBy_id crossref_primary_10_3390_rs16061055
crossref_primary_10_3390_math11224667
crossref_primary_10_1142_S2301385025500669
crossref_primary_10_1016_j_phycom_2025_102821
crossref_primary_10_1109_JAS_2024_124494
crossref_primary_10_1016_j_energy_2024_133705
crossref_primary_10_1016_j_actaastro_2024_07_048
crossref_primary_10_1016_j_oceaneng_2023_115021
crossref_primary_10_1016_j_aei_2024_102836
crossref_primary_10_1016_j_fss_2025_109273
crossref_primary_10_3390_app13010426
crossref_primary_10_1109_TVT_2024_3373175
crossref_primary_10_1016_j_ins_2024_121790
crossref_primary_10_1109_TNSE_2022_3211193
crossref_primary_10_1109_TAI_2024_3354694
crossref_primary_10_1109_TSMC_2024_3449332
crossref_primary_10_1088_1361_6501_ad96d3
crossref_primary_10_1016_j_comnet_2024_110865
crossref_primary_10_1109_TVT_2024_3406896
crossref_primary_10_1109_JIOT_2024_3507290
crossref_primary_10_1109_JIOT_2023_3343590
crossref_primary_10_1109_TVT_2023_3281367
crossref_primary_10_7717_peerj_cs_2708
crossref_primary_10_1109_TAC_2024_3454011
crossref_primary_10_1016_j_apenergy_2024_123348
crossref_primary_10_1109_TNNLS_2024_3481887
Cites_doi 10.1109/MLSP49062.2020.9231618
10.1038/nature14236
10.1609/aaai.v32i1.11796
10.1109/TII.2017.2783439
10.1609/aaai.v30i1.10295
10.1109/ROBOT.2004.1307456
10.1109/TSMC.2019.2931946
10.1109/TCYB.2019.2921057
10.1109/TCYB.2020.2977374
10.1109/TNNLS.2019.2927227
10.1109/TCYB.2018.2886735
10.1109/TCYB.2020.3023127
10.1109/TCYB.2015.2477810
10.1109/TCYB.2020.3023033
10.1109/ICTAI.2019.00206
10.1109/TPWRS.2018.2881359
10.1109/TNNLS.2018.2805379
10.1109/URAI.2018.8441797
10.1007/978-3-319-71682-4_5
10.1613/jair.301
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7SC
7SP
7TB
8FD
F28
FR3
H8D
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TCYB.2021.3051456
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Aerospace Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
Aerospace Database
PubMed

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
EISSN 2168-2275
EndPage 9438
ExternalDocumentID 33705327
10_1109_TCYB_2021_3051456
9376693
Genre orig-research
Journal Article
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61976215; 61772532
  funderid: 10.13039/501100001809
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
AENEX
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
HZ~
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
NPM
7SC
7SP
7TB
8FD
F28
FR3
H8D
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c349t-86013561b9b7d35dc685c9305ea35f134e70d9d7e612db344bf8b3e9be77a37c3
IEDL.DBID RIE
ISICitedReferencesCount 35
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000732135700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2168-2267
2168-2275
IngestDate Thu Oct 02 11:42:20 EDT 2025
Mon Jun 30 04:28:27 EDT 2025
Thu Jan 02 22:56:28 EST 2025
Sat Nov 29 02:02:33 EST 2025
Tue Nov 18 22:16:48 EST 2025
Wed Aug 27 02:22:58 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c349t-86013561b9b7d35dc685c9305ea35f134e70d9d7e612db344bf8b3e9be77a37c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0003-2022-9999
0000-0002-5327-1088
PMID 33705327
PQID 2704098933
PQPubID 85422
PageCount 11
ParticipantIDs proquest_miscellaneous_2501260652
proquest_journals_2704098933
crossref_citationtrail_10_1109_TCYB_2021_3051456
ieee_primary_9376693
pubmed_primary_33705327
crossref_primary_10_1109_TCYB_2021_3051456
PublicationCentury 2000
PublicationDate 2022-09-01
PublicationDateYYYYMMDD 2022-09-01
PublicationDate_xml – month: 09
  year: 2022
  text: 2022-09-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Piscataway
PublicationTitle IEEE transactions on cybernetics
PublicationTitleAbbrev TCYB
PublicationTitleAlternate IEEE Trans Cybern
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
Rothfuss (ref28) 2018
Tuan (ref33) 2018
Ilyas (ref34) 2018
ref11
ref10
ref32
ref2
Schulman (ref25) 2017
ref1
Wang (ref24) 2019
Kakade (ref37); 2
ref16
Ilyas (ref36) 2018
ref18
Haarnoja (ref19) 2018
Chu (ref27) 2018
Wang (ref31) 2019
Wang (ref17); 48
Sutton (ref21) 2000
ref26
ref20
Engstrom (ref35) 2020
Imagawa (ref30) 2019
Chen (ref23) 2018
Kullback (ref38) 1968
ref8
ref7
ref9
ref4
ref3
ref6
ref5
Stadie (ref29) 2018
Schulman (ref22)
References_xml – volume-title: Implementation matters in deep policy gradients: A case study on PPO and TRPO
  year: 2020
  ident: ref35
– ident: ref32
  doi: 10.1109/MLSP49062.2020.9231618
– ident: ref13
  doi: 10.1038/nature14236
– ident: ref18
  doi: 10.1609/aaai.v32i1.11796
– volume-title: Optimistic proximal policy optimization
  year: 2019
  ident: ref30
– ident: ref7
  doi: 10.1109/TII.2017.2783439
– volume-title: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
  year: 2018
  ident: ref19
– ident: ref16
  doi: 10.1609/aaai.v30i1.10295
– ident: ref20
  doi: 10.1109/ROBOT.2004.1307456
– volume: 48
  start-page: 1995
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref17
  article-title: Dueling network architectures for deep reinforcement learning
– ident: ref1
  doi: 10.1109/TSMC.2019.2931946
– ident: ref12
  doi: 10.1109/TCYB.2019.2921057
– year: 2018
  ident: ref23
  publication-title: An adaptive clipping approach for proximal policy optimization
– volume-title: Proximal policy optimization algorithms
  year: 2017
  ident: ref25
– volume-title: A closer look at deep policy gradients
  year: 2018
  ident: ref34
– start-page: 1057
  volume-title: Advances in Neural Information Processing Systems
  year: 2000
  ident: ref21
  article-title: Policy gradient methods for reinforcement learning with function approximation
– volume-title: Information Theory and Statistics
  year: 1968
  ident: ref38
– volume-title: A closer look at deep policy gradients
  year: 2018
  ident: ref36
– volume-title: Proximal policy optimization and its dynamic version for sequence generation
  year: 2018
  ident: ref33
– ident: ref3
  doi: 10.1109/TCYB.2020.2977374
– ident: ref14
  doi: 10.1109/TNNLS.2019.2927227
– ident: ref11
  doi: 10.1109/TCYB.2018.2886735
– ident: ref4
  doi: 10.1109/TCYB.2020.3023127
– volume-title: Policy optimization with penalized point probability distance: an alternative to proximal policy optimization
  year: 2018
  ident: ref27
– ident: ref10
  doi: 10.1109/TCYB.2015.2477810
– ident: ref5
  doi: 10.1109/TCYB.2020.3023033
– ident: ref26
  doi: 10.1109/ICTAI.2019.00206
– volume-title: Some considerations on learning to explore via meta-reinforcement learning
  year: 2018
  ident: ref29
– ident: ref9
  doi: 10.1109/TPWRS.2018.2881359
– ident: ref2
  doi: 10.1109/TNNLS.2018.2805379
– ident: ref6
  doi: 10.1109/URAI.2018.8441797
– volume-title: Truly proximal policy optimization
  year: 2019
  ident: ref24
– volume: 2
  start-page: 267
  volume-title: Proc. 19th Int. Conf. Mach. Learn.
  ident: ref37
  article-title: Approximately optimal approximate reinforcement learning
– start-page: 1889
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref22
  article-title: Trust region policy optimization
– volume-title: ProMP: Proximal meta-policy search
  year: 2018
  ident: ref28
– ident: ref8
  doi: 10.1007/978-3-319-71682-4_5
– ident: ref15
  doi: 10.1613/jair.301
– volume-title: Trust region-guided proximal policy optimization
  year: 2019
  ident: ref31
SSID ssj0000816898
Score 2.483906
Snippet In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 9428
SubjectTerms Algorithms
Authentic boundary
Control tasks
Games
Learning theory
Linear programming
Machine learning
Neural networks
Optimization
penalized point policy difference
Performance enhancement
proximal policy optimization (PPO)
Reinforcement learning
reinforcement learning (RL)
Robot control
Robots
rollback clipping
Task analysis
Title Authentic Boundary Proximal Policy Optimization
URI https://ieeexplore.ieee.org/document/9376693
https://www.ncbi.nlm.nih.gov/pubmed/33705327
https://www.proquest.com/docview/2704098933
https://www.proquest.com/docview/2501260652
Volume 52
WOSCitedRecordID wos000732135700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2168-2275
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816898
  issn: 2168-2267
  databaseCode: RIE
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NaxUxEB_a4sGLWuvHq7Ws4EHF7dtkNpvN0RaLB6k9VHielnzMg0L7nrwPaf_7TrJ5CwUreAsk2Y_JTOYjmfkBvJdGKodSlOiULZkppqWtSJVWKGuVriSJlCj8XZ-dtZOJOd-Cz0MuDBGly2d0FJvpLD_M_TqGysasSpvG4DZsa930uVpDPCUBSCToW8mNkq0KnQ8xRWXGFye_jtkZlOIIY8FvFZGLEHWERdD3NFKCWHnY2kxa5_Tp_33vM3iSrcviS88Ou7BFs-ewm-V3WXzIRaY_7sE4xsbiTSFfHCdkpcVtcb6Y31xe8_y-WHDxg7eT65yn-QJ-nn69OPlWZvCE0mNtVmXLnhayceSM0wFV8E2rvOG_J4tqKrAmXQUTNLGJExzWtZu2Dsk40tqi9vgSdmbzGb2GwjfGUEAnptTWaKUVTWWFdyF4bVlqR1BtCNj5XFk8AlxcdcnDqEwXyd9F8neZ_CP4NEz53ZfV-NfgvUjbYWAm6wgONqvUZcFbdlLzrmTYCOPud0M3i0w8B7Ezmq95jGKtzI6bkiN41a_u8OwNU-z__Z1v4LGM-Q_pktkB7KwWa3oLj_yf1eVycch8OWkPE1_eAYFh2cs
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dTxQxEJ8gmOiLgIiegi6JD2pYbtvZbrePQCAYj5OHM8GnTb8uIYE7cx9G_nun3d4mJGriW5O2-zGd6Xy0Mz-A91xxYZCzHI3QOTHFONeFF7lmQmshC-5ZTBQeyOGwvr5WV2tw2OXCeO_j5TN_FJrxLN9N7TKEyvqkSqtK4SPYEGXJizZbq4uoRAiJCH7LqZGTXSHTMSYrVH90-v2E3EHOjjCU_BYBuwhRBmAE-UAnRZCVv9ubUe-cb_7fF2_Bs2RfZsctQ2zDmp88h-0kwfPsQyoz_XEH-iE6Fu4K2ewkYivN7rOr2fTXzR3Nb8sFZ19pQ7lLmZov4Nv52ej0Ik_wCbnFUi3ymnwtJPPIKCMdCmerWlhFf-81ijHD0svCKSc9GTnOYFmacW3QK-Ol1Cgt7sL6ZDrxryCzlVLeoWFjX5eouWZVoZk1zlmpSW57UKwI2NhUWzxAXNw20ccoVBPI3wTyN4n8PfjUTfnRFtb41-CdQNtuYCJrD_ZWq9Qk0Zs3XNK-pMgMo-6DrpuEJpyE6ImfLmmMIL1MrpvgPXjZrm737BVTvP7zO9_Bk4vR5aAZfB5-eQNPeciGiFfO9mB9MVv6fXhsfy5u5rO3kTt_A9323Co
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Authentic+Boundary+Proximal+Policy+Optimization&rft.jtitle=IEEE+transactions+on+cybernetics&rft.au=Cheng%2C+Yuhu&rft.au=Huang%2C+Longyang&rft.au=Wang%2C+Xuesong&rft.date=2022-09-01&rft.issn=2168-2275&rft.eissn=2168-2275&rft.volume=52&rft.issue=9&rft.spage=9428&rft_id=info:doi/10.1109%2FTCYB.2021.3051456&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2267&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2267&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2267&client=summon