Authentic Boundary Proximal Policy Optimization
In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which...
Saved in:
| Published in: | IEEE transactions on cybernetics Vol. 52; no. 9; pp. 9428 - 9438 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
IEEE
01.09.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 2168-2267, 2168-2275, 2168-2275 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO. |
|---|---|
| AbstractList | In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO.In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO. In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO’s horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO’s clipping operation and TRPO’s trust region constraint has not been well studied. In this article, we first analyze the effect of PPO’s clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO. |
| Author | Cheng, Yuhu Huang, Longyang Wang, Xuesong |
| Author_xml | – sequence: 1 givenname: Yuhu orcidid: 0000-0003-2022-9999 surname: Cheng fullname: Cheng, Yuhu email: chengyuhu@163.com organization: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China – sequence: 2 givenname: Longyang surname: Huang fullname: Huang, Longyang email: lyhuang789@163.com organization: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China – sequence: 3 givenname: Xuesong orcidid: 0000-0002-5327-1088 surname: Wang fullname: Wang, Xuesong email: wangxuesongcumt@163.com organization: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33705327$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kM1rwjAchsNwTOf8A8ZgCLvsUs1HkzRHlX2BoAd32Cm0acoibeOSFub--sX5cfCwXBLC8_54f8816NS21gDcIjhCCIrxavYxHWGI0YhAimLKLkAPI5ZEGHPaOb0Z74KB92sYThK-RHIFuoRwSAnmPTCetM2nrhujhlPb1nnqtsOls9-mSsvh0pZGbYeLTWMq85M2xtY34LJIS68Hh7sP3p-fVrPXaL54eZtN5pEisWiihEFEKEOZyHhOaK5YQpUIRXVKaIFIrDnMRc41QzjPSBxnRZIRLTLNeUq4In3wuJ-7cfar1b6RlfFKl2Vaa9t6iSlEmEFGcUAfztC1bV0d2knMYQxFIggJ1P2BarNK53LjwopuK48qAsD3gHLWe6cLqUzzt3PjUlNKBOXOu9x5lzvv8uA9JNFZ8jj8v8zdPmO01ideEM5YaPsLdXmKvA |
| CODEN | ITCEB8 |
| CitedBy_id | crossref_primary_10_3390_rs16061055 crossref_primary_10_3390_math11224667 crossref_primary_10_1142_S2301385025500669 crossref_primary_10_1016_j_phycom_2025_102821 crossref_primary_10_1109_JAS_2024_124494 crossref_primary_10_1016_j_energy_2024_133705 crossref_primary_10_1016_j_actaastro_2024_07_048 crossref_primary_10_1016_j_oceaneng_2023_115021 crossref_primary_10_1016_j_aei_2024_102836 crossref_primary_10_1016_j_fss_2025_109273 crossref_primary_10_3390_app13010426 crossref_primary_10_1109_TVT_2024_3373175 crossref_primary_10_1016_j_ins_2024_121790 crossref_primary_10_1109_TNSE_2022_3211193 crossref_primary_10_1109_TAI_2024_3354694 crossref_primary_10_1109_TSMC_2024_3449332 crossref_primary_10_1088_1361_6501_ad96d3 crossref_primary_10_1016_j_comnet_2024_110865 crossref_primary_10_1109_TVT_2024_3406896 crossref_primary_10_1109_JIOT_2024_3507290 crossref_primary_10_1109_JIOT_2023_3343590 crossref_primary_10_1109_TVT_2023_3281367 crossref_primary_10_7717_peerj_cs_2708 crossref_primary_10_1109_TAC_2024_3454011 crossref_primary_10_1016_j_apenergy_2024_123348 crossref_primary_10_1109_TNNLS_2024_3481887 |
| Cites_doi | 10.1109/MLSP49062.2020.9231618 10.1038/nature14236 10.1609/aaai.v32i1.11796 10.1109/TII.2017.2783439 10.1609/aaai.v30i1.10295 10.1109/ROBOT.2004.1307456 10.1109/TSMC.2019.2931946 10.1109/TCYB.2019.2921057 10.1109/TCYB.2020.2977374 10.1109/TNNLS.2019.2927227 10.1109/TCYB.2018.2886735 10.1109/TCYB.2020.3023127 10.1109/TCYB.2015.2477810 10.1109/TCYB.2020.3023033 10.1109/ICTAI.2019.00206 10.1109/TPWRS.2018.2881359 10.1109/TNNLS.2018.2805379 10.1109/URAI.2018.8441797 10.1007/978-3-319-71682-4_5 10.1613/jair.301 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| DBID | 97E RIA RIE AAYXX CITATION NPM 7SC 7SP 7TB 8FD F28 FR3 H8D JQ2 L7M L~C L~D 7X8 |
| DOI | 10.1109/TCYB.2021.3051456 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitle | CrossRef PubMed Aerospace Database Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic Aerospace Database PubMed |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEL url: https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Sciences (General) |
| EISSN | 2168-2275 |
| EndPage | 9438 |
| ExternalDocumentID | 33705327 10_1109_TCYB_2021_3051456 9376693 |
| Genre | orig-research Journal Article |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61976215; 61772532 funderid: 10.13039/501100001809 |
| GroupedDBID | 0R~ 4.4 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK AENEX AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD HZ~ IFIPE IPLJI JAVBF M43 O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION NPM 7SC 7SP 7TB 8FD F28 FR3 H8D JQ2 L7M L~C L~D 7X8 |
| ID | FETCH-LOGICAL-c349t-86013561b9b7d35dc685c9305ea35f134e70d9d7e612db344bf8b3e9be77a37c3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 35 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000732135700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2168-2267 2168-2275 |
| IngestDate | Thu Oct 02 11:42:20 EDT 2025 Mon Jun 30 04:28:27 EDT 2025 Thu Jan 02 22:56:28 EST 2025 Sat Nov 29 02:02:33 EST 2025 Tue Nov 18 22:16:48 EST 2025 Wed Aug 27 02:22:58 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c349t-86013561b9b7d35dc685c9305ea35f134e70d9d7e612db344bf8b3e9be77a37c3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0003-2022-9999 0000-0002-5327-1088 |
| PMID | 33705327 |
| PQID | 2704098933 |
| PQPubID | 85422 |
| PageCount | 11 |
| ParticipantIDs | proquest_miscellaneous_2501260652 proquest_journals_2704098933 crossref_citationtrail_10_1109_TCYB_2021_3051456 ieee_primary_9376693 pubmed_primary_33705327 crossref_primary_10_1109_TCYB_2021_3051456 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-09-01 |
| PublicationDateYYYYMMDD | 2022-09-01 |
| PublicationDate_xml | – month: 09 year: 2022 text: 2022-09-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States – name: Piscataway |
| PublicationTitle | IEEE transactions on cybernetics |
| PublicationTitleAbbrev | TCYB |
| PublicationTitleAlternate | IEEE Trans Cybern |
| PublicationYear | 2022 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 ref14 Rothfuss (ref28) 2018 Tuan (ref33) 2018 Ilyas (ref34) 2018 ref11 ref10 ref32 ref2 Schulman (ref25) 2017 ref1 Wang (ref24) 2019 Kakade (ref37); 2 ref16 Ilyas (ref36) 2018 ref18 Haarnoja (ref19) 2018 Chu (ref27) 2018 Wang (ref31) 2019 Wang (ref17); 48 Sutton (ref21) 2000 ref26 ref20 Engstrom (ref35) 2020 Imagawa (ref30) 2019 Chen (ref23) 2018 Kullback (ref38) 1968 ref8 ref7 ref9 ref4 ref3 ref6 ref5 Stadie (ref29) 2018 Schulman (ref22) |
| References_xml | – volume-title: Implementation matters in deep policy gradients: A case study on PPO and TRPO year: 2020 ident: ref35 – ident: ref32 doi: 10.1109/MLSP49062.2020.9231618 – ident: ref13 doi: 10.1038/nature14236 – ident: ref18 doi: 10.1609/aaai.v32i1.11796 – volume-title: Optimistic proximal policy optimization year: 2019 ident: ref30 – ident: ref7 doi: 10.1109/TII.2017.2783439 – volume-title: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor year: 2018 ident: ref19 – ident: ref16 doi: 10.1609/aaai.v30i1.10295 – ident: ref20 doi: 10.1109/ROBOT.2004.1307456 – volume: 48 start-page: 1995 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref17 article-title: Dueling network architectures for deep reinforcement learning – ident: ref1 doi: 10.1109/TSMC.2019.2931946 – ident: ref12 doi: 10.1109/TCYB.2019.2921057 – year: 2018 ident: ref23 publication-title: An adaptive clipping approach for proximal policy optimization – volume-title: Proximal policy optimization algorithms year: 2017 ident: ref25 – volume-title: A closer look at deep policy gradients year: 2018 ident: ref34 – start-page: 1057 volume-title: Advances in Neural Information Processing Systems year: 2000 ident: ref21 article-title: Policy gradient methods for reinforcement learning with function approximation – volume-title: Information Theory and Statistics year: 1968 ident: ref38 – volume-title: A closer look at deep policy gradients year: 2018 ident: ref36 – volume-title: Proximal policy optimization and its dynamic version for sequence generation year: 2018 ident: ref33 – ident: ref3 doi: 10.1109/TCYB.2020.2977374 – ident: ref14 doi: 10.1109/TNNLS.2019.2927227 – ident: ref11 doi: 10.1109/TCYB.2018.2886735 – ident: ref4 doi: 10.1109/TCYB.2020.3023127 – volume-title: Policy optimization with penalized point probability distance: an alternative to proximal policy optimization year: 2018 ident: ref27 – ident: ref10 doi: 10.1109/TCYB.2015.2477810 – ident: ref5 doi: 10.1109/TCYB.2020.3023033 – ident: ref26 doi: 10.1109/ICTAI.2019.00206 – volume-title: Some considerations on learning to explore via meta-reinforcement learning year: 2018 ident: ref29 – ident: ref9 doi: 10.1109/TPWRS.2018.2881359 – ident: ref2 doi: 10.1109/TNNLS.2018.2805379 – ident: ref6 doi: 10.1109/URAI.2018.8441797 – volume-title: Truly proximal policy optimization year: 2019 ident: ref24 – volume: 2 start-page: 267 volume-title: Proc. 19th Int. Conf. Mach. Learn. ident: ref37 article-title: Approximately optimal approximate reinforcement learning – start-page: 1889 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref22 article-title: Trust region policy optimization – volume-title: ProMP: Proximal meta-policy search year: 2018 ident: ref28 – ident: ref8 doi: 10.1007/978-3-319-71682-4_5 – ident: ref15 doi: 10.1613/jair.301 – volume-title: Trust region-guided proximal policy optimization year: 2019 ident: ref31 |
| SSID | ssj0000816898 |
| Score | 2.483906 |
| Snippet | In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging... |
| SourceID | proquest pubmed crossref ieee |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 9428 |
| SubjectTerms | Algorithms Authentic boundary Control tasks Games Learning theory Linear programming Machine learning Neural networks Optimization penalized point policy difference Performance enhancement proximal policy optimization (PPO) Reinforcement learning reinforcement learning (RL) Robot control Robots rollback clipping Task analysis |
| Title | Authentic Boundary Proximal Policy Optimization |
| URI | https://ieeexplore.ieee.org/document/9376693 https://www.ncbi.nlm.nih.gov/pubmed/33705327 https://www.proquest.com/docview/2704098933 https://www.proquest.com/docview/2501260652 |
| Volume | 52 |
| WOSCitedRecordID | wos000732135700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEL customDbUrl: eissn: 2168-2275 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816898 issn: 2168-2267 databaseCode: RIE dateStart: 20130101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB5UPHjx_VhfVPCgYt220zbJUUXxpB4U1lPJqyDoruxD9N87SbMFQQVvgSRtOpnpzCQz8wEcypozJaWNU16zOE8lxly7CrQWbVlrUhpcebAJdnvLez1xPwOnbS6MtdYHn9kz1_R3-WagJ-6orEuqtCwFzsIsY2WTq9Wep3gACQ99m1EjJquChUvMNBHdh8unC3IGs_QMXcHvwiEXITIHi8C-aSQPsfK7tem1zvXS_9a7DIvBuozOG3ZYgRnbX4WVIL-j6CgUmT5eg647G3ORQjq68MhKw8_ofjj4eH6l-U2x4OiOfievIU9zHR6vrx4ub-IAnhBrzMU45uRpIRlHSihmsDC65IUW9PVWYlGnmFuWGGGYJRPHKMxzVXOFVijLmESmcQPm-oO-3YLI1GWdZTJBbmTOpVSaJTnWgiG5a6o0HUimBKx0qCzuAC5eKu9hJKJy5K8c-atA_g6ctFPemrIafw1ec7RtBwaydmB3uktVELxRldHSEkFGGHUftN0kMu4eRPbtYEJjCtLK5LgVWQc2m91tnz1liu2f37kDC5nLf_BBZrswNx5O7B7M6_fx82i4T3zZ4_ueL78AQ_7aMA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB5RWqm9lFLasi1tg9RDqQibZJzYPgIqAgFbDlsJTpFfkZDKLtpHBf-eseONhARI3CzZTpzxTOZhz3wAP1QjuFbKpbloeMpyhakwvgKtQ1c1hpSG0AFsgg8G4vxcni3BdpcL45wLl8_cjm-Gs3w7NnMfKuuTKq0qiS_gZclYkbXZWl1EJUBIBPDbghop2RU8HmPmmewP9y_2yB0s8h30Jb9Lj12EyD0wAr-nkwLIyuP2ZtA7ByvPW_E7eBvty2S3ZYhVWHKj97AaJXia_IxlprfWoO-jY_6ukEn2ArbS5DY5m4xvLq9oflsuOPlDP5SrmKn5Af4e_B7uH6YRPiE1yOQsFeRrIZlHWmpusbSmEqWR9PVOYdnkyBzPrLTckZFjNTKmG6HRSe04V8gNfoTl0Xjk1iGxTdUUhcpQWMWEUtrwjGEjOZLDpivbg2xBwNrE2uIe4uJfHXyMTNae_LUnfx3J34Nf3ZTrtrDGU4PXPG27gZGsPdhY7FIdRW9aF7S0TJIZRt2bXTcJjT8JUSM3ntOYkvQyuW5l0YNP7e52z14wxeeH3_kdXh8OT0_qk6PB8Rd4U_hsiHDlbAOWZ5O5-wqvzP_Z5XTyLXDnHaRT3I8 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Authentic+Boundary+Proximal+Policy+Optimization&rft.jtitle=IEEE+transactions+on+cybernetics&rft.au=Cheng%2C+Yuhu&rft.au=Huang%2C+Longyang&rft.au=Wang%2C+Xuesong&rft.date=2022-09-01&rft.eissn=2168-2275&rft.volume=PP&rft_id=info:doi/10.1109%2FTCYB.2021.3051456&rft_id=info%3Apmid%2F33705327&rft.externalDocID=33705327 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2267&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2267&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2267&client=summon |