Reward-Guided Synthesis of Intelligent Agents with Control Structures
Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon rob...
Saved in:
| Published in: | Proceedings of ACM on programming languages Vol. 8; no. PLDI; pp. 1730 - 1754 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York, NY, USA
ACM
20.06.2024
|
| Subjects: | |
| ISSN: | 2475-1421, 2475-1421 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: • Software and its engineering → Automatic programming. |
|---|---|
| AbstractList | Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation.
CCS Concepts: • Software and its engineering → Automatic programming. Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: • Software and its engineering → Automatic programming. |
| ArticleNumber | 217 |
| Author | Qiu, Wenjie Wang, Yuning Zhu, He Cui, Guofeng |
| Author_xml | – sequence: 1 givenname: Guofeng orcidid: 0000-0002-7994-915X surname: Cui fullname: Cui, Guofeng email: gc669@cs.rutgers.edu organization: Rutgers University, New Brunswick, USA – sequence: 2 givenname: Yuning orcidid: 0009-0000-4317-9758 surname: Wang fullname: Wang, Yuning email: yw895@rutgers.edu organization: Rutgers University, New Brunswick, USA – sequence: 3 givenname: Wenjie orcidid: 0000-0002-2271-6443 surname: Qiu fullname: Qiu, Wenjie email: wenjie.qiu@rutgers.edu organization: Rutgers University, New Brunswick, USA – sequence: 4 givenname: He orcidid: 0000-0001-9606-150X surname: Zhu fullname: Zhu, He email: he.zhu.cs@rutgers.edu organization: Rutgers University, New Brunswick, USA |
| BookMark | eNpNkE1Lw0AQhhepYK3Fu6e9eYru7FeSYwm1FgqC1XNIdmdtJE1kd0Ppv9fQKh5m5oX3YQ7PNZl0fYeE3AJ7AJDqUWilpUwvyJTLVCUgOUz-5SsyD-GTMQa5kJnIp2T5iofK22Q1NBYt3R67uMPQBNo7uu4itm3zgV2ki3EHemjijhZ9F33f0m30g4mDx3BDLl3VBpyf74y8Py3fiudk87JaF4tNUnGRxwRS1DVXjtVCG42ZHUcIi1DnsuZoHePaCKalA81Fyrg1KuMMALUCZsSM3J_-Gt-H4NGVX77ZV_5YAitHAeVZwA95dyIrs_-Dfstvt19WXg |
| Cites_doi | 10.5281/zenodo.10976438 10.1007/978-3-030-53291-8_30 10.1016/j.artint.2010.10.006 10.1109/ASE56229.2023.00129 10.1007/978-0-387-30164-8_244 10.1609/aaai.v34i06.6587 10.1145/2535838.2535859 10.1007/3-540-44914-0_2 10.1109/IROS51168.2021.9635941 10.1007/978-3-031-43421-1_36 10.1007/11871842_29 10.1109/IROS.2012.6386109 10.1145/3009837.3009851 10.1109/IROS.2017.8202206 10.1109/CVPR.2016.91 10.1145/3428295 |
| ContentType | Journal Article |
| Copyright | Owner/Author |
| Copyright_xml | – notice: Owner/Author |
| DBID | AAYXX CITATION |
| DOI | 10.1145/3656447 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2475-1421 |
| EndPage | 1754 |
| ExternalDocumentID | 10_1145_3656447 3656447 |
| GrantInformation_xml | – fundername: National Science Foundation grantid: CCF-2124155 funderid: https:\/\/doi.org\/10.13039\/100000001 |
| GroupedDBID | AAKMM AAYFX ACM AEFXT AEJOY AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS GUFHI LHSKQ M~E OK1 ROL AAYXX CITATION |
| ID | FETCH-LOGICAL-a239t-17e6b25f0b36c6e8d6e8d33de1b94b2edf026c3064f1623702dc582011e6510c3 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001264464100072&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2475-1421 |
| IngestDate | Sat Nov 29 07:45:10 EST 2025 Mon Jul 07 16:40:28 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | PLDI |
| Keywords | Sequential Decision Making Program Synthesis |
| Language | English |
| License | This work is licensed under a Creative Commons Attribution International 4.0 License. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-a239t-17e6b25f0b36c6e8d6e8d33de1b94b2edf026c3064f1623702dc582011e6510c3 |
| ORCID | 0000-0002-7994-915X 0000-0001-9606-150X 0009-0000-4317-9758 0000-0002-2271-6443 |
| OpenAccessLink | https://dl.acm.org/doi/10.1145/3656447 |
| PageCount | 25 |
| ParticipantIDs | crossref_primary_10_1145_3656447 acm_primary_3656447 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-06-20 |
| PublicationDateYYYYMMDD | 2024-06-20 |
| PublicationDate_xml | – month: 06 year: 2024 text: 2024-06-20 day: 20 |
| PublicationDecade | 2020 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationTitle | Proceedings of ACM on programming languages |
| PublicationTitleAbbrev | ACM PACMPL |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| References | (bib43) 2016 (bib53) 2018 (bib14) 2023 (bib40) 2016 (bib50) 2019 (bib1) 2002 (bib54) 2020 (bib23) 2018 (bib38) 1981 (bib29) 2006 (bib20) 2017 (bib30) 2010 (bib41) 2020 (bib24) 2017 (bib5) 2016 (bib16) 2017 (bib52) 2017 (bib47) 2011 (bib36) 2018 (bib21) 2017 (bib27) 2019 (bib19) 2018 (bib44) 2017 (bib55) 2021 (bib32) 2020 (bib42) 2022 (bib25) 2018 (bib35) 2018 (bib48) 2012 (bib8) 2019 (bib12) 2024 (bib51) 2018 (bib4) 2017 (bib18) 2019 (bib26) 2020 (bib46) 2021 (bib7) 2014 (bib39) 2020 (bib49) 2021 (bib13) 2024 (bib28) 2021 (bib6) 2021 (bib9) 2020 (bib33) 2021 (bib3) 2018 (bib17) 2016 (bib11) 2018 (bib22) 2019 (bib31) 2018 (bib2) 2020 (bib37) 2019 (bib15) 2000 (bib10) 2023 (bib34) 2017 (bib45) 2020 Wang Chenglong (e_1_3_1_53_1) 2017 Feng Yu (e_1_3_1_21_1) 2017 Duan Yan (e_1_3_1_18_1) 2016 e_1_3_1_22_1 e_1_3_1_45_1 e_1_3_1_24_1 e_1_3_1_47_1 Chane Elliot (e_1_3_1_7_1) 2021 e_1_3_1_8_1 e_1_3_1_43_1 e_1_3_1_4_1 e_1_3_1_26_1 e_1_3_1_49_1 e_1_3_1_28_1 Dillig Isil (e_1_3_1_54_1) 2018 e_1_3_1_34_1 e_1_3_1_36_1 Verma Abhinav (e_1_3_1_52_1) 2018 e_1_3_1_13_1 e_1_3_1_30_1 e_1_3_1_51_1 e_1_3_1_11_1 e_1_3_1_32_1 e_1_3_1_17_1 e_1_3_1_15_1 e_1_3_1_38_1 e_1_3_1_19_1 Feng Yu (e_1_3_1_20_1) 2018 Polikarpova Nadia (e_1_3_1_41_1) 2016 Chan Harris (e_1_3_1_40_1) 2020 Pong Vitchyr (e_1_3_1_42_1) 2020 e_1_3_1_44_1 e_1_3_1_23_1 Shah Rushi (e_1_3_1_55_1) 2020 e_1_3_1_25_1 e_1_3_1_46_1 e_1_3_1_9_1 e_1_3_1_5_1 e_1_3_1_27_1 e_1_3_1_48_1 e_1_3_1_3_1 e_1_3_1_29_1 e_1_3_1_50_1 e_1_3_1_10_1 e_1_3_1_33_1 e_1_3_1_56_1 e_1_3_1_35_1 e_1_3_1_14_1 e_1_3_1_12_1 e_1_3_1_31_1 Andre David (e_1_3_1_2_1) 2002 e_1_3_1_16_1 Bornholt James (e_1_3_1_6_1) 2016 e_1_3_1_37_1 e_1_3_1_39_1 |
| References_xml | – year: 2017 ident: bib20 article-title: Component-based synthesis of table consolidation and transformation tasks from examples publication-title: In Proceedings ofthe 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017 – year: 2020 ident: bib39 article-title: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning. publication-title: In Proceedings of the 37th International Conference on Machine Learning, ICML 2020. – year: 2016 ident: bib43 article-title: You Only Look Once: Unified, Real-Time Object Detection publication-title: In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016. 779-788 – year: 1981 ident: bib38 article-title: Karel the Robot: A Gentle Introduction to the Art of Programming (1st ed.). publication-title: John Wiley & Sons, Inc., USA. – year: 2012 ident: bib48 article-title: Mujoco: A physics engine for model-based control. publication-title: In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems – year: 2018 ident: bib19 article-title: Program synthesis using conflict-driven learning publication-title: In Proceedings ofthe 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018 – year: 2017 ident: bib21 article-title: Component-based synthesis for complex APIs publication-title: In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017 – year: 2024 ident: bib12 article-title: PLDI 2024 Artifact: Reward-Guided Synthesis ofIntelligent Agents with Control Structures doi: 10.5281/zenodo.10976438 – year: 2020 ident: bib54 article-title: Data Migration using Datalog Program Synthesis publication-title: Proc. VLDBEndow. (2020). – year: 2021 ident: bib49 article-title: Learning to Synthesize Programs as Interpretable and Generalizable Policies publication-title: In Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021 – year: 2010 ident: bib30 article-title: Efficient Exploration in Reinforcement Learning publication-title: In Encyclopedia of Machine Learning – year: 2019 ident: bib8 article-title: Execution-Guided Neural Program Synthesis publication-title: In 7th International Conference on Learning Representations, ICLR 2019 – year: 2019 ident: bib22 article-title: Quantitative Programming by Examples publication-title: CoRR abs/1909.05964 (2019) – year: 2020 ident: bib32 article-title: SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition publication-title: In 8th International Conference on Learning Representations, ICLR 2020 – year: 2018 ident: bib31 article-title: An Environment for Autonomous Driving Decision-Making – year: 2020 ident: bib9 article-title: Program Synthesis Using Deduction- Guided Reinforcement Learning publication-title: In Computer AidedVerification - 32ndInternational Conference, CAV 2020 – year: 2021 ident: bib46 article-title: Learning Symbolic Operators for Task and Motion Planning publication-title: In IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2021 – year: 2021 ident: bib6 article-title: Goal-Conditioned Reinforcement Learning with Imagined Subgoals publication-title: In Proceedings ofthe 38th International Conference on Machine Learning, ICML 2021 – year: 2019 ident: bib27 article-title: Reasoning About Physical Interactions with Object-Oriented Prediction and Planning publication-title: In 7th International Conference on Learning Representations, ICLR 2019 – year: 2020 ident: bib45 article-title: Few-Shot Bayesian Imitation Learning with Logical Program Policies publication-title: In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020 – year: 2017 ident: bib44 article-title: Proximal policy optimization algorithms publication-title: preprint arXiv:1707.06347 (2017) – year: 2017 ident: bib4 article-title: Syntia: Synthesizing the Semantics of Obfuscated Code publication-title: In 26th USENIX Security Symposium, USENIX Security 2017 – year: 2022 ident: bib42 article-title: Programmatic Reinforcement Learning without Oracles publication-title: In 10th International Conference on Learning Representations, ICLR 2022 – year: 2018 ident: bib11 article-title: Minimalistic Gridworld Environment for Gymnasium – year: 2021 ident: bib33 article-title: Discovering and Achieving Goals via World Models publication-title: In Advances in Neural Information Processing Systems, NeurIPS 2021 – year: 2021 ident: bib55 article-title: Program Synthesis Guided Reinforcement Learning for Partially Observed Environments publication-title: In Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021 – year: 2002 ident: bib1 article-title: State Abstraction for Programmable Reinforcement Learning Agents publication-title: In Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence – year: 2023 ident: bib10 article-title: Fast and Reliable Program Synthesis via User Interaction publication-title: In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023 – year: 2020 ident: bib41 article-title: Skew-Fit: State-Covering Self-Supervised Reinforcement Learning publication-title: In Proceedings of the 37th International Conference on MachineLearning, ICML 2020 – year: 2018 ident: bib35 article-title: Robust 6D Object Pose Estimation with Stochastic Congruent Sets publication-title: In British Machine Vision Conference 2018, BMVC 2018 – year: 2016 ident: bib40 article-title: Program synthesis from polymorphic refinement types publication-title: In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016 – year: 2018 ident: bib53 article-title: Program synthesis using abstraction refinement publication-title: Proc. ACM Program. Lang. POPL (2018) – year: 2019 ident: bib50 article-title: Imitation-Projected Programmatic Reinforcement Learning publication-title: In Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019 – year: 2017 ident: bib52 article-title: Synthesizing highly expressive SQL queries from input-output examples publication-title: In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017 – year: 2023 ident: bib14 article-title: Boosting Object Representation Learning via Motion and Object Continuity publication-title: In Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference, ECML PKDD 2023 – year: 2021 ident: bib28 article-title: Compositional Reinforcement Learning from Logical Specifications publication-title: In Annual Conference on Neural Information Processing Systems, NeurIPS 2021 – year: 2018 ident: bib23 article-title: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor publication-title: In International conference on machine learning, ICML 2018 – year: 2006 ident: bib29 article-title: Bandit Based Monte-Carlo Planning publication-title: In 17th European Conference on Machine Learning, ECML 2006 – year: 2024 ident: bib13 article-title: Reward-guided Synthesis of Intelligent Agents with Control Structures (Extended Version) – year: 2014 ident: bib7 article-title: Bridging boolean and quantitative synthesis using smoothed proof search publication-title: In The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages, POPL 2014 – year: 2018 ident: bib25 article-title: Syntax-Guided Synthesis with Quantitative Syntactic Objectives publication-title: In Computer Aided Verification - 30th International Conference, CAV 2018 – year: 2017 ident: bib16 article-title: One-Shot Imitation Learning publication-title: In Advances in Neural Information Processing Systems, NeurIPS 2017 – year: 2016 ident: bib17 article-title: Benchmarking Deep Reinforcement Learning for Continuous Control publication-title: In Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016 – year: 2017 ident: bib24 article-title: Inverse Reward Design. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA – year: 2020 ident: bib26 article-title: Synthesizing Programmatic Policies that Inductively Generalize publication-title: In 8th International Conference on Learning Representations, ICLR 2020 – year: 2011 ident: bib47 article-title: A new representation and associated algorithms for generalized planning publication-title: Artif. Intell. (2011) – year: 2018 ident: bib3 article-title: Verifiable Reinforcement Learning via Policy Extraction publication-title: In Advances in Neural Information Processing Systems, NeurIPS 2018 – year: 2017 ident: bib34 article-title: A self-supervised learning system for object detection using physics simulation and multi-view pose estimation publication-title: In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems – year: 2020 ident: bib2 article-title: Just-in-time learning for bottom-up enumerative synthesis publication-title: Proc. ACM Program. Lang. OOPSLA (2020) – year: 2019 ident: bib37 article-title: Planning with Goal-Conditioned Policies publication-title: In Annual Conference on Neural Information Processing Systems, NeurIPS 2019 – year: 2016 ident: bib5 article-title: Optimizing synthesis with metasketches publication-title: In Proceedings ofthe 43rdAnnual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages, POPL 2016 – year: 2000 ident: bib15 article-title: Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition publication-title: J. Artif Intell. Res. (2000) – year: 2019 ident: bib18 article-title: Write, Execute, Assess: Program Synthesis with aREPL publication-title: In Advances in Neural Information Processing Systems, NeurIPS 2019 – year: 2018 ident: bib36 article-title: Data-Efficient Hierarchical Reinforcement Learning publication-title: In Annual Conference on Neural Information Processing Systems, NeurIPS 2018 – year: 2018 ident: bib51 article-title: Programmatically Interpretable Reinforcement Learning publication-title: In Proceedings of the 35th International Conference on Machine Learning, ICML 2018 – ident: e_1_3_1_10_1 doi: 10.1007/978-3-030-53291-8_30 – ident: e_1_3_1_25_1 – year: 2002 ident: e_1_3_1_2_1 article-title: State Abstraction for Programmable Reinforcement Learning Agents publication-title: In Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence – year: 2021 ident: e_1_3_1_7_1 article-title: Goal-Conditioned Reinforcement Learning with Imagined Subgoals publication-title: In Proceedings ofthe 38th International Conference on Machine Learning, ICML 2021 – ident: e_1_3_1_56_1 – ident: e_1_3_1_13_1 doi: 10.5281/zenodo.10976438 – ident: e_1_3_1_17_1 – ident: e_1_3_1_33_1 – ident: e_1_3_1_48_1 doi: 10.1016/j.artint.2010.10.006 – ident: e_1_3_1_9_1 – year: 2020 ident: e_1_3_1_55_1 article-title: Data Migration using Datalog Program Synthesis publication-title: Proc. VLDBEndow. (2020). – year: 2016 ident: e_1_3_1_41_1 article-title: Program synthesis from polymorphic refinement types publication-title: In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016 – ident: e_1_3_1_11_1 doi: 10.1109/ASE56229.2023.00129 – ident: e_1_3_1_24_1 – ident: e_1_3_1_31_1 doi: 10.1007/978-0-387-30164-8_244 – year: 2017 ident: e_1_3_1_21_1 article-title: Component-based synthesis of table consolidation and transformation tasks from examples publication-title: In Proceedings ofthe 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017 – ident: e_1_3_1_39_1 – ident: e_1_3_1_43_1 – year: 2018 ident: e_1_3_1_52_1 article-title: Programmatically Interpretable Reinforcement Learning publication-title: In Proceedings of the 35th International Conference on Machine Learning, ICML 2018 – year: 2016 ident: e_1_3_1_6_1 article-title: Optimizing synthesis with metasketches publication-title: In Proceedings ofthe 43rdAnnual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages, POPL 2016 – ident: e_1_3_1_4_1 – ident: e_1_3_1_46_1 doi: 10.1609/aaai.v34i06.6587 – ident: e_1_3_1_32_1 – ident: e_1_3_1_51_1 – ident: e_1_3_1_8_1 doi: 10.1145/2535838.2535859 – ident: e_1_3_1_16_1 doi: 10.1007/3-540-44914-0_2 – ident: e_1_3_1_27_1 – year: 2018 ident: e_1_3_1_54_1 article-title: Program synthesis using abstraction refinement publication-title: Proc. ACM Program. Lang. POPL (2018) – year: 2018 ident: e_1_3_1_20_1 article-title: Program synthesis using conflict-driven learning publication-title: In Proceedings ofthe 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018 – ident: e_1_3_1_38_1 – ident: e_1_3_1_47_1 doi: 10.1109/IROS51168.2021.9635941 – ident: e_1_3_1_15_1 doi: 10.1007/978-3-031-43421-1_36 – year: 2020 ident: e_1_3_1_40_1 article-title: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning. publication-title: In Proceedings of the 37th International Conference on Machine Learning, ICML 2020. – year: 2016 ident: e_1_3_1_18_1 article-title: Benchmarking Deep Reinforcement Learning for Continuous Control publication-title: In Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016 – year: 2017 ident: e_1_3_1_53_1 article-title: Synthesizing highly expressive SQL queries from input-output examples publication-title: In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017 – ident: e_1_3_1_36_1 – ident: e_1_3_1_28_1 – ident: e_1_3_1_14_1 – ident: e_1_3_1_30_1 doi: 10.1007/11871842_29 – ident: e_1_3_1_49_1 doi: 10.1109/IROS.2012.6386109 – ident: e_1_3_1_19_1 – ident: e_1_3_1_22_1 doi: 10.1145/3009837.3009851 – ident: e_1_3_1_35_1 doi: 10.1109/IROS.2017.8202206 – ident: e_1_3_1_5_1 – ident: e_1_3_1_44_1 doi: 10.1109/CVPR.2016.91 – ident: e_1_3_1_12_1 – ident: e_1_3_1_37_1 – year: 2020 ident: e_1_3_1_42_1 article-title: Skew-Fit: State-Covering Self-Supervised Reinforcement Learning publication-title: In Proceedings of the 37th International Conference on MachineLearning, ICML 2020 – ident: e_1_3_1_23_1 – ident: e_1_3_1_29_1 – ident: e_1_3_1_45_1 – ident: e_1_3_1_3_1 doi: 10.1145/3428295 – ident: e_1_3_1_34_1 – ident: e_1_3_1_50_1 – ident: e_1_3_1_26_1 |
| SSID | ssj0001934839 |
| Score | 2.273322 |
| Snippet | Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to... |
| SourceID | crossref acm |
| SourceType | Index Database Publisher |
| StartPage | 1730 |
| SubjectTerms | Automatic programming Software and its engineering |
| SubjectTermsDisplay | Software and its engineering -- Automatic programming |
| Title | Reward-Guided Synthesis of Intelligent Agents with Control Structures |
| URI | https://dl.acm.org/doi/10.1145/3656447 |
| Volume | 8 |
| WOSCitedRecordID | wos001264464100072&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2475-1421 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001934839 issn: 2475-1421 databaseCode: M~E dateStart: 20170101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwELa2lEMvLYVW0BbkQ29VIImdtX1cbSkgAdq2IEEvKHYcSFWyiG4QvfTQX97xI054SNBDD4mSyTpKPN-OP09mxgi9z5XIilKSSAGeYIKi4ojnsogSquMslnlaKLtqyS7b3-dHR2IyGPxpc2GufrC65tfX4uK_qhpkoGyTOvsP6g43BQEcg9JhD2qH_aMU_0XbQNitpioMmfxVA8XzVUd2Qv3N2YfRaZfbNvbx6l9tMdnm0scVes46CWOcvclovGe-MPjArnPjamidnoGfjxsbJLDVTEvth0brtHeG5bipq076uWpsoJ-uv1cBZd_OGjco9r0SKTXRU2ncGa-UsgwU7rKf1_U9Mm99eQ9kk92POz1jmjD_yUb7U1du-q7Rp6Y-BgFmSl31zptltW8NdyEI0aVkZye-4RP0NGWZMGGBe797fjpBKLcL0oXHd7nXpu2Gb2uojTrvUZseRzlYQM_95AKPHCheooGuF9GLduEO7O34Etq8gREcMIKnJe5hBDuMYIMR7DGCO4y8QoefNg_G25FfTyPKUyJmUcL0UKZZGUsyVEPNC7MRUuhECipTXZQwIVdmSlomwIpZbP6oliHqIZhuRV6juXpa62WEcyNMGC84ySkrZR6bDOWEx0pILaheQYvQIScXrmJK28UrCLcdFC7d0sKbh3_yFj3rEPcOzcFr61U0r65m1c_LNau9v0sGZYM |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reward-Guided+Synthesis+of+Intelligent+Agents+with+Control+Structures&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Cui%2C+Guofeng&rft.au=Wang%2C+Yuning&rft.au=Qiu%2C+Wenjie&rft.au=Zhu%2C+He&rft.date=2024-06-20&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=8&rft.issue=PLDI&rft.spage=1730&rft.epage=1754&rft_id=info:doi/10.1145%2F3656447&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3656447 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon |