Reward-Guided Synthesis of Intelligent Agents with Control Structures

Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon rob...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of ACM on programming languages Vol. 8; no. PLDI; pp. 1730 - 1754
Main Authors: Cui, Guofeng, Wang, Yuning, Qiu, Wenjie, Zhu, He
Format: Journal Article
Language:English
Published: New York, NY, USA ACM 20.06.2024
Subjects:
ISSN:2475-1421, 2475-1421
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: • Software and its engineering → Automatic programming.
AbstractList Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: • Software and its engineering → Automatic programming.
Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: • Software and its engineering → Automatic programming.
ArticleNumber 217
Author Qiu, Wenjie
Wang, Yuning
Zhu, He
Cui, Guofeng
Author_xml – sequence: 1
  givenname: Guofeng
  orcidid: 0000-0002-7994-915X
  surname: Cui
  fullname: Cui, Guofeng
  email: gc669@cs.rutgers.edu
  organization: Rutgers University, New Brunswick, USA
– sequence: 2
  givenname: Yuning
  orcidid: 0009-0000-4317-9758
  surname: Wang
  fullname: Wang, Yuning
  email: yw895@rutgers.edu
  organization: Rutgers University, New Brunswick, USA
– sequence: 3
  givenname: Wenjie
  orcidid: 0000-0002-2271-6443
  surname: Qiu
  fullname: Qiu, Wenjie
  email: wenjie.qiu@rutgers.edu
  organization: Rutgers University, New Brunswick, USA
– sequence: 4
  givenname: He
  orcidid: 0000-0001-9606-150X
  surname: Zhu
  fullname: Zhu, He
  email: he.zhu.cs@rutgers.edu
  organization: Rutgers University, New Brunswick, USA
BookMark eNpNkE1Lw0AQhhepYK3Fu6e9eYru7FeSYwm1FgqC1XNIdmdtJE1kd0Ppv9fQKh5m5oX3YQ7PNZl0fYeE3AJ7AJDqUWilpUwvyJTLVCUgOUz-5SsyD-GTMQa5kJnIp2T5iofK22Q1NBYt3R67uMPQBNo7uu4itm3zgV2ki3EHemjijhZ9F33f0m30g4mDx3BDLl3VBpyf74y8Py3fiudk87JaF4tNUnGRxwRS1DVXjtVCG42ZHUcIi1DnsuZoHePaCKalA81Fyrg1KuMMALUCZsSM3J_-Gt-H4NGVX77ZV_5YAitHAeVZwA95dyIrs_-Dfstvt19WXg
Cites_doi 10.5281/zenodo.10976438
10.1007/978-3-030-53291-8_30
10.1016/j.artint.2010.10.006
10.1109/ASE56229.2023.00129
10.1007/978-0-387-30164-8_244
10.1609/aaai.v34i06.6587
10.1145/2535838.2535859
10.1007/3-540-44914-0_2
10.1109/IROS51168.2021.9635941
10.1007/978-3-031-43421-1_36
10.1007/11871842_29
10.1109/IROS.2012.6386109
10.1145/3009837.3009851
10.1109/IROS.2017.8202206
10.1109/CVPR.2016.91
10.1145/3428295
ContentType Journal Article
Copyright Owner/Author
Copyright_xml – notice: Owner/Author
DBID AAYXX
CITATION
DOI 10.1145/3656447
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2475-1421
EndPage 1754
ExternalDocumentID 10_1145_3656447
3656447
GrantInformation_xml – fundername: National Science Foundation
  grantid: CCF-2124155
  funderid: https:\/\/doi.org\/10.13039\/100000001
GroupedDBID AAKMM
AAYFX
ACM
AEFXT
AEJOY
AIKLT
AKRVB
ALMA_UNASSIGNED_HOLDINGS
GUFHI
LHSKQ
M~E
OK1
ROL
AAYXX
CITATION
ID FETCH-LOGICAL-a239t-17e6b25f0b36c6e8d6e8d33de1b94b2edf026c3064f1623702dc582011e6510c3
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001264464100072&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2475-1421
IngestDate Sat Nov 29 07:45:10 EST 2025
Mon Jul 07 16:40:28 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue PLDI
Keywords Sequential Decision Making
Program Synthesis
Language English
License This work is licensed under a Creative Commons Attribution International 4.0 License.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a239t-17e6b25f0b36c6e8d6e8d33de1b94b2edf026c3064f1623702dc582011e6510c3
ORCID 0000-0002-7994-915X
0000-0001-9606-150X
0009-0000-4317-9758
0000-0002-2271-6443
OpenAccessLink https://dl.acm.org/doi/10.1145/3656447
PageCount 25
ParticipantIDs crossref_primary_10_1145_3656447
acm_primary_3656447
PublicationCentury 2000
PublicationDate 2024-06-20
PublicationDateYYYYMMDD 2024-06-20
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-06-20
  day: 20
PublicationDecade 2020
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationTitle Proceedings of ACM on programming languages
PublicationTitleAbbrev ACM PACMPL
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
References (bib43) 2016
(bib53) 2018
(bib14) 2023
(bib40) 2016
(bib50) 2019
(bib1) 2002
(bib54) 2020
(bib23) 2018
(bib38) 1981
(bib29) 2006
(bib20) 2017
(bib30) 2010
(bib41) 2020
(bib24) 2017
(bib5) 2016
(bib16) 2017
(bib52) 2017
(bib47) 2011
(bib36) 2018
(bib21) 2017
(bib27) 2019
(bib19) 2018
(bib44) 2017
(bib55) 2021
(bib32) 2020
(bib42) 2022
(bib25) 2018
(bib35) 2018
(bib48) 2012
(bib8) 2019
(bib12) 2024
(bib51) 2018
(bib4) 2017
(bib18) 2019
(bib26) 2020
(bib46) 2021
(bib7) 2014
(bib39) 2020
(bib49) 2021
(bib13) 2024
(bib28) 2021
(bib6) 2021
(bib9) 2020
(bib33) 2021
(bib3) 2018
(bib17) 2016
(bib11) 2018
(bib22) 2019
(bib31) 2018
(bib2) 2020
(bib37) 2019
(bib15) 2000
(bib10) 2023
(bib34) 2017
(bib45) 2020
Wang Chenglong (e_1_3_1_53_1) 2017
Feng Yu (e_1_3_1_21_1) 2017
Duan Yan (e_1_3_1_18_1) 2016
e_1_3_1_22_1
e_1_3_1_45_1
e_1_3_1_24_1
e_1_3_1_47_1
Chane Elliot (e_1_3_1_7_1) 2021
e_1_3_1_8_1
e_1_3_1_43_1
e_1_3_1_4_1
e_1_3_1_26_1
e_1_3_1_49_1
e_1_3_1_28_1
Dillig Isil (e_1_3_1_54_1) 2018
e_1_3_1_34_1
e_1_3_1_36_1
Verma Abhinav (e_1_3_1_52_1) 2018
e_1_3_1_13_1
e_1_3_1_30_1
e_1_3_1_51_1
e_1_3_1_11_1
e_1_3_1_32_1
e_1_3_1_17_1
e_1_3_1_15_1
e_1_3_1_38_1
e_1_3_1_19_1
Feng Yu (e_1_3_1_20_1) 2018
Polikarpova Nadia (e_1_3_1_41_1) 2016
Chan Harris (e_1_3_1_40_1) 2020
Pong Vitchyr (e_1_3_1_42_1) 2020
e_1_3_1_44_1
e_1_3_1_23_1
Shah Rushi (e_1_3_1_55_1) 2020
e_1_3_1_25_1
e_1_3_1_46_1
e_1_3_1_9_1
e_1_3_1_5_1
e_1_3_1_27_1
e_1_3_1_48_1
e_1_3_1_3_1
e_1_3_1_29_1
e_1_3_1_50_1
e_1_3_1_10_1
e_1_3_1_33_1
e_1_3_1_56_1
e_1_3_1_35_1
e_1_3_1_14_1
e_1_3_1_12_1
e_1_3_1_31_1
Andre David (e_1_3_1_2_1) 2002
e_1_3_1_16_1
Bornholt James (e_1_3_1_6_1) 2016
e_1_3_1_37_1
e_1_3_1_39_1
References_xml – year: 2017
  ident: bib20
  article-title: Component-based synthesis of table consolidation and transformation tasks from examples
  publication-title: In Proceedings ofthe 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017
– year: 2020
  ident: bib39
  article-title: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning.
  publication-title: In Proceedings of the 37th International Conference on Machine Learning, ICML 2020.
– year: 2016
  ident: bib43
  article-title: You Only Look Once: Unified, Real-Time Object Detection
  publication-title: In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016. 779-788
– year: 1981
  ident: bib38
  article-title: Karel the Robot: A Gentle Introduction to the Art of Programming (1st ed.).
  publication-title: John Wiley & Sons, Inc., USA.
– year: 2012
  ident: bib48
  article-title: Mujoco: A physics engine for model-based control.
  publication-title: In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems
– year: 2018
  ident: bib19
  article-title: Program synthesis using conflict-driven learning
  publication-title: In Proceedings ofthe 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018
– year: 2017
  ident: bib21
  article-title: Component-based synthesis for complex APIs
  publication-title: In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017
– year: 2024
  ident: bib12
  article-title: PLDI 2024 Artifact: Reward-Guided Synthesis ofIntelligent Agents with Control Structures
  doi: 10.5281/zenodo.10976438
– year: 2020
  ident: bib54
  article-title: Data Migration using Datalog Program Synthesis
  publication-title: Proc. VLDBEndow. (2020).
– year: 2021
  ident: bib49
  article-title: Learning to Synthesize Programs as Interpretable and Generalizable Policies
  publication-title: In Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021
– year: 2010
  ident: bib30
  article-title: Efficient Exploration in Reinforcement Learning
  publication-title: In Encyclopedia of Machine Learning
– year: 2019
  ident: bib8
  article-title: Execution-Guided Neural Program Synthesis
  publication-title: In 7th International Conference on Learning Representations, ICLR 2019
– year: 2019
  ident: bib22
  article-title: Quantitative Programming by Examples
  publication-title: CoRR abs/1909.05964 (2019)
– year: 2020
  ident: bib32
  article-title: SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
  publication-title: In 8th International Conference on Learning Representations, ICLR 2020
– year: 2018
  ident: bib31
  article-title: An Environment for Autonomous Driving Decision-Making
– year: 2020
  ident: bib9
  article-title: Program Synthesis Using Deduction- Guided Reinforcement Learning
  publication-title: In Computer AidedVerification - 32ndInternational Conference, CAV 2020
– year: 2021
  ident: bib46
  article-title: Learning Symbolic Operators for Task and Motion Planning
  publication-title: In IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2021
– year: 2021
  ident: bib6
  article-title: Goal-Conditioned Reinforcement Learning with Imagined Subgoals
  publication-title: In Proceedings ofthe 38th International Conference on Machine Learning, ICML 2021
– year: 2019
  ident: bib27
  article-title: Reasoning About Physical Interactions with Object-Oriented Prediction and Planning
  publication-title: In 7th International Conference on Learning Representations, ICLR 2019
– year: 2020
  ident: bib45
  article-title: Few-Shot Bayesian Imitation Learning with Logical Program Policies
  publication-title: In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020
– year: 2017
  ident: bib44
  article-title: Proximal policy optimization algorithms
  publication-title: preprint arXiv:1707.06347 (2017)
– year: 2017
  ident: bib4
  article-title: Syntia: Synthesizing the Semantics of Obfuscated Code
  publication-title: In 26th USENIX Security Symposium, USENIX Security 2017
– year: 2022
  ident: bib42
  article-title: Programmatic Reinforcement Learning without Oracles
  publication-title: In 10th International Conference on Learning Representations, ICLR 2022
– year: 2018
  ident: bib11
  article-title: Minimalistic Gridworld Environment for Gymnasium
– year: 2021
  ident: bib33
  article-title: Discovering and Achieving Goals via World Models
  publication-title: In Advances in Neural Information Processing Systems, NeurIPS 2021
– year: 2021
  ident: bib55
  article-title: Program Synthesis Guided Reinforcement Learning for Partially Observed Environments
  publication-title: In Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021
– year: 2002
  ident: bib1
  article-title: State Abstraction for Programmable Reinforcement Learning Agents
  publication-title: In Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence
– year: 2023
  ident: bib10
  article-title: Fast and Reliable Program Synthesis via User Interaction
  publication-title: In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023
– year: 2020
  ident: bib41
  article-title: Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
  publication-title: In Proceedings of the 37th International Conference on MachineLearning, ICML 2020
– year: 2018
  ident: bib35
  article-title: Robust 6D Object Pose Estimation with Stochastic Congruent Sets
  publication-title: In British Machine Vision Conference 2018, BMVC 2018
– year: 2016
  ident: bib40
  article-title: Program synthesis from polymorphic refinement types
  publication-title: In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016
– year: 2018
  ident: bib53
  article-title: Program synthesis using abstraction refinement
  publication-title: Proc. ACM Program. Lang. POPL (2018)
– year: 2019
  ident: bib50
  article-title: Imitation-Projected Programmatic Reinforcement Learning
  publication-title: In Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019
– year: 2017
  ident: bib52
  article-title: Synthesizing highly expressive SQL queries from input-output examples
  publication-title: In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017
– year: 2023
  ident: bib14
  article-title: Boosting Object Representation Learning via Motion and Object Continuity
  publication-title: In Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference, ECML PKDD 2023
– year: 2021
  ident: bib28
  article-title: Compositional Reinforcement Learning from Logical Specifications
  publication-title: In Annual Conference on Neural Information Processing Systems, NeurIPS 2021
– year: 2018
  ident: bib23
  article-title: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
  publication-title: In International conference on machine learning, ICML 2018
– year: 2006
  ident: bib29
  article-title: Bandit Based Monte-Carlo Planning
  publication-title: In 17th European Conference on Machine Learning, ECML 2006
– year: 2024
  ident: bib13
  article-title: Reward-guided Synthesis of Intelligent Agents with Control Structures (Extended Version)
– year: 2014
  ident: bib7
  article-title: Bridging boolean and quantitative synthesis using smoothed proof search
  publication-title: In The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages, POPL 2014
– year: 2018
  ident: bib25
  article-title: Syntax-Guided Synthesis with Quantitative Syntactic Objectives
  publication-title: In Computer Aided Verification - 30th International Conference, CAV 2018
– year: 2017
  ident: bib16
  article-title: One-Shot Imitation Learning
  publication-title: In Advances in Neural Information Processing Systems, NeurIPS 2017
– year: 2016
  ident: bib17
  article-title: Benchmarking Deep Reinforcement Learning for Continuous Control
  publication-title: In Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016
– year: 2017
  ident: bib24
  article-title: Inverse Reward Design. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA
– year: 2020
  ident: bib26
  article-title: Synthesizing Programmatic Policies that Inductively Generalize
  publication-title: In 8th International Conference on Learning Representations, ICLR 2020
– year: 2011
  ident: bib47
  article-title: A new representation and associated algorithms for generalized planning
  publication-title: Artif. Intell. (2011)
– year: 2018
  ident: bib3
  article-title: Verifiable Reinforcement Learning via Policy Extraction
  publication-title: In Advances in Neural Information Processing Systems, NeurIPS 2018
– year: 2017
  ident: bib34
  article-title: A self-supervised learning system for object detection using physics simulation and multi-view pose estimation
  publication-title: In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems
– year: 2020
  ident: bib2
  article-title: Just-in-time learning for bottom-up enumerative synthesis
  publication-title: Proc. ACM Program. Lang. OOPSLA (2020)
– year: 2019
  ident: bib37
  article-title: Planning with Goal-Conditioned Policies
  publication-title: In Annual Conference on Neural Information Processing Systems, NeurIPS 2019
– year: 2016
  ident: bib5
  article-title: Optimizing synthesis with metasketches
  publication-title: In Proceedings ofthe 43rdAnnual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages, POPL 2016
– year: 2000
  ident: bib15
  article-title: Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
  publication-title: J. Artif Intell. Res. (2000)
– year: 2019
  ident: bib18
  article-title: Write, Execute, Assess: Program Synthesis with aREPL
  publication-title: In Advances in Neural Information Processing Systems, NeurIPS 2019
– year: 2018
  ident: bib36
  article-title: Data-Efficient Hierarchical Reinforcement Learning
  publication-title: In Annual Conference on Neural Information Processing Systems, NeurIPS 2018
– year: 2018
  ident: bib51
  article-title: Programmatically Interpretable Reinforcement Learning
  publication-title: In Proceedings of the 35th International Conference on Machine Learning, ICML 2018
– ident: e_1_3_1_10_1
  doi: 10.1007/978-3-030-53291-8_30
– ident: e_1_3_1_25_1
– year: 2002
  ident: e_1_3_1_2_1
  article-title: State Abstraction for Programmable Reinforcement Learning Agents
  publication-title: In Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence
– year: 2021
  ident: e_1_3_1_7_1
  article-title: Goal-Conditioned Reinforcement Learning with Imagined Subgoals
  publication-title: In Proceedings ofthe 38th International Conference on Machine Learning, ICML 2021
– ident: e_1_3_1_56_1
– ident: e_1_3_1_13_1
  doi: 10.5281/zenodo.10976438
– ident: e_1_3_1_17_1
– ident: e_1_3_1_33_1
– ident: e_1_3_1_48_1
  doi: 10.1016/j.artint.2010.10.006
– ident: e_1_3_1_9_1
– year: 2020
  ident: e_1_3_1_55_1
  article-title: Data Migration using Datalog Program Synthesis
  publication-title: Proc. VLDBEndow. (2020).
– year: 2016
  ident: e_1_3_1_41_1
  article-title: Program synthesis from polymorphic refinement types
  publication-title: In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016
– ident: e_1_3_1_11_1
  doi: 10.1109/ASE56229.2023.00129
– ident: e_1_3_1_24_1
– ident: e_1_3_1_31_1
  doi: 10.1007/978-0-387-30164-8_244
– year: 2017
  ident: e_1_3_1_21_1
  article-title: Component-based synthesis of table consolidation and transformation tasks from examples
  publication-title: In Proceedings ofthe 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017
– ident: e_1_3_1_39_1
– ident: e_1_3_1_43_1
– year: 2018
  ident: e_1_3_1_52_1
  article-title: Programmatically Interpretable Reinforcement Learning
  publication-title: In Proceedings of the 35th International Conference on Machine Learning, ICML 2018
– year: 2016
  ident: e_1_3_1_6_1
  article-title: Optimizing synthesis with metasketches
  publication-title: In Proceedings ofthe 43rdAnnual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages, POPL 2016
– ident: e_1_3_1_4_1
– ident: e_1_3_1_46_1
  doi: 10.1609/aaai.v34i06.6587
– ident: e_1_3_1_32_1
– ident: e_1_3_1_51_1
– ident: e_1_3_1_8_1
  doi: 10.1145/2535838.2535859
– ident: e_1_3_1_16_1
  doi: 10.1007/3-540-44914-0_2
– ident: e_1_3_1_27_1
– year: 2018
  ident: e_1_3_1_54_1
  article-title: Program synthesis using abstraction refinement
  publication-title: Proc. ACM Program. Lang. POPL (2018)
– year: 2018
  ident: e_1_3_1_20_1
  article-title: Program synthesis using conflict-driven learning
  publication-title: In Proceedings ofthe 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018
– ident: e_1_3_1_38_1
– ident: e_1_3_1_47_1
  doi: 10.1109/IROS51168.2021.9635941
– ident: e_1_3_1_15_1
  doi: 10.1007/978-3-031-43421-1_36
– year: 2020
  ident: e_1_3_1_40_1
  article-title: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning.
  publication-title: In Proceedings of the 37th International Conference on Machine Learning, ICML 2020.
– year: 2016
  ident: e_1_3_1_18_1
  article-title: Benchmarking Deep Reinforcement Learning for Continuous Control
  publication-title: In Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016
– year: 2017
  ident: e_1_3_1_53_1
  article-title: Synthesizing highly expressive SQL queries from input-output examples
  publication-title: In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017
– ident: e_1_3_1_36_1
– ident: e_1_3_1_28_1
– ident: e_1_3_1_14_1
– ident: e_1_3_1_30_1
  doi: 10.1007/11871842_29
– ident: e_1_3_1_49_1
  doi: 10.1109/IROS.2012.6386109
– ident: e_1_3_1_19_1
– ident: e_1_3_1_22_1
  doi: 10.1145/3009837.3009851
– ident: e_1_3_1_35_1
  doi: 10.1109/IROS.2017.8202206
– ident: e_1_3_1_5_1
– ident: e_1_3_1_44_1
  doi: 10.1109/CVPR.2016.91
– ident: e_1_3_1_12_1
– ident: e_1_3_1_37_1
– year: 2020
  ident: e_1_3_1_42_1
  article-title: Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
  publication-title: In Proceedings of the 37th International Conference on MachineLearning, ICML 2020
– ident: e_1_3_1_23_1
– ident: e_1_3_1_29_1
– ident: e_1_3_1_45_1
– ident: e_1_3_1_3_1
  doi: 10.1145/3428295
– ident: e_1_3_1_34_1
– ident: e_1_3_1_50_1
– ident: e_1_3_1_26_1
SSID ssj0001934839
Score 2.273322
Snippet Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to...
SourceID crossref
acm
SourceType Index Database
Publisher
StartPage 1730
SubjectTerms Automatic programming
Software and its engineering
SubjectTermsDisplay Software and its engineering -- Automatic programming
Title Reward-Guided Synthesis of Intelligent Agents with Control Structures
URI https://dl.acm.org/doi/10.1145/3656447
Volume 8
WOSCitedRecordID wos001264464100072&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2475-1421
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001934839
  issn: 2475-1421
  databaseCode: M~E
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwELa2lEMvLYVW0BbkQ29VIImdtX1cbSkgAdq2IEEvKHYcSFWyiG4QvfTQX97xI054SNBDD4mSyTpKPN-OP09mxgi9z5XIilKSSAGeYIKi4ojnsogSquMslnlaKLtqyS7b3-dHR2IyGPxpc2GufrC65tfX4uK_qhpkoGyTOvsP6g43BQEcg9JhD2qH_aMU_0XbQNitpioMmfxVA8XzVUd2Qv3N2YfRaZfbNvbx6l9tMdnm0scVes46CWOcvclovGe-MPjArnPjamidnoGfjxsbJLDVTEvth0brtHeG5bipq076uWpsoJ-uv1cBZd_OGjco9r0SKTXRU2ncGa-UsgwU7rKf1_U9Mm99eQ9kk92POz1jmjD_yUb7U1du-q7Rp6Y-BgFmSl31zptltW8NdyEI0aVkZye-4RP0NGWZMGGBe797fjpBKLcL0oXHd7nXpu2Gb2uojTrvUZseRzlYQM_95AKPHCheooGuF9GLduEO7O34Etq8gREcMIKnJe5hBDuMYIMR7DGCO4y8QoefNg_G25FfTyPKUyJmUcL0UKZZGUsyVEPNC7MRUuhECipTXZQwIVdmSlomwIpZbP6oliHqIZhuRV6juXpa62WEcyNMGC84ySkrZR6bDOWEx0pILaheQYvQIScXrmJK28UrCLcdFC7d0sKbh3_yFj3rEPcOzcFr61U0r65m1c_LNau9v0sGZYM
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reward-Guided+Synthesis+of+Intelligent+Agents+with+Control+Structures&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Cui%2C+Guofeng&rft.au=Wang%2C+Yuning&rft.au=Qiu%2C+Wenjie&rft.au=Zhu%2C+He&rft.date=2024-06-20&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=8&rft.issue=PLDI&rft.spage=1730&rft.epage=1754&rft_id=info:doi/10.1145%2F3656447&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3656447
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon