A synthesis of automated planning and reinforcement learning for efficient, robust decision-making
Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and computation, while the latter on interaction with the world, and experience. Planning allows robots to carry out different tasks in the same domain,...
Uloženo v:
| Vydáno v: | Artificial intelligence Ročník 241; s. 103 - 130 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Amsterdam
Elsevier B.V
01.12.2016
Elsevier Science Ltd |
| Témata: | |
| ISSN: | 0004-3702, 1872-7921 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and computation, while the latter on interaction with the world, and experience. Planning allows robots to carry out different tasks in the same domain, without the need to acquire knowledge about each one of them, but relies strongly on the accuracy of the model. Reinforcement learning, on the other hand, does not require previous knowledge, and allows robots to robustly adapt to the environment, but often necessitates an infeasible amount of experience. We present Domain Approximation for Reinforcement LearnING (DARLING), a method that takes advantage of planning to constrain the behavior of the agent to reasonable choices, and of reinforcement learning to adapt to the environment, and increase the reliability of the decision making process. We demonstrate the effectiveness of the proposed method on a service robot, carrying out a variety of tasks in an office building. We find that when the robot makes decisions by planning alone on a given model it often fails, and when it makes decisions by reinforcement learning alone it often cannot complete its tasks in a reasonable amount of time. When employing DARLING, even when seeded with the same model that was used for planning alone, however, the robot can quickly learn a behavior to carry out all the tasks, improves over time, and adapts to the environment as it changes. |
|---|---|
| AbstractList | Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and computation, while the latter on interaction with the world, and experience. Planning allows robots to carry out different tasks in the same domain, without the need to acquire knowledge about each one of them, but relies strongly on the accuracy of the model. Reinforcement learning, on the other hand, does not require previous knowledge, and allows robots to robustly adapt to the environment, but often necessitates an infeasible amount of experience. We present Domain Approximation for Reinforcement LearnING (DARLING), a method that takes advantage of planning to constrain the behavior of the agent to reasonable choices, and of reinforcement learning to adapt to the environment, and increase the reliability of the decision making process. We demonstrate the effectiveness of the proposed method on a service robot, carrying out a variety of tasks in an office building. We find that when the robot makes decisions by planning alone on a given model it often fails, and when it makes decisions by reinforcement learning alone it often cannot complete its tasks in a reasonable amount of time. When employing DARLING, even when seeded with the same model that was used for planning alone, however, the robot can quickly learn a behavior to carry out all the tasks, improves over time, and adapts to the environment as it changes. |
| Author | Leonetti, Matteo Iocchi, Luca Stone, Peter |
| Author_xml | – sequence: 1 givenname: Matteo surname: Leonetti fullname: Leonetti, Matteo email: m.leonetti@leeds.ac.uk organization: Department of Computer Science, The University of Texas at Austin, 2317 Speedway, Stop D9500, Austin, TX 78712, USA – sequence: 2 givenname: Luca surname: Iocchi fullname: Iocchi, Luca email: iocchi@dis.uniroma1.it organization: Department of Computer, Control, and Management Engineering, Sapienza University of Rome, Via Ariosto 25, 00185 Rome, Italy – sequence: 3 givenname: Peter surname: Stone fullname: Stone, Peter email: pstone@cs.utexas.edu organization: Department of Computer Science, The University of Texas at Austin, 2317 Speedway, Stop D9500, Austin, TX 78712, USA |
| BookMark | eNqFkEFP3DAQhS1EJRbaf9CDJS4cSGrHThxzQEIIChJSL-3Zcuxx621iL7ZTiX9fb7cnDnAazZv3RjPfKToOMQBCnylpKaHDl22rU_GhtF3tWiJaQvgR2tBRdI2QHT1GG1KlhgnSnaDTnLe1ZVLSDZpucH4J5Rdkn3F0WK8lLrqAxbtZh-DDT6yDxQl8cDEZWCAUPINO_0ZVwuCcN77KlzjFac0FWzA--xiaRf-uro_og9Nzhk__6xn6cX_3_fahefr29fH25qkxnLLSTBqIZFxOZOrc2FtHutFyY6vC3DQw03NrneBuFG7ihLh-kG4QdNDEwdT17AxdHPbuUnxeIRe1-Gxgrn9AXLOiY9-zQXaCVOv5K-s2rinU6xSVdGCcyV5W19XBZVLMOYFTxhdd6mclaT8rStQev9qqA361x6-IUJV1DfNX4V3yi04v78WuDzGopP54SCrv4RqwPoEpykb_9oK_3T6k-A |
| CitedBy_id | crossref_primary_10_1017_S0263574725101914 crossref_primary_10_1007_s10994_024_06543_w crossref_primary_10_1007_s10462_022_10389_w crossref_primary_10_1109_MCI_2023_3245733 crossref_primary_10_1109_TITS_2020_3041228 crossref_primary_10_1016_j_robot_2023_104613 crossref_primary_10_1109_LRA_2022_3157567 crossref_primary_10_1111_exsy_12487 crossref_primary_10_1145_3627822 crossref_primary_10_3390_s22166301 crossref_primary_10_1109_TETCI_2021_3084290 crossref_primary_10_1016_j_procir_2022_09_088 crossref_primary_10_3390_s21113791 crossref_primary_10_1016_j_procs_2021_09_171 crossref_primary_10_1109_ACCESS_2021_3096662 crossref_primary_10_1109_TNNLS_2020_2977924 crossref_primary_10_1109_ACCESS_2024_3456914 crossref_primary_10_1007_s10270_022_00983_5 crossref_primary_10_1109_TIM_2022_3158384 crossref_primary_10_1109_TASE_2020_2984739 crossref_primary_10_3389_frobt_2022_819107 crossref_primary_10_3390_logistics6030063 crossref_primary_10_1007_s10489_021_02423_1 crossref_primary_10_1016_j_trac_2025_118196 crossref_primary_10_1017_S1471068420000472 crossref_primary_10_1177_0278364916688949 crossref_primary_10_1016_j_neunet_2018_02_010 crossref_primary_10_1016_j_artint_2021_103523 crossref_primary_10_1016_j_eswa_2024_124959 crossref_primary_10_1017_S1471068419000371 crossref_primary_10_1016_j_bdr_2021_100241 crossref_primary_10_1002_aaai_12053 crossref_primary_10_1109_TCYB_2019_2958912 crossref_primary_10_1038_s42256_025_01012_y crossref_primary_10_1016_j_ins_2024_121666 crossref_primary_10_1016_j_jbusres_2019_12_035 crossref_primary_10_1016_j_engappai_2021_104382 crossref_primary_10_1016_j_neunet_2025_107254 crossref_primary_10_1016_j_robot_2020_103693 crossref_primary_10_1007_s10845_021_01758_3 |
| Cites_doi | 10.1613/jair.30 10.1023/A:1022140919877 10.1017/S1471068411000548 10.1023/A:1007694015589 10.1007/s10458-009-9081-1 10.1016/0004-3702(94)00086-G 10.1007/s10994-011-5234-y 10.1016/j.robot.2005.09.004 10.3233/AIC-2011-0491 10.1016/S0004-3702(98)00023-X 10.1016/j.artint.2013.11.002 |
| ContentType | Journal Article |
| Copyright | 2016 Elsevier B.V. Copyright Elsevier Science Ltd. Dec 2016 |
| Copyright_xml | – notice: 2016 Elsevier B.V. – notice: Copyright Elsevier Science Ltd. Dec 2016 |
| DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D F28 FR3 |
| DOI | 10.1016/j.artint.2016.07.004 |
| DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ANTE: Abstracts in New Technology & Engineering Engineering Research Database |
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional Engineering Research Database ANTE: Abstracts in New Technology & Engineering |
| DatabaseTitleList | Computer and Information Systems Abstracts Technology Research Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1872-7921 |
| EndPage | 130 |
| ExternalDocumentID | 10_1016_j_artint_2016_07_004 S0004370216300819 |
| GroupedDBID | --K --M --Z -~X .DC .~1 0R~ 1B1 1~. 1~5 23N 4.4 457 4G. 5GY 5VS 6I. 6J9 6TJ 7-5 71M 77K 8P~ 9JN AACTN AAEDT AAEDW AAFTH AAIAV AAIKJ AAKOC AAKPC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABFRF ABJNI ABMAC ABVKL ABXDB ABYKQ ACDAQ ACGFO ACGFS ACNCT ACNNM ACRLP ACWUS ACZNC ADBBV ADEZE ADMUD AEBSH AECPX AEFWE AEKER AENEX AETEA AEXQZ AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 E3Z EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F0J F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ IHE IXB J1W JJJVA KOM KQ8 LG9 LY7 M41 MO0 MVM N9A NCXOZ O-L O9- OAUVE OK1 OZT P-8 P-9 P2P PC. PQQKQ Q38 R2- RIG RNS ROL RPZ SBC SDF SDG SDP SES SET SEW SPC SPCBC SST SSV SSZ T5K TAE TN5 TR2 TWZ UPT UQL VQA WH7 WUQ XFK XJE XJT XPP XSW ZMT ~02 ~G- 77I 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO ADVLN AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 7SC 8FD JQ2 L7M L~C L~D F28 FR3 |
| ID | FETCH-LOGICAL-c413t-bae09349b0b2f85df028d4cd49b3fb63c54ddf74f87fb400f569f6716a0feb253 |
| ISICitedReferencesCount | 70 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000387518000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0004-3702 |
| IngestDate | Sat Sep 27 22:01:46 EDT 2025 Sun Nov 09 05:40:41 EST 2025 Tue Nov 18 21:42:03 EST 2025 Sat Nov 29 07:33:18 EST 2025 Fri Feb 23 02:31:59 EST 2024 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Autonomous robot Robot learning Automated planning Answer set programming Reinforcement learning |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c413t-bae09349b0b2f85df028d4cd49b3fb63c54ddf74f87fb400f569f6716a0feb253 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| OpenAccessLink | http://hdl.handle.net/11573/932159 |
| PQID | 1916343959 |
| PQPubID | 2038285 |
| PageCount | 28 |
| ParticipantIDs | proquest_miscellaneous_1855369270 proquest_journals_1916343959 crossref_citationtrail_10_1016_j_artint_2016_07_004 crossref_primary_10_1016_j_artint_2016_07_004 elsevier_sciencedirect_doi_10_1016_j_artint_2016_07_004 |
| PublicationCentury | 2000 |
| PublicationDate | December 2016 2016-12-00 20161201 |
| PublicationDateYYYYMMDD | 2016-12-01 |
| PublicationDate_xml | – month: 12 year: 2016 text: December 2016 |
| PublicationDecade | 2010 |
| PublicationPlace | Amsterdam |
| PublicationPlace_xml | – name: Amsterdam |
| PublicationTitle | Artificial intelligence |
| PublicationYear | 2016 |
| Publisher | Elsevier B.V Elsevier Science Ltd |
| Publisher_xml | – name: Elsevier B.V – name: Elsevier Science Ltd |
| References | Pack Kaelbling, Littman, Cassandra (br0150) 1998; 101 Srivastava, Nguyen, Gerevini, Kambhampati, Do, Serina (br0290) 2007 Sutton (br0300) 1990 Dean, Kaelbling, Kirman, Nicholson (br0060) 1995; 76 Efthymiadis, Kudenko (br0080) 2013 Ng, Harada, Russell (br0210) 1999; vol. 99 Seijen, Sutton (br0280) 2014 Eiter, Erdem, Erdogan, Fink (br0090) 2013; 13 Džeroski, De Raedt, Driessens (br0070) 2001; 43 Sutton, Barto (br0310) 1998 van Otterlo (br0320) 2005 Newell, Simon (br0200) 1961 Gelfond, Lifschitz (br0120) 1988; vol. 88 Brafman, Tennenholtz (br0040) 2003; 3 Fern, Khardon, Tadepalli (br0100) 2011; 84 Gebser, Kaufmann, Kaminski, Ostrowski, Schaub, Schneider (br0110) 2011; 24 Lifschitz (br0170) 1999 Pinto, Fern (br0250) 2014 Grzes, Kudenko (br0140) 2008; vol. 2 Leonetti, Iocchi, Patrizi (br0160) 2012 Mann, Choe (br0190) 2011 Lifschitz (br0180) 2008 Pettersson (br0240) 2005; 53 Brenner, Nebel (br0050) 2009; 19 Box (br0030) 1979; vol. 1 Nilsson (br0220) 1994; 1 Ryan, Pendrith (br0270) 1998 Ghallab, Nau, Traverso (br0130) 2014; 208 Abbeel, Quigley, Ng (br0010) 2006; vol. 148 Barto, Mahadevan (br0020) 2003; 13 Ryan (br0260) 2002; vol. 2 Parr, Russell (br0230) 1998 Ng (10.1016/j.artint.2016.07.004_br0210) 1999; vol. 99 Lifschitz (10.1016/j.artint.2016.07.004_br0170) 1999 Pettersson (10.1016/j.artint.2016.07.004_br0240) 2005; 53 Efthymiadis (10.1016/j.artint.2016.07.004_br0080) 2013 Ghallab (10.1016/j.artint.2016.07.004_br0130) 2014; 208 Seijen (10.1016/j.artint.2016.07.004_br0280) 2014 Parr (10.1016/j.artint.2016.07.004_br0230) 1998 Lifschitz (10.1016/j.artint.2016.07.004_br0180) 2008 Sutton (10.1016/j.artint.2016.07.004_br0300) 1990 Fern (10.1016/j.artint.2016.07.004_br0100) 2011; 84 Brenner (10.1016/j.artint.2016.07.004_br0050) 2009; 19 Pinto (10.1016/j.artint.2016.07.004_br0250) 2014 Sutton (10.1016/j.artint.2016.07.004_br0310) 1998 Ryan (10.1016/j.artint.2016.07.004_br0270) 1998 Leonetti (10.1016/j.artint.2016.07.004_br0160) 2012 Box (10.1016/j.artint.2016.07.004_br0030) 1979; vol. 1 Mann (10.1016/j.artint.2016.07.004_br0190) 2011 Abbeel (10.1016/j.artint.2016.07.004_br0010) 2006; vol. 148 Nilsson (10.1016/j.artint.2016.07.004_br0220) 1994; 1 Pack Kaelbling (10.1016/j.artint.2016.07.004_br0150) 1998; 101 Grzes (10.1016/j.artint.2016.07.004_br0140) 2008; vol. 2 Eiter (10.1016/j.artint.2016.07.004_br0090) 2013; 13 Barto (10.1016/j.artint.2016.07.004_br0020) 2003; 13 Gebser (10.1016/j.artint.2016.07.004_br0110) 2011; 24 van Otterlo (10.1016/j.artint.2016.07.004_br0320) 2005 Srivastava (10.1016/j.artint.2016.07.004_br0290) 2007 Gelfond (10.1016/j.artint.2016.07.004_br0120) 1988; vol. 88 Newell (10.1016/j.artint.2016.07.004_br0200) 1961 Džeroski (10.1016/j.artint.2016.07.004_br0070) 2001; 43 Brafman (10.1016/j.artint.2016.07.004_br0040) 2003; 3 Dean (10.1016/j.artint.2016.07.004_br0060) 1995; 76 Ryan (10.1016/j.artint.2016.07.004_br0260) 2002; vol. 2 |
| References_xml | – start-page: 481 year: 1998 end-page: 487 ident: br0270 article-title: Rl-tops: an architecture for modularity and re-use in reinforcement learning publication-title: Proceedings of the International Conference of Machine Learning – volume: 101 start-page: 99 year: 1998 end-page: 134 ident: br0150 article-title: Planning and acting in partially observable stochastic domains publication-title: Artif. Intell. – start-page: 135 year: 2012 end-page: 144 ident: br0160 article-title: Automatic generation and learning of finite-state controllers publication-title: Artificial Intelligence: Methodology, Systems, and Applications – volume: 1 start-page: 139 year: 1994 end-page: 158 ident: br0220 article-title: Teleo-reactive programs for agent control publication-title: J. Artif. Intell. Res. – volume: vol. 1 start-page: 201 year: 1979 end-page: 236 ident: br0030 article-title: Robustness in the strategy of scientific model building publication-title: Robustness in Statistics – volume: 24 start-page: 107 year: 2011 end-page: 124 ident: br0110 article-title: Potassco: the Potsdam answer set solving collection publication-title: AI Commun. – volume: vol. 2 start-page: 10 year: 2008 end-page: 22 ident: br0140 article-title: Plan-based reward shaping for reinforcement learning publication-title: Proceedings of the 4th International IEEE Conference on Intelligent Systems – start-page: 2016 year: 2007 end-page: 2022 ident: br0290 article-title: Domain independent approaches for finding diverse plans publication-title: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence – start-page: 1043 year: 1998 end-page: 1049 ident: br0230 article-title: Reinforcement learning with hierarchies of machines publication-title: Adv. Neural Inf. Process. Syst. – volume: 19 start-page: 297 year: 2009 end-page: 331 ident: br0050 article-title: Continual planning and acting in dynamic multiagent environments publication-title: Auton. Agents Multi-Agent Syst. – start-page: 1 year: 2013 end-page: 8 ident: br0080 article-title: Using plan-based reward shaping to learn strategies in starcraft: broodwar publication-title: Proceedings of IEEE Conference on Computational Intelligence in Games – year: 2005 ident: br0320 article-title: A survey of reinforcement learning in relational domains – volume: 208 start-page: 1 year: 2014 end-page: 17 ident: br0130 article-title: The actor's view of automated planning and acting: a position paper publication-title: Artif. Intell. – start-page: 1594 year: 2008 end-page: 1597 ident: br0180 article-title: What is answer set programming? publication-title: Proceedings of the 23rd National Conference on Artificial Intelligence – volume: 13 start-page: 303 year: 2013 end-page: 359 ident: br0090 article-title: Finding similar/diverse solutions in answer set programming publication-title: Theory Pract. Log. Program. – start-page: 216 year: 1990 end-page: 224 ident: br0300 article-title: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming publication-title: Proceedings of the Seventh International Conference on Machine Learning – volume: 76 start-page: 35 year: 1995 end-page: 74 ident: br0060 article-title: Planning under time constraints in stochastic domains publication-title: Artif. Intell. – start-page: 435 year: 2011 end-page: 440 ident: br0190 article-title: Scaling up reinforcement learning through targeted exploration publication-title: AAAI Conference on Artificial Intelligence – year: 1961 ident: br0200 article-title: GPS, a Program that Simulates Human Thought – volume: 3 start-page: 213 year: 2003 end-page: 231 ident: br0040 article-title: R-max – a general polynomial time algorithm for near-optimal reinforcement learning publication-title: J. Mach. Learn. Res. – volume: vol. 2 start-page: 522 year: 2002 end-page: 529 ident: br0260 article-title: Using abstract models of behaviours to automatically generate reinforcement learning hierarchies publication-title: Proceedings of the International Conference of Machine Learning – volume: vol. 148 start-page: 1 year: 2006 end-page: 8 ident: br0010 article-title: Using inaccurate models in reinforcement learning publication-title: Proceedings of the 23rd International Conference on Machine Learning – volume: vol. 88 start-page: 1070 year: 1988 end-page: 1080 ident: br0120 article-title: The stable model semantics for logic programming publication-title: ICLP/SLP – start-page: 373 year: 1999 end-page: 374 ident: br0170 article-title: Answer set planning publication-title: Logic Programming and Nonmonotonic Reasoning – start-page: 692 year: 2014 end-page: 700 ident: br0280 article-title: True online TD( publication-title: Proceedings of the 31st International Conference on Machine Learning – year: 1998 ident: br0310 article-title: Reinforcement Learning: An Introduction – volume: 84 start-page: 81 year: 2011 end-page: 107 ident: br0100 article-title: The first learning track of the international planning competition publication-title: Mach. Learn. – volume: vol. 99 start-page: 278 year: 1999 end-page: 287 ident: br0210 article-title: Policy invariance under reward transformations: theory and application to reward shaping publication-title: ICML – start-page: 672 year: 2014 end-page: 681 ident: br0250 article-title: Learning partial policies to speedup MDP tree search publication-title: Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence – volume: 13 start-page: 41 year: 2003 end-page: 77 ident: br0020 article-title: Recent advances in hierarchical reinforcement learning publication-title: Discrete Event Dyn. Syst. – volume: 53 start-page: 73 year: 2005 end-page: 88 ident: br0240 article-title: Execution monitoring in robotics: a survey publication-title: Robot. Auton. Syst. – volume: 43 start-page: 7 year: 2001 end-page: 52 ident: br0070 article-title: Relational reinforcement learning publication-title: Mach. Learn. – volume: 1 start-page: 139 year: 1994 ident: 10.1016/j.artint.2016.07.004_br0220 article-title: Teleo-reactive programs for agent control publication-title: J. Artif. Intell. Res. doi: 10.1613/jair.30 – start-page: 1 year: 2013 ident: 10.1016/j.artint.2016.07.004_br0080 article-title: Using plan-based reward shaping to learn strategies in starcraft: broodwar – year: 2005 ident: 10.1016/j.artint.2016.07.004_br0320 – year: 1998 ident: 10.1016/j.artint.2016.07.004_br0310 – start-page: 1594 year: 2008 ident: 10.1016/j.artint.2016.07.004_br0180 article-title: What is answer set programming? – start-page: 216 year: 1990 ident: 10.1016/j.artint.2016.07.004_br0300 article-title: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming – start-page: 672 year: 2014 ident: 10.1016/j.artint.2016.07.004_br0250 article-title: Learning partial policies to speedup MDP tree search – volume: 13 start-page: 41 issue: 1 year: 2003 ident: 10.1016/j.artint.2016.07.004_br0020 article-title: Recent advances in hierarchical reinforcement learning publication-title: Discrete Event Dyn. Syst. doi: 10.1023/A:1022140919877 – volume: 13 start-page: 303 issue: 5 year: 2013 ident: 10.1016/j.artint.2016.07.004_br0090 article-title: Finding similar/diverse solutions in answer set programming publication-title: Theory Pract. Log. Program. doi: 10.1017/S1471068411000548 – volume: 43 start-page: 7 issue: 1 year: 2001 ident: 10.1016/j.artint.2016.07.004_br0070 article-title: Relational reinforcement learning publication-title: Mach. Learn. doi: 10.1023/A:1007694015589 – start-page: 1043 year: 1998 ident: 10.1016/j.artint.2016.07.004_br0230 article-title: Reinforcement learning with hierarchies of machines publication-title: Adv. Neural Inf. Process. Syst. – volume: 19 start-page: 297 issue: 3 year: 2009 ident: 10.1016/j.artint.2016.07.004_br0050 article-title: Continual planning and acting in dynamic multiagent environments publication-title: Auton. Agents Multi-Agent Syst. doi: 10.1007/s10458-009-9081-1 – volume: 76 start-page: 35 issue: 1 year: 1995 ident: 10.1016/j.artint.2016.07.004_br0060 article-title: Planning under time constraints in stochastic domains publication-title: Artif. Intell. doi: 10.1016/0004-3702(94)00086-G – start-page: 435 year: 2011 ident: 10.1016/j.artint.2016.07.004_br0190 article-title: Scaling up reinforcement learning through targeted exploration – volume: vol. 88 start-page: 1070 year: 1988 ident: 10.1016/j.artint.2016.07.004_br0120 article-title: The stable model semantics for logic programming – year: 1961 ident: 10.1016/j.artint.2016.07.004_br0200 – volume: vol. 2 start-page: 522 year: 2002 ident: 10.1016/j.artint.2016.07.004_br0260 article-title: Using abstract models of behaviours to automatically generate reinforcement learning hierarchies – volume: 84 start-page: 81 issue: 1 year: 2011 ident: 10.1016/j.artint.2016.07.004_br0100 article-title: The first learning track of the international planning competition publication-title: Mach. Learn. doi: 10.1007/s10994-011-5234-y – start-page: 2016 year: 2007 ident: 10.1016/j.artint.2016.07.004_br0290 article-title: Domain independent approaches for finding diverse plans – start-page: 373 year: 1999 ident: 10.1016/j.artint.2016.07.004_br0170 article-title: Answer set planning – start-page: 481 year: 1998 ident: 10.1016/j.artint.2016.07.004_br0270 article-title: Rl-tops: an architecture for modularity and re-use in reinforcement learning – volume: 53 start-page: 73 issue: 2 year: 2005 ident: 10.1016/j.artint.2016.07.004_br0240 article-title: Execution monitoring in robotics: a survey publication-title: Robot. Auton. Syst. doi: 10.1016/j.robot.2005.09.004 – volume: 24 start-page: 107 issue: 2 year: 2011 ident: 10.1016/j.artint.2016.07.004_br0110 article-title: Potassco: the Potsdam answer set solving collection publication-title: AI Commun. doi: 10.3233/AIC-2011-0491 – volume: 101 start-page: 99 issue: 1 year: 1998 ident: 10.1016/j.artint.2016.07.004_br0150 article-title: Planning and acting in partially observable stochastic domains publication-title: Artif. Intell. doi: 10.1016/S0004-3702(98)00023-X – start-page: 135 year: 2012 ident: 10.1016/j.artint.2016.07.004_br0160 article-title: Automatic generation and learning of finite-state controllers – volume: vol. 99 start-page: 278 year: 1999 ident: 10.1016/j.artint.2016.07.004_br0210 article-title: Policy invariance under reward transformations: theory and application to reward shaping – volume: 3 start-page: 213 issue: 2 year: 2003 ident: 10.1016/j.artint.2016.07.004_br0040 article-title: R-max – a general polynomial time algorithm for near-optimal reinforcement learning publication-title: J. Mach. Learn. Res. – volume: 208 start-page: 1 year: 2014 ident: 10.1016/j.artint.2016.07.004_br0130 article-title: The actor's view of automated planning and acting: a position paper publication-title: Artif. Intell. doi: 10.1016/j.artint.2013.11.002 – volume: vol. 148 start-page: 1 year: 2006 ident: 10.1016/j.artint.2016.07.004_br0010 article-title: Using inaccurate models in reinforcement learning – volume: vol. 1 start-page: 201 year: 1979 ident: 10.1016/j.artint.2016.07.004_br0030 article-title: Robustness in the strategy of scientific model building – volume: vol. 2 start-page: 10 year: 2008 ident: 10.1016/j.artint.2016.07.004_br0140 article-title: Plan-based reward shaping for reinforcement learning – start-page: 692 year: 2014 ident: 10.1016/j.artint.2016.07.004_br0280 article-title: True online TD(λ) |
| SSID | ssj0003991 |
| Score | 2.5332584 |
| Snippet | Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and... |
| SourceID | proquest crossref elsevier |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 103 |
| SubjectTerms | Answer set programming Approximation Automated planning Automation Autonomous robot Decision making Decisions Expert systems Learning Machine learning Model accuracy Reinforcement Reinforcement learning Reliability Robot learning Robots Tasks |
| Title | A synthesis of automated planning and reinforcement learning for efficient, robust decision-making |
| URI | https://dx.doi.org/10.1016/j.artint.2016.07.004 https://www.proquest.com/docview/1916343959 https://www.proquest.com/docview/1855369270 |
| Volume | 241 |
| WOSCitedRecordID | wos000387518000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-7921 dateEnd: 20180131 omitProxy: false ssIdentifier: ssj0003991 issn: 0004-3702 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bi9QwFA7DrA--eBfHXSWCb1rpJWmSx0FWVGQRXGXeStMk6LK2w0y77Jt_fU9undkZZFXwpQxpOy05X0--k5x8B6GXVJFaMpolJTFpQgSEO0LnZcLymkuIKLR0OgXfPrGTE75YiM-Tya-4F-binLUtv7wUy_9qamgDY9uts39h7vFPoQF-g9HhCGaH4x8Zfm5VCIDWBaWReug7YKXAK5ehPpFPKddOMrVxs4OxdoTPqdROVUJ7QYJVJ4d1_0qFUjzJT1e9apvSzlcu3cgV_9jS99xk-nSt7n3SgC0urrsRjV3TfPebs4dNwpBTB7-eORwmJbJyJ8Fjf7dM8L4EHFrqva_2DpczYPjC75KOHjn3WljBp2ZpsTU8Z34ZZ8_z-0mIszdOfMEmyWZeldUXN97R1P7iVkDhTTKrOMatbOxBzqjgU3Qw_3C8-DgO5sDfQtFF_-px96VLEdx_1u_Yzc4478jL6T10J0QdeO7Rch9NdPsA3Y0VPXBw8A-RnOMRPLgzeAQPjuDBAB58DTw4ggdDEx7B8xp76OAd6DxCX98dn759n4QqHEkDBKdPZK1TURAhU5kbTpUBRqpIo6ClMLIsGkqUMowYzoyEEcHQUpgSwvA6NVrmtHiMpi0g5wnCwO4V2FZLxjVpirKG8CGHmF-Vhilom6Ei9l7VBIl6WynlvIq5iGeV7_PK9nmV2twJMkPJeNfSS7TccD2LhqkCzfT0sQIs3XDnUbRjFb74dZVBgFUAradihl6Mp8FJ25W3utXdANdwSotS5Cx9-s8PP0S3N1_aEZr2q0E_Q7eai_7HevU84PYKsp-8qQ |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+synthesis+of+automated+planning+and+reinforcement+learning+for+efficient%2C+robust+decision-making&rft.jtitle=Artificial+intelligence&rft.au=Leonetti%2C+Matteo&rft.au=Iocchi%2C+Luca&rft.au=Stone%2C+Peter&rft.date=2016-12-01&rft.pub=Elsevier+B.V&rft.issn=0004-3702&rft.eissn=1872-7921&rft.volume=241&rft.spage=103&rft.epage=130&rft_id=info:doi/10.1016%2Fj.artint.2016.07.004&rft.externalDocID=S0004370216300819 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0004-3702&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0004-3702&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0004-3702&client=summon |