A synthesis of automated planning and reinforcement learning for efficient, robust decision-making

Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and computation, while the latter on interaction with the world, and experience. Planning allows robots to carry out different tasks in the same domain,...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Artificial intelligence Ročník 241; s. 103 - 130
Hlavní autoři: Leonetti, Matteo, Iocchi, Luca, Stone, Peter
Médium: Journal Article
Jazyk:angličtina
Vydáno: Amsterdam Elsevier B.V 01.12.2016
Elsevier Science Ltd
Témata:
ISSN:0004-3702, 1872-7921
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and computation, while the latter on interaction with the world, and experience. Planning allows robots to carry out different tasks in the same domain, without the need to acquire knowledge about each one of them, but relies strongly on the accuracy of the model. Reinforcement learning, on the other hand, does not require previous knowledge, and allows robots to robustly adapt to the environment, but often necessitates an infeasible amount of experience. We present Domain Approximation for Reinforcement LearnING (DARLING), a method that takes advantage of planning to constrain the behavior of the agent to reasonable choices, and of reinforcement learning to adapt to the environment, and increase the reliability of the decision making process. We demonstrate the effectiveness of the proposed method on a service robot, carrying out a variety of tasks in an office building. We find that when the robot makes decisions by planning alone on a given model it often fails, and when it makes decisions by reinforcement learning alone it often cannot complete its tasks in a reasonable amount of time. When employing DARLING, even when seeded with the same model that was used for planning alone, however, the robot can quickly learn a behavior to carry out all the tasks, improves over time, and adapts to the environment as it changes.
AbstractList Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and computation, while the latter on interaction with the world, and experience. Planning allows robots to carry out different tasks in the same domain, without the need to acquire knowledge about each one of them, but relies strongly on the accuracy of the model. Reinforcement learning, on the other hand, does not require previous knowledge, and allows robots to robustly adapt to the environment, but often necessitates an infeasible amount of experience. We present Domain Approximation for Reinforcement LearnING (DARLING), a method that takes advantage of planning to constrain the behavior of the agent to reasonable choices, and of reinforcement learning to adapt to the environment, and increase the reliability of the decision making process. We demonstrate the effectiveness of the proposed method on a service robot, carrying out a variety of tasks in an office building. We find that when the robot makes decisions by planning alone on a given model it often fails, and when it makes decisions by reinforcement learning alone it often cannot complete its tasks in a reasonable amount of time. When employing DARLING, even when seeded with the same model that was used for planning alone, however, the robot can quickly learn a behavior to carry out all the tasks, improves over time, and adapts to the environment as it changes.
Author Leonetti, Matteo
Iocchi, Luca
Stone, Peter
Author_xml – sequence: 1
  givenname: Matteo
  surname: Leonetti
  fullname: Leonetti, Matteo
  email: m.leonetti@leeds.ac.uk
  organization: Department of Computer Science, The University of Texas at Austin, 2317 Speedway, Stop D9500, Austin, TX 78712, USA
– sequence: 2
  givenname: Luca
  surname: Iocchi
  fullname: Iocchi, Luca
  email: iocchi@dis.uniroma1.it
  organization: Department of Computer, Control, and Management Engineering, Sapienza University of Rome, Via Ariosto 25, 00185 Rome, Italy
– sequence: 3
  givenname: Peter
  surname: Stone
  fullname: Stone, Peter
  email: pstone@cs.utexas.edu
  organization: Department of Computer Science, The University of Texas at Austin, 2317 Speedway, Stop D9500, Austin, TX 78712, USA
BookMark eNqFkEFP3DAQhS1EJRbaf9CDJS4cSGrHThxzQEIIChJSL-3Zcuxx621iL7ZTiX9fb7cnDnAazZv3RjPfKToOMQBCnylpKaHDl22rU_GhtF3tWiJaQvgR2tBRdI2QHT1GG1KlhgnSnaDTnLe1ZVLSDZpucH4J5Rdkn3F0WK8lLrqAxbtZh-DDT6yDxQl8cDEZWCAUPINO_0ZVwuCcN77KlzjFac0FWzA--xiaRf-uro_og9Nzhk__6xn6cX_3_fahefr29fH25qkxnLLSTBqIZFxOZOrc2FtHutFyY6vC3DQw03NrneBuFG7ihLh-kG4QdNDEwdT17AxdHPbuUnxeIRe1-Gxgrn9AXLOiY9-zQXaCVOv5K-s2rinU6xSVdGCcyV5W19XBZVLMOYFTxhdd6mclaT8rStQev9qqA361x6-IUJV1DfNX4V3yi04v78WuDzGopP54SCrv4RqwPoEpykb_9oK_3T6k-A
CitedBy_id crossref_primary_10_1017_S0263574725101914
crossref_primary_10_1007_s10994_024_06543_w
crossref_primary_10_1007_s10462_022_10389_w
crossref_primary_10_1109_MCI_2023_3245733
crossref_primary_10_1109_TITS_2020_3041228
crossref_primary_10_1016_j_robot_2023_104613
crossref_primary_10_1109_LRA_2022_3157567
crossref_primary_10_1111_exsy_12487
crossref_primary_10_1145_3627822
crossref_primary_10_3390_s22166301
crossref_primary_10_1109_TETCI_2021_3084290
crossref_primary_10_1016_j_procir_2022_09_088
crossref_primary_10_3390_s21113791
crossref_primary_10_1016_j_procs_2021_09_171
crossref_primary_10_1109_ACCESS_2021_3096662
crossref_primary_10_1109_TNNLS_2020_2977924
crossref_primary_10_1109_ACCESS_2024_3456914
crossref_primary_10_1007_s10270_022_00983_5
crossref_primary_10_1109_TIM_2022_3158384
crossref_primary_10_1109_TASE_2020_2984739
crossref_primary_10_3389_frobt_2022_819107
crossref_primary_10_3390_logistics6030063
crossref_primary_10_1007_s10489_021_02423_1
crossref_primary_10_1016_j_trac_2025_118196
crossref_primary_10_1017_S1471068420000472
crossref_primary_10_1177_0278364916688949
crossref_primary_10_1016_j_neunet_2018_02_010
crossref_primary_10_1016_j_artint_2021_103523
crossref_primary_10_1016_j_eswa_2024_124959
crossref_primary_10_1017_S1471068419000371
crossref_primary_10_1016_j_bdr_2021_100241
crossref_primary_10_1002_aaai_12053
crossref_primary_10_1109_TCYB_2019_2958912
crossref_primary_10_1038_s42256_025_01012_y
crossref_primary_10_1016_j_ins_2024_121666
crossref_primary_10_1016_j_jbusres_2019_12_035
crossref_primary_10_1016_j_engappai_2021_104382
crossref_primary_10_1016_j_neunet_2025_107254
crossref_primary_10_1016_j_robot_2020_103693
crossref_primary_10_1007_s10845_021_01758_3
Cites_doi 10.1613/jair.30
10.1023/A:1022140919877
10.1017/S1471068411000548
10.1023/A:1007694015589
10.1007/s10458-009-9081-1
10.1016/0004-3702(94)00086-G
10.1007/s10994-011-5234-y
10.1016/j.robot.2005.09.004
10.3233/AIC-2011-0491
10.1016/S0004-3702(98)00023-X
10.1016/j.artint.2013.11.002
ContentType Journal Article
Copyright 2016 Elsevier B.V.
Copyright Elsevier Science Ltd. Dec 2016
Copyright_xml – notice: 2016 Elsevier B.V.
– notice: Copyright Elsevier Science Ltd. Dec 2016
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
F28
FR3
DOI 10.1016/j.artint.2016.07.004
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
Engineering Research Database
ANTE: Abstracts in New Technology & Engineering
DatabaseTitleList Computer and Information Systems Abstracts

Technology Research Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-7921
EndPage 130
ExternalDocumentID 10_1016_j_artint_2016_07_004
S0004370216300819
GroupedDBID --K
--M
--Z
-~X
.DC
.~1
0R~
1B1
1~.
1~5
23N
4.4
457
4G.
5GY
5VS
6I.
6J9
6TJ
7-5
71M
77K
8P~
9JN
AACTN
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AAKPC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABFNM
ABFRF
ABJNI
ABMAC
ABVKL
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACNCT
ACNNM
ACRLP
ACWUS
ACZNC
ADBBV
ADEZE
ADMUD
AEBSH
AECPX
AEFWE
AEKER
AENEX
AETEA
AEXQZ
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
E3Z
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
IXB
J1W
JJJVA
KOM
KQ8
LG9
LY7
M41
MO0
MVM
N9A
NCXOZ
O-L
O9-
OAUVE
OK1
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
RNS
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TAE
TN5
TR2
TWZ
UPT
UQL
VQA
WH7
WUQ
XFK
XJE
XJT
XPP
XSW
ZMT
~02
~G-
77I
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7SC
8FD
JQ2
L7M
L~C
L~D
F28
FR3
ID FETCH-LOGICAL-c413t-bae09349b0b2f85df028d4cd49b3fb63c54ddf74f87fb400f569f6716a0feb253
ISICitedReferencesCount 70
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000387518000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0004-3702
IngestDate Sat Sep 27 22:01:46 EDT 2025
Sun Nov 09 05:40:41 EST 2025
Tue Nov 18 21:42:03 EST 2025
Sat Nov 29 07:33:18 EST 2025
Fri Feb 23 02:31:59 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Autonomous robot
Robot learning
Automated planning
Answer set programming
Reinforcement learning
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c413t-bae09349b0b2f85df028d4cd49b3fb63c54ddf74f87fb400f569f6716a0feb253
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
OpenAccessLink http://hdl.handle.net/11573/932159
PQID 1916343959
PQPubID 2038285
PageCount 28
ParticipantIDs proquest_miscellaneous_1855369270
proquest_journals_1916343959
crossref_citationtrail_10_1016_j_artint_2016_07_004
crossref_primary_10_1016_j_artint_2016_07_004
elsevier_sciencedirect_doi_10_1016_j_artint_2016_07_004
PublicationCentury 2000
PublicationDate December 2016
2016-12-00
20161201
PublicationDateYYYYMMDD 2016-12-01
PublicationDate_xml – month: 12
  year: 2016
  text: December 2016
PublicationDecade 2010
PublicationPlace Amsterdam
PublicationPlace_xml – name: Amsterdam
PublicationTitle Artificial intelligence
PublicationYear 2016
Publisher Elsevier B.V
Elsevier Science Ltd
Publisher_xml – name: Elsevier B.V
– name: Elsevier Science Ltd
References Pack Kaelbling, Littman, Cassandra (br0150) 1998; 101
Srivastava, Nguyen, Gerevini, Kambhampati, Do, Serina (br0290) 2007
Sutton (br0300) 1990
Dean, Kaelbling, Kirman, Nicholson (br0060) 1995; 76
Efthymiadis, Kudenko (br0080) 2013
Ng, Harada, Russell (br0210) 1999; vol. 99
Seijen, Sutton (br0280) 2014
Eiter, Erdem, Erdogan, Fink (br0090) 2013; 13
Džeroski, De Raedt, Driessens (br0070) 2001; 43
Sutton, Barto (br0310) 1998
van Otterlo (br0320) 2005
Newell, Simon (br0200) 1961
Gelfond, Lifschitz (br0120) 1988; vol. 88
Brafman, Tennenholtz (br0040) 2003; 3
Fern, Khardon, Tadepalli (br0100) 2011; 84
Gebser, Kaufmann, Kaminski, Ostrowski, Schaub, Schneider (br0110) 2011; 24
Lifschitz (br0170) 1999
Pinto, Fern (br0250) 2014
Grzes, Kudenko (br0140) 2008; vol. 2
Leonetti, Iocchi, Patrizi (br0160) 2012
Mann, Choe (br0190) 2011
Lifschitz (br0180) 2008
Pettersson (br0240) 2005; 53
Brenner, Nebel (br0050) 2009; 19
Box (br0030) 1979; vol. 1
Nilsson (br0220) 1994; 1
Ryan, Pendrith (br0270) 1998
Ghallab, Nau, Traverso (br0130) 2014; 208
Abbeel, Quigley, Ng (br0010) 2006; vol. 148
Barto, Mahadevan (br0020) 2003; 13
Ryan (br0260) 2002; vol. 2
Parr, Russell (br0230) 1998
Ng (10.1016/j.artint.2016.07.004_br0210) 1999; vol. 99
Lifschitz (10.1016/j.artint.2016.07.004_br0170) 1999
Pettersson (10.1016/j.artint.2016.07.004_br0240) 2005; 53
Efthymiadis (10.1016/j.artint.2016.07.004_br0080) 2013
Ghallab (10.1016/j.artint.2016.07.004_br0130) 2014; 208
Seijen (10.1016/j.artint.2016.07.004_br0280) 2014
Parr (10.1016/j.artint.2016.07.004_br0230) 1998
Lifschitz (10.1016/j.artint.2016.07.004_br0180) 2008
Sutton (10.1016/j.artint.2016.07.004_br0300) 1990
Fern (10.1016/j.artint.2016.07.004_br0100) 2011; 84
Brenner (10.1016/j.artint.2016.07.004_br0050) 2009; 19
Pinto (10.1016/j.artint.2016.07.004_br0250) 2014
Sutton (10.1016/j.artint.2016.07.004_br0310) 1998
Ryan (10.1016/j.artint.2016.07.004_br0270) 1998
Leonetti (10.1016/j.artint.2016.07.004_br0160) 2012
Box (10.1016/j.artint.2016.07.004_br0030) 1979; vol. 1
Mann (10.1016/j.artint.2016.07.004_br0190) 2011
Abbeel (10.1016/j.artint.2016.07.004_br0010) 2006; vol. 148
Nilsson (10.1016/j.artint.2016.07.004_br0220) 1994; 1
Pack Kaelbling (10.1016/j.artint.2016.07.004_br0150) 1998; 101
Grzes (10.1016/j.artint.2016.07.004_br0140) 2008; vol. 2
Eiter (10.1016/j.artint.2016.07.004_br0090) 2013; 13
Barto (10.1016/j.artint.2016.07.004_br0020) 2003; 13
Gebser (10.1016/j.artint.2016.07.004_br0110) 2011; 24
van Otterlo (10.1016/j.artint.2016.07.004_br0320) 2005
Srivastava (10.1016/j.artint.2016.07.004_br0290) 2007
Gelfond (10.1016/j.artint.2016.07.004_br0120) 1988; vol. 88
Newell (10.1016/j.artint.2016.07.004_br0200) 1961
Džeroski (10.1016/j.artint.2016.07.004_br0070) 2001; 43
Brafman (10.1016/j.artint.2016.07.004_br0040) 2003; 3
Dean (10.1016/j.artint.2016.07.004_br0060) 1995; 76
Ryan (10.1016/j.artint.2016.07.004_br0260) 2002; vol. 2
References_xml – start-page: 481
  year: 1998
  end-page: 487
  ident: br0270
  article-title: Rl-tops: an architecture for modularity and re-use in reinforcement learning
  publication-title: Proceedings of the International Conference of Machine Learning
– volume: 101
  start-page: 99
  year: 1998
  end-page: 134
  ident: br0150
  article-title: Planning and acting in partially observable stochastic domains
  publication-title: Artif. Intell.
– start-page: 135
  year: 2012
  end-page: 144
  ident: br0160
  article-title: Automatic generation and learning of finite-state controllers
  publication-title: Artificial Intelligence: Methodology, Systems, and Applications
– volume: 1
  start-page: 139
  year: 1994
  end-page: 158
  ident: br0220
  article-title: Teleo-reactive programs for agent control
  publication-title: J. Artif. Intell. Res.
– volume: vol. 1
  start-page: 201
  year: 1979
  end-page: 236
  ident: br0030
  article-title: Robustness in the strategy of scientific model building
  publication-title: Robustness in Statistics
– volume: 24
  start-page: 107
  year: 2011
  end-page: 124
  ident: br0110
  article-title: Potassco: the Potsdam answer set solving collection
  publication-title: AI Commun.
– volume: vol. 2
  start-page: 10
  year: 2008
  end-page: 22
  ident: br0140
  article-title: Plan-based reward shaping for reinforcement learning
  publication-title: Proceedings of the 4th International IEEE Conference on Intelligent Systems
– start-page: 2016
  year: 2007
  end-page: 2022
  ident: br0290
  article-title: Domain independent approaches for finding diverse plans
  publication-title: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence
– start-page: 1043
  year: 1998
  end-page: 1049
  ident: br0230
  article-title: Reinforcement learning with hierarchies of machines
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 19
  start-page: 297
  year: 2009
  end-page: 331
  ident: br0050
  article-title: Continual planning and acting in dynamic multiagent environments
  publication-title: Auton. Agents Multi-Agent Syst.
– start-page: 1
  year: 2013
  end-page: 8
  ident: br0080
  article-title: Using plan-based reward shaping to learn strategies in starcraft: broodwar
  publication-title: Proceedings of IEEE Conference on Computational Intelligence in Games
– year: 2005
  ident: br0320
  article-title: A survey of reinforcement learning in relational domains
– volume: 208
  start-page: 1
  year: 2014
  end-page: 17
  ident: br0130
  article-title: The actor's view of automated planning and acting: a position paper
  publication-title: Artif. Intell.
– start-page: 1594
  year: 2008
  end-page: 1597
  ident: br0180
  article-title: What is answer set programming?
  publication-title: Proceedings of the 23rd National Conference on Artificial Intelligence
– volume: 13
  start-page: 303
  year: 2013
  end-page: 359
  ident: br0090
  article-title: Finding similar/diverse solutions in answer set programming
  publication-title: Theory Pract. Log. Program.
– start-page: 216
  year: 1990
  end-page: 224
  ident: br0300
  article-title: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
  publication-title: Proceedings of the Seventh International Conference on Machine Learning
– volume: 76
  start-page: 35
  year: 1995
  end-page: 74
  ident: br0060
  article-title: Planning under time constraints in stochastic domains
  publication-title: Artif. Intell.
– start-page: 435
  year: 2011
  end-page: 440
  ident: br0190
  article-title: Scaling up reinforcement learning through targeted exploration
  publication-title: AAAI Conference on Artificial Intelligence
– year: 1961
  ident: br0200
  article-title: GPS, a Program that Simulates Human Thought
– volume: 3
  start-page: 213
  year: 2003
  end-page: 231
  ident: br0040
  article-title: R-max – a general polynomial time algorithm for near-optimal reinforcement learning
  publication-title: J. Mach. Learn. Res.
– volume: vol. 2
  start-page: 522
  year: 2002
  end-page: 529
  ident: br0260
  article-title: Using abstract models of behaviours to automatically generate reinforcement learning hierarchies
  publication-title: Proceedings of the International Conference of Machine Learning
– volume: vol. 148
  start-page: 1
  year: 2006
  end-page: 8
  ident: br0010
  article-title: Using inaccurate models in reinforcement learning
  publication-title: Proceedings of the 23rd International Conference on Machine Learning
– volume: vol. 88
  start-page: 1070
  year: 1988
  end-page: 1080
  ident: br0120
  article-title: The stable model semantics for logic programming
  publication-title: ICLP/SLP
– start-page: 373
  year: 1999
  end-page: 374
  ident: br0170
  article-title: Answer set planning
  publication-title: Logic Programming and Nonmonotonic Reasoning
– start-page: 692
  year: 2014
  end-page: 700
  ident: br0280
  article-title: True online TD(
  publication-title: Proceedings of the 31st International Conference on Machine Learning
– year: 1998
  ident: br0310
  article-title: Reinforcement Learning: An Introduction
– volume: 84
  start-page: 81
  year: 2011
  end-page: 107
  ident: br0100
  article-title: The first learning track of the international planning competition
  publication-title: Mach. Learn.
– volume: vol. 99
  start-page: 278
  year: 1999
  end-page: 287
  ident: br0210
  article-title: Policy invariance under reward transformations: theory and application to reward shaping
  publication-title: ICML
– start-page: 672
  year: 2014
  end-page: 681
  ident: br0250
  article-title: Learning partial policies to speedup MDP tree search
  publication-title: Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence
– volume: 13
  start-page: 41
  year: 2003
  end-page: 77
  ident: br0020
  article-title: Recent advances in hierarchical reinforcement learning
  publication-title: Discrete Event Dyn. Syst.
– volume: 53
  start-page: 73
  year: 2005
  end-page: 88
  ident: br0240
  article-title: Execution monitoring in robotics: a survey
  publication-title: Robot. Auton. Syst.
– volume: 43
  start-page: 7
  year: 2001
  end-page: 52
  ident: br0070
  article-title: Relational reinforcement learning
  publication-title: Mach. Learn.
– volume: 1
  start-page: 139
  year: 1994
  ident: 10.1016/j.artint.2016.07.004_br0220
  article-title: Teleo-reactive programs for agent control
  publication-title: J. Artif. Intell. Res.
  doi: 10.1613/jair.30
– start-page: 1
  year: 2013
  ident: 10.1016/j.artint.2016.07.004_br0080
  article-title: Using plan-based reward shaping to learn strategies in starcraft: broodwar
– year: 2005
  ident: 10.1016/j.artint.2016.07.004_br0320
– year: 1998
  ident: 10.1016/j.artint.2016.07.004_br0310
– start-page: 1594
  year: 2008
  ident: 10.1016/j.artint.2016.07.004_br0180
  article-title: What is answer set programming?
– start-page: 216
  year: 1990
  ident: 10.1016/j.artint.2016.07.004_br0300
  article-title: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
– start-page: 672
  year: 2014
  ident: 10.1016/j.artint.2016.07.004_br0250
  article-title: Learning partial policies to speedup MDP tree search
– volume: 13
  start-page: 41
  issue: 1
  year: 2003
  ident: 10.1016/j.artint.2016.07.004_br0020
  article-title: Recent advances in hierarchical reinforcement learning
  publication-title: Discrete Event Dyn. Syst.
  doi: 10.1023/A:1022140919877
– volume: 13
  start-page: 303
  issue: 5
  year: 2013
  ident: 10.1016/j.artint.2016.07.004_br0090
  article-title: Finding similar/diverse solutions in answer set programming
  publication-title: Theory Pract. Log. Program.
  doi: 10.1017/S1471068411000548
– volume: 43
  start-page: 7
  issue: 1
  year: 2001
  ident: 10.1016/j.artint.2016.07.004_br0070
  article-title: Relational reinforcement learning
  publication-title: Mach. Learn.
  doi: 10.1023/A:1007694015589
– start-page: 1043
  year: 1998
  ident: 10.1016/j.artint.2016.07.004_br0230
  article-title: Reinforcement learning with hierarchies of machines
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 19
  start-page: 297
  issue: 3
  year: 2009
  ident: 10.1016/j.artint.2016.07.004_br0050
  article-title: Continual planning and acting in dynamic multiagent environments
  publication-title: Auton. Agents Multi-Agent Syst.
  doi: 10.1007/s10458-009-9081-1
– volume: 76
  start-page: 35
  issue: 1
  year: 1995
  ident: 10.1016/j.artint.2016.07.004_br0060
  article-title: Planning under time constraints in stochastic domains
  publication-title: Artif. Intell.
  doi: 10.1016/0004-3702(94)00086-G
– start-page: 435
  year: 2011
  ident: 10.1016/j.artint.2016.07.004_br0190
  article-title: Scaling up reinforcement learning through targeted exploration
– volume: vol. 88
  start-page: 1070
  year: 1988
  ident: 10.1016/j.artint.2016.07.004_br0120
  article-title: The stable model semantics for logic programming
– year: 1961
  ident: 10.1016/j.artint.2016.07.004_br0200
– volume: vol. 2
  start-page: 522
  year: 2002
  ident: 10.1016/j.artint.2016.07.004_br0260
  article-title: Using abstract models of behaviours to automatically generate reinforcement learning hierarchies
– volume: 84
  start-page: 81
  issue: 1
  year: 2011
  ident: 10.1016/j.artint.2016.07.004_br0100
  article-title: The first learning track of the international planning competition
  publication-title: Mach. Learn.
  doi: 10.1007/s10994-011-5234-y
– start-page: 2016
  year: 2007
  ident: 10.1016/j.artint.2016.07.004_br0290
  article-title: Domain independent approaches for finding diverse plans
– start-page: 373
  year: 1999
  ident: 10.1016/j.artint.2016.07.004_br0170
  article-title: Answer set planning
– start-page: 481
  year: 1998
  ident: 10.1016/j.artint.2016.07.004_br0270
  article-title: Rl-tops: an architecture for modularity and re-use in reinforcement learning
– volume: 53
  start-page: 73
  issue: 2
  year: 2005
  ident: 10.1016/j.artint.2016.07.004_br0240
  article-title: Execution monitoring in robotics: a survey
  publication-title: Robot. Auton. Syst.
  doi: 10.1016/j.robot.2005.09.004
– volume: 24
  start-page: 107
  issue: 2
  year: 2011
  ident: 10.1016/j.artint.2016.07.004_br0110
  article-title: Potassco: the Potsdam answer set solving collection
  publication-title: AI Commun.
  doi: 10.3233/AIC-2011-0491
– volume: 101
  start-page: 99
  issue: 1
  year: 1998
  ident: 10.1016/j.artint.2016.07.004_br0150
  article-title: Planning and acting in partially observable stochastic domains
  publication-title: Artif. Intell.
  doi: 10.1016/S0004-3702(98)00023-X
– start-page: 135
  year: 2012
  ident: 10.1016/j.artint.2016.07.004_br0160
  article-title: Automatic generation and learning of finite-state controllers
– volume: vol. 99
  start-page: 278
  year: 1999
  ident: 10.1016/j.artint.2016.07.004_br0210
  article-title: Policy invariance under reward transformations: theory and application to reward shaping
– volume: 3
  start-page: 213
  issue: 2
  year: 2003
  ident: 10.1016/j.artint.2016.07.004_br0040
  article-title: R-max – a general polynomial time algorithm for near-optimal reinforcement learning
  publication-title: J. Mach. Learn. Res.
– volume: 208
  start-page: 1
  year: 2014
  ident: 10.1016/j.artint.2016.07.004_br0130
  article-title: The actor's view of automated planning and acting: a position paper
  publication-title: Artif. Intell.
  doi: 10.1016/j.artint.2013.11.002
– volume: vol. 148
  start-page: 1
  year: 2006
  ident: 10.1016/j.artint.2016.07.004_br0010
  article-title: Using inaccurate models in reinforcement learning
– volume: vol. 1
  start-page: 201
  year: 1979
  ident: 10.1016/j.artint.2016.07.004_br0030
  article-title: Robustness in the strategy of scientific model building
– volume: vol. 2
  start-page: 10
  year: 2008
  ident: 10.1016/j.artint.2016.07.004_br0140
  article-title: Plan-based reward shaping for reinforcement learning
– start-page: 692
  year: 2014
  ident: 10.1016/j.artint.2016.07.004_br0280
  article-title: True online TD(λ)
SSID ssj0003991
Score 2.5332584
Snippet Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 103
SubjectTerms Answer set programming
Approximation
Automated planning
Automation
Autonomous robot
Decision making
Decisions
Expert systems
Learning
Machine learning
Model accuracy
Reinforcement
Reinforcement learning
Reliability
Robot learning
Robots
Tasks
Title A synthesis of automated planning and reinforcement learning for efficient, robust decision-making
URI https://dx.doi.org/10.1016/j.artint.2016.07.004
https://www.proquest.com/docview/1916343959
https://www.proquest.com/docview/1855369270
Volume 241
WOSCitedRecordID wos000387518000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-7921
  dateEnd: 20180131
  omitProxy: false
  ssIdentifier: ssj0003991
  issn: 0004-3702
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bi9QwFA7DrA--eBfHXSWCb1rpJWmSx0FWVGQRXGXeStMk6LK2w0y77Jt_fU9undkZZFXwpQxpOy05X0--k5x8B6GXVJFaMpolJTFpQgSEO0LnZcLymkuIKLR0OgXfPrGTE75YiM-Tya-4F-binLUtv7wUy_9qamgDY9uts39h7vFPoQF-g9HhCGaH4x8Zfm5VCIDWBaWReug7YKXAK5ehPpFPKddOMrVxs4OxdoTPqdROVUJ7QYJVJ4d1_0qFUjzJT1e9apvSzlcu3cgV_9jS99xk-nSt7n3SgC0urrsRjV3TfPebs4dNwpBTB7-eORwmJbJyJ8Fjf7dM8L4EHFrqva_2DpczYPjC75KOHjn3WljBp2ZpsTU8Z34ZZ8_z-0mIszdOfMEmyWZeldUXN97R1P7iVkDhTTKrOMatbOxBzqjgU3Qw_3C8-DgO5sDfQtFF_-px96VLEdx_1u_Yzc4478jL6T10J0QdeO7Rch9NdPsA3Y0VPXBw8A-RnOMRPLgzeAQPjuDBAB58DTw4ggdDEx7B8xp76OAd6DxCX98dn759n4QqHEkDBKdPZK1TURAhU5kbTpUBRqpIo6ClMLIsGkqUMowYzoyEEcHQUpgSwvA6NVrmtHiMpi0g5wnCwO4V2FZLxjVpirKG8CGHmF-Vhilom6Ei9l7VBIl6WynlvIq5iGeV7_PK9nmV2twJMkPJeNfSS7TccD2LhqkCzfT0sQIs3XDnUbRjFb74dZVBgFUAradihl6Mp8FJ25W3utXdANdwSotS5Cx9-s8PP0S3N1_aEZr2q0E_Q7eai_7HevU84PYKsp-8qQ
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+synthesis+of+automated+planning+and+reinforcement+learning+for+efficient%2C+robust+decision-making&rft.jtitle=Artificial+intelligence&rft.au=Leonetti%2C+Matteo&rft.au=Iocchi%2C+Luca&rft.au=Stone%2C+Peter&rft.date=2016-12-01&rft.pub=Elsevier+B.V&rft.issn=0004-3702&rft.eissn=1872-7921&rft.volume=241&rft.spage=103&rft.epage=130&rft_id=info:doi/10.1016%2Fj.artint.2016.07.004&rft.externalDocID=S0004370216300819
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0004-3702&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0004-3702&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0004-3702&client=summon