Reinforcement learning algorithms with function approximation: Recent advances and applications

In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques in RL will be essential to deal with MDPs with large or continuous state and actio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences Jg. 261; S. 1 - 31
Hauptverfasser: Xu, Xin, Zuo, Lei, Huang, Zhenhua
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Inc 10.03.2014
Schlagworte:
ISSN:0020-0255, 1872-6291
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques in RL will be essential to deal with MDPs with large or continuous state and action spaces. In this paper, a comprehensive survey is given on recent developments in RL algorithms with function approximation. From a theoretical point of view, the convergence and feature representation of RL algorithms are analyzed. From an empirical aspect, the performance of different RL algorithms was evaluated and compared in several benchmark learning prediction and learning control tasks. The applications of RL with function approximation are also discussed. At last, future works on RL with function approximation are suggested.
AbstractList In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques in RL will be essential to deal with MDPs with large or continuous state and action spaces. In this paper, a comprehensive survey is given on recent developments in RL algorithms with function approximation. From a theoretical point of view, the convergence and feature representation of RL algorithms are analyzed. From an empirical aspect, the performance of different RL algorithms was evaluated and compared in several benchmark learning prediction and learning control tasks. The applications of RL with function approximation are also discussed. At last, future works on RL with function approximation are suggested.
Author Zuo, Lei
Huang, Zhenhua
Xu, Xin
Author_xml – sequence: 1
  givenname: Xin
  surname: Xu
  fullname: Xu, Xin
  email: xuxin_mail@263.net, xinxu@nudt.edu.cn
– sequence: 2
  givenname: Lei
  surname: Zuo
  fullname: Zuo, Lei
– sequence: 3
  givenname: Zhenhua
  surname: Huang
  fullname: Huang, Zhenhua
BookMark eNp9kMtOwzAQRS1UJNrCB7DLDySMncROYYUqXlIlpArW1tSeFFepU9mhwN-TtKxYdHU10pzRnTNhI996YuyaQ8aBy5tN5nzMBPA8gyqDXJ2xMa-USKWY8REbAwhIQZTlBZvEuAGAQkk5ZnpJztdtMLQl3yUNYfDOrxNs1m1w3cc2Jl99JPWnN51rfYK7XWi_3RaH6TZZkhk4tHv0hmKC3g4rjTOHhXjJzmtsIl395ZS9Pz68zZ_TxevTy_x-kRoxU11aKs4tB2NtLq0QlnBVFKuCakSu6twIa8qyXMm8RihwBihIzspSygKU4RXmU8aPd01oYwxU613oS4YfzUEPhvRG94b0YEhDpXtDPaP-McZ1h9pdQNecJO-OJPUv7R0FHY2jXoB1gUynbetO0L-IhIWy
CitedBy_id crossref_primary_10_1080_0305215X_2021_2024177
crossref_primary_10_1080_09540091_2022_2025765
crossref_primary_10_1109_TIE_2017_2708002
crossref_primary_10_1016_j_ins_2014_03_104
crossref_primary_10_1038_s41598_022_06326_0
crossref_primary_10_1145_3729217
crossref_primary_10_3389_fpsyg_2020_560080
crossref_primary_10_1080_00207721_2025_2469821
crossref_primary_10_1016_j_multra_2025_100190
crossref_primary_10_3390_s22186992
crossref_primary_10_1016_j_nantod_2022_101665
crossref_primary_10_1109_TITS_2022_3179893
crossref_primary_10_1109_TSMC_2018_2870724
crossref_primary_10_1016_j_ins_2014_07_008
crossref_primary_10_1016_j_cor_2019_104850
crossref_primary_10_1109_ACCESS_2020_2964042
crossref_primary_10_1155_2015_760459
crossref_primary_10_1080_08839514_2018_1525852
crossref_primary_10_7717_peerj_cs_755
crossref_primary_10_1108_IMDS_09_2024_0874
crossref_primary_10_1155_2014_173290
crossref_primary_10_1109_TNSM_2021_3066625
crossref_primary_10_1155_2014_628798
crossref_primary_10_1080_00207179_2015_1068955
crossref_primary_10_1016_j_vehcom_2018_04_001
crossref_primary_10_1016_j_ins_2024_120736
crossref_primary_10_1260_1748_3018_9_4_449
crossref_primary_10_1145_3596222
crossref_primary_10_1016_j_ifacol_2020_12_2292
crossref_primary_10_1016_j_ins_2018_01_032
crossref_primary_10_1049_iet_its_2017_0153
crossref_primary_10_1016_j_ins_2015_04_005
crossref_primary_10_1007_s10462_023_10450_2
crossref_primary_10_3233_ICA_160531
crossref_primary_10_1007_s11280_023_01158_y
crossref_primary_10_1016_j_ins_2022_08_079
crossref_primary_10_1016_j_adhoc_2022_103080
crossref_primary_10_1109_TWC_2022_3147411
crossref_primary_10_3390_designs7010018
crossref_primary_10_1049_joe_2019_1215
crossref_primary_10_1109_TWC_2018_2890057
crossref_primary_10_3390_math9222970
crossref_primary_10_20965_jaciii_2016_p1135
crossref_primary_10_1109_TSMC_2023_3305498
crossref_primary_10_1016_j_ins_2014_12_059
crossref_primary_10_1016_j_ins_2016_05_034
crossref_primary_10_3390_ma15144825
crossref_primary_10_1007_s11431_021_2004_9
crossref_primary_10_1016_j_asoc_2018_01_027
crossref_primary_10_1109_ACCESS_2019_2907618
crossref_primary_10_1002_acs_2475
crossref_primary_10_1016_j_ins_2021_07_060
crossref_primary_10_3233_JIFS_17052
crossref_primary_10_3390_s23115206
crossref_primary_10_1049_iet_its_2014_0156
crossref_primary_10_1109_TSMC_2017_2698473
crossref_primary_10_1177_0142331216649655
crossref_primary_10_1287_moor_2022_0241
crossref_primary_10_1016_j_ins_2020_06_010
crossref_primary_10_1088_1742_6596_1920_1_012084
crossref_primary_10_1016_j_ifacol_2018_11_115
crossref_primary_10_1109_LRA_2020_3013920
crossref_primary_10_1007_s42979_024_02831_3
crossref_primary_10_1016_j_enbuild_2017_08_052
crossref_primary_10_1002_oca_2791
crossref_primary_10_1007_s11227_018_2515_2
crossref_primary_10_1007_s00521_017_3066_9
crossref_primary_10_3390_en9090725
crossref_primary_10_1109_TCYB_2015_2478857
crossref_primary_10_1109_TVT_2017_2724060
crossref_primary_10_1109_JIOT_2019_2957313
crossref_primary_10_1016_j_neunet_2018_07_018
crossref_primary_10_1080_08839514_2024_2383101
crossref_primary_10_1007_s10898_018_0698_y
crossref_primary_10_1017_S026357471800111X
crossref_primary_10_1109_TMECH_2018_2817495
crossref_primary_10_1109_ACCESS_2019_2926642
crossref_primary_10_1016_j_ins_2021_04_092
crossref_primary_10_1016_j_ins_2014_05_050
crossref_primary_10_1016_j_asoc_2014_10_005
crossref_primary_10_1016_j_adhoc_2024_103751
crossref_primary_10_1016_j_apenergy_2022_120212
crossref_primary_10_1109_TSMC_2017_2712561
crossref_primary_10_1109_ACCESS_2024_3387273
crossref_primary_10_1007_s10489_019_01417_4
crossref_primary_10_1016_j_ins_2015_04_044
crossref_primary_10_1007_s42979_021_00934_9
crossref_primary_10_1177_09544100221149231
crossref_primary_10_1016_j_ins_2020_03_105
crossref_primary_10_1109_OJITS_2025_3550312
crossref_primary_10_1109_ACCESS_2021_3063463
crossref_primary_10_1007_s11276_019_02225_x
crossref_primary_10_1016_j_jfranklin_2017_06_017
crossref_primary_10_1109_TSC_2021_3075988
crossref_primary_10_1080_00207179_2016_1185802
crossref_primary_10_1016_j_ins_2018_01_005
crossref_primary_10_1109_TSMC_2019_2926806
crossref_primary_10_1109_JSYST_2017_2720682
crossref_primary_10_1016_j_eswa_2023_120495
Cites_doi 10.1061/(ASCE)0733-947X(2003)129:3(278)
10.1007/s10994-011-5254-7
10.1007/s10514-009-9120-4
10.1016/j.automatica.2009.07.008
10.1109/5326.704593
10.1016/j.neunet.2009.03.008
10.1109/34.659932
10.1145/1102351.1102421
10.1109/MCI.2009.932261
10.1023/A:1022633531479
10.1023/A:1017936530646
10.1016/j.neucom.2007.11.026
10.1023/A:1022657612745
10.1023/A:1022140919877
10.1016/j.automatica.2010.02.018
10.1613/jair.639
10.1109/TNN.2011.2168422
10.1109/TSMC.1983.6313077
10.1016/S0377-2217(02)00874-3
10.1109/TPAMI.2005.201
10.1109/TNN.2011.2168538
10.1109/TNN.2011.2132737
10.1613/jair.806
10.2514/3.21715
10.1023/A:1022192903948
10.1109/JOE.2004.835805
10.1007/s10994-006-8365-9
10.1016/j.ins.2007.03.012
10.1613/jair.946
10.1109/TNN.2005.853408
10.1109/ICCW.2010.5503970
10.1007/s10994-009-5128-4
10.1109/TSMCB.2008.925890
10.1109/MCAS.2009.933854
10.1109/TCIAIG.2010.2100395
10.1007/978-3-540-45167-9_11
10.1109/TPWRS.2004.831259
10.1007/978-3-540-76928-6_8
10.1023/A:1022632907294
10.1049/iet-its.2009.0070
10.1109/ADPRL.2007.368190
10.1016/j.neunet.2009.03.012
10.1109/TNN.2003.813839
10.7551/mitpress/7503.003.0151
10.1109/JSAC.2012.120106
10.1109/ROBOT.2001.932842
10.1109/9.580874
10.1109/37.126844
10.1109/ACC.2009.5160611
10.1016/S0020-0255(02)00223-2
10.1023/A:1017928328829
10.1023/A:1007678930559
10.1016/j.asoc.2009.10.003
10.1109/TITS.2010.2091408
10.1109/TNN.2007.899161
10.1109/ICASSP.2012.6288330
10.1016/j.automatica.2008.08.017
10.1023/A:1007518724497
10.1177/105971230501300301
10.1016/j.rcim.2010.06.019
10.1049/iet-com.2010.0258
10.1109/TPWRS.2006.888977
10.1162/neco.1994.6.2.215
10.1016/j.ins.2005.10.009
10.1109/TSMC.1973.4309272
10.1109/98.788210
10.1049/iet-its.2009.0096
10.1023/B:MACH.0000039779.47329.3a
10.1145/1553374.1553501
10.1162/089976698300017746
10.1007/s10994-006-8258-y
10.1109/TPWRS.2003.821457
10.1109/72.788641
10.1023/A:1022693225949
10.7551/mitpress/7503.003.0062
10.1016/j.automatica.2011.03.005
10.1016/S0167-6911(97)90015-3
10.1162/neco.1994.6.6.1185
10.1016/S0004-3702(99)00052-1
10.1016/j.neucom.2008.11.031
10.1016/j.automatica.2006.09.019
10.1109/TNNLS.2012.2236354
10.1109/TVT.2010.2043124
10.1007/3-540-44914-0_2
10.1023/A:1018056104778
10.1109/TPWRS.2006.882467
10.1109/TNN.2002.1000146
10.1109/72.623201
10.1109/TPWRS.2010.2102372
10.1109/IROS.2006.282564
ContentType Journal Article
Copyright 2013 Elsevier Inc.
Copyright_xml – notice: 2013 Elsevier Inc.
DBID AAYXX
CITATION
DOI 10.1016/j.ins.2013.08.037
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Library & Information Science
EISSN 1872-6291
EndPage 31
ExternalDocumentID 10_1016_j_ins_2013_08_037
S0020025513005975
GroupedDBID --K
--M
--Z
-~X
.DC
.~1
0R~
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXUO
AAYFN
ABAOU
ABBOA
ABFNM
ABJNI
ABMAC
ABUCO
ABXDB
ABYKQ
ACAZW
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
ADGUI
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGVJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
ARUGR
AXJTR
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
LG9
LY1
M41
MHUIS
MO0
MS~
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
RIG
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SSB
SSD
SST
SSV
SSW
SSZ
T5K
TN5
TWZ
WH7
XPP
ZMT
~02
~G-
1OL
29I
77I
9DU
AAAKG
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABEFU
ABWVN
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
ADVLN
AEIPS
AEUPX
AFFNX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
HLZ
HVGLF
HZ~
H~9
R2-
SBC
SDS
SEW
UHS
WUQ
YYP
ZY4
~HD
ID FETCH-LOGICAL-c297t-5711d10cdd36d22deab44b4efaa17f3c2dc555b63fa04a90a2e695566407c18a3
ISICitedReferencesCount 140
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000331689700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0020-0255
IngestDate Sat Nov 29 07:29:45 EST 2025
Tue Nov 18 21:50:03 EST 2025
Fri Feb 23 02:23:16 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Approximate dynamic programming
Function approximation
Learning control
Generalization
Reinforcement learning
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c297t-5711d10cdd36d22deab44b4efaa17f3c2dc555b63fa04a90a2e695566407c18a3
PageCount 31
ParticipantIDs crossref_primary_10_1016_j_ins_2013_08_037
crossref_citationtrail_10_1016_j_ins_2013_08_037
elsevier_sciencedirect_doi_10_1016_j_ins_2013_08_037
PublicationCentury 2000
PublicationDate 2014-03-10
PublicationDateYYYYMMDD 2014-03-10
PublicationDate_xml – month: 03
  year: 2014
  text: 2014-03-10
  day: 10
PublicationDecade 2010
PublicationTitle Information sciences
PublicationYear 2014
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References Xu, Hou, Lian, He (b0735) 2013; 24
Söderström, Stoica (b0540) 1983
Wang, Cheng, Yi (b0660) 2007; 177
Al-Tamimi, Abu-Khalaf, Lewis (b0015) 2006
Dietterich (b0185) 2000; 13
Wang, Zhang, Liu (b0665) 2009
Farahmand, Ghavamzadeh, Szepesvári, Mannor (b0235) 2008
M. Riedmiller, M. Montemerlo, et al., Learning to drive in 20
Vrabie, Lewis, Abu-Khalaf (b0650) 2009; 45
Mitola, Maguire (b0395) 1999; 6
Peng, Bhaun (b0440) 1998; 20
min, in: Proceedings of the FBIT 2007 Conference, Jeju, Korea, 2007.
Y. Engel, S. Mannor, R. Meir, “Bayes meets bellman: the Gaussian Process approach to temporal difference learning, in: Proceedings of the Twentieth International Conference of Machine Learning, Washington, DC, 2003, pp. 154–161.
Ng, Kim (b0415) 2004; 16
Shimokawa, Suzuki (b0515) 2009; 72
W. Zhang, T.G. Dietterich. A reinforcement learning approach to job-shop scheduling, in: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI 1995), 1995, pp. 1114–1120.
Parr, Russell (b0430) 1998
Iftekharuddin (b0295) 2011; 22
S. Mahadevan, Proto-value functions: developmental reinforcement learning, in: Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 553–560.
Stone, Sutton (b0545) 2005; 13
Xu, He, Hu (b0705) 2002; 16
Tsitsiklis, Roy (b0615) 1997; 42
Liu, Zhang, Zhang (b0355) 2005; 16
Sutton, Szepesvari, Maei (b0575) 2009; vol. 21
Bach, Jordan (b0040) 2002; 3
Mahadevan, Maggioni (b0375) 2007; 8
R. Sutton, Adapting bias by gradient descent: an incremental version of delta-bar-delta, in: Proceedings of the 10th National Conference on Artificial Intelligence, 1992, pp. 171–176.
P.J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavior Sciences, Ph.D. thesis, Committee Appl. Math. Harvard Univ., 1974.
Singh, Bertsekas (b0525) 1997; 9
Balaji, German (b0060) 2010; 4
Bhatnagar, Sutton, Ghavamzadeh, Lee (b0095) 2009; 45
Baxter, Bartlett (b0085) 2001; 15
K.-L.A. Yau, P. Komisarczuk, et al., Applications of reinforcement learning to cognitive radio networks, 2010 IEEE International Conference on Communication Workshops (ICC), 2010, pp. 1–6.
S.Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in: Proceedings of the Fourth Connectionist Models Summer School, 1993, pp. 255–263.
Mohagheghi, Venayagamoorthy (b0400) 2006; 21
Borkar (b0105) 2008
Nanduri, Das (b0405) 2007; 22
Samuel (b0495) 1959; 3
Xu, Hu, Lu (b0715) 2007; 18
Jiang, Grace (b0310) 2011; 5
N. Kohl, P. Stone, Machine learning for fast quadrupedal locomotion, in: D.L. McGuinness, G. Ferguson (Eds.), Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI 2004), AAAI Press, Menlo Park, pp. 611–616.
Vlachogiannis, Hatziargyriou (b0645) 2004; 19
Vamvoudakis, Lewis (b0625) 2010; 46
W.T.B. Uther, M.M. Veloso, Tree based discretization for continuous state space reinforcement learning, in: Proceedings of AAAI-98, 1998, pp. 769–774.
Johns, Petrik, Mahadevan (b0315) 2009; 76
Boyan, Littman (b0120) 1994; 6
Jaradat, AI-Rousan (b0305) 2011; 27
Sutton, Barto, Williams (b0555) 1992; 12
Venayagamoorthy, Harley, Wunsch (b0640) 2002; 13
Xu, Xie, Hu, Lu (b0710) 2005; 11
Bagnell, Schneider (b0050) 2003
Hu, Yue (b0290) 2008
Ormoneit, Sen (b0425) 2002; 49
Szepesvári (b0595) 2010
Zhou (b0770) 2002; 145
Crites, Barto (b0160) 1996; 8
Carreras, Yuh (b0150) 2005; 30
Lagoudakis, Parr (b0340) 2003; 4
Ernst, Geurts, Wehenkel (b0220) 2005; 6
Zhang, Cui, Zhang, Luo (b0755) 2011; 22
Widrow, Gupta, Maitra (b0695) 1973; SMC-3
Galindo-Serrano, Giupponi (b0240) 2010; 59
Riedmiller, Gabel (b0480) 2009; 27
B. Hengst, Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning, in: AI 2007: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 4830, 2007, pp. 58–67.
Xu, Liu, Yang, Hu (b0725) 2011; 22
J. Peters, S. Vijayakumar, S. Schaal, Reinforcement learning for humanoid robotics, in: IEEE/RSJ International Conference on Humanoid Robotics, 2003.
Tesauro (b0600) 1994; 6
Xu (b0720) 2010; 10
Vamvoudakis, Lewis (b0630) 2011; 47
Balakrishnan, Biega (b0065) 1996; 19
Bertsekas, Tsitsiklis (b0090) 1996
Driessens, Ramon, Gärtner (b0245) 2006; 64
Lewis, Lendaris, Liu (b0345) 2008; 38
Maei, Szepesvári, Bhatnagar, Sutton (b0365) 2010
Amari (b0020) 1998; 10
Prashanth, Bhatnagar (b0465) 2011; 12
Arel, Liu (b0035) 2010; 4
Dietterich, Wang (b0180) 2002; vol. 14
Nedic, Bertsekas (b0410) 2003; 13
J.A. Bagnell, J.G. Schneider, Autonomous helicopter control using reinforcement learning policy search methods, in: Proceedings of the 2001 IEEE International Conference on Robotics & Automation, Seoul, Korea, 2001, pp. 1615–1620.
Barto, Sutton, Anderson (b0080) 1983; 13
C. Darken, J. Moody, Note on learning rate schedules for stochastic optimization, in: Lippman, et al. (Eds.), Advances in Neural Information Processing Systems, vol. 3, 1991, pp. 1009–1016.
Al-Tamimi, Lewis, Abu-Khalaf (b0010) 2007; 43
P.J. Werbos, Using ADP to understand and replicate brain intelligence: the next level design, in: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, pp. 209–216.
Xu (b0700) 2010
Konda, Tsitsiklis (b0335) 2000; vol. 12
Singh, Yee (b0535) 1994; 16
T.G. Dietterich, State abstraction in MAXQ hierarchical reinforcement learning, in: S.A. Solla, T.K. Leen, K.R. Muller (Eds.), Advances in Neural Information Processing Systems, NIPS, 2000, pp. 994–1000.
Driessens, Ramon (b0205) 2003
Yin, Bhanu (b0745) 2005; 27
R. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari, E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation, in: Proceedings of the 26th Annual International Conference on Machine Learning (ICML-09), 2009, pp. 993–1000.
Maei, Szepesvári, Bhatnagar, Precup, Sutton (b0360) 2010; vol. 22
S. Bradtke, B. Ydstie, A. Barto, Adaptive linear quadratic control using policy iteration, Univ. Massachusetts, Amherst, MA, Tech. Rep. CMPSCI-94-49, June 1994.
Yu, Zhou (b0750) 2011; 26
Farahmand, Szepesvári (b0230) 2011; 85
Busoniu, Babuska, De Schutter, Ernst (b0140) 2010
C. Watkins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge Univ., Cambridge, England, 1989.
Driessens, Ramon, Blockeel (b0200) 2001; vol. 2167
Singh, Jaakkola, Littman, Szepesvari (b0530) 2000; 38
Silver, Sutton (b0520) 2007
Dayan (b0170) 1992; 8
Werbos (b0680) 2009
Driessens, Dzeroski (b0195) 2004; 57
J. Boyan, A.W. Moore, Generalization in reinforcement learning: safely approximating the value function, in: Advances in Neural Information Processing Systems, 1995, pp. 369–376.
Cao (b0145) 2009
McPartl, Gallagher (b0385) 2011; 3
A. Antos, R. Munos, C. Szepesvari, Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems, in: 2009 American Control Conference, Hyatt Regency Riverfront, St. Louis, MO, USA, June 10–12, pp. 725–730.
T. Gärtner, P. Flach, S. Wrobel, On graph kernels: hardness results and efficient alternatives, in: M.W.B. Scholkopf (Ed.), Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop, 2003, pp. 129–143.
J.N. Tsitsiklis, Asynchronous Stochastic Approximation and Q-learning, Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems, MIT, Cambridge, MA, 1993.
Brartke, Barto (b0135) 1996; 22
R. Sutton, A.G. Barto, A temporal-difference model of classical conditioning, in: Proceedings of the 9th Annual Conference Cognitive Science Society, 1987, pp. 355–378.
A.R. Mahmood, R. Sutton, T.Degris, P.M. Pilarski, Tuning-free step-size adaptation, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012.
Sutton, Barto (b0550) 1998
Peters, Schaal (b0445) 2008; 71
Abdulhai, Pringle (b0005) 2003; 129
M.L. Minsky, Theory of Neural-Analog Reinforcement Systems and its Application to the Brain-Model Problem, Ph.D. Thesis, Princeton University, 1954.
Schwartz (b0510) 1993
Baird (b0055) 1995
M. Ghavamzadeh, Y. Engel, Bayesian policy gradient algorithms, in: Advances in Neural Information Processing Systems, 2006, pp. 457–464.
Enns, Si (b0215) 2003; 14
Gosavi (b0255) 2004; 155
Xu, Liu, Hu (b0730) 2011; 15
George, Powell (b0260) 2006; 65
Borkar (b0100) 1997; 29
Sutton (b0565) 1988; 3
Jaakkola, Jordan, Singh (b0300) 1994; 6
Geramifard, Bowling, Zinkevich, Sutton (b0265) 2007; vol. 19
Ernst, Glavic (b0225) 2004; 19
S. Bradtke, Incremental Dynamic Programming for On-Line Adaptive Optimal Control, Ph.D. thesis, University of Massachusetts, Computer Science Dept. Tech. Rep., 1994, pp. 94–62.
Zhou, Chang (b0765) 2012; 30
Dayan, Sejnowski (b0175) 1994; 14
Powell (b0460) 2007
Sutton, Precup, Singh (b0585) 1999; 112
O, Lee, Lee, Zhang (b0420) 2006; 176
Crites, Barto (b0155) 1998; 33
Vrabie, Lewis (b0655) 2009; 22
Prokhorov, Wunsch (b0470) 1997; 8
Lewis, Vrabie (b0350) 2009; 9
Peng, Bhanu (b0435) 1998; 28
Boyan (b0115) 2002; 49
D. Haussler, Convolution Kernels on Discrete Structures, Technical Report, Department of Computer Science, University of California at Santa Cruz, 1999.
D. Andre, S.J. Russell, State abstraction for programmable reinforcement learning agents, in: Proceedings of the Eighteenth National Conference on Artificial Intelligence, 2002, pp. 119–125.
Schölkopf, Smola (b0500) 2002
Rasmussen, Kuss (b0475) 2004; vol. 16
S. Richter, D. Aberdeen, J. Yu, Natural actor–critic for road traffic optimisation, in: Advances in Neural Information Processing Systems, 2006, pp. 3522–3529.
Schölkopf, Mika, Burges, Knirsch, Müller, Rätsch, Smola (b0505) 1999; 10
J. Peters, S. Schaal, Policy gradient methods for robotics, in: Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots ans Systems, Beijing, China, 2006, pp. 2219–2225.
Barto, Dietterich (b0070) 2004
Kober, Peters (b0325) 2008
Barto, Mahadevan (b0075) 2003; 13
Vapnik (b0635) 1998
Kakade (b0320) 2002
R. Sutton, Cs. Szepesvári, A. Geramifard, M. Bowling, Dyna-style planning with linear function approximation and prioritized sweeping, UAI, 2008, pp. 528–536.
Ghavamzadeh, Mahadevan (b0275)
McPartl (10.1016/j.ins.2013.08.037_b0385) 2011; 3
Dayan (10.1016/j.ins.2013.08.037_b0170) 1992; 8
Xu (10.1016/j.ins.2013.08.037_b0730) 2011; 15
Parr (10.1016/j.ins.2013.08.037_b0430) 1998
10.1016/j.ins.2013.08.037_b0280
Balakrishnan (10.1016/j.ins.2013.08.037_b0065) 1996; 19
10.1016/j.ins.2013.08.037_b0285
Nanduri (10.1016/j.ins.2013.08.037_b0405) 2007; 22
10.1016/j.ins.2013.08.037_b0560
Tsitsiklis (10.1016/j.ins.2013.08.037_b0615) 1997; 42
10.1016/j.ins.2013.08.037_b0165
Boyan (10.1016/j.ins.2013.08.037_b0120) 1994; 6
Samuel (10.1016/j.ins.2013.08.037_b0495) 1959; 3
Barto (10.1016/j.ins.2013.08.037_b0080) 1983; 13
Watkins (10.1016/j.ins.2013.08.037_b0675) 1992; 8
Baird (10.1016/j.ins.2013.08.037_b0055) 1995
Abdulhai (10.1016/j.ins.2013.08.037_b0005) 2003; 129
Enns (10.1016/j.ins.2013.08.037_b0215) 2003; 14
Driessens (10.1016/j.ins.2013.08.037_b0200) 2001; vol. 2167
Galindo-Serrano (10.1016/j.ins.2013.08.037_b0240) 2010; 59
10.1016/j.ins.2013.08.037_b0270
Iftekharuddin (10.1016/j.ins.2013.08.037_b0295) 2011; 22
Peters (10.1016/j.ins.2013.08.037_b0445) 2008; 71
Zhou (10.1016/j.ins.2013.08.037_b0765) 2012; 30
10.1016/j.ins.2013.08.037_b0390
Farahmand (10.1016/j.ins.2013.08.037_b0230) 2011; 85
Shimokawa (10.1016/j.ins.2013.08.037_b0515) 2009; 72
10.1016/j.ins.2013.08.037_b0670
Xu (10.1016/j.ins.2013.08.037_b0720) 2010; 10
Dzeroski (10.1016/j.ins.2013.08.037_b0775) 1998
10.1016/j.ins.2013.08.037_b0030
10.1016/j.ins.2013.08.037_b0025
Vrabie (10.1016/j.ins.2013.08.037_b0655) 2009; 22
Xu (10.1016/j.ins.2013.08.037_b0705) 2002; 16
Hu (10.1016/j.ins.2013.08.037_b0290) 2008
Boyan (10.1016/j.ins.2013.08.037_b0115) 2002; 49
Vlachogiannis (10.1016/j.ins.2013.08.037_b0645) 2004; 19
Vrabie (10.1016/j.ins.2013.08.037_b0650) 2009; 45
Bhatnagar (10.1016/j.ins.2013.08.037_b0095) 2009; 45
Jaradat (10.1016/j.ins.2013.08.037_b0305) 2011; 27
Venayagamoorthy (10.1016/j.ins.2013.08.037_b0640) 2002; 13
Al-Tamimi (10.1016/j.ins.2013.08.037_b0015) 2006
Barto (10.1016/j.ins.2013.08.037_b0075) 2003; 13
Dietterich (10.1016/j.ins.2013.08.037_b0180) 2002; vol. 14
Powell (10.1016/j.ins.2013.08.037_b0460) 2007
Brartke (10.1016/j.ins.2013.08.037_b0135) 1996; 22
Prashanth (10.1016/j.ins.2013.08.037_b0465) 2011; 12
Rasmussen (10.1016/j.ins.2013.08.037_b0475) 2004; vol. 16
O (10.1016/j.ins.2013.08.037_b0420) 2006; 176
10.1016/j.ins.2013.08.037_b0580
Barto (10.1016/j.ins.2013.08.037_b0070) 2004
10.1016/j.ins.2013.08.037_b0455
10.1016/j.ins.2013.08.037_b0210
Driessens (10.1016/j.ins.2013.08.037_b0245) 2006; 64
Silver (10.1016/j.ins.2013.08.037_b0520) 2007
10.1016/j.ins.2013.08.037_b0610
Peng (10.1016/j.ins.2013.08.037_b0440) 1998; 20
Geramifard (10.1016/j.ins.2013.08.037_b0265) 2007; vol. 19
Peng (10.1016/j.ins.2013.08.037_b0435) 1998; 28
Wang (10.1016/j.ins.2013.08.037_b0660) 2007; 177
Lewis (10.1016/j.ins.2013.08.037_b0345) 2008; 38
Ghavamzadeh (10.1016/j.ins.2013.08.037_b0275) 2007; 8
Bertsekas (10.1016/j.ins.2013.08.037_b0090) 1996
Schwartz (10.1016/j.ins.2013.08.037_b0510) 1993
Werbos (10.1016/j.ins.2013.08.037_b0680) 2009
Söderström (10.1016/j.ins.2013.08.037_b0540) 1983
Johns (10.1016/j.ins.2013.08.037_b0315) 2009; 76
Crites (10.1016/j.ins.2013.08.037_b0155) 1998; 33
10.1016/j.ins.2013.08.037_b0450
10.1016/j.ins.2013.08.037_b0330
Sutton (10.1016/j.ins.2013.08.037_b0555) 1992; 12
Zhang (10.1016/j.ins.2013.08.037_b0755) 2011; 22
10.1016/j.ins.2013.08.037_b0690
10.1016/j.ins.2013.08.037_b0570
Xu (10.1016/j.ins.2013.08.037_b0735) 2013; 24
Ernst (10.1016/j.ins.2013.08.037_b0225) 2004; 19
10.1016/j.ins.2013.08.037_b0685
Maei (10.1016/j.ins.2013.08.037_b0360) 2010; vol. 22
10.1016/j.ins.2013.08.037_b0045
Xu (10.1016/j.ins.2013.08.037_b0700) 2010
Busoniu (10.1016/j.ins.2013.08.037_b0140) 2010
Kober (10.1016/j.ins.2013.08.037_b0325) 2008
Mitola (10.1016/j.ins.2013.08.037_b0395) 1999; 6
Al-Tamimi (10.1016/j.ins.2013.08.037_b0010) 2007; 43
Driessens (10.1016/j.ins.2013.08.037_b0195) 2004; 57
Sutton (10.1016/j.ins.2013.08.037_b0585) 1999; 112
10.1016/j.ins.2013.08.037_b0605
Balaji (10.1016/j.ins.2013.08.037_b0060) 2010; 4
Vamvoudakis (10.1016/j.ins.2013.08.037_b0630) 2011; 47
Xu (10.1016/j.ins.2013.08.037_b0710) 2005; 11
Mohagheghi (10.1016/j.ins.2013.08.037_b0400) 2006; 21
Jiang (10.1016/j.ins.2013.08.037_b0310) 2011; 5
Yin (10.1016/j.ins.2013.08.037_b0745) 2005; 27
Lagoudakis (10.1016/j.ins.2013.08.037_b0340) 2003; 4
Prokhorov (10.1016/j.ins.2013.08.037_b0470) 1997; 8
Lewis (10.1016/j.ins.2013.08.037_b0350) 2009; 9
Sutton (10.1016/j.ins.2013.08.037_b0575) 2009; vol. 21
Nedic (10.1016/j.ins.2013.08.037_b0410) 2003; 13
Ormoneit (10.1016/j.ins.2013.08.037_b0425) 2002; 49
Baxter (10.1016/j.ins.2013.08.037_b0085) 2001; 15
Singh (10.1016/j.ins.2013.08.037_b0535) 1994; 16
Singh (10.1016/j.ins.2013.08.037_b0525) 1997; 9
10.1016/j.ins.2013.08.037_b0190
Schölkopf (10.1016/j.ins.2013.08.037_b0500) 2002
Yu (10.1016/j.ins.2013.08.037_b0750) 2011; 26
10.1016/j.ins.2013.08.037_b0590
Ernst (10.1016/j.ins.2013.08.037_b0220) 2005; 6
Tesauro (10.1016/j.ins.2013.08.037_b0600) 1994; 6
Carreras (10.1016/j.ins.2013.08.037_b0150) 2005; 30
10.1016/j.ins.2013.08.037_b0110
Maei (10.1016/j.ins.2013.08.037_b0365) 2010
Wang (10.1016/j.ins.2013.08.037_b0665) 2009
Borkar (10.1016/j.ins.2013.08.037_b0100) 1997; 29
10.1016/j.ins.2013.08.037_b0740
10.1016/j.ins.2013.08.037_b0620
Driessens (10.1016/j.ins.2013.08.037_b0205) 2003
Jaakkola (10.1016/j.ins.2013.08.037_b0300) 1994; 6
Vapnik (10.1016/j.ins.2013.08.037_b0635) 1998
Gosavi (10.1016/j.ins.2013.08.037_b0255) 2004; 155
Riedmiller (10.1016/j.ins.2013.08.037_b0480) 2009; 27
Xu (10.1016/j.ins.2013.08.037_b0725) 2011; 22
Zhou (10.1016/j.ins.2013.08.037_b0770) 2002; 145
Xu (10.1016/j.ins.2013.08.037_b0715) 2007; 18
10.1016/j.ins.2013.08.037_b0380
Sutton (10.1016/j.ins.2013.08.037_b0565) 1988; 3
Crites (10.1016/j.ins.2013.08.037_b0160) 1996; 8
Bagnell (10.1016/j.ins.2013.08.037_b0050) 2003
Liu (10.1016/j.ins.2013.08.037_b0355) 2005; 16
Cao (10.1016/j.ins.2013.08.037_b0145) 2009
Stone (10.1016/j.ins.2013.08.037_b0545) 2005; 13
Dietterich (10.1016/j.ins.2013.08.037_b0185) 2000; 13
Kakade (10.1016/j.ins.2013.08.037_b0320) 2002
Schölkopf (10.1016/j.ins.2013.08.037_b0505) 1999; 10
Singh (10.1016/j.ins.2013.08.037_b0530) 2000; 38
Amari (10.1016/j.ins.2013.08.037_b0020) 1998; 10
Szepesvári (10.1016/j.ins.2013.08.037_b0595) 2010
10.1016/j.ins.2013.08.037_b0490
10.1016/j.ins.2013.08.037_b0370
Sutton (10.1016/j.ins.2013.08.037_b0550) 1998
Ng (10.1016/j.ins.2013.08.037_b0415) 2004; 16
Vamvoudakis (10.1016/j.ins.2013.08.037_b0625) 2010; 46
10.1016/j.ins.2013.08.037_b0250
10.1016/j.ins.2013.08.037_b0130
Widrow (10.1016/j.ins.2013.08.037_b0695) 1973; SMC-3
10.1016/j.ins.2013.08.037_b0125
Borkar (10.1016/j.ins.2013.08.037_b0105) 2008
Mahadevan (10.1016/j.ins.2013.08.037_b0375) 2007; 8
10.1016/j.ins.2013.08.037_b0485
10.1016/j.ins.2013.08.037_b0760
Dayan (10.1016/j.ins.2013.08.037_b0175) 1994; 14
George (10.1016/j.ins.2013.08.037_b0260) 2006; 65
Arel (10.1016/j.ins.2013.08.037_b0035) 2010; 4
Bach (10.1016/j.ins.2013.08.037_b0040) 2002; 3
Farahmand (10.1016/j.ins.2013.08.037_b0235) 2008
Konda (10.1016/j.ins.2013.08.037_b0335) 2000; vol. 12
References_xml – year: 2009
  ident: b0145
  article-title: Stochastic Learning and Optimization
– year: 2007
  ident: b0460
  article-title: Approximate Dynamic Programming: Solving the Curses of Dimensionality
– volume: 4
  start-page: 128
  year: 2010
  end-page: 135
  ident: b0035
  article-title: Reinforcement learning-based multi-agent system for network traffic signal control
  publication-title: IET Intelligent Transport Systems
– volume: 24
  start-page: 762
  year: 2013
  end-page: 775
  ident: b0735
  article-title: Online learning control using adaptive critic designs with sparse kernel machines
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
– reference: K.-L.A. Yau, P. Komisarczuk, et al., Applications of reinforcement learning to cognitive radio networks, 2010 IEEE International Conference on Communication Workshops (ICC), 2010, pp. 1–6.
– volume: vol. 22
  year: 2010
  ident: b0360
  article-title: Convergent temporal-difference learning with arbitrary smooth function approximation
  publication-title: Advances in Neural Information Processing Systems
– volume: 11
  start-page: 54
  year: 2005
  end-page: 63
  ident: b0710
  article-title: Kernel least-squares temporal difference learning
  publication-title: International Journal of Information Technology
– volume: 21
  start-page: 1744
  year: 2006
  end-page: 1754
  ident: b0400
  article-title: Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system
  publication-title: IEEE Transactions on Power Systems
– volume: 13
  start-page: 41
  year: 2003
  end-page: 77
  ident: b0075
  article-title: Recent advances in hierarchical reinforcement learning
  publication-title: Discrete Event Dynamic Systems-Theory and Applications
– volume: 6
  start-page: 503
  year: 2005
  end-page: 556
  ident: b0220
  article-title: Tree-based batch mode reinforcement learning
  publication-title: Journal of Machine Learning Research
– volume: 20
  start-page: 139
  year: 1998
  end-page: 154
  ident: b0440
  article-title: Closed-loop object recognition using reinforcement learning
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– year: 2002
  ident: b0500
  article-title: Learning with Kernels
– reference: R. Sutton, Adapting bias by gradient descent: an incremental version of delta-bar-delta, in: Proceedings of the 10th National Conference on Artificial Intelligence, 1992, pp. 171–176.
– volume: 42
  start-page: 674
  year: 1997
  end-page: 690
  ident: b0615
  article-title: An analysis of temporal difference learning with function approximation
  publication-title: IEEE Transactions on Automatic Control
– year: 2008
  ident: b0290
  article-title: Markov Decision Processes with Their Applications
– volume: 18
  start-page: 973
  year: 2007
  end-page: 992
  ident: b0715
  article-title: Kernel based least-squares policy iteration for reinforcement learning
  publication-title: IEEE Transactions on Neural Networks
– volume: 8
  year: 1996
  ident: b0160
  article-title: Improving elevator performance using reinforcement learning
  publication-title: Advances in Neural Information Processing Systems
– year: 2008
  ident: b0105
  article-title: Stochastic Approximation: A Dynamical Systems Viewpoint
– reference: T. Gärtner, P. Flach, S. Wrobel, On graph kernels: hardness results and efficient alternatives, in: M.W.B. Scholkopf (Ed.), Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop, 2003, pp. 129–143.
– start-page: 719
  year: 2010
  end-page: 726
  ident: b0365
  article-title: Toward off-policy learning control with function approximation
  publication-title: ICML
– reference: S. Mahadevan, Proto-value functions: developmental reinforcement learning, in: Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 553–560.
– volume: 3
  start-page: 43
  year: 2011
  end-page: 56
  ident: b0385
  article-title: Reinforcement learning in first person shooter games
  publication-title: IEEE Transactions on Computational Intelligence and AI in Games
– volume: 176
  start-page: 2121
  year: 2006
  end-page: 2147
  ident: b0420
  article-title: Adaptive stock trading with dynamic asset allocation using reinforcement learning
  publication-title: Information Sciences
– volume: 71
  start-page: 1180
  year: 2008
  end-page: 1190
  ident: b0445
  article-title: Natural actor–critic
  publication-title: Neurocomputing
– volume: 22
  start-page: 2226
  year: 2011
  end-page: 2236
  ident: b0755
  article-title: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method
  publication-title: IEEE Transactions on Neural Networks
– volume: 38
  start-page: 287
  year: 2000
  end-page: 308
  ident: b0530
  article-title: Convergence results for single-step on-policy reinforcement-learning algorithms
  publication-title: Machine Learning
– volume: 19
  start-page: 893
  year: 1996
  end-page: 898
  ident: b0065
  article-title: Adaptive-critic-based neural networks for aircraft optimal control
  publication-title: Journal of Guidance, Control, Dynamics
– volume: 13
  start-page: 79
  year: 2003
  end-page: 110
  ident: b0410
  article-title: Least squares policy evaluation algorithms with linear function approximation
  publication-title: Discrete Event Dynamic Systems
– volume: 49
  start-page: 233
  year: 2002
  end-page: 246
  ident: b0115
  article-title: Technical update: least-squares temporal difference learning
  publication-title: Machine Learning
– volume: 6
  year: 1994
  ident: b0120
  article-title: Packet routing in dynamically changing networks: a reinforcement learning approach
  publication-title: Advances in neural information processing systems
– volume: 6
  start-page: 185
  year: 1994
  end-page: 1201
  ident: b0300
  article-title: On the convergence of stochastic iterative dynamic programming algorithms
  publication-title: Neural Computation
– volume: 12
  start-page: 19
  year: 1992
  end-page: 22
  ident: b0555
  article-title: Reinforcement learning is direct adaptive control
  publication-title: IEEE Control Systems
– year: 1996
  ident: b0090
  article-title: Neuro-Dynamic Programming
– volume: 3
  start-page: 211
  year: 1959
  end-page: 229
  ident: b0495
  article-title: Some studies in machine learning using game of checkers
  publication-title: IBM Jounal on Research and Development
– reference: Y. Engel, S. Mannor, R. Meir, “Bayes meets bellman: the Gaussian Process approach to temporal difference learning, in: Proceedings of the Twentieth International Conference of Machine Learning, Washington, DC, 2003, pp. 154–161.
– start-page: 298
  year: 1993
  end-page: 305
  ident: b0510
  article-title: A reinforcement learning method for maximizing undiscounted rewards
  publication-title: Proceedings of the Tenth Annual Conference on Machine Learning
– volume: 8
  start-page: 997
  year: 1997
  end-page: 1007
  ident: b0470
  article-title: Adaptive critic designs
  publication-title: IEEE Transactions Neural Networks
– volume: 19
  start-page: 1225
  year: 2004
  end-page: 1317
  ident: b0645
  article-title: Reinforcement learning for reactive power control
  publication-title: IEEE Transactions on Power Systems
– volume: 14
  start-page: 929
  year: 2003
  end-page: 939
  ident: b0215
  article-title: Helicopter trimming and tracking control using direct neural dynamic programming
  publication-title: IEEE Transactions on Neural Networks
– volume: 6
  start-page: 13
  year: 1999
  end-page: 18
  ident: b0395
  article-title: Cognitive radio: making software radios more personal
  publication-title: IEEE Personal Communications
– start-page: 1531
  year: 2002
  end-page: 1538
  ident: b0320
  article-title: A natural policy gradient
  publication-title: Advances in Neural Information Processing Systems
– start-page: 441
  year: 2008
  end-page: 448
  ident: b0235
  article-title: Regularized policy iteration
  publication-title: NIPS
– volume: 65
  start-page: 167
  year: 2006
  end-page: 198
  ident: b0260
  article-title: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
  publication-title: Machine Learning
– volume: 4
  start-page: 177
  year: 2010
  end-page: 188
  ident: b0060
  article-title: Urban traffic signal control using reinforcement learning agents
  publication-title: IET Intelligent Transport Systems
– volume: 64
  start-page: 91
  year: 2006
  end-page: 119
  ident: b0245
  article-title: Graph kernels and Gaussian Processes for relational reinforcement learning
  publication-title: Machine Learning
– volume: 16
  start-page: 227
  year: 1994
  end-page: 233
  ident: b0535
  article-title: An upper bound on the loss from approximate optimal value functions
  publication-title: Machine Learning
– volume: 112
  start-page: 181
  year: 1999
  end-page: 211
  ident: b0585
  article-title: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
  publication-title: Artificial Intelligence
– reference: S.Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in: Proceedings of the Fourth Connectionist Models Summer School, 1993, pp. 255–263.
– reference: W.T.B. Uther, M.M. Veloso, Tree based discretization for continuous state space reinforcement learning, in: Proceedings of AAAI-98, 1998, pp. 769–774.
– start-page: 1043
  year: 1998
  end-page: 1049
  ident: b0430
  article-title: Reinforcement learning with hierarchies of machines
  publication-title: Advances in Neural Information Processing Systems
– volume: 59
  start-page: 1823
  year: 2010
  end-page: 1834
  ident: b0240
  article-title: Distributed Q-Learning for aggregated interference control in cognitive radio networks
  publication-title: IEEE Transactions on Vehicular Technology
– volume: 10
  start-page: 859
  year: 2010
  end-page: 867
  ident: b0720
  article-title: Sequential anomaly detection based on temporal-difference learning: principles, models and case studies
  publication-title: Applied Soft Computing
– volume: 33
  start-page: 235
  year: 1998
  end-page: 262
  ident: b0155
  article-title: Elevator group control using multiple reinforcement learning agents
  publication-title: Machine Learning
– volume: 16
  start-page: 1219
  year: 2005
  end-page: 1228
  ident: b0355
  article-title: A self-learning call admission control scheme for CDMA cellular networks
  publication-title: IEEE Transactions on Neural Networks
– reference: P.J. Werbos, Using ADP to understand and replicate brain intelligence: the next level design, in: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, pp. 209–216.
– volume: 9
  start-page: 974
  year: 1997
  end-page: 980
  ident: b0525
  article-title: Reinforcement learning for dynamic channel allocation in cellular telephone systems
  publication-title: Advances in Neural Information Processsing Systems
– start-page: 200
  year: 2009
  end-page: 212
  ident: b0680
  article-title: Intelligence in the brain: a theory of how it works and how to build it
  publication-title: Neural Networks
– volume: 16
  start-page: 259
  year: 2002
  end-page: 292
  ident: b0705
  article-title: Efficient reinforcement learning using recursive least-squares methods
  publication-title: Journal of Artificial Intelligence Research
– reference: A. Antos, R. Munos, C. Szepesvari, Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems, in: 2009 American Control Conference, Hyatt Regency Riverfront, St. Louis, MO, USA, June 10–12, pp. 725–730.
– volume: vol. 16
  start-page: 751
  year: 2004
  end-page: 759
  ident: b0475
  article-title: Gaussian processes in reinforcement learning
  publication-title: Advances in Neural Information Processing Systems
– reference: J.N. Tsitsiklis, Asynchronous Stochastic Approximation and Q-learning, Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems, MIT, Cambridge, MA, 1993.
– year: 2010
  ident: b0700
  article-title: Reinforcement Learning and Approximate Dynamic Programming
– volume: 30
  start-page: 54
  year: 2012
  end-page: 69
  ident: b0765
  article-title: Reinforcement learning for repeated power control game in cognitive radio networks
  publication-title: IEEE Journal on Selected Areas in Communications
– volume: 15
  start-page: 1055
  year: 2011
  end-page: 1070
  ident: b0730
  article-title: Continuous-action reinforcement learning with fast policy search and adaptive basis function selection
  publication-title: Soft Computing – A Fusion of Foundations, Methodologies and Applications
– reference: B. Hengst, Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning, in: AI 2007: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 4830, 2007, pp. 58–67.
– reference: N. Kohl, P. Stone, Machine learning for fast quadrupedal locomotion, in: D.L. McGuinness, G. Ferguson (Eds.), Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI 2004), AAAI Press, Menlo Park, pp. 611–616.
– reference: T.G. Dietterich, State abstraction in MAXQ hierarchical reinforcement learning, in: S.A. Solla, T.K. Leen, K.R. Muller (Eds.), Advances in Neural Information Processing Systems, NIPS, 2000, pp. 994–1000.
– volume: 47
  start-page: 1556
  year: 2011
  end-page: 1569
  ident: b0630
  article-title: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton–Jacobi equations
  publication-title: Automatica
– volume: 13
  start-page: 227
  year: 2000
  end-page: 303
  ident: b0185
  article-title: Hierarchical reinforcement learning with the Max-Q value function decomposition
  publication-title: Journal of Artificial Intelligence Research
– reference: A.R. Mahmood, R. Sutton, T.Degris, P.M. Pilarski, Tuning-free step-size adaptation, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012.
– year: 1998
  ident: b0635
  article-title: Statistical Learning Theory
– volume: 27
  start-page: 1536
  year: 2005
  end-page: 1551
  ident: b0745
  article-title: Integrating relevance feedback techniques for image retrieval using reinforcement learning
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– reference: R. Sutton, A.G. Barto, A temporal-difference model of classical conditioning, in: Proceedings of the 9th Annual Conference Cognitive Science Society, 1987, pp. 355–378.
– volume: vol. 21
  start-page: 1609
  year: 2009
  end-page: 1616
  ident: b0575
  article-title: A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation
  publication-title: Advances in Neural Information Processing Systems
– volume: 85
  start-page: 299
  year: 2011
  end-page: 332
  ident: b0230
  article-title: Model selection in reinforcement learning
  publication-title: Machine Learning
– volume: 19
  start-page: 427
  year: 2004
  end-page: 435
  ident: b0225
  article-title: Power systems stability control: reinforcement learning framework
  publication-title: IEEE Transactions on Power Systems
– volume: 145
  start-page: 45
  year: 2002
  end-page: 68
  ident: b0770
  article-title: Robot learning with GA-based fuzzy reinforcement learning agents
  publication-title: Information Sciences
– volume: 22
  start-page: 33
  year: 1996
  end-page: 57
  ident: b0135
  article-title: Linear least-squares algorithms for temporal difference learning
  publication-title: Machine Learning
– volume: 5
  start-page: 1309
  year: 2011
  end-page: 1317
  ident: b0310
  article-title: Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing
  publication-title: IET Communication
– year: 2006
  ident: b0015
  article-title: Adaptive critic designs for discrete-time zero-sum games with application to H-Infinity control
  publication-title: IEEE Transactions on Systems Man Cybernetics-Part B
– reference: D. Andre, S.J. Russell, State abstraction for programmable reinforcement learning agents, in: Proceedings of the Eighteenth National Conference on Artificial Intelligence, 2002, pp. 119–125.
– volume: 30
  start-page: 416
  year: 2005
  end-page: 427
  ident: b0150
  article-title: A behavior-based scheme using reinforcement learning for autonomous underwater vehicles
  publication-title: IEEE Journal of Oceanic Engineering
– volume: vol. 14
  start-page: 1491
  year: 2002
  end-page: 1498
  ident: b0180
  article-title: Batch value function approximation via support vectors
  publication-title: Advances in Neural Information Processing Systems
– reference: R. Sutton, Cs. Szepesvári, A. Geramifard, M. Bowling, Dyna-style planning with linear function approximation and prioritized sweeping, UAI, 2008, pp. 528–536.
– volume: 57
  start-page: 271
  year: 2004
  end-page: 304
  ident: b0195
  article-title: Integrating guidance into relational reinforcement learning
  publication-title: Machine Learning
– reference: M. Riedmiller, M. Montemerlo, et al., Learning to drive in 20
– start-page: 1019
  year: 2003
  end-page: 1024
  ident: b0050
  article-title: Covariant policy search
  publication-title: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03)
– start-page: 123
  year: 2003
  end-page: 130
  ident: b0205
  article-title: Relational instance based regression for relational reinforcement learning
  publication-title: Proceedings of the Twentieth International Conference on Machine Learning
– volume: 27
  start-page: 55
  year: 2009
  end-page: 74
  ident: b0480
  article-title: Reinforcement learning for robot soccer
  publication-title: Autonomous Robots
– volume: 72
  start-page: 3447
  year: 2009
  end-page: 3461
  ident: b0515
  article-title: Predicting investment behavior: an augmented reinforcement learning model
  publication-title: Neurocomputing
– reference: R. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari, E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation, in: Proceedings of the 26th Annual International Conference on Machine Learning (ICML-09), 2009, pp. 993–1000.
– volume: 29
  start-page: 291
  year: 1997
  end-page: 294
  ident: b0100
  article-title: Stochastic approximation with two time scales
  publication-title: Systems & Control Letters
– volume: 22
  start-page: 85
  year: 2007
  end-page: 95
  ident: b0405
  article-title: A reinforcement learning model to assess market power under auction-based energy pricingm
  publication-title: IEEE Transactions on Power Systems
– start-page: 39
  year: 2009
  end-page: 47
  ident: b0665
  article-title: Adaptive dynamic programming: an introduction
  publication-title: IEEE Computational Intelligence Magazine
– reference: J. Peters, S. Schaal, Policy gradient methods for robotics, in: Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots ans Systems, Beijing, China, 2006, pp. 2219–2225.
– volume: 155
  start-page: 654
  year: 2004
  end-page: 674
  ident: b0255
  article-title: Reinforcement learning for long-run average cost
  publication-title: European Journal of Operational Research
– volume: 15
  start-page: 319
  year: 2001
  end-page: 350
  ident: b0085
  article-title: Infinite-horizon policy-gradient estimation
  publication-title: Journal of Artificial Intelligence Research
– reference: M. Ghavamzadeh, Y. Engel, Bayesian policy gradient algorithms, in: Advances in Neural Information Processing Systems, 2006, pp. 457–464.
– year: 2008
  ident: b0325
  article-title: Policy search for motor primitives in robotics
  publication-title: Advances in Neural Information Processing Systems
– volume: 4
  start-page: 1107
  year: 2003
  end-page: 1149
  ident: b0340
  article-title: Least-squares policy iteration
  publication-title: Journal of Machine Learning Research
– volume: 45
  start-page: 477
  year: 2009
  end-page: 484
  ident: b0650
  article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration
  publication-title: Automatica
– volume: 22
  start-page: 237
  year: 2009
  end-page: 246
  ident: b0655
  article-title: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
  publication-title: Neural Networks
– volume: 26
  start-page: 1272
  year: 2011
  end-page: 1282
  ident: b0750
  article-title: Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q(
  publication-title: IEEE Transactions on Power Systems
– reference: J.A. Bagnell, J.G. Schneider, Autonomous helicopter control using reinforcement learning policy search methods, in: Proceedings of the 2001 IEEE International Conference on Robotics & Automation, Seoul, Korea, 2001, pp. 1615–1620.
– volume: 177
  start-page: 3764
  year: 2007
  end-page: 3781
  ident: b0660
  article-title: A fuzzy actor–critic reinforcement learning network
  publication-title: Information Sciences
– reference: C. Watkins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge Univ., Cambridge, England, 1989.
– year: 1998
  ident: b0550
  article-title: Reinforcement Learning. An Introduction
– year: 2010
  ident: b0140
  article-title: Reinforcement Learning and Dynamic Programming Using Function Approximators
– volume: 8
  start-page: 341
  year: 1992
  end-page: 362
  ident: b0170
  article-title: The convergence of TD(
  publication-title: Machine Learning
– reference: S. Richter, D. Aberdeen, J. Yu, Natural actor–critic for road traffic optimisation, in: Advances in Neural Information Processing Systems, 2006, pp. 3522–3529.
– reference: W. Zhang, T.G. Dietterich. A reinforcement learning approach to job-shop scheduling, in: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI 1995), 1995, pp. 1114–1120.
– volume: 12
  start-page: 412
  year: 2011
  end-page: 421
  ident: b0465
  article-title: Reinforcement learning with function approximation for traffic signal control
  publication-title: IEEE Transactions on Intelligence Transportation Systems
– year: 2010
  ident: b0595
  article-title: Algorithms for Reinforcement Learning
– volume: 38
  year: 2008
  ident: b0345
  article-title: Special issue on approximate dynamic programming and reinforcement learning for feedback control
  publication-title: IEEE Transactions on Systems, Man, and Cybernetics B
– volume: 8
  start-page: 279
  year: 1992
  end-page: 292
  ident: b0675
  article-title: Q-Learning
  publication-title: Machine Learning
– start-page: 1053
  year: 2007
  end-page: 1058
  ident: b0520
  article-title: Reinforcement learning of local shape in the game of Go
  publication-title: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007)
– volume: 3
  start-page: 1
  year: 2002
  end-page: 48
  ident: b0040
  article-title: Kernel independent component analysis
  publication-title: Journal of Machine Learning Research
– reference: S. Bradtke, Incremental Dynamic Programming for On-Line Adaptive Optimal Control, Ph.D. thesis, University of Massachusetts, Computer Science Dept. Tech. Rep., 1994, pp. 94–62.
– volume: vol. 19
  start-page: 441
  year: 2007
  end-page: 448
  ident: b0265
  article-title: iLSTD: eligibility traces and convergence analysis
  publication-title: Advances in Neural Information Processing Systems
– volume: 28
  start-page: 482
  year: 1998
  end-page: 488
  ident: b0435
  article-title: Delayed reinforcement learning for adaptive image segmentation and feature extraction
  publication-title: IEEE Transactions on System Man and Cybernetics-Part C
– volume: 43
  start-page: 473
  year: 2007
  end-page: 481
  ident: b0010
  article-title: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
  publication-title: Automatica
– reference: min, in: Proceedings of the FBIT 2007 Conference, Jeju, Korea, 2007.
– reference: J. Peters, S. Vijayakumar, S. Schaal, Reinforcement learning for humanoid robotics, in: IEEE/RSJ International Conference on Humanoid Robotics, 2003.
– reference: S. Bradtke, B. Ydstie, A. Barto, Adaptive linear quadratic control using policy iteration, Univ. Massachusetts, Amherst, MA, Tech. Rep. CMPSCI-94-49, June 1994.
– start-page: 30
  year: 1995
  end-page: 37
  ident: b0055
  article-title: Residual algorithms: reinforcement learning with function approximation
  publication-title: Proceedings of the 12th International Conference on Machine Learning (ICML 1995)
– reference: M.L. Minsky, Theory of Neural-Analog Reinforcement Systems and its Application to the Brain-Model Problem, Ph.D. Thesis, Princeton University, 1954.
– volume: 10
  start-page: 1000
  year: 1999
  end-page: 1017
  ident: b0505
  article-title: Input space vs feature space in kernel-based algorithms
  publication-title: IEEE Transactions on Neural Networks
– volume: 6
  start-page: 215
  year: 1994
  end-page: 219
  ident: b0600
  article-title: TD-Gammon, a self-teaching backgammon program, achieves master-level play
  publication-title: Neural Computation
– volume: 45
  start-page: 2471
  year: 2009
  end-page: 2482
  ident: b0095
  article-title: Natural actor–critic algorithms
  publication-title: Automatica
– volume: 8
  start-page: 2629
  year: 2007
  end-page: 2669
  ident: b0275
  article-title: Hierarchical average reward reinforcement learning
  publication-title: Journal of Machine Learning Research
– volume: vol. 12
  year: 2000
  ident: b0335
  article-title: Actor–critic algorithms
  publication-title: Advances in Neural Information Processing Systems
– year: 2004
  ident: b0070
  article-title: Reinforcement learning and its relationship to supervised learning
  publication-title: Handbook of Learning and Approximate Dynamic Programming
– volume: 27
  start-page: 135
  year: 2011
  end-page: 149
  ident: b0305
  article-title: Reinforcement based mobile robot navigation in dynamic environment
  publication-title: Robotics and Computer-Integrated Manufacturing
– year: 1983
  ident: b0540
  article-title: Instrumental Variable Methods in System Identification
– reference: J. Boyan, A.W. Moore, Generalization in reinforcement learning: safely approximating the value function, in: Advances in Neural Information Processing Systems, 1995, pp. 369–376.
– reference: P.J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavior Sciences, Ph.D. thesis, Committee Appl. Math. Harvard Univ., 1974.
– volume: 9
  start-page: 32
  year: 2009
  end-page: 50
  ident: b0350
  article-title: Reinforcement learning and adaptive dynamic programming for feedback control
  publication-title: IEEE Circuits and Systems Magazine
– volume: 13
  start-page: 764
  year: 2002
  end-page: 773
  ident: b0640
  article-title: Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator
  publication-title: IEEE Transactions on Neural Networks
– volume: 49
  start-page: 161
  year: 2002
  end-page: 178
  ident: b0425
  article-title: Kernel-based reinforcement learning
  publication-title: Machine Learning
– volume: vol. 2167
  start-page: 97
  year: 2001
  end-page: 108
  ident: b0200
  article-title: Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner
  publication-title: Proceedings of the 13th European Conference on Machine Learning
– volume: 8
  start-page: 2169
  year: 2007
  end-page: 2231
  ident: b0375
  article-title: Proto-value functions: a laplacian framework for learning representation and control in markov decision processes
  publication-title: Journal of Machine Learning Research
– volume: 13
  start-page: 834
  year: 1983
  end-page: 846
  ident: b0080
  article-title: Neuron-like adaptive elements that can solve difficult learning control problems
  publication-title: IEEE Transactions on Systems, Man, and Cybernetics
– reference: D. Haussler, Convolution Kernels on Discrete Structures, Technical Report, Department of Computer Science, University of California at Santa Cruz, 1999.
– volume: SMC-3
  start-page: 455
  year: 1973
  end-page: 465
  ident: b0695
  article-title: Punish/reward: Learning with a critic in adaptive threshold systems
  publication-title: IEEE Transactions on Systems, Man, and Cybernetics
– volume: 13
  start-page: 165
  year: 2005
  end-page: 188
  ident: b0545
  article-title: Reinforcement learning for RoboCup-soccer keepaway
  publication-title: Adaptive Behavior
– volume: 3
  start-page: 9
  year: 1988
  end-page: 44
  ident: b0565
  article-title: Learning to predict by the method of temporal differences
  publication-title: Machine Learning
– volume: 46
  start-page: 878
  year: 2010
  end-page: 888
  ident: b0625
  article-title: Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem
  publication-title: Automatica
– reference: C. Darken, J. Moody, Note on learning rate schedules for stochastic optimization, in: Lippman, et al. (Eds.), Advances in Neural Information Processing Systems, vol. 3, 1991, pp. 1009–1016.
– volume: 16
  year: 2004
  ident: b0415
  article-title: Autonomous helicopter flight via reinforcement learning
  publication-title: Advances in Neural Information Processing Systems
– volume: 10
  start-page: 251
  year: 1998
  end-page: 276
  ident: b0020
  article-title: Natural gradient works efficiently in learning
  publication-title: Neural Computation
– volume: 14
  start-page: 295
  year: 1994
  end-page: 301
  ident: b0175
  article-title: TD(
  publication-title: Machine Learning
– volume: 129
  start-page: 278
  year: 2003
  end-page: 285
  ident: b0005
  article-title: Reinforcement learning for true adaptive traffic signal control
  publication-title: Journal of Transportation Engineering
– volume: 22
  start-page: 1863
  year: 2011
  end-page: 1877
  ident: b0725
  article-title: Hierarchical approximate policy iteration with binary-tree state space decomposition
  publication-title: IEEE Transactions on Neural Networks
– volume: 22
  start-page: 906
  year: 2011
  end-page: 918
  ident: b0295
  article-title: Transformation invariant on-line target recognition
  publication-title: IEEE Transactions on Neural Networks
– volume: 76
  start-page: 243
  year: 2009
  end-page: 256
  ident: b0315
  article-title: Hybrid least-squares algorithms for approximate policy evaluation
  publication-title: Machine Learning
– ident: 10.1016/j.ins.2013.08.037_b0670
– volume: vol. 21
  start-page: 1609
  year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0575
  article-title: A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation
– volume: 129
  start-page: 278
  issue: 3
  year: 2003
  ident: 10.1016/j.ins.2013.08.037_b0005
  article-title: Reinforcement learning for true adaptive traffic signal control
  publication-title: Journal of Transportation Engineering
  doi: 10.1061/(ASCE)0733-947X(2003)129:3(278)
– volume: 85
  start-page: 299
  issue: 3
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0230
  article-title: Model selection in reinforcement learning
  publication-title: Machine Learning
  doi: 10.1007/s10994-011-5254-7
– volume: 27
  start-page: 55
  issue: 1
  year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0480
  article-title: Reinforcement learning for robot soccer
  publication-title: Autonomous Robots
  doi: 10.1007/s10514-009-9120-4
– ident: 10.1016/j.ins.2013.08.037_b0590
– volume: vol. 19
  start-page: 441
  year: 2007
  ident: 10.1016/j.ins.2013.08.037_b0265
  article-title: iLSTD: eligibility traces and convergence analysis
– volume: vol. 16
  start-page: 751
  year: 2004
  ident: 10.1016/j.ins.2013.08.037_b0475
  article-title: Gaussian processes in reinforcement learning
– volume: 45
  start-page: 2471
  issue: 11
  year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0095
  article-title: Natural actor–critic algorithms
  publication-title: Automatica
  doi: 10.1016/j.automatica.2009.07.008
– volume: 28
  start-page: 482
  issue: 3
  year: 1998
  ident: 10.1016/j.ins.2013.08.037_b0435
  article-title: Delayed reinforcement learning for adaptive image segmentation and feature extraction
  publication-title: IEEE Transactions on System Man and Cybernetics-Part C
  doi: 10.1109/5326.704593
– volume: 22
  start-page: 237
  issue: 3
  year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0655
  article-title: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
  publication-title: Neural Networks
  doi: 10.1016/j.neunet.2009.03.008
– start-page: 1053
  year: 2007
  ident: 10.1016/j.ins.2013.08.037_b0520
  article-title: Reinforcement learning of local shape in the game of Go
  publication-title: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007)
– volume: 20
  start-page: 139
  issue: 2
  year: 1998
  ident: 10.1016/j.ins.2013.08.037_b0440
  article-title: Closed-loop object recognition using reinforcement learning
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
  doi: 10.1109/34.659932
– start-page: 136
  year: 1998
  ident: 10.1016/j.ins.2013.08.037_b0775
  article-title: Relational reinforcement Learning
– ident: 10.1016/j.ins.2013.08.037_b0370
  doi: 10.1145/1102351.1102421
– start-page: 39
  year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0665
  article-title: Adaptive dynamic programming: an introduction
  publication-title: IEEE Computational Intelligence Magazine
  doi: 10.1109/MCI.2009.932261
– volume: 3
  start-page: 9
  year: 1988
  ident: 10.1016/j.ins.2013.08.037_b0565
  article-title: Learning to predict by the method of temporal differences
  publication-title: Machine Learning
  doi: 10.1023/A:1022633531479
– year: 1996
  ident: 10.1016/j.ins.2013.08.037_b0090
– ident: 10.1016/j.ins.2013.08.037_b0125
– volume: 49
  start-page: 233
  issue: 2–3
  year: 2002
  ident: 10.1016/j.ins.2013.08.037_b0115
  article-title: Technical update: least-squares temporal difference learning
  publication-title: Machine Learning
  doi: 10.1023/A:1017936530646
– volume: 71
  start-page: 1180
  year: 2008
  ident: 10.1016/j.ins.2013.08.037_b0445
  article-title: Natural actor–critic
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2007.11.026
– volume: 14
  start-page: 295
  year: 1994
  ident: 10.1016/j.ins.2013.08.037_b0175
  article-title: TD(λ) converges with probability 1
  publication-title: Machine Learning
  doi: 10.1023/A:1022657612745
– volume: 13
  start-page: 41
  issue: 1-2
  year: 2003
  ident: 10.1016/j.ins.2013.08.037_b0075
  article-title: Recent advances in hierarchical reinforcement learning
  publication-title: Discrete Event Dynamic Systems-Theory and Applications
  doi: 10.1023/A:1022140919877
– volume: 46
  start-page: 878
  issue: 5
  year: 2010
  ident: 10.1016/j.ins.2013.08.037_b0625
  article-title: Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem
  publication-title: Automatica
  doi: 10.1016/j.automatica.2010.02.018
– ident: 10.1016/j.ins.2013.08.037_b0455
– volume: 13
  start-page: 227
  year: 2000
  ident: 10.1016/j.ins.2013.08.037_b0185
  article-title: Hierarchical reinforcement learning with the Max-Q value function decomposition
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.639
– start-page: 298
  year: 1993
  ident: 10.1016/j.ins.2013.08.037_b0510
  article-title: A reinforcement learning method for maximizing undiscounted rewards
– volume: 22
  start-page: 1863
  issue: 12
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0725
  article-title: Hierarchical approximate policy iteration with binary-tree state space decomposition
  publication-title: IEEE Transactions on Neural Networks
  doi: 10.1109/TNN.2011.2168422
– volume: 13
  start-page: 834
  issue: 5
  year: 1983
  ident: 10.1016/j.ins.2013.08.037_b0080
  article-title: Neuron-like adaptive elements that can solve difficult learning control problems
  publication-title: IEEE Transactions on Systems, Man, and Cybernetics
  doi: 10.1109/TSMC.1983.6313077
– ident: 10.1016/j.ins.2013.08.037_b0390
– ident: 10.1016/j.ins.2013.08.037_b0690
– volume: 155
  start-page: 654
  year: 2004
  ident: 10.1016/j.ins.2013.08.037_b0255
  article-title: Reinforcement learning for long-run average cost
  publication-title: European Journal of Operational Research
  doi: 10.1016/S0377-2217(02)00874-3
– volume: 27
  start-page: 1536
  issue: 10
  year: 2005
  ident: 10.1016/j.ins.2013.08.037_b0745
  article-title: Integrating relevance feedback techniques for image retrieval using reinforcement learning
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
  doi: 10.1109/TPAMI.2005.201
– volume: 22
  start-page: 2226
  issue: 12
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0755
  article-title: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method
  publication-title: IEEE Transactions on Neural Networks
  doi: 10.1109/TNN.2011.2168538
– volume: 22
  start-page: 906
  issue: 6
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0295
  article-title: Transformation invariant on-line target recognition
  publication-title: IEEE Transactions on Neural Networks
  doi: 10.1109/TNN.2011.2132737
– volume: 15
  start-page: 319
  year: 2001
  ident: 10.1016/j.ins.2013.08.037_b0085
  article-title: Infinite-horizon policy-gradient estimation
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.806
– volume: 19
  start-page: 893
  issue: 4
  year: 1996
  ident: 10.1016/j.ins.2013.08.037_b0065
  article-title: Adaptive-critic-based neural networks for aircraft optimal control
  publication-title: Journal of Guidance, Control, Dynamics
  doi: 10.2514/3.21715
– start-page: 1019
  year: 2003
  ident: 10.1016/j.ins.2013.08.037_b0050
  article-title: Covariant policy search
– volume: 13
  start-page: 79
  issue: 1
  year: 2003
  ident: 10.1016/j.ins.2013.08.037_b0410
  article-title: Least squares policy evaluation algorithms with linear function approximation
  publication-title: Discrete Event Dynamic Systems
  doi: 10.1023/A:1022192903948
– volume: 30
  start-page: 416
  issue: 2
  year: 2005
  ident: 10.1016/j.ins.2013.08.037_b0150
  article-title: A behavior-based scheme using reinforcement learning for autonomous underwater vehicles
  publication-title: IEEE Journal of Oceanic Engineering
  doi: 10.1109/JOE.2004.835805
– issue: NIPS 2008
  year: 2008
  ident: 10.1016/j.ins.2013.08.037_b0325
  article-title: Policy search for motor primitives in robotics
  publication-title: Advances in Neural Information Processing Systems
– volume: 65
  start-page: 167
  year: 2006
  ident: 10.1016/j.ins.2013.08.037_b0260
  article-title: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
  publication-title: Machine Learning
  doi: 10.1007/s10994-006-8365-9
– volume: 177
  start-page: 3764
  issue: 18
  year: 2007
  ident: 10.1016/j.ins.2013.08.037_b0660
  article-title: A fuzzy actor–critic reinforcement learning network
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2007.03.012
– volume: 8
  start-page: 2629
  year: 2007
  ident: 10.1016/j.ins.2013.08.037_b0275
  article-title: Hierarchical average reward reinforcement learning
  publication-title: Journal of Machine Learning Research
– volume: 16
  start-page: 259
  year: 2002
  ident: 10.1016/j.ins.2013.08.037_b0705
  article-title: Efficient reinforcement learning using recursive least-squares methods
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.946
– volume: 16
  start-page: 1219
  issue: 5
  year: 2005
  ident: 10.1016/j.ins.2013.08.037_b0355
  article-title: A self-learning call admission control scheme for CDMA cellular networks
  publication-title: IEEE Transactions on Neural Networks
  doi: 10.1109/TNN.2005.853408
– ident: 10.1016/j.ins.2013.08.037_b0620
– volume: 11
  start-page: 54
  issue: 9
  year: 2005
  ident: 10.1016/j.ins.2013.08.037_b0710
  article-title: Kernel least-squares temporal difference learning
  publication-title: International Journal of Information Technology
– start-page: 1043
  year: 1998
  ident: 10.1016/j.ins.2013.08.037_b0430
  article-title: Reinforcement learning with hierarchies of machines
– ident: 10.1016/j.ins.2013.08.037_b0740
  doi: 10.1109/ICCW.2010.5503970
– volume: 76
  start-page: 243
  year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0315
  article-title: Hybrid least-squares algorithms for approximate policy evaluation
  publication-title: Machine Learning
  doi: 10.1007/s10994-009-5128-4
– year: 1998
  ident: 10.1016/j.ins.2013.08.037_b0550
– volume: 38
  issue: 4
  year: 2008
  ident: 10.1016/j.ins.2013.08.037_b0345
  article-title: Special issue on approximate dynamic programming and reinforcement learning for feedback control
  publication-title: IEEE Transactions on Systems, Man, and Cybernetics B
  doi: 10.1109/TSMCB.2008.925890
– volume: 9
  start-page: 32
  issue: 3
  year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0350
  article-title: Reinforcement learning and adaptive dynamic programming for feedback control
  publication-title: IEEE Circuits and Systems Magazine
  doi: 10.1109/MCAS.2009.933854
– volume: 3
  start-page: 43
  issue: 1
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0385
  article-title: Reinforcement learning in first person shooter games
  publication-title: IEEE Transactions on Computational Intelligence and AI in Games
  doi: 10.1109/TCIAIG.2010.2100395
– volume: 3
  start-page: 1
  year: 2002
  ident: 10.1016/j.ins.2013.08.037_b0040
  article-title: Kernel independent component analysis
  publication-title: Journal of Machine Learning Research
– ident: 10.1016/j.ins.2013.08.037_b0250
  doi: 10.1007/978-3-540-45167-9_11
– ident: 10.1016/j.ins.2013.08.037_b0280
– volume: 19
  start-page: 1225
  issue: 3
  year: 2004
  ident: 10.1016/j.ins.2013.08.037_b0645
  article-title: Reinforcement learning for reactive power control
  publication-title: IEEE Transactions on Power Systems
  doi: 10.1109/TPWRS.2004.831259
– ident: 10.1016/j.ins.2013.08.037_b0285
  doi: 10.1007/978-3-540-76928-6_8
– volume: 3
  start-page: 211
  year: 1959
  ident: 10.1016/j.ins.2013.08.037_b0495
  article-title: Some studies in machine learning using game of checkers
  publication-title: IBM Jounal on Research and Development
– volume: 8
  start-page: 341
  year: 1992
  ident: 10.1016/j.ins.2013.08.037_b0170
  article-title: The convergence of TD(λ) for general λ
  publication-title: Machine Learning
  doi: 10.1023/A:1022632907294
– volume: 4
  start-page: 128
  issue: 2
  year: 2010
  ident: 10.1016/j.ins.2013.08.037_b0035
  article-title: Reinforcement learning-based multi-agent system for network traffic signal control
  publication-title: IET Intelligent Transport Systems
  doi: 10.1049/iet-its.2009.0070
– year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0145
– year: 2002
  ident: 10.1016/j.ins.2013.08.037_b0500
– ident: 10.1016/j.ins.2013.08.037_b0685
  doi: 10.1109/ADPRL.2007.368190
– start-page: 200
  year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0680
  article-title: Intelligence in the brain: a theory of how it works and how to build it
  publication-title: Neural Networks
  doi: 10.1016/j.neunet.2009.03.012
– volume: 14
  start-page: 929
  issue: 4
  year: 2003
  ident: 10.1016/j.ins.2013.08.037_b0215
  article-title: Helicopter trimming and tracking control using direct neural dynamic programming
  publication-title: IEEE Transactions on Neural Networks
  doi: 10.1109/TNN.2003.813839
– ident: 10.1016/j.ins.2013.08.037_b0490
  doi: 10.7551/mitpress/7503.003.0151
– year: 2004
  ident: 10.1016/j.ins.2013.08.037_b0070
  article-title: Reinforcement learning and its relationship to supervised learning
– volume: 30
  start-page: 54
  issue: 1
  year: 2012
  ident: 10.1016/j.ins.2013.08.037_b0765
  article-title: Reinforcement learning for repeated power control game in cognitive radio networks
  publication-title: IEEE Journal on Selected Areas in Communications
  doi: 10.1109/JSAC.2012.120106
– ident: 10.1016/j.ins.2013.08.037_b0045
  doi: 10.1109/ROBOT.2001.932842
– volume: 42
  start-page: 674
  issue: 5
  year: 1997
  ident: 10.1016/j.ins.2013.08.037_b0615
  article-title: An analysis of temporal difference learning with function approximation
  publication-title: IEEE Transactions on Automatic Control
  doi: 10.1109/9.580874
– volume: 12
  start-page: 19
  issue: 2
  year: 1992
  ident: 10.1016/j.ins.2013.08.037_b0555
  article-title: Reinforcement learning is direct adaptive control
  publication-title: IEEE Control Systems
  doi: 10.1109/37.126844
– volume: vol. 2167
  start-page: 97
  year: 2001
  ident: 10.1016/j.ins.2013.08.037_b0200
  article-title: Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner
– ident: 10.1016/j.ins.2013.08.037_b0025
  doi: 10.1109/ACC.2009.5160611
– year: 2010
  ident: 10.1016/j.ins.2013.08.037_b0140
– volume: 145
  start-page: 45
  year: 2002
  ident: 10.1016/j.ins.2013.08.037_b0770
  article-title: Robot learning with GA-based fuzzy reinforcement learning agents
  publication-title: Information Sciences
  doi: 10.1016/S0020-0255(02)00223-2
– volume: 49
  start-page: 161
  issue: 2-3
  year: 2002
  ident: 10.1016/j.ins.2013.08.037_b0425
  article-title: Kernel-based reinforcement learning
  publication-title: Machine Learning
  doi: 10.1023/A:1017928328829
– volume: 38
  start-page: 287
  year: 2000
  ident: 10.1016/j.ins.2013.08.037_b0530
  article-title: Convergence results for single-step on-policy reinforcement-learning algorithms
  publication-title: Machine Learning
  doi: 10.1023/A:1007678930559
– volume: 10
  start-page: 859
  issue: 3
  year: 2010
  ident: 10.1016/j.ins.2013.08.037_b0720
  article-title: Sequential anomaly detection based on temporal-difference learning: principles, models and case studies
  publication-title: Applied Soft Computing
  doi: 10.1016/j.asoc.2009.10.003
– volume: 12
  start-page: 412
  issue: 2
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0465
  article-title: Reinforcement learning with function approximation for traffic signal control
  publication-title: IEEE Transactions on Intelligence Transportation Systems
  doi: 10.1109/TITS.2010.2091408
– volume: 18
  start-page: 973
  issue: 4
  year: 2007
  ident: 10.1016/j.ins.2013.08.037_b0715
  article-title: Kernel based least-squares policy iteration for reinforcement learning
  publication-title: IEEE Transactions on Neural Networks
  doi: 10.1109/TNN.2007.899161
– ident: 10.1016/j.ins.2013.08.037_b0210
– ident: 10.1016/j.ins.2013.08.037_b0380
  doi: 10.1109/ICASSP.2012.6288330
– volume: 45
  start-page: 477
  issue: 2
  year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0650
  article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration
  publication-title: Automatica
  doi: 10.1016/j.automatica.2008.08.017
– volume: 33
  start-page: 235
  issue: 2–3
  year: 1998
  ident: 10.1016/j.ins.2013.08.037_b0155
  article-title: Elevator group control using multiple reinforcement learning agents
  publication-title: Machine Learning
  doi: 10.1023/A:1007518724497
– start-page: 719
  year: 2010
  ident: 10.1016/j.ins.2013.08.037_b0365
  article-title: Toward off-policy learning control with function approximation
– volume: 13
  start-page: 165
  issue: 3
  year: 2005
  ident: 10.1016/j.ins.2013.08.037_b0545
  article-title: Reinforcement learning for RoboCup-soccer keepaway
  publication-title: Adaptive Behavior
  doi: 10.1177/105971230501300301
– volume: 27
  start-page: 135
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0305
  article-title: Reinforcement based mobile robot navigation in dynamic environment
  publication-title: Robotics and Computer-Integrated Manufacturing
  doi: 10.1016/j.rcim.2010.06.019
– volume: 4
  start-page: 1107
  year: 2003
  ident: 10.1016/j.ins.2013.08.037_b0340
  article-title: Least-squares policy iteration
  publication-title: Journal of Machine Learning Research
– ident: 10.1016/j.ins.2013.08.037_b0485
– volume: 8
  start-page: 279
  year: 1992
  ident: 10.1016/j.ins.2013.08.037_b0675
  article-title: Q-Learning
  publication-title: Machine Learning
– ident: 10.1016/j.ins.2013.08.037_b0760
– ident: 10.1016/j.ins.2013.08.037_b0580
– volume: 5
  start-page: 1309
  issue: 10
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0310
  article-title: Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing
  publication-title: IET Communication
  doi: 10.1049/iet-com.2010.0258
– volume: vol. 14
  start-page: 1491
  year: 2002
  ident: 10.1016/j.ins.2013.08.037_b0180
  article-title: Batch value function approximation via support vectors
– volume: 22
  start-page: 85
  issue: 1
  year: 2007
  ident: 10.1016/j.ins.2013.08.037_b0405
  article-title: A reinforcement learning model to assess market power under auction-based energy pricingm
  publication-title: IEEE Transactions on Power Systems
  doi: 10.1109/TPWRS.2006.888977
– volume: 9
  start-page: 974
  issue: NIPS 1996
  year: 1997
  ident: 10.1016/j.ins.2013.08.037_b0525
  article-title: Reinforcement learning for dynamic channel allocation in cellular telephone systems
  publication-title: Advances in Neural Information Processsing Systems
– volume: 6
  start-page: 215
  year: 1994
  ident: 10.1016/j.ins.2013.08.037_b0600
  article-title: TD-Gammon, a self-teaching backgammon program, achieves master-level play
  publication-title: Neural Computation
  doi: 10.1162/neco.1994.6.2.215
– volume: 176
  start-page: 2121
  issue: 15
  year: 2006
  ident: 10.1016/j.ins.2013.08.037_b0420
  article-title: Adaptive stock trading with dynamic asset allocation using reinforcement learning
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2005.10.009
– volume: SMC-3
  start-page: 455
  issue: 5
  year: 1973
  ident: 10.1016/j.ins.2013.08.037_b0695
  article-title: Punish/reward: Learning with a critic in adaptive threshold systems
  publication-title: IEEE Transactions on Systems, Man, and Cybernetics
  doi: 10.1109/TSMC.1973.4309272
– volume: 16
  issue: NIPS 2003
  year: 2004
  ident: 10.1016/j.ins.2013.08.037_b0415
  article-title: Autonomous helicopter flight via reinforcement learning
  publication-title: Advances in Neural Information Processing Systems
– year: 1998
  ident: 10.1016/j.ins.2013.08.037_b0635
– ident: 10.1016/j.ins.2013.08.037_b0560
– start-page: 123
  year: 2003
  ident: 10.1016/j.ins.2013.08.037_b0205
  article-title: Relational instance based regression for relational reinforcement learning
– volume: 6
  start-page: 13
  issue: 4
  year: 1999
  ident: 10.1016/j.ins.2013.08.037_b0395
  article-title: Cognitive radio: making software radios more personal
  publication-title: IEEE Personal Communications
  doi: 10.1109/98.788210
– year: 2008
  ident: 10.1016/j.ins.2013.08.037_b0290
– year: 1983
  ident: 10.1016/j.ins.2013.08.037_b0540
– ident: 10.1016/j.ins.2013.08.037_b0130
– volume: 6
  start-page: 503
  year: 2005
  ident: 10.1016/j.ins.2013.08.037_b0220
  article-title: Tree-based batch mode reinforcement learning
  publication-title: Journal of Machine Learning Research
– year: 2007
  ident: 10.1016/j.ins.2013.08.037_b0460
– volume: 4
  start-page: 177
  issue: 3
  year: 2010
  ident: 10.1016/j.ins.2013.08.037_b0060
  article-title: Urban traffic signal control using reinforcement learning agents
  publication-title: IET Intelligent Transport Systems
  doi: 10.1049/iet-its.2009.0096
– volume: 8
  start-page: 2169
  year: 2007
  ident: 10.1016/j.ins.2013.08.037_b0375
  article-title: Proto-value functions: a laplacian framework for learning representation and control in markov decision processes
  publication-title: Journal of Machine Learning Research
– year: 2010
  ident: 10.1016/j.ins.2013.08.037_b0595
– volume: 57
  start-page: 271
  year: 2004
  ident: 10.1016/j.ins.2013.08.037_b0195
  article-title: Integrating guidance into relational reinforcement learning
  publication-title: Machine Learning
  doi: 10.1023/B:MACH.0000039779.47329.3a
– ident: 10.1016/j.ins.2013.08.037_b0570
  doi: 10.1145/1553374.1553501
– volume: vol. 12
  year: 2000
  ident: 10.1016/j.ins.2013.08.037_b0335
  article-title: Actor–critic algorithms
– year: 2010
  ident: 10.1016/j.ins.2013.08.037_b0700
– volume: 10
  start-page: 251
  issue: 2
  year: 1998
  ident: 10.1016/j.ins.2013.08.037_b0020
  article-title: Natural gradient works efficiently in learning
  publication-title: Neural Computation
  doi: 10.1162/089976698300017746
– volume: 64
  start-page: 91
  issue: 1–3
  year: 2006
  ident: 10.1016/j.ins.2013.08.037_b0245
  article-title: Graph kernels and Gaussian Processes for relational reinforcement learning
  publication-title: Machine Learning
  doi: 10.1007/s10994-006-8258-y
– year: 2008
  ident: 10.1016/j.ins.2013.08.037_b0105
– ident: 10.1016/j.ins.2013.08.037_b0110
– volume: 19
  start-page: 427
  issue: 1
  year: 2004
  ident: 10.1016/j.ins.2013.08.037_b0225
  article-title: Power systems stability control: reinforcement learning framework
  publication-title: IEEE Transactions on Power Systems
  doi: 10.1109/TPWRS.2003.821457
– volume: 10
  start-page: 1000
  issue: 3
  year: 1999
  ident: 10.1016/j.ins.2013.08.037_b0505
  article-title: Input space vs feature space in kernel-based algorithms
  publication-title: IEEE Transactions on Neural Networks
  doi: 10.1109/72.788641
– volume: 16
  start-page: 227
  issue: 3
  year: 1994
  ident: 10.1016/j.ins.2013.08.037_b0535
  article-title: An upper bound on the loss from approximate optimal value functions
  publication-title: Machine Learning
  doi: 10.1023/A:1022693225949
– ident: 10.1016/j.ins.2013.08.037_b0270
  doi: 10.7551/mitpress/7503.003.0062
– ident: 10.1016/j.ins.2013.08.037_b0610
– volume: 47
  start-page: 1556
  issue: 8
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0630
  article-title: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton–Jacobi equations
  publication-title: Automatica
  doi: 10.1016/j.automatica.2011.03.005
– ident: 10.1016/j.ins.2013.08.037_b0165
– volume: 6
  issue: NIPS 1994
  year: 1994
  ident: 10.1016/j.ins.2013.08.037_b0120
  article-title: Packet routing in dynamically changing networks: a reinforcement learning approach
  publication-title: Advances in neural information processing systems
– volume: 29
  start-page: 291
  issue: 5
  year: 1997
  ident: 10.1016/j.ins.2013.08.037_b0100
  article-title: Stochastic approximation with two time scales
  publication-title: Systems & Control Letters
  doi: 10.1016/S0167-6911(97)90015-3
– volume: vol. 22
  year: 2010
  ident: 10.1016/j.ins.2013.08.037_b0360
  article-title: Convergent temporal-difference learning with arbitrary smooth function approximation
– volume: 6
  start-page: 185
  issue: 6
  year: 1994
  ident: 10.1016/j.ins.2013.08.037_b0300
  article-title: On the convergence of stochastic iterative dynamic programming algorithms
  publication-title: Neural Computation
  doi: 10.1162/neco.1994.6.6.1185
– ident: 10.1016/j.ins.2013.08.037_b0605
– volume: 112
  start-page: 181
  year: 1999
  ident: 10.1016/j.ins.2013.08.037_b0585
  article-title: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
  publication-title: Artificial Intelligence
  doi: 10.1016/S0004-3702(99)00052-1
– volume: 15
  start-page: 1055
  issue: 6
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0730
  article-title: Continuous-action reinforcement learning with fast policy search and adaptive basis function selection
  publication-title: Soft Computing – A Fusion of Foundations, Methodologies and Applications
– volume: 72
  start-page: 3447
  year: 2009
  ident: 10.1016/j.ins.2013.08.037_b0515
  article-title: Predicting investment behavior: an augmented reinforcement learning model
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2008.11.031
– volume: 43
  start-page: 473
  year: 2007
  ident: 10.1016/j.ins.2013.08.037_b0010
  article-title: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
  publication-title: Automatica
  doi: 10.1016/j.automatica.2006.09.019
– volume: 24
  start-page: 762
  issue: 5
  year: 2013
  ident: 10.1016/j.ins.2013.08.037_b0735
  article-title: Online learning control using adaptive critic designs with sparse kernel machines
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
  doi: 10.1109/TNNLS.2012.2236354
– volume: 59
  start-page: 1823
  issue: 4
  year: 2010
  ident: 10.1016/j.ins.2013.08.037_b0240
  article-title: Distributed Q-Learning for aggregated interference control in cognitive radio networks
  publication-title: IEEE Transactions on Vehicular Technology
  doi: 10.1109/TVT.2010.2043124
– start-page: 441
  year: 2008
  ident: 10.1016/j.ins.2013.08.037_b0235
  article-title: Regularized policy iteration
  publication-title: NIPS
– ident: 10.1016/j.ins.2013.08.037_b0190
  doi: 10.1007/3-540-44914-0_2
– year: 2006
  ident: 10.1016/j.ins.2013.08.037_b0015
  article-title: Adaptive critic designs for discrete-time zero-sum games with application to H-Infinity control
  publication-title: IEEE Transactions on Systems Man Cybernetics-Part B
– volume: 8
  issue: NIPS 1995
  year: 1996
  ident: 10.1016/j.ins.2013.08.037_b0160
  article-title: Improving elevator performance using reinforcement learning
  publication-title: Advances in Neural Information Processing Systems
– start-page: 1531
  year: 2002
  ident: 10.1016/j.ins.2013.08.037_b0320
  article-title: A natural policy gradient
  publication-title: Advances in Neural Information Processing Systems
– volume: 22
  start-page: 33
  year: 1996
  ident: 10.1016/j.ins.2013.08.037_b0135
  article-title: Linear least-squares algorithms for temporal difference learning
  publication-title: Machine Learning
  doi: 10.1023/A:1018056104778
– volume: 21
  start-page: 1744
  issue: 4
  year: 2006
  ident: 10.1016/j.ins.2013.08.037_b0400
  article-title: Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system
  publication-title: IEEE Transactions on Power Systems
  doi: 10.1109/TPWRS.2006.882467
– volume: 13
  start-page: 764
  issue: 3
  year: 2002
  ident: 10.1016/j.ins.2013.08.037_b0640
  article-title: Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator
  publication-title: IEEE Transactions on Neural Networks
  doi: 10.1109/TNN.2002.1000146
– ident: 10.1016/j.ins.2013.08.037_b0330
– volume: 8
  start-page: 997
  issue: 5
  year: 1997
  ident: 10.1016/j.ins.2013.08.037_b0470
  article-title: Adaptive critic designs
  publication-title: IEEE Transactions Neural Networks
  doi: 10.1109/72.623201
– ident: 10.1016/j.ins.2013.08.037_b0030
– start-page: 30
  year: 1995
  ident: 10.1016/j.ins.2013.08.037_b0055
  article-title: Residual algorithms: reinforcement learning with function approximation
– volume: 26
  start-page: 1272
  issue: 3
  year: 2011
  ident: 10.1016/j.ins.2013.08.037_b0750
  article-title: Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q(λ) learning
  publication-title: IEEE Transactions on Power Systems
  doi: 10.1109/TPWRS.2010.2102372
– ident: 10.1016/j.ins.2013.08.037_b0450
  doi: 10.1109/IROS.2006.282564
SSID ssj0004766
Score 2.514319
Snippet In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Approximate dynamic programming
Function approximation
Generalization
Learning control
Reinforcement learning
Title Reinforcement learning algorithms with function approximation: Recent advances and applications
URI https://dx.doi.org/10.1016/j.ins.2013.08.037
Volume 261
WOSCitedRecordID wos000331689700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Pb9MwFLbKxgEOCAaIjQ35gDiAIsWOXcfcpmkI0DQhNKRoByLHdtZMXTZ17dQ_H_9M3cEQO3CJqqhxonxfnp-f3_seAG-FMfqkVWUmhRXVzpHKSjXWmSp5g2mLiSauUPiIHR-XVcW_jUY_Yy3MzZT1fblc8qv_CrU5Z8C2pbP3gHsY1Jwwvw3o5mhgN8d_Av67dmKo0sX9YleIsw9ienY56-aTi1DPZic03yXcqoovu4shzcN4ki7x3GcHeA3ndJ879WdDNZMbKEymg5NeLSx-VTfQ73Th4rJHuluRKYSrTye6nyxEGoNAJHMJbaldNYtQuzpJ7Sr2KuvBMqJkivV2_zfj7eMI52bFYXXUUeG0Vb0kzLpQ9q0JbEgrjBlr57UZorZD1LbHZsEegE3MKDdWb3P_y2H1dVU5y_xudnz-uO_tMgBvPcefPZfEGzl5Cp6EZQTc9_A_AyPdb4HHibjkFtgLJSnwHUxQgsGYPwf1GlFgJApcEQVaosBIFLhGlI_Q0wRGmkBDE5jS5AX48enw5OBzFtptZBJzNs8oQ0ihXCpVjBXGSouGkIboVgjE2kJiJSmlzbhoRU4EzwXWY07NcoDkTKJSFC_BRn_Z61cAMp7LvGwkoa0giuuSMkWFIqzBkosWb4M8vspaBi162xJlWt8J4TZ4P1xy5YVY_vZnEvGpA_m9h1gbrt192c597vEaPFp9CrtgYz5b6D3wUN7Mu-vZm0C0XyfQmJc
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reinforcement+learning+algorithms+with+function+approximation%3A+Recent+advances+and+applications&rft.jtitle=Information+sciences&rft.au=Xu%2C+Xin&rft.au=Zuo%2C+Lei&rft.au=Huang%2C+Zhenhua&rft.date=2014-03-10&rft.issn=0020-0255&rft.volume=261&rft.spage=1&rft.epage=31&rft_id=info:doi/10.1016%2Fj.ins.2013.08.037&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_ins_2013_08_037
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0020-0255&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0020-0255&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0020-0255&client=summon