Kernel-Based Reinforcement Learning

We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second...

Full description

Saved in:
Bibliographic Details
Published in:Machine learning Vol. 49; no. 2-3; pp. 161 - 178
Main Authors: Ormoneit, Dirk, Sen, Śaunak
Format: Journal Article
Language:English
Published: Dordrecht Springer Nature B.V 01.11.2002
Subjects:
ISSN:0885-6125, 1573-0565
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.[PUBLICATION ABSTRACT]
AbstractList We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not posses this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.
We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.[PUBLICATION ABSTRACT]
Author Ormoneit, Dirk
Sen, Śaunak
Author_xml – sequence: 1
  givenname: Dirk
  surname: Ormoneit
  fullname: Ormoneit, Dirk
– sequence: 2
  givenname: Śaunak
  surname: Sen
  fullname: Sen, Śaunak
BookMark eNp1kEtLAzEUhYMo2FbXbouCu7F5THIz7mrxhQVBdB1i5kZSpklNpgv_vSN1Y8HV2Xzf4XDG5DCmiIScMXrFKBez-TWjDBquBdeaNwdkxCSIikolD8mIai0rxbg8JuNSVpRSrrQakYsnzBG76sYWbKcvGKJP2eEaYz9dos0xxI8TcuRtV_D0Nyfk7e72dfFQLZ_vHxfzZeWE0H2lvIVWKMWVrzXSWjtoobZKeODvHmwN2CKFFoXlFhqUUgIqa6XjnErqxYRc7no3OX1usfRmHYrDrrMR07YYDkwpqdgAnu-Bq7TNcdhmQALlDTA5QLMd5HIqJaM3mxzWNn8ZRs3PY2Zu_jw2GHLPcKG3fUixzzZ0_3rfx6tuUA
CitedBy_id crossref_primary_10_1016_j_artint_2007_08_001
crossref_primary_10_1109_TNNLS_2015_2442233
crossref_primary_10_1061__ASCE_WR_1943_5452_0001182
crossref_primary_10_1007_s00466_023_02335_6
crossref_primary_10_1109_TNNLS_2012_2236354
crossref_primary_10_3390_app142311114
crossref_primary_10_1016_j_apm_2016_05_049
crossref_primary_10_1007_s10458_008_9056_7
crossref_primary_10_1002_wrcr_20295
crossref_primary_10_1214_24_AOS2399
crossref_primary_10_1007_s10479_018_2910_3
crossref_primary_10_1016_j_sysconle_2023_105563
crossref_primary_10_1214_22_AOS2231
crossref_primary_10_1109_TNNLS_2017_2702566
crossref_primary_10_1109_TVT_2016_2603536
crossref_primary_10_1016_j_clinthera_2021_11_002
crossref_primary_10_1007_s10994_007_5038_2
crossref_primary_10_1109_ACCESS_2020_3027765
crossref_primary_10_1109_TNN_2007_899161
crossref_primary_10_1109_JPROC_2011_2109671
crossref_primary_10_1016_j_arcontrol_2018_09_005
crossref_primary_10_1007_s00245_025_10304_z
crossref_primary_10_1111_j_1541_0420_2011_01572_x
crossref_primary_10_1016_j_neunet_2014_01_002
crossref_primary_10_1016_S1005_8885_09_60495_7
crossref_primary_10_1109_TAES_2022_3208865
crossref_primary_10_1038_srep00400
crossref_primary_10_1080_17460441_2022_2072288
crossref_primary_10_1007_s10479_012_1248_5
crossref_primary_10_1007_s10994_010_5186_7
crossref_primary_10_1016_j_biosystems_2016_05_007
crossref_primary_10_1007_s10458_015_9284_6
crossref_primary_10_1007_s11633_015_0893_y
crossref_primary_10_1016_j_asoc_2023_110975
crossref_primary_10_1214_18_STS672
crossref_primary_10_1016_j_ins_2013_08_037
crossref_primary_10_1016_j_knosys_2023_110902
crossref_primary_10_1109_TCYB_2014_2311578
crossref_primary_10_1007_s00607_019_00760_1
crossref_primary_10_1109_TSG_2020_3027728
crossref_primary_10_1109_TAC_2015_2505403
crossref_primary_10_1287_opre_2015_1425
crossref_primary_10_1109_TNNLS_2013_2270561
crossref_primary_10_1016_j_automatica_2024_111517
crossref_primary_10_1002_acs_2344
crossref_primary_10_3390_a14100291
crossref_primary_10_1145_2766910
crossref_primary_10_1016_j_ejor_2014_08_023
crossref_primary_10_20965_jaciii_2016_p1135
crossref_primary_10_1109_TPAMI_2021_3088063
crossref_primary_10_1016_j_cor_2013_09_006
crossref_primary_10_1146_annurev_statistics_022513_115553
crossref_primary_10_1287_ijoc_1110_0470
crossref_primary_10_3724_SP_J_1004_2012_00673
crossref_primary_10_1371_journal_pone_0205839
crossref_primary_10_3390_math12243935
crossref_primary_10_1016_j_engappai_2018_09_007
crossref_primary_10_1109_TAC_2002_803530
crossref_primary_10_1007_s00500_016_2248_1
crossref_primary_10_1016_j_physa_2023_128901
crossref_primary_10_1109_TIE_2020_3047041
crossref_primary_10_1016_j_ifacol_2017_08_340
crossref_primary_10_1016_j_engappai_2017_07_005
crossref_primary_10_1016_j_jfranklin_2022_01_016
crossref_primary_10_1109_TSP_2024_3505266
crossref_primary_10_1109_TAC_2020_3029315
crossref_primary_10_1007_s10994_011_5251_x
crossref_primary_10_2139_ssrn_3790066
crossref_primary_10_1007_s00354_015_0102_0
crossref_primary_10_1007_s10690_017_9226_1
crossref_primary_10_1109_TCBB_2016_2595577
crossref_primary_10_1177_02783649241238766
crossref_primary_10_1109_TAC_2019_2907414
crossref_primary_10_3390_stats4010001
crossref_primary_10_1038_s41467_025_60085_w
crossref_primary_10_1111_insr_12617
crossref_primary_10_1109_TSG_2019_2936142
crossref_primary_10_1109_TCDS_2020_3034452
crossref_primary_10_1109_TSG_2016_2517211
crossref_primary_10_1016_j_drugalcdep_2007_01_005
crossref_primary_10_1287_ijoc_1080_0305
crossref_primary_10_1007_s00521_017_3066_9
crossref_primary_10_1093_biomet_asy043
crossref_primary_10_1137_120867263
crossref_primary_10_1016_j_jprocont_2010_06_007
crossref_primary_10_1109_TNNLS_2022_3176204
crossref_primary_10_1137_S0040585X97T987910
crossref_primary_10_1109_TCIAIG_2014_2369345
crossref_primary_10_1145_3477600
crossref_primary_10_1051_e3sconf_201911105013
crossref_primary_10_1007_s10489_021_02953_8
crossref_primary_10_1016_j_jprocont_2005_04_010
crossref_primary_10_1146_annurev_psych_122414_033625
crossref_primary_10_1109_TCST_2013_2246866
crossref_primary_10_1007_s11432_011_4332_6
crossref_primary_10_1016_j_cie_2022_108934
crossref_primary_10_1029_2009WR008898
crossref_primary_10_1007_s12555_024_0990_1
crossref_primary_10_1038_ncomms15958
crossref_primary_10_1109_TNNLS_2013_2247418
crossref_primary_10_1515_ijb_2015_0052
crossref_primary_10_1007_s10994_006_8258_y
crossref_primary_10_1109_TSMC_2019_2958846
crossref_primary_10_1109_TIE_2022_3192676
crossref_primary_10_1109_TAC_2019_2912443
crossref_primary_10_4213_tvp5033
crossref_primary_10_1137_130907070
crossref_primary_10_1016_j_neucom_2008_12_019
crossref_primary_10_1145_2185520_2185524
crossref_primary_10_3390_aerospace9060294
crossref_primary_10_1016_j_knosys_2024_112230
crossref_primary_10_1016_j_knosys_2016_03_007
crossref_primary_10_1016_j_engappai_2019_04_001
crossref_primary_10_1016_j_artmed_2020_101964
crossref_primary_10_1109_TSMC_2020_2966631
crossref_primary_10_1109_TSMCC_2007_913919
crossref_primary_10_1007_s10458_009_9104_y
crossref_primary_10_1016_j_trc_2022_103640
crossref_primary_10_1137_13091333X
crossref_primary_10_1016_j_ins_2018_12_019
crossref_primary_10_1016_j_eswa_2023_120495
crossref_primary_10_1109_JSTSP_2017_2787979
Cites_doi 10.1162/neco.1989.1.3.321
10.1016/0893-6080(90)90088-3
10.1111/0022-1082.00162
10.2307/2171751
10.21236/ADA280844
10.1007/978-1-4612-0711-5
10.1109/9.793723
10.1109/TAC.2002.803530
10.1002/9780470316887
10.1023/A:1006511328852
10.1016/0022-1236(75)90056-7
10.1214/aoms/1177729586
10.1214/aos/1176345969
ContentType Journal Article
Copyright Kluwer Academic Publishers 2002
Copyright_xml – notice: Kluwer Academic Publishers 2002
DBID AAYXX
CITATION
3V.
7SC
7XB
88I
8AL
8AO
8FD
8FE
8FG
8FK
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L7M
L~C
L~D
M0N
M2P
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
Q9U
DOI 10.1023/A:1017928328829
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ProQuest Central (purchase pre-March 2016)
Science Database (Alumni Edition)
Computing Database (Alumni Edition)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials - QC
ProQuest Central
ProQuest Technology Collection
ProQuest One
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database (ProQuest)
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Computing Database
Science Database (ProQuest)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
DatabaseTitle CrossRef
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Pharma Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies Database with Aerospace
Advanced Technologies & Aerospace Collection
ProQuest Computing
ProQuest Science Journals (Alumni Edition)
ProQuest Central Basic
ProQuest Science Journals
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest Central (Alumni)
ProQuest One Academic (New)
DatabaseTitleList Computer and Information Systems Abstracts
Computer Science Database
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0565
EndPage 178
ExternalDocumentID 2157618351
10_1023_A_1017928328829
Genre Feature
GroupedDBID -Y2
-~C
-~X
.4S
.86
.DC
.VR
06D
0R~
0VY
199
1N0
1SB
2.D
203
28-
29M
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
6TJ
78A
88I
8AO
8FE
8FG
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAEWM
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AAPKM
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYXX
ABAKF
ABBBX
ABBRH
ABBXA
ABDBE
ABDZT
ABECU
ABFSG
ABFTV
ABHLI
ABHQN
ABIVO
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABRTQ
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACGOD
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACNCT
ACOKC
ACOMO
ACPIV
ACSTC
ACZOJ
ADHHG
ADHIR
ADHKG
ADIMF
ADKFA
ADKNI
ADKPE
ADMLS
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AETLH
AEVLU
AEXYK
AEZWR
AFBBN
AFDZB
AFEXP
AFFHD
AFGCZ
AFHIU
AFKRA
AFLOW
AFOHR
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGJBK
AGMZJ
AGQEE
AGQMX
AGQPQ
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHPBZ
AHSBF
AHWEU
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AIXLP
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMVHM
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARCSS
ARMRJ
ASPBG
ATHPR
AVWKF
AXYYD
AYFIA
AYJHY
AZFZN
AZQEC
B-.
BA0
BBWZM
BDATZ
BENPR
BGLVJ
BGNMA
BPHCQ
BSONS
CAG
CCPQU
CITATION
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
DWQXO
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ7
GQ8
GXS
H13
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I-F
I09
IHE
IJ-
IKXTQ
ITG
ITH
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Y
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K6V
K7-
KDC
KOV
KOW
LAK
LLZTM
M2P
M4Y
MA-
MVM
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P62
P9O
PF-
PHGZM
PHGZT
PQGLB
PQQKQ
PROAC
PT4
Q2X
QF4
QM1
QN7
QO4
QOK
QOS
R4E
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZC
RZE
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TAE
TEORI
TN5
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WH7
WIP
WK8
YLTOR
Z45
Z8Z
ZMTXR
3V.
7SC
7XB
8AL
8FD
8FK
AESKC
JQ2
L7M
L~C
L~D
M0N
PKEHL
PQEST
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c338t-6fa7d36626f48e048c7d74a63f72bf7a47ede07de3a2a79e5557e6aa5c22050f3
IEDL.DBID M2P
ISICitedReferencesCount 278
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000173841100004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0885-6125
IngestDate Sun Nov 09 11:42:22 EST 2025
Tue Nov 04 17:03:27 EST 2025
Tue Nov 18 22:17:25 EST 2025
Sat Nov 29 07:46:04 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2-3
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c338t-6fa7d36626f48e048c7d74a63f72bf7a47ede07de3a2a79e5557e6aa5c22050f3
Notes SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
OpenAccessLink https://link.springer.com/content/pdf/10.1023/A:1017928328829.pdf
PQID 757029715
PQPubID 54194
PageCount 18
ParticipantIDs proquest_miscellaneous_27166561
proquest_journals_757029715
crossref_primary_10_1023_A_1017928328829
crossref_citationtrail_10_1023_A_1017928328829
PublicationCentury 2000
PublicationDate 2002-11-01
PublicationDateYYYYMMDD 2002-11-01
PublicationDate_xml – month: 11
  year: 2002
  text: 2002-11-01
  day: 01
PublicationDecade 2000
PublicationPlace Dordrecht
PublicationPlace_xml – name: Dordrecht
PublicationTitle Machine learning
PublicationYear 2002
Publisher Springer Nature B.V
Publisher_xml – name: Springer Nature B.V
References P. J. Werbos (395106_CR35) 1990; 3
D. Ormoneit (395106_CR16) 2001
W. D. Smart (395106_CR25) 2000
C. J. Stone (395106_CR26) 1982; 10
G. Tesauro (395106_CR29) 1989; 1
M. W. Brandt (395106_CR7) 1999; 54
J. Fan (395106_CR10) 1996
S. J. Bradtke (395106_CR6) 1993
M. L. Puterman (395106_CR21) 1994
C. G. Atkeson (395106_CR1) 1997; 11
R. S. Sutton (395106_CR28) 2000
J. Peng (395106_CR20) 1995
D. P. Bertsekas (395106_CR4) 1995
J. N. Tsitsiklis (395106_CR31) 1996; 22
L. Devroye (395106_CR9) 1996
M. E. Connell (395106_CR8) 1987
S. B. Thrun (395106_CR30) 1992
G. Gordon (395106_CR11) 1999
J. A. Boyan (395106_CR5) 1995
H. Robbins (395106_CR22) 1951; 20
J. Rust (395106_CR23) 1997; 65
C. J. C. H. Watkins (395106_CR34) 1992; 8
395106_CR14
N. C. Jain (395106_CR13) 1975; 19
J. N. Tsitsiklis (395106_CR33) 2000
395106_CR15
395106_CR17
R. E. Bellman (395106_CR3) 1957
J. N. Tsitsiklis (395106_CR32) 1999; 44
395106_CR2
395106_CR19
S. Singh (395106_CR24) 1997
R. S. Sutton (395106_CR27) 1988; 3
D. Ormoneit (395106_CR18) 2000
T. Hastie (395106_CR12) 1993; 8
References_xml – volume: 1
  start-page: 321
  issue: 3
  year: 1989
  ident: 395106_CR29
  publication-title: Neural Computation
  doi: 10.1162/neco.1989.1.3.321
– volume: 3
  start-page: 179
  year: 1990
  ident: 395106_CR35
  publication-title: Neural Networks
  doi: 10.1016/0893-6080(90)90088-3
– volume-title: Approximate solutions to Markov decision processes
  year: 1999
  ident: 395106_CR11
– ident: 395106_CR19
– start-page: 540
  volume-title: Advances in neural information processing systems
  year: 2000
  ident: 395106_CR18
– start-page: 438
  volume-title: Twelfth International Conference on Machine Learning
  year: 1995
  ident: 395106_CR20
– volume: 54
  start-page: 1609
  issue: 5
  year: 1999
  ident: 395106_CR7
  publication-title: Journal of Finance
  doi: 10.1111/0022-1082.00162
– volume-title: Local polynomial modelling and its applications
  year: 1996
  ident: 395106_CR10
– ident: 395106_CR15
– start-page: 369
  volume-title: Advance in neural information processing systems
  year: 1995
  ident: 395106_CR5
– volume-title: Advances in neural information processing systems (Vol. 12)
  year: 2000
  ident: 395106_CR33
– volume: 65
  start-page: 487
  issue: 3
  year: 1997
  ident: 395106_CR23
  publication-title: Econometrica
  doi: 10.2307/2171751
– volume-title: Advances in neural information processing systems (Vol. 12)
  year: 2000
  ident: 395106_CR28
– ident: 395106_CR2
  doi: 10.21236/ADA280844
– volume: 8
  start-page: 120
  issue: 2
  year: 1993
  ident: 395106_CR12
  publication-title: Statistical Science
– volume: 8
  start-page: 279
  year: 1992
  ident: 395106_CR34
  publication-title: Machine Learning
– volume: 3
  start-page: 9
  year: 1988
  ident: 395106_CR27
  publication-title: Machine Learning
– start-page: 456
  volume-title: Sixth National Conference on Artificial Intelligence
  year: 1987
  ident: 395106_CR8
– start-page: 974
  volume-title: Advances in neural information processing systems
  year: 1997
  ident: 395106_CR24
– volume-title: A probabilistic theory of pattern recognition
  year: 1996
  ident: 395106_CR9
  doi: 10.1007/978-1-4612-0711-5
– volume-title: Dynamic programming and optimal control (Vols. 1 and 2)
  year: 1995
  ident: 395106_CR4
– start-page: 531
  volume-title: Advances in neural informaton processing systems
  year: 1992
  ident: 395106_CR30
– volume: 44
  start-page: 1840
  issue: 10
  year: 1999
  ident: 395106_CR32
  publication-title: IEEE Transactions on Automatic Control
  doi: 10.1109/9.793723
– ident: 395106_CR17
  doi: 10.1109/TAC.2002.803530
– volume-title: Markov decision processes: Discrete stochastic dynamic programming
  year: 1994
  ident: 395106_CR21
  doi: 10.1002/9780470316887
– start-page: 295
  volume-title: Advances in neural information processing systems
  year: 1993
  ident: 395106_CR6
– ident: 395106_CR14
– volume: 11
  start-page: 75
  issue: 1-5
  year: 1997
  ident: 395106_CR1
  publication-title: Artificial Intelligence Review
  doi: 10.1023/A:1006511328852
– volume: 19
  start-page: 216
  year: 1975
  ident: 395106_CR13
  publication-title: Journal of Functional Analysis
  doi: 10.1016/0022-1236(75)90056-7
– start-page: 903
  volume-title: Proceedings of the Seventeenth International Conference on Machine Learning
  year: 2000
  ident: 395106_CR25
– volume-title: Dynamic programming
  year: 1957
  ident: 395106_CR3
– volume: 22
  start-page: 59
  year: 1996
  ident: 395106_CR31
  publication-title: Machine Learning
– volume-title: Advances in neural information processing systems (Vol. 13)
  year: 2001
  ident: 395106_CR16
– volume: 20
  start-page: 400
  year: 1951
  ident: 395106_CR22
  publication-title: Annals of Mathematical Statistics
  doi: 10.1214/aoms/1177729586
– volume: 10
  start-page: 1040
  issue: 4
  year: 1982
  ident: 395106_CR26
  publication-title: Annals of Statistics
  doi: 10.1214/aos/1176345969
SSID ssj0002686
Score 2.2741778
Snippet We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces....
SourceID proquest
crossref
SourceType Aggregation Database
Enrichment Source
Index Database
StartPage 161
SubjectTerms Mathematical models
Studies
Title Kernel-Based Reinforcement Learning
URI https://www.proquest.com/docview/757029715
https://www.proquest.com/docview/27166561
Volume 49
WOSCitedRecordID wos000173841100004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 1573-0565
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0002686
  issn: 0885-6125
  databaseCode: P5Z
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1573-0565
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0002686
  issn: 0885-6125
  databaseCode: K7-
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1573-0565
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0002686
  issn: 0885-6125
  databaseCode: BENPR
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Science Database (ProQuest)
  customDbUrl:
  eissn: 1573-0565
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0002686
  issn: 0885-6125
  databaseCode: M2P
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/sciencejournals
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1573-0565
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002686
  issn: 0885-6125
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NS8MwFH-4zYMX5yfO6SzowUt0TT_SnmTKhjAdYygMLyVLXkQY3Vw3_36TLBtM0IuXttAGyuN95r38fgBXTCWYpNQnksechDoGkHSkn6gfx0rJIKQjC5n_xHq9ZDhM-242p3BjlSufaB21nAizR37LImZ4lvzobvpJDGmUaa46Bo0SVHRi45uJrmfaXztiGluiR21HETGB_Aeyj1ZFw9OjU8x0Myht-mQbaDrVf_7iHuy6DNNrLVViH7YwP4Dqir3Bc8Z8CJddnOU4Jvc6jklvgBZCVdjdQs-hrr4fwWun_fLwSBxlAhG61pyTWHEmg1hXKSpMUFunYJKFPA4UoyPFeMhQYpNJDDjlLMUoihjGnEfCHLhtquAYyvkkxxPwFApfhEJQnoYhM8h8I5UkqO8YCImsBjcrmWXC4YkbWotxZvvaNMha2YaQa3C9XjBdQmn8_ml9JeHM2VSRrcVbg4v1W20MpsPBc5wsiozq6k8nqP7pn-vrsGM5Xez8zRmU57MFnsO2-Jp_FLMGVO7bvf6gAaUuIw2rUfraj96-AZtWz1Q
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1bS8MwFD7oFPTFuzjnpaCCL9E1TZv2QcQrjs0xRMG3miUnIoxurpvij_I_mnTtYIK--eBTC21KyTk51-T7APa5DjGMqEuUCARhxgeQqG3uqBsEWiuP0XYGmd_gzWb4-Bi1puCzOAtjt1UWNjEz1KorbY38mPvc8iy5_mnvlVjSKNtcLRg0RlpRx493k7GlJ7VLI94DSq-v7i9uSE4qQKTJxgYk0IIrLzBxvGYhGv2VXHEmAk9z2tZcMI4Kq1yhJ6jgEfq-zzEQwpf2SGpVe-a70zDDLLCY3SlIW2PDT4OMWNKsW5_YwOEbkpBRfcsLZELaaNIJTvqAzLFdL_6zKVmChTyCds5GKr8MU5iswGLBTuHkxmoV9urYT7BDzo2fVs4dZhCxMquGOjmq7PMaPPzJr65DKekmuAGORulKJiUVEWPcIg-2dRiiuaInFfIyHBUyimWOl25pOzpx1renXnwWTwi1DIfjAb0RVMjPr1YKica5zUjjsTjLsDt-aha77eCIBLvDNKYmuzUBuLv56_hdmLu5v23EjVqzXoH5EX-NLRptQWnQH-I2zMq3wUva38n014Gnv9aOL9vTKVE
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1JS8NAFH64IV7cxboGVPAytpkskxxE3IqlpRRREC9xOvNGhJJqWxV_mv_ON2lSqKA3D54SSCaEefubme8D2BcmwijmLtMylMynGMDiNt1xNwyN0Z7P2xlkfkM0m9HdXdyagM_iLIzdVln4xMxR666yPfKyCITlWXKDssl3RbQuqifPL8wSSNmF1oJNY6ghdfx4p-qtf1y7IFEfcF69vDm_YjnBAFNUmQ1YaKTQXkg5vfEjJF1WQgtfhp4RvG2E9AVqrAiNnuRSxBgEgcBQykDZ46kV49F3J2FaUIlpdxO2gvtREOBhRjJJNhwwm0R8QxUiM7AcQZTexuMBcTweZEGuuvCPp2cR5vPM2jkdmsISTGC6DAsFa4WTO7EV2KtjL8UOO6P4rZ1rzKBjVdYldXK02cdVuP2TX12DqbSb4jo4BpWrfKW4jH1fWETCtokipCt6SqMowVEhr0TlOOqWzqOTZOv53EtOkzEBl-BwNOB5CCHy86ubhXST3Jf0k5FoS7A7ekpOwK7syBS7r_2EU9VLibm78ev4XZglpUgatWZ9E-aGtDa2l7QFU4PeK27DjHobPPV7O5kqO_Dw18rxBVRsMj0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Kernel-Based+Reinforcement+Learning&rft.jtitle=Machine+learning&rft.au=Ormoneit%2C+Dirk&rft.au=Sen%2C+%C5%9Aaunak&rft.date=2002-11-01&rft.issn=0885-6125&rft.eissn=1573-0565&rft.volume=49&rft.issue=2-3&rft.spage=161&rft.epage=178&rft_id=info:doi/10.1023%2FA%3A1017928328829&rft.externalDBID=n%2Fa&rft.externalDocID=10_1023_A_1017928328829
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-6125&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-6125&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-6125&client=summon