Kernel-Based Reinforcement Learning
We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second...
Saved in:
| Published in: | Machine learning Vol. 49; no. 2-3; pp. 161 - 178 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Dordrecht
Springer Nature B.V
01.11.2002
|
| Subjects: | |
| ISSN: | 0885-6125, 1573-0565 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.[PUBLICATION ABSTRACT] |
|---|---|
| AbstractList | We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not posses this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem. We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.[PUBLICATION ABSTRACT] |
| Author | Ormoneit, Dirk Sen, Śaunak |
| Author_xml | – sequence: 1 givenname: Dirk surname: Ormoneit fullname: Ormoneit, Dirk – sequence: 2 givenname: Śaunak surname: Sen fullname: Sen, Śaunak |
| BookMark | eNp1kEtLAzEUhYMo2FbXbouCu7F5THIz7mrxhQVBdB1i5kZSpklNpgv_vSN1Y8HV2Xzf4XDG5DCmiIScMXrFKBez-TWjDBquBdeaNwdkxCSIikolD8mIai0rxbg8JuNSVpRSrrQakYsnzBG76sYWbKcvGKJP2eEaYz9dos0xxI8TcuRtV_D0Nyfk7e72dfFQLZ_vHxfzZeWE0H2lvIVWKMWVrzXSWjtoobZKeODvHmwN2CKFFoXlFhqUUgIqa6XjnErqxYRc7no3OX1usfRmHYrDrrMR07YYDkwpqdgAnu-Bq7TNcdhmQALlDTA5QLMd5HIqJaM3mxzWNn8ZRs3PY2Zu_jw2GHLPcKG3fUixzzZ0_3rfx6tuUA |
| CitedBy_id | crossref_primary_10_1016_j_artint_2007_08_001 crossref_primary_10_1109_TNNLS_2015_2442233 crossref_primary_10_1061__ASCE_WR_1943_5452_0001182 crossref_primary_10_1007_s00466_023_02335_6 crossref_primary_10_1109_TNNLS_2012_2236354 crossref_primary_10_3390_app142311114 crossref_primary_10_1016_j_apm_2016_05_049 crossref_primary_10_1007_s10458_008_9056_7 crossref_primary_10_1002_wrcr_20295 crossref_primary_10_1214_24_AOS2399 crossref_primary_10_1007_s10479_018_2910_3 crossref_primary_10_1016_j_sysconle_2023_105563 crossref_primary_10_1214_22_AOS2231 crossref_primary_10_1109_TNNLS_2017_2702566 crossref_primary_10_1109_TVT_2016_2603536 crossref_primary_10_1016_j_clinthera_2021_11_002 crossref_primary_10_1007_s10994_007_5038_2 crossref_primary_10_1109_ACCESS_2020_3027765 crossref_primary_10_1109_TNN_2007_899161 crossref_primary_10_1109_JPROC_2011_2109671 crossref_primary_10_1016_j_arcontrol_2018_09_005 crossref_primary_10_1007_s00245_025_10304_z crossref_primary_10_1111_j_1541_0420_2011_01572_x crossref_primary_10_1016_j_neunet_2014_01_002 crossref_primary_10_1016_S1005_8885_09_60495_7 crossref_primary_10_1109_TAES_2022_3208865 crossref_primary_10_1038_srep00400 crossref_primary_10_1080_17460441_2022_2072288 crossref_primary_10_1007_s10479_012_1248_5 crossref_primary_10_1007_s10994_010_5186_7 crossref_primary_10_1016_j_biosystems_2016_05_007 crossref_primary_10_1007_s10458_015_9284_6 crossref_primary_10_1007_s11633_015_0893_y crossref_primary_10_1016_j_asoc_2023_110975 crossref_primary_10_1214_18_STS672 crossref_primary_10_1016_j_ins_2013_08_037 crossref_primary_10_1016_j_knosys_2023_110902 crossref_primary_10_1109_TCYB_2014_2311578 crossref_primary_10_1007_s00607_019_00760_1 crossref_primary_10_1109_TSG_2020_3027728 crossref_primary_10_1109_TAC_2015_2505403 crossref_primary_10_1287_opre_2015_1425 crossref_primary_10_1109_TNNLS_2013_2270561 crossref_primary_10_1016_j_automatica_2024_111517 crossref_primary_10_1002_acs_2344 crossref_primary_10_3390_a14100291 crossref_primary_10_1145_2766910 crossref_primary_10_1016_j_ejor_2014_08_023 crossref_primary_10_20965_jaciii_2016_p1135 crossref_primary_10_1109_TPAMI_2021_3088063 crossref_primary_10_1016_j_cor_2013_09_006 crossref_primary_10_1146_annurev_statistics_022513_115553 crossref_primary_10_1287_ijoc_1110_0470 crossref_primary_10_3724_SP_J_1004_2012_00673 crossref_primary_10_1371_journal_pone_0205839 crossref_primary_10_3390_math12243935 crossref_primary_10_1016_j_engappai_2018_09_007 crossref_primary_10_1109_TAC_2002_803530 crossref_primary_10_1007_s00500_016_2248_1 crossref_primary_10_1016_j_physa_2023_128901 crossref_primary_10_1109_TIE_2020_3047041 crossref_primary_10_1016_j_ifacol_2017_08_340 crossref_primary_10_1016_j_engappai_2017_07_005 crossref_primary_10_1016_j_jfranklin_2022_01_016 crossref_primary_10_1109_TSP_2024_3505266 crossref_primary_10_1109_TAC_2020_3029315 crossref_primary_10_1007_s10994_011_5251_x crossref_primary_10_2139_ssrn_3790066 crossref_primary_10_1007_s00354_015_0102_0 crossref_primary_10_1007_s10690_017_9226_1 crossref_primary_10_1109_TCBB_2016_2595577 crossref_primary_10_1177_02783649241238766 crossref_primary_10_1109_TAC_2019_2907414 crossref_primary_10_3390_stats4010001 crossref_primary_10_1038_s41467_025_60085_w crossref_primary_10_1111_insr_12617 crossref_primary_10_1109_TSG_2019_2936142 crossref_primary_10_1109_TCDS_2020_3034452 crossref_primary_10_1109_TSG_2016_2517211 crossref_primary_10_1016_j_drugalcdep_2007_01_005 crossref_primary_10_1287_ijoc_1080_0305 crossref_primary_10_1007_s00521_017_3066_9 crossref_primary_10_1093_biomet_asy043 crossref_primary_10_1137_120867263 crossref_primary_10_1016_j_jprocont_2010_06_007 crossref_primary_10_1109_TNNLS_2022_3176204 crossref_primary_10_1137_S0040585X97T987910 crossref_primary_10_1109_TCIAIG_2014_2369345 crossref_primary_10_1145_3477600 crossref_primary_10_1051_e3sconf_201911105013 crossref_primary_10_1007_s10489_021_02953_8 crossref_primary_10_1016_j_jprocont_2005_04_010 crossref_primary_10_1146_annurev_psych_122414_033625 crossref_primary_10_1109_TCST_2013_2246866 crossref_primary_10_1007_s11432_011_4332_6 crossref_primary_10_1016_j_cie_2022_108934 crossref_primary_10_1029_2009WR008898 crossref_primary_10_1007_s12555_024_0990_1 crossref_primary_10_1038_ncomms15958 crossref_primary_10_1109_TNNLS_2013_2247418 crossref_primary_10_1515_ijb_2015_0052 crossref_primary_10_1007_s10994_006_8258_y crossref_primary_10_1109_TSMC_2019_2958846 crossref_primary_10_1109_TIE_2022_3192676 crossref_primary_10_1109_TAC_2019_2912443 crossref_primary_10_4213_tvp5033 crossref_primary_10_1137_130907070 crossref_primary_10_1016_j_neucom_2008_12_019 crossref_primary_10_1145_2185520_2185524 crossref_primary_10_3390_aerospace9060294 crossref_primary_10_1016_j_knosys_2024_112230 crossref_primary_10_1016_j_knosys_2016_03_007 crossref_primary_10_1016_j_engappai_2019_04_001 crossref_primary_10_1016_j_artmed_2020_101964 crossref_primary_10_1109_TSMC_2020_2966631 crossref_primary_10_1109_TSMCC_2007_913919 crossref_primary_10_1007_s10458_009_9104_y crossref_primary_10_1016_j_trc_2022_103640 crossref_primary_10_1137_13091333X crossref_primary_10_1016_j_ins_2018_12_019 crossref_primary_10_1016_j_eswa_2023_120495 crossref_primary_10_1109_JSTSP_2017_2787979 |
| Cites_doi | 10.1162/neco.1989.1.3.321 10.1016/0893-6080(90)90088-3 10.1111/0022-1082.00162 10.2307/2171751 10.21236/ADA280844 10.1007/978-1-4612-0711-5 10.1109/9.793723 10.1109/TAC.2002.803530 10.1002/9780470316887 10.1023/A:1006511328852 10.1016/0022-1236(75)90056-7 10.1214/aoms/1177729586 10.1214/aos/1176345969 |
| ContentType | Journal Article |
| Copyright | Kluwer Academic Publishers 2002 |
| Copyright_xml | – notice: Kluwer Academic Publishers 2002 |
| DBID | AAYXX CITATION 3V. 7SC 7XB 88I 8AL 8AO 8FD 8FE 8FG 8FK ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- L7M L~C L~D M0N M2P P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS Q9U |
| DOI | 10.1023/A:1017928328829 |
| DatabaseName | CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts ProQuest Central (purchase pre-March 2016) Science Database (Alumni Edition) Computing Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials - QC ProQuest Central ProQuest Technology Collection ProQuest One ProQuest Central Korea ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database (ProQuest) Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database Science Database (ProQuest) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic |
| DatabaseTitle | CrossRef Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Pharma Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Collection ProQuest Computing ProQuest Science Journals (Alumni Edition) ProQuest Central Basic ProQuest Science Journals ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Academic ProQuest Central (Alumni) ProQuest One Academic (New) |
| DatabaseTitleList | Computer and Information Systems Abstracts Computer Science Database |
| Database_xml | – sequence: 1 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1573-0565 |
| EndPage | 178 |
| ExternalDocumentID | 2157618351 10_1023_A_1017928328829 |
| Genre | Feature |
| GroupedDBID | -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 0VY 199 1N0 1SB 2.D 203 28- 29M 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5GY 5QI 5VS 67Z 6NX 6TJ 78A 88I 8AO 8FE 8FG 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAEWM AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AAPKM AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYXX ABAKF ABBBX ABBRH ABBXA ABDBE ABDZT ABECU ABFSG ABFTV ABHLI ABHQN ABIVO ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABRTQ ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACGOD ACHSB ACHXU ACKNC ACMDZ ACMLO ACNCT ACOKC ACOMO ACPIV ACSTC ACZOJ ADHHG ADHIR ADHKG ADIMF ADKFA ADKNI ADKPE ADMLS ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AETLH AEVLU AEXYK AEZWR AFBBN AFDZB AFEXP AFFHD AFGCZ AFHIU AFKRA AFLOW AFOHR AFQWF AFWTZ AFZKB AGAYW AGDGC AGJBK AGMZJ AGQEE AGQMX AGQPQ AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHPBZ AHSBF AHWEU AHYZX AIAKS AIGIU AIIXL AILAN AITGF AIXLP AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMVHM AMXSW AMYLF AMYQR AOCGG ARAPS ARCSS ARMRJ ASPBG ATHPR AVWKF AXYYD AYFIA AYJHY AZFZN AZQEC B-. BA0 BBWZM BDATZ BENPR BGLVJ BGNMA BPHCQ BSONS CAG CCPQU CITATION COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 DWQXO EBLON EBS EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ7 GQ8 GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I-F I09 IHE IJ- IKXTQ ITG ITH ITM IWAJR IXC IZIGR IZQ I~X I~Y I~Z J-C J0Z JBSCW JCJTX JZLTJ K6V K7- KDC KOV KOW LAK LLZTM M2P M4Y MA- MVM N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P62 P9O PF- PHGZM PHGZT PQGLB PQQKQ PROAC PT4 Q2X QF4 QM1 QN7 QO4 QOK QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TAE TEORI TN5 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WH7 WIP WK8 YLTOR Z45 Z8Z ZMTXR 3V. 7SC 7XB 8AL 8FD 8FK AESKC JQ2 L7M L~C L~D M0N PKEHL PQEST PQUKI PRINS Q9U |
| ID | FETCH-LOGICAL-c338t-6fa7d36626f48e048c7d74a63f72bf7a47ede07de3a2a79e5557e6aa5c22050f3 |
| IEDL.DBID | M2P |
| ISICitedReferencesCount | 278 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000173841100004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0885-6125 |
| IngestDate | Sun Nov 09 11:42:22 EST 2025 Tue Nov 04 17:03:27 EST 2025 Tue Nov 18 22:17:25 EST 2025 Sat Nov 29 07:46:04 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2-3 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c338t-6fa7d36626f48e048c7d74a63f72bf7a47ede07de3a2a79e5557e6aa5c22050f3 |
| Notes | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23 |
| OpenAccessLink | https://link.springer.com/content/pdf/10.1023/A:1017928328829.pdf |
| PQID | 757029715 |
| PQPubID | 54194 |
| PageCount | 18 |
| ParticipantIDs | proquest_miscellaneous_27166561 proquest_journals_757029715 crossref_primary_10_1023_A_1017928328829 crossref_citationtrail_10_1023_A_1017928328829 |
| PublicationCentury | 2000 |
| PublicationDate | 2002-11-01 |
| PublicationDateYYYYMMDD | 2002-11-01 |
| PublicationDate_xml | – month: 11 year: 2002 text: 2002-11-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationPlace | Dordrecht |
| PublicationPlace_xml | – name: Dordrecht |
| PublicationTitle | Machine learning |
| PublicationYear | 2002 |
| Publisher | Springer Nature B.V |
| Publisher_xml | – name: Springer Nature B.V |
| References | P. J. Werbos (395106_CR35) 1990; 3 D. Ormoneit (395106_CR16) 2001 W. D. Smart (395106_CR25) 2000 C. J. Stone (395106_CR26) 1982; 10 G. Tesauro (395106_CR29) 1989; 1 M. W. Brandt (395106_CR7) 1999; 54 J. Fan (395106_CR10) 1996 S. J. Bradtke (395106_CR6) 1993 M. L. Puterman (395106_CR21) 1994 C. G. Atkeson (395106_CR1) 1997; 11 R. S. Sutton (395106_CR28) 2000 J. Peng (395106_CR20) 1995 D. P. Bertsekas (395106_CR4) 1995 J. N. Tsitsiklis (395106_CR31) 1996; 22 L. Devroye (395106_CR9) 1996 M. E. Connell (395106_CR8) 1987 S. B. Thrun (395106_CR30) 1992 G. Gordon (395106_CR11) 1999 J. A. Boyan (395106_CR5) 1995 H. Robbins (395106_CR22) 1951; 20 J. Rust (395106_CR23) 1997; 65 C. J. C. H. Watkins (395106_CR34) 1992; 8 395106_CR14 N. C. Jain (395106_CR13) 1975; 19 J. N. Tsitsiklis (395106_CR33) 2000 395106_CR15 395106_CR17 R. E. Bellman (395106_CR3) 1957 J. N. Tsitsiklis (395106_CR32) 1999; 44 395106_CR2 395106_CR19 S. Singh (395106_CR24) 1997 R. S. Sutton (395106_CR27) 1988; 3 D. Ormoneit (395106_CR18) 2000 T. Hastie (395106_CR12) 1993; 8 |
| References_xml | – volume: 1 start-page: 321 issue: 3 year: 1989 ident: 395106_CR29 publication-title: Neural Computation doi: 10.1162/neco.1989.1.3.321 – volume: 3 start-page: 179 year: 1990 ident: 395106_CR35 publication-title: Neural Networks doi: 10.1016/0893-6080(90)90088-3 – volume-title: Approximate solutions to Markov decision processes year: 1999 ident: 395106_CR11 – ident: 395106_CR19 – start-page: 540 volume-title: Advances in neural information processing systems year: 2000 ident: 395106_CR18 – start-page: 438 volume-title: Twelfth International Conference on Machine Learning year: 1995 ident: 395106_CR20 – volume: 54 start-page: 1609 issue: 5 year: 1999 ident: 395106_CR7 publication-title: Journal of Finance doi: 10.1111/0022-1082.00162 – volume-title: Local polynomial modelling and its applications year: 1996 ident: 395106_CR10 – ident: 395106_CR15 – start-page: 369 volume-title: Advance in neural information processing systems year: 1995 ident: 395106_CR5 – volume-title: Advances in neural information processing systems (Vol. 12) year: 2000 ident: 395106_CR33 – volume: 65 start-page: 487 issue: 3 year: 1997 ident: 395106_CR23 publication-title: Econometrica doi: 10.2307/2171751 – volume-title: Advances in neural information processing systems (Vol. 12) year: 2000 ident: 395106_CR28 – ident: 395106_CR2 doi: 10.21236/ADA280844 – volume: 8 start-page: 120 issue: 2 year: 1993 ident: 395106_CR12 publication-title: Statistical Science – volume: 8 start-page: 279 year: 1992 ident: 395106_CR34 publication-title: Machine Learning – volume: 3 start-page: 9 year: 1988 ident: 395106_CR27 publication-title: Machine Learning – start-page: 456 volume-title: Sixth National Conference on Artificial Intelligence year: 1987 ident: 395106_CR8 – start-page: 974 volume-title: Advances in neural information processing systems year: 1997 ident: 395106_CR24 – volume-title: A probabilistic theory of pattern recognition year: 1996 ident: 395106_CR9 doi: 10.1007/978-1-4612-0711-5 – volume-title: Dynamic programming and optimal control (Vols. 1 and 2) year: 1995 ident: 395106_CR4 – start-page: 531 volume-title: Advances in neural informaton processing systems year: 1992 ident: 395106_CR30 – volume: 44 start-page: 1840 issue: 10 year: 1999 ident: 395106_CR32 publication-title: IEEE Transactions on Automatic Control doi: 10.1109/9.793723 – ident: 395106_CR17 doi: 10.1109/TAC.2002.803530 – volume-title: Markov decision processes: Discrete stochastic dynamic programming year: 1994 ident: 395106_CR21 doi: 10.1002/9780470316887 – start-page: 295 volume-title: Advances in neural information processing systems year: 1993 ident: 395106_CR6 – ident: 395106_CR14 – volume: 11 start-page: 75 issue: 1-5 year: 1997 ident: 395106_CR1 publication-title: Artificial Intelligence Review doi: 10.1023/A:1006511328852 – volume: 19 start-page: 216 year: 1975 ident: 395106_CR13 publication-title: Journal of Functional Analysis doi: 10.1016/0022-1236(75)90056-7 – start-page: 903 volume-title: Proceedings of the Seventeenth International Conference on Machine Learning year: 2000 ident: 395106_CR25 – volume-title: Dynamic programming year: 1957 ident: 395106_CR3 – volume: 22 start-page: 59 year: 1996 ident: 395106_CR31 publication-title: Machine Learning – volume-title: Advances in neural information processing systems (Vol. 13) year: 2001 ident: 395106_CR16 – volume: 20 start-page: 400 year: 1951 ident: 395106_CR22 publication-title: Annals of Mathematical Statistics doi: 10.1214/aoms/1177729586 – volume: 10 start-page: 1040 issue: 4 year: 1982 ident: 395106_CR26 publication-title: Annals of Statistics doi: 10.1214/aos/1176345969 |
| SSID | ssj0002686 |
| Score | 2.2741778 |
| Snippet | We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces.... |
| SourceID | proquest crossref |
| SourceType | Aggregation Database Enrichment Source Index Database |
| StartPage | 161 |
| SubjectTerms | Mathematical models Studies |
| Title | Kernel-Based Reinforcement Learning |
| URI | https://www.proquest.com/docview/757029715 https://www.proquest.com/docview/27166561 |
| Volume | 49 |
| WOSCitedRecordID | wos000173841100004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: Advanced Technologies & Aerospace Database customDbUrl: eissn: 1573-0565 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0002686 issn: 0885-6125 databaseCode: P5Z dateStart: 19970101 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 1573-0565 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0002686 issn: 0885-6125 databaseCode: K7- dateStart: 19970101 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 1573-0565 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0002686 issn: 0885-6125 databaseCode: BENPR dateStart: 19970101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Science Database (ProQuest) customDbUrl: eissn: 1573-0565 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0002686 issn: 0885-6125 databaseCode: M2P dateStart: 19970101 isFulltext: true titleUrlDefault: https://search.proquest.com/sciencejournals providerName: ProQuest – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1573-0565 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002686 issn: 0885-6125 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NS8MwFH-4zYMX5yfO6SzowUt0TT_SnmTKhjAdYygMLyVLXkQY3Vw3_36TLBtM0IuXttAGyuN95r38fgBXTCWYpNQnksechDoGkHSkn6gfx0rJIKQjC5n_xHq9ZDhM-242p3BjlSufaB21nAizR37LImZ4lvzobvpJDGmUaa46Bo0SVHRi45uJrmfaXztiGluiR21HETGB_Aeyj1ZFw9OjU8x0Myht-mQbaDrVf_7iHuy6DNNrLVViH7YwP4Dqir3Bc8Z8CJddnOU4Jvc6jklvgBZCVdjdQs-hrr4fwWun_fLwSBxlAhG61pyTWHEmg1hXKSpMUFunYJKFPA4UoyPFeMhQYpNJDDjlLMUoihjGnEfCHLhtquAYyvkkxxPwFApfhEJQnoYhM8h8I5UkqO8YCImsBjcrmWXC4YkbWotxZvvaNMha2YaQa3C9XjBdQmn8_ml9JeHM2VSRrcVbg4v1W20MpsPBc5wsiozq6k8nqP7pn-vrsGM5Xez8zRmU57MFnsO2-Jp_FLMGVO7bvf6gAaUuIw2rUfraj96-AZtWz1Q |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1bS8MwFD7oFPTFuzjnpaCCL9E1TZv2QcQrjs0xRMG3miUnIoxurpvij_I_mnTtYIK--eBTC21KyTk51-T7APa5DjGMqEuUCARhxgeQqG3uqBsEWiuP0XYGmd_gzWb4-Bi1puCzOAtjt1UWNjEz1KorbY38mPvc8iy5_mnvlVjSKNtcLRg0RlpRx493k7GlJ7VLI94DSq-v7i9uSE4qQKTJxgYk0IIrLzBxvGYhGv2VXHEmAk9z2tZcMI4Kq1yhJ6jgEfq-zzEQwpf2SGpVe-a70zDDLLCY3SlIW2PDT4OMWNKsW5_YwOEbkpBRfcsLZELaaNIJTvqAzLFdL_6zKVmChTyCds5GKr8MU5iswGLBTuHkxmoV9urYT7BDzo2fVs4dZhCxMquGOjmq7PMaPPzJr65DKekmuAGORulKJiUVEWPcIg-2dRiiuaInFfIyHBUyimWOl25pOzpx1renXnwWTwi1DIfjAb0RVMjPr1YKica5zUjjsTjLsDt-aha77eCIBLvDNKYmuzUBuLv56_hdmLu5v23EjVqzXoH5EX-NLRptQWnQH-I2zMq3wUva38n014Gnv9aOL9vTKVE |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1JS8NAFH64IV7cxboGVPAytpkskxxE3IqlpRRREC9xOvNGhJJqWxV_mv_ON2lSqKA3D54SSCaEefubme8D2BcmwijmLtMylMynGMDiNt1xNwyN0Z7P2xlkfkM0m9HdXdyagM_iLIzdVln4xMxR666yPfKyCITlWXKDssl3RbQuqifPL8wSSNmF1oJNY6ghdfx4p-qtf1y7IFEfcF69vDm_YjnBAFNUmQ1YaKTQXkg5vfEjJF1WQgtfhp4RvG2E9AVqrAiNnuRSxBgEgcBQykDZ46kV49F3J2FaUIlpdxO2gvtREOBhRjJJNhwwm0R8QxUiM7AcQZTexuMBcTweZEGuuvCPp2cR5vPM2jkdmsISTGC6DAsFa4WTO7EV2KtjL8UOO6P4rZ1rzKBjVdYldXK02cdVuP2TX12DqbSb4jo4BpWrfKW4jH1fWETCtokipCt6SqMowVEhr0TlOOqWzqOTZOv53EtOkzEBl-BwNOB5CCHy86ubhXST3Jf0k5FoS7A7ekpOwK7syBS7r_2EU9VLibm78ev4XZglpUgatWZ9E-aGtDa2l7QFU4PeK27DjHobPPV7O5kqO_Dw18rxBVRsMj0 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Kernel-Based+Reinforcement+Learning&rft.jtitle=Machine+learning&rft.au=Ormoneit%2C+Dirk&rft.au=Sen%2C+%C5%9Aaunak&rft.date=2002-11-01&rft.issn=0885-6125&rft.eissn=1573-0565&rft.volume=49&rft.issue=2-3&rft.spage=161&rft.epage=178&rft_id=info:doi/10.1023%2FA%3A1017928328829&rft.externalDBID=n%2Fa&rft.externalDocID=10_1023_A_1017928328829 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-6125&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-6125&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-6125&client=summon |