Reinforcement learning algorithms with function approximation: Recent advances and applications
In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques in RL will be essential to deal with MDPs with large or continuous state and actio...
Gespeichert in:
| Veröffentlicht in: | Information sciences Jg. 261; S. 1 - 31 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier Inc
10.03.2014
|
| Schlagworte: | |
| ISSN: | 0020-0255, 1872-6291 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques in RL will be essential to deal with MDPs with large or continuous state and action spaces. In this paper, a comprehensive survey is given on recent developments in RL algorithms with function approximation. From a theoretical point of view, the convergence and feature representation of RL algorithms are analyzed. From an empirical aspect, the performance of different RL algorithms was evaluated and compared in several benchmark learning prediction and learning control tasks. The applications of RL with function approximation are also discussed. At last, future works on RL with function approximation are suggested. |
|---|---|
| AbstractList | In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques in RL will be essential to deal with MDPs with large or continuous state and action spaces. In this paper, a comprehensive survey is given on recent developments in RL algorithms with function approximation. From a theoretical point of view, the convergence and feature representation of RL algorithms are analyzed. From an empirical aspect, the performance of different RL algorithms was evaluated and compared in several benchmark learning prediction and learning control tasks. The applications of RL with function approximation are also discussed. At last, future works on RL with function approximation are suggested. |
| Author | Zuo, Lei Huang, Zhenhua Xu, Xin |
| Author_xml | – sequence: 1 givenname: Xin surname: Xu fullname: Xu, Xin email: xuxin_mail@263.net, xinxu@nudt.edu.cn – sequence: 2 givenname: Lei surname: Zuo fullname: Zuo, Lei – sequence: 3 givenname: Zhenhua surname: Huang fullname: Huang, Zhenhua |
| BookMark | eNp9kMtOwzAQRS1UJNrCB7DLDySMncROYYUqXlIlpArW1tSeFFepU9mhwN-TtKxYdHU10pzRnTNhI996YuyaQ8aBy5tN5nzMBPA8gyqDXJ2xMa-USKWY8REbAwhIQZTlBZvEuAGAQkk5ZnpJztdtMLQl3yUNYfDOrxNs1m1w3cc2Jl99JPWnN51rfYK7XWi_3RaH6TZZkhk4tHv0hmKC3g4rjTOHhXjJzmtsIl395ZS9Pz68zZ_TxevTy_x-kRoxU11aKs4tB2NtLq0QlnBVFKuCakSu6twIa8qyXMm8RihwBihIzspSygKU4RXmU8aPd01oYwxU613oS4YfzUEPhvRG94b0YEhDpXtDPaP-McZ1h9pdQNecJO-OJPUv7R0FHY2jXoB1gUynbetO0L-IhIWy |
| CitedBy_id | crossref_primary_10_1080_0305215X_2021_2024177 crossref_primary_10_1080_09540091_2022_2025765 crossref_primary_10_1109_TIE_2017_2708002 crossref_primary_10_1016_j_ins_2014_03_104 crossref_primary_10_1038_s41598_022_06326_0 crossref_primary_10_1145_3729217 crossref_primary_10_3389_fpsyg_2020_560080 crossref_primary_10_1080_00207721_2025_2469821 crossref_primary_10_1016_j_multra_2025_100190 crossref_primary_10_3390_s22186992 crossref_primary_10_1016_j_nantod_2022_101665 crossref_primary_10_1109_TITS_2022_3179893 crossref_primary_10_1109_TSMC_2018_2870724 crossref_primary_10_1016_j_ins_2014_07_008 crossref_primary_10_1016_j_cor_2019_104850 crossref_primary_10_1109_ACCESS_2020_2964042 crossref_primary_10_1155_2015_760459 crossref_primary_10_1080_08839514_2018_1525852 crossref_primary_10_7717_peerj_cs_755 crossref_primary_10_1108_IMDS_09_2024_0874 crossref_primary_10_1155_2014_173290 crossref_primary_10_1109_TNSM_2021_3066625 crossref_primary_10_1155_2014_628798 crossref_primary_10_1080_00207179_2015_1068955 crossref_primary_10_1016_j_vehcom_2018_04_001 crossref_primary_10_1016_j_ins_2024_120736 crossref_primary_10_1260_1748_3018_9_4_449 crossref_primary_10_1145_3596222 crossref_primary_10_1016_j_ifacol_2020_12_2292 crossref_primary_10_1016_j_ins_2018_01_032 crossref_primary_10_1049_iet_its_2017_0153 crossref_primary_10_1016_j_ins_2015_04_005 crossref_primary_10_1007_s10462_023_10450_2 crossref_primary_10_3233_ICA_160531 crossref_primary_10_1007_s11280_023_01158_y crossref_primary_10_1016_j_ins_2022_08_079 crossref_primary_10_1016_j_adhoc_2022_103080 crossref_primary_10_1109_TWC_2022_3147411 crossref_primary_10_3390_designs7010018 crossref_primary_10_1049_joe_2019_1215 crossref_primary_10_1109_TWC_2018_2890057 crossref_primary_10_3390_math9222970 crossref_primary_10_20965_jaciii_2016_p1135 crossref_primary_10_1109_TSMC_2023_3305498 crossref_primary_10_1016_j_ins_2014_12_059 crossref_primary_10_1016_j_ins_2016_05_034 crossref_primary_10_3390_ma15144825 crossref_primary_10_1007_s11431_021_2004_9 crossref_primary_10_1016_j_asoc_2018_01_027 crossref_primary_10_1109_ACCESS_2019_2907618 crossref_primary_10_1002_acs_2475 crossref_primary_10_1016_j_ins_2021_07_060 crossref_primary_10_3233_JIFS_17052 crossref_primary_10_3390_s23115206 crossref_primary_10_1049_iet_its_2014_0156 crossref_primary_10_1109_TSMC_2017_2698473 crossref_primary_10_1177_0142331216649655 crossref_primary_10_1287_moor_2022_0241 crossref_primary_10_1016_j_ins_2020_06_010 crossref_primary_10_1088_1742_6596_1920_1_012084 crossref_primary_10_1016_j_ifacol_2018_11_115 crossref_primary_10_1109_LRA_2020_3013920 crossref_primary_10_1007_s42979_024_02831_3 crossref_primary_10_1016_j_enbuild_2017_08_052 crossref_primary_10_1002_oca_2791 crossref_primary_10_1007_s11227_018_2515_2 crossref_primary_10_1007_s00521_017_3066_9 crossref_primary_10_3390_en9090725 crossref_primary_10_1109_TCYB_2015_2478857 crossref_primary_10_1109_TVT_2017_2724060 crossref_primary_10_1109_JIOT_2019_2957313 crossref_primary_10_1016_j_neunet_2018_07_018 crossref_primary_10_1080_08839514_2024_2383101 crossref_primary_10_1007_s10898_018_0698_y crossref_primary_10_1017_S026357471800111X crossref_primary_10_1109_TMECH_2018_2817495 crossref_primary_10_1109_ACCESS_2019_2926642 crossref_primary_10_1016_j_ins_2021_04_092 crossref_primary_10_1016_j_ins_2014_05_050 crossref_primary_10_1016_j_asoc_2014_10_005 crossref_primary_10_1016_j_adhoc_2024_103751 crossref_primary_10_1016_j_apenergy_2022_120212 crossref_primary_10_1109_TSMC_2017_2712561 crossref_primary_10_1109_ACCESS_2024_3387273 crossref_primary_10_1007_s10489_019_01417_4 crossref_primary_10_1016_j_ins_2015_04_044 crossref_primary_10_1007_s42979_021_00934_9 crossref_primary_10_1177_09544100221149231 crossref_primary_10_1016_j_ins_2020_03_105 crossref_primary_10_1109_OJITS_2025_3550312 crossref_primary_10_1109_ACCESS_2021_3063463 crossref_primary_10_1007_s11276_019_02225_x crossref_primary_10_1016_j_jfranklin_2017_06_017 crossref_primary_10_1109_TSC_2021_3075988 crossref_primary_10_1080_00207179_2016_1185802 crossref_primary_10_1016_j_ins_2018_01_005 crossref_primary_10_1109_TSMC_2019_2926806 crossref_primary_10_1109_JSYST_2017_2720682 crossref_primary_10_1016_j_eswa_2023_120495 |
| Cites_doi | 10.1061/(ASCE)0733-947X(2003)129:3(278) 10.1007/s10994-011-5254-7 10.1007/s10514-009-9120-4 10.1016/j.automatica.2009.07.008 10.1109/5326.704593 10.1016/j.neunet.2009.03.008 10.1109/34.659932 10.1145/1102351.1102421 10.1109/MCI.2009.932261 10.1023/A:1022633531479 10.1023/A:1017936530646 10.1016/j.neucom.2007.11.026 10.1023/A:1022657612745 10.1023/A:1022140919877 10.1016/j.automatica.2010.02.018 10.1613/jair.639 10.1109/TNN.2011.2168422 10.1109/TSMC.1983.6313077 10.1016/S0377-2217(02)00874-3 10.1109/TPAMI.2005.201 10.1109/TNN.2011.2168538 10.1109/TNN.2011.2132737 10.1613/jair.806 10.2514/3.21715 10.1023/A:1022192903948 10.1109/JOE.2004.835805 10.1007/s10994-006-8365-9 10.1016/j.ins.2007.03.012 10.1613/jair.946 10.1109/TNN.2005.853408 10.1109/ICCW.2010.5503970 10.1007/s10994-009-5128-4 10.1109/TSMCB.2008.925890 10.1109/MCAS.2009.933854 10.1109/TCIAIG.2010.2100395 10.1007/978-3-540-45167-9_11 10.1109/TPWRS.2004.831259 10.1007/978-3-540-76928-6_8 10.1023/A:1022632907294 10.1049/iet-its.2009.0070 10.1109/ADPRL.2007.368190 10.1016/j.neunet.2009.03.012 10.1109/TNN.2003.813839 10.7551/mitpress/7503.003.0151 10.1109/JSAC.2012.120106 10.1109/ROBOT.2001.932842 10.1109/9.580874 10.1109/37.126844 10.1109/ACC.2009.5160611 10.1016/S0020-0255(02)00223-2 10.1023/A:1017928328829 10.1023/A:1007678930559 10.1016/j.asoc.2009.10.003 10.1109/TITS.2010.2091408 10.1109/TNN.2007.899161 10.1109/ICASSP.2012.6288330 10.1016/j.automatica.2008.08.017 10.1023/A:1007518724497 10.1177/105971230501300301 10.1016/j.rcim.2010.06.019 10.1049/iet-com.2010.0258 10.1109/TPWRS.2006.888977 10.1162/neco.1994.6.2.215 10.1016/j.ins.2005.10.009 10.1109/TSMC.1973.4309272 10.1109/98.788210 10.1049/iet-its.2009.0096 10.1023/B:MACH.0000039779.47329.3a 10.1145/1553374.1553501 10.1162/089976698300017746 10.1007/s10994-006-8258-y 10.1109/TPWRS.2003.821457 10.1109/72.788641 10.1023/A:1022693225949 10.7551/mitpress/7503.003.0062 10.1016/j.automatica.2011.03.005 10.1016/S0167-6911(97)90015-3 10.1162/neco.1994.6.6.1185 10.1016/S0004-3702(99)00052-1 10.1016/j.neucom.2008.11.031 10.1016/j.automatica.2006.09.019 10.1109/TNNLS.2012.2236354 10.1109/TVT.2010.2043124 10.1007/3-540-44914-0_2 10.1023/A:1018056104778 10.1109/TPWRS.2006.882467 10.1109/TNN.2002.1000146 10.1109/72.623201 10.1109/TPWRS.2010.2102372 10.1109/IROS.2006.282564 |
| ContentType | Journal Article |
| Copyright | 2013 Elsevier Inc. |
| Copyright_xml | – notice: 2013 Elsevier Inc. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.ins.2013.08.037 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Library & Information Science |
| EISSN | 1872-6291 |
| EndPage | 31 |
| ExternalDocumentID | 10_1016_j_ins_2013_08_037 S0020025513005975 |
| GroupedDBID | --K --M --Z -~X .DC .~1 0R~ 1B1 1RT 1~. 1~5 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AARIN AAXUO AAYFN ABAOU ABBOA ABFNM ABJNI ABMAC ABUCO ABXDB ABYKQ ACAZW ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE ADGUI ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIGVJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM ARUGR AXJTR BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HAMUX IHE J1W JJJVA KOM LG9 LY1 M41 MHUIS MO0 MS~ N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 RIG ROL RPZ SDF SDG SDP SES SPC SPCBC SSB SSD SST SSV SSW SSZ T5K TN5 TWZ WH7 XPP ZMT ~02 ~G- 1OL 29I 77I 9DU AAAKG AAQXK AATTM AAXKI AAYWO AAYXX ABEFU ABWVN ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO ADVLN AEIPS AEUPX AFFNX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EJD FEDTE FGOYB HLZ HVGLF HZ~ H~9 R2- SBC SDS SEW UHS WUQ YYP ZY4 ~HD |
| ID | FETCH-LOGICAL-c297t-5711d10cdd36d22deab44b4efaa17f3c2dc555b63fa04a90a2e695566407c18a3 |
| ISICitedReferencesCount | 140 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000331689700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0020-0255 |
| IngestDate | Sat Nov 29 07:29:45 EST 2025 Tue Nov 18 21:50:03 EST 2025 Fri Feb 23 02:23:16 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Approximate dynamic programming Function approximation Learning control Generalization Reinforcement learning |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c297t-5711d10cdd36d22deab44b4efaa17f3c2dc555b63fa04a90a2e695566407c18a3 |
| PageCount | 31 |
| ParticipantIDs | crossref_primary_10_1016_j_ins_2013_08_037 crossref_citationtrail_10_1016_j_ins_2013_08_037 elsevier_sciencedirect_doi_10_1016_j_ins_2013_08_037 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-03-10 |
| PublicationDateYYYYMMDD | 2014-03-10 |
| PublicationDate_xml | – month: 03 year: 2014 text: 2014-03-10 day: 10 |
| PublicationDecade | 2010 |
| PublicationTitle | Information sciences |
| PublicationYear | 2014 |
| Publisher | Elsevier Inc |
| Publisher_xml | – name: Elsevier Inc |
| References | Xu, Hou, Lian, He (b0735) 2013; 24 Söderström, Stoica (b0540) 1983 Wang, Cheng, Yi (b0660) 2007; 177 Al-Tamimi, Abu-Khalaf, Lewis (b0015) 2006 Dietterich (b0185) 2000; 13 Wang, Zhang, Liu (b0665) 2009 Farahmand, Ghavamzadeh, Szepesvári, Mannor (b0235) 2008 M. Riedmiller, M. Montemerlo, et al., Learning to drive in 20 Vrabie, Lewis, Abu-Khalaf (b0650) 2009; 45 Mitola, Maguire (b0395) 1999; 6 Peng, Bhaun (b0440) 1998; 20 min, in: Proceedings of the FBIT 2007 Conference, Jeju, Korea, 2007. Y. Engel, S. Mannor, R. Meir, “Bayes meets bellman: the Gaussian Process approach to temporal difference learning, in: Proceedings of the Twentieth International Conference of Machine Learning, Washington, DC, 2003, pp. 154–161. Ng, Kim (b0415) 2004; 16 Shimokawa, Suzuki (b0515) 2009; 72 W. Zhang, T.G. Dietterich. A reinforcement learning approach to job-shop scheduling, in: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI 1995), 1995, pp. 1114–1120. Parr, Russell (b0430) 1998 Iftekharuddin (b0295) 2011; 22 S. Mahadevan, Proto-value functions: developmental reinforcement learning, in: Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 553–560. Stone, Sutton (b0545) 2005; 13 Xu, He, Hu (b0705) 2002; 16 Tsitsiklis, Roy (b0615) 1997; 42 Liu, Zhang, Zhang (b0355) 2005; 16 Sutton, Szepesvari, Maei (b0575) 2009; vol. 21 Bach, Jordan (b0040) 2002; 3 Mahadevan, Maggioni (b0375) 2007; 8 R. Sutton, Adapting bias by gradient descent: an incremental version of delta-bar-delta, in: Proceedings of the 10th National Conference on Artificial Intelligence, 1992, pp. 171–176. P.J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavior Sciences, Ph.D. thesis, Committee Appl. Math. Harvard Univ., 1974. Singh, Bertsekas (b0525) 1997; 9 Balaji, German (b0060) 2010; 4 Bhatnagar, Sutton, Ghavamzadeh, Lee (b0095) 2009; 45 Baxter, Bartlett (b0085) 2001; 15 K.-L.A. Yau, P. Komisarczuk, et al., Applications of reinforcement learning to cognitive radio networks, 2010 IEEE International Conference on Communication Workshops (ICC), 2010, pp. 1–6. S.Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in: Proceedings of the Fourth Connectionist Models Summer School, 1993, pp. 255–263. Mohagheghi, Venayagamoorthy (b0400) 2006; 21 Borkar (b0105) 2008 Nanduri, Das (b0405) 2007; 22 Samuel (b0495) 1959; 3 Xu, Hu, Lu (b0715) 2007; 18 Jiang, Grace (b0310) 2011; 5 N. Kohl, P. Stone, Machine learning for fast quadrupedal locomotion, in: D.L. McGuinness, G. Ferguson (Eds.), Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI 2004), AAAI Press, Menlo Park, pp. 611–616. Vlachogiannis, Hatziargyriou (b0645) 2004; 19 Vamvoudakis, Lewis (b0625) 2010; 46 W.T.B. Uther, M.M. Veloso, Tree based discretization for continuous state space reinforcement learning, in: Proceedings of AAAI-98, 1998, pp. 769–774. Johns, Petrik, Mahadevan (b0315) 2009; 76 Boyan, Littman (b0120) 1994; 6 Jaradat, AI-Rousan (b0305) 2011; 27 Sutton, Barto, Williams (b0555) 1992; 12 Venayagamoorthy, Harley, Wunsch (b0640) 2002; 13 Xu, Xie, Hu, Lu (b0710) 2005; 11 Bagnell, Schneider (b0050) 2003 Hu, Yue (b0290) 2008 Ormoneit, Sen (b0425) 2002; 49 Szepesvári (b0595) 2010 Zhou (b0770) 2002; 145 Crites, Barto (b0160) 1996; 8 Carreras, Yuh (b0150) 2005; 30 Lagoudakis, Parr (b0340) 2003; 4 Ernst, Geurts, Wehenkel (b0220) 2005; 6 Zhang, Cui, Zhang, Luo (b0755) 2011; 22 Widrow, Gupta, Maitra (b0695) 1973; SMC-3 Galindo-Serrano, Giupponi (b0240) 2010; 59 Riedmiller, Gabel (b0480) 2009; 27 B. Hengst, Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning, in: AI 2007: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 4830, 2007, pp. 58–67. Xu, Liu, Yang, Hu (b0725) 2011; 22 J. Peters, S. Vijayakumar, S. Schaal, Reinforcement learning for humanoid robotics, in: IEEE/RSJ International Conference on Humanoid Robotics, 2003. Tesauro (b0600) 1994; 6 Xu (b0720) 2010; 10 Vamvoudakis, Lewis (b0630) 2011; 47 Balakrishnan, Biega (b0065) 1996; 19 Bertsekas, Tsitsiklis (b0090) 1996 Driessens, Ramon, Gärtner (b0245) 2006; 64 Lewis, Lendaris, Liu (b0345) 2008; 38 Maei, Szepesvári, Bhatnagar, Sutton (b0365) 2010 Amari (b0020) 1998; 10 Prashanth, Bhatnagar (b0465) 2011; 12 Arel, Liu (b0035) 2010; 4 Dietterich, Wang (b0180) 2002; vol. 14 Nedic, Bertsekas (b0410) 2003; 13 J.A. Bagnell, J.G. Schneider, Autonomous helicopter control using reinforcement learning policy search methods, in: Proceedings of the 2001 IEEE International Conference on Robotics & Automation, Seoul, Korea, 2001, pp. 1615–1620. Barto, Sutton, Anderson (b0080) 1983; 13 C. Darken, J. Moody, Note on learning rate schedules for stochastic optimization, in: Lippman, et al. (Eds.), Advances in Neural Information Processing Systems, vol. 3, 1991, pp. 1009–1016. Al-Tamimi, Lewis, Abu-Khalaf (b0010) 2007; 43 P.J. Werbos, Using ADP to understand and replicate brain intelligence: the next level design, in: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, pp. 209–216. Xu (b0700) 2010 Konda, Tsitsiklis (b0335) 2000; vol. 12 Singh, Yee (b0535) 1994; 16 T.G. Dietterich, State abstraction in MAXQ hierarchical reinforcement learning, in: S.A. Solla, T.K. Leen, K.R. Muller (Eds.), Advances in Neural Information Processing Systems, NIPS, 2000, pp. 994–1000. Driessens, Ramon (b0205) 2003 Yin, Bhanu (b0745) 2005; 27 R. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari, E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation, in: Proceedings of the 26th Annual International Conference on Machine Learning (ICML-09), 2009, pp. 993–1000. Maei, Szepesvári, Bhatnagar, Precup, Sutton (b0360) 2010; vol. 22 S. Bradtke, B. Ydstie, A. Barto, Adaptive linear quadratic control using policy iteration, Univ. Massachusetts, Amherst, MA, Tech. Rep. CMPSCI-94-49, June 1994. Yu, Zhou (b0750) 2011; 26 Farahmand, Szepesvári (b0230) 2011; 85 Busoniu, Babuska, De Schutter, Ernst (b0140) 2010 C. Watkins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge Univ., Cambridge, England, 1989. Driessens, Ramon, Blockeel (b0200) 2001; vol. 2167 Singh, Jaakkola, Littman, Szepesvari (b0530) 2000; 38 Silver, Sutton (b0520) 2007 Dayan (b0170) 1992; 8 Werbos (b0680) 2009 Driessens, Dzeroski (b0195) 2004; 57 J. Boyan, A.W. Moore, Generalization in reinforcement learning: safely approximating the value function, in: Advances in Neural Information Processing Systems, 1995, pp. 369–376. Cao (b0145) 2009 McPartl, Gallagher (b0385) 2011; 3 A. Antos, R. Munos, C. Szepesvari, Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems, in: 2009 American Control Conference, Hyatt Regency Riverfront, St. Louis, MO, USA, June 10–12, pp. 725–730. T. Gärtner, P. Flach, S. Wrobel, On graph kernels: hardness results and efficient alternatives, in: M.W.B. Scholkopf (Ed.), Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop, 2003, pp. 129–143. J.N. Tsitsiklis, Asynchronous Stochastic Approximation and Q-learning, Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems, MIT, Cambridge, MA, 1993. Brartke, Barto (b0135) 1996; 22 R. Sutton, A.G. Barto, A temporal-difference model of classical conditioning, in: Proceedings of the 9th Annual Conference Cognitive Science Society, 1987, pp. 355–378. A.R. Mahmood, R. Sutton, T.Degris, P.M. Pilarski, Tuning-free step-size adaptation, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012. Sutton, Barto (b0550) 1998 Peters, Schaal (b0445) 2008; 71 Abdulhai, Pringle (b0005) 2003; 129 M.L. Minsky, Theory of Neural-Analog Reinforcement Systems and its Application to the Brain-Model Problem, Ph.D. Thesis, Princeton University, 1954. Schwartz (b0510) 1993 Baird (b0055) 1995 M. Ghavamzadeh, Y. Engel, Bayesian policy gradient algorithms, in: Advances in Neural Information Processing Systems, 2006, pp. 457–464. Enns, Si (b0215) 2003; 14 Gosavi (b0255) 2004; 155 Xu, Liu, Hu (b0730) 2011; 15 George, Powell (b0260) 2006; 65 Borkar (b0100) 1997; 29 Sutton (b0565) 1988; 3 Jaakkola, Jordan, Singh (b0300) 1994; 6 Geramifard, Bowling, Zinkevich, Sutton (b0265) 2007; vol. 19 Ernst, Glavic (b0225) 2004; 19 S. Bradtke, Incremental Dynamic Programming for On-Line Adaptive Optimal Control, Ph.D. thesis, University of Massachusetts, Computer Science Dept. Tech. Rep., 1994, pp. 94–62. Zhou, Chang (b0765) 2012; 30 Dayan, Sejnowski (b0175) 1994; 14 Powell (b0460) 2007 Sutton, Precup, Singh (b0585) 1999; 112 O, Lee, Lee, Zhang (b0420) 2006; 176 Crites, Barto (b0155) 1998; 33 Vrabie, Lewis (b0655) 2009; 22 Prokhorov, Wunsch (b0470) 1997; 8 Lewis, Vrabie (b0350) 2009; 9 Peng, Bhanu (b0435) 1998; 28 Boyan (b0115) 2002; 49 D. Haussler, Convolution Kernels on Discrete Structures, Technical Report, Department of Computer Science, University of California at Santa Cruz, 1999. D. Andre, S.J. Russell, State abstraction for programmable reinforcement learning agents, in: Proceedings of the Eighteenth National Conference on Artificial Intelligence, 2002, pp. 119–125. Schölkopf, Smola (b0500) 2002 Rasmussen, Kuss (b0475) 2004; vol. 16 S. Richter, D. Aberdeen, J. Yu, Natural actor–critic for road traffic optimisation, in: Advances in Neural Information Processing Systems, 2006, pp. 3522–3529. Schölkopf, Mika, Burges, Knirsch, Müller, Rätsch, Smola (b0505) 1999; 10 J. Peters, S. Schaal, Policy gradient methods for robotics, in: Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots ans Systems, Beijing, China, 2006, pp. 2219–2225. Barto, Dietterich (b0070) 2004 Kober, Peters (b0325) 2008 Barto, Mahadevan (b0075) 2003; 13 Vapnik (b0635) 1998 Kakade (b0320) 2002 R. Sutton, Cs. Szepesvári, A. Geramifard, M. Bowling, Dyna-style planning with linear function approximation and prioritized sweeping, UAI, 2008, pp. 528–536. Ghavamzadeh, Mahadevan (b0275) McPartl (10.1016/j.ins.2013.08.037_b0385) 2011; 3 Dayan (10.1016/j.ins.2013.08.037_b0170) 1992; 8 Xu (10.1016/j.ins.2013.08.037_b0730) 2011; 15 Parr (10.1016/j.ins.2013.08.037_b0430) 1998 10.1016/j.ins.2013.08.037_b0280 Balakrishnan (10.1016/j.ins.2013.08.037_b0065) 1996; 19 10.1016/j.ins.2013.08.037_b0285 Nanduri (10.1016/j.ins.2013.08.037_b0405) 2007; 22 10.1016/j.ins.2013.08.037_b0560 Tsitsiklis (10.1016/j.ins.2013.08.037_b0615) 1997; 42 10.1016/j.ins.2013.08.037_b0165 Boyan (10.1016/j.ins.2013.08.037_b0120) 1994; 6 Samuel (10.1016/j.ins.2013.08.037_b0495) 1959; 3 Barto (10.1016/j.ins.2013.08.037_b0080) 1983; 13 Watkins (10.1016/j.ins.2013.08.037_b0675) 1992; 8 Baird (10.1016/j.ins.2013.08.037_b0055) 1995 Abdulhai (10.1016/j.ins.2013.08.037_b0005) 2003; 129 Enns (10.1016/j.ins.2013.08.037_b0215) 2003; 14 Driessens (10.1016/j.ins.2013.08.037_b0200) 2001; vol. 2167 Galindo-Serrano (10.1016/j.ins.2013.08.037_b0240) 2010; 59 10.1016/j.ins.2013.08.037_b0270 Iftekharuddin (10.1016/j.ins.2013.08.037_b0295) 2011; 22 Peters (10.1016/j.ins.2013.08.037_b0445) 2008; 71 Zhou (10.1016/j.ins.2013.08.037_b0765) 2012; 30 10.1016/j.ins.2013.08.037_b0390 Farahmand (10.1016/j.ins.2013.08.037_b0230) 2011; 85 Shimokawa (10.1016/j.ins.2013.08.037_b0515) 2009; 72 10.1016/j.ins.2013.08.037_b0670 Xu (10.1016/j.ins.2013.08.037_b0720) 2010; 10 Dzeroski (10.1016/j.ins.2013.08.037_b0775) 1998 10.1016/j.ins.2013.08.037_b0030 10.1016/j.ins.2013.08.037_b0025 Vrabie (10.1016/j.ins.2013.08.037_b0655) 2009; 22 Xu (10.1016/j.ins.2013.08.037_b0705) 2002; 16 Hu (10.1016/j.ins.2013.08.037_b0290) 2008 Boyan (10.1016/j.ins.2013.08.037_b0115) 2002; 49 Vlachogiannis (10.1016/j.ins.2013.08.037_b0645) 2004; 19 Vrabie (10.1016/j.ins.2013.08.037_b0650) 2009; 45 Bhatnagar (10.1016/j.ins.2013.08.037_b0095) 2009; 45 Jaradat (10.1016/j.ins.2013.08.037_b0305) 2011; 27 Venayagamoorthy (10.1016/j.ins.2013.08.037_b0640) 2002; 13 Al-Tamimi (10.1016/j.ins.2013.08.037_b0015) 2006 Barto (10.1016/j.ins.2013.08.037_b0075) 2003; 13 Dietterich (10.1016/j.ins.2013.08.037_b0180) 2002; vol. 14 Powell (10.1016/j.ins.2013.08.037_b0460) 2007 Brartke (10.1016/j.ins.2013.08.037_b0135) 1996; 22 Prashanth (10.1016/j.ins.2013.08.037_b0465) 2011; 12 Rasmussen (10.1016/j.ins.2013.08.037_b0475) 2004; vol. 16 O (10.1016/j.ins.2013.08.037_b0420) 2006; 176 10.1016/j.ins.2013.08.037_b0580 Barto (10.1016/j.ins.2013.08.037_b0070) 2004 10.1016/j.ins.2013.08.037_b0455 10.1016/j.ins.2013.08.037_b0210 Driessens (10.1016/j.ins.2013.08.037_b0245) 2006; 64 Silver (10.1016/j.ins.2013.08.037_b0520) 2007 10.1016/j.ins.2013.08.037_b0610 Peng (10.1016/j.ins.2013.08.037_b0440) 1998; 20 Geramifard (10.1016/j.ins.2013.08.037_b0265) 2007; vol. 19 Peng (10.1016/j.ins.2013.08.037_b0435) 1998; 28 Wang (10.1016/j.ins.2013.08.037_b0660) 2007; 177 Lewis (10.1016/j.ins.2013.08.037_b0345) 2008; 38 Ghavamzadeh (10.1016/j.ins.2013.08.037_b0275) 2007; 8 Bertsekas (10.1016/j.ins.2013.08.037_b0090) 1996 Schwartz (10.1016/j.ins.2013.08.037_b0510) 1993 Werbos (10.1016/j.ins.2013.08.037_b0680) 2009 Söderström (10.1016/j.ins.2013.08.037_b0540) 1983 Johns (10.1016/j.ins.2013.08.037_b0315) 2009; 76 Crites (10.1016/j.ins.2013.08.037_b0155) 1998; 33 10.1016/j.ins.2013.08.037_b0450 10.1016/j.ins.2013.08.037_b0330 Sutton (10.1016/j.ins.2013.08.037_b0555) 1992; 12 Zhang (10.1016/j.ins.2013.08.037_b0755) 2011; 22 10.1016/j.ins.2013.08.037_b0690 10.1016/j.ins.2013.08.037_b0570 Xu (10.1016/j.ins.2013.08.037_b0735) 2013; 24 Ernst (10.1016/j.ins.2013.08.037_b0225) 2004; 19 10.1016/j.ins.2013.08.037_b0685 Maei (10.1016/j.ins.2013.08.037_b0360) 2010; vol. 22 10.1016/j.ins.2013.08.037_b0045 Xu (10.1016/j.ins.2013.08.037_b0700) 2010 Busoniu (10.1016/j.ins.2013.08.037_b0140) 2010 Kober (10.1016/j.ins.2013.08.037_b0325) 2008 Mitola (10.1016/j.ins.2013.08.037_b0395) 1999; 6 Al-Tamimi (10.1016/j.ins.2013.08.037_b0010) 2007; 43 Driessens (10.1016/j.ins.2013.08.037_b0195) 2004; 57 Sutton (10.1016/j.ins.2013.08.037_b0585) 1999; 112 10.1016/j.ins.2013.08.037_b0605 Balaji (10.1016/j.ins.2013.08.037_b0060) 2010; 4 Vamvoudakis (10.1016/j.ins.2013.08.037_b0630) 2011; 47 Xu (10.1016/j.ins.2013.08.037_b0710) 2005; 11 Mohagheghi (10.1016/j.ins.2013.08.037_b0400) 2006; 21 Jiang (10.1016/j.ins.2013.08.037_b0310) 2011; 5 Yin (10.1016/j.ins.2013.08.037_b0745) 2005; 27 Lagoudakis (10.1016/j.ins.2013.08.037_b0340) 2003; 4 Prokhorov (10.1016/j.ins.2013.08.037_b0470) 1997; 8 Lewis (10.1016/j.ins.2013.08.037_b0350) 2009; 9 Sutton (10.1016/j.ins.2013.08.037_b0575) 2009; vol. 21 Nedic (10.1016/j.ins.2013.08.037_b0410) 2003; 13 Ormoneit (10.1016/j.ins.2013.08.037_b0425) 2002; 49 Baxter (10.1016/j.ins.2013.08.037_b0085) 2001; 15 Singh (10.1016/j.ins.2013.08.037_b0535) 1994; 16 Singh (10.1016/j.ins.2013.08.037_b0525) 1997; 9 10.1016/j.ins.2013.08.037_b0190 Schölkopf (10.1016/j.ins.2013.08.037_b0500) 2002 Yu (10.1016/j.ins.2013.08.037_b0750) 2011; 26 10.1016/j.ins.2013.08.037_b0590 Ernst (10.1016/j.ins.2013.08.037_b0220) 2005; 6 Tesauro (10.1016/j.ins.2013.08.037_b0600) 1994; 6 Carreras (10.1016/j.ins.2013.08.037_b0150) 2005; 30 10.1016/j.ins.2013.08.037_b0110 Maei (10.1016/j.ins.2013.08.037_b0365) 2010 Wang (10.1016/j.ins.2013.08.037_b0665) 2009 Borkar (10.1016/j.ins.2013.08.037_b0100) 1997; 29 10.1016/j.ins.2013.08.037_b0740 10.1016/j.ins.2013.08.037_b0620 Driessens (10.1016/j.ins.2013.08.037_b0205) 2003 Jaakkola (10.1016/j.ins.2013.08.037_b0300) 1994; 6 Vapnik (10.1016/j.ins.2013.08.037_b0635) 1998 Gosavi (10.1016/j.ins.2013.08.037_b0255) 2004; 155 Riedmiller (10.1016/j.ins.2013.08.037_b0480) 2009; 27 Xu (10.1016/j.ins.2013.08.037_b0725) 2011; 22 Zhou (10.1016/j.ins.2013.08.037_b0770) 2002; 145 Xu (10.1016/j.ins.2013.08.037_b0715) 2007; 18 10.1016/j.ins.2013.08.037_b0380 Sutton (10.1016/j.ins.2013.08.037_b0565) 1988; 3 Crites (10.1016/j.ins.2013.08.037_b0160) 1996; 8 Bagnell (10.1016/j.ins.2013.08.037_b0050) 2003 Liu (10.1016/j.ins.2013.08.037_b0355) 2005; 16 Cao (10.1016/j.ins.2013.08.037_b0145) 2009 Stone (10.1016/j.ins.2013.08.037_b0545) 2005; 13 Dietterich (10.1016/j.ins.2013.08.037_b0185) 2000; 13 Kakade (10.1016/j.ins.2013.08.037_b0320) 2002 Schölkopf (10.1016/j.ins.2013.08.037_b0505) 1999; 10 Singh (10.1016/j.ins.2013.08.037_b0530) 2000; 38 Amari (10.1016/j.ins.2013.08.037_b0020) 1998; 10 Szepesvári (10.1016/j.ins.2013.08.037_b0595) 2010 10.1016/j.ins.2013.08.037_b0490 10.1016/j.ins.2013.08.037_b0370 Sutton (10.1016/j.ins.2013.08.037_b0550) 1998 Ng (10.1016/j.ins.2013.08.037_b0415) 2004; 16 Vamvoudakis (10.1016/j.ins.2013.08.037_b0625) 2010; 46 10.1016/j.ins.2013.08.037_b0250 10.1016/j.ins.2013.08.037_b0130 Widrow (10.1016/j.ins.2013.08.037_b0695) 1973; SMC-3 10.1016/j.ins.2013.08.037_b0125 Borkar (10.1016/j.ins.2013.08.037_b0105) 2008 Mahadevan (10.1016/j.ins.2013.08.037_b0375) 2007; 8 10.1016/j.ins.2013.08.037_b0485 10.1016/j.ins.2013.08.037_b0760 Dayan (10.1016/j.ins.2013.08.037_b0175) 1994; 14 George (10.1016/j.ins.2013.08.037_b0260) 2006; 65 Arel (10.1016/j.ins.2013.08.037_b0035) 2010; 4 Bach (10.1016/j.ins.2013.08.037_b0040) 2002; 3 Farahmand (10.1016/j.ins.2013.08.037_b0235) 2008 Konda (10.1016/j.ins.2013.08.037_b0335) 2000; vol. 12 |
| References_xml | – year: 2009 ident: b0145 article-title: Stochastic Learning and Optimization – year: 2007 ident: b0460 article-title: Approximate Dynamic Programming: Solving the Curses of Dimensionality – volume: 4 start-page: 128 year: 2010 end-page: 135 ident: b0035 article-title: Reinforcement learning-based multi-agent system for network traffic signal control publication-title: IET Intelligent Transport Systems – volume: 24 start-page: 762 year: 2013 end-page: 775 ident: b0735 article-title: Online learning control using adaptive critic designs with sparse kernel machines publication-title: IEEE Transactions on Neural Networks and Learning Systems – reference: K.-L.A. Yau, P. Komisarczuk, et al., Applications of reinforcement learning to cognitive radio networks, 2010 IEEE International Conference on Communication Workshops (ICC), 2010, pp. 1–6. – volume: vol. 22 year: 2010 ident: b0360 article-title: Convergent temporal-difference learning with arbitrary smooth function approximation publication-title: Advances in Neural Information Processing Systems – volume: 11 start-page: 54 year: 2005 end-page: 63 ident: b0710 article-title: Kernel least-squares temporal difference learning publication-title: International Journal of Information Technology – volume: 21 start-page: 1744 year: 2006 end-page: 1754 ident: b0400 article-title: Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system publication-title: IEEE Transactions on Power Systems – volume: 13 start-page: 41 year: 2003 end-page: 77 ident: b0075 article-title: Recent advances in hierarchical reinforcement learning publication-title: Discrete Event Dynamic Systems-Theory and Applications – volume: 6 start-page: 503 year: 2005 end-page: 556 ident: b0220 article-title: Tree-based batch mode reinforcement learning publication-title: Journal of Machine Learning Research – volume: 20 start-page: 139 year: 1998 end-page: 154 ident: b0440 article-title: Closed-loop object recognition using reinforcement learning publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – year: 2002 ident: b0500 article-title: Learning with Kernels – reference: R. Sutton, Adapting bias by gradient descent: an incremental version of delta-bar-delta, in: Proceedings of the 10th National Conference on Artificial Intelligence, 1992, pp. 171–176. – volume: 42 start-page: 674 year: 1997 end-page: 690 ident: b0615 article-title: An analysis of temporal difference learning with function approximation publication-title: IEEE Transactions on Automatic Control – year: 2008 ident: b0290 article-title: Markov Decision Processes with Their Applications – volume: 18 start-page: 973 year: 2007 end-page: 992 ident: b0715 article-title: Kernel based least-squares policy iteration for reinforcement learning publication-title: IEEE Transactions on Neural Networks – volume: 8 year: 1996 ident: b0160 article-title: Improving elevator performance using reinforcement learning publication-title: Advances in Neural Information Processing Systems – year: 2008 ident: b0105 article-title: Stochastic Approximation: A Dynamical Systems Viewpoint – reference: T. Gärtner, P. Flach, S. Wrobel, On graph kernels: hardness results and efficient alternatives, in: M.W.B. Scholkopf (Ed.), Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop, 2003, pp. 129–143. – start-page: 719 year: 2010 end-page: 726 ident: b0365 article-title: Toward off-policy learning control with function approximation publication-title: ICML – reference: S. Mahadevan, Proto-value functions: developmental reinforcement learning, in: Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 553–560. – volume: 3 start-page: 43 year: 2011 end-page: 56 ident: b0385 article-title: Reinforcement learning in first person shooter games publication-title: IEEE Transactions on Computational Intelligence and AI in Games – volume: 176 start-page: 2121 year: 2006 end-page: 2147 ident: b0420 article-title: Adaptive stock trading with dynamic asset allocation using reinforcement learning publication-title: Information Sciences – volume: 71 start-page: 1180 year: 2008 end-page: 1190 ident: b0445 article-title: Natural actor–critic publication-title: Neurocomputing – volume: 22 start-page: 2226 year: 2011 end-page: 2236 ident: b0755 article-title: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method publication-title: IEEE Transactions on Neural Networks – volume: 38 start-page: 287 year: 2000 end-page: 308 ident: b0530 article-title: Convergence results for single-step on-policy reinforcement-learning algorithms publication-title: Machine Learning – volume: 19 start-page: 893 year: 1996 end-page: 898 ident: b0065 article-title: Adaptive-critic-based neural networks for aircraft optimal control publication-title: Journal of Guidance, Control, Dynamics – volume: 13 start-page: 79 year: 2003 end-page: 110 ident: b0410 article-title: Least squares policy evaluation algorithms with linear function approximation publication-title: Discrete Event Dynamic Systems – volume: 49 start-page: 233 year: 2002 end-page: 246 ident: b0115 article-title: Technical update: least-squares temporal difference learning publication-title: Machine Learning – volume: 6 year: 1994 ident: b0120 article-title: Packet routing in dynamically changing networks: a reinforcement learning approach publication-title: Advances in neural information processing systems – volume: 6 start-page: 185 year: 1994 end-page: 1201 ident: b0300 article-title: On the convergence of stochastic iterative dynamic programming algorithms publication-title: Neural Computation – volume: 12 start-page: 19 year: 1992 end-page: 22 ident: b0555 article-title: Reinforcement learning is direct adaptive control publication-title: IEEE Control Systems – year: 1996 ident: b0090 article-title: Neuro-Dynamic Programming – volume: 3 start-page: 211 year: 1959 end-page: 229 ident: b0495 article-title: Some studies in machine learning using game of checkers publication-title: IBM Jounal on Research and Development – reference: Y. Engel, S. Mannor, R. Meir, “Bayes meets bellman: the Gaussian Process approach to temporal difference learning, in: Proceedings of the Twentieth International Conference of Machine Learning, Washington, DC, 2003, pp. 154–161. – start-page: 298 year: 1993 end-page: 305 ident: b0510 article-title: A reinforcement learning method for maximizing undiscounted rewards publication-title: Proceedings of the Tenth Annual Conference on Machine Learning – volume: 8 start-page: 997 year: 1997 end-page: 1007 ident: b0470 article-title: Adaptive critic designs publication-title: IEEE Transactions Neural Networks – volume: 19 start-page: 1225 year: 2004 end-page: 1317 ident: b0645 article-title: Reinforcement learning for reactive power control publication-title: IEEE Transactions on Power Systems – volume: 14 start-page: 929 year: 2003 end-page: 939 ident: b0215 article-title: Helicopter trimming and tracking control using direct neural dynamic programming publication-title: IEEE Transactions on Neural Networks – volume: 6 start-page: 13 year: 1999 end-page: 18 ident: b0395 article-title: Cognitive radio: making software radios more personal publication-title: IEEE Personal Communications – start-page: 1531 year: 2002 end-page: 1538 ident: b0320 article-title: A natural policy gradient publication-title: Advances in Neural Information Processing Systems – start-page: 441 year: 2008 end-page: 448 ident: b0235 article-title: Regularized policy iteration publication-title: NIPS – volume: 65 start-page: 167 year: 2006 end-page: 198 ident: b0260 article-title: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming publication-title: Machine Learning – volume: 4 start-page: 177 year: 2010 end-page: 188 ident: b0060 article-title: Urban traffic signal control using reinforcement learning agents publication-title: IET Intelligent Transport Systems – volume: 64 start-page: 91 year: 2006 end-page: 119 ident: b0245 article-title: Graph kernels and Gaussian Processes for relational reinforcement learning publication-title: Machine Learning – volume: 16 start-page: 227 year: 1994 end-page: 233 ident: b0535 article-title: An upper bound on the loss from approximate optimal value functions publication-title: Machine Learning – volume: 112 start-page: 181 year: 1999 end-page: 211 ident: b0585 article-title: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning publication-title: Artificial Intelligence – reference: S.Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in: Proceedings of the Fourth Connectionist Models Summer School, 1993, pp. 255–263. – reference: W.T.B. Uther, M.M. Veloso, Tree based discretization for continuous state space reinforcement learning, in: Proceedings of AAAI-98, 1998, pp. 769–774. – start-page: 1043 year: 1998 end-page: 1049 ident: b0430 article-title: Reinforcement learning with hierarchies of machines publication-title: Advances in Neural Information Processing Systems – volume: 59 start-page: 1823 year: 2010 end-page: 1834 ident: b0240 article-title: Distributed Q-Learning for aggregated interference control in cognitive radio networks publication-title: IEEE Transactions on Vehicular Technology – volume: 10 start-page: 859 year: 2010 end-page: 867 ident: b0720 article-title: Sequential anomaly detection based on temporal-difference learning: principles, models and case studies publication-title: Applied Soft Computing – volume: 33 start-page: 235 year: 1998 end-page: 262 ident: b0155 article-title: Elevator group control using multiple reinforcement learning agents publication-title: Machine Learning – volume: 16 start-page: 1219 year: 2005 end-page: 1228 ident: b0355 article-title: A self-learning call admission control scheme for CDMA cellular networks publication-title: IEEE Transactions on Neural Networks – reference: P.J. Werbos, Using ADP to understand and replicate brain intelligence: the next level design, in: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, pp. 209–216. – volume: 9 start-page: 974 year: 1997 end-page: 980 ident: b0525 article-title: Reinforcement learning for dynamic channel allocation in cellular telephone systems publication-title: Advances in Neural Information Processsing Systems – start-page: 200 year: 2009 end-page: 212 ident: b0680 article-title: Intelligence in the brain: a theory of how it works and how to build it publication-title: Neural Networks – volume: 16 start-page: 259 year: 2002 end-page: 292 ident: b0705 article-title: Efficient reinforcement learning using recursive least-squares methods publication-title: Journal of Artificial Intelligence Research – reference: A. Antos, R. Munos, C. Szepesvari, Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems, in: 2009 American Control Conference, Hyatt Regency Riverfront, St. Louis, MO, USA, June 10–12, pp. 725–730. – volume: vol. 16 start-page: 751 year: 2004 end-page: 759 ident: b0475 article-title: Gaussian processes in reinforcement learning publication-title: Advances in Neural Information Processing Systems – reference: J.N. Tsitsiklis, Asynchronous Stochastic Approximation and Q-learning, Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems, MIT, Cambridge, MA, 1993. – year: 2010 ident: b0700 article-title: Reinforcement Learning and Approximate Dynamic Programming – volume: 30 start-page: 54 year: 2012 end-page: 69 ident: b0765 article-title: Reinforcement learning for repeated power control game in cognitive radio networks publication-title: IEEE Journal on Selected Areas in Communications – volume: 15 start-page: 1055 year: 2011 end-page: 1070 ident: b0730 article-title: Continuous-action reinforcement learning with fast policy search and adaptive basis function selection publication-title: Soft Computing – A Fusion of Foundations, Methodologies and Applications – reference: B. Hengst, Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning, in: AI 2007: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 4830, 2007, pp. 58–67. – reference: N. Kohl, P. Stone, Machine learning for fast quadrupedal locomotion, in: D.L. McGuinness, G. Ferguson (Eds.), Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI 2004), AAAI Press, Menlo Park, pp. 611–616. – reference: T.G. Dietterich, State abstraction in MAXQ hierarchical reinforcement learning, in: S.A. Solla, T.K. Leen, K.R. Muller (Eds.), Advances in Neural Information Processing Systems, NIPS, 2000, pp. 994–1000. – volume: 47 start-page: 1556 year: 2011 end-page: 1569 ident: b0630 article-title: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton–Jacobi equations publication-title: Automatica – volume: 13 start-page: 227 year: 2000 end-page: 303 ident: b0185 article-title: Hierarchical reinforcement learning with the Max-Q value function decomposition publication-title: Journal of Artificial Intelligence Research – reference: A.R. Mahmood, R. Sutton, T.Degris, P.M. Pilarski, Tuning-free step-size adaptation, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012. – year: 1998 ident: b0635 article-title: Statistical Learning Theory – volume: 27 start-page: 1536 year: 2005 end-page: 1551 ident: b0745 article-title: Integrating relevance feedback techniques for image retrieval using reinforcement learning publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – reference: R. Sutton, A.G. Barto, A temporal-difference model of classical conditioning, in: Proceedings of the 9th Annual Conference Cognitive Science Society, 1987, pp. 355–378. – volume: vol. 21 start-page: 1609 year: 2009 end-page: 1616 ident: b0575 article-title: A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation publication-title: Advances in Neural Information Processing Systems – volume: 85 start-page: 299 year: 2011 end-page: 332 ident: b0230 article-title: Model selection in reinforcement learning publication-title: Machine Learning – volume: 19 start-page: 427 year: 2004 end-page: 435 ident: b0225 article-title: Power systems stability control: reinforcement learning framework publication-title: IEEE Transactions on Power Systems – volume: 145 start-page: 45 year: 2002 end-page: 68 ident: b0770 article-title: Robot learning with GA-based fuzzy reinforcement learning agents publication-title: Information Sciences – volume: 22 start-page: 33 year: 1996 end-page: 57 ident: b0135 article-title: Linear least-squares algorithms for temporal difference learning publication-title: Machine Learning – volume: 5 start-page: 1309 year: 2011 end-page: 1317 ident: b0310 article-title: Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing publication-title: IET Communication – year: 2006 ident: b0015 article-title: Adaptive critic designs for discrete-time zero-sum games with application to H-Infinity control publication-title: IEEE Transactions on Systems Man Cybernetics-Part B – reference: D. Andre, S.J. Russell, State abstraction for programmable reinforcement learning agents, in: Proceedings of the Eighteenth National Conference on Artificial Intelligence, 2002, pp. 119–125. – volume: 30 start-page: 416 year: 2005 end-page: 427 ident: b0150 article-title: A behavior-based scheme using reinforcement learning for autonomous underwater vehicles publication-title: IEEE Journal of Oceanic Engineering – volume: vol. 14 start-page: 1491 year: 2002 end-page: 1498 ident: b0180 article-title: Batch value function approximation via support vectors publication-title: Advances in Neural Information Processing Systems – reference: R. Sutton, Cs. Szepesvári, A. Geramifard, M. Bowling, Dyna-style planning with linear function approximation and prioritized sweeping, UAI, 2008, pp. 528–536. – volume: 57 start-page: 271 year: 2004 end-page: 304 ident: b0195 article-title: Integrating guidance into relational reinforcement learning publication-title: Machine Learning – reference: M. Riedmiller, M. Montemerlo, et al., Learning to drive in 20 – start-page: 1019 year: 2003 end-page: 1024 ident: b0050 article-title: Covariant policy search publication-title: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03) – start-page: 123 year: 2003 end-page: 130 ident: b0205 article-title: Relational instance based regression for relational reinforcement learning publication-title: Proceedings of the Twentieth International Conference on Machine Learning – volume: 27 start-page: 55 year: 2009 end-page: 74 ident: b0480 article-title: Reinforcement learning for robot soccer publication-title: Autonomous Robots – volume: 72 start-page: 3447 year: 2009 end-page: 3461 ident: b0515 article-title: Predicting investment behavior: an augmented reinforcement learning model publication-title: Neurocomputing – reference: R. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari, E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation, in: Proceedings of the 26th Annual International Conference on Machine Learning (ICML-09), 2009, pp. 993–1000. – volume: 29 start-page: 291 year: 1997 end-page: 294 ident: b0100 article-title: Stochastic approximation with two time scales publication-title: Systems & Control Letters – volume: 22 start-page: 85 year: 2007 end-page: 95 ident: b0405 article-title: A reinforcement learning model to assess market power under auction-based energy pricingm publication-title: IEEE Transactions on Power Systems – start-page: 39 year: 2009 end-page: 47 ident: b0665 article-title: Adaptive dynamic programming: an introduction publication-title: IEEE Computational Intelligence Magazine – reference: J. Peters, S. Schaal, Policy gradient methods for robotics, in: Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots ans Systems, Beijing, China, 2006, pp. 2219–2225. – volume: 155 start-page: 654 year: 2004 end-page: 674 ident: b0255 article-title: Reinforcement learning for long-run average cost publication-title: European Journal of Operational Research – volume: 15 start-page: 319 year: 2001 end-page: 350 ident: b0085 article-title: Infinite-horizon policy-gradient estimation publication-title: Journal of Artificial Intelligence Research – reference: M. Ghavamzadeh, Y. Engel, Bayesian policy gradient algorithms, in: Advances in Neural Information Processing Systems, 2006, pp. 457–464. – year: 2008 ident: b0325 article-title: Policy search for motor primitives in robotics publication-title: Advances in Neural Information Processing Systems – volume: 4 start-page: 1107 year: 2003 end-page: 1149 ident: b0340 article-title: Least-squares policy iteration publication-title: Journal of Machine Learning Research – volume: 45 start-page: 477 year: 2009 end-page: 484 ident: b0650 article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration publication-title: Automatica – volume: 22 start-page: 237 year: 2009 end-page: 246 ident: b0655 article-title: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems publication-title: Neural Networks – volume: 26 start-page: 1272 year: 2011 end-page: 1282 ident: b0750 article-title: Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q( publication-title: IEEE Transactions on Power Systems – reference: J.A. Bagnell, J.G. Schneider, Autonomous helicopter control using reinforcement learning policy search methods, in: Proceedings of the 2001 IEEE International Conference on Robotics & Automation, Seoul, Korea, 2001, pp. 1615–1620. – volume: 177 start-page: 3764 year: 2007 end-page: 3781 ident: b0660 article-title: A fuzzy actor–critic reinforcement learning network publication-title: Information Sciences – reference: C. Watkins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge Univ., Cambridge, England, 1989. – year: 1998 ident: b0550 article-title: Reinforcement Learning. An Introduction – year: 2010 ident: b0140 article-title: Reinforcement Learning and Dynamic Programming Using Function Approximators – volume: 8 start-page: 341 year: 1992 end-page: 362 ident: b0170 article-title: The convergence of TD( publication-title: Machine Learning – reference: S. Richter, D. Aberdeen, J. Yu, Natural actor–critic for road traffic optimisation, in: Advances in Neural Information Processing Systems, 2006, pp. 3522–3529. – reference: W. Zhang, T.G. Dietterich. A reinforcement learning approach to job-shop scheduling, in: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI 1995), 1995, pp. 1114–1120. – volume: 12 start-page: 412 year: 2011 end-page: 421 ident: b0465 article-title: Reinforcement learning with function approximation for traffic signal control publication-title: IEEE Transactions on Intelligence Transportation Systems – year: 2010 ident: b0595 article-title: Algorithms for Reinforcement Learning – volume: 38 year: 2008 ident: b0345 article-title: Special issue on approximate dynamic programming and reinforcement learning for feedback control publication-title: IEEE Transactions on Systems, Man, and Cybernetics B – volume: 8 start-page: 279 year: 1992 end-page: 292 ident: b0675 article-title: Q-Learning publication-title: Machine Learning – start-page: 1053 year: 2007 end-page: 1058 ident: b0520 article-title: Reinforcement learning of local shape in the game of Go publication-title: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007) – volume: 3 start-page: 1 year: 2002 end-page: 48 ident: b0040 article-title: Kernel independent component analysis publication-title: Journal of Machine Learning Research – reference: S. Bradtke, Incremental Dynamic Programming for On-Line Adaptive Optimal Control, Ph.D. thesis, University of Massachusetts, Computer Science Dept. Tech. Rep., 1994, pp. 94–62. – volume: vol. 19 start-page: 441 year: 2007 end-page: 448 ident: b0265 article-title: iLSTD: eligibility traces and convergence analysis publication-title: Advances in Neural Information Processing Systems – volume: 28 start-page: 482 year: 1998 end-page: 488 ident: b0435 article-title: Delayed reinforcement learning for adaptive image segmentation and feature extraction publication-title: IEEE Transactions on System Man and Cybernetics-Part C – volume: 43 start-page: 473 year: 2007 end-page: 481 ident: b0010 article-title: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control publication-title: Automatica – reference: min, in: Proceedings of the FBIT 2007 Conference, Jeju, Korea, 2007. – reference: J. Peters, S. Vijayakumar, S. Schaal, Reinforcement learning for humanoid robotics, in: IEEE/RSJ International Conference on Humanoid Robotics, 2003. – reference: S. Bradtke, B. Ydstie, A. Barto, Adaptive linear quadratic control using policy iteration, Univ. Massachusetts, Amherst, MA, Tech. Rep. CMPSCI-94-49, June 1994. – start-page: 30 year: 1995 end-page: 37 ident: b0055 article-title: Residual algorithms: reinforcement learning with function approximation publication-title: Proceedings of the 12th International Conference on Machine Learning (ICML 1995) – reference: M.L. Minsky, Theory of Neural-Analog Reinforcement Systems and its Application to the Brain-Model Problem, Ph.D. Thesis, Princeton University, 1954. – volume: 10 start-page: 1000 year: 1999 end-page: 1017 ident: b0505 article-title: Input space vs feature space in kernel-based algorithms publication-title: IEEE Transactions on Neural Networks – volume: 6 start-page: 215 year: 1994 end-page: 219 ident: b0600 article-title: TD-Gammon, a self-teaching backgammon program, achieves master-level play publication-title: Neural Computation – volume: 45 start-page: 2471 year: 2009 end-page: 2482 ident: b0095 article-title: Natural actor–critic algorithms publication-title: Automatica – volume: 8 start-page: 2629 year: 2007 end-page: 2669 ident: b0275 article-title: Hierarchical average reward reinforcement learning publication-title: Journal of Machine Learning Research – volume: vol. 12 year: 2000 ident: b0335 article-title: Actor–critic algorithms publication-title: Advances in Neural Information Processing Systems – year: 2004 ident: b0070 article-title: Reinforcement learning and its relationship to supervised learning publication-title: Handbook of Learning and Approximate Dynamic Programming – volume: 27 start-page: 135 year: 2011 end-page: 149 ident: b0305 article-title: Reinforcement based mobile robot navigation in dynamic environment publication-title: Robotics and Computer-Integrated Manufacturing – year: 1983 ident: b0540 article-title: Instrumental Variable Methods in System Identification – reference: J. Boyan, A.W. Moore, Generalization in reinforcement learning: safely approximating the value function, in: Advances in Neural Information Processing Systems, 1995, pp. 369–376. – reference: P.J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavior Sciences, Ph.D. thesis, Committee Appl. Math. Harvard Univ., 1974. – volume: 9 start-page: 32 year: 2009 end-page: 50 ident: b0350 article-title: Reinforcement learning and adaptive dynamic programming for feedback control publication-title: IEEE Circuits and Systems Magazine – volume: 13 start-page: 764 year: 2002 end-page: 773 ident: b0640 article-title: Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator publication-title: IEEE Transactions on Neural Networks – volume: 49 start-page: 161 year: 2002 end-page: 178 ident: b0425 article-title: Kernel-based reinforcement learning publication-title: Machine Learning – volume: vol. 2167 start-page: 97 year: 2001 end-page: 108 ident: b0200 article-title: Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner publication-title: Proceedings of the 13th European Conference on Machine Learning – volume: 8 start-page: 2169 year: 2007 end-page: 2231 ident: b0375 article-title: Proto-value functions: a laplacian framework for learning representation and control in markov decision processes publication-title: Journal of Machine Learning Research – volume: 13 start-page: 834 year: 1983 end-page: 846 ident: b0080 article-title: Neuron-like adaptive elements that can solve difficult learning control problems publication-title: IEEE Transactions on Systems, Man, and Cybernetics – reference: D. Haussler, Convolution Kernels on Discrete Structures, Technical Report, Department of Computer Science, University of California at Santa Cruz, 1999. – volume: SMC-3 start-page: 455 year: 1973 end-page: 465 ident: b0695 article-title: Punish/reward: Learning with a critic in adaptive threshold systems publication-title: IEEE Transactions on Systems, Man, and Cybernetics – volume: 13 start-page: 165 year: 2005 end-page: 188 ident: b0545 article-title: Reinforcement learning for RoboCup-soccer keepaway publication-title: Adaptive Behavior – volume: 3 start-page: 9 year: 1988 end-page: 44 ident: b0565 article-title: Learning to predict by the method of temporal differences publication-title: Machine Learning – volume: 46 start-page: 878 year: 2010 end-page: 888 ident: b0625 article-title: Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem publication-title: Automatica – reference: C. Darken, J. Moody, Note on learning rate schedules for stochastic optimization, in: Lippman, et al. (Eds.), Advances in Neural Information Processing Systems, vol. 3, 1991, pp. 1009–1016. – volume: 16 year: 2004 ident: b0415 article-title: Autonomous helicopter flight via reinforcement learning publication-title: Advances in Neural Information Processing Systems – volume: 10 start-page: 251 year: 1998 end-page: 276 ident: b0020 article-title: Natural gradient works efficiently in learning publication-title: Neural Computation – volume: 14 start-page: 295 year: 1994 end-page: 301 ident: b0175 article-title: TD( publication-title: Machine Learning – volume: 129 start-page: 278 year: 2003 end-page: 285 ident: b0005 article-title: Reinforcement learning for true adaptive traffic signal control publication-title: Journal of Transportation Engineering – volume: 22 start-page: 1863 year: 2011 end-page: 1877 ident: b0725 article-title: Hierarchical approximate policy iteration with binary-tree state space decomposition publication-title: IEEE Transactions on Neural Networks – volume: 22 start-page: 906 year: 2011 end-page: 918 ident: b0295 article-title: Transformation invariant on-line target recognition publication-title: IEEE Transactions on Neural Networks – volume: 76 start-page: 243 year: 2009 end-page: 256 ident: b0315 article-title: Hybrid least-squares algorithms for approximate policy evaluation publication-title: Machine Learning – ident: 10.1016/j.ins.2013.08.037_b0670 – volume: vol. 21 start-page: 1609 year: 2009 ident: 10.1016/j.ins.2013.08.037_b0575 article-title: A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation – volume: 129 start-page: 278 issue: 3 year: 2003 ident: 10.1016/j.ins.2013.08.037_b0005 article-title: Reinforcement learning for true adaptive traffic signal control publication-title: Journal of Transportation Engineering doi: 10.1061/(ASCE)0733-947X(2003)129:3(278) – volume: 85 start-page: 299 issue: 3 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0230 article-title: Model selection in reinforcement learning publication-title: Machine Learning doi: 10.1007/s10994-011-5254-7 – volume: 27 start-page: 55 issue: 1 year: 2009 ident: 10.1016/j.ins.2013.08.037_b0480 article-title: Reinforcement learning for robot soccer publication-title: Autonomous Robots doi: 10.1007/s10514-009-9120-4 – ident: 10.1016/j.ins.2013.08.037_b0590 – volume: vol. 19 start-page: 441 year: 2007 ident: 10.1016/j.ins.2013.08.037_b0265 article-title: iLSTD: eligibility traces and convergence analysis – volume: vol. 16 start-page: 751 year: 2004 ident: 10.1016/j.ins.2013.08.037_b0475 article-title: Gaussian processes in reinforcement learning – volume: 45 start-page: 2471 issue: 11 year: 2009 ident: 10.1016/j.ins.2013.08.037_b0095 article-title: Natural actor–critic algorithms publication-title: Automatica doi: 10.1016/j.automatica.2009.07.008 – volume: 28 start-page: 482 issue: 3 year: 1998 ident: 10.1016/j.ins.2013.08.037_b0435 article-title: Delayed reinforcement learning for adaptive image segmentation and feature extraction publication-title: IEEE Transactions on System Man and Cybernetics-Part C doi: 10.1109/5326.704593 – volume: 22 start-page: 237 issue: 3 year: 2009 ident: 10.1016/j.ins.2013.08.037_b0655 article-title: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems publication-title: Neural Networks doi: 10.1016/j.neunet.2009.03.008 – start-page: 1053 year: 2007 ident: 10.1016/j.ins.2013.08.037_b0520 article-title: Reinforcement learning of local shape in the game of Go publication-title: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007) – volume: 20 start-page: 139 issue: 2 year: 1998 ident: 10.1016/j.ins.2013.08.037_b0440 article-title: Closed-loop object recognition using reinforcement learning publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence doi: 10.1109/34.659932 – start-page: 136 year: 1998 ident: 10.1016/j.ins.2013.08.037_b0775 article-title: Relational reinforcement Learning – ident: 10.1016/j.ins.2013.08.037_b0370 doi: 10.1145/1102351.1102421 – start-page: 39 year: 2009 ident: 10.1016/j.ins.2013.08.037_b0665 article-title: Adaptive dynamic programming: an introduction publication-title: IEEE Computational Intelligence Magazine doi: 10.1109/MCI.2009.932261 – volume: 3 start-page: 9 year: 1988 ident: 10.1016/j.ins.2013.08.037_b0565 article-title: Learning to predict by the method of temporal differences publication-title: Machine Learning doi: 10.1023/A:1022633531479 – year: 1996 ident: 10.1016/j.ins.2013.08.037_b0090 – ident: 10.1016/j.ins.2013.08.037_b0125 – volume: 49 start-page: 233 issue: 2–3 year: 2002 ident: 10.1016/j.ins.2013.08.037_b0115 article-title: Technical update: least-squares temporal difference learning publication-title: Machine Learning doi: 10.1023/A:1017936530646 – volume: 71 start-page: 1180 year: 2008 ident: 10.1016/j.ins.2013.08.037_b0445 article-title: Natural actor–critic publication-title: Neurocomputing doi: 10.1016/j.neucom.2007.11.026 – volume: 14 start-page: 295 year: 1994 ident: 10.1016/j.ins.2013.08.037_b0175 article-title: TD(λ) converges with probability 1 publication-title: Machine Learning doi: 10.1023/A:1022657612745 – volume: 13 start-page: 41 issue: 1-2 year: 2003 ident: 10.1016/j.ins.2013.08.037_b0075 article-title: Recent advances in hierarchical reinforcement learning publication-title: Discrete Event Dynamic Systems-Theory and Applications doi: 10.1023/A:1022140919877 – volume: 46 start-page: 878 issue: 5 year: 2010 ident: 10.1016/j.ins.2013.08.037_b0625 article-title: Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem publication-title: Automatica doi: 10.1016/j.automatica.2010.02.018 – ident: 10.1016/j.ins.2013.08.037_b0455 – volume: 13 start-page: 227 year: 2000 ident: 10.1016/j.ins.2013.08.037_b0185 article-title: Hierarchical reinforcement learning with the Max-Q value function decomposition publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.639 – start-page: 298 year: 1993 ident: 10.1016/j.ins.2013.08.037_b0510 article-title: A reinforcement learning method for maximizing undiscounted rewards – volume: 22 start-page: 1863 issue: 12 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0725 article-title: Hierarchical approximate policy iteration with binary-tree state space decomposition publication-title: IEEE Transactions on Neural Networks doi: 10.1109/TNN.2011.2168422 – volume: 13 start-page: 834 issue: 5 year: 1983 ident: 10.1016/j.ins.2013.08.037_b0080 article-title: Neuron-like adaptive elements that can solve difficult learning control problems publication-title: IEEE Transactions on Systems, Man, and Cybernetics doi: 10.1109/TSMC.1983.6313077 – ident: 10.1016/j.ins.2013.08.037_b0390 – ident: 10.1016/j.ins.2013.08.037_b0690 – volume: 155 start-page: 654 year: 2004 ident: 10.1016/j.ins.2013.08.037_b0255 article-title: Reinforcement learning for long-run average cost publication-title: European Journal of Operational Research doi: 10.1016/S0377-2217(02)00874-3 – volume: 27 start-page: 1536 issue: 10 year: 2005 ident: 10.1016/j.ins.2013.08.037_b0745 article-title: Integrating relevance feedback techniques for image retrieval using reinforcement learning publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence doi: 10.1109/TPAMI.2005.201 – volume: 22 start-page: 2226 issue: 12 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0755 article-title: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method publication-title: IEEE Transactions on Neural Networks doi: 10.1109/TNN.2011.2168538 – volume: 22 start-page: 906 issue: 6 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0295 article-title: Transformation invariant on-line target recognition publication-title: IEEE Transactions on Neural Networks doi: 10.1109/TNN.2011.2132737 – volume: 15 start-page: 319 year: 2001 ident: 10.1016/j.ins.2013.08.037_b0085 article-title: Infinite-horizon policy-gradient estimation publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.806 – volume: 19 start-page: 893 issue: 4 year: 1996 ident: 10.1016/j.ins.2013.08.037_b0065 article-title: Adaptive-critic-based neural networks for aircraft optimal control publication-title: Journal of Guidance, Control, Dynamics doi: 10.2514/3.21715 – start-page: 1019 year: 2003 ident: 10.1016/j.ins.2013.08.037_b0050 article-title: Covariant policy search – volume: 13 start-page: 79 issue: 1 year: 2003 ident: 10.1016/j.ins.2013.08.037_b0410 article-title: Least squares policy evaluation algorithms with linear function approximation publication-title: Discrete Event Dynamic Systems doi: 10.1023/A:1022192903948 – volume: 30 start-page: 416 issue: 2 year: 2005 ident: 10.1016/j.ins.2013.08.037_b0150 article-title: A behavior-based scheme using reinforcement learning for autonomous underwater vehicles publication-title: IEEE Journal of Oceanic Engineering doi: 10.1109/JOE.2004.835805 – issue: NIPS 2008 year: 2008 ident: 10.1016/j.ins.2013.08.037_b0325 article-title: Policy search for motor primitives in robotics publication-title: Advances in Neural Information Processing Systems – volume: 65 start-page: 167 year: 2006 ident: 10.1016/j.ins.2013.08.037_b0260 article-title: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming publication-title: Machine Learning doi: 10.1007/s10994-006-8365-9 – volume: 177 start-page: 3764 issue: 18 year: 2007 ident: 10.1016/j.ins.2013.08.037_b0660 article-title: A fuzzy actor–critic reinforcement learning network publication-title: Information Sciences doi: 10.1016/j.ins.2007.03.012 – volume: 8 start-page: 2629 year: 2007 ident: 10.1016/j.ins.2013.08.037_b0275 article-title: Hierarchical average reward reinforcement learning publication-title: Journal of Machine Learning Research – volume: 16 start-page: 259 year: 2002 ident: 10.1016/j.ins.2013.08.037_b0705 article-title: Efficient reinforcement learning using recursive least-squares methods publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.946 – volume: 16 start-page: 1219 issue: 5 year: 2005 ident: 10.1016/j.ins.2013.08.037_b0355 article-title: A self-learning call admission control scheme for CDMA cellular networks publication-title: IEEE Transactions on Neural Networks doi: 10.1109/TNN.2005.853408 – ident: 10.1016/j.ins.2013.08.037_b0620 – volume: 11 start-page: 54 issue: 9 year: 2005 ident: 10.1016/j.ins.2013.08.037_b0710 article-title: Kernel least-squares temporal difference learning publication-title: International Journal of Information Technology – start-page: 1043 year: 1998 ident: 10.1016/j.ins.2013.08.037_b0430 article-title: Reinforcement learning with hierarchies of machines – ident: 10.1016/j.ins.2013.08.037_b0740 doi: 10.1109/ICCW.2010.5503970 – volume: 76 start-page: 243 year: 2009 ident: 10.1016/j.ins.2013.08.037_b0315 article-title: Hybrid least-squares algorithms for approximate policy evaluation publication-title: Machine Learning doi: 10.1007/s10994-009-5128-4 – year: 1998 ident: 10.1016/j.ins.2013.08.037_b0550 – volume: 38 issue: 4 year: 2008 ident: 10.1016/j.ins.2013.08.037_b0345 article-title: Special issue on approximate dynamic programming and reinforcement learning for feedback control publication-title: IEEE Transactions on Systems, Man, and Cybernetics B doi: 10.1109/TSMCB.2008.925890 – volume: 9 start-page: 32 issue: 3 year: 2009 ident: 10.1016/j.ins.2013.08.037_b0350 article-title: Reinforcement learning and adaptive dynamic programming for feedback control publication-title: IEEE Circuits and Systems Magazine doi: 10.1109/MCAS.2009.933854 – volume: 3 start-page: 43 issue: 1 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0385 article-title: Reinforcement learning in first person shooter games publication-title: IEEE Transactions on Computational Intelligence and AI in Games doi: 10.1109/TCIAIG.2010.2100395 – volume: 3 start-page: 1 year: 2002 ident: 10.1016/j.ins.2013.08.037_b0040 article-title: Kernel independent component analysis publication-title: Journal of Machine Learning Research – ident: 10.1016/j.ins.2013.08.037_b0250 doi: 10.1007/978-3-540-45167-9_11 – ident: 10.1016/j.ins.2013.08.037_b0280 – volume: 19 start-page: 1225 issue: 3 year: 2004 ident: 10.1016/j.ins.2013.08.037_b0645 article-title: Reinforcement learning for reactive power control publication-title: IEEE Transactions on Power Systems doi: 10.1109/TPWRS.2004.831259 – ident: 10.1016/j.ins.2013.08.037_b0285 doi: 10.1007/978-3-540-76928-6_8 – volume: 3 start-page: 211 year: 1959 ident: 10.1016/j.ins.2013.08.037_b0495 article-title: Some studies in machine learning using game of checkers publication-title: IBM Jounal on Research and Development – volume: 8 start-page: 341 year: 1992 ident: 10.1016/j.ins.2013.08.037_b0170 article-title: The convergence of TD(λ) for general λ publication-title: Machine Learning doi: 10.1023/A:1022632907294 – volume: 4 start-page: 128 issue: 2 year: 2010 ident: 10.1016/j.ins.2013.08.037_b0035 article-title: Reinforcement learning-based multi-agent system for network traffic signal control publication-title: IET Intelligent Transport Systems doi: 10.1049/iet-its.2009.0070 – year: 2009 ident: 10.1016/j.ins.2013.08.037_b0145 – year: 2002 ident: 10.1016/j.ins.2013.08.037_b0500 – ident: 10.1016/j.ins.2013.08.037_b0685 doi: 10.1109/ADPRL.2007.368190 – start-page: 200 year: 2009 ident: 10.1016/j.ins.2013.08.037_b0680 article-title: Intelligence in the brain: a theory of how it works and how to build it publication-title: Neural Networks doi: 10.1016/j.neunet.2009.03.012 – volume: 14 start-page: 929 issue: 4 year: 2003 ident: 10.1016/j.ins.2013.08.037_b0215 article-title: Helicopter trimming and tracking control using direct neural dynamic programming publication-title: IEEE Transactions on Neural Networks doi: 10.1109/TNN.2003.813839 – ident: 10.1016/j.ins.2013.08.037_b0490 doi: 10.7551/mitpress/7503.003.0151 – year: 2004 ident: 10.1016/j.ins.2013.08.037_b0070 article-title: Reinforcement learning and its relationship to supervised learning – volume: 30 start-page: 54 issue: 1 year: 2012 ident: 10.1016/j.ins.2013.08.037_b0765 article-title: Reinforcement learning for repeated power control game in cognitive radio networks publication-title: IEEE Journal on Selected Areas in Communications doi: 10.1109/JSAC.2012.120106 – ident: 10.1016/j.ins.2013.08.037_b0045 doi: 10.1109/ROBOT.2001.932842 – volume: 42 start-page: 674 issue: 5 year: 1997 ident: 10.1016/j.ins.2013.08.037_b0615 article-title: An analysis of temporal difference learning with function approximation publication-title: IEEE Transactions on Automatic Control doi: 10.1109/9.580874 – volume: 12 start-page: 19 issue: 2 year: 1992 ident: 10.1016/j.ins.2013.08.037_b0555 article-title: Reinforcement learning is direct adaptive control publication-title: IEEE Control Systems doi: 10.1109/37.126844 – volume: vol. 2167 start-page: 97 year: 2001 ident: 10.1016/j.ins.2013.08.037_b0200 article-title: Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner – ident: 10.1016/j.ins.2013.08.037_b0025 doi: 10.1109/ACC.2009.5160611 – year: 2010 ident: 10.1016/j.ins.2013.08.037_b0140 – volume: 145 start-page: 45 year: 2002 ident: 10.1016/j.ins.2013.08.037_b0770 article-title: Robot learning with GA-based fuzzy reinforcement learning agents publication-title: Information Sciences doi: 10.1016/S0020-0255(02)00223-2 – volume: 49 start-page: 161 issue: 2-3 year: 2002 ident: 10.1016/j.ins.2013.08.037_b0425 article-title: Kernel-based reinforcement learning publication-title: Machine Learning doi: 10.1023/A:1017928328829 – volume: 38 start-page: 287 year: 2000 ident: 10.1016/j.ins.2013.08.037_b0530 article-title: Convergence results for single-step on-policy reinforcement-learning algorithms publication-title: Machine Learning doi: 10.1023/A:1007678930559 – volume: 10 start-page: 859 issue: 3 year: 2010 ident: 10.1016/j.ins.2013.08.037_b0720 article-title: Sequential anomaly detection based on temporal-difference learning: principles, models and case studies publication-title: Applied Soft Computing doi: 10.1016/j.asoc.2009.10.003 – volume: 12 start-page: 412 issue: 2 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0465 article-title: Reinforcement learning with function approximation for traffic signal control publication-title: IEEE Transactions on Intelligence Transportation Systems doi: 10.1109/TITS.2010.2091408 – volume: 18 start-page: 973 issue: 4 year: 2007 ident: 10.1016/j.ins.2013.08.037_b0715 article-title: Kernel based least-squares policy iteration for reinforcement learning publication-title: IEEE Transactions on Neural Networks doi: 10.1109/TNN.2007.899161 – ident: 10.1016/j.ins.2013.08.037_b0210 – ident: 10.1016/j.ins.2013.08.037_b0380 doi: 10.1109/ICASSP.2012.6288330 – volume: 45 start-page: 477 issue: 2 year: 2009 ident: 10.1016/j.ins.2013.08.037_b0650 article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration publication-title: Automatica doi: 10.1016/j.automatica.2008.08.017 – volume: 33 start-page: 235 issue: 2–3 year: 1998 ident: 10.1016/j.ins.2013.08.037_b0155 article-title: Elevator group control using multiple reinforcement learning agents publication-title: Machine Learning doi: 10.1023/A:1007518724497 – start-page: 719 year: 2010 ident: 10.1016/j.ins.2013.08.037_b0365 article-title: Toward off-policy learning control with function approximation – volume: 13 start-page: 165 issue: 3 year: 2005 ident: 10.1016/j.ins.2013.08.037_b0545 article-title: Reinforcement learning for RoboCup-soccer keepaway publication-title: Adaptive Behavior doi: 10.1177/105971230501300301 – volume: 27 start-page: 135 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0305 article-title: Reinforcement based mobile robot navigation in dynamic environment publication-title: Robotics and Computer-Integrated Manufacturing doi: 10.1016/j.rcim.2010.06.019 – volume: 4 start-page: 1107 year: 2003 ident: 10.1016/j.ins.2013.08.037_b0340 article-title: Least-squares policy iteration publication-title: Journal of Machine Learning Research – ident: 10.1016/j.ins.2013.08.037_b0485 – volume: 8 start-page: 279 year: 1992 ident: 10.1016/j.ins.2013.08.037_b0675 article-title: Q-Learning publication-title: Machine Learning – ident: 10.1016/j.ins.2013.08.037_b0760 – ident: 10.1016/j.ins.2013.08.037_b0580 – volume: 5 start-page: 1309 issue: 10 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0310 article-title: Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing publication-title: IET Communication doi: 10.1049/iet-com.2010.0258 – volume: vol. 14 start-page: 1491 year: 2002 ident: 10.1016/j.ins.2013.08.037_b0180 article-title: Batch value function approximation via support vectors – volume: 22 start-page: 85 issue: 1 year: 2007 ident: 10.1016/j.ins.2013.08.037_b0405 article-title: A reinforcement learning model to assess market power under auction-based energy pricingm publication-title: IEEE Transactions on Power Systems doi: 10.1109/TPWRS.2006.888977 – volume: 9 start-page: 974 issue: NIPS 1996 year: 1997 ident: 10.1016/j.ins.2013.08.037_b0525 article-title: Reinforcement learning for dynamic channel allocation in cellular telephone systems publication-title: Advances in Neural Information Processsing Systems – volume: 6 start-page: 215 year: 1994 ident: 10.1016/j.ins.2013.08.037_b0600 article-title: TD-Gammon, a self-teaching backgammon program, achieves master-level play publication-title: Neural Computation doi: 10.1162/neco.1994.6.2.215 – volume: 176 start-page: 2121 issue: 15 year: 2006 ident: 10.1016/j.ins.2013.08.037_b0420 article-title: Adaptive stock trading with dynamic asset allocation using reinforcement learning publication-title: Information Sciences doi: 10.1016/j.ins.2005.10.009 – volume: SMC-3 start-page: 455 issue: 5 year: 1973 ident: 10.1016/j.ins.2013.08.037_b0695 article-title: Punish/reward: Learning with a critic in adaptive threshold systems publication-title: IEEE Transactions on Systems, Man, and Cybernetics doi: 10.1109/TSMC.1973.4309272 – volume: 16 issue: NIPS 2003 year: 2004 ident: 10.1016/j.ins.2013.08.037_b0415 article-title: Autonomous helicopter flight via reinforcement learning publication-title: Advances in Neural Information Processing Systems – year: 1998 ident: 10.1016/j.ins.2013.08.037_b0635 – ident: 10.1016/j.ins.2013.08.037_b0560 – start-page: 123 year: 2003 ident: 10.1016/j.ins.2013.08.037_b0205 article-title: Relational instance based regression for relational reinforcement learning – volume: 6 start-page: 13 issue: 4 year: 1999 ident: 10.1016/j.ins.2013.08.037_b0395 article-title: Cognitive radio: making software radios more personal publication-title: IEEE Personal Communications doi: 10.1109/98.788210 – year: 2008 ident: 10.1016/j.ins.2013.08.037_b0290 – year: 1983 ident: 10.1016/j.ins.2013.08.037_b0540 – ident: 10.1016/j.ins.2013.08.037_b0130 – volume: 6 start-page: 503 year: 2005 ident: 10.1016/j.ins.2013.08.037_b0220 article-title: Tree-based batch mode reinforcement learning publication-title: Journal of Machine Learning Research – year: 2007 ident: 10.1016/j.ins.2013.08.037_b0460 – volume: 4 start-page: 177 issue: 3 year: 2010 ident: 10.1016/j.ins.2013.08.037_b0060 article-title: Urban traffic signal control using reinforcement learning agents publication-title: IET Intelligent Transport Systems doi: 10.1049/iet-its.2009.0096 – volume: 8 start-page: 2169 year: 2007 ident: 10.1016/j.ins.2013.08.037_b0375 article-title: Proto-value functions: a laplacian framework for learning representation and control in markov decision processes publication-title: Journal of Machine Learning Research – year: 2010 ident: 10.1016/j.ins.2013.08.037_b0595 – volume: 57 start-page: 271 year: 2004 ident: 10.1016/j.ins.2013.08.037_b0195 article-title: Integrating guidance into relational reinforcement learning publication-title: Machine Learning doi: 10.1023/B:MACH.0000039779.47329.3a – ident: 10.1016/j.ins.2013.08.037_b0570 doi: 10.1145/1553374.1553501 – volume: vol. 12 year: 2000 ident: 10.1016/j.ins.2013.08.037_b0335 article-title: Actor–critic algorithms – year: 2010 ident: 10.1016/j.ins.2013.08.037_b0700 – volume: 10 start-page: 251 issue: 2 year: 1998 ident: 10.1016/j.ins.2013.08.037_b0020 article-title: Natural gradient works efficiently in learning publication-title: Neural Computation doi: 10.1162/089976698300017746 – volume: 64 start-page: 91 issue: 1–3 year: 2006 ident: 10.1016/j.ins.2013.08.037_b0245 article-title: Graph kernels and Gaussian Processes for relational reinforcement learning publication-title: Machine Learning doi: 10.1007/s10994-006-8258-y – year: 2008 ident: 10.1016/j.ins.2013.08.037_b0105 – ident: 10.1016/j.ins.2013.08.037_b0110 – volume: 19 start-page: 427 issue: 1 year: 2004 ident: 10.1016/j.ins.2013.08.037_b0225 article-title: Power systems stability control: reinforcement learning framework publication-title: IEEE Transactions on Power Systems doi: 10.1109/TPWRS.2003.821457 – volume: 10 start-page: 1000 issue: 3 year: 1999 ident: 10.1016/j.ins.2013.08.037_b0505 article-title: Input space vs feature space in kernel-based algorithms publication-title: IEEE Transactions on Neural Networks doi: 10.1109/72.788641 – volume: 16 start-page: 227 issue: 3 year: 1994 ident: 10.1016/j.ins.2013.08.037_b0535 article-title: An upper bound on the loss from approximate optimal value functions publication-title: Machine Learning doi: 10.1023/A:1022693225949 – ident: 10.1016/j.ins.2013.08.037_b0270 doi: 10.7551/mitpress/7503.003.0062 – ident: 10.1016/j.ins.2013.08.037_b0610 – volume: 47 start-page: 1556 issue: 8 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0630 article-title: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton–Jacobi equations publication-title: Automatica doi: 10.1016/j.automatica.2011.03.005 – ident: 10.1016/j.ins.2013.08.037_b0165 – volume: 6 issue: NIPS 1994 year: 1994 ident: 10.1016/j.ins.2013.08.037_b0120 article-title: Packet routing in dynamically changing networks: a reinforcement learning approach publication-title: Advances in neural information processing systems – volume: 29 start-page: 291 issue: 5 year: 1997 ident: 10.1016/j.ins.2013.08.037_b0100 article-title: Stochastic approximation with two time scales publication-title: Systems & Control Letters doi: 10.1016/S0167-6911(97)90015-3 – volume: vol. 22 year: 2010 ident: 10.1016/j.ins.2013.08.037_b0360 article-title: Convergent temporal-difference learning with arbitrary smooth function approximation – volume: 6 start-page: 185 issue: 6 year: 1994 ident: 10.1016/j.ins.2013.08.037_b0300 article-title: On the convergence of stochastic iterative dynamic programming algorithms publication-title: Neural Computation doi: 10.1162/neco.1994.6.6.1185 – ident: 10.1016/j.ins.2013.08.037_b0605 – volume: 112 start-page: 181 year: 1999 ident: 10.1016/j.ins.2013.08.037_b0585 article-title: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning publication-title: Artificial Intelligence doi: 10.1016/S0004-3702(99)00052-1 – volume: 15 start-page: 1055 issue: 6 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0730 article-title: Continuous-action reinforcement learning with fast policy search and adaptive basis function selection publication-title: Soft Computing – A Fusion of Foundations, Methodologies and Applications – volume: 72 start-page: 3447 year: 2009 ident: 10.1016/j.ins.2013.08.037_b0515 article-title: Predicting investment behavior: an augmented reinforcement learning model publication-title: Neurocomputing doi: 10.1016/j.neucom.2008.11.031 – volume: 43 start-page: 473 year: 2007 ident: 10.1016/j.ins.2013.08.037_b0010 article-title: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control publication-title: Automatica doi: 10.1016/j.automatica.2006.09.019 – volume: 24 start-page: 762 issue: 5 year: 2013 ident: 10.1016/j.ins.2013.08.037_b0735 article-title: Online learning control using adaptive critic designs with sparse kernel machines publication-title: IEEE Transactions on Neural Networks and Learning Systems doi: 10.1109/TNNLS.2012.2236354 – volume: 59 start-page: 1823 issue: 4 year: 2010 ident: 10.1016/j.ins.2013.08.037_b0240 article-title: Distributed Q-Learning for aggregated interference control in cognitive radio networks publication-title: IEEE Transactions on Vehicular Technology doi: 10.1109/TVT.2010.2043124 – start-page: 441 year: 2008 ident: 10.1016/j.ins.2013.08.037_b0235 article-title: Regularized policy iteration publication-title: NIPS – ident: 10.1016/j.ins.2013.08.037_b0190 doi: 10.1007/3-540-44914-0_2 – year: 2006 ident: 10.1016/j.ins.2013.08.037_b0015 article-title: Adaptive critic designs for discrete-time zero-sum games with application to H-Infinity control publication-title: IEEE Transactions on Systems Man Cybernetics-Part B – volume: 8 issue: NIPS 1995 year: 1996 ident: 10.1016/j.ins.2013.08.037_b0160 article-title: Improving elevator performance using reinforcement learning publication-title: Advances in Neural Information Processing Systems – start-page: 1531 year: 2002 ident: 10.1016/j.ins.2013.08.037_b0320 article-title: A natural policy gradient publication-title: Advances in Neural Information Processing Systems – volume: 22 start-page: 33 year: 1996 ident: 10.1016/j.ins.2013.08.037_b0135 article-title: Linear least-squares algorithms for temporal difference learning publication-title: Machine Learning doi: 10.1023/A:1018056104778 – volume: 21 start-page: 1744 issue: 4 year: 2006 ident: 10.1016/j.ins.2013.08.037_b0400 article-title: Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system publication-title: IEEE Transactions on Power Systems doi: 10.1109/TPWRS.2006.882467 – volume: 13 start-page: 764 issue: 3 year: 2002 ident: 10.1016/j.ins.2013.08.037_b0640 article-title: Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator publication-title: IEEE Transactions on Neural Networks doi: 10.1109/TNN.2002.1000146 – ident: 10.1016/j.ins.2013.08.037_b0330 – volume: 8 start-page: 997 issue: 5 year: 1997 ident: 10.1016/j.ins.2013.08.037_b0470 article-title: Adaptive critic designs publication-title: IEEE Transactions Neural Networks doi: 10.1109/72.623201 – ident: 10.1016/j.ins.2013.08.037_b0030 – start-page: 30 year: 1995 ident: 10.1016/j.ins.2013.08.037_b0055 article-title: Residual algorithms: reinforcement learning with function approximation – volume: 26 start-page: 1272 issue: 3 year: 2011 ident: 10.1016/j.ins.2013.08.037_b0750 article-title: Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q(λ) learning publication-title: IEEE Transactions on Power Systems doi: 10.1109/TPWRS.2010.2102372 – ident: 10.1016/j.ins.2013.08.037_b0450 doi: 10.1109/IROS.2006.282564 |
| SSID | ssj0004766 |
| Score | 2.514319 |
| Snippet | In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 1 |
| SubjectTerms | Approximate dynamic programming Function approximation Generalization Learning control Reinforcement learning |
| Title | Reinforcement learning algorithms with function approximation: Recent advances and applications |
| URI | https://dx.doi.org/10.1016/j.ins.2013.08.037 |
| Volume | 261 |
| WOSCitedRecordID | wos000331689700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-6291 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004766 issn: 0020-0255 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Pb9MwFLbKxgEOCAaIjQ35gDiAIsWOXcfcpmkI0DQhNKRoByLHdtZMXTZ17dQ_H_9M3cEQO3CJqqhxonxfnp-f3_seAG-FMfqkVWUmhRXVzpHKSjXWmSp5g2mLiSauUPiIHR-XVcW_jUY_Yy3MzZT1fblc8qv_CrU5Z8C2pbP3gHsY1Jwwvw3o5mhgN8d_Av67dmKo0sX9YleIsw9ienY56-aTi1DPZic03yXcqoovu4shzcN4ki7x3GcHeA3ndJ879WdDNZMbKEymg5NeLSx-VTfQ73Th4rJHuluRKYSrTye6nyxEGoNAJHMJbaldNYtQuzpJ7Sr2KuvBMqJkivV2_zfj7eMI52bFYXXUUeG0Vb0kzLpQ9q0JbEgrjBlr57UZorZD1LbHZsEegE3MKDdWb3P_y2H1dVU5y_xudnz-uO_tMgBvPcefPZfEGzl5Cp6EZQTc9_A_AyPdb4HHibjkFtgLJSnwHUxQgsGYPwf1GlFgJApcEQVaosBIFLhGlI_Q0wRGmkBDE5jS5AX48enw5OBzFtptZBJzNs8oQ0ihXCpVjBXGSouGkIboVgjE2kJiJSmlzbhoRU4EzwXWY07NcoDkTKJSFC_BRn_Z61cAMp7LvGwkoa0giuuSMkWFIqzBkosWb4M8vspaBi162xJlWt8J4TZ4P1xy5YVY_vZnEvGpA_m9h1gbrt192c597vEaPFp9CrtgYz5b6D3wUN7Mu-vZm0C0XyfQmJc |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reinforcement+learning+algorithms+with+function+approximation%3A+Recent+advances+and+applications&rft.jtitle=Information+sciences&rft.au=Xu%2C+Xin&rft.au=Zuo%2C+Lei&rft.au=Huang%2C+Zhenhua&rft.date=2014-03-10&rft.issn=0020-0255&rft.volume=261&rft.spage=1&rft.epage=31&rft_id=info:doi/10.1016%2Fj.ins.2013.08.037&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_ins_2013_08_037 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0020-0255&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0020-0255&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0020-0255&client=summon |